Logics for the Semantic Web
Total Page:16
File Type:pdf, Size:1020Kb
LOGICS FOR THE SEMANTIC WEB Pascal Hitzler, Jens Lehmann, Axel Polleres Reader: Patrick Hayes 1 INTRODUCTION A major international research effort is currently under way to improve the exist- ing World Wide Web (WWW), with the intention to create what is often called the Semantic Web [Berners-Lee et al., 2001; Hitzler et al., 2010]. Driven by the World Wide Web Consortium (W3C) and its director Sir Tim Berners-Lee (inven- tor of the WWW), and heavily funded by many national and international research funding agencies, Semantic Web has become an established field of research. It integrates methods and expertise from many subfields of Computer Science and Artificial Intelligence [Studer, 2006], and it has now reached sufficient maturity for first industrial scale applications [Hamby, 2012; Hermann, 2010]. Correspondingly, major IT companies are starting to roll out applications involving Semantic Web technologies; these include Apple's Siri, IBM's Watson system, Google's Knowedge Graph, Facebook's Open Graph Protocol, and schema.org as a collaboration be- tween major search engine providers including Microsoft, Google, and Yahoo!. The Semantic Web field is driven by the vision to develop powerful methods and technologies for the reuse and integration of information on the Web. While current information on the Web is mainly made for human consumption, it shall in the future be made available for automated processing by intelligent systems. This vision is based on the idea of describing the meaning|or semantics|of data on the Web using metadata|data that describes other data|in the form of so-called ontologies [Hitzler et al., 2010]. Ontologies are essentially knowledge bases rep- resented using logic-based knowledge representation languages. This shall enable access to implicit knowledge through logical deduction [Hitzler and van Harme- len, 2010], and its use for search, integration, browsing, organization, and reuse of information. Of course, the idea of adopting knowledge representation languages raises the question which of the many approaches discussed in the literature should be adopted and promoted to Web standards (officially called W3C Recommenda- tions). In this chapter, we give an overview of the most important present stan- dards as well as their origins and history. The idea that the World Wide Web shall have capabilities to convey information for processing by intelligent systems, and not only by humans, has already been part of its original design [Berners-Lee, 1996]. The World Wide Web was initiated 2 Pascal Hitzler, Jens Lehmann, Axel Polleres in 1990, and immediately showed exponential growth [Berners-Lee, 1996]. In the meantime, it has become a very significant technological infrastructure of modern society. In the 1990s, the Semantic Web vision1 was mainly driven by the W3C Meta- data Activity [W3C Metadata, revision of 23 August 2002] which produced the first version of the Resource Description Framework (RDF) which we will discuss in Section 2. The Semantic Web Activity [W3C Semantic Web, revision of 19 June 2013] replaced the Metadata Activity in 2001, and has installed several stan- dards for representing knowledge on the Web, most noteably two revisions of RDF, the Web Ontology Language (OWL) discussed in Section 3, and the Rule Inter- change Format (RIF) discussed in Section 4. In Section 5, we discuss some of the particular challenges which must be faced when adopting logic-based knowledge representation languages for the Semantic Web, and in Section 6 we discuss some of the more recent research developments and questions. Note that we give a more detailed technical account for RDF than for OWL and RIF, because the latter are closely related to description logics and logic programming, respectively, and the reader is refered to the corresponding chapters in this volume for additional background and introductions. 2 RDF AND RDF SCHEMA The Resource Description Framework (RDF) [Manola et al., 2004; Hayes, 2004] comprises a simple data format as well as a basic schema language, called RDF Schema [Brickley and Guha, 2004]. While historically often termed a \medadata" standard, that is, an exchange format for data about documents and resources, in the meantime, RDF has been well established as a universal data exchange format for classical data integration scenarios, and particularly for publishing and exchanging structured data on the Web [Polleres et al., 2011]. Informally, all RDF data can be understood as a set of subject{predicate{object triples, where all subjects and predicates are Uniform Resource Identifiers (URIs)2 [Berners-Lee et al., 2005], and in the object position both URIs and literal values (such as numbers, strings, etc.) are allowed. Such a simple, triple based format was chosen since on the one hand, it can accomodate for any kind of metadata in the form of predicate-value pairs, and on the other hand, any more complex relational or object-oriented data can be decomposed into such triples in a fairly straightforward manner [Berners-Lee, 2006].3 Last, but not least, since URIs are being used as constant symbols in the lan- guage of RDF, any RDF triple may likewise be viewed as a generalization of 1The term Semantic Web became popular in the aftermath of the widely cited popular science article [Berners-Lee et al., 2001]. We were able to trace the term Semantic, in relation to Web, back to a 1994 presentation by Tim Berners-Lee [Berners-Lee, 1994]. 2URIs are a generalization of URLs. 3See also the discussion of semantic networks in the chapter on description logics, in this volume. Logics for the Semantic Web 3 http://www.w3.org <DesignIssues/TimBook-old/History.html> dc:creator <People/Berners-Lee/card#i> Figure 1. An RDF triple a \link" on the Web; as opposed to plain links on the traditional Web, on the Semantic Web, links can be associated with an arbitrary binary relation, which again is represented by a URI. We note that this idea of \typed" links was already part of Tim-Berners Lee's original design ideas of the Web [Berners-Lee, 1993; Berners-Lee and Fischetti, 1999]. For instance, on the website of the W3C (http: //www.w3.org, if one wants to state that the page with the URI http://www.w3. org/DesignIssues/TimBook-old/History.html was created by Tim Berners- Lee, this fact may be viewed as such a typed link, and consequently as an RDF subject-predicate-object triple, as shown in Fig. 1. Here, the URI dc:creator is used to denote the has-creator relation between a resource and its creator. This common view of typed links as labeled edges linking between resources also leads to sets of RDF triples often being called \RDF graphs." A distinguished relation within RDF, represented by the URI rdf:type is the is-a relation, that allows to denote membership to a certain class, where classes are again represented by URIs, for instance the class foaf:Person. Another important feature of RDF is that so called \blank nodes" can be used in the subject or object positions of triples to denote unnamed or unknown resources. This allows to model incomplete information in the form of existentials. For instance, the RDF graph in Fig. 2 extends the information in Fig. 1 by the fact that Tim Berners-Lee is a Person, is named \Timothy Berners-Lee" and knows some person named \Dan Brickley". In the following, after giving a brief history of the RDF standard (Section 2.1), we will present the RDF Data model along with a short discussion of different syntactic representations and the semantics of RDF (Section 2.2). We continue with a discussion of RDF Schema in Section 2.3 and the query language SPARQL in Section 2.4. 4 Pascal Hitzler, Jens Lehmann, Axel Polleres http://www.w3.org <DesignIssues/TimBook-old/History.html> dc:creator <People/Berners-Lee/card#i> foaf:name rdf:type foaf:knows foaf:Person “Timothy Berners-Lee” rdf:type foaf:name “Dan Brickley” Figure 2. An RDF graph with a \blank" (anonymous) node 2.1 Brief History of RDF The standardisation of RDF has been preceded by two earlier proposals for meta- data standards in the form of W3C member submissions, namely (i) the Chan- nel Definition Format [Ellerman, 1997] and (ii) the Meta Content Framework (MCF) [Guha and Bray, 1997]. While the former comprised an XML format with a fixed term of metadata properties for describing information channels on the Web, (somewhat similar to RSS nowadays), the latter (MCF) was strictly extensible and evolved into the first version of RDF [Lassila and Swick, 1999], published in 1999. Another important metadata initiative from the digital libraries community, Dublin Core [Nilsson et al., 2008], which started around the same time but outside of W3C, later on adopted RDF as a representation syntax, becoming one of the most prominent RDF vocabularies, see Section 2.3 below. The first official standard recommendation of RDF from 1999 was extended in 2004 by a formal definition of the Semantics of RDF [Hayes, 2004], decoupling the syntactical representation in XML from the RDF data model. Since then RDF has been used in various contexts and experienced wide adop- tion. In 2009 the W3C held a workshop on future directions of RDF [Herman, 2009], discussing several extensions but also simplifications of the standard. These extensions are currently under discussion in the ongoing W3C RDF 1.1 working group.4 4See http://www.w3.org/2011/rdf-wg/. Logics for the Semantic Web 5 2.2 Different Syntactic Representations and Semantics There are various serialisations for RDF. Fig. 3 shows different syntactic represen- tations of the six triples in the RDF graph from Fig. 2 in some of these serialisation syntaxes: N-Triples [Beckett and McBride, 2004a], cf.