Framework for the Semantic Web: an RDF Tutorial
Total Page:16
File Type:pdf, Size:1020Kb
Framework for the SPOTLIGHT SPOTLIGHT Semantic Web: An RDF Tutorial STEFAN DECKER, PRASENJIT MITRA, AND SERGEY MELNIK Stanford University RDF provides a data model that supports fast integration of data sources by bridging semantic differences. he current Web supports mainly human browsing information, as it provides the least common denomina- and searching of textual content. This model has tor for all information models. T become less and less adequate as the mass of avail- able information increases. What is required instead is a Core of the Model model that supports integrated and uniform access to The RDF data model vaguely resembles an object-orient- information sources and services as well as intelligent appli- ed data model. It consists of entities, represented by unique cations for information processing on the Web. Such a identifiers, and binary relationships, or statements, between model will require standard mechanisms for interchanging those entities. In a graphical representation of an RDF data and handling different data semantics. statement, the source of the relationship is called the sub- The Resource Description Framework is a step in this ject, the labeled arc is the predicate (also called property), direction. RDF provides a data model that supports fast and the relationship’s destination is the object. Both state- integration of data sources by bridging semantic differences. ments and predicates are first-class objects, which means It is often used (and was initially designed) for representing they can be used as the subjects or objects of other state- metadata about other Web resources such as XML files. ments (see the section on reifying statements). However, representing metadata1 about the Web is not dif- The RDF data model distinguishes between resources, ferent from representing data generally. Thus, RDF can be which are object identifiers represented by URIs, and lit- used as a general framework for data exchange on the Web. erals, which are just strings. The subject and the predicate of a statement are always resources, while the object can be The RDF Data Model a resource or a literal. In RDF diagrams, resources are The Resource Description Framework Model and Syntax always drawn as ovals, and literals are drawn as boxes. Specification,2 which became a World Wide Web Consor- Figure 1 shows an example statement, which can be read tium (W3C) Recommendation in February 1999, defines as: “The resource http://www.daml.org/projects/#11 has a the RDF data model and an XML-based serialization syn- property hasHomepage (described in http://www.semantic- tax. By separating the data model from the syntax needed web.org/schema-daml01/#hasHomepage) the value of to transport RDF data in a network, the specification which is the resource http://www-db.stanford.edu/Onto- allows an RDF-aware system to access a new non-RDF Agents.” The three parts of this statement are the subject data source. To do this, an RDF adapter first assigns unique http://www.daml.org/projects/#11; the predicate http:// resource identifiers (URIs) to resources in the non-RDF www.semanticweb.org/schema-daml01/#hasHomepage; and data source (when URIs are not already available) and then the object http://www-db.stanford.edu/OntoAgents. A set of generates statements describing resource properties. The statements can be visualized as a graph. The graph in Fig- RDF data model’s use of URIs to unambiguously denote ure 2 extends the statement in Figure 1 by adding the prop- objects, and of properties to describe relationships between erty http://purl.org/dc/elements/1.1/Creator with value Ste- objects, distinguish it fundamentally from XML’s tree- fan Decker (a literal). based data model.This simple but effective mechanism It might seem strange that predicates are also resources supports a general approach to representing and integrating with URI labels, but it eliminates ambiguities that arise 68 NOVEMBER • DECEMBER 2000 http://computer.org/internet/ 1089-7801/00/$10.00 ©2000 IEEE IEEE INTERNET COMPUTING SPOTLIGHT ON RDF Subject Predicate Object http://www.SemanticWeb.org/schema-daml01/#hasHomepage http://www.daml.org/projects/#11 http://www-db.stanford.edu/OntoAgents Figure 1. A simple RDF statement. The resource “http://www.daml.org/projects/#11” is a subject with a property “http://www.SemanticWeb.org/schema-daml-01/#hasHomepage.” The value of the property is the object “http://www- db.stanford.edu/OntoAgents.” http://www.SemanticWeb.org/schema-daml01/#hasHomepage http://www.daml.org/projects/#11 http://www-db.stanford.edu/OntoAgents http://purl.org/dc//elements/1.1/Creator "Stefan Decker" Figure 2. RDF data model graph. The property “http://purl.org/DC/#Creator” with value “Stefan Decker” (a literal) is joined with the simple statement in Figure 1 to form a graph. from using only word identifiers. For example, Representing collections. We often need to make vocabulary providers can define different versions statements about collections of resources, for exam- of the predicate hasHomepage. URIs provide ple, the members of a project. We could make sin- unique identifiers for each version. gle statements about each element of the collection; however, if we wish to make a statement about all Additional Vocabulary members of a project (where the individual mem- The XML-namespace syntax3 is used to abbreviate bers might change), we must use a container. URIs in statements. For instance, we can define RDF defines three types of containers that can the substitution of the namespace-prefix sw for represent collections of resources or literals: http://www.SemanticWeb.org/schema-daml01/#, and then write simply sw:hasHomepage. In this I Bags are unordered lists, for example, a class ros- tutorial, we use the namespace prefix rdf for ter where the order of student names is not vocabulary defined in the RDF model and syntax important. Bags don’t enforce set semantics, so specification.2 The prefix can be expanded to a value can appear several times in a Bag. http://www.w3.org/1999/02/22-rdf-syntax-ns#, I Sequences are ordered lists. Sequences represent, which is the namespace defined for RDF-specific for example, a batch of jobs submitted to a vocabulary. processor, where the job order is important. Like The rdf namespace defines the property Bags, Sequences permit duplicate values. “rdf:type” to denote type relationships between two I Alternatives are lists from which the property resources. The rdf:type property is particularly use- can use only one value. We could use alterna- ful in conjunction with the Class and subClassOf tives to express, for example, the options of fly- vocabulary defined in the RDF Schema Specifica- ing or driving from San Francisco to San Diego, tion,4 where rdf:type is used like the instanceOf rela- and an application could choose a suitable alter- tionship in object-oriented languages. native. IEEE INTERNET COMPUTING http://computer.org/internet/ NOVEMBER • DECEMBER 2000 69 TUTORIAL http://www.daml.org/projects/#11 rdf:Bag rdf:type sw:hasMembers "Gio Wiederhold" rdf:_1 "Stefan Decker" rdf:_2 bagid1 "Sergey Melnik" rdf:_3 rdf:_4 "Prasenjit Mitra" Figure 3. An rdf:Bag specifying the members of the DAML project. In Figure 3, the resource http://www.daml.org/ The only thing missing now is the statement projects/#11 has the property sw:hasMembers, whose about the reified statement: “All members of the value is a container of the type bag. The bag is repre- OntoAgents project oppose the original statement.” sented by an intermediate node. To refer to the inter- To model this, we introduce an additional arc, mediate node, you can give it an identifier like labeled sw:opposedBy, which ends at the rdf:Bag bagid1. An rdf:type arc specifies bagid1 as an rdf:Bag, resource defined in Figure 3. and ordinal properties (rdf:_1 through rdf:_4) point to literals representing members’ names. RDF Schema The RDF Schema Specification,4 which became a Reifying statements. In RDF, we want to be able to W3C candidate recommendation in March 2000, make statements about statements. To refer to a is an RDF application that introduces an object- statement, we need to treat it like a resource. The oriented, extensible type system to RDF. RDF process of associating a statement and a specific Schema provides means to define property domains resource representing the statement is formally and ranges, and class and subclass hierarchies. called reification. The statement that has been mod- In RDF Schema a property is not “local” to a eled as a resource is called a reified statement. class, as an attribute usually is in object-oriented Consider, for example, the following statement: languages. Instead, properties are global and “http://www.daml.org/projects/#11 is an NSF-fund- described in terms of the classes they connect. To ed project.” We want to oppose this statement illustrate the use of RDF Schema, we will now because it is not true (the project is funded by define the vocabulary used in the previous exam- DARPA). To represent the statement, “The members ples. We use the namespace prefix rdfs to abbre- of the project oppose the statement that viate the RDF Schema namespace identifier http://www.daml.org/projects/#11 is an NSF-funded http://www.w3.org/2000/01/RDF schema#. project,” we must reify it by introducing a new Figure 5 (on page 72) depicts an RDF schema resource (node in the graph) to represent the original that defines the class sw:Project and two proper- statement. We type the new resource using the RDF- ties, sw:hasHomepage and sw:hasMembers. The defined resource rdf:Statement, and then model the class is defined by typing it with the resource statement’s subject, object, and predicate with the rdfs:Class, which is defined as a metaclass in the special properties (rdf:subject, rdf:object, and rdf:pred- RDF Schema Specification.