• Topics • Finish up XML. • What is RDF? • Why is it interesting? • SPARQL: The for querying RDF. • Learning Objectives: • Identify data management problems for which RDF is a desirable representation. • Design RDF stores for a given data management problem. • Write queries in SPARQL

4/22/10 1

XQuery Evaluation Model

• The FOR clause acts more or less like a FROM clause in SQL • You can have multiple variables and paths: For $v1 IN path1 $v2 IN path2 • The query iterates over all combinations of values from the component XPath expressions. • Variables $v1 and $v2 are assigned one value at a time. • The WHERE and RETURN clauses are applied to each combination. • The LET clause is applied to each combination of values produced by the FOR clause. • It assigns to a variable the complete set of values produced by its path expression. • The RETURN clause is similar to a select clause. • Identifies the pieces that you actually want to return.

4/22/10 2 Berkeley DB XML

• What is it? • A native XML built atop Berkeley DB. • Think of BDB as the storage layer and XML/XQuery as the schema and query layers. • Concepts • Document: A well-formed XML document, treated like a row/tuple in an RDBMS. • Container: a collection of related documents (kind of like a table). • Can combine different document types in a container (unlike a relational table).

4/22/10 3


• Include documents and meta-data • Indices • Dictionaries mapping element and attribute names to ID numbers. • Statistics to aid in optimization • Document meta-data • Two flavors of containers: • Whole document: each key/data pair contains an entire document. • Node containers: key/data pairs correspond to elements.

4/22/10 4 Indices

• Users can specify indices on containers. • Three types of indices: • Presence: keeps track of locations of elements or attributes with a specified name. • Useful for queries that access elements or attributes of a particular name, regardless of value. • Equality: indexes the values of all the elements/attributes with a specified name • Useful for queries on value equality or range • Substring: indexes on substrings of the values of all elements or attributes with a specified name • Useful for queries using the Xquery contains() function (comparable to LIKE in SQL).

4/22/10 5

Using DBXML to Query • DBXML is case sensitive and wants keywords in lower case (select, not SELECT). • Getting Started: • Copy imdb.dbxml into your current directory. • Start dbxml: % dbxml • Open container dbxml> openContainer imdb.dbxml • Specify the collection to use using: • collection(“imdb.dbxml”)/ • Must use query command to issue an XQuery: dbxml> query ‘for $m in collection(“imdb.dbxml”)/movie return $m/title’ • Use print to display results: dbxml> print

4/22/10 6 What is RDF?

• Resource Description Framework • Grew out of the Semantic Web; way of attaching meta-data to the web. • Fundamental idea is to attach labels to the edges between nodes. • Structure is represented as a collection of triples: node, edge, node. • Nodes represented by URIs (uniform resource identifiers) • Can be used to represent both structured and semi-structured data. • Perhaps more importantly, interestingly, it is a natural representation for graphs.

4/22/10 7

RDF Concepts

• Things are called resources. • Resources have properties. • Properties have values. • You can describe a resource by making statements about it; these statements define its properties. • In RDF, we use the terms subject, predicate, and object to describe the pieces of these statements. • The statement, “Margo has brown hair.” has: • Subject=resource: Margo • Predicate=property: hair • Object=value: brown

4/22/10 8 Naming

• RDF uses URI references (URIref) to identify any of subject, predicate, object. • A URIref is a URI with an optional fragment identifier (signified by the # symbol). • Examples: • URI : http://www.eecs.harvard.edu/~margo/somepage • URIref: http://www.eecs.harvard.edu/~margo/somepage#foo • Although RDF was invented for the web and is specified in terms of URI’s, you can use it to store any data and refer to subjects, predicates, and objects by any ID of your choosing.

4/22/10 9

The Model

• Triples • That’s right -- the entire data model is described by triples of subject, predicate, object, and by that mechanism, you can build up arbitrarily complex objects and relationships. • Think of RDF as describing a graph where the subject and object are nodes and the predicates are edges. • So, our,”Margo has brown hair,” statement would be represented as (margo, hair, brown) and look like:

Margo hair “brown”

4/22/10 10 Drawing Pictures of Data

• Although there are XML representations of RDF, it is by far most natural to think of RDF visually as a graph. • When we think about the graph stucture, we make URIs circular nodes and literal values rectangles. • So our previous example might look like:

hair “brown” MyURI name “Margo”

4/22/10 11

Let’s Draw our (tired) Products

itemname sku-value item sku Product inventory instock #

$$$ vendor val

Vendor curr name currency

4/22/10 12 Making this more RDF-like

itemname #

name instock

SKU inventory price val $$$

vendor curr currency products name name vendor

4/22/10 address address 13

Queries over RDF: SPARQL

• Yet another query language. • Like many others, looks a lot like SQL: • Basic Structure: SELECT FROM WHERE • Variables • Name is preceded by a ? (or $- in Jena) • Predicates are expressed as pattern-matching on triples.

4/22/10 14 WHERE Claues

• WHERE { triple . triple . … }; • Triple is a combination of variables, URIs, constants • The “.” means AND • Example: find all products sold by the vendor whose ID is: vend001 SELECT ?name WHERE { ?p . ?p ?name };

4/22/10 15


• If the “.” operator is AND, how do you specify OR? • Find all products in either US or Canadian dollars: SELECT ?name WHERE { ?sku ?inv . ?inv ?price . {{ ?price “USD” } UNION { ?price “CAD” }} . ?sku ?name };

4/22/10 16 Optional Matches

• Let’s say that we want all items and their vendor, but if there is no vendor specified, we want to see the item anyway. • The obvious query doesn’t quite work: select ?name ?vendor where { ?product ?vendor . ?product ?name }; • Use OPTIONAL to return items without a vendor: select ?name ?vendor where { ?product ?name . OPTIONAL { ?product ?vendor } };

4/22/10 17


• Let’s you impose constraints on values that match a pattern. • Syntax: part of the WHERE clause (add like you would add a pattern): • FILTER expression • Example: Find all product over $1. select ?name ?val where { ?product ?name . ?product ?inv . ?inv ?price . ?price ?val . FILTER (?val > “1.0”) };

4/22/10 18 Functions on which you can filter • There are many functions on which you can filter: • Regular binary and unary operators: • ! (not), || (or), && (and) • Comparators (numeric and string): <, > , =, !=, <=, >= • REGEX: regular expressions • BOUND: tells you if a variable is bound to a value. • Regular Expressions • regex(string, pattern)

4/22/10 19