<<

Learning SPARQL INFX 532 | UW iSchool

References: Cambridge Semantics SPARQL 101 SPARQL Nuts & Bolts SPARQL by Example

Wonghong Jang | Sam Oh

Introduction

• SPARQL is a recursive acronym, which stands for SPARQL Protocol and RDF Query Language.

• SPARQL is used to query RDF data

• Why SPARQL? – Pull values from structured and semi-structured data – Explore data by querying unknown relationships – Perform complex joins of disparate in a single, simple query – Transform RDF data from one vocabulary to another – Develop higher-level cross-platform application

RDF’s SPO

• Statement: "The author of http://www.w3schools.com/rdf is Jan Egil Refsnes".

• The subject of the statement above is: http://www.w3schools.com/rdf

• The predicate is: author

• The object is: Jan Egil Refsnes

Jan Egil Refsnes http://www.w3schools.com

3 SPARQL Landscape

• SPARQL 1.0 became a standard in January, 2008, and included: – SPARQL 1.0 Query Language – SPARQL 1.0 Protocol – SPARQL Results XML Format

• SPARQL 1.1 is in-progress, and includes: – Updated 1.1 versions of SPARQL Query and SPARQL Protocol – SPARQL 1.1 Update – SPARQL 1.1 Graph Store HTTP Protocol – SPARQL 1.1 Service Descriptions – SPARQL 1.1 Entailments – SPARQL 1.1 Basic Federated Query

SQL vs. SPARQL

5

Anatomy of a SPARQL query Declare prefix PREFIX foo: <…> shortcuts PREFIX bar: <…> (optional) … Query result SELECT … clause Define the FROM <…> dataset FROM NAMED <…> (optional) WHERE { … Triple patterns } Query ORDER BY … modifiers LIMIT … (optional) OFFSET …

6 6 4 Types of SPARQL Queries SELECT queries CONSTRUCT queries Project out specific variables and expressions: Construct RDF triples/graphs: SELECT ?c ?cap (1000 * ?people AS ?pop) CONSTRUCT { ?country a ex:HolidayDestination ; Project out all variables: ex:arrive_at ?capital ; SELECT * ex:population ?population . } Project out distinct combinations only: SELECT DISTINCT ?country Results in RDF triples (in any RDF serialization):

Results in a table of values (in XML or JSON): ex:France a ex:HolidayDestination ; ?c ?cap ?pop ex:arrive_at ex:Paris ; ex:population 635000000 . ex:France ex:Paris 63,500,000 ex:Canada a ex:HolidayDestination ; ex:Canada ex:Ottawa 32,900,000 ex:arrive_at ex:Ottawa ; ex:population 329000000 . ex:Italy ex:Rome 58,900,000 ASK queries DESCRIBE queries Ask whether or not there are any matches: Describe the resources matched by the given variables: ASK DESCRIBE ?country

Result is either “true” or “false” (in XML or JSON): Result is RDF triples (in any RDF serialization) : true, false ex:France a geo:Country ; ex:continent geo:Europe ; ex:flag ; … 7 SPARQL Query: Syntax

PREFIX : SELECT ?name FROM WHERE { ?x foaf:name ?name . } ORDER BY ?name

PREFIX SELECT FROM WHERE ORDER BY

The PREFIX keyword The SELECT keyword The FROM keyword The WHERE clause The ORDER BY is one describes prefix is the most popular of defines the RDF dataset specifies the query of the several possible declarations for the 4 possible return which is being queried. graph pattern to be solution modifiers, abbreviating URIs. clauses (more on the There is an optional matched. This is the which are used to others later). If you've clause, FROM NAMED, heart of the query. A rearrange the query used SQL, SELECT which is used when you graph pattern, as results. Other solution serves very much the want to query a named mentioned above, is, in modifiers are LIMIT same function in graph. essence, RDF with and OFFSET. SPARQL, which is variables. simply to return data matching some conditions. Returns Clauses

• In addition to SELECT, there are three other very important return clauses that you can use: ASK, DESCRIBE, and CONSTRUCT.

1. ASK: Check if there is at least one result for a given query pattern. The result is true or false.

2. DESCRIBE: Returns an RDF graph that describes a resource. The implementation of this return form is up to each query engine, so you won't see it used nearly as often as the other return clauses.

3. CONSTRUCT: returns an RDF graph that is created from a template specified as part of the query itself. CONSTRUCT is used to transform RDF data (for example into a different graph structure and with a different vocabulary than the source data). SPARQL Architecture & Endpoint

• SPARQL queries are executed against RDF datasets, consisting of RDF graphs. (More on this later.)

• A SPARQL endpoint accepts queries and returns results via HTTP. – Generic endpoints will query any Web-accessible RDF data – Specific endpoints are hardwired to query against particular datasets

• The results of SPARQL queries can be returned and/or rendered in a variety of formats: – XML. SPARQL specifies an XML vocabulary for returning tables of results. – JSON. A JSON "port" of the XML vocabulary, particularly useful for Web applications. – RDF. Certain SPARQL result clauses trigger RDF responses, which in turn can be serialized in a number of ways (RDF/XML, N-Triples, , etc.) – HTML. When using an interactive form to work with SPARQL queries. Often implemented by applying an XSL transform to XML results.

About DBpedia

• DBPedia is an RDF version of information from .

• DBPedia contains data derived from Wikipedia's infoboxes, category hierarchy, article abstracts, and various external links.

• DBpedia contains over 100 million triples.

Dbpedia SPARQL Endpoint

• http://dbpedia.org/sparql

Basic Graph Patterns

• This query returns all of the URIs that identify cities that are of type "Cities in Texas".

[Query 1] PREFIX rdf: SELECT * WHERE { ?city rdf:type } Basic Graph Patterns

• This query returns the cities that are of type "Cities in Texas" as well as their total populations.

[Query 2] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type . ?city dbp:populationTotal ?popTotal . } Basic Graph Patterns

• This query returns the cities that are of type "Cities in Texas" with their total populations and metro populations.

[Query 3] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; dbp:populationMetro ?popMetro . } Dealing with Missing or Sparse Data using OPTIONAL

• This query returns the cities that are of type "Cities in Texas" and their total population and optionally the metro population, if it exists.

[Query 4] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro . } } Solution Modifiers: ORDER BY, LIMIT, OFFSET

• This query returns the cities that are of type "Cities in Texas", their total population, and optionally their metro populations. The results are returned in the order of their total populations (so big cities like Houston would be the first results).

[Query 5] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro . } } ORDER BY desc(?popTotal) Solution Modifiers: ORDER BY, LIMIT, OFFSET

• This query returns the cities that are of type "Cities in Texas", their total population, and optionally their metro populations. The results are returned in the order of their total populations (so big cities would be the top results). At most 10 results will be returned, starting with the 5th result. [Query 6]

PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro. } } ORDER BY desc(?popTotal) LIMIT 10 OFFSET 5 Remove results using FILTER

• A FILTER clause restricts which results are returned • With graph patterns and filters, SPARQL becomes a very powerful language for selecting only data that matches very specific criteria.

• The following filters are allowed:

– Logical: &&, ||, ! – Mathematical: +, -, *, / – Comparison: =, !=, <, >, <=, >= – SPARQL tests: isURI, isBlank, isLiteral, bound – SPARQL accessors: str, lang, datatype – Other: sameTerm, langMatches, regex

Remove results using FILTER

• This Query returns only cities that have a total population of more than 50,000.

[Query 7] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro . } FILTER (?popTotal > 50000) } ORDER BY desc(?popTotal) Remove results using FILTER

• This query is the same as query 7, but brings back the human readable name of each city with the results. rdfs:label is an RDFS predicate commonly used to represent the human-readable name of a resource.

[Query 8]

PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro . } FILTER (?popTotal > 50000) } ORDER BY desc(?popTotal) Remove results using FILTER

• Since we don't need all the results for all languages, we can simplify the query by requesting only the English values be returned. In this way, RDF and SPARQL naturally support internationalization. • [Query 9] requests only English labels for the matching patterns

PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && langmatches(lang(?name), "EN")) } ORDER BY desc(?popTotal) Remove results using FILTER

• The previous query [Query 9] can be rewritten equivalently without the langmatches operator and using "=" and "en" (lowercase) instead of "EN" (uppercase):

PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && lang(?name) = "en") } ORDER BY desc(?popTotal) Remove results using FILTER

• This query shows how to use regular expression filters. It is the same as Query 9, but matching only cities with "El" in their names.

[Query 10] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && langmatches(lang(?name), "EN") && regex(str(?name), "El")) } ORDER BY desc(?popTotal) Negation: Where is the NOT Operator?

• This query is the same as Query 10, except that it returns only cities that do not have a metro population.

[Query 11] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro . } FILTER (?popTotal > 50000 && langmatches(lang(?name), "EN") ) FILTER(!bound(?popMetro)) } ORDER BY desc(?popTotal) Union (= OR) Query

• This is much the same as the queries that we've been seeing, only it returns cities that are of type "Cities in Texas" or of type "Cities in California". [Query 12] PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && langmatches(lang(?name), "EN")) } UNION { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && langmatches(lang(?name), "EN")) } } ORDER BY desc(?popTotal) Union (= OR) Query

• The following query is equivalent to the previous one but simpler: [Query 12] Simplified

PREFIX rdf: PREFIX rdfs: PREFIX dbp: SELECT * WHERE { ?city dbp:populationTotal ?popTotal ; rdfs:label ?name OPTIONAL {?city dbp:populationMetro ?popMetro. } FILTER (?popTotal > 50000 && langmatches(lang(?name), "EN")) { ?city rdf:type . } UNION { ?city rdf:type . } } ORDER BY desc(?popTotal) Ask Query

• ASK queries checks if there is at least one result for a given query pattern. The result is true or false. • This query asks if Austin is a city in Texas.

[Query 13] PREFIX rdf: ASK WHERE { rdf:type . } Complicated Ask Query

• This query asks if there exists a city in Texas that has a total population greater than 600,000 and a metro population less than 1,800.000

[Query 14] PREFIX rdf: PREFIX dbp: ASK WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; dbp:populationMetro ?popMetro. FILTER (?popTotal > 600000 && ?popMetro < 1800000) } Describe Query

• DESCRIBE queries returns an RDF graph that describes a resource. The implementation of this return form is up to each query engine. • This query returns an RDF graph that describes Austin.

[Query 15] DESCRIBE Describe Query

• This query returns an RDF graph that describes all the cities in Texas that have a total population greater than 600,000 and a metro population less than 1,800.000.

[Query 16] PREFIX rdf: PREFIX dbp: DESCRIBE ?city WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; dbp:populationMetro ?popMetro. FILTER (?popTotal > 600000 && ?popMetro < 1800000) } Construct Query

• A CONSTRUCT query returns an RDF graph that is created from a graph template specified in the CONSTRUCT query. More specifically, the result RDF graph is created by taking the results of a query pattern and filling in the values of variables that occur in the construct template. • This query constructs a new RDF graph for cities in Texas that have a metro population greater than 500,000. [Query 17]

PREFIX rdf: PREFIX rdfs: PREFIX dbp: CONSTRUCT { ?city rdf:type ; ?name ; ?popTotal ; ?popMetro . } WHERE { ?city rdf:type ; dbp:populationTotal ?popTotal ; rdfs:label ?name ; dbp:populationMetro ?popMetro . FILTER (?popTotal > 500000 && langmatches(lang(?name), "EN")) }