SPARQL INFX 532 | UW Ischool
Total Page:16
File Type:pdf, Size:1020Kb
Learning SPARQL INFX 532 | UW iSchool References: Cambridge Semantics SPARQL 101 SPARQL Nuts & Bolts SPARQL by Example Wonghong Jang | Sam Oh Introduction • SPARQL is a recursive acronym, which stands for SPARQL Protocol and RDF Query Language. • SPARQL is used to query RDF data • Why SPARQL? – Pull values from structured and semi-structured data – Explore data by querying unknown relationships – Perform complex joins of disparate databases in a single, simple query – Transform RDF data from one vocabulary to another – Develop higher-level cross-platform application RDF’s SPO • Statement: "The author of http://www.w3schools.com/rdf is Jan Egil Refsnes". • The subject of the statement above is: http://www.w3schools.com/rdf • The predicate is: author • The object is: Jan Egil Refsnes • <?xml version="1.0"?> <RDF> <Description about="http://www.w3schools.com/rdf"> <author>Jan Egil Refsnes</author> <homepage>http://www.w3schools.com</homepage> </Description> </RDF> 3 SPARQL Landscape • SPARQL 1.0 became a standard in January, 2008, and included: – SPARQL 1.0 Query Language – SPARQL 1.0 Protocol – SPARQL Results XML Format • SPARQL 1.1 is in-progress, and includes: – Updated 1.1 versions of SPARQL Query and SPARQL Protocol – SPARQL 1.1 Update – SPARQL 1.1 Graph Store HTTP Protocol – SPARQL 1.1 Service Descriptions – SPARQL 1.1 Entailments – SPARQL 1.1 Basic Federated Query SQL vs. SPARQL 5 Anatomy of a SPARQL query Declare prefix PREFIX foo: <…> shortcuts PREFIX bar: <…> (optional) … Query result SELECT … clause Define the FROM <…> dataset FROM NAMED <…> (optional) WHERE { … Triple patterns } Query ORDER BY … modifiers LIMIT … (optional) OFFSET … 6 6 4 Types of SPARQL Queries SELECT queries CONSTRUCT queries Project out specific variables and expressions: Construct RDF triples/graphs: SELECT ?c ?cap (1000 * ?people AS ?pop) CONSTRUCT { ?country a ex:HolidayDestination ; Project out all variables: ex:arrive_at ?capital ; SELECT * ex:population ?population . } Project out distinct combinations only: SELECT DISTINCT ?country Results in RDF triples (in any RDF serialization): Results in a table of values (in XML or JSON): ex:France a ex:HolidayDestination ; ?c ?cap ?pop ex:arrive_at ex:Paris ; ex:population 635000000 . ex:France ex:Paris 63,500,000 ex:Canada a ex:HolidayDestination ; ex:Canada ex:Ottawa 32,900,000 ex:arrive_at ex:Ottawa ; ex:population 329000000 . ex:Italy ex:Rome 58,900,000 ASK queries DESCRIBE queries Ask whether or not there are any matches: Describe the resources matched by the given variables: ASK DESCRIBE ?country Result is either “true” or “false” (in XML or JSON): Result is RDF triples (in any RDF serialization) : true, false ex:France a geo:Country ; ex:continent geo:Europe ; ex:flag <http://…/flag-france.png> ; … 7 SPARQL Query: Syntax PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name FROM <http://example.com/dataset.rdf> WHERE { ?x foaf:name ?name . } ORDER BY ?name PREFIX SELECT FROM WHERE ORDER BY The PREFIX keyword The SELECT keyword The FROM keyword The WHERE clause The ORDER BY is one describes prefix is the most popular of defines the RDF dataset specifies the query of the several possible declarations for the 4 possible return which is being queried. graph pattern to be solution modifiers, abbreviating URIs. clauses (more on the There is an optional matched. This is the which are used to others later). If you've clause, FROM NAMED, heart of the query. A rearrange the query used SQL, SELECT which is used when you graph pattern, as results. Other solution serves very much the want to query a named mentioned above, is, in modifiers are LIMIT same function in graph. essence, RDF with and OFFSET. SPARQL, which is variables. simply to return data matching some conditions. Returns Clauses • In addition to SELECT, there are three other very important return clauses that you can use: ASK, DESCRIBE, and CONSTRUCT. 1. ASK: Check if there is at least one result for a given query pattern. The result is true or false. 2. DESCRIBE: Returns an RDF graph that describes a resource. The implementation of this return form is up to each query engine, so you won't see it used nearly as often as the other return clauses. 3. CONSTRUCT: returns an RDF graph that is created from a template specified as part of the query itself. CONSTRUCT is used to transform RDF data (for example into a different graph structure and with a different vocabulary than the source data). SPARQL Architecture & Endpoint • SPARQL queries are executed against RDF datasets, consisting of RDF graphs. (More on this later.) • A SPARQL endpoint accepts queries and returns results via HTTP. – Generic endpoints will query any Web-accessible RDF data – Specific endpoints are hardwired to query against particular datasets • The results of SPARQL queries can be returned and/or rendered in a variety of formats: – XML. SPARQL specifies an XML vocabulary for returning tables of results. – JSON. A JSON "port" of the XML vocabulary, particularly useful for Web applications. – RDF. Certain SPARQL result clauses trigger RDF responses, which in turn can be serialized in a number of ways (RDF/XML, N-Triples, Turtle, etc.) – HTML. When using an interactive form to work with SPARQL queries. Often implemented by applying an XSL transform to XML results. About DBpedia • DBPedia is an RDF version of information from Wikipedia. • DBPedia contains data derived from Wikipedia's infoboxes, category hierarchy, article abstracts, and various external links. • DBpedia contains over 100 million triples. Dbpedia SPARQL Endpoint • http://dbpedia.org/sparql Basic Graph Patterns • This query returns all of the URIs that identify cities that are of type "Cities in Texas". [Query 1] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> } Basic Graph Patterns • This query returns the cities that are of type "Cities in Texas" as well as their total populations. [Query 2] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp: <http://dbpedia.org/ontology/> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> . ?city dbp:populationTotal ?popTotal . } Basic Graph Patterns • This query returns the cities that are of type "Cities in Texas" with their total populations and metro populations. [Query 3] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp: <http://dbpedia.org/ontology/> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> ; dbp:populationTotal ?popTotal ; dbp:populationMetro ?popMetro . } Dealing with Missing or Sparse Data using OPTIONAL • This query returns the cities that are of type "Cities in Texas" and their total population and optionally the metro population, if it exists. [Query 4] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp: <http://dbpedia.org/ontology/> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro . } } Solution Modifiers: ORDER BY, LIMIT, OFFSET • This query returns the cities that are of type "Cities in Texas", their total population, and optionally their metro populations. The results are returned in the order of their total populations (so big cities like Houston would be the first results). [Query 5] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp: <http://dbpedia.org/ontology/> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro . } } ORDER BY desc(?popTotal) Solution Modifiers: ORDER BY, LIMIT, OFFSET • This query returns the cities that are of type "Cities in Texas", their total population, and optionally their metro populations. The results are returned in the order of their total populations (so big cities would be the top results). At most 10 results will be returned, starting with the 5th result. [Query 6] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp: <http://dbpedia.org/ontology/> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro. } } ORDER BY desc(?popTotal) LIMIT 10 OFFSET 5 Remove results using FILTER • A FILTER clause restricts which results are returned • With graph patterns and filters, SPARQL becomes a very powerful language for selecting only data that matches very specific criteria. • The following filters are allowed: – Logical: &&, ||, ! – Mathematical: +, -, *, / – Comparison: =, !=, <, >, <=, >= – SPARQL tests: isURI, isBlank, isLiteral, bound – SPARQL accessors: str, lang, datatype – Other: sameTerm, langMatches, regex Remove results using FILTER • This Query returns only cities that have a total population of more than 50,000. [Query 7] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp: <http://dbpedia.org/ontology/> SELECT * WHERE { ?city rdf:type <http://dbpedia.org/class/yago/CitiesInTexas> ; dbp:populationTotal ?popTotal . OPTIONAL {?city dbp:populationMetro ?popMetro . } FILTER (?popTotal > 50000) } ORDER BY desc(?popTotal) Remove results using