RDF Triplestores and SPARQL Endpoints

RDF triplestores and SPARQL endpoints Lecturer: Mathias Bonduel [email protected] LDAC summer school 2019 – Lisbon, Portugal Lecture outline • Storing RDF data: RDF triplestores o Available methods to store RDF data o RDF triplestores o Triplestore applications – databases – default graph – named graphs o List of triplestore applications o Comparing triplestores o Relevant triplestore settings o Communication with triplestores • Distributing RDF data: SPARQL endpoints o Available methods to distribute RDF data o SPARQL endpoints o Reuse of SPARQL queries o SPARQL communication protocol: requests and responses June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 2 Storing RDF data June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 3 Available methods to store RDF data • In-memory storage (local RAM) o Working memory of application (e.g. client side web app, desktop app) o Frameworks/libraries: RDFLib (Python), rdflib.js (JavaScript), N3 (JavaScript), rdfstore-js (JavaScript), Jena (Java), RDF4J (Java), dotNetRDF (.NET), etc. o Varied support for SPARQL querying • Persistent storage (storage drive) o RDF file/dump (diff. RDF serializations): TTL, RDF/XML, N-Quads, JSON-LD, N-triples, TriG, N3, TriX, RDFa (RDF embedded in HTML), etc. o RDF triplestore o (ontology editing applications: Protégé, Topbraid Composer, etc.) June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 4 RDF triplestores “a database to store and query RDF triples” • Member of the family of graph/NoSQL databases • Data structure: RDF • Main query language: SPARQL standards • Oftentimes support for RDFS/OWL/rules reasoning • Data storage is typically persistent June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 5 Triplestore applications – databases - default graph - named graphs • An RDF triplestore instance (application) can have one or multiple databases (repositories) • Each database has one default graph and zero or more named graphs o a good practice is to place TBox in a separate named graph. Most triplestores’ reasoning engines consider all TBox statements in the database o You can also use the TBox without reasoning => query directly June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 6 Instance of an RDF triplestore application (e.g. an Apache Jena Fuseki instance) Named graph 1 Default graph Default graph Named graph n Named graph 2 Database 1 Database 2 Database m June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 7 List of triplestore applications • Apache Jena: o TDB o Fuseki • Blazegraph DB • Stardog • Strabon • Apache Marmotta/KiWi • Ontotext GraphDB (former OWLIM) • Openlink Virtuoso • Cambridge Semantics AnzoGraph • Amazon Neptune • Franz AllegroGraph • BBN Parliament • ClioPatria • ... June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 8 List of triplestore applications • Apache Jena Fuseki • Stardog • Ontotext GraphDB • BBN Parliament June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 9 Comparing triplestores (1/4) • Hardware sizing: recommended RAM, disk space, etc. • Efficiency (speed and memory usage) o RDF loading/import efficiency o Querying efficiency o Reasoning efficiency • License options o open source/proprietary o free/educational/commercial • Security: SSH, access roles June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 10 Comparing triplestores (2/4) • API programming language support • SPARQL 1.1 support + SPARQL extensions (spatial, temporal, PATHS, etc.) • Reasoning: support for RDFS, OWL flavors, rule languages, SHACL • Validity/conformity check of input data • Documentation (or the lack of it) • (Commercial) support • Extras: o GraphQL, support for Apache TinkerPop API (e.g. Gremlin), etc. o plug-and-play cloud database o full-text search, machine learning o virtual graphs June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 11 Disclaimer: no liability regarding Comparing triplestores (3/4) completeness and correctness Name GeoSPARQL Multiple repositories Named Supported reasoning Licensing options Bulk triplestore querying per DB instance graphs schemes loading Apache Jena No* Yes Yes RDFS/OWL rule Open source (Apache Yes Fuseki engines** License v2.0) Stardog Yes Yes Yes RDFS, OWL DL/QL/RL, Proprietary: Enterprise Yes SL (includes SWRL), licenses (60 day trial), SHACL free academic license Ontotext No*** Yes Yes RDFS, OWL QL/RL, Proprietary: Enterprise Yes GraphDB custom rulesets license (60 day trial), standard license (60 day trial), free license BBN Yes No Yes Selection of RDFS and Open source (BSD) Yes Parliament OWL rules (config), SWRL * Limited custom geospatial queries possible over WGS84 and GeoSPARQL WKT data (newest versions might include GeoSPARQL querying) ** Jena modules can be connected for extensions: http://jena.apache.org/documentation/inference/ *** Limited custom geospatial queries possible over WGS84 data (GraphDB functions) June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 12 Disclaimer: no liability regarding Comparing triplestores (3/4) completeness and correctness Name Support Programming Extras: rule languages, validity check, built-in security triplestore interface Apache Jena (Public Java (Jena) Support for rules, authentication (Apache Shiro), extensions via Jena Fuseki mailinglist) (advanced reasoning, geospatial querying, full text search, etc.) Stardog Commercial Java (SNARL Validation (SHACL/ICV), rules (SWRL and Stardog Rules), PATHS queries, support, (native API), virtual graphs (RDBMS, CSV/JSON/XML, MongoDB, Cosmos DB, Stardog forum Sesame, RDF4J, Cassandra, etc. using R2RML or Stardog mapping syntax), GraphQL, Jena) property graph model (Tinkerpop), built-in authentication, full-text search, NLP, query/reasoning(inconsistency/inference)/validation explanations, machine learning functions Ontotext Commercial Java (RDF4J, Jena) Validity check, separate plug and play cloud database, full-text search, GraphDB support, RDFRank (interconnectedness), built-in authentication, custom email, Stack GraphDB Rules, query explanations Overflow BBN (Github) Java, C++ Support for SWRL rules, temporal indexing Parliament June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 13 Relevant triplestore configuration settings (1/2) • SPARQL endpoint host and port • Query timeout RAM • Heap size JVM Heap Off-Heap OS • Security and authorisation o Connection methods to HTTP server: enable/require SSL encryption o users: name and password o roles connected to users o rights for user/role: read/write RDF content, create/delete databases, adjust settings/users, grant/revoke permissions • Treatment of literals: strict parsing, canonicalization Settings with potential impact on loading/querying efficiency June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 14 Relevant triplestore configuration settings (1/2) • Indexing: regular, geospatial, temporal, text (literals) • Query scope: default graph, union of all named graphs, union of all named graphs and default graph • Active reasoning profiles: RDFS, OWL RL, OWL QL, etc. • Active rule engines • Enable spatial querying, reasoning, etc. • ... Settings with potential impact on loading/querying efficiency June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 15 Communication with triplestores • SPARQL over HTTP request: SPARQL protocol (code, cURL, Postman, etc.): o SPARQL endpoint URL: read update o Options: query, authentication, inference, input/output format o standardized • Native API of triplestore (own code, commandline): o Bulk loading RDF files o Management of databases: (all/active) databases, start, stop, drop, settings, roles, etc. o Direct queries • Web/desktop client: o Triplestore specific client: via SPARQL protocol or native API o Generic client: via SPARQL protocol June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 16 Links to installation guidelines • Apache Jena Fuseki: o Visual interface: Fuseki web app on localhost:3030 (comes with triplestore) o Triplestore: all OS • Stardog: o Visual interface: Stardog Studio (and web app) o Triplestore: Linux and Mac OS / Docker / Windows • GraphDB: o Visual interface: GraphDB workbench web app on localhost:7200 (comes with triplestore) o Triplestore: Linux / Mac OS / Windows • Parliament: o Visual interface: Parliament web app on localhost:8089/parliament (comes with triplestore) o Triplestore: all OS (download from Github repository, old website is depreciated) June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 17 Distributing RDF data June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 18 Available methods to distribute RDF data • SPARQL endpoint • Triple Pattern Fragment in-memory RDF store server endpoint (local access only) • RDF files: o dump file o subject page o RDF embedded in HTML (i.e. RDFa) Source: http://linkeddatafragments.org/concept/ June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 19 SPARQL endpoint (1/4) “An entry point for HTTP access to shared RDF data using SPARQL as a query language” ( standardized SPARQL communication protocol) June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias Bonduel 20 SPARQL endpoint (2/4) Triplestores RDF files HTTP request with SPARQL query RDBMS NoSQL HTTP response Virtual Client Server graphs XML files JSON files CSV files Triple Pattern Fragment Servers June 18, 2019 RDF triplestores and SPARQL endpoints | Mathias

Load more