Querying N-Triples from an Extensible Functional DBMS

IT 10 012 Examensarbete 15 hp April 2010 Querying N-triples from an extensible functional DBMS Martynas Mickevicius Institutionen för informationsteknologi Department of Information Technology Abstract Querying N-triples from an extensible functional DBMS Martynas Mickevicius Teknisk- naturvetenskaplig fakultet UTH-enheten The goal of this project is to provide an N-triple wrapper for the extensible DBMS Amos II. This enables search and filter through the N-Triple streams using the Besöksadress: AmosQL query language to return relevant triples back as a result stream. Ångströmlaboratoriet Lägerhyddsvägen 1 To implement this wrapper, first N-triples streams are parsed using the Bison/Flex Hus 4, Plan 0 parser generator. The parsed triples are streamed to Amos II using its foreign function interface. The query processing kernel of Amos II then enables general Postadress: queries to N-triple data sources. The report includes a performance evaluation for Box 536 751 21 Uppsala the wrapper. Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student Handledare: Silvia Stefanova Ämnesgranskare: Tore Risch Examinator: Anders Jansson IT 10 012 Tryckt av: Reprocentralen ITC 1. Introduction ....................................................................................... 7 2. Background ........................................................................................ 7 2.1. Resource Description Framework .................................................. 7 2.1.1. URIs ......................................................................................... 8 2.1.2. Literals...................................................................................... 8 2.1.3. Blank node ............................................................................... 8 2.1.4. N-Triples file format ................................................................ 8 2.2. Database Management Systems ..................................................... 9 2.3. Amos II ........................................................................................... 9 2.4. Bison/Flex .................................................................................... 10 3. The N-Triple wrapper ...................................................................... 10 3.1. The parser ..................................................................................... 10 3.2. RDF types and functions .............................................................. 12 3.3. Encoders and decoders ................................................................. 12 4. Implementation ................................................................................ 13 4.1. RDFResource objects ................................................................... 13 4.1.1. Internal structure .................................................................... 13 4.1.2. Registering custom type to Amos II ...................................... 14 4.1.3. Internal functions ................................................................... 15 4.1.4. Creating RDFResource object ............................................... 15 4.1.5. Printing ................................................................................... 16 4.2. Encoders and decoders ................................................................. 17 4.3. External AmosQL functions......................................................... 18 4.4. External ALisp functions ............................................................. 20 4.5. The lexer ....................................................................................... 20 4.6. The parser ..................................................................................... 20 5. Evaluation ........................................................................................ 21 5.1. Conformance ................................................................................ 21 5.2. Performance .................................................................................. 21 6. Summary and conclusions ............................................................... 21 7. Examples ......................................................................................... 22 8. References ....................................................................................... 22 9. Table of figures................................................................................ 23 5 1. Introduction Due to the always decreasing costs of managing distributed systems and continuously increasing speed of interconnectivity computer systems have become highly heterogeneous. However from the user point of view there is still a need to access and search different kinds of data. These problems can be solved by using a mediator, which is a system to enable combined search in different kinds of data sources. There are very many different kinds of data models and standards for one database system to cope with. To alleviate this, the World Wide Web Consortium has introduced a new set of data representation languages called the Semantic web [2]. These languages are based on new standard called Resource Description Framework (RDF) that enables to link data of different kinds. In this work the functional DBMS Amos II [1] is extended with a wrapper interface to allow database queries to streams of RDF triples represented in the N-Triples [4] format. The overall result of the implemented wrapper is a function that takes a stream as a parameter and returns a stream of objects which can be further manipulated in the system. The wrapper also includes primitives to create and access object properties. 2. Background The following technologies are involved in this project: Resource Description Framework Database Management Systems Amos II Bison/Flex parser generator 2.1. Resource Description Framework Resource Description Framework (RDF) is a method to model any kind of information. It is based on modeling data and relationships between data entities as graphs. RDF has quite a few serialization formats, such as XML and Notation 3 (N3) [3]. A subset of the latter, the N-Triples format [4] is the one being used in this project work. It has some advantages over the other formats, such as being human readable and having minimal overhead. An N-Triples data file consists of a set of triples. N-Triples represent the same data relationships as RDF statements do. For example, the following three N-Triples define three RDF statements: 7 <http://www.w3.org/> <http://purl.org/creator> "Dave" . <http://www.w3.org/> <http://purl.org/creator> "Art" . <http://www.w3.org/> <http://purl.org/publisher> <http://www.w3.org/> . These RDF statements can also be expressed using the more complex RDF- XML format [12]: <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:dc="http://purl.org/"> <rdf:Description rdf:about="http://www.w3.org/"> <dc:creator>Art</dc:creator> <dc:creator>Dave</dc:creator> <dc:publisher rdf:resource="http://www.w3.org/"/> </rdf:Description> </rdf:RDF> The triples are defined using three kinds of resources: URIs, literals and blank nodes. They are represented in files using different notations. 2.1.1. URIs A uniform resource identifier, URI, represents an identifier of globally unique resource on the web, e.g. a URL or an image. It is enclosed in less than (<) and greater than (>) signs, e.g.: <http://example.org/resource1> 2.1.2. Literals A literal is used to represent values such as numbers or dates by means of a lexical representation. They are specified enclosed in double quotes (“), optionally followed by type or language definitions, e.g.: "chat1"^^<http://example.org/datatype1> "chat2"@en-us Note that a literal string is followed by a double caret sign (^^) and another URI resource representing the datatype. 2.1.3. Blank node A blank node is an internal node which is not identified by a universal URI and is not literal. It starts with an underscore and a colon (_:) and then is followed by its name. _:anon 2.1.4. N-Triples file format The RDF statements in an N-Triples file are defined as triples of resources, for example: <http://www.w3.org/> <http://purl.org/creator> "Dave" . 8 Each RDF statement consists of subject, predicate, and object. The subject is the resource that is being described. It can be a URI or a blank node. The predicate defines the property for the subject. It is always a URI. The object is referred as a property value for the subject. It can be either of three kinds of resources. This gives a total combination of six kinds of triples. In the N- triples format each triple is followed by dot (.) and an end-of-line character. Triple can be one of the following: URI URI URI . URI URI Literal . URI URI Blank . Blank URI URI . Blank URI Literal . Blank URI Node . This specifies the complete grammar of N-Triples as Extended Backus- Naur Form (EBNF) Grammar [6]. This project implements a parser for RDF triples in the N-Triples format using this grammar. 2.2. Database Management Systems A database management system (DBMS) is a set of programs which are responsible for storage, management, and retrieval of data from a database. A DBMS accepts queries for data from an external application program and delivers matching data back to the application. The most common query language is SQL. A data model is the kind of data types used to model data in a DBMS. The most common data model is the relational model where data is represented as tables consisting of rows and attributes (columns). These

Querying N-Triples from an Extensible Functional DBMS

Semantics Developer's Guide

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

Exercise Sheet 1

An Introduction to RDF

On Blank Nodes

Common Sense Reasoning with the Semantic Web

RDF—The Basis of the Semantic Web 3

Resource Description Framework Technologies in Chemistry Egon L Willighagen1* and Martin P Brändle2

Inscribing the Blockchain with Digital Signatures of Signed RDF Graphs. a Worked Example of a Thought Experiment

Connecting RDA and RDF Linked Data for a Wide World of Connected Possibilities

Semantic Integration and Analysis of Clinical Data-1024