RAL: an Algebra for Querying RDF

Flavius Frasincar Geert-Jan Houben Richard Vdovjak Peter Barna Eindhoven University of Technology

PO Box 513, NL-5600 MB Eindhoven, the Netherlands

flaviusf, houben, richardv, pbarna ¡ @win.tue.nl

Abstract ject, predicate, object) or in an XML syntax. Compared to XML, which is document-oriented, RDF To make the WWW machine-understandable there is a takes into consideration a knowledge-oriented approach de- strong demand both for languages describing metadata and signed specifically for the Web. One of the advantages of for languages querying metadata. The Resource Descrip- RDF over XML is that an RDF graph depicts in a unique tion Framework (RDF), a language proposed by W3C, can form the information to be conveyed while there are sev- be used for describing metadata about (Web) resources. eral XML documents to represent the same semantic graph. RDF Schema (RDFS) extends RDF by providing means RDF’s strength is based on its ability to define statements for creating application specific vocabularies (ontologies). that assign values to resource properties. While the two above languages are widely acknowledged While object-oriented systems are object-centered (prop- as a standard means for describing Web metadata, a stan- erties are defined in a class context), RDF is property- dardized language for querying RDF metadata is still an centric, which makes it easy for anyone to “say anything open issue. Research groups coming from both industry and about anything” [4], an architecture principle of the SW. In academia are presently involved in proposing several RDF RDF, concepts from E-R modeling are being reused for web query languages. Due to the lack of an RDF algebra such ontology modeling, a common understanding of topics that query languages use APIs to describe their semantics and allows application interoperability [12]. optimization issues are mostly neglected. This paper pro- RDF Schema (RDFS) [24] can be used to define ap- poses RAL, an RDF algebra suitable for defining (and com- plication specific vocabularies. These vocabularies define paring) the semantics of different RDF query languages and taxonomies of resources and properties used by specific (at a future stage) for performing algebraic optimizations. RDF descriptions. RDFS is designed as a flexible lan- After the definition of the data model we present the op- guage to support distributed description models. Unlike erators by means of which the model can be manipulated. XML DTD/Schema, RDFS does not impose a strict typing The operators come in three flavors: extraction operators on descriptions (e.g. one can use new properties that were retrieve the needed resources from the input RDF model, not present in the schema, a resource can be an instance of loop operators support repetition, and construction opera- more than one class etc.). The set of primitive data types tors build the resulting RDF model. in RDF is left on purpose poorly defined as it is envisaged that RDFS will reuse the work done for data typing in XML Schema [6]. We do hope that future versions of RDFS will 1. Introduction bring clarification regarding this subject and will eliminate several other shortcomings of the present specification (e.g. The next-generation Web, called the Semantic Web missing set collection, difficult literals handling etc.). (SW), aims at making its content not only machine-readable In the XML world there is already a winner in the but also machine-understandable [5]. The final goal of the quest for the most appropriate XML query language, i.e. SW is to make machines fully exploit the Web knowledge XQuery [7]. As the SW initiative started not too long ago, in an uncentralized manner. The W3C standard called Re- its supporting technologies are still in their infancy. Re- source Description Framework (RDF) [8, 24] is intended to search groups coming from both industry and academia serve as a metadata language for the Web and together with are presently involved in proposing several RDF query lan- its extensions lays a foundation for the SW. It has a graph guages. Due to the lack of an RDF algebra such query lan- notation, which can be serialized in a triple notation (sub- guages use APIs to describe their semantics and the opti- mization issues are mostly neglected. This paper proposes information in an RDF graph. The location step and the fil- RAL, an RDF algebra suitable for defining (and compar- ter constructs were present also in XPath, but the primary ing) the semantics of different RDF query languages and selection construct is new. With the RDF graph being a (at a future stage) for performing algebraic optimizations. forest, one needs to specify from which trees the selection will be made. RDFT is an RDF declarative transformation 2. Related Work language a la XSLT [21], while RQuery, an RDF query lan- guage, is obtained by replacing XPath [3] with RDFPath in XQuery [7]. However this approach is not using the features At present there already exist a few RDF query lan- specific for RDF, as the RDF Schema is being completely guages but to our knowledge there is no full-fledged RDF neglected. algebra. The only algebraic description of RDF that we en- The third approach (coming from ICS-FORTH in countered so far is the RDF data model specification [28] Greece) uses the RDF Graph Model for defining the RDF done by Sergey Melnik (Stanford). It is based on triples and query language RQL [20]. It extends previous work on provides a formal definition of resources, literals, and state- semistructured query languages (e.g. path expressions, fil- ments. Despite being nicely defined, the specification does tering capabilities etc.) [14] with RDF peculiarities. Its not include URIs, neglects the RDF graph structure, and strength lies in the ability to uniformly query both RDF does not provide operations for manipulating RDF models. descriptions and schemas. Compared to the previous ap- Another formal approach, which aims not only at formaliz- proach it exploits the inference given in the RDF Schema ing the RDF data model but also at associating a formal se- (e.g. multiple classification of resources, taxonomies of mantics to it, is the RDF Model Theory (RMT) [18]. It does classes/properties etc.) being the most advanced RDF query not however, qualify as an algebraic approach but rather, as language proposed so far. the name suggests, a model theoretic one. As RMT is cur- Other query languages have been proposed during the rently being considered a main reference when it comes to last years: Algae [31] (W3C) and rdfDB query language RDF semantics, we tried to make our algebra (especially the [16] (Netscape) as graph matching query languages. RDF data model part) compatible with RMT. query languages similar to rdfDB are: RDFQL [19], David As implementation of RDF toolkits started before having Allsop’s RDF query language [1], SquishQL [30], and an RDF query language, there are a lot of RDF APIs present RDQL [32] (HP Labs) an implementation of SquishQL on today. Three main approaches for querying RDF (meta)data top of the Jena RDF API [27] of Brian McBride (HP Labs). have been proposed. Some other proposed RDF APIs are: Wilbur [23] (Nokia), The first approach (supported in the W3C working group the RDF API of Sergey Melnik [29] (Stanford), and Red- by Stanford) is to view RDF data as a knowledge base of land [2]. DQL, the query language for DAML+OIL Web triples. Triple [33], the successor of SiLRI (Simple Logic- ontology language [9] (built on top of RDF), is currently based RDF Interpreter) [11], maps RDF metadata to a Horn under development. Logic (replacing Frame Logic) knowledge base. A similar All the proposed approaches disregard the reconstruc- approach is taken in Metalog [26], which matches triples to tion of the output: they leave the output as a “flat” RDF Datalog predicates (a subset of Horn logic). In this way one container of input resources. An RDF algebra needs to take can query RDF descriptions at a high level of abstraction, into account also the construction part, as the resulting RDF i.e. at a logical layer that supports inference [17]. graph can contain new vertices and edges not present in the The second approach proposed by IBM builds upon the original RDF graph. This construction part is not only nec- XML serialization of RDF. In the “RDF for XML” project essary to express RDF queries, but also for their optimiza- (recently removed), an RDF API is proposed on top of the tion. Query optimization can be achieved not only in the IBM AlphaWork’s XML 4 Java parser. In the frame of the extraction part but also in the construction part when the same project a declarative query language for RDF (RDF actual output is going to be produced. Query) [25] was created for which both input and output are resource containers. One of the nice features of this query language is that it proposes operators similar to the 3. Data Model relational algebra, leaving the possibility to reuse some of the 25 years experience with relational databases. Unfortu- An RDF model is similar to a directed labeled graph nately it fails to include the inference rules specific to RDF (DLG). However, it differs from a classical DLG definition Schema, loosing description semantics. since it allows for multiple edges between two nodes. It Stefan Kokkelink goes even further with the second ap- also differs from a multigraph because the different edges proach proposing RDF query and transformation languages between two nodes are not allowed to share the same label. that extend existing XML technologies. With this respect The graph is not necessarily connected and it can contain he defines RDFPath [22], similarly to XPath, for locating cycles. The graph nodes represent resources or literals (strings). unless they represent items in a sequence container. Not Resources can be further classified as URI references or having the burden of preserving element order eases the def- blank nodes. Each blank node (also called an anonymous inition of algebra operators and their laws. resource) is unique in the graph despite the fact that it has Each node has three basic properties as described in Ta-

no label associated to it. Non-blank nodes are labeled with ble 1. The q of a node represents the label associated to resource identifiers or string values. The graph edges repre- it. The nodes from the subset of resources that represent the

sent properties and are labeled by property names. Edges blank nodes do not have an q associated to them. There are

s:V¢:}% {b rs:= ¥~qrn oGN between different pairs of nodes may share the same la- two % s of nodes: and . The

bel and the same property can be applied repetitively on a Z=€G gives the unique internal identifier of each node in q certain resource. This RDF feature enables multiple clas- the graph. Z=€G has the same value as the for the sification of resources, multiple inheritance for classes, and nodes that have a label, but in addition it gives a unique

multiple domains/ranges for properties. Both resources and identifier to the blank nodes. The internal identifier ZG€= properties are first class citizens in the proposed RDF data is not available for external use, i.e. it is not disclosed for

model. querying. £

We identify the following sets: ¢ (set of resources), ¥ (set of URI references), ¤ (set of blank nodes), (set of lit-

erals), and ¦ (set of properties). At RDF level the following Table 1. Basic properties for nodes ¦   !£

holds for the above sets: ¢¨§©£ ¤ , , Basic property Result Result

 %& '¦ £ ¤ ¥

¦#"$¢ , , and , , and are pair-wise for resources for literals

NQPl./}v2b

disjoint. q

% ¢‚:} {g ¥~qrn oƒN % The property  defines the type of a particular

resource instance. At RDF level any resource can be the Z=€G internal ID internal ID 

target of  . RDF supports multiple classification of  resources because  , (as any other property) can be repeated on a particular resource. Each edge has three basic properties as described in Ta- ble 2. Compared with nodes, which have unique identifiers,

Definition 1. An RDF model ( is a finite set of triples

edges have a ZoGz1 (label), which may be not unique. There (statements)

can be several edges sharing the same ZoGz  but connecting ()"*¢,+-£©+!./¢0 1¥32

different pairs of vertices. The ZoGz  of an edge is (lexi-

cally) identified with the q of the resource corresponding Definition 2. The set of properties of an RDF model ( is

to that property. The :e}v„F ={U of an edge gives the resource

?„n G{U

G<7?2@ A(6BC.DE

an RDF model ( is J

§6./K-<7LM

using the following construction mechanism: for each

ZoGz  NQSY.D2b[<7Z>\4 ]K

.;:=

:e}s„n G{U <7 l ¢

§_ NQP`.aZ>[b2c§d: NQP`.aZ>\e2c§X gf0 hL

: ^ ) with , , and add

?„F ={U ƒ

Z>\ NQSi.;gfV2j§k

as a directed edge between Z>[ and with . In

l A¤ NQPl./Z>[b2 NQPl./Z>\g2

case that : and/or then and/or are not 2 defined. The function NQPR.Fm is an injective partial function, while NQSj.nm2 is a (possibly non-injective) total function. Definition 3. Two non-blank nodes are considered to be We use quotes for strings that represent literal equal if they have the same q . Two blank nodes are con- nodes to make a syntactical distinction between them sidered to be equal if they have the same (RDF) properties

and URI nodes. A URI can be expressed by quali- and the corresponding (RDF) property values are equal. ¦poGqrZsqtZvu

fied names (e.g. :V ) or in absolute form (e.g.

w w

xVx?y%oGzlN/Gm {bzAxV:{ z oH|R¦poGqrZsqrZvu QE ). Blank nodes All equal non-blank nodes are internally mapped into do not have a proper identifier which implies that they can one node in the graph. be queried only through a property pointing to them. Com- pared to XML, which defines an order between subele- Definition 4. Two graphs are considered to be equal if they

ments, in RDF the properties of a resource are unordered differ only by re-naming the Z=€G of their blank nodes. Note that two graphs that have all their nodes equal (node We extend the data model with the set C (set of

equality) may be not equal (graph inequality) if some corre- classes). At RDFS level the following holds: ‡Š"‹¢ ,

¦ %  ‡ s:=D‡N/o‰::-

sponding non-blank nodes have different properties and/or rs:=¢:e} {bŒ ‡ , ,

s:= ¥Tqrne oƒN ‡ different property values. ‡ , and . RDF Schema (RDFS) provides a richer modeling lan- guage on top of RDF. RDFS adds new modeling primitives Definition 5. The set of classes of an RDF model M is

by introducing RDF resources with an additional semantics.

%G

els can be viewed as (plain) RDF models. Figure 1 depicts ¢:e} {b The most general types are s:V and graphically the RDF/RDFS primitives.

rs:=¥~qrn oƒN which represent all resources and literals re- spectively. According to the data model these types are dis-

rdfs:Resource rdfs:Literal

¢:} {b s:VD‡N;oƒ::

junctive. Subclasses of rs:= are and s:=D‡N/o‰:: rs:=¦ %  , representing all types (already

stated above) and rs:=¦ %  containing all properties.

rdf:Property rdfs:Class The distinction between properties and resources is not a clear cut one as properties are resources with some addi- tional (edge) semantics associated to them. A property can be used repetitively between nodes (somehow similar to re-

rdf:type rdfs:subClassOf rdfs:domain peating a particular type in the definition of its instances)

which justifies the existence of an y%neZs function (defined rdfs:subPropertyOf rdfs:range later on) for properties (as well as for classes). Moreover

property instances can have the s:VD:e}s„U¦ % ‰ˆp de-

rdfs:subClassOf fined in the same way as one can use the s:=:e}s„b‡N/o‰::ˆp rdf:type for classes.

From all RDFS properties the most important ones are:

s:= :e}s„U¦ % ‰ˆp s:==z oGqtZ rs:=D:e}v„g‡N/o‰::ˆp , , ,

Figure 1. RDF/RDFS primitives

oGZvu‰ r¦ % 

and s:= (all being instances of ). s:=D:g}s„U¦ % Hˆp rs:=D:e}v„g‡N/o‰::ˆp and are used to RDFS supports taxonomies of resources/properties us- define inheritance relationships between classes and prop-

ing the inheritance mechanism at both class level (us- erties respectively. According to the RDF Test Cases [15]

rD:e}s„b‡N/o‰::ˆp rD:e}s„U¦ % ‰ˆp ing the property s:=D:g}s„b‡N;oƒ::ˆp ) and property level and can produce

(using the property s:=:g}s„U¦ % Hˆp ). It also de- cycles, a useful mechanism if we think about class or prop-

fines constraints (e.g. names to be used for properties, do- erty equivalence. A resource of type ¦ %  may

rs:= oGZvuƒ main and range for properties etc.) that need to be ful- define the s:VGz1oGqrZ and the associated to filled by RDF descriptions (later on called instances) in or- that property (i.e. the type of the subject/object nodes of the

der to validate them according to the associated schema. property edge in an RDF instance). Inspired by ontology

Gz o=qrZ s:= oGZvu‰

RDFS provides a type system built on the following languages, like OWL [10], s:= and

¦   s:=‡N;o‰::

primitives: s:=¢‚:} {g , , , can be multiply defined for one particular property and will

¥~qrn oƒN s:VD:e}s„b‡N/o‰::ˆp rs:=D:e}v„b¦  Hˆp

s:= , , , have conjunctive semantics.

s:= =z oGqtZ s:= oGZvu‰  , and The distinction between There is one particular class called s:=¥~qrn oƒN that

and s: namespaces to be used for different resources is represents all strings. Note that the RDF Model Theory more due to historical reasons (RDF was developed before [18] provides a more complex definition of the literal as a

RDFS) than due to semantical ones. triplet (bit, character string, language tag), where the bit is % Every resource that has the r equal to used to flag if the character string represents an XML frag-

s:=‡N/o‰:: , represents a type (or class) in the RDF(S) ment or not. In the data model we simplify the literal def-

type system. Types can be classified as primitive types: inition considering just the character string from the above

¢‚:} {g r ¦ %  s:=‡N;o‰:: s:=¥~qrn oƒN s:= , , , triple. Note that literals are not resources, i.e. one cannot

or user-defined types (resources defined explicitly by a associate properties to them. On the other hand there are re-

 rs:=¥~qrn oƒN

particular RDF model to have the  equal to sources that have type and thus can have prop- rs:=D‡N/o‰::

s:=‡N/o‰:: ). The type of the resource is de- erties attached to them. Nevertheless one cannot say which rs:=D‡N/o‰::

fined reflexively to be rs:=D‡N/o‰:: . contains all literal this resource denotes. RDF defines also container

¤Ro=u ‘YN/ the types, which is not the same thing as saying that it in- classes: rDŽ , , and to model ordered cludes all the values (instances) represented by these types. sequences, sets with duplicates, and value alternatives. The

r ’   “ r  ” ¢* 1¥ y˜nZs properties  , , etc. refer to the to . In this way one can define the of an RDF container members. property even if the property is not explicitly defined in a Each node representing a class has three schema prop- schema. In a schemaless RDF graph all resources are as-

erties as shown in Table 3. Schema properties associated sumed to be of type rs:=¢:e} {b . to nodes are short notations (like a macro) for expressions The RDF Model Theory [18] defines the RDF-closure

doing the same computation based only on basic proper- and RDFS-closure of a certain model ( by adding new

 ‡N/o‰:: ties. The of a class node is . The set of super- triples to the model ( according to some given infer-

classes (classes from which the current class node is inher- ence rules. We call the original model ( the exten- iting properties) is given by :e}v„g‡N/o‰::ˆp . RDFS allows sional data and the newly generated triples we call inten- multiple inheritance for classes because s:VD:e}s„b‡N/o‰::ˆp sional data. There are two inference rules for RDF-closure

(as any other property) can be repeated on a particular class. and nine inference rules for RDFS-closure. The inference

ey%nZs  The of a class node is the set of all instances of this rules for RDF-closure are adding the  (pointing

class. to r¦ %  ) for all properties in the model. Exam-

ples of inference rules for RDFS-closure are: transitivity of s:VD:e}s„U¦ % ‰ˆp

Table 3. Schema properties for class nodes rs:=D:e}v„g‡N/o‰::ˆp , transitivity of ,

% r % Schema property Result r inference based on a edge followed by

an s:=:e}s„b‡N/o‰::ˆp edge etc. One should note that the ‡N/o‰::

 output of these inference rules may trigger other rules. Nev- Ž•<Ž–" ‡

:e}s„b‡N/o‰::ˆp ertheless the rules will terminate for any RDF input model

y%neZs ¢—<7¢—" ¢

( , as there is only a finite number of triples that can be

formed with the vocabulary of ( . Each node representing a property has five schema prop-

Definition 6. An RDF model ( is complete if it contains erties as shown in Table 4. The % of a property node is both its RDF-closure and RDFS-closure.

¦   . The set of superproperties (properties which the

current one is specializing) is given by :e}v„b¦  Hˆp . In the proposed data model we neglect reification and

Note that the domain or range of a superproperty should

s:= qn:e›qrZG¤œ built-in properties like s:=:‘YNt:e , ,

be superclasses for the domain or range, respectively, of s:=N/o‰„UN

rs:={bz1z1Zs , and without loosing generality. oGZvu‰ the current property. The =z oGqtZ and return sets of classes that represent the domain and the range, respec- tively, of the property node. The y˜nZs of a node is the 4. RAL set of resource pairs linked by the current property (which is a subset of the Cartesian product between the associated domain and range extents). The purpose of defining RAL is twofold: to support the formal specification of an RDF query language and to en- able algebraic manipulations for query optimization. RAL Table 4. Schema properties for property is an algebra for RDF defined from a database perspective, nodes some of its operators being inspired by their relational alge-

Schema property Result bra counterparts. We used a similar approach in developing ¦ % 

% XAL [13], an algebra for XML query optimization. Ž•

:e}v„b¦  Hˆp During the presentation of RAL operators we will use A<7™"&‡N/o‰::

Gz1oGqrZ the RDF data from the example in Figure 2 as input. It is ¢`<7¢6"&‡N/o‰::

oGZvuƒ assumed that all operators know about the complete RDF LM<7Lš" =z oGqrZŒ+1 o=Zvu‰ y˜nZs model as it was defined in Section 3. That means that they all have the complete knowledge (extensional and inten- sional data) present in the given model. Variants of the pro-

One should note that we assume in the data model that posed operators can be defined using the suffix “  ” which

there can be several edges having the same ZoGz1 but link- will make the operators neglect the intensional data, i.e. data ing different pairs of resources. All these properties can derived by applying RDF(S) inference rules to the input be seen as “instances” (abusing the “instance” term, previ- model is neglected (similar to the RQL ‘strict interpreta- ously referring to resource instances of a particular class) of tion’). In order to simplify Figure 2 we chose to present

the property node with the q equal to their common name. only the extensional data and just one intensional data ele-

% ž

In the absence of a schema, all RDF properties are of ment given by the inferred edge  between and

¢ ‡‚ oGn type ¦ %  with domain equal to and range equal . Literal Literal the z o operator, to compute the result. In the operator’s name year general form,  is optional. One should note that an n-ary tname exemplified_by created_by cname Literal Technique Artifact Creator Literal exemplifies creates operator can be reduced to a unary one by having as input a sequence type. RAL operators are defined to work on any RDF de-

painted_by scriptions, with or without an explicit schema. Note Painting Painter paints that implicitly there is always a default schema based

image

¢:} {b ¦   on the RDFS primitives rs:= , ,

Image

s:V ¥~qrn oƒN schema rs:=D‡N/o‰:: , and . These RDFS primitives can

instance be used to retrieve a particular schema in case that such in- formation is not known in advance. Once the application tname cname "Chiaroscuro" r1 r4 "Rembrandt" schema is known, one can formulate queries to return in- exemplified_by exemplified_by paints paints stances from the input RDF model.

image image http://example.com/sb.jpg r2 r3 http://example.com/sp.jpg 4.1. Extraction Operators name year name year "Stone Bridge" "1638" "Self Portrait" "1628" The extraction operators retrieve the resources/literals of interest from the input collection of nodes. If the operator is name rdf:Property not defined on nodes that represent literals, these nodes are rdfs:subClassOf simply neglected. rdfs:subPropertyOf In the examples that illustrate the operators we will use rdf:type

inferred rdf:type expressions representing resources from the example RDF { model z of Figure 2. The expression represents the col-

lection (set) of all resources present in model z . Figure 2. Example schema and instance

Projection ¦

Figure 2 is an excerpt from the RDF schema and RDF

 ZoGz e¢r.;œVyV% ::eqZ>2 instance of some Web data describing different paint- ¡

ing techniques. For reasons of simplicity we consider The input of the projection is a collection of nodes (spec-

w

 qo= ?:{U}  only one painting technique ( ŸO‡ ), one painter ified by the expression ) and the projection operator com-

( ŸO¢‚zA„ oGZVn ), and two paintings of the same painter putes the values (objects) of the properties with a name

 ZoGz  ŸOŽNtv¦p  oGqrn

( ŸŽ•nZ¤œ q?u‰ and ). The schema given by the regular expression over strings. The

¦

| ¢:}% {b

does not present the RDFS primitives s:= , string that matches all names is denoted by .

¦ %  rs:=D‡N/o‰:: rs:= ¥~qrn oGN

 , , and from which

v2noGqrZsb¡ :g¢/|œ¢r./ žƒ2 Example 1. ¡9.;¦c8 returns the collection of all the resources and literals are derived.

resources in our RDF graph painted by ž (Rembrandt) (i.e.

We define RDF collections to be sets of nodes (re-

¦ ” ?“ and ).

sources/literals). All RAL operators are closed for collec-

 %e¢r./ žG2

tions, which implies that RAL expressions can be easily Example 2. ¡ returns the resource(s) repre- ¦poGqrZsn composed. The operators come in three flavors: extraction senting the type of ž (i.e. ). operators retrieve the needed resources from the input RDF Note that in these examples we talk about resources, model, loop operators support repetition, and construction where actually the output of the operators are collections operators build the resulting RDF model. of resources. The general form of the operators is

Selection § ‰¡ %¢.ayv£?2 {gZVqrqZv¢r.;œVy=˜ ::eqZ>2

Informally, this means the following. For each bind- ¡ {bZ=qtqZ

ing of y to an element in the input collection, given by The is a Boolean function that uses as

y <7y

¤ ¥ £ , is computed. is a function that allows constants URIs and/or strings. The operators allowed in

to refer to basic/derived properties or to one of the proposed the {gZVqrqZ are RAL operators, the usual comparison §l

operators. Based on the semantics of operator  a partial operators ( ), and logical operators

•./y2 oGZ%<7 <7Z result for the application of  to is computed. The op- ( ). The input of the selection is a collection of erator result is obtained by combining (set union) all partial nodes and the operator selects only the nodes that fulfill the

results. All unary operators use this implicit mechanism, {bZ=qtqZ .

§ ¦

w

¡ ZoGz g¢A§«ŸO‡ qoG ?:{U}  ¢./{g2 Example 3. ¡ is a se- Union

lection operation applied to the collection { of all re- sources from Figure 2. The expression returns the re- .ay VyV˜ ::eqrZ>2s ±./›VyV% ::eqZ>2

source(s) representing the painting technique with the name The union operator combines two input collections of

w

§ ¦

qroG ?:{U}  ƒ’

ŸO‡ (i.e. ). nodes reflecting the set-theoretical union.

¡ r %e¢3§¬‡‚ o=n ¢./{g2

Example 4. The expression ¡

 ‡‚ oGn

returns resources with the value of  being Difference

¦po=qrZsn ¦poGqrZsne

(i.e. ž , a resource of type with being a

.ay VyV˜ ::eqrZ>2²0./›VyV% ::eqZ>2

§ ¦

subclass of ‡‚ o=n ).

¡  g¢s§k‡‚ o=n ¢./{g2 Example 5. The expression V¡ The difference operator returns the nodes present in the (different from the selection in the previous example, as first input collection but not in the second input collection.

it asks for the use of only the extensional data) returns

 ž the empty collection, as the inferred  of (i.e. Intersection

‡‚ oGn ) from Figure 2 will not be available to the oper- ator. .ay VyV˜ ::eqrZ>2s³±./›VyV% ::eqZ>2 The intersection operator returns the nodes present in

Cartesian Product both input collections. .ay-=ey=˜ ::gqZ>2C+!.a†VyV% ::eqZ>2 The Cartesian product takes as input two collections of 4.2. Loop Operators nodes on which it performs the set-theoretical Cartesian product. Each pair of nodes builds an anonymous resource Loop operators are used to control the repetitive applica- that has all the properties of the original resources. Thus, tion of a certain function or operator. They express repeti- such a resource will have all the types of the original two tion at input or function/operator level. resources (RDF multiple classification of resources). The

final output is the collection of all the anonymous resources. Map ¦

Note how for binary operators§ we also use an infix notation.

z1oE¡ %¢./œ=ey=˜ ::gqZ>2

w

§ ¦

¡ %e¢§k­Y{ Zsqr}ve¢

Example 6. The expression ¡

¡ ¡ rg¢E§©¦po=qrZsn ¢r.;{g2

./{g2j+ returns an anonymous The map operator is defined as ž

resource having all the properties of ƒ’ and . As a conse-

w

C.;•./?£g2U

and ¦po=qrZsn .

?£

of applying the function/operator  to each element in the {bZVqrqZv¢.a†VyV˜ ::eqrZ>2 .ay VyV˜ ::eqrZ>2-®;¯°¡ input collection are combined (set union) to obtain the final result. All unary construction operators have an implicit The join expression is defined to be a Cartesian product map operator associated with them.

followed by a selection,§ so equivalent to qrV¢./{g2

Example 8. z oE¡ computes the labels of all the non-

¡ {bZ=qrqrZv¢.ay +†H2 blank nodes in the (input) model, i.e. of all resources having The expression has as input two collections of resources an qr property. that have their elements paired only if they fulfill the

{bZVqrqZ (relating the left and right operands). Anony-

mous resources are built for each such pair. The output is Kleene Star µ

¦ ¦

the collection of all§ the anonymous resources.

¡ %¢./œVyV˜ ::eqrZ>2

w

¦ § ¦

¡ ¡  e¢s§k­Y{ Zsq}ve¢r.;{g2F2®t¯c¡ ¡  Example 7. .ay- -

The Kleene star operator is defined as

y%zlNaqnq „G¢r./y2p§ ¡ oGqtZsF:g¢.aH2r¢.a! ¡ ¡ %e¢~§

¦po=qrZsn ¢r.;{g2F2 returns an anonymous resource having all the

C †•.;2> -m´m9m•.;•.Fm9m´m¶.;•.;•.;2F2F272F2E m´m´m ž

properties of ƒ’ and . In case that there would have

w qo= ?:{U}  been painters who didn’t use the ŸO‡ tech- So, the Kleene star operator expresses repetition at func- nique, these painters would not be paired with this tech- tion/operator level. It repeats the application of the func-

nique in the resulting anonymous resources. tion/operator  on the given input possibly infinite number ¦poGqtZsn 

of times. For each iteration the result is obtained by com- Example 10. Z=ƒ¡ creates a blank node of ¥~qrn oGNF

cation on the input with the input. If after an iteration the a Literal node representing the string ŸO‡oG oG·Go=uƒqr . result is the same as the input, a fixed point is reached and the repetition stops. In order to ensure termination, a variant

of this operator that specifies the number of iterations Z is Create Edge

defined below:

µ

Vuƒƒ¡ ZoGz1G2

¡ 2 The create edge operator adds new edges (properties) to Note that the map operator does not include the input in the graph. The name (label) of the edges, as specified by

the result, while the Kleeneµ star operator does.

¦ ¦ % 

ZoGz1 , is the id of a resource of type (property re- ?„n G{b

source). The :e}v„F ={U and the must have types com-

q?¢. ¡ ¡ rs:=D:e}v„g‡N/o‰::ˆp%¢´¢./¦poGqrZsqtZvuH2F2 Example 9. z oE¡ plying with the domain and the range of the associated (by gives the q of all ancestor classes in the type hierarchy

name) property resource. The :e}v„F ={U is a node (or single- starting with ¦poGqrZsqtZvu . For our example the result will

ton collection) in the graph and the ?„n G{b is a collection of contain three resources representing the types ‘Y qnvoƒ{U ,

nodes. The edges are between the :e}s„n G{U node and each of

s:= ¢:}% {b ¦po=qrZsqrZvu , and . If there were loops made

the nodes in the ?„F ={U collection. The create edge operator by the s:=D:e}v„b‡N;o‰::ˆp property in the input model, the returns the subject node (a collection containing one node). above example would still terminate. The fact that the input

model has a finite number of classes implies that at a certain Z>“

moment a fixed point is reached (we obtain the same output Example 11. If Z’ and are the two nodes constructed Z>“

collection as for the previous iteration) and thus the Kleene in Example 10, Z’ denoting the blank node and denot- ZoGz G“=2

star operator terminates. ing the literal node, ?u‰ƒ¡ creates an edge

Z’ Z>“ labeled ZoGz1 between the nodes and . 4.3. Construction Operators 5. Conclusions Querying an RDF model implies not only extracting in- teresting resources/literals from the input but also building RAL is an RDF algebra defined to support the formal new resources/literals and associating new properties be- specification of an RDF query language. It presents a set of tween resources/properties. operations to be used in both the extraction and construction Before inserting new nodes/edges, the RDF constraints parts of a formally defined RDF query language. It is one are checked. If these constraints are not met, the operation of the first RDF algebras developed from a database per- aborts. Examples of RDF constraints are: the uniqueness

spective. Compared with existing RDF query languages,  of resource identifiers, the value of  cannot be a the construction phase is not neglected and is part of the literal, literals cannot have properties etc. language specification. As future work we will analyze the expressive power of Create Node RAL with respect to existing RDF query languages and its

completeness. Comparing the expressive power of RAL to %G

The type of the node, specified by  , is a resource of guages (implementations) for RDF: RQL is the prime can- q type ‡N/o‰:: . The is a resource identifier if the node rep- didate. resents a resource, or a string if the node represents a literal. We would like also to investigate optimization laws that

As a side effect, an edge representing the % property is will enable algebraic manipulations for query optimization. added between the created resource and its associated type The lack of order (between resources) in RDF models and resource. All resources and literals need to define their type RAL collections, as well as the simplicity and composabil-

(even if it is one of the RDFS primitives s:=¢‚:} {g ity of RAL operators (similar to the relational algebra ones) q or s:=¥~qrn oƒN ). If the operand is empty a blank node seem to foster the definition of RAL optimization laws. A is constructed. For these blank nodes the system assigns translator from a popular RDF query language (e.g. RQL)

unique ZG€G s. The create node operator returns the cre- to RAL and a RAL engine will enable us to experiment with ated node (a collection containing one node). different aspects of RDF query optimization. References [16] R. V. Guha. Rdfdb query language. http://web1. guha.com/rdfdb/query.html. [17] R. V. Guha, O. Lassila, E. Miller, and D. Brickley. Enabling [1] D. Allsopp, P. Beautement, J. Carson, and M. Kirton. inferencing. In The W3C Query Languages Workshop, Towards semantic interoperability in agent-based coalition 1998. http://www.w3.org/TandS/QL/QL98/pp/ command systems. In The Emerging Semantic Web - Se- enabling.. lected Papers From The First Semantic Web Working Sym- [18] P. Hayes. Rdf model theory. W3C Working Draft 29 April posium. IOS Press, 2002. 2002. http://www.w3.org/TR/rdf-mt. http: [2] D. Beckett. Redland rdf application framework. [19] Intellidimension Inc. Rdfql query language refer- //www.redland.opensource.ac.uk . ence. http://www.intellidimension.com/ [3] A. Berglund, S. Boag, D. Chamberlin, M. F. Fernandez, RDFGateway/Docs/querying.asp. M. Kay, J. Robie, and J. Simeon. Xml path language [20] G. Karvounarakis, V. Christophides, D. Plexousakis, and () 2.0. W3C Working Draft 16 August 2002. http: S. Alexaki. Querying rdf descriptions for community web //www.w3.org/TR/xpath20/. portals. In 17iemes Journees Bases de Donnees Avancees, [4] T. Berner-Lee. What the semantic web can represent. pages 133–144, 2001. W3C 1998. http://www.w3.org/DesignIssues/ [21] M. Kay. Xsl transformations () version 2.0. W3C Work- RDFnot.html. ing Draft 16 August 2002. http://www.w3.org/TR/ [5] T. Berners-Lee. Weaving the Web. Harper San Francisco, xslt20/. 1999. [22] S. Kokkelink. Transforming rdf with rdfpath. Work- [6] P. V. Biron and A. Malhotra. Xml schema part 2: Datatypes. ing Draft, 2001. http://zoe.mathematik. W3C Recommendation 02 May 2001. http://www.w3. Uni-Osnabrueck.DE/QAT/Transform/ org/TR/xmlschema-2/. RDFTransform.pdf. [7] S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, [23] O. Lassila. Enabling semantic web programming by inte- J. Robie, and J. Simeon. Xquery 1.0: An query lan- grating rdf and common lisp. In The First Semantic Web guage. W3C Working Draft 16 August 2002. http: Working Symposium, pages 403–410. Stanford, 2001. //www.w3.org/TR/xquery/. [24] O. Lassila and R. R. Swick. Resource description frame- [8] D. Brickley and R. V. Guha. Rdf vocabulary description work (rdf) model and syntax specification. W3C Recom- language 1.0: Rdf schema. W3C Working Draft 30 April mendation 22 February 1999. http://www.w3.org/ 2002. http://www.w3.org/TR/rdf-schema/. TR/1999/REC-rdf-syntax-19990222. [9] D. Connolly, F. van Harmelen, J. Horrocks, D. L. McGuin- [25] A. Malhotra and N. Sundaresan. Rdf query speci- ness, P. F. Patel-Schneider, and L. A. Stein. Daml+oil fication. In The W3C Query Languages Workshop, (march 2001) reference description. W3C Note 18 1998. http://www.w3.org/TandS/QL/QL98/pp/ December 2001. http://www.w3.org/TR/daml+ rdfquery.html. oil-reference. [26] M. Marchiori and J. Saarela. Query + metadata + logic [10] M. Dean, D. Connoly, F. van Harmelen, J. Hendler, J. Hor- = metalog, 1998. http://www.w3.org/TandS/QL/ rocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. QL98/pp/metalog.html. Stein. Owl 1.0 reference. W3C [27] B. McBride. Jena: Implementing the rdf model and syntax Working Draft 29 July 2002. http://www.w3.org/ specification. In Second International Workshop on the Se- TR/2002/WD-owl-ref-20020729/. mantic Web, 2001. http://SunSITE.Informatik. [11] S. Decker, D. Brickley, J. Saarela, and J. Angele. A query RWTH-Aachen.DE/Publications/CEUR-WS/ and inference service for rdf. In The W3C Query Languages Vol-40/mc%bride.pdf. [28] S. Melnik. Algebraic specification for rdf models. Working Workshop, 1998. http://www.w3.org/TandS/QL/ Draft, 1999. http://www-diglib.stanford.edu/ QL98/pp/queryservice.html. diglib/ginf/WD/rdf-alg/rdf-alg.pdf. [12] S. Decker, S. Melnik, F. Van Harmelen, D. Fensel, M. Klein, [29] S. Melnik. Rdf api draft, 2001. http://www-db. J. Broekstra, M. Erdmann, and J. Harrocks. The semantic stanford.edu/˜melnik/rdf/api.html. web: The roles of xml and rdf. IEEE Internet Computing, [30] L. Miller. Inkling: Rdf query using squishql, 2002. http: 4(5):63–74, 2000. //swordfish.rdfweb.org/rdfquery. [13] F. Frasincar, G.-J. Houben, and C. Pau. Xal: an alge- [31] E. Prud’hommeaux. Algae howto. W3C, 2002. bra for xml query optimization. In Database Technologies http://www.w3.org/1999/02/26-modules/ 2002, Thirteenth Australasian Database Conference, vol- User/Algae-HOWTO.html. ume 5 of Conferences in Research and Practice in Informa- [32] A. Seaborne. Rdql - a data oriented query language for tion Technology, pages 49–56. Australian Computer Society rdf models. HP Labs. http://www.hpl.hp.com/ Inc., 2002. semweb/rdql.html. [14] C. R. G. G., B. Douglas, B. Mark, E. Jeff, J. David, R. Craig, [33] M. Sintek and S. Decker. Triple - an rdf query, inference, S. Olaf, S. Torsten, and V. Fernando. The Object Data Stan- and transformation language. In The Semantic Web - ISWC dard: ODMG 3.0. Morgan Kaufmann, 2000. 2002, First International Semantic Web Conference, volume [15] J. Grant and D. Beckett. Rdf test cases. W3C Work- 2342 of Lecture Notes in Computer Science, pages 364–378. ing Draft 29 April 2002. http://www.w3.org/TR/ Springer, 2002. rdf-testcases/.