Annual Meeting 2006 March 21-24, 2006

Extended Abstracts and Overview of Demos

Project Number: 506779 Project Title: Reasoning on the Web with Rules and Semantics Acronym: REWERSE Reporting period: March 2005-February 2006 Project Coordinator: Dr. François Bry, professor Project Manager: Dr. Uta Schwertel Organisation: Institute for Informatics Oettingenstr. 67 D-80538 München Phone: +49-89-2180 9016 Fax: +49-89-2180-9017 E-mail: [email protected] Project Web site: http://rewerse.net Document Date: March 14, 2006

REWERSE

Workpackage I1

Rule Markup Languages 1 Demos

1.1 Strelka - A Visual Rule Modeling Tool

The languages used in the communication between domain analysts and domain experts for analyzing and documenting system requirements should not be ’technical’, but should allow visual and/or natural-language like vocabulary and rule expressions that can be understood by domain experts without extensive technical training.

The UML offers a visual language for specifying vocabularies. Some rule types, for example, integrity constraints (invariants) and derivation rules can be represented in UML models by means of the Object Constraint Language (OCL). The OCL is a formal language, which is difficult to understand for people without a technical background.

In order to simplify rule modeling, the REWERSE Working Group I1 has developed

• a UML-based Rule Modeling Language (URML), which extends UML class models by adding rules, and • Strelka, a tool for making URML models. Strelka is implemented as a plug-in for the Fujaba Tool Suite, which is an open source UML case tool.

Detailed demo description available.

Contact: Sergey Lukichev (Cottbus)

2 Extended Abstracts

2.1 R2ML - The REWERSE I1 Rule Markup Language

Gerd Wagner, Adrian Giurca, Sergey Lukichev

Strelka - A Visual Rule Modeling Tool

1 Introduction

The languages used in the communication between domain analysts and domain experts for analyzing and documenting system requirements should not be ’technical’, but should allow visual and/or natural-language- like vocabulary and rule expressions that can be understood by domain experts without extensive technical training. The UML offers a visual language for specifying vocabularies. Some rule types, for example, integrity constraints (invariants) and derivation rules can be represented in UML models by means of the Object Constraint Language (OCL). The OCL is a formal language, which is difficult to understand for people without a technical background. In order to simplify rule modeling, the REWERSE Working Group I1 has developed • a UML-based Rule Modeling Language (URML), which extends UML class models by adding rules, and • Strelka, a tool for making URML models. Strelka is implemented as a plug-in for the Fujaba Tool Suite, which is an open source UML case tool1.

2 URML – A UML-Based Metamodel and a Visual Notation for Rule Modeling

URML supports modeling of derivation rules, production rules and reaction rules. A rule is represented graphically as a circle with a rule identifier. Incoming arrows represent rule conditions or triggering events, outgoing arrows represent rule conclusions or produced actions.

Language Elements Condition arrows refer to a conditioned model element, which is a classifier such as a class or an association. It may come with a filter expression selecting instances from the extension of the condition classifier and with an explicit object variable (or object variable tuple, in the case of an association) ranging over the resulting instance collection. Negated condition arrows are crossed at their origin; they denote a negated condition which has to be conjoined with one or more positive condition arrows such that its variables are covered by them. Derivation rules are represented graphically as a circle with an internal label ”DR” and a rule identifier attached to it. Incoming arrows represent conditions, outgoing arrows represent conclusions. Conclusion arrows also refer to a classifier model element. Its meaning is to state that the predicate represented by the conclusion classifier applies to any instance that satisfies all rule conditions. Production rules are represented graphically as a circle with an internal label ”PR” and a rule identi- fier attached to it. Incoming arrows represent conditions, outgoing arrows with a double arrowhead represent actions.

1Fujaba Tool Suite http://www.fujaba.de

1 Customer Add CD Link to customer page has type 1

0..1

PR not item -> ShoppingCart 4.6 exists(t.type == 'CD') discount total

Item 1 type has value *

Figure 1: Production rule, create action: If a customer has no items of type CD in her shopping cart, then add a CD page link to the customer page.

Action arrows refer either to a class (in the case of a create, delete, assign or invoke action) or to an activity. Reaction rules are represented graphically as a circle with an internal label ”RR” and a rule identifier attached to it. There are two kinds of incoming arrows (condition arrows and event arrows) and two kinds of outgoing arrows (action arrows and postcondition arrows, both having a double arrowhead). Event arrows may refer to a class.

A more detailed description of URML is available on the website of the Working Group I12.

3 Functionality of Strelka

To create a rule, a user should create a new class diagram first. All rule modeling operations are accessible in the pop-up menu. Rule-specific items are context-dependent and items availability depends on what is currently selected on the diagram. To create a rule, a user should right-click and select ”New derivation rule” or new ”New production rule”. To create rule condition arrow, a user should select condition classifier (class or association) and a rule circle. The selection is done by holding Ctrl button while selecting items with the mouse. After selection is complete, the user should right-click on a classifier and select ”Create new condition”. To add a filter to a condition, a user should right-click on the condition arrow and select ”Edit filter”. The filter dialog will pop-up to add/delete/modify filter expression. Currently filters are represented as strings and syntax validation and consistency check are not supported. To create a rule conclusion arrow, a user should select a conclusion classifier and a rule, right click on the classifier and select appropriate conclusion type (classification, association or attribution). A property value statement is used to specify the value of the derived attribute in the attribution conclusion. To edit a conclusion property value statement, a user should right-click on the conclusion arrow and select ”Edit property value statement”. To create a production rule action, a user should select a production rule and a classifier for the action (class or association), then right-click on the classifier and select action type in the popup menu. Visually, action types are distinguished by the capital character near the arrow head. A is used for the assign action, C for create, D for delete and I for the invoke action.

2REWERSE Working Group I1 http://www.rewerse.net/I1

2 isStoredAt

0..1 isAvailableAt 0..1

Branch 0..1 0..1 RentalCar

*

DR isAssignedTo id: 1

0..1

Rental

RentalCarScheduledForService

Figure 2: If a rental car is stored at a branch, is not assigned to a rental and is not scheduled for service, then the rental car is available at the branch.

The model can be saved using the standard Fujaba saving function available in the File menu. A model can be exported in any of the graphics formats PDF, SVG, JPG, PNG and GIF. Two examples of derivation and production rules, created with Strela, are depicted on Figure 1 and 2. A user manual for Strelka is available on the website of the Working Group I13.

4 Future Work

The current work of I1 is focused on the serialization of rule models in the R2ML format4. More information about the functionality of the tool, download instructions and sample rule models are available from the Working Group I1 website http://www.rewerse.net/I1 (go to the Projects section).

3REWERSE Working Group I1 http://www.rewerse.net/I1 4G. Wagner, A.Giurca, S. Lukichev (2005). R2ML: A General Approach for Marking up Rules, Dagstuhl Seminar Pro- ceedings 05371, in F. Bry, F. Fages, M. Marchiori, H. Ohlbach (Eds.) Principles and Practices of Semantic Web Reasoning, http://drops.dagstuhl.de/opus/volltexte/2006/479/

3 R2ML - The REWERSE I1 Rule Markup Language∗

Gerd Wagner Adrian Giurca Sergey Lukichev Institute of Informatics Institute of Informatics Institute of Informatics Brandenburg University of Brandenburg University of Brandenburg University of Technology at Cottbus Technology at Cottbus Technology at Cottbus 03046 Walther Pauer Str.2, 03046 Walther Pauer Str.2, 03046 Walther Pauer Str.2, Cottbus, Germany Cottbus, Germany Cottbus, Germany [email protected] [email protected] [email protected]

ABSTRACT • user-defined object function names comprising role This study concerning R2ML (REWERSE I1 Rule Markup function names and object property names; Language) an interchange format for rules integrating the Rule Markup Language (RuleML) with the Semantic Web • user-defined data function names comprising at- Rule Language (SWRL) as well as the Object Constraint tributes and data operations; Language (OCL). These languages provide a rich syntax for expressing rules. This means that they support conceptual • user-defined noun concept names standing for gen- distinctions, such as distinguishing different types of terms eral noun concepts, among which we distinguish object and atoms, which are not present in standard predicate logic. types (’classes’) and ’datatypes’; The interchange format is usable in the sense that it allows structure-preserving markup of all constructs of these dif- • user-defined verb concept names, called ’predicate ferent languages and does not force users to translate their symbols’ in traditional logic, standing for general verb rule expressions to a completely different language paradigm concepts, or predicates, among which we distinguish such as having to transform a function into a functional properties and associations; properties are either at- predicate. tributes, if they are data-valued, or reference proper- ties, if they are object-valued. R2ML is serialization language for the vizual tool Strelka built in top of Fujaba environment. In Web languages such as RDF and OWL, all these names are globally unique standard identifiers in the form of URI Keywords: Rules, rule markup languages, integrity rules references. One of the goals of R2ML is to comply with derivation rules, production rules, rule metamodels, OCL, important Semantic Web standards like RDF(S) and OWL. RuleML, SWRL. In particular, R2ML accommodates the data type concept In R2ML, at that moment, we consider three kinds of rules in of RDF. this report: integrity rules, derivation rules and production rules. We define rule concepts with the help of MOF/UML, User-defined noun concepts comprise classes (or object types). a subset of the UML class modeling language proposed by Usually, any object or object variable belongs to a class. the Object Management Group (OMG). A class in R2ML is an URI reference. A class is a type 1. BASIC CONTENT VOCABULARY entity for R2ML objects and object variables. The user-defined content vocabulary includes: The datatype language consists of a set of predefined datatype names, including the name rdfs:Literal standing for the • user-defined object names; generic built-in datatype of all Unicode strings. Each pre- defined datatype name is associated with ∗This research has been funded by the European Com- mission and by the Swiss State Secretariat for Education and Research within the 6th Framework Programme project REWERSE no.506779 • a set of data literals, which are Unicode strings;

• a set of datatype function names;

• a set of datatype predicate names.

A datatype is R2ML is an rdfs:Literal or an user defined datatype (subclass of rdf:TypedLiteral) referenced by an URI reference. A datatype is a type entity for R2ML data values and data variables. User-defined verb concepts comprise properties and asso- 3. FUNCTIONAL TERMS ciations. Properties are either attributes, if they are data- A data term is either a data variable, a data value, or a data valued, or reference properties, if they are object-valued. function term, which can be of three different types:

Notice that we use an attribute name both as the name of a function and as the name of the corresponding functional 1. A datatype function term formed with the help of a predicate. Likewise, we use a reference property name both datatype function that comes with the corresponding as the name of a property predicate and as the name of the datatype. corresponding role function. This kind of naming liberty, 2. An attribute function term formed with the help of a which is supported by RDF and Common Logic, helps to user-defined attribute. switch between functional and relational languages. 3. A data operation term formed with the help of a user- A reference property is R2ML is an URI reference. A ref- defined data operation. erence property corresponds to the rdf:predicate property from RDF. It is used in R2ML ReferencePropertyAtom. User-defined data functions are either attributes or data op- A datatype predicate is R2ML is an URI reference. A datatype erations. They are used in data terms. R2ML defines At- predicate accomodate SWRL concept of a built-in predicate tributeFunctionTerm, DataOperationTerm and DatatypeFunc- tionTerm. (predicates that from http://www.w3.org/2003/11/swrlb namespace). User-defined object functions are either role functions or In R2ML individual terms are either object terms standing object operations. They are used in object terms. for objects, or data terms standing for data values. The con- crete syntax of first-order non-Boolean OCL expressions can An object operation is a special type of user-defined func- be directly mapped to our abstract concepts of ObjectTerm tion that corresponds to an object-valued operation in a and DataTerm which can be viewed as a predicate-logic- UML class model. For example, in Figure ??, the operation based reconstruction of the standard OCL abstract syntax. getLastRental() defines an object operation.

As in RuleML [2], R2ML provide the concept of a variable 4. ATOMS but as in almost of programming languages, distinguish be- The basic constituent of a rule is the atom. In R2ML we tween object and data variables i.e between references that define metamodels for atoms, which are compatible with all are instantiated with objects or with data values. important concepts of OWL, SWRL and RuleML.

As in RuleML, R2ML define the concept of an individ- 4.1 Basic Atoms ual(constant) but distinguish between objects and data. Fol- An object classification atom refers to a class and consists lowing the terminology from UML we define the concepts of of an object term. an object name and data value. An ObjectName contains an optional reference to a class which is his type. R2ML A data classification atom consists of a data term and refers allows both typed and un-typed individuals. As in RDF to a data type. a DataValue consist in a lexical value and a type who is an RDF datatype or a user defined datatype (subclass of As in SWRL [3], R2ML supports the concepts of equality rdfs:Literal). and inequality atoms.

In order to support common fact types of natural language An equality atom or inequality atom, is composed of two or directly, it is important to have n-ary predicates (for n > 2). more object terms.

An association predicate is R2ML is an URI reference. An A built-in data predicate atom refers to a datatype predicate, association predicate corresponds to the standard Datalog and consists of a number of data terms. predicates. It is used in AssociationPredicateAtom. 4.2 Relational Atoms An attribution atom consists of an object term as ”subject”, 2. FUNCTIONAL CONTENT VOCABULARY and a data term as ”value”. In R2ML a datatype function is a functional symbol identi- fied by a URI reference. A reference property atom associates object terms as ”sub- jects” with other object terms as ”objects”. A data operation is a special type of user-defined function that corresponds to a data-valued operation in a UML class An association atom is constructed using an n-ary predicate model. as association predicate, a collection of data terms as ”data arguments” and a collection of object terms as ”object ar- An attribute is a special type of user-defined function that guments”. corresponds to a data-valued property in a UML class model. In Figure ??, reservationDate is an attribute of the class Following RDF [?] and OWL [?], we adopt the concept of Rental. an object description atom. An object description atom refers to a class as a base type and to zero or more classes as categories, and consists of a number of property/term pairs (attribute data term pairs and reference property object term pairs). Any instance of by an objectID, if it is not anonymous. 5. FORMULAS R2ML provides two abstract concepts for formulas: the con- general quantifier free logical formula with weak and strong negations, and the concept of LogicalFormula, which corre- sponds to a general quantified first order formula. 6. ACTIONS The REWERSE Rule Markup Language offer support both for production rules and reaction rules. With this respect it 7.2 Derivation Rules define the concept of an action. Following the OMG Produc- Derivation Rules, have ”conditions” and ”conclusions”. In tion Rule Representation submission, an action is either an R2ML framework the conditions of a derivation rule are InvokeActionExpression or an AssignActionExpression or a AndOrNafNegFormula. Conclusions are restricted to An- CreateActionExpression or a DeleteActionExpression. dOrNegFormula without NAF.

InvokeAction refers to an UML Operation and contains a list of arguments. This action invokes an operation with a Example 2. If a rental car has then last main- list of arguments. tenance date older than 90 days or the service reading greater than 5000km then it is a rental AssignAction refers to an UML Property and contains a car scheduled for service. DataTerm. This action assigns a value to a property.

encePropertyObjectTermPair and/or AttributeDataTermPair. DeleteAction refers to an UML Class and contains an Ob- 7.1 Integrity Rules An integrity rule consist in a constraint that is a general Example 1. We use examples from the EU-Rent case 1 If a rental is not an one way rental then return branch of rental must be the same as pick-up branch of rental rew:classID="srv:RentalCarScheduledForService"> rew:classID="srv:OneWayRental" rew:isNegated="true"> 7.3 Production Rules Production Rules, have ”conditions” and ”post-conditions”. The conditions and post-conditions of a production rule are LogicalFormula. A production rule may execute an Action. 1EU-Rent case study at the Euro- pean Business Rules Conference web site http://www.eurobizrules.org/eurentcs/eurent.htm Example 3. Assign Discount R2ML SWRL R2ML SWRL object variable individual variable ObjectClassificationAtom classAtom object name individual (owl:Individual) AttributionAtom datavaluedPropertyAtom role function term n/a ReferencePropertyAtom individualPropertyAtom data variable data variable DataClassificationAtom datarangeAtom data value (rdfs:Literal) data literal (owlx:datatype) EqualityAtom sameIndividualAtom data operation term n/a InequalityAtom differentIndividualsAtom attribute function term n/a DataPredicateAtom builtinAtom datatype function term n/a AssociationAtom n/a data operation term n/a ObjectDescriptionAtom n/a

Table 1: Comparison by supported terms Table 2: Comparison by supported atoms

If the customer order has the value greater than • Extending R2ML for supporting other rule languages 1000 and the customer is not a ”gold” customer, developed in REWERSE. then assign a discount of 10%. 10. REFERENCES [1] Wagner, G., How to Design a General Rule Markup Language,Proceedings of the Workshop XML Technologies for the Semantic Web (XSW 2002), HU Notes in Informatics, Gesellschaft f. Informatik. H.,The Abstract Syntax of RuleML - Towards a General Web Rule Language Framework, Rule Markup Initiative (RuleML), http://www.ruleml.org [3] Horrocks I., Patel-Schneider P. F., Bell Labs Research, Boley H., Tabet S., Grosof B.,Dean M., SWRL: A Semantic Web Rule Language Combining OWL and RuleML, W3C Member Submission 21 May 2004, http://www.w3.org/Submission/SWRL/ [4] W3C Workgroup on RIF Charter, http://www.w3.org/2005/rules/wg/charter [5] G. Wagner, A.Giurca, S. Lukichev (2005). R2ML: A General Approach for Marking up Rules, Dagstuhl Marchiori, H. Ohlbach (Eds.) Principles and Practices of Semantic Web Reasoning, http: //drops.dagstuhl.de/opus/volltexte/2006/479/ [6] EU-Rent case study at the European Business Rules Conference web site http://www.eurobizrules.org/eurentcs/eurent.htm

8. COMPARISON WITH SWRL 9. FUTURE WORK Next steps in the development of R2MLconcern:

• Finalization of the schema.

• XSLT and CSS for rule publishing on the web.

• Integrating rules in server side programming.

• Complete integration with URML and Strelka (as a serialization language).

REWERSE

Workpackage I2

Policy Specification, Composition, and Conformance 1 Demos

1.1 Bidirectional Mapping between ACE and OWL DL

We show how Attempto Controlled English (ACE) can be used as a natural language front- end to producing OWL DL ontologies. We demonstrate the mapping from a subset of ACE to OWL DL and from OWL DL to a subset of ACE. Contact: Kaarel Kaljurand (Zurich)

1.2 AceWiki – Breaching the Semantic Barrier

We show AceWiki, a prototype for a semantic wiki that contains articles written in Attempto Controlled English. AceWiki has been designed with a strong focus on biomedical knowl- edge. The preloaded ontology is one of protein interactions. Contact: Tobias Kuhn (Zurich) & Loic Royer (Dresden)

1.3 Recent Developments of the Attempto System

We will demonstrate version 4 of Attempto Controlled English (ACE), and Attempto tools like the Attempto Parsing Engine (APE), the Attempto Paraphraser (DRACE) and the At- tempto Reasoner (RACE). Contact: Norbert E. Fuchs (Zurich)

2 Extended Abstracts

2.1 From ACE to OWL and from OWL to ACE

Author: Kaarel Kaljurand

2.2 Attempto Controlled English as Ontology Language

Author: Tobias Kuhn

From ACE to OWL and from OWL to ACE

Kaarel Kaljurand

University of Zurich, University of Tartu [email protected]

Abstract. We describe ongoing work on a bidirectional mapping be- tween Attempto Controlled English (ACE) and OWL DL. ACE is a well-studied controlled language, with a parser that converts ACE texts into Discourse Representation Structures (DRS). We show how ACE can be translated into OWL DL (by using the DRS as interlingua) and how OWL DL can be verbalized in ACE. This mapping renders ACE an interesting companion to existing OWL front-ends.

1 Introduction

Existing OWL tools (Prot´eg´e1, SWOOP2, etc) are user-friendly graphical edi- tors, but for complex class descriptions they require the user to possess a large knowledge of Description Logics (DL). E.g. [7] list the problems that users en- counter when working with OWL DL and express the need for a “pedantic but explicit” paraphrase language. To answer this need, we envision a text-based system that allows the users to express the ontologies in the most natural way — in natural language. Such a system would provide a natural syntax for logical constructions such as dis- jointness or transitivity, i.e. it would not use keywords but instead a syntactic structure to represent those complex concepts. It would also hide the sometimes artificial distinction between an ontology language and a rule language. The sys- tem would be tightly integrated with an OWL DL reasoner, but the output of the reasoner (if expressed in OWL DL as a modification of the ontology) would again be verbalized in natural language, so that all user interaction takes place in natural language and the central role in the system is carried by plain text. As a basis of the natural language, we have chosen Attempto Controlled En- glish (ACE), a subset of English that can be converted through its DRS represen- tation into first-order logic representation and automatically reasoned about (see [1] for more information). The current version of ACE offers language constructs like countable and mass nouns; collective and distributive plurals; generalized quantifiers; indefinite pronouns; negation, conjunction and disjunction of noun phrases, verb phrases and sentences; and anaphoric references to noun phrases through proper names, definite noun phrases, pronouns, and variables. The in- tention behind ACE is to minimize the number of syntax and interpretation rules 1 http://protege.stanford.edu 2 http://www.mindswap.org/2004/SWOOP/ needed to predict the resulting DRS, or for the end-user, the reasoning results. At the same time, the expressivity of ACE must not suffer. The small number of ACE function words have a clear and predictable meaning and the remaining content words are classified only as verbs, nouns, adjectives and adverbs. Still, ACE has a relatively complex syntax compared to the OWL representation e.g. in the OWL Abstract Syntax specification ([6]), but as ACE is based on En- glish, its grammar rules are intuitive (already known to English speakers) and experiments show that ACE can be learned in a few days. Some existing results show the potential and the need for a natural language based interface to OWL. [3] paraphrase OWL class hierarchies but their target language is not a controlled language and cannot be edited and parsed back into a standard OWL representation. [8] propose writing ontologies in a controlled language, but do not provide a natural syntax for writing TBoxes. Our work tries to overcome these shortcomings and addresses the following issues:

– Show that there is a mapping from a subset of ACE (which we call OWL ACE) into a syntactic subset of OWL DL (i.e. a subset which does not use all the syntactic constructs in OWL DL but is still capable of expressing everything that OWL DL can express). – Show that the two involved subsets and the mapping from one to the other are easy to explain to the users. This means that the entailment and con- sistency results given by the OWL DL reasoners ”make sense” on the ACE level. – Show that there is a mapping from the syntactic subset of OWL DL into OWL ACE. This mapping (which can be called a verbalization) must, again, be easily explainable. – Implement a converter from OWL DL to the chosen syntactic subset of OWL DL. By this, we will be able to handle all OWL DL ontologies on the web. – If needed, extend ACE to provide a more natural syntax or more syntactic variety for expressing the OWL DL constructs. – Extend the verbalization process to target a richer syntactic subset of OWL ACE. – Extend all the aspects of this mapping in order to be compatible with future standards of OWL DL, e.g. OWL 1.1 ([5]) or extensions of it, e.g. SWRL ([4]).

So far, we have focused on the first 3 steps. In the following, we describe a mapping from OWL ACE to OWL DL (in RDF/XML syntax), the problems encountered, the OWL ACE subset and the verbalization of OWL DL.

2 From ACE to OWL

The following figure shows the DRS corresponding to the ACE text “Bill who is a man likes himself. Bill is William. Every businessman who owns at least 3 things is a self-made-man or employs a programmer who knows Bill.” (Note that the example is somewhat artificial to demonstrate concisely the features of OWL DL as expressed in ACE.)

[A, B, C, D, E, F] object(A, atomic, named_entity, person, cardinality, count_unit, eq, 1)-1 named(A, Bill)-1 object(C, atomic, man, person, cardinality, count_unit, eq, 1)-1 predicate(E, state, be, A, C)-1 predicate(B, unspecified, like, A, A)-1 object(D, atomic, named_entity, person, cardinality, count_unit, eq, 1)-2 named(D, William)-2 predicate(F, state, be, A, D)-2 [G, H, I] object(H, atomic, businessman, person, cardinality, count_unit, eq, 1)-3 object(G, group, thing, object, cardinality, count_unit, geq, 3)-3 predicate(I, unspecified, own, H, G)-3 => [] [J, K] object(J, atomic, self_made_man, person, cardinality, count_unit, eq, 1)-3 predicate(K, state, be, H, J)-3 v [L, M, N] object(L, atomic, programmer, person, cardinality, count_unit, eq, 1)-3 predicate(N, unspecified, employ, H, L)-3 predicate(M, unspecified, know, L, A)-3

The DRS makes use of a small number of predicates, most importantly ob- ject derived from nouns and predicate derived from verbs. The predicates share information by means of discourse referents (denoted by capital letters) and are further grouped by embedded DRS-boxes, that represent implication (derived from ‘every’ or ‘if... then...’), negation (derived from various forms of English negation), and disjunction (derived from ‘or’). Conjunction — derived from rel- ative clauses, explicit ‘and’, or the sentence end symbol — is represented by the co-occurrence in the same DRS-box. The mapping to OWL DL does not modify the existing DRS construction algorithm but only the interpretation of the DRS. It considers everything in the toplevel DRS to denote individuals (typed to belong to a certain class), or to denote relations between individuals. Individuals are introduced by nouns, so that propernames map to individuals with type owl:Thing and common nouns to an anonymous individual with the type derived from the corresponding noun (e.g. class Man). Properties are derived from transitive verbs and transitive adjectives (‘fond of’, ‘taller than’). Special meaning is assigned to the copula ‘be’ which introduces an equality between individuals. An embedded implication-box introduces a subClassOf -relation between classes — the head of the implication maps to the class description, the body to its su- perclass description. Transitive verbs and adjectives introduce a property restric- tion with someValuesFrom a class denoted by the object of the verb or adjective, and the copula introduces a class restriction. Co-occurrence of predicates maps to intersectionOf. Negation and disjunction boxes introduce complementOf and unionOf, respectively. Any embedding of them is allowed. The plural form of the word ‘thing’ allows to define cardinality restrictions. Thus our DRS has the following meaning (in Description Logics notation): bill m1 ∈Man > william∈ bill = m1∈ > bill = william likes(bill, bill)

(Businessman owns > 3) (SelfMadeManu ( employsv (Programmer ( knows bill )))) t ∃ u ∃ { } Note that an ACE construct like “A man who owns a dog likes an animal.” describes relationships between individuals and not classes, since the correspond- ing DRS does not have any embedded DRSs. In full English, this sentence is ambiguous by also having a reading which relates classes. In ACE, one would have to use ‘every’ instead of ‘a’ to get this reading. OWL ACE allows properties to have superproperties. A superproperty (e.g. ‘likes’) for a given property (e.g. ‘loves’) can be defined as: Everybody who loves somebody likes him/her. Describing the transitivity of properties and inverse properties is quite “math- ematical” in ACE, but there does not seem to be a better way in natural lan- guages, unless one defines a keyword such as ‘transitive’ which then has to be explained to the average users. Consider e.g. If a thing A is taller than a thing B and B is taller than a thing C then A is taller than C.

If a thing A is taller than a thing B then B is shorter than A. If a thing A is shorter than a thing B then B is taller than A. Note that property definitions make use of indefinite pronouns (‘everybody’, ‘somebody’) or a noun ‘thing’, which all map to owl:Thing. The current mapping does not target all the syntactic variety defined in the OWL DL specification, e.g. elements like disjointWith or equivalentProperty can- not be directly expressed in ACE, but their semantically equivalent constructs can be generated.

3 Problems

Now we look at some of the problems that we have encountered when imple- menting the mapping from ACE to OWL DL. Complex class descriptions as arguments to someValuesFrom are difficult to map to OWL DL, since the DRS representation resembles more a rule language than a DL style property restriction. Also, allValuesFrom cannot be created in the most natural way because ACE lacks function words like ‘only’ or ‘nothing but’.

Every carnivore eats only meat. ∗Every carnivore eats nothing but meat. ∗ To express this meaning, the user can use double negation or a rule-like construction.

Every carnivore does not eat something that is not a meat. If a carnivore eats something then it is a meat.

The ACE negation does not generate an implication-box, but for class de- scriptions like “No man is a woman.” it would be desirable. Therefore, we first convert the negation-box into an implication-box (containing a negated then- part). The fact that inverseOf is symmetrical is also difficult to implement because the ACE-way of expressing this creates two implication-boxes which have to be handled as one unit in the mapping. Currently, there is no support for enumerations (oneOf ). One possibility would be to extend ACE with NP disjunction.

Every student is John or Mary or Bill. Everybody likes John or Mary or likes John or Bill. Everybody who is John or Bill is a man and is a student.

Also, at this point, ACE has no support for datatype properties. And finally, metalevel constructions such as URIs, imports, annotation properties, versioning, etc, which essentially make OWL DL a Semantic Web language cannot be cleanly expressed in ACE.

4 Explaining OWL ACE

Some restrictions to OWL ACE compared to full ACE are easy to explain: there is no support for intransitive and ditransitive verbs, prepositional phrases, adverbs, intransitive adjectives, and most forms of plurals. In addition, there are constraints on the DRS structure which might be diffi- cult to explain to the average user. Currently, an implication-box is only allowed to occur at a toplevel DRS. Disjunction, however, is not allowed to occur at the toplevel. Negation at the toplevel is handled by converting it first into an impli- cation, or alternatively, as a negation of the equivalence of individuals. A further restriction requires the predicates in the implication-box to share one common discourse referent as the subject argument, unless the subject is directly or indi- rectly an object of a predicate that binds it to the common subject. This allows us to exclude sentences like “If a man sees a dog then a cat sees a mouse.” but to include sentences like “If a man sees a dog that sees a cat then the man sees a mouse.” The first sentence does not seem to map nicely to OWL DL but instead to a more powerful rule language (such as SWRL). Also, no subject can occur as an object in an embedded if-then box (“Every man hates a dog that bites the man.”)

5 From OWL to ACE The mapping in the opposite direction must handle all OWL DL constructs, some of which the ACE-to-OWL mapping does not produce. Also, it involves parsing RDF, which is the normative syntax for OWL DL. Another issue is raised by the naming conventions used for OWL classes and properties. E.g. OWL ontologies can contain class names like SpicyPizza, MotherWith3Children and property names like accountName, brotherOf, isWrittenBy. OWL ACE on the other hand would prefer classes to be named by singular nouns and properties by transitive verbs or adjectives. So far, we have implemented a simple prototype in XSLT, which directly generates ACE. An alternative would target the DRS instead, and use an existing general mapping from the DRS to a canonical ACE form [2]. Currently, the ACE representation ends up being quite repetitive and un- ordered. For large ontologies this might become a problem and a more complex strategy is needed. Consider e.g. the following sentences. Every professor who teaches a class is a teacher. If there is a professor and he teaches a class then he is a teacher. If there is a professor P and P teaches a class then P is a teacher. Those sentences are equivalent, as far as ACE is concerned. Still, one could argue that some of those sentences are more readable than others, e.g. the every- construction with a relative clause is more readable than the if-then construc- tions with full clauses. On the other hand, relative clauses cannot express more complex structures (without causing ambiguity in the output), thus the more general if-then construction must be used. A flexible ACE generation system could use relative clauses in case they allow to correctly express all the refer- ences in the DRS and revert to using if-then sentences in case a more flexible reference system is needed. Note also, that a variety of different verbalizations can be achieved by chang- ing the input ontology with a reasoner which restructures the ontology and/or modifies it by adding/removing certain (possibly redundant) information. I.e. we could provide a relatively direct OWL-to-ACE mapping, but use a reasoner to customize the verbalization procedure for our needs.

6 Future work The current mapping lacks support for datatype properties and enumerations. Also, allValuesFrom cannot be directly generated, but its semantics can be achieved by using double negation. We will add support of those constructs along with support of proposed extensions to the current version of OWL DL, such as qualified cardinality and local reflexivity restrictions. Some of those changes re- quire modification of the ACE syntax. ACE also needs support for namespaces, at least at the tokenizer level, to be called a Semantic Web language. We will also study if the OWL ACE subset is easier or harder to teach to the users than full ACE.

References

1. Attempto project. Attempto website, 2006. http://www.ifi.unizh.ch/attempto. 2. Norbert E. Fuchs, Kaarel Kaljurand, and Gerold Schneider. Deliverable I2-D5. Verbalising Formal Languages in Attempto Controlled English I. Technical report, REWERSE, 2005. http://rewerse.net/deliverables.html. 3. Daniel Hewlett, Aditya Kalyanpur, Vladamir Kovlovski, and Chris Halaschek- Wiener. Effective Natural Language Paraphrasing of Ontologies on the Semantic Web. In End User Semantic Web Interaction Workshop (ISWC 2005), 2005. 4. Ian Horrocks, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, and Mike Dean. SWRL: A Semantic Web Rule Language Combining OWL and RuleML. W3C Member Submission 21 May 2004. Technical report, W3C, 2004. http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/. 5. Peter F. Patel-Schneider. The OWL 1.1 Extension to the W3C OWL Web Ontology Language. Editor’s Draft of 19 December 2005. Technical report, 2005. http://www- db.research.bell-labs.com/user/pfps/owl/overview.html. 6. Peter F. Patel-Schneider, Patrick Hayes, and Ian Horrocks. OWL Web Ontol- ogy Language Semantics and Abstract Syntax. W3C Recommendation 10 February 2004. Technical report, W3C, 2004. http://www.w3.org/TR/owl-semantics/. 7. Alan L. Rector, Nick Drummond, Matthew Horridge, Jeremy Rogers, Holger Knublauch, Robert Stevens, Hai Wang, and Chris Wroe. OWL Pizzas: Practical Experience of Teaching OWL-DL: Common Errors & Common Patterns. In Enrico Motta, Nigel Shadbolt, Arthur Stutt, and Nicholas Gibbins, editors, Engineering Knowledge in the Age of the Semantic Web, 14th International Conference, EKAW 2004, volume 3257 of Lecture Notes in Computer Science, pages 63–81. Springer, October 5–8 2004. 8. Rolf Schwitter. Controlled Natural Language as Interface Language to the Semantic Web. In 2nd Indian International Conference on Artificial Intelligence (IICAI-05), Pune, India, December 20–22 2005. Attempto Controlled English as Ontology Language

Tobias Kuhn Department of Informatics University of Zurich Switzerland Email: [email protected]

Abstract— Using the domain of protein interactions we in- example, can stand for the concept of all proteins. As a second troduce Attempto Controlled English (ACE) as an ontology possibility we can use the positive form of adjectives. The language. We then use this ontology in two example applications. adjective ‘organic’, for example, might be used for the concept In the first example, we propose to summarize the results of scientific papers in ACE. These summaries would make scientific of all organic substances. results accessible by computers and thus directly usable for any Roles stand for binary relations between objects, and they reasoning task. In the second example we present a novel kind of can be expressed in four different ways. First of all, we wiki that allows users to extend and to maintain ACE ontologies. This work is a cooperation of Zurich (working group I2) and can use transitive verbs for expressing roles. For example, Dresden (working group A2). we can use ‘interacts-with’ to express a relationship between proteins. Next, we can combine transitive verbs with adverbs. I.INTRODUCTION For example, we can use the adverb ‘directly’ together with A major drawback of most ontology languages is their the transitive verb ‘interacts-with’ to express the role ‘directly formal appearance. Domain specialists are often not familiar interacts-with’. As a third possibility we can use of-constructs with formal languages and they do not want to learn a com- like ‘is a part of’. Due to the syntax of ACE, ‘of’ is the only plicated formal knowledge representation language. Attempto allowed preposition for nouns. Finally, we can use constructs Controlled English (ACE) could close this gap. ACE is a with the comparative form of adjectives like ‘is larger than’. formal language that looks like natural English. Every ACE The examination of statements about protein interactions text can be translated unambiguously into first-order logic. showed that normal roles are often not sufficient to express Table I shows how the fact ‘everything that is a protein has the needed information. We can express simple statements a terminus’ is expressed in different knowledge representation like ‘P1 interacts-with P2’, but we cannot express statements languages: first-order logic (FOL), OWL, UML, and ACE. The with contextual information like ‘P1 interacts-with P2 in OWL representation (using the RDF/XML syntax) is the most Yeast’ or ‘P1 interacts-with P2 in Microfilament for Motor- verbose and – from the human perspective – the least readable Activity’. In order to be able to express such statements, one. The representations in first-order logic and UML are more we want to allow roles to have such additional information. concise, but they are still not understandable for people who In natural English we usually express such information with are not familiar with formal notations. The ACE representa- prepositional phrases, and this is exactly the way we do it in tion, in contrast, should be immediately understandable for ACE. any English speaking person. It looks perfectly like natural English and thus the reader might not even recognize that it Using these ontology elements, we can express for example is a formal language. the sentence We present now how ACE can be used as ontology lan- guage, and afterwards we will show two possible applications. P1 is a protein and directly interacts-with P2 in Yeast. In both cases we focus on the domain of protein interactions, where ‘P1’, ‘P2’, and ‘Yeast’ are individuals, ‘protein’ stands and all the examples in this text are chosen from this domain. for a concept, and ‘directly interacts-with’ stands for a role. II.ACE ASAN ONTOLOGY LANGUAGE The phrase ‘is a’ is used to assign the individual ‘P1’ to In order to provide basic structures for ontologies in ACE, the concept ‘protein’. The conjunction ‘and’ connects the we adopt the elements from Description Logics: individuals, statements flanking left and right, and the preposition ‘in’, concepts, and roles. We call them ontology elements, and use finally, connects to the context ‘Yeast’. them to represent the foundation of ontologies in ACE. In the following we present two example applications of Individuals stand for single objects of the domain. They an ACE ontology. First, we show how ACE can be used to are represented in ACE as proper names like ‘IRAK2’ or express the results of scientific papers in formal abstracts, in ‘Alzheimer’. oder to make text mining more powerful. Second, we present Concepts stand for classes of objects and there are two a novel kind of wiki that contains articles written in ACE. possibilities how to express them in ACE. Common nouns Extending and maintaining an ontology becomes (almost) as are the most straight-forward way. The noun ‘protein’, for easy as editing a common wiki. TABLE I EXAMPLE IN DIFFERENT KNOWLEDGE REPRESENTATION LANGUAGES FOL ∀X(protein(X) → ∃Y (terminus(Y ) ∧ has(X,Y ))) OWL

1..* UML Protein Terminus

ACE Every protein has a terminus.

III.FORMAL SUMMARIESFOR SCIENTIFIC PAPERS ontology element. By clicking the add-button, the user can Biomedical scientists are challenged by an ever-increasing create ACE sentences and the words of the sentence are amount of scientific papers. The indexing service PubMed1 automatically linked to the existing articles. shows the huge quantity of literature that the scientists have For the creation of ACE sentences, the user is supported to face. It contains at the moment 16 million articles and grows by a special editor, that shows step by step the possible every year by over 600’000 articles. All these biomedical continuations of the sentence. Thus the user does not need articles are written in natural language. That means that we to know the details of the ACE syntax. The words have cannot easily process them with computers. But, facing the to be chosen by navigating through the concept- and role- quantity of literature, it is clear that we need computational hierarchies. If the user wants to use a word that does not support in order to manage the contained knowledge. yet exist, he can create it on-the-fly. The editor is aware of Instead of using natural language processing, we suggest an the underlying ontology and thus it allows only to create alternative approach: The authors of scientific articles formally sentences that comply with that ontology. If the role ‘localizes- summarize their own results in ACE. Such formal summaries to’, for example, has the range ‘cellular-component’, then we are added to the articles which makes them processable would not be able to create a sentence like ‘Act1 localizes-to by computers. ACE is on the one hand easy to learn and a protein’, since this would violate the range-condition. understand, and on the other hand it is expressive enough to There exist many other attempts for semantic wikis (e.g. 2 represent even complicated scientific results. IkeWiki ). ACEWIKI is different in that the formal semantics It is clear that this approach is not applicable for papers are not just attached to the articles, but the contents of that have been written without the formal summaries, and that the articles themselves are completely formal. Nevertheless means that we still need natural language processing or manual ACEWIKI looks almost as natural as a normal wiki. extraction for such papers. Thus it is rather a concept for the V. CONCLUSIONS future than a solution for today’s problems. Formal summaries for biomedical papers could be a first Figure 1 shows how the frontpage of an article with an ACE step towards better communication and persistence of biomedi- summary could look like. In order to allow the researchers cal knowledge. In the case of wikis, ACE could move them to a to write such formal summaries, we support them with an higher level, providing a logical representation of the contained authoring tool that hides the technical details. It helps the knowledge. researchers to choose the right words from the lexicon, to use Altogether, we believe that controlled natural languages like these words as intended by the ontology, and to write texts that ACE can combine the power of ontologies with the conve- are compliant with the ACE syntax and with the ontology. nience of natural language. ACE can make scientific results IV. ACE FOR SEMANTIC WIKI readable by humans (since it looks like natural English) and processable by machines (since it is automatically translatable If the extension and maintenance of an ontology is as easy into first-order logic). Humans and machines could work hand as editing a wiki, then it would be much easier for a scientific in hand. community to provide a consistent and complete knowledge base. Each member would be able to contribute, without learn- REFERENCES ing a complicated language. We built the prototype ACEWIKI [1] Corinne Cayrol, Celine´ Cougoule, Michel Wright. The β-adaptin clathrin that shows how a semantic wiki using ACE could look like. adaptor interacts with the mitotic checkpoint kinase BubR1. In “Biochem- Figure 2 shows a screenshot. Like common wikis, ACEWIKI ical and Biophysical Research Communications”, Volume 298, Issue 5, consists of articles. Every article is attached to exactly one Pages 720-730. 2002.

1http://www.pubmed.gov 2http://ikewiki.salzburgresearch.at The β-adaptin clathrin adaptor interacts with the mitotic checkpoint kinase BubR1

Corinne Cayrol, C´eline Cougoule, Michel Wright

Abstract

The adaptor AP2 is a heterotetrameric complex that associates with clathrin and regulatory proteins to mediate rapid endocytosis from the plasma membrane. Here, we report the identification of ...

Keywords: Protein interactions; Two-hybrid; Vesicular traffic; Adaptor protein; Protein kinase; Mitotic checkpoint.

ACE Summary: Beta2-Adaptin binds BubR1 in Yeast-Two-Hybrid. A trunk-domain of Beta2-Adaptin interacts-with BubR1. Bub1 interacts-with the trunk-domain of Beta2-Adaptin. Bub1 interacts-with every beta-sheet of AP and BubR1 interacts-with every beta-sheet of AP.

Fig. 1. Article with ACE summary (see [1])

Fig. 2. ACEWIKI

REWERSE

Workpackage I3

Composition and Typing 1 Demos

1.1 A prototype implementation of a descriptive type system for Xcerpt

A descriptive type system for Xcerpt has been proposed in deliverable I3-D4 and other pa- pers. The intended application is (statically) finding (a certain kind of) errors in programs. Here we present a prototype implementation of the type system. The implementation is able to (statically) check correctness of an Xcerpt program with respect to a type specification. A type specification describes a set of possible data bases to which the program is to be applied and an expected set of results. Program correctness means that all its results are in the ex- pected set. Failure of correctness check suggests error in the program. Under certain condi- tions such failure indeed means that the program is incorrect. Current implementation works for a restricted, but interesting subset of Xcerpt. In particular only programs consisting of one rule are dealt with. Additionally, the system provides (ap- proximations of) the set of program results and the sets of values of program variables; this information is useful for programmers (and is produced even when the specification of the expected set of results is not given). The prototype is available at http://www.ida.liu.se/~artwi/XcerptT. Contact: Artur Wilk (Linköping)

1.2 Applying TOM to implement a prescriptive type system

The prescriptive typing approach has been adopted by Nancy to develop a set of typing rules to capture the types of forward path expressions in XQuery. The algorithm builds an envi- ronment with type information provided by the schema definition of the source data being queried, and relies on the static semantics of XQuery (http://www.w3.org/TR/xquery- semantics/) for the other constructs of the language. The typing rules were implemented using TOM (http://tom.loria.fr), that demonstrates its suitability to encode formal systems (besides other papers, see "documentation" in its the web site). The prototype is able to indicate the resulting type for a considerable set of path expressions. Contact: Jakob Henriksson (Dresden)

1.3 Prototypical composition system

We aim at creating a composition system according to the ideas of invasive software compo- sition. A vital part of any composition system is the component model. A component model describes what components look like and how they are interfaced to create more complex software entities. Deliverable I3-D1 described a method of how to automatically generate a component model from a core-language description in form of a meta-model. We extend this meta-model with hook constructs, each corresponding to an existing construct in the core language. This ex- tended meta-model, the component model, defines a reuse-language in which reuse-aware components can be written. The reuse-language thus contain constructs (hooks) for explicitly defining the interfaces to other components. We present two systems which follow two closely related tracks.

One system uses the Eclipse Modeling Framework (EMF) as a backbone to generate tool sup- port for composition, e.g. parsing and abstract syntax tree tooling. Ecore (modeling language of EMF) is used to describe the core languages and the generated reuse languages. The tool integrates into the Eclipse Platform. The system processes the components written in a reuse-language by merging them into a complete program of the core language. The resulting code can thus be processed by already existing tools developed for the core language, i.e. interpreters or compilers. The system demonstrates composition using two different core languages: a simple impera- tive programming language called Graal and the XML query language Xcerpt developed by WG I4. In the other track we work with a system targeted towards OWL as meta-modeling language. Contact: Jakob Henriksson (Dresden)

1.4 Personalised Service Discovery and Composition

PreDiCtS is a framework for personalised service discovery and composition. The underlying concept behind PreDiCtS is that, similar service composition problems could be tackled in a similar manner by reusing past composition experience. The stored experience has to be use- ful and at the same time flexible enough to allow for adaptations to new problems. For this reason we have opted to use template-based composition information. PreDiCtS’s retrieval and refinement technique is based on conversational case-based reasoning (CCBR) [1] and makes use of a core OWL ontology called CCBROnto [2] for case representations. The PreDiCtS prototype, allows for the creation and retrieval (shortly also adaption) of cases. Each case includes knowledge about a particular problem,in the form of a set of question- answer pairs, and its relevant solution which is a place holder for a composition template. Case retrieval is based on an adaptation of the taxonomic CCBR algorithm [3]. At present these composition templates are defined in OWL-S but we are experimenting with the possi- bility of using the component composition technology coming out of I3. [1] Conversational case-based reasoning, Aha, D.W., Breslow, L.A., & Muñoz-Avila, H.(2000). [2] http://www.semantech.org/ontologies/CCBROnto.owl [3] Gupta K.M., Taxonomic Conversational Case-Based Reasoning, Case-Based Reasoning Research and Development, (Eds. D.W. Aha & I. Watson), Springer, Berlin, Germany, 2001 Contact: Charlie Abela (Malta)

2 Extended Abstracts

REWERSE

Workpackage I4

Reasoning Aware Querying 1 Demos

1.1 Effective and Efficient Data Access in the Versatile Web Query Lan- guage Xcerpt

Access to Web data has ecome an integral part of many applications and services. In the past, such data has usually been accessed through human-tailored HTML interfaces. Nowadays, rich client interfaces in desktop applications or, increasingly, in browser-based clients ease data access and allow more complex client processing based on XML or RDF data retrieved throughWeb service interfaces. Convenient specifications of the data processing on the client and flexible, expressive service interfaces for data access become essential in this context. Web query languages such as XQuery, XSLT, SPARQL, or Xcerpt have been tailored spe- cifically for such a setting: declarative and efficient access and processing of Web data. Xcerpt stands apart among these languages by its versatility, i.e., its ability to access not just one Web format but many. In this demonstration, two aspects of Xcerpt are illustrated in de- tail: The first part of the demonstration focuses on Xcerpt’s pattern matching constructs and rules to enable effective and versatile data access. It uses a concrete practical use case from bibliography management to illustrate these language features. Xcerpt’s visual companion language visXcerpt is used to provide an intuitive interface to both data and queries. The sec- ond part of the demonstration shows recent advancements in Xcerpt’s implementation focus- ing on experimental evaluation of recent complexity results and optimization techniques, as well as scalability over a number of usage scenarios and input sizes.

Detailed demo description: [pdf]

Contact: Sacha Berger, Francois Bry, Tim Furche, Benedikt Linse, Andreas Schroeder (Munich)

2 Extended Abstracts

2.1 Efficient Querying with Xcerpt: Theory, Complexity, and Algorithms

Authors: Andreas Schroeder, Tim Furche, François Bry

2.2 dlvhex: Current Status and Future Issues

Authors: Thomas Eiter, Giovambattista Ianni, Roman Schindlauer, and Hans Tompits

BERGER et al.: EFFECTIVE AND EFFICIENT DATA ACCESS IN XCERPT. REWERSE annual meeting, 2006 1

Eective and Ecient Data Access in the Versatile Web Query Language Xcerpt

Sacha Berger, François Bry, Tim Furche, Benedikt Linse, Andreas Schroeder

Abstract— Access to Web data has become an integral part of many (6) In its strict separation of querying and construction in rules, it applications and services. In the past, such data has usually been accessed makes programs more readable and optimization over intermediary through human-tailored HTML interfaces. Nowadays, rich client interfaces results feasible. in desktop applications or, increasingly, in browser-based clients ease data access and allow more complex client processing based on XML or RDF visXcerpt [2] is Xcerpt’s visual companion language related to it data retrieved through Web service interfaces. Convenient specications of the data processing on the client and exible, expressive service interfaces in an unusual way: visXcerpt is a visual query language obtained by for data access become essential in this context. Web query languages mere rendering of Xcerpt without changing the language constructs such as XQuery, XSLT, SPARQL, or Xcerpt have been tailored specically or the runtime system for query evaluation. ¿is rendering is mainly for such a setting: declarative and ecient access and processing of Web achieved via CSS styling of Xcerpt’s constructs. ¿e authors believe data. Xcerpt stands apart among these languages by its versatility, i.e., its that this approach is promising, as it makes those languages easy to ability to access not just one Web format but many. In this demonstration, two aspects of Xcerpt are illustrated in detail: ¿e rst part of the learn—and easy to develop. demonstration focuses on Xcerpt’s pattern matching constructs and rules to enable eective and versatile data access. It uses a concrete practical ¿is demonstration is split in two parts: rst the novel language use case from bibliography management to illustrate these language constructs for versatile pattern matching and rule-based data in- features. Xcerpt’s visual companion language visXcerpt is used to provide tegration are illustrated along a practical demonstrator application an intuitive interface to both data and queries. ¿e second part of the using visXcerpt. Xcerpt’s core features, especially the pattern-oriented demonstration shows recent advancements in Xcerpt’s implementation focusing on experimental evaluation of recent complexity results and queries and answer-constructors, its rules or views, and its specic optimization techniques, as well as scalability over a number of usage language constructs for incomplete specications are emphasized in scenarios and input sizes. this application. It is demonstrated (a) how incomplete specications Index Terms— Xcerpt, evaluation, versatility, visXcerpt, rules, complex- are essential for retrieving semi-structured data, (b) how access to ity, XML, RDF both Web and Semantic Web data in the same query program is achieved and (c) how visXcerpt complements and integrates with Xcerpt. Special emphasis is placed on recent advancements in language I. Introduction constructs and concepts. ¿e second part of this demonstration focuses on the evaluation and EB QUERYING has received considerable attention from optimization of Xcerpt queries. In particular, it shows experimental W academia and industry culminating in the recent development conrmation of recent complexity results for various Xcerpt subsets. of the W3C Web query languages XQuery and SPARQL. ¿ese main- Furthermore, an impression of the eects of recent optimizations of stream languages, however, focus only on one of the dierent data complex queries involving negated or optional subterms is given. formats available on the Web. Integration of data from dierent sources and in dierent formats becomes a daunting task that requires knowledge of several query languages and to overcome the impedance II. Part 1: Language Features and visXcerpt mismatch between the query paradigms in the dierent languages. Xcerpt [8], [9] addresses this issue by garnering the entire language Setting of the Demonstrator towards versatility in format, representation, and schema of the data, Excerpts from DBLP1 and from a computer science taxonomy form cf. [5]. It is a semi-structured query language, but very much unique the base for the scenario considered in the application. DBLP is a among such languages (for an overview see [1]): collection of bibliographic entries for articles, books, etc. in the eld (1) In its use of a graph data model, it stands more closely to of Computer Science. DBLP data is a representative for standard semi-structured query languages like Lorel than to recent mainstream Web data using a mixture of rather regular XML content combined XML query languages. with free form, HTML-like information. A small Computer Science (2) In its aim to address all specicities of XML, it resembles more taxonomy has been built for the purpose of this demonstration. Very mainstream XML query languages such as XSLT or XQuery. much in the spirit of SKOS, this is a lightweight ontology based on (3) In using (slightly enriched) patterns (or templates or examples) RDF and RDFS. Combining such an ontology as metadata with the of the sought-for data for querying, it resembles more the “query- XML data of DBLP is a foundation for applications such as community by-example” paradigm [10] than mainstream XML query languages based classication and analysis of bibliographic information using using navigational access. interrelations between researchers and research elds. Realizing such (4) In oering a consistent extension of XML, it is able to incorporate applications is eased by using the integrated Web and semantic Web access to data represented in richer data representation formats. query language (vis)Xcerpt that also allows reasoning using rules. Instances of such features are element content, where the order is irrelevant, and non-hierarchical relations. Realizing Versatility (5) In providing (syntactical) extensions for querying, among others, Query and construction patterns in (vis)Xcerpt are used, both for RDF, Xcerpt becomes a versatile query language, cf. [5]. binding variables in query terms and for reassembling the variables ¿is research has been funded by the European Commission and by the in so-called construct terms. ¿e variable binding paradigm is that Swiss Federal Oce for Education and Science within the 6th Framework of Datalog: the programmer species patterns including variables. Programme project REWERSE number 506779 (cf. http://www.rewerse. net/). 1http://www.informatik.uni-trier.de/˜ley/db/ 2 BERGER et al.: EFFECTIVE AND EFFICIENT DATA ACCESS IN XCERPT. REWERSE annual meeting, 2006

Terms as formulas: Terms may contain boolean connectives, variables, negation, etc.

Subterm negation: Some subterms may be required not to occur in matching data

Optional subterms: Local form of disjunction essential for variable schema data

Value Joins: Expressed through multiple variable occurrences

Optional construction: Limited form of conditional construction based on variable bindings

Accessing Web resources: arbitrary XML documents can be accessed using their URL

Incomplete patterns in depth: descendant allows additional intermediary elements

Grouping collects alternative bindings for variables: essential for structural assembly

Incomplete patterns in breadth: partial patterns allow additional child elements

Fig. 1. Exemplary visXcerpt Query Patterns

Interactive behavior of variables in visXcerpt highlights the relation employed evaluation algorithm, called “matrix method” scales over a between variables in query and construct terms. Arguably, pattern large set of query scenarios, empirically conrming the theoretical based querying and constructing together with the variable binding complexity derived in [4]. Furthermore, the scalability of basic pattern paradigm make complex queries easier to specify and read. queries over a broad range of data sizes is illustrated. Finally, the To cope with the semistructured nature of Web data, (vis)Xcerpt eect of several advanced query constructs is investigated. It is shown query patterns use a notion of incomplete term specications with that constructs such as optional or qualied descendant do not optional or unordered content specication. ¿is feature distinguishes only make queries easier to express and understand, but in many (vis)Xcerpt from query languages like Datalog and query interfaces practical cases also more ecient to evaluate. Eects of optionality, like QBE [10]. Simple, yet powerful textual and visual constructs injectivity, order, totality, and subterm negation are shown in detailed of incompleteness are presented in the demonstrator application, evaluations. cf. Figure 1 showing two exemplary visual query patterns and a In further work, optimizations of the rule chaining algorithm are breakdown of used language constructs. investigated, partially based on dependency analysis provided by An important characteristic of (vis)Xcerpt is its rule-based nature: the above mentioned type system. Furthermore, rule unfolding and (vis)Xcerpt provides rules very similar to SQL views. Arguably, rules or algebraic optimization beyond intermediary construction similar to views are convenient for a logical structuring of complex queries. ¿us, optimization of nested construction in languages such as XQuery is in specifying a complex query, it eases the programming and improves investigated, cf. [7] for details on the relation of the two. the program readability to specify (abstract) rules as intermediate steps—very much like procedures in conventional programming. Another aspect of rules is the ability to solve simple reasoning tasks. References Referential transparency and answer closedness are essential prop- [1] J. Bailey, F. Bry, T. Furche, and S. Schaert, “Web and Semantic Web erties of Xcerpt and visXcerpt, surfacing in various parts of the Query Languages: A Survey,” in Reasoning Web Summer School 2005, demonstration. ¿ey are two precisely dened traits of the rather J. Maluszinsky and N. Eisinger, Eds. Springer-Verlag, 2005. vague notion of “declarativity”. Referential transparency means that [2] S. Berger, F. Bry, and S. Schaert, “A Visual Language for Web within a denition scope all occurrences of an expression have the Querying and Reasoning,” in Proc. Workshop on Principles and Practice of Semantic Web Reasoning, ser. LNCS, vol. 2901. Springer-Verlag, same value, i.e., denote the same data. Answer-closedness means that 2003. replacing a sub-query in a compound query by a possible single answer [3] S. Berger, E. Coquery, W. Drabent, and A. Wilk, “Descriptive Typing always yields a syntactically valid query. Referentially transparent and Rules for Xcerpt,” in Proc. of Workshop on Principles and Practice of answer-closed programs are easy to understand (and therefore easy Semantic Web Reasoning. REWERSE, 2005. [4] F. Bry, A. Schroeder, T. Furche, and B. Linse, “Ecient Evaluation of to develop and to maintain), as the unavoidable shi in syntax from n-ary Queries over Trees and Graphs,” Submitted for publication, 2006. the data sought for to the query specifying this data is minimized. [5] F. Bry, T. Furche, L. Badea, C. Koch, S. Schaert, and S. Berger, “Querying the Web Reconsidered: Design Principles for Versatile Web Query Languages,” Journal of Semantic Web and Information Systems, III. Part 2: Effectiveness and Efficiency vol. 1, no. 2, 2005. [6] T. Furche, F. Bry, and S. Schaert, “Initial Dra of a Language Currently, two main threads are considered in the Xcerpt project: Syntax,” REWERSE, Deliverable I4-D6, 2006. [Online]. Available: (1) A careful review of language constructs is underway that aims http://rewerse.net/deliverables/m18/i4-d6.pdf at an improved eectiveness for query authoring, cf. [6]. Related [7] B. Linse, “Automatic Translation between XQuery and Xcerpt,” Diplomarbeit/Master thesis, Institute for Informatics, University is a better support for RDF, including proper handling of b-nodes of Munich, 2006. [Online]. Available: http://www.pms.i.lmu.de/ in results and incomplete data specications. Furthermore, a type publikationen#DA_Benedikt.Linse system [3] for Xcerpt is under development that eases error detection [8] S. Schaert, “Xcerpt: A Rule-Based Query and Transformation and recovery. (2) Novel evaluation methods for Xcerpt, enabled by Language for the Web,” Dissertation/Ph.D. thesis, University of Munich, high-level query constructs, are being investigated. Xcerpt’s pattern 2004. [Online]. Available: http://www.pms.i.lmu.de/publikationen/ [9] S. Schaert and F. Bry, “Querying the Web Reconsidered: A Practical matching is based on simulation unication. An ecient algorithm of Introduction to Xcerpt,” in Proc. Extreme Markup Languages, 2004. simulation unication that is competitive with current main-stream [10] M. M. Zloof, “Query By Example: A Data Base Language,” IBM Systems Web query languages both in worst-case complexity and practical Journal, vol. 16, no. 4, pp. 324–343, 1977. performance is described in [4]. ¿e demonstration shows that the SCHROEDER et al.: EFFICIENT QUERYING WITH XCERPT. REWERSE annual meeting, 2006 1 Ecient Querying with Xcerpt: ¿eory, Complexity, and Algorithms

Andreas Schroeder, Tim Furche, François Bry

Abstract— Applications and services that access Web data are relational query evaluation. We show that the evaluation of a becoming increasingly more useful and wide-spread. Web query subset of Xcerpt queries, so called tree queries is polynomial languages provide ecient and eective means to access and in combined complexity (i.e., query and data complexity) process data published on the Web. Xcerpt is a particular breed of Web query languages tailored to versatile data access for RDF, even on graph-shaped data. ¿is result has been both derived XML, and other Web representation formats. Its rich constructs from known results in constraint solving and from a direct and reasoning capabilities make it a convenient language for a complexity analysis of the evaluation algorithm for Xcerpt. broad range of use cases, but also bring with it that ecient and Xcerpt’s evaluation algorithm, called here “matrix method”, scalable evaluation is dicult to achieve. ¿is talk presents both has been introduced in [10]. It is a rather simple, yet powerful theoretical complexity results over dierent settings and rs results on improved practical performance of Xcerpt. and practically ecient algorithm for simulation unication. Recently, it has been rened to avoid exponential behavior in Index Terms— Xcerpt, complexity, algebra, evaluation, eciency, XML fringe cases. ¿e revised version exhibits the above mentioned complexity results. ¿e evaluation of full Xcerpt queries (that include, e.g., value and identity joins) is obviously NP-complete I. Introduction over graph-shaped data, as can be deduced from the evaluation of conjunctive queries over relational data. Even over tree- CERPT [10], [11] is a versatile query language for shaped data, their evaluation is still NP-complete as shown in X reasoning-aware querying on the web currently developed recent work [6], [8] by Meuss, Schulz, and Gottlob. a er principles described in [2] as part of the REWERSE Despite the worst case combined complexity of full Xcerpt work package I4. Xcerpt uses a tailor made unication, called queries, the reduced Xcerpt query core can be evaluated in “simulation unication” [3]. Where canonical unication can be polynomial combined complexity (to be precise, Oˆq e2 where seen as a surjective and monotonic mapping between terms, q is the size of the query, and e is the number of edges in the simulation unication introduces several aspects to unication, data). Indeed, it has been shown that the matrix method scales such as injective, non-surjective, and non-monotonic mappings, over dierent query classes: it exhibits polynomial complexity subterm optionality and subterm negation. ¿ese new aspects of if used to evaluate tree queries. In practical cases, however, unication are essential for querying semi-structured data, even for queries using the full expressiveness of Xcerpt, the but pose signicant challenges for ecient evaluation. In worst-case behavior occurs seldom and query evaluation time this talk, the theoretical complexity properties as well as usually scales merely polynomially in data and query size. rst optimization approaches for simulation unication are Where the original matrix method is a top-down algorithm investigated. with xed access paths, practical evaluation might be improved by allowing more exible access paths thus giving the query optimizer a better chance to select appropriate indices. A naive II. Complexity of Xcerpt Core bottom-up algorithm employing at most basic relational indices, To this aim, we base querying with Xcerpt on the theoretical though it exhibits the same worst-case complexity, exhibits in foundation of conjunctive queries [1]. Conjunctive queries are practical cases on average case behavior close to the worst- o en used in the formalization of relational queries, as well as case complexity. More involved algorithms that mix bottom-up that of queries against tree and graph shaped data, e.g., in the and top-down evaluation and make use of specic indices for context of XPath [4], [5]. First a formalization of a “reduced” semi-structured query evaluation might change this picture core of Xcerpt queries as conjunctive queries is investigated. dramatically, however. ¿e reduced Xcerpt query core consists of comparisons of nodes (both based on node value and node identity), child and descendant queries, but lacks means to specify order and III. Extensions index injectivity among siblings. ¿e analysis of the reduced Xcerpt allows to relate Xcerpt A rst extension of the Xcerpt query core is the intro- queries to other approaches such as constraint solving and duction of mapping properties. Xcerpt’s four dierent kinds of subterm specications (partial vs. total, unordered vs. ¿is research has been funded by the European Commission and by the ordered) are known to require specic properties of sub- Swiss Federal Oce for Education and Science within the 6th Framework Programme project REWERSE number 506779 (cf. http://www.rewerse. term mappings. Total subterm specication (using single net/). query braces, i.e. f[t1,..., tn]) require mapping surjectivity, 2 SCHROEDER et al.: EFFICIENT QUERYING WITH XCERPT. REWERSE annual meeting, 2006 while ordered subterm specications (using square braces, i.e. ¿ough the results reported on in this talk are very f[[t1,..., tn]]) require mapping monotonicity, and unordered encouraging, they are only for the evaluation of Xcerpt subterm specications (using curly braces, i.e. f{t1,..., tn}) queries, but not for full programs involving several rules and, mapping injectivity. possibly, recursion. Further investigation of ecient evaluation Mapping monotonicity does not aect query complexity. of programs including dependency analysis based on work Introducing mapping injectivity however adds a complexity from REWERSE work package I3 on typing and rule unfolding factor of Oˆq3~2 v, where v denotes the number of data to algebraic expressions with construction using recent work vertices. ¿is factor emerges from the fact that injectivity can on nested querying and construction [7]. be enforced by using the “alldierent” global constraint, which has the above complexity [9]. ¿e Xcerpt query core is secondly extended adding subterm References optionality and negation. Here, interesting interaction eects [1] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Boston, with mapping properties arise: Monotonicity, injectivity and MA, USA: Addison-Wesley, 1995. surjectivity can be used to reduce the number of considered [2] F. Bry, T. Furche, L. Badea, C. Koch, S. Schaert, and S. Berger, mappings for subterm lists in the presence of optional subterms. “Querying the Web Reconsidered: Design Principles for Versatile Web Query Languages,” Journal of Semantic Web and Information Systems, With surjective mappings for example, the number of matching vol. 1, no. 2, 2005. optional subterms is determined by the arity of the data term. [3] F. Bry, T. Furche, S. Schaert, and A. Schröder, “Simulation Unication,” More complex considerations apply for monotonic and injective REWERSE, Deliverable I4-D5, 2005. [Online]. Available: http://rewerse. net/deliverables/m18/i4-d5.pdf mappings, for details refer to [12]. [4] G. Gottlob, C. Koch, and R. Pichler, “Ecient Algorithms for Negated subterms prot from the observation that injectivity Processing XPath Queries,” ACM Transactions on Database Systems, and monotonicity may not be respected between several 2005. [5] G. Gottlob, C. Koch, R. Pichler, and L. Segoun, “¿e Complexity of negated subterms, so that a slightly modied greedy algorithm XPath Query Evaluation and XML Typing,” Journal of the ACM, 2005. can reduce the number of considered subterm mappings for n [6] G. Gottlob, C. Koch, and K. Schulz, “Conjunctive Queries over Trees,” subsequent negated subterms from Oˆ2n to Oˆn, for details in Proc. Symposium on Principles of Database Systems, 6 2004, pp. 189–200, i4. refer again to [12]. [7] B. Linse, “Automatic Translation between XQuery and Xcerpt,” Although the presented techniques constitute a visible Diplomarbeit/Master thesis, Institute for Informatics, University improvement of Xcerpt’s runtime behavior, there are still several of Munich, 2006. [Online]. Available: http://www.pms.i.lmu.de/ publikationen#DA_Benedikt.Linse further elds to investigate for optimization of query evaluation. [8] H. Meuss and K. U. Schulz, “Complete Answer Aggregates for Treelike It may be fruitful to carry over optimization techniques from Databases: A Novel Approach to Combine Querying and Navigation,” other related areas such as constraint solving and relational ACM Transactions on Information Systems, vol. 19, no. 2, pp. 161–215, 2001. query optimization to Xcerpt. [9] J.-C. Régin, “A Filtering Algorithm for Constraints of Dierence in CSPs,” in Proc. Conf. on Articial Intelligence, AAAI, Ed., 1994, pp. 362–367. [10] S. Schaert, “Xcerpt: A Rule-Based Query and Transformation IV. Conclusion Language for the Web,” Dissertation/Ph.D. thesis, University of Munich, 2004. [Online]. Available: http://www.pms.i.lmu.de/publikationen/ We study the complexity of a small core of Xcerpt queries [11] S. Schaert and F. Bry, “Querying the Web Reconsidered: A Practical and extended it step by step to cover almost all special query Introduction to Xcerpt,” in Proc. Extreme Markup Languages, 2004. constructs. with every extension step, the complexity of the [12] A. Schroeder, “An Algebra and Optimization Techniques for Simulation Unication,” Diplomarbeit/Master thesis, Institute for presented solutions is investigated and shown. Informatics, University of Munich, 2005. [Online]. Available: Although the new constructs of Xcerpt are very powerful, http://www.pms.i.lmu.de/publikationen/#DA_Andreas.Schroeder we found techniques and algorithms with satisfactory computa- tional complexity that legitimate their addition to the language Xcerpt. Querying semistructured data with Xcerpt and all supported special constructs is very ecient in almost all common cases, even though there are some fringe cases where the evaluation time degrades. 1

dlvhex: Current Status and Future Issues ‡ ‡ ‡ ¿omas Eiter , Giovambattista Ianni , Roman Schindlauer†, and Hans Tompits ‡ Institut für Informationssysteme, Technische Universität Wien Favoritenstraße 9-11, A-1040 Vienna, Austria ˜eiter,ianni,roman,tompits@kr.tuwien.ac.at †Dipartimento di Matematica, Università della Calabria 87030 Rende (CS), Italy [email protected]

Abstract— ¿e Semantic Web vision needs formalisms for the Rule Other logic-based formalisms, like TRIPLE [16] or F-Logic [13], Layer that guarantee transparent interoperability with the Ontology Layer, feature also higher-order predicates for meta-reasoning in Semantic- clear semantics and full declarativity. hex-programs are logic programs Web applications. Our formalism is fully declarative and oers the featuring higher-order atoms, external atoms, negation-as-failure whose semantics is based on the notion of Answer Sets. ¿ey are aimed at possibility of nondeterministic predicate denitions with higher providing a suitable tool that aids in building the Rule Layer. Full complexity in a decidable setting. ¿is proved already useful for a declarativity, decidability, nondeterminism, nonmonotonicity, non-nite range of applications with inherent nondeterminism, such as ontology universe of individuals, smooth interfacing with the Ontology layer are merging (cf. [17]) or matchmaking, and thus provides a rich basis for the features which -programs foster. ¿is talks reports about the hex integrating these areas with meta-reasoning. ongoing eort of developing dlvhex, a reasoning engine for hex-programs, based on a the DLV answer set programming engine. ¿anks to current external interfaces to RDF data sets and OWL-DL knowledge bases, dlvhex provides a tool which supports the access and integration of knowledge and information in dierent standard formats on the web. ¿anks to II. Current Issues its openness and possibility of extensions through user-dened plugins, dlvhex may be used to as a backbone for reasoning aware querying. ¿e presence of external and higher-order atoms in hex programs raises some technical diculties for building implemented systems. In particular, if the following design goals should be kept: I. Introduction 1) Full declarativity. ¿is would mean that the user must be enabled to exploit external calls ignoring the exact moment an evaluation Nonmonotonic semantics is o en requested by Semantic-Web algorithm will invoke an external reasoner. So external calls designers in cases where the reasoning capabilities of the Ontology must be, although parametric, stateless. layer of the Semantic Web turn out to be too limiting, since they 2) Potentially innite universe of individuals. Current ASP solvers are based on monotonic logics. ¿e widely acknowledged answer-set work under the assumption of a given, nite universe of semantics of nonmonotonic logic programs [10], which is arguably constants. ¿is ensures termination of evaluation algorithms the most important instance of the answer-set programming (ASP) (which are based on grounding), but is a non-practical setting if paradigm, is a natural host for giving nonmonotonic semantics to actual external knowledge must be brought inside the rule layer. the Rules, Logic, and Proof layers of the Semantic Web. ¿erefore, suitable methods must be devised for bringing nite However, for important issues such as meta-reasoning in the context of the Semantic Web, no adequate answer-set engines have been amounts of new symbols into play while keeping decidability available so far. Motivated by this fact and the observation that, of the formalism. furthermore, interoperability with other so ware is an important 3) Expressive external atoms. Interfacing external sources should issue (not only in this context), in previous work [6], the answer-set support (at least) the exchange of predicates, and not only of semantics has been extended to hex programs, which are higher- constants (i.e., individuals). ¿e generic notion of external atom order logic programs (which accommodate meta-reasoning through higher-order atoms) with external atoms for so ware interoperability. in [6], however, permits that its evaluation depends on the Intuitively, a higher-order atom allows to quantify values over predicate interpretation as a whole; for practical realization, this quickly names, and to freely exchange predicate symbols with constant gets infeasible. ¿erefore, restricted yet still expressive classes symbols, like in the rule of external atoms need to be identied. CˆX ¢ subClassOf ˆD, C, DˆX. ¿ese problems are nontrivial and require careful attention. In our An external atom facilitates the assignment of a truth value of an recent work, we have addressed them as follows [8]. atom through an external source of computation. For instance, the Y Emerging from reasonable (syntactic and semantic) conditions, rule meaningful classes of -programs are considered, which tˆSub, Pred, Obj ¢ &RDF uri¥ˆSub, Pred, Obj hex leading to a categorization of hex programs. ¿ey include a computes the predicate t taking values from the predicate &RDF. notion of stratication, which is more liberal than previous ¿e latter predicate extracts RDF statements from the set of URIs proposals for fragments of the language (e.g., as for HiLog specied by the extension of the predicate uri; this task is delegated to programs [15]), as well as syntactic restrictions in terms of novel an external computational source (e.g., an external deduction system, safety conditions for the rules. Furthermore, we consider restricted an execution library, etc.). External atoms allow for a bidirectional external predicates with additional semantic annotation which ow of information to and from external sources of computation such includes types of arguments and properties such as monotonicity, as description logics reasoners. By means of hex-programs, powerful anti-monotonicity, or linearity. meta-reasoning becomes available in a decidable setting, e.g., not Y A method for decomposing hex-programs into separate modules only for Semantic-Web applications, but also for meta-interpretation with distinct features regarding their evaluation algorithm has techniques in ASP itself, or for dening policy languages. been devised. 2

Y Strategies for computing the models of hex-programs by hierar- Identiers, or URIs), and describing resources in terms of simple chically evaluating their decomposed modules are explored. properties and property values. ¿e RDF plug-in provides a single ¿ese results are important steps towards the eective realization of external atom, the &RDF atom, which enables the user to import hex programs. While targeted for the hex framework, the methods RDF-triples from any RDF knowledge base. It takes a single constant and techniques might be applied to other similar languages and as input, which denotes the RDF-source (a le path or Web address). frameworks a well. Indeed, hex-programs model various formalisms 2) ¿e Description-Logics Plug-In: Description logics are an im- in dierent domains [6], and special external atoms (inspired by [9]) portant class of formalisms for expressing knowledge about concepts are important features of other recent declarative rule formalisms for and concept hierarchies (o en denoted as ontologies). ¿e basic the Semantic Web [11], [17], [18]. building blocks are concepts, roles, and individuals. Concepts describe the common properties of a collection of individuals and can be considered as unary predicates interpreted as sets of objects. Roles are III. Current Prototype interpreted as binary relations between objects. In previous work [9], we introduced dl-programs as a method to interface description-logic Our prototype implementation of hex programs, dlvhex, uses knowledge bases with answer-set programs, allowing a bidirectional for the core reasoning process the answer-set solver DLV [14] (and ow of information. To model dl-programs in terms of hex-programs, DLT [12] if F-Logic syntax is used). It is a command-line application we developed the description-logics plug-in, which includes three and takes one or more hex-programs as input and directly prints external atoms (these atoms—in accord to the semantics of dl- the resultant models as output. Both input and output are given in programs—also allow for extending the description logic knowledge classical textual logic-programming notation. A simple Web-interface base prior to the actual query by means of the atoms’ input for dlvhex is available at parameters): http://www.kr.tuwien.ac.at/staff/roman/dlvhex. Y the &dlC atom, which queries a concept (specied by an input It allows for entering a hex-program and lter predicates and displays parameter of the atom) and retrieves its individuals, the resulting models. Y the &dlR atom, which queries a role and retrieves its individual ¿e evaluation principle of dlvhex is to split the program according pairs, and to its dependency graph, generalized to hex programs, into components Y the &dlConsistent atom, which tests the (possibly extended) and alternately call an answer-set solver (DLV) and the external atom description logic knowledge base for consistency. functions for the respective subprograms. ¿e framework takes care ¿e description-logics plug-in can access OWL ontologies, i.e., of traversing the tree of components in the right order and combining description logic knowledge bases in the language SHOIN ˆD, their resulting models. Composing the initial dependency graph from utilizing the RACER reasoning engine. a nonground program is not a trivial task, since higher-order atoms We also supply on the prototype webpage a toolkit for developing as well as the input list of an external atom have to be considered. custom plug-ins, embedded in the GNU autotools environment, which To this end, we dened a novel notion of atom dependency, which takes care for the low-level, system-specic build process and lets the extends the traditional understanding of dependencies within a logic plug-in author concentrate his or her eorts on the implementation of program. ¿is leads to the novel types of stratication mentioned the plug-in’s actual core functionality. More details on the prototype above, which help splitting a hex-program and choosing the suitable can be found in [7]. model generation strategies. Further methods of increasing the eciency of computation include a general classication of external atoms regarding their functional IV. Future Issues properties. For instance, their evaluation functions may be monotonic or linear with respect to a given input. Formalizing such knowledge ¿ere are a number of issues which we have on our agenda and allows for an intelligent caching algorithm and thus for a reduction of which we plan to address in future work. interactions with the external computation source. Latest developments Y RuleML integration. We are developing an XML-based markup also include a directive to syntactically handle namespaces and an language for specifying hex-programs, based on and extend- algorithm for traversing the component graph for disjunctive programs, ing RuleML (Rule Markup Language) [4]; see http://www.kr. eventually implementing the full -program semantics. hex tuwien.ac.at/staff/roman// for more information. We To keep the development and usage of external atoms as exible as intend to integrate RuleML import and export mechanisms into possible, we decided to embed them into plug-ins, i.e., libraries that dlvhex, as well as providing a Web-Service interface through dene and provide one or more external atoms. Such plug-ins are standardized access mechanism such as SOAP (Simple Object implemented as shared libraries, which link dynamically to the main Access Protocol). application at runtime. A lean, object-oriented interface reduces the Y Further external atoms. Another issue is to build other external eort of developing custom plug-ins to a minimum. atoms, and thus to see to broaden the range of applicability. In particular, interfacing with Xcerpt [3], [5] and similar formalisms A. Available plugins is of interest. ¿e current implementation features DL-atoms and RDF-atoms for Y Optimization issues and full implementation. ¿e current proto- accessing OWL and RDF ontologies, respectively, but also provides a type implements some optimization techniques (e.g., the usage of toolkit for programming customized external predicates. ¿e prototype stratication, splitting sets) but obviously there is room for further actually subsumes our DL-program prototype [9]. optimization. In particular, for programs which make use of 1) ¿e RDF Plug-In: RDF (Resource Description Framework) is a unstratied negation, the current implementation used (rened) language for representing information about resources in the World- guess and checking as a basic approach. Furthermore, not all Wide Web and is intended to represent meta-data about Web resources types of external atoms are currently faithfully supported (in which is machine-readable and -processable. RDF is based on the idea particular, non-monotonic atoms), and current implementation of identifying objects using Web identiers (called Uniform Resource of plugins may be enhanced and rened. 3

Y Extension of the core language. ¿e denition of hex programs [11] S. Heymans, D. V. Nieuwenborgh, and D. Vermeir. Preferential reasoning was concentrated on the core aspects of a non-monotonic on a web of trust. In Y. Gil, E. Motta, V. R. Benjamins, and M. A. logic programming language. Numerous extensions to this core Musen, editors, Proceedings 4th International SemanticWeb Conference (ISWC 2005), Galway, Ireland, November 6-10, 2005, volume 3729 of language do exist, cf. [19]. A very useful and promising extension, Lecture Notes in Computer Science, pages 368–382. Springer, 2005. which is not mere syntactic sugar, are weak (so ) constraints [12] G. Ianni, G. Ielpa, A. Pietramala, M. C. Santoro, and F. Calimeri. as available in DLV [14]. ¿ese are, in the simplest instance, Enhancing Answer Set Programming with Templates. In J. P. Delgrande expressions of the form and T. Schaub, editors, 10th International Workshop on Non-Monotonic Reasoning (NMR 2004), Whistler, Canada, June 6-8, 2004, Proceedings, ¢¢ L ,..., L . C¥ pages 233–239, 2004. 1 k [13] M. Kifer, G. Lausen, and J. Wu. Logical foundations of object-oriented and frame-based languages. J. ACM, 42(4):741–843, 1995. where L1,... Lk are literals and C is an integer. Informally, [14] N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, and joint truth of L1,..., Lk in a model comes at a cost of C, and F. Scarcello. ¿e DLV System for Knowledge Representation and the overall cost of all weak constraints is minimized. In this Reasoning. ACM Transactions on Computational Logic, 2006. To way, optimization problems (e.g., Traveling Salesperson) can be appear. Available via http://www.arxiv.org/ps/cs.AI/0211004. conveniently expressed. [15] K. A. Ross. On Negation in HiLog. Journal of Logic Programming, 18(1):27–53, 1994. Y Applications. As for applications, there are dierent target areas. [16] M. Sintek and S. Decker. Triple - a query, inference, and transformation One is reasoning on top of ontologies which involves non- language for the semantic web. In International Semantic Web monotonicity (e.g., abduction, or enhancement of ontology Conference, pages 364–378, 2002. reasoning with simple defaults). Another interesting area is [17] K. Wang, G. Antoniou, R. W. Topor, and A. Sattar. Merging and Aligning Ontologies in dl-Programs. In A. Adi, S. Stoutenburg, and personalization [1], which may be explored in future work in S. Tabet, editors, Proceedings First International Conference on Rules and connection with WG A3, in particular the Personal Publication Rule Markup Languages for the Semantic Web (RuleML 2005), Galway, Reader [2]. Another interesting application area are security Ireland, November 10-12, 2005, volume 3791 of LNCS, pages 160–171. polices, where dlvhex might be used for evaluating security Springer, 2005. [18] K. Wang, D. Billington, J. Blee, and G. Antoniou. Combining description policies. ¿is should be explored in cooperation with WG I2. logic and defeasible logic for the semantic web. In G. Antoniou and H. Boley, editors, Proceedings RuleML 2004 Workshop, ISWC Conference, Hiroshima, Japan, November 2004, number 3323 in LNCS, pages 170–181. Springer, 2004. References [19] S. Woltran. Answer Set Programming: Model Applications and Proofs- of-Concept. Technical Report WP5, Working Group on Answer Set [1] G. Antoniou, M. Baldoni, C. Baroglio, R. Baumgartner, F. Bry, T. Eiter, Programming (WASP, IST-FET-2001-37004), July 2005. Available at N. Henze, M. Herzog, W. May, V. Patti, S. Schaert, R. Schindlauer, and http://www.kr.tuwien.ac.at/projects/WASP/repo\rt.html. H. Tompits. Reasoning methods for personalization on the semantic web. Annals of Mathematics, Computing and Teleinformatics, 2(1):1–24, 2004. ISSN 1109-9305. Invited paper. [2] R. Baumgartner, N. Henze, and M. Herzog. ¿e personal publication reader: Illustrating web data extraction, personalization and reasoning for the semantic web. In A. Gómez-Pérez and J. Euzenat, editors, ¿e Semantic Web: Research and Applications, Second European Semantic Web Conference, ESWC 2005, Heraklion, Crete, Greece, May 29 - June 1, 2005, Proceedings, volume 3532 of Lecture Notes in Computer Science, pages 515–530. Springer, 2005. [3] S. Berger, F. Bry, O. Bolzer, T. Furche, S. Schaert, and C. Wieser. Xcerpt and XChange: Twin query languages the semantic web. In Online Proceedings International Conference on Semantic Web (ISWC), 2004. Demo paper. [4] H. Boley, S. Tabet, and G. Wagner. Design rationale for RuleML: A markup language for Semantic Web rules. In Proceedings SWWS-2001, pages 381–401, 2001. [5] F. Bry, P.-L. Patranjan, and S. Schaert. Xcerpt and XChange - logic programming languages for querying and evolution on the web. In B. Demoen and V. Lifschitz, editors, Logic Programming, 20th International Conference, ICLP 2004, Saint-Malo, France, September 6-10, 2004, Proceedings, pages 450–451, 2004. [6] T. Eiter, G. Ianni, R. Schindlauer, and H. Tompits. A uniform integration of higher-order reasoning and external evaluations in answer set programming. In Proceedings of the 19th International Joint Conference on Articial Intelligence (IJCAI-05). Morgan Kaufmann, 2005. [7] T. Eiter, G. Ianni, R. Schindlauer, and H. Tompits. dlvhex: A system for integrating multiple semantics in an answer-set programming framework. In M. Fink, H. Tompits, and S. Woltran, editors, Proceedings 20th Workshop on Logic Programming and Constraint Systems (WLP ’06), pages 206–210. TU Wien, Inst. f. Informationssysteme, TR 1843-06-02, 2006. [8] T. Eiter, G. Ianni, R. Schindlauer, and H. Tompits. Towards ecient evaluation of HEX programs. In Proceedings 3rd European Semantic Web Conference (ESWC 2006), 2006. To appear. [9] T. Eiter, T. Lukasiewicz, R. Schindlauer, and H. Tompits. Combining answer set programming with description logics for the Semantic Web. In Proceedings KR-2004, pages 141–151, 2004. Extended Report RR-1843- 03-13, Institut für Informationssysteme, TU Wien, 2003. [10] M. Gelfond and V. Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991.

REWERSE

Workpackage I5

Evolution and Reactivity 1 Demos

1.1 Demonstration of the Language XChange: Propagation of Updates on the Web

Many resources on the Web and the semantic Web are dynamic in the sense that they can change their content over time. The need for changing (updating) data on the Web has several reasons: new information comes in, calling for insertions of new data; information is out-of- date, calling for deletions and replacements of data. Such changes need to be mirrored by o- ther Web resources whose data depends on the initial changes. In other words, updates need to be propagated over related Web resources. The language XChange has been developed to respond to this need for evolution of data and reactivity on the Web.

Our use case demonstrates corresponding features of XChange considering as example sev- eral distributed Web resources of the Eighteenth Century Studies Society (ECSS), a fictitious historical scientific community. This scenario resembles the "REWERSE Information System and Portal" as specified in the deliverable I5-D2. Similar to REWERSE, the ECSS is subdi- vided into participants, i.e., universities, working groups, and a central node. Each node has its - locally administered - Web site. Events that occur in this community include changes in the personal data of members, keeping track of the inventory of the community owned library, or simply announcing information from email newsletters to interested working groups. These events require reactions such as updates, deletion, alteration, or propagation of data, which are implemented using XChange rules. The rules run locally at the different Web nodes of the community, allowing for the processing of local and remote events.

While a similar behavior could be obtained with conventional means of programming, XChange provides an elegant and easy way to arrange for evolution of data and reactivity on the Web using readable and intuitive ECA rules. Moreover, by employing and extending Xcerpt as a query language, XChange integrates reactivity to events, querying of Web re- sources, and updating those resources in a single language.

Contact: Michael Eckert (Munich)

1.2 Prototype of a general ECA framework for the Web

The prototype of the "General ECA Framework" approach described in I5-D4, Chapter 3, is under implementation towards I5-D4 (month 30). The current demo shows the state of the art of an implementation of the central ECA functionality in combination with a simple event detection service and XML databases. We will illustrate the communication mechanisms and the execution semantics by monitoring the execution of some sample rules.

Further information can be found at http://www.dbis.informatik.uni-goettingen.de/rewerse/eca/

Contact: José Júlio Alves Alferes (Lisbon), Wolfgang May (Göttingen) 2 Extended Abstracts

2.1 Active Rules in the Semantic Web

Authors: Wolfgang May, José Júlio Alferes, Ricardo Amador

2.2 XChange: A Reactive, Rule-Based Language for the Web

Authors: François Bry, Michael Eckert, Paula-Lavinia Pătrânjan

2.3 Supporting ECA rule markup in ruleCore [pdf]

Authors: Mikael Berndtsson and Marco Seiriö

Active Rules in the Semantic Web: An Ontology Based Approach Extended Abstract

Wolfgang May1, Jos´e Julio´ Alferes2, and Ricardo Amador2

1 Institut fur¨ Informatik, Universit¨at G¨ottingen 2 Centro de Inteligˆencia Artificial - CENTRIA, Universidade Nova de Lisboa

1 Introduction

The goal of the Semantic Web is to bridge the heterogeneity of data formats, schemas, languages, and ontologies used in the Web to provide unified view(s) on the Web, as an extension to today’s portals. In this scenario, XML (as a format for storing and exchanging data), RDF (as an abstract data model for states), OWL (as an additional framework for state theories), and XML-based communication (Web Services, SOAP, WSDL) provide the natural underlying concepts. In contrast to the current Web, the Semantic Web should be able not only to support querying, but also to propagate knowledge and changes in a semantic way. This evolution and behavior depends on the cooperation of nodes. In the same way as the main driving forces for XML and the Semantic Web idea were the heterogeneity and the lack of accessing the semantics of the underlying data, the heterogeneity of concepts for expressing behavior requires for an ap- propriate handling on the semantic level. Since the contributing nodes are based on different concepts such as data models and languages, it is important that frameworks for the Semantic Web are modular, and that the concepts and the actual languages are independent. Here, reactivity and its formalization as Event-Condition-Action (ECA) rules offer a suitable common model because they provide a modularization into clean concepts with a well-defined information flow. An important advantage of them is that the content of a rule (event, condition, and action specifications) is sep- arated from the generic semantics of the ECA rules themselves that provides a well-understood formal meaning: when an event (atomic event or composite event) occurs, evaluate a condition (possibly after querying for some extra (sta- tic) data), and if the condition is satisfied then execute an action (or a sequence of actions, a program, a transaction, or even start a process). ECA rules provide a generic uniform framework for specifying and implementing communication, local evolution, policies and strategies, and –altogether– global evolution in the Semantic Web. In this presentation, we briefly describe an ontology-based approach for spec- ifying (reactive) behavior in the Web and evolution of the Web that follows the ECA paradigm. We propose a modular framework for composing languages for events, queries, conditions, and actions by separating the ECA semantics from the underlying semantics of events, queries, and actions. This modularity allows for high flexibility wrt. these sublanguages, while exploiting and supporting their meta-level homogeneity on the way to the Semantic Web. Moreover, the ECA rules do not only operate on the Semantic Web, but are themselves also part of it. In general, especially if one wants to reason about evolution, ECA rules (and their components) must be communicated between different nodes, and may themselves be subject to being updated. For that, the ECA rules themselves must be represented as data in the (Semantic) Web. This need calls for, besides the on- tology, a (XML) Markup Language of ECA Rules. Details about the markup, the semantics of rule execution, about a service-oriented architecture proposal, and implementation issues can be found in the papers [BFMS06,MAA05a,MAA05b].

2 Active Rules, Rule Components and Languages

An ECA concept for supporting interoperability in the Semantic Web needs to be flexible and adapted to the “global” environment. Since the Semantic Web is a world-wide living organism, nodes “speaking different languages” should be able to interoperate. So, different “local” languages, be it the condition (query) languages, the action languages or the event languages/event algebras have to be integrated in a common framework. There is a more succinct separation between event, condition, and action part, which are possibly (i) given in separate languages, and (ii) possibly evaluated/executed in different places. For this, an (extendible) ontology for rules, events, and actions that allows for interoperability is needed, that can be combined with an infrastructure that turns the instances of these concepts into objects of the Semantic Web itself. Components of Active Rules in the Semantic Web A basic form of active rules is that of the well-known database triggers, e.g., in SQL, of the form ON database-update WHEN condition BEGIN pl/sql-fragment END. In SQL, the con- dition can only use very restricted information about the immediate database update. In case that an action should only be executed under certain condi- tions which involve a (local) database query, this is done in a procedural way in the pl/sql-fragment. This has the drawback of not being declarative; reasoning about the actual effects would require to analyze the program code of the pl/sql- fragment. Additionally, in the distributed environment of the Web, the query is probably (i) not local, and (ii) heterogeneous in the language – queries against different nodes may be expressed in different languages. For our framework, we prefer a declarative approach with a clean, declarative design as a “Normal Form”: Detecting just the dynamic part of a situation (event), then check if something has to be done by first obtaining additional information by a query and then evaluating a boolean test, and, if “yes”, then actually do something – as shown in Figure 1. With this separation of tasks, we obtain the following structure: every rule uses an event language, one or more query languages, a test language, and an action language for the respective components. Each of these languages and their constructs are described by metadata and an ontology of its semantics, and their nature as a language, e.g., associating them with a processor. Moreover, a well-defined interface for communication between the E, Q&T, and A components by variables must be defined.

Event Condition Action dynamic static dynamic event query test action

collect test act

Fig. 1. Components and Phases of Evaluating an ECA Rule

Sublanguages and Interoperability. For expressing and applying such rules in the Semantic Web, a uniform handling of the event, querying, testing, and action sublanguages is required. Rules and their components are objects of the Semantic Web, i.e., subject to a generic rule ontology as shown in the UML model in Figure 2. The ontology part then splits in a structural and application-oriented part, and an infrastructure part about the language itself. In this presentation, we restrict ourselves to the issues of the ECA language structure.

Rule Model ecaRule

£ £ OpaqueECARule ECARule theRule: text 1 1 0..1

Event Condition Action Component Component Component * 1

Query Test Component Component

↓uses ↓uses ↓uses ↓uses ↓uses Trigger Event Query Test Action

Language Language Language Language Language

£ £ £ £ £ Language Processor Languages Model Name service/plugin URI impl by syntax definition

Fig. 2. ECA Rule Components and corresponding Languages II

Hierarchical Structure of Languages The framework defines a hierarchi- cal structure of language families (wrt. embedding of language expressions) as shown in Figure 3. As described until now, there is an ECA language, and there are (heterogeneous) event, query, test, and action languages. Rules will combine one or more languages of each of the families. In general, each such language consists of an own, application-independent, syntax and semantics (e.g., event algebras, query languages, boolean tests, process algebras or programming lan- guages) that is then applied to a domain (e.g. travelling, banking, universities, etc.). The domain ontologies define the static and dynamic notions of the appli- cation domain, i.e., predicates or literals (for queries and conditions), and events and actions (e.g. events of train schedule changes, actions of reserving tickets, etc.). Additionally, there are domain-independent languages that provide prim- itives (with arguments), like general communication, e.g. received message(M) (where M in turn contains domain-specific content), or transactional languages with an action commit(A) and an event committed(A) where A is a domain- specific action.

ECA Language :

embeds embeds embeds embeds Event Query Test Action Language Language Language Language

embeds embeds embeds embeds Languages for Application-Independent Domains: communication/messages, transactions, etc.

Atomic Events Literals Atomic Actions

talks about Application-Domain Language

Atomic Events Literals Atomic Actions

Fig. 3. Hierarchy of Languages

3 Common Structure of Component Languages

The four types of rule components use specialized types of languages that, al- though dealing with different notions, share the same algebraic language struc- ture: – event languages: every expression gives a description of a (possibly composite) event. Expressions are built by composers of an event algebra, where the leaves are atomic events of the underlying application; – query languages: expressions of an algebraic query-language; – test languages: they are in fact formulas of some logic over literals (of that logic) and an underlying domain (that determines the predicate and function symbols, or class symbols etc., depending on the logic); – action languages: every expression describes an (possible composite) activity. Here, algebraic languages (like process algebras) or “classical” programming languages (that nevertheless consist of expressions) can be used. Again, the atomic items are actions of the application domain. As shown in Figure 4, all such component languages consist of an algebraic language defining a set of composers, and embedding atomic elements (events, literals, actions) that are contributed by domain languages, either for specific applications or application-independent (e.g., messaging). Expressions of the language are then (i) atomic elements, or (ii) composite expressions recursively obtained by applying composers to expressions. Due to their structure, these lan- guages are called algebraic languages, e.g. used in event algebras, algebraic query languages, and process algebras. Each composer has a given cardinality that de- notes the number of expressions (of the same type of language, e.g., events) it can compose, and (optionally) a sequence of parameters (that come from another ontology, e.g., time intervals) that determines its arity (see Figures 4 and 5).

ComponentLanguage

DomainEngine 1..* Processor *

DomainLanguage AlgebraicLanguage ↓impl name name * * Semantics * * Composer Primitive Parameter /arity arity name cardinality

Fig. 4. Notions of an Algebraic Language

Thus, language expressions are in fact trees which are marked up accordingly. The markup elements are provided by the definition of the individual languages, “residing” in, and distinguished by, the appropriate namespaces: the expression “structure” inside each component is built from elements of the algebraic lan- guage. An expression is either an atomic one (atomic event, literal, action) that belongs to a domain language, or an opaque one that is a code fragment of some event/query/logic/action language, or a composite expression that consists of a composer (that belongs to a language) and several subexpressions (where each recursively also belongs to a language). The leaves of the expression trees are the atomic events, literals, or actions, contributed by the application domains represented by

RuleComponent Expression £ £ * £ AtomicExpr CompositeExpr * * Variable * OpaqueSpec * * * * 1 Parameter Composer name 0..* ↓has language ↓has language ↓has language

DomainLanguage AlgebraicLanguage £ £ Language

Fig. 5. Syntactical Structure of Expressions of an Algebraic Language (and residing in the domain’s ontology and namespace); they may again have an internal structure in the domain’s namespace. Note that this proposal doesn’t exclude nesting composers and expressions from different languages of the same kind (e.g., an event sequence where the first part is described in event algebra A and the second in another algebra B), distinguishing them by the namespaces they use. Thus, languages are not only associated once on the rule component level, but this can also be done on the expression level.

References

[BFMS06] E. Behrends, O. Fritzen, W. May, and D. Schubert. An ECA Engine for Deploying Heterogeneous Component Languages in the Semantic Web. In Workshop ’Reactivity on the Web’, Springer 20056 (To appear). [MAA05a] W. May, J. J. Alferes, and R. Amador. An Ontology- and Resources-Based Approach to Evolution and Reactivity in the Semantic Web. In Ontologies, Databases and Semantics (ODBASE), pages 1553-1570. LNCS 3761, Springer 2005. [MAA05b] W. May, J. J. Alferes, and R. Amador. Active Rules in the Semantic Web: Dealing with Language Heterogeneity. In Rules and Rule Markup Languages for the Semantic Web (RuleML’05), pages pages 30-44. LNCS 3791, Springer 2005. XChange: A Reactive, Rule-Based Language for the Web

Franc¸ois Bry Michael Eckert Paula-Lavinia P˘atrˆanjan University of Munich University of Munich University of Munich Institute for Informatics Institute for Informatics Institute for Informatics Oettingenstr. 67, D-80538 M¨unchen Oettingenstr. 67, D-80538 M¨unchen Oettingenstr. 67, D-80538 M¨unchen Email: [email protected]fi.lmu.de Email: [email protected]fi.lmu.de Email: [email protected]fi.lmu.de

Abstract— Reactivity on the Web is an emerging research issue • XChange embeds an XML query language, Xcerpt covering: updating data on the Web, exchanging information [SB04], [Xce], allowing to access and reason with Web about events (such as executed updates) between Web sites, resources. and reacting to combinations of such events. Reactivity plays • an important role for upcoming Web systems such as online XChange provides an integrated XML update language marketplaces, adaptive Web and Semantic Web systems, as well for modifying Web resources. as Web services and Grids. This article introduces the high- • XChange reactive rules enforce a clear separation of level language XChange for programming reactive behavior and persistent data (Web resources) and volatile data (events). distributed applications on the Web. The distinction is important for programmers: the former relates to state, while the latter reflects changes in state. I.INTRODUCTION • XChange’s high abstraction level and its powerful con- Many resources on the Web and Semantic Web are dynamic structs allow for short and compact code. in the sense that they can change their content over time as The remainder of this article is structured as follows. We new information comes in or information becomes out-of-date. first introduce the paradigms driving the design of XChange Often, changes must be mirrored by other Web resources by (Section 2). Next we give a flavor of XChange programs propagating updates. (Section 3). We conclude the article with an outlook on future Reactivity on the Web is the ability of Web sites to detect research directions (Section 4). happenings, or events, of interest that have occurred on the Web and to automatically react to them through reactive pro- II.PARADIGMS grams. Events may have various levels of abstraction ranging Clear paradigms that a programming language follows pro- from low-level, such as insertions into XML or RDF docu- vide a better language understanding and ease programming. ments, to high-level application-dependent ones. For example, This section introduces the paradigms upon which XChange in a tourism application, events of interest include delays or is designed. cancellations of flights, and new discounts for flights offered Event vs. Event Query: An event is a happening to which by an airline. Reactions to such events include notifying each Web site may decide to react in a particular way or not colleagues about delays, looking for and booking another to react at all. In order to notify Web sites about events and to flight, or booking flights from a particular airline. process event data, events need to have a data representation. This article presents the reactive, rule-based language In XChange, events are represented as XML documents. XChange [BBEP05], [BEP06a], [BEP06b], [XCh], which pro- Event queries are queries against event data; they serve vides the following benefits over the conventional approaches a double purpose: detecting events of interest and (through of using general-purpose programming languages to imple- composite event queries) temporal combinations of them and ment reactive behavior on the Web: selecting data items from the representation of events. • XChange reactive rules are highly declarative. They allow Volatile vs. Persistent Data: XChange reflects the novel programming on a high abstraction level, and are easy to view over Web data that differentiates between volatile data analyze for both humans and machines. (event data communicated on the Web between XChange • The various parts of a rule all follow the same paradigm programs) and persistent data (data of Web resources such as of specifying patterns for XML data, thus making XML or HTML documents). This clear distinction between XChange an elegant, easy to learn language. volatile and persistent data aims at easing programming and • Both atomic and composite events can be detected and avoiding the emergence of a parallel “Web of events.” relevant data extracted from events. Composite events, Rule-Based Language: An XChange program is located temporal combinations of events, are an important re- at one Web site and contains reactive rules, more precisely quirement in composing an application from different Event-Condition-Action rules (ECA rules) of the form Event services. Query — Web Query — Action. Such an ECA rule has the ON xchange:event {{ flight-cancellation {{ flight-number{var N}, passenger{{ name {"Christina Smith"} }} }} }} FROM in { resource { "http://www.example.com/lufthansa.xml", "xml" }, flights {{ flight {{ number { var N } }} }} } DO xchange:event [ xchange:recipient [ "http://sms-gateway.org/us/206-240-1087/" ], text-message [ "Your flight", var N, "has been cancelled." ] ] END Fig. 1. ECA rule to notify Mrs. Smith of a flight cancellation following meaning: When events answering the event query Example: The site http://airline.com has been are received and the Web query is successfully evaluated, the told to notify Mrs. Smith’s travel organizer by SMS of delays action is performed. Both event query and Web query can or cancellations of flights she travels with. This can be extract data through variable bindings, which can then be expressed as XChange rule in Figure 1. Note that both data used in the action. XChange embeds the Web query language from the event (number of the canceled flight) and an XML Xcerpt (developed in the REWERSE working group I4 on document on the Web are accessed. “Reasoning-aware Querying”) for expressing Web queries and We now look closer at the three parts of an XChange for specifying deductive rules (e.g., for views over heteroge- rule, starting for ease of presentation with Web queries in the neous data, and advanced reasoning tasks). condition part. Pattern-Based Approach: Event queries, Web queries and Web Queries: The condition part of XChange rules actions follow the same approach of specifying patterns for queries data from Web resources such as XML documents or data queried, updated, or constructed. RDF documents. Such Web queries are expressed in Xcerpt, Push Strategy for Event Communication: XChange uses a Web and Semantic Web query language. Xcerpt has query a push strategy for communicating events to interested Web patterns, called query terms, for querying Web resources, and sites. This has several advantages over pull communication: construction patterns, called construct terms, for re-assembling it allows faster reaction, avoids unnecessary network traffic data selected by queries into new data items. Only query terms through periodic polling, and saves local resources. are used in the condition part of XChange ECA rules; however, Processing of Events: Event queries are evaluated locally XChange programs can contain deductive rules expressed as at each Web site against the stream of incoming events. CONSTRUCT construct-term FROM query-term END. For efficiency reasons, an incremental evaluation is used for Such deductive rules are similar to views in relational (composite) event queries. databases, and data derived by them can “feed” into other deductive rules and into the condition part of ECA rules. Bounded Event Lifespan: Event queries are such that no For conciseness, XChange and Xcerpt represent data, query data on any event has to be kept forever in memory, that is, the event lifespan is bounded. Hence, design enforces that volatile patterns, and construction patterns in a term-like syntax. The data remains volatile. following depicts an XML document with information about flights, its representation as data term, and a query extracting a flight number. III.AFLAVOR OF XCHANGE An XChange program is located at one Web site and UA917 consists of one or more (re)active rules of the form Event 2006-02-20 query — Web query — Action. Every incoming event is queried using the event query (introduced by keyword ON). flight-cancellation [ If an answer is found and the Web query (introduced by number ["UA917"], keyword FROM) has also an answer, then the specified action date ["2006-02-20"] (introduced by keyword DO) is executed. ] Rule parts communicate through variable substitutions. Sub- stitutions obtained by evaluating the event query can be used in flight-cancellation {{ the Web query and the action part, those obtained by evaluating number [ var N ] the Web query can be used in the action part. }} in { resource { "http://airline.com" }, flights {{ last-change { var L replaceby "2006-02-20" }, flight {{ number { var N }, delete departure-time {{ }}, delete arrival-time {{ }}, insert news { "Flight has been cancelled!!" } }} }} } Fig. 2. Update to the flight database at http://airline.com

In the term syntax, square brackets denote that the order have to happen in sequence), or (either event can happen), of the children of an XML element is relevant, curly braces without (non-occurrence of the event in a given time frame). denote that the order is not relevant. Limiting temporal ranges can be specified with keywords such Both partial (i.e. incomplete) or total (i.e. complete) query as before (all events have to happen before a certain time patterns can be specified. A query term t using a partial point), in (all events have to happen in an absolute time specification denoted by double brackets or braces) for its interval), within (all events have to happen within a given subterms matches with all such terms that (1) contain matching length of time). subterms for all subterms of t and that (2) might contain further Actions: The Web is a dynamic, state-changing system. subterms without corresponding subterms in t. In contrast, a To act in this world, XChange rules support the following query term t using a total specification (denoted by single primitive actions: executing simple updates to persistent Web square brackets [ ] or curly braces {}) does not match with data (such as the insertion of an XML element) and raising terms that contain additional subterms without corresponding new events (i.e., sending a new event message to a remote Web subterms in t. Query terms contain variables for selecting site or oneself). To specify more complex actions, compound subterms of data terms that are bound to the variables. actions can be constructed as from the primitive actions. The results of a query are bindings for the free variable in Updating Web Data: An XChange update term is a that query. In the example, N is bound to "UA917". (possibly incomplete) pattern for the data to be updated, Event Queries: Events are represented as XML messages augmented with the desired update operations. An update term in XChange, and these event messages are exchanged between may contain different types of update operations: An insertion Web sites. Each Web site monitors the incoming event mes- operation specifies an Xcerpt construct term that is to be sages to check if they match an event query of one of its inserted, a deletion operation specifies an Xcerpt query term XChange rules. for deleting all data terms matching it, and a replace operation Atomic event queries detect occurrences of single, atomic specifies an Xcerpt query term to determine data terms to be events. They are query patterns for the XML representation of modified and an Xcerpt construct term as their new value. the events. This means that the same pattern-based approach The example in Figure 2 updates the flight timetable at is used to query Web data and data in atomic events. http://airline.com (in reaction to the flight cancellation event Often, situations that require a reaction by a rule are not seen earlier). The variable N is already bound to the flight given by a single atomic event, but a temporal combination number of the canceled flight. of events, leading to the notion of composite events and Raising New Events: Events to be raised are specified as composite event queries. Support for composite events is very (complete) patterns for the event messages, called event terms. important for the Web: In carefully developed applications, An event term is simply an Xcerpt construct term restricted to designers have the freedom to choose events according to having a root labeled event and at least one sub-term recipient their goal. They can thus often do with only atomic events specifying the URI of the recipient. by representing events which might be conceptually composite The following is an example of an event term to notify a with a single atomic event. In the Web’s open world, however, passenger on his PDA about the flight cancellation: many different applications which have not been engineered xchange:event { together are integrated and have to cooperate. Situation that xchange:recipient { require a reaction might not have been considered in the "http://www.johnqpublic.com/pda/" }, original design of the applications and thus have to be inferred cancellation-notification { var N } from many atomic events. } A composite event query consists of (1) a connection of (atomic or composite) event queries with event composition Specifying Complex Actions: The primitive actions de- operators and (2) an optional temporal range limiting the time scribed by update terms and event terms alone do not let you interval in which events are relevant. do very much; only in their combination they can become Composition operators are denoted with keywords such as powerful. XChange hence allows specifying complex actions and (both events have to happen), andthen (the events as combinations of (primitive and complex) actions. Such ON flight-cancellation {{ number [var N] }} FROM in { resource { "http://airline.com/passengers.xml" }, passengers {{ booked-for { var N }, name { var P }, contact { var C} }} } DO and { in { resource { "http://airline.com/waitinglist.xml"}, waitinglist {{ flight {{ replaces { var N }, insert all passenger{ var P } }} }} }, xchange:event { all xchange:recipient { var C }, message ["Your flight ", var N, " has been cancelled. ", "You have been placed on the waiting list."], waitinglist {"http://airline.com/waitinglist.xml"} } } END Fig. 3. Example of a complete XChange rule combination of actions is to be executed in a transactional XChange is an ongoing research project [XCh]. The design, all-or-nothing manner. the core language constructs, and the semantics are completed Actions can be combined with disjunctions and conjunc- and a proof-of-concept prototype has been implemented. An tions. Disjunctions specify alternatives, only one of the speci- implementation of use cases with XChange indicates the fied actions is to be performed successfully. (Note that actions language’s applicability and relative ease of use [Rom06]. such as updates can be unsuccessful, i.e., fail). Conjunctions Issues deserving further attention in XChange are automatic in turn specify that all actions need to be performed. The generation of ECA rules (e.g., from data dependency specifi- combinations are indicated by the keywords or and and, cations), efficient evaluation of rule sets and in particular event followed by a list of the actions enclosed in braces or brackets. queries, visual rendering of XChange programs, and means to The list of the actions can be ordered (indicated by square structure and organize large rule programs [BE06]. brackets []) or unordered (indicated by curly braces ). If REFERENCES the actions are ordered, their execution order is specified to be relevant. If the actions are unordered, their execution [BBEP05] James Bailey, Franc¸ois Bry, Michael Eckert, and Paula-Lavinia P˘atrˆanjan. Flavours of XChange, a rule-based reactive language order is specified as irrelevant, thus giving more freedom for for the (Semantic) Web. In Proc. Int. Conf. on Rules and Rule parallelization. Markup Languages for the Semantic Web, number 3791 in LNCS, Putting It All Together: An Example: Having seen the pages 187–192. Springer, 2005. [BE06] Franc¸ois Bry and Michael Eckert. Twelve theses on reactive rules three parts of an XChange rule, events, conditions, and actions, for the Web. In Workshop ”Reactivity on the Web” at Int. Conf. Figure 3 gives an example of a complete XChange rule. The Extending Database Technology, 2006. (Invited paper). rule below reacts upon the flight cancellation (event), extracts [BEP06a] Franc¸ois Bry, Michael Eckert, and Paula-Lavinia P˘atrˆanjan. Querying composite events for reactivity on the Web. In Proc. as a query to Web data the affected passengers (condition), Intl. Workshop on XML Research and Applications, number 3842 and notifies the affected passengers (event raising action) as in LNCS, pages 38–47. Springer, 2006. well as placing them on a waiting list (update action). [BEP06b] Franc¸ois Bry, Michael Eckert, and Paula-Lavinia P˘atrˆanjan. Re- activity on the Web: Paradigms and applications of the language IV. CONCLUSIONS XChange. Journal of Web Engineering, 5(1):3–24, 2006. [Rom06] Inna Romanenko. Use cases for reactivity on the Web: Using ECA This article has presented the high-level language XChange rules for business process modeling. Master’s thesis, Institute for for realizing reactivity on the Web. XChange introduces a Informatics, University of Munich, 2006. [SB04] Sebastian Schaffert and Franc¸ois Bry. Querying the Web recon- novel view over the Web data by stressing a clear separation sidered: A practical introduction to Xcerpt. In Proc. of Extreme between persistent data (data of Web resources, such as XML Markup Languages Conf., 2004. or HTML documents) and volatile data (event data commu- [Xce] Xcerpt. http://xcerpt.org. nicated on the Web between XChange programs). XChange’s [XCh] XChange. http://www.pms.ifi.lmu.de/projekte/xchange. language design enforces this clear separation. Supporting ECA rule markup in ruleCore

Mikael Berndtsson1 and Marco Seiri¨o2

1 University of Sk¨ovde, Sweden [email protected] http://www.his.se/berk 2 Analog Software, Sweden [email protected] http://www.rulecore.com

1 Introduction

Event Condition Action (ECA) rules were first proposed in the late 1980s and extensively explored during the 1990s within the active database community for monitoring state changes in database systems [11, 10, 14]. Briefly, ECA rules have the following semantics: when an event occurs, evaluate a condition, and if the condition is satisfied then execute an action. Recently, the concept of ECA rules have been transferred to the Web community for supporting ECA rules for XML data, see [2] for an overview. We see a great need for an ECA rule markup language, in terms of having a suitable format for exchanging ECA rules between different applications and platforms. The need for an ECA rule markup language has also been identified elsewhere, e.g. [1]:

” ... ECA rules themselves must be represented as data in the (Se- mantic) Web. This need calls for a (XML) Markup Language of ECA Rules.”

Existing work on ECA rule markup languages is still very much in the initial phase, for example, the RuleML [4] standardization initiative has no markup for events, only for conditions and actions. In addition related work, e.g., [3, 5], on ECA rules for XML usually has an XML programming style for specifying ECA rules, rather than specifying ECA rules in a markup language. We approach the above need for an ECA rule markup language and the lack of markup for events in existing literature by presenting the design and implementation of the ECA rule engine ruleCore3 [12] and the ruleCore Markup Language (rCML)4. The rCML Language has a clear separation between speci- fication of the three different parts (event, condition, action). In addition, it also supports specification of an extensive set of composite events.

3 ruleCore is a registered trademark of MS Analog Software kb 4 rCML is a trademark of MS Analog Software kb 2 RuleCore

RuleCore [12] is implemented in Python, Qt, XML, and it supports ECA rules and event monitoring in heterogeneous environments. For example, a broker system can be used to integrate heterogeneous systems, and ruleCore can be attached to such a broker system and react to events that are sent through the broker.

Fig. 1. ruleCore architecture

The ruleCore engine is built around a concept of loosely coupled components and is internally event driven. Components communicate indirectly using events and the publish/subscribe event passing model. The functionality of the ECA rules are provided by a number of components working in concert, where each component provides the functionality in a well defined small area. As the com- ponents are not aware of the recipient of the event they publish, it is easy to reconfigure the engine to experiment with other models besides the more well known ECA model. For example, one could insert an additional processing step between any of the event, condition or action steps. All internal and external events are stored in a relational database (PostgreSQL). Storing the event oc- currences in a database implies that traditional database tools can be used for off-line analysis, visualization, simulation and reporting. At the core of ruleCore lies a component framework, see Figure 1. The frame- work provides services for loading, initializing, starting and stopping compo- nents. It also handles persistence for the components and manages automatic crash recovery for the engine. The crash recovery mechanism is fully automatic and restores the last known state of the engine at startup in case the engine was not shut down properly. The recovery mechanism uses the transaction manage- ment features of the PostgreSQL database to roll forward transactions to keep the internal state of the engine consistent at all times. The temporal features of the engine are fully integrated into the recovery process and thus all time dependencies are managed in a best effort manner even in case of engine failure or down time. The components that provide the functionality can broadly be divided into three groups. – Input components are responsible for implementing support for a specific transport protocol and accepting events through it. Currently support ex- ists for receiving events with XML-RPC, TCP/IP Sockets, SOAP, IBM Web- Sphere MQ, and TIBCO Rendezvous. – Rule components provide the functionality of the rules. The rule manager, condition manager and action manager components work together using event passing to implement the ECA rule execution model. – Support components provide functionality that is directly or indirectly used by other components. In this group we find components for event routing, event flow management, persistent state management and management of the configuration of the engine.

3 The ruleCore Markup Language (rCML)

3.1 Design Principles Each rule, event, condition, or action in rCML has a unique name. This design decision is reusing previous design solutions from the active database community. In essence, it means that rules, events, conditions, and actions should be treated as first class objects. Some advantages are that, e.g., events can be related to other objects and also have attributes, and they can be added, deleted and updated as any other object. If for example, events are not treated as first class objects, the event information needs to duplicated in each ECA rule that is triggered by the event. This can quickly lead to maintenance problems.

3.2 ECA Rules ECA rules are specified inside the element and each individual ECA rule is described with a element. A element has the following subelements: – The element contains a description of the rule. This subele- ment is only for user convenience and it is not used by the engine. – The element contains a reference to the composite event that trigger the rule. The main target of applications for rCML are applications that react on composite events. However, a simple basic event can also act as a triggering event for a rule by constructing a composite event with only one sub event. – The contains a reference to a condition definition element that specifies a condition that is evaluated when the rule is triggered by its event. – The contains a reference to a element that is executed if the rule condition is evaluated to true. – The contains a reference to a element that is executed if the triggering event can never be detected. – The element is used to limit the number of the rule instance for each type of rule. Possible values are: 1. None 2. An integer specifying the maximum number of rule instances. Below is an example of an ECA rule in rCML:

A description of this rule E1 Condition12 Action1 Action2 None

The , and and all contain an attribute called enabled with the possible values of yes or no. Thus a rule whose rule condition should always be evaluates to true is spec- ified as .

3.3 Event Types Two different event types are supported in rCML: basic events and composite events. A simple composite event can be defined by using two basic events. However, more complex composite events are defined by building composite events out of other composite events. The following event operators are supported:

– Conjunction. Similar event operators are supported in Snoop [7], Ode [9], and SAMOS [8]. – Disjunction. Similar event operators are supported in Snoop [7], Ode [9], and SAMOS [8]. – Sequence. The sequence operator is perhaps most useful when the sub-events are basic events. Similar event operators are supported in Snoop [7], Ode [9], and SAMOS [8]. – Prior sequence. The prior sequence operator behaves like the sequence op- erator when all of its sub events are basic events. However, when the sub events are composite events the semantics of the composite event detection are as follow. The terminating event in a sub-composite event must occur be- fore the terminating event in the following sub-composite event occurs. The semantics of the prior sequence operator in rCML is similar to the semantics of the prior event operator as defined in Ode [9]. – Relative sequence. The relative sequence operator behaves like the prior sequence operator and sequence operator when all of its sub events are basic events. The relative sequence operator requires that the terminating event in a sub-composite event is detected before the detection of the initiating event in the following sub-composite event. The semantics of the relative sequence operator in rCML is similar to the semantics of the relative event operator as defined in Ode [9]. – Any. A similar event operator is found in Snoop [7]. The any event operator will detect when any m events out of the n specified sub events have occurred, where m ≤ n. The order of detection of the sub events is not important. Thus, the semantics of the element is similar to the element, but with the difference that the user can choose that only a limited number of the sub events need to be detected. – Between. The between event operator uses one initiating event and one ter- minating event to detect the composite event. Any number of events can occur between the initiating event and the terminating event. All the events that occur between the initiating event and the terminating event can be stored for condition evaluation. The between event operator is usable when the initiating and terminating events of a composite event are known but not how many events that will occur in between them. – Not. The classical semantics of the NOT operator when specifying composite events for ECA rules are that an event E is not detected during an interval specified by two events. For example, a composite event NOT E3 (E1,E2) is detected if event E3 is not detected between the detection of E1 and E2. Previous systems [6, 8] have restricted the use of the NOT operator to: (i) a conjunction [6], i.e., event E3 should not occur between (E1 and E2), or ii) a time interval [8], i.e., event E3 should not occur between 18:00 and 20:00. The approach taken in rCML generalize the usage of the NOT operator to any type of event interval. Thus, the NOT operator extends previous usage of the NOT operator for specifying composite events for ECA rules. – Count. The count event operator is used to count how many times its only sub event is detected within an interval. The interval is configured in such a way that the count operator knows when it should start and stop count- ing event occurrences. Thus, a element is either in an open state (counting) or in a closed state (not counting). – Timeport. The time event operator supports specification of absolute, rel- ative, and periodic time events. Similar events have been proposed by the active database community. – State gate. The state gate operator can be used to detect whether an object is in a particular state, for example, between 12:00 and 13:00 the object is in the ”LUNCH” state.

4 The Future

The rule engine is currently being extended to support dynamic ECA rule man- agement, i.e., adding ECA rules on the fly. This is a feature that is mandatory, if we are to support ECA rules that integrate parts from different ECA systems. For additional details about ruleCore and rCML, please see [12, 13].

References

1. J. J. Alferes, R. Amador, and W. May. A general language for Evolution and Reactivity in the Semantic Web. In Proceedings of the 3rd Workshop on Principles and Practice of Semantic Web Reasoning, 2005. 2. J. J. Alferes, J. Bailey, M. Berndtsson, F. Bry, J. Dietrich, A. Kozlenkov, W. May, P. L. Patranjan, A. Pinto, M. Schroeder, and G. Wagner. State-of-the-art on Evolution and Reactivity. Technical Report REWERSE deliverable I5-D1, 2004. 3. J. Bailey, A. Poulovassilis, and P. T. Wood. An Event Condition Action Language for XML. In Proceedings of WWW’2002, pages 486–495, 2002. 4. H. Boley, B. Grosof, M. Sintek, S. Tabet, and G. Wagner. RuleML Design. RuleML Initiative, http://www.ruleml.org/, 2002. 5. F. Bry and P.-L. Patranjan. Reactivity on the Web: Paradigms and Applications of the Language XChange. In Proceedings of the 20th Annual ACM Symposium on Applied Computing SAC’2005, 2005. 6. S. Chakravarthy, E. Anwar, L. Maugis, and D. Mishra. Design of Sentinel: An Object-Oriented DBMS with Event-Based Rules. Information and Software Tech- nology, 36(9):559–568, 1994. 7. S. Chakravarthy, V. Krishnaprasad, E. Anwar, and S. K. Kim. Composite Events for Active Databases: Semantics Contexts and Detection. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 606–617, September 1994. 8. K. R. Dittrich, H. Fritschi, S. Gatziu, A. Geppert, and A. Vaduva. SAMOS in hindsight: experiences in building an active object-oriented DBMS. Information Systems, 28(5):369–392, July 2003. 9. N. Gehani, H. V. Jagadish, and O. Smueli. Event specification in an active object- oriented database. In Proceedings of the 1992 ACM SIGMOD International Con- ference on Management of Data, pages 81–90, 1992. 10. N. W. Paton, editor. Active Rules in Database Systems. Monographs in Computer Science. Springer, 1999. ISBN 0-387-98529-8. 11. N. W. Paton and O. Diaz. Active Database Systems. ACM Computing Surveys, 31(1):63–103, 1999. 12. ruleCore. The ruleCore home page: http://www.rulecore.com/. 13. M. Seiri¨oand M. Berndtsson. Design and Implementation of an ECA Rule Markup Language. In Proceedings of the 4th International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML-2005), volume 3791 of Lecture Notes in Computer Science, pages 98–112. Springer, 2005. ISBN 3-540-29922-X. 14. J. Widom and S. Ceri, editors. Active Database Systems: Triggers and Rules For Advanced Database Processing. Morgan Kaufmann, 1996. ISBN 1-55860-304-2.

REWERSE

Workpackage A1

Web-based Decision Support for Event, Temporal, and Geographical Data 1 Demos

1.1 Local Data Stream Management System

L-DSMS facilitates filtering and transformation of XML data streams. In this demo we proc- ess an XML stream of RDS/TMC data (Traffic Information broadcast via radio signals).

Contact: Bernhard Lorenz (Munich)

1.2 KML Traffic Information Service

As an application which consumes streamed XML data, we show how the TMC data provided by the L-DSMS are cached, transformed to the Keyhole Markup Language (KML), and made available on a web server for display in the Google Earth client.

Contact: Bernhard Lorenz (Munich)

1.3 TransRoute: A Framework for Graphs

This demo shows the basic features of TransRoute, a framework for representing, processing and querying graphs. The street network of Munich serves as a testbed, consisting of around 10000 vertices and 15000 edges.

Contact: Bernhard Lorenz (Munich)

2 Extended Abstracts

2.1 Geospatial Information Processing in WG A1

Authors: Bernhard Lorenz, Hans Jürgen Ohlbach, Edgar-Philipp Stoffel OHLBACH et al.: GEOSPATIAL INFORMATION PROCESSING IN WG A1. REWERSE annual meeting, 2006 1 Geospatial Information Processing in WG A1 Bernhard Lorenz, Hans Jurgen¨ Ohlbach, Edgar-Philipp Stoffel

I.INTRODUCTION The systems we are developing win WG A1 try to address all these aspects: WG A1 has two main working hypotheses with respect to geospatial information processing for the Semantic Web. 1) representation and computation with static and dynamic geospatial information; 1) In order to give information on the Web its real seman- 2) combination of detailed geographic computation with tics, i.e. to realize the semantic web, we must develop abstract spatial reasoning; models of a considerable part of the reality. In our case 3) the possibility to specify application dependent geospa- this amounts to modelling geographic and other spatial tial notions in a suitable specification language. information in such a way that if can be be used in a 4) Finally, it is necessary for testing the components, and flexible way by many different applications. eventually for many applications, to visualise geospatial 2) The Semantic Web will not be confined to desktop information or to output it in another way (written computers in offices and homes. There is an enor- or spoken language, for example). Therefore rendering mous potential of new applications when Ubiquitous components of different kinds are also an important Computing and the Semantic Web eventually merge. aspect in our work. The Semantic Web will be accessible from Personal Digital Assistents (PDAs), mobile phones, wearables and many other computing devices. Since most of them II.LOCAL DATA STREAM MANAGEMENT SYSTEM are mobile, it is of particular importance to provide them The traditional way in which data is managed and made with a geospatial model of the reality. Therefore certain available for processing is via (relational) Database Manage- developments in WG A1 target applications which run ment Systems (DBMS), which expect data to be put into the on mobile devices. form of persistent data sets. Whereas for many applications The main bulk of geospatial information is static. Road and this constitutes a suitable form of data storage, there exists train networks, for example, do not change that often. There a growing number of applications which require data to be is, however, a small, but important part of dynamic geospatial treated and processed as a continuous stream [1]. Currently information. News about traffic jams, train delays, etc. can a number of different Data Stream Management Systems be very important and therefore must not be neglected. A (DSMS) are developed to facilitate access to data streams. comprehensive geospatial model must therefore combine static The Local Data Stream Management System (L-DSMS) and dynamic information. which was developed by WG A1 is local in the sense that Another important aspect is the granularity of the geospatial it facilitates the specification and construction of a single Java information. In order to navigate a car through a city, it is program which consists of a network of nodes for processing necessary to have a very detailed road map. If, on the other streams of data. Each such node receives data from one or hand, one want to list all major cities in Europe, it is sufficient several data sources, processes them in a certain way, and to know that, in particular, Munich is in Germany, Germany delivers the processed data to one or more data drains. A data is in Europe and ‘is in’ is transitive. Since Semantic Web drain can be the data source for the next processing node in applications may be of almost any type, the whole range the network, or it can be the end application in the whole from very detailed coordinate computations up to very abstract processing chain. One of the components of L-DSMS is the spatial reasoning should be available. SPEX XML–filtering system [2], [3]. It processes XPath [4] Many geographic notions are very much application depen- queries on a stream of XML data and can be used to extract dent. “City A is between city B and city C”, for example, may interesting information from XML streams. be true if you consider a train network, where the train line L-DSMS has been developed and tested in an application for makes a detour through city A. It may be false if you consider processing dynamic traffic information. The traffic information a highway network where the closest highway is very far from comes from RDS-TMC receivers, is converted to an XML city A. Notions like “between” and many others can therefore stream, is subsequently processed in several steps by the L- not be hard coded into a geospatial system. Instead, it must be DSMS and then delivered to other application systems, one possible to specify them in a suitable specification language of which is described in section III. Further details about L- such that one can use them for computation and reasoning DSMS can be found in [5]. tasks. Fig. 1 shows (from left to right) different examples of This research has been co-funded by the European Commission and by the configurations of node networks, which (1) receive the stream Swiss Federal Office for Education and Science within the 6th Framework Programme project REWERSE number 506779 (cf. http://rewerse. from a socket connection for output on a screen and logging on net) disk, (2) are directly connected to an RDS/TMC receiver and OHLBACH et al.: GEOSPATIAL INFORMATION PROCESSING IN WG A1. REWERSE annual meeting, 2006 2

WAN development of TIS. The databases store static information, DB such as the Location- and Event Code Lists needed for TMC. The Cache Drain, closely connected to L-DSMS, serves to TCP/IP FM/RDS FM/RDS DB Source Source Source Source keep track of current TTI messages. This is necessary because the TMC system is based on the recurring broadcast of traffic Node Node Join Node Node Node DB messages, including massive repetitions in order to reduce latency and improve the timeliness of the messages. The cache

Node OTN Join Drain Node drain retains a list of all currently active messages, adds new messages as they come in via the XML stream, and gets rid of

Console File TCP/IP Drain Drain Drain outdated messages which either reach the end of their life cycle

graphical Visualisation or are explicitly invalidated by special cancellation messages. The KML server waits for external requests coming in via WAN the web server and generates KML documents from the cached TMC data on-the-fly. These documents are then made available Fig. 1. Examples of L-DSMS configurations via the web server. In a closely related project we collect TTI over longer periods of time for the purpose of developing statistical data enrich the stream by data from a relational database before about traffic networks. This is rather important for routing providing output on a graphical map in a browser window, applications such as described in section V because the and (3) join a stream from a receiver with an artificial stream throughput in traffic networks is highly depending on the generated from data in a relational database and providing individual load on each of its connections, i.e. the different the joint stream for use by another application via a socket states of congestion. Time dependent data about congestion connection. enables the application for example to propose a different routes between identical locations A and B at certain times, III.KMLTRAFFIC INFORMATION SERVICE for example during and outside of rush-hours, on holidays, This project serves to join several components into an etc. This is possible because the applications can statistically application providing real time traffic information for display “predict” that on a certain connection there is a high chance in an online map system, in this case Google Earth. The name for congestion between e.g. 7am and 9am and again between of the service is derived from Traffic Message Channel (TMC), 4pm and 6pm. an ISO standard for traffic information and the source for the transformation, and the Keyhole Markup Language (KML), IV. MULTI PARADIGM LOCATION LANGUAGE a language developed by Keyhole Inc., now part of Google, Geographical distances among geospatial objects are ex- which is used to provide data for the Google Earth client. pressed by proximity relations, such as “A is near B”, “C is far As sketched above, Traffic and Travel Information (TTI) is from D”. There exist a number of approaches to model prox- collected via suitable RDS/TMC receivers and transformed imity relations in the geospatial domain. Within the semantic into an enriched XML stream. This stream can optionally be web context we propose an approach based on multi modal transformed and filtered by L-DSMS before it is read by the path planning, since computing semantically correct distances TMC-KML transformation system. almost always amount to path planning problems. This is due to a number of reasons, chiefly because L1 distances do http://... Webserver Google not reflect people’s intuitive understanding of distances. See Earth KML section V for more detail on our approach.

SOAP A fuzzy logic approach for proximity reasoning for instance

KML Server is proposed by [6]. Proximity expressions such as far or close to have a corresponding fuzzy membership function. Using

XML this model, the query “Which shopping centres are close to TMC Cache R” takes the following form:

TMC TMC/ Receiver XML−stream Close : O = CloseT o{o : O, {o},R, {x1, y1, x2, y2}, DistanceMethod, C} (1) Fig. 2. TIS System Architecture O is the object type (shopping centre), CloseT o is a fuzzy set membership function, and DistanceMethod is a As shown in fig. 2 the Traffic Information Service (TIS) distance calibration method (e.g. absolute or relative distance). consists of three main components: a TMC Cache Drain, Object o is an object of type O, R is the reference location, the KML server, including a relational database for static and {x1, y1, x2, y2} serves to represent scale by denotes the information, and a web server. The shaded grey components area. Since proximity depends on contextual information, the are parts of L-DSMS which were developed prior to the context C also has to be given. It includes factors such as OHLBACH et al.: GEOSPATIAL INFORMATION PROCESSING IN WG A1. REWERSE annual meeting, 2006 3 mode of transportation, user profile and preferences, possibly decorated by according attributes. For this purpose, the open- device characteristics and more. See below for more detail on source Java Universal Network and Graph framework (JUNG) context and user modelling. Since most transportation modes [8] based upon the object-oriented paradigm and its relevant are based on network like structures, they have a significant design patterns has been extended with respect to following impact on the notion of proximity. Therefore, two objects functionalities: might be regarded as near to each other in one context, but far in another. In our example, the result of the query above A. Persistence of Data is a set of objects that is close to R and is of type O. In MPLL spatial relations cannot be entirely hardcoded. On For any application dealing with high volumes of data, the contrary, spatial relations have to be user definable, so storing these solely in transient memory is not only unsat- that from a set of predefined as well as user-defined relations isfactory but inconceivable. Instead, making them persistent new relations can be composed. MPLL will therefore provide is the passable solution. In order to overcome this problem basic quantitative relations, such as those based on angular and in TransRoute, the Graph eXchange Language (GXL) [9] distal values, as well as qualitative relations and interpretations defined in an XML vocabulary is employed for exchanging thereof. Based on angular values, the relation “between” for graphs. Consequently, both storing and retrieving graphs from a persistent repository are possible. The key advantages using example could be defined as follows, θAB being the bearing between points A and B. If GXL are revolving its advanced data model, in which e.g. nested graphs and attributes of various basic datatypes are ◦ ◦ included. |θAB − θAC | = 180 ± 5 (2)

1 then the statement “A is located between B and C” is true . B. Hierarchic Graphs Optionally, this relation can be extended to support fuzzy logic Special emphasis is put on cluster-like hierarchic structures by mapping the (limited) deviation from the core value 180◦ [10], [11], [12], [13], [14] which essentially compose geospa- (i.e. the interval of the support values around the core value) tial networks. to values in the interval [0, 1]. The basic idea behind is that graphs of a particular level are being condensed to one node of the level above. The former can also be seen in terms of clustering, i.e. compressing a graph to one node is a natural grouping into a logical (or sometimes physical) cluster. By adopting a high-level viewpoint of transport networks (represented as a vertex on this abstract level), they can not only be matched to regions, but also to network types indicating the modes of transportation which may be used. The essential idea focusses on employing one network for each mode of transportation as a separate graph, which fits well into our hierarchic concept. Fig. 3. 2-dimensional Fuzzy Distance C. Ontologies The same approach can be used for distal relations. An Ontologies have proven to be well-suited to describe com- omnidirectional fuzzy distribution around a point in space is plex entities and features of a modelled domain in a machine- shown in fig. 3. For each point in planar space, i.e. each pair processable way. For our purpose, TransRoute uses the On- of coordinates, the value on the z axis represents the fulfilment tology for Transport Networks (OTN) [15] which has been of the distal relation. developed in the Web Ontology Language (OWL) [16]. OTN is used for the abstract modelling layer for representing V. TRANSROUTE:AFRAMEWORKFOR GRAPHS domain concepts. Graphs, vertices and edges can be created Focal points of this component are both representation and by specifying the domain concept they ultimately stand for, manipulation of graphs in a programming environment. A thus opening the bridge to domain-specific knowledge. large variety of algorithms ranging from connectivity and clustering over flows to shortest path computation are provided D. Interfaces for Algorithms by TransRoute [7]. Interfaces are an elegant way for conveying the idea of In particular, graphs are considered as first-class generic design by contract: Since there are situations in which one can data types as part of a graph library, each of which can be choose from different algorithms solving the same problem, instanciated with concepts of a domain ontology, and also the strategy pattern has been applied: all algorithms (plug- ins) are realising the same contract imposed by the interface 1This refers to a navigational context, i.e. regarding free movement in space. In a network based environment, as mentioned in the introduction, IAlgorithm. Hence, properties of different algorithms can be this approach might not be suitable. compared. OHLBACH et al.: GEOSPATIAL INFORMATION PROCESSING IN WG A1. REWERSE annual meeting, 2006 4

E. Application In a first case study based on the metropolitan street network data of Munich, encoded in an OWL instance file containing a total of 13300 vertices and 19887 edges, the algorithms of TransRoute could be tested on real data. These preliminary results are promising, yet further data of a larger magnitude (for Germany) will be taken into account for testing efficiency in practical application.

REFERENCES [1] S. Babu and J. Widom, “Continuous Queries over Data Streams,” SIGMOD Record, vol. 30, no. 3, pp. 109–120, 2001. [2] D. Olteanu, “Evaluation of XPath Queries against XML Streams,” Dissertation/Ph.D. thesis, Institute of Computer Science, University of Munich, 2005, phD Thesis, Institute for Informatics, University of Munich, 2005. [Online]. Available: http://www.pms.ifi.lmu.de/ publikationen/#PMS-DISS-2005-1 [3] D. Olteanu, T. Furche, and F. Bry, “An efficient single-pass query evaluator for XML data streams,” in SAC, 2004, pp. 627–631. [4] J. Clark and S. DeRose, “XML Path Language (XPath) Version 1.0,” W3C – World Wide Web Consortium,” Recommendation, 1999, http://www.w3.org/TR/xpath. [5] H. J. Ohlbach and B. Lorenz, “Dynamic Data for Geospatial Reasoning - A Local Data Stream Management System (L-DSMS) and a Case Study with RDS-TMC,” 2006. [6] M. Gahegan, “Proximity operators for qualitative spatial reasoning.” in COSIT, 1995, pp. 31–44. [7] E.-P. Stoffel, “A research framework for graph theory in routing applications,” Diplomarbeit/diploma thesis, Institute of Computer Science, LMU, Munich, 2005. [Online]. Available: http://www.pms.ifi. lmu.de/publikationen/#DA Edgar.Stoffel [8] “Sourceforge.net jung: Java universal network/graph framework.” [Online]. Available: http://jung.sourceforge.net [9] “Gxl: Graph exchange language.” [Online]. Available: http://www. gupro.de/GXL/ [10] A. Car, H. Mehner, and G. Taylor, “Experimenting with hierarchical wayfinding,” 1999. [Online]. Available: citeseer.ist.psu. edu/car99experimenting.html [11] P. Eades and Q.-W. Feng, “Multilevel visualization of clustered graphs,” in Proc. Graph Drawing, GD, no. 1190. Berlin, Germany: Springer-Verlag, 18–20 1996, pp. 101–112. [Online]. Available: citeseer.ist.psu.edu/eades97multilevel.html [12] D.-I. M. Rose, “Modeling of freeway traffic,” in 10th International Conference on Computing in Civil Engineering, Weimar, June 02-04 2004. [Online]. Available: http://www.bauinf.uni-hannover.de/ publikationen/ICCCEBPaperRose.pdf [13] B. Riedhofer, “Hierarchische strassengraphen,” 1997, master Thesis. University of Stuttgart, Faculty of Computer Science. [Online]. Available: elib.uni-stuttgart.de/opus/volltexte/1999/7/ [14] G. Busatto, “An abstract model of hierarchical graphs and hierarchical graph transformation,” Ph.D. dissertation, University of Paderborn, 2002. [15] F. Ipfelkofer, “Basisontologie und anwendungs-framework fur¨ visualisierung und geospatial reasoning,” Diplomarbeit/diploma thesis, Institute of Computer Science, LMU, Munich, 2004. [Online]. Available: http://www.pms.ifi.lmu.de/publikationen/#DA Frank.Ipfelkofer [16] “W3c owl: Web ontology language.” [Online]. Available: http: //www.w3.org/TR/owl-features/

REWERSE

Workpackage A2

Towards a Bioinformatics Semantic Web 1 Demos A detailed description of the below A2 demos is available in Deliverable A2-D4.

1.1 GoProteins

GoProteins allows users to browse the GeneOntology and retrieve protein structure data from the web. It uses ontologies, deduction rules, reaction rules, databases and XML.

1.2 Sambo

Sambo integrates bio-ontologies such as the GeneOntology and Mesh. It deploys novel algo- rithms for concept mapping.

1.3 BioRevise

BioRevise reasons over metabolic networks using techniques from belief revision.

1.4 EarthFeed

EarthFeed reads RSS feeds on the web, extracts locations and maps them onto a map of the world. The demonstrator uses XML and reactivity.

1.5 GoPubMed

GoPubMed allows users to explore PubMed search results with the GeneOntology. The dem- onstrators can be shown at the annual review meeting.

2 Extended Abstracts

2.1 Analyzing Gene Expression Profiles with Semantic Web Reasoning

Authors: Liviu Badea, Doina Tilivea, Anca Hotaran Analyzing Gene Expression Profiles with Semantic Web Reasoning (Extended Abstract)

Liviu Badea 1, Doina Tilivea 1, Anca Hotaran 1

1AI Lab, National Institute for Research and Development in Informatics 8-10 Averescu Blvd., Bucharest, Romania [email protected]

Abstract. We argue that Semantic Web reasoning is an ideal tool for analyzing gene expression profiles and the resulting sets of differentially expressed genes produced by high-throughput microarray experiments, especially since this involves combining not only very large, but also semantically and structurally complex data and knowledge sources that are inherently distributed on the Web. In this paper, we describe an initial implementation of a full-fledged system for integrated reasoning about biological data and knowledge using Sematic Web reasoning technology and apply it to the analysis of a public pancreatic cancer dataset produced in the Pollack lab at Stanford.

1 Introduction

The recent breakthroughs in genomics have allowed new rational approaches to the diagnosis and treatment of complex diseases such as cancer or type 2 diabetes. The role of bioinformatics in this domain has become essential, not just for managing the huge amounts of diverse data available, but also for extracting biological meaning out of heterogeneous data produced by different labs using widely different experimental techniques. The study of complex diseases has been revolutionized by the advent of whole-genome measurements of gene expression using microarrays. These allow the determination of gene expression levels of virtually all genes of a given organism in a variety of different samples, for example coming from normal and diseased tissues. However, the initial enthusiasm related to such microarray data has been tempered by the difficulty in their interpretation. It has become obvious that additional available knowledge has to be somehow used in the data analysis process. However, the complexity of the types of knowledge involved render any known data analysis algorithm inapplicable. Thus, we need to integrate at a deep semantic level the existing domain knowledge with the partial results from data analysis. Semantic Web technology, and especially the reasoning facilities that it will offer turn out to be indispensable in the biological domain at all levels: - At the lower data access level, we are dealing with huge data- and knowledge bases that are virtually impossible to duplicate on a local server. A mediator-type architecture [15] would therefore be useful for integrating the various resources and for bridging their heterogeneity. - At the level of data schemas, we frequently encounter in this domain very complex semi-structured data sources – accessing their contents at a semantic level requires precise machine-interpretable descriptions of the schemas. - Finally, the data and knowledge refer to complex conceptual constructions, which require the use of common domain ontologies for bridging the semantic heterogeneities of the sources. In the following we describe an initial attempt at developing a full-fledged system for integrated reasoning about biological data and knowledge using Sematic Web reasoning technology. The system is designed as an open system, able to quickly accommodate various data sources of virtually all types (semi- structured, textual, databases, etc.). At this time, we are using the state-of-the-art XML query language XQuery [8] for implementing the wrappers to the Web-based sources (either in XML or possibly non-well- formed HTML), the Flora2 [9] F-logic implementation for reasoning and a Tomcat-based implementation of the Web application server. 2 The pancreatic cancer dataset

In the following we describe an application of the technology to the analysis of a public pancreatic cancer dataset produced in the Pollack lab at Stanford [1]. Bashyam et al. [1] have performed simultaneous array Comparative Genomic Hybridization and microarray expression measurements on a set of 23 human pancreatic cell lines (with two additional normal-normal reference array-CGH measurements) using cDNA microarrays containing 39632 human cDNAs (representing about 26000 named human genes). Array-CGH measurements involved co-hybridizing Cy5-labeled genomic DNA from each cell line along with Cy3- labeled sex-matched normal leukocyte DNA. Expression profiling was performed with reference RNA derived from 11 different human cell lines. We retrieved the normalized intensity ratios from the Stanford Microarray Database [4] and used the CGH-Miner software as described in [1] to identify DNA copy number gains and losses. θ = Expression ratios were called significant if they either exceed the threshold EXPR + 2 , or were θ = below EXPR − 0.5 . Since for certain microarray spots expression ratios may be poorly defined (mainly due to low intensities in one of the two channels), we only retained genes whose expression ratios were well measured in at least 14 of the 23 samples. Unlike Bashyam et al. who performed mean centering of the (log- )expression ratios of the genes (to emphasize their relative levels among samples), we avoid mean- centering or variance normalization of the ratios since we are interested in identifying systematically over/under-expressed genes, the expression level being important for this purpose. We constructed the following two lists of “common” up- and respectively down-regulated genes: Common+ and Common−

3 The data sources

The architecture of the application is presented in Figure 2 in the Appendix. The application uses various data and knowledge sources, ranging from semi-structured data to databases of literature-based paper abstracts. We initially integrated the following sources: NCBI/Gene. The e-utilities [10] interface to the NCBI Gene database [11] returns gene-centred information in XML format. We extracted using an XQuery wrapper gene symbols, names, descriptions, domains (originating from Pfam or CCD), and literature references. We also extracted the Gene Ontology (GO) [12] annotations of the genes, as well as the pathways1 and interactions2 in which these are known to be involved. TRED. The Transcriptional Regulatory Element Database TRED [7] contains knowledge about transcription factor binding sites in gene promoters. Such information is essential for determining potentially co-expressed genes and for linking them to signaling pathways. Biocarta [6] is pathway repository containing mostly graphical representations of pathways contributed by an open community of researchers. We have developed an XQuery wrapper that currently extracts the lists of genes involved in the various pathways. Pubmed. Literature references to genes and their interactions extracted from Pubmed abstracts [13] will also be integrated into the system. The above sources contain complementary information about the genes, their interactions and pathways, neither of which can be exploited to their full potential in isolation. For example, the GO annotations of genes can be used to extract the main functional roles of the genes involved in the disease under study. Many such genes are receptors or their ligands, intra-cellular signal transducers, transcription factors, etc.

1 Originating from KEGG or Reactome. 2 Taken e.g. from BIND or HPRD. And although many of these genes are known to be involved in cancer (as oncogenes or tumor suppressors), the GO annotations will not allow us to determine their interactions and pathway membership. These can only be extracted from explicit interaction or pathway data-sources, such as TRED, BIND, Biocarta, etc.

4 A unified model of the data sources

In order to be able to jointly query the data sources, a unified model is required. We used the prototype system described in [16] to implement a mediator over the above-mentioned data sources. The system uses F-logic [9] for describing the content of information sources as well as the domain ontology. However, we also consider the possibility of using Xcerpt [17] at this level.

4.1 Mapping rules

Since the sources are heterogeneous, we use so-called “mapping rules” to describe their content in terms of a common representation or ontology. For example, we can retrieve direct interactions either from the gene-centred NCBI Gene database, or from TRED: di(I):direct_interaction[gene->G1, other_gene->G2, int_type->IntType, source->'ncbi_gene', description->Desc, pubmed->PM] :– query_source('ncbi_gene_interactions', 'bashyam')@query, I:interaction[gene->G1, other_gene->G2, description->Desc, pubs->PM]@'ncbi_gene_interactions', if (str_sub('promoter',Desc,_)@prolog(string)) then IntType = 'p-d' else IntType = 'p-p'. di(I):direct_interaction[gene->G1, other_gene->G2, int_type->IntType, source->'tred'] :– query_source('tred', 'bashyam')@query, I:interaction[tf->G1, gene->G2]@'tred', IntType = 'p-d'. The common representation refers to direct interactions by the direct_interaction Flora2 object. We distinguish between two types of interactions: - protein-to-DNA (‘p-d’), which refers to transcription regulatory influences between a protein and a target gene, and - protein-to-protein (‘p-p’), which comprises all other types of interactions. The distinction is important since the gene expression data analyzed reveals only changes in expression levels. Thus, while the protein-to-DNA interactions could in principle be checked against the expression data, the protein-to-protein interactions are complementary to the expression data3 and could reveal the cellular functions of the associated proteins. While certain types of knowledge are more or less explicit in the sources (for example, the interaction type is ‘p-d’ if the description of the interaction contains the substring ‘promoter’), in other cases we may have to describe implicit knowledge about sources (i.e. knowledge that applies to the source but cannot be retrieved from it – for example, the TRED database contains only interactions of type ‘p-d’, but this is nowhere explicitly recorded in the data).

4.2 Model rules

Although in principle the wrappers and the mapping rules are sufficient for being able to formulate and answer any query to the sources, it is normally convenient to construct a more complex model, that is as close as possible to the conceptual model of the users (molecular biologists/geneticists in our case). This is achieved using so called “model rules” which refer to the common representation extracted by the mapping rules to define the conceptual view (model) of the problem.

3 i.e. cannot be derived from it. For example, we may want to query the system about “functional” interactions (which are not necessarily direct interactions). More precisely, a functional interaction between two genes can be either due to a direct interaction, or to the membership in the same pathway, or to their co-reference in some literature abstract from Pubmed: pi(I1,I2):pathway_interaction[gene->G1, other_gene->G2, int_type->IntType, source->[Src1,Src2], pathway->P, role(G1)->R1, role(G2)->R2] :− I1:pathway[name->P, gene->G1, gene_description->GN1, role(G1)->R1, source->Src1], I2:pathway[name->P, gene->G2, gene_description->GN2, role(G2)->R2, source->Src2], interaction_type(R1,R2,IntType). interaction_type(target_gene, target_gene, coexpression) :− !. interaction_type(target_gene, Role2, transcriptional) :− Role2 \= target_gene, !. interaction_type(Role1, target_gene, transcriptional) :− Role1 \= target_gene, !. interaction_type(Role1, Role2, same_pathway) :− Role1 \= target_gene, Role2 \= target_gene, !. fi(I):functional_interaction[gene->G1, other_gene->G2, int_type->IntType, source->Src] :− I:direct_interaction[gene->G1, other_gene->G2, int_type->IntType, source->Src] ; I:pathway_interaction[gene->G1, other_gene->G2, int_type->IntType, source->Src] ; I:literature_interaction[gene->G1, other_gene->G2, int_type->IntType, source->Src]. We may also define classes of genes based on their GO annotations. For example, the following rules extract receptors, ligands and respectively transcription regulators: r(I):gene_role[gene->G, category->C, role->receptor, source->Src] :− I:gene_category[gene->G, category->C, source->Src], str_sub('receptor',C,_)@prolog(string), str_sub('activity',C,_)@prolog(string). r(I):gene_role[gene->G, category->C, role->ligand, source->Src] :− I:gene_category[gene->G, category->C, source->Src], str_sub('receptor',C,_)@prolog(string), ( str_sub('binding',C,_)@prolog(string) ; str_sub('ligand',C,_)@prolog(string) ). r(I):gene_role[gene->G, category->C, role->transcription_regulator, source->Src] :− I:gene_category[gene->G, category->C, source->Src], ( str_sub('DNA binding',C,_)@prolog(string) ; str_sub('transcription',C,_)@prolog(string) ).

Such classes of genes can be used to “fill in” templates of signaling chains, such as ligand  receptor  signal transducer … transcription factor, which could in principle be reconstructed using knowledge about interactions: generic_signaling_chain_interaction(ligand, receptor, 'p-p'). generic_signaling_chain_interaction(receptor, signal_transducer, 'p-p'). generic_signaling_chain_interaction(signal_transducer, signal_transducer, 'p-p'). generic_signaling_chain_interaction(signal_transducer, transcription_factor, 'p-p'). generic_signaling_chain_interaction(transcription_factor, target_gene, 'p-d'). generic_signaling_chain_interaction(modulator, receptor, 'p-p'). generic_signaling_chain_interaction(modulator, signal_transducer, 'p-p'). generic_signaling_chain_interaction(modulator, transcription_factor, 'p-p'). signaling_chain(sig_chain(G), G, Role) :− Role = receptor, _:gene_role[gene->G, role->Role]. signaling_chain(S, G2, Role2) :− signaling_chain(S, G1, Role1), generic_signaling_chain_interaction(Role1, Role2, IntType), _:direct_interaction[gene->G1, other_gene->G2, int_type->IntType, source->Src], _:gene_role[gene->G2, role->Role2]. Note that the signaling chains are initialized with receptors, since these are the starting points of signaling cascades and are typically affected in most cancer samples (including our pancreatic cancer dataset). In our cancer dataset analysis application, the transcription factors play an important role, since their gene targets’ co-expression can reveal the groups of genes that are differentially co-regulated in the disease: tf_binding(G1, G2, IntType) :− _:gene_role[gene->G1, category->C1, role->transcription_regulator], _:direct_interaction[gene->G1, other_gene->G2, int_type->IntType, source->Src], _:gene_list[gene->G2, list->common]. Figure 1 below shows the graph generated by the system in response to the following query (Cytoscape [18] is used for visualization): ?- show_graph(${tf_binding(TF,G,IntType)}, [TF,G,IntType]).

5 Conclusions and future work

Our initial experiments confirmed the feasibility of our approach and lead to a number of interesting observations. Although all processing was performed in-memory, the system was able to deal with the complete data-sources mentioned above for the selection of “common” genes (359 genes): - NCBI Gene interactions: 2239 - TRED interactions: 10717 - Biocarta gene to pathway membership relations: 5493 - NCBI gene to pathway membership relations: 622 - Other pathway membership relations: 5095 - GO annotations: 2394 - Domains: 614. From a certain perspective, the approach is a combination of remote-source mediation and data- warehousing. As in a mediation approach, only the relevant entries of remote data sources are retrieved, but these are stored in a local warehouse by the wrappers (in XML format) to avoid repetitive remote accesses over the Web. Such exploratory queries involving large datasets and combinatorial reasoning typically have slow response times (typically seconds to minutes if the relevant sources have been accessed previously and are therefore in the local warehouse; if not, response times depend on the size of the data to be transferred from remote sources and on the connection speed). However, as far as we know, other existing approaches are either slower4 or cannot deal with such datasets at all. Finally, there are certain technical issues whose improvement would lead to a significantly better Semantic Web reasoning system: - Query planning - Streaming - Source capabilities - Support for (semi-)automated development of wrappers.

4 In the case of systems based on plain Prolog (with no tabling or other similar optimizations). Figure 1. Transcription regulatory relationships among “common” genes in the Bashyam et al. pancreatic cancer dataset (arrows: ‘p-d’, undirected edges: ‘p-p’ interactions) From the biological point of view, the system has proved to be very useful for creating a global “picture” of the interactions among the genes differentially expressed in pancreatic cancer. The large number (359) of these genes would have made the task extremely difficult, if not impossible for a human exploration of the data sources. For example, note the involvement of: - the Epidermal Growth Factor Receptor EGFR, known to be involved in any cancers - BCL2, a gene involved in the apoptotic response of cells (note that the down-regulation of BCL2 in pancreatic cancer is quite unusual for an anti-apoptotic gene, since it is normally over-expressed in other tumor types [14]) - the transcription factors FOS, MYB, LEF1 - the metalloproteinases MMP3, and MMP7 (involved in tissue remodeling, invasion, tumor progression, metastasis and tumor initiation – in the case of MMP3) - the nuclear receptor PPARG, a regulator of differentiation known to be involved in cancer and PPARGC1A, its coactivator. The biological interpretation of the results is outside the scope of this paper and will be discussed elsewhere in a specialized paper. 6 References

1. Bashyam MD et al.Array-based comparative genomic hybridization identifies localized DNA amplifications and homozygous deletions in pancreatic cancer.Neoplasia. 2005 Jun;7(6):556-62 2. Westphal S, Kalthoff H.Apoptosis: targets in pancreatic cancer.Mol Cancer. 2003 Jan 7;2:6. Review. 3. Lipson D, et al. Joint Analysis of DNA Copy Numbers and Gene Expression Levels Proceedings of Algorithms in Bioinformatics: 4th International Workshop, WABI 2004, Bergen, Norway, September 17- 21, 2004, Lecture Notes in Computer Science (LNCS), Vol. 3240/2004, p.135, Springer 2004. 4. The Stanford Microarray Database. http://genome-www5.stanford.edu 5. Bhattacharjee et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA. 2001 Nov. 20;98(24):13790-5. 6. Biocarta. www.biocarta.com 7. TRED. http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=home 8. Qizxopen. http://www.xfra.net/qizxopen/ 9. Flora2. http://flora.sourceforge.net/ 10. NCBI e-utilities. http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html 11. NCBI Gene. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene 12. Gene Ontology. http://www.geneontology.org/ 13. Pubmed. http://www.ncbi.nih.gov/entrez/query.fcgi?db=PubMed 14. Westphal S, Kalthoff H.Apoptosis: targets in pancreatic cancer.Mol Cancer. 2003 Jan 7;2:6. Review. 15. Wiederhold G. Mediators in the architecture of future information systems, IEEE Comp. 25(3) 1992, 38-49. 16. Liviu Badea, Doina Tilivea, Anca Hotaran. Semantic Web Reasoning for Ontology-Based Integration of Resources. Proc. PPSWR 2004, pp. 61-75, Springer Verlag. 17. Berger S., Bry F., Schaffert S., Wieser C. Xcerpt and visXcerpt: From Pattern-Based to Visual Querying of XML and Semistructured Data. Proceedings VLDB03, Berlin, September 2003, http://www.xcerpt.org/. 18. Cytoscape. http://www.cytoscape.org

Appendix.

Figure 2. The architecture of the pancreatic cancer dataset analysis application

REWERSE

Workpackage A3

Personalised Information Systems 1 Demos

1.1 The Personal Reader Framework: Personalization Services for the Semantic Web

This application demonstrates how to provide personalized, syndicated views on distributed Web data using Semantic Web technologies. The application comprises four steps: The in- formation gathering step, in which information from distributed, heterogeneous sources is extracted and enriched with machine-readable semantics, the operation step for timely and up- to-date extractions, the reasoning step in which rules reason about the created semantic de- scriptions and additional knowledge-bases like ontologies and user profile information, and the user interface creation step in which the RDF-descriptions resulting from the reasoning step are interpreted and translated into an appropriate, personalized user interface. We have developed this application for solving the following real-world problem: We provide person- alized, syndicated views on the publications of a large European research project with more than twenty geographically distributed partners and embed this information with contextual information on the project, its working groups, information about the authors, related publica- tions, etc. Contact: Nicola Henze (Hannover)

2 Extended Abstracts

2.1 Verifying web service conformance and interoperability w.r.t. a global choreography

Authors: Matteo Baldoni, Cristina Baroglio, Alberto Martelli, Viviana Patti

2.2 The Personal Publication Reader

Authors: Fabian Abel, Robert Baumgartner, Adrian Brooks, Christian Enzi, Georg Gottlob, Nicola Henze, Marcus Herzog, Matthias Kriesell, Wolfgang Nejdl, and Kai Tomaschewski1

Verifying web service conformance and interoperability w.r.t. a global choreography

Matteo Baldoni, Cristina Baroglio, Alberto Martelli, Viviana Patti Dipartimento di Informatica Universita` degli Studi di Torino C.so Svizzera, 185 — I-10149 Torino (Italy) Email: {baldoni,baroglio,mrt,patti}@di.unito.it

Abstract— Global choreographies define the rules that peers provide descriptions of the service capabilities, business pro- should respect in their interaction, with the aim of guaranteeing cess orchestration and choreography at the syntactic level. interoperability. An abstract choreography can be seen as a Such descriptions do not rely on well-founded models that protocol specification; it does not refer to specific peers and, especially in an open application domain, it might be necessary make possible to define access and usage mechanisms without to retrieve a set of web services that fit in it. A crucial issue, that necessitating human intervention and to perform the analysis is raising attention, is verifying whether the business process of of the described process. But the capability of performing some peers, in particular the parts that encode the communicative this analysis is fundamental to the real implementation of behavior, will produce interactions which are conformant to the those sophisticate forms of flexibility and composition that one agreed protocol (legality issue). Such issue is tackled by the so called conformance test, which is a means for certifying the expects in the context of the personalization on web. To meet capability of interacting of the involved parts: two peers that are this requirement, one possibility is to focus on giving to the proved conformant to a same protocol will actually interoperate standard languages a formal semantics, by translation into for- by producing a legal conversation. This work proposes an mal models. Part of the formal methods community focussed approach to the verification of a priori conformance of a business the attention on capturing the behavior of BPEL and WS-CDL process to a protocol, which is based on the theory of formal languages and guarantees the interoperability of peers that are in a formal way, and many translations of such languages into individually proved conformant. models supporting analysis and verification (process algebras, petri nets, finite state machines) are currently under investi- I.INTRODUCTION gation [11], [10]. In parallel to the industrial standards, some proposals of standards for describing Semantic Web Services The next Web generation promises to deliver Semantic have been developed within the Semantic Web Initiative [15]. Web Services, that can be retrieved and combined in a way In this area we can distinguish two main approaches: IRS that satisfies the user. It opens the way to many forms of III [27], which is based on a knowledge oriented approach service oriented personalization. Indeed web services provide and relies on WSMO ontology [34] and OWL-S [30], which a suitable infrastructure for constructing Plug&Play-like envi- is based on an agent-oriented approach. The common goal ronments, where the user can select and combine the kinds of of such proposals is augmenting web services with seman- services s/he prefers. Personalization can be obtained by taking tic descriptions that enable some kind of automatic service different approaches, e.g. by developing services that offer discovery and composition. Such semantic descriptions may personalization functionalities, or by personalizing the way in concern services goal and capabilities as well as the possible which services are discovered, selected, invoked and composed compositions or choreographies. so to meet specific user’s requirements or by customizing the Let us focus on the description of the choreography of a composition of different services offering personalization. set of services. Recently W3C proposed the choreography A prerequisite to this is the definition of an infrastructure language WS-CDL [33], well-characterized and distinguished for semantic interoperability of web services provided by from languages for business process representation, like BPEL. the evolution of the Semantic Web initiatives. Indeed func- Choreographies aim at expressing global interaction protocols, tionalities for performing personalization require a machine- i.e. rules that define the global behavior of a system of processable knowledge layer that is not supplied by the current cooperating parties. The respect of these rules guarantees the web. Web services should be augmented with public machine- interoperability of the parties (i.e. the capability of actually interpretable semantic descriptions of their capabilities, such producing an interaction), and that the interactions will satisfy that a rational inspection of their behavior is enabled and new given requirements. applications encapsulating personalization functionalities can In the context of personalization, semantic descriptions be developed on this basis. of choreographies can provide a basis for enabling service So far, most of the current standard technologies for web discovery/composition personalized w.r.t. the user’s goal or services (e.g WSDL [16], BPEL4WS [9], WS-CDL [33]) for developing semantic personalization service that can be combined or customized w.r.t to the user’s requirements. This agent communication languages (ACL), e.g. FIPA [23] and is the idea behind the approach taken in [4] (see Section II). KQML [22]. Recently, most of the efforts have been devoted Services are augmented with a high-level description of their to the definition of formal models of interaction, based on global interaction protocols, about which agent applications conversation protocols. Protocols improve the interoperability can reason so as to personalize the selection and the compo- of the various components (often separately developed) and sition of services and meet specific user’s requirements. allow the verification of compliance to the desired standards. However one key issue for enabling open and interoperable The basic idea is to consider a service as a software agent personalization functionalities is the development of formal and the problem of composing a set of web services as the methods for verifying if the behavior of a service respects problem of making a set of software agents interact and a choreography. Such kind of verification is known in the cooperate within a MAS. This interpretation is quite natural, literature as conformance test. The applications would be and shared in proposals that are closer to the agent research various. A choreography could be used at design time (a priori) community or more properly set in the Semantic Web research for verifying if the internal processes of a new personalization field [12], [31]. Among the others, let us recall the OWL-S service enable it to participate appropriately in the interaction [30] experience. In [12] the goal of providing greater expres- with other personalization services, interaction that is specified siveness to service description in a way that can be reasoned by the choreography. At run-time, choreographies could be about has been pursued by exploiting agent technologies based used to verify if everything is proceeding according to the on the action metaphor; in particular, at the level of the current agreements. Section III reports our results concerning process model, a service is described in a way inspired by the the proof of interoperability and conformance of services to a agent language GOLOG and its extensions [28], [24], [29]. global description of their interaction. Reasoning techniques supported by the language are used to produce composite and customized services. II.PERSONALIZATIONOFTHEINTERACTIONWITHWEB On this line, we have studied the benefits provided by SERVICES a declarative description of the communicative behavior, in One of the needs that have inspired recent research [8] is terms of personalization of the service selection and compo- the study of declarative descriptions of web services, aimed sition. A better personalization can be achieved by focussing at allowing forms of automated interoperation that include, on the abstraction of web services as entities, that communi- the automation of matchmaking, of execution, and of service cate by following predefined, public and sharable interaction selection and composition, in a way that is customized w.r.t. protocols and by allowing agents to reason about high level the user’s goals and needs; indeed, a form of personalization descriptions of the interaction protocols followed by web [2]. In particular, selection and composition not always are to services. We model the interaction protocols provided by web be performed on the sole basis of general properties of the services by a set of logic clauses,using an extension of the services themselves, such as their category or their functional agent programming language DyLOG [7], [4]. compositionality, but they should also take into account the Having a logic specification of the protocol, it is possible to user’s intentions and purposes. As a quick example, consider reason about the effects of engaging specific conversations. We a service that allows buying products, alternatively paying have proposed to use techniques for reasoning about actions cash or by credit card: a user might have preferences on for performing the selection and composition of web services, the form of payment to enact. This information should be in a way that is customized w.r.t. the users’s request. Commu- taken into account when buying at an e-shop, singleing out nication can, in fact, be considered as the behavior resulting a specific course of interaction that allows buying cash. This from the application of a special kind of actions: speech acts. form of personalization can be obtained by applying reasoning The reasoning problem to face can intuitively be described techniques on a description of the service process, that has a as looking for a an answer to the question “Is it possible to well-defined meaning for all the parties involved. In this issue make a deal with this service respecting the user’s goals?”. it is possible to distinguish three necessary components: Given a logic-based representation of the service policies and • web services capabilities must be represented according a representation of the customer’s needs as abstract goals, to some declarative formalism with a well-defined seman- expressed by a logic formula, logic programming reasoning tics, as also recently observed by van der Aalst [32]; techniques are used for understanding if the constraints of the • automated tools for reasoning about such a description customer fit in with the policy of the service. must be developed; Our proposal can be considered as an approach based on • in order to gain flexibility, reasoning tools should repre- the process ontology, a white box approach in which part sent such requests as abstract goals. of the behavior of the services is available for a rational The approach that we propose in [4] inherits from the inspection. A description of the communicative behavior by experience of the research community that studies MAS and, policies is definitely richer than the list of input and output, in particular, logic-based formalizations of interaction aspects. precondition and effect properties usually taken into account Indeed, communication has intensively been studied in the for the matchmaking. Actually, the approach can be considered context of formal theories of agency [18], [17] and a great as a second step in the matchmaking process, which narrows a deal of attention has been devoted to the definition of standard set of already selected services and performs a customization of the interaction with them. these rules guarantees the interoperability of the parties (i.e. Present web service technology is quite primitive w.r.t. the the capability of actually producing an interaction), and that framework we propose, and the realization of the picture the interactions will satisfy given requirements. sketched above requires an integration, in the current tools In this context, a crucial problem is the development of for web service representation and handling, of knowledge formal methods for verifying if a service respects a chore- representation languages –in the line of those developed in ography. The applications would be various. A choreography the Semantic Web area– and of techniques for reasoning and could be used at design time for verifying if the internal dealing with communication, inspired by those studied in the processes of a service enable it to participate appropriately in area of MAS. Even if the integration is still far from being real the interaction. At run-time, choreographies could be used to let us describe our vision of the steps to be taken towards the verify if everything is proceeding according to the agreements. realization. Public descriptions of interaction protocols should A choreography could also be used unilaterally to detect be mapped to public descriptions of choreographies (e.g. WS- exceptions (e.g. a message was expected but not received) or CDL-like descriptions). A choreography defines a global view help a participant in sending messages in the right order and of the protocol followed by a certain service, e.g. a cinema ser- at the right time. vice, for accomplishing the cooperative task of booking a cin- In the last years the agent community already started to face ema ticket. A costumer service, that in principle is interested to the two above mentioned kinds of conformance w.r.t. MAS participate to the cooperation for booking a ticket for its user, [25] (e.g. see [19], [20], [5], [3] for a priori conformance, and translates such a description in the declarative language and [1] for run-time conformance). In the web service community uses reasoning techniques, supported by the language, plus the problem of conformance is arising only recently [13] its local knowledge on the user’s preferences for checking because so far the focus has been posed on the specification of whether the contract, defined by the choreography, satisfies single services and on standards for their remote invocation. its user. The outcome is meaningful under the following The new interest is emerging due to the growing need of assumption: the implementation of the cinema service behavior making services, that are heterogeneous (in kind of platform (that could be written in an execution language like BPEL) or in language implementation), to interoperate. Therefore, must be conformant w.r.t. the choreography specification that there is a need of giving more abstract representations of the is used. Verifying the conformance and the interoperability of interactions that allow to perform reasoning in order to select web services to a global protocol definition (to be provided at and compose services disregarding the specific implementation the choreography level) is definitely one of the hot topics at details. Given our experience in the area of MASs, where the the moment and is the issue addressed by the next section. heterogeneity of the components is a fundamental characteris- tic, we agree with the observation by van der Aalst [32] that III.WEBSERVICEINTEROPERABILITY there is a need for a more declarative representation of the According to Agent-Oriented Software Engineering [26], behaviour of services. a distinction is made between the global and the individ- In this line, the work in [5], [3] about conformance of ual points of view of interaction. The global viewpoint is agent implementations w.r.t. protocol specifications has been captured by an abstract protocol, expressed by formalisms adapted to the case of web services in [6]. In particular, in like AUML, automata or Petri Nets. The local viewpoint, [6] we focus on testing a priori conformance and develop instead, is captured by tthe agent’s policy; being part of the a framework based on the use of formal languages. In this agent’s implementation, the policy is usually written in some framework a global interaction protocol (a choreography), is executable language. Having these two levels of description it represented as a finite state automaton, whose alphabet is is possible to decide whether an agent can take a role in an the set of messages exchanged among services. It specifies interaction: this problem can be read as proving if the agent’s permitted conversations. Atomic services, that have to be policy conforms to the abstract protocol specification. composed according to the choreography, are described as A similar need of distinguishing a global and a local view of finite state automata as well. Given such a representation we the interaction is recently emerging also in the area of Service capture a concept of conformance that answers positively to Oriented Architectures. Here a distinction is made between all these questions: is it possible to verify that a service, the choreography of a set of services, a global specification playing a role in a given global protocol, produces at least of the desired interaction, and the concept of behavioral those conversations which guarantee interoperability with interface, seen as the specification of the interaction from other conformant service? Will such a service always follow the point of view of the individual service. The recent W3C one of these conversations when interacting with the other proposal of the choreography language WS-CDL [33], well- parties in the context of the protocol? Will it always be able to characterized and distinguished from languages for business conclude the legal conversations it is involved in? Technically, process representation, like BPEL, is emblematic. the conformance test is based on the acceptance of both the Taking this perspective, choreographies and agent commu- service behavior and the global protocol by a special finite nication protocols undoubtedly share a common purpose. In state automaton. Briefly, at every point of a conversation, we fact, they both aim at expressing the rules that define the global expect that a conformant policy never utters speech acts that behavior of a system of cooperating parties. The respect of are not expected, according to the protocol, and we also expect it to be able to handle any message that can possibly be [8] A. Barros, M. Dumas, and P. Oaks, “A critical overview of the web received, once again according to the protocol. However, the services choreography description language(ws-cdl),” Business Process Trends, 2005, http://www.bptrends.com. policy is not obliged to foresee (at every point of conversation) [9] BPEL4WS, “http://www-106.ibm.com/developerworks/library/ws-bpel,” an outgoing message for every alternative included in the 2003. protocol (but it must foresee at least one of them). [10] M. Bravetti, L. Kloul, and G. Zavattaro, Eds., Proc. of the 2nd Interna- tional Workshop on Web Services and Formal Methods (WS-FM 2005), The interesting characteristic of this test is that it guarantees ser. LNCS, no. 3670. Springer, 2005. the interoperability of services that are proved conformant [11] M. Bravetti and G. Zavattaro, Eds., Proc. of the 1st Int. Workshop on Web individually and independently from one another. By inter- Services and Formal Methods (WS-FM 2004). Elsevier Science Direct, 2004, vol. 105 of Electronic Notes in Theoretical Computer Science. operability we mean the capability of an agent of actually [12] J. Bryson, D. Martin, S. McIlraith, and L. A. Stein, “Agent-based producing a conversation when interacting with another. The composite services in DAML-S: The behavior-oriented design of an conformance test has been proved decidable when the lan- intelligent semantic web,” 2002. [Online]. Available: citeseer.nj.nec. com/bryson02agentbased.html guages used to represent all the possible conversations w.r.t. [13] N. Busi, R. Gorrieri, C. Guidi, R. Lucchi, and G. Zavattaro, “Chore- the policy and w.r.t. the protocol are regular. ography and Orchestration: a synergic approach for system design,” in The application of our approach is particularly easy in Proc. the 3rd Int. Conf. on Service Oriented Computing, 2005. [14] L. Cabac and D. Moldt, “Formal semantics for auml agent interaction case a logic-based declarative language is used to implement protocol diagrams,” in Proc. of AOSE 2004, 2004, pp. 47–61. the policies. In logic languages indeed policies are usually [15] T. Cabral, J. Domingue, E. Motta, T. Payne, and F. Hakimpour, expressed by Prolog-like rules, which can be easily converted “Approaches to semantic web services: An overview andcompar- isons,” in proceedings of the First European Semantic Web Symposium in a formal language representation. In [3] we show this by (ESWS2004), Heraklion, Crete, Greece, 2004. means of a concrete example where the language DyLOG [16] R. Chinnici, M. Gudgin, J. J. Moreau, and S. Weerawarana, “Web [7], based on computational logic, is used for implementing Services Pescription Language (WSDL) version 1.2,” 2003, working Draft. the agents’ policies. On the side of the protocol specification [17] F. Dignum, Ed., Advances in agent communication languages, ser. languages, currently there is a great interest in using informal, LNAI, vol. 2922. Springer-Verlag, 2004. graphical languages (e.g. UML-based) for specifying protocols [18] F. Dignum and M. Greaves, “Issues in agent communication,” in Issues in Agent Communication, ser. LNCS, vol. 1916. Springer, 2000, pp. and in the translation of such languages in formal languages 1–16. [14], [21]. By this translation it is, in fact, possible to prove [19] U. Endriss, N. Maudet, F. Sadri, and F. Toni, “Protocol conformance for properties that the original representation does not allow. In logic-based agents,” in Proc. of the 18th International Joint Conference on Artificial Intelligence (IJCAI-2003), G. Gottlob and T. Walsh, Eds. this context, in [5] we have shown an easy algorithm for Morgan Kaufmann Publishers, August 2003, pp. 679–684. translating AUML sequence diagrams to finite state automata [20] ——, “Logic-based agent communication protocols,” in Advances in thus enabling the verification of conformance. Of course, agent communication languages, ser. LNAI, vol. 2922. Springer-Verlag, 2004, pp. 91–107, invited contribution. having a declarative representation of the choreographies as [21] R. Eshuis and R. Wieringa, “Tool support for verifying UML activity well, would help the proof of these properties in the context diagrams,” IEEE Trans. on Software Eng., vol. 7, no. 30, 2004. of the web services. [22] T. Finin, Y. Labrou, and J. Mayfield, “KQML as an Agent Communi- cation Language,” in Software Agents, J. Bradshaw, Ed. MIT Press, 1995. REFERENCES [23] FIPA, “Communicative act library specification,” FIPA (Foundation for [1] M. Alberti, M. Gavanelli, E. Lamma, P. Mello, and P. Torroni, “Spec- Intelligent Physical Agents), Tech. Rep., 2002. ification and verification of agent interactions using social integrity [24] G. D. Giacomo, Y. Lesperance, and H. Levesque, “Congolog, a concur- constraints,” in Proc. of the Workshop on Logic and Communication rent programming language based on the situation calculus,” Artificial in Multi-Agent Systems, LCMAS 2003, ser. ENTCS, W. van der Hoek, Intelligence, vol. 121, pp. 109–169, 2000. A. Lomuscio, E. de Vink, and M. Wooldridge, Eds., vol. 85(2). Eind- [25] F. Guerin and J. Pitt, “Verification and Compliance Testing,” in Com- hoven, the Netherlands: Elsevier, 2003. munication in Multiagent Systems, ser. LNAI, M. Huget, Ed., vol. 2650. [2] M. Baldoni, C. Baroglio, and N. Henze, “Personalization for the Seman- Springer, 2003, pp. 98–112. tic Web,” in Reasoning Web, ser. LNCS Tutorial, vol. 3564. Springer, [26] M. P. Huget and J. Koning, “Interaction Protocol Engineering,” in 2005, pp. 173–212. Communication in Multiagent Systems, ser. LNAI, H. Huget, Ed., vol. [3] M. Baldoni, C. Baroglio, A. Martelli, and V. Patti, “Verification of 2650. Springer, 2003, pp. 179–193. protocol conformance and agent interoperability,” in Pre-proc. of Sixth [27] I. III, “http://kmi.open.ac.uk/projects/irs/.” International Workshop on Computational Logic in Multi-Agent Systems, [28] H. J. Levesque, R. Reiter, Y. Lesperance,´ F. Lin, and R. B. Scherl, CLIMA VI, 2005, pp. 12–27. “GOLOG: A Logic Programming Language for Dynamic Domains,” J. [4] ——, “Reasoning about interaction protocols for customizing web of Logic Programming, vol. 31, pp. 59–83, 1997. service selection and composition,” Journal of Logic and Algebraic [29] S. McIlraith and T. Son, “Adapting Golog for Programmin the Semantic Programming, special issue on Web Services and Formal Methods, 2006, Web,” in 5th Int. Symp. on Logical Formalization of Commonsense to appear. Reasoning, 2001, pp. 195–202. [5] M. Baldoni, C. Baroglio, A. Martelli, V. Patti, and C. Schifanella, “Ver- [30] OWL-S, “http://www.daml.org/services/owl-s/1.1/,” 2004. ifying protocol conformance for logic-based communicating agents,” [31] K. Sycara, “Brokering and matchmaking for coordination of agent in Proc. of 5th Int. Workshop on Computational Logic in Multi-Agent societies: A survey,” in Coordination of Internet Agents, A. O. et al., Systems, CLIMA V, ser. LNCS, no. 3487, 2005, pp. 192–212. Ed. Springer, 2001. [6] ——, “Verifying the conformance of web services to global interaction [32] W. M. P. van der Aalst, M. Dumas, A. H. M. ter Hofstede, N. Russell, protocols: a first step,” in Proc. of 2nd Int. Workshop on Web Services H. M. W. Verbeek, and P. Wohed, “Life after BPEL?” in Proc. of WS- and Formal Methods, WS-FM 2005, ser. LNCS, no. 3670, 2005, pp. FM’05, ser. LNCS, vol. 3670. Springer, 2005, pp. 35–50, invited 257–271. speaker. [7] M. Baldoni, L. Giordano, A. Martelli, and V. Patti, “Programming [33] WS-CDL, “http://www.w3.org/tr/2004/wd-ws-cdl-10-20041217/,” 2004. Rational Agents in a Modal Action Logic,” Annals of Mathematics and [34] WSMO, “http://www.wsmo.org/,” 2005. Artificial Intelligence, Special issue on Logic-Based Agent Implementa- tion, vol. 41, no. 2–4, pp. 207–257, 2004. The Personal Publication Reader

Fabian Abel1, Robert Baumgartner2,3, Adrian Brooks3, Christian Enzi2, Georg Gottlob2,3, Nicola Henze1, Marcus Herzog2,3, Matthias Kriesell4, Wolfgang Nejdl1, and Kai Tomaschewski1

1 Research Center L3S & Information Systems Institute, University of Hannover, {abel,henze,nejdl,tomaschewski}@kbs.uni-hannover.de 2 DBAI, Institute of Information Systems, Vienna University of Technology {baumgart,enzi,gottlob,herzog}@dbai.tuwien.ac.at 3 Lixto Software GmbH, Donau-City-Strasse 1/Gate 1, 1220 Vienna, Austria {baumgartner,brooks,gottlob,herzog}@lixto.com 4 Inst. f. Math. (A), University of Hannover [email protected]

Abstract. This application demonstrates how to provide personalized, syndicated views on distributed Web data using Semantic Web technolo- gies. The application comprises four steps: The information gather- ing step, in which information from distributed, heterogenous sources is extracted and enriched with machine-readable semantics, the oper- ation step for timely and up-to-date extractions, the reasoning step in which rules reason about the created semantic descriptions and addi- tional knowledge-bases like ontologies and user profile information, and the user interface creation step in which the RDF-descriptions re- sulting from the reasoning step are interpreted and translated into an appropriate, personalized user interface. We have developed this appli- cation for solving the following real-world problem: We provide person- alized, syndicated views on the publications of a large European research project with more than twenty geographically distributed partners and embed this information with contextual information on the project, its working groups, information about the authors, related publications, etc. keywords: web data extraction, web data syndication, personalized views.

Introduction

In today’s information society, the World Wide Web plays a prominent role for disseminating and retrieving information: lots of useful information can be found in the web, from train departure tables to consultation hours, from scientific data to online auctions, and so on. While this information is already available for consumption by human users, we lack applications that can collect, evaluate, combine, and re-evaluate this information. Currently, users retrieve online con- tent in separate steps, one step for each information request, and evaluate the information chunks afterwards according to their needs: e.g. the user compares the train arrival time with the starting time of the meeting he is requested to participate in, etc. Another common scenario for researchers is that a user reads some scientific publication, gets curious about the authors, other work of the au- thors, on related work targeting on similar research questions, etc. Linking these information chunks together is a task that can currently not be performed by machines. In our application, we show how to solve this information integration problem for the latter mentioned “researcher scenario”. We show, how to 1. extract information from distributed and inhomogeneous sites, and create semantic descriptions of the extracted information chunks, 2. maintain the Web data extraction to ensure up-to-date information and se- mantic descriptions, 3. reason about the created semantic descriptions and additional, ontological knowledge, and 4. create syndicated, personalized views on Web information. The Personal Publication Reader extends the idea of Semantic Portals like e.g. SEAL [3] or others with the capability of extracting and syndicating Web data from various, distributed sites or portals which do not belong to the own- ership of the application itself.

1 Information Extraction and Annotation with Semantic Descriptions

In our application, the web pages from which we extract the information are maintained by partners of the research project REWERSE, thus the sources of the information are distributed and belong to different owners which pro- vide their information in various ways and formats (HTML, Java-script, PHP- generated pages, etc.). Moreover, in each list, authors, titles and other entities are potentially characterized in a different way, and different order criteria are enforced (e.g. by year or by name). Such a web presentation is well suited for human consumption, but hardly usable for automatic processing. Nevertheless, the web is the most valuable information resource in this scenario. In order to access and understand these heterogeneous information sources one has to apply web extraction techniques. The idea of our application is to “wrap” these hetero- geneous sources into a formal representation based on Semantic Web standards. In this way, each institution can still maintain their own publication list and at the same way we can offer an integrated and personalized view on this data by regularly extracting Web data from all member sites. This application is open in the sense that it can be extended in an easy way, i.e. by connecting additional web sources. For instance, abstracts from www.researchindex.com can be queried for each publication lacking this in- formation and joined to each entry. Moreover, using text categorization tools one can rate and classify the contents of the abstracts. Another possibility is to extract organization and person data from the institution’s web pages to inform the ontology to which class in the taxonomy an author belongs (such as full professor). Web extraction and annotation in the Personal Publication Reader is performed by the Lixto Suite. First, with the Lixto Visual Wrapper [1] for each type of web site a so-called wrapper is created; the application designer vi- sually and semi-automatically defines the characteristics of publication elements on particular web sites based on characteristics of the particular HTML presen- tation and some possible domain knowledge. After a wrapper has been generated it can be applied to a given web site (e.g. publications of University of Munich) to generate an “XML companion” that contains the relevant information stored in XML using (in this application context meaningful) XML tags.

2 Extraction Maintenance

In the next step, in the Lixto Transformation Server application designer visu- ally composes the information flow from web sources to an RDF presentation that is handed over to the Personal Publication Reader once a week. Then the application designer defines a schedule how often which Web source is queried and how often the information flow is executed. Additionally Deep Web navi- gation macros possibly containing logins and forms are created. As a next step in the data flow, the data is harmonized to fit into a common structure, and e.g. an attribute “origin” is added containing the institution’s name, and author names are harmonized by being mapped to a list of names known by the sys- tem. Finally, the XML data structure is mapped to a pre-defined RDF schema structure. Once the wrappers are in place, the complete application runs without further human interference, and takes care of publication updates. In case future extractions fail the application designers will receive a notification.

3 Reasoning for Syndicated & Personalized Views on Distributed Web Data

In addition to the extracted dynamic information, we maintain data about the members of the research project from the member’s corner of the REWERSE project web site. We have constructed an ontology for describing researchers and their involvement in scientific projects like REWERSE, which extends the known Semantic Web Research Community Ontology (http://ontobroker. se- manticweb.org/ontos/swrc.html) with some project-specific aspects. Personalization rules reason about all this dynamic and static data in order to create syndicated and personalized views. As an example, the following rule (using the TRIPLE[4] syntax) determines all authors of a publication:

FORALL A, P authors(A, P) <- P[dc:creator -> A]@’http:..’:publications. In this rule, @’http:..’:publications is the name of the model which contains the RDF-descriptions of the extracted publication informations. Further rules combine information on these authors from the researcher ontology with the author information. E.g. the following rule determines the employer of a project member, which might be a company, or a university, or, in general, some instance of a subclass of an organization (see line three below: here, we query for some subclass (direct or inferred) of the class “Organization” ): FORALL A,I works_at(A, I) <- EXISTS A_id,X (name(A_id,A) AND ont:A_id[ont:involvedIn -> ont:I]@’http:...#’:researcher AND ont:X[rdfs:subClassOf -> ont:Organization]@rdfschema(’..’:researcher) AND ont:I[rdf:type -> ont:X]@’http:...#’:researcher). Disambiguation of results – here especially resource identification problems caused by varying author names – is achieved by an additional name identifica- tion step. For a user with specific interests, for example “interest in personalized information systems”, information on respective research groups in the project, on persons working in this field, on their publications, etc., is syndicated.

4 User Interface Provision

We run the Personal Publication Reader within our Personal Reader framework for designing, implementing and maintaining personal Web Content Readers [2]. These personal Web Content Readers allow a user to browse information (the Reader part), and to access personal recommendations and contextual infor- mation on the currently regarded Web resource (the Personal part). For the Personal Publication Reader, we instantiated a personalization Web service in our Personal Reader framework which holds the above mentioned rules. An ap- propriate visualization Web service for displaying the results of the reasoning step (which are provided as RDF documents and refer to an ontology of person- alization functionality) has been implemented.

Availability of the Personal Publication Reader

The concept of the Personal Publication Reader and its functionality are sum- marized in a video, and so are the Web data extraction and maintenance tasks. All demonstration videos and access to the application itself are available via http://www.personal-reader.de/semwebchallenge/sw-challenge.html.

References

1. R. Baumgartner, S. Flesca, and G. Gottlob. Visual Web Information Extraction with Lixto. In Proc. of VLDB, 2001. 2. N. Henze and M. Kriesell. Personalization Functionality for the Semantic Web: Architectural Outline and First Sample Implementation. In 1st Int. Workshop on Engineering the Adaptive Web (EAW 2004), Eindhoven, The Netherlands, 2004. 3. A. Maedche, S. Staab, N. Stojanovice, and R.Studer. Semantic portal - the seal ap- proach. In D. Fensel, J. Hendler, H. Lieberman, and W. Wahlster, editors, Spinning the Semantic Web, pages 317–359. MIT-Press, 2003. 4. M. Sintek and S. Decker. TRIPLE - an RDF Query, Inference, and Transformation Language. In International Semantic Web Conference (ISWC), Sardinia, Italy, 2002.

REWERSE

Workpackage ET

Education and Training 1 Description of Activities

1.1 Towards a graduate curriculum for the Semantic Web education

Authors: Jan Maluszynski, Jörg Diederich, Luis Moniz Pereira, Norbert Eisinger and Artur Wilk Towards a graduate curriculum for the Semantic Web education

Jan Maluszynski1, J¨orgDiederich2, Luis Moniz Pereira3, Norbert Eisinger4, and Artur Wilk1

1 Link¨opingUniversity, Department of Computer and Information Science, 581 83 Link¨oping, Sweden [email protected], [email protected] 2 L3S Research Center, Hannover, Germany [email protected] 3 Departmento de Inform´atica, Faculdade de Ciˆenciase Tecnologia, Universidade Nova de Lisboa, Portugal [email protected] 4 Ludwig-Maximilians-Universit¨atM¨unchen, Germany [email protected]

1 Introduction

One of the objectives of REWERSE Education and Training activity is to de- velop a curriculum recommendation for the graduate Semantic Web education. Understanding of the structure of this rapidly developing field is essential for providing recommendations for higher education curricula on M.Sc. and Ph.D levels, as well as for industrial courses and for supporting Semantic Web edu- cation with learning materials. An important aspect of the structuring effort is identification of relations between the Semantic Web and the existing body of knowledge in Computer Science. This extended abstract summarizes the existing draft proposal for such a structuring (REWERSE deliverable E-D7) with the objective to stimulate fur- ther discussion. It also refers to the IEEE/ACM Computer Science Curriculum CC2001 in an attempt to identify the undergraduate learning units which may be required as prerequistes in some options of the graduate Semantic Web edu- cation. The long range objective of the work is to develop recommendations for structure and options of graduate Semantic Web education. The present version of the proposed structure builds upon the previous work:

– REWERSE deliverable E-D1 presented information on already offered uni- versity courses relevant for the Semantic Web. – Analysis of the information in E-D1 resulted in a preliminary structure pre- sented in REWERSE deliverable E-D5, which was the subject of a discussion initiated by Hannover and involving members of Knowledge Web. The result of this discussion is a refined draft structure at https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb The proposed structure is used in the joint Knowledge Web and REWERSE educational infrastructure REASE5 for classification of learning materials.

2 Specifying Prerequistes

Different options of graduate Semantic Web education may require different background knowledge. This section attempts to identify prerequisites which may be needed in some options but not necessarily in all of them.

2.1 Relevant learning units in CC2001

We take IEEE/ACM CC2001 as standard reference to undergraduate curricula in Computer Science. We refer to the areas and units specified therein, using the terminology and the unit codes of CC2001. As a general guidance for the identification of prerequisites for graduate Se- mantic Web education we take the introductory sentence at the main page of W3C Semantic Web Activity (http://www.w3.org/2001): “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.” and quoted therein the statement by Tim Berners-Lee, James Hendler and Ora Lassila: “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation”. Thus the prerequisites for Semantic Web education should include topics in the following areas of the CS Body of Knowledge as defined in CC2001:

Information Management (IM) The following IM topics, as specified in CC2001, seem to be relevant in Semantic Web education

IM1 Information models and systems, IM2 Database systems, IM3 Data modeling, including conceptual models (entity-relationship and UML), relational data models, object-oriented models and semistructured data, IM5 Database query languages.

Generally the field of Information Management is very broad and its structuring in CC2001 may not be fully adequate for purposes of Semantic Web education. However, some topics like conceptual modelling or semistructured data are of direct importance for Semantic Web education.

5 http://rease.semanticweb.org Intelligent Systems (IS) The following IS topics, as specified in CC2001, are of particular importance IS3 Knowledge representation and reasoning (including review of propositional and predicate logic, resolution and theorem proving), IS5 Advanced knowledge representation and reasoning (with focus on descrip- tion logics, on nonmonotonic reasoning and on reasoning on action and change), IS6 Agents, IS7 Natural language processing. Courses on Knowledge Representation and Reasoning, Agents, and Natural Lan- guage processing offered in undergraduate curricula may not be sufficient for graduate Semantic Web education. In particular, we note that, due to an unfor- tunate decision, CC2001 does not include logic programming thus neglecting its importance in Knowledge Representation and Reasoning. These topics are very relevant for graduate Semantic Web education, among others as a prerequisite for studying rules on the Semantic Web. Thus a graduate Semantic Web pro- gram should offer specialized advanced courses on relevant topics not covered by undergraduate curricula.

Net-centric Computing (NC) The following NC topics, as specified in CC2001 are relevant as prerequisites for Semantic Web education: NC1 Introduction to net-centric computing, NC2 Communication and networking, NC3 Network security, NC5 Building web applications.

2.2 Other Foundational Topics The above listed CC2001 topics in undergraduate education give a general back- ground for Semantic Web education. Some of them may need additional ad- vanced courses. Also some of advanced foundational topics relevant for the Se- mantic Web are not covered by CC2001. The following list of foundational topics, reflecting previous discussions in REWERSE and in Knowledge Web, includes foundational topics from both categories mentioned above:

– Knowledge Engineering and Ontology Engineering • Methodologies, • Ontology population/generation, • Maintenance and versioning (dynamics), • Mapping/translation/matching/aligning (heterogeneity), • Validation, • Interoperability/Integration, • Modularization and Composition, • Tools; – Web information technologies • XML (including Namespaces, Schema Languages, XML query and trans- formation languages, XML programming techniques), • Web data integration, • Security, • Web services, • Personalization techniques, • Web data extraction/information extraction, • Architecture of Web Information Systems.

Notice that the above topics do not address explicitly the Semantic Web. How- ever, development of the Semantic Web relies to large extent on the use of ontologies and on the use of the above listed web technologies.

3 Structuring of the Semantic Web body of knowledge

This section presents the proposed structure of the Semantic Web body of knowl- edge which should be used as a basis for development of a recommendation for graduate Semantic Web curriculum and options. The structure is already used in REASE for classification of learning units. The REWERSE deliverable E- D7 shows how the topics in this structure are covered by the learning units in REASE and by the Semantic Web courses offered in the ERASMUS MUNDUS supported European Master Program in Computational Logic.

i. Knowledge Engineering / Ontology Engineering 1. Methodologies 2. Ontology population / generation 3. Maintenance and versioning (dynamics) 4. Mapping / translation / matching / aligning 5. Validation 6. Interoperability / Integration 7. Modularization and Composition 8. Tools ii. Knowledge Representation and Reasoning 1. Logic 2. Logic Programming 3. Reasoning iii. Basic Web information technologies 1. XML 2. Web data integration 3. Security 4. Web services RM35 5. Personalization techniques 6. Web data extraction 7. Architecture of Web Information Systems iv. Resource Description Framework / RDFSchema v. Semantic Web Query and Update Languages 1. Query Languages 2. Update Languages vi. Ontologies for the Semantic Web 1. Ontology representation / Ontology languages / OWL 2. Ontology Engineering 3. Ontology reasoners vii. Semantic Web Rules + Logic 1. Rule languages 2. Rule Markup 3. Reasoning languages 4. Rule reasoners viii. Proof in the Semantic Web ix. Security / trust / privacy in the Semantic Web x. Semantic Web Applications 1. Knowledge Management 2. e-learning 3. Bioinformatics 4. Multimedia 5. e-health 6. e-business 7. Law 8. Engineering 9. e-government xi. Semantic Web Special Topics 1. Natural language processing / human language technologies 2. Social impact of the Semantic Web 3. Social networks and Semantic Web 4. Peer-to-peer and Semantic Web 5. Agents and Semantic Web 6. Semantic Grid 7. Semantic Web Services 8. Outreach to industry 9. Benchmarking and scalability 10. Design and testbed case studies

4 Conclusions and Future Work

We attempted to identify a body of knowledge in the field of Semantic Web, to structure it and to identify its links to Computer Science. We referred to IEEE/ACM CC2001 document for identifying prerequisites for graduate Seman- tic Web education. The proposed structure is preliminary and will be subject of further discus- sion in REWERSE and in Knowledge Web. Also, revisions reflecting develop- ment of the Semantic Web will certainly be needed. We encourage all REWERSE members to participate in the discussion. We notice that according to the information at hand some of the topics in the structure are not yet supported by existing learning units/learning material in REASE. In some cases this may be caused by insufficient development of the field. For example this seems to apply to some of the listed topics in the area “x. Semantic Web Applications”, or to the area “vii. Semantic Web Rules” which only recently became a subject of W3C activities. Also the importance of some topics in the foundational areas “i. Knowledge Engineering/Ontology Engineering” and “iii. Basic Web information technologies” might not have been yet sufficiently explored in the context of the Semantic Web. The “proof level” (“viii. Proof”) postulated in the original Semantic Web vision of Tim Berners- Lee was not yet sufficiently explained. Proofs are inherent both on ontology level and rule level and there may be no need for special proof level. The future work includes: – Refinement of the proposed structure. The discussion preceding prepa- ration of this deliverable will be continued, taking into account recent de- velopments in the field, especially W3C activities, and new contributions to REASE. This may lead to revision of the proposed structure. The structure will be refined by suggesting the recommended content for the foundational topics not covered by CC2001 and for the core topics of the Semantic Web. In particular the recommendation should consider different options in grad- uate Semantic Web education, identifying the prerequisites and the elements of the structure to be covered by a defined option. – Supporting the structure by new learning units in REASE. This applies in the first hand to the topics in our structure which are not cov- ered by the existing units, and/or to the topics which are closely related to REWERSE research with particular focus on recent developments. These criteria have been taken into account while preparing the programme of the REWERSE Summer School 2006 (see REWERSE deliverable E-D8-1). The materials of the school will be uploaded in REASE. Among others they will address the following topics in the structure: • Bioinformatics and the Semantic Web, • Semantic Web query and update languages with particular focus on re- cent developments, • Rule languages for the Semantic Web with particular focus on W3C and on integration of rules and ontologies postulated in the original vision of the Semantic Web architecture. • Outreach to industry, not yet addressed by REWERSE will be a subject of two contributions. REWERSE will also encourage uploading to REASE learning materials of Erasmus Mundus supported European Master Program in Computational Logic.