<<

SomeWhere: a scalable peer-to-peer infrastructure for querying distributed ontologies

Marie-Christine Rousset

Joint work with Philippe Adjiman, Philippe Chatalic, 1 François Goasdoué, Gia-Hien Nguyen, Laurent Simon Focus of this talk How to make semantic approaches scalable to the web ?

ò A data centered vision of the Semantic Web – viewed as a huge semantic and distributed data management system

ò SomeWhere – a peer to peer infrastructure – based on simple personalized ontologies and mappings distributed at large scale

2 P2P Data Management Systems

ò Logical network of peers (≠ physical network) – each peer is characterized by ò its physical address (IP) ò a description of the stored resources ò its neighbors in the network – the peers to which it can transmit messages (queries,...) ò Various topologies – random and dynamic (Gnutella) – fixed (Chord, Hypercube) – guided by the semantics ò SON, Edutella, Piazza, DRAGO, coDB, Somewhere

3 SomeWhere logical networks ò The topology is not fixed ò Guided by mappings – A peer ò joins by declaring mappings between its ontology and the ontologies of some peers that it knows ò leaves by removing the mappings with its acquaintances in the network

4 SomeWhere in a nutshell

ò Simple data model based on a propositional language of classes – for defining ontologies, mappings, and queries – a sublanguage of OWL DL (W3C)

ò Scales up to one thousand peers – logical network : « small world »

5 SomeWhere Data Model

a+Data Schem A0

A1 A2

A3 A4 A5

St_A3 St_A5

Data Data

Storage description: Oenxteonlsoiognya:l chlaiessraesrchy of inMteornet icoonmaplle cx liancslusseiosn 6 statement: St_A1 A1 ¬A2 SomeWhere Data Model

QMuaepripeisn:gs: A LQog1icalQ c2o mbination of class A A literals: A1 ¬1 A3 2

A3

A B B A A1 (A2 1 A3)3 B13 ¬B13

B

Q B1 B2

B3 7 Semantics

ò Standard FO logical semantics – one single domain of interpretation – a distributed set of formulas interpreted in the same way as if they were not distributed – in contrast with some other approaches ò coDB: epistemic logic ò DRAGO: distributed semantics of DDL or DFOL based on a collection of domains of interpretations ò Our assumption: – the objects have a unique URI – objects stored at different peers and having the same URI are interpreted as being the same

8 Data model: example

P1 Musique Radio P2

Rock Pop Classique Mouv Rock Mouv Nostalgie

Français US St_pop Tchaikovsky St_Mouv St_Nostalgie

St_Français St_US St_Tchaikovsky

Rock Pop Pop_Rock Music Tchai Tchaikovsky Tchaikovsky Tchai Pop_Rock Classical

St_Pop_Rock Ru It

St_Ru Tchai 9 St_Tchai Query answering : illustration

P1 Musique Radio P2

Rock Pop Classique Mouv Rock Mouv Nostalgie

Français US St_pop Tchaikovsky St_Mouv St_Nostalgie Mouv? St_Français St_US St_Tchaikovsky St_Mouv

Pop? Rock ? St_Français SStt__PPoopp St_US

Rock Pop Pop_Rock Tchai Tchaikovsky Music Rewritings Tchaikovsky Tchai Pop_Rock Classical St_Pop_Rock St_Pop_Rock Ru It St_Francais St_Pop St_US St_Pop St_Ru Tchai St_Mouv St_Pop 10 Pop_Rock ? St_Tchai Query rewriting in SomeWhere

ò Decomposition of queries/recombination of rewritings – only atomic queries are transmitted to peers ò a complex query is splitted into atomic queries – each solicited peer processes a given atomic query q and incrementally sends back intentional answers for it ò (conjunction of) extensional classes that are rewritings of q – intentional answers of different atomic queries resulting from the split of a complex query must be recombined ò intentional answers can combine extensional classes of different peers ò Can be reduced to a consequence finding problem in distributed clausal propositional theories – Ontologies and mappings are encoded as clauses – The maximal conjunctive rewritings of a conjunctive query Q correspond exactly to the negation of the clauses that are proper prime implicates of the negation of Q w.r.t the union of the local theories and the mappings 11 Query rewriting algorithm

ò Message based local algorithm running on each peer – query, answer, and termination messages

ò Global properties – soundness – completeness – termination (even for cyclic networks)

12 http://www.lri.fr/~adjiman/somewhere/ Flash demo of the SomeWhere extension

13 P1

MyBookmarks

DVD Action Suspense Action Suspense Animation Cartoons

P P MyBookmarks 4 MyBookmarks 3

Comedy Humor DVD Movies Humor Humor Romance BruceWillis Action Comedy Thriller Cartoons Adult

Friends Humor Drama Thriller BenStiller Comedy

P MyBookmarks 6 P2 P5 MyBookmarks MyBookmarks IMDB

Genre Actors DramaComedy Drama T V Shows Drama Comedy BruceWillis BenStiller JuliaRoberts Drama Fantastic Gore

Friends Alias P1

Classes extensions Action Suspense Thriller Animation Cartoons

P4 P3

Comedy Humor Humor Comedy

BruceWillis Action

Friends Humor Drama Thriller BenStiller Comedy

P6

P2 P5

DramaComedy Drama P1

MyBookmarks Q1: Thriller ?

DVD Q2: NOT Adult ? Action Suspense Thriller Animation Q3: Thriller AND Comedy ? Action Suspense Animation Cartoons

P4 P3 MyBookmarks MyBookmarks

Comedy Humor DVD Movies Humor Comedy Humor Romance BruceWillis Action Comedy Thriller Cartoons Adult

Friends Humor Drama Thriller BenStiller Comedy

P MyBookmarks 6 P2 P5 MyBookmarks MyBookmarks IMDB

Genre Actors DramaComedy Drama T V Shows Drama Comedy BruceWillis BenStiller JuliaRoberts Drama Fantastic Gore

Friends Alias P1 Thriller ? MyBookmarks Rewritings:

DVD P3:Thriller P :Action P :Suspense Action Suspense Thriller 1 1 Animation P5:Drama Action Suspense Animation Cartoons P6:DramaComedy P2:BruceWillis P1:Suspense

P4 P3 MyBookmarks MyBookmarks

Comedy Humor DVD Movies Humor Comedy Humor Romance BruceWillis Action Comedy Thriller Cartoons Adult

Friends Humor Drama Thriller BenStiller Comedy

P MyBookmarks 6 P2 P5 MyBookmarks MyBookmarks IMDB

Genre Actors DramaComedy Drama T V Shows Drama Comedy BruceWillis BenStiller JuliaRoberts Drama Fantastic Gore

Friends Alias Rewritings of Thriller: evaluation

P3:Thriller Local

P1:Action P1:Suspence Navigational

P5:Drama Navigational

P6:DramaComedy Navigational P :BruceWillis P :Suspense 2 1 Integration

P3 P5

P1

P6

P5 SomeWhere infrastructure

N machines 1 peer per machine

N machines K peers per machine 1 machine N peers 19 SomeWhere infrastructure Zoom on one machine

100 % JAVA 1.5 somewhere.jar ~ 250 Ko

20 Scalability experiments [IJCAI 05]

ò on randomly generated networks – 1000 peers deployed on a cluster of 75 machines – small world topology ò Close to the topology of the web ò peers – ontologies ò random clauses of length 2 – mappings ò random clauses of length 2 or 3

21 Varying topologies

P: probability of redirecting an edge

1000 peers Ring, 10 neighbours/peer

P = 0.01 Model of Watts and Strogatz

P = 0.1 P = 1 22 Small world Random graph Scalability results

Varying parameters ò Number of mappings between peers ò complexity of mappings – ratio of clauses of length 3 (0%, 20%, 100%)

timeout : 30 s/query

Depth of query processing ò Small depth (less than 7) even on the hard cases

Time to produce a number of answers ò In 90% cases, the first answer is produced within 2 seconds ò Easy cases (simple mappings): – few answers per query (5 on average) – very fast (less than 0.1s) to compute all the answers without timeouts ò Hard cases (complex and more mappings per edge) – around 1000 answers per query (but > 30% queries not complete : timeouts) 23 – quite fast to obtain them (less than 20s) Handling inconsistencies

ò How to define them ? – unsatisfiability => derivation of the empty clause – empty classes => derivation of unit negative clauses

A’ B’ B’ ¬A’ A B A B

there exists A such that A is empty in every model: S |= ¬A ò How to detect inconsistencies? – at each join of a new peer ò How to deal with inconsistencies? – avoid them when reasoning 24 illustration

path m1: 2005 ArtiAclIePubli is a subclass of P1 m0 P2 Conf. AIPubli Theory AIPubli BDPubli Theory path Emxp0e-> m2: AIPublic is a subclass of Journal.

Conf and Journal are 2005 Conf Theory Journal disjoint, therefore AIPUbli is necessarily m1 m2 empty

Publi P3 inconsistencies are caused by mappings.

Conf >--< Journal 25 P2P detecting of inconsistencies

¬AIPubli v 2005 ¬AIPubli v Theory ¬Theory v Article ¬BDPubli v 2005 ¬Expe v Article

¬2005 v Conf ¬Conf v Publi ¬Theory v Journal m1 ¬Journal v Publi m2 ¬Journal v ¬Conf

° Propagation of m2: ° Propagation of m1: { ¬Theory v Journal; { ¬AIPubli v Conf; ¬AIPubli v Journal; ¬AIPubli v Publi; …..; ¬AIPubli v ¬Journal; ; ¬BDPubli v Conf; ¬AIPubli …; ¬BDPubli v Publi; ¬BDPubli v ¬Journal }. ¬AIPubli v ¬Conf}.

No production of unit clause Production of a unit clause No inconsistency Inconsistency {m1,m2} is a NoGood stored at P3 26 Distributed storage of the NoGoods

{ } M*1

M*2 { … } M* { } n

27 P2P well-founded reasoning

ò Principle: – avoid the inconsistencies when constructing answers ò Semantics of « well-founded » answer: – obtained from a consistent subset of formulas ò Algorithm: – for each answer, ò build its set of mapping supports and return the set of NoGoods encountered during the reasoning, ò discard the mapping supports including a NoGood – return the answers having a not empty set of mapping supports

It has been presented at ECAI 06 28 SomeRDFS

ò Extending the data model to RDF(S) – W3C recommendation for describing web resources – Classes and (binary) relations between objects – each object is identified by a URI

MuseumName http://www.louvre.fr "Le Louvre"

Located CityName http://www.paris.fr " Paris"

Triple notation: Relational notation: property(resource, value) 29 RDFS

CulturalPlace

Is-a Contains MuseumName MadeBy Literal Museum Work Artist W orkName Located ArtistName Is-a Is-a Literal CityName Literal City Literal

ModernMuseum ArcheologyMuseum

30 SomeRDFS: data model a simple fragment of RDFS distributed through simple mappings (using the same constructors)

Q(X,Y): P2.Work(X)∧P2.refersTo(X,Y) 31 Query rewriting in SomeRDFS

ò Propositionalisation of the RDFS statements and the query: removing the variables

dom Ω dom range Ω range rel Ω rel C1 C2 C1 C2 P1 P2

Prel Ω Cdom Prel Ω Crange ò Propositional query rewriting using SomeWhere ò Building the relational rewritings by adding the variables at the right place.

32 To appear in Journal of Data Semantics illustration

Q(X,Y): P2.Work(X)∧P2.refersTo(X,Y)

P2.Workdom P2.Workrange P2.refersTorel

SomeWhere SomeWhere SomeWhere rewriting rewriting rewriting

P2.Paintingdom … P1.Paintsrel … P1.belongsTorel …

P2.Painting(X) P1.Paints(Z,X) P1.belongsTo(X,Y)

R1(X,Y): P2.Painting(X)∧P1.belongsTo(X,Y) 33 illustration

Q(X,Y): P2.Work(X)∧P2.refersTo(X,Y)

P2.Workdom P2.Workrange P2.refersTorel

SomeWhere SomeWhere SomeWhere rewriting rewriting rewriting

P2.Paintingdom … P1.Paintsrel … P1.belongsTorel …

P2.Painting(X) P1.Paints(Z,X) P1.belongsTo(X,Y)

R2(X,Y): P1.Paints(Z,X)∧P1.belongsTo(X,Y) 34 Ongoing work

ò Coupling SomeWhere to a DHT for optimizing lookup queries

ò Adapting the SomeWhere algorithm to support the epistemic semantics

ò Modeling and handling trust in P2P Semantic overlay networks – based on a logical approach

ò P2P discovery and composition of smart devices – based on a semantic description of the functionality, inputs and outputs of devices

35