Conjunctive Query Answering in the Description Logic EL Using a Relational Database System

Conjunctive Query Answering in the Description Logic EL Using a Relational Database System

Conjunctive Query Answering in the Description Logic EL Using a Relational Database System Carsten Lutz - Universität Bremen, Germany David Toman - University of Waterloo, Canada FrankWolter - University of Liverpool, UK Presented By : Jasmeet Jagdev Background • One of the main application of ontologies is data access • Ontologies formalize conceptual information about the data • Calvanese et al. have argued, true scalability of CQ answering over DL ontologies can only be achieved by making use of RDBMSs • Not straight forward ! Background • RDBMSs are unaware of Tboxes (the DL mechanism for storing conceptual information) and adopt the closed-world semantics • In contrast, ABoxes (the DL mechanism for storing data) and the associated ontologies employ the open-world semantics Introduction • Approach for using RDBMSs for CQ answering over DL ontologies • We apply it to an extension of EL family of DLs • Widely used as ontology languages for large scale bio-medical ontologies such as SNOMED CT and NCI Introduction • EL allows – Concept Intersection – Existential restrictions dr • ELH⊥ = EL + bottom concept, role inclusions, and domain and range restrictions Main Idea • To incorporate the consequences of the TBox T into the relational instance corresponding to the given ABox A • To introduce the idea of combined first-order (FO) rewritability • It is possibly only if: – A and T can be written into an FO structure – q and T into FO query q* Main Idea • Properties of this approach: – It applies to DLs for which data complexity of CQ dr answering is PTIME-complete, such as ELH⊥ dr – For ELH⊥ , rewriting step can be carried out in polynomial time and produce only a polynomial blowup Notations dr • In ELH⊥ , concepts are according to the rule: C ::= A | T | ⊥ |C П D | ∃r.C A – Concept Names taken from set NC r – Role names taken from set NR C,D – concepts Notations • A TBox is a finite set of concept inclusions - C D role inclusions - r s domain restrictions - dom(r) C and range restrictions - ran(r) C • An Abox is a finite set of concept assertions - A(a) role assertions - r(a, b) • a, b individual names taken from set NI • A knowledge base is a pair (T ,A) with a TBox T and an Abox A Notations • Sets NV (of variables) and NI (of individual names) form the set NT of terms • A first-order (FO) query q is a first-order formula built from NT and the unary and binary predicates from NC and NR • q = 휑(v) FO formula - 휑 free variables - v = v1, . , vk • q is k-ary if there are k answer variables Notations • An FO conjunctive query is of the form q = ∃u.ψ(u, v) • where – ψ is a conjunction of concept atoms A(t) and role atoms r(t, t’) – u,v are the quantified variables of q • We use : – var(q) - set of all variables in u and v – qvar(q) - set of quantified variables – avar(q) - answer variables – term(q) - terms in q Prime Concept dr • For a given ELH⊥ knowledge base K and a CQ q cert(q,K) = set of all certain answers = tuple (a1, . , ak) Such that, (a1, . , ak) occur in K for each model I of K I i.e. I satisfies q with vi assigned to a i , 1 ≤ i ≤ k ABox Rewriting • Extension of the ABox to a canonical model of the knowledge base dr – ELH⊥ –KB K = (T,A) – sub(T ) = sub-concepts of T – rol(T ) = role names in T – Ind (A) = individual names in A – ranT (r) to denote the (unique) concept C with ran(r) C ∈ T Canonical Model of IK Problems • This situation can be detected by observing that the object xT,A is not reachable from a by a role chain in IK • This deficiency of IK is easily repaired by restricting it to elements reachable from some aIK with a ∈ Ind(A) • A path in I is finite sequence d0r1d1……… rndn, n ≥ 0, where d0 ∈ I I Ind(A) and (di,di+1) ∈ r i+1 for all i < n r IK • I K denotes the restriction of IK to those d ∈ Δ for which there exists a p ∈ pathsA(I ) such that d = tail(p). r K • Then I K provides the correct certain answers to the query q Problems r • This problem can be overcome by replacing I K with its unraveling into a less constrained, tree-like model • UK = (A,R)-unraveling J of I is defined as follows • UK becomes infinite r • We work with I K and instead rewrite the query to q* • Such that for all a1,……..,ak ∈ Ind(A) Query Rewriting * • q R contains one additional unary predicate r • Aux(x) = in I K • ∼q denote the smallest relation on term(q) that includes the identity relation, is transitive, and satisfies the following closure condition Query Rewriting.. • For any equivalence class ζ of ∼q • Fork= is the set of pairs (pre(ζ), ζ) • Fork≠ is the set of variables v ∈ qvar(q) such that there is no implicant of in([v]) Query Rewriting.. Query Rewriting.. • For q*R is defined as Transformed Queries Transformed Queries Implementation and Experiments • Based on NCI thesaurus which is a well-known ontology from the bio-medical domain • Extracted EL-TBox that contains – 65K primitive concept names – 70 primitive roles – 70K concept inclusions and concept definitions Implementation and Experiments • The auxiliary part of canonical models, which is independent of the Abox consists of – 702K concept assertions – 171K role assertions • IBM DB2 DBMS was used Implementation and Experiments r • We have used only two relations to represent I K – acbox (conceptid, indid) – arbox (roleid, domain-indid, range-indid) • Where – conceptid and roleid are numerical identifiers for concept names and roles names, – Indid, domain-indid, and range-indid are numerical identifiers for individuals from NI ∪ NIaux – acbox represents concept memberships – arbox represents role memberships Implementation and Experiments • Sample Query q Nerve(x)∧¬ Aux(x) • Equivalent SQL statement select indid from acbox where conceptid=141723 and indid > 0 Results • Simple chains (Q1) • Star queries (Q2,Q3,Q4) • Cyclic queries (Q5) Conclusion • A novel approach is provided to CQ answering in DLs using RDBMSs • Can be used for DLs for which the data complexity is between LOGSPACE and PTIME • One drawback of this approach is the blowup of the data, which is polynomial but still considerable on large data sets • Future work could be to reduce this blowup by incorporating the TBox partly into the data and partly into the query .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us