Overview of Part 1: First-order queries Knowledge Bases and Databases Part 1: First-Order Queries 1 First-order logic 1 Syntax of first-order logic Diego Calvanese 2 Semantics of first-order logic 3 First-order logic queries Faculty of Computer Science Master of Science in Computer Science 2 First-order query evaluation A.Y. 2007/2008 1 Query evaluation problem 2 Complexity of query evaluation

3 Conjunctive queries 1 Evaluation of conjunctive queries 2 Containment of conjunctive queries 3 Unions of conjunctive queries

unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (2/66)

Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic Outline

Chapter I 1 Syntax of first-order logic

First-Order Logic 2 Semantics of first-order logic

3 First-order logic queries

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (3/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (4/66) Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic Outline First-order logic

1 Syntax of first-order logic First-order logic (FOL) is the logic to speak about objects, which are the domain of discourse or universe. 2 Semantics of first-order logic FOL is concerned about properties of these objects and relations over objects (resp., unary and n-ary predicates). 3 First-order logic queries FOL also has functions including constants that denote objects.

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (5/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (6/66)

Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic FOL syntax – Terms FOL syntax – Formulas

Def.: The set of Formulas is defined inductively as follows: We first introduce: k If t1, . . . , tk Terms and P is a k-ary predicate, then A set Vars = x1, . . . , xn of individual variables (i.e., variables ∈ { } P k(t , . . . , t ) Formulas (atomic formulas). that denote single objects). 1 k ∈ If t , t Terms, then t = t Formulas. A set of functions symbols, each of given arity 0. 1 2 ∈ 1 2 ∈ ≥ If ϕ Formulas and ψ Formulas then Functions of arity 0 are called constants. ∈ ∈ ϕ Formulas ϕ¬ ∈ψ Formulas ∧ ∈ Def.: The set of Terms is defined inductively as follows: ϕ ψ Formulas ϕ ∨ ψ∈ Formulas Vars Terms; → ∈ ⊆ If ϕ Formulas and x Vars then k ∈ ∈ If t1, . . . , tk Terms and f is a k-ary function symbol, then x.ϕ Formulas k ∈ ∃ ∈ f (t1, . . . , tk) Terms; x.ϕ Formulas ∈ ∀ ∈ Nothing else is in Terms. Nothing else is in Formulas.

unibz.it Note: a predicate of arity 0 is a of propositional logic. unibz.it D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (7/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (8/66) Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic Outline Interpretations

Given an alphabet of predicates P1,P2,... and functions f1, f2,..., each with an associated arity, a FOL interpretation is: 1 Syntax of first-order logic = (∆I ,P I ,P I , . . . , f I , f I ,...) I 1 2 1 2 2 Semantics of first-order logic where:

∆I is the domain (a set of objects) 3 First-order logic queries if P is a k-ary predicate, then P ∆ ∆ (k times) i iI ⊆ I × · · · × I if f is a k-ary function, then f : ∆ ∆ ∆ (k times) i iI I × · · · × I −→ I if f is a constant (i.e., a 0-ary function), then f : () ∆ i iI −→ I (i.e., fi denotes exactly one object of the domain)

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (9/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (10/66)

Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic Assignment Truth in an interpretation wrt an assignment

We define when a FOL formula ϕ is true in an interpretation wrt an Let Vars be a set of (individual) variables. I assignment α, written , α = ϕ: I | Def.: Given an interpretation , an assignment is a function , α = P (t , . . . , t ) if (ˆα(t ),..., αˆ(t )) P I I | 1 k 1 k ∈ I , α = t = t if αˆ(t ) =α ˆ(t ) I | 1 2 1 2 α : Vars ∆I , α = ϕ if , α = ϕ −→ I | ¬ I 6| that assigns to each variable x Vars an object α(x) ∆ . , α = ϕ ψ if , α = ϕ and , α = ψ I I | ∧ I | I | ∈ ∈ , α = ϕ ψ if , α = ϕ or , α = ψ I | ∨ I | I | , α = ϕ ψ if , α = ϕ implies , α = ψ I | → I | I | It is convenient to extend the notion of assignment to terms. We can do , α = x.ϕ if for some a ∆ we have , α[x a] = ϕ I | ∃ ∈ I I 7→ | so by defining a function αˆ : Terms ∆I inductively as follows: , α = x.ϕ if for every a ∆ we have , α[x a] = ϕ −→ I | ∀ ∈ I I 7→ | αˆ(x) = α(x), if x Vars ∈ Here, α[x a] stands for the new assignment obtained from α as αˆ(f(t1, . . . , tk)) = f I (ˆα(t1),..., αˆ(tk)) 7→ follows: α[x a](x) = a 7→ Note: for constants αˆ(c) = cI . α[x a](y) = α(y) for y = x unibz.it 7→ 6 unibz.it D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (11/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (12/66) Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic Open vs. closed formulas Outline

Definitions A variable x in a formula ϕ is free if x does not occur in the scope of any quantifier, otherwise it is bounded. 1 Syntax of first-order logic An open formula is a formula that has some free variable. A closed formula, also called sentence, is a formula that has no free 2 Semantics of first-order logic variables.

For closed formulas (but not for open formulas) we can define what it 3 First-order logic queries means to be true in an interpretation, written = ϕ, without I | mentioning the assignment, since the assignment α does not play any role in verifying , α = ϕ. I | Instead, open formulas are strongly related to queries — cf. relational

databases. unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (13/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (14/66)

Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic FOL queries FOL boolean queries

Def.: A FOL query is an (open) FOL formula. Def.: A FOL boolean query is a FOL query without free variables.

When ϕ is a FOL query with free variables (x1, . . . , xk), then we sometimes write it as ϕ(x1, . . . , xk), and say that ϕ has arity k. Hence, the answer to a boolean query ϕ() is defined as follows: Given an interpretation , we are interested in those assignments that I map the variables x1, . . . , xk (and only those). We write an assignment ϕ()I = () , = ϕ() { | I hi | } α s.t. α(x ) = a , for i = 1, . . . , k, as a , . . . , a . i i h 1 ki Def.: Given an interpretation , the answer to a query ϕ(x , . . . , x ) is I 1 k Such an answer is ϕ(x1, . . . , xk)I = (a1, . . . , ak) , a1, . . . , ak = ϕ(x1, . . . , xk) (), if = ϕ { | I h i | } I | , if = ϕ. Note: We will also use the notation ϕI , which keeps the free variables ∅ I 6| implicit, and ϕ( ) making apparent that ϕ becomes a functions from As an obvious convention we read () as “true” and as “false”. I ∅ interpretations to set of tuples. unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (15/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (16/66) Syntax of first-order logic Semantics of first-order logic First-order logic queries Syntax of first-order logic Semantics of first-order logic First-order logic queries

Chap. 1: First-Order Logic Chap. 1: First-Order Logic FOL formulas: logical tasks FOL queries – Logical tasks

Validity: if ϕ is valid, then ϕ = ∆ ∆ for all , i.e., the I I × · · · × I I Definitions query always returns all the tuples of . I Validity: ϕ is valid iff for all and α we have that , α = ϕ. Satisfiability: if ϕ is satisfiable, then ϕ = for some , i.e., the I 6 ∅ I I I | query returns at least one tuple. Satisfiability: ϕ is satisfiable iff there exists an and α such that Logical implication: if ϕ logically implies ψ, then ϕ ψ for all I I ⊆ I , α = ϕ, and unsatisfiable otherwise. , written ϕ ψ, i.e., the answer to ϕ is contained in that of ψ in I | I ⊆ every interpretation. This is called query containment. Logical implication: ϕ logically implies ψ, written ϕ = ψ iff for all | Logical equivalence: if ϕ is logically equivalent to ψ, then ϕI = ψI and α, if , α = ϕ then , α = ψ. for all , written ϕ ψ, i.e., the answer to the two queries is the I I | I | I ≡ same in every interpretation. This is called query equivalence and Logical equivalence: ϕ is logically equivalent to ψ, iff for all and I corresponds to query containment in both directions. α, we have that , α = ϕ iff , α = ψ (i.e., ϕ = ψ and ψ = ϕ). I | I | | | Note: These definitions can be extended to the case where we have

unibz.it axioms, i.e., constraints on the admissible interpretations. unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (17/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (18/66)

Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Outline

Chapter II 4 Query evaluation problem

First-Order Query Evaluation 5 Complexity of query evaluation

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (19/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (20/66) Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Outline Query evaluation

Let us consider: a finite alphabet, i.e., we have a finite number of predicates and functions, and 4 Query evaluation problem a finite interpretation , i.e., an interpretation (over the finite I 5 Complexity of query evaluation alphabet) for which ∆I is finite.

Then we can consider query evaluation as an algorithmic problem, and study its computational properties.

Note: To study the computational complexity of the problem, we need to define a corresponding decision problem.

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (21/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (22/66)

Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Query evaluation problem Query evaluation algorithm

We define now an algorithm that computes the function Truth( , α, ϕ) Definitions I Query answering problem: given a finite interpretation and a FOL in such a way that Truth( , α, ϕ) = true iff , α = ϕ. I I I | query ϕ(x1, . . . , xk), compute We make use of an auxiliary function TermEval( , α, t) that, given an I ϕI = (a , . . . , a ) , a , . . . , a = ϕ(x , . . . , x ) interpretation and an assignment α, evaluates a term t returning an { 1 k | I h 1 ki | 1 k } I object o ∆I : Recognition problem (for query answering): given a finite ∈ interpretation , a FOL query ϕ(x1, . . . , xk), and a tuple ∆I TermEval( ,α,t){ I if (t is xI Vars) (a1, . . . , ak), with ai ∆I , check whether (a1, . . . , ak) ϕI , i.e., ∈ ∈ return α∈(x); whether if (t is f(t 1, . . . , t k)) , a1, . . . , ak = ϕ(x1, . . . , xk) I h i | return f I (TermEval( ,α,t 1),...,TermEval( ,α,t k)); } I I Note: The recognition problem for query answering is the decision Then, Truth( , α, ϕ) can be defined by structural recursion on ϕ. problem corresponding to the query answering problem. unibz.it I unibz.it D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (23/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (24/66) Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Query evaluation algorithm (cont’d) Query evaluation – Results

boolean Truth( ,α,ϕ){ if (ϕ is t 1I = t 2) return TermEval( ,α,t 1) = TermEval( ,α,t 2); if (ϕ is P (t 1, . . . ,I t k)) I Theorem (Termination of Truth( , α, ϕ)) return P I (TermEval( ,α,t 1),...,TermEval( ,α,t k)); I if (ϕ is ψ) I I The algorithm Truth terminates. return ¬Truth( ,α,ψ); if (ϕ is ¬ψ ψ ) I ◦ 0 Proof. Immediate. return Truth( ,α,ψ) Truth( ,α,ψ0); if (ϕ is x.ψ){I ◦ I boolean∃ b = false; for all (a ∆I ) Theorem (Correctness) b = b ∈Truth( ,α[x a],ψ); ∨ I 7→ The algorithm Truth is sound and complete, i.e., , α = ϕ if and only if return b; I | } Truth( , α, ϕ) = true. if (ϕ is x.ψ){ I ∀ boolean b = true; Proof. Easy, since the algorithm is very close to the semantic definition for all (a ∆I ) b = b ∈Truth( ,α[x a],ψ); of , α = ϕ. return b;∧ I 7→ I | } } unibz.it unibz.it D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (25/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (26/66)

Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Outline Query evaluation – Time complexityI

Theorem (Time complexity of Truth( , α, ϕ)) I The time complexity of Truth( , α, ϕ) is ( + α + ϕ ) ϕ , i.e., I |I| | | | | | | 4 Query evaluation problem polynomial in the size of and exponential in the size of ϕ. I

5 Complexity of query evaluation Proof.

f I (of arity k) can be represented as k-dimensional array, hence accessing the required element can be done in time linear in . |I| TermEval(...) visits the term, so it generates a polynomial number of recursive calls, hence is time polynomial in ( + α + ϕ ). |I| | | | |

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (27/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (28/66) Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Query evaluation – Time complexityII Query evaluation – Space complexityI

P I (of arity k) can be represented as k-dimensional boolean array, Theorem (Space complexity of Truth( , α, ϕ)) hence accessing the required element can be done in time linear in I The space complexity of Truth( , α, ϕ) is ϕ ( ϕ log ), i.e., . I | | · | | · |I| |I| logarithmic in the size of and polynomial in the size of ϕ. I Truth(...) for the boolean cases simply visits the formula, so Proof. generates either one or two recursive calls. f I (...) can be represented as k-dimensional array, hence accessing the required element requires O(log ); Truth(...) for the quantified cases x.ϕ and x.ψ involves looping |I| ∃ ∀ for all elements in ∆I and testing the resulting assignments. TermEval(...) simply visits the term, so it generates a polynomial number of recursive calls. Each activation record has a constant The total number of such testings is O( ]Vars ). size, and we need O( ϕ ) activation records; |I| | | P I (...) can be represented as k-dimensional boolean array, hence Hence the claim holds. accessing the required element requires O(log ); |I| unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (29/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (30/66)

Query evaluation problem Complexity of query evaluation Query evaluation problem Complexity of query evaluation

Chap. 2: First-Order Query Evaluation Chap. 2: First-Order Query Evaluation Query evaluation – Space complexityII Query evaluation – Complexity measures [Var82]

Truth(...) for the boolean cases simply visits the formula, so Definition (Combined complexity) generates either one or two recursive calls, each requiring constant The combined complexity is the complexity of , α, ϕ , α = ϕ , size; {hI i | I | } i.e., interpretation, tuple, and query are all considered part of the input. Truth(...) for the quantified cases x.ϕ and x.ψ involves looping ∃ ∀ for all elements in ∆I and testing the resulting assignments; The total number of activation records that need to be at the same Definition (Data complexity) time on the stack is O(]Vars) O( ϕ ). The data complexity is the complexity of , α , α = ϕ , i.e., the ≤ | | {hI i | I | } query ϕ is fixed (and hence not considered part of the input). Hence the claim holds.

Note: the worst case form for the formula is Definition (Query complexity) The query complexity is the complexity of α, ϕ , α = ϕ , i.e., the x1. x2. xn 1. xn.P (x1, x2, . . . , xn 1, xn). {h i | I | } ∀ ∃ · · · ∀ − ∃ − interpretation is fixed (and hence not considered part of the input). I unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (31/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (32/66) Query evaluation problem Complexity of query evaluation Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 2: First-Order Query Evaluation Chap. 3: Conjunctive Queries Query evaluation – Combined, data, query complexity

Theorem (Combined complexity of query evaluation) The complexity of , α, ϕ , α = ϕ is: {hI i | I | } time: exponential space: PSpace-complete — see [Var82] for hardness Chapter III

Theorem (Data complexity of query evaluation) Conjunctive Queries The complexity of , α , α = ϕ is: {hI i | I | } time: polynomial space: in LogSpace Theorem (Query complexity of query evaluation) The complexity of α, ϕ , α = ϕ is: {h i | I | } time: exponential space: PSpace-complete — see [Var82] for hardness unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (33/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (34/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Outline Outline

6 Evaluation of conjunctive queries 6 Evaluation of conjunctive queries

7 Containment of conjunctive queries 7 Containment of conjunctive queries

8 Unions of conjunctive queries 8 Unions of conjunctive queries

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (35/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (36/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Conjunctive queries (CQs) Conjunctive queries and SQL – Example

Def.: A conjunctive query (CQ) is a FOL query of the form Relational alphabet: Person(name, age), Lives(person, city), Manages(boss, employee) ~y.conj (~x,~y) ∃ Query: return name and age of all persons that live in the same city as where conj (~x,~y) is a conjunction (i.e., an “and”) of atoms and their boss. equalities, over the free variables ~x, the existentially quantified Expressed in SQL: variables ~y, and possibly constants. SELECT P.name, P.age FROM Person P, Manages M, Lives L1, Lives L2 Note: WHERE P.name = L1.person AND P.name = M.employee AND CQs contain no disjunction, no negation, no universal M.boss = L2.person AND L1.city = L2.city quantification, and no function symbols besides constants. Expressed as a CQ: Hence, they correspond to select-project- b, e, p1, c1, p2, c2.Person(n,a ) Manages(b, e) Lives(p1, c1) Lives(p2, c2) (SPJ) queries. ∃ n = p1 n∧= e b = p2 ∧ c1 = c2 ∧ ∧ ∧ ∧ ∧ CQs are the most frequently asked queries. Or simpler: b, c.Person(n,a ) Manages(b,n ) Lives(n, c) Lives(b, c) unibz.it ∃ ∧ ∧ ∧ unibz.it D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (37/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (38/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries notation for CQs Conjunctive queries – Example

A CQ q = ~y.conj (~x,~y) can also be written using datalog notation as ∃ Consider an interpretation = (∆I ,EI ), where EI is a binary I q(~x ) conj 0(~x , ~y ) – note that such interpretation is a (directed) graph. 1 ← 1 1 The followingCQ q returns all nodes that participate to a triangle where conj0(~x1, ~y1) is the list of atoms in conj (~x,~y) obtained by equating the variables ~x, ~y according to the equalities in conj (~x,~y). in the graph: As a result of such an equality elimination, we have that ~x and ~y can y, z.E(x, y) E(y, z) E(z,x ) 1 1 ∃ ∧ ∧ contain constants and multiple occurrences of the same variable. The query q in datalog notation becomes:

Def.: In the above query q, we call: q(x) E(x, y),E(y, z),E(z,x ) ← q(~x1) the head; The query q in SQL is (we use Edge(f,s) for E(x, y): conj 0(~x1, ~y1) the body; SELECT E1.f the variables in ~x the distinguished variables; 1 FROM Edge E1, Edge E2, Edge E3 the variables in ~y the non-distinguished variables. 1 unibz.it WHERE E1.s = E2.f AND E2.s = E3.f AND E3.s = E1.f unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (39/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (40/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Nondeterministic evaluation of CQs Nondeterministic CQ evaluation algorithm

Since a CQ contains only existential quantifications, we can evaluate it boolean Truth( ,α,ϕ){ I by: if (ϕ is t 1 = t 2) 1 guessing a truth assignment for the non-distinguished variables; return TermEval( ,α,t 1) = TermEval( ,α,t 2); I I 2 evaluating the resulting formula (that has no quantifications). if (ϕ is P (t 1, . . . , t k)) return P (TermEval( ,α,t 1),...,TermEval( ,α,t k)); I I I if (ϕ is ψ ψ ) ∧ 0 boolean ConjTruth( ,α, ~y.conj(~x,~y)){ return Truth( ,α,ψ) Truth( ,α,ψ0); I ∃ I ∧ I GUESS assignment α[~y ~a] { } 7→ return Truth( ,α[~y ~a],conj (~x,~y)); I 7→ } ∆I TermEval( ,α,t){ I if (t is a variable x) return α(x); where Truth( , α, ϕ) is defined as for FOL queries, considering only the if (t is a constant c) return cI ; I required cases. }

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (41/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (42/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries CQ evaluation – Combined, data, and query complexity 3-colorability

Theorem (Combined complexity of CQ evaluation) A graph is k-colorable if it is possible to assign to each node one of k , α, q , α = q is NP-complete — see below for hardness. {hI i | I | } colors in such a way that every two nodes connected by an edge have time: exponential different colors. space: polynomial Def.: 3-colorability is the following decision problem Theorem (Data complexity of CQ evaluation) Given a graph G = (V,E), is it 3-colorable? , α , α = q is in LogSpace {hI i | I | } time: polynomial space: logarithmic Theorem Theorem (Query complexity of CQ evaluation) 3-colorability is NP-complete. α, q , α = q is NP-complete — see below for hardness. {h i | I | } time: exponential We exploit 3-colorability to show NP-hardness of conjunctive query space: polynomial evaluation. unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (43/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (44/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Reduction from 3-colorability to CQ evaluation NP-hardness of CQ evaluation

Let G = (V,E) be a graph. We define: The previous reduction immediately gives us the hardness for combined An Interpretation: = (∆I ,EI ) where: complexity. I ∆I = r, g, b { } Theorem EI = (r, g), (g, r), (r, b), (b, r), (g, b), (b, g) { } A conjunctive query: Let V = x , . . . , x , then consider the CQ evaluation is NP-hard in combined complexity. { 1 n} boolean conjunctive query defined as: Note: in the previous reduction, the interpretation does not depend on q = x , . . . , x . E(x , x ) E(x , x ) G ∃ 1 n i j ∧ j i the actual graph. Hence, the reduction provides also the lower-bound (x ,x ) E i ^j ∈ for query complexity. Theorem Theorem CQ evaluation is NP-hard in query (and combined) complexity. G is 3-colorable iff = q . I | G

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (45/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (46/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Outline Homomorphism

Let = (∆ ,P , . . . , c ,...) and = (∆ ,P , . . . , c ,...) be two I I I I J J J J interpretations over the same alphabet (for simplicity, we consider only constants as functions). 6 Evaluation of conjunctive queries Def.: A homomorphism from to I J is a mapping h : ∆I ∆J such that: 7 Containment of conjunctive queries → h(cI ) = cJ

h(P I (a1, . . . , ak)) = P J (h(a1), . . . , h(ak)) 8 Unions of conjunctive queries

Note: An isomorphism is a homomorphism that is one-to-one and onto. Theorem FOL is unable to distinguish between interpretations that are isomorphic.

unibz.it Proof. See any standard book on logic. unibz.it D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (47/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (48/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Recognition problem and boolean query evaluation Canonical interpretation of a (boolean) CQ

Let q be a conjunctive query x , . . . , x .conj Consider the recognition problem associated to the evaluation of a query ∃ 1 n q of arity k. Then Def.: The canonical interpretation associated with q Iq q q q is the interpretation q = (∆I ,P I , . . . , cI ,...), where , α = q(x1, . . . , xk) iff α,~c = q(c1, . . . , ck) I I | I | ∆ q = x , . . . , x c c constant occurring in q , I { 1 n} ∪ { | } where is identical to but includes new constants c1, . . . , c that i.e., all the variables and constants in q; Iα,~c I k Iα,~c q are interpreted as ci = α(xi). cI = c, for each constant c in q; (t , . . . , t ) P q iff the atom P (t , . . . , t ) occurs in q. 1 k ∈ I 1 k That is, we can reduce the recognition problem to the evaluation of a boolean query. Sometimes the procedure for obtaining the canonical interpretation is called freezing of q.

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (49/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (50/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Canonical interpretation of a (boolean) CQ – Example Canonical interpretation and (boolean) CQ evaluation

Consider the boolean query q Theorem ([CM77]) q(c) E(c, y),E(y, z),E(z, c) For boolean CQs, = q iff there exists a homomorphism from to . ← I | Iq I

Then, the canonical interpretation is defined as Proof. Iq “ ” Let = q, let α be an assignment to the existential variables that q q q ⇒ I | q = (∆I ,EI , cI ) makes q true in , and let αˆ be its extension to constants. Then αˆ is a I I homomorphism from to . where Iq I

q ∆I = y, z, c “ ” Let h be a homomorphism from to . Then restricting h to { } ⇐ Iq I E q = (c, y), (y, z), (z, c) the variables only we obtain an assignment to the existential variables I { } q that makes q true in . cI = c I

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (51/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (52/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Canonical interpretation and (boolean) CQ evaluation Query containment

Def.: Query containment The previous result can be rephrased as follows: Given two FOL queries ϕ and ψ of the same arity, ϕ is contained in ψ, denoted ϕ ψ, if for all interpretations and all assignments α we ⊆ I (The recognition problem associated to) query evaluation can be have that reduced to finding a homomorphism. , α = ϕ implies , α = ψ I | I | (In logical terms: ϕ = ψ.) | Note: Query containment is of special interest in query optimization. Finding a homomorphism between two interpretations (aka relational structures) is also known as solving a Constraint Satisfaction Problem (CSP), a problem well-studied in AI – see also [KV98]. Theorem For FOL queries, query containment is undecidable.

Proof.: Reduction from FOL logical implication. unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (53/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (54/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Query containment for CQs Reducing containment of CQs to CQ evaluation

For CQs, query containment q (~x) q (~x) can be reduced to query 1 ⊆ 2 Theorem ([CM77]) evaluation. For CQs, q (~x) q (~x) iff = q (~c), where ~c are new constants. 1 ⊆ 2 Iq1(~c) | 2 1 Freeze the free variables, i.e., consider them as constants. Proof. This is possible, since q1(~x) q2(~x) iff ⊆ “ ” Assume that q1(~x) q2(~x). , α = q1(~x) implies , α = q2(~x), for all and α; or equivalently ⇒ ⊆ I | I | I Since = q (~c) it follows that = q (~c). = q (~c) implies = q (~c), for all , where ~c are new q1(~c) 1 q1(~c) 2 Iα,~c | 1 Iα,~c | 2 Iα,~c I | I | constants, and α,~c extends to the new constants with “ ” Assume that q1(~c) = q2(~c). α,~c I I ⇐ I | cI = α(x). By [CM77] on hom., for every such that = q (~c) there exists a I I | 1 homomorphism h from q1(~c) to . 2 I I Construct the canonical interpretation q1(~c) of the CQ q1(~c) on the I On the other hand, since q1(~c) = q2(~c), again by [CM77] on hom., there left hand side... I | exists a homomorphism h0 from to . Iq2(~c) Iq1(~c) The mapping h h0 (obtained by composing h and h0) is a homomorphism 3 . . . and evaluate on the CQ q (~c) on the right hand side, ◦ q1(~c) 2 from to . Hence, once again by [CM77] on hom., = q (~c). I Iq2(~c) I I | 2 i.e., check whether q1(~c) = q2(~c). I | unibz.it So we can conclude that q (~c) q (~c), and hence q (~x) q (~x). unibz.it 1 ⊆ 2 1 ⊆ 2 D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (55/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (56/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Query containment for CQs Query containment for CQs – Complexity

For CQs, we also have that (boolean) query evaluation = q can be I| reduced to query containment. From the previous results and NP-completenss of combined complexity of CQ evaluation, we immediately get: Let = (∆ ,P , . . . , c ,...). I I I I We construct the (boolean) CQ q as follows: Theorem I q has no existential variables (hence no variables at all); Containment of CQs is NP-complete. I the constants in q are the elements of ∆I ; I for each relation P interpreted in and for each fact Since CQ evaluation is NP-complete even in query complexity, the I (a1, . . . , ak) P I , q contains one atom P (a1, . . . , ak) (note that above result can be strengthened: ∈ I each ai ∆I is a constant in q ). ∈ I Theorem Containment q (~x) q (~x) of CQs is NP-complete, even when q is 1 ⊆ 2 1 Theorem considered fixed. For CQs, = q iff q q. I| I ⊆ unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (57/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (58/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Outline Union of conjunctive queries (UCQs)

Def.: A union of conjunctive queries (UCQ) is a FOL query of the form 6 Evaluation of conjunctive queries ~y .conj (~x,~y ) ∃ i i i i=1,...,n 7 Containment of conjunctive queries _ where each conj i(~x,~yi) is a conjunction of atoms and equalities with 8 Unions of conjunctive queries free variables ~x and ~yi, and possibly constants.

Note: Obviously, each conjunctive query is also a of union of conjunctive queries.

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (59/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (60/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Datalog notation for UCQs Evaluation of UCQs

A union of conjunctive queries From the definition of FOL query we have that:

q = ~yi.conj i(~x,~yi) ∃ , α = ~yi.conj i(~x,~yi) i=1_,...,n I | ∃ i=1_,...,n is written in datalog notation as if and only if

q(~x) conj 10 (~x,~y10) { ←. , α = ~yi.conj i(~x,~yi) for some i 1, . . . , n . . I | ∃ ∈ { } q(~x) conj n0 (~x,~yn0) ← } Hence to evaluate a UCQ q, we simply evaluate a number (linear in the where each element of the set is the datalog expression corresponding to size of q) of conjunctive queries in isolation. the conjunctive query q = ~y .conj (~x,~y ). i ∃ i i i Hence, evaluating UCQs has the same complexity as evaluating CQs. Note: in general, we omit the set brackets. unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (61/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (62/66)

Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries UCQ evaluation – Combined, data, and query complexity Query containment for UCQs

Theorem (Combined complexity of UCQ evaluation) Theorem , α, q , α = q is NP-complete. For UCQs, q , . . . , q q , . . . , q iff for each q there is a q such {hI i | I | } { 1 k}⊆{ 10 n0 } i j0 time: exponential that qi qj0 . space: polynomial ⊆ Proof. Theorem (Data complexity of UCQ evaluation) “ ” Obvious. ⇐ , q , α = q is in LogSpace (query q fixed). {hI i | I | } “ ” If the containment holds, then we have time: polynomial ⇒ q1(~c), . . . , qk(~c) q10 (~c), . . . , qn0 (~c) , where ~c are new constants: space: logarithmic { }⊆{ } Now consider . We have = q (~c), and hence Iqi(~c) Iqi(~c) | i Theorem (Query complexity of UCQ evaluation) q (~c) = q1(~c), . . . , qk(~c) . I i | { } By the containment, we have that = q (~c), . . . , q (~c) . I.e., α, q , α = q is NP-complete (interpretation fixed). Iqi(~c) | { 10 n0 } {h i | I | } I there exists a q (~c) such that = q (~c). time: exponential j0 Iqi(~c) | j0 Hence, by [CM77] on containment of CQs, we have that q q . space: polynomial i ⊆ j0 unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (63/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (64/66) Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries Evaluation of conjunctive queries Containment of conjunctive queries Unions of conjunctive queries

Chap. 3: Conjunctive Queries Chap. 3: Conjunctive Queries Query containment for UCQs – Complexity References

[CM77] A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. From the previous result, we have that we can check In Proc. of the 9th ACM Symp. on Theory of Computing (STOC’77), pages q1, . . . , q q0 , . . . , q0 by at most k n CQ containment checks. 77–90, 1977. { k}⊆{ 1 n} · [KV98] P. G. Kolaitis and M. Y. Vardi. We immediately get: Conjunctive-query containment and constraint satisfaction. In Proc. of the 17th ACM SIGACT SIGMOD SIGART Symp. on Principles of Theorem Database Systems (PODS’98), pages 205–213, 1998. Containment of UCQs is NP-complete. [Var82] M. Y. Vardi. The complexity of relational query languages. In Proc. of the 14th ACM SIGACT Symp. on Theory of Computing (STOC’82), pages 137–146, 1982.

unibz.it unibz.it

D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (65/66) D. Calvanese Part 1: First-Order Queries KBDB – 2007/2008 (66/66)