ESSLLI 2016 day 2 Bolzano, Italy ∀ ¬∃ Logical foundations of databases

Diego Figueira Gabriele Puppis

CNRS LaBRI Recap

•Relational model (tables)

•Relational Algebra (union, product, difference, selection, projection)

•SQL (SELECT … FROM … WHERE …)

• RA ≈ basic SQL

•First-order logic (syntax, semantics)

•Expressiveness: FO =* RA Formulas as queries

FO can serve as a declarative query language on relational databases : we express the properties of the answer

Tables = Relations Rows = Tuples Queries = Formulas

[E.F. Codd 1972]

3 Formulas as queries

FO can serve as a declarative query language on relational databases : we express the properties of the answer

Tables = Relations RA =* FO Rows = Tuples Queries = Formulas How = What

[E.F. Codd 1972]

3 Formulas as queries

FO can serve as a declarative query language on relational databases : we express the properties of the answer

Tables = Relations RA =* FO Rows = Tuples Queries = Formulas How = What

RA and FO logic have roughly* the same expressive power! [E.F. Codd 1972]

*FO without functions, with equality, on finite domains, … 3 Formulas as queries

RA ⊆ FO

• R1 × R2 ⤳ R1(x1, …, xn) ⋀ R2(xn+1, …, xm)

• R1 ∪ R2 ⤳ R1(x1, …, xn) ∨ R2(x1, …, xn)

• σ{i1=j1,…,in=jn}(R) ⤳ R(x1, …, xm) ⋀ (xi1=xj1)⋀ ··· ⋀ (xin=xjn)

• π{i1,…,in}(R) ⤳ ∃({x1,…,xm} \ {xi1,…,xin}). R(x1, …, xm)

• R1 \ R2 ⤳ R1(x1, …, xn) ⋀ ¬R2(x1, …, xn)

• …

4 Formulas as queries

FO ⊆ RA does not hold in general!

5 Formulas as queries

FO ⊆ RA does not hold in general!

∉ RA “the complement of R” ∈ FO : ¬R(x)

5 Formulas as queries

FO ⊆/ RA

∉ RA “the complement of R” ∈ FO : ¬R(x)

5 Formulas as queries

FO ⊆/ RA

∉ RA “the complement of R” ∈ FO : ¬R(x)

⇝ We restrict variables to range over active domain

5 Formulas as queries

FO ⊆/ RA

∉ RA “the complement of R” ∈ FO : ¬R(x)

elements in the relations ⇝ We restrict variables to range over active domain

FOact = FO restricted to active domain

5 Formulas as queries

FO ⊆/ RA

∉ RA “the complement of R” ∈ FO : ¬R(x)

elements in the relations ⇝ We restrict variables to range over active domain

act FO φ1(x) = ∀y E(y,x) = φ1(G) = {v2} v2 FO restricted G = v1 to active domain φ2(x,y) = ¬E(x,y) v4 v3 φ2(G) = {(v1,v1),(v3,v1),(v2,v3)}

5 First-order logic restricted to active domain

Formal Semantics of FOact

G ⊧α ∃x φ iff for some v ∈ ACT(G) and α' = α ∪ {x ↦ v} we have G ⊧α' φ

G ⊧α ∀x φ iff for every v ∈ ACT(G) and α' = α ∪ {x ↦ v} we have G ⊧α' φ

G ⊧α φ⋀ψ iff G ⊧α φ and G ⊧α ψ

G ⊧α ¬φ iff it is not true that G ⊧α φ

G ⊧α x=y iff α(x)=α(y)

G ⊧α E(x,y) iff (α(x),α(y)) ∈ E

ACT(G) = {v | for some v': (v,v') ∈ E or (v',v) ∈ E}

6 First-order logic restricted to active domain

FOact ⊆ RA

7 First-order logic restricted to active domain

FOact ⊆ RA

1. φ in normal form: (∃* (¬∃)*)* + quantifier-free ψ(x ,…,x ) Assume: 1 n 2. φ has n variables

∃x1 ∃x2 ¬∃x3 ∃x4 . ( E(x1,x3) ⋀ ¬E(x4,x2) ) ⋁ (x1=x3)

7 First-order logic restricted to active domain

FOact ⊆ RA

1. φ in normal form: (∃* (¬∃)*)* + quantifier-free ψ(x ,…,x ) Assume: 1 n 2. φ has n variables

∃x1 ∃x2 ¬∃x3 ∃x4 . ( E(x1,x3) ⋀ ¬E(x4,x2) ) ⋁ (x1=x3)

Adom = RA expression for active domain = “π1(E) ∪ π2(E)”

✢ • (R(xi1,…,xit)) ⤳ R

✢ • (xi = xj) ⤳ σ{i=j}( Adom × · · · × Adom )

✢ ✢ ✢ • (ψ1 ⋀ ψ2) ⤳ ψ1 ∩ ψ2

• (¬ψ)✢ ⤳ Adom × · · · × Adom \ ψ✢

✢ ✢

Translation • ( ∃xi φ(x ,…,x ) ) ⤳ π ( φ ) i1 it {i1,…,it}\{i} 7 First-order logic restricted to active domain

FOact ⊆ RA

1. φ in normal form: (∃* (¬∃)*)* + quantifier-free ψ(x ,…,x ) Assume: 1 n 2. φ has n variables n π{1,…,n}(σ{i1=n+1,…,it=n+t} (Adom × R))

∃x1 ∃x2 ¬∃x3 ∃x4 . ( E(x1,x3) ⋀ ¬E(x4,x2) ) ⋁ (x1=x3)

Adom = RA expression for active domain = “π1(E) ∪ π2(E)”

✢ • (R(xi1,…,xit)) ⤳ R

✢ • (xi = xj) ⤳ σ{i=j}( Adom × · · · × Adom )

✢ ✢ ✢ • (ψ1 ⋀ ψ2) ⤳ ψ1 ∩ ψ2

• (¬ψ)✢ ⤳ Adom × · · · × Adom \ ψ✢

✢ ✢

Translation • ( ∃xi φ(x ,…,x ) ) ⤳ π ( φ ) i1 it {i1,…,it}\{i} 7 First-order logic restricted to active domain

FOact ⊆ RA

1. φ in normal form: (∃* (¬∃)*)* + quantifier-free ψ(x ,…,x ) Assume: 1 n 2. φ has n variables

∃x1 ∃x2 ¬∃x3 ∃x4 . ( E(x1,x3) ⋀ ¬E(x4,x2) ) ⋁ (x1=x3)

Adom = RA expression for active domain = “π1(E) ∪ π2(E)” Adomn ✢ • (R(xi1,…,xit)) ⤳ R

✢ • (xi = xj) ⤳ σ{i=j}( Adom × · · · × Adom )

✢ ✢ ✢ • (ψ1 ⋀ ψ2) ⤳ ψ1 ∩ ψ2

• (¬ψ)✢ ⤳ Adom × · · · × Adom \ ψ✢

✢ ✢

Translation • ( ∃xi φ(x ,…,x ) ) ⤳ π ( φ ) i1 it {i1,…,it}\{i} 7 First-order logic restricted to active domain

FOact ⊆ RA

1. φ in normal form: (∃* (¬∃)*)* + quantifier-free ψ(x ,…,x ) Assume: 1 n 2. φ has n variables

∃x1 ∃x2 ¬∃x3 ∃x4 . ( E(x1,x3) ⋀ ¬E(x4,x2) ) ⋁ (x1=x3)

Adom = RA expression for active domain = “π1(E) ∪ π2(E)”

✢ • (R(xi1,…,xit)) ⤳ R A∩B = ((A∪B) \ (A \ B)) ✢ \ (B \ A) • (xi = xj) ⤳ σ{i=j}( Adom × · · · × Adom )

✢ ✢ ✢ • (ψ1 ⋀ ψ2) ⤳ ψ1 ∩ ψ2

• (¬ψ)✢ ⤳ Adom × · · · × Adom \ ψ✢

✢ ✢

Translation • ( ∃xi φ(x ,…,x ) ) ⤳ π ( φ ) i1 it {i1,…,it}\{i} 7 First-order logic restricted to active domain

FOact ⊆ RA

1. φ in normal form: (∃* (¬∃)*)* + quantifier-free ψ(x ,…,x ) Assume: 1 n 2. φ has n variables

∃x1 ∃x2 ¬∃x3 ∃x4 . ( E(x1,x3) ⋀ ¬E(x4,x2) ) ⋁ (x1=x3)

Adom = RA expression for active domain = “π1(E) ∪ π2(E)”

✢ • (R(xi1,…,xit)) ⤳ R

✢ • (xi = xj) ⤳ σ{i=j}( Adom × · · · × Adom )Adom t if t is the arity of ψ✢

✢ ✢ ✢ • (ψ1 ⋀ ψ2) ⤳ ψ1 ∩ ψ2

• (¬ψ)✢ ⤳ Adom × · · · × Adom \ ψ✢

✢ ✢

Translation • ( ∃xi φ(x ,…,x ) ) ⤳ π ( φ ) i1 it {i1,…,it}\{i} 7 Corollary

FOact is equivalent to RA

8 Question 1: How is π2(σ1=3(R1 × R2) expressed in FO? Remember: R1,R2 are binary

Question 2: How is ∃y,z . (R1(x,y) ⋀ R1(y,z) ⋀ x≠z ) expressed in RA? Remember: The signature is the same as before (R1,R2 binary)

• R1 ∪ R2

• R1 × R2

• R1 \ R2 ≠ ≠ ≠ ≠ • σ{i1=j1,…,in=jn}(R) ≔{(x1, …, xm) ∈ R | (xi1=xj1)⋀ ··· ⋀ (xin=xjn)}

• π{i1,…,in}(R) ≔ {(xi1,…,xin) | (x1, …, xm) ∈ R}

9 Question 1: How is π2(σ1=3(R1 × R2) expressed in FO? Remember: R1,R2 are binary

Answer: ∃x1,x3,x4 (R1(x1,x2) ⋀ R2(x3,x4) ∧ x1 = x3)

Question 2: How is ∃y,z . (R1(x,y) ⋀ R1(y,z) ⋀ x≠z ) expressed in RA? Remember: The signature is the same as before (R1,R2 binary)

• R1 ∪ R2

• R1 × R2

• R1 \ R2 ≠ ≠ ≠ ≠ • σ{i1=j1,…,in=jn}(R) ≔{(x1, …, xm) ∈ R | (xi1=xj1)⋀ ··· ⋀ (xin=xjn)}

• π{i1,…,in}(R) ≔ {(xi1,…,xin) | (x1, …, xm) ∈ R}

9 Question 1: How is π2(σ1=3(R1 × R2) expressed in FO? Remember: R1,R2 are binary

Answer: ∃x1,x3,x4 (R1(x1,x2) ⋀ R2(x3,x4) ∧ x1 = x3)

Question 2: How is ∃y,z . (R1(x,y) ⋀ R1(y,z) ⋀ x≠z ) expressed in RA? Remember: The signature is the same as before (R1,R2 binary)

• R1 ∪ R2

• R1 × R2

• R1 \ R2 ≠ ≠ ≠ ≠ • σ{i1=j1,…,in=jn}(R) ≔{(x1, …, xm) ∈ R | (xi1=xj1)⋀ ··· ⋀ (xin=xjn)}

• π{i1,…,in}(R) ≔ {(xi1,…,xin) | (x1, …, xm) ∈ R}

Answer: π1(σ{2=3,1≠4}(R1 × R1))

9 FO = RA = SQL

Logic Algebra Programming language

10 over on finite very basic active domain domains

FO = RA = SQL

Logic Algebra Programming language

10 Algorithmic problems for query languages

Evaluation problem: Given a query Q, a database instance db, and a tuple t, is t ∈ Q(db) ?

⇝ How hard is it to retrieve data?

11 Algorithmic problems for query languages

Evaluation problem: Given a query Q, a database instance db, and a tuple t, is t ∈ Q(db) ?

⇝ How hard is it to retrieve data?

Emptiness problem: Given a query Q, is there a database instance db that Q(db) ≠ ∅ ? ⇝ Does Q make sense? Is it a contradiction? (Query optimization)

11 Algorithmic problems for query languages

Evaluation problem: Given a query Q, a database instance db, and a tuple t, is t ∈ Q(db) ?

⇝ How hard is it to retrieve data?

Emptiness problem: Given a query Q, is there a database instance db so that Q(db) ≠ ∅ ? ⇝ Does Q make sense? Is it a contradiction? (Query optimization)

Equivalence problem: Given queries Q1, Q2, is Q1(db) = Q2(db) for all database instances db? ⇝ Can we safely replace a query with another? (Query optimization)

11 Complexity theory

What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes usage of resources: • time • memory

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes usage of resources: • time • memory

Algorithm Alg is TIME-bounded by a function f : N ⟶ N if Alg(input) uses less than f (|input|) units of TIME.

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes usage of resources: • time • memory

f Alg is TIME-bounded by a function f : N N if Alg ⟶ time Alg(input) uses less than f (|input|) units of TIME. input size

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes usage of resources: • time • memory

SPACE f Algorithm Alg is TIME-bounded by a function f : N N if Alg ⟶ SPACE. time Alg(input) uses less than f (|input|) units of TIME. input size

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes usage of resources: • time • memory

SPACE f Algorithm Alg is TIME-bounded by a function f : N N if Alg ⟶ SPACE. time Alg(input) uses less than f (|input|) units of TIME. input size

LOGSPACE ⊆ PTIME ⊆ PSPACE ⊆ EXPTIME ⊆ · · ·

12 Complexity theory H’s 10th Domino PCP K . . . What can be mechanized? ⤳ decidable/undecidable How hard is it to mechanise? ⤳ complexity classes usage of resources: • time • memory

SPACE f Algorithm Alg is TIME-bounded by a function f : N N if Alg ⟶ SPACE. time Alg(input) uses less than f (|input|) units of TIME. input size

TIME-bounded by a polynomial LOGSPACE ⊆ PTIME ⊆ PSPACE ⊆ EXPTIME ⊆ · · · SPACE-bounded by a polynomial SPACE-bounded by log(n) 12 Algorithmic problems for FO

Evaluation problem: Given a FO formula φ(x1, …, xn), a graph G, and a binding α, does G ⊧α φ ?

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

13 Algorithmic problems for FO

Evaluation problem: Given a FO formula φ(x1, …, xn), a graph G, and a binding α, does G ⊧α φ ?

DECIDABLE ⇝ foundations of the database industry

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

13 Algorithmic problems for FO

Evaluation problem: Given a FO formula φ(x1, …, xn), a graph G, and a binding α, does G ⊧α φ ?

DECIDABLE ⇝ foundations of the database industry

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

13 Algorithmic problems for FO

Evaluation problem: Given a FO formula φ(x1, …, xn), a graph G, and a binding α, does G ⊧α φ ?

DECIDABLE ⇝ foundations of the database industry

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

� UNDECIDABLE ⇝ by reduction to the satisfiability problem

13 Algorithmic problems for FO

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite [Trakhtenbrot ’50]

14 Algorithmic problems for FO

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite [Trakhtenbrot ’50]

Proof: By reduction from the Domino (aka Tiling) problem.

14 Algorithmic problems for FO

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite [Trakhtenbrot ’50]

Proof: By reduction from the Domino (aka Tiling) problem.

Reduction from P to P': Algorithm that solves P using a O(1) procedure “ P'(x) ” that returns the truth value of P'(x).

14 The (undecidable) Domino problem

Domino Input: 4-sided dominos: The (undecidable) Domino problem

Domino Input: 4-sided dominos:

Output: Is it possible to form a white-bordered rectangle? (of any size)

. . .

......

. . . The (undecidable) Domino problem

Domino Input: 4-sided dominos:

Output: Is it possible to form a white-bordered rectangle? (of any size)

. . .

......

. . .

Rules: sides must match, you can’t rotate the dominos, but you can ‘clone’ them. The (undecidable) Domino problem

Domino - Why is it undecidable?

It can easily encode halting computations of Turing machines:

. . .

0 0 q 0 0 q q 0 1 0 0 0 l 1 0 0 l l 0 1 r 0 0 0 1 r 0 0 r r 0 q 0 0 0 0 q 0 0 0 q q l 1 0 0 0 l 1 0 0 0 l l 1 r 0 0 0 1 r 0 0 0 r r s 0 0 0 0 s 0 0 0 0 The (undecidable) Domino problem

Domino - Why is it undecidable?

It can easily encode halting computations of Turing machines:

. . . i i i (head is elsewhere, 0 0 q 0 0 q q symbol is not modified) i i i 0 l 1 0 0 0 l 1 0 0 l l 0 1 r 0 0 0 1 r 0 0 r r 0 q 0 0 0 0 q 0 0 0 q q l 1 0 0 0 l 1 0 0 0 l l 1 r 0 0 0 1 r 0 0 0 r r s 0 0 0 0 s 0 0 0 0 The (undecidable) Domino problem

Domino - Why is it undecidable?

It can easily encode halting computations of Turing machines:

. . . i i i (head is elsewhere, 0 0 q 0 0 q q symbol is not modified) i i i 0 l 1 0 0 0 l 1 0 0 1 r 2 (head is here, symbol is l l r r 0 1 r 0 0 q 0 2 rewritten, head moves right) 0 1 r 0 0 r r 0 q 0 0 0 0 q 0 0 0 q q l 1 0 0 0 l 1 0 0 0 l l 1 r 0 0 0 1 r 0 0 0 r r s 0 0 0 0 s 0 0 0 0 The (undecidable) Domino problem

Domino - Why is it undecidable?

It can easily encode halting computations of Turing machines:

. . . i i i (head is elsewhere, 0 0 q 0 0 q q symbol is not modified) i i i 0 l 1 0 0 0 l 1 0 0 1 r 2 (head is here, symbol is l l r r 0 1 r 0 0 q 0 2 rewritten, head moves right) 0 1 r 0 0 r r 0 q 0 0 0 l 2 1 (head is here, symbol is l l 0 q 0 0 0 2 q 0 rewritten, head moves left) q q l 1 0 0 0 l 1 0 0 0 l l 1 r 0 0 0 1 r 0 0 0 r r s 0 0 0 0 s 0 0 0 0 The (undecidable) Domino problem

Domino - Why is it undecidable?

It can easily encode halting computations of Turing machines:

. . . i i i (head is elsewhere, 0 0 q 0 0 q q symbol is not modified) i i i 0 l 1 0 0 0 l 1 0 0 1 r 2 (head is here, symbol is l l r r 0 1 r 0 0 q 0 2 rewritten, head moves right) 0 1 r 0 0 r r 0 q 0 0 0 l 2 1 (head is here, symbol is l l 0 q 0 0 0 2 q 0 rewritten, head moves left) q q l 1 0 0 0 l 1 0 0 0 s 0 0 0 (initial configuration) l l 1 r 0 0 0 1 r 0 0 0 r r s 0 0 0 0 s 0 0 0 0 The (undecidable) Domino problem

Domino - Why is it undecidable?

It can easily encode halting computations of Turing machines:

. . . i i i (head is elsewhere, 0 0 q 0 0 q q symbol is not modified) i i i 0 l 1 0 0 0 l 1 0 0 1 r 2 (head is here, symbol is l l r r 0 1 r 0 0 q 0 2 rewritten, head moves right) 0 1 r 0 0 r r 0 q 0 0 0 l 2 1 (head is here, symbol is l l 0 q 0 0 0 2 q 0 rewritten, head moves left) q q l 1 0 0 0 l 1 0 0 0 s 0 0 0 (initial configuration) l l 1 r 0 0 0 1 r 0 0 0 r r s 0 0 0 0 (halting configuration) s 0 0 0 0 h 0 0 0 . . . Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that… Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that…

V H Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that…

∃ ∀ H

V V H Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that…

H H H . . . H V V V V

...... V V V V H H H . . . H

V V V V H H H . . . H

V V V V H H H . . . H Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that…

H H H . . . H 2. Assign one domino to each node: V V V V a unary relation

...... V V V V H H H . . . H D ( x ) V V V V H H H . . . H for each domino V V V V H H H . . . H Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that…

H H H . . . H 2. Assign one domino to each node: V V V V a unary relation

...... V V V V H H H . . . H D ( x ) V V V V H H H . . . H for each domino V V V V H H H . . . H 3. Match the sides ∀x,y

if H(x,y), then Da(x) ⋀ Db(y) for some dominos a,b that ‘match’ horizontally (Idem vertically) Domino ⇝ Sat-FO (domino has a solution iff φ satisfiable)

1. Tere is a grid: H( , ) and V( , ) are relations representing bijections such that…

H H H . . . H 2. Assign one domino to each node: V V V V a unary relation

...... V V V V H H H . . . H D ( x ) V V V V H H H . . . H for each domino V V V V H H H . . . H 3. Match the sides ∀x,y

if H(x,y), then Da(x) ⋀ Db(y) for some dominos a,b that ‘match’ horizontally (Idem vertically) 4. Borders are white. Algorithmic problems for FO

Evaluation problem: Given a FO formula φ(x1, …, xn), a graph G, and a binding α, does G ⊧α φ ?

DECIDABLE ⇝ foundations of the database industry

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

� UNDECIDABLE ⇝ by reduction to the satisfiability problem

18 Algorithmic problems for FO

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

� UNDECIDABLE ⇝ by reduction from the satisfiability problem

19 Algorithmic problems for FO

φ is satisfiable iff φ is not equivalent to ⊥ Satisfiability problem undecidable ⇝ Equivalence problem undecidable

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

� UNDECIDABLE ⇝ by reduction from the satisfiability problem

19 Algorithmic problems for FO

φ is satisfiable iff φ is not equivalent to ⊥ Satisfiability problem undecidable ⇝ Equivalence problem undecidable Actually, there are reductions in both senses:

φ(x1,…,xn) and ψ(y1,…,ym) are equivalent iff • n=m

• (x1=y1) ⋀ ··· ⋀ (xn=yn) ⋀ φ(x1,…,xn) ⋀ ¬ψ(y1,…,yn) is unsatisfiable

• (x1=y1) ⋀ ··· ⋀ (xn=yn) ⋀ ψ(x1,…,xn) ⋀ ¬φ(y1,…,yn) is unsatisfiable

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

� UNDECIDABLE ⇝ by reduction from the satisfiability problem

19 Algorithmic problems for FO

Evaluation problem: Given a FO formula φ(x1, …, xn), a graph G, and a binding α, does G ⊧α φ ?

DECIDABLE ⇝ foundations of the database industry

Satisfiability problem: Given a FO formula φ, is there a graph G and binding α, such that G ⊧αφ ?

� UNDECIDABLE ⇝ both for ⊧ and ⊧finite

Equivalence problem: Given FO formulae φ,ψ, is G ⊧αφ iff G ⊧αψ for all graphs G and bindings α?

� UNDECIDABLE ⇝ by reduction to the satisfiability problem

20 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

21 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

Encoding of G = (V, E)

• each node is coded with a bit string of size log(|V|), • edge set is encoded by its tuples, e.g. (100,101), (010, 010), …

Cost of coding: ||G|| = |E|·2·log(|V|) ≈ |V| (mod a polynomial)

21 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

Encoding of G = (V, E)

• each node is coded with a bit string of size log(|V|), • edge set is encoded by its tuples, e.g. (100,101), (010, 010), …

Cost of coding: ||G|| = |E|·2·log(|V|) ≈ |V| (mod a polynomial)

Encoding of α = {x1,…,xn} ⟶ V

• each node is coded with a bit string of size log(|V|),

Cost of coding: ||α|| = n·log(|V|)

21 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn): answer YES iff G ⊧α ψ and G ⊧α ψ'

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y): answer YES iff for some v ∈ V and α'= α ∪ {y↦v} we have G ⊧α' ψ.

22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn): answer YES iff G ⊧α ψ and G ⊧α ψ'

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y): answer YES iff for some v ∈ V and α'= α ∪ {y↦v} we have G ⊧α' ψ. Question: How much space does it take? 22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): use 4 pointers ⇝ LOGSPACE answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn): answer YES iff G ⊧α ψ and G ⊧α ψ'

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y): answer YES iff for some v ∈ V and α'= α ∪ {y↦v} we have G ⊧α' ψ. Question: How much space does it take? 22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): use 4 pointers ⇝ LOGSPACE answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn):

answer YES iff G ⊧α ψ and G ⊧α ψ' ⇝ MAX( SPACE(G ⊧α ψ)), SPACE(G ⊧α ψ')) )

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y): answer YES iff for some v ∈ V and α'= α ∪ {y↦v} we have G ⊧α' ψ. Question: How much space does it take? 22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): use 4 pointers ⇝ LOGSPACE answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn):

answer YES iff G ⊧α ψ and G ⊧α ψ' ⇝ MAX( SPACE(G ⊧α ψ)), SPACE(G ⊧α ψ')) )

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): ⇝ SPACE(G ⊧α ψ)) answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y): answer YES iff for some v ∈ V and α'= α ∪ {y↦v} we have G ⊧α' ψ. Question: How much space does it take? 22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): use 4 pointers ⇝ LOGSPACE answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn):

answer YES iff G ⊧α ψ and G ⊧α ψ' ⇝ MAX( SPACE(G ⊧α ψ)), SPACE(G ⊧α ψ')) )

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): ⇝ SPACE(G ⊧α ψ)) answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y):

answer YES iff for some v ∈ V and α'= α ∪ {y↦v} ⇝ 2·log(|G|) + SPACE(G ⊧α' ψ ) we have G ⊧α' ψ. Question: How much space does it take? 22 Evaluation problem for FO

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): use 4 pointers ⇝ LOGSPACE answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn):

answer YES iff G ⊧α ψ and G ⊧α ψ' ⇝ MAX( SPACE(G ⊧α ψ)), SPACE(G ⊧α ψ')) )

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): ⇝ SPACE(G ⊧α ψ)) answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y):

answer YES iff for some v ∈ V and α'= α ∪ {y↦v} ⇝ 2·log(|G|) + SPACE(G ⊧α' ψ ) we have G ⊧α' ψ. Question: 2·log(|G|) + ··· + 2·log(|G|) + k·log(|α|+|G|) space How much space does it take? ≤ |φ| times 22 Evaluation problem for FO in PSPACE

φ(x1,…,xn)

Input: G = (V,E) Output: G ⊧α φ ?

α = {x1,…,xn} ⟶ V

• If φ(x1,…,xn) = E(xi,xj): use 4 pointers ⇝ LOGSPACE answer YES iff (α(xi),α(xj)) ∈ E

• If φ(x1,…,xn) = ψ(x1,…,xn) ⋀ ψ'(x1,…,xn):

answer YES iff G ⊧α ψ and G ⊧α ψ' ⇝ MAX( SPACE(G ⊧α ψ)), SPACE(G ⊧α ψ')) )

• If φ(x1,…,xn) = ¬ψ(x1,…,xn): ⇝ SPACE(G ⊧α ψ)) answer NO iff G ⊧α ψ

• If φ(x1,…,xn) = ∃y . ψ(x1,…,xn,y):

answer YES iff for some v ∈ V and α'= α ∪ {y↦v} ⇝ 2·log(|G|) + SPACE(G ⊧α' ψ ) we have G ⊧α' ψ. Question: 2·log(|G|) + ··· + 2·log(|G|) + k·log(|α|+|G|) space How much space does it take? ≤ |φ| times 22 Evaluation pb for FO is PSPACE-complete

PSPACE-complete problem: QBF (satisfaction of Quantified Boolean Formulas) QBF = a boolean formula with quantification over the truth values (T,F)

23 Evaluation pb for FO is PSPACE-complete

PSPACE-complete problem: QBF (satisfaction of Quantified Boolean Formulas) QBF = a boolean formula with quantification over the truth values (T,F) ∃p ∀q . (p ⋁ ¬q) where p,q range over {T,F}

23 Evaluation pb for FO is PSPACE-complete

PSPACE-complete problem: QBF (satisfaction of Quantified Boolean Formulas) QBF = a boolean formula with quantification over the truth values (T,F) ∃p ∀q . (p ⋁ ¬q) where p,q range over {T,F}

Theorem: Evaluation for FO is PSPACE-complete (combined c.)

23 Evaluation pb for FO is PSPACE-complete

PSPACE-complete problem: QBF (satisfaction of Quantified Boolean Formulas) QBF = a boolean formula with quantification over the truth values (T,F) ∃p ∀q . (p ⋁ ¬q) where p,q range over {T,F}

Theorem: Evaluation for FO is PSPACE-complete (combined c.)

Polynomial reduction QBF ⤳ FO : 1. Given ψ ∈ QBF, let ψ'(x) be the replacement of each ‘p’ with ‘p=x’ in ψ.

2. Note: ∃x ψ' holds in a 2-element graph iff ψ is QBF-satisfiable

3. Test if G ⊧∅ ψ' for G=({v,v'},{}) 23 Evaluation pb for FO is PSPACE-complete

PSPACE-complete problem: QBF (satisfaction of Quantified Boolean Formulas) QBF = a boolean formula with quantification over the truth values (T,F) ∃p ∀q . (p ⋁ ¬q) where p,q range over {T,F}

Theorem: Evaluation for FO is PSPACE-complete (combined c.)

Polynomial reduction QBF ⤳ FO : 1. Given ψ ∈ QBF, let ψ'(x) be the replacement ψ'(x)=∃p ∀q . ( (p=x) ⋁ ¬(q=x) ) of each ‘p’ with ‘p=x’ in ψ.

2. Note: ∃x ψ' holds in a 2-element graph iff ψ is QBF-satisfiable

3. Test if G ⊧∅ ψ' for G=({v,v'},{}) 23 Evaluation pb for FO is PSPACE-complete

PSPACE-complete problem: QBF (satisfaction of Quantified Boolean Formulas) QBF = a boolean formula with quantification over the truth values (T,F) ∃p ∀q . (p ⋁ ¬q) where p,q range over {T,F}

Theorem: Evaluation for FO is PSPACE-complete (combined c.)

Polynomial reduction QBF ⤳ FO : 1. Given ψ ∈ QBF, let ψ'(x) be the replacement ψ'(x)=∃p ∀q . ( (p=x) ⋁ ¬(q=x) ) of each ‘p’ with ‘p=x’ in ψ.

2. Note: ∃x ψ' holds in a 2-element ∃x ∃p ∀q . ( (p=x) ⋁ ¬(q=x) ) graph iff ψ is QBF-satisfiable

3. Test if G ⊧∅ ψ' for G=({v,v'},{}) 23 Combined, Query, and Data complexities [Vardi, 1982]

A database of size 106 Problem: Usual scenario in database A query of size 100

Input:

24 Combined, Query, and Data complexities [Vardi, 1982]

A database of size 106 Problem: Usual scenario in database A query of size 100

Input: query +

24 Combined, Query, and Data complexities [Vardi, 1982]

A database of size 106 Problem: Usual scenario in database A query of size 100

Input: query + database

24 Combined, Query, and Data complexities [Vardi, 1982]

A database of size 106 Problem: Usual scenario in database A query of size 100

Input: query + database

TIME(2|query| + |data|) But we don’t distinguish this in the analysis: = TIME(|query| + 2|data|)

24 Combined, Query, and Data complexities [Vardi, 1982]

Query and data play very different roles.

Separation of concerns: How the resources grow with respect to • the size of the data • the query size

25 Combined, Query, and Data complexities

Combined complexity: input size is |query| + |data|

Query complexity (|data| fixed): input size is |query|

Data complexity (|query| fixed): input size is |data|

26 Combined, Query, and Data complexities

Combined complexity: input size is |query| + |data|

Query complexity (|data| fixed): input size is |query|

Data complexity (|query| fixed): input size is |data|

exponential in combined complexity O(2|query| + |data|) is exponential in query complexity linear in data complexity

exponential in combined complexity O(|query| + 2|data|) is linear in query complexity exponential in data complexity

26 Question

What is the data, query and combined complexity for the evaluation problem for FO?

Remember: data complexity, input size: |data| query complexity, input size: |query| combined complexity, input size: |data| + |query|

|φ| · 2 · log(|G|) + k·log(|α|+|G|) space

27 Question

What is the data, query and combined complexity for the evaluation problem for FO?

Remember: data complexity, input size: |data| query complexity, input size: |query| combined complexity, input size: |data| + |query|

|φ| · 2 · log(|G|) + k·log(|α|+|G|) space

query data

O(log(|data|)·|query|) space PSPACE combined and query complexity LOGSPACE data complexity

27 Recap

Equivalence-RA

Equivalence-SQL

Equivalence-FO Eval-FO Sat-FO (combined) Eval-FO Domino QBF (data)

UNDECIDABLE PSPACE LOGSPACE28 Trading expressiveness for efficiency

expressiveness efficiency

Alternation of quantifiers significantly affects complexity (recall that evaluation of QBF is PSPACE-complete: ∀x ∃y ∀z ∃w … φ).

What happens if we disallow ∀ and ¬ ?

29 Te class NP

LOGSPACE ⊆ PTIME ⊆ PSPACE ⊆ EXPTIME

30 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

30 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

Examples:

• 3-COLORABILITY: Given a graph G, can we assign a colour from {R,G,B} to each node so that adjacent nodes have always different colours ?

• SAT: Given a propositional formula, e.g. (p ⋁ ¬q ⋁ r) ⋀ (¬p ⋁ s ) ⋀ (¬s ⋁ ¬p), can we assign a truth value to each variable so that the formula becomes true ?

• MONEY-CHANGE: Given an amount of money A and a set of coins {B1, …, Bn}, can we find a subset S ⊆ {B1, …, Bn} such that ∑ S = A ?

30 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

31 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

Initial configuration

Final configuration 31 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

Initial configuration

Final configuration

Final configuration Final configuration 31 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

Initial configuration

Non-deterministic transitions

Final configuration

Final configuration Final configuration 31 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

Initial configuration

Non-deterministic transitions

Many paths, each has length bounded by a polynomial Final configuration

Final configuration Final configuration 31 Te class NP

LOGSPACE ⊆ PTIME ⊆ NP ⊆ PSPACE ⊆ EXPTIME

NP = Problems whose solutions can be witnessed by a certificate to be guessed and checked in polynomial time (e.g. a colouring)

Initial configuration

Non-deterministic transitions

Many paths, each has length bounded by a polynomial Final A solution exists if there is configuration at least a successful path.

Final configuration Final configuration 31 Question

Consider: Positive FO = FO without ∀,¬

E.g. φ = ∃ x ∃ y ∃ z . (E(x, y) ⋁ E(y, z)) ⋀ ( y=z ⋁ E(x, z))

What is the complexity of evaluating Positive FO on graphs ?

32 Question

Consider: Positive FO = FO without ∀,¬

E.g. φ = ∃ x ∃ y ∃ z . (E(x, y) ⋁ E(y, z)) ⋀ ( y=z ⋁ E(x, z))

What is the complexity of evaluating Positive FO on graphs ?

Solution

Tis is in NP: Given φ and G=(V, E) it suffices to guess a binding α : { x, y, z, … } → V and then verify that the formula holds. 32 Conjunctive Queries

Def. CQ = FO without ∀,¬,⋁

Eg: φ(x, y) = ∃ z . (Parent(x, z) ⋀ Parent(z, y))

Usual notation: “Grandparent(X,Y) : – Parent(X,Z), Parent(Z,Y)”

33 Conjunctive Queries

Def. CQ = FO without ∀,¬,⋁

Normal form: “ ∃ x1, …, xn . φ(x1, …, xn) ” quantifier-free and no equalities!

Eg: φ(x, y) = ∃ z . (Parent(x, z) ⋀ Parent(z, y))

Usual notation: “Grandparent(X,Y) : – Parent(X,Z), Parent(Z,Y)”

33 Conjunctive Queries

Def. CQ = FO without ∀,¬,⋁

Normal form: “ ∃ x1, …, xn . φ(x1, …, xn) ” quantifier-free and no equalities!

Eg: φ(x, y) = ∃ z . (Parent(x, z) ⋀ Parent(z, y))

Usual notation: “Grandparent(X,Y) : – Parent(X,Z), Parent(Z,Y)”

It corresponds to positive It corresponds to “π-σ-×” RA queries “SELECT-FROM-WHERE” SQL queries Select ... From ... πX(σZ(R1 ×···× Rn)) Where Z no negation no negation or disjunction 33 Bibliography

Abiteboul, Hull, Vianu, “Foundations of Databases”, Addison-Wesley, 1995.

(freely available at http://webdam.inria.fr/Alice/)

Chapters 1, 2, 3

34