Approximations of Consistent Query Answers 1
Jef Wijsen
UMONS
DaQuaTa International Workshop 2016 Lyon, 12–13 December 2016
1Joint work with Floris Geerts, Paris Koutris, and Fabian Pijcke Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 1 / 54 Outline
1 Motivation
2 On the Complexity of Embracing Primary Key Violations
3 First-Order Under-Approximations Of Consistent Query Answers
4 Beyond (Un)certainty: Counts and Probabilities
5 Attack Graphs, a Complexity Classification Tool
6 Final Thoughts
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 2 / 54 Outline
1 Motivation
2 On the Complexity of Embracing Primary Key Violations
3 First-Order Under-Approximations Of Consistent Query Answers
4 Beyond (Un)certainty: Counts and Probabilities
5 Attack Graphs, a Complexity Classification Tool
6 Final Thoughts
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 3 / 54 Data Quality
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 4 / 54 Data Quality
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 4 / 54 Consistent and Complete Relational Database
Integrity constraints are satisfied. The database contains all (and only) the facts that are true (Closed-World Assumption). No missing values.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 5 / 54 Dealing with Imperfect Data
It is common to have inconsistent data, incomplete data, missing data, uncertain data. . . What can we do with this data?
Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. . .
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data
It is common to have inconsistent data, incomplete data, missing data, uncertain data. . . What can we do with this data?
Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. . .
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data
It is common to have inconsistent data, incomplete data, missing data, uncertain data. . . What can we do with this data?
Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. . .
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data
It is common to have inconsistent data, incomplete data, missing data, uncertain data. . . What can we do with this data?
Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. . .
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data
It is common to have inconsistent data, incomplete data, missing data, uncertain data. . . What can we do with this data?
Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. . .
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL | ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL | ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections
Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ···
Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=‘A+’; WHERE BloodType<>‘A+’;
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Outline
1 Motivation
2 On the Complexity of Embracing Primary Key Violations
3 First-Order Under-Approximations Of Consistent Query Answers
4 Beyond (Un)certainty: Counts and Probabilities
5 Attack Graphs, a Complexity Classification Tool
6 Final Thoughts
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 8 / 54 Embrace Primary Key Violations
Data model We allow (primary) key violations.
Example (Keys are underlined) WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6 =⇒ James works for either CIA or MI6.
Definition (Block) A block is a maximal set of tuples of the same relation with the same value for the key. (Blocks are separated by dashed lines.)
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 9 / 54 Embrace Primary Key Violations
Data model We allow (primary) key violations.
Example (Keys are underlined) WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6 =⇒ James works for either CIA or MI6.
Definition (Block) A block is a maximal set of tuples of the same relation with the same value for the key. (Blocks are separated by dashed lines.)
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 9 / 54 Certainty Semantics
Definition (Repair and Certainty) A repair is obtained by selecting exactly one tuple from each block. A Boolean query is certain if it is true in all repairs.
Certainty semantics WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6
Is the budget of James’ department equal to 60M? ∃d∃m (WorksFor(‘James’, d) ∧ ManagedBy(d, m, ‘60M’)) is certain. Is James’ department managed by Alex? ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b)) is not certain.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 10 / 54 Certainty Semantics
Definition (Repair and Certainty) A repair is obtained by selecting exactly one tuple from each block. A Boolean query is certain if it is true in all repairs.
Certainty semantics WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6
Is the budget of James’ department equal to 60M? ∃d∃m (WorksFor(‘James’, d) ∧ ManagedBy(d, m, ‘60M’)) is certain. Is James’ department managed by Alex? ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b)) is not certain.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 10 / 54 The Computational Complexity of Deciding Certainty I
Relation with exponentially many repairs WorksFor Agent Dept 1 MI6 1 CIA This WorksFor relation contains 2n 2 MI6 √ 2n 2 CIA tuples and has 2 distinct . . repairs. . . n MI6 n CIA
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 11 / 54 The Computational Complexity of Deciding Certainty II
Example of Low Complexity Let
q1 = ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b))
For example, q1 is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA Alex 50M James CIA CIA Alex 60M James MI6 MI6 Alex 60M
One can verify that q1 is certain iff the following query is true: ∃d WorksFor(‘James’, d) ∧ ∀d WorksFor(‘James’, d) → ∃m∃b[ManagedBy(d, m, b) ∧ ∀m∀b(ManagedBy(d, m, b) → m = ‘Alex’)]
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 12 / 54 The Computational Complexity of Deciding Certainty II
Example of Low Complexity Let
q1 = ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b))
For example, q1 is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA Alex 50M James CIA CIA Alex 60M James MI6 MI6 Alex 60M
One can verify that q1 is certain iff the following query is true: ∃d WorksFor(‘James’, d) ∧ ∀d WorksFor(‘James’, d) → ∃m∃b[ManagedBy(d, m, b) ∧ ∀m∀b(ManagedBy(d, m, b) → m = ‘Alex’)]
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 12 / 54 The Computational Complexity of Deciding Certainty II
Example of Low Complexity Let
q1 = ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b))
For example, q1 is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA Alex 50M James CIA CIA Alex 60M James MI6 MI6 Alex 60M
One can verify that q1 is certain iff the following query is true: ∃d WorksFor(‘James’, d) ∧ ∀d WorksFor(‘James’, d) → ∃m∃b[ManagedBy(d, m, b) ∧ ∀m∀b(ManagedBy(d, m, b) → m = ‘Alex’)]
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 12 / 54 The Computational Complexity of Deciding Certainty II
Example of Low Complexity Let
q1 = ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b))
For example, q1 is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA Alex 50M James CIA CIA Alex 60M James MI6 MI6 Alex 60M
One can verify that q1 is certain iff the following query is true: ∃d WorksFor(‘James’, d)∧ ∀d WorksFor(‘James’, d) → ∃m∃b[ManagedBy(d, m, b)∧ ∀m∀b(ManagedBy(d, m, b) → m = ‘Alex’)]
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 12 / 54 The Computational Complexity of Deciding Certainty II
Example of Low Complexity Let
q1 = ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b))
For example, q1 is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA Alex 50M James CIA CIA Alex 60M James MI6 MI6 Alex 60M
One can verify that q1 is certain iff the following query is true: ∃d WorksFor(‘James’, d)∧ ∀d WorksFor(‘James’, d) → ∃m∃b[ManagedBy(d, m, b)∧ ∀m∀b(ManagedBy(d, m, b) → m = ‘Alex’)]
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 12 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty III
Example of Higher Complexity Is some department self-managed (i.e., managed by an agent of the department)? Let
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))
For example, qself managed is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA James 60M James CIA MI6 James 60M James MI6 MI6 Cherlock 60M Cherlock MI6
No first-order query can decide whether qself managed is certain [Wij10]. Intuition: neither d nor m can be “skolemized.”
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 13 / 54 The Computational Complexity of Deciding Certainty IV
Definition For every Boolean first-order query q, the problem CERTAINTY(q) is the following: Input A database instance (possibly with key violations) Question Is q certain?
Complexity Classification Task Input A Boolean first-order query q Question What complexity classes does CERTAINTY(q) belong to? Complexity classes of interest:
FO ⊆ P ⊆ coNP
Complexity in FO is of interest to database practitioners, because it allows for implementation in SQL.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 14 / 54 The Computational Complexity of Deciding Certainty IV
Definition For every Boolean first-order query q, the problem CERTAINTY(q) is the following: Input A database instance (possibly with key violations) Question Is q certain?
Complexity Classification Task Input A Boolean first-order query q Question What complexity classes does CERTAINTY(q) belong to? Complexity classes of interest:
FO ⊆ P ⊆ coNP
Complexity in FO is of interest to database practitioners, because it allows for implementation in SQL.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 14 / 54 Main Result
We solved the aforementioned complexity classification task when the input queries q are conjunctive and self-join-free (i.e., no relation name occurs more than once in q): Theorem (Complexity Classification) For every self-join-free Boolean conjunctive query q, the following hold: 1 CERTAINTY(q) is either in P or coNP-complete (and the dichotomy is decidable); 2 it can be decided whether CERTAINTY(q) is in FO; and 3 if CERTAINTY(q) is in FO, then its first-order definition can be computed effectively.
The theorem settles a conjecture that had been open for 10 years. ACM SIGMOD Research Highlight Award 2015 was awarded to [KW15].
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 15 / 54 Main Result
We solved the aforementioned complexity classification task when the input queries q are conjunctive and self-join-free (i.e., no relation name occurs more than once in q): Theorem (Complexity Classification) For every self-join-free Boolean conjunctive query q, the following hold: 1 CERTAINTY(q) is either in P or coNP-complete (and the dichotomy is decidable); 2 it can be decided whether CERTAINTY(q) is in FO; and 3 if CERTAINTY(q) is in FO, then its first-order definition can be computed effectively.
The theorem settles a conjecture that had been open for 10 years. ACM SIGMOD Research Highlight Award 2015 was awarded to [KW15].
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 15 / 54 Main Result
We solved the aforementioned complexity classification task when the input queries q are conjunctive and self-join-free (i.e., no relation name occurs more than once in q): Theorem (Complexity Classification) For every self-join-free Boolean conjunctive query q, the following hold: 1 CERTAINTY(q) is either in P or coNP-complete (and the dichotomy is decidable); 2 it can be decided whether CERTAINTY(q) is in FO; and 3 if CERTAINTY(q) is in FO, then its first-order definition can be computed effectively.
The theorem settles a conjecture that had been open for 10 years. ACM SIGMOD Research Highlight Award 2015 was awarded to [KW15].
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 15 / 54 The Geography of coNP (assuming P 6= coNP)
coNP-complete coNP
coNP-intermediate
P
FO
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 16 / 54 Examples of Different Complexities
Example
q1 = ∃d∃b (WorksFor(‘James’, d) ∧ ManagedBy(d, ‘Alex’, b))
qself managed = ∃d∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d)) a q3 = ∃d∃m∃b∃a (ManagedBy(d, m, b) ∧ WorksFor(a, m))
Our results allow us to tell that
CERTAINTY(q1) is in FO;
CERTAINTY(qself managed) is in P but not in FO; and
CERTAINTY(q3) is coNP-complete.
aA meaningless query for our example database.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 17 / 54 Open Problems
Conjecture For every Boolean conjunctive query q, CERTAINTY(q) is in P or coNP-complete.
Conjecture For every query q that is a finite disjunction of Boolean conjunctive queries, CERTAINTY(q) is in P or coNP-complete.
Caveat It is known [Fon13] that the latter conjecture implies Bulatov’s complexity dichotomy theorem for conservative CSP [Bul11], the proof of which is very involved (the full paper contains 66 pages).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 18 / 54 Open Problems
Conjecture For every Boolean conjunctive query q, CERTAINTY(q) is in P or coNP-complete.
Conjecture For every query q that is a finite disjunction of Boolean conjunctive queries, CERTAINTY(q) is in P or coNP-complete.
Caveat It is known [Fon13] that the latter conjecture implies Bulatov’s complexity dichotomy theorem for conservative CSP [Bul11], the proof of which is very involved (the full paper contains 66 pages).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 18 / 54 Open Problems
Conjecture For every Boolean conjunctive query q, CERTAINTY(q) is in P or coNP-complete.
Conjecture For every query q that is a finite disjunction of Boolean conjunctive queries, CERTAINTY(q) is in P or coNP-complete.
Caveat It is known [Fon13] that the latter conjecture implies Bulatov’s complexity dichotomy theorem for conservative CSP [Bul11], the proof of which is very involved (the full paper contains 66 pages).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 18 / 54 Outline
1 Motivation
2 On the Complexity of Embracing Primary Key Violations
3 First-Order Under-Approximations Of Consistent Query Answers
4 Beyond (Un)certainty: Counts and Probabilities
5 Attack Graphs, a Complexity Classification Tool
6 Final Thoughts
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 19 / 54 CQA for Open Queries
Definition (Consistent query answer) Let q be an open query (i.e., containing at least one free variable). Given a database db, the consistent answer to q is defined by \n o q(r) | r is a repair of db .
We write q for the query that maps db to the consistent answer to q.
Example T U q = {x | WorksFor(x, ‘MI6’)} q returns ‘Sherlock’ and WorksFor Agent Dept ‘James’; Sherlock MI6 db = q returns only James CIA James MI6 ‘Sherlock’. T U Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 20 / 54 CQA for Open Queries
Definition (Consistent query answer) Let q be an open query (i.e., containing at least one free variable). Given a database db, the consistent answer to q is defined by \n o q(r) | r is a repair of db .
We write q for the query that maps db to the consistent answer to q.
Example T U q = {x | WorksFor(x, ‘MI6’)} q returns ‘Sherlock’ and WorksFor Agent Dept ‘James’; Sherlock MI6 db = q returns only James CIA James MI6 ‘Sherlock’. T U Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 20 / 54 Free Variables Can be Treated as Constants
Note: For every relational calculus query q(~x) and database db, ~c ∈ q (db) ⇐⇒ the Boolean query q[~x7→~c] is certain in db. Carry Over of Complexity T U For every self-join-free conjunctive query q(~x) and sequence ~c of constants (of the same length as ~x):
CERTAINTY(q[~x7→~c]) is in FO ⇐⇒ q can be expressed in relational calculus (or SQL);
CERTAINTY(q[~x7→~c]) is in P ⇐⇒ qT Ucan be computed in polynomial time; and
CERTAINTY(q[~x7→~c]) is coNP-hardT =U⇒ q cannot be computed in polynomial time (unless TP =U coNP).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 21 / 54 Free Variables Can be Treated as Constants
Note: For every relational calculus query q(~x) and database db, ~c ∈ q (db) ⇐⇒ the Boolean query q[~x7→~c] is certain in db. Carry Over of Complexity T U For every self-join-free conjunctive query q(~x) and sequence ~c of constants (of the same length as ~x):
CERTAINTY(q[~x7→~c]) is in FO ⇐⇒ q can be expressed in relational calculus (or SQL);
CERTAINTY(q[~x7→~c]) is in P ⇐⇒ qT Ucan be computed in polynomial time; and
CERTAINTY(q[~x7→~c]) is coNP-hardT =U⇒ q cannot be computed in polynomial time (unless TP =U coNP).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 21 / 54 Free Variables Can be Treated as Constants
Note: For every relational calculus query q(~x) and database db, ~c ∈ q (db) ⇐⇒ the Boolean query q[~x7→~c] is certain in db. Carry Over of Complexity T U For every self-join-free conjunctive query q(~x) and sequence ~c of constants (of the same length as ~x):
CERTAINTY(q[~x7→~c]) is in FO ⇐⇒ q can be expressed in relational calculus (or SQL);
CERTAINTY(q[~x7→~c]) is in P ⇐⇒ qT Ucan be computed in polynomial time; and
CERTAINTY(q[~x7→~c]) is coNP-hardT =U⇒ q cannot be computed in polynomial time (unless TP =U coNP).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 21 / 54 Free Variables Can be Treated as Constants
Note: For every relational calculus query q(~x) and database db, ~c ∈ q (db) ⇐⇒ the Boolean query q[~x7→~c] is certain in db. Carry Over of Complexity T U For every self-join-free conjunctive query q(~x) and sequence ~c of constants (of the same length as ~x):
CERTAINTY(q[~x7→~c]) is in FO ⇐⇒ q can be expressed in relational calculus (or SQL);
CERTAINTY(q[~x7→~c]) is in P ⇐⇒ qT Ucan be computed in polynomial time; and
CERTAINTY(q[~x7→~c]) is coNP-hardT =U⇒ q cannot be computed in polynomial time (unless TP =U coNP).
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 21 / 54 Restricted Setting for CQA I
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 22 / 54 Restricted Setting for CQA I
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 22 / 54 Restricted Setting for CQA II
Setting Database owner Bob answers queries on his database db, subject to two postulates: Consistent query answering Inconsistencies must not be divulged. Upper bounded complexity Only queries with low data complexity (in FO) will be answered. Database querier Alice can post-process query answers by means of (finitely many) first-order operations.
Strategy Suppose Alice wants to ask query q. What is her best strategy to get a maximal subset of q(db), without false positives?
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 23 / 54 Restricted Setting for CQA II
Setting Database owner Bob answers queries on his database db, subject to two postulates: Consistent query answering Inconsistencies must not be divulged. Upper bounded complexity Only queries with low data complexity (in FO) will be answered. Database querier Alice can post-process query answers by means of (finitely many) first-order operations.
Strategy Suppose Alice wants to ask query q. What is her best strategy to get a maximal subset of q(db), without false positives?
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 23 / 54 Bob’s Interface
Bob only returns consistent query answers. . .
first-order query q Interface CQA db
q (db)
T U
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 24 / 54 Bob’s Interface
Bob only returns consistent query answers computable with low complexity.
first-order query q Interface CQA db
if q is in FO then q (db) elseT Ureject T U
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 24 / 54 Bob’s Interface
Bob only returns consistent query answers computable with low complexity.
self-join-free conj. query q Interface CQA db
if q is in FO then q (db) elseT Ureject T U
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 24 / 54 Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples I
WorksFor Agent Dept James CIA James MI6 Cherlock MI6 Example Alice wants to answer q = {a | ∃d1∃d2 (WorksFor(a, d1) ∧ WorksFor(a, d2) ∧ d1 6= d2)}.
Alice: q1 = {a | ∃dWorksFor(a, d)} Bob: ‘Cherlock’ and ‘James’.
Alice: q2 = {ha, di | WorksFor(a, d)} Bob: h‘Cherlock’, ‘MI6’i. Alice: The answer to q is ‘James’. Alice’s strategy can be summarized as follows: {a | q1 ∧ ¬∃d q2 }
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 25 / 54 T U T U Strategy Examples II
Example Alice wants to get budgets of self-managed departments:
qself managed = {b | ∃d∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}.
Since qself managed is not in FO, the query qself managed will be rejected. The following queries will not be rejected: T U q0 = {d, m, b | ManagedBy(d, m, b)} and q1 = {m, d | WorksFor(m, d)}
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
Some strategies: s01 = {b | ∃d∃m ( q0 ∧ q1 )} s2 = {b | ∃d q2 } s3 = {b | ∃a q3 T} U T U
Since s01 ⊆ s2 and s01 ⊆ s3, the strategyTs2U∪ s3 seems optimal. T U Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 26 / 54 Strategy Examples II
Example Alice wants to get budgets of self-managed departments:
qself managed = {b | ∃d∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}.
Since qself managed is not in FO, the query qself managed will be rejected. The following queries will not be rejected: T U q0 = {d, m, b | ManagedBy(d, m, b)} and q1 = {m, d | WorksFor(m, d)}
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
Some strategies: s01 = {b | ∃d∃m ( q0 ∧ q1 )} s2 = {b | ∃d q2 } s3 = {b | ∃a q3 T} U T U
Since s01 ⊆ s2 and s01 ⊆ s3, the strategyTs2U∪ s3 seems optimal. T U Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 26 / 54 Strategy Examples II
Example Alice wants to get budgets of self-managed departments:
qself managed = {b | ∃d∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}.
Since qself managed is not in FO, the query qself managed will be rejected. The following queries will not be rejected: T U q0 = {d, m, b | ManagedBy(d, m, b)} and q1 = {m, d | WorksFor(m, d)}
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
Some strategies: s01 = {b | ∃d∃m ( q0 ∧ q1 )} s2 = {b | ∃d q2 } s3 = {b | ∃a q3 T} U T U
Since s01 ⊆ s2 and s01 ⊆ s3, the strategyTs2U∪ s3 seems optimal. T U Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 26 / 54 Strategy Examples III Example (Continued)
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
On the next db, q2 returns {h‘CIA’, ‘60M’i}, while q3 returns ∅.
ManagedBy Dept Mgr Budget WorksFor Agent Dept CIAT U John 60M TJohnU CIA CIA Alex 60M Alex CIA
On the next db, q3 returns {h‘James’, ‘50M’i}, while q2 returns ∅.
WorksFor Agent Dept ManagedBy Dept Mgr Budget JamesT U MI6 MI6 JamesT U 50M James MI7 MI7 James 50M
No perfect strategy
Since every strategy is in FO, but qself managed is not in FO, no strategy is equivalent to qself managed . Jef Wijsen (UMONS) ApproximationsT of CQAU DaQuaTa 2016 27 / 54 T U Strategy Examples III Example (Continued)
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
On the next db, q2 returns {h‘CIA’, ‘60M’i}, while q3 returns ∅.
ManagedBy Dept Mgr Budget WorksFor Agent Dept CIAT U John 60M TJohnU CIA CIA Alex 60M Alex CIA
On the next db, q3 returns {h‘James’, ‘50M’i}, while q2 returns ∅.
WorksFor Agent Dept ManagedBy Dept Mgr Budget JamesT U MI6 MI6 JamesT U 50M James MI7 MI7 James 50M
No perfect strategy
Since every strategy is in FO, but qself managed is not in FO, no strategy is equivalent to qself managed . Jef Wijsen (UMONS) ApproximationsT of CQAU DaQuaTa 2016 27 / 54 T U Strategy Examples III Example (Continued)
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
On the next db, q2 returns {h‘CIA’, ‘60M’i}, while q3 returns ∅.
ManagedBy Dept Mgr Budget WorksFor Agent Dept CIAT U John 60M TJohnU CIA CIA Alex 60M Alex CIA
On the next db, q3 returns {h‘James’, ‘50M’i}, while q2 returns ∅.
WorksFor Agent Dept ManagedBy Dept Mgr Budget JamesT U MI6 MI6 JamesT U 50M James MI7 MI7 James 50M
No perfect strategy
Since every strategy is in FO, but qself managed is not in FO, no strategy is equivalent to qself managed . Jef Wijsen (UMONS) ApproximationsT of CQAU DaQuaTa 2016 27 / 54 T U Strategy Examples III Example (Continued)
q2 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}
q3 = {a, b | ∃d (WorksFor(a, d) ∧ ManagedBy(d, a, b))}
On the next db, q2 returns {h‘CIA’, ‘60M’i}, while q3 returns ∅.
ManagedBy Dept Mgr Budget WorksFor Agent Dept CIAT U John 60M TJohnU CIA CIA Alex 60M Alex CIA
On the next db, q3 returns {h‘James’, ‘50M’i}, while q2 returns ∅.
WorksFor Agent Dept ManagedBy Dept Mgr Budget JamesT U MI6 MI6 JamesT U 50M James MI7 MI7 James 50M
No perfect strategy
Since every strategy is in FO, but qself managed is not in FO, no strategy is equivalent to qself managed . Jef Wijsen (UMONS) ApproximationsT of CQAU DaQuaTa 2016 27 / 54 T U Wrap Up
CQAFO If q is a self-join-free conjunctive query such that q is in FO, then q is an (atomic) CQAFO query. CQAFO is closed under first-order operations (∧, ∨T, U¬, ∃, ∀). T U Open problem Input Self-join-free conjunctive query q Question Construct a CQAFO query ϕ such that Under-Approximation ϕ ⊆ q ; and Maximality for every CQAFO query ϕ0 such that ϕ ⊆ ϕ0 ⊆ q ,T weU have ϕ ≡ ϕ0.
Studied in [GPW16] for the case where post-processing uses only ∨ and ∃. T U
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 28 / 54 Wrap Up
CQAFO If q is a self-join-free conjunctive query such that q is in FO, then q is an (atomic) CQAFO query. CQAFO is closed under first-order operations (∧, ∨T, U¬, ∃, ∀). T U Open problem Input Self-join-free conjunctive query q Question Construct a CQAFO query ϕ such that Under-Approximation ϕ ⊆ q ; and Maximality for every CQAFO query ϕ0 such that ϕ ⊆ ϕ0 ⊆ q ,T weU have ϕ ≡ ϕ0.
Studied in [GPW16] for the case where post-processing uses only ∨ and ∃. T U
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 28 / 54 Outline
1 Motivation
2 On the Complexity of Embracing Primary Key Violations
3 First-Order Under-Approximations Of Consistent Query Answers
4 Beyond (Un)certainty: Counts and Probabilities
5 Attack Graphs, a Complexity Classification Tool
6 Final Thoughts
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 29 / 54 Counting Semantics
]CERTAINTY(q) For a Boolean first-order query q, the counting problem ]CERTAINTY(q) is: INPUT A database instance db (possibly with key violations) QUESTION How many repairs of db satisfy q?
Counting semantics T Town Country Michelin R Conf Year Town Mons Belgium ∗ EDBT 2015 Mons Mons Belgium ∗∗ EDBT 2015 Brussel Brussel Belgium ∗∗
q1 = ∃yT (‘Mons’, y, ‘∗∗’) =⇒ true in 2 repairs q2 = ∃x∃y∃z R(‘EDBT’, x, y) ∧ T (y, ‘Belgium’, z) =⇒ true in 4 repairs
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 30 / 54 Counting Semantics
]CERTAINTY(q) For a Boolean first-order query q, the counting problem ]CERTAINTY(q) is: INPUT A database instance db (possibly with key violations) QUESTION How many repairs of db satisfy q?
Counting semantics T Town Country Michelin R Conf Year Town Mons Belgium ∗ EDBT 2015 Mons Mons Belgium ∗∗ EDBT 2015 Brussel Brussel Belgium ∗∗
q1 = ∃yT (‘Mons’, y, ‘∗∗’) =⇒ true in 2 repairs q2 = ∃x∃y∃z R(‘EDBT’, x, y) ∧ T (y, ‘Belgium’, z) =⇒ true in 4 repairs
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 30 / 54 Complexity Dichotomies for ]CERTAINTY(q)
Theorem ([MW13])
For every self-join-free Boolean conjunctive query q, ]CERTAINTY(q) is either in FP or ]P-complete, and it is decidable which of the two cases applies.
Theorem ([MW14])
For every Boolean conjunctive query q in which all relation names are simple-key, ]CERTAINTY(q) is either in FP or ]P-complete, and it is decidable which of the two cases applies.
Note The previous theorem is the only general result for queries with self-joins.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 31 / 54 Complexity Dichotomies for ]CERTAINTY(q)
Theorem ([MW13])
For every self-join-free Boolean conjunctive query q, ]CERTAINTY(q) is either in FP or ]P-complete, and it is decidable which of the two cases applies.
Theorem ([MW14])
For every Boolean conjunctive query q in which all relation names are simple-key, ]CERTAINTY(q) is either in FP or ]P-complete, and it is decidable which of the two cases applies.
Note The previous theorem is the only general result for queries with self-joins.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 31 / 54 Complexity Dichotomies for ]CERTAINTY(q)
Theorem ([MW13])
For every self-join-free Boolean conjunctive query q, ]CERTAINTY(q) is either in FP or ]P-complete, and it is decidable which of the two cases applies.
Theorem ([MW14])
For every Boolean conjunctive query q in which all relation names are simple-key, ]CERTAINTY(q) is either in FP or ]P-complete, and it is decidable which of the two cases applies.
Note The previous theorem is the only general result for queries with self-joins.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 31 / 54 Complexity Classes
NP, the class of decision problems whose “yes” instances have succinct certificates that can be verified in deterministic polynomial time. FP, the class of function problems that can be solved in deterministic polynomial time. ]P, the class of counting problems associated with decision problems in NP. Given an instance of a decision problem in NP, the associated counting problem instance asks to determine the number of succinct certificates of its being a “yes” instance.
By Toda’s theorem (stating PH ⊆ P]P), ]P-complete problems appear to be extremely hard. ]P-completeness suggests a higher level of intractability than NP-completeness, insofar decision problems and counting problems can be compared.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 32 / 54 The Two Sides of the Dichotomy (by Size)
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 33 / 54 Tractability is Rare
Paths in directed graphs/ Let C be the class of Boolean queries of the form ∃~x R(x1, x2) ∧ R(x3, x4) ∧ · · · ∧ R(x2n−1, x2n) with n 6= 0 and x1, x2,..., x2n (not necessarily distinct) variables. Queries in C ask for the existence of paths and cycles of fixed length.
Let q0 = ∃xR(x, x) =⇒ “cycle of length 1?” q1 = ∃x∃yR(x, y) =⇒ “path of length 1?” q2 = ∃x∃y∃z R(x, y) ∧ R(y, z) =⇒ “path of length 2?” R From To a a How many repairs satisfy q0? 1 a b How many repairs satisfy q1? 3 a c How many repairs satisfy q2? 2 c d
Counting is hard for most queries
For every q ∈ C, if q ≡ q0, q ≡ q1, or q ≡ q2, then ]CERTAINTY(q) is in FP; otherwise ]CERTAINTY(q) is ]P-complete.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 34 / 54 Tractability is Rare
Paths in directed graphs/ Let C be the class of Boolean queries of the form ∃~x R(x1, x2) ∧ R(x3, x4) ∧ · · · ∧ R(x2n−1, x2n) with n 6= 0 and x1, x2,..., x2n (not necessarily distinct) variables. Queries in C ask for the existence of paths and cycles of fixed length.
Let q0 = ∃xR(x, x) =⇒ “cycle of length 1?” q1 = ∃x∃yR(x, y) =⇒ “path of length 1?” q2 = ∃x∃y∃z R(x, y) ∧ R(y, z) =⇒ “path of length 2?” R From To a a How many repairs satisfy q0? 1 a b How many repairs satisfy q1? 3 a c How many repairs satisfy q2? 2 c d
Counting is hard for most queries
For every q ∈ C, if q ≡ q0, q ≡ q1, or q ≡ q2, then ]CERTAINTY(q) is in FP; otherwise ]CERTAINTY(q) is ]P-complete.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 34 / 54 Tractability is Rare
Paths in directed graphs/ Let C be the class of Boolean queries of the form ∃~x R(x1, x2) ∧ R(x3, x4) ∧ · · · ∧ R(x2n−1, x2n) with n 6= 0 and x1, x2,..., x2n (not necessarily distinct) variables. Queries in C ask for the existence of paths and cycles of fixed length.
Let q0 = ∃xR(x, x) =⇒ “cycle of length 1?” q1 = ∃x∃yR(x, y) =⇒ “path of length 1?” q2 = ∃x∃y∃z R(x, y) ∧ R(y, z) =⇒ “path of length 2?” R From To a a How many repairs satisfy q0? 1 a b How many repairs satisfy q1? 3 a c How many repairs satisfy q2? 2 c d
Counting is hard for most queries
For every q ∈ C, if q ≡ q0, q ≡ q1, or q ≡ q2, then ]CERTAINTY(q) is in FP; otherwise ]CERTAINTY(q) is ]P-complete.
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 34 / 54 Block-Independent-Disjoint Probabilistic Databases
BID probabilistic database T Town Country Michelin R Conf Year Town (t1) Mons Belgium ∗ (r1) EDBT 2015 Mons (t2) Mons Belgium ∗∗ (r2) EDBT 2015 Brussel (t3) Brussel Belgium ∗∗
{r1, t1, t3} 7→ 0.18 A possible world selects at most {r1, t2, t3} 7→ 0.12 one fact from each block. {r2, t1, t3} 7→ 0.36 {r2, t2, t3} 7→ 0.24 Assume the following probability {t1, t3} 7→ 0.06 distribution Pr: {t2, t3} 7→ 0.04 other possible worlds 7→ 0
Since the “block-independence” assumption holds, the following BID specification does not lose information:
T Town Country Michelin Pr R Conf Year Town Pr (t1) Mons Belgium ∗ 0.6 (r1) EDBT 2015 Mons 0.3 (t2) Mons Belgium ∗∗ 0.4 (r2) EDBT 2015 Brussel 0.6 (t3) Brussel Belgium ∗∗ 1.0
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 35 / 54 Block-Independent-Disjoint Probabilistic Databases
BID probabilistic database T Town Country Michelin R Conf Year Town (t1) Mons Belgium ∗ (r1) EDBT 2015 Mons (t2) Mons Belgium ∗∗ (r2) EDBT 2015 Brussel (t3) Brussel Belgium ∗∗
{r1, t1, t3} 7→ 0.18 A possible world selects at most {r1, t2, t3} 7→ 0.12 one fact from each block. {r2, t1, t3} 7→ 0.36 {r2, t2, t3} 7→ 0.24 Assume the following probability {t1, t3} 7→ 0.06 distribution Pr: {t2, t3} 7→ 0.04 other possible worlds 7→ 0
Since the “block-independence” assumption holds, the following BID specification does not lose information:
T Town Country Michelin Pr R Conf Year Town Pr (t1) Mons Belgium ∗ 0.6 (r1) EDBT 2015 Mons 0.3 (t2) Mons Belgium ∗∗ 0.4 (r2) EDBT 2015 Brussel 0.6 (t3) Brussel Belgium ∗∗ 1.0
Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 35 / 54 Probabilistic Query Answering
Probabilistic query evaluation T Town Country Michelin Pr R Conf Year Town Pr (t1) Mons Belgium ∗ 0.6 (r1) EDBT 2015 Mons 0.3 (t2) Mons Belgium ∗∗ 0.4 (r2) EDBT 2015 Brussel 0.6 (t3) Brussel Belgium ∗∗ 1.0