Approximations of Consistent Query Answers 1

Approximations of Consistent Query Answers 1

Approximations of Consistent Query Answers 1 Jef Wijsen UMONS DaQuaTa International Workshop 2016 Lyon, 12{13 December 2016 1Joint work with Floris Geerts, Paris Koutris, and Fabian Pijcke Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 1 / 54 Outline 1 Motivation 2 On the Complexity of Embracing Primary Key Violations 3 First-Order Under-Approximations Of Consistent Query Answers 4 Beyond (Un)certainty: Counts and Probabilities 5 Attack Graphs, a Complexity Classification Tool 6 Final Thoughts Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 2 / 54 Outline 1 Motivation 2 On the Complexity of Embracing Primary Key Violations 3 First-Order Under-Approximations Of Consistent Query Answers 4 Beyond (Un)certainty: Counts and Probabilities 5 Attack Graphs, a Complexity Classification Tool 6 Final Thoughts Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 3 / 54 Data Quality Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 4 / 54 Data Quality Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 4 / 54 Consistent and Complete Relational Database Integrity constraints are satisfied. The database contains all (and only) the facts that are true (Closed-World Assumption). No missing values. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 5 / 54 Dealing with Imperfect Data It is common to have inconsistent data, incomplete data, missing data, uncertain data. What can we do with this data? Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data It is common to have inconsistent data, incomplete data, missing data, uncertain data. What can we do with this data? Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data It is common to have inconsistent data, incomplete data, missing data, uncertain data. What can we do with this data? Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data It is common to have inconsistent data, incomplete data, missing data, uncertain data. What can we do with this data? Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Dealing with Imperfect Data It is common to have inconsistent data, incomplete data, missing data, uncertain data. What can we do with this data? Some data quality problems already have principled solutions: incomplete data [IJ84], NULLs in SQL [Lib16]. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 6 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL | ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL | ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Embrace Imperfections Example P PID FirstName LastName BloodType Gendre ··· 1 John Adams NULL M ··· 2 Jan Peeters A+ M ··· 3 Jean Dubois A+ M ··· 3 Jean Dubois AB+ M ··· Caveat If we embrace imperfections, we should rethink query answering. For example, what should be the answer to the following queries? SELECT COUNT(DISTINCT PID) SELECT COUNT(DISTINCT PID) FROM P FROM P WHERE BloodType=`A+'; WHERE BloodType<>`A+'; Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 7 / 54 Outline 1 Motivation 2 On the Complexity of Embracing Primary Key Violations 3 First-Order Under-Approximations Of Consistent Query Answers 4 Beyond (Un)certainty: Counts and Probabilities 5 Attack Graphs, a Complexity Classification Tool 6 Final Thoughts Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 8 / 54 Embrace Primary Key Violations Data model We allow (primary) key violations. Example (Keys are underlined) WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6 =) James works for either CIA or MI6. Definition (Block) A block is a maximal set of tuples of the same relation with the same value for the key. (Blocks are separated by dashed lines.) Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 9 / 54 Embrace Primary Key Violations Data model We allow (primary) key violations. Example (Keys are underlined) WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6 =) James works for either CIA or MI6. Definition (Block) A block is a maximal set of tuples of the same relation with the same value for the key. (Blocks are separated by dashed lines.) Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 9 / 54 Certainty Semantics Definition (Repair and Certainty) A repair is obtained by selecting exactly one tuple from each block. A Boolean query is certain if it is true in all repairs. Certainty semantics WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6 Is the budget of James' department equal to 60M? 9d9m (WorksFor(`James'; d) ^ ManagedBy(d; m; `60M')) is certain. Is James' department managed by Alex? 9d9b (WorksFor(`James'; d) ^ ManagedBy(d; `Alex'; b)) is not certain. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 10 / 54 Certainty Semantics Definition (Repair and Certainty) A repair is obtained by selecting exactly one tuple from each block. A Boolean query is certain if it is true in all repairs. Certainty semantics WorksFor Agent Dept ManagedBy Dept Mgr Budget Sherlock MI6 CIA John 60M James CIA MI6 Alex 60M James MI6 Is the budget of James' department equal to 60M? 9d9m (WorksFor(`James'; d) ^ ManagedBy(d; m; `60M')) is certain. Is James' department managed by Alex? 9d9b (WorksFor(`James'; d) ^ ManagedBy(d; `Alex'; b)) is not certain. Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 10 / 54 The Computational Complexity of Deciding Certainty I Relation with exponentially many repairs WorksFor Agent Dept 1 MI6 1 CIA This WorksFor relation contains 2n 2 MI6 p 2n 2 CIA tuples and has 2 distinct . repairs. n MI6 n CIA Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 11 / 54 The Computational Complexity of Deciding Certainty II Example of Low Complexity Let q1 = 9d9b (WorksFor(`James'; d) ^ ManagedBy(d; `Alex'; b)) For example, q1 is certain in the following database: ManagedBy Dept Mgr Budget WorksFor Agent Dept CIA Alex 50M James CIA CIA Alex 60M James MI6 MI6 Alex 60M One can verify that q1 is certain iff the following query is true: 9d WorksFor(`James'; d) ^ 8dWorksFor(`James'; d) ! 9m9b[ManagedBy(d; m; b) ^ 8m8b(ManagedBy(d; m; b) ! m = `Alex')] Jef Wijsen (UMONS) Approximations of CQA DaQuaTa 2016 12 / 54 The Computational Complexity of Deciding Certainty II Example of Low Complexity Let q1 = 9d9b (WorksFor(`James'; d) ^ ManagedBy(d; `Alex'; b))

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    123 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us