On the Complexity of Deriving Schema Mappings from Database Instances
Total Page:16
File Type:pdf, Size:1020Kb
On the Complexity of Deriving Schema Mappings from Database Instances Pierre Senellart1;2 Georg Gottlob3 1 3 2 30 November 2007, Gemo Seminar Jeffrey D. Ullman List of publications from the DBLP Bibliography Server ǖ FAQ Coauthor Index ǖ Ask others: ACM DL /Guide ǖ CiteSeer ǖ CSB ǖ Google ǖ MSN ǖ Yahoo Different sources organize the same data differently Home Page 2007 240 EE Foto N. Afrati , Chen Li , Jeffrey D. Ullman: Using views to generate efficient evaluation plans for queries. J. Comput. Syst. Sci. 73 (5): 703ǖ724 (2007) 2005 239 EE Jeffrey D. Ullman: Gradiance OnǖLine Accelerated Learning. ACSC 2005 : 3ǖ6 238 EE Serge Abiteboul , Rakesh Agrawal , Philip A. Bernstein , Michael J. Carey , Stefano Ceri , W. Bruce Croft , David J. DeWitt , Michael J. Franklin , Hector GarciaǖMolina , Dieter Gawlick , Jim Gray , Laura M. Haas , Alon Y. Halevy , Joseph M. Hellerstein , Yannis E. Ioannidis , Martin L. Kersten , Michael J. Pazzani , Michael Lesk , David Maier , Jeffrey F. Naughton , HansǖJörg Schek , Timos K. Sellis , Avi Silberschatz , Michael Stonebraker , Richard T. Snodgrass , Jeffrey D. Ullman, Gerhard Weikum , Jennifer Widom , Stanley B. Zdonik : The Lowell database research selfǖassessment. Commun. ACM 48 (5): 111ǖ118 (2005) 237 EE Serge Abiteboul , Richard Hull , Victor Vianu , Sheila A. Greibach , Michael A. Harrison , Ellis Horowitz , Daniel J. Rosenkrantz , Jeffrey D. Ullman, Moshe Y. Vardi : In memory of Seymour Ginsburg 1928 ǖ 2004. SIGMOD Record 34 (1): 5ǖ12 (2005) 2003 236 EE Jeffrey D. Ullman: A Survey of New Directions in Database System. DASFAA 2003 : 3ǖ 235 EE Jeffrey D. Ullman: Improving the Efficiency of DatabaseǖSystem Teaching. SIGMOD Conference 2003 : 1ǖ3 234 EE Jim Gray , HansǖJörg Schek , Michael Stonebraker , Jeffrey D. Ullman: The Lowell Report. SIGMOD Conference 2003 : 680 233 EE Serge Abiteboul , Rakesh Agrawal , Philip A. Bernstein , Michael J. Carey , Stefano Ceri , W. Bruce Croft , David J. DeWitt , Michael J. Franklin , Hector GarciaǖMolina , Dieter Gawlick , Jim Gray , Laura M. Haas , Alon Y. Halevy , Joseph M. Hellerstein , Yannis E. Ioannidis , Martin L. Kersten , Michael J. Pazzani , Michael Lesk , David Maier , Jeffrey F. Naughton , HansǖJörg Schek , Timos K. Sellis , Avi Silberschatz , Michael Stonebraker , Richard T. Snodgrass , Jeffrey D. Ullman, Gerhard Weikum , Jennifer Widom , Stanley B. Zdonik : The Lowell Database Research Self Assessment CoRR cs.DB/0310006 : (2003) 232 EE Anand Rajaraman , Jeffrey D. Ullman: Querying websites using compact skeletons. J. Comput. Syst. Sci. 66 (4): 809ǖ851 (2003) 2001 231 EE Chen Li , Mayank Bawa , Jeffrey D. Ullman: Minimizing View Sets without Losing QueryǖAnswering Power. ICDT 2001 : 99ǖ113 230 EE Anand Rajaraman , Jeffrey D. Ullman: Querying Websites Using Compact Skeletons. PODS 2001 229 EE Foto N. Afrati , Chen Li , Jeffrey D. Ullman: Generating Efficient Plans for Queries Using Views. SIGMOD Conference 2001 : 319ǖ330 228 EE Edith Cohen , Mayur Datar , Shinji Fujiwara , Aristides Gionis , Piotr Indyk , Rajeev Motwani , Jeffrey D. Ullman, Cheng Yang : Finding Interesting Associations without Support Pruning. IEEE Trans. Knowl. Data Eng. 13 (1): 64ǖ78 (2001) 2000 227 Hector GarciaǖMolina , Jeffrey D. Ullman, Jennifer Widom : Database System Implementation PrenticeǖHall 2000 226 EE Jeffrey D. Ullman: A Survey of AssociationǖRule Mining. Discovery Science 2000 : 1ǖ14 225 EE Edith Cohen , Mayur Datar , Shinji Fujiwara , Aristides Gionis , Piotr Indyk , Rajeev Motwani , Jeffrey D. Ullman, Cheng Yang : Finding Interesting Associations without Support Pruning. ICDE 2000 : 489ǖ499 224 EE Shinji Fujiwara , Jeffrey D. Ullman, Rajeev Motwani : Dynamic MissǖCounting Algorithms: Finding Advanced Scholar Search Scholar Preferences ǖ Scholar Help Different sources organize the same data differently Scholar All articles ǖ Recent articles Results 1 ǖ 10 of about 12 for author:"jd ullman" . ( 0.07 seconds) Querying websites using compact skeletons ǖ all 11 versions » jd ullman A Rajaraman, JD Ullman ǖ Journal of Computer and System Sciences, 2003 ǖ Elsevier J Ullman Several commercial applications, such as online comparison shopping and process J Hopcroft automation, require integrating information that is scattered across multiple A Rajaraman w ebsites or XML documents. Much research has been devoted to this problem, ... Cited by 13 ǖ Related Articles ǖ Web Search B Konikow ska A Aho [BOOK] Wprowadzenie do teorii automatów, jezyków i obliczen JE Hopcroft, JD Ullman , B Konikow ska ǖ 2003 ǖ Wydaw . Naukow e PWN Cited by 15 ǖ Related Articles ǖ Web Search Improving the efficiency of databaseǖsystem teaching ǖ all 3 versions » JD Ullman ǖ Proceedings of the 2003 ACM SIGMOD international conference …, 2003 ǖ portal.acm.org ABSTRACT The education industry has a very poor record of producǖ tivity gains. In this brief article, I outline some of the w ays the teaching of a college course in database systems could be made more ecient, and sta time used ... Cited by 4 ǖ Related Articles ǖ Web Search A survey of new directions in database systems ǖ all 5 versions » JD Ullman ǖ Database Systems for Advanced Applications, 2003.(DASFAA …, 2003 ǖ ieeexplore.ieee.org A survey of new directions in database systems. Ullman, JD Stanford University; This paper appears in: Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International ... Cited by 3 ǖ Related Articles ǖ Web Search [CITATION] ???? AV Aho, R Sethi, JD Ullman ǖ 2003 ǖ ??: ??????? Cited by 6 ǖ Related Articles ǖ Web Search [BOOK] Automi, linguaggi e calcolabilità … Hopcroft, R Motw ani, JD Ullman , L Bernardinello, L … ǖ 2003 ǖ Pearson Education Italia Cited by 5 ǖ Related Articles ǖ Web Search [CITATION] ??????? H GarciaǖMolina, JD Ullman , J Widom ǖ 2003 ǖ ??: ??????? Cited by 4 ǖ Related Articles ǖ Web Search [BOOK] Implementacja systemów baz danych H GarciaǖMolina, J Widom, M Jurkiew icz, JD Ullman ǖ 2003 ǖ Wydaw nictw a Naukow oǖTechniczne Cited by 3 ǖ Related Articles ǖ Web Search [BOOK] Projektowanie i analiza algorytmów: klasyczna praca z teorii algorytmów komputerowych AV Aho, JE Hopcroft, JD Ullman , W Derechow ski ǖ 2003 ǖ Helion Cited by 2 ǖ Related Articles ǖ Web Search [CITATION] ??????? AV AHO, JE HOPCROFT, JD ULLMAN ǖ 2003 ǖ ??: ??????? Cited by 1 ǖ Related Articles ǖ Web Search Result Page: 1 2 Next Google Home ǖ About Google ǖ About Google Scholar ©2007 Google Motivation Context Multiple data sources containing information about similar entities, with some redundancy (e.g., sources of the deep Web). Several different ways to present this information, i.e., several different schemata. No a priori information about (some of) these schemata. How to know the relationships between these schemata, by just looking at the instances? Other way to see this problem: Match operator on schema mappings, in the setting of data exchange. Motivation Context Multiple data sources containing information about similar entities, with some redundancy (e.g., sources of the deep Web). Several different ways to present this information, i.e., several different schemata. No a priori information about (some of) these schemata. How to know the relationships between these schemata, by just looking at the instances? Other way to see this problem: Match operator on schema mappings, in the setting of data exchange. Motivation Context Multiple data sources containing information about similar entities, with some redundancy (e.g., sources of the deep Web). Several different ways to present this information, i.e., several different schemata. No a priori information about (some of) these schemata. How to know the relationships between these schemata, by just looking at the instances? Other way to see this problem: Match operator on schema mappings, in the setting of data exchange. Problem definition Problem Given two (relational) database instances I and J with different schemata, what is the optimal description ¦ of J knowing I (with ¦ a finite set of formulas in some logical language)? What does optimal implies: Conciseness of description. Validity of facts predicted by I and ¦. All facts of J explained by I and ¦. (Note the asymmetry between I and J; context of data exchange where J is computed from I and ¦). Problem definition Problem Given two (relational) database instances I and J with different schemata, what is the optimal description ¦ of J knowing I (with ¦ a finite set of formulas in some logical language)? What does optimal implies: Conciseness of description. Validity of facts predicted by I and ¦. All facts of J explained by I and ¦. (Note the asymmetry between I and J; context of data exchange where J is computed from I and ¦). Outline Source-to-target tuple-generating dependencies Definition (Source-to-target tgd) First-order formula of the form: 8x '(x) ! 9y (x; y) with: ' conjunction of source relation atoms; conjunction of target relation atoms; all variables of x bound in '. Example 0 8x18x2 R1(x1; x2) ^ R2(x2) ! 9y R (x1; y) Particular tgds We consider two ways of having simpler tgds: Disallow existential quantifiers on the right hand-side: full tgds. Disallow cycles on both left- and right-hand sides: acyclic tgds. (Classical notion of acyclicity on hypergraphs extending the basic notion of acyclicity on graphs.) Examples 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ^ R3(x3; x1) ! R (x1) is cyclic (and full). 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ! R (x1) is acyclic (and full). We then consider the languages: Ltgd: arbitrary source-to-target tgds; Lfull: full tgds; Lacyc: acyclic tgds; Lfacyc: full and acyclic tgds. Particular tgds We consider two ways of having simpler tgds: Disallow existential quantifiers on the right hand-side: full tgds. Disallow cycles on both left- and right-hand sides: acyclic tgds. (Classical notion of acyclicity on hypergraphs extending the basic notion of acyclicity on graphs.) Examples 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ^ R3(x3; x1) ! R (x1) is cyclic (and full). 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ! R (x1) is acyclic (and full). We then consider the languages: Ltgd: arbitrary source-to-target tgds; Lfull: full tgds; Lacyc: acyclic tgds; Lfacyc: full and acyclic tgds.