On the Complexity of Deriving Schema Mappings from Database Instances

Total Page:16

File Type:pdf, Size:1020Kb

On the Complexity of Deriving Schema Mappings from Database Instances On the Complexity of Deriving Schema Mappings from Database Instances Pierre Senellart1;2 Georg Gottlob3 1 3 2 30 November 2007, Gemo Seminar Jeffrey D. Ullman List of publications from the DBLP Bibliography Server ǖ FAQ Coauthor Index ǖ Ask others: ACM DL /Guide ǖ CiteSeer ǖ CSB ǖ Google ǖ MSN ǖ Yahoo Different sources organize the same data differently Home Page 2007 240 EE Foto N. Afrati , Chen Li , Jeffrey D. Ullman: Using views to generate efficient evaluation plans for queries. J. Comput. Syst. Sci. 73 (5): 703ǖ724 (2007) 2005 239 EE Jeffrey D. Ullman: Gradiance OnǖLine Accelerated Learning. ACSC 2005 : 3ǖ6 238 EE Serge Abiteboul , Rakesh Agrawal , Philip A. Bernstein , Michael J. Carey , Stefano Ceri , W. Bruce Croft , David J. DeWitt , Michael J. Franklin , Hector GarciaǖMolina , Dieter Gawlick , Jim Gray , Laura M. Haas , Alon Y. Halevy , Joseph M. Hellerstein , Yannis E. Ioannidis , Martin L. Kersten , Michael J. Pazzani , Michael Lesk , David Maier , Jeffrey F. Naughton , HansǖJörg Schek , Timos K. Sellis , Avi Silberschatz , Michael Stonebraker , Richard T. Snodgrass , Jeffrey D. Ullman, Gerhard Weikum , Jennifer Widom , Stanley B. Zdonik : The Lowell database research selfǖassessment. Commun. ACM 48 (5): 111ǖ118 (2005) 237 EE Serge Abiteboul , Richard Hull , Victor Vianu , Sheila A. Greibach , Michael A. Harrison , Ellis Horowitz , Daniel J. Rosenkrantz , Jeffrey D. Ullman, Moshe Y. Vardi : In memory of Seymour Ginsburg 1928 ǖ 2004. SIGMOD Record 34 (1): 5ǖ12 (2005) 2003 236 EE Jeffrey D. Ullman: A Survey of New Directions in Database System. DASFAA 2003 : 3ǖ 235 EE Jeffrey D. Ullman: Improving the Efficiency of DatabaseǖSystem Teaching. SIGMOD Conference 2003 : 1ǖ3 234 EE Jim Gray , HansǖJörg Schek , Michael Stonebraker , Jeffrey D. Ullman: The Lowell Report. SIGMOD Conference 2003 : 680 233 EE Serge Abiteboul , Rakesh Agrawal , Philip A. Bernstein , Michael J. Carey , Stefano Ceri , W. Bruce Croft , David J. DeWitt , Michael J. Franklin , Hector GarciaǖMolina , Dieter Gawlick , Jim Gray , Laura M. Haas , Alon Y. Halevy , Joseph M. Hellerstein , Yannis E. Ioannidis , Martin L. Kersten , Michael J. Pazzani , Michael Lesk , David Maier , Jeffrey F. Naughton , HansǖJörg Schek , Timos K. Sellis , Avi Silberschatz , Michael Stonebraker , Richard T. Snodgrass , Jeffrey D. Ullman, Gerhard Weikum , Jennifer Widom , Stanley B. Zdonik : The Lowell Database Research Self Assessment CoRR cs.DB/0310006 : (2003) 232 EE Anand Rajaraman , Jeffrey D. Ullman: Querying websites using compact skeletons. J. Comput. Syst. Sci. 66 (4): 809ǖ851 (2003) 2001 231 EE Chen Li , Mayank Bawa , Jeffrey D. Ullman: Minimizing View Sets without Losing QueryǖAnswering Power. ICDT 2001 : 99ǖ113 230 EE Anand Rajaraman , Jeffrey D. Ullman: Querying Websites Using Compact Skeletons. PODS 2001 229 EE Foto N. Afrati , Chen Li , Jeffrey D. Ullman: Generating Efficient Plans for Queries Using Views. SIGMOD Conference 2001 : 319ǖ330 228 EE Edith Cohen , Mayur Datar , Shinji Fujiwara , Aristides Gionis , Piotr Indyk , Rajeev Motwani , Jeffrey D. Ullman, Cheng Yang : Finding Interesting Associations without Support Pruning. IEEE Trans. Knowl. Data Eng. 13 (1): 64ǖ78 (2001) 2000 227 Hector GarciaǖMolina , Jeffrey D. Ullman, Jennifer Widom : Database System Implementation PrenticeǖHall 2000 226 EE Jeffrey D. Ullman: A Survey of AssociationǖRule Mining. Discovery Science 2000 : 1ǖ14 225 EE Edith Cohen , Mayur Datar , Shinji Fujiwara , Aristides Gionis , Piotr Indyk , Rajeev Motwani , Jeffrey D. Ullman, Cheng Yang : Finding Interesting Associations without Support Pruning. ICDE 2000 : 489ǖ499 224 EE Shinji Fujiwara , Jeffrey D. Ullman, Rajeev Motwani : Dynamic MissǖCounting Algorithms: Finding Advanced Scholar Search Scholar Preferences ǖ Scholar Help Different sources organize the same data differently Scholar All articles ǖ Recent articles Results 1 ǖ 10 of about 12 for author:"jd ullman" . ( 0.07 seconds) Querying websites using compact skeletons ǖ all 11 versions » jd ullman A Rajaraman, JD Ullman ǖ Journal of Computer and System Sciences, 2003 ǖ Elsevier J Ullman Several commercial applications, such as online comparison shopping and process J Hopcroft automation, require integrating information that is scattered across multiple A Rajaraman w ebsites or XML documents. Much research has been devoted to this problem, ... Cited by 13 ǖ Related Articles ǖ Web Search B Konikow ska A Aho [BOOK] Wprowadzenie do teorii automatów, jezyków i obliczen JE Hopcroft, JD Ullman , B Konikow ska ǖ 2003 ǖ Wydaw . Naukow e PWN Cited by 15 ǖ Related Articles ǖ Web Search Improving the efficiency of databaseǖsystem teaching ǖ all 3 versions » JD Ullman ǖ Proceedings of the 2003 ACM SIGMOD international conference …, 2003 ǖ portal.acm.org ABSTRACT The education industry has a very poor record of producǖ tivity gains. In this brief article, I outline some of the w ays the teaching of a college course in database systems could be made more ecient, and sta time used ... Cited by 4 ǖ Related Articles ǖ Web Search A survey of new directions in database systems ǖ all 5 versions » JD Ullman ǖ Database Systems for Advanced Applications, 2003.(DASFAA …, 2003 ǖ ieeexplore.ieee.org A survey of new directions in database systems. Ullman, JD Stanford University; This paper appears in: Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International ... Cited by 3 ǖ Related Articles ǖ Web Search [CITATION] ???? AV Aho, R Sethi, JD Ullman ǖ 2003 ǖ ??: ??????? Cited by 6 ǖ Related Articles ǖ Web Search [BOOK] Automi, linguaggi e calcolabilità … Hopcroft, R Motw ani, JD Ullman , L Bernardinello, L … ǖ 2003 ǖ Pearson Education Italia Cited by 5 ǖ Related Articles ǖ Web Search [CITATION] ??????? H GarciaǖMolina, JD Ullman , J Widom ǖ 2003 ǖ ??: ??????? Cited by 4 ǖ Related Articles ǖ Web Search [BOOK] Implementacja systemów baz danych H GarciaǖMolina, J Widom, M Jurkiew icz, JD Ullman ǖ 2003 ǖ Wydaw nictw a Naukow oǖTechniczne Cited by 3 ǖ Related Articles ǖ Web Search [BOOK] Projektowanie i analiza algorytmów: klasyczna praca z teorii algorytmów komputerowych AV Aho, JE Hopcroft, JD Ullman , W Derechow ski ǖ 2003 ǖ Helion Cited by 2 ǖ Related Articles ǖ Web Search [CITATION] ??????? AV AHO, JE HOPCROFT, JD ULLMAN ǖ 2003 ǖ ??: ??????? Cited by 1 ǖ Related Articles ǖ Web Search Result Page: 1 2 Next Google Home ǖ About Google ǖ About Google Scholar ©2007 Google Motivation Context Multiple data sources containing information about similar entities, with some redundancy (e.g., sources of the deep Web). Several different ways to present this information, i.e., several different schemata. No a priori information about (some of) these schemata. How to know the relationships between these schemata, by just looking at the instances? Other way to see this problem: Match operator on schema mappings, in the setting of data exchange. Motivation Context Multiple data sources containing information about similar entities, with some redundancy (e.g., sources of the deep Web). Several different ways to present this information, i.e., several different schemata. No a priori information about (some of) these schemata. How to know the relationships between these schemata, by just looking at the instances? Other way to see this problem: Match operator on schema mappings, in the setting of data exchange. Motivation Context Multiple data sources containing information about similar entities, with some redundancy (e.g., sources of the deep Web). Several different ways to present this information, i.e., several different schemata. No a priori information about (some of) these schemata. How to know the relationships between these schemata, by just looking at the instances? Other way to see this problem: Match operator on schema mappings, in the setting of data exchange. Problem definition Problem Given two (relational) database instances I and J with different schemata, what is the optimal description ¦ of J knowing I (with ¦ a finite set of formulas in some logical language)? What does optimal implies: Conciseness of description. Validity of facts predicted by I and ¦. All facts of J explained by I and ¦. (Note the asymmetry between I and J; context of data exchange where J is computed from I and ¦). Problem definition Problem Given two (relational) database instances I and J with different schemata, what is the optimal description ¦ of J knowing I (with ¦ a finite set of formulas in some logical language)? What does optimal implies: Conciseness of description. Validity of facts predicted by I and ¦. All facts of J explained by I and ¦. (Note the asymmetry between I and J; context of data exchange where J is computed from I and ¦). Outline Source-to-target tuple-generating dependencies Definition (Source-to-target tgd) First-order formula of the form: 8x '(x) ! 9y (x; y) with: ' conjunction of source relation atoms; conjunction of target relation atoms; all variables of x bound in '. Example 0 8x18x2 R1(x1; x2) ^ R2(x2) ! 9y R (x1; y) Particular tgds We consider two ways of having simpler tgds: Disallow existential quantifiers on the right hand-side: full tgds. Disallow cycles on both left- and right-hand sides: acyclic tgds. (Classical notion of acyclicity on hypergraphs extending the basic notion of acyclicity on graphs.) Examples 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ^ R3(x3; x1) ! R (x1) is cyclic (and full). 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ! R (x1) is acyclic (and full). We then consider the languages: Ltgd: arbitrary source-to-target tgds; Lfull: full tgds; Lacyc: acyclic tgds; Lfacyc: full and acyclic tgds. Particular tgds We consider two ways of having simpler tgds: Disallow existential quantifiers on the right hand-side: full tgds. Disallow cycles on both left- and right-hand sides: acyclic tgds. (Classical notion of acyclicity on hypergraphs extending the basic notion of acyclicity on graphs.) Examples 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ^ R3(x3; x1) ! R (x1) is cyclic (and full). 0 8x18x28x3 R1(x1; x2) ^ R2(x2; x3) ! R (x1) is acyclic (and full). We then consider the languages: Ltgd: arbitrary source-to-target tgds; Lfull: full tgds; Lacyc: acyclic tgds; Lfacyc: full and acyclic tgds.
Recommended publications
  • Understanding the Hidden Web
    Introduction General Framework Different Modules Conclusion Understanding the Hidden Web Pierre Senellart Athens University of Economics & Business, 28 July 2008 Introduction General Framework Different Modules Conclusion Simple problem Contact all co-authors of Serge Abiteboul. It’s easy! You just need to: Find all co-authors. For each of them, find their current email address. Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication Return articles published in e.g., J Biol Chem or Nature Date Return articles published between — e.g., 1996 Subject Areas Return articles in all subject areas. Return only articles in the following subject areas: Biology, Life Sciences, and Environmental Science Business, Administration, Finance, and Economics Chemistry and Materials Science Engineering, Computer Science, and Mathematics Medicine, Pharmacology, and Veterinary Science Physics, Astronomy, and Planetary Science Social Sciences, Arts, and Humanities ©2007 Google Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Essays Dedicated to Peter Buneman ; [Colloquim in Celebration
    Val Tannen Limsoon Wong Leonid Libkin Wenfei Fan Wang-Chiew Tan Michael Fourman (Eds.) In Search of Elegance in the Theory and Practice of Computation Essays Dedicated to Peter Buneman 4^1 Springer Table of Contents Models for Data-Centric Workflows 1 Serge Abiteboul and Victor Vianu Relational Databases and Bell's Theorem 13 Samson Abramsky High-Level Rules for Integration and Analysis of Data: New Challenges 36 Bogdan Alexe, Douglas Burdick, Mauricio A. Hernandez, Georgia Koutrika, Rajasekar Krishnamurthy, Lucian Popa, Ioana R. Stanoi, and Ryan Wisnesky A New Framework for Designing Schema Mappings 56 Bogdan Alexe and Wang-Chiew Tan User Trust and Judgments in a Curated Database with Explicit Provenance 89 David W. Archer, Lois M.L. Delcambre, and David Maier An Abstract, Reusable, and Extensible Programming Language Design Architecture 112 Hassan Ait-Kaci A Discussion on Pricing Relational Data 167 Magdalena Balazinska, Bill Howe, Paraschos Koutris, Dan Suciu, and Prasang Upadhyaya Tractable Reasoning in Description Logics with Functionality Constraints 174 Andrea Call, Georg Gottlob, and Andreas Pieris Toward a Theory of Self-explaining Computation 193 James Cheney, Umut A. Acar, and Roly Perera To Show or Not to Show in Workflow Provenance 217 Susan B. Davidson, Sanjeev Khanna, and Tova Milo Provenance-Directed Chase&Backchase 227 Alin Deutsch and Richard Hull Data Quality Problems beyond Consistency and Deduplication 237 Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu X Table of Contents 250 Hitting Buneman Circles Michael Paul Fourman 259 Looking at the World Thru Colored Glasses Floris Geerts, Anastasios Kementsietsidis, and Heiko Muller Static Analysis and Query Answering for Incomplete Data Trees with Constraints 273 Amelie Gheerbrant, Leonid Libkin, and Juan Reutter Using SQL for Efficient Generation and Querying of Provenance Information 291 Boris Glavic, Renee J.
    [Show full text]
  • INF2032 2017-2 Tpicos Em Bancos De Dados III ”Foundations of Databases”
    INF2032 2017-2 Tpicos em Bancos de Dados III "Foundations of Databases" Profs. Srgio Lifschitz and Edward Hermann Haeusler August-December 2017 1 Motivation The motivation of this course is to put together theory and practice of database and knowledge base systems. The main content includes a formal approach for the underlying existing technology from the point of view of logical expres- siveness and computational complexity. New and recent database models and systems are discussed from theory to practical issues. 2 Goal of the course The main goal is to introduce the student to the logical and computational foundations of database systems. 3 Sylabus The course in divided in two parts: Classical part: relational database model 1. Revision and formalization of relational databases. Limits of the relational data model; 2. Relational query languages; DataLog and recursion; 3. Logic: Propositional, First-Order, Higher-Order, Second-Order, Fixed points logics; 4. Computational complexity and expressiveness of database query lan- guages; New approaches for database logical models 1. XML-based models, Monadic Second-Order Logic, XPath and XQuery; 2. RDF-based knowledge bases; 3. Decidable Fragments of First-Order logic, Description Logic, OWL dialects; 4. NoSQL and NewSQL databases; 4 Evaluation The evaluation will be based on homework assignments (70 %) and a research manuscript + seminar (30%). 5 Supporting material Lecture notes, slides and articles. Referenced books are freely available online. More details at a wiki-based web page (available from August 7th on) References [ABS00] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: From Relations to Semistructured Data and XML.
    [Show full text]
  • Author Template for Journal Articles
    Mining citation information from CiteSeer data Dalibor Fiala University of West Bohemia, Univerzitní 8, 30614 Plzeň, Czech Republic Phone: 00420 377 63 24 29, fax: 00420 377 63 24 01, email: [email protected] Abstract: The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including cita- tion and in-degree counts, HITS, PageRank, and its variations based on both the citation and colla- boration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods. Keywords: CiteSeer, citation analysis, rankings, evaluation. Introduction Data from CiteSeer have been surprisingly little explored in the scientometric lite- rature. One of the reasons for this may have been fears that the data gathered in an automated way from the Web are inaccurate – incomplete, erroneous, ambiguous, redundant, or simply wrong. Also, the uncontrolled and decentralized nature of the Web is said to simplify manipulating and biasing Web-based publication and citation metrics. However, there have been a few attempts at processing the Cite- Seer data which we will briefly mention. Zhou et al.
    [Show full text]
  • Bibliography
    Bibliography [57391] ISO/IEC JTC1/SC21 N 5739. Database language SQL, April 1991. [69392] ISO/IEC JTC1/SC21 N 6931. Database language SQL (SQL3), June 1992. [A+76] M. M. Astrahan et al. System R: a relational approach to data management. ACM Trans. on Database Systems, 1(2):97–137, 1976. [AA93] P. Atzeni and V. De Antonellis. Relational Database Theory. Benjamin/Cummings Publishing Co., Menlo Park, CA, 1993. [AABM82] P. Atzeni, G. Ausiello, C. Batini, and M. Moscarini. Inclusion and equivalence between relational database schemata. Theoretical Computer Science, 19:267–285, 1982. [AB86] S. Abiteboul and N. Bidoit. Non first normal form relations: An algebra allowing restructuring. Journal of Computer and System Sciences, 33(3):361–390, 1986. [AB87a] M. Atkinson and P. Buneman. Types and persistence in database programming languages. ACM Computing Surveys, 19(2):105–190, June 1987. [AB87b] P. Atzeni and M. C. De Bernardis. A new basis for the weak instance model. In Proc. ACM Symp. on Principles of Database Systems, pages 79–86, 1987. [AB88] S. Abiteboul and C. Beeri. On the manipulation of complex objects. Technical Report, INRIA and Hebrew University, 1988. (To appear, VLDB Journal.) [AB91] S. Abiteboul and A. Bonner. Objects and views. In Proc. ACM SIGMOD Symp. on the Management of Data, 1991. [ABD+89] M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik. The object- oriented database system manifesto. In Proc. of Intl. Conf. on Deductive and Object-Oriented Databases (DOOD), pages 40–57, 1989. [ABGO93] A. Albano, R. Bergamini, G. Ghelli, and R. Orsini.
    [Show full text]
  • Formal Languages and Logics Lecture Notes
    Formal Languages and Logics Jacobs University Bremen Herbert Jaeger Course Number 320211 Lecture Notes Version from Oct 18, 2017 1 1. Introduction 1.1 What these lecture notes are and aren't This lecture covers material that is relatively simple, extremely useful, superbly described in a number of textbooks, and taught in almost precisely the same way in hundreds of university courses. It would not make much sense to re-invent the wheel and re-design this lecture to make it different from the standards. Thus I will follow closely the classical textbook of Hopcroft, Motwani, and Ullman for the first part (automata and formal languages) and, somewhat more freely, the book of Schoening in the second half (logics). Both books are available at the IRC. These books are not required reading; the lecture notes that I prepare are a fully self-sustained reference for the course. However, I would recommend to purchase a personal copy of these books all the same for additional and backup reading because these are highly standard reference books that a computer scientist is likely to need again later in her/his professional life, and they are more detailed than this lecture or these lecture notes. There are several sets of course slides and lecture notes available on the Web which are derived from the Hopcroft/Motwani/Ullman book. At http://www- db.stanford.edu/~ullman/ialc/win00/win00.html, you will find lecture notes prepared by Jeffrey Ullman himself. Given the highly standardized nature of the material and the availablity of so many good textbooks and online lecture notes, these lecture notes of mine do not claim originality.
    [Show full text]
  • Data on the Web: from Relations to Semistructured Data and XML
    Data on the Web: From Relations to Semistructured Data and XML Serge Abiteboul Peter Buneman Dan Suciu February 19, 2014 1 Data on the Web: from Relations to Semistructured Data and XML Serge Abiteboul Peter Buneman Dan Suciu An initial draft of part of the book Not for distribution 2 Contents 1 Introduction 7 1.1 Audience . 8 1.2 Web Data and the Two Cultures . 9 1.3 Organization . 15 I Data Model 17 2 A Syntax for Data 19 2.1 Base types . 21 2.2 Representing Relational Databases . 21 2.3 Representing Object Databases . 22 2.4 Specification of syntax . 26 2.5 The Object Exchange Model, OEM . 26 2.6 Object databases . 27 2.7 Other representations . 30 2.7.1 ACeDB . 30 2.8 Terminology . 32 2.9 Bibliographic Remarks . 34 3 XML 37 3.1 Basic syntax . 39 3.1.1 XML Elements . 39 3.1.2 XML Attributes . 41 3.1.3 Well-Formed XML Documents . 42 3.2 XML and Semistructured Data . 42 3.2.1 XML Graph Model . 43 3.2.2 XML References . 44 3 4 CONTENTS 3.2.3 Order . 46 3.2.4 Mixing elements and text . 47 3.2.5 Other XML Constructs . 47 3.3 Document Type Declarations . 48 3.3.1 A Simple DTD . 49 3.3.2 DTD's as Grammars . 49 3.3.3 DTD's as Schemas . 50 3.3.4 Declaring Attributes in DTDs . 52 3.3.5 Valid XML Documents . 54 3.3.6 Limitations of DTD's as schemas .
    [Show full text]
  • O. Peter Buneman Curriculum Vitæ – Jan 2008
    O. Peter Buneman Curriculum Vitæ { Jan 2008 Work Address: LFCS, School of Informatics University of Edinburgh Crichton Street Edinburgh EH8 9LE Scotland Tel: +44 131 650 5133 Fax: +44 667 7209 Email: [email protected] or [email protected] Home Address: 14 Gayfield Square Edinburgh EH1 3NX Tel: 0131 557 0965 Academic record 2004-present Research Director, Digital Curation Centre 2002-present Adjunct Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 2002-present Professor of Database Systems, Laboratory for the Foundations of Computer Science, School of Informatics, University of Edinburgh 1990-2001 Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1989 Associate Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1987 Graduate Chairman, Department of Computer and Information Science, University of Pennsyl- vania 1975-1981 Assistant Professor of Computer Science at the Moore School and Assistant Professor of Decision Sciences at the Wharton School, University of Pennsylvania. 1969-1974 Research Associate and subsequently Lecturer in the School of Artificial Intelligence, Edinburgh University. Education 1970 PhD in Mathematics, University of Warwick (Supervisor E.C. Zeeman) 1966 MA in Mathematics, Cambridge. 1963-1966 Major Scholar in Mathematics at Gonville and Caius College, Cambridge. Awards, Visting positions Distinguished Visitor, University of Auckland, 2006 Trustee, VLDB Endowment, 2004 Fellow of the Royal Society of Edinburgh, 2004 Royal Society Wolfson Merit Award, 2002 ACM Fellow, 1999. 1 Visiting Resarcher, INRIA, Jan 1999 Visiting Research Fellowship sponsored by the Japan Society for the Promotion of Science, Research Institute for the Mathematical Sciences, Kyoto University, Spring 1997.
    [Show full text]
  • SIGMOD Record’S Series of Interviews with Distinguished Members of the Database Community
    Serge Abiteboul Speaks Out on Building a Research Group in Europe, How He Got Involved in a Startup, Why Systems Papers Shouldn’t Have to Include Measurements, the Value of Object Databases, and More by Marianne Winslett http://www-rocq.inria.fr/~abitebou/ Welcome to this installment of ACM SIGMOD Record’s series of interviews with distinguished members of the database community. I’m Marianne Winslett, and today we are at the SIGMOD 2006 conference in Chicago, Illinois. I have here with me Serge Abiteboul, who is a senior researcher at INRIA and the manager of the Gemo Database Group. Serge’s research interests are in databases, web data, and database theory. He received the SIGMOD Innovation Award in 1998, and he is a cofounder of Xyleme, a company that provides XML-based content management. His PhD is from the University of Southern California. So, Serge, welcome! Serge, you have built what may be the most successful database group in Europe. How did you do it? And are the challenges for building a group different in Europe than they would be in the US? Many people deserve credit for the group’s success. It is a continuation of a research group named Verso that was sponsored by Francois Bancilhon and Michel Scholl; so when I arrived, the group was already existing. My contribution was to bring in some new fresh very good scientists --- Luc Segoufin, Ioana Manolescu, Sophie Cluet. More recently, we moved to a new research unit of INRIA. The idea was to get closer to the university, so now we are at the University of Orsay; we merged with the knowledge representation group that was managed by Christina Rousset.
    [Show full text]
  • Example-Guided Synthesis of Relational Queries
    Example-Guided Synthesis of Relational Queries Aalok Thakkar Aaditya Naik Nathaniel Sands University of Pennsylvania University of Pennsylvania University of Southern California Philadelphia, USA Philadelphia, USA Los Angeles, USA [email protected] [email protected] [email protected] Rajeev Alur Mayur Naik Mukund Raghothaman University of Pennsylvania University of Pennsylvania University of Southern California Philadelphia, USA Philadelphia, USA Los Angeles, USA [email protected] [email protected] [email protected] Abstract 1 Introduction Program synthesis tasks are commonly specified via input- Program synthesis aims to automatically synthesize a pro- output examples. Existing enumerative techniques for such gram that meets user intent. While the user intent is classi- tasks are primarily guided by program syntax and only make cally described as a correctness specification, synthesizing indirect use of the examples. We identify a class of synthesis programs from input-output examples has gained much trac- algorithms for programming-by-examples, which we call tion, as evidenced by the many applications of programming- Example-Guided Synthesis (EGS), that exploits latent struc- by-example and programming-by-demonstration, such as ture in the provided examples while generating candidate spreadsheet programming [25], relational query synthesis [51, programs. We present an instance of EGS for the synthesis 57], and data wrangling [19, 33]. Nevertheless, their scalabil- of relational queries and evaluate it on 86 tasks from three ity remains an important challenge, and often hinders their application domains: knowledge discovery, program analy- application in the field [5]. sis, and database querying. Our evaluation shows that EGS Existing synthesis tools predominantly adapt a syntax- outperforms state-of-the-art synthesizers based on enumer- guided approach to search through the space of candidate ative search, constraint solving, and hybrid techniques in programs.
    [Show full text]
  • Continuance and Transience of Lifetime Co-Authorships
    Open Archive TOULOUSE Archive Ouverte ( OATAO ) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 13197 To link to this article : doi:10.1007/s11192-014-1426-0 URL : http://dx.doi.org/10.1007/s11192-014-1426-0 To cite this version : Cabanac, Guillaume and Hubert, Gilles and Milard, Béatrice Academic careers in Computer Science: Continuance and transience of lifetime co-authorships. (2015) Scientometrics, vol. 102 (n° 1). pp. 135-150. ISSN 0138-9130 Any correspondance concerning this service should be sent to the repository administrator: [email protected] DOI 10.1007/s11192-014-1426-0 Academic careers in Computer Science: Continuance and transience of lifetime co-authorships Guillaume Cabanac · Gilles Hubert · Béatrice Milard Abstract Scholarly publications reify fruitful collaborations between co-authors. A branch of research in the Science Studies focuses on analyzing the co-authorship networks of established scientists. Such studies tell us about how their collaborations developed through their careers. This paper updates previous work by reporting a transversal and a longitudinal studies spanning the lifelong careers of a cohort of researchers from the DBLP bibliographic database. We mined 3,860 researchers’ publication records to study the evolution patterns of their co-authorships. Two features of co-authors were considered: 1) their expertise, and 2) the history of their partnerships with the sampled researchers. Our findings reveal the ephemeral nature of most collaborations: 70% of the new co-authors were only one-shot partners since they did not appear to collaborate on any further publications.
    [Show full text]