Semistructured Data

Total Page:16

File Type:pdf, Size:1020Kb

Semistructured Data Edinburgh Research Explorer Semistructured Data Citation for published version: Buneman, P 1997, Semistructured Data. in PODS '97 Proceedings of the sixteenth ACM SIGACT- SIGMOD-SIGART symposium on Principles of database systems. ACM, pp. 117-121. https://doi.org/10.1145/263661.263675 Digital Object Identifier (DOI): 10.1145/263661.263675 Link: Link to publication record in Edinburgh Research Explorer Document Version: Early version, also known as pre-print Published In: PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 27. Sep. 2021 Semistructured Data Peter Buneman Department of Computer and Information Science University of Pennsylvania Philadelphia PA p etercisup ennedu Some data really is unstructured Abstract The most obvious motivation comes from the need to bring In semistructured data the information that is normally as new forms of data into the ambit of conventional database so ciated with a schema is contained within the data which is technology Some of these such as do cuments with struc sometimes called selfdescribing In some forms of semi tured text and data formats while they may structured data there is no separate schema in others it call for increasingly expressive query languages and new op exists but only places lo ose constraints on the data Semi timization techniques only require mild extensions to the structured data has recently emerged as an imp ortant topic existing notion of data mo dels such as ODMG How of study for a variety of reasons First there are data sources ever these extensions still require the prior imp osition of such as the Web which we would like to treat as databases structure on the data and there are some forms of data for but which cannot b e constrained by a schema Second it which this is genuinely dicult may b e desirable to have an extremely exible format for The most immediate example of data that cannot b e con data exchange b etween disparate databases Third even strained by a schema is the WorldWideWeb As database when dealing with structured data it may b e helpful to view researchers we would like to think of this as a database but it as semistructured for the purp oses of browsing This tu to what extent are database to ols available for querying or torial will cover a numb er of issues surrounding such data maintaining the web Most web queries exploit information nding a concise formulation building a suciently expres retrieval techniques to retrieve individual pages from their sive language for querying and transformation and opti contents but there is little available that allows us to use mization problems the structure of the web in formulating queries and since the web do es not obviously conform to any standard data The motivation mo del we need a metho d of describing its structure Another example little known to the database commu The topic of semistructured data also called unstructured nity but resp onsible for piquing the authors interest in this data is relatively recent and a tutorial on the topic may topic is the database management system ACeDB which well b e premature It represents if anything the conver is p opular with biologists Sup ercially it lo oks like gence of a numb er of lines of thinking ab out new ways to an ob jectoriented database system for it has a schema represent and query data that do not completely t with language that resembles that of an ob jectoriented DBMS conventional data mo dels The purp ose of this tutorial is but this schema imp oses only lo ose constraints on the data to to describ e this motivation and to suggest areas in which Moreover the relationship b etween data and schema is not further research may b e fruitful For a similar exp osition easily describ ed in ob jectoriented terms and there are struc the reader is referred to Serge Abiteb ouls recent survey pa tures that are naturally expressed in ACeDB such as trees of p er arbitrary depth that cannot b e queried using conventional The slides for this tutorial will b e made available from a techniques section of the Penn database home page httpwwwcisupennedudb Data Integration This work was partly supp orted by the Army Research Oce A second motivation is that of data exchange and transfor DAAH and the National Science Foundation CCR mation which is the starting p oint for the Tsimmis pro ject at Stanford The rationale here is that none of the existing data mo dels is allembracing so that it is dicult to build software that will easily convert b etween two dis parate mo dels The Ob ject Exchange Mo del OEM oers a highly exible data structure that may b e used to cap ture most kinds of data and provides a substrate in which almost any other data structure may b e represented In ef fect OEM is an internal data structure for exchange of data b etween DBMSs but having such a structure invites the idea of querying data in OEM format directly Entry Entry Entry References Movie Movie TV Show Is referenced in Title Cast Director Title Cast Director Title Cast Episode “Play it again, Sam” “Casablanca” “Bogart” “Bacall” Credit Actors 1 2 3 1.2E6 Actors “Allen” Special Guests Director “Allen” Figure An example movie database Browsing b oth with data of typ es such as int and string and p ossi bly other base or external abstract typ es video audio etc A nal motivation is that of browsing Generally sp eaking Edges are also with names such as Movie and Title that a user cannot write a database query without knowledge would normally b e used for attribute or class names We of the schema However schemas may have opaque termi shall refer to such lab els as symbols Internally they are rep nology and the rationale for the design is often dicult to resented as strings Note that arrays may b e represented by understand It may help in understanding the schema to b e lab eling internal edges with integers We can formulate the able to query data without full knowledge of the schema typ e of this kind of lab eled tree as For example the queries type label int j string j j symbol Where in the database is the string Casablanca to type tree setlabel tree b e found The rst line describ es a tagged union or variant the 16 Are there integers in the database greater than second says that a tree is a set of lab eltree pairs The edges out of no des in our trees are assumed to b e unordered What ob jects in the database have an attribute name There are a numb er of variations on this basic mo del that starts with act and it is worth briey reviewing them In leaf no des Such questions cannot b e answered in any generic fashion are lab eled with data internal no des are not lab eled with by standard relational or ob jectoriented query languages meaningful data and edges are lab eled only with symb ols While languages have b een prop osed that allow schema and type base int j string j data to b e queried simultaneously in the context of type tree base j setsymbol tree relational and ob jectoriented database systems these lan guages do not have the exibility to express complex con The dierences b etween the two mo dels are minor and straints on paths and it is not clear how their implementa give rise to minor dierences in the query language It is tion will work on the structures describ ed b elow easy to dene mappings in b oth directions Another p ossibility is to allow lab els on internal no des The Mo del for example type base int j string j j symbol The unifying idea in semistructured data is the representa type tree label setlabel tree tion of data as some kind of graphlike or treelike structure Although we shall allow cycles in the data we shall gener The problem with using this representation directly is ally refer to these graphs as trees The example in gure is that it makes the op eration of taking the union of two trees taken from in which the data mo del is formalized as an dicult to dene However by intro ducing extra edges edge labeled graph The structure is taken with some inac this represaentation can b e converted into one of the edge curacies from a wellknown web database that provides lab elled representations ab ove a go o d example of semistructured data There are several A nal and more complex issue is that of ob ject identity things to note ab out it If one connes ones attention to the by which we mean no de lab els or p ossibly edge lab els parts of the database b elow Movie edges the data app ears that apart from an equality test are not observable in fairly regular except that there are two ways of representing the query language In OEM ob ject identities are used as a cast That is the data do es not quite t with some re no de lab els and placeholders to dene trees While ob ject lational or ob jectoriented presentation Edges are lab eled identities provide an ecient way to dene and test equality The select
Recommended publications
  • Probabilistic Databases
    Series ISSN: 2153-5418 SUCIU • OLTEANU •RÉ •KOCH M SYNTHESIS LECTURES ON DATA MANAGEMENT &C Morgan & Claypool Publishers Series Editor: M. Tamer Özsu, University of Waterloo Probabilistic Databases Probabilistic Databases Dan Suciu, University of Washington, Dan Olteanu, University of Oxford Christopher Ré,University of Wisconsin-Madison and Christoph Koch, EPFL Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk DATABASES PROBABILISTIC assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic Dan Suciu inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and Dan Olteanu can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-kquery processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases.
    [Show full text]
  • Understanding the Hidden Web
    Introduction General Framework Different Modules Conclusion Understanding the Hidden Web Pierre Senellart Athens University of Economics & Business, 28 July 2008 Introduction General Framework Different Modules Conclusion Simple problem Contact all co-authors of Serge Abiteboul. It’s easy! You just need to: Find all co-authors. For each of them, find their current email address. Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication Return articles published in e.g., J Biol Chem or Nature Date Return articles published between — e.g., 1996 Subject Areas Return articles in all subject areas. Return only articles in the following subject areas: Biology, Life Sciences, and Environmental Science Business, Administration, Finance, and Economics Chemistry and Materials Science Engineering, Computer Science, and Mathematics Medicine, Pharmacology, and Veterinary Science Physics, Astronomy, and Planetary Science Social Sciences, Arts, and Humanities ©2007 Google Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Essays Dedicated to Peter Buneman ; [Colloquim in Celebration
    Val Tannen Limsoon Wong Leonid Libkin Wenfei Fan Wang-Chiew Tan Michael Fourman (Eds.) In Search of Elegance in the Theory and Practice of Computation Essays Dedicated to Peter Buneman 4^1 Springer Table of Contents Models for Data-Centric Workflows 1 Serge Abiteboul and Victor Vianu Relational Databases and Bell's Theorem 13 Samson Abramsky High-Level Rules for Integration and Analysis of Data: New Challenges 36 Bogdan Alexe, Douglas Burdick, Mauricio A. Hernandez, Georgia Koutrika, Rajasekar Krishnamurthy, Lucian Popa, Ioana R. Stanoi, and Ryan Wisnesky A New Framework for Designing Schema Mappings 56 Bogdan Alexe and Wang-Chiew Tan User Trust and Judgments in a Curated Database with Explicit Provenance 89 David W. Archer, Lois M.L. Delcambre, and David Maier An Abstract, Reusable, and Extensible Programming Language Design Architecture 112 Hassan Ait-Kaci A Discussion on Pricing Relational Data 167 Magdalena Balazinska, Bill Howe, Paraschos Koutris, Dan Suciu, and Prasang Upadhyaya Tractable Reasoning in Description Logics with Functionality Constraints 174 Andrea Call, Georg Gottlob, and Andreas Pieris Toward a Theory of Self-explaining Computation 193 James Cheney, Umut A. Acar, and Roly Perera To Show or Not to Show in Workflow Provenance 217 Susan B. Davidson, Sanjeev Khanna, and Tova Milo Provenance-Directed Chase&Backchase 227 Alin Deutsch and Richard Hull Data Quality Problems beyond Consistency and Deduplication 237 Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu X Table of Contents 250 Hitting Buneman Circles Michael Paul Fourman 259 Looking at the World Thru Colored Glasses Floris Geerts, Anastasios Kementsietsidis, and Heiko Muller Static Analysis and Query Answering for Incomplete Data Trees with Constraints 273 Amelie Gheerbrant, Leonid Libkin, and Juan Reutter Using SQL for Efficient Generation and Querying of Provenance Information 291 Boris Glavic, Renee J.
    [Show full text]
  • INF2032 2017-2 Tpicos Em Bancos De Dados III ”Foundations of Databases”
    INF2032 2017-2 Tpicos em Bancos de Dados III "Foundations of Databases" Profs. Srgio Lifschitz and Edward Hermann Haeusler August-December 2017 1 Motivation The motivation of this course is to put together theory and practice of database and knowledge base systems. The main content includes a formal approach for the underlying existing technology from the point of view of logical expres- siveness and computational complexity. New and recent database models and systems are discussed from theory to practical issues. 2 Goal of the course The main goal is to introduce the student to the logical and computational foundations of database systems. 3 Sylabus The course in divided in two parts: Classical part: relational database model 1. Revision and formalization of relational databases. Limits of the relational data model; 2. Relational query languages; DataLog and recursion; 3. Logic: Propositional, First-Order, Higher-Order, Second-Order, Fixed points logics; 4. Computational complexity and expressiveness of database query lan- guages; New approaches for database logical models 1. XML-based models, Monadic Second-Order Logic, XPath and XQuery; 2. RDF-based knowledge bases; 3. Decidable Fragments of First-Order logic, Description Logic, OWL dialects; 4. NoSQL and NewSQL databases; 4 Evaluation The evaluation will be based on homework assignments (70 %) and a research manuscript + seminar (30%). 5 Supporting material Lecture notes, slides and articles. Referenced books are freely available online. More details at a wiki-based web page (available from August 7th on) References [ABS00] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: From Relations to Semistructured Data and XML.
    [Show full text]
  • Author Template for Journal Articles
    Mining citation information from CiteSeer data Dalibor Fiala University of West Bohemia, Univerzitní 8, 30614 Plzeň, Czech Republic Phone: 00420 377 63 24 29, fax: 00420 377 63 24 01, email: [email protected] Abstract: The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including cita- tion and in-degree counts, HITS, PageRank, and its variations based on both the citation and colla- boration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods. Keywords: CiteSeer, citation analysis, rankings, evaluation. Introduction Data from CiteSeer have been surprisingly little explored in the scientometric lite- rature. One of the reasons for this may have been fears that the data gathered in an automated way from the Web are inaccurate – incomplete, erroneous, ambiguous, redundant, or simply wrong. Also, the uncontrolled and decentralized nature of the Web is said to simplify manipulating and biasing Web-based publication and citation metrics. However, there have been a few attempts at processing the Cite- Seer data which we will briefly mention. Zhou et al.
    [Show full text]
  • Data on the Web: from Relations to Semistructured Data and XML
    Data on the Web: From Relations to Semistructured Data and XML Serge Abiteboul Peter Buneman Dan Suciu February 19, 2014 1 Data on the Web: from Relations to Semistructured Data and XML Serge Abiteboul Peter Buneman Dan Suciu An initial draft of part of the book Not for distribution 2 Contents 1 Introduction 7 1.1 Audience . 8 1.2 Web Data and the Two Cultures . 9 1.3 Organization . 15 I Data Model 17 2 A Syntax for Data 19 2.1 Base types . 21 2.2 Representing Relational Databases . 21 2.3 Representing Object Databases . 22 2.4 Specification of syntax . 26 2.5 The Object Exchange Model, OEM . 26 2.6 Object databases . 27 2.7 Other representations . 30 2.7.1 ACeDB . 30 2.8 Terminology . 32 2.9 Bibliographic Remarks . 34 3 XML 37 3.1 Basic syntax . 39 3.1.1 XML Elements . 39 3.1.2 XML Attributes . 41 3.1.3 Well-Formed XML Documents . 42 3.2 XML and Semistructured Data . 42 3.2.1 XML Graph Model . 43 3.2.2 XML References . 44 3 4 CONTENTS 3.2.3 Order . 46 3.2.4 Mixing elements and text . 47 3.2.5 Other XML Constructs . 47 3.3 Document Type Declarations . 48 3.3.1 A Simple DTD . 49 3.3.2 DTD's as Grammars . 49 3.3.3 DTD's as Schemas . 50 3.3.4 Declaring Attributes in DTDs . 52 3.3.5 Valid XML Documents . 54 3.3.6 Limitations of DTD's as schemas .
    [Show full text]
  • O. Peter Buneman Curriculum Vitæ – Jan 2008
    O. Peter Buneman Curriculum Vitæ { Jan 2008 Work Address: LFCS, School of Informatics University of Edinburgh Crichton Street Edinburgh EH8 9LE Scotland Tel: +44 131 650 5133 Fax: +44 667 7209 Email: [email protected] or [email protected] Home Address: 14 Gayfield Square Edinburgh EH1 3NX Tel: 0131 557 0965 Academic record 2004-present Research Director, Digital Curation Centre 2002-present Adjunct Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 2002-present Professor of Database Systems, Laboratory for the Foundations of Computer Science, School of Informatics, University of Edinburgh 1990-2001 Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1989 Associate Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1987 Graduate Chairman, Department of Computer and Information Science, University of Pennsyl- vania 1975-1981 Assistant Professor of Computer Science at the Moore School and Assistant Professor of Decision Sciences at the Wharton School, University of Pennsylvania. 1969-1974 Research Associate and subsequently Lecturer in the School of Artificial Intelligence, Edinburgh University. Education 1970 PhD in Mathematics, University of Warwick (Supervisor E.C. Zeeman) 1966 MA in Mathematics, Cambridge. 1963-1966 Major Scholar in Mathematics at Gonville and Caius College, Cambridge. Awards, Visting positions Distinguished Visitor, University of Auckland, 2006 Trustee, VLDB Endowment, 2004 Fellow of the Royal Society of Edinburgh, 2004 Royal Society Wolfson Merit Award, 2002 ACM Fellow, 1999. 1 Visiting Resarcher, INRIA, Jan 1999 Visiting Research Fellowship sponsored by the Japan Society for the Promotion of Science, Research Institute for the Mathematical Sciences, Kyoto University, Spring 1997.
    [Show full text]
  • SIGMOD Record’S Series of Interviews with Distinguished Members of the Database Community
    Serge Abiteboul Speaks Out on Building a Research Group in Europe, How He Got Involved in a Startup, Why Systems Papers Shouldn’t Have to Include Measurements, the Value of Object Databases, and More by Marianne Winslett http://www-rocq.inria.fr/~abitebou/ Welcome to this installment of ACM SIGMOD Record’s series of interviews with distinguished members of the database community. I’m Marianne Winslett, and today we are at the SIGMOD 2006 conference in Chicago, Illinois. I have here with me Serge Abiteboul, who is a senior researcher at INRIA and the manager of the Gemo Database Group. Serge’s research interests are in databases, web data, and database theory. He received the SIGMOD Innovation Award in 1998, and he is a cofounder of Xyleme, a company that provides XML-based content management. His PhD is from the University of Southern California. So, Serge, welcome! Serge, you have built what may be the most successful database group in Europe. How did you do it? And are the challenges for building a group different in Europe than they would be in the US? Many people deserve credit for the group’s success. It is a continuation of a research group named Verso that was sponsored by Francois Bancilhon and Michel Scholl; so when I arrived, the group was already existing. My contribution was to bring in some new fresh very good scientists --- Luc Segoufin, Ioana Manolescu, Sophie Cluet. More recently, we moved to a new research unit of INRIA. The idea was to get closer to the university, so now we are at the University of Orsay; we merged with the knowledge representation group that was managed by Christina Rousset.
    [Show full text]
  • Example-Guided Synthesis of Relational Queries
    Example-Guided Synthesis of Relational Queries Aalok Thakkar Aaditya Naik Nathaniel Sands University of Pennsylvania University of Pennsylvania University of Southern California Philadelphia, USA Philadelphia, USA Los Angeles, USA [email protected] [email protected] [email protected] Rajeev Alur Mayur Naik Mukund Raghothaman University of Pennsylvania University of Pennsylvania University of Southern California Philadelphia, USA Philadelphia, USA Los Angeles, USA [email protected] [email protected] [email protected] Abstract 1 Introduction Program synthesis tasks are commonly specified via input- Program synthesis aims to automatically synthesize a pro- output examples. Existing enumerative techniques for such gram that meets user intent. While the user intent is classi- tasks are primarily guided by program syntax and only make cally described as a correctness specification, synthesizing indirect use of the examples. We identify a class of synthesis programs from input-output examples has gained much trac- algorithms for programming-by-examples, which we call tion, as evidenced by the many applications of programming- Example-Guided Synthesis (EGS), that exploits latent struc- by-example and programming-by-demonstration, such as ture in the provided examples while generating candidate spreadsheet programming [25], relational query synthesis [51, programs. We present an instance of EGS for the synthesis 57], and data wrangling [19, 33]. Nevertheless, their scalabil- of relational queries and evaluate it on 86 tasks from three ity remains an important challenge, and often hinders their application domains: knowledge discovery, program analy- application in the field [5]. sis, and database querying. Our evaluation shows that EGS Existing synthesis tools predominantly adapt a syntax- outperforms state-of-the-art synthesizers based on enumer- guided approach to search through the space of candidate ative search, constraint solving, and hybrid techniques in programs.
    [Show full text]
  • Paris C. Kanellakis
    In Memoriam Paris C Kanellakis Our colleague and dear friend Paris C Kanellakis died unexp ectedly and tragically on De cemb er together with his wife MariaTeresa Otoya and their b eloved young children Alexandra and Stephanos En route to Cali Colombia for an annual holiday reunion with his wifes family their airplane strayed o course without warning during the night minutes b efore an exp ected landing and crashed in the Andes As researchers we mourn the passing of a creative and thoughtful colleague who was resp ected for his manycontributions b oth technical and professional to the computer science research community As individuals wegrieveover our tragic lossof a friend who was regarded with great aection and of a happy thriving family whose warmth and hospitalitywere gifts appreciated by friends around the world Their deaths create for us a void that cannot b e lled Paris left unnished several pro jects including a pap er on database theory intended for this sp ecial issue of Computing Surveys to b e written during his holiday visit to Colombia Instead we wish to oer here a brief biographyofParis and a description of the research topics that interested Paris over the last few years together with the contributions that he made to these areas It is not our intention to outline denitive surveys or histories of these areas but rather to honor the signicantcontemp orary research of our friend Paris was b orn in Greece in to Eleftherios and Argyroula Kanellakis In he received the Diploma in Electrical Engineering from the National
    [Show full text]
  • Foundations of Data Management
    Report from Dagstuhl Perspectives Workshop 16151 Foundations of Data Management Edited by Marcelo Arenas1, Richard Hull2, Wim Martens3, Tova Milo4, and Thomas Schwentick5 1 Pontificia Universidad Catolica de Chile, CL, [email protected] 2 IBM TJ Watson Research Center – Yorktown Heights, US, [email protected] 3 Universität Bayreuth, DE, [email protected] 4 Tel Aviv University, IL, [email protected] 5 TU Dortmund, DE, [email protected] Abstract In this Perspectives Workshop we have explored the degree to which principled foundations are crucial to the long-term success and effectiveness of the new generation of data management paradigms and applications, and investigated what forms of research need to be pursued to develop and advance these foundations. The workshop brought together specialists from the existing database theory community, and from adjoining areas, particularly from various subdisciplines within the Big Data community, to understand the challenge areas that might be resolved through principled foundations and mathematical theory. Perspectives Workshop April 10–15, 2016 – http://www.dagstuhl.de/16151 1998 ACM Subject Classification H.2 Database Management Keywords and phrases Foundations of data management, Principles of databases Digital Object Identifier 10.4230/DagRep.6.4.39 Edited in cooperation with Pablo Barceló 1 Executive Summary Marcelo Arenas Richard Hull Wim Martens Tova Milo Thomas Schwentick License Creative Commons BY 3.0 Unported license © Marcelo Arenas, Richard Hull, Wim Martens, Tova Milo, and Thomas Schwentick The focus of Foundations of Data Management (traditionally termed Database Theory) is to provide the many facets of data management with solid and robust mathematical foundations.
    [Show full text]