Type-Based Detection of XML Query-Update Independence

Total Page:16

File Type:pdf, Size:1020Kb

Type-Based Detection of XML Query-Update Independence Type-Based Detection of XML Query-Update Independence Nicole Bidoit-Tollu Dario Colazzo Federico Ulliana Universite Paris Sud & Universite Paris Sud & Universite Paris Sud & INRIA Saclay INRIA Saclay INRIA Saclay [email protected] [email protected] [email protected] ABSTRACT Schema languages, while in other contexts quite precise schemas, This paper presents a novel static analysis technique to detect XML in the form of a DTD, can be automatically inferred, by using accu- query-update independence, in the presence of a schema. Rather rate and efficient existing techniques like the one proposed by Bex than types, our system infers chains of types. Each chain rep- et al. in [8]. resents a path that can be traversed on a valid document during query/update evaluation. The resulting independence analysis is State of the Art precise, although it raises a challenging issue: recursive schemas Schema-based detection of XML query-update independence has may lead to inference of infinitely many chains. been recently investigated. The state of the art technique has been A sound and complete approximation technique ensuring a finite presented by Benedikt and Cheney in [6]. This technique infers analysis in any case is presented, together with an efficient imple- from the schema the set of node types traversed by the query, and mentation performing the chain-based analysis in polynomial space the set of node types impacted by the update. The query and the and time. update are then deemed as independent if the two sets do not over- lap. This technique is effective since the static analysis i) is able to manage a wide class of XQuery queries and updates, ii) can be 1. INTRODUCTION performed in a negligible time, and iii) as a consequence, even on A query and an update are independent when the query result is small documents, can avoid expensive query re-computation when not affected by update execution, on any possible input database. independence wrt an update is detected. However, the technique Detecting query-update independence is of crucial importance in has some weaknesses. As illustrated in [6], in some cases, inde- many contexts: i) to minimize view re-materialization; ii) to ensure pendence is not detected, due to some over-approximation made isolation, when queries and updates are executed concurrently; iii) by the type inference rules. as outlined in [6], to enforce access control policies, when the query For example, this technique cannot detect independence between is used to define the part of the database that must not be changed the query q1===a==c and the update u1=delete ==b==c, when by a user update. the schema enforces that c descendants of b nodes are never descen- In all these contexts, benefits are amplified when query-update dants of a nodes. This is because the type inference technique of [6] independence can be checked statically. In order to be useful, ev- infers the type c both for the query path and the update path, with- ery static analysis technique must be sound: if query-update inde- out considering contextual information about the inferred types. pendence is statically detected, then independence does hold. The Since the query and update types overlap, independence is wrongly inverse implication (completeness) cannot be ensured in the general excluded. Indeed, the technique is not precise enough when ances- case, since static independence detection is undecidable (see [6]). tor or descendant axes are used in queries and updates. This means that if a static analyzer is used, for instance, in a view The way XPath axes are typed is not the only source of low pre- maintenance system, sometimes views are re-materialized after up- cision of this technique. Consider documents typed by the well dates even if not needed, because the analysis has not been smart known bibliographic DTD used in [1], the query q2===title and enough to statically detect a view-update independence. Useless the update u2=for x in ==book return insert <author=> view re-materialization frequently occurs if a static analyzer with into x. The technique of [6] infers bib, book and title as types low precision is adopted. This can lead to great waste of time, since traced by q2, and book as type impacted by u2. According to this view materialization cost can be proportional to the database size. technique, the two expressions share the type book, hence indepen- High precision of static independence analysis can be ensured dence is not detected, while it holds. by taking into account schema information. In many contexts, In none of the above examples, independence can be detected schemas are defined by users, mainly by means of the DTD or XML by techniques ignoring schema information like the path-based ap- proach proposed by Ghelli et al.1 [15] and the recent destabilizers- Permission to make digital or hard copies of all or part of this work for based approach proposed by Benedikt and Cheney [5]. Follow- personal or classroom use is granted without fee provided that copies are ing these approaches, for the example q1-u1, the paths ==a==c and not made or distributed for profit or commercial advantage and that copies ==b==c are deemed as overlapping since, for instance, documents bear this notice and the full citation on the first page. To copy otherwise, to matching the path =a=b=c match both paths and similarly, for ex- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present ample q2-u2 and the paths ==title and ==book. their results at The 38th International Conference on Very Large Data Bases, 1 August 27th - 31st 2012, Istanbul, Turkey. This technique deals with update-commutativity detection for a Proceedings of the VLDB Endowment, Vol. 5, No. 9 language with side effects and can be directly extended to query- Copyright 2012 VLDB Endowment 2150-8097/12/05... $ 10.00. update independence detection without side effects. 872 ∗ Contributions sd=doc d(doc)=(a j b) d(a)=c d(b)=c This paper proposes a novel schema-based approach for detecting <doc> XML query-update independence. Differently from [6, 10, 11], <a><c=><=a> a DTD d our system infers sequences of labels (hereafter called chains). In- <a><c=><=a> 8 tuitively, for each node that can be selected by a query/update path <b><c=><=b> > lt doc[l1; l2; l3; l4] <a><c=><=a> <> l a[l0 ] l a[l0 ] in a schema instance, the system infers a chain recording i) all la- σ = 1 1 2 2 <=doc> 0 0 bels that are encountered from the root to the node, ii) in the order > l3 b[l3] l4 a[l4] :> 0 of traversal. This information is at the basis of a precise static inde- li c[]; i = 1::4 a document t valid wrt d pendence analysis. For instance, for q ===a==c and u ===b==c 1 1 the store (σ; l ) over the schema f doc (ajb)∗; a c; b c g, the chains doc:a:c t and doc:b:c are inferred for the query and the update, respectively. Disjointness of these two chains allows us to statically derive the Figure 1: Document and store independence for q1-u1. For the DTD of the XQuery Use Cases [1] before discussed, the chains bib:book:title and bib:book:author, respectively inferred for q2 and u2, diverge after the book sym- 2. PRELIMINARIES bol; this allows us to conclude independence for q2-u2. These two examples highlight that chain inference provides a more precise in- Data model. We represent an instance of the XML data model as dependence analysis than that of [6, 15, 5]. a store σ, which is an environment associating each node location The main contribution of this work is a precise algorithm to de- (or identifier) l with either an element node a[L] or a text node s. tect independence for a query-update pair q-u knowing that docu- In a[L], a is the element tag, while L=(l ; : : : ; l ) is the ordered ments are valid wrt a DTD d. It strongly relies on the following 1 n list of children locations in σ. A tree is a pair t=(σ; l ), where l is developments. t t its root location. dom(σ) denotes the set of locations of σ, while • Chain-based independence for q-u, a static notion, is the foun- σ@l denotes the subtree of σ rooted at l whose domain is limited dation of our algorithm: starting from the set Cd of all possi- to locations connected to l. See Figure 1 for a small document ble chains associated with the DTD d, our inference system together with its store. extracts subsets of chains Cq and Cu which soundly approx- imate the navigation through valid documents made by the DTDs. A DTD is a 3-tuple (Σ; sd; d) where: Σ is a finite alphabet evaluation of the query q and the update u, respectively. Note for element tags, denoted by a; b; c; sd2Σ is the start symbol; d is that our inference system (Section 3) is cautiously specified a function from Σ to regular expressions over Σ[fSg, where S for dealing with all XPath axes. Chain-based independence denotes the string type. For simplicity, next we often use only the is the result of the absence of overlapping pair of chains in Cq d component to specify a DTD. and Cu. Chain-based independence is proved to be sound wrt A tree t=(σ; lt) is valid wrt d, denoted t2d, iff there exists a the semantics notion of query-update independence (Section mapping ν : dom(t) 7! Σ[fSg such that: ν(lt)=sd; ν(l)=S 4).
Recommended publications
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • Basex Server
    BaseX Documentation Version 7.8 BaseX Documentation: Version 7.8 Content is available under Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0). Table of Contents 1. Main Page .............................................................................................................................. 1 Getting Started .................................................................................................................. 1 XQuery Portal .................................................................................................................... 1 Advanced User's Guide ..................................................................................................... 2 I. Getting Started ........................................................................................................................ 3 2. Command-Line Options ................................................................................................... 4 BaseX Standalone ..................................................................................................... 4 BaseX Server ............................................................................................................ 6 BaseX Client ............................................................................................................. 7 BaseX HTTP Server .................................................................................................. 9 BaseX GUI .............................................................................................................
    [Show full text]
  • Supporting SPARQL Update Queries in RDF-XML Integration *
    Supporting SPARQL Update Queries in RDF-XML Integration * Nikos Bikakis1 † Chrisa Tsinaraki2 Ioannis Stavrakantonakis3 4 Stavros Christodoulakis 1 NTU Athens & R.C. ATHENA, Greece 2 EU Joint Research Center, Italy 3 STI, University of Innsbruck, Austria 4 Technical University of Crete, Greece Abstract. The Web of Data encourages organizations and companies to publish their data according to the Linked Data practices and offer SPARQL endpoints. On the other hand, the dominant standard for information exchange is XML. The SPARQL2XQuery Framework focuses on the automatic translation of SPARQL queries in XQuery expressions in order to access XML data across the Web. In this paper, we outline our ongoing work on supporting update queries in the RDF–XML integration scenario. Keywords: SPARQL2XQuery, SPARQL to XQuery, XML Schema to OWL, SPARQL update, XQuery Update, SPARQL 1.1. 1 Introduction The SPARQL2XQuery Framework, that we have previously developed [6], aims to bridge the heterogeneity issues that arise in the consumption of XML-based sources within Semantic Web. In our working scenario, mappings between RDF/S–OWL and XML sources are automatically derived or manually specified. Using these mappings, the SPARQL queries are translated on the fly into XQuery expressions, which access the XML data. Therefore, the current version of SPARQL2XQuery provides read-only access to XML data. In this paper, we outline our ongoing work on extending the SPARQL2XQuery Framework towards supporting SPARQL update queries. Both SPARQL and XQuery have recently standardized their update operation seman- tics in the SPARQL 1.1 and XQuery Update Facility, respectively. We have studied the correspondences between the update operations of these query languages, and we de- scribe the extension of our mapping model and the SPARQL-to-XQuery translation algorithm towards supporting SPARQL update queries.
    [Show full text]
  • XML Transformations, Views and Updates Based on Xquery Fragments
    Faculteit Wetenschappen Informatica XML Transformations, Views and Updates based on XQuery Fragments Proefschrift voorgelegd tot het behalen van de graad van doctor in de wetenschappen aan de Universiteit Antwerpen, te verdedigen door Roel VERCAMMEN Promotor: Prof. Dr. Jan Paredaens Antwerpen, 2008 Co-promotor: Dr. Ir. Jan Hidders XML Transformations, Views and Updates based on XQuery Fragments Roel Vercammen Universiteit Antwerpen, 2008 http://www.universiteitantwerpen.be Permission to make digital or hard copies of portions of this work for personal or classroom use is granted, provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice. Copyrights for components of this work owned by others than the author must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission of the author. Research funded by a Ph.D. grant of the Institute for the Promotion of Innovation through Science and Technology in Flan- ders (IWT-Vlaanderen). { Onderzoek gefinancierd met een specialisatiebeurs van het Instituut voor de Aanmoediging van Innovatie door Wetenschap en Technologie in Vlaanderen (IWT-Vlaanderen). Grant number / Beurs nummer: 33581. http://www.iwt.be Typesetting by LATEX Acknowledgements This thesis is the result of the contributions of many friends and colleagues to whom I would like to say \thank you". First and foremost, I want to thank my advisor Jan Paredaens, who gave me the opportunity to become a researcher and teached me how good research should be performed. I had the honor to write several papers in collaboration with him and will always remember the discussions and his interesting views on research, politics and gastronomy.
    [Show full text]
  • Access Control Models for XML
    Access Control Models for XML Abdessamad Imine Lorraine University & INRIA-LORIA Grand-Est Nancy, France [email protected] Outline • Overview on XML • Why XML Security? • Querying Views-based XML Data • Updating Views-based XML Data 2 Outline • Overview on XML • Why XML Security? • Querying Views-based XML Data • Updating Views-based XML Data 3 What is XML? • eXtensible Markup Language [W3C 1998] <files> "<record>! ""<name>Robert</name>! ""<diagnosis>Pneumonia</diagnosis>! "</record>! "<record>! ""<name>Franck</name>! ""<diagnosis>Ulcer</diagnosis>! "</record>! </files>" 4 What is XML? • eXtensible Markup Language [W3C 1998] <files>! <record>! /files" <name>Robert</name>! <diagnosis>! /record" /record" Pneumonia! </diagnosis> ! </record>! /name" /diagnosis" <record …>! …! </record>! Robert" Pneumonia" </files>! 5 XML for Documents • SGML • HTML - hypertext markup language • TEI - Text markup, language technology • DocBook - documents -> html, pdf, ... • SMIL - Multimedia • SVG - Vector graphics • MathML - Mathematical formulas 6 XML for Semi-Structered Data • MusicXML • NewsML • iTunes • DBLP http://dblp.uni-trier.de • CIA World Factbook • IMDB http://www.imdb.com/ • XBEL - bookmark files (in your browser) • KML - geographical annotation (Google Maps) • XACML - XML Access Control Markup Language 7 XML as Description Language • Java servlet config (web.xml) • Apache Tomcat, Google App Engine, ... • Web Services - WSDL, SOAP, XML-RPC • XUL - XML User Interface Language (Mozilla/Firefox) • BPEL - Business process execution language
    [Show full text]
  • Web and Semantic Web Query Languages: a Survey
    Web and Semantic Web Query Languages: A Survey James Bailey1, Fran¸coisBry2, Tim Furche2, and Sebastian Schaffert2 1 NICTA Victoria Laboratory Department of Computer Science and Software Engineering The University of Melbourne, Victoria 3010, Australia http://www.cs.mu.oz.au/~jbailey/ 2 Institute for Informatics,University of Munich, Oettingenstraße 67, 80538 M¨unchen, Germany http://pms.ifi.lmu.de/ Abstract. A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This ar- ticle introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced us- ing common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion. 1 Introduction The Semantic Web Vision A major endeavour in current Web research is the so-called Semantic Web, a term coined by W3C founder Tim Berners-Lee in a Scientific American article describing the future of the Web [37]. The Semantic Web aims at enriching Web data (that is usually represented in (X)HTML or other XML formats) by meta-data and (meta-)data processing specifying the “meaning” of such data and allowing Web based systems to take advantage of “intelligent” reasoning capabilities. To quote Berners-Lee et al. [37]: “The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.” The Semantic Web meta-data added to today’s Web can be seen as advanced semantic indices, making the Web into something rather like an encyclopedia.
    [Show full text]
  • Declarative Access to Filesystem Data New Application Domains for XML Database Management Systems
    Declarative Access to Filesystem Data New application domains for XML database management systems Alexander Holupirek Dissertation zur Erlangung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) Fachbereich Informatik und Informationswissenschaft Mathematisch-Naturwissenschaftliche Sektion Universität Konstanz Referenten: Prof. Dr. Marc H. Scholl Prof. Dr. Marcel Waldvogel Tag der mündlichen Prüfung: 17. Juli 2012 Abstract XML and state-of-the-art XML database management systems (XML-DBMSs) can play a leading role in far more application domains as it is currently the case. Even in their basic configuration, they entail all components necessary to act as central systems for complex search and retrieval tasks. They provide language-specific index- ing of full-text documents and can store structured, semi-structured and binary data. Besides, they offer a great variety of standardized languages (XQuery, XSLT, XQuery Full Text, etc.) to develop applications inside a pure XML technology stack. Benefits are obvious: Data, logic, and presentation tiers can operate on a single data model, and no conversions have to be applied when switching in between. This thesis deals with the design and development of XML/XQuery driven informa- tion architectures that process formerly heterogeneous data sources in a standardized and uniform manner. Filesystems and their vast amounts of different file types are a prime example for such a heterogeneous dataspace. A new XML dialect, the Filesystem Markup Language (FSML), is introduced to construct a database view of the filesystem and its contents. FSML provides a uniform view on the filesystem’s contents and allows developers to leverage the complete XML technology stack on filesystem data.
    [Show full text]
  • Standardised Xquery 3.0 Annotations for REST
    RESTful XQuery Standardised XQuery 3.0 Annotations for REST Adam Retter Adam Retter Consulting <[email protected]> January, 2012 Abstract Whilst XQuery was originally envisaged and designed as a query language for XML, it has been adopted by many as a language for application development This, in turn, has encouraged additional and diverse extensions, many of which could not easily have been foreseen. This paper examines how XQuery has been used for Web Application development, current implementation approaches for executing XQuery in a Web context, and subsequently presents a proposal for a standard approach to RESTful XQuery through the use of XQuery 3.0 Annotations. Keywords: XQuery 3.0, Annotations, REST, HTTP, Standard 1. Introduction 1.1 Background XML Query Language (XQuery) was originally born from several competing query languages for XML[1]. All of these languages had in common the noble yet limited goal of querying XML. They focused on XML as a read-only store for data. In addition, whilst several of these predecessors recognised the Web as a critical factor, like their successor XQuery, none of them attempted to implement constructs in the language that supported use as a (Web) server-side processing language. With the adoption and use of XQuery, because of its functional nature and module system which permit the organisation of code units, people attempted to write complex processing applications in XQuery. As the limits of what was achievable in XQuery were tested, real world scenarios emerged which called for additional XQuery facilities, resulting in extension standards: XPath and XQuery Update[2] and XQuery Full-Text[3].
    [Show full text]
  • Using Map and Reduce for Querying Distributed XML Data
    Using Map and Reduce for Querying Distributed XML Data Lukas Lewandowski Master Thesis in fulfillment of the requirements for the degree of Master of Science (M.Sc.) Submitted to the Department of Computer and Information Science at the University of Konstanz Reviewer: Prof. Dr. Marc H. Scholl Prof. Dr. Marcel Waldvogel Abstract Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results. iii Acknowledgments I would like to thank my advisors Professor Marc H. Scholl and Professor Marcel Wald- vogel, who have supported me with advice and guidance throughout my master thesis.
    [Show full text]
  • Xpath, Xquery and XSLT
    XPath, XQuery and XSLT Timo Lilja February 25, 2010 Timo Lilja () XPath, XQuery and XSLT February 25, 2010 1 / 24 Introduction When processing, XML documents are organized into a tree structure in the memory All XML Query and transformation languages operate on this tree XML Documents consists of tags and attributes which form logical constructs called nodes The basic operation unit of these query/processing languages is the node though access to substructures is provided Timo Lilja () XPath, XQuery and XSLT February 25, 2010 2 / 24 XPath XPath XPath 2.0 [4] is a W3C Recomendation released on January 2007 XPath provides a way to query the nodes and their attributes, tags and values XPath type both dynamic and static typing if the XML Scheme denition is present, it is used for static type checking before executing a query otherwise nodes are marked for untyped and they are coerced dynamically during run-time to suitable types Timo Lilja () XPath, XQuery and XSLT February 25, 2010 3 / 24 XPath Timo Lilja () XPath, XQuery and XSLT February 25, 2010 4 / 24 XPath Path expressions Path expressions provide a way to choose all matching branches of the tree representation of the XML Document Path expressions consist of one or more steps separated by the path operater/. A step can be an axis which is a way to reference a node in the path. a node-test which corresponds to a node in the actual XML document zero or more predicates Syntax for the path expressions: /step/step Syntaxc for a step: axisname::nodestest[predicate] Timo Lilja () XPath, XQuery and XSLT February 25, 2010 5 / 24 XPath Let's assume that we use the following XML Docuemnt <bookstore> <book category="COOKING"> <title lang="it">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K.
    [Show full text]
  • Evolutionary Tree-Structured Storage: Concepts, Interfaces, and Applications
    Evolutionary Tree-Structured Storage: Concepts, Interfaces, and Applications Dissertation submitted for the degree of Doctor of Natural Sciences Presented by Marc Yves Maria Kramis at the Faculty of Sciences Department of Computer and Information Science Date of the oral examination: 22.04.2014 First supervisor: Prof. Dr. Marcel Waldvogel Second supervisor: Prof. Dr. Marc Scholl i Abstract Life is subdued to constant evolution. So is our data, be it in research, business or personal information management. From a natural, evolutionary perspective, our data evolves through a sequence of fine-granular modifications resulting in myriads of states, each describing our data at a given point in time. From a technical, anti- evolutionary perspective, mainly driven by technological and financial limitations, we treat the modifications as transient commands and only store the latest state of our data. It is surprising that the current approach is to ignore the natural evolution and to willfully forget about the sequence of modifications and therefore the past state. Sticking to this approach causes all kinds of confusion, complexity, and performance issues. Confusion, because we still somehow want to retrieve past state but are not sure how. Complexity, because we must repeatedly work around our own obsolete approaches. Performance issues, because confusion times complexity hurts. It is not surprising, however, that intelligence agencies notoriously try to collect, store, and analyze what the broad public willfully forgets. Significantly faster and cheaper random-access storage is the key driver for a paradigm shift towards remembering the sequence of modifications. We claim that (1) faster storage allows to efficiently and cleverly handle finer-granular modifi- cations and (2) that mandatory versioning elegantly exposes past state, radically simplifies the applications, and effectively lays a solid foundation for backing up, distributing and scaling of our data.
    [Show full text]
  • Database Architecture Evolution: Mammals Flourished Long Before Dinosaurs Became Extinct
    Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct Stefan Manegold Martin L. Kersten Peter Boncz CWI, Amsterdam, The Netherlands {manegold,mk,boncz}@cwi.nl ABSTRACT building block for realizing more complex structures (such as ta- The holy grail for database architecture research is to find a solution bles, but also objects and even trees or graphs). that is Scalable & Speedy, to run on anything from small ARM In this paper, we summarize the quest to renovate database ar- processors up to globally distributed compute clusters, Stable & chitecture with the goal of addressing data management needs with Secure, to service a broad user community, Small & Simple, to be a common core rather than bolting on new subsystems [38], and comprehensible to a small team of programmers, Self-managing, to pushing through the functionality plateau, especially using self- let it run out-of-the-box without hassle. adaptation, exploiting resource optimization opportunities as they In this paper, we provide a trip report on this quest, covering appear dynamically, rather than by static prediction. This quest both past experiences, ongoing research on hardware-conscious al- covers the major design space choices, experiences gained and on- gorithms, and novel ways towards self-management specifically fo- going research around the MonetDB system. cused on column store solutions. In all, this story demonstrates the viability of new self-adaptive life-forms that may be very successful yet are very different from the long existing species. In this, the MonetDB strain forms an 1. INTRODUCTION open platform available for all who want to explore further evolu- Forty years of relational database technology seemingly con- tion in the data management ecosystem.
    [Show full text]