Querying Nosql with Deep Learning to Answer Natural Language Questions

Total Page:16

File Type:pdf, Size:1020Kb

Querying Nosql with Deep Learning to Answer Natural Language Questions The Thirty-First AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19) Querying NoSQL with Deep Learning to Answer Natural Language Questions Sebastian Blank,1 Florian Wilhelm,1 Hans-Peter Zorn,1 Achim Rettinger2 inovex GmbH,1 Karlsruhe Institute of Technology,2 [email protected], [email protected], [email protected], [email protected] Abstract To enable the application of a KBQueryBot in real-world use-cases, it needs to satisfy two conditions. Firstly, it needs Almost all of today’s knowledge is stored in databases and to be trainable in an end-to-end fashion. Labelled data is thus can only be accessed with the help of domain spe- cific query languages, strongly limiting the number of peo- always a scarce resource and any kind of additional inter- ple which can access the data. In this work, we demonstrate mediate labelling to fulfil this task is too costly for practi- an end-to-end trainable question answering (QA) system that cal applications since it requires expert knowledge about the allows a user to query an external NoSQL database by us- query language and scales poorly (Liang et al. 2016). On the ing natural language. A major challenge of such a system other hand, judging the correctness of a retrieved result can is the non-differentiability of database operations which we be conducted having only domain knowledge and thus even overcome by applying policy-based reinforcement learning. active learning-powered techniques can be applied. We evaluate our approach on Facebook’s bAbI Movie Dialog Typically, there are two different ways how to interact dataset and achieve a competitive score of 84.2% compared to several benchmark models. We conclude that our approach with a KB. Agents can either query an external KB with excels with regard to real-world scenarios where knowledge the corresponding query language or make use of an inter- resides in external databases and intermediate labels are too nal memory that allows for more flexible means of lookups, costly to gather for non-end-to-end trainable QA systems. e.g. probabilistic lookup. This requires a conversion of the externally available database into the internal memory ren- dering the user management and access control capabilities Introduction of modern databases useless. However, lacking these capa- Almost all of today’s knowledge is stored and organized bilities results in severe issues regarding data security and in various types of databases but the ability to access this privacy. Therefore, our second requirement is that a KB- knowledge is limited to the ones who have mastered the cor- QueryBot needs to cope with today’s prevalent databases responding query language. Conversational agents promise like Elasticsearch, MongoDB and PostgreSQL. to overcome this barrier by allowing humans to query these This work contributes to engineering efforts to establish resources with natural language thus fostering the democra- a standard machine learning model and training procedure tization of knowledge. The broad domain of conversational for all kinds of structured database queries via natural lan- agents can be further subdivided by looking at the complex- guage questions. We propose a KBQueryBot which relies ity of human-machine interaction. Goal-oriented dialog sys- on a sequence-to-sequence model (Sutskever, Vinyals, and tems follow the purpose of gaining information through con- Le 2014) that is augmented with pointer attention (Bah- versation in order to complete a specific task. Hence, they danau, Cho, and Bengio 2014; Vinyals, Fortunato, and Jaitly are typically applied in short dialogs that are domain spe- 2015). We apply the REINFORCE-algorithm (Williams cific. In contrast, non-goal-oriented agents are designed to 1992) which enables us to overcome the problem of non- have extended conversations without any restriction to a spe- differentiable database operations and train our model on cific domain. question-answer pairs. Our approach is closely related to In our opinion, both categories represent long-term goals Zhong, Xiong, and Socher (2017) and also uses policy gra- in artificial intelligence, but they depend on the practical fea- dient but differs in avoiding the usage of intermediate la- sibility of another type of natural language processing (NLP) bels. Our model integrates with Elasticsearch (ES) which is task that is much simpler. We define a KBQueryBot as an one of the most popular NoSQL databases1 and dominant in agent that fulfils the task of translating a query, posed in nat- search-related applications. Typically, users query ES with ural language, into the domain-specific query language of the help of graphical user interfaces designed for the spe- an external knowledge base (KB) in order to retrieve factoid cific use-case. However, we notice an increased interest of answers in a single-shot manner. This task can also be seen business customers to make their KBs accessible by natural as a semantic parsing problem. language input to provide an improved user experience. Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1https://db-engines.com/en/ranking 9416 Related Work In the same manner, Zhong, Xiong, and Socher (2017) as- Since the task of a KBQueryBot is to reformulate a single sume the availability of the query ground truth (intermediate user question into a corresponding structured query, it can labels) and the database response. They propose Seq2SQL, be qualified as a specialised QA system which does not take which is a modular approach to translate natural language into account any dialog history. As the focus of our work questions into SQL queries and generalizes across different lies on the retrieval of data stored in an external KB, it can table schemas. Seq2SQL consists of three modules. The first clearly be separated from QA systems that only leverage di- module is responsible to identify an aggregator, e.g. count or alog history to answer a question (Eric and Manning 2017). max, while the second module identifies the column that will Recent QA systems that incorporate external knowledge be used in the select operator. Both modules are trained on can be distinguished by the way they perform the actual question-query pairs. The third module is designed to pro- lookup. Dhingra et al. (2016) proposed the term soft-KB duce the where-clause. Since arguments in the where-clause lookup for using an attention mechanism as introduced by can be swapped, this ambiguity is handled by policy-based Bahdanau, Cho, and Bengio (2014) to compute a probability reinforcement learning trained on question-answer pairs di- distribution over the index of the KB. The index with the rectly. Our work resembles their architecture but uses only highest probability is then retrieved and the corresponding reinforcement learning to achieve end-to-end trainability. data presented as result. In contrast, generating a structured The work of Williams and Zweig (2016) applies policy- query which is then executed is known as hard-KB lookup. based reinforcement without intermediate labels in task- oriented dialog system. It translates human commands into Soft-KB lookup. Memory Networks (MemNN) repre- API calls also taken into account the dialog history. Our sent a neural architecture that is extended with an inter- work differs in that no separately trained entity extraction nal memory component (Weston, Chopra, and Bordes 2014; is needed and the resulting query is solely based on the Sukhbaatar et al. 2015). Bordes et al. (2015) proposed their user’s input without needing any additional predefined busi- application for soft-KB lookup by ingesting the external ness logic. KB into the internal memory in a pre-processing step. They showed promising performance against multiple benchmark Model models, second only to a subgraph embedding based QA system by Bordes, Chopra, and Weston (2014), on the large- The overall architecture of our SeqPolicyNet model which scale and domain specific Movie Dialog dataset (Dodge et solves the introduced KBQueryBot task is illustrated in Fig- al. 2015). Miller et al. (2016) confirmed the qualities of ure 1. The core component is an attention-based pointer net- MemNN in another benchmark on a modified Movie Dia- work (Sutskever, Vinyals, and Le 2014) which fills slots of log dataset where MemNN even outperformed the QA sys- a predefined ES query template. The fixed-length output se- tem by Bordes, Chopra, and Weston (2014). Due to these quence is composed of elements which either select a col- promising results, we use MemNNs as a benchmark for our umn name of the KB or point to a token of the question for hard-KB approach on the same dataset. entity extraction. Thus, we will refer to them as selection or Dhingra et al. (2016) introduced KB-InfoBot which extraction outputs. With the help two extraction outputs act- also performs a soft-KB lookup, but its lookup policy is ing as start and end pointer, it is thus possible to extract an trained with reinforcement learning similar to our work. entity composed by a span of tokens. For instance, given the They observed that their model tends to fail from random question ”Who was the director of Gone With the Wind?” initialization and overcame this problem by using imitation the start pointer would be Gone and the end pointer Wind. learning to mimic hand-crafted agents. This required Our model can thus be seen as a pointer network for entity human interaction diminishes the advantage of end-to-end extraction (Zhong, Xiong, and Socher 2017) with reduced trainability. complexity. Hard-KB lookup. Non-differentiable database opera- Pointer Network tions are the major shortcoming of hard-KB lookup, since As input SeqPolicyNet receives the natural language ques- they complicate the end-to-end trainability of the corre- tions xq and the columns xc of the KB which are lowercased sponding models.
Recommended publications
  • Not ACID, Not BASE, but SALT a Transaction Processing Perspective on Blockchains
    Not ACID, not BASE, but SALT A Transaction Processing Perspective on Blockchains Stefan Tai, Jacob Eberhardt and Markus Klems Information Systems Engineering, Technische Universitat¨ Berlin fst, je, [email protected] Keywords: SALT, blockchain, decentralized, ACID, BASE, transaction processing Abstract: Traditional ACID transactions, typically supported by relational database management systems, emphasize database consistency. BASE provides a model that trades some consistency for availability, and is typically favored by cloud systems and NoSQL data stores. With the increasing popularity of blockchain technology, another alternative to both ACID and BASE is introduced: SALT. In this keynote paper, we present SALT as a model to explain blockchains and their use in application architecture. We take both, a transaction and a transaction processing systems perspective on the SALT model. From a transactions perspective, SALT is about Sequential, Agreed-on, Ledgered, and Tamper-resistant transaction processing. From a systems perspec- tive, SALT is about decentralized transaction processing systems being Symmetric, Admin-free, Ledgered and Time-consensual. We discuss the importance of these dual perspectives, both, when comparing SALT with ACID and BASE, and when engineering blockchain-based applications. We expect the next-generation of decentralized transactional applications to leverage combinations of all three transaction models. 1 INTRODUCTION against. Using the admittedly contrived acronym of SALT, we characterize blockchain-based transactions There is a common belief that blockchains have the – from a transactions perspective – as Sequential, potential to fundamentally disrupt entire industries. Agreed, Ledgered, and Tamper-resistant, and – from Whether we are talking about financial services, the a systems perspective – as Symmetric, Admin-free, sharing economy, the Internet of Things, or future en- Ledgered, and Time-consensual.
    [Show full text]
  • MDA Process to Extract the Data Model from a Document-Oriented Nosql Database Amal Ait Brahim, Rabah Tighilt Ferhat, Gilles Zurfluh
    MDA process to extract the data model from a document-oriented NoSQL database Amal Ait Brahim, Rabah Tighilt Ferhat, Gilles Zurfluh To cite this version: Amal Ait Brahim, Rabah Tighilt Ferhat, Gilles Zurfluh. MDA process to extract the data model from a document-oriented NoSQL database. ICEIS 2019: 21st International Conference on Enterprise Information Systems, May 2019, Heraklion, Greece. pp.141-148, 10.5220/0007676201410148. hal- 02886696 HAL Id: hal-02886696 https://hal.archives-ouvertes.fr/hal-02886696 Submitted on 1 Jul 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Open Archive Toulouse Archive Ouverte OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible This is an author’s version published in: https://oatao.univ-toulouse.fr/26232 Official URL : https://doi.org/10.5220/0007676201410148 To cite this version: Ait Brahim, Amal and Tighilt Ferhat, Rabah and Zurfluh, Gilles MDA process to extract the data model from a document-oriented NoSQL database. (2019) In: ICEIS 2019: 21st International Conference on Enterprise Information Systems, 3 May 2019 - 5 May 2019 (Heraklion, Greece).
    [Show full text]
  • MDA Process to Extract the Data Model from Document-Oriented Nosql Database
    MDA Process to Extract the Data Model from Document-oriented NoSQL Database Amal Ait Brahim, Rabah Tighilt Ferhat and Gilles Zurfluh Toulouse Institute of Computer Science Research (IRIT), Toulouse Capitole University, Toulouse, France Keywords: Big Data, NoSQL, Model Extraction, Schema Less, MDA, QVT. Abstract: In recent years, the need to use NoSQL systems to store and exploit big data has been steadily increasing. Most of these systems are characterized by the property "schema less" which means absence of the data model when creating a database. This property brings an undeniable flexibility by allowing the evolution of the model during the exploitation of the base. However, query expression requires a precise knowledge of the data model. In this article, we propose a process to automatically extract the physical model from a document- oriented NoSQL database. To do this, we use the Model Driven Architecture (MDA) that provides a formal framework for automatic model transformation. From a NoSQL database, we propose formal transformation rules with QVT to generate the physical model. An experimentation of the extraction process was performed on the case of a medical application. 1 INTRODUCTION however that it is absent in some systems such as Cassandra and Riak TS. The "schema less" property Recently, there has been an explosion of data offers undeniable flexibility by allowing the model to generated and accumulated by more and more evolve easily. For example, the addition of new numerous and diversified computing devices. attributes in an existing line is done without Databases thus constituted are designated by the modifying the other lines of the same type previously expression "Big Data" and are characterized by the stored; something that is not possible with relational so-called "3V" rule (Chen, 2014).
    [Show full text]
  • What Is Nosql? the Only Thing That All Nosql Solutions Providers Generally Agree on Is That the Term “Nosql” Isn’T Perfect, but It Is Catchy
    NoSQL GREG SYSADMINBURD Greg Burd is a Developer Choosing between databases used to boil down to examining the differences Advocate for Basho between the available commercial and open source relational databases . The term Technologies, makers of Riak. “database” had become synonymous with SQL, and for a while not much else came Before Basho, Greg spent close to being a viable solution for data storage . But recently there has been a shift nearly ten years as the product manager for in the database landscape . When considering options for data storage, there is a Berkeley DB at Sleepycat Software and then new game in town: NoSQL databases . In this article I’ll introduce this new cat- at Oracle. Previously, Greg worked for NeXT egory of databases, examine where they came from and what they are good for, and Computer, Sun Microsystems, and KnowNow. help you understand whether you, too, should be considering a NoSQL solution in Greg has long been an avid supporter of open place of, or in addition to, your RDBMS database . source software. [email protected] What Is NoSQL? The only thing that all NoSQL solutions providers generally agree on is that the term “NoSQL” isn’t perfect, but it is catchy . Most agree that the “no” stands for “not only”—an admission that the goal is not to reject SQL but, rather, to compensate for the technical limitations shared by the majority of relational database implemen- tations . In fact, NoSQL is more a rejection of a particular software and hardware architecture for databases than of any single technology, language, or product .
    [Show full text]
  • Data Definition Language
    1 Structured Query Language SQL, or Structured Query Language is the most popular declarative language used to work with Relational Databases. Originally developed at IBM, it has been subsequently standard- ized by various standards bodies (ANSI, ISO), and extended by various corporations adding their own features (T-SQL, PL/SQL, etc.). There are two primary parts to SQL: The DDL and DML (& DCL). 2 DDL - Data Definition Language DDL is a standard subset of SQL that is used to define tables (database structure), and other metadata related things. The few basic commands include: CREATE DATABASE, CREATE TABLE, DROP TABLE, and ALTER TABLE. There are many other statements, but those are the ones most commonly used. 2.1 CREATE DATABASE Many database servers allow for the presence of many databases1. In order to create a database, a relatively standard command ‘CREATE DATABASE’ is used. The general format of the command is: CREATE DATABASE <database-name> ; The name can be pretty much anything; usually it shouldn’t have spaces (or those spaces have to be properly escaped). Some databases allow hyphens, and/or underscores in the name. The name is usually limited in size (some databases limit the name to 8 characters, others to 32—in other words, it depends on what database you use). 2.2 DROP DATABASE Just like there is a ‘create database’ there is also a ‘drop database’, which simply removes the database. Note that it doesn’t ask you for confirmation, and once you remove a database, it is gone forever2. DROP DATABASE <database-name> ; 2.3 CREATE TABLE Probably the most common DDL statement is ‘CREATE TABLE’.
    [Show full text]
  • Nosql Database Technology
    NoSQL Database Technology Post-relational data management for interactive software systems NOSQL DATABASE TECHNOLOGY Table of Contents Summary 3 Interactive software has changed 4 Users – 4 Applications – 5 Infrastructure – 5 Application architecture has changed 6 Database architecture has not kept pace 7 Tactics to extend the useful scope of RDBMS technology 8 Sharding – 8 Denormalizing – 9 Distributed caching – 10 “NoSQL” database technologies 11 Mobile application data synchronization 13 Open source and commercial NoSQL database technologies 14 2 © 2011 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM NOSQL DATABASE TECHNOLOGY Summary Interactive software (software with which a person iteratively interacts in real time) has changed in fundamental ways over the last 35 years. The “online” systems of the 1970s have, through a series of intermediate transformations, evolved into today’s web and mobile appli- cations. These systems solve new problems for potentially vastly larger user populations, and they execute atop a computing infrastructure that has changed even more radically over the years. The architecture of these software systems has likewise transformed. A modern web ap- plication can support millions of concurrent users by spreading load across a collection of application servers behind a load balancer. Changes in application behavior can be rolled out incrementally without requiring application downtime by gradually replacing the software on individual servers. Adjustments to application capacity are easily made by changing the number of application servers. But database technology has not kept pace. Relational database technology, invented in the 1970s and still in widespread use today, was optimized for the applications, users and inf- rastructure of that era.
    [Show full text]
  • SQL Vs Nosql: a Performance Comparison
    SQL vs NoSQL: A Performance Comparison Ruihan Wang Zongyan Yang University of Rochester University of Rochester [email protected] [email protected] Abstract 2. ACID Properties and CAP Theorem We always hear some statements like ‘SQL is outdated’, 2.1. ACID Properties ‘This is the world of NoSQL’, ‘SQL is still used a lot by We need to refer the ACID properties[12]: most of companies.’ Which one is accurate? Has NoSQL completely replace SQL? Or is NoSQL just a hype? SQL Atomicity (Structured Query Language) is a standard query language A transaction is an atomic unit of processing; it should for relational database management system. The most popu- either be performed in its entirety or not performed at lar types of RDBMS(Relational Database Management Sys- all. tems) like Oracle, MySQL, SQL Server, uses SQL as their Consistency preservation standard database query language.[3] NoSQL means Not A transaction should be consistency preserving, meaning Only SQL, which is a collection of non-relational data stor- that if it is completely executed from beginning to end age systems. The important character of NoSQL is that it re- without interference from other transactions, it should laxes one or more of the ACID properties for a better perfor- take the database from one consistent state to another. mance in desired fields. Some of the NOSQL databases most Isolation companies using are Cassandra, CouchDB, Hadoop Hbase, A transaction should appear as though it is being exe- MongoDB. In this paper, we’ll outline the general differences cuted in iso- lation from other transactions, even though between the SQL and NoSQL, discuss if Relational Database many transactions are execut- ing concurrently.
    [Show full text]
  • Applying the ETL Process to Blockchain Data. Prospect and Findings
    information Article Applying the ETL Process to Blockchain Data. Prospect and Findings Roberta Galici 1, Laura Ordile 1, Michele Marchesi 1 , Andrea Pinna 2,* and Roberto Tonelli 1 1 Department of Mathematics and Computer Science, University of Cagliari, Via Ospedale 72, 09124 Cagliari, Italy; [email protected] (R.G.); [email protected] (L.O.); [email protected] (M.M.); [email protected] (R.T.) 2 Department of Electrical and Electronic Engineering (DIEE), University of Cagliari, Piazza D’Armi, 09100 Cagliari, Italy * Correspondence: [email protected] Received: 7 March 2020; Accepted: 7 April 2020; Published: 10 April 2020 Abstract: We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction.
    [Show full text]
  • IBM Cloudant: Database As a Service Fundamentals
    Front cover IBM Cloudant: Database as a Service Fundamentals Understand the basics of NoSQL document data stores Access and work with the Cloudant API Work programmatically with Cloudant data Christopher Bienko Marina Greenstein Stephen E Holt Richard T Phillips ibm.com/redbooks Redpaper Contents Notices . 5 Trademarks . 6 Preface . 7 Authors. 7 Now you can become a published author, too! . 8 Comments welcome. 8 Stay connected to IBM Redbooks . 9 Chapter 1. Survey of the database landscape . 1 1.1 The fundamentals and evolution of relational databases . 2 1.1.1 The relational model . 2 1.1.2 The CAP Theorem . 4 1.2 The NoSQL paradigm . 5 1.2.1 ACID versus BASE systems . 7 1.2.2 What is NoSQL? . 8 1.2.3 NoSQL database landscape . 9 1.2.4 JSON and schema-less model . 11 Chapter 2. Build more, grow more, and sleep more with IBM Cloudant . 13 2.1 Business value . 15 2.2 Solution overview . 16 2.3 Solution architecture . 18 2.4 Usage scenarios . 20 2.5 Intuitively interact with data using Cloudant Dashboard . 21 2.5.1 Editing JSON documents using Cloudant Dashboard . 22 2.5.2 Configuring access permissions and replication jobs within Cloudant Dashboard. 24 2.6 A continuum of services working together on cloud . 25 2.6.1 Provisioning an analytics warehouse with IBM dashDB . 26 2.6.2 Data refinement services on-premises. 30 2.6.3 Data refinement services on the cloud. 30 2.6.4 Hybrid clouds: Data refinement across on/off–premises. 31 Chapter 3.
    [Show full text]
  • The Temporal Query Language Tquel
    The Temporal Query Language TQuel RICHARD SNODGRASS University of North Carolina Recently, attention has been focused on temporal datubases, representing an enterprise over time. We have developed a new language, TQuel, to query a temporal database. TQuel was designed to be a minimal extension, both syntactically and semantically, of Quel, the query language in the Ingres relational database management system. This paper discusses the language informally, then provides a tuple relational calculus semantics for the TQuel statements that differ from their Quel counterparts, including the modification statements. The three additional temporal constructs defined in TQuel are shown to be direct semantic analogues of Quel’s where clause and target list. We also discuss reducibility of the semantics to Quel’s semantics when applied to a static database. TQuel is compared with ten other query languages supporting time. Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design-data models; H.2.3 [Database Management]: Languages-query lunguages; H.2.7 [Database Management]: Database Administration-logging and recovery General Terms: Languages, Theory Additional Key Words and Phrases: Historical database, Quel, relational calculus, relational model, rollback database, temporal database, tuple calculus 1. INTRODUCTION Most conventional databases represent the state of an enterprise at a single moment of time. Although the contents of the database continue to change as new information is added, these changes are viewed as modifications to the state, with the old, out-of-date data being deleted from the database. The current contents of the database may be viewed as a snapshot of the enterprise. Recently, attention has been focused on temporal dutubases (Z’DBs), repre- senting the progression of states of an enterprise over an interval of time.
    [Show full text]
  • Top 5 Considerations When Evaluating Nosql Databases February 2015 Table of Contents
    A MongoDB White Paper Top 5 Considerations When Evaluating NoSQL Databases February 2015 Table of Contents Introduction 1 Data Model 2 Document Model 2 Graph Model 2 Key-Value and Wide Column Models 2 Query Model 3 Document Database 3 Graph Database 3 Key Value and Wide Column Databases 3 Consistency Model 4 Consistent Systems 4 Eventually Consistent Systems 4 APIs 4 Idiomatic Drivers 5 Thrift or RESTful APIs 5 Pluggable Storage API 5 Commercial Support and Community Strength 5 Commercial Support 5 Community Strength 5 Conclusion 6 We Can Help 6 Introduction Relational databases have a long-standing position in most Development teams have a strong say in the technology organizations, and for good reason. Relational databases selection process. This community tends to find that the underpin legacy applications that meet current business relational data model is not well aligned with the needs of needs; they are supported by an extensive ecosystem of their applications. Consider: tools; and there is a large pool of labor qualified to • Developers are working with new data types — implement and maintain these systems. structured, semi-structured, unstructured and But companies are increasingly considering alternatives to polymorphic data — and massive volumes of it. legacy relational infrastructure. In some cases the • Long gone is the twelve-to-eighteen month waterfall motivation is technical — such as a need to scale or development cycle. Now small teams work in agile perform beyond the capabilities of their existing systems — sprints, iterating quickly and pushing code every week while in other cases companies are driven by the desire to or two, even every day.
    [Show full text]
  • Choosing a Data Model and Query Language for Provenance
    Choosing a Data Model and Query Language for Provenance The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Holland, David A., Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo I. Seltzer. 2008. Choosing a data model and query language for provenance. In Provenance and Annotation of Data and Processes: Proceedings of the 2nd International Provenance and Annotation Workshop (IPAW '08), June 17-18, 2008, Salt Lake City, UT, ed. Juliana Freire, David Koop, and Luc Moreau. Berlin: Springer. Special Issue. Lecture Notes in Computer Science 5272. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:8747959 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP Choosing a Data Model and Query Language for Provenance David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, Margo I. Seltzer Harvard University, Cambridge, Massachusetts [email protected] Abstract. The ancestry relationships found in provenance form a di- rected graph. Many provenance queries require traversal of this graph. The data and query models for provenance should directly and naturally address this graph-centric nature of provenance. To that end, we set out the requirements for a provenance data and query model and discuss why the common solutions (relational, XML, RDF) fall short. A semistruc- tured data model is more suited for handling provenance.
    [Show full text]