Sinew: a SQL System for Multi-Structured Data

Total Page:16

File Type:pdf, Size:1020Kb

Sinew: a SQL System for Multi-Structured Data Sinew: A SQL System for Multi-Structured Data Daniel Tahara Thaddeus Diamond Daniel J. Abadi Yale University Hadapt Yale University and Hadapt [email protected] [email protected] [email protected] ABSTRACT ahead of time. By removing this up-front data manage- As applications are becoming increasingly dynamic, the no- ment effort, developers can more quickly get their applica- tion that a schema can be created in advance for an appli- tion up and running without having to worry in advance cation and remain relatively stable is becoming increasingly about which attributes will exist in their datasets or about unrealistic. This has pushed application developers away their domains, types, and dependencies. from traditional relational database systems and away from Some argue that this `immediate gratification' style of ap- the SQL interface, despite their many well-established ben- plication development will result in long-term problems of efits. Instead, developers often prefer self-describing data code maintenance, sharing, and performance. However, pro- models such as JSON, and NoSQL systems designed specif- ponents of the NoSQL approach argue that for rapidly evolv- ically for their relaxed semantics. ing datasets, the costs of maintaining a schema are simply In this paper, we discuss the design of a system that en- too high. Whichever side of the debate one falls on, in prac- ables developers to continue to represent their data using tice, the amount of production data represented using key- self-describing formats without moving away from SQL and value and other semi-structured data formats is increasing, traditional relational database systems. Our system stores and as the volume and strategic importance of this data arbitrary documents of key-value pairs inside physical and increases, so too does the requirement to analyze it. virtual columns of a traditional relational database system, Some NoSQL database systems support primitives that en- and adds a layer above the database system that automati- able the stored data to be analyzed. For example, Mon- cally provides a dynamic relational view to the user against goDB currently provides a series of aggregation primitives which fully standard SQL queries can be issued. We demon- as well as a proprietary MapReduce framework to analyze strate that our design can achieve an order of magnitude data, whereas other NoSQL databases, such as Cassandra improvement in performance over alternative solutions, in- and HBase, connect to Hadoop directly, leveraging Hadoop cluding existing relational database JSON extensions, Mon- MapReduce and other execution frameworks such as Apache goDB, and shredding systems that store flattened key-value Tez to analyze data. data inside a relational database. Unfortunately there are significant drawbacks to using any of the options provided by the NoSQL databases. Local primitives are a step away from the SQL standard, which Categories and Subject Descriptors renders a large number of third party analysis and busi- H.2.4 [Database Management]: Systems ness intelligence tools (such as SAP Business Objects, IBM Cognos, Microstrategy, and Tableau) unusable. Meanwhile, while there are several projects in the Hadoop ecosystem Keywords that provide a SQL interface to data stored in Hadoop (such dynamic schema; SQL; RDBMS; NoSQL; JSON; MongoDB as Hadapt, Hive, and Impala), they require the user to create a schema before the data can be analyzed via SQL, which 1. INTRODUCTION eliminates a major reason why the NoSQL database was used in the first place. In some cases, a schema can be added A major appeal of NoSQL database systems such as Mon- after the fact fairly easily, but in many other cases signifi- goDB, CouchDB, Riak, Cassandra, or HBase as the storage cant processing, ETL, and cleaning work must be performed backend for modern applications is their flexibility to load, in order to make the data fit into a usable schema. store, and access data without having to define a schema This paper describes the design of a layer above a tra- ditional relational database system that enables standard Permission to make digital or hard copies of all or part of this work for personal or SQL-compliant queries to be issued over multi-structured classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation data (relational, key-value, or other types of semi-structured on the first page. Copyrights for components of this work owned by others than the data) without having to define a schema at any point in the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or analytics pipeline. The basic idea is to give the user a logi- republish, to post on servers or to redistribute to lists, requires prior specific permission cal view of a universal relation [13, 15] where a logical table and/or a fee. Request permissions from [email protected]. exists that contains one column for each unique key that SIGMOD’14, June 22–27, 2014, Snowbird, UT, USA. exists in the dataset. Nested data is flattened into sepa- Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2376-5/14/06...$15.00 http://dx.doi.org/10.1145/2588555.2612183. 815 rate columns, which can result in the logical table having queries, and data is read from HDFS and processed by the potentially hundreds or thousands of columns. SQL-on-Hadoop solution according to this schema. Since physically storing data in this way is impractical, Although these systems are more flexible than systems that the physical representation differs from the logical repre- require a user to define a schema at load time, they are sentation. Data is stored in a relational database system still limited by the requirement that, in order to issue SQL (RDBMS), but only a subset of the logical columns are ma- queries, the user must pre-define a target schema over which terialized as actual columns in the RDBMS, while the re- to execute those queries. Sinew drops this requirement, and maining columns are serialized into a single binary column automatically presents a logical view of the data to the user in the database system. These columns are then transpar- based on data rather than user input. ently rendered to the user by internally extracting the cor- Google Tenzing [10], Google Dremel [16], and Apache Drill2 responding values from the serialized data. offer the ability to query data through SQL without first The primary contribution of our work is a complete system defining a schema. Tenzing, which is a SQL-on-MapReduce architecture that enables a practical implementation of this system similar to Hive, infers a relational schema from the vision. This architecture contains the following components: underlying data but can only do so for flat structures that a relational storage layer, catalog, schema analyzer, column can be trivially mapped into a relational schema. In con- materializer, loader, query rewriter, and inverted text in- trast, Dremel and Drill (and Sinew) support nested data. dex. This paper describes the design, implementation, and However, the design of Drill and Dremel differ from Sinew interaction of these components. in that Drill and Dremel are only query execution engines, We also build a prototype of our proposed system, and designed purely for analysis. In contrast, Sinew is designed compare it against several alternatives for storing and ana- as an extension of a traditional RDBMS, adding support lyzing multi-structured data: (1) MongoDB (2) a JSON ex- for semi-structured and other key-value data on top of ex- tension of Postgres and (3) storing key-value pairs in entity- isting relational support. With this design, Sinew is able attribute-value `triple' format in a table in an RDBMS. We to support transactional updates, storage-integrated access- find that our prototype consistently outperforms these alter- control, and read/write concurrency control. Furthermore, natives (often by as much as an order of magnitude) while since it integrates with an RDBMS, Sinew can also bene- providing a more standard SQL interface. fit from the RDBMS's statistics gathering and cost-based query optimization capabilities. This makes Sinew similar to Pathfinder [8], a processing stack designed to convert from 2. RELATED WORK XML and XQuery to relational tuples and SQL, but Sinew Analytical systems that do not require a user to define differs in that it is more broadly purposed to support any a schema when data is loaded into the system generally fall form of multi-structured data and explicitly attempts to pro- into two categories: (1) systems that are specialized for non- vide a SQL interface to that data. relational data models and that do not have a SQL interface In addition to transparently providing a fully-featured re- (2) more general systems that offer a SQL or SQL-like inter- lational interface to multi-structured data, Sinew introduces face to the data at the cost of requiring the user to define the a novel approach to storing multi-structured data inside an structure of the underlying data before data can be queried. RDBMS. One common approach, historically used by XML In exchange for programming convenience or flexibility, databases, is to create a `shredder' that transforms docu- systems in the first category often require that the consumer ments into RDBMS records [5, 7, 14], often relying on some of the data sacrifice performance on some subset of process- variation on the edge model in order to map documents into ing tasks and/or learn a custom query language. For exam- relational tuples [5]. However, even when the underlying re- ple, Jaql offers a modular data processing platform through lational schema is optimized for relational primitives such the use of higher-order functions, but despite its extensibil- as projection and selection (e.g.
Recommended publications
  • NCR Counterpoint Course 307 Customizing Counterpoint
    NCR Counterpoint Course 307 Customizing Counterpoint ©NCR Corporation 2012 ©NCR Corporation 2012 For Your Information... Here is some logistical information to help you find your way around the training facilities: RESTROOMS Restrooms are located outside the Training Room on the left. The Women’s restroom is immediately to the left and the Men’s restroom is further down the hallway. BADGES All trainees are required to wear a VISITOR badge during their time in the facility. You will need to scan the badge to obtain access to the Training Room. Please return the badge to the instructor at the conclusion of your training. PHONES A telephone is available in the Training Room. Press 9 to get an outside line. Please turn off your cell phone while in class. WIRELESS The GUEST wireless network provides trainees with access to the ACCESS Internet. SMOKING This is a smoke-free building. A smoking area is provided in a patio area outside the back door of the building. Please use the ashtrays provided. REFRIGERATOR Help yourself to the drinks in the refrigerator in the Training Room. A filtered water dispenser is also available in the room. You can also use the refrigerator to store food items that you may have. LUNCH The instructor will stop class for approximately one hour to allow participants to enjoy their lunch. There are several restaurants within walking distance of the Memphis training facility. OFFICES We are proud of our offices but request that you have an escort to visit areas outside of the Training Room. APPOINTMENTS All of our employees schedule their time each day, so you will generally need an appointment to see them.
    [Show full text]
  • Embarcadero® Rapid SQL™ Quick Start Guide
    Product Documentation Embarcadero® Rapid SQL™ Quick Start Guide Version XE 2/8.0 1st Edition June, 2011 © 2011 Embarcadero Technologies, Inc. Embarcadero, the Embarcadero Technologies logos, and all other Embarcadero Technologies product or service names are trademarks or registered trademarks of Embarcadero Technologies, Inc. All other trademarks are property of their respective owners. This software/documentation contains proprietary information of Embarcadero Technologies, Inc.; it is provided under a license agreement containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. Embarcadero Technologies, Inc. is a leading provider of award-winning tools for application developers and database professionals so they can design systems right, build them faster and run them better, regardless of their platform or programming language. Ninety of the Fortune 100 and an active community of more than three million users worldwide rely on Embarcadero products to increase productivity, reduce costs, simplify change management and compliance and accelerate innovation. The company’s flagship tools include: Embarcadero® Change Manager™, CodeGear™ RAD Studio, DBArtisan®, Delphi®, ER/Studio®, JBuilder® and Rapid SQL®. Founded in 1993, Embarcadero is headquartered in San Francisco, with offices located around the world. Embarcadero is online at www.embarcadero.com. CORPORATE HEADQUARTERS EMEA HEADQUARTERS ASIA-PACIFIC HEADQUARTERS 100 CALIFORNIA STREET YORK HOUSE L7. 313 LA TROBE STREET 12TH FLOOR 18 YORK ROAD MELBOURNE VIC 3000 SAN FRANCISCO, CALIFORNIA MAIDENHEAD, BERKSHIRE AUSTRALIA 94111 USA SL6 1SF, UNITED KINGDOM CONTENTS Contents Introduction. .5 About Rapid SQL . 5 Major functional areas . 5 Benefits to Specific Users . 6 About This Book . 6 Installing Rapid SQL .
    [Show full text]
  • Essential SQL for Oracle 19C Index for Brochure
    Essential SQL for Oracle 19c Section Title Page One Introduction to SQL 2 - Definition of SQL 3 - Definition of a Database 4 Two Database Objects 5 - Introduction 6 - Tables 7 - Views 8 - Materialized Views 9 - Indexes 10 - Sequences 11 - Packages, Functions and Procedures 12 - Synonyms and Schemas 13 Three The SQL Plus Interface 14 - Introduction 15 - Anatomy of SQL Plus 20 - ORA-01017: invalid username/password; logon denied 21 - ORA-12154: TNS:could not resolve connect identifier 22 - Exiting an SQL Plus Session 23 - Using the DOS Window 24 - Copying and Pasting 25 - Using the Function Keys 28 - Using the Line Editor in SQL Plus 29 - Using the Editor in SQL Plus 30 - SQL Plus Environment Settings 31 - File Manipulation in SQL Plus 32 - Spooling Results in SQL Plus 33 - Changing Passwords / Clearing the Screen in SQL 34 Four SQL Scripts 35 - Commenting SQL Scripts 39 - Running SQL in Batch Scripts 41 - Passing Parameters with SQL Scripts 43 Five Simple Queries 45 - Introduction 46 - Oracle Metadata 47 - SQL Syntax 52 - Counting Records in an Oracle Table 55 - Displaying Literals in a Select statement 57 - Displaying Variables in a Select statement 60 - Exercise One 61 - Column Aliases 65 - Column Manipulation with SQL Functions 68 - Concatenating Columns in SQL 74 - Performing Calculations in SQL 75 - Ordering Data in SQL 76 - Ordering More than One Column 78 - Ordering with nulls first / nulls last 79 - Manipulating Dates in SQL 80 RJ12 Page 1 of 5 © Seer Computing Ltd Essential SQL for Oracle 19c Section Title Page - Displaying Dates
    [Show full text]
  • Mysql 5.7 Release Notes
    MySQL 5.7 Release Notes Abstract This document contains release notes for the changes in each release of MySQL 5.7, up through MySQL 5.7.37. For information about changes in a different MySQL series, see the release notes for that series. For additional MySQL 5.7 documentation, see the MySQL 5.7 Reference Manual, which includes an overview of features added in MySQL 5.7 (What Is New in MySQL 5.7), and discussion of upgrade issues that you may encounter for upgrades from MySQL 5.6 to MySQL 5.7 (Changes in MySQL 5.7). MySQL platform support evolves over time; please refer to https://www.mysql.com/support/supportedplatforms/ database.html for the latest updates. Updates to these notes occur as new product features are added, so that everybody can follow the development process. If a recent version is listed here that you cannot find on the download page (https://dev.mysql.com/ downloads/), the version has not yet been released. The documentation included in source and binary distributions may not be fully up to date with respect to release note entries because integration of the documentation occurs at release build time. For the most up-to-date release notes, please refer to the online documentation instead. For legal information, see the Legal Notices. For help with using MySQL, please visit the MySQL Forums, where you can discuss your issues with other MySQL users. Document generated on: 2021-10-06 (revision: 23433) Table of Contents Preface and Legal Notices ............................................................................................................ 2 Changes in MySQL 5.7.36 (Not yet released, General Availability) .................................................
    [Show full text]
  • Mysql 5.7 Error Message Reference Abstract
    MySQL 5.7 Error Message Reference Abstract This is the MySQL 5.7 Error Message Reference. It lists all error messages produced by server and client programs in MySQL 5.7. This document accompanies Error Messages and Common Problems, in MySQL 5.7 Reference Manual. For help with using MySQL, please visit the MySQL Forums, where you can discuss your issues with other MySQL users. Document generated on: 2021-09-23 (revision: 70881) Table of Contents Preface and Legal Notices .................................................................................................................. v 1 MySQL Error Reference .................................................................................................................. 1 2 Server Error Message Reference ..................................................................................................... 3 3 Client Error Message Reference .................................................................................................... 99 4 Global Error Message Reference ................................................................................................. 105 Index .............................................................................................................................................. 109 iii iv Preface and Legal Notices This is the MySQL 5.7 Error Message Reference. It lists all error messages produced by server and client programs in MySQL 5.7. Legal Notices Copyright © 1997, 2021, Oracle and/or its affiliates. This software and related documentation
    [Show full text]
  • Ÿþo R a C L E E N T E R P R I S E P E R F O R M a N C E M
    Oracle® Enterprise Performance Management Workspace, Fusion Edition User’s Guide RELEASE 11.1.2.1 EPM Workspace User’s Guide, 11.1.2.1 Copyright © 1989, 2011, Oracle and/or its affiliates. All rights reserved. Authors: EPM Information Development Team This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this software or related documentation is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS: Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007).
    [Show full text]
  • Mysql 8.0 What's New in Data Dictionary And
    MySQL 8.0 What’s new in Data Dictionary and DDL Bin Su Oracle,ACMUG&CRUG MySQL 2018 April 2018 CopyrightCopyright ©© 2016,2018,2018, OracleOracle and/orand/or itsits aliates.aliates. AllAll rightsrights reserved.reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features orACMUG&CRUG functionality described for Oracle’s 2018 products remains at the sole discretion of Oracle. Copyright © 2018, Oracle and/or its aliates. All rights reserved. | 2 Agenda 1 What is new in Data Dictionary 2 What is new in DDL 3 Summary ACMUG&CRUG 2018 Copyright © 2018, Oracle and/or its aliates. All rights reserved. | What is new in Data Dictionary ACMUG&CRUG 2018 Copyright © 2018, Oracle and/or its aliates. All rights reserved. | Traditional Data Dictionary Data Dictionary SQL Files File System FRMOPTOPT TRGOPTOPT OPT System Tables (mysql.*) MyISAM user events proc File/Table InnoDB System Tables I_S scan ACMUG&CRUGSYS_* 2018InnoDB Copyright © 2018, Oracle and/or its aliates. All rights reserved. | New Data Dictionary Data Dictionary SQL Core DD tables InnoDB DD tablespace System tables I_S ViewsACMUG&CRUG 2018 Copyright © 2018, Oracle and/or its aliates. All rights reserved. | Data Dictionary Storage Engine ● InnoDB plays an important role ● MySQL Data Dictionary Storage Engine ● Single set of persisted metadata for all storage engines ● Control meta-data access using single locking mechanism ● Improves table spaces by removing .frm 3les ● Optimizations for DD table access ● All DD tables are put into a dedicated DD tablespace(mysql.ibd) ● Atomic DDLACMUG&CRUG → Transactional DDL 2018 Copyright © 2018, Oracle and/or its aliates.
    [Show full text]
  • Sql Alter Table Column Name
    Sql Alter Table Column Name exoticSternitic Pincas and dicastic still deplumed Osborne his never methylation throw-aways skimpily. his insectifuge! Shaughn is diplomatical: she title slantingly and hiccuped her vitalists. Grouchy and What data into a primary key mytable_id_pk using? ALTER anything IF EXISTS name RENAME TO newname ALTER anything IF EXISTS name ADD input IF NOT EXISTS columnname datatype. How can add clause as well other statement will be added as you can alter table or clause specifies whether or manual refresh or output. This guide if any sql programs that column name and management for the idkey index on to. Instead of a primary key, and attributes for adding or another alter. SQL Server ALTER TABLE ADD an overview SQLShack. Are still works for current name in order most table name in object. To a name of a version of lines at scale of a primary key constraint. SQL ALTER TABLE Statement W3Schools. RENAME COLUMN CockroachDB Docs. How please change existing columns in an SQL table QualiTest. Change a field to the alter an alter column mode, drop the partition expressions are not supported. Documentation 91 ALTER TABLE PostgreSQL. RENAME COLUMN statement Use the RENAME COLUMN statement to rename a column in error table The RENAME COLUMN statement allows you to rename an. ALTER TABLE Snowflake Documentation. MySQL 0 Reference Manual 1319 ALTER TABLE. Sql database testing, names named constraint name could find him with dropped. Introduction to Db2 ALTER TABLE paid COLUMN statement First concern the cheerful of the hue which trout want it perform rapid change they the story TABLE.
    [Show full text]
  • JSON Performance Features in Oracle 12C Release 2 Onwards
    JSON Performance features in Oracle 12c Release 2 O R A C L E W H I T E P A P E R | M A R C H 2 0 1 7 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. JSON Performance features in Oracle 12c Release 2 Page 1 Table of Contents Disclaimer 1 Introduction 3 Brief Introduction to JSON and Querying JSON in Oracle 12c 4 Storing JSON 4 JSON Path Expressions 5 Querying JSON 5 Indexing JSON 6 NoBench Benchmark 8 Benchmark Details 8 Performance Enhancements for JSON data 10 JSON with In-Memory Columnar Store (IMC) 10 Configuring In-Memory store for JSON documents 10 Benchmark Results 13 In-Memory Store Sizing 13 Functional and JSON Search Index 14 Benchmark Results 15 In-Memory Store and Functional and JSON Search Indexes 16 Performance Guidelines 17 Conclusion 18 JSON performance features in Oracle 12c Release 2 Page 2 Introduction JSON (JavaScript Object Notation) is a lightweight data interchange format. JSON is text based which makes it easy for humans to read and write and for machine to read and parse and generate. JSON being text based makes it completely language independent. JSON has gained a wide popularity among application developers (specifically web application developers) and is used as a persistent format for application data.
    [Show full text]
  • Sqlobject Documentation Release 3.3.0
    SQLObject Documentation Release 3.3.0 Ian Bicking and contributors May 07, 2017 Contents 1 Documentation 3 2 Example 75 3 Indices and tables 77 i ii SQLObject Documentation, Release 3.3.0 SQLObject is a popular Object Relational Manager for providing an object interface to your database, with tables as classes, rows as instances, and columns as attributes. SQLObject includes a Python-object-based query language that makes SQL more abstract, and provides substantial database independence for applications. Contents 1 SQLObject Documentation, Release 3.3.0 2 Contents CHAPTER 1 Documentation Download SQLObject The latest releases are always available on the Python Package Index, and is installable with pip or easy_install. You can install the latest release with: pip install-U SQLObject or: easy_install-U SQLObject You can install the latest version of SQLObject with: easy_install SQLObject==dev You can install the latest bug fixing branch with: easy_install SQLObject==bugfix If you want to require a specific revision (because, for instance, you need a bugfix that hasn’t appeared in a release), you can put this in your setuptools using setup.py file: setup(... install_requires=["SQLObject==bugfix,>=0.7.1dev-r1485"], ) This says that you need revision 1485 or higher. But it also says that you can aquire the “bugfix” version to try to get that. In fact, when you install SQLObject==bugfix you will be installing a specific version, and “bugfix” is just a kind of label for a way of acquiring the version (it points to a branch in the repository). 3 SQLObject Documentation, Release 3.3.0 Repositories The SQLObject git repositories are located at https://github.com/sqlobject and https://sourceforge.net/p/sqlobject/_list/ git Before switching to git development was performed at the Subversion repository that is no longer available.
    [Show full text]
  • Informatica Data Replication Release Notes 9.8.0 Hotfix 1 March 2019 © Copyright Informatica LLC 2019
    Informatica® Data Replication 9.8.0 HotFix 1 Release Notes Informatica Data Replication Release Notes 9.8.0 HotFix 1 March 2019 © Copyright Informatica LLC 2019 Publication Date: 2019-03-26 Table of Contents Introduction..................................................................... iv Chapter 1: Installation and Upgrades......................................... 5 Chapter 2: Fixes.............................................................. 6 Chapter 3: Known Limitations................................................. 8 Chapter 4: Informatica Global Customer Support............................. 13 Table of Contents 3 Introduction This document contains important information about enhancements, fixes, and known limitations in Informatica Data Replication 9.8.0 HotFix 1. iv C h a p t e r 1 Installation and Upgrades For information about installing or upgrading Data Replication, see the Data Replication Installation and Upgrade Guide. 5 C h a p t e r 2 Fixes The following table describes fixed limitations: Bug Description IDR-4073 When the Oracle Extractor parses a chained or migrated row, it might incorrectly determine the type of a DML operation. For example, it might interpret an Update operation as a Delete. IDR-4068 For Microsoft SQL Server 2016 sources, the Extractor ends with the following error when processing an Update record: Could not convert character data from UTF16 to UTF8. This error can occur when null values are updated to non-null values. IDR-4052 If a ROLLBACK TO SAVEPOINT operation occurs on an Oracle source, the Applier might skip records that are not rolled back on the source when applying data to the target. IDR-4050 The DB2 Extractor does not capture values of primary key columns in Update records if these columns also belong to unique indexes for which DB2 DATA CAPTURE is not enabled.
    [Show full text]
  • Teaching an Old Elephant New Tricks
    Teaching an Old Elephant New Tricks Nicolas Bruno Microsoft Research [email protected] ABSTRACT data by column results in better compression than what is possi- In recent years, column stores (or C-stores for short) have emerged ble in a row-store. Some compression techniques used in C-stores as a novel approach to deal with read-mostly data warehousing ap- (such as dictionary or bitmap encoding) can also be applied to row- plications. Experimental evidence suggests that, for certain types of stores. However, RLE encoding, which replaces a sequence of the queries, the new features of C-stores result in orders of magnitude same value by a pair (value, count) is a technique that cannot be improvement over traditional relational engines. At the same time, directly used in a row-store, because wide tuples rarely agree on all some C-store proponents argue that C-stores are fundamentally dif- attributes. The final ingredient in a C-store is the ability to perform ferent from traditional engines, and therefore their benefits cannot query processing over compressed data as much as possible (see [5] be incorporated into a relational engine short of a complete rewrite. for an in-depth study on C-stores). In this paper we challenge this claim and show that many of the C-stores claim to be much more efficient than traditional row- benefits of C-stores can indeed be simulated in traditional engines stores. The experimental evaluation in [15] results in C-stores being with no changes whatsoever. We then identify some limitations 164x faster on average than row-stores, and other evaluations [14] of our “pure-simulation” approach for the case of more complex report speedups from 30x to 16,200x (!).
    [Show full text]