Query Processing for SQL Updates

Total Page:16

File Type:pdf, Size:1020Kb

Query Processing for SQL Updates Query Processing for SQL Updates Cesar´ A. Galindo-Legaria Stefano Stefani Florian Waas [email protected] [email protected][email protected] Microsoft Corp., One Microsoft Way, Redmond, WA 98052 ABSTRACT tradeoffs between plans that do serial or random I/Os. This moti- A rich set of concepts and techniques has been developed in the vates the integration of update processing in the general framework context of query processing for the efficient and robust execution of query processing. To be successful, this integration needs to of queries. So far, this work has mostly focused on issues related model update processing in a suitable way and consider the special to data-retrieval queries, with a strong backing on relational alge- requirements of updates. One of the few papers in this area is [2], bra. However, update operations can also exhibit a number of query which deals with delete operations, but there is little documented processing issues, depending on the complexity of the operations on the integration of updates with query processing, to the best of and the volume of data to process. Such issues include lookup and our knowledge. matching of values, navigational vs. set-oriented algorithms and In this paper, we present an overview of the basic concepts used trade-offs between plans that do serial or random I/Os. to support SQL Data Manipulation Language (DML) by the query In this paper we present an overview of the basic techniques used processor in Microsoft SQL Server. We focus on the query process- to support SQL DML (Data Manipulation Language) in Microsoft ing aspects of the problem, how data is modeled, primitive opera- SQL Server. Our focus is on the integration of update operations tors and different execution plans. There are a number of other as- into the query processor, the query execution primitives required to pects required in a complete implementation, notably concurrency support updates, and the update-specific considerations to analyze control and locking, and algorithms for the manipulation of B-trees. and execute update plans. Full integration of update processing in Those subjects are outside of the scope of the present paper. the query processor provides a robust and flexible framework and leverages existing query processing techniques. 2. SQL UPDATES The concepts presented in this paper can be applied to a num- ber of data modification scenarios, but we focus on the basic op- 1. INTRODUCTION erations and functionality provided by SQL. This functionality is Data update operations are a necessary part of customer applica- briefly reviewed in this section. Examples throughout the paper tions, and their efficient processing is as important as that of data- use the well-known TPC-H schema [4]. retrieval operations. When data is inserted, deleted or modified in The basic data-modification statements provided by SQL are IN- a database, the DBMS (Database Management System) needs to SERT, DELETE, and UPDATE. Their target is a single table, al- validate the changes against declared constraints and maintain the though multiple tables may be modified as a result of the changes, underlying storage structures to preserve a consistent and correct as we shall see later. INSERT takes a collection of rows computed representation of the data. using an arbitrary SQL query and adds those rows to the target ta- Some aspects of update processing have been thoroughly studied ble. DELETE and UPDATE use a WHERE condition to qualify and documented in the past. Transactions and concurrency control the rows of interest, either to be removed from the table or to as- issues [1] are examples where formal models, algorithms, and per- sign values to their columns. It is valid to use subqueries in either formance analyses have been developed over time and contribute the qualifying predicate or the assignment values, so multiple tables to our understanding of the problem space. It has also led to suc- and arbitrarily complex queries are often part of data modification cessful and robust commercial implementations. statements. Depending on the complexity of the operations and the volume The following simple queries show deletion and update of a row of data to process, standard query processing issues also surface in the ORDERS table: during the execution of updates. These issues include lookup and matching of values, navigational vs. set-oriented algorithms and DELETE ORDERS WHERE O ORDERKEY = 3271 UPDATE ORDERS SET O PRIORITY = URGENT Permission to make digital or hard copies of all or part of this work for WHERE O ORDERKEY = 3271 personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies Update operations need to be validated against declared con- bear this notice and the full citation on the first page. To copy otherwise, to straints. Several forms of constraints are supported by SQL. republish, to post on servers or to redistribute to lists, requires prior specific Check constraints are predicates that enforce the domain of a permission and/or a fee. SIGMOD 2004 June 13-18, 2004, Paris, France. column, or relationships between columns of a single table. For ex- Copyright 2004 ACM 1-58113-859-8/04/06 . $5.00. ample, a constraint can be O ORDERSTATUS IN (“O”, “F”, “P”). When inserting a row or updating column values, the DBMS re- jects the operation if the column values fail to satisfy the check constraint. A common special case of this form of constraint is to declare a column NOT NULL. UPD(S) secondary application Referential integrity (RI) constraints establish a relationship be- optional secondary ∆−Stream tween tables, indicating that the values appearing in one of the tables must be found in the key of another table. For example, the values of column O CUSTKEY in ORDERS must be found in C CUSTKEY of CUSTOMER. The column C CUSTKEY is called the referenced key, and the column O CUSTKEY of OR- UPD(T) primary application DERS is called the foreign key. primary ∆−Stream Finally, uniqueness constraints indicate that the value of one of more columns cannot have duplicates within the table, such as Read−Query C CUSTKEY in CUSTOMER. In addition to validation, update operations need to maintain the underlying storage structures. From a query processing point of T view, maintenance consists of finding all the underlying structures that represent the logical data and issuing the necessary modifica- Figure 1: General form for update plan. tions to keep them consistent and accurate for the data represented. In practice, this means updating all necessary indices and material- ized views that depend on the changed data. Due to space limita- Figure 1 shows a general template for the execution plan of up- tions, we will focus on index maintenance issues in this paper. date statements. It comprises two components. The first is read- Cascading RI constraints combine validation with maintenance only and responsible for delivering a delta stream that specifies the of structures. They specify, for example, that if the value of changes to be applied to the target table. The second component C CUSTKEY is changed then the system needs to automatically consumes the delta stream, applying the changes to the base table, change the value of the referencing O CUSTKEY to preserve the and then performs all the actions that the DML statement implicitly constraint. This is one example where multiple tables are modified fires. The list of actions to perform is determined by enumerating as a result of an update statement, even though its target is a single all the active dependencies against the target table and deciding base table. which of these are required by the current statement. 3. MODELING UPDATES IN QUERY PRO- 4. INDEX MAINTENANCE CESSING 4.1 Indices in Microsoft SQL Server The query processor of Microsoft SQL Server is based on an In Microsoft SQL Server, each base table has a “primary access algebraic, extensible model that supports the addition of new oper- path” storing the table contents. Such primary path can be either ators as necessary. Both in query optimization and in query execu- a clustered index, if organized as a B-tree, or a heap otherwise. tion, the framework provides a contract for operators and manipu- In either case, there is always a locator consisting of one or more lates trees of those operators. This general philosophy is formulated columns whose value uniquely identifies a row within the table and in [3] and it serves as the background for the work described here. can be used to locate it efficiently. A heap is not ordered; however, The following concepts make up the backbone of our update pro- a row is uniquely identified by a physical locator called RID. In a cessing approach: clustered index, a row can be uniquely identified with the values for the index keys, if the index is unique. If the clustered index is non- Delta stream. A delta stream is a collection of rows encoding unique, an extra, hidden column is added to the index keys, called changes to a particular base table. This is simply a relation uniquifier. The uniquifier column is used to distinguish between with a well-defined schema and as such it can be processed two rows with identical clustered index keys, and is automatically by any relational operator. Two different relational encod- assigned by the storage engine when populating or maintaining the ings of delta streams will be detailed later. B-tree. Secondary indices, also called non-clustered indices in Microsoft Application of a delta stream. A side-effecting operator called SQL Server, can also be created for the base table and they also StreamUpdate issues a data-modification command to the correspond to B-trees.
Recommended publications
  • SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations
    SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations Johes Bater Yongjoo Park Xi He Northwestern University University of Illinois (UIUC) University of Waterloo [email protected] [email protected] [email protected] Xiao Wang Jennie Rogers Northwestern University Northwestern University [email protected] [email protected] ABSTRACT 1. INTRODUCTION A private data federation enables clients to query the union of data Querying the union of multiple private data stores is challeng- from multiple data providers without revealing any extra private ing due to the need to compute over the combined datasets without information to the client or any other data providers. Unfortu- data providers disclosing their secret query inputs to anyone. Here, nately, this strong end-to-end privacy guarantee requires crypto- a client issues a query against the union of these private records graphic protocols that incur a significant performance overhead as and he or she receives the output of their query over this shared high as 1,000× compared to executing the same query in the clear. data. Presently, systems of this kind use a trusted third party to se- As a result, private data federations are impractical for common curely query the union of multiple private datastores. For example, database workloads. This gap reveals the following key challenge some large hospitals in the Chicago area offer services for a cer- in a private data federation: offering significantly fast and accurate tain percentage of the residents; if we can query the union of these query answers without compromising strong end-to-end privacy. databases, it may serve as invaluable resources for accurate diag- To address this challenge, we propose SAQE, the Secure Ap- nosis, informed immunization, timely epidemic control, and so on.
    [Show full text]
  • Support Aggregate Analytic Window Function Over Large Data by Spilling
    Support Aggregate Analytic Window Function over Large Data by Spilling Xing Shi and Chao Wang Guangdong University of Technology, Guangzhou, Guangdong 510006, China North China University of Technology, Beijing 100144, China Abstract. ​Analytic function, also called window function, is to query the aggregation of data over a sliding window. For example, a simple query over the online stock platform is to return the average price of a stock of the last three days. These functions are commonly used features in SQL databases. They are supported in most of the commercial databases. With the increasing usage of cloud data infra and machine learning technology, the frequency of queries with analytic window functions rises. Some analytic functions only require const space in memory to store the state, such as SUM, AVG, while others require linear space, such as MIN, MAX. When the window is extremely large, the memory space to store the state may be too large. In this case, we need to spill the state to disk, which is a heavy operation. In this paper, we proposed an algorithm to manipulate the state data in the disk to reduce the disk I/O to make spill available and efficiency. We analyze the complexity of the algorithm with different data distribution. 1. Introducion In this paper, we develop novel spill techniques for analytic window function in SQL databases. And discuss different various types of aggregate queries, e.g., COUNT, AVG, SUM, MAX, MIN, etc., over a relational table. Aggregate ​analytic function, also called aggregate window function, is to query the aggregation of data over a sliding window.
    [Show full text]
  • Veritas: Shared Verifiable Databases and Tables in the Cloud
    Veritas: Shared Verifiable Databases and Tables in the Cloud Lindsey Alleny, Panagiotis Antonopoulosy, Arvind Arasuy, Johannes Gehrkey, Joachim Hammery, James Huntery, Raghav Kaushiky, Donald Kossmanny, Jonathan Leey, Ravi Ramamurthyy, Srinath Settyy, Jakub Szymaszeky, Alexander van Renenz, Ramarathnam Venkatesany yMicrosoft Corporation zTechnische Universität München ABSTRACT A recent class of new technology under the umbrella term In this paper we introduce shared, verifiable database tables, a new "blockchain" [11, 13, 22, 28, 33] promises to solve this conundrum. abstraction for trusted data sharing in the cloud. For example, companies A and B could write the state of these shared tables into a blockchain infrastructure (such as Ethereum [33]). The KEYWORDS blockchain then gives them transaction properties, such as atomic- ity of updates, durability, and isolation, and the blockchain allows Blockchain, data management a third party to go through its history and verify the transactions. Both companies A and B would have to write their updates into 1 INTRODUCTION the blockchain, wait until they are committed, and also read any Our economy depends on interactions – buyers and sellers, suppli- state changes from the blockchain as a means of sharing data across ers and manufacturers, professors and students, etc. Consider the A and B. To implement permissioning, A and B can use a permis- scenario where there are two companies A and B, where B supplies sioned blockchain variant that allows transactions only from a set widgets to A. A and B are exchanging data about the latest price for of well-defined participants; e.g., Ethereum [33]. widgets and dates of when widgets were shipped.
    [Show full text]
  • SQL and Management of External Data
    SQL and Management of External Data Jan-Eike Michels Jim Melton Vanja Josifovski Oracle, Sandy, UT 84093 Krishna Kulkarni [email protected] Peter Schwarz Kathy Zeidenstein IBM, San Jose, CA {janeike, vanja, krishnak, krzeide}@us.ibm.com [email protected] SQL/MED addresses two aspects to the problem Guest Column Introduction of accessing external data. The first aspect provides the ability to use the SQL interface to access non- In late 2000, work was completed on yet another part SQL data (or even SQL data residing on a different of the SQL standard [1], to which we introduced our database management system) and, if desired, to join readers in an earlier edition of this column [2]. that data with local SQL data. The application sub- Although SQL database systems manage an mits a single SQL query that references data from enormous amount of data, it certainly has no monop- multiple sources to the SQL-server. That statement is oly on that task. Tremendous amounts of data remain then decomposed into fragments (or requests) that are in ordinary operating system files, in network and submitted to the individual sources. The standard hierarchical databases, and in other repositories. The does not dictate how the query is decomposed, speci- need to query and manipulate that data alongside fying only the interaction between the SQL-server SQL data continues to grow. Database system ven- and foreign-data wrapper that underlies the decompo- dors have developed many approaches to providing sition of the query and its subsequent execution. We such integrated access. will call this part of the standard the “wrapper inter- In this (partly guested) article, SQL’s new part, face”; it is described in the first half of this column.
    [Show full text]
  • SQL Stored Procedures
    Agenda Key:31MA Session Number:409094 DB2 for IBM i: SQL Stored Procedures Tom McKinley ([email protected]) DB2 for IBM i consultant IBM Lab Services 8 Copyright IBM Corporation, 2009. All Rights Reserved. This publication may refer to products that are not currently available in your country. IBM makes no commitment to make available any products referred to herein. What is a Stored Procedure? • Just a called program – Called from SQL-based interfaces via SQL CALL statement • Supports input and output parameters – Result sets on some interfaces • Follows security model of iSeries – Enables you to secure your data – iSeries adopted authority model can be leveraged • Useful for moving host-centric applications to distributed applications 2 © 2009 IBM Corporation What is a Stored Procedure? • Performance savings in distributed computing environments by dramatically reducing the number of flows (requests) to the database engine – One request initiates multiple transactions and processes R R e e q q u u DB2 for i5/OS DB2DB2 for for i5/OS e e AS/400 s s t t SP o o r r • Performance improvements further enhanced by the option of providing result sets back to ODBC, JDBC, .NET & CLI clients 3 © 2009 IBM Corporation Recipe for a Stored Procedure... 1 Create it CREATE PROCEDURE total_val (IN Member# CHAR(6), OUT total DECIMAL(12,2)) LANGUAGE SQL BEGIN SELECT SUM(curr_balance) INTO total FROM accounts WHERE account_owner=Member# AND account_type IN ('C','S','M') END 2 Call it (from an SQL interface) over and over CALL total_val(‘123456’, :balance) 4 © 2009 IBM Corporation Stored Procedures • DB2 for i5/OS supports two types of stored procedures 1.
    [Show full text]
  • SQL Database Management Portal
    APPENDIX A SQL Database Management Portal This appendix introduces you to the online SQL Database Management Portal built by Microsoft. It is a web-based application that allows you to design, manage, and monitor your SQL Database instances. Although the current version of the portal does not support all the functions of SQL Server Management Studio, the portal offers some interesting capabilities unique to SQL Database. Launching the Management Portal You can launch the SQL Database Management Portal (SDMP) from the Windows Azure Management Portal. Open a browser and navigate to https://manage.windowsazure.com, and then login with your Live ID. Once logged in, click SQL Databases on the left and click on your database from the list of available databases. This brings up the database dashboard, as seen in Figure A-1. To launch the management portal, click the Manage icon at the bottom. Figure A-1. SQL Database dashboard in Windows Azure Management Portal 257 APPENDIX A N SQL DATABASE MANAGEMENT PORTAL N Note You can also access the SDMP directly from a browser by typing https://sqldatabasename.database. windows.net, where sqldatabasename is the server name of your SQL Database. A new web page opens up that prompts you to log in to your SQL Database server. If you clicked through the Windows Azure Management Portal, the database name will be automatically filled and read-only. If you typed the URL directly in a browser, you will need to enter the database name manually. Enter a user name and password; then click Log on (Figure A-2).
    [Show full text]
  • Case in Insert Statement Sql
    Case In Insert Statement Sql Unreleased Randal disbosoms: he despond his lordolatry negligibly and connectively. Is Dale black-and-white when Willi intertraffic bimanually? Goddard still spirit ideographically while untenable Vernor belove that banquettes. This case statement in sql case insert into a safe place. For sql server database must be inserted row to retain in tables created in other hand side of a rating from a real work properly. Getting rows of specific columns from existing table by using CASE statement with ORDER BY clause. FYI, your loan may vary. Given a sql users view to update and inserts. As shown in excel above denote, the insertion of deceased in permanent new ship from the existing table was successful. Is used to query techniques, though an interval to their firms from emp_master table? By inserting rows to insert a single value in for a equality expressions, we have inserted into another table variables here, false predicate is true. Count function can actually gain a dress in gates the join produces a founder of consent. Migration solutions for only those values? Instead of in case insert statement sql sql for each programming. Salesforce logos and inserts new row. In PROC SQL, you can do the same with CREATE TABLE and INSERT INTO statement. Sometimes goods might develop to focus during a portion of the Publishers table, such trust only publishers that register in Vancouver. Net mvc with this article has, and you may then correspond to. Please leave your head yet so unsure if. If ELSE was not offend and none set the Boolean_expression return which, then Null will be displayed.
    [Show full text]
  • 2.2 Update, Insert, Delete Stored Procedure Examples
    SYSTEMS DESIGN / CAPSTONE PROJECT MIS 413 User Guide 2.2 Creating Update, Insert and Delete Stored Procedures This guide will demonstrate building Insert, Update and Delete Stored Procedures with a few input parameters using the Query Editor Tool. They all assume you have a person table with a personID (auto number), FN, LN, Email, PWD attributes. Example A: Update Stored Procedure The easiest manner to create a stored procedure is to let the wizard in SQL assist you. 1) Open SQL and the proper database 2) Expand the ‘programmability’ tab under your database 3) Right click on stored procedures and click ‘new stored procedure’ 4) Consistent and proper naming of your stored procedure will help you and future developers understand their purpose. A guideline for this class would be: Name of the Table, the Operation (select, update, etc.), Optional Word(s) as in: personUpdate 5) Add your name of the procedure (without the < > brackets) as in: CREATE PROCEDURE personUpdate in the first row of the stored procedure. 6) To build your first UPDATE, use the Design in Query Editor Option. Delete the following row in your Stored Procedure: SELECT <@Param1, sysname, @p1>, <@Param2, sysname, @p2> 7) Then right click in the blank area and select DESIGN IN QUERY EDITOR A dialog box will appear and select the name of the table to be updated. A general rule is you will only UPDATE one table at a time. Do not try to UPDATE 2 or more tables with one UPDATE statement. If you need to update more than one table, you can do two separate UPDATE statements in the same Stored Procedure.
    [Show full text]
  • SQL/PSM Stored Procedures Basic PSM Form Parameters In
    Stored Procedures PSM, or “persistent, stored modules,” SQL/PSM allows us to store procedures as database schema elements. PSM = a mixture of conventional Procedures Stored in the Database statements (if, while, etc.) and SQL. General-Purpose Programming Lets us do things we cannot do in SQL alone. 1 2 Basic PSM Form Parameters in PSM CREATE PROCEDURE <name> ( Unlike the usual name-type pairs in <parameter list> ) languages like C, PSM uses mode- <optional local declarations> name-type triples, where the mode can be: <body>; IN = procedure uses value, does not Function alternative: change value. CREATE FUNCTION <name> ( OUT = procedure changes, does not use. <parameter list> ) RETURNS <type> INOUT = both. 3 4 1 Example: Stored Procedure The Procedure Let’s write a procedure that takes two CREATE PROCEDURE JoeMenu ( arguments b and p, and adds a tuple IN b CHAR(20), Parameters are both to Sells(bar, beer, price) that has bar = IN p REAL read-only, not changed ’Joe’’s Bar’, beer = b, and price = p. ) Used by Joe to add to his menu more easily. INSERT INTO Sells The body --- VALUES(’Joe’’s Bar’, b, p); a single insertion 5 6 Invoking Procedures Types of PSM statements --- (1) Use SQL/PSM statement CALL, with the RETURN <expression> sets the return name of the desired procedure and value of a function. arguments. Unlike C, etc., RETURN does not terminate Example: function execution. CALL JoeMenu(’Moosedrool’, 5.00); DECLARE <name> <type> used to declare local variables. Functions used in SQL expressions wherever a value of their return type is appropriate.
    [Show full text]
  • Sql Statement Used to Update Data in a Database
    Sql Statement Used To Update Data In A Database Beaufort remains foresightful after Worden blotted drably or face-off any wodge. Lyophilised and accompanied Wait fleyed: which Gonzales is conceived enough? Antibilious Aub never decolorising so nudely or prickles any crosiers alluringly. Stay up again an interval to rome, prevent copying data source code specifies the statement used to sql update data in database because that can also use a row for every row deletion to an answer The alias will be used in displaying the name. Which SQL statement is used to update data part a database MODIFY the SAVE draft UPDATE UPDATE. Description of the illustration partition_extension_clause. Applicable to typed views, the ONLY keyword specifies that the statement should apply only future data use the specified view and rows of proper subviews cannot be updated by the statement. This method allows us to copy data from one table then a newly created table. INSERT specifies the table or view that data transfer be inserted into. If our users in the update statement multiple users to update, so that data to in sql update statement used a database to all on this syntax shown below so that they are. Already seen an account? DELETE FROM Employees; Deletes all the records from master table Employees. The type for the queries to design window that to sql update data in a database company is sql statement which will retrieve a time, such a new value. This witch also allows the caller to store turn logging on grid off. Only one frame can edit a dam at original time.
    [Show full text]
  • Plan Stitch: Harnessing the Best of Many Plans
    Plan Stitch: Harnessing the Best of Many Plans Bailu Ding Sudipto Das Wentao Wu Surajit Chaudhuri Vivek Narasayya Microsoft Research Redmond, WA 98052, USA fbadin, sudiptod, wentwu, surajitc, [email protected] ABSTRACT optimizer chooses multiple query plans for the same query, espe- Query performance regression due to the query optimizer select- cially for stored procedures and query templates. ing a bad query execution plan is a major pain point in produc- The query optimizer often makes good use of the newly-created indexes and statistics, and the resulting new query plan improves tion workloads. Commercial DBMSs today can automatically de- 1 tect and correct such query plan regressions by storing previously- in execution cost. At times, however, the latest query plan chosen executed plans and reverting to a previous plan which is still valid by the optimizer has significantly higher execution cost compared and has the least execution cost. Such reversion-based plan cor- to previously-executed plans, i.e., a plan regresses [5, 6, 31]. rection has relatively low risk of plan regression since the deci- The problem of plan regression is painful for production applica- sion is based on observed execution costs. However, this approach tions as it is hard to debug. It becomes even more challenging when ignores potentially valuable information of efficient subplans col- plans change regularly on a large scale, such as in a cloud database lected from other previously-executed plans. In this paper, we service with automatic index tuning. Given such a large-scale pro- propose a novel technique, Plan Stitch, that automatically and op- duction environment, detecting and correcting plan regressions in portunistically combines efficient subplans of previously-executed an automated manner becomes crucial.
    [Show full text]
  • Exploring Query Re-Optimization in a Modern Database System
    Radboud University Nijmegen Institute of Computing and Information Sciences Exploring Query Re-Optimization In a Modern Database System Master Thesis Computing Science Supervisor: Prof. dr. ir. Arjen P. Author: de Vries BSc Laurens N. Kuiper Second reader: Prof. dr. ir. Djoerd Hiemstra May 2021 Abstract The standard approach to query processing in database systems is to optimize before executing. When available statistics are accurate, optimization yields the optimal plan, and execution is as quick as it can be. However, when queries become more complex, the quality of statistics degrades, which leads to sub-optimal query plans, sometimes up to several orders of magnitude worse. Improving statistics for these queries is infeasible because the cardinality estimation error propagates exponentially through the plan. This problem can be alleviated through re-optimization, which is to progressively optimize the query, executing parts of the plan at a time, improving plan quality at the cost of additional overhead. In this work, re-optimization is implemented and simulated in DuckDB, a main-memory database system designed for analytical workloads, and evaluated on the Join Order Benchmark. Total plan cost of the benchmark is reduced by up to 44% using this method, and a reduction in end-to-end query latency of up to 20% is observed using only simple re-optimization schemes, showing the potential of this approach. Acknowledgements The past year has been a challenging and exciting year. From breaking my leg to starting a job, and finally finishing my thesis. I would like to thank everyone who supported me during this period, and helped me get here.
    [Show full text]