Logical Plan 1 Example Query

Total Page:16

File Type:pdf, Size:1020Kb

Logical Plan 1 Example Query Review: Query Evaluation Steps SQL query CSE 444: Database Internals Parse & Rewrite Query Logical Select Logical Plan Query plan optimization Lecture 10 Select Physical Plan Query Optimization (part 1) Physical plan Query Execution Magda Balazinska - CSE 444, Spring 2012 1 Disk 2 What We Already Know… Example Query: Logical Plan 1 Supplier(sno,sname,scity,sstate) π Part(pno,pname,psize,pcolor) sname Supply(sno,pno,price) For each SQL query…. σ sscity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 SELECT S.sname FROM Supplier S, Supply U WHERE S.scity='Seattle' AND S.sstate='WA’ AND S.sno = U.sno sno = sno AND U.pno = 2 There exist many logical query plan… Supplier Supply Magda Balazinska - CSE 444, Spring 2012 3 Magda Balazinska - CSE 444, Spring 2012 4 Example Query: Logical Plan 2 What We Also Know • For each logical plan… π sname • There exist many physical plans sno = sno σ sscity=‘Seattle’ ∧sstate=‘WA’ σ pno=2 Supplier Supply Magda Balazinska - CSE 444, Spring 2012 5 Magda Balazinska - CSE 444, Spring 2012 6 1 Example Query: Physical Plan 1 Example Query: Physical Plan 2 (On the fly) π sname (On the fly) π sname (On the fly) (On the fly) σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 (Nested loop) (Index nested loop) sno = sno sno = sno Supplier Supply Supplier Supply (File scan) (File scan) (File scan) (Index scan) Magda Balazinska - CSE 444, Spring 2012 7 Magda Balazinska - CSE 444, Spring 2012 8 Query Optimizer Overview Today • Input: A logical query plan Estimating the cost of a query plan • Output: A good physical query plan • Basic query optimization algorithm – Enumerate alternative plans (logical and physical) – Compute estimated cost of each plan • Compute number of I/Os • Optionally take into account other resources – Choose plan with lowest cost – This is called cost-based optimization Magda Balazinska - CSE 444, Spring 2012 9 Magda Balazinska - CSE 444, Spring 2012 10 Estimating Cost of a Query Plan Access Path • We already know how to • Access path: a way to retrieve tuples from a table – Compute the cost of different operations – A file scan • In terms of number IOs – An index plus a matching selection condition • Normally should also consider CPU and network • We still need to • Index matches selection condition if it can be used – Compute cost of retrieving tuples from disk to retrieve just tuples that satisfy the condition • We can use different access paths – Example: Supplier(sid,sname,scity,sstate) • Queries have more sophisticated predicates than equality – B+-tree index on (scity,sstate) – Compute cost of a complete plan • matches scity=‘Seattle’ • does not match sid=3, does not match sstate=‘WA’ Magda Balazinska - CSE 444, Spring 2012 11 Magda Balazinska - CSE 444, Spring 2012 12 2 Access Path Selection Access Path Selectivity • Supplier(sid,sname,scity,sstate) • Access path selectivity is the number of pages retrieved if we use this access path • Selection condition: sid > 300 ∧ scity=‘Seattle’ – Most selective retrieves fewest pages • As we saw earlier, for equality predicates • Indexes: B+-tree on sid and B+-tree on scity – Selection on equality: σa=v(R) – V(R, a) = # of distinct values of attribute a – 1/V(R,a) is thus the reduction factor • Which access path should we use? – Clustered index on a: cost B(R)/V(R,a) – Unclustered index on a: cost T(R)/V(R,a) • We should pick the most selective access path – (we are ignoring I/O cost of index pages for simplicity) Magda Balazinska - CSE 444, Spring 2012 13 Magda Balazinska - CSE 444, Spring 2012 14 Selectivity for Range Predicates Back to Our Example • Selection condition: sid > 300 ∧ scity=‘Seattle’ Selection on range: σa>v(R) – Index I1: B+-tree on sid clustered – Index I2: B+-tree on scity unclustered • How to compute the selectivity? • Let’s assume • Assume values are uniformly distributed – V(Supplier,scity) = 20 • Reduction factor X – Max(Supplier, sid) = 1000, Min(Supplier,sid)=1 • X = (Max(R,a) - v) / (Max(R,a) - Min(R,a)) – B(Supplier) = 100, T(Supplier) = 1000 • Clustered index on a: cost B(R)*X • Cost I1: B(R) * (Max-v)/(Max-Min) = 100*700/999 ≈ 70 • Unclustered index on a: cost T(R)*X • Cost I2: T(R) * 1/V(Supplier,scity) = 1000/20 = 50 Magda Balazinska - CSE 444, Spring 2012 15 Magda Balazinska - CSE 444, Spring 2012 16 Selectivity with Back to Estimating Multiple Conditions Cost of a Query Plan What if we have an index on multiple attributes? • We already know how to • Ex: selection σa=v1 ∧ b= v2(R) and index on <a,b> – Compute the cost of different operations (last week) – Compute cost of retrieving tuples from disk with How to compute the selectivity? different access paths • Assume attributes are independent • X = 1 / (V(R,a) * V(R,b)) • We still need to – Compute cost of a complete plan • Clustered index on <a,b>: cost B(R)*X • Unclustered index on <a,b>: cost T(R)*X Magda Balazinska - CSE 444, Spring 2012 17 Magda Balazinska - CSE 444, Spring 2012 18 3 Computing the Cost of a Plan Projection: Cardinality of Result • Estimate cardinality in a bottom-up fashion • Output cardinality same as input cardinality – Cardinality is the size of a relation (nb of tuples) – Same number of input tuples – Compute size of all intermediate relations in plan – But tuples are smaller! • Estimate cost by using the estimated cardinalities • We learned how to compute the cost last week • But how do we compute cardinalities? Magda Balazinska - CSE 444, Spring 2012 19 Magda Balazinska - CSE 444, Spring 2012 20 Selection: Cardinality of Result Join: Cardinality of Result • Multiply input cardinality by reduction factor • For joins R ⋈ S – Similar to estimating access path selectivity – In fact, we also say operator/predicate selectivity – Take product of cardinalities of relations R and S – Condition is a = c /* value selection on R */ – Apply reduction factors for each term in join condition • Selectivity = 1/V(R,a) – Condition is a < v /* range selection on R */ – Terms are of the form: column1 = column2 • Selectivity = (v - Min(R, a))/(Max(R,a) - Min(R,a)) – Reduction: 1/ ( MAX( V(R,column1), V(S,column2)) – Multiple conditions: assume independence • Assumes each value in smaller set has a matching value • Use product of the reduction factors for the terms in the larger set (more on next slide) • Condition is a=v1 ∧ b= v2 • Selectivity = 1 / (V(R,a) * V(R,b)) Magda Balazinska - CSE 444, Spring 2012 21 Magda Balazinska - CSE 444, Spring 2012 22 Assumptions Selectivity of R ⨝A=B S • Containment of values: if V(R,A) <= V(S,B), then Assume V(R,A) <= V(S,B) the set of A values of R is included in the set of • Each tuple t in R joins with T(S)/V(S,B) tuple(s) in S B values of S – Note: this indeed holds when A is a foreign key in R, • Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B) and B is a key in S In general: T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B)) • Preservation of values: for any other attribute C, V(R ⨝A=B S, C) = V(R, C) (or V(S, C)) Magda Balazinska - CSE 444, Spring 2012 23 Magda Balazinska - CSE 444, Spring 2012 24 4 T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11 T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10 Complete Example V(Supply,pno) = 2,500 Physical Query Plan 1 SELECT sname Supplier(sid, sname, scity, sstate) (On the fly) π sname Selection and project on-the-fly FROM Supplier x, Supply y Supply(sid, pno, quantity) -> No additional cost. WHERE x.sid = y.sid (On the fly) • Some statistics and y.pno = 2 – T(Supplier) = 1000 records and x.scity = ‘Seattle’ σ scity=‘Seattle’ ∧sstate=‘WA’ ∧ pno=2 and x.sstate = ‘WA’ – T(Supply) = 10,000 records Total cost of plan is thus cost of join: – B(Supplier) = 100 pages = B(Supplier)+B(Supplier)*B(Supply)/(M-1) – B(Supply) = 100 pages (Nested loop) sno = sno = 100 + 100 * 100/10 – V(Supplier,scity) = 20, V(Suppliers,state) = 10 = 1,100 I/Os – V(Supply,pno) = 2,500 – Both relations are clustered Supplier Supply • M = 11 (File scan) (File scan) Magda Balazinska - CSE 444, Spring 2012 25 Magda Balazinska - CSE 444, Spring 2012 26 T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11 M = 11 T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10 V(Supply,pno) = 2,500 Plan 2 with Different Numbers Physical Query Plan 2 Total cost Total cost (On the fly) π sname (d) What if we had: π sname (d) = 100 + 100 * 1/20 * 1/10 (a) = 10000 + 50 (a) 10K pages of Supplier + 100 + 100 * 1/2500 (b) + 10000 + 4 (b) 10K pages of Supply + 2 (c) + 3*50 + 4 (c) (Sort-merge join) (c) (Sort-merge join) (c) sno = sno + 0 (d) sno = sno + 0 (d) (Scan Total cost ≈ 204 I/Os (Scan Total cost ≈ 20,208 I/Os write to T1) (Scan write to T1) (Scan (a) σ write to T2) (a) σ write to T2) scity=‘Seattle’ ∧sstate=‘WA’ (b) σ pno=2 scity=‘Seattle’ ∧sstate=‘WA’ (b) σ pno=2 Need to do a two- Supplier Supply Supplier Supply pass sort algorithm (File scan) (File scan) (File scan) (File scan) Magda Balazinska - CSE 444, Spring 2012 27 Magda Balazinska - CSE 444, Spring 2012 28 T(Supplier) = 1000 B(Supplier) = 100 V(Supplier,scity) = 20 M = 11 T(Supply) = 10,000 B(Supply) = 100 V(Supplier,state) = 10 V(Supply,pno) = 2,500 Simplifications Physical Query Plan 3 (On the fly) d Total cost ( ) π sname • In the previous examples, we assumed that all = 1 (a) (On the fly) + 4 (b) index pages were in memory (c) σ scity=‘Seattle’ ∧sstate=‘WA’ + 0 (c) + 0 (d) • When this is not the case, we need to add the Total cost ≈ 5 I/Os (b) cost of fetching index pages from disk sno = sno (Index nested loop) (Use hash index) 4 tuples (a) σ pno=2 Supply Supplier (Hash index on pno ) (Hash index on sno) 29 Magda
Recommended publications
  • SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations
    SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations Johes Bater Yongjoo Park Xi He Northwestern University University of Illinois (UIUC) University of Waterloo [email protected] [email protected] [email protected] Xiao Wang Jennie Rogers Northwestern University Northwestern University [email protected] [email protected] ABSTRACT 1. INTRODUCTION A private data federation enables clients to query the union of data Querying the union of multiple private data stores is challeng- from multiple data providers without revealing any extra private ing due to the need to compute over the combined datasets without information to the client or any other data providers. Unfortu- data providers disclosing their secret query inputs to anyone. Here, nately, this strong end-to-end privacy guarantee requires crypto- a client issues a query against the union of these private records graphic protocols that incur a significant performance overhead as and he or she receives the output of their query over this shared high as 1,000× compared to executing the same query in the clear. data. Presently, systems of this kind use a trusted third party to se- As a result, private data federations are impractical for common curely query the union of multiple private datastores. For example, database workloads. This gap reveals the following key challenge some large hospitals in the Chicago area offer services for a cer- in a private data federation: offering significantly fast and accurate tain percentage of the residents; if we can query the union of these query answers without compromising strong end-to-end privacy. databases, it may serve as invaluable resources for accurate diag- To address this challenge, we propose SAQE, the Secure Ap- nosis, informed immunization, timely epidemic control, and so on.
    [Show full text]
  • Veritas: Shared Verifiable Databases and Tables in the Cloud
    Veritas: Shared Verifiable Databases and Tables in the Cloud Lindsey Alleny, Panagiotis Antonopoulosy, Arvind Arasuy, Johannes Gehrkey, Joachim Hammery, James Huntery, Raghav Kaushiky, Donald Kossmanny, Jonathan Leey, Ravi Ramamurthyy, Srinath Settyy, Jakub Szymaszeky, Alexander van Renenz, Ramarathnam Venkatesany yMicrosoft Corporation zTechnische Universität München ABSTRACT A recent class of new technology under the umbrella term In this paper we introduce shared, verifiable database tables, a new "blockchain" [11, 13, 22, 28, 33] promises to solve this conundrum. abstraction for trusted data sharing in the cloud. For example, companies A and B could write the state of these shared tables into a blockchain infrastructure (such as Ethereum [33]). The KEYWORDS blockchain then gives them transaction properties, such as atomic- ity of updates, durability, and isolation, and the blockchain allows Blockchain, data management a third party to go through its history and verify the transactions. Both companies A and B would have to write their updates into 1 INTRODUCTION the blockchain, wait until they are committed, and also read any Our economy depends on interactions – buyers and sellers, suppli- state changes from the blockchain as a means of sharing data across ers and manufacturers, professors and students, etc. Consider the A and B. To implement permissioning, A and B can use a permis- scenario where there are two companies A and B, where B supplies sioned blockchain variant that allows transactions only from a set widgets to A. A and B are exchanging data about the latest price for of well-defined participants; e.g., Ethereum [33]. widgets and dates of when widgets were shipped.
    [Show full text]
  • SQL Database Management Portal
    APPENDIX A SQL Database Management Portal This appendix introduces you to the online SQL Database Management Portal built by Microsoft. It is a web-based application that allows you to design, manage, and monitor your SQL Database instances. Although the current version of the portal does not support all the functions of SQL Server Management Studio, the portal offers some interesting capabilities unique to SQL Database. Launching the Management Portal You can launch the SQL Database Management Portal (SDMP) from the Windows Azure Management Portal. Open a browser and navigate to https://manage.windowsazure.com, and then login with your Live ID. Once logged in, click SQL Databases on the left and click on your database from the list of available databases. This brings up the database dashboard, as seen in Figure A-1. To launch the management portal, click the Manage icon at the bottom. Figure A-1. SQL Database dashboard in Windows Azure Management Portal 257 APPENDIX A N SQL DATABASE MANAGEMENT PORTAL N Note You can also access the SDMP directly from a browser by typing https://sqldatabasename.database. windows.net, where sqldatabasename is the server name of your SQL Database. A new web page opens up that prompts you to log in to your SQL Database server. If you clicked through the Windows Azure Management Portal, the database name will be automatically filled and read-only. If you typed the URL directly in a browser, you will need to enter the database name manually. Enter a user name and password; then click Log on (Figure A-2).
    [Show full text]
  • Plan Stitch: Harnessing the Best of Many Plans
    Plan Stitch: Harnessing the Best of Many Plans Bailu Ding Sudipto Das Wentao Wu Surajit Chaudhuri Vivek Narasayya Microsoft Research Redmond, WA 98052, USA fbadin, sudiptod, wentwu, surajitc, [email protected] ABSTRACT optimizer chooses multiple query plans for the same query, espe- Query performance regression due to the query optimizer select- cially for stored procedures and query templates. ing a bad query execution plan is a major pain point in produc- The query optimizer often makes good use of the newly-created indexes and statistics, and the resulting new query plan improves tion workloads. Commercial DBMSs today can automatically de- 1 tect and correct such query plan regressions by storing previously- in execution cost. At times, however, the latest query plan chosen executed plans and reverting to a previous plan which is still valid by the optimizer has significantly higher execution cost compared and has the least execution cost. Such reversion-based plan cor- to previously-executed plans, i.e., a plan regresses [5, 6, 31]. rection has relatively low risk of plan regression since the deci- The problem of plan regression is painful for production applica- sion is based on observed execution costs. However, this approach tions as it is hard to debug. It becomes even more challenging when ignores potentially valuable information of efficient subplans col- plans change regularly on a large scale, such as in a cloud database lected from other previously-executed plans. In this paper, we service with automatic index tuning. Given such a large-scale pro- propose a novel technique, Plan Stitch, that automatically and op- duction environment, detecting and correcting plan regressions in portunistically combines efficient subplans of previously-executed an automated manner becomes crucial.
    [Show full text]
  • Exploring Query Re-Optimization in a Modern Database System
    Radboud University Nijmegen Institute of Computing and Information Sciences Exploring Query Re-Optimization In a Modern Database System Master Thesis Computing Science Supervisor: Prof. dr. ir. Arjen P. Author: de Vries BSc Laurens N. Kuiper Second reader: Prof. dr. ir. Djoerd Hiemstra May 2021 Abstract The standard approach to query processing in database systems is to optimize before executing. When available statistics are accurate, optimization yields the optimal plan, and execution is as quick as it can be. However, when queries become more complex, the quality of statistics degrades, which leads to sub-optimal query plans, sometimes up to several orders of magnitude worse. Improving statistics for these queries is infeasible because the cardinality estimation error propagates exponentially through the plan. This problem can be alleviated through re-optimization, which is to progressively optimize the query, executing parts of the plan at a time, improving plan quality at the cost of additional overhead. In this work, re-optimization is implemented and simulated in DuckDB, a main-memory database system designed for analytical workloads, and evaluated on the Join Order Benchmark. Total plan cost of the benchmark is reduced by up to 44% using this method, and a reduction in end-to-end query latency of up to 20% is observed using only simple re-optimization schemes, showing the potential of this approach. Acknowledgements The past year has been a challenging and exciting year. From breaking my leg to starting a job, and finally finishing my thesis. I would like to thank everyone who supported me during this period, and helped me get here.
    [Show full text]
  • Queryguard: Privacy-Preserving Latency-Aware Query Optimization for Edge Computing
    QueryGuard: Privacy-preserving Latency-aware Query Optimization for Edge Computing Runhua Xu ∗, Balaji Palanisamy y and James Joshi z School of Computing and Information University of Pittsburgh, Pittsburgh, PA USA ∗[email protected], [email protected], [email protected] Abstract—The emerging edge computing paradigm has en- adopting distributed database techniques in edge computing abled applications having low response time requirements to brings new challenges and concerns. First, in contrast to meet the quality of service needs of applications by moving the cloud data centers where cloud servers are managed through computations to the edge of the network that is geographically closer to the end-users and end-devices. Despite the low latency strict and regularized policies, edge nodes may not have the advantages provided by the edge computing model, there are same degree of regulatory and monitoring oversight. This significant privacy risks associated with the adoption of edge may lead to higher privacy risks compared to that in cloud computing services for applications dealing with sensitive data. servers. In particular, when dealing with a join query, tra- In contrast to cloud data centers where system infrastructures are ditional distributed query processing techniques sometimes managed through strict and regularized policies, edge computing nodes are scattered geographically and may not have the same ship selected or projected data to different nodes, some of degree of regulatory and monitoring oversight. This can lead which may be untrusted or semi-trusted. Thus, such techniques to higher privacy risks for the data processed and stored at may lead to greater disclosure of private information within the edge nodes, thus making them less trusted.
    [Show full text]
  • Lesson 4: Optimize a Query Using the Sybase IQ Query Plan
    Author: Courtney Claussen – SAP Sybase IQ Technical Evangelist Contributor: Bruce McManus – Director of Customer Support at Sybase Getting Started with SAP Sybase IQ Column Store Analytics Server Lesson 4: Optimize a Query using the SAP Sybase IQ Query Plan Copyright (C) 2012 Sybase, Inc. All rights reserved. Unpublished rights reserved under U.S. copyright laws. Sybase and the Sybase logo are trademarks of Sybase, Inc. or its subsidiaries. SAP and the SAP logo are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other trademarks are the property of their respective owners. (R) indicates registration in the United States. Specifications are subject to change without notice. Table of Contents 1. Introduction ................................................................................................................1 2. SAP Sybase IQ Query Processing ............................................................................2 3. SAP Sybase IQ Query Plans .....................................................................................4 4. What a Query Plan Looks Like ................................................................................5 5. What a Query Plan Will Tell You ............................................................................7 5.1 Query Tree Nodes and Node Detail ......................................................................7 5.1.1 Root Node Details ..........................................................................................7
    [Show full text]
  • Query Processing for SQL Updates
    Query Processing for SQL Updates Cesar´ A. Galindo-Legaria Stefano Stefani Florian Waas [email protected] [email protected][email protected] Microsoft Corp., One Microsoft Way, Redmond, WA 98052 ABSTRACT tradeoffs between plans that do serial or random I/Os. This moti- A rich set of concepts and techniques has been developed in the vates the integration of update processing in the general framework context of query processing for the efficient and robust execution of query processing. To be successful, this integration needs to of queries. So far, this work has mostly focused on issues related model update processing in a suitable way and consider the special to data-retrieval queries, with a strong backing on relational alge- requirements of updates. One of the few papers in this area is [2], bra. However, update operations can also exhibit a number of query which deals with delete operations, but there is little documented processing issues, depending on the complexity of the operations on the integration of updates with query processing, to the best of and the volume of data to process. Such issues include lookup and our knowledge. matching of values, navigational vs. set-oriented algorithms and In this paper, we present an overview of the basic concepts used trade-offs between plans that do serial or random I/Os. to support SQL Data Manipulation Language (DML) by the query In this paper we present an overview of the basic techniques used processor in Microsoft SQL Server. We focus on the query process- to support SQL DML (Data Manipulation Language) in Microsoft ing aspects of the problem, how data is modeled, primitive opera- SQL Server.
    [Show full text]
  • CPS216: Advanced Database Systems Notes 03:Query Processing
    CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu Overview of SQL query Query parse Processing parse tree Query rewriting Query statistics logical query plan Optimization Physical plan generation physical query plan execute Query Execution result SQL query Initial logical plan parse parse tree Rewrite rules Query rewriting Logical plan statistics logical query plan Physical plan generation “Best” logical plan physical query plan execute result Query Rewriting π π B,D B,D σ σ R.C = S.C R.A = “c” Λ R.C = S.C σ R.A = “c” X X RS RS We will revisit it towards the end of this lecture SQL query parse parse tree Query rewriting statistics Best logical query plan Physical plan generation Best physical query plan execute result Physical Plan Generation π B,D Project Natural join Hash join σ R.A = “c” S Index scan Table scan R RS Best logical plan SQL query parse parse tree Enumerate possible physical plans Query rewriting statistics Best logical query plan Find the cost of each plan Physical plan generation Best physical query plan Pick plan with execute minimum cost result Physical Plan Generation Logical Query Plan Physical P1 P2 …. Pn plans C1 C2 …. Cn Costs Pick minimum cost one Plans for Query Execution • Roadmap – Path of a SQL query – Operator trees – Physical Vs Logical plans – Plumbing: Materialization Vs pipelining Modern DBMS Architecture Applications SQL DBMS Parser Logical query plan Query Optimizer Physical query plan Query Executor Access method API calls Storage Manager Storage system API calls File system API calls OS Disk(s) Logical Plans Vs.
    [Show full text]
  • Protecting User Privacy Using Declarative Preferences During Distributed Query Processing
    Don’t Reveal My Intension: Protecting User Privacy Using Declarative Preferences during Distributed Query Processing Nicholas L. Farnan1,AdamJ.Lee1, Panos K. Chrysanthis1, and Ting Yu2 1 Department of Computer Science, University of Pittsburgh 2 Department of Computer Science, North Carolina State University Abstract. In a centralized setting, the declarative nature of SQL is a major strength: a user can simply describe what she wants to retrieve, and need not worry about how the resulting query plan is actually gener- ated and executed. However, in a decentralized setting, two query plans that produce the same result might actually reveal vastly different infor- mation about the intensional description of a user’s query to the servers participating its evaluation. In cases where a user considers portions of her query to be sensitive, this is clearly problematic. In this paper, we address the specification and enforcement of querier privacy constraints on the execution of distributed database queries. We formalize a notion of intensional query privacy called (I,A)-privacy, and extend the syn- tax of SQL to allow users to enforce strict (I,A)-privacy constraints or partially ordered privacy/performance preferences over the execution of their queries. 1 Introduction Applications are increasingly becoming more decentralized, relying on data ex- change between autonomous and distributed data stores. This reliance is in some cases a design decision motivated by the need for scalability; for example, when user demand for a service exceeds what a system at a single site would be able to provide and replication becomes a necessity. In other instances, it is a fact of life that must be dealt with.
    [Show full text]
  • Anylog: a Grand Unification of the Internet of Things
    AnyLog: a Grand Unification of the Internet of Things Daniel Abadi Owen Arden University of Maryland University of California, [email protected] Santa Cruz [email protected] Faisal Nawab Moshe Shadmon University of California, AnyLog Santa Cruz [email protected] [email protected] ABSTRACT specify the permissions of that data. Some data will be published AnyLog is a decentralized platform for data publishing, sharing, with open access—in which case it will be queryable by any user of and querying IoT (Internet of Things) data that enables an unlim- AnyLog. Other data will be published in encrypted form, in which ited number of independent participants to publish and access the case only users with access to the decryption key may access it. contents of IoT datasets stored across the participants. AnyLog AnyLog is designed to provide a powerful query interface to the provides decentralized publishing and querying functionality over entire wealth of data produced by IoT devices. Questions such as: structured data in an analogous fashion to how the world wide “What was the maximum temperature reported in Palo Alto on June web (WWW) enables decentralized publishing and accessing of 21, 2008?” or “What was the difference in near accidents between unstructured data. However, AnyLog differs from the traditional self-driving cars that used deep-learning model X vs. self-driving WWW in the way that it provides incentives and financial reward cars that used deep-learning model Y?” or “How many cars passed for performing tasks that are critical to the well-being of the system the toll bridge in the last hour?” or “How many malfunctions were as a whole, including contribution, integration, storing, and pro- reported by a turbine of a particular model in all deployments in cessing of data, as well as protecting the confidentiality, integrity, the last year?” can all be expressed using clean and clearly spec- and availability of that data.
    [Show full text]
  • HP Nonstop SQL/MX Release 3.1 Query Guide
    TP663851.fm Page 1 Monday, October 17, 2011 11:48 AM HP NonStop SQL/MX Release 3.1 Query Guide Abstract This guide describes how to understand query execution plans and write optimal queries for an HP NonStop™ SQL/MX database. It is intended for database administrators and application developers who use NonStop SQL/MX to query an SQL/MX database and who have a particular interest in issues related to query performance. Product Version NonStop SQL/MX Releases 3.1 Supported Release Version Updates (RVUs) This publication supports J06.12 and all subsequent J-series RVUs and H06.23 and all subsequent H-series RVUs, until otherwise indicated by its replacement publications. Part Number Published 663851-001 October 2011 TP663851.fm Page 2 Monday, October 17, 2011 11:48 AM Document History Part Number Product Version Published 640323-001 NonStop SQL/MX Release 3.0 February 2011 663851-001 NonStop SQL/MX Release 3.1 October 2011 TP663851.fm Page 1 Monday, October 17, 2011 11:48 AM Legal Notices Copyright 2011 Hewlett-Packard Development Company L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty.
    [Show full text]