SQL++ Query Language

9/9/2015 SQL++ Query Language and its use in the FORWARD framework with the support of National Science Foundation, Google, Informatica, Couchbase Semistructured Databases have Arrived SQL-on-Hadoop NoSQL CQL PIG N1QL SQL-on-Hadoop Jaql Others AQL in the form of JSON… { location: 'Alpine', readings: [ { time: timestamp('2014-03-12T20:00:00'), ozone: 0.035, no2: 0.0050 }, { time: timestamp('2014-03-12T22:00:00'), ozone: 'm', co: 0.4 } ] } 1 9/9/2015 Semistructured Query Languages come along… SQL: JSON: Expressive Semistructured Power and Data Adoption but in a Tower of Babel… SELECT AVG(temp) AS tavg db.readings.aggregate( FROM readings {$group: {_id: "$sid", GROUP BY sid tavg: {$avg:"$temp"}}}) SQL MongoDB readings -> group by sid = $.sid a = LOAD 'readings' AS into { tavg: avg($.temp) }; (sid:int, temp:float); Jaql b = GROUP a BY sid; for $r in collection("readings") c = FOREACH b GENERATE group by $r.sid AVG(temp); return { tavg: avg($r.temp) } DUMP c; JSONiq Pig with limited query abilities… Growing out of key-value databases or SQL with special type UDFs Far from the full-fledged power of prior non-relational query languages – OQL, Xquery *academic projects being the exception (eg, ASTERIX, FORWARD) 2 9/9/2015 in lieu of formal semantics… Query languages were not meant for Ninjas Outline • What is SQL++ ? • How we use SQL++ in UCSD’s FORWARD ? • Introduction to the formal survey of NoSQL, NewSQL and SQL-on-Hadoop SQL++ : Data model + query language for querying both structured data (SQL) and semistructured data (native JSON) Formal Schemaless Full fledged, Semantics: adjusting tuple declarative , calculus to the aggressively backwards- richness of JSON compatible with SQL Configurable SQL++: Unifying: Formal configuration 11 popular databases/QLs options towards choosing captured by appropriate semantics, understanding settings of Configurable repercussions SQL++ 3 9/9/2015 How is it used in the Big Data industry • ASTERIXDB • CouchDB heading towards SQL++ How do we use it in UCSD Automatic Incremental View Maintenance Middleware Query Optimization & Execution: FORWARD virtual database query plans based on SQL++ algebra based on a schemaless SQL++ algebra SQL++ Data Model (also JSON++ data model) A superset of SQL and JSON • Extend SQL with arrays + nesting + heterogeneity { location: 'Alpine', readings: [ • Array nested { inside a tuple time: timestamp('2014-03-12T20:00:00'), ozone: 0.035, • Heterogeneous no2: 0.0050 tuples in }, collections { time: timestamp('2014-03-12T22:00:00'), • Arbitrary ozone: 'm', compositions of co: 0.4 array, bag, tuple } ] } 4 9/9/2015 SQL++ Data Model (also JSON++ data model) A superset of SQL and JSON • Extend SQL with arrays + nesting + heterogeneity + permissive structures [ [5, 10, 3], [21, 2, 15, 6], [4, {lo: 3, exp: 4, hi: 7}, 2, 13, 6] ] • Arbitrary compositions of array, bag, tuple SQL++ Data Model (also JSON++ data model) A superset of SQL and JSON • Extend SQL with arrays + nesting + heterogeneity + permissive structures • Extend JSON with bags and enriched types { location: 'Alpine', readings: {{ { time: timestamp('2014-03-12T20:00:00'), ozone: 0.035, • Bags {{ … }} no2: 0.0050 }, • Enriched types { time: timestamp('2014-03-12T22:00:00'), ozone: 'm', co: 0.4 } }} } Exercise: Why the SQL++ data model is a superset of the SQL data model? • a SQL table corresponds to a JSON bag that has homogeneous tuples 5 9/9/2015 SQL++ Data Model (also JSON++ data model) A superset of SQL and JSON • Extend SQL with arrays + nesting + heterogeneity + permissive structures + permissive schema • Extend JSON with bags and enriched types { { location: 'Alpine', location: string, readings: {{ readings: {{ { { time: timestamp('2014-03-12T20:00:00'), time: timestamp, • With schema ozone: 0.035, ozone: any, no2: 0.0050 * • Or, schemaless }, } { }} • Or, partial schema time: timestamp('2014} -03-12T22:00:00'), ozone: 'm', co: 0.4 } }} } From SQL to SQL++ • Goal: Queries that input/output any JSON type, yet backwards compatible with SQL Backwards Compatibility with SQL readings : {{ { sid: 2, temp: 70.1 }, {{ { sid: 2, temp: 49.2 }, { sid : 2 } { sid: 1, temp: null } SELECT DISTINCT r.sid }} }} FROM readings AS r WHERE r.temp < 50 Find sensors that recorded a temperature below 50 sid sid temp 2 2 70.1 2 49.2 1 null 6 9/9/2015 From SQL to SQL++ • Goal: Queries that input/output any JSON type, yet backwards compatible with SQL • Adjust SQL syntax/semantics for heterogeneity, complex input/output values • Achieved with few extensions and mostly by SQL limitation removals From Tuple Variables to Element Variables SELECT r AS co readings : FROM readings AS r [ [ 1.3, WHERE r < 1.0 { co: 0.8 }, 0.7, ORDER BY r DESC { co: 0.7 } 0.3, LIMIT 2 ] 0.8 ] Find the highest two sensor readings that are below 1.0 Element Variables FROM readings AS r out in B FROM = B WHERE = {{ ⟨ r : 1.3 ⟩, ⟨ r : 0.7 ⟩, ⟨ r : 0.3 ⟩, ⟨ r : 0.8 ⟩ }} WHERE r < 1.0 : readings Bout = Bin = {{ ⟨ r : 0.7 ⟩, [ WHERE ORDERBY ⟨ r : 0.3 ⟩, [ 1.3, ⟨ r : 0.8 ⟩ }} { co: 0.8 }, 0.7, { co: 0.7 } 0.3, ORDER BY r DESC ] 0.8 Bout = Bin = [ ⟨ r : 0.8 ⟩, ] ORDERBY LIMIT ⟨ : ⟩, r 0.7 ⟨ r : 0.3 ⟩ ] LIMIT 2 out in B LIMIT = B SELECT = [ ⟨ r : 0.8 ⟩ ⟨ r : 0.7 ⟩ ] SELECT r AS co 7 9/9/2015 Correlated Subqueries Everywhere SELECT r AS co sensors : FROM sensors AS s, [ s AS r [ [1.3, 2] WHERE r < 1.0 { co: 0.9 }, [0.7, 0.7, 0.9] ORDER BY r DESC { co: 0.8 } [0.3, 0.8, 1.1] LIMIT 2 ] [0.7, 1.4] ] Find the highest two sensor readings that are below 1.0 Correlated Subqueries Everywhere FROM sensors AS s, s AS r out in B FROM = B WHERE = {{ ⟨ s : [1.3, 2] , r : 1.3 ⟩, ⟨ s : [1.3, 2] , r : 2 ⟩, ⟨ s : [0.7, 0.7, 0.9] , r : 0.7 ⟩, ⟨ s : [0.7, 0.7, 0.9] , r : 0.7 ⟩, sensors : [ ⟨ s : [0.7, 0.7, 0.9] , r : 0.9 ⟩, [1.3, 2] ⟨ s : [0.3, 0.8, 1.1] , r : 0.3 ⟩, [0.7, 0.7, 0.9] ⟨ s : [0.3, 0.8, 1.1] , r : 0.8 ⟩, [0.3, 0.8, 1.1] ⟨ s : [0.3, 0.8, 1.1] , r : 1.1 ⟩, [0.7, 1.4] ⟨ s : [0.7, 1.4] , r : 0.7 ⟩, ] ⟨ s : [0.7, 1.4] , r : 1.4 ⟩ }} WHERE r < 1.0 ORDER BY r DESC LIMIT 2 SELECT r AS co Correlated Subqueries Everywhere FROM sensors AS s, s AS r WHERE r < 1.0 out in B WHERE = B ORDERBY = {{ ⟨ s : [0.7, 0.7, 0.9] , r : 0.7 ⟩, ⟨ s : [0.7, 0.7, 0.9] , r : 0.7 ⟩, ⟨ s : [0.7, 0.7, 0.9] , r : 0.9 ⟩, : sensors [ ⟨ s : [0.3, 0.8, 1.1] , r : 0.3 ⟩, [1.3, 2] ⟨ s : [0.3, 0.8, 1.1] , r : 0.8 ⟩, [0.7, 0.7, 0.9] ⟨ s : [0.7, 1.4] , r : 0.7 ⟩ [0.3, 0.8, 1.1] }} [0.7, 1.4] ] ORDER BY r DESC LIMIT 2 SELECT r AS co 8 9/9/2015 Correlated Subqueries Everywhere FROM sensors AS s, s AS r WHERE r < 1.0 ORDER BY r DESC out in B ORDERBY = B LIMIT = [ ⟨ s : [0.7, 0.7, 0.9] , r : 0.9 ⟩, sensors : [ ⟨ s : [0.3, 0.8, 1.1] , r : 0.8 ⟩, [1.3, 2] ⟨ s : [0.7, 0.7, 0.9] , r : 0.7 ⟩, [ [0.7, 0.7, 0.9] ⟨ s : [0.7, 0.7, 0.9] , r : 0.7 ⟩, { co: 0.9 }, [0.3, 0.8, 1.1] ⟨ s : [0.7, 1.4] , r : 0.7 ⟩, { co: 0.8 } [0.7, 1.4] ⟨ s : [0.3, 0.8, 1.1] , r : 0.3 ⟩ ] ] ] LIMIT 2 out in B LIMIT = B SELECT = [ ⟨ s : [0.7, 0.7, 0.9] , r : 0.9 ⟩, ⟨ s : [0.3, 0.8, 1.1] , r : 0.8 ⟩ ] SELECT r AS co Utilization of Input Order x y SELECT r AS co, [ rp AS x, { co: 0.9, sensors : sp AS y x: 3, [ FROM sensors AS s AT sp, y: 2 [1.3, 2] s AS r AT rp }, [0.7, 0.7, 0.9] WHERE r < 1.0 [0.3, 0.8, 1.1] ORDER BY r DESC { co: 0.8, [0.7, 1.4] LIMIT 2 x: 2, ] y: 3 } Find the highest two sensor readings that are ] below 1.0 Constructing Nested Results – Subqueries Again {{ ℾ = ⟨ 0 { sensors : {{ sensor : 1, {sensor: 1}, FROM sensors AS s readings: {{ {sensor: 2} Bout = Bin = {{ {co:0.4}, }} , FROM SELECT ⟨ s : {sensor: 1} ⟩, {co:0.2} ⟨ s : {sensor: 2} ⟩ }} logs: {{ }} }, {sensor: 1, co: 0.4}, SELECT { {sensor: 1, s.sensor AS sensor, sensor : 2, co: 0.2}, ( readings: {{ {sensor: 2, SELECT l.co AS co {co:0.3} co: 0.3}, FROM logs AS l }} }} WHERE l.sensor = s.sensor } ⟩ ) AS readings }} 9 9/9/2015 Constructing Nested Results FROM logs AS l ℾ1 = ⟨ s : {sensor : 1}, out in B FROM = B SELECT = {{ sensors : {{ ⟨ l : {sensor: 1, co: 0.4} ⟩, {sensor: 1}, ⟨ l : {sensor: 1, co: 0.2} ⟩, {sensor: 2} ⟨ l : {sensor: 2, co: 0.3} ⟩ }} , }} {{ {co:0.4}, logs: {{ WHERE l.sensor = s.sensor {co:0.2} {sensor: 1, }} co: 0.4}, Bout = Bin = {{ {sensor: 1, FROM SELECT ⟨ l : {sensor: 1, co: 0.4} ⟩, co: 0.2}, ⟨ l : {sensor: 1, co: 0.2} ⟩ {sensor: 2, }} co: 0.3}, }} ⟩ SELECT l.co AS co Iterating over Attribute-Value pairs INNER Vs OUTER Correlation: Capturing Outerjoins, OUTER FLATTEN FROM [ readings AS {g:v}, { v AS n gas: "co", num: null ℾ = ⟨ }, { readings : { gas: "no2", co : [], num: 0.7 {{ no2: [0.7], }, { ⟨ g : "no2", v : [0.7], n : 0.7 ⟩, so2: [0.5, 2] gas: "so2", ⟨ g : "so2", v : [0.5, 2], n : 0.5 ⟩, } num: 0.5, ⟨ g : "co", v : [0.5, 2], n : 2 ⟩ }} ⟩ }, { gas: "so2", SELECT ELEMENT num: 2 { gas: g, } val: v } ] 10 9/9/2015 INNER Vs OUTER Correlation: Capturing Outerjoins, OUTER FLATTEN FROM [ readings AS {g:v} { INNER CORRELATE gas: "co", v AS n num: null ℾ = ⟨ }, { readings : { gas: "no2", co : [], num: 0.7 {{ no2: [0.7], }, { ⟨ g : "no2", v : [0.7], n : 0.7 ⟩, so2: [0.5, 2] gas: "so2", ⟨ g : "so2", v : [0.5, 2], n : 0.5 ⟩, } num: 0.5, ⟨ g : "co", v : [0.5, 2], n : 2 ⟩ }} ⟩ }, { gas: "so2", SELECT ELEMENT num: 2 { gas: g, } val: v } ] INNER Vs OUTER Correlation: Capturing Outerjoins, OUTER FLATTEN FROM [ readings AS {g:v} { OUTER CORRELATE gas: "co", v AS n num: null ℾ = ⟨ }, { readings : { gas: "no2", co : [], num: 0.7 {{ ⟨ g : "co", v : [], n : missing ⟩, no2: [0.7], }, { ⟨ g : "no2", v : [0.7], n : 0.7 ⟩, so2: [0.5, 2] gas: "so2", ⟨ g : "so2", v : [0.5, 2], n : 0.5 ⟩, } num: 0.5, ⟨ g : "co", v : [0.5, 2], n : 2 ⟩ }} ⟩ }, { gas: "so2", SELECT ELEMENT num: 2 { gas: g, } val: v } ] The SQL++ Extensions to SQL • Goal: Queries that input/output any JSON type, yet backwards compatible with SQL • Adjust SQL syntax/semantics for heterogeneity, complex values • Achieved with few extensions and SQL limitation removals • Element Variables • Correlated Subqueries (in FROM and SELECT clauses) • Optional utilization of Input Order • Grouping indendent of Aggregation Functions • ..

SQL++ Query Language

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support