Still using Windows 3.1?
So why stick with SQL-92?
@ModernSQL - https://modern-sql.com/ @MarkusWinand
SQL:1999 LATERAL LATERAL Before SQL:1999
Select-list sub-queries must be scalar[0]: (an atomic quantity that can hold only one value at a time[1])
SELECT … , (SELECT column_1 FROM t1 WHERE t1.x = t2.y ) AS c FROM t2 …
[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar LATERAL Before SQL:1999
Select-list sub-queries must be scalar[0]: (an atomic quantity that can hold only one value at a time[1])
SELECT … , (SELECT column_1 , column_2 FROM t1 ✗ WHERE t1.x = t2.y ) AS c More than FROM t2 one column? … ⇒Syntax error
[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar LATERAL Before SQL:1999
Select-list sub-queries must be scalar[0]: (an atomic quantity that can hold only one value at a time[1])
SELECT … More than , (SELECT column_1 , column_2 one row? ⇒Runtime error! FROM t1 ✗ WHERE t1.x = t2.y } ) AS c More than FROM t2 one column? … ⇒Syntax error
[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar LATERAL Since SQL:1999
Lateral derived queries can see table names defined before:
SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM t2 WHERE t2.x = t1.x ) derived_table ON (true) LATERAL Since SQL:1999
Lateral derived queries can see table names defined before:
SELECT * FROM t1 Valid due to CROSS JOIN LATERAL (SELECT * LATERAL FROM t2 keyword WHERE t2.x = t1.x ) derived_table ON (true) LATERAL Since SQL:1999
Lateral derived queries can see table names defined before:
SELECT * FROM t1 Valid due to CROSS JOIN LATERAL (SELECT * LATERAL FROM t2 keyword WHERE t2.x = t1.x Useless, but ) derived_table still required ON (true) LATERAL Since SQL:1999
Lateral derived queries can see table names defined before:
SELECT * FROM t1 Valid due to CROSS JOIN LATERAL (SELECT * LATERAL keyword Use FROM t2 CROSS JOIN to WHERE t2.x = t1.x omit the ON clause ) derived_table ON (true) But WHY? LATERAL Use-Cases LATERAL Use-Cases
‣ Top-N per group
inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row from left tables. LATERAL Use-Cases
‣ Top-N per group FROM t
inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … LIMIT 10 ) derived_table LATERAL Use-Cases
‣ Top-N per group FROM t
inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … Add proper index LIMIT 10 for Top-N query ) derived_table
https://use-the-index-luke.com/sql/partial-results/top-n-queries LATERAL Use-Cases
‣ Top-N per group FROM t
inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … ‣ Also useful to find most recent LIMIT 10 news from several subscribed ) derived_table topics (“multi-source top-N”). LATERAL Use-Cases
‣ Top-N per group FROM t
inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … ‣ Also useful to find most recent LIMIT 10 news from several subscribed ) derived_table topics (“multi-source top-N”).
‣ Table function arguments FROM t (TABLE often implies LATERAL) JOIN TABLE (your_func(t.c)) LATERAL Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB 8.0.14 MySQL 9.3 PostgreSQL SQLite 9.1 DB2 LUW 11gR1[0] 12cR1 Oracle 2005[1] SQL Server [0]Undocumented. Requires setting trace event 22829. [1]LATERAL is not supported as of SQL Server 2016 but [CROSS|OUTER] APPLY can be used for the same effect. WITH RECURSIVE (Common Table Expressions) WITH RECURSIVE CREATE TABLE t ( id INTEGER, parent INTEGER) WITH RECURSIVE CREATE TABLE t ( id INTEGER, parent INTEGER) WITH RECURSIVE CREATE TABLE t ( id INTEGER, parent INTEGER) WITH RECURSIVE Since SQL:1999
WITH RECURSIVE Since SQL:1999
SELECT t.id, t.parent FROM t WHERE t.id = ?
WITH RECURSIVE Since SQL:1999
SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? WITH RECURSIVE Since SQL:1999
SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? WITH RECURSIVE Since SQL:1999
SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? WITH RECURSIVE Since SQL:1999
WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? ) SELECT * FROM prev WITH RECURSIVE Since SQL:1999
WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL
SELECT t.id, t.parent FROM t JOIN prev ON t.parent = prev.id ) SELECT * FROM prev WITH RECURSIVE Since SQL:1999
WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL
SELECT t.id, t.parent FROM t JOIN prev ON t.parent = prev.id ) SELECT * FROM prev WITH RECURSIVE Since SQL:1999
WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL
SELECT t.id, t.parent FROM t JOIN prev ON t.parent = prev.id ) SELECT * FROM prev WITH RECURSIVE Use Cases WITH RECURSIVE Use Cases
‣ Processing graphs “[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer Shortest route from person A performance superior to that of the dedicated graph to B in LinkedIn/Facebook/… databases.” event.cwi.nl/grades2013/07-welc.pdf WITH RECURSIVE Use Cases
‣ Processing graphs “[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer Shortest route from person A performance superior to that of the dedicated graph to B in LinkedIn/Facebook/… databases.” event.cwi.nl/grades2013/07-welc.pdf
https://wiki.postgresql.org/wiki/Loose_indexscan ‣ Finding distinct values
in n*log(N)† time complexity
† n … # distinct values, N … # of table rows. Suitable index required WITH RECURSIVE Use Cases
‣ Processing graphs “[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer Shortest route from person A performance superior to that of the dedicated graph to B in LinkedIn/Facebook/… databases.” event.cwi.nl/grades2013/07-welc.pdf
https://wiki.postgresql.org/wiki/Loose_indexscan ‣ Finding distinct values
† WITH RECURSIVE gen (n) AS ( in n*log(N) time complexity SELECT 1 UNION ALL SELECT n+1 ‣ Row generators FROM gen
WHERE n < ? To fill gaps (e.g., in time ) series), generate test data SELECT * FROM gen
† n … # distinct values, N … # of table rows. Suitable index required WITH RECURSIVE Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2 MariaDB 8.0 MySQL 8.4 PostgreSQL 3.8.3[0] SQLite 7.0 DB2 LUW 11gR2 Oracle 2005 SQL Server [0]Only for top-level SELECT statements GROUPING SETS GROUPING SETS Before SQL:1999
Only one GROUP BY operation at a time: GROUPING SETS Before SQL:1999
Only one GROUP BY operation at a time: Monthly revenue SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month GROUPING SETS Before SQL:1999
Only one GROUP BY operation at a time: Monthly revenue Yearly revenue SELECT year SELECT year , month , sum(revenue) , sum(revenue) FROM tbl FROM tbl GROUP BY year, month GROUP BY year GROUPING SETS Before SQL:1999
SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month
SELECT year
, sum(revenue) FROM tbl GROUP BY year GROUPING SETS Before SQL:1999
SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month UNION ALL SELECT year , null , sum(revenue) FROM tbl GROUP BY year GROUPING SETS Since SQL:1999
SELECT year SELECT year , month , month , sum(revenue) , sum(revenue) FROM tbl FROM tbl GROUP BY year, month GROUP BY UNION ALL GROUPING SETS ( SELECT year (year, month) , null , (year) , sum(revenue) ) FROM tbl GROUP BY year GROUPING SETS Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1[0] MariaDB 5.0[1] MySQL 9.5 PostgreSQL SQLite 5 DB2 LUW 9iR1 Oracle 2008 SQL Server [0]Only ROLLUP (properitery syntax). [1]Only ROLLUP (properitery syntax). GROUPING function since MySQL 8.0. SQL:2003 OVER and PARTITION BY OVER (PARTITION BY) The Problem
Two distinct concepts could not be used independently: OVER (PARTITION BY) The Problem
Two distinct concepts could not be used independently: ‣ Merge rows with the same key properties ‣ GROUP BY to specify key properties ‣ DISTINCT to use full row as key OVER (PARTITION BY) The Problem
Two distinct concepts could not be used independently: ‣ Merge rows with the same key properties ‣ GROUP BY to specify key properties ‣ DISTINCT to use full row as key ‣ Aggregate data from related rows ‣ Requires GROUP BY to segregate the rows ‣ COUNT, SUM, AVG, MIN, MAX to aggregate grouped rows OVER (PARTITION BY) The Problem OVER (PARTITION BY) The Problem
SELECT c1 , c2 FROM t OVER (PARTITION BY) The Problem
SELECT c1
No , c2 ⇢
FROM t Merge rows Merge
⇠ Yes Yes OVER (PARTITION BY) The Problem
SELECT c1
No , c2 ⇢
FROM t
SELECT DISTINCT Merge rows Merge c1 ⇠ , c2
Yes Yes FROM t OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1
No , c2 ⇢
FROM t
SELECT DISTINCT Merge rows Merge c1 ⇠ , c2
Yes Yes FROM t OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1
No , c2 ⇢
FROM t
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 ⇢
FROM t FROM t
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 ⇢
FROM t FROM t JOIN ( ) ta ON (t.c1=ta.c1)
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 ⇢
FROM t FROM t SELECT c1 , SUM(c2) tot FROM t JOIN ( ) ta GROUP BY c1 ON (t.c1=ta.c1)
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 , tot ⇢
FROM t FROM t SELECT c1 , SUM(c2) tot FROM t JOIN ( ) ta GROUP BY c1 ON (t.c1=ta.c1)
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 , tot ⇢
FROM t FROM t SELECT c1 , SUM(c2) tot FROM t JOIN ( ) ta GROUP BY c1 ON (t.c1=ta.c1)
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) Since SQL:2003
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 ⇢
FROM t FROM tFROM t
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) Since SQL:2003
No ⇠ Aggregate ⇢ Yes
SELECT c1 SELECT c1
No , c2 , c2 ⇢
FROM t FROM t, SUM(c2) OVER (PARTITION BY c1) FROM t
SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t
Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, Look here 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() ) FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() ) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER (PARTITION BY) How it works
dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER and ORDER BY (Framing & Ranking) OVER (ORDER BY) The Problem
acnt id value balance SELECT id, value, 1 1 +10 +10 FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem
acnt id value balance SELECT id, value, 1 1 +10 +10 FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem
acnt id value balance SELECT id, value, 1 1 +10 +10 FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem
acnt id value balance SELECT id, value, 1 1 +10 +10 (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) FROM transactions t 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem
acnt id value balance SELECT id, value, 1 1 +10 +10 (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) FROM transactions t 333 4 +50 +70 333 5 -30 +40 Range segregation (<=) not possible with 333 6 -20 +20 GROUP BY or PARTITION BY OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) FROM transactions t 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 SUM(value) 22 2 +20 OVER ( 22 3 -10 ORDER BY id ROWS BETWEEN 333 4 +50 UNBOUNDED PRECEDING 333 5 -30 AND CURRENT ROW 333 6 -20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +20 OVER ( PARTITION BY acnt 22 3 -10 +10 ORDER BY id ROWS BETWEEN 333 4 +50 +50 UNBOUNDED PRECEDING 333 5 -30 +20 AND CURRENT ROW
333 6 -20 . 0 ) FROM transactions t OVER (ORDER BY) Since SQL:2003
With OVER (ORDER BY n) a new type of functions make sense:
n ROW_NUMBER RANK DENSE_RANK PERCENT_RANK CUME_DIST A 1 1 1 0 0.25 B 2 2 2 0.33… 0.75 B 3 2 2 0.33… 0.75 X 4 4 3 1 1 OVER (SQL:2003) Use Cases
‣ Aggregates without GROUP BY OVER (SQL:2003) Use Cases
‣ Aggregates without GROUP BY
‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg OVER (SQL:2003) Use Cases
‣ Aggregates without GROUP BY
‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg ‣ Ranking OVER (SQL:2003) Use Cases
‣ Aggregates without GROUP BY
‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg
‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* FROM t) numbered_t WHERE rn <= 3 OVER (SQL:2003) Use Cases
‣ Aggregates without GROUP BY
‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg
‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* Avoiding self-joins FROM t) numbered_t ‣ WHERE rn <= 3 OVER (SQL:2003) Use Cases
‣ Aggregates without GROUP BY
‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg
‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* Avoiding self-joins FROM t) numbered_t ‣ WHERE rn <= 3 [… many more …] OVER (SQL:2003) Availability
1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2 MariaDB 8.0 MySQL 8.4 PostgreSQL 3.25.0 SQLite 7.0 DB2 LUW 8i Oracle 2005[0] 2012 SQL Server [0]No framing Impala OVER (SQL:2003) Availability Hive Spark BigQuery
NuoDB 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2 MariaDB 8.0 MySQL 8.4 PostgreSQL 3.25.0 SQLite 7.0 DB2 LUW 8i Oracle 2005[0] 2012 SQL Server [0]No framing Inverse Distribution Functions (percentiles) Inverse Distribution Functions The Problem
Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) Inverse Distribution Functions The Problem
Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Inverse Distribution Functions Since SQL:2003 Median SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Inverse Distribution Functions Since SQL:2003 Median SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Which value? Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation) Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Two variants: 1 ‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 ‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISC ‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISC ‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISCPERCENTILE_DISC(0.5) ‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISCPERCENTILE_DISC(0.5) ‣ for discrete values 2 PERCENTILE_CONT (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003 SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISCPERCENTILE_DISC(0.5) ‣ for discrete values 2 PERCENTILE_CONTPERCENTILE_CONT(0.5) (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Availability 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.3[0] MariaDB MySQL 9.4 PostgreSQL SQLite 11.1 DB2 LUW 9iR1 Oracle 2012[0] SQL Server [0]Only as window function (requires OVER clause) TABLESAMPLE TABLESAMPLE Since SQL:2003 SELECT * FROM tbl TABLESAMPLE system(10) REPEATABLE (0) User provided seed value TABLESAMPLE Since SQL:2003 Or: bernoulli(10) better sampling, but slow. SELECT * FROM tbl TABLESAMPLE system(10) REPEATABLE (0) User provided seed value TABLESAMPLE Since SQL:2003 Or: bernoulli(10) better sampling, but slow. SELECT * FROM tbl TABLESAMPLE system(10) REPEATABLE (0) User provided seed value TABLESAMPLE Availability 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB MySQL 9.5[0] PostgreSQL SQLite 8.2[0] DB2 LUW 8i[0] Oracle 2005[0] SQL Server [0]Not for derived tables SQL:2008 MERGE MERGE The Problem Copying rows from another table is easy: INSERT INTO Copying rows from another table is easy: INSERT INTO Deleting rows that exist in another table is also possible: DELETE FROM Deleting rows that exist in another table is also possible: DELETE FROM Updating rows from another table is awkward: UPDATE Updating rows from another table is awkward: Requires a name UPDATE Updating rows from another table is awkward: UPDATE SQL:2008 introduced merge to improve this situation: MERGE INTO ‣It has always two tables in scope, the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go. MERGE Since SQL:2008 SQL:2008 introduced merge to improve this situation: MERGE INTO ‣It has always two tables in scope, the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go. MERGE Since SQL:2008 SQL:2008 introduced merge to improve this situation: MERGE INTO ‣It has always two tables in scope, the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go. MERGE Since SQL:2008 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB MySQL PostgreSQL SQLite 9.1 DB2 LUW 9iR1[0] Oracle 2008 SQL Server [0]No AND condition. FETCH FIRST FETCH FIRST The Problem Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary) FETCH FIRST The Problem Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary) SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn <=10 FETCH FIRST The Problem Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary) SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn <=10 SQL:2003 introduced ROW_NUMBER() to number rows. But this still requires wrapping to limit the result. And how about databases not supporting ROW_NUMBER()? FETCH FIRST The Problem Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary) SELECT * Dammit! FROM (SELECT * Let's take , ROW_NUMBER() OVER(ORDER BY x) rn LIMIT FROM data) numbered_data WHERE rn <=10 SQL:2003 introduced ROW_NUMBER() to number rows. But this still requires wrapping to limit the result. And how about databases not supporting ROW_NUMBER()? FETCH FIRST Since SQL:2008 SQL:2008 introduced the FETCH FIRST … ROWS ONLY clause: SELECT * FROM data ORDER BY x FETCH FIRST 10 ROWS ONLY FETCH FIRST Availability 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB 3.19.3[0] MySQL 6.5[1] 8.4 PostgreSQL 2.1.0[1] SQLite 7.0 DB2 LUW 12cR1 Oracle 7.0[2] 2012 SQL Server [0]Earliest mention of LIMIT. Probably inherited from mSQL [1]Functionality available using LIMIT [2]SELECT TOP n ... SQL Server 2000 also supports expressions and bind parameters SQL:2011 OFFSET OFFSET The Problem How to fetch the rows after a limit? (pagination anybody?) SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn > 10 and rn <= 20 OFFSET Since SQL:2011 SQL:2011 introduced OFFSET, unfortunately! SELECT * FROM data ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY OFFSET Since SQL:2011 SQL:2011 introduced OFFSET, unfortunately! SELECT * FROM data OFFSET ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY OFFSET Since SQL:2011 SQL:2011 introduced OFFSET, unfortunately! SELECT * FROM data OFFSET ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY Grab coasters & stickers! https://use-the-index-luke.com/no-offset OFFSET Since SQL:2011 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB 3.20.3[0] 4.0.6[1] MySQL 6.5 PostgreSQL 2.1.0 SQLite 9.7[2] 11.1 DB2 LUW 12cR1 Oracle 2012 SQL Server [0]LIMIT [offset,] limit: "With this it's easy to do a poor man's next page/previous page WWW application." [1]The release notes say "Added PostgreSQL compatible LIMIT syntax" [2]Requires enabling the MySQL compatibility vector: db2set DB2_COMPATIBILITY_VECTOR=MYS OVER OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows) balance … 50 … SELECT balance 90 … , balance - COALESCE (MAX(balance) 70 … OVER(ORDER BY … 30 … ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows) balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows) balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows) balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows) balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows) balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) Since SQL:2011 SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that: OVER (SQL:2011) Since SQL:2011 SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that: SELECT *, balance - COALESCE( LAG(balance) OVER(ORDER BY x) , 0) FROM t OVER (SQL:2011) Since SQL:2011 SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that: SELECT *, balance - COALESCE( LAG(balance) OVER(ORDER BY x) , 0) FROM t Available functions: LEAD / LAG FIRST_VALUE / LAST_VALUE NTH_VALUE(col, n) FROM FIRST/LAST RESPECT/IGNORE NULLS OVER (LEAD, LAG, …) Since SQL:2011 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2[0] MariaDB 8.0[0] MySQL 8.4[0] PostgreSQL 3.25.0[0] SQLite 9.5[1] 11.1 DB2 LUW 8i[1] 11gR2 Oracle 2012[1] SQL Server [0]No IGNORE NULLS and FROM LAST [1]No NTH_VALUE System Versioning (Time Traveling) System Versioning The Problem INSERT UPDATE DELETE are DESTRUCTIVE System Versioning Since SQL:2011 Table can be system versioned, application versioned or both. System Versioning Since SQL:2011 Table can be system versioned, application versioned or both. CREATE TABLE t (..., System Versioning Since SQL:2011 Table can be system versioned, application versioned or both. CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, System Versioning Since SQL:2011 Table can be system versioned, application versioned or both. CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END, System Versioning Since SQL:2011 Table can be system versioned, application versioned or both. CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END, PERIOD FOR SYSTEM_TIME (start_ts, end_ts) System Versioning Since SQL:2011 Table can be system versioned, application versioned or both. CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END, PERIOD FOR SYSTEM_TIME (start_ts, end_ts) ) WITH SYSTEM VERSIONING System Versioning Since SQL:2011 INSERT ... (ID, DATA) VALUES (1, 'X') ID Data start_ts end_ts 1 X 10:00:00 UPDATE ... SET DATA = 'Y' ... ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 DELETE ... WHERE ID = 1 System Versioning Since SQL:2011 INSERT ... (ID, DATA) VALUES (1, 'X') ID Data start_ts end_ts 1 X 10:00:00 UPDATE ... SET DATA = 'Y' ... ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 DELETE ... WHERE ID = 1 System Versioning Since SQL:2011 INSERT ... (ID, DATA) VALUES (1, 'X') ID Data start_ts end_ts 1 X 10:00:00 UPDATE ... SET DATA = 'Y' ... ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 DELETE ... WHERE ID = 1 INSERT ... (ID, DATA) VALUES (1, 'X') ID Data start_ts end_ts System Versioning1 X 10:00:00 Since SQL:2011 UPDATE ... SET DATA = 'Y' ... ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 DELETE ... WHERE ID = 1 ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00 System Versioning Since SQL:2011 ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00 Although multiple versions exist, only the “current” one is visible per default. After 12:00:00, SELECT * FROM t doesn’t return anything anymore. System Versioning Since SQL:2011 ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00 With FOR … AS OF you can query anything you like: SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2019-05-08 10:30:00' ID Data start_ts end_ts 1 X 10:00:00 11:00:00 System Versioning Since SQL:2011 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.3 MariaDB MySQL PostgreSQL SQLite 10.1 DB2 LUW 10gR1[0] 11gR1[1] Oracle 2016 SQL Server [0]Short term using Flashback. [1]Flashback Archive. Proprietery syntax. SQL:2016 (released: 2016-12-15) JSON_TABLE JSON Since SQL:2016 JSON Since SQL:2016 [ { "id": 42, "a1": "foo" }, { "id": 43, "a1": "bar" } ] JSON Since SQL:2016 [ { "id": 42, "a1": "foo" id a1 }, 42 foo { 43 bar "id": 43, "a1": "bar" } ] JSON_TABLE Since SQL:2016 [ SELECT * { "id": 42, FROM JSON_TABLE "a1": "foo" }, ( ? { "id": 43, , '$[*]' "a1": "bar" COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016 [ SELECT * { "id": 42, FROM JSON_TABLE "a1": "foo" }, ( ? { "id": 43, , '$[*]' Bind "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016 SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, FROM JSON_TABLE select elements from "a1": "foo" a JSON document }, ( ? { ‣ Defined in the , '$[*]' Bind "id": 43, SQL standard "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016 SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, FROM JSON_TABLE select elements from "a1": "foo" a JSON document }, ( ? { ‣ Defined in the , '$[*]' Bind "id": 43, SQL standard "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016 SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, FROM JSON_TABLE select elements from "a1": "foo" a JSON document }, ( ? { ‣ Defined in the , '$[*]' Bind "id": 43, SQL standard "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016 SELECT * FROM JSON_TABLE ( ? , '$[*]' COLUMNS ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' ) ) r JSON_TABLE Since SQL:2016 INSERT INTO target_table SELECT * FROM JSON_TABLE ( ? , '$[*]' COLUMNS ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' ) ) r JSON_TABLE Availability 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 8.0 MySQL PostgreSQL SQLite DB2 LUW 12cR1 Oracle SQL Server MATCH_RECOGNIZE (Row Pattern Matching) Row Pattern Matching The Problem Example: Logfile Row Pattern Matching The Problem Example: Logfile 30 minutes Time Row Pattern Matching The Problem Example: Logfile 30 minutes Time Row Pattern Matching The Problem Example: Logfile 30 minutes Time Row Pattern Matching The Problem Example: Logfile 30 minutes Session 3 Session 1 Session 2 Time Session 4 Row Pattern Matching The Problem 30 minutes Time Row Pattern Matching The Problem 30 minutes Time SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem 30 minutes Time SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start Start-of-group FROM log ) tagged tags ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem 30 minutes Time SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start Start-of-group FROM log ) tagged tags ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem 30 minutes Time number SELECT count(*) sessions, avg(duration) avg_duration sessions FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem 30 minutes 22 222 2 3 33 3 444 4 1 Time number SELECT count(*) sessions, avg(duration) avg_duration sessions FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem 30 minutes 22 222 2 3 33 3 444 4 1 Time SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching Regular Expressions .\S* Row Pattern Matching Regular Expressions Regular Expression .\S* Row Pattern Matching Regular Expressions .\S* { any character non-white space Rail track diagram by regexper.com Row Pattern Matching Regular Expressions .\S* { { any character non-white space Rail track diagram by regexper.com Row Pattern Matching Regular Expressions .\S*{ { { any character non-white space Rail track diagram by regexper.com Row Pattern Matching MATCH_RECOGNIZE applies regular expressions on rows. Row Pattern Matching MATCH_RECOGNIZE applies regular expressions on rows. ‣Define the pattern variables (“characters”): DEFINE deleted AS status = 99 You can also look at other rows for that: DEFINE continuation AS ts < PREV(ts) + INTERVAL '30' MINUTE Row Pattern Matching MATCH_RECOGNIZE applies regular expressions on rows. ‣Define the pattern variables (“characters”): DEFINE deleted AS status = 99 You can also look at other rows for that: DEFINE continuation AS ts < PREV(ts) + INTERVAL '30' MINUTE Row Pattern Matching MATCH_RECOGNIZE applies regular expressions on rows. ‣Define the pattern variables (“characters”): DEFINE deleted AS status = 99 You can also look at other rows for that: DEFINE continuation AS ts < PREV(ts) + INTERVAL '30' MINUTE ‣Use this alphabet in an regular expression: PATTERN (deleted continuation*) Row Pattern Matching Since SQL:2016 30 minutes Time Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) define + INTERVAL '30' minute continued ) t Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) undefined + INTERVAL '30' minute pattern variable: ) t matches any row Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES any number LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH of “cont” PATTERN ( any cont* ) rows DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts Very much MEASURES like GROUP LAST(ts) - FIRST(ts) AS duration BY ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( Very much ORDER BY ts like SELECT MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016 30 minutes Time SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Availability 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB MySQL PostgreSQL SQLite DB2 LUW 12cR1 Oracle SQL Server https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92 https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92 SQL has evolved beyond the relational idea https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92 SQL has evolved beyond the relational idea If you use SQL for CRUD operations only, you are doing it wrong https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92 SQL has evolved beyond the relational idea If you use SQL for CRUD operations only, https://modern-sql.com you are doing it wrong @ModernSQL by @MarkusWinand https://www.flickr.com/photos/mfoubister/25367243054/