Still using Windows 3.1?

So why stick with SQL-92?

@ModernSQL - https://modern-sql.com/ @MarkusWinand

SQL:1999 LATERAL LATERAL Before SQL:1999

Select-list sub-queries must be scalar[0]: (an atomic quantity that can hold only one value at a time[1])

SELECT … , (SELECT column_1 FROM t1 WHERE t1.x = t2.y ) AS c FROM t2 …

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar LATERAL Before SQL:1999

Select-list sub-queries must be scalar[0]: (an atomic quantity that can hold only one value at a time[1])

SELECT … , (SELECT column_1 , column_2 FROM t1 ✗ WHERE t1.x = t2.y ) AS c More than FROM t2 one ? … ⇒Syntax error

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar LATERAL Before SQL:1999

Select-list sub-queries must be scalar[0]: (an atomic quantity that can hold only one value at a time[1])

SELECT … More than , (SELECT column_1 , column_2 one row? ⇒Runtime error! FROM t1 ✗ WHERE t1.x = t2.y } ) AS c More than FROM t2 one column? … ⇒Syntax error

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar LATERAL Since SQL:1999

Lateral derived queries can see names defined before:

SELECT * FROM t1 CROSS JOIN LATERAL (SELECT * FROM t2 WHERE t2.x = t1.x ) derived_table ON (true) LATERAL Since SQL:1999

Lateral derived queries can see table names defined before:

SELECT * FROM t1 Valid due to CROSS JOIN LATERAL (SELECT * LATERAL FROM t2 keyword WHERE t2.x = t1.x ) derived_table ON (true) LATERAL Since SQL:1999

Lateral derived queries can see table names defined before:

SELECT * FROM t1 Valid due to CROSS JOIN LATERAL (SELECT * LATERAL FROM t2 keyword WHERE t2.x = t1.x Useless, but ) derived_table still required ON (true) LATERAL Since SQL:1999

Lateral derived queries can see table names defined before:

SELECT * FROM t1 Valid due to CROSS JOIN LATERAL (SELECT * LATERAL keyword Use FROM t2 CROSS JOIN to WHERE t2.x = t1.x omit the ON clause ) derived_table ON (true) But WHY? LATERAL Use-Cases LATERAL Use-Cases

‣ Top-N per group

inside a lateral derived table FETCH FIRST (or LIMIT, TOP) applies per row left tables. LATERAL Use-Cases

‣ Top-N per group FROM t

inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… … LIMIT 10 ) derived_table LATERAL Use-Cases

‣ Top-N per group FROM t

inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … Add proper index LIMIT 10 for Top-N query ) derived_table

https://use-the-index-luke.com/sql/partial-results/top-n-queries LATERAL Use-Cases

‣ Top-N per group FROM t

inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … ‣ Also useful to find most recent LIMIT 10 news from several subscribed ) derived_table topics (“multi-source top-N”). LATERAL Use-Cases

‣ Top-N per group FROM t

inside a lateral derived table CROSS JOIN LATERAL ( SELECT … (or , ) FETCH FIRST LIMIT TOP FROM … applies per row from left tables. WHERE t.c=… ORDER BY … ‣ Also useful to find most recent LIMIT 10 news from several subscribed ) derived_table topics (“multi-source top-N”).

‣ Table function arguments FROM t (TABLE often implies LATERAL) JOIN TABLE (your_func(t.c)) LATERAL Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB 8.0.14 MySQL 9.3 PostgreSQL SQLite 9.1 DB2 LUW 11gR1[0] 12cR1 Oracle 2005[1] SQL Server [0]Undocumented. Requires setting trace event 22829. [1]LATERAL is not supported as of SQL Server 2016 but [CROSS|OUTER] APPLY can be used for the same effect. WITH RECURSIVE (Common Table Expressions) WITH RECURSIVE CREATE TABLE t ( id INTEGER, parent INTEGER) WITH RECURSIVE CREATE TABLE t ( id INTEGER, parent INTEGER) WITH RECURSIVE CREATE TABLE t ( id INTEGER, parent INTEGER) WITH RECURSIVE Since SQL:1999

WITH RECURSIVE Since SQL:1999

SELECT t.id, t.parent FROM t WHERE t.id = ?

WITH RECURSIVE Since SQL:1999

SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? WITH RECURSIVE Since SQL:1999

SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? WITH RECURSIVE Since SQL:1999

SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? WITH RECURSIVE Since SQL:1999

WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL SELECT t.id, t.parent FROM t WHERE t.parent = ? ) SELECT * FROM prev WITH RECURSIVE Since SQL:1999

WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL

SELECT t.id, t.parent FROM t JOIN prev ON t.parent = prev.id ) SELECT * FROM prev WITH RECURSIVE Since SQL:1999

WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL

SELECT t.id, t.parent FROM t JOIN prev ON t.parent = prev.id ) SELECT * FROM prev WITH RECURSIVE Since SQL:1999

WITH RECURSIVE prev (id, parent) AS ( SELECT t.id, t.parent FROM t WHERE t.id = ? UNION ALL

SELECT t.id, t.parent FROM t JOIN prev ON t.parent = prev.id ) SELECT * FROM prev WITH RECURSIVE Use Cases WITH RECURSIVE Use Cases

‣ Processing graphs “[…] for certain classes of graphs, solutions utilizing technology […] can offer Shortest route from person A performance superior to that of the dedicated graph to B in LinkedIn/Facebook/… .” event.cwi.nl/grades2013/07-welc.pdf WITH RECURSIVE Use Cases

‣ Processing graphs “[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer Shortest route from person A performance superior to that of the dedicated graph to B in LinkedIn/Facebook/… databases.” event.cwi.nl/grades2013/07-welc.pdf

https://wiki.postgresql.org/wiki/Loose_indexscan ‣ Finding distinct values

in n*log(N)† time complexity

† n … # distinct values, N … # of table rows. Suitable index required WITH RECURSIVE Use Cases

‣ Processing graphs “[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer Shortest route from person A performance superior to that of the dedicated graph to B in LinkedIn/Facebook/… databases.” event.cwi.nl/grades2013/07-welc.pdf

https://wiki.postgresql.org/wiki/Loose_indexscan ‣ Finding distinct values

† WITH RECURSIVE gen (n) AS ( in n*log(N) time complexity SELECT 1 UNION ALL SELECT n+1 ‣ Row generators FROM gen

WHERE n < ? To fill gaps (e.g., in time ) series), generate test data SELECT * FROM gen

† n … # distinct values, N … # of table rows. Suitable index required WITH RECURSIVE Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2 MariaDB 8.0 MySQL 8.4 PostgreSQL 3.8.3[0] SQLite 7.0 DB2 LUW 11gR2 Oracle 2005 SQL Server [0]Only for top-level SELECT statements GROUPING SETS GROUPING SETS Before SQL:1999

Only one GROUP BY operation at a time: GROUPING SETS Before SQL:1999

Only one GROUP BY operation at a time: Monthly revenue SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month GROUPING SETS Before SQL:1999

Only one GROUP BY operation at a time: Monthly revenue Yearly revenue SELECT year SELECT year , month , sum(revenue) , sum(revenue) FROM tbl FROM tbl GROUP BY year, month GROUP BY year GROUPING SETS Before SQL:1999

SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month

SELECT year

, sum(revenue) FROM tbl GROUP BY year GROUPING SETS Before SQL:1999

SELECT year , month , sum(revenue) FROM tbl GROUP BY year, month UNION ALL SELECT year , , sum(revenue) FROM tbl GROUP BY year GROUPING SETS Since SQL:1999

SELECT year SELECT year , month , month , sum(revenue) , sum(revenue) FROM tbl FROM tbl GROUP BY year, month GROUP BY UNION ALL GROUPING SETS ( SELECT year (year, month) , null , (year) , sum(revenue) ) FROM tbl GROUP BY year GROUPING SETS Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1[0] MariaDB 5.0[1] MySQL 9.5 PostgreSQL SQLite 5 DB2 LUW 9iR1 Oracle 2008 SQL Server [0]Only ROLLUP (properitery syntax). [1]Only ROLLUP (properitery syntax). GROUPING function since MySQL 8.0. SQL:2003 OVER and BY OVER (PARTITION BY) The Problem

Two distinct concepts could not be used independently: OVER (PARTITION BY) The Problem

Two distinct concepts could not be used independently: ‣ Merge rows with the same key properties ‣ GROUP BY to specify key properties ‣ DISTINCT to use full row as key OVER (PARTITION BY) The Problem

Two distinct concepts could not be used independently: ‣ Merge rows with the same key properties ‣ GROUP BY to specify key properties ‣ DISTINCT to use full row as key ‣ Aggregate data from related rows ‣ Requires GROUP BY to segregate the rows ‣ COUNT, SUM, AVG, MIN, MAX to aggregate grouped rows OVER (PARTITION BY) The Problem OVER (PARTITION BY) The Problem

SELECT c1 , c2 FROM t OVER (PARTITION BY) The Problem

SELECT c1

No , c2 ⇢

FROM t Merge rows Merge

⇠ Yes Yes OVER (PARTITION BY) The Problem

SELECT c1

No , c2 ⇢

FROM t

SELECT DISTINCT Merge rows Merge c1 ⇠ , c2

Yes Yes FROM t OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1

No , c2 ⇢

FROM t

SELECT DISTINCT Merge rows Merge c1 ⇠ , c2

Yes Yes FROM t OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1

No , c2 ⇢

FROM t

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 ⇢

FROM t FROM t

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 ⇢

FROM t FROM t JOIN ( ) ta ON (t.c1=ta.c1)

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 ⇢

FROM t FROM t SELECT c1 , SUM(c2) tot FROM t JOIN ( ) ta GROUP BY c1 ON (t.c1=ta.c1)

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 , tot ⇢

FROM t FROM t SELECT c1 , SUM(c2) tot FROM t JOIN ( ) ta GROUP BY c1 ON (t.c1=ta.c1)

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) The Problem

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 , tot ⇢

FROM t FROM t SELECT c1 , SUM(c2) tot FROM t JOIN ( ) ta GROUP BY c1 ON (t.c1=ta.c1)

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) Since SQL:2003

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 ⇢

FROM t FROM tFROM t

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) Since SQL:2003

No ⇠ Aggregate ⇢ Yes

SELECT c1 SELECT c1

No , c2 , c2 ⇢

FROM t FROM t, SUM(c2) OVER (PARTITION BY c1) FROM t

SELECT DISTINCT SELECT c1 Merge rows Merge c1 , SUM(c2) tot ⇠ , c2 FROM t

Yes Yes FROM t GROUP BY c1 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, Look here 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 6000 salary, 22 1000 6000 SUM(salary) 22 1000 6000 OVER() ) FROM emp 333 1000 6000 333 1000 6000 333 1000 6000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() ) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER (PARTITION BY) How it works

dep salary SELECT dep, 1 1000 1000 salary, 22 1000 2000 SUM(salary) 22 1000 2000 OVER() PARTITION BY dep) FROM emp 333 1000 3000 333 1000 3000 333 1000 3000 OVER and ORDER BY (Framing & Ranking) OVER (ORDER BY) The Problem

acnt id value balance SELECT id, value, 1 1 +10 +10 FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem

acnt id value balance SELECT id, value, 1 1 +10 +10 FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem

acnt id value balance SELECT id, value, 1 1 +10 +10 FROM transactions t 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem

acnt id value balance SELECT id, value, 1 1 +10 +10 (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) FROM transactions t 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) The Problem

acnt id value balance SELECT id, value, 1 1 +10 +10 (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) FROM transactions t 333 4 +50 +70 333 5 -30 +40 Range segregation (<=) not possible with 333 6 -20 +20 GROUP BY or PARTITION BY OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 (SELECT SUM(value) 22 2 +20 +30 FROM transactions t2 22 3 -10 +20 WHERE t2.id <= t.id) FROM transactions t 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 22 2 +20 +30 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id 333 4 +50 +70 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +30 OVER ( 22 3 -10 +20 ORDER BY id ROWS BETWEEN 333 4 +50 +70 UNBOUNDED PRECEDING 333 5 -30 +40 AND CURRENT ROW 333 6 -20 +20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 SUM(value) 22 2 +20 OVER ( 22 3 -10 ORDER BY id ROWS BETWEEN 333 4 +50 UNBOUNDED PRECEDING 333 5 -30 AND CURRENT ROW 333 6 -20 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

acnt id value balance SELECT id, value, 1 1 +10 +10 SUM(value) 22 2 +20 +20 OVER ( PARTITION BY acnt 22 3 -10 +10 ORDER BY id ROWS BETWEEN 333 4 +50 +50 UNBOUNDED PRECEDING 333 5 -30 +20 AND CURRENT ROW

333 6 -20 . 0 ) FROM transactions t OVER (ORDER BY) Since SQL:2003

With OVER (ORDER BY n) a new type of functions make sense:

n ROW_NUMBER RANK DENSE_RANK PERCENT_RANK CUME_DIST A 1 1 1 0 0.25 B 2 2 2 0.33… 0.75 B 3 2 2 0.33… 0.75 X 4 4 3 1 1 OVER (SQL:2003) Use Cases

‣ Aggregates without GROUP BY OVER (SQL:2003) Use Cases

‣ Aggregates without GROUP BY

‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg OVER (SQL:2003) Use Cases

‣ Aggregates without GROUP BY

‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg ‣ Ranking OVER (SQL:2003) Use Cases

‣ Aggregates without GROUP BY

‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg

‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* FROM t) numbered_t WHERE rn <= 3 OVER (SQL:2003) Use Cases

‣ Aggregates without GROUP BY

‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg

‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* Avoiding self-joins FROM t) numbered_t ‣ WHERE rn <= 3 OVER (SQL:2003) Use Cases

‣ Aggregates without GROUP BY

‣ Running totals, AVG(…) OVER(ORDER BY … ROWS BETWEEN 3 PRECEDING moving averages AND 3 FOLLOWING) moving_avg

‣ Ranking SELECT * FROM (SELECT ROW_NUMBER() ‣ Top-N per Group OVER(PARTITION BY … ORDER BY …) rn , t.* Avoiding self-joins FROM t) numbered_t ‣ WHERE rn <= 3 [… many more …] OVER (SQL:2003) Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2 MariaDB 8.0 MySQL 8.4 PostgreSQL 3.25.0 SQLite 7.0 DB2 LUW 8i Oracle 2005[0] 2012 SQL Server [0]No framing Impala OVER (SQL:2003) Availability Hive Spark BigQuery

NuoDB 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2 MariaDB 8.0 MySQL 8.4 PostgreSQL 3.25.0 SQLite 7.0 DB2 LUW 8i Oracle 2005[0] 2012 SQL Server [0]No framing Inverse Distribution Functions (percentiles) Inverse Distribution Functions The Problem

Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) Inverse Distribution Functions The Problem

Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id

Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id

Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id

Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id

Grouped rows cannot be ordered prior to aggregation. (how to get the middle value (median) of a set) SELECT d1.val Number rows FROM data d1 JOIN data d2 ON (d1.val < d2.val OR (d1.val=d2.val AND d1.id

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Inverse Distribution Functions Since SQL:2003

Median SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Inverse Distribution Functions Since SQL:2003

Median SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Which value? Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation) Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data

Two variants: 1

‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1

‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISC

‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISC

‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISCPERCENTILE_DISC(0.5)

‣ for discrete values 2 (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISCPERCENTILE_DISC(0.5)

‣ for discrete values 2 PERCENTILE_CONT (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Since SQL:2003

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY val) FROM data 0 0.25 0.5 0.75 1 Two variants: 1 PERCENTILE_DISCPERCENTILE_DISC(0.5)

‣ for discrete values 2 PERCENTILE_CONTPERCENTILE_CONT(0.5) (categories) 3 ‣ for continuous values (linear interpolation) 4 Inverse Distribution Functions Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.3[0] MariaDB MySQL 9.4 PostgreSQL SQLite 11.1 DB2 LUW 9iR1 Oracle 2012[0] SQL Server [0]Only as (requires OVER clause) TABLESAMPLE TABLESAMPLE Since SQL:2003

SELECT * FROM tbl TABLESAMPLE system(10) REPEATABLE (0)

User provided seed value TABLESAMPLE Since SQL:2003

Or:

bernoulli(10) better sampling, but slow. SELECT * FROM tbl TABLESAMPLE system(10) REPEATABLE (0)

User provided seed value TABLESAMPLE Since SQL:2003

Or:

bernoulli(10) better sampling, but slow. SELECT * FROM tbl TABLESAMPLE system(10) REPEATABLE (0)

User provided seed value TABLESAMPLE Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB MySQL 9.5[0] PostgreSQL SQLite 8.2[0] DB2 LUW 8i[0] Oracle 2005[0] SQL Server [0]Not for derived tables SQL:2008 MERGE MERGE The Problem

Copying rows from another table is easy: INSERT INTO SELECT … FROM WHERE NOT EXISTS (SELECT * FROM WHERE … ) MERGE The Problem

Copying rows from another table is easy: INSERT INTO SELECT … FROM WHERE NOT EXISTS (SELECT * FROM WHERE … ) Both, and are in scope here. MERGE The Problem

Deleting rows that exist in another table is also possible:

DELETE FROM WHERE EXISTS (SELECT * FROM WHERE … ) MERGE The Problem

Deleting rows that exist in another table is also possible:

DELETE FROM WHERE EXISTS (SELECT * FROM WHERE … ) Both, and are in scope here. MERGE The Problem

Updating rows from another table is awkward:

UPDATE SET … WHERE … MERGE The Problem

Updating rows from another table is awkward: Requires a name UPDATE (table or updatable ) SET … but not a subquery WHERE … MERGE The Problem

Updating rows from another table is awkward:

UPDATE SET (col1, col2) = (SELECT col1, col2 SET … FROM WHERE … WHERE … ) WHERE … Both, and are in scope here. MERGE Since SQL:2008

SQL:2008 introduced to improve this situation:

MERGE INTO USING ON WHEN MATCHED [AND ] THEN [UPDATE…|DELETE…] … WHEN NOT MATCHED [AND ] THEN INSERT…

‣It has always two tables in scope, the source can be a derived table (subquery). ‣It can do , , or in one go. MERGE Since SQL:2008

SQL:2008 introduced merge to improve this situation:

MERGE INTO USING ON WHEN MATCHED [AND ] THEN [UPDATE…|DELETE…] WHEN NOT MATCHED [AND ] THEN INSERT…

‣It has always two tables in scope, the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go. MERGE Since SQL:2008

SQL:2008 introduced merge to improve this situation:

MERGE INTO USING ON WHEN MATCHED [AND ] THEN [UPDATE…|DELETE…] WHEN NOT MATCHED [AND ] THEN INSERT…

‣It has always two tables in scope, the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go. MERGE Since SQL:2008

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB MySQL PostgreSQL SQLite 9.1 DB2 LUW 9iR1[0] Oracle 2008 SQL Server [0]No AND condition. FETCH FIRST FETCH FIRST The Problem

Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary) FETCH FIRST The Problem

Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn <=10 FETCH FIRST The Problem

Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn <=10 SQL:2003 introduced ROW_NUMBER() to number rows. But this still requires wrapping to limit the result. And how about databases not supporting ROW_NUMBER()? FETCH FIRST The Problem

Limit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SELECT * Dammit! FROM (SELECT * Let's take , ROW_NUMBER() OVER(ORDER BY x) rn LIMIT FROM data) numbered_data WHERE rn <=10 SQL:2003 introduced ROW_NUMBER() to number rows. But this still requires wrapping to limit the result. And how about databases not supporting ROW_NUMBER()? FETCH FIRST Since SQL:2008

SQL:2008 introduced the FETCH FIRST … ROWS ONLY clause:

SELECT * FROM data ORDER BY x FETCH FIRST 10 ROWS ONLY FETCH FIRST Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB 3.19.3[0] MySQL 6.5[1] 8.4 PostgreSQL 2.1.0[1] SQLite 7.0 DB2 LUW 12cR1 Oracle 7.0[2] 2012 SQL Server [0]Earliest mention of LIMIT. Probably inherited from mSQL [1]Functionality available using LIMIT [2]SELECT TOP n ... SQL Server 2000 also supports expressions and bind parameters SQL:2011 OFFSET OFFSET The Problem

How to fetch the rows after a limit? (pagination anybody?)

SELECT * FROM (SELECT * , ROW_NUMBER() OVER(ORDER BY x) rn FROM data) numbered_data WHERE rn > 10 and rn <= 20 OFFSET Since SQL:2011

SQL:2011 introduced OFFSET, unfortunately!

SELECT * FROM data ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY OFFSET Since SQL:2011

SQL:2011 introduced OFFSET, unfortunately!

SELECT * FROM data OFFSET ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY OFFSET Since SQL:2011

SQL:2011 introduced OFFSET, unfortunately!

SELECT * FROM data OFFSET ORDER BY x OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY Grab coasters & stickers! https://use-the-index-luke.com/no-offset OFFSET Since SQL:2011

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 MariaDB 3.20.3[0] 4.0.6[1] MySQL 6.5 PostgreSQL 2.1.0 SQLite 9.7[2] 11.1 DB2 LUW 12cR1 Oracle 2012 SQL Server [0]LIMIT [offset,] limit: "With this it's easy to do a poor man's next page/previous page WWW application." [1]The release notes say "Added PostgreSQL compatible LIMIT syntax" [2]Requires enabling the MySQL compatibility vector: db2set DB2_COMPATIBILITY_VECTOR=MYS OVER OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows)

balance … 50 … SELECT balance 90 … , balance - COALESCE (MAX(balance) 70 … OVER(ORDER BY … 30 … ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows)

balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows)

balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows)

balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows)

balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) The Problem Direct access of other rows of the same window is ugly. (E.g., calculate the difference to the previous rows)

balance … diff 50 … +50 SELECT balance 90 … +40 , balance - COALESCE (MAX(balance) 70 … -20 OVER(ORDER BY … 30 … -40 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 0) diff FROM t OVER (SQL:2011) Since SQL:2011

SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that: OVER (SQL:2011) Since SQL:2011

SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that: SELECT *, balance - COALESCE( LAG(balance) OVER(ORDER BY x) , 0) FROM t OVER (SQL:2011) Since SQL:2011

SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that: SELECT *, balance - COALESCE( LAG(balance) OVER(ORDER BY x) , 0) FROM t Available functions: LEAD / LAG FIRST_VALUE / LAST_VALUE NTH_VALUE(col, n) FROM FIRST/LAST RESPECT/IGNORE NULLS OVER (LEAD, LAG, …) Since SQL:2011

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.2[0] MariaDB 8.0[0] MySQL 8.4[0] PostgreSQL 3.25.0[0] SQLite 9.5[1] 11.1 DB2 LUW 8i[1] 11gR2 Oracle 2012[1] SQL Server [0]No IGNORE NULLS and FROM LAST [1]No NTH_VALUE System Versioning (Time Traveling) System Versioning The Problem INSERT UPDATE DELETE are DESTRUCTIVE System Versioning Since SQL:2011

Table can be system versioned, application versioned or both. System Versioning Since SQL:2011

Table can be system versioned, application versioned or both.

CREATE TABLE t (..., System Versioning Since SQL:2011

Table can be system versioned, application versioned or both.

CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, System Versioning Since SQL:2011

Table can be system versioned, application versioned or both.

CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END, System Versioning Since SQL:2011

Table can be system versioned, application versioned or both.

CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END,

PERIOD FOR SYSTEM_TIME (start_ts, end_ts) System Versioning Since SQL:2011

Table can be system versioned, application versioned or both.

CREATE TABLE t (..., start_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW START, end_ts TIMESTAMP(9) GENERATED ALWAYS AS ROW END,

PERIOD FOR SYSTEM_TIME (start_ts, end_ts) ) WITH SYSTEM VERSIONING System Versioning Since SQL:2011

INSERT ... (ID, DATA) VALUES (1, 'X')

ID Data start_ts end_ts 1 X 10:00:00

UPDATE ... SET DATA = 'Y' ...

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00

DELETE ... WHERE ID = 1 System Versioning Since SQL:2011

INSERT ... (ID, DATA) VALUES (1, 'X')

ID Data start_ts end_ts 1 X 10:00:00

UPDATE ... SET DATA = 'Y' ...

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00

DELETE ... WHERE ID = 1 System Versioning Since SQL:2011

INSERT ... (ID, DATA) VALUES (1, 'X')

ID Data start_ts end_ts 1 X 10:00:00

UPDATE ... SET DATA = 'Y' ...

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00

DELETE ... WHERE ID = 1 INSERT ... (ID, DATA) VALUES (1, 'X')

ID Data start_ts end_ts System Versioning1 X 10:00:00 Since SQL:2011

UPDATE ... SET DATA = 'Y' ...

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00

DELETE ... WHERE ID = 1

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00 System Versioning Since SQL:2011

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00 Although multiple versions exist, only the “current” one is visible per default.

After 12:00:00, SELECT * FROM t doesn’t return anything anymore. System Versioning Since SQL:2011

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 1 Y 11:00:00 12:00:00 With FOR … AS OF you can query anything you like: SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2019-05-08 10:30:00'

ID Data start_ts end_ts 1 X 10:00:00 11:00:00 System Versioning Since SQL:2011

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 5.1 10.3 MariaDB MySQL PostgreSQL SQLite 10.1 DB2 LUW 10gR1[0] 11gR1[1] Oracle 2016 SQL Server [0]Short term using Flashback. [1]Flashback Archive. Proprietery syntax. SQL:2016 (released: 2016-12-15) JSON_TABLE JSON Since SQL:2016 JSON Since SQL:2016

[ { "id": 42, "a1": "foo" },

{ "id": 43, "a1": "bar" } ] JSON Since SQL:2016

[ { "id": 42, "a1": "foo" id a1 }, 42 foo { 43 bar "id": 43, "a1": "bar" } ] JSON_TABLE Since SQL:2016

[ SELECT * { "id": 42, FROM JSON_TABLE "a1": "foo" }, ( ? { "id": 43, , '$[*]' "a1": "bar" COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016

[ SELECT * { "id": 42, FROM JSON_TABLE "a1": "foo" }, ( ? { "id": 43, , '$[*]' Bind "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016

SQL/JSON Path [ SELECT * { ‣ to "id": 42, FROM JSON_TABLE elements from "a1": "foo" a JSON document }, ( ? { ‣ Defined in the , '$[*]' Bind "id": 43, SQL standard "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016

SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, FROM JSON_TABLE select elements from "a1": "foo" a JSON document }, ( ? { ‣ Defined in the , '$[*]' Bind "id": 43, SQL standard "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016

SQL/JSON Path [ SELECT * { ‣ Query language to "id": 42, FROM JSON_TABLE select elements from "a1": "foo" a JSON document }, ( ? { ‣ Defined in the , '$[*]' Bind "id": 43, SQL standard "a1": "bar" Parameter COLUMNS } ( id INT PATH '$.id' ] , a1 VARCHAR(…) PATH '$.a1' id a1 ) 42 foo ) r 43 bar JSON_TABLE Since SQL:2016

SELECT * FROM JSON_TABLE ( ? , '$[*]' COLUMNS ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' ) ) r JSON_TABLE Since SQL:2016

INSERT INTO target_table SELECT * FROM JSON_TABLE ( ? , '$[*]' COLUMNS ( id INT PATH '$.id' , a1 VARCHAR(…) PATH '$.a1' ) ) r JSON_TABLE Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB 8.0 MySQL PostgreSQL SQLite DB2 LUW 12cR1 Oracle SQL Server MATCH_RECOGNIZE (Row Pattern Matching) Row Pattern Matching The Problem

Example: Logfile Row Pattern Matching The Problem

Example: Logfile

30 minutes

Time Row Pattern Matching The Problem

Example: Logfile

30 minutes

Time Row Pattern Matching The Problem

Example: Logfile

30 minutes

Time Row Pattern Matching The Problem

Example: Logfile

30 minutes Session 3

Session 1 Session 2 Time Session 4 Row Pattern Matching The Problem

30 minutes

Time Row Pattern Matching The Problem

30 minutes

Time

SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem

30 minutes

Time

SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start Start-of-group FROM log ) tagged tags ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem

30 minutes

Time

SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start Start-of-group FROM log ) tagged tags ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem

30 minutes

Time number SELECT count(*) sessions, avg(duration) avg_duration sessions FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem

30 minutes 22 222 2 3 33 3 444 4 1 Time number SELECT count(*) sessions, avg(duration) avg_duration sessions FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching The Problem

30 minutes 22 222 2 3 33 3 444 4 1 Time

SELECT count(*) sessions, avg(duration) avg_duration FROM (SELECT MAX(ts) - MIN(ts) duration FROM (SELECT ts, SUM(grp_start) OVER(ORDER BY ts) session_no FROM (SELECT ts, CASE WHEN ts >= LAG( ts, 1, DATE'1900-01-01' ) OVER( ORDER BY ts ) + INTERVAL '30' minute THEN 1 ELSE 0 END grp_start FROM log ) tagged ) numbered GROUP BY session_no ) grouped Row Pattern Matching Regular Expressions

.\S* Row Pattern Matching Regular Expressions

Regular Expression .\S* Row Pattern Matching Regular Expressions

.\S* {

any character non-white space

Rail track diagram by regexper.com Row Pattern Matching Regular Expressions

.\S* { {

any character non-white space

Rail track diagram by regexper.com Row Pattern Matching Regular Expressions

.\S*{ { {

any character non-white space

Rail track diagram by regexper.com Row Pattern Matching

MATCH_RECOGNIZE applies regular expressions on rows. Row Pattern Matching

MATCH_RECOGNIZE applies regular expressions on rows.

‣Define the pattern variables (“characters”):

DEFINE deleted AS status = 99

You can also look at other rows for that:

DEFINE continuation AS ts < PREV(ts) + INTERVAL '30' MINUTE Row Pattern Matching

MATCH_RECOGNIZE applies regular expressions on rows.

‣Define the pattern variables (“characters”):

DEFINE deleted AS status = 99

You can also look at other rows for that:

DEFINE continuation AS ts < PREV(ts) + INTERVAL '30' MINUTE Row Pattern Matching

MATCH_RECOGNIZE applies regular expressions on rows.

‣Define the pattern variables (“characters”):

DEFINE deleted AS status = 99

You can also look at other rows for that:

DEFINE continuation AS ts < PREV(ts) + INTERVAL '30' MINUTE

‣Use this alphabet in an regular expression:

PATTERN (deleted continuation*) Row Pattern Matching Since SQL:2016

30 minutes

Time Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) define + INTERVAL '30' minute continued ) t

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) undefined + INTERVAL '30' minute pattern variable: ) t matches any row

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES any number LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH of “cont” PATTERN ( any cont* ) rows DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts Very much MEASURES like GROUP LAST(ts) - FIRST(ts) AS duration BY ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( Very much ORDER BY ts like SELECT MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Since SQL:2016

30 minutes

Time

SELECT COUNT(*) sessions , AVG(duration) avg_duration FROM log MATCH_RECOGNIZE( ORDER BY ts MEASURES LAST(ts) - FIRST(ts) AS duration ONE ROW PER MATCH PATTERN ( any cont* ) DEFINE cont AS ts < PREV(ts) + INTERVAL '30' minute ) t

Oracle doesn’t support avg on intervals — query doesn’t work as shown Row Pattern Matching Availability

1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 MariaDB MySQL PostgreSQL SQLite DB2 LUW 12cR1 Oracle SQL Server https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92

https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92

SQL has evolved beyond the relational idea

https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92

SQL has evolved beyond the relational idea

If you use SQL for CRUD operations only, you are doing it wrong

https://www.flickr.com/photos/mfoubister/25367243054/ A lot has happened since SQL-92

SQL has evolved beyond the relational idea

If you use SQL for CRUD operations only, https://modern-sql.com you are doing it wrong @ModernSQL by @MarkusWinand

https://www.flickr.com/photos/mfoubister/25367243054/