Monetdb/SQL Niels Nes

Monetdb/SQL Niels Nes

MonetDB/SQL Niels Nes Outline ● SQL Features ● MonetDB ● Relational Mapping ● ACID ● Index ● optimizers ● Query Cache ● Future extensions Projects ● SkyServer ● GeoInfoNed SQL Features ● SQL©99 -> SQL©03 – Constraints (NULL, KEYS) – UNION, EXCEPT, INTERSECT – JOIN including outer – Correlated Sub Queries – Sequence numbers (auto_increment, serial) – boolean,char,int,float,decimal,blob,temporal – Recursive Transactions SQL Features ● Recent Additions – Create table .. as (like) with (out) data – With .. as (inlined views) – Triggers – Persistent Stored Modules – window functions (over partition by order by) – – ● Missing Features ● Cursors ● Collation (only UTF-8) ● Domain ● Cascading (drops, deletes) ● UDT (no OO) ● Collection types (array) ● Multi sets ● XML ● ● How difficult is it to add them? PSM ● If then else end if; ● While do end while; ● Case when then else end case ● Declare ● Return ● Select into var ● Table functions ● ● Is SQL a programming language? PSM example (1) Create Function my_complex_declare(v int) returns int begin declare x, y, z int; declare a int, b, c varchar(10); set x = v; return x; end; select my_complex_declare(1); PSM example (2) CREATE FUNCTION fWedgeV3(x1 float, y1 float, z1 float, x2 float, y2 float, z2 float) RETURNS TABLE (x float, y float, z float) RETURN TABLE(SELECT (y1*z2 - y2*z1) as x, (x2*z1 - x1*z2) as y, (x1*y2 - x2*y1) as z); select * from fWedgeV3( cast (1.0 as float), .... cast (1.0 as float), cast (1.0 as float)) fla; External Modules CREATE module url EXTERNAL NAME url CREATE TYPE url EXTERNAL NAME url; CREATE function getAnchor( theUrl url ) RETURNS STRING EXTERNAL NAME ©getAnchor©; END MODULE; Implementation Overview ● relational mapping ● index ● system overview ● multi backend system ● relational version ● query cache MonetDB ● Binary Tables ● Binary Algebra ● Full materialization ● Main Memory System ● Cache optimized ● Multi frontend system (SQL, Xquery) ● Interpreted language (MAL) Relational Mapping ● N columns ● N stable bats ● N insert bats (i) ● N update bats (u) ● 1 deleted bat (d) Doesn't it lead to many small objects to be managed? Example column ● C:=bind(column, 0); ● I :=bind(column, 1); ● U:=bind(column, 2); ● D:=bind(table); ● ● X := kunion(C, I); ● X := kdiff(X,U) ● X := kunion(X,U); ● X := kdiff(X.D); ACID ● Atomic ● Concurrent ● Isolation (by copies of (i,u,d)) ● Durability (read log after crash) Optimistic CC ● Multi Version time stamping – Keep read, write and transaction start time stamps – 3 phases ● Query ● Validate ● Apply changes or abort ● – Pos: No locking, no deadlock detection – Neg: Timestamps dictate order, current table granularity Index ● Hash key (for primary key constraints) – Multi attribute – Use hash shift xor – Only used for updates – ● Join index (for foreign key constraints) – Is n-1 mapping – Stored on foreign key side (not on unique side) – Used by optimizer System Overview ● SQL scanner-parser ● Syntax tree ● Algebra ● Optimizer ● Code Generation ● MAL optimizers ● Execution Multiple backends ● M4 produces mil strings ● M5 mal blocks (no mal-parsing) ● MonetDB/X100 possible ● ● M5 has backend optimizers – Crackers (tables are reorganized based on queries) – Deadcode elimination – Empty set (sometimes requires recompile) – Garbage collection TPCH Q4 select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date ©1993-07-01© and o_orderdate < date ©1993-07-01© + interval ©3© month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate) group by o_orderpriority order by o_orderpriority; MIL s17 := mvc_bind(myc,"sys","orders","o_orderdate"); s18 := date("1993-07-01"); s20 := date("1993-07-01"); s19 := addmonths(s20,3); s16 := s17.uselect(s18, s19, TRUE, FALSE); s32 := mvc_bind(myc, "sys","orders","o_orderkey"); s11 := semijoin(s12, s16); s10 := mark(s11, 0@0); s9 := s10.reverse(); MAL _1:bat[:void,:date]:=sql.bind("sys","orders","o_orderdate"); _8 := calc.addmonths(1993-07-01,3); _9 := algebra.uselect(_1,19930701,_8,true,false); _24:bat[:void,:int]:=sql.bind("sys","orders","o_orderkey"); _34 := algebra.semijoin(_24,_9); _36 := algebra.markT(_34,0@0); _37 := bat.reverse(_36); Why such a low level language? Operations ● Select D from R where x=1 and y=2 – T1 := uselect(Rx,1); – T2 := uselect(Ry,2) – Pivot:= semijoin(T2,T1) – ● Join a=b,c=d (A,B) – J := join(hash(a,c),hash(b,d).reverse()) – Am := reverse(j.mark()); – Bm := j.reverse().mark().reverse(); – R := [=](Am.join(a),Bm.join(c)).uselect(true) – Am := Am.semijoin(r), Bm := Bm.semijoin(r); Relational Overview ● SQL scanner-parser ● Syntax tree ● R-Algebra (select only) ● R-Optimizer ● b-algebra/b-optimizer ● Backend Code Generation ● Optional Backend optimizers ● Execution TPCH Q1 select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, count(*) as count_order from lineitem where l_shipdate <= date ©1998-12-01© - interval ©90© day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; order by ( Relational Q1 (X100) project ( group by ( select ( table(sys.lineitem) as lineitem ) [ convert( lineitem.l_shipdate) <= (©1998-12-01© - 7776000) ] ) [ lineitem.l_returnflag, lineitem.l_linestatus ] [ ©sum© unique [ lineitem.l_quantity ] as L1, ©count© as L2 ] ) [ lineitem.l_returnflag, lineitem.l_linestatus, L1 as sum_qty, L2 as count_order ] ) [ lineitem.l_returnflag, 1, lineitem.l_linestatus, 1 ] Optimizers ● sequence of optimizers ● rewriter mal in mal out ● Sequential mal block (mb) – 2 arrays (stmt, var) – (v0,..vn) := m.f(vn+1,..,vm) ● Recognize module and function ● Unused stmts, vars, stmt/var depencies ● Side effects, barriers SQL Optimizer ● Select Push down through joins etc ● Join order rewrites based on select count ● Combine selects (overlapping ranges) ● Common expression ● Push through delta columns (union, diff) ● Use join index ● ● Why not a cost-based optimizer? MAL optimizer ● Statistics ● Cost Model ● Coercions ● Aliases ● Common terms ● Accumulators ● Deadcode ● Garbage collector Query Cache ● Goal – Reduced compilation time ● How (Tricks in parser) – Filter out atomic values – Calculate hash over syntax tree ● Cached object – hash key, syntax tree – backend code ● On second call we replace arguments SkyServer ✔ Large schema ✔ Tables with > 400 columns ➢ Large number of function (lots of typing) ➢ HTM (external c++?) ✗ 2T – Distribution/replication – ColumnBM – Multi bat columns Multi bat columns ● Column N bats ● Where 1-bat into n-bat translation ● How many i, u and d bats ● How to handle the joins efficiently ● Optimize Aggregation ● Why not ColumnBM/X100? GeoInfoNed ● Spatial data types, (point, line, polyline) ● Spatial index (rtree?) ● SQL/XML reporting ( – xmlelement,xmlattributes..) ● SQL/XML xml data type (xquery) Crackers/Armada ● Crackers and Updates ● Armada extensions – Catalog name cat.schema.table.column – Constraints with option (split functions) ● Replication/Distribution – Access remote catalogs – Remote execution Optimization ● Rewriters (not plan generators) ● Too Many optimizers ● Relational optimizer ● Cost Model ● Push join through c,i,u,d bats ● Too many scans – Use hash index – crackers Future ● Larger databases (SkyServer 2T) ● Extensions (spatial/xml reporting) ● Distribution/Replication ● more optimizers ● Wishes ?.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    36 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us