MonetDB: the Challenges of a Scientific

Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam

SkyServer Schema

446 columns >585 million rows

6 columns > 20 Billion rows

M. Ivanova et al., CWI Outline

• MonetDB/SQL • SkyServer porting lessons • Query log lessons • Recycling • Evaluation • Outlook

M. Ivanova et al., CWI MonetDB Background

PhotoObjAll Ra Dec U ... 0.0645 1.2079 14.70872 … 0.1433 1.0662 11.71277 … 0.2811 1.2495 12.02889 … … … …

Ra BAT Dec BAT U BAT H T H T H T 0@0 Ra 0@0 Dec 0@0 U 0@0 0.0645 0@0 1.2079 0@0 14.70872 1@0 0.1433 1@0 1.0662 1@0 11.71277 2@0 0.2811 2@0 1.2495 2@0 12.02889 … … …

M. Ivanova et al., CWI MonetDB Architecture

select count(*) from photoobjall;

function user.s3_1():void; SQL XQuery X1:bat[:oid,:lng] := .bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); MAL X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); Tactical Optimizer X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); MAL sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; MonetDB Kernel

MonetDB Server

M. Ivanova et al., CWI SkyServer with MonetDB

Goal: To provide SkyServer mirror with similar functionality using MonetDB Three phases: 1%, 10%, entire SDSS data set Can we • Do better in terms of performance and functionality? • Improve query processing by novel parallelism and query cracking techniques?

M. Ivanova et al., CWI Portability Lessons

• Need for rich SQL environment (PSM) • Cast to SQL:2003 standard – Replacement of data types and operations – Specific extensions ignored or replaced • Avoid data redundancy – Auxiliary tables replaced by views:10% size reduction

M. Ivanova et al., CWI Spatial Search Lesson

• HTM (Hierarchical Triangular Mesh) – Implemented in ++, C# – Good for point-near-point and point-in- region queries • Zones – Implemented in SQL – Good for point-near-point (x3) – Efficient for batch-oriented spatial join(x32) – Enables SQL optimizer usage M. Ivanova et al., CWI Query Log Lessons

• Query logs important for both application and science • Analysed 1.2M queries, August 2006 • Spatial access prevails (83%) • Small core of photo and spectro tables accessed – 64% photo, 44% spectro, 27% both

M. Ivanova et al., CWI Common Patterns

• Limited number of query patterns – Correlation to web site interface • Most popular query (25%) SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field, p.obj, p.type, p.ra, p.dec, p.u, p.g, p., p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary p WHERE n.objID = p.objID;

M. Ivanova et al., CWI Spatial Overlap

• 24% queries overlap • Mean sequence length of 9.4, max of 6200 • Overlap and equality patterns for script- based interaction • Zoom in/zoom out patterns for manual interaction

M. Ivanova et al., CWI Evaluation on 100GB

• ‘Color-cut’ for low-z quasars SELECT g, run, rerun, camcol, field, objID, FROM Galaxy WHERE ( ( g <= 22) and (u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and (r - i >= -0.27) and (r - i < 0.57) and (i - z >= -0.35) and (i - z < 0.7) );

• Moving asteroids SELECT objID, sqrt(power(rowv,2) + power(colv,2)) as velocity FROM PhotoObj WHERE power(rowv,2) + power(colv,2) > 50 and rowv >= 0 and colv >= 0;

M. Ivanova et al., CWI Staircase to the sky

• Status nov 2008 – 1GB: done – 100GB: done – 2.7 TB DR6

• Platform – Dual quadcore 2.4Ghz, 64GB, 6TB raid5

• Web site

M. Ivanova et al., CWI Moving ahead

• Progress 2009 – Download DR 7 and installation – Development of export/attach functionality – Development of partial result recycler

M. Ivanova et al., CWI MonetDB Background

• Tuple-at-a-time pipelined execution – Materialized views and caches – Semi-automatic • Operator-at-a-time – Materialized intermediates – Automatic management and low cost Self-organizing cache of intermediates to speed up query streams MonetDB Architecture

SQL XQuery function user.s1_2(A0:date, ...):void; X5 := sql.bind("sys","lineitem",...); X10 := algebra.select(X5,A0); X12 := sql.bindIdx("sys","lineitem",...); X15 := algebra.join(X10,X12); MAL X25 := mtime.addmonths(A1,A2); Recycler ... Tactical Optimizer Optimizer function user.s1_2(A0:date, ...):void; X5 := sql.bind("sys","lineitem",...); MAL X10 := algebra.select(X5,A0); X12 := sql.bindIdx("sys","lineitem",...); X15 := algebra.join(X10,X12); MonetDB Kernel Run-time Support X25 := mtime.addmonths(A1,A2); Admission & Eviction ...

Recycle MonetDB Pool Server Instruction Matching Run time comparison of • instruction types • argument values Y3 := sql.bind("sys","orders","o_orderdate",0);

Exact X1 := sql.bind("sys","orders","o_orderdate",0); matching … Name Value Data type Size X1 10 :bat[:oid,:date] T1 “sys” :str T2 “orders” :str … Instruction Subsumption

Y3 := algebra.select(X1,20,45);

X3 := algebra.select(X1,10,80); … X5X5 := algebra.select(X1,20,60); Name Value Data type Size X1 10 :bat[:oid,:int] 2000 X3 130 :bat[:oid,:int] 700 X5 150 :bat[:oid,:int] 350 … a Cache with Lineage

Q1 …

algebra.join algebra.join(X2,X3)

X4 := algebra.select X3 := algebra.select(X1) X2 := sql.bind(“C1“) sql.bind(“C2“) X1 := sql.bind(“C1“) sql.bind(“C2“) a Cache with Lineage

Q2 …

algebra.join

algebra.join sql.bind(“C3“) X4 := algebra.join(X2,X3) X4 X3 := sql.bind(“C2“) X3 algebra.select X2 := algebra.select(X1) X2 X1 := sql.bind(“C1“) X1 sql.bind(“C1“) sql.bind(“C2“) Mismatching Q2 …

algebra.join

Y4 Y4 := algebra.join(Y2,Y3) algebra.join sql.bind(“C3“) X4Y3 := algebra.join(X2,X3)sql.bind(“C2“) Y2 !=X2 Y2X3 := algebra.select(Y1)sql.bind(“C2“) algebra.select Y1X2 := sql.bind(“C1“)algebra.select(X1) Y1 Y3 !=X3 X1 := sql.bind(“C1“) sql.bind(“C1“) sql.bind(“C2“) Admission Policies Decide about storing the results • KEEPALL – all instructions advised by the optimizer • CREDIT – instructions supplied with credits – storage ‘paid’ with 1 credit – reuse returns credits – lack of reuse limits admission and resource claims Cache Policies

• Decide about eviction of intermediates • Filter ‘top’ instructions without dependents • Pick instructions with smallest utility – LRU : time of computation or last reuse – BENEFIT : estimated contribution to performance: CPU and I/O costs, recycling • Triggered by resource limitations (memory or entries) SkyServer Evaluation

• 100 GB subset of DR4 • 100-query batch from January 2008 log • 1.5GB intermediates, 99% reuse • Join intermediates major contributor to savings Status aug 2009

• DR 7 fully loaded – Loading and integrity checking – Queries ran and traces collected – 01: real 1m43.142s 5 rows – 02: real 0m10.836s 310 rows – 03: real 9m55.870s 7805794 rows – 04: real 3m46.905s 2088794 rows – 05: real 3m54.591s 264954 rows – 06: real 7m11.867s 584884 rows – 07: real 0m0.501s 1148 rows – 08: real 1m43.935s 58599 rows – 09: real 26m23.968s 33086 rows – 11: real 13m25.067s 11 rows – 12: real 0m0.914s 44 rows – 13: real 0m0.893s 4 rows – 14: real 5m18.018s 207 rows

M. Ivanova et al., CWI Query traces

[ 21472701 usec @0 _55[357175411] := algebra.uselect(_53=[585634220],1); ] [ 49 usec @0 _53 := nil:BAT; ] [ 38150716 usec @0 _56[357175411] := algebra.semijoin(_45=[585634220],_55=[357175411]); ] [ 38 usec @0 _45 := nil:BAT; ] [ 284808 usec @0 _55 := nil:BAT; ] [ 31309800 usec @0 _57[180377011] := algebra.uselect(_56=[357175411],6:sht); ] [ 2059575 usec @0 _56 := nil:BAT; ] [ 32340510 usec @0 _58[43684077] := algebra.semijoin(_36=[132480668],_57=[180377011]); ] [ 91705 usec @0 _36 := nil:BAT; ] [ 229621 usec @0 _59[585634220] := batcalc.flt(_26=[585634220]); ] [ 25123250 usec @0 _60[409973371] := algebra.thetauselect(_59=[585634220],A1=22.2999992,">"); ] [ 1208014 usec @0 _59 := nil:BAT; ] [ 30692604 usec @0 _61[121305794] := algebra.semijoin(_60=[409973371],_57=[180377011]); ] [ 295106 usec @0 _60 := nil:BAT; ] [ 4440129 usec @0 _57 := nil:BAT; ] [ 19408689 usec @0 _62[132511910] := algebra.kunion(_58=[43684077],_61=[121305794]); ] [ 105538 usec @0 _58 := nil:BAT; ] [ 320537 usec @0 _61 := nil:BAT; ] [ 49 usec @0 _64[132511910] := algebra.markT(_62=[132511910],0@0); ] [ 6 usec @0 _62 := nil:BAT; ] [ 5 usec @0 _65[132511910] := bat.reverse(_64=[132511910]); ] [ 3

M. Ivanova et al., CWI • 2264769425 32 algebra.join • 1535721272 285 algebra.leftjoin • 636128509 76 algebra.semijoin • 306371507 59 algebra.uselect • 112269101 10 batcalc.* • 106087174 36 batcalc.- • 101124840 28 algebra.thetauselect • 54344621 2 user.getnearbyobjectsmode • 19421076 391 algebra.kunion • 2577559 25 batcalc.flt • 1242179 8 batcalc.int • 1105366 9 batcalc.+ • 657450 30 bat.append • 402815 570 sql.bind

M. Ivanova et al., CWI Summary

• Database architecture augmented with recycling intermediates • Self-organizing technique • Extension to MonetDB transforming materialization overhead into benefit Future Work

• Refining cache policies • Opportunities by query class recognition • Automatic switch to suitable policies • Automatic database replication • Distributed processing (Octopus) Recycling

Is Green

30/06/2009 An Architecture for Recycling 31/20 SIGMOD'09 Intermediates M. Ivanova, M. L. Providence, RI Kersten, N. Nes, R. Goncalves Inspirations

• Self-organization vs. hard-coded zoning – Adaptive segmentation (ICDE’08) – Adaptive replication (EDBT’08) • Results caching and reuse • Workload-driven optimization

M. Ivanova et al., CWI