A Glance over MonetDB CWI November 22nd 2011 at Nopsar

[email protected] [email protected] Centrum Centrum Informatica & Wiskunde Centrum Wiskunde & Informatica Background History of MonetDB

• 1979-1992 – first relational kernel • 1993-1995 – BAT-based kernel

CWI • 1996-2003 – spin-off Data Distilleries • 2003-2007 – Open Source (v4)

• 2008-20?? – MonetDB v5 Centrum Centrum Informatica & Wiskunde 3 MonetDB Principles

• Full vertical fragmentation: always! (BATs) • RISC approach to

CWI • Optimised for in-memory processing • Operator-at-a-time bulk processing

• CPU and memory cache optimised Centrum Centrum Informatica & Wiskunde 4 Traditional DBMSs

• Row-based • Buffer managers, pages, ...

CWI • (Magnetic) disk I/O conscious • Tuple-at-a-time volcano-style processing

• Mostly disk (index) bound (for speed) Centrum Centrum Informatica & Wiskunde 5 Research Nature

• MonetDB is different by design • Educated guesses have proven useful CWI • Open Architecture (pluggable/extensible)

• Experimental research additions Centrum Centrum Informatica & Wiskunde 6 The Vision

Column-store long before column-stores became known – a pioneer in the field

CWI “We can’t solve problems by using the same kind of thinking we used when we

created them.” Centrum Centrum Informatica & Wiskunde 7 Problems Seen

• From OLTP to OLAP, BI, • DBMSs on modern processors: 60–90% idle CWI • Waiting for memory to arrive at CPU

• Non-utilised caches and CPU features Centrum Centrum Informatica & Wiskunde 8 CPU niceness CPU usage

CWI Centrum Centrum Informatica & Wiskunde 10 Why are we waiting?

• CPU is 60%-90% idle, waiting for memory: • L1 data stalls • L1 instruction stalls CWI • L2 data stalls • TLB stalls • Branch mispredictions

• Resource stalls Centrum Centrum Informatica & Wiskunde 11 Memory Wall CWI

Trip to memory = 1000s of instructions! Centrum Centrum Informatica & Wiskunde 12 Memory Hierarchy

CWI Centrum Centrum Informatica & Wiskunde 13 simple hardcoded semantics simple hardcoded batcalc_minus_int(int* res, int* col, int val, int n) { Processing for(i=0; x i 30 CWI

CPU: Give it “nice” code !

- few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD

One loop for an entire column - no per-tuple interpretation Centrum Centrum Informatica & Wiskunde - arrays: no record navigation - better instruction cache locality 14 Internals Software Stack

• GDK (BAT Kernel) • MonetDB 5 • MAL

CWI • Optimiser stack • Execution/scheduler • SQL to MAL translator

• MonetDB daemon Centrum Centrum Informatica & Wiskunde 16 A Row-store

Early 80s: tuple storage structures were simple

OK John 32 Houston CWI OK Mary 31 Houston

Centrum Centrum Informatica & Wiskunde 17 Disk Pages

32 John Houston

CWI 31 Mary Houston Centrum Centrum Informatica & Wiskunde 18 A Column-store

A column orientation is simple and acts like an array CWI

Attributes of a tuple are

correlated by offset Centrum Centrum Informatica & Wiskunde 19 Binary Association Tables

row-store CWI

column-store Centrum Centrum Informatica & Wiskunde 20 Data Organisation

head tail head and tail 100 10 stored as dense 101 11 separate files 102 12 sequence

CWI 103 14 memory { 104 18 mapped

head and tail columns in fact fixed width

are -like arrays Centrum Centrum Informatica & Wiskunde 21 Tail Heaps

head tail heap 100 0x01 John 101 0x04 102 0x08 Mary best effort

CWI 103 0x04 duplicate

104 0x25 elimination Centrum Centrum Informatica & Wiskunde 22 Accelerators

head tail column properties: 100 10 key-ness 101 11 non-null 102 12 dense CWI hash-based 103 14 ordered 104 18

access Centrum Centrum Informatica & Wiskunde 23 GDK processing model

• Bulk processing (full materialisation) • Binary algebra core

CWI • select, join, semijoin, outerjoin, union, intersection, diff, group, count, max, min, sum, avg, reverse, mirror, mark

• Runtime operational optimisation Centrum Centrum Informatica & Wiskunde 24 GDK

• Heavy use of code expansion • Fast, branchless, code paths CWI • ~1500 selection routines • Runtime selection of best for

current situation Centrum Centrum Informatica & Wiskunde 25 Maintenance Knoblessness

• MonetDB is host-oriented • We follow the no knobs principle MonetDB aims to maintain its own CWI • databases • TODO: we need vacuum for deletes

• Upgrades on new releases are in-place Centrum Centrum Informatica & Wiskunde 27 Backups

• dbfarm can be copied verbatim • as long as the server won’t change it • which means either it’s stopped or CWI suspended • only works on same architecture (rarely a problem these days, most is x86_64)

• Data can be dumped to SQL Centrum Centrum Informatica & Wiskunde 28 Deletes

• A DELETE does not remove for real! :( • On frequent DELETE scenarios, tables need to be reloaded to free up space

CWI • dump/restore • CREATE LIKE SELECT ... WITH DATA

• delete by dropping entire tables Centrum Centrum Informatica & Wiskunde 29 Code generation Inspection from SQL

• Prefix a query by: • PLAN – to get the relational plan, independent of data CWI • EXPLAIN – to get the MAL plan, what will be really executed • TRACE – to see each instruction of the

MAL plan prefixed by microseconds Centrum Centrum Informatica & Wiskunde 31 Optimisers

Strategic optimizer: SQL – Exploit the semantics of the language – Rely on heuristics

MAL Tactical MAL optimizer:

– No changes in front-ends and no direct Tactical Optimizer human guidance

CWI – Minimal changes in the engine MAL

MonetDB Kernel Operational optimizer: – Exploit everything you know at runtime

– Re-organize if necessary MonetDB Server Centrum Centrum Informatica & Wiskunde 32 Optimisers

Strategic optimizer: SQL – Exploit the semantics of the language – Rely on heuristics

MAL Tactical MAL optimizer: x1:bat[:oid,:dbl]:= .bind("sys","photoobjall","ra",0); – Nox14:= changes algebra.uselect(x1,A0,A1); in front-ends and no direct Tactical Optimizer human guidance

CWI – Minimal changes in the engine MAL y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra"); y2 := bpm.new(:oid,:oid); barrier rs:= bpm.newIterator(y1,A0,A1); t1:= algebra.uselect(rs,A0,A1); bpm.addSegment(y2,t1); MonetDB Kernel Operationalredo rs:= bpm.hasMoreElements(y1,A0,A1); optimizer: – Exploitexit everything rs; you know at runtime

– Re-organize if necessary MonetDB Server Centrum Centrum Informatica & Wiskunde 32 Examples

Code Inliner Code Paralliser Constant Expression Evaluator Replication Manager Accumulator Evaluations Result Recycler

Strength Reduction Dynamic Query Scheduler Common Term Optimizer Alias Removal

CWI Join Path Optimiser Dead Code Removal Ranges Propagation Garbage Collector Operator Cost Reduction Foreign Key handling

Aggregate Groups Centrum Centrum Informatica & Wiskunde 33 Usage Research Usage

• CWI, Amsterdam • Core DBMS Reseach • TIJAH: Multi-Media IR • Data Mining, GIS, Astronomy, RDF/SPARQL, Streams, ... CWI • Universität Tübingen (with UTwente & CWI) • Pathfinder: XQuery compiler • Knowledge Discovery Lab, UMass, Amherst

• Proximity: OpenSource relational knowledge Centrum Centrum Informatica & Wiskunde 35 Commercial Usage

• Data Distilleries (CWI Spin-Off, now part of SPSS -> IBM), Amsterdam • Commercial Data-Mining & CRM Software

CWI • Many banks & insurance companies in NL • Pentaho • MonetDB as supported analytic platform

• Coupling to Infobright Centrum Centrum Informatica & Wiskunde 36 Extendability MonetDB Sources

• Using Mercurial, distributed VCS at http://dev.monetdb.org/hg/MonetDB/ • Release Managers create release branches

CWI (Aug2011, Dec2011, ...) and tags (SP-1, ...) • Commits are propagated from/to stable, candidate and development branches

• Nightly regression testing of branches Centrum Centrum Informatica & Wiskunde 38 Release Cycle

• Deliver releases on regular intervals

CWI • Predictable for both devs and users

• Keeps “gap” between devs and users low Centrum Centrum Informatica & Wiskunde 39 Classic School Theory

fixes CWI

branch

slowdown

release

unlock tree

test & fix Centrum Centrum Informatica & Wiskunde 40 Classic School Theory

fixes CWI

branch

slowdown

release

unlock tree

test & fix Centrum Centrum Informatica & Wiskunde 40 OpenBSD Release Cycle CWI

branch

lock tree

slowdown

unlock tree

release

API/ABI locks

big commits now

everyone tests Centrum Centrum Informatica & Wiskunde

normal development 41 unlock tree next cycle OpenBSD Release Cycle CWI

branch

lock tree

slowdown

unlock tree

release

API/ABI locks

big commits now

everyone tests Centrum Centrum Informatica & Wiskunde

normal development 41 unlock tree next cycle OpenBSD Release Cycle CWI

branch

lock tree

slowdown

unlock tree

release

API/ABI locks

big commits now

everyone tests Centrum Centrum Informatica & Wiskunde

normal development 41 unlock tree next cycle OpenBSD?

• OS with strong focus on security and stability • their release cycle might be too rigid for us (e.g. treelocks, single version development) CWI • good things: • features are committed at the start

• the final release process should be minor Centrum Centrum Informatica & Wiskunde 42 MonetDB Cycle big commits now

fixes

CWI initial release

release N+1 release N

SP1

Centrum Centrum Informatica & Wiskunde SP2

normal development 43 Branching

• One of the strong points of DVCSs • Hg allows easy “merging” CWI • Each “clone” is a branch itself

• Extremely simple to keep changes local Centrum Centrum Informatica & Wiskunde 44 Mercurial

• Each clone contains full history/data • Pulling changes as well as pushing • Local in-house “master” clone pulling in CWI changes from e.g. dev..org • Staged/selected pushs/pull requests of fixes back to monetdb.org (or sent through hg

email my-bugfix-rev) Centrum Centrum Informatica & Wiskunde 45 Considerations

• MonetDB devs stop bug-fixes on a branch when follow-up one becomes a release • Release branches maintain API/ABI

CWI compatability • Database format upgrade path only supported from previous release

• It is best to stay with current release branches Centrum Centrum Informatica & Wiskunde 46 MonetDB Team Spirit

• Bring all fixes back to monetdb.org codebase Help develop solutions, have primitives CWI • available on monetdb.org codebase • Share our minds to help find good (code)

solutions, or migrations Centrum Centrum Informatica & Wiskunde 47