A Glance over MonetDB CWI November 22nd 2011 at Nopsar
[email protected] [email protected] Centrum Centrum Informatica & Wiskunde Centrum Wiskunde & Informatica Background History of MonetDB
• 1979-1992 – first relational kernel • 1993-1995 – BAT-based kernel
CWI • 1996-2003 – spin-off Data Distilleries • 2003-2007 – Open Source (v4)
• 2008-20?? – MonetDB v5 Centrum Centrum Informatica & Wiskunde 3 MonetDB Principles
• Full vertical fragmentation: always! (BATs) • RISC approach to databases
CWI • Optimised for in-memory processing • Operator-at-a-time bulk processing
• CPU and memory cache optimised Centrum Centrum Informatica & Wiskunde 4 Traditional DBMSs
• Row-based • Buffer managers, pages, ...
CWI • (Magnetic) disk I/O conscious • Tuple-at-a-time volcano-style processing
• Mostly disk (index) bound (for speed) Centrum Centrum Informatica & Wiskunde 5 Research Nature
• MonetDB is different by design • Educated guesses have proven useful CWI • Open Architecture (pluggable/extensible)
• Experimental research additions Centrum Centrum Informatica & Wiskunde 6 The Vision
Column-store long before column-stores became known – a pioneer in the field
CWI “We can’t solve problems by using the same kind of thinking we used when we
created them.” Centrum Centrum Informatica & Wiskunde 7 Problems Seen
• From OLTP to OLAP, BI, Data Mining • DBMSs on modern processors: 60–90% idle CWI • Waiting for memory to arrive at CPU
• Non-utilised caches and CPU features Centrum Centrum Informatica & Wiskunde 8 CPU niceness CPU usage
CWI Centrum Centrum Informatica & Wiskunde 10 Why are we waiting?
• CPU is 60%-90% idle, waiting for memory: • L1 data stalls • L1 instruction stalls CWI • L2 data stalls • TLB stalls • Branch mispredictions
• Resource stalls Centrum Centrum Informatica & Wiskunde 11 Memory Wall CWI
Trip to memory = 1000s of instructions! Centrum Centrum Informatica & Wiskunde 12 Memory Hierarchy
CWI Centrum Centrum Informatica & Wiskunde 13 simple hardcoded semantics simple hardcoded batcalc_minus_int(int* res, int* col, int val, int n) { Processing for(i=0; x i
CPU: Give it “nice” code !
- few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD
One loop for an entire column - no per-tuple interpretation Centrum Centrum Informatica & Wiskunde - arrays: no record navigation - better instruction cache locality 14 Internals Software Stack
• GDK (BAT Kernel) • MonetDB 5 • MAL interpreter
CWI • Optimiser stack • Execution/scheduler • SQL to MAL translator
• MonetDB daemon Centrum Centrum Informatica & Wiskunde 16 A Row-store
Early 80s: tuple storage structures were simple
OK John 32 Houston CWI OK Mary 31 Houston
Centrum Centrum Informatica & Wiskunde 17 Disk Pages
32 John Houston
CWI 31 Mary Houston Centrum Centrum Informatica & Wiskunde 18 A Column-store
A column orientation is simple and acts like an array CWI
Attributes of a tuple are
correlated by offset Centrum Centrum Informatica & Wiskunde 19 Binary Association Tables
row-store CWI
column-store Centrum Centrum Informatica & Wiskunde 20 Data Organisation
head tail head and tail 100 10 stored as dense 101 11 separate files 102 12 sequence
CWI 103 14 memory { 104 18 mapped
head and tail columns in fact fixed width
are C-like arrays Centrum Centrum Informatica & Wiskunde 21 Tail Heaps
head tail heap 100 0x01 John 101 0x04 102 0x08 Mary best effort
CWI 103 0x04 duplicate
104 0x25 elimination Centrum Centrum Informatica & Wiskunde 22 Accelerators
head tail column properties: 100 10 key-ness 101 11 non-null 102 12 dense CWI hash-based 103 14 ordered 104 18
access Centrum Centrum Informatica & Wiskunde 23 GDK processing model
• Bulk processing (full materialisation) • Binary algebra core
CWI • select, join, semijoin, outerjoin, union, intersection, diff, group, count, max, min, sum, avg, reverse, mirror, mark
• Runtime operational optimisation Centrum Centrum Informatica & Wiskunde 24 GDK algorithms
• Heavy use of code expansion • Fast, branchless, code paths CWI • ~1500 selection routines • Runtime selection of best algorithm for
current situation Centrum Centrum Informatica & Wiskunde 25 Maintenance Knoblessness
• MonetDB is host-oriented • We follow the no knobs principle MonetDB aims to maintain its own CWI • databases • TODO: we need vacuum for deletes
• Upgrades on new releases are in-place Centrum Centrum Informatica & Wiskunde 27 Backups
• dbfarm can be copied verbatim • as long as the server won’t change it • which means either it’s stopped or CWI suspended • only works on same architecture (rarely a problem these days, most is x86_64)
• Data can be dumped to SQL Centrum Centrum Informatica & Wiskunde 28 Deletes
• A DELETE does not remove for real! :( • On frequent DELETE scenarios, tables need to be reloaded to free up space
CWI • dump/restore • CREATE TABLE LIKE SELECT ... WITH DATA
• delete by dropping entire tables Centrum Centrum Informatica & Wiskunde 29 Code generation Inspection from SQL
• Prefix a query by: • PLAN – to get the relational plan, independent of data CWI • EXPLAIN – to get the MAL plan, what will be really executed • TRACE – to see each instruction of the
MAL plan prefixed by microseconds Centrum Centrum Informatica & Wiskunde 31 Optimisers
Strategic optimizer: SQL – Exploit the semantics of the language – Rely on heuristics
MAL Tactical MAL optimizer:
– No changes in front-ends and no direct Tactical Optimizer human guidance
CWI – Minimal changes in the engine MAL
MonetDB Kernel Operational optimizer: – Exploit everything you know at runtime
– Re-organize if necessary MonetDB Server Centrum Centrum Informatica & Wiskunde 32 Optimisers
Strategic optimizer: SQL – Exploit the semantics of the language – Rely on heuristics
MAL Tactical MAL optimizer: x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0); – Nox14:= changes algebra.uselect(x1,A0,A1); in front-ends and no direct Tactical Optimizer human guidance
CWI – Minimal changes in the engine MAL y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra"); y2 := bpm.new(:oid,:oid); barrier rs:= bpm.newIterator(y1,A0,A1); t1:= algebra.uselect(rs,A0,A1); bpm.addSegment(y2,t1); MonetDB Kernel Operationalredo rs:= bpm.hasMoreElements(y1,A0,A1); optimizer: – Exploitexit everything rs; you know at runtime
– Re-organize if necessary MonetDB Server Centrum Centrum Informatica & Wiskunde 32 Examples
Code Inliner Code Paralliser Constant Expression Evaluator Replication Manager Accumulator Evaluations Result Recycler
Strength Reduction Dynamic Query Scheduler Common Term Optimizer Alias Removal
CWI Join Path Optimiser Dead Code Removal Ranges Propagation Garbage Collector Operator Cost Reduction Foreign Key handling
Aggregate Groups Centrum Centrum Informatica & Wiskunde 33 Usage Research Usage
• CWI, Amsterdam • Core DBMS Reseach • TIJAH: Multi-Media IR • Data Mining, GIS, Astronomy, RDF/SPARQL, Streams, ... CWI • Universität Tübingen (with UTwente & CWI) • Pathfinder: XQuery compiler • Knowledge Discovery Lab, UMass, Amherst
• Proximity: OpenSource relational knowledge Centrum Centrum Informatica & Wiskunde 35 Commercial Usage
• Data Distilleries (CWI Spin-Off, now part of SPSS -> IBM), Amsterdam • Commercial Data-Mining & CRM Software
CWI • Many banks & insurance companies in NL • Pentaho • MonetDB as supported analytic database platform
• Coupling to Infobright Centrum Centrum Informatica & Wiskunde 36 Extendability MonetDB Sources
• Using Mercurial, distributed VCS at http://dev.monetdb.org/hg/MonetDB/ • Release Managers create release branches
CWI (Aug2011, Dec2011, ...) and tags (SP-1, ...) • Commits are propagated from/to stable, candidate and development branches
• Nightly regression testing of branches Centrum Centrum Informatica & Wiskunde 38 Release Cycle
• Deliver releases on regular intervals
CWI • Predictable for both devs and users
• Keeps “gap” between devs and users low Centrum Centrum Informatica & Wiskunde 39 Classic School Theory
fixes CWI
branch
slowdown
release
unlock tree
test & fix Centrum Centrum Informatica & Wiskunde 40 Classic School Theory
fixes CWI
branch
slowdown
release
unlock tree
test & fix Centrum Centrum Informatica & Wiskunde 40 OpenBSD Release Cycle CWI
branch
lock tree
slowdown
unlock tree
release
API/ABI locks
big commits now
everyone tests Centrum Centrum Informatica & Wiskunde
normal development 41 unlock tree next cycle OpenBSD Release Cycle CWI
branch
lock tree
slowdown
unlock tree
release
API/ABI locks
big commits now
everyone tests Centrum Centrum Informatica & Wiskunde
normal development 41 unlock tree next cycle OpenBSD Release Cycle CWI
branch
lock tree
slowdown
unlock tree
release
API/ABI locks
big commits now
everyone tests Centrum Centrum Informatica & Wiskunde
normal development 41 unlock tree next cycle OpenBSD?
• OS with strong focus on security and stability • their release cycle might be too rigid for us (e.g. treelocks, single version development) CWI • good things: • features are committed at the start
• the final release process should be minor Centrum Centrum Informatica & Wiskunde 42 MonetDB Cycle big commits now
fixes
CWI initial release
release N+1 release N
SP1
Centrum Centrum Informatica & Wiskunde SP2
normal development 43 Branching
• One of the strong points of DVCSs • Hg allows easy “merging” CWI • Each “clone” is a branch itself
• Extremely simple to keep changes local Centrum Centrum Informatica & Wiskunde 44 Mercurial
• Each clone contains full history/data • Pulling changes as well as pushing • Local in-house “master” clone pulling in CWI changes from e.g. dev.monetdb.org • Staged/selected pushs/pull requests of fixes back to monetdb.org (or sent through hg
email my-bugfix-rev) Centrum Centrum Informatica & Wiskunde 45 Considerations
• MonetDB devs stop bug-fixes on a branch when follow-up one becomes a release • Release branches maintain API/ABI
CWI compatability • Database format upgrade path only supported from previous release
• It is best to stay with current release branches Centrum Centrum Informatica & Wiskunde 46 MonetDB Team Spirit
• Bring all fixes back to monetdb.org codebase Help develop solutions, have primitives CWI • available on monetdb.org codebase • Share our minds to help find good (code)
solutions, or migrations Centrum Centrum Informatica & Wiskunde 47