Script ( PDF , 8 MB )

Main Memory DBMS G. Moerkotte February 4, 2019 2 Contents 1 Introduction9 2 Hardware 11 2.1 Memory................................ 11 2.1.1 Aligned vs. Unaligned Access................ 11 2.1.2 Address Translation (Virtual Memory)........... 11 2.1.3 Caches and the Memory Hierachie............. 15 2.1.4 Some Numbers of Some Processors............. 18 2.1.5 Prefetching.......................... 18 2.2 CPU.................................. 19 2.2.1 Pipelining........................... 19 2.2.2 Out-Of-Order-Execution................... 20 2.2.3 (Cost of) Branch (Mis-) Prediction............. 21 2.2.4 SIMD............................. 23 2.2.5 Simultaneous multithreading (SMT)............ 30 2.3 Cache Coherence........................... 30 2.4 Synchronization Primitives..................... 32 2.5 NUMA................................. 32 2.6 Performance Monitoring Unit.................... 34 2.7 References............................... 35 3 Operating System 37 4 Hash Tables 39 4.1 Hash Functions............................ 39 4.1.1 Why hash-functions matter................. 40 4.1.2 Properties........................... 42 4.1.3 Uniformity.......................... 42 4.1.4 Average-case Search Length................. 42 4.1.5 Expected Length of the Longest Probe Sequence (llps).. 42 4.1.6 Universality.......................... 43 4.1.7 k-Universal.......................... 46 4.1.8 (c; k)-Universal........................ 46 4.1.9 Dietzfelbinger: Universality without Primes........ 47 4.1.10 k-universal Hash Functions................. 47 4.1.11 Tabulation Hashing..................... 48 3 4 CONTENTS 4.1.12 Hashing string values.................... 48 4.2 Hash Table Organization....................... 49 4.2.1 Two versions of Chained Hashtable............. 49 4.2.2 Cuckoo-Hashing....................... 49 4.2.3 Robin-Hood-Hashing..................... 50 4.2.4 Hopscotch-Hashing...................... 50 5 Compression 51 6 Storage Layout 53 6.1 Row stores and Column Stores................... 53 6.1.1 Row format (NSM)...................... 53 6.1.2 Column format (DSM).................... 53 6.1.3 Hybrid Storage Model (PDSM)............... 55 6.1.4 Cache Lines in Row and Column Format......... 55 6.2 Organization on Pages........................ 56 6.3 Row Layouts............................. 57 6.4 Column Layouts........................... 60 6.4.1 BitPackingH......................... 60 6.4.2 BitSliceH........................... 61 6.4.3 BitSliceV........................... 64 6.4.4 ByteSliceV.......................... 64 6.5 DB2 BLU............................... 64 6.5.1 Column Groups........................ 64 6.5.2 Compression......................... 65 6.5.3 Cell/Region.......................... 65 6.5.4 Page Format......................... 66 6.5.5 Page Compression...................... 67 6.5.6 Small Materialized Aggregates (SMA)........... 67 6.5.7 Global Code......................... 68 6.5.8 Scan.............................. 68 6.6 SQL Server.............................. 68 6.6.1 Apollo............................. 68 6.6.2 Hekaton............................ 69 6.7 Large Objects............................. 72 7 Physical Algebra: Processing Modes 73 7.1 Pull Algebra............................. 73 7.2 Push Algebra............................. 73 7.2.1 Interface............................ 73 7.2.2 Scan.............................. 73 7.2.3 Select............................. 74 7.2.4 Simplest Hash-Join...................... 74 7.3 Materialization Granularity/Call Granularity........... 77 7.3.1 Tuple-wise (single tuple materialization).......... 77 7.3.2 Complete (full materialization)............... 77 7.3.3 Blockwise (partial materialization)............. 77 CONTENTS 5 8 Expression Evaluation 79 8.1 Introduction.............................. 79 8.2 Result Representation........................ 80 8.3 Interpretation: Operator Tree.................... 80 8.4 Interpretation: A Virtual Machine (AVM)............. 80 8.4.1 AVM: row: single tuple................... 81 8.4.2 AVM: row: vectorized.................... 83 8.4.3 AVM: col: single....................... 84 8.4.4 AVM: col: vectorized..................... 85 8.4.5 AVM: col: vectorized with SIMD.............. 85 8.5 Compilation.............................. 86 8.6 Comparison: simple map program................. 86 8.7 AVM: col: Vectorized SIMD: selection............... 88 9 Physical Algebra: Implementation 91 9.1 General Implementation Techniques................ 91 9.1.1 Blocking/Tiling........................ 92 9.1.2 Partitioning.......................... 92 9.1.3 Extraction........................... 93 9.1.4 Loop Fusion.......................... 93 9.2 Scan/Select.............................. 93 9.3 Join.................................. 93 9.3.1 Simple Hash Table...................... 94 9.3.2 Extraction........................... 94 9.3.3 Partitioning.......................... 94 9.3.4 Software Prefetching..................... 95 9.3.5 Group Prefetchin....................... 95 9.3.6 Software-Pipelined Prefetching............... 95 9.3.7 Rolling Prefetching...................... 96 9.3.8 Asynchronous Memory Access Chaining (AMAC)..... 99 9.4 Partitioning.............................. 101 9.5 Sort Operator............................. 105 9.6 Grouping and Aggregation...................... 105 10 Indexing 107 10.1 T-Tree................................. 107 10.2 Cache Conscious B+-Tree...................... 107 10.3 Skip Lists............................... 108 10.4 ART.................................. 108 11 Boolean Expressions 109 12 Cardinality Estimation 111 12.1 Sketches for Joins........................... 111 12.1.1 Tug-Of-War (AGMS).................... 111 12.1.2 FastAGMS.......................... 115 12.1.3 FastCount........................... 115 6 CONTENTS 12.1.4 CountMin........................... 115 13 Parallelism 117 13.1 Amdahl's Law............................. 117 13.2 Parallelization Constructs, Frameworks, and Libraries...... 117 13.3 Kinds of Parallelism......................... 118 13.4 Morsel-Driven Parallelism...................... 118 13.5 Synchronization of Index Structures................ 123 14 Memory Management 125 15 Thread Architecture 127 16 Transaction Processing 129 16.1 Handling updates........................... 129 16.2 Lock Manager............................. 129 16.2.1 The Gray/Reuter Lock Manager.............. 130 16.2.2 Starburst Lock Manager................... 135 16.2.3 Starburst MM Lock Manager [104]............. 136 16.2.4 Fekete Lock Manager.................... 137 16.2.5 Pointers to Literature.................... 137 16.3 Snapshot isolation.......................... 138 16.4 Logging................................ 138 17 The End 139 A Tools and System Calls 141 B Pointers to the Literature 143 B.1 Overview Papers/Books....................... 143 B.2 Timeline (Milestones)........................ 145 B.3 Storage Layouts............................ 146 B.4 Memory Management........................ 146 B.5 Hashing................................ 146 B.6 Compression............................. 147 B.7 Expression Evaluation Techniques................. 147 B.8 Indexes................................ 148 B.9 Physical Algebra........................... 148 B.10 Prefetching.............................. 149 B.11 Instruction-Level Parallelism (ILP, SIMD)............. 150 B.12 Thread-Level Parallelism (TLP).................. 150 B.13 NUMA................................. 150 B.14 Cost Models.............................. 150 B.15 Code Generation........................... 150 B.16 Buffer Management.......................... 151 B.16.1 Buffer Manager........................ 151 B.16.2 Buffering Without Buffermanager............. 152 B.17 Recovery/Checkpointing....................... 152 CONTENTS 7 B.18 Storage Manager........................... 152 B.19 System Overviews.......................... 153 B.20 todo.................................. 153 8 CONTENTS Chapter 1 Introduction The holy grail for a DBMS is one that is [183]: • Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, • Stable & Secure, to service a broad user community, • Small & Simple, to be comprehensible to a small team of programmers, • Self-managing, to let it run out-of-the-box without hassle. We will have a more limited view, shared with Stonebraker: There are three important things in databases: 1. performance, 2. performance, and 3. performance. We concentrate on the architecture and the implementation of Query Execu- tion Engines (QEE) focussing on data layout and (algebraic, simple) expression evaluation. We start out with a short refresh on hardware and a reminder that operating systems are our friends. Then, we recall some stuff on hash tables and drop the names of light-weight compression techniques that are applied in database systems. The first true chapter is Chap.6. The chapter on memory management is quite short yet. Larger is the chapter on the principal workings of a physical algebra (Chap.7). Very large is (or will be) the chapter on expression evaluation (Chap.8). OTHER CHAPTERS TODO 9 10 CHAPTER 1. INTRODUCTION Chapter 2 Hardware 2.1 Memory 2.1.1 Aligned vs. Unaligned Access Accessing a data item d at memory address a is aligned if a mod jdj = 0 if jdj is the size of the data item in bytes. Fig. 2.1 illustrates aligned vs. unaligned access of a 4-byte integer. Unaligned access maybe more expensive or even impossible on some architectures. 2.1.2 Address Translation (Virtual Memory)

Script ( PDF , 8 MB )

SAHA: a String Adaptive Hash Table for Analytical Databases

Chapter 5 Hashing

Double Hashing

Chapter 5. Hash Tables

Lock-Free Hopscotch Hashing

SIMD Vectorized Hashing for Grouped Aggregation

Implementation and Performance Analysis of Hash Functions and Collision Resolutions

Robin Hood Hashing

Practical Survey on Hash Tables

Hashing: Part 2 332

Fundamental Data Structures

SCALABLE GRAPH PROCESSING on RECONFIGURABLE SYSTEMS by ROBERT G. KIRCHGESSNER a DISSERTATION PRESENTED to the GRADUATE SCHOOL OF