Architecture and Implementation of Database Systems Matters
Total Page:16
File Type:pdf, Size:1020Kb
Introduction Architecture of a DBMS Architecture and Implementation of Organizational Matters Database System 2013 - 2014 1 Introduction Introduction Preliminaries and Organizational Matters Architecture of a DBMS Organizational Architecture and Implementation of Database Systems Matters http://adrem.ua.ac.be/advanced-database-systems 2 Welcome all . Introduction . to this course whose lectures are primarily about digging in the mud of database system internals. Architecture of a • While others talk about SQL and graphical query interfaces, DBMS Organizational we will Matters 1 learn how DBMSs can access files on hard disks without paying too much for I/O traffic, 2 see how to organize data on disk and which kind of “maps” for huge amounts of data we can use to avoid to get lost, 3 assess what it means to sort/combine/filter data volumes that exceed main memory size by far, and 4 learn how user queries are represented and executed inside the database kernel. 3 Architecture of a DBMS / Course Outline Introduction Web Forms Applications SQL Interface SQL Commands Architecture of a DBMS Executor Parser Organizational Matters Operator Evaluator Optimizer Transaction Files and Access Methods Manager Recovery Buffer Manager this course Manager Lock Manager Disk Space Manager amakrishnan/Gehrke: “Database Management Systems”,McGraw-Hill 2003. R DBMS data files, indices, . Database Figure inspired by 4 Reading Material Introduction • Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. McGraw-Hill. Architecture of a DBMS • . in fact, any book about advanced database topics and Organizational internals will do — pick your favorite. Matters Here and there, pointers ( ) to specific research papers will be given and you are welcome% to search for additional background reading. Use Google Scholar or similar search engines. R Exam: Open book, example exams are on the website. 5 These Slides. Introduction • . prepared/updated throughout the semester — watch out for bugs and please let me know. Thanks. • Posted to course web home on the day before the lecture. Architecture of a Example DBMS Organizational Matters Open Issues/Questions Take notes. Code Snippets, Algorithms IBM DB2 Specifics If possible and insightful, discuss how IBM DB2 does things. PostgreSQL Specifics Ditto, but related to the glorious PostgreSQL system. 6 Before We Begin Introduction Architecture of a DBMS Organizational Questions? Matters Comments? Suggestions? 7 Storage Chapter Storage Disks, Buffer Manager, Files. Magnetic Disks Access Time Sequential vs. Random Architecture and Implementation of Database Systems Access I/O Parallelism RAID Levels 1, 0, and 5 Alternative Storage Techniques Solid-State Disks Network-Based Storage Managing Space Free Space Management Buffer Manager Pinning and Unpinning Replacement Policies Databases vs. Operating Systems Files and Records Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 1 Database Architecture Storage Web Forms Applications SQL Interface SQL Commands Magnetic Disks Access Time Executor Parser Sequential vs. Random Access Operator Evaluator Optimizer I/O Parallelism RAID Levels 1, 0, and 5 Alternative Storage Techniques Transaction Files and Access Methods Solid-State Disks Manager Network-Based Storage Recovery Managing Space Buffer Manager Manager Free Space Management Lock Manager Buffer Manager Disk Space Manager Pinning and Unpinning Replacement Policies Databases vs. DBMS Operating Systems Files and Records data files, indices, . Database Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 2 The Memory Hierarchy Storage capacity latency CPU (with registers) bytes < 1 ns Magnetic Disks Access Time caches kilo-/megabytes < 10 ns Sequential vs. Random Access I/O Parallelism main memory gigabytes 70–100 ns RAID Levels 1, 0, and 5 Alternative Storage hard disks terabytes 3–10 ms Techniques Solid-State Disks tape library petabytes varies Network-Based Storage Managing Space Free Space Management Buffer Manager • Fast—but expensive and small—memory close to CPU Pinning and Unpinning Replacement Policies • Larger, slower memory at the periphery Databases vs. Operating Systems • DBMSs try to hide latency by using the fast memory as a Files and Records cache. Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 3 Magnetic Disks Storage rotation arm track Magnetic Disks Access Time Sequential vs. Random Access I/O Parallelism RAID Levels 1, 0, and 5 Alternative Storage block sector Techniques platter heads Solid-State Disks Network-Based Storage Managing Space Free Space Management • A stepper motor positions an array of Buffer Manager disk heads on the requested track Pinning and Unpinning Replacement Policies • Platters (disks) steadily rotate Databases vs. Operating Systems • Disks are managed in blocks: the system Files and Records http://www.metallurgy.utah.edu/ Heap Files reads/writes data one block at a time Free Space Management hoto: P Inside a Page Alternative Page Layouts Recap 4 Access Time Storage Data blocks can only be read and written if disk heads and platters are positioned accordingly. Magnetic Disks Access Time • This design has implications on the access time to Sequential vs. Random read/write a given block: Access I/O Parallelism RAID Levels 1, 0, and 5 Definition (Access Time) Alternative Storage Techniques 1 Move disk arms to desired track (seek time t ) Solid-State Disks s Network-Based Storage 2 Disk controller waits for desired block to rotate under Managing Space disk head (rotational delay tr) Free Space Management Buffer Manager 3 transfer time Read/write data ( ttr) Pinning and Unpinning Replacement Policies Databases vs. access time: t = ts + tr + ttr Operating Systems ⇒ Files and Records Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 5 Example: Seagate Cheetah 15K.7 Storage (600 GB, server-class drive) • Seagate Cheetah 15K.7 performance characteristics: • 4 disks, 8 heads, avg. 512 kB/track, 600 GB capacity • rotational speed: 15 000 rpm (revolutions per minute) Magnetic Disks Access Time • average seek time: 3.4 ms Sequential vs. Random Access • transfer rate 163 MB/s I/O Parallelism ≈ RAID Levels 1, 0, and 5 Alternative Storage What is the access time to read an 8 KB data block? Techniques Solid-State Disks average seek time ts = 3.40 ms Network-Based Storage 1 1 Managing Space average rotational delay: 1 tr = 2.00 ms Free Space Management 2 · 15 000 min− transfer time for 8 KB: 8 kB t = 0.05 ms Buffer Manager 163 MB/s tr Pinning and Unpinning Replacement Policies Databases vs. access time for an 8 kB data block t = 5.45 ms Operating Systems Files and Records Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 6 Sequential vs. Random Access Storage Example (Read 1 000 blocks of size 8 kB) • random access: Magnetic Disks trnd = 1 000 5.45 ms = 5.45 s Access Time · Sequential vs. Random • sequential read of adjacent blocks: Access I/O Parallelism tseq = ts + tr + 1 000 ttr + 16 ts,track-to-track · · RAID Levels 1, 0, and 5 = 3.40 ms + 2.00 ms + 50 ms + 3.2 ms 58.6 ms Alternative Storage ≈ Techniques The Seagate Cheetah 15K.7 stores an average of 512 kB per Solid-State Disks track, with a 0.2 ms track-to-track seek time; our 8 kB blocks Network-Based Storage Managing Space are spread across 16 tracks. Free Space Management Buffer Manager Pinning and Unpinning Sequential I/O is much faster than random I/O Replacement Policies ⇒ Databases vs. Avoid random I/O whenever possible Operating Systems ⇒ 58.6 ms Files and Records As soon as we need at least 5,450 ms = 1.07 % of a file, Heap Files ⇒ Free Space Management we better read the entire file sequentially Inside a Page Alternative Page Layouts Recap 7 Performance Tricks Storage • Disk manufacturers play a number of tricks to improve performance: track skewing Magnetic Disks Access Time Align sector 0 of each track to avoid Sequential vs. Random rotational delay during longer Access I/O Parallelism sequential scans RAID Levels 1, 0, and 5 Alternative Storage Techniques request scheduling Solid-State Disks Network-Based Storage If multiple requests have to be served, choose the one Managing Space that requires the smallest arm movement (SPTF: shortest Free Space Management Buffer Manager positioning time first, elevator algorithms) Pinning and Unpinning Replacement Policies Databases vs. zoning Operating Systems Files and Records Outer tracks are longer than the inner ones. Therefore, Heap Files divide outer tracks into more sectors than inner tracks Free Space Management Inside a Page Alternative Page Layouts Recap 8 Evolution of Hard Disk Technology Storage Disk seek and rotational latencies have only marginally improved over the last years ( 10 % per year) ≈ Magnetic Disks But: Access Time Sequential vs. Random Access • Throughput (i.e., transfer rates) improve by 50 % per year ≈ I/O Parallelism • Hard disk capacity grows by 50 % every year RAID Levels 1, 0, and 5 ≈ Alternative Storage Techniques Solid-State Disks Therefore: Network-Based Storage Managing Space • Random access cost hurts even more as time progresses Free Space Management Buffer Manager Pinning and Unpinning Replacement Policies Example (5 Years Ago: Seagate Barracuda 7200.7) Databases vs. Operating Systems Read 1K blocks of 8 kB sequentially/randomly: 397 ms / 12 800 ms Files and Records Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 9 Ways to Improve I/O Performance Storage The latency penalty is hard to avoid But: Magnetic Disks • Throughput can be increased rather easily by exploiting Access Time Sequential vs. Random parallelism Access • Idea: Use multiple disks and access them in parallel, try to I/O Parallelism hide latency RAID Levels 1, 0, and 5 Alternative Storage Techniques Solid-State Disks TPC-C: An industry benchmark for OLTP Network-Based Storage Managing Space A recent #1 system (IBM DB2 9.5 on AIX) uses Free Space Management Buffer Manager • 10,992 disk drives (73.4 GB each, 15,000 rpm) (!) Pinning and Unpinning plus 8 146.8 GB internal SCSI drives, Replacement Policies Databases vs.