CSE 132C Database System Implementation
Total Page:16
File Type:pdf, Size:1020Kb
CSE 132C Database System Implementation Spring 2021 Arun Kumar 1 About myself 2009: Bachelors in CSE from IIT Madras Summer: 110F! 2009-16: MS and PhD in CS from UW-Madison Winter: -40F! 2016-Now: Asst. Prof. at UC San Diego CSE 2019-Now: Asst. Prof. at UC San Diego HDSI Ahh… :) 2 What is this course about? Why take it? 3 Large-scale data management systems are the cornerstone of many digital applications, both traditional and modern 4 Our primary focus will be the relational data model and Relational Database Management Systems (RDBMS) 5 Relational model in a nutshell Basically, Relation:Table :: Pilot:Driver (okay, a bit more) RatingID Rating Date UserID MovieID 1 3.5 08/27/15 23294 20 2 4.0 07/20/15 4232 293 3 2.5 08/02/15 54551 846 … … … … … The model formalizes “operations” to manipulate relations 6 Relational model in a nutshell Invented by E. F. Codd in 1970s at IBM Research in CA 7 Relational DBMS in a nutshell First RDBMSs: System R (IBM) and Ingres (Berkeley) in 1970s A rare photo of the original Mike Stonebraker won the Turing System R manual Award in 2015! 8 CSE 132 Database Course Series <latexit sha1_base64="4ffEk4IQP/eG/drN/sd9HAaspz4=">AAAB63icbVBNSwMxEJ2tX7V+VT16CRbBU9kVUY9FLx4r2A9ol5JNs21okg1JVihL/4IXD4p49Q9589+YbfegrQ8GHu/NMDMvUpwZ6/vfXmltfWNzq7xd2dnd2z+oHh61TZJqQlsk4YnuRthQziRtWWY57SpNsYg47USTu9zvPFFtWCIf7VTRUOCRZDEj2OZSn6RqUK35dX8OtEqCgtSgQHNQ/eoPE5IKKi3h2Jhe4CsbZlhbRjidVfqpoQqTCR7RnqMSC2rCbH7rDJ05ZYjiRLuSFs3V3xMZFsZMReQ6BbZjs+zl4n9eL7XxTZgxqVJLJVksilOObILyx9GQaUosnzqCiWbuVkTGWGNiXTwVF0Kw/PIqaV/Ug6u6/3BZa9wWcZThBE7hHAK4hgbcQxNaQGAMz/AKb57wXrx372PRWvKKmWP4A+/zBx+9jks=</latexit> <latexit sha1_base64="64gkUccErWGoPbizbdYU9zepors=">AAAB7nicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9Rj04jGCeUCyhNlJJxkyO7PMzCphyUd48aCIV7/Hm3/jJNmDJhY0FFXddHdFieDG+v63t7K6tr6xWdgqbu/s7u2XDg4bRqWaYZ0poXQrogYFl1i33ApsJRppHAlsRqPbqd98RG24kg92nGAY04Hkfc6odVKzE6kny7FbKvsVfwayTIKclCFHrVv66vQUS2OUlglqTDvwExtmVFvOBE6KndRgQtmIDrDtqKQxmjCbnTshp07pkb7SrqQlM/X3REZjY8Zx5Dpjaodm0ZuK/3nt1Pavw4zLJLUo2XxRPxXEKjL9nfS4RmbF2BHKNHe3EjakmjLrEiq6EILFl5dJ47wSXFb8+4ty9SaPowDHcAJnEMAVVOEOalAHBiN4hld48xLvxXv3PuatK14+cwR/4H3+AHzcj6s=</latexit> <latexit sha1_base64="1BpxIRz+W3F0UxuD+iEJpGBauDc=">AAAB7XicbVA9TwJBEJ3DL8Av1NLmIjGxIncWakm0scREPiIQsrfswcre7mV3zkgu/AcbC4yxtfS/2PlrdPkoFHzJJC/vzWRmXhALbtDzvpzMyura+kY2l9/c2t7ZLezt14xKNGVVqoTSjYAYJrhkVeQoWCPWjESBYPVgcDXx6w9MG67kLQ5j1o5IT/KQU4JWqrWQR8x0CkWv5E3hLhN/TorlXDy++3j8rnQKn62uoknEJFJBjGn6XoztlGjkVLBRvpUYFhM6ID3WtFQSu6SdTq8ducdW6bqh0rYkulP190RKImOGUWA7I4J9s+hNxP+8ZoLhRTvlMk6QSTpbFCbCReVOXne7XDOKYmgJoZrbW13aJ5pQtAHlbQj+4svLpHZa8s9K3o1N4xJmyMIhHMEJ+HAOZbiGClSBwj08wRheHOU8O6/O26w148xnDuAPnPcf1wSTBw==</latexit> σ<latexit sha1_base64="egPbPWtjWR7/Ozpezvo2t0aXlVs=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKRI9BLx4jmAckS5idzCZj5rHMzAphyT948aCIV//Hm3/jJNmDJhY0FFXddHdFCWfG+v63V1hb39jcKm6Xdnb39g/Kh0cto1JNaJMornQnwoZyJmnTMstpJ9EUi4jTdjS+nfntJ6oNU/LBThIaCjyULGYEWye1eoYNBe6XK37VnwOtkiAnFcjR6Je/egNFUkGlJRwb0w38xIYZ1pYRTqelXmpogskYD2nXUYkFNWE2v3aKzpwyQLHSrqRFc/X3RIaFMRMRuU6B7cgsezPxP6+b2vg6zJhMUkslWSyKU46sQrPX0YBpSiyfOIKJZu5WREZYY2JdQCUXQrD88ippXVSDWtW/v6zUb/I4inACp3AOAVxBHe6gAU0g8AjP8ApvnvJevHfvY9Fa8PKZY/gD7/MHngWPKA==</latexit> ./ <latexit sha1_base64="UD0zqrn256RR3QuXxywn+gfuMJI=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9Rj04jGCeUCyhN7JbDJmZnaZmRVCyD948aCIV//Hm3/jJNmDJhY0FFXddHdFqeDG+v63t7K6tr6xWdgqbu/s7u2XDg4bJsk0ZXWaiES3IjRMcMXqllvBWqlmKCPBmtHwduo3n5g2PFEPdpSyUGJf8ZhTtE5qdPooJXZLZb/iz0CWSZCTMuSodUtfnV5CM8mUpQKNaQd+asMxasupYJNiJzMsRTrEPms7qlAyE45n107IqVN6JE60K2XJTP09MUZpzEhGrlOiHZhFbyr+57UzG1+HY67SzDJF54viTBCbkOnrpMc1o1aMHEGqubuV0AFqpNYFVHQhBIsvL5PGeSW4rPj3F+XqTR5HAY7hBM4ggCuowh3UoA4UHuEZXuHNS7wX7937mLeuePnMEfyB9/kDiJGPGg==</latexit> <latexit sha1_base64="X7vb6zISVJyGYClLThCotEB5plU=">AAAB63icbVBNSwMxEJ2tX7V+VT16CRbBU9kVUY9FLx4r2A9ol5JNs21okg1JVihL/4IXD4p49Q9589+YbfegrQ8GHu/NMDMvUpwZ6/vfXmltfWNzq7xd2dnd2z+oHh61TZJqQlsk4YnuRthQziRtWWY57SpNsYg47USTu9zvPFFtWCIf7VTRUOCRZDEj2OZSn2A1qNb8uj8HWiVBQWpQoDmofvWHCUkFlZZwbEwv8JUNM6wtI5zOKv3UUIXJBI9oz1GJBTVhNr91hs6cMkRxol1Ji+bq74kMC2OmInKdAtuxWfZy8T+vl9r4JsyYVKmlkiwWxSlHNkH542jINCWWTx3BRDN3KyJjrDGxLp6KCyFYfnmVtC/qwVXdf7isNW6LOMpwAqdwDgFcQwPuoQktIDCGZ3iFN094L96797FoLXnFzDH8gff5AwFZjjc=</latexit> <latexit sha1_base64="/e9dPNGGCeaRpBw8WeYtTvRO9T4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh14i+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6CfkY1Cib5tNRLDU8oG9Mh71qqaMSNn81PnZIzqwxIGGtbCslc/T2R0ciYSRTYzojiyCx7M/E/r5tieO1nQiUpcsUWi8JUEozJ7G8yEJozlBNLKNPC3krYiGrK0KZTsiF4yy+vktZF1atV3fvLSv0mj6MIJ3AK5+DBFdThDhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzifP1FjjdI=</latexit> CSE 132A ⇡ γ ⇥ [<latexit sha1_base64="hcDVEi/5PzGWbRdEH4Wb1NZCSL4=">AAAB6HicbVDJTgJBEK3BDXBDPXrpSEy8SGY8qEeiF4+QyBJhQnqaGmjpWdLdYyQTvsCLB43h6g/4L978Gm2Wg4IvqeTlvapU1fNiwZW27S8rs7K6tr6RzeU3t7Z3dgt7+3UVJZJhjUUikk2PKhQ8xJrmWmAzlkgDT2DDG1xP/MYDSsWj8FYPY3QD2gu5zxnVRqqedgpFu2RPQZaJMyfFci4e3308flc6hc92N2JJgKFmgirVcuxYuymVmjOBo3w7URhTNqA9bBka0gCVm04PHZFjo3SJH0lToSZT9fdESgOlhoFnOgOq+2rRm4j/ea1E+5duysM40Riy2SI/EURHZPI16XKJTIuhIZRJbm4lrE8lZdpkkzchOIsvL5P6Wck5L9lVk8YVzJCFQziCE3DgAspwAxWoAQOEJ3iBV+veerberPGsNWPNZw7gD6z3H5VmkIQ=</latexit> \ − CSE 132B SELECT … FROM … WHERE RDBMS CSE 132C Database 9 Relational DBMS in a nutshell A software system that implements the relational model, i.e., enables users to store and process relational databases 10 Relational DBMS in a nutshell RDBMS software is now a USD 20+ billions/year industry; many open source RDBMSs also exist People still start companies about what are basically RDBMSs! 11 CSE 132C will get you to think critically about the implementation of database systems 1. DataHow arestorage, large filestructured organization, datasets and stored buffer managementand organized? 2. HowIndexing, are queries sorting, handled? relational operator implementation, and query processing 3. HowQuery to optimization; make it faster parallel and more RDBMSs scalable? 4. RecentDataflow developments Systems; NoSQL and trendssystems; ML systems; ML for RDBMSs 12 (Tentative) Course Schedule Week Topic 1 Introduction; Recap of Relational Algebra and SQL 1-2 Data Storage; Buffer Management; File Organization 3-4 Indexing (B+ Tree; Hash Index) 4-5 External Sorting 5 Midterm Exam on May 4 6-7 Relational Operator Implementations; Query Processing 7-8 Query Optimization 9 ML for RDBMSs 10 Parallel DBMSs and Dataflow Systems 10 Optional: Key-value stores, Graph DBMSs, ML systems 11 Final Exam on Jun 8 (Tue) There will also be 2 invited guest lectures from industry 13 Course Textbook Prescribed Textbook: “Database Management Systems” 3rd Edition Raghu Ramakrishnan and Johannes Gehrke Aka The “Cow Book” Which cow are you? Optional Textbook: “Database Systems: The Complete Book” H.G. Molina, J.D. Ullman, and J. Widom 14 Prerequisites ❖ CSE 132A or equivalent introductory DB course or DSC 102 ❖ Waivers possible; email me with justification (e.g., other DB course elsewhere or real-world experience with RDBMSs) ❖ CSE 120 or CSE 132B may be helpful but not required ❖ C++ is necessary for programming projects; check course webpage for resources http://cseweb.ucsd.edu/classes/sp21/cse132C-a/ 15 Components and Grading ❖ Project 1 (Buffer Manager): 10% ❖ Project 2 (B+ Tree Index): 25% ❖ Midterm Exam on May 4: 15% ❖ Cumulative Final Exam on Jun 8: 25% ❖ Quizzes: 20% (5% x 4) ❖ Peer Evaluation Activities: 5% ❖ Quizzes, Exams, and Peer Activities open book/notes/Web; via Canvas with time windows; similar details for DSC 102 below http://cseweb.ucsd.edu/classes/sp21/cse132C-a/ http://cseweb.ucsd.edu/~arunkk/dsc102_winter21/ 16 Grading Scheme Hybrid of relative and absolute; grade is better of the two Grade Relative Bin (Use strictest) Absolute Cutoff (>=) A+ Highest 5% 95 A Next 10% (5-15) 90 A- Next 15% (15-30) 85 B+ Next 15% (30-45) 80 B Next 15% (45-60) 75 B- Next 15% (60-75) 70 C+ Next 5% (75-80) 65 C Next 5% (80-85) 60 C- Next 5% (85-90) 55 D Next 5% (90-95) 50 F Lowest 5% < 50 Example: Score 82 but 33%le; Rel.: B-; Abs.: B+; so, B+ 17 Programming Projects ❖ BadgerDB: An RDBMS skeleton in C++ from UW-Madison ❖ Project 1: Buffer Manager ❖ Implement “clock algorithm” for buffer replacement ❖ Project 2: B+ Tree Index ❖ Implement data structure; insert/update/delete ops ❖ Teams of 2 or 1 only. No sharing of code across teams; don’t post/search online; I take Academic Integrity very seriously! ❖ No late days! Plan well ahead of time and execute. ❖ See course projects page for details http://cseweb.ucsd.edu/classes/sp21/cse132C-a/ 18 Course Administrivia ❖ 3 key tools: Canvas + Zoom (mandatory); Piazza (optional) ❖ Lecture Slot: TueThu 2:00-3:20pm PT ❖ Live via Zoom with Q&A, breakout rooms, etc. ❖ Recordings posted to Canvas Media Gallery afterward ❖ Discussion Slot: Wed 2:00-2:50pm PT ❖ For review discussion; TA talks; NOT all slots will be used ❖ Canvas: Announcements, links to course webpages, all Zoom links (do NOT make them public!), Discussions, etc. ❖ Piazza: Discussion board; Canvas Discussions also fine; pitch in to discuss peers’ questions http://cseweb.ucsd.edu/classes/sp21/cse132C-a/ 19 General Dos and Do NOTs Do: ❖ Try to join the synchronous Zoom lectures and ask doubts/ questions that the instructor can answer live ❖ View and review video lectures asynchronously yourself ❖ Participate in class discussions on Piazza ❖ Follow all announcements on Canvas ❖ Use “CSE132C:” as subject prefix for all emails to me/TA Do NOT: ❖ Record any Zoom session without explicit permission of the instructor or other participating