Databases & External Memory

Databases & External Memory

Databases & External Memory Indexes, Write Optimization, and Crypto-searches Michael A. Bender Martin Farach-Colton Tokutek & Stony Brook Tokutek & Rutgers What’s a Database? DBs are systems for: • Storing data • Querying data. 9 count (*) where a<120; 5 809 99 20 8 202 16 1 14 STOC ’12 Tutorial: Databases and External Memory What’s a Database? DBs are systems for: • Storing data • Querying data. DBs can have SQL interface or something else. • We’ll talk some about so-called NoSQL systems. Data consists of a set of key,value pairs. • Each value can consist of a well defined tuple, where each component is called a field (e.g. relational DBs). Big Data. • Big data = bigger than RAM. STOC ’12 Tutorial: Databases and External Memory What’s this database tutorial? Traditionally: • Fast queries ⇒ sophisticated indexes and slow inserts. • Fast inserts ⇒ slow queries. 9 count (*) where a<120; 5 809 99 20 8 202 16 1 14 We want fast data ingestion + fast queries. STOC ’12 Tutorial: Databases and External Memory What’s this database tutorial? Database tradeoffs: • There is a query/insertion tradeoff. ‣ Algorithmicists mean one thing by this claim. ‣ DB users mean something different. • We’ll look at both. Overview of Tutorial: • Indexes: The DB tool for making queries fast • The difficulty of indexing under heavy data loads • Theory and Practice of write optimization • From data structure to database • Algorithmic challenges from maintaining two indexes STOC ’12 Tutorial: Databases and External Memory Row, Index, and Table a b c Row 100 5 45 • Key,value pair 101 92 2 • key = a, value = b,c 156 56 45 Index 165 6 2 • Ordering of rows by key 198 202 56 • Used to make queries fast 206 23 252 256 56 2 Table 412 43 45 • Set of indexes create table foo (a int, b int, c int, primary key(a)); STOC ’12 Tutorial: Databases and External Memory An index can select needed rows a b c 100 5 45 101 92 2 156 56 45 165 6 2 198 202 56 206 23 252 256 56 2 412 43 45 count (*) where a<120; STOC ’12 Tutorial: Databases and External Memory An index can select needed rows a b c 100 5 45 100 5 45 101 92 2 101 92 2 } 156 56 45 165 6 2 198 202 56 206 23 252 2 256 56 2 412 43 45 count (*) where a<120; STOC ’12 Tutorial: Databases and External Memory No good index means slow table scans a b c 100 5 45 101 92 2 156 56 45 165 6 2 198 202 56 206 23 252 256 56 2 412 43 45 count (*) where b>50 and b<100; STOC ’12 Tutorial: Databases and External Memory No good index means slow table scans a b c 100 5 45 100 5 45 101 92 2 101 92 2 156 56 45 156 56 45 165 6 2 165 6 2 1230 198 202 56 198 202 56 206 23 252 206 23 252 256 56 2 256 56 2 412 43 45 412 43 45 count (*) where b>50 and b<100; STOC ’12 Tutorial: Databases and External Memory You can add an index a b c b a 100 5 45 5 100 101 92 2 6 165 156 56 45 23 206 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 412 43 45 202 198 alter table foo add key(b); STOC ’12 Tutorial: Databases and External Memory A selective index speeds up queries a b c b a 100 5 45 5 100 101 92 2 6 165 156 56 45 23 206 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 412 43 45 202 198 count (*) where b>50 and b<100; STOC ’12 Tutorial: Databases and External Memory A selective index speeds up queries a b c b a 100 5 45 5 100 101 92 2 6 165 156 56 45 23 206 165 6 2 43 412 56 156 198 202 56 56 156 56 256 206 23 252 56 156256 } 92 101 256 56 2 5692 256101 412 43 45 20292 101198 3 count (*) where b>50 and b<100; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow a b c b a 100 5 45 5 100 101 92 2 6 165 156 56 45 23 206 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 412 43 45 202 198 sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow 56 156 56 256 a b c b a 92 101 Selecting 100 5 45 5 100 202 198 fast b: on 101 92 2 6 165 156 56 45 23 206 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 412 43 45 202 198 sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow 56 156 56 256 a b c b a 92 101 Selecting 100 5 45 5 100 202 198 fast b: on 101 92 2 6 165 156 56 45 23 206 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 summing c: slow c: summing 412 43 45 202 198 for info Fetching sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow 56 156 56 256 a b c b a 92 101 Selecting 100 5 45 5 100 202 198 fast b: on 101 92 2 6 165 156 56 45 23 206 156 56 45 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 summing c: slow c: summing 412 43 45 202 198 for info Fetching sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow 56 156 56 256 a b c b a 92 101 Selecting 100 5 45 5 100 202 198 fast b: on 101 92 2 6 165 156 56 45 23 206 156 56 45 165 6 2 43 412 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 summing c: slow c: summing 412 43 45 202 198 for info Fetching sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow 56 156 56 256 a b c b a 92 101 Selecting 100 5 45 5 100 202 198 fast b: on 101 92 2 6 165 156 56 45 23 206 156 56 45 165 6 2 43 412 256 56 2 198 202 56 56 156 206 23 252 56 256 256 56 2 92 101 summing c: slow c: summing 412 43 45 202 198 for info Fetching sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Selective indexes can still be slow 56 156 56 256 a b c b a 92 101 Selecting 100 5 45 5 100 202 198 fast b: on 101 92 2 6 165 156 56 45 23 206 156 56 45 165 6 2 43 412 256 56 2 198 202 56 56 156 101 92 2 Poor data locality data Poor 206 23 252 56 256 198 202 56 256 56 2 92 101 summing c: slow c: summing 412 43 45 202 198 for info Fetching 105 sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Covering indexes speed up queries a b c b,c a 100 5 45 5,45 100 101 92 2 6,2 165 156 56 45 23,252 206 165 6 2 43,45 412 198 202 56 56,2 256 206 23 252 56,45 156 256 56 2 92,2 101 412 43 45 202,56 198 alter table foo add key(b,c); sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory Covering indexes speed up queries 56,56,22 256 a b c b,c a 56,56,4545 156 100 5 45 5,45 100 92,92,22 101 101 92 2 6,2 165 202,202,5656 198 156 56 45 23,252 206 165 6 2 43,45 412 198 202 56 56,2 256 105 206 23 252 56,45 156 256 56 2 92,2 101 412 43 45 202,56 198 alter table foo add key(b,c); sum(c) where b>50; STOC ’12 Tutorial: Databases and External Memory DB Performance and Indexes Read performance depends on having the right indexes for a query workload. • We’ve scratched the surface of index selection. • And there’s interesting query optimization going on. Write performance depends on speed of maintaining those indexes. Next: how to implement indexes and tables. STOC ’12 Tutorial: Databases and External Memory An index is a dictionary Dictionary API: maintain a set S subject to • insert(x): S ← S ∪ {x} • delete(x): S ← S - {x} • search(x): is x ∊ S? • successor(x): return min y > x s.t. y ∊ S • predecessor(y): return max y < x s.t. y ∊ S STOC ’12 Tutorial: Databases and External Memory A table is a set of indexes A table is a set of indexes with operations: • Add index: add key(f1,f2,...); • Drop index: drop key(f1,f2,...); • Add column: adds a field to primary key value.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    74 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us