LECTURE #08:QUERY EXECUTION Administrivia

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #08:QUERY EXECUTION administrivia • Reminders – Sign up for discussion slots on Thursday – Proposal presentations on next Wednesday – 2 page document + 5 minute presentation – Assignment 1 due on Wednesday GT 8803 // Fall 2019 2 LAST CLASS • Storage models: NSM, DSM, and FSM FLEXIBLE STORAGE MODEL ID University Enrollment City 1 Georgia Tech 15000 Atlanta 2 Wisconsin 30000 Madison 3 Carnegie Mellon 6000 Pittsburgh 4 UC Berkeley 30000 Berkeley GT 8803 // Fall 2019 3 LAST CLASS • Compression – Zone maps – Dictionary encoding Original Data val 100 200 300 400 400 GT 8803 // Fall 2019 4 LAST CLASS • Compression – Zone maps – Dictionary encoding Original Data Zone Map val type val 100 MIN 100 200 MAX 400 300 AVG 280 400 SUM 1400 400 COUNT 5 GT 8803 // Fall 2019 5 LAST CLASS • Compression SELECT * FROM table – Zone maps WHERE val > 600 – Dictionary encoding Original Data Zone Map val type val 100 MIN 100 200 MAX 400 300 AVG 280 400 SUM 1400 400 COUNT 5 GT 8803 // Fall 2019 6 LAST CLASS • Visual Storage Engine – Convolutional Auto Encoder COMPRESSED DATA ENCODER DECODER (CONVOLUTIONS) (DECONVOLUTIONS) GT 8803 // Fall 2019 7 TODAY’s AGENDA • Query Processing Models • Access Methods • Expression Evaluation • Visual Query Execution Engine GT 8803 // Fall 2019 8 QUERY PROCESSING MODELS GT 8803 // Fall 2018 9 ANATOMY OF A DATABASE SYSTEM Connection Manager + Admission Control Process Manager Query Parser Query Optimizer Query Processor Query Executor Query Lock Manager (Concurrency Control) Access Methods (or Indexes) Transactional Storage Manager Buffer Pool Manager Log Manager Memory Manager + Disk Manager Shared Utilities Networking Manager Source: Anatomy of a Database System GT 8803 // Fall 2019 10 QUERY PLAN • The operators are arranged in a tree. – Data flows from the leaves toward the root. – Output of the root node is the result of the query. A.id, B.value SELECT A.id, B.value p FROM A, B WHERE A.id = B.id ⨝ A.id=B.id AND B.value > 100 s value>100 AB GT 8803 // Fall 2019 11 QUERY PROCESSING MODELS • A DBMS's processing model defines how the system executes a query plan. – Different trade-offs for different workloads. – Top-down or Bottom-up execution – Determines what data is sent between operators • Three approaches: – Tuple-at-a-time Model – Operator-at-a-time Model – Vector-at-a-time Model GT 8803 // Fall 2019 12 #1: TUPLE-AT-A-TIME MODEL • Each query plan operator implements a next function. – On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. – The operator implements a loop that calls next on its children to retrieve their tuples and then process them. • Top-down plan processing. • Also called Iterator / Volcano / Pipeline model. GT 8803 // Fall 2019 13 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 p A.id, B.value ⨝ A.id=B.id s value>100 AB GT 8803 // Fall 2018 14 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): emit(projection(t)) FROM A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t1) for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): emit(t) s value>100 for t in A: for t in B: emit(t) emit(t) AB GT 8803 // Fall 2018 15 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): emit(projection(t)) FROM A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t1) for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): emit(t) s value>100 for t in A: for t in B: emit(t) emit(t) AB GT 8803 // Fall 2018 16 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): FROM 1 emit(projection(t)) A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t1) for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): emit(t) s value>100 for t in A: for t in B: emit(t) emit(t) AB GT 8803 // Fall 2018 17 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): FROM 1 emit(projection(t)) A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t1) for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): emit(t) s value>100 for t in A: for t in B: emit(t) emit(t) AB GT 8803 // Fall 2018 18 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): FROM 1 emit(projection(t)) A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t ) 2 1 for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): emit(t) s value>100 for t in A: for t in B: emit(t) emit(t) AB GT 8803 // Fall 2018 19 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): FROM 1 emit(projection(t)) A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t ) 2 1 for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): emit(t) s value>100 for t in A: for t in B: 3 emit(t) emit(t) AB GT 8803 // Fall 2018 20 #1: TUPLE-AT-A-TIME MODEL SELECT A.id, B.value for t in child.Next(): FROM 1 emit(projection(t)) A, B WHERE A.id = B.id AND for t1 in left.Next(): B.value > 100 buildHashTable(t ) 2 1 for t in right.Next(): 2 p A.id, B.value if probe(t2): emit(t1⨝t2) for t in child.Next(): ⨝ A.id=B.id if evalPred(t): 4 emit(t) s value>100 for t in A: for t in B: 3 emit(t) emit(t) 5 AB GT 8803 // Fall 2018 21 #1: TUPLE-AT-A-TIME MODEL • This is used in almost every DBMS. – Allows for tuple pipelining. • Some operators will block until children emit all of their tuples. – Joins, Subqueries, Order By – Known as pipeline breakers • Output control works easily with this approach. – Limit GT 8803 // Fall 2019 22 #2: OPERATOR-AT-A-TIME MODEL • Each operator processes its input all at once and then emits its output all at once. – The operator "materializes" it output as a single result. – The DBMS can push down hints into to avoid scanning too many tuples. • Bottom-up plan processing. GT 8803 // Fall 2019 23 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t1 in left.Output(): buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): if evalPred(t): out.add(t) s value>100 out = { } out = { } for t in A: for t in B: out.add(t) out.add(t) AB GT 8803 // Fall 2018 24 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t1 in left.Output(): buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): if evalPred(t): out.add(t) s value>100 out = { } out = { } 1 for t in A: for t in B: out.add(t) out.add(t) AB GT 8803 // Fall 2018 25 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t1 in left.Output(): buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): if evalPred(t): out.add(t) s value>100 out = { } out = { } 1 for t in A: for t in B: out.add(t) out.add(t) AB GT 8803 // Fall 2018 26 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t1 in left.Output(): buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): if evalPred(t): out.add(t) s value>100 out = { } out = { } 1 for t in A: for t in B: 2 out.add(t) out.add(t) AB GT 8803 // Fall 2018 27 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t1 in left.Output(): buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): 3 if evalPred(t): out.add(t) s value>100 out = { } out = { } 1 for t in A: for t in B: 2 out.add(t) out.add(t) AB GT 8803 // Fall 2018 28 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t in left.Output(): 4 1 buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): 3 if evalPred(t): out.add(t) s value>100 out = { } out = { } 1 for t in A: for t in B: 2 out.add(t) out.add(t) AB GT 8803 // Fall 2018 29 #2: OPERATOR-AT-A-TIME MODEL out = { } SELECT A.id, B.value 5 for t in child.Output(): FROM A, B out.add(projection(t)) WHERE A.id = B.id out = { } AND B.value > 100 for t in left.Output(): 4 1 buildHashTable(t ) 1 p A.id, B.value for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) out = { } ⨝ A.id=B.id for t in child.Output(): 3 if evalPred(t): out.add(t) s value>100 out = { } out = { } 1 for t in A: for t in B: 2 out.add(t) out.add(t) AB GT 8803 // Fall 2018 30 #2: OPERATOR-AT-A-TIME MODEL • Better for OLTP workloads – Transactions typically only access a small number of tuples at a time.

Load more