Relational Algebra 2 & Query Optimization

Relational Algebra 2 & Query Optimization

L22: The Relational Model (continued) CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 4/5/2018 256 Announcements! • Please pick up your exam if you have not yet • HW6 will include Redis and MongoDB exercises with minimal install overhead (still in preparation) • Final class calendar • Outline today - Relational algebra - Optimization - NoSQL (start, continuing next time) 257 258 RDBMS Architecture • How does a SQL engine work ? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan Declarative Translate to Find logically Execute each query (from relational algebra equivalent- but operator of the user) expresson more efficient- RA optimized plan! expression 259 RDBMS Architecture • How does a SQL engine work ? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions! 260 Relational Algebra (RA) • Five basic operators: 1. Selection: s 2. Projection: P 3. Cartesian Product: ´ We’ll look at these first! 4. Union: È 5. Difference: - • Derived or auxiliary operators: - Intersection, complement - Joins (natural,equi-join, theta join, semi-join) - Renaming: r And also at one example of a derived operator (natural - Division join) and a special operator (renaming) 261 Keep in mind: RA operates on sets! • RDBMSs use multisets, however in relational algebra formalism we will consider sets! • Also: we will consider the named perspective, where every attribute must have a unique name - àattribute order does not matter… Now on to the basic RA operators… 262 1. Selection (�) Students(sid,sname,gpa) • Returns all tuples which satisfy a condition • Notation: sc (R) SQL: • Examples SELECT * - s (Employee) Salary > 40000 FROM Students - s name = “Smith” (Employee) WHERE gpa > 3.5; • The condition c can be =, <, £, >, ³, <> RA: �{|} ~•.w(��������) 263 Another example: SSN Name Salary 1234545 John 200000 5423341 Smith 600000 4352342 Fred 500000 sSalary > 40000 (Employee) SSN Name Salary 5423341 Smith 600000 4352342 Fred 500000 264 2. Projection (Π) Students(sid,sname,gpa) • Eliminates columns, then removes duplicates SQL: • Notation: P (R) A1,…,An SELECT DISTINCT • Example: project social-security sname, number and names: gpa - P SSN, Name (Employee) FROM Students; - Output schema: Answer(SSN, Name) RA: Π„&}(…,{|}(��������) 265 SSN Name Salary Another example: 1234545 John 200000 5423341 John 600000 4352342 John 200000 P SSN (Employee) SSN 1234545 5423341 4352342 266 SSN Name Salary Another example: 1234545 John 200000 5423341 John 600000 4352342 John 200000 P Name,Salary (Employee) Name Salary John 200000 John 600000 267 Note that RA Operators are Compositional! Students(sid,sname,gpa) SELECT DISTINCT sname, Π„&}(…,{|}(�{|}~•.w(��������)) gpa FROM Students WHERE gpa > 3.5; �{|}~•.w(Π„&}(…,{|}( ��������)) How do we represent this query in RA? Are these logically equivalent? 268 3. Cross-Product (×) Students(sid,sname,gpa) • Each tuple in R1 with each tuple in People(ssn,pname,address) R2 SQL: • ´ Notation: R1 R2 SELECT * • Example: FROM Students, People; - Employee ´ Dependents • Rare in practice; mainly used to express joins RA: �������� × ������ 269 Another example: People Students ssn pname address sid sname gpa 1234545 John 216 Rosse × 001 John 3.4 5423341 Bob 217 Rosse 002 Bob 1.3 �������� × ������ ssn pname address sid sname gpa 1234545 John 216 Rosse 001 John 3.4 5423341 Bob 217 Rosse 001 John 3.4 1234545 John 216 Rosse 002 Bob 1.3 5423341 Bob 216 Rosse 002 Bob 1.3 270 4. Renaming (�) Students(sid,sname,gpa) • Changes the schema, not the instance SQL: • A ‘special’ operator- neither basic SELECT nor derived sid AS studId, • Notation: r (R) sname AS name, B1,…,Bn gpa AS gradePtAvg FROM Students; • Note: this is shorthand for the proper form (since names, not RA: order matters!): �„ˆ‰Š‹Š,&}(…,{Œ}Š…kˆ•Ž{(��������) - r A1àB1,…,AnàBn (R) We care about this operator because we are working in a named perspective 271 Another example: Students sid sname gpa 001 John 3.4 002 Bob 1.3 �„ˆ‰Š‹Š,&}(…,{Œ}Š…kˆ•Ž{(��������) Students studId name gradePtAvg 001 John 3.4 002 Bob 1.3 272 5. Natural Join (⋈) Students(sid,name,gpa) • Notation: R1 ⋈ R2 People(ssn,name,address) • Joins R1 and R2 on equality of all shared attributes SQL: - If R1 has attribute set A, and R2 has attribute set B, and SELECT DISTINCT they share attributes A⋂B = C, can also be written: ssid, S.name, gpa, ⋈ R1 � R2 ssn, address FROM • Our first example of a derived RA operator: Students S, - Meaning: R1 ⋈ R2 = PA U B(sC=D(�•→’(R1) ´ R2)) People P - Where: WHERE S.name = P.name; • The rename �•→’ renames the shared attributes in one of the relations • The selection sC=D checks equality of the shared attributes • The projection PA U B eliminates the duplicate common attributes RA: �������� ⋈ ������ 273 Another example: Students S People P sid S.name gpa ssn P.name address 001 John 3.4 ⋈ 1234545 John 216 Rosse 002 Bob 1.3 5423341 Bob 217 Rosse �������� ⋈ ������ sid S.name gpa ssn address 001 John 3.4 1234545 216 Rosse 002 Bob 1.3 5423341 216 Rosse 274 Natural Join practice • Given schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given R(A, B, C), S(D, E), what is R ⋈ S ? • Given R(A, B), S(A, B), what is R ⋈ S ? 275 Example: Converting SFW Query -> RA Students(sid,sname,gpa) People(ssn,sname,address) SELECT DISTINCT gpa, address FROM Students S, Π{|},}ŠŠŒ…„„(�{|}~•.w(� ⋈ �)) People P WHERE gpa > 3.5 AND sname = pname; How do we represent this query in RA? 276 Logical Equivalece of RA Plans • Given relations R(A,B) and S(B,C): - Here, projection & selection commute: • �•Sw(Π•(�)) = Π•(�•Sw(�)) - What about here? • �•Sw(Π“(�)) ? = Π“(�•Sw(�)) We’ll look at this in more depth in query optimization… 277 RDBMS Architecture • How does a SQL engine work ? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan We saw how we can transform declarative SQL queries into precise, compositional RA plans 278 RDBMS Architecture • How does a SQL engine work ? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan We’ll look at how to then optimize these plans 279 RDBMS Architecture • How is the RA “plan” executed? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan We have already seen how to execute a few basic operators! 280 RA Plan Execution • Natural Join / Join: - We saw how to use memory & IO cost considerations to pick the correct algorithm to execute a join with BNLJ or SMJ (we skipped HJ) • Selection: - We saw how to use indexes to aid selection - Can always fall back on scan / binary search as well • Projection: - The main operation here is finding distinct values of the project tuples; we briefly discussed how to do this with sorting (we skipped hashing) We already know how to execute all the basic operators! 281 3. Advanced Relational Algebra (very brief) 282 What we will briefly cover next • Set Operations in RA • Extensions & Limitations 283 Relational Algebra (RA) • Five basic operators: 1. Selection: s 2. Projection: P 3. Cartesian Product: ´ 4. Union: È We’ll look at these 5. Difference: - • Derived or auxiliary operators: - Intersection, complement - Joins (natural,equi-join, theta join, semi-join) - Renaming: P - Division 284 1. Union (È) and 2. Difference (–) • R1 È R2 • Example: - ActiveEmployees È RetiredEmployees R1 R2 • R1 – R2 • Example: - AllEmployees -- RetiredEmployees R1 R2 285 What about Intersection (Ç) ? • It is a derived operator • R1 Ç R2 = R1 – (R1 – R2) • Also expressed as a join! R1 R2 • Example - UnionizedEmployees Ç RetiredEmployees 286 RA Expressions Can Get Complex! P name buyer-ssn=ssn pid=pid seller-ssn=ssn P ssn P pid sname=fred sname=gizmo Person Purchase Person Product 287 Operations on Multisets • All RA operations need to be defined carefully on bags - sC(R): preserve the number of occurrences - PA(R): no duplicate elimination - Cross-product, join: no duplicate elimination This is important- relational engines work on multisets, not sets! 288 RA has Limitations ! • Cannot compute “transitive closure” Name1 Name2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister • Find all direct and indirect relatives of Fred • Cannot be expressed in RA ! - Need to write C program, use a graph engine, or modern SQL… 289 Activity-45.ipynb as part of HW6 290 L22: Query Optimization CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 4/5/2018 291 Logical vs. Physical Optimization SQL Query • Logical optimization: Relational - Find equivalent plans that are more efficient Algebra (RA) Plan - Intuition: Minimize # of tuples at each step by changing the order of RA operators Optimized RA Plan • Physical optimization: - Find algorithm with lowest IO cost to execute our plan - Intuition: Calculate based on physical parameters (buffer size, etc.) and estimates of data size (histograms) Execution • We only discuss Logical optimization today 292 1. Logical Optimization 1) Optimization of RA Plans 2) ACTIVITY: RA Plan Optimization 293 RDBMS Architecture • How does a SQL engine work ? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan Declarative Translate to Find logically Execute each query (from relational algebra equivalent- but operator of the user) expresson more efficient- RA optimized plan! expression 294 RDBMS Architecture • How does a SQL engine work ? Relational SQL Optimized Algebra (RA) Execution Query RA Plan Plan Relational Algebra allows us to translate declarative (SQL) queries into precise

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    57 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us