Relational algebra Introduction to Database Design 2012, Lecture 5
Rasmus Ejlers Møgelberg
Overview • Use of logic in constructing queries • Relational algebra • Translating queries to relational algebra • Equations expressed in relational algebra
Rasmus Ejlers Møgelberg 2 Use of logic in constructing queries • Consider the following problem
Find all students who have taken all courses offered by the biology department
• Expressed more formally
Find all students s such that for all courses c, if c is offered by ‘Biology’ then s has taken c
• Translate to SQL:
select * from students where [???]
Rasmus Ejlers Møgelberg 3
Use of logic in constructing queries
Find all students s such that for all courses c, if c is offered by ‘Biology’ then s has taken c • The problem is not suitable for SQL, because it uses ‘for all’ and ‘if ... then ...’ • So reformulate
Find all students s such that there is no course c such that c is offered by ‘Biology’ and s has not taken c
• (using classical logic)
Rasmus Ejlers Møgelberg 4 Use of logic in constructing queries
Find all students s such that there is no course c such that c is offered by ‘Biology’ and s has not taken c • This can be formulated in SQL:
select * from student where not exists (select * from course where dept_name = ‘Biology’ and course_id not in (select course_id from takes where takes.id = student.id))
Finds all courses offered by Biology not Finds all courses taken by student take by student
Rasmus Ejlers Møgelberg 5
More logic • Similar analysis is needed for the challenging exercises this week • You will see more logic in the course Foundations of Computing
Rasmus Ejlers Møgelberg 6 Relational algebra
Rasmus Ejlers Møgelberg
Relational algebra • A language for expressing basic operations in the relational model • Two purposes - Express meaning of queries - Express execution plans in DBMSs • SQL is declarative (what) • Relational algebra is procedual (how)
Rasmus Ejlers Møgelberg 8 Relational algebra in DBMSs
Illustration from book
Rasmus Ejlers Møgelberg 9
Projection
• In SQL select name, salary from instructor; • In relational algebra Πname, salary(instructor)
Rasmus Ejlers Møgelberg 10 Selection
select * from instructor where salary > 90000;
σsalary>90000(instructor)
Rasmus Ejlers Møgelberg 11
Combining selection and projection
select name, dept_name from instructor where salary > 90000;
Πname, dept name(σsalary>90000(instructor))
Rasmus Ejlers Møgelberg 12 Translating SQL into relational algebra • Expression select name, dept_name from instructor where salary > 90000; • Is translated to Πname, dept name(σsalary>90000(instructor)) • Relational algebra expression says - First do selection - Then do projection • Relational algebra procedural
Rasmus Ejlers Møgelberg 13
Syntax trees • The syntax tree * + 5 3 4 • represents the expression (3+4)*5 • Trees grow downwards in computer science! • Evaluation from bottom up • Useful graphical way of representing evaluation order (no need for parentheses)
Rasmus Ejlers Møgelberg 14 Syntax trees for relational algebra
Πname, dept name
σsalary>90000
instructor • represents Πname, dept name(σsalary>90000(instructor))
Rasmus Ejlers Møgelberg 15
Cartesian products mysql> select * from instructor, department; +------+------+------+------+------+------+------+ | ID | name | dept_name | salary | dept_name | building | budget | +------+------+------+------+------+------+------+ | 10101 | Srinivasan | Comp. Sci. | 65000.00 | Biology | Watson | 90000.00 | | 10101 | Srinivasan | Comp. Sci. | 65000.00 | Comp. Sci. | Taylor | 100000.00 | | 10101 | Srinivasan | Comp. Sci. | 65000.00 | Elec. Eng. | Taylor | 85000.00 | | 10101 | Srinivasan | Comp. Sci. | 65000.00 | Finance | Painter | 120000.00 | | 10101 | Srinivasan | Comp. Sci. | 65000.00 | History | Painter | 50000.00 | | 10101 | Srinivasan | Comp. Sci. | 65000.00 | Music | Packard | 80000.00 | | 10101 | Srinivasan | Comp. Sci. | 65000.00 | Physics | Watson | 70000.00 | | 12121 | Wu | Finance | 90000.00 | Biology | Watson | 90000.00 | | 12121 | Wu | Finance | 90000.00 | Comp. Sci. | Taylor | 100000.00 | | 12121 | Wu | Finance | 90000.00 | Elec. Eng. | Taylor | 85000.00 | | 12121 | Wu | Finance | 90000.00 | Finance | Painter | 120000.00 | | 12121 | Wu | Finance | 90000.00 | History | Painter | 50000.00 | | 12121 | Wu | Finance | 90000.00 | Music | Packard | 80000.00 | | 12121 | Wu | Finance | 90000.00 | Physics | Watson | 70000.00 | | 15151 | Mozart | Music | 40000.00 | Biology | Watson | 90000.00 | | 15151 | Mozart | Music | 40000.00 | Comp. Sci. | Taylor | 100000.00 | | 15151 | Mozart | Music | 40000.00 | Elec. Eng. | Taylor | 85000.00 | | 15151 | Mozart | Music | 40000.00 | Finance | Painter | 120000.00 | | 15151 | Mozart | Music | 40000.00 | History | Painter | 50000.00 | ... +------+------+------+------+------+------+------+ 84 rows in set (0.01 sec)
Rasmus Ejlers Møgelberg 16 Products
select * from instructor, department; • In relational algebra instructor department × • Syntax tree
× instructor department
Rasmus Ejlers Møgelberg 17
Relational model: natural join
mysql> select * from instructor natural join department; +------+------+------+------+------+------+ | dept_name | ID | name | salary | building | budget | +------+------+------+------+------+------+ | Comp. Sci. | 10101 | Srinivasan | 65000.00 | Taylor | 100000.00 | | Finance | 12121 | Wu | 90000.00 | Painter | 120000.00 | | Music | 15151 | Mozart | 40000.00 | Packard | 80000.00 | | Physics | 22222 | Einstein | 95000.00 | Watson | 70000.00 | | History | 32343 | El Said | 60000.00 | Painter | 50000.00 | | Physics | 33456 | Gold | 87000.00 | Watson | 70000.00 | | Comp. Sci. | 45565 | Katz | 75000.00 | Taylor | 100000.00 | | History | 58583 | Califieri | 62000.00 | Painter | 50000.00 | | Finance | 76543 | Singh | 80000.00 | Painter | 120000.00 | | Biology | 76766 | Crick | 72000.00 | Watson | 90000.00 | | Comp. Sci. | 83821 | Brandt | 92000.00 | Taylor | 100000.00 | | Elec. Eng. | 98345 | Kim | 80000.00 | Taylor | 85000.00 | +------+------+------+------+------+------+ 12 rows in set (0.01 sec)
• First cartesian product, then select, then project
Rasmus Ejlers Møgelberg 18 Join in relational algebra • Join can be defined using other constructors Πdept name,ID,. . . ,budget
σinstructor.dept name=department.dept name
×
department instructor
Rasmus Ejlers Møgelberg 19
Computation of joins • In practice joins are not always computed this way • Consider e.g.
• Can often find relevant entry on right hand side fast without having to construct cartesian product
Rasmus Ejlers Møgelberg 20 Expressing execution plans
• DBMSs use a variant of relational algebra for this • Still, basic relational algebra good way of understanding meaning of queries
Rasmus Ejlers Møgelberg 21
General joins • Define
R Θ S = σΘ(R S) × • For example select * from student join advisor on s_ID = ID • Is translated to relational algebra as
student (ID=s ID) advisor
Rasmus Ejlers Møgelberg 22 Set operations • Usual set operations in relational algebra R S ∪ R S ∩ R S \ • These only allowed between relations with same set of attributes! • Warning: - The book treats relational algebra - Might have been better to use multiset relational algebra
Rasmus Ejlers Møgelberg 23
Using left outer join
mysql> select * from student natural left outer join takes; +------+------+------+------+------+------+------+------+------+ | ID | name | dept_name | tot_cred | course_id | sec_id | semester | year | grade | +------+------+------+------+------+------+------+------+------+ | 00128 | Zhang | Comp. Sci. | 102 | CS-101 | 1 | Fall | 2009 | A | | 00128 | Zhang | Comp. Sci. | 102 | CS-347 | 1 | Fall | 2009 | A- | | 12345 | Shankar | Comp. Sci. | 32 | CS-101 | 1 | Fall | 2009 | C | | 12345 | Shankar | Comp. Sci. | 32 | CS-190 | 2 | Spring | 2009 | A | | 12345 | Shankar | Comp. Sci. | 32 | CS-315 | 1 | Spring | 2010 | A | | 12345 | Shankar | Comp. Sci. | 32 | CS-347 | 1 | Fall | 2009 | A | | 19991 | Brandt | History | 80 | HIS-351 | 1 | Spring | 2010 | B | | 23121 | Chavez | Finance | 110 | FIN-201 | 1 | Spring | 2010 | C+ | | 44553 | Peltier | Physics | 56 | PHY-101 | 1 | Fall | 2009 | B- | | 45678 | Levy | Physics | 46 | CS-101 | 1 | Fall | 2009 | F | | 45678 | Levy | Physics | 46 | CS-101 | 1 | Spring | 2010 | B+ | | 45678 | Levy | Physics | 46 | CS-319 | 1 | Spring | 2010 | B | | 54321 | Williams | Comp. Sci. | 54 | CS-101 | 1 | Fall | 2009 | A- | | 54321 | Williams | Comp. Sci. | 54 | CS-190 | 2 | Spring | 2009 | B+ | | 55739 | Sanchez | Music | 38 | MU-199 | 1 | Spring | 2010 | A- | | 70557 | Snow | Physics | 0 | NULL | NULL | NULL | NULL | NULL | | 76543 | Brown | Comp. Sci. | 58 | CS-101 | 1 | Fall | 2009 | A | | 76543 | Brown | Comp. Sci. | 58 | CS-319 | 2 | Spring | 2010 | A | | 76653 | Aoi | Elec. Eng. | 60 | EE-181 | 1 | Spring | 2009 | C | | 98765 | Bourikas | Elec. Eng. | 98 | CS-101 | 1 | Fall | 2009 | C- | | 98765 | Bourikas | Elec. Eng. | 98 | CS-315 | 1 | Spring | 2010 | B | | 98988 | Tanaka | Biology | 120 | BIO-101 | 1 | Summer | 2009 | A | | 98988 | Tanaka | Biology | 120 | BIO-301 | 1 | Summer | 2010 | NULL | +------+------+------+------+------+------+------+------+------+ 23 rows in set (0.00 sec)
Rasmus Ejlers Møgelberg 24 Outer join • Left outer join defined as ∪
student takes × (null,...,null) − { } students who have not taken a student Πstudent course students who have taken a course student takes
Rasmus Ejlers Møgelberg 25
Generalised projections • Projections can be combined with basic operations on numbers, dates, booleans or strings, e.g.
Πflight num,capacity reservations(...) −
Rasmus Ejlers Møgelberg 26 Renaming • It is often necessary to rename a relation • The expression
ρR(a,...,b)(S) • renames relation S to R and the attributes of S to a, ..., b
Rasmus Ejlers Møgelberg 27
Aggregation • Special symbol for aggregation • SQL mysql> select avg(salary), dept_name from instructor -> group by dept_name;
• Relational algebra
dept nameGaverage(salary)
Rasmus Ejlers Møgelberg 28 Aggregation with having
• First group, then select groups mysql> select avg(salary), dept_name from instructor -> group by dept_name -> having count(ID)>1;
Rasmus Ejlers Møgelberg 29
Having • Recall that having is just selection on the group level • Translate mysql> select avg(salary), dept_name from instructor -> group by dept_name -> having avg(salary) > 80000; • as σavg salary>80000
ρR(avg salary,dept name)
dept nameGaverage(salary)
instructor
Rasmus Ejlers Møgelberg 30 Subqueries • Example
mysql> select name from instructor, -> (select max(salary) as max_salary from instructor) as S -> where instructor.salary = S.max_salary;
• Insert tree from subquery in tree from outer query • (details on blackboard) • Nested queries in where clause are more involved
mysql> select name from instructor -> where salary >= all (select salary from instructor);
Rasmus Ejlers Møgelberg 31
Equations
Rasmus Ejlers Møgelberg Equations • Many different relational algebra expressions compute the same thing • When evaluating queries, DBMS will - generate many different relational algebra expressions computing the query - choose the one it thinks is most efficient • Here we see some basic equalities of expressions
Rasmus Ejlers Møgelberg 33
Relational algebra in DBMSs
Illustration from book
Rasmus Ejlers Møgelberg 34 Equalities (examples) • Selection is commutative σΘ1 (σΘ2 (R)) = σΘ2 (σΘ1 (R))
= σΘ1andΘ2 (R) • Join is commutative R1 R2 = R2 R1 • (only difference is order of attributes) • Join is associative
R1 (R2 R3)=(R1 R2) R3
Rasmus Ejlers Møgelberg 35
More equalities
• Suppose Θ 1 only talks about attributes of R 1 and similarly Θ 2 only talks about attributes of R2 • Then
σΘ andΘ 1 2
= σΘ1 σΘ2
R1 R2 R1 R2 • Right hand side is often much less expensive to compute (DBMS makes such optimizations automatically)
Rasmus Ejlers Møgelberg 36 Summary • After this lecture you should be able to - Translate simple queries to relational algebra - Draw the syntax tree of relational algebra expressions • Future goal: - Judge which relational algebra expression represents the most efficient evaluation plan for a query
Rasmus Ejlers Møgelberg 37