Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (1)

Database 2 Lecture III

Alessandro Artale Faculty of Computer Science – Free University of Bolzano Room: 221 [email protected] http://www.inf.unibz.it/∼artale/

2003/2004 – First Semester Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (2)

Summary of Lecture III

• Introduction to Query Processing – An Overview of Relational and SQL Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (3) Query Processing

• Query Processor: Component of a DBMS that translates a query (or a generic SQL command) into a sequence of executable operations on the database. • Query processing involves representing an SQL query into an equivalent relational algebra expression. • A logical query plan is the result of the Query Compilation phase where each relational algebra operator is annotated with: Query – The algorithm to be used; – The particular index to be used. Query Compilation Query Plan

Query Execution Data Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (4)

Relational Algebra

• The relational algebra is The Query Language for the relational model. SQL can be intended as an implementation of the relational algebra.

• The Relational Algebra provides a collection of operations that can be used to construct new relations from old ones.

• The relational algebra operations are divided into two groups: Set-theoretic Relational Name Symbol Name Symbol

Union ∪ Projection πA(R)

Intersection ∩ Selection σC (R) Difference − Natural Join 1

Cartesian Product × Theta-Join 1C Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (5)

A Company Database

EMPLOYEE FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO John B Smith 123456789 09-JAN-55 731 Fondren, Houston, TX M 30000 333445555 5 Franklin T Wong 333445555 08-DEC-45 638 Voss, Houston, TX M 40000 888665555 5 Alicia J Zelaya 999887777 19-JUL-58 3321 Castle, Spring, TX F 25000 987654321 4 Jennifer S Wallace 987654321 20-JUN-31 291 Berry, Bellaire, TX F 43000 888665555 4 Ramesh K Narayan 666884444 15-SEP-52 975 Fire Oak, Humble, TX M 38000 333445555 5 Joyce A English 453453453 31-JUL-62 5631 Rice, Houston, TX F 25000 333445555 5 Ahmad V Jabbar 987987987 29-MAR-59 980 Dallas, Houston, TX M 25000 987654321 4 James E Borg 888665555 10-NOV-27 450 Stone, Houston, TX M 55000 null 1

DEPT_LOCATIONS DNUMBER DLOCATION 1 Houston 4 Stafford DEPARTMENT DNAME DNUMBER MGRSSN MGRSTARTDATE 5 Bellaire Research 5 333445555 22-MAY-78 5 Sugarland Administration 4 987654321 01-JAN-85 5 Houston Headquarters 1 888665555 19-JUN-71

WORKS_ON ESSN PNO HOURS 123456789 1 32.5 123456789 2 7.5 666884444 3 40.0 453453453 1 20.0 453453453 2 20.0 PROJECT PNAME PNUMBER PLOCATION DNUM 333445555 2 10.0 ProductX 1 Bellaire 5 333445555 3 10.0 ProductY 2 Sugarland 5 333445555 10 10.0 ProductZ 3 Houston 5 333445555 20 10.0 Computerization 10 Stafford 4 999887777 30 30.0 Reorganization 20 Houston 1 999887777 10 10.0 Newbenefits 30 Stafford 4 987987987 10 35.0 987987987 30 5.0 987654321 30 20.0 987654321 20 15.0 888665555 20 null

DEPENDENT ESSN DEPENDENT_NAME SEX BDATE RELATIONSHIP 333445555 Alice F 05-APR-76 DAUGHTER 333445555 Theodore M 25-OCT-73 SON 333445555 Joy F 03-MAY-48 SPOUSE 987654321 Abner M 29-FEB-32 SPOUSE 123456789 Michael M 01-JAN-78 SON 123456789 Alice F 31-DEC-78 DAUGHTER 123456789 Elizabeth F 05-MAY-57 SPOUSE

 The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (6)

The Selection Operation

• The Selection operation is denoted as follows:

σ(R)

• The selection operation, when applied to a R, produces a new relation which contains only the subset of the tuples of R that satisfy the selection condition.

• The selection operation produces a relation with the same attributes as the input relation.

NOTE: Do not confuse the selection operation with the SELECT command in SQL. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (7)

The Selection Operation: Examples

1. σDNUM=4(PROJECT) PNAME PNUMBER PLOCATION DNUM Computerization 10 Stafford 4 Newbenefits 30 Stafford 4

2. σDNAME=0Research0∨DNAME=0Administration0 (DEPARTMENT) DNAME DNUMBER MGRSSN MGRSTARTDATE Research 5 333445555 22-MAY-78 Administration 4 987654321 01-JAN-85 Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (8)

The Project Operation

• The projection operation is used to produce from a relation R a new relation S that has only some of R’s columns.

• The general form of a projection operation is:

πA1,A2,...,An (R)

• The resulting relation is: S(A1, A2, . . . , An). Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (9)

The Project Operation: An Example

πLNAME,FNAME,SALARY(EMPLOYEE)

LNAME FNAME SALARY Smith John 30000 Wong Franklin 40000 Zelaya Alicia 25000 Wallace Jennifer 43000 Narayan Ramesh 38000 Enlish Joyce 25000 Jabbar Ahmad 25000 Borg James 55000 Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (10)

Set Theoretic Operations

• As relations are sets of tuples, the operations UNION (∪), INTERSECTION (∩) and DIFFERENCE (−) can be applied.

• For the result to make sense, two relations R and S participating in a union, intersection or difference operation must have schemas with identical sets of attributes. The result relation has the same set of attributes. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (11)

Set Theoretic Operations: Examples

R S R ∪ S A B C A B C A B C

a1 b1 c1 a1 b1 c1 a1 b1 c1

a1 b2 c3 a1 b1 c2 a1 b1 c2

a2 b1 c2 a1 b2 c3 a1 b2 c3

a3 b2 c3 a2 b1 c2

a3 b2 c3 R ∩ S R − S S − R A B C A B C A B C

a1 b1 c1 a2 b1 c2 a1 b1 c2

a1 b2 c3 a3 b2 c3 Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (12)

The Operation

• If R and S are relations with schemas

R(A1, A2, . . . , An) and S(B1, B2, . . . , Bm) then the Cartesian product R × S is is a relation Q with schema:

Q(A1, A2, . . . , An, B1, B2, . . . , Bm)

• Tuples are formed by combining every tuple of R with every tuple of S.

• If NR, NS are the number of tuples in R and S, then there are NR ∗ NS tuples in R × S. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (13)

The Cartesian Product Operation: An Example

R × S A B C D E F R S a1 b1 c1 a1 b1 c2 A B C D E F a1 b1 c1 a3 b2 c3 a1 b1 c1 a1 b1 c2 a1 b2 c3 a1 b1 c2 a1 b2 c3 a3 b2 c3 a1 b2 c3 a3 b2 c3 a2 b1 c2 a2 b1 c2 a1 b1 c2

a2 b1 c2 a3 b2 c3 Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (14) The Theta-Join Operation

• Theta-Join operation combines related tuples from two relations into single tuples. It is an important operation because it allows us to process relationships among relations.

• Definition. If R and S are relations with schemas R(A1, A2, . . . , An) and S(B1, B2, . . . , Bm) then the theta-join, R 1 S, is a relation Q with schema:

Q(A1, A2, . . . , An, B1, B2, . . . , Bm)

• Tuples that are formed by: 1. Taking the cartesian product R × S; 2. Selecting from the product only the tuples that satisfy the join condition.

• If a relation R has NR tuples and a relation S has NS tuples then the theta-join of R and S will have between 0 and NR ∗ NS tuples. • A theta-join using only the equality comparison operator is called equijoin. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (15)

Equijoin Example

DEPARTMENT 1MGRSSN=SSN EMPLOYEE

DNAME DNUMBER MGRSSN . . . FNAME MINIT LNAME SSN . . .

Research 5 333445555 . . . Franklin T Wong 333445555 . . . Administration 4 987654321 . . . Jennifer S Wallace 987654321 . . . Headquarters 1 888665555 . . . James E Borg 888665555 . . .

• Note: The result of the equijoin shows pairs of attributes with the same value. In the above example, the values of the attributes MGRSSN and SSN are identical.

• The Natural Join operation has been created to get rid of superfluous attributes. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (16)

Natural Join: An Example

DEPARTMENT 1 DEPT LOCATIONS

DNAME DNUMBER MGRSSN MGRSTARTDATE LOCATION Headquarters 1 888665555 19-JUN-71 Houston Administration 4 987654321 01-JAN-85 Stafford Research 5 333445555 22-MAY-78 Bellaire Research 5 333445555 22-MAY-78 Sugarland Research 5 333445555 22-MAY-78 Houston Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (17)

Relational Expressions

Relational Algebra operators can be combined giving rise to Relational Algebra Expressions.

• A Relational Algebra Expression is a relational algebra operator applied either to a given relation or to another relational algebra expression.

Example: “Retrieve the full name and salary of all employees working in department 4”

πFNAME,LNAME,SALARY(σDNO=4(EMPLOYEE)) Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (18)

Expression Trees

• Relational Algebra Expressions can be depicted using Expression Trees.

• Leaves of Expression Trees stand for relations, internal nodes for operators.

• Expression Trees are useful to Design complex queries. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (19) Expression Trees: An Example

Example. MovieStar(Name,Address,Gender,Birthdate) StarIn(Title,Year,StarName) Query: “Find the birthdate and the movie title for those female stars who appeared in movies in 1996. SELECT Title, Birthdate FROM MovieStar, StarIn WHERE Year=1996 AND Gender='F' AND Name=StarName;

πTitle,Birthdate πTitle,Birthdate

σYear=1996∧Gender=0F 0∧StarName=Name 1StarName=Name

× σGender=0F 0 σYear=1996

MovieStar StarIn MovieStar StarIn Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (20)

Summary

• Introduction to Query Processing – An Overview of Relational Algebra – Relational Algebra and SQL Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (21)

Relational Algebra and SQL

1. SQL queries can be expressed using relation algebra operators with the addition of aggregation based operators: -by, order-by.

2. SQL has a bag or multiset semantics: the same tuple can appear more than one time in an SQL relation. • Assumption. We consider relational algebra as an Algebra on Bags. • When computing relational operation we stick to their bag form – i.e., we don’t eliminate duplicate tuples. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (22) Relational Algebra and SQL (cont.)

Algebra Symbol SQL Union ∪ UNION Intersection ∩ INTERSECT Difference − EXCEPT/NOT IN Cartesian Product × FROM

Projection πA(R) SELECT

Selection σC (R) WHERE Natural Join 1 NATURAL JOIN

Theta-Join 1C FROM+WHERE Duplicate Elimination δ DISTINCT Grouping γ GROUP BY, SUM, MIN,... Sorting τ ORDER BY Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (23)

Duplicate Elimination and Sorting

• Duplicate Elimination, δ(R). Operator that converts a bag to a set, eliminating from the result all the duplicate tuples.

• Sorting, τL(R). Operator that sorts a relation, R, w.r.t. the attributes indicated in the list L.

Example. Given R(A, B, C), then τC,B(R) orders the tuples of R by the value of the attribute C. Tuples with the same value for C are ordered w.r.t.the value for B. Tuples that agree on both C and B are ordered arbitrarily. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (24)

Grouping and Aggregation in SQL

• Grouping. GROUP BY A1, ..., An. Group a relation according to the values of the attributes listed.

• Aggregation Operators. AVG, SUM, COUNT, MIN, MAX apply to an attribute, present in the SELECT clause.

• Having. A HAVING clause must follow a GROUP BY clause. It filters the group of tuples based on some aggregate property of the group itself. The syntax is: HAVING , where < condition > is a Boolean expression formed by comparison conditions as in the WHERE clause. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (25)

Grouping and Aggregation in SQL: An Example

Example. StarIn(Title,Year,StarName). Query: “Find for each star who appeared in at least three movies, the earliest year in which they appeared.”

SELECT StarName, MIN(Year) as minYear FROM StarIn GROUP BY StarName HAVING COUNT(Title) >= 3; Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (26)

Grouping and Aggregation in Relational Algebra

We extend standard relational algebra with a single operator for grouping and

aggregation: γL(R). “L” is a list where each element can be: 1. An attribute of the relation R, corresponding to one of the attributes in the GROUP BY clause – Grouping attribute.

2. An aggregation operator (AVG, SUM, COUNT, MIN, MAX) applied to an attribute of R – aggregated attribute. Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (27)

Grouping and Aggregation in Relational Algebra (cont.)

Example. StarIn(Title,Year,StarName).

SELECT StarName, MIN(Year) as minYear FROM StarIn GROUP BY StarName HAVING COUNT(Title) >= 3;

πStarName,minYear

σctTitle≥3

γStarName,MIN(Year)→minYear,COUNT(Title)→ctTitle

StarIn Free University of Bolzano–Database 2. Lecture III, 2003/2004 – A.Artale (28)

Summary of Lecture III

• Introduction to Query Processing – An Overview of Relational Algebra – Relational Algebra and SQL