<<

Cleveland State University CIS611 – Relational Systems Lecture Notes Prof. Victor Matos

RELATIONAL

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 1 THE RELATIONAL (RM) and the Relational Algebra

The of data (RM) was introduced by Dr. E. Codd (CACM, June 1970).

The RM is simpler and more uniform than the preceding Network and Hierarchical model.

S.Todd, (IBM 1976) presented PRTV the first implementation of a relational algebra DBMS.

A. Klug added summary functions for statistical computing (ACM SIGMOD 1982).

Roth, Oszoyoglu et al. (1987) extended the model to allow nested data structures

Clifford, Tansel, Navathe, and others have added time especifications into the model

Current research is aimed at extending the model to support complex data objects, multimedia mgnt, hyperdata, geographical, temporal and logical processing.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 2 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

A is a collection of relations

A is a 2-dimensional , in which each represents a collection of related data values

The values in a relation can be interpreted as a fact describing an entity or a relationship

Relation name Attributes

STUDENT Name SSN Address GPA Mary Poppins 111-22-3333 77 Picadilly St 4.00 Pepe Gonzalez 123-45-6789 123 Bonita Rd. 3.09 V. Sundarabatharan 999-88-7777 105 Calcara Ave. 3.87 Shi-Wua Yan 881-99-0101 778 Tienamen Sq. 3.88

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 3 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Domains, Tuples, Attributes, and Relations

A domain D is a of atomic values. A domain is given a name, data type and format.

A relation schema R, denoted R(A1, A2, ...An), is a set of attributes ( names). The degree of a relation is the number of attributes of its relation scheme

Each attribute Ai is the name of a role played by

some domain D in R(A1...An). D is denoted the

domain of Ai and is denoted by dom(Ai).

A relation r defined on schema R(A1...An), also

denoted by r(R), is a set of n-tuples r= { t1, t2, ...tm}

Each n- t is an ordered list of n values t=

, where each value vi is an element

of dom(Ai) or a special value.

A relation r(R) is a subset of the cartesian

product of the domains dom(Ai ) that define R.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 4 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Characteristics of Relations

Relations are defined as a (mathematical) set of tuples.

Duplicate tuples are not allowed

Order of tuples inside a relation is immaterial

Ordering of values within a tuple is irrelevant; therefore the column ordering is not important.

Each value in a tuple is atomic (not divisible)

Recent research is oriented toward removing the atomicity of First Normal Form

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 5 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Key Attributes of a Relation

A SK of relation r(R) is a group of attributes which uniquely identifies all the other attributes of r(R).

A key K of a relation schema R is a minimal superkey of R.

A relation schema R may have more than one key. Each of those is called a .

It is common to one of the candidate keys and elevate it to Primary Key.

Convention: The attributes representing the primary key of schema R are underlined. Example: EMPLOYEE (SSN, Name, Address, Salary) STOCK (PartNum, SupNum, Quantity)

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 6 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Integrity Constrains

Integrity constrains are rules specified on the database and are expected to hold on every instance of that schema.

1. Key constraints specify the candidate keys of each relation scheme R.

2. Entity integrity constraints state that no primary key value can be null.

3. constraints are specified between two tables and is used to maintain the consistency among tuples of the two relations. (s) of one relation are used to refer to primary key values in the other relation.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 7 RELATIONAL QUERY LANGUAGES

Classification based on the underlying language „model‟

Model Example

1. Pure Algebraic ISBL-IBM Info. Syst. Base Lang.

2. Pure Predicate Calculus Tuple Oriented Type QUEL Ingres Domain Oriented QBE, STBE

3. Mixed Algebra-Calculus SQL

4. Object Oriented DB4O

5. Associative Sentences - LazySoft

6. Other Cache (Object-Relational) ….

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 8 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Relational Algebra

Collection of operators which are used to manipulate entire relations.

The result of each operation is a new relation.

Consists of two grups: operations on sets and operations specifaclly designed to manipulate relational databases

Operations on sets: DIFFERENCE INTERSECTION

Operations on databases SELECT PROJECT JOIN AGGREGATE DIVISION RENAME

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 9 RELATIONAL ALGEBRA

UNION

The result of this operation, denoted (r s ) or ( r + s ), is a relation that includes all tuples that either are in r or s or both in r and s. Duplicate tuples are eliminated. Combined relations must be union-compatible r + s = { t / t r or t s }

r A B C s A B C 1 1 1 1 2 3 2 2 2 1 1 1 3 3 3 3 2 1

r + s A B C 1 1 1 2 2 2 3 3 3 1 2 3 3 2 1

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 10 RELATIONAL ALGEBRA

DIFFERENCE

The result of this operation, denoted ( r - s ), is a relation that includes all tuples that are in r but not in s. Participating relations must be union-compatible r - s = { t / t r and t s }

Example

r A B C s A B C 1 1 1 1 2 3 2 2 2 1 1 1 3 3 3 3 2 1

r - s A B C 2 2 2 3 3 3

NOTE: The difference operator is not commutative, that is ( in general ) r - s s - r

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 11 RELATIONAL ALGEBRA

INTERSECTION

The result of this operation, denoted ( r s ), is a relation that includes all tuples that are present in both r and s. Participating relations must be union-compatible r s = { t / t r and t s }

Example

r A B C s A B C 1 1 1 1 2 3 2 2 2 1 1 1 3 3 3 3 2 1

r s A B C 1 1 1

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 12 RELATIONAL ALGEBRA

CARTESIAN PRODUCT

The operation, denoted ( r s ), is also known as the cross product or cross . The purpose of the operator is to concatenate rows from two relations, making all possible combinations of rows.

Consider relation schemas r(A1,A2,...An ) and s(B1,B2,...Bm ) Relations r and s, do not have to be union-compatible If r has n tuples, and s has m tuples, then (r s) will have a total of (n * m) tuples

The resulting relation schema is ( A1,A2,...An, B1,...,Bm )

Example

r2 A B C r2 x A B C D E s2 1 1 1 1 1 1 10 a 2 2 2 1 1 1 20 b 3 3 3 2 2 2 10 a 2 2 2 20 b s2 D E 3 3 3 10 a 10 a 3 3 3 20 b 20 b

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 13 RELATIONAL ALGEBRA

PROJECTION

The project operator extracts certain columns from the table and discards the other columns. Syntax: Result= (Table) Col where Col is the list of columns to be extracted from the Table Duplicate tuples in the resulting table are eliminated

EXAMPLE Evaluate the expressions Temp1= (r ) A

Temp2= ( r ) B,C

r A B C Temp1 A Temp2 B C 1 610 3 1 610 3 1 620 3 2 620 3 1 600 2 600 2 1 650 2 650 2 2 610 3 634 4 2 634 4

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 14 RELATIONAL ALGEBRA

SELECTION

The selection operator extracts certain rows from the table and discards the others. Retrieved tuples must satisfy a given filtering condition.

Syntax: Result = cond (table)

where Cond is a logical expression containing and, or, not operators on clauses of the form (table.column value) or (table.col1 table.col2) and = { =, >, >=, <, <=, <> } Entire rows (with all of their columns) are retrieved when the condition is met.

EXAMPLE Evaluate the expression

Temp1 = (B >=620) and (c<4) (r)

r A B C Temp A B C 1 1 610 3 1 620 3 1 620 3 1 650 2 1 600 2 1 650 2 2 610 3 2 634 4

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 15 RELATIONAL ALGEBRA

RENAME

In some cases, we may want to rename the attributes of a relation or the relation name or both. The rename operator is useful to avoid situations in which a query produces columns with the same name (perhaps different meaning).

Syntax: Result = oldName ← newName (table)

where oldName is a column in the table and newName is the new identification for the column. Only the column name is changed, data remains intact.

EXAMPLE Evaluate the expression

Temp1 = A ← Section, B ← Course, B ← Credits (r)

r A B C Temp Sectio Cours Credits 1 n e 1 610 3 1 610 3 1 620 3 1 620 3 1 600 2 1 600 2 1 650 2 1 650 2 2 610 3 2 610 3 2 634 4 2 634 4

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 16 RELATIONAL ALGEBRA

JOIN

The join operation, denoted by (Tab1 Tab2), is < join condition> used to combine related tuples from two relations

Join-condition format is: (Table1.Col1 Table2.Col2), where could be { =, >, >=, <, <=, <> } Restrictions of the form (Table.Col Value), can be and-ed, or or-ed to the joining condition.

EXAMPLE

Consider relation schemas r(A,B,C ) and s(D,E ), and the expression: Temp1= ( r s ) (C D)

r A B C 1 1 1 2 2 2 3 3 3 Temp1 A B C D E 1 1 1 1 a s D E 2 2 2 2 b 1 a 2 2 2 2 c 2 b 2 c

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 17 RELATIONAL ALGEBRA

NATURAL JOIN

The natural join operation, denoted by (Tab1 * Tab2 ), is used to combine tuples of two relations under an equi-join. Related columns must have the same name & domain

Implicit Join-Conditions are: (Table1.Col1 = Table2.Col2)

EXAMPLE

Consider relation schemas r(A,B,C ) and s(B,C,D ), and the expression: Temp= ( r * s )

r A B C 1 1 1 2 1 0 4 3 2 Temp A B C D 1 1 1 a s B C D 4 3 2 c 1 1 a 1 2 b 3 2 c 4 3 d

The implicit join-condition is (r.B=s.B) and (r.C=s.C)

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 18

RELATIONAL ALGEBRA

LEFT OUTER JOIN

The left outer join operation, denoted by (r  s ), is a special < join condition> case of the general join. LOJ keeps in the resulting table representation from every tuple that appears in the first (or left) relation If no matching value for r is found in s, then the attributes of s appear in the result as null values

EXAMPLE

Consider relation schemas r(A,B,C ) and s(D,E ), and the expression: Temp1= ( r  s ) ( A D )

r A B C 1 1 1 2 2 2 3 3 3 Temp1 A B C D E 1 1 1 1 a s D E 2 2 2 2 b 1 a 2 2 2 2 c 2 b 3 3 3 null null 2 c

NOTE: Outer-join is not a primitive operator. It could be expressed as follows: ( r s ) = r( s ( ( r ) ( s )) Null L ) AD YY

Where: Y= Schema(r) Schema(s), and L = degree(s) - |Y|

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 19

RELATIONAL ALGEBRA

AGGREGATE FUNCTIONS

Originally proposed by A. Klug (1982) to extend the scope of relational algebra allowing mathematical computations of summary functions.

Syntax: ( ) Common functions are: MAX, MIN, AVG, SUM, COUNT Grouping attributes force a fragmentation of the relation, the function is computed in each independent group. Output consists of the grouping attributes and the result of the summary functions on each group If no grouping field(s) is given the function(s) applies on the entire table

EXAMPLE

Compute Temp= A Sum(B), Max(C) ( r )

r A B C 1 10 1 Group-by Summary field Data 1 2 5 Temp A Sum_B Max_C 2 3 3 1 12 5 3 6 10 2 3 3 3 5 7 3 11 10

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 20 RELATIONAL ALGEBRA

DIVISION

The division operation, denoted by (r / s) is useful when you need a mechanism to identify the tuples of some table that are related to each and every one of the tuples of a second group.

EXAMPLE

Consider relation schemas r(A,B ) and s(B ), and the expression: Temp1= ( r / s )

r A B Temp A 1 1 1 1 1 2 1 3 the algebraic expression 1 4 r[A,B ] / s[B ] 2 1 selects the A -values from the dividend table 2 3 r[A,B], whose B-values are a super-set of those 3 3 B-values held in the divisor table s[B].

s B 1 2 3

NOTE Division is not a primitive operation it could be expressed as: r / s = A for table schemes r(A,B) and s(B)

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 21 RELATIONAL ALGEBRA

PACK

Assume A is an attribute in Schema(r). The PackA(r) operator transforms the A-values into a ‘nested’ representation. A Nested field is a set of related atomic values.

EXAMPLE Consider relation schemas r(A,B,C) and the expression: temp1 = PackC(r) r A B C temp1 A B C b1 40 a1 b1 40 {a1, a2}

b1 40 a2 Packc(r) b2 50 {a3}

b2 50 a3 b3 60 {a2, a4, a5}

b3 60 a4 b4 60 {a6}

b3 60 a2

b3 60 a5

b4 60 a6

Consider relation schemas s(A,B,C) and the expression: temp2 = PackC(s)

s A B C temp2 A B C PackC(s) m1 1 {a1, a2} m1 1 {a1, a2, a3}

m1 1 {a3} m2 2 {a4, a7, a8}

m2 2 { a4 } m2 1 {a5, a6} m2 1 {a5, a6} m2 2 {a4, a7, a8}

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 22 RELATIONAL ALGEBRA

PACK (cont…)

Let A be one of the n attributes in Schema (A1…An). Assume the relation r is defined over the Schema (A1…An).

Let CA = Schema (A1…An) – {A}. Therefore |CA| = n-1

For each (n-1)-tuple gr() we define the sets Wg[CA] and Wg[A] as CA follows:

Wg[A] = { t[A] / t and t[CA]= g } if A is atomic, and

Wg[A] = { x / t) t otherwise.

Then PackA(r) = { Wg / gr() } CA Therefore, the Pack operator converts sets of r-tuples whose (n-1) attributes for CA are the same into a single tuple.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 23 RELATIONAL ALGEBRA

UNPACK

Unpack is the counterpart of the Pack operator. When applied on the set- valued attribute A of a relation r, this operator transforms the single non- atomic version of the tuple into a group of records in which the attribute A is atomic. EXAMPLE Consider relation schemas r(A,B,C) and the expression: temp1 = UnpackC(r) r A B C temp1 A B C b1 1 {a1, a2} b1 1 a1

b2 2 {a3} b1 1 a2 b2 2 {a2, a4, a5} b2 2 a3 b4 3 {a6} b3 2 a4 UnpackC(r) b3 2 a2

b3 2 a5

b4 3 a6

Let A be one of the n attributes in Schema (A1…An). Assume the relation r is defined over the Schema (A1…An).

Let CA = Schema (A1…An) – {A}. Therefore |CA| = n-1

UPA ( {t} ) = { t[A] } if A is atomic, and

UPA( {t} ) = { t’ / (t’[A] t[A]) and (t’[CA] = t[CA]) } otherwise.

Then PackA(r) = { Wg / (({})UPA t } tr If A is atomic then UPA(r)= r, otherwise UPA(r) maps each tuple t in r into a set of (decompressed) tuples such that each element in t[A] becomes the atomic A-value of a new decompressed tuple.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 24

RELATIONAL ALGEBRA

EXAMPLE QUERIES

QUERY 1. Retrieve the name and address of all employees who work in the 'Research' department.

QUERY 2. For every project located in 'Cleveland', list the project number, the controlling department number, and the department manager's last name, address, and birthdate.

QUERY 3. Find the name of employees who work on all projects controlled by department number 5.

QUERY 4. Make a list of project numbers for projects that involve an employee whose last name is 'Smith', either as a worker or as a manager of the department that controls the project.

QUERY 5. List the name of all employees with two or more dependents.

QUERY 6. Retrieve the name of employees who have no dependents.

QUERY 7. List the name of managers who have at least one dependent.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 25 Company Database

PROJECT WORKS_ON PNAME ESSN PNUMBER PNO PLOCATION Hours DNUM

EMPLOYEE FNAME MINIT DEPARTMENT LNAME DEPENDENT DNAME SSN ESSN DNUMBER BDATE DEPENDENT_NAME MGRSSN ADDRESS SEX MGRSTARTDATE SEX BDATE SALARY RELATIONSHIP SUPERSSN DNO

DEPT_LOCATIONS DNUMBER DLOCATION

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 26

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 27 IST 331 Brief Notes on Relational Algebra V. Matos.

Consider the relation schema of the COMPANY database given below

EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno) DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate) PROJECT (pname, pnumber, plocation, dnum) WORKS_ON (essn, pno, hours) DEPENDENT (essn, dependent-name, sex, bdate, relationship)

Operator Example Comments

Answer Find the female employees Selection ()Employee earning at least $25K. (sex ' F ') and ( salary 25000)

Get the Social Sec. No. and first Answer() Employee ssn, Fname name of each employee Answer Merge employee and department Join Employee Department records according to matching (..)L dno R dnumber dept. numbers. Get the male employees earning Union Answer()() Employee Employee at least $35K and the female (sex ' F ') and ( salary 25000) ( sex ' M ') and ( salary 35000) employees whose salary is exactly $25K Minus Answer Employee() Employee Get the employees who do not dno 4 work for department No. 4 Intersection Answer()() Employee Employee Get all the female employees sex' F ' dno 4 who work for dept. 4 Find the average salary of Aggregatio employees grouping by sex and Answerdno,() sexF average salary () Employee dept. no. Put the results in the n table defined as: Answer(dno,sex,average_salary) Answer( WorksOn ) ( ( Project )) Get the SSN of employees Division plocation'' Clev working in each one of the essn, pno Pnumber projects located in Cleveland Change the labels “fname” and Rename Answer() Employee “lname” in the Employee table to fname,, lname First Last “First”, and “Last”.

Q01. Get the SSN and Last name of each of the female managers

Answer1 ( ( Employee ) Department ) 1 (..)L ssn R MgrSsn ssn, lname sex'' F

Q02. Get the Social Sec. No. of those employees who are married.

Answer2 ( ( Dependent )) essn ('')relationship Spouse

1 Observation: The notation (ee 1 2) merges the tables produced by the expressions e1 and e2. The match is dictated by the joining (..)L a R b condition (L.a=R..b). The fragment L.a identifies the a-column as part of the table produced by the table/expression e1.The L and R qualifications indicate whether the source columns are located to the left or right side of the join-operator .

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 28 Q03. Get the last name of the married female managers. Rename the column to “Mgr Name”

Answer3 ( ( Answer 1 Answer 2)) (..)L ssn R essn lname"" Mgr Name Lname

Q04. Get the last name of married employees who have at least one daughter and one son.

Boys( ( Dependent )) essn (relationship ' Son ') Girls(()) Dependent essn (relationship ' Daughter ')

Married(()) Dependent essn (relationship ' Spouse ')

TheSsn Married Boys Girls

Answer() Employee TheSsn L.. ssn R essn Lname

Q05. Get the last name of married employees who have no children.

TheSsn Married() Boys Girls

Answer() Employee TheSsn L.. ssn R essn Lname

Q06. Get the last name of married employees who only have daughters.

TheSsn Married() Girls Boys

Answer() Employee TheSsn L.. ssn R essn Lname

Q07. Get the last name and salary of each employee as well as that of their corresponding (direct) supervisor.

Answer(()()) Employee Employee EmpSsn ssn,,,, salary EmpSsn EmpSalary(..)L superSsn R BossSsn ssn salary BossSsn BossSalary EmpSalary BossSsn BossSalary

Q08. Get the last name of employees who work on five or more projects.

theTallyessnF count() pno () worksOn

Answer(()) Employee theTally Lname L.. ssn R essn (count _ pno 5)

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 29 Relational Algebra – Practice Test

Last Name: ______First Name:______

Consider the relation schema of the COMPANY database given below EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno) KEY: ssn

DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate) KEY: dnumber.

PROJECT (pname, pnumber, plocation, dnum) KEY: pnumber.

WORKS_ON (essn, pno, hours) KEY: (essn, pno)

DEPENDENT (essn, dependent-name, sex, bdate, relationship) KEY: (essn, dependent-name)

Formulate the following question in Relational Algebra : 1. Give the last name of those employees who work in any project(s) where there are more female than male employees.

2. Give the last name of those female managers who work in each of the projects located in Cleveland.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 30