Relational Algebra
Total Page:16
File Type:pdf, Size:1020Kb
Cleveland State University CIS611 – Relational Database Systems Lecture Notes Prof. Victor Matos RELATIONAL ALGEBRA V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 1 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra The relational model of data (RM) was introduced by Dr. E. Codd (CACM, June 1970). The RM is simpler and more uniform than the preceding Network and Hierarchical model. S.Todd, (IBM 1976) presented PRTV the first implementation of a relational algebra DBMS. A. Klug added summary functions for statistical computing (ACM SIGMOD 1982). Roth, Oszoyoglu et al. (1987) extended the model to allow nested data structures Clifford, Tansel, Navathe, and others have added time especifications into the model Current research is aimed at extending the model to support complex data objects, multimedia mgnt, hyperdata, geographical, temporal and logical processing. V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 2 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra A relational database is a collection of relations A relation is a 2-dimensional table, in which each row represents a collection of related data values The values in a relation can be interpreted as a fact describing an entity or a relationship Relation name Attributes STUDENT Name SSN Address GPA Mary Poppins 111-22-3333 77 Picadilly St 4.00 Pepe Gonzalez 123-45-6789 123 Bonita Rd. 3.09 Tuples V. Sundarabatharan 999-88-7777 105 Calcara Ave. 3.87 Shi-Wua Yan 881-99-0101 778 Tienamen Sq. 3.88 V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 3 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra Domains, Tuples, Attributes, and Relations A domain D is a set of atomic values. A domain is given a name, data type and format. A relation schema R, denoted R(A1, A2, ...An), is a set of attributes (column names). The degree of a relation is the number of attributes of its relation scheme Each attribute Ai is the name of a role played by some domain D in R(A1...An). D is denoted the domain of Ai and is denoted by dom(Ai). A relation r defined on schema R(A1...An), also denoted by r(R), is a set of n-tuples r= { t1, t2, ...tm} Each n-tuple t is an ordered list of n values t= <v1, v2,...,vn>, where each value vi is an element of dom(Ai) or a special null value. A relation r(R) is a subset of the cartesian product of the domains dom(Ai ) that define R. V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 4 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra Characteristics of Relations Relations are defined as a (mathematical) set of tuples. Duplicate tuples are not allowed Order of tuples inside a relation is immaterial Ordering of values within a tuple is irrelevant; therefore the column ordering is not important. Each value in a tuple is atomic (not divisible) Recent research is oriented toward removing the atomicity of First Normal Form databases V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 5 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra Key Attributes of a Relation A superkey SK of relation r(R) is a group of attributes which uniquely identifies all the other attributes of r(R). A key K of a relation schema R is a minimal superkey of R. A relation schema R may have more than one key. Each of those is called a Candidate Key. It is common to select one of the candidate keys and elevate it to Primary Key. Convention: The attributes representing the primary key of schema R are underlined. Example: EMPLOYEE (SSN, Name, Address, Salary) STOCK (PartNum, SupNum, Quantity) V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 6 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra Integrity Constrains Integrity constrains are rules specified on the database and are expected to hold on every instance of that schema. 1. Key constraints specify the candidate keys of each relation scheme R. 2. Entity integrity constraints state that no primary key value can be null. 3. Referential integrity constraints are specified between two tables and is used to maintain the consistency among tuples of the two relations. Foreign key(s) of one relation are used to refer to primary key values in the other relation. V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 7 RELATIONAL QUERY LANGUAGES Classification based on the underlying language „model‟ Model Example 1. Pure Algebraic ISBL-IBM Info. Syst. Base Lang. 2. Pure Predicate Calculus Tuple Oriented Type QUEL Ingres Domain Oriented QBE, STBE 3. Mixed Algebra-Calculus SQL 4. Object Oriented DB4O 5. Associative Sentences - LazySoft 6. Other Cache (Object-Relational) …. V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 8 THE RELATIONAL DATA MODEL (RM) and the Relational Algebra Relational Algebra Collection of operators which are used to manipulate entire relations. The result of each operation is a new relation. Consists of two grups: operations on sets and operations specifaclly designed to manipulate relational databases Operations on sets: UNION DIFFERENCE INTERSECTION CARTESIAN PRODUCT Operations on databases SELECT PROJECT JOIN AGGREGATE DIVISION RENAME V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 9 RELATIONAL ALGEBRA UNION The result of this operation, denoted (r s ) or ( r + s ), is a relation that includes all tuples that either are in r or s or both in r and s. Duplicate tuples are eliminated. Combined relations must be union-compatible r + s = { t / t r or t s } r A B C s A B C 1 1 1 1 2 3 2 2 2 1 1 1 3 3 3 3 2 1 r + s A B C 1 1 1 2 2 2 3 3 3 1 2 3 3 2 1 V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 10 RELATIONAL ALGEBRA DIFFERENCE The result of this operation, denoted ( r - s ), is a relation that includes all tuples that are in r but not in s. Participating relations must be union-compatible r - s = { t / t r and t s } Example r A B C s A B C 1 1 1 1 2 3 2 2 2 1 1 1 3 3 3 3 2 1 r - s A B C 2 2 2 3 3 3 NOTE: The difference operator is not commutative, that is ( in general ) r - s s - r V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 11 RELATIONAL ALGEBRA INTERSECTION The result of this operation, denoted ( r s ), is a relation that includes all tuples that are present in both r and s. Participating relations must be union-compatible r s = { t / t r and t s } Example r A B C s A B C 1 1 1 1 2 3 2 2 2 1 1 1 3 3 3 3 2 1 r s A B C 1 1 1 V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 12 RELATIONAL ALGEBRA CARTESIAN PRODUCT The operation, denoted ( r s ), is also known as the cross product or cross join. The purpose of the operator is to concatenate rows from two relations, making all possible combinations of rows. Consider relation schemas r(A1,A2,...An ) and s(B1,B2,...Bm ) Relations r and s, do not have to be union-compatible If r has n tuples, and s has m tuples, then (r s) will have a total of (n * m) tuples The resulting relation schema is ( A1,A2,...An, B1,...,Bm ) Example r2 A B C r2 x A B C D E s2 1 1 1 1 1 1 10 a 2 2 2 1 1 1 20 b 3 3 3 2 2 2 10 a 2 2 2 20 b s2 D E 3 3 3 10 a 10 a 3 3 3 20 b 20 b V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 13 RELATIONAL ALGEBRA PROJECTION The project operator extracts certain columns from the table and discards the other columns. Syntax: Result= (Table) Col where Col is the list of columns to be extracted from the Table Duplicate tuples in the resulting table are eliminated EXAMPLE Evaluate the expressions Temp1= (r ) A Temp2= ( r ) B,C r A B C Temp1 A Temp2 B C 1 610 3 1 610 3 1 620 3 2 620 3 1 600 2 600 2 1 650 2 650 2 2 610 3 634 4 2 634 4 V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 14 RELATIONAL ALGEBRA SELECTION The selection operator extracts certain rows from the table and discards the others. Retrieved tuples must satisfy a given filtering condition. Syntax: Result = cond (table) where Cond is a logical expression containing and, or, not operators on clauses of the form (table.column value) or (table.col1 table.col2) and = { =, >, >=, <, <=, <> } Entire rows (with all of their columns) are retrieved when the condition is met. EXAMPLE Evaluate the expression Temp1 = (B >=620) and (c<4) (r) r A B C Temp A B C 1 1 620 3 1 610 3 1 650 2 1 620 3 1 600 2 1 650 2 2 610 3 2 634 4 V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 15 RELATIONAL ALGEBRA RENAME In some cases, we may want to rename the attributes of a relation or the relation name or both. The rename operator is useful to avoid situations in which a query produces columns with the same name (perhaps different meaning). Syntax: Result = oldName ← newName (table) where oldName is a column in the table and newName is the new identification for the column. Only the column name is changed, data remains intact. EXAMPLE Evaluate the expression Temp1 = A ← Section, B ← Course, B ← Credits (r) r A B C Temp Sectio Cours Credits 1 n e 1 610 3 1 610 3 1 620 3 1 620 3 1 600 2 1 600 2 1 650 2 1 650 2 2 610 3 2 610 3 2 634 4 2 634 4 V.