<<

Codd’s Relational (RA) § Data is stored as a of relations CS 2451 • Relations implemented as tables Systems: • in a is a in the • Attribute (from domain) in relation is in table Relational Algebra & § RA = A set of mathematical operators that compose, modify, and combine within different relations § Relations are sets. http://www.seas.gwu.edu/~bhagiweb/cs2541 • Mapping: maps elements in one set to another Spring 2020 § Relational algebra operations operate on relations and Instructor: Dr. Bhagi Narahari produce relations (“closure”) f: Relation à Relation f: Relation x Relation à Relation

Based on slides © Ramakrishnan&Gerhke, Dr R. Lawrence, UBC 1 2

Preliminaries Relational Algebra § used to update and retrieve data that is § A query is applied to relation instances, and the result of stored in a . a query is also a relation instance. § Relational algebra is a set of relational operations for • Schemas of input relations for a query are fixed (but query will run retrieving data. regardless of instance!) • Just like algebra with numbers, relational algebra consists of operands • The schema for the result of a given query is also fixed -- (which are relations) and a set of operators. Determined by definition of query language constructs. § Every takes as input one or more § Notation: Positional ( R[0] ) vs. named-field notation relations and produces a relation as output. (R.name): • Closure property - input is relations, output is relations • Positional notation easier for formal definitions, named-field • Unary operations - operate on one relation notation more readable. • Binary operations - have two relations as input We will use named field notation R.name § A sequence of relational algebra operators is called a • Both used in SQL relational algebra expression.

3 4

1 Relational Algebra Operators Relational Algebra Expression: Syntax § RA operators operate on relations and produce relations – § Basic operations: closed algebra • Selection ( s ) Selects a subset of rows from relation. • Defined recursively • ( p ) Deletes unwanted columns from relation. § (Basis) basic expression consists of a relation in the schema • Cross-product ( ´ ) Allows us to combine two relations. or a constant relation • Set-difference ( - ) Tuples in relation 1, but not in relation 2. • What is a constant relation ? • ( ) Tuples in relation. 1 or in relation. 2. ! § (R) Let E1 and E2 be RA expressions, then § Note: cross-product, set-difference, union are the set operations you have seen before § Additional operations: • Intersection, , assignment, division, renaming: Not essential, but (very!) useful. § Since each operation returns a relation, operations can be composed! (Algebra is “closed”.)

5 6

RA expressions..contd.. Operator Precedence

• (E1 ∪ E2) is a RA expression § Just like mathematical operators, the relational operators • (E1 – E2) is a RA expression have precedence. • (E1 × E2) is a RA expression

• σP (E1) is a RA expression Where P is a predicate (conditional statement) § The precedence of operators from highest to lowest is:

• πS(E1) is a RA expression • unary operators - s, P, r Where S is subset of the attributes in the schema • and joins - X, ⨝ , division • ρR (E1) is a RA expression • intersection § Operations Can be composed • union and set difference • If R1, R2 are relations (sets), then R1 R2 is also a relation (set) § Parentheses can be used to changed the order of • Closed algebra – how is closure defined ?? operations. § Operations are defined as Set operations • Input is a set, output is a set § It is a good idea to always use parentheses around the SQL allows duplicated, RA does not argument for both unary and binary operators. § Above definition can be used to define syntax and construct a parser 7 8

2 Data Instance for Mini-Banner Example Projection Operator § We want to query the database and fetch only some STUDENT Takes COURSE column/attribute from the relation sid name sid exp-grade cid cid subj sem § Example: We want student name only from students table 1 Jill 1 A 550-0103 550-0103 DB F03 2 Matt 1 A 700-1003 700-1003 Math S03 3 Jack 3 A 700-1003 501-0103 Arch F03 Pname(Student) 4 Maury 3 C 500-0103 4 C 500-0103

PROFESSOR Teaches fid name fid cid 1 Narahari 1 550-0103 2 Youssef 2 700-1003 8 Choi 8 501-0103

9 10

Projection Operation Formal Definition Projection Example

§ Given a list of column names a and a relation R, pa (R) extracts the columns in a from the relation. Emp Relation P eno,ename (Emp) § The projection operation on relation R with output attributes eno ename title salary eno ename E1 J. Doe EE 30000 E1 J. Doe A1,…,Am is denoted by PA1,…,Am(R). E2 M. Smith SA 50000 E2 M. Smith E3 A. Lee ME 40000 E3 A. Lee E4 J. Miller PR 20000 E4 J. Miller PA1,…,Am(R)={t[A1,…, Am] | tÎR} E5 B. Casey SA 50000 E5 B. Casey E6 L. Chu EE 30000 E6 L. Chu where E7 R. Davis ME 40000 E7 R. Davis E8 J. Jones SA 50000 E8 J. Jones • R is a relation, t is a tuple variable

• {A1,…,Am} is a subset of the attributes of R over which the projection will be performed.

• Order of A1,…, Am is significant in the result.

• Cardinality of PA1,…,Am(R) is not necessarily the same as R because of duplicate removal.

11 12

3 Recap- Projection, Pa Selection Operation § Given a list of column names a and a relation R, pa § We want to query table and fetch only the rows that satisfy (R) extracts the columns in a from the relation. come condition Output is relation…so removes duplicates! • Example: Students whose grade = B § Example: find sid and grade from enrollment table § The selection operation s (sigma) is a unary operation Takes P sid,exp-grade (Takes) that takes in a relation as input and returns a new relation as output that contains a subset of the tuples of the input sid exp-grade cid sid exp-grade relation. 1 A 550-0103 1 A • output relation has the same number of columns as the input relation, but may have less rows. 1 A 700-1003 3 A 3 A 700-1003 3 C 3 C 500-0103 4 C § To determine which tuples are in the output, the selection 4 C 500-0103 operation has a specified condition, called a predicate, that tuples must satisfy to be in the output. Note: duplicate elimination. In contrast, SQL returns by default • The predicate is similar to a condition in an if statement. a and duplicates must be explicitly removed. § Selection sq R takes a relation R and extracts those rows from it that satisfy the condition q 13 14

Select Operation s Select Operation s § The selection operation on relation R with predicate F is § Notation: spredicate(Relation) § Relation: can be table or result of another query denoted by sF(R). § Predicate: 1. Simple predicate sF(R)={t | tÎR and F(t) is true} • Attribute1 = attribute2 • Attribute = constant (also >, <, >=, <=, <> ) where 2. Complex predicate • R is a relation, t is a tuple variable • Predicate AND predicate • F is a formula (predicate) consisting of • Predicate OR predicate operands that are constants or attributes • NOT predicate comparison operators: <, >, =, ¹, £, ³ § Idea: rows from a Relation based on a predicate logical operators:AND, OR, NOT

§ Similarity between formal definition of operators in RA and the syntax of Relational Calculus

15 16

4 Complex Predicate Conditions Selection Example

Emp Relation stitle = 'EE' (Emp) § Conditions are built up from boolean-valued eno ename title salary operations on the field names. E1 J. Doe EE 30000 eno ename title salary • exp-grade <> “A”, name = “Jill”, STUDENT.sid=Takes.sid E2 M. Smith SA 50000 E1 J. Doe EE 30000 § RA allows comparison predicate on attributes E3 A. Lee ME 40000 E6 L. Chu EE 30000 • =, not=, >, <, >=, <= E4 J. Miller PR 20000 E5 B. Casey SA 50000 § Larger predicates can be formed using logical ssalary > 35000 Ú title = 'PR' (Emp) connectives – or (∨ ) and and (∧ ) and not ( ⌉ ) E6 L. Chu EE 30000 eno ename title salary § Selection predicate can include comparison between E7 R. Davis ME 40000 E8 J. Jones SA 50000 E2 M. Smith SA 50000 attributes E3 A. Lee ME 40000 § We don't lose any expressive power if we don't have E4 J. Miller PR 20000 complex predicates in the language, but they are E5 B. Casey SA 50000 convenient and useful in practice. E7 R. Davis ME 40000 E8 J. Jones SA 50000

Logic operators: Ù AND, Ú OR, ¬ NOT

17 18

Union È (same as set union operation) Union Example § If two relations have the same structure (Database eno ename title salary terminology: are union-compatible. Programming Emp E1 J. Doe EE 30000 language terminology: have the same type) we can E2 M. Smith SA 50000 perform set operations. E3 A. Lee ME 40000 E4 J. Miller PR 20000 § All persons who are students or faculty E5 B. Casey SA 50000 E6 L. Chu EE 30000 Peno(Emp) È Peno(WorksOn) E7 R. Davis ME 40000 STUDENT FACULTY STUDENT È FACULTY E8 J. Jones SA 50000 eno E1 fid name id name E2 sid name WorksOn eno pno resp dur 1 Darby 1 Darby E1 P1 Manager 12 E3 1 Darby E4 E2 P1 Analyst 24 12 Youssef 2 Matt E5 2 Matt E2 P2 Analyst 6 18 Choi 3 Dan E6 3 Dan E3 P4 Engineer 48 E7 4 Marty E5 P2 Manager 24 4 Maury E8 12 Youssef E6 P4 Manager 48 E7 P3 Engineer 36 18 Choi E7 P5 Engineer 23

19 20

5 Type Matching for Set operations Set Difference § Same number of attributes § Set difference is a binary operation that takes two relations § Same type of attributes R and S as input and produces an output relation that • Each position must match domain contains all the tuples of R that are not in S. Real systems sometimes allow sub-types: CHAR(2) and CHAR(20) § General form: § R – S = {t | tÎR and tÏS} § where R and S are relations, t is a tuple variable.

§ Note that: • R – S ¹ S – R • R and S must be union compatible.

21 22

Difference – Set Difference Example § Another set operator. Example: Emp Relation WorksOn Relation eno ename title salary eno pno resp dur E1 J. Doe EE 30000 E1 P1 Manager 12 STUDENT E2 M. Smith SA 50000 E2 P1 Analyst 24 FACULTY STUDENT – FACULTY E3 A. Lee ME 40000 E2 P2 Analyst 6 sid name sid name E4 J. Miller PR 20000 E3 P4 Engineer 48 sid name E5 P2 Manager 24 1 Sarah 2 Matt E5 B. Casey SA 50000 E6 P4 Manager 48 1 Sarah 3 Dan E6 L. Chu EE 30000 12 Youssef E7 P3 Engineer 36 2 Matt E7 R. Davis ME 40000 18 Choi 4 Maury E8 J. Jones SA 50000 E7 P5 Engineer 23 3 Dan 4 Maury Peno(Emp) - Peno(WorksOn) eno E4 Question 1: What is the meaning of this query? E8

23 24

6 Intersection?? Deriving Intersection using Union and Difference § People who are both Students and Faculty ?? Intersection: as with set operations, derivable from difference

§ Do we need an Intersection operator?

A ∩ B ≡ (A – (A-B)) = (B – (B-A)) A-B B-A

A B

25 26

Operators that ‘combine’ relations Product X § Thus far, only operated on a single relation § “Join” is a generic term for a variety of operations that § How to connect two relations ? connect two relations. The basic operation is the • To find name of students taking a specific course with cid, we need to cartesian product, R × S, which concatenates every look at both students and Takes (enrolled) tables tuple in R with every tuple in S. Example:

§ Operator(s) that produce a relation (set of tuples) after STUDENT x SCHOOL combining two different relations STUDENT SCHOOL sid name school sid name school 1 Jill UPenn 1 Jill UPenn § provides us with the cartesian product operator 2 Matt UPenn 2 Matt GWU (between two sets; but can be applied to product of any 1 Jill GWU number of sets – to get a k-tuple) 2 Matt GWU

27 28

7 Cartesian Product Example Product/Join

Emp Relation § Tuple in R1 × R2 constructed by associating a tuple from R1 Emp ´ Proj eno ename title salary with every tuple in R2 E1 J. Doe EE 30000 eno ename title salary pno pname budget E1 J. Doe EE 30000 P1 Instruments 150000 § If R1 has n1 tuples and R2 has n2 tuples how many does R1 E2 M. Smith SA 50000 E2 M. Smith SA 50000 P1 Instruments 150000 E3 A. Lee ME 40000 E3 A. Lee ME 40000 P1 Instruments 150000 × R2 have ? E4 J. Miller PR 20000 P1 Instruments 150000 E4 J. Miller PR 20000 E1 J. Doe EE 30000 P2 DB Develop 135000 § ‘Same’ attribute can appear in both tables ? E2 M. Smith SA 50000 P2 DB Develop 135000 • This is the “link” between the two tables Proj Relation E3 A. Lee ME 40000 P2 DB Develop 135000 E4 J. Miller PR 20000 P2 DB Develop 135000 pno pname budget E1 J. Doe EE 30000 P3 CAD/CAM 250000 § Rather than write Product followed by Selection predicate, P1 Instruments 150000 E2 M. Smith SA 50000 P3 CAD/CAM 250000 some RA versions give us join operators that perform E3 A. Lee ME 40000 P3 CAD/CAM 250000 P2 DB Develop 135000 E4 J. Miller PR 20000 P3 CAD/CAM 250000 multiple operations P3 CAD/CAM 250000 • Theta join: product followed by selection operator Equijoin when selection is an equality predicate • Natural Join…in addition, apply projection operator • Outer join, etc. etc.

29 30

(Theta) Join, q: A Combination of Types of Joins ⋈ Product and Selection § The q-Join is a general join in that it allows any expression in the condition F. However, there are more specialized § Example: Find students (id and name) and courses they took joins that are frequently used. with grades and cid

§ A equijoin is theta-join that only contains the equality sSTUDENT.sid=Takes.sid (STUDENT x Takes) = operator (=) in formula F. • e.g. WorksOn WorksOn.pno = Proj.pno Proj (STUDENT ⋈STUDENT.sid=Takes.sid Takes)

§ A natural join over two relations R and S denoted by R ⨝ sid:1 name sid:2 exp-grade cid Join S is the equijoin of R and S over a set of attributes common 1 Jill 1 A 550-0103 condition to both R and S. 1 Jill 1 A 700-1003 • It removes the “extra copies” of the join attributes. 3 Alex 3 A 700-1003 • The attributes must have the same name in both relations. 3 Alex 3 C 500-0103 4 Maury 4 C 500-0103

31 32

8 q -Join Example Natural Joins § Example: Find students (id and name) and courses they took WorksOn Relation WorksOn dur*10000 > budget Proj with grades and cid eno pno resp dur eno pno resp dur P.pno pname budget E1 P1 Manager 12 E2 P1 Analyst 24 P1 Instruments 150000 sSTUDENT.sid=Takes.sid (STUDENT x Takes) = E2 P1 Analyst 24 E2 P1 Analyst 24 P2 DB Develop 135000 E2 P2 Analyst 6 E3 P4 Engineer 48 P1 Instruments 150000 E3 P4 Engineer 48 P2 DB Develop 135000 (STUDENT ⋈STUDENT.sid=Takes.sid Takes) E3 P4 Engineer 48 E3 P4 Engineer 48 P3 CAD/CAM 250000 E5 P2 Manager 24 E3 P4 Engineer 48 P4 Maintenance 310000 sid:1 and sid:2 are duplicate information…. E6 P4 Manager 48 E5 P2 Manager 24 P1 Instruments 150000 E7 P3 Engineer 36 E5 P2 Manager 24 P2 DB Develop 135000 Do we need two columns? E6 P4 Manager 48 P1 Instruments 150000 Why not project only one of them ? E7 P4 Engineer 23 E6 P4 Manager 48 P2 DB Develop 135000 E6 P4 Manager 48 P3 CAD/CAM 250000 Proj Relation E6 P4 Manager 48 P4 Maintenance 310000 sid:1 name sid:2 exp-grade cid E7 P3 Engineer 36 P1 Instruments 150000 pno pname budget E7 P3 Engineer 36 P2 DB Develop 135000 1 Jill 1 A 550-0103 P1 Instruments 150000 E7 P3 Engineer 36 P3 CAD/CAM 250000 1 Jill 1 A 700-1003 P2 DB Develop 135000 E7 P4 Engineer 23 P1 Instruments 150000 P3 CAD/CAM 250000 E7 P4 Engineer 23 P2 DB Develop 135000 3 Alex 3 A 700-1003 P4 Maintenance 310000 3 Alex 3 C 500-0103 P5 CAD/CAM 500000 4 Maury 4 C 500-0103

33 34

“Natural” Join, ⋈ Equijoin Example

§ A common join to do is an equality join of two relations WorksOn Relation WorksOn WorksOn.pno = Proj.pno Proj on commonly named fields, and to leave one copy of eno pno resp dur eno pno resp dur P.pno pname budget those fields in the resulting relation. Example: E1 P1 Manager 12 E1 P1 Manager 12 P1 Instruments 150000 E2 P1 Analyst 24 E2 P1 Analyst 24 P1 Instruments 150000 E2 P2 Analyst 6 P2 DB Develop 135000 STUDENT ⋈ Takes = E2 P2 Analyst 6 E3 P4 Engineer 48 P4 Maintenance 310000 E3 P4 Engineer 48 E5 P2 Manager 24 P2 DB Develop 135000 (P sid:1,name, exp-grade, cid E5 P2 Manager 24 E6 P4 Manager 48 P4 Maintenance 310000 (STUDENT ⋈STUDENT.sid=Takes.sid Takes)) E6 P4 Manager 48 E7 P3 Engineer 36 P3 CAD/CAM 250000 E7 P4 Engineer 23 P4 Maintenance 310000 E7 P3 Engineer 36 sid name exp-grade cid E7 P4 Engineer 23 1 Jill A 550-0103 Proj Relation 1 Jill A 700-1003 pno pname budget 3 Nick A 700-1003 P1 Instruments 150000 3 Nick C 500-0103 P2 DB Develop 135000 P3 CAD/CAM 250000 4 Sina F 500-0103 P4 Maintenance 310000 What if all the field names are the same in the two relations? P5 CAD/CAM 500000 What if the field names are all disjoint? 35 36

9 Natural join Example Combining Operations….assignment operator § Relational algebra operations can be combined in one WorksOn Relation WorksOn ⨝ Proj expression by nesting them: eno pno resp dur eno pno resp dur pname budget E1 P1 Manager 12 E1 P1 Manager 12 Instruments 150000 Peno,pno,dur(sename='J. Doe' (Emp) ⨝ sdur>16 (WorksOn)) E2 P1 Analyst 24 E2 P1 Analyst 24 Instruments 150000 E2 P2 Analyst 6 DB Develop 135000 • Return the eno, pno, and duration for employee 'J. Doe' when he has E2 P2 Analyst 6 E3 P4 Engineer 48 Maintenance 310000 worked on a project for more than 16 months. E3 P4 Engineer 48 E5 P2 Manager 24 DB Develop 135000 E5 P2 Manager 24 E6 P4 Manager 48 Maintenance 310000 § Operations also can be combined by using temporary E6 P4 Manager 48 E7 P3 Engineer 36 CAD/CAM 250000 E7 P4 Engineer 23 Maintenance 310000 relation variables to hold intermediate results. E7 P3 Engineer 36 E7 P4 Engineer 23 • We will use the assignment operator ¬ for indicating that the result of an operation is assigned to a temporary relation. Proj Relation Natural join is performed by pno pname budget comparing pno in both relations. P1 Instruments 150000 empdoe ¬ sename='J. Doe' (Emp) P2 DB Develop 135000 wodur ¬ sdur>16 (WorksOn) P3 CAD/CAM 250000 empwo ¬ empdoe ⨝ wodur P4 Maintenance 310000 P5 CAD/CAM 500000 result ¬ Peno,pno,dur (empwo)

37 38

and lastly…. Referencing same relation twice.. Rename Operator , ra (R) § We want to find the student IDs of students who have the § The rename operator can be expressed several ways: same name • One definition:

ra(x) Takes the relation x and returns a copy of § Ambiguity in the statement the relation with the name a General Def: can rename attribute list • Student.name = student.name ?? with new names b • Rename isn’t all that useful, except if you join a relation with itself

§ easiest approach is to create a copy of Student with a • ρPerson (STUDENT) = copy of STUDENT with table name Person different name • Find pairs of student IDs who have the same name:

P STUDENT.sid, Person.sid (STUDENT ⋈STUDENT.name=Person.name ( ρPerson (STUDENT) )

39 40

10 Rename Operator Rename Operation § General definition allows renaming specific attributes § Renaming can be applied when assigning a result: • ρX(C,D) (R (A,B)) Relation R renamed to X

Fields A,B in R are now renamed to C,D in X result(EmployeeNum, ProjectNum, Duration) ¬ Peno,pno,dur (empwo)

• ρ (STUDENT (sid, name)) Person(idnum, who) § Or by using the rename operator r (rho): § Find pairs of student IDs who have the same name: ?

§ P Student.sid, Person.idnum rresult(EmployeeName, ProjectNum, Duration)(empwo) (Student ⋈Student.name=Person.who ( ρPerson(idnum,who) (Student) ) § Note: not necessary to rename the attributes …below will also work:

§ P Student.sid, Person.sid (Student ⋈Student.name=Person.sid ( ρPerson (Student) )

41 42

Completeness of Relational Algebra Operators Other Relational Algebra Operators § It has been shown that the relational operators {s, P, ´, È, -} § There are other relational algebra operators that we will not form a complete set of operators. discuss. Most notably, we often need aggregate • That is, any of the other operators can be derived from a combination operations that compute functions on the data. of these 5 basic operators. § For example, given the current operators, we cannot answer § Examples: the query: • Intersection - R Ç S º R È S - ((R - S) È (S - R)) • What is the total number of students enrolled in a course • We have also seen how a join is a combination of a Cartesian product • What are the total number of employees in department 5 ? followed by a selection. § We will see how to answer these queries when we study SQL.

43 44

11 How to write a RA query ? Modifying the Database § Find out which tables you need to access § Need to insert, delete, update tuples in the database • Compute × of these tables § What is insert ? § What are the conditions/predicates you need to apply ? • Add a new tuple to existing set = Union • Determines what select σ operators you need to apply § What is delete ? § What attributes/columns are needed in result • Remove a tuple from existing set = Set difference • Determines what project π operators you need § How to update attribute to new value ? • Need new operator: δ Project ( Select ( Product))

45 46

Schema of Bank DB Modifying Database § Customer (CustID, Name, street,city,zip) § Delete all accounts of Customer with CustID=3333 • Customer ID, Name, and Address info: street, city, zip • Deposit ← Deposit – (tuples of CustID 3333) § Insert tuple (4444, Downtown, 1000,1234) § Deposit (CustID, Acct-num, balance,Branch-name) • Deposit ← Deposit ∪ (4444,Downtown,1000,1234) • Customer ID, Account number, Balance in account, name of branch § Update: δA←E(R) where account is held • Update attribute A to E for tuples in relation R

§ Loan (CustID, Loan-num, Amount, Branch-name) • δbalance←1.05*balance(Deposit) : updates balances • Customer ID, loan number, amount of loan… • Can also specify selection condition on Deposit Update balances only for customers with CustID=1234 § Branch (Branch-name, assets, Branch-city)

47 48

12