Algebraic Operations on Tabular Data Context 6.1 Basic Idea of Relational
Total Page:16
File Type:pdf, Size:1020Kb
Context Database Design: 6 The Relational Data Model: e s - developing a relational a Requirements analysis b database schema Algebraic operations on tabular data a t Design: a d - formal theory 6.1 Basic idea of relational languages Conceptual Design g n i Data handling in a rela- s 6.2 Relational Algebra operations u tional database Schema design d 6.3 Relational Algebra: Syntax and Semantics n - Algebra,-calculus -SQL -logical a (“create tables”) g Extensions 6.4. More Operators n i n g Using Databases 6.5 Special Topics of RA i s from application progs 6.5.1 Relational algebra operators in SQL e D : Special topics, 6.5.2 Relational completeness 1 Schema design t Physical Schema, 6.5.3 What is missing in RA? r a -physical P 6.5.4 RA operator trees Part 2: (“create access path”) Implementation of DBS Loading, administration, Transaction, Kemper / Eickler: 3.4, Elmasri /Navathe: chap. 74-7.6, tuning, maintenance, Concurrency control Garcia-Molina, Ullman, Widom: chap. 5 reorganization Recovery 6.1 Basic idea of relational languages Relational Languages • Data Model: Important concepts Goal of language design Language for definition and Given a relational database like the Video shop DB handling (manipulation ) of data Design a language, which allows to express queries – Languages for handling data: like: • Relational Algebra (RA) as a semantically well defined - Customers who rented videos for more than 100 $ last month applicative language - List of all movies no copy of which have been on loan since 2 month - List the total sales volume of each movie within the last year • Relational tuple calculus (domain calculus): predicate logic - Is there anybody whose rented movies all have category "horror"? interpretation of data and queries y? ……. ac riv • SQL / DML (or simply SQL) Language should be declarative ("descriptive") P – Kernel of SQL built upon RA as well as calculus, Historically: “Make query formulation ‘as easy as in natural language’ “ extended by operations like arithmetic expressions not available in RA or calculus HS / DBS05-08-RDML1 3 HS / DBS05-08-RDML1 4 Relational Algebra Basic Operations informally (repeated from chapter 4) Projection • Idea of Relational Algebra: – Given relations R(a1,...,an), S(b1,...,bm) Selection – Define operators which transform one or more tables into a result table. Example Union ∪ Tape ( id movieId ) Movie ( m_Id title ) 1 'B' Difference 'A' "Frenzy" - 5 'A' Join tables! 'B' "Matrix" 6 'B' A I Cartesian Project A A II B B I How could we find the number of tapes of "Matrix?" C B II C I I C II HS / DBS05-08-RDML1 5 (Renaming) II HS / DBS05-08-RDML1 6 1 Relational Algebra Basics 6.2 Relational Algebra operations • Why “algebra”? Terminological update – Mathematically, algebraic structures basically defined Let A be a set universal of attributes by a base set S of values and operations which map one or more elements of S to S and obey certain laws –A Relation Schema is a named n-tuple of attributes (e.g. groups, lattices, …) RS = R (a1,...,an), {a1,…an} ⊆ A – The base set of Relational Algebra is the set of all –RA is the set {a1,...,an} of attributes (columns) of RS relations (tables) with attributes from a given set A of called the type signature of R attributes. – The operation Σ applied to a relation R results in the – Operations on tables type signature of R: Σ ( R ) = RA projection, cartesian product, join, .... as introduced intuitively above –A Relational Database Schema is a set of relation schemas –A Database Relation R (conforming to Relation – Note: Result of an operation is time dependent Schema RS) is a subset of D(a1) X ...X D(an), the HS / DBS05-08-RDML1 7 cross product of the domains of the HSattributes / DBS05-08-RDML1 of 8R Set operations Relational Algebra Basic Operations Tape ( id movieId ) Movie ( movieId title ) • Cartesian (Cross) Product 1 'B' ∪ 'A' "Frenzy" ? – Cross product of two sets R and S: 5 'A' 'B' "Matrix" a set of pairs with type signature Σ(R) ⊆ A and Σ(S) ⊆ A 6 'B' – Result relation T should be a relation over R, S relations, Σ(R) ∪Σ(S) = A' ⊆ A ( assumed Σ(R) ∩Σ(S) = ∅) R and S are called union-compatible R ( a1 a2 ) S ( b1 a2 ) if the domains of Σ ( R ) and Σ ( S ) are pair wise the same or: two tables are union-compatible if they have 1 'A' X 3 'A' the same number of columns and have the same 5 'Z' 1 'B' domains in corresponding colums T ((a1 , a2) (b1 a2) ) R and S union-compatible, then set union and set difference Not a relation = (1 'A') (3 'A') R ∪ S and R \ S are defined as usual on mathematical sets schema in 1NF. (5 'Z') (3 'A') Therefore NOT Other set operations may be easily defined using ∪ and \ (1 'A' ) (1 'B') an operation of Relational Algebra HS / DBS05-08-RDML1 9 (5 'Z') (1 'B') HS / DBS05-08-RDML1 10 Relational Algebra Basic Operations Projection π Extended cross product X Let Σ(R) = B’, B ⊆ B’ Projection πB (R ) of R on B: Let R and S be relations, Σ(R) ={a1,...,an} ⊆ A , Set of rows from R with the columns not in B eliminated B' Σ(S) = {b1,...,bm} ⊆ A, Σ(R) ∩Σ(S) = ∅ then B – Schema Σ(R X S): project {R.a1,...R.an, S.b1, …,S.bm} = {R.a | a ∈ A} ∪ {S.b | b ∈ A} Omit relation qualifiers “R.” and “S.” - no naming conflict. – Extended cross product R X S : R X S = – No duplicates in πB (R ) (in theory!) {(a1,...an,b1,...,bm) | (a1,...,an) ∈ R, (b1,...,bm) ∈ S} πB (R ) = {r restricted to B | r ∈ R} Renaming, if Σ(R) ∩Σ(S) != ∅ : = {r’ | there is a tuple r ∈ R such that ρ <attrname> ← <newAttrname> (<relname>) r’ is the restriction of r to the attributes in B} HS / DBS05-08-RDML1 11 HS / DBS05-08-RDML1 12 2 Relational Algebra Basic Operations Selection σ "Find movies directed by Billy Wilder made 1960 or later" Parent (id, mother, father) πmother (Parent) = Mary 25 Mary, Paul select 47 Mary, John 55 Mary, Paul A relation with only one column and one row (no duplicates) Selection of tuples from a table R according to a predicate defined on R • Property of projection: B contains a key of R fl πB (R ) contains as many Predicate P :: R -> {TRUE, FLASE} tuples as R: | πB (R ) | = |R| defined on tuples of R. For each r ∈ R : P(r) = TRUE | FALSE • Useful for estimating the size of query results • Important for optimization HS / DBS05-08-RDML1 13 HS / DBS05-08-RDML1 14 Primitive predicates Row predicates Boolean row predicates Primitive (simple) predicates: Row predicates combine primitive (simple) predicates by and, or, not and parenthesis '(', ')' Let a, b be attributes, w value from dom (a) a θ b and a θ w are primitive predicates Inductive definition of syntax for (row) predicates where θ∈{ =, !=, <, <=, >, >=} ....as usual: - Primitive predicates are predicates Primitive predicates compare either an attribute and a - If Q, Q’ are predicates, then Q ∧ Q’ , Q ∨ Q’ and ¬Q value or two attributes are predicates - Operator preference and brackets as usual - There are no other predicates e.g. Movie.director = ‘Wilder, Billy' Movies directed by Spielberg before 1999 or an entertainment movie : or Rental.until > Rental.from + 1 month movie.director='Spielberg' ∧ (year <= TO_DATE('1999', 'yyyy') ∨ cat = 'entertainment ) HS / DBS05-08-RDML1 15 HS / DBS05-08-RDML1 16 Propositional semantics Selection of rows Semantics of predicates: Selection σ Let σ (R) = {r | r ∈ R and P(r) = TRUE } a, b be attributes of table R, r ∈ R, P P the predicate a θ v , Q is the predicate a θ b where P is a row predicate r(a) : value for attribute a of tuple r Then Note: P(r) := r(a) θ v - Selection operator selects the row with all attributes: Q(r) := r(a) θ r(b) Σ(R) = Σ (σ P (R) ) - size of result depends on selectivity of P S ≡ P ∧ Q : S(r,t) := P(r) ∧ Q(r) according to ∧ semantics selectivity := | σ P (R) | / | R | important for optimization ¬, ∨ and preference as usual in propositional logic Frequently, θ is equality predicate (=) HS / DBS05-08-RDML1 17 HS / DBS05-08-RDML1 18 3 Relational Algebra Basic Operations Relational Algebra: combining operators Example π title (σ P ( Movie) ) "Movies directed by Spielberg produced 1997 or later ! " where P = "director = 'Spielberg' and year >= 1997" Movie(mId,title,...,director, year) Find the actors performing in movie directed by σ ( Movie) P Spielberg where P = "director = 'Spielberg' and year >= 1997" M_ID TITLE CAT YEAR DIRECTOR PRICE_DAY LENGTH ---- ----- ---------- ------- ---------- -------- -------- Select Q Select P +Proj. 2 Amistat Drama 01.04.97 Spielberg 1 120 4 Minority Report Drama 01.04.02 Spielberg 2 145 2 Zeilen ausgewählt. π stage_name (σ P (σ Q ( Movie) X Starring ) where P = "Movie.id = Starring.movieId " Q = "director = 'Spielberg' Wrong result…? "Truth" defined relative to contents of DB Closed world assumption: for all predicates P r ∉R ⇒ P(r) = False HS / DBS05-08-RDML1 19 HS / DBS05-08-RDML1 20 Renaming Renaming Renaming of relations ρ <newname> (<relname>) Relation <relname> is renamed to <newname> in the Example: Employee( id, name, boss, …) context of expression Find subordinates of 'Miller' ρ <attrname> ← <newAttrname> (<relname>) Attribute <attrname> of relation <relname> is renamed to <newAttrname> in the context of expression π name (σ P (σ Q ( Employee) X Employee ))) where P = "Employee.name = 'Miller' " Q = "Employee.boss = Employee.id " π Sub.name (σ Q (σ P ( Employee X ( ρSub (Employee))))) where P = "Employee.name = 'Miller' " Q = "Sub.boss = Employee.id " HS / DBS05-08-RDML1 21 HS / DBS05-08-RDML1 22 Evalution example: one table – two roles 6.3 Relational Algebra: Syntax and Semantics Employee EmployeeSub Syntax of (simple) Relational Algebra defined id name boss id name boss NULL ρ 001 Abel NULL 001 Abel Sub (Employee) inductively : 002 Bebel 005 002 Bebel 005 004 Cebel 005 004 Cebel 005 005 Miller 001 005 Miller 001 (1) Each table identifier is a RA expression 006 Debel 001 006 Debel 001 ...