<<

Outline What is a DBMS?

• The world of • A Management System • The relational model manages very large amounts of data • Normalization and provides: – Normal forms – Decomposition – persistent storage, • – efficient access, – Elegant theoretical framework – concurrent access, – secure, atomic access.

Examples Relational Model

• (Almost) Everything is a database! • Based on tables, as: – Banking systems acct# name balance 12345 Kate 100000.21 – Reservation systems 76543 Rick 89.01 23500 Tom 555999.02 – Libraries 34567 Alice 285.48 …… … – The Web • Varying degrees of structure, organization, • Today used in most DBMS's. and efficiency.

Database Marketplace Relational DBMS Market • Market size: $7.8B in 2004. • In 2004, $15B ($13.4B in 2003) • Major vendors: – IBM: DB2, Informix – 34.1% • Market shares: – Oracle – 33.7% – Oracle 41.3% – Microsoft: SQL Server, Access – 20.0% – IBM 30.6% • Open-source databases are growing in popularity: – Microsoft 13.4% – MySQL, PostgresSQL – Sybase 3.1% – NCR (Teradata) 2.6% Source: Gartner – Other 9.0% Source: IDC

1 Relational Model Keys

= . • Need to distinguish . • headers = attributes. – No pointers or IDs • K is a key for relation R if: • = tuple. name manf 1. K determines all attributes of R.

• Beers example: Honkers Ale Goose Island 2. For no proper of K is (1) true.

BudLite A.B. • Consequences:

… … – The value of a key is unique within a relation, i.e., no two tuples agree on keyattributes.

Relational Model Why Relations?

• Relation schema: • Very simple model. – name (attributes) – other structure info., e.g., keys, other constraints. • Often a good match for the way we think • Relation instance is current of rows for a relation about our data. schema. • Abstract model that underlies SQL, the • is collection of relation schemas. most important language in DBMS’s today. •“A Relational Model of Data for Large Shared Data Banks” by E. F. Codd in Communications of ACM, Vol 13. No. 6, June 1970 – But SQL uses bags while the abstract • E. F. Codd received the Turing award in 1981. relational model is set-oriented.

Relational Design Example

• Model the real world scenario with diagrams Drinkers(name, addr, beersLiked, manf, favoriteBeer) – Entity-Relationship (E-R) diagrams name addr beersLiked manf favoriteBeer • Translate diagram to relations Mike 111 E Ohio Bud A.B. Blonde Ale Mike 111 E Ohio Blonde Ale G.I. Blonde Ale • Normalize relations Nick 2000 W North Bud A.B. Corona – Eliminate redundancies, other anomalies Anna 123 W Grand BudLite A.B. BudLite • Implement in SQL • What is wrong with this relation? • Load data

2 Example Normalization

name addr beersLiked manf favoriteBeer • Improve the schema by decomposing relations Mike 111 E Ohio Bud A.B. Blonde Ale and removing anomalies. Mike ??? Blonde Ale G.I. ??? • Boyce-Codd Normal Form (BCNF): all Nick 2000 W North Bud ??? Corona dependencies follow from keys of relations. Anna 123 W Grand BudLite A.B. BudLite • (3NF): all dependencies • ???’s are redundant, since we can figure them out from the follow from keys or determine parts of keys FD’s. • Update anomalies: If Mike moves, we need to change addr in • Other normal forms: each of his tuples. – 1NF, 2NF, 4NF, 5NF • Deletion anomalies: If nobody likes Bud, we lose track of Bud’s manufacturer.

Decomposition Properties 3NF vs. BCNF

1. We should be able to recover from the • A relation can always be decomposed into decomposed relations the data of the BCNF and satisfy (1). original. • A relation can always be decomposed into – Recovery involves projection and 3NF and satisfy both (1) and (2). 2. We should be able to check that the • But it is not possible to decompose into dependencies for the original relation are BCNF and get both (1) and (2). satisfied by checking the projections of – Street-city-zip is a counterexample. those dependencies in the decomposed relations.

Example (1/4) Example (2/4)

name addr beersLiked manf favoriteBeer Mike 111 E Ohio Bud A.B. Blonde Ale Drinkers(name,addr, beersLiked, manf, favoriteBeer) Mike 111 E Ohio Blonde Ale G.I. Blonde Ale Drinkers1(name, addr, favoriteBeer) Nick 2000 W North Bud A.B. Corona Anna 123 W Grand BudLite A.B. BudLite Drinkers2(name,beersLiked, manf) • Drinkers can be decomposed as: Drinkers3(beersLiked, manf) Drinkers Drinkers4(name, beersLiked)

Drinkers1 Drinkers2 • How would you rename this relations?

Drinkers3 Drinkers4

3 Example (3/4) Example (4/4)

Project onto Drinkers1(name, addr, favoriteBeer): Project onto Drinkers4(name, beersLiked): name addr favoriteBeer Mike 111 E Ohio Blonde Ale name beersLiked Nick 2000 W North Corona Mike Bud Anna 123 W Grand BudLite Mike Blonde Ale Nick Bud Project onto Drinkers3(beersLiked, manf): Anna BudLite

beersLiked manf Bud A.B. Blonde Ale G.I. BudLite A.B.

Core Relational Algebra Selection

σ • A small set of operators that allows us to • R1 = C (R2) manipulate relations in limited but useful ways. – where C is a condition involving the attributes of relation R2. 1. Union, intersection, and difference: the usual set •Example: bar beer pric e operators. Relation Sells: Spoon Amstel 4 • Relation schemas must be the same. Spoon Guinness 7 2. Selection: Pick certain rows from a relation. Whiskey Guinness 7 3. Projection: Pick certain columns. Whiskey Bud 5 σ 4. Products and joins: Combine relations in useful SpoonMenu = bar=Spoon(Sells) ways. bar beer price 5. Renaming of relations and their attributes. Spoon Amstel 4 Spoon Guinness 7

Projection Product

• R = π (R ) × 1 L 2 • R = R1 R2 –where L is a list of attributes from the schema of R . 2 – pairs each tuple t1 of R1 with each tuple t2 of •Example R2 and puts in R a tuple t1t2. π beer,price(Sells) beer price • Theta-Join: R = R1 C R2 σ × Amstel 4 – is equivalent to R = C(R1 R2). Guinness 7 Bud 5 • Notice elimination of duplicate tuples.

4 Example Natural Join

Sells = bar beer price Bars = • R = R1 R2 Spoon Amstel 4 name addr – Equivalent to: Spoon Guinness 7 Spoon Wells 1. theta-join of R1 and R2 with the condition that all attributes Whiskey Guinness 7 Whiskey Rush of the same name be equated. Whiskey Bud 5 2. one column for each pair of equated attributes is projected out. BarInfo = Sells Bars Sells.bar=Bars.name • What is the formula? bar beer price name addr Spoon Amstel 4 Spoon Wells •Example: Spoon Guinness 7 Spoon Wells – Suppose the attribute name in relation Bars was Whiskey Guinness 7 Whiskey Rush changed to bar, to match the bar name in Sells. Whiskey Bud 5 Whiskey Rush – BarInfo = Sells Bars

Natural Join Example Renaming

ρ • BarInfo = Sells Bars • S(A1,…,An) (R) produces a relation identical to R but named S and with attributes, in bar beer price addr order, named A1,…,An. Spoon Amstel 4 Wells • Example: bar addr Spoon Guinness 7 Wells ρ R(bar,addr) (Bars) = Spoon Wells Whiskey Guinness 7 Rush Whiskey Rush Whiskey Bud 5 Rush • The name of the second relation is R.

Combining Operations Example

• Any algebra is defined as: • Find the bars that are either on Wells – basis arguments Street or sell Bud for less than $6. – ways of constructing expressions Sells(bar, beer, price) • For relational algebra: Bars(name, addr) – Arguments = variables standing for relations + finite, constant relations. – Expressions constructed by applying one of the operators + parentheses. • Query = expression of relational algebra.

5 Bag Semantics Bag Operations

• A relation (in SQL, at least) is really a bag. • Union: sum the times an element appears in the two bags. • It may contain the same tuple more than • Example: {1,2,1} ∪ {1,2,3,3} = {1,1,1,2,2,3,3}. once, although there is no specified order • Intersection: take the minimum of the number of (unlike a list). occurrences in each bag. • Example: {1,2,1,3} is a bag and not a set. • Example: {1,2,1} ∩ {1,2,3,3} = {1,2} • Select, project, and join work for bags as • Difference: subtract the number of occurrences in the two bags. well as sets. • Example: {1,2,1} – {1,2,3,3} = {1}. – Just work on a tuple-by-tuple basis, and don't eliminate duplicates.

Different Laws for Bags Example

• Some familiar laws continue to hold for • R ∩ (S ∪ T) ≡ (R ∩ S) ∪ (R ∩ T) holds for bags. sets but not for bags! – Examples: union and intersection are still • Let R, S, and T each be the bag {1}. commutative and associative. • Left side: S ∪ T = {1,1}; R ∩ (S ∪ T) = {1}. • But other laws that hold for sets do not • Right side: R ∩ S = R ∩ T = {1}; hold for bags! (R ∩ S) ∪ (R ∩ T) = {1} ∪ {1} = {1,1} ≠ {1}.

Extended Relational Algebra

• Adds features needed for SQL, bags. • Duplicate-elimination operator δ. • Extended projection. • Sorting operator τ.

6