<<

Definition

CSE 444: Internals • Database is collection of relations

• Relation R is subset of S1 x S2 x … x Sn

– Where Si is the domain of attribute i Lecture 2 – n is number of attributes of the relation Review of the and SQL • Relation is basically a with rows & columns – SQL uses word table to refer to relations

Magda Balazinska - CSE 444, Spring 2011 1 Magda Balazinska - CSE 444, Spring 2011 2

Properties of a Relation More Definitions • Each represents an n- of R • Ordering of rows is immaterial • Relation schema: describes heads • All rows are distinct – Relation name • Ordering of columns is significant – Name of each field (or column, or attribute) – Because two columns can have same domain – Domain of each field – But columns are labeled so – Applications need not worry about order – They can simply use the names • Degree (or arity) of relation: nb attributes • Domain of each column is a primitive type

• Relation consists of a relation schema and instance • : set of all relation schemas

Magda Balazinska - CSE 444, Spring 2011 3 Magda Balazinska - CSE 444, Spring 2011 4

Even More Definitions Example • Relation schema • Relation instance: concrete table content Supplier(sno: integer, sname: string, scity: string, sstate: string) – Set of (also called records) matching the schema • Relation instance

• Cardinality of relation instance: nb tuples sno sname scity sstate 1 s1 city 1 WA • Database instance: set of all relation 2 s2 city 1 WA instances 3 s3 city 2 MA 4 s4 city 2 MA

Magda Balazinska - CSE 444, Spring 2011 5 Magda Balazinska - CSE 444, Spring 2011 6

1 Integrity Constraints Key Constraints • Key constraint: “certain minimal subset of fields • Integrity constraint is a unique identifier for a tuple” – Condition specified on a database schema – Restricts data that can be stored in db instance • – Minimal set of fields • DBMS enforces integrity constraints – That uniquely identify each tuple in a relation – Ensures only legal database instances exist • • Simplest form of constraint is domain constraint – One candidate key can be selected as primary key – Attribute values must come from attribute domain

Magda Balazinska - CSE 444, Spring 2011 7 Magda Balazinska - CSE 444, Spring 2011 8

Foreign Key Constraints Key Constraint SQL Examples • A relation can refer to a tuple in another relation CREATE TABLE Part ( pno integer, • pname varchar(20), – Field that refers to tuples in another relation psize integer, pcolor varchar(20), – Typically, this field refers to the primary key of other PRIMARY KEY (pno) relation ); – Can pick another field as well

Magda Balazinska - CSE 444, Spring 2011 9 Magda Balazinska - CSE 444, Spring 2011 10

Key Constraint SQL Examples Key Constraint SQL Examples

CREATE TABLE Supply( CREATE TABLE Supply( sno integer, sno integer, pno integer, pno integer, qty integer, qty integer, price integer price integer, ); PRIMARY KEY (sno,pno) );

Magda Balazinska - CSE 444, Spring 2011 11 Magda Balazinska - CSE 444, Spring 2011 12

2 Key Constraint SQL Examples Key Constraint SQL Examples

CREATE TABLE Supply( CREATE TABLE Supply( sno integer, sno integer, pno integer, pno integer, qty integer, qty integer, price integer, price integer, PRIMARY KEY (sno,pno), PRIMARY KEY (sno,pno), FOREIGN KEY (sno) REFERENCES Supplier, FOREIGN KEY (sno) REFERENCES Supplier FOREIGN KEY (pno) REFERENCES Part ON DELETE NO ACTION, ); FOREIGN KEY (pno) REFERENCES Part ON DELETE CASCADE );

Magda Balazinska - CSE 444, Spring 2011 13 Magda Balazinska - CSE 444, Spring 2011 14

General Constraints Relational Queries • Table constraints serve to express complex constraints over a single table • Query inputs and outputs are relations

CREATE TABLE Part ( pno integer, • Query evaluation pname varchar(20), – Input: instances of input relations psize integer, – Output: instance of output relation pcolor varchar(20), PRIMARY KEY (pno), CHECK ( psize > 0 ) );

Note: Also possible to create constraints over many tables Magda Balazinska - CSE 444, Spring 2011 15 Magda Balazinska - CSE 444, Spring 2011 16

Relational Algebra Relational Operators • associated with relational model • Selection: σcondition(S) – Condition is Boolean combination (∧,∨) of terms • Queries specified in an operational manner – Term is: attr. op constant, attr. op attr. – A query gives a step-by-step procedure – Op is: <, <=, =, ≠, >=, or >

• Projection: πlist-of-attributes(S) • Relational operators • Union (∪), Intersection (∩), Set difference (–), – Take one or two relation instances as argument • Cross-product or cartesian product (×) – Return one relation instance as result • Join: R S = σ (R × S) – Easy to compose into expressions θ θ • Division: R/S, Rename ρ(R(F),E) Magda Balazinska - CSE 444, Spring 2011 17 Magda Balazinska - CSE 444, Spring 2011 18

3 Selection & Projection Examples Relational Operators

Patient πzip,disease(Patient) no name zip disease zip disease • Selection: σcondition(S) 1 p1 98125 flu 98125 flu – Condition is Boolean combination (∧,∨) of terms 2 p2 98125 heart 98125 heart – Term is: attr. op constant, attr. op attr. 3 p3 98120 lung 98120 lung – Op is: <, <=, =, ≠, >=, or > 4 p4 98120 heart 98120 heart • Projection: πlist-of-attributes(S) • Union (∪), Intersection (∩), Set difference (–), σdisease=‘heart’(Patient) πzip (σdisease=‘heart’(Patient)) no name zip disease zip • Cross-product or cartesian product (×) 2 p2 98125 heart 98120 • Join: R θ S = σθ(R × S) 4 p4 98120 heart 98125 • Division: R/S, Rename ρ(R(F),E) Magda Balazinska - CSE 444, Spring 2011 19 Magda Balazinska - CSE 444, Spring 2011 20

Cross-Product Example Different Types of Join AnonPatient P Voters V • Theta-: R θ S = σθ(R x S) age zip disease name age zip – Join of R and S with a join condition θ 54 98125 heart p1 54 98125 – Cross-product followed by selection 20 98120 flu p2 20 98120 θ • Equijoin: R θ S = πA (σθ(R x S)) P V – Join condition θ consists only of equalities

P.age P.zip disease name V.age V.zip – Projection πA drops all redundant attributes 54 98125 heart p1 54 98125 • Natural join: R S = πA (σθ(R x S)) 54 98125 heart p2 20 98120 – Equijoin 20 98120 flu p1 54 98125 20 98120 flu p2 20 98120 – Equality on all fields with same name in R and in S

Magda Balazinska - CSE 444, Spring 2011 21 Magda Balazinska - CSE 444, Spring 2011 22

Theta-Join Example Equijoin Example AnonPatient P Voters V AnonPatient P Voters V age zip disease name age zip age zip disease name age zip 54 98125 heart p1 54 98125 54 98125 heart p1 54 98125 20 98120 flu p2 20 98120 20 98120 flu p2 20 98120

P V P P.age=V.age ∧ P.zip=A.zip ∧ P.age < 50 V P.age=V.age P.age P.zip disease name V.age V.zip age P.zip disease name V.zip 20 98120 flu p2 20 98120 54 98125 heart p1 98125

20 98120 flu p2 98120

Magda Balazinska - CSE 444, Spring 2011 23 Magda Balazinska - CSE 444, Spring 2011 24

4 Natural Join Example More Joins AnonPatient P Voters V age zip disease name age zip • Outer join 54 98125 heart p1 54 98125 – Include tuples with no matches in the output 20 98120 flu p2 20 98120 – Use NULL values for missing attributes

P V • Variants age zip disease name – Left outer join 54 98125 heart p1 – Right outer join

20 98120 flu p2 – Full outer join

Magda Balazinska - CSE 444, Spring 2011 25 Magda Balazinska - CSE 444, Spring 2011 26

Outer Join Example Example of Algebra Queries AnonPatient P Voters V age zip disease name age zip Q1: Names of patients who have heart disease 54 98125 heart p1 54 98125 πname(Voter (σdisease=‘heart’ (AnonPatient)) 20 98120 flu p2 20 98120 33 98120 lung age zip disease name

54 98125 heart p1 P o V 20 98120 flu p2

33 98120 lung

Magda Balazinska - CSE 444, Spring 2011 27 Magda Balazinska - CSE 444, Spring 2011 28

Extended Operators More Examples of Relational Algebra Relations • Duplicate elimination (δ) Supplier(sno,sname,scity,sstate)! – Since commercial DBMSs operate on multisets !Part(pno,pname,psize,pcolor)! not sets !Supply(sno,pno,qty,price)! • Aggregate operators (γ) Q2: Name of supplier of parts with size greater than 10 – Min, max, sum, average, count (Supplier Supply ( (Part)) πsname σpsize>10 • Grouping operators (γ) – Partitions tuples of a relation into “groups” Q3: Name of supplier of red parts or parts with size greater than 10 – Aggregates can then be applied to groups πsname(Supplier Supply (σpsize>10 (Part) ∪ σpcolor=‘red’ (Part) ) ) • Sort operator (τ) (Many more examples in the book) Magda Balazinska - CSE 444, Spring 2011 29 Magda Balazinska - CSE 444, Spring 2011 30

5 Structured Query Language: SQL SQL Query

• Influenced by (see 344) Basic form: (plus many many more bells and whistles) • Declarative query language SELECT • Multiple aspects of the language FROM WHERE • Statements to create, modify tables and views – Data manipulation language • Statements to issue queries, insert, delete data – More

Magda Balazinska - CSE 444, Spring 2011 31 Magda Balazinska - CSE 444, Spring 2011 32

Simple SQL Query Simple SQL Query

Product PName Price Category Manufacturer Product PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi MultiTouch $203.99 Household Hitachi

SELECT * SELECT PName, Price, Manufacturer FROM Product FROM Product WHERE category=‘Gadgets’ WHERE Price > 100

PName Price Category Manufacturer PName Price Manufacturer Gizmo $19.99 Gadgets GizmoWorks “selection” and SingleTouch $149.99 Canon Powergizmo $29.99 Gadgets GizmoWorks MultiTouch $203.99 Hitachi “selection” “projection” Magda Balazinska - CSE 444, Spring 2011 33 Magda Balazinska - CSE 444, Spring 2011 34

Details Eliminating Duplicates

• Case insensitive: Category SELECT DISTINCT category Gadgets – Same: SELECT Select select FROM Product Photography – Same: Product product Household – Different: ‘Seattle’ ‘seattle’ Compare to:

• Constants: Category – ‘abc’ - yes Gadgets – “abc” - no SELECT category Gadgets FROM Product Photography Household

Magda Balazinska - CSE 444, Spring 2011 35 Magda Balazinska - CSE 444, Spring 2011 36

6 Ordering the Results Joins

Product (pname, price, category, manufacturer) SELECT pname, price, manufacturer Company (cname, stockPrice, country) FROM Product • WHERE category=‘gizmo’ AND price > 50 Find all products under $200 manufactured in Japan; ORDER BY price, pname return their names and prices.

Ties are broken by the second attribute on the ORDER BY list, etc. SELECT PName, Price FROM Product, Company Ordering is ascending, unless you specify the DESC keyword. WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200

Magda Balazinska - CSE 444, Spring 2011 37 Magda Balazinska - CSE 444, Spring 2011 38

Tuple Variables Nested Queries

Person(pname, address, worksfor) • Nested query Company(cname, address) Which – Query that has another query embedded within it address ? – The embedded query is called a subquery SELECT DISTINCT pname, address FROM Person, Company • Why do we need them? WHERE worksfor = cname – Enables to refer to a table that must itself be computed SELECT DISTINCT Person.pname, Company.address • Subqueries can appear in FROM Person, Company – WHERE clause (common) WHERE Person.worksfor = Company.cname – FROM clause (less common) SELECT DISTINCT x.pname, y.address – HAVING clause (less common) FROM Person AS x, Company AS y WHERE x.worksforMagda Balazinska = - CSEy.cname 444, Spring 2011 39 Magda Balazinska - CSE 444, Spring 2011 40

Subqueries Returning Relations Subqueries Returning Relations Company(name, city) Product(pname, maker) You can also use: s > ALL R Purchase(id, product, buyer) s > ANY R Return cities where one can find companies that manufacture EXISTS R products bought by Joe Blow Product ( pname, price, category, maker) SELECT Company.city Find products that are more expensive than all those produced FROM Company By “Gizmo-Works” WHERE Company.name IN SELECT name (SELECT Product.maker FROM Product FROM Purchase , Product WHERE price > ALL (SELECT price WHERE Product.pname=Purchase.product FROM Purchase Magda AND Balazinska Purchase - CSE 444, Spring .buyer 2011 = ‘Joe Blow‘); Magda Balazinska WHERE - CSE 444, Spring maker=‘Gizmo-Works’) 2011

7 Correlated Queries Aggregation

Movie (title, year, director, length) SELECT avg(price) SELECT count(*) Find movies whose title appears more than once. FROM Product FROM Product correlation WHERE maker=“Toyota” WHERE year > 1995 SELECT DISTINCT title FROM Movie AS x SQL supports several aggregation operations: WHERE year <> ANY sum, count, min, max, avg (SELECT year

FROM Movie Except count, all aggregations apply to a single attribute WHERE title = x.title); Note (1) scope of variables (2) this can still be expressed as single SFW

Magda Balazinska - CSE 444, Spring 2011 43 Magda Balazinska - CSE 444, Spring 2011 44

Grouping and Aggregation

SELECT S

FROM R1,…,Rn WHERE C1

GROUP BY a1,…,ak HAVING C2

Conceptual evaluation steps: 1. Evaluate FROM-WHERE, apply condition C1

2. Group by the attributes a1,…,ak 3. Apply condition C2 to each group (may have aggregates) 4. Compute aggregates in S and return the result Read more about it in the book...

Magda Balazinska - CSE 444, Spring 2011 45

8