Review

SELECT [DISTINCT]… Step 5. π SQL FROM … Step 1. × σ CPS 216 WHERE … Step 2. Advanced Systems GROUP BY … Step 3. Grouping

HAVING …; Step 4. Another σ 2

ORDER BY ORDER BY example

• SELECT [DISTINCT] E1, E2, E3... • List all students, sort them by GPA (descending) FROM…WHERE…GROUP BY…HAVING… and then name (ascending) ORDER BY E [ASC | DESC], – SELECT SID, name, age, GPA i1 E [ASC | DESC], …; FROM Student i2 • ASC = ascending, DESC = descending ORDER BY GPA DESC, name; – ASC is the default option • Operational semantics – Technically, only output columns can appear in – After SELECT list has been computed and optional ORDER BY clause (some DBMS support more) duplicate elimination has been carried out, – Can use output index instead sort the output according to ORDER BY specification ORDER BY 4 DESC, 2; 3 4

Data modification: INSERT Data modification: DELETE

• Insert one row • Delete everything – DELETE FROM Enroll; Example: Student 456 takes CPS 216 • Delete according to a WHERE – INSERT INTO Enroll VALUES (456, ’CPS 216’); Example: Student 456 drops CPS 216 • Insert the result of a query – DELETE FROM Enroll Example: Force everybody to take CPS 216 WHERE SID = 456 AND CID = ’CPS 216’; Example: Drop students with GPA lower than 1.0 – INSERT INTO Enroll all CPS classes (SELECT SID, ’CPS 216’ FROM Student – DELETE FROM Enroll WHERE SID NOT IN (SELECT SID FROM Enroll WHERE SID IN (SELECT SID FROM Student WHERE CID = ’CPS 216’)); WHERE GPA < 1.0) AND CID LIKE ’CPS%’; 5 6

1 Data modification: UPDATE Views

• Example: Student 142 changes name to “Barney” • A is like a virtual table – UPDATE Student – Defined by a query, which describes how to compute SET name = ’Barney’ the view contents on the fly WHERE SID = 142; – DBMS stores the view definition query instead of • Example: Let’s be “fair”? view contents – UPDATE Student – Can be used in queries just like a regular table SET GPA = (SELECT AVG(GPA) FROM Student); – Update of every row causes average GPA to change – Average GPA is computed over the old Student table

7 8

Creating and dropping views Using views in queries

• Example: CPS 216 roster • Example: find the average GPA of CPS 216 – CREATE VIEW CPS216Roster AS students SELECT SID, name, age, GPA – SELECT AVG(GPA) FROM CPS216Roster; FROM Student – To process the query, replace the reference to the WHERE SID IN (SELECT SID FROM Enroll view by its definition WHERE CID = ’CPS 216’); – SELECT AVG(GPA) • To drop a view (or table) FROM (SELECT SID, name, age, GPA FROM Student –DROP VIEW view_name; WHERE SID IN (SELECT SID – DROP TABLE table_name; FROM Enroll WHERE CID = ’CPS 216’)); 9 10

Why use views? Modifying views

• To hide data from users • Doesn’t seems to make sense since views are • To hide complexity from users virtual • Logical data independence • But does make sense if that’s how users view the – If applications deal with views, we can change the database underlying schema without affecting applications • Goal: modify the base tables such that the – Recall physical data independence: change the modification would appear to have been physical organization of data without affecting accomplished on the view applications • Real database applications use tons of views

11 12

2 A simple case An impossible case

CREATE VIEW StudentGPA AS CREATE VIEW HighGPAStudent AS SELECT SID, GPA FROM Student; SELECT SID, GPA FROM Student WHERE GPA > 3.7; DELETE FROM StudentGPA WHERE SID = 123; INSERT INTO HighGPAStudent translates to: VALUES(987, 2.5); • No matter what you do on the student table, DELETE FROM Student WHERE SID = 123; the inserted tuple won’t be in HighGPAStudent

13 14

A case with too many possibilities SQL92 updatable views

• Single-table SFW CREATE VIEW AverageGPA(GPA) AS – No aggregation SELECT AVG(GPA) FROM Student; – No subqueries – Note that you can rename columns in view definition UPDATE AverageGPA SET GPA = 2.5; • Overly restrictive • Set everybody’s GPA to 2.5? • Still gets it wrong in some cases • Adjust everybody’s GPA by the same amount? – See the slide titled “An impossible case” • Just lower Bart’s GPA?

15 16

Incomplete information Solution 1

• Example: Student (SID, name, age, GPA) • A dedicated special value for each domain – GPA cannot be –1, so use –1 as a special value • Value unknown – SELECT AVG(GPA) FROM Student; • Oh no, it’s lower than I expected! – We don’t know Nelson’s age – SELECT AVG(GPA) FROM Student • Value not applicable WHERE GPA <> –1; – Nelson hasn’t taken any classes yet; what’s his GPA? • Complicates applications – Remember the pre-Y2K bug? • 09/09/99 was used as an invalid or missing date value • It’s tricky to make these assumptions! 17 18

3 Solution 2 SQL’s solution

• A valid-bit column for every real column – Student (SID, name, name_is_valid, • A special value NULL age, age_is_valid, – Same for every domain GPA, GPA_is_valid) – Special rules for dealing with NULLs – Too much overhead – SELECT AVG(GPA) FROM Student WHERE GPA_valid; • Example: Student (SID, name, age, GPA) • Still complicates applications – <789, ’Nelson’, NULL, NULL>

19 20

Computing with NULLs Three-valued logic • TRUE = 1, FALSE = 0, UNKNOWN = 0.5 • When we operate on a NULL and another value • x AND y = min(x, y) (including another NULL) using +, –, etc., the x OR y = max(x, y) result is NULL NOT(x) = 1 – x • When we compare a NULL with another value (including another NULL) using =, >, etc., the • Aggregate functions ignore NULL, except result is UNKNOWN COUNT(*) • WHERE and HAVING clauses only tuples if the condition evaluates to TRUE – UNKNOWN is insufficient 21 22

Unfortunate consequences Another problem

• select avg(GPA) from Student; • Example: Who has NULL GPA values? select sum(GPA) / count(*) from Student; – select * from Student GPA = NULL; – Not equivalent • Won’t work; never returns anything! – avg(GPA) = sum(GPA) / count(GPA) still holds – (select * from Student) except all • select * from Student; (select * from Student where GPA = 0 OR GPA<>0); •Ugly! select * from Student – New built-in predicates IS NULL and IS NOT NULL where GPA > 3.0 or GPA <= 3.0; select * from Student where GPA is ; – Not equivalent • Be careful: NULL breaks many equivalences

23 24

4 Recap Constraints

• Covered • Restrictions on allowable data in a database –ORDER BY – In addition to the simple structure and type – Data modification statements restrictions imposed by the table definitions –Views – Declared as part of the schema – NULLs – Enforced by the DBMS • Skipped – Outerjoin • Why use constraints? – Alternative syntax – Protect data integrity (catch errors) – Schema modification statements – Tell the DBMS about the data (so it can optimize •Next better)

– Constraints 25 26

Types of constraints NOT NULL constraint example

• create table Student • NOT NULL (SID integer not null, name varchar(30) not null, •Key email varchar(30), • Referential integrity age integer, GPA float); • General assertion • create table Course (CID char(10) not null, • Tuple- and attribute-based CHECKs title varchar(100) not null); • create table Enroll (SID integer not null, CID char(10) not null);

27 28

Key declaration Key declaration examples

• At most one PRIMARY KEY per table • create table Student – Typically implies a primary index (SID integer not null primary key, name varchar(30) not null, Works on Oracle – Rows are stored inside the index, typically sorted by email varchar(30) unique, but not DB2: primary key value DB2 requires UNIQUE age integer, GPA float); key columns • Any number of UNIQUE keys per table • create table Course to be NOT NULL – Typically implies a secondary index (CID char(10) not null primary key, – Pointers to rows are stored inside the index title varchar(100) not null); • create table Enroll (SID integer not null, CID char(10) not null, primary key(SID, CID)); 29 30

5 Referential integrity example Referential integrity in SQL

– Enroll.SID references Student.SID • Referenced column must be PRIMARY KEY – Enroll.CID references Course.CID • Referencing column is called FOREIGN KEY – If an SID appears in Enroll, it must appear in Student • Example declaration – If a CID appears in Enroll, it must appear in Course – create table Enroll – That is, no “dangling pointers” (SID integer not null references Student(SID), Student Enroll Course CID char(10) not null, SID name … SID CID CID title primary key(SID, CID), 142 Bart … 142 CPS 216 CPS 216 Advanced Data… foreign key CID references Course(CID)); 123 Milhouse … 142 CPS 214 CPS 130 Analysis of Algo… 857 Lisa … 123 CPS 216 CPS 214 Computer Net… 456 Ralph … 857 CPS 216 ...... 857 CPS 130 ...... 31 32

Enforcing referential integrity Deferred constraint checking Example: Enroll.SID references Student.SID • No-chicken-no-egg problem • Insert or a Enroll tuple so it refers to a – create table Dept non-existent SID (name char(20) not null primary key, chair char(30) not null references Prof(name)); – Reject create table Prof • Delete or update a Student tuple whose SID is (name char(30) not null primary key, referenced by some Enroll tuple dept char(20) not null references Dept(name)); – Reject – The first INSERT will always violate a constraint – Cascade: ripple changes to all referring tuples • Deferred constraint checking is necessary – Set NULL: set all references to NULL – Check only at the end of a transaction – All three options can be specified in SQL – Allowed in SQL as an option 33 34

General assertion Tuple- and attribute-based CHECKs

• CREATE ASSERTION assertion_name • Associated with a single table CHECK assertion_condition; • Only checked when a tuple or an attribute is • assertion_condition is checked for each inserted or updated modification that could potentially violate it • Example: • Example: Enroll.SID references Student.SID – CREATE TABLE Enroll – CREATE ASSERTION EnrollStudentRefIntegrity (SID integer not null CHECK (NOT EXISTS (SELECT * FROM Enroll CHECK (SID IN (SELECT SID FROM Student)), WHERE SID NOT IN CID …); (SELECT SID FROM Student))); – Is it a referential integrity constraint? • SQL3, but not all (perhaps no) DBMS supports it – Not quite; not checked when Student is modified 35 36

6 Next time

Transactions!

37

7