The SQL

• Structured Query Language • Developed by IBM (system R) in the 1970s • Need for a standard since it is used by many vendors COS 597D: – ANSI (American Naonal Standards Instute) Principles of – ISO (Internaonal Organizaon for Standardizaon) and Informaon Systems • Standards: – SQL‐86 SQL: – SQL‐92 (major revision) – SQL‐99 (major extensions) Overview and highlights – SQL 2003 (XML ↔ SQL) – SQL 2008 – SQL 2011 – connue enhancements Based on slides for Database Management Systems by R. Ramakrishnan and J. Gehrke

Creang Relaons in SQL Referenal Integrity in SQL • CREATE Movie ( Observe: name CHAR(30), •type producer CHAR(30), • SQL‐92 on support all 4 opons on deletes and updates. (domain) rel_date CHAR(8), – Default is NO ACTION (delete/update is rejected) of each rang CHAR, attribute (name, producer, rel_date) ) – CASCADE (also delete all that refer to deleted ) specified • CREATE TABLE Employee ( – SET NULL / SET DEFAULT (sets value of SS# CHAR(9), referencing tuple) •type name CHAR(30), enforced addr CHAR(50), CREATE TABLE Acct by DBMS startYr INT, (bname CHAR(20) DEFAULT ‘main’, whenever PRIMARY KEY (SS#)) acctn CHAR(20), tuples are • CREATE TABLE Assignment ( bal REAL, added or posion CHAR(20), PRIMARY KEY ( acctn), modified. SS# CHAR(9), FOREIGN KEY (bname) REFERENCES Branch manager SS# CHAR(9), ON DELETE SET DEFAULT ) PRIMARY KEY (posion), FOREIGN KEY(SS# REFERENCES Employee),  BUT individual implementations may NOT support FOREIGN KEY (managerSS# REFERENCES Employee) )

SELECT [DISTINCT] select-list Primary and Candidate Keys in SQL Basic SQL Query FROM from-list WHERE qualification • Possibly many candidate keys (specified using

UNIQUE), one of which is chosen as the primary key. • from‐list A list of relaon names (possibly with a range‐ variable aer each name). • at most one book with a CREATE TABLE Book • select‐list A list of aributes of relaons in from‐list given title and edition – date, (isbn CHAR(10) • qualificaon Comparisons (Ar op const or Ar1 op publisher and isbn are title CHAR(100), Ar2, where op is one of <, >,=, ≤, ≥, ≠ ) combined determined ed INTEGER, using AND, OR and NOT. • Used carelessly, can prevent pub CHAR(30), • DISTINCT is an oponal keyword indicang that the the storage of database date INTEGER, answer should not contain duplicates. Default is that instances that arise in PRIMARY KEY (isbn), duplicates are not eliminated! practice! Title and ed suffice? UNIQUE (title, ed )) UNIQUE (title, ed, pub)? Conceptual Evaluaon Strategy Example Instances

• Semancs of an SQL query defined in terms instance of bname bcity assets of the following conceptual evaluaon Branch strategy: pu Pton 10 • We will use these nyu nyc 20 – Compute the cross‐product of from‐list. instances of the Acct – time sq nyc 30 Discard resulng tuples if they fail qualificaons. and Branch relaons

– Delete aributes that are not in select‐list. in our examples. – If DISTINCT is specified, eliminate duplicate rows. bname acctn bal • This strategy is probably the least efficient instance of way to compute a query! An opmizer will Acct pu 33 356 find more efficient strategies to compute the nyu 45 500

same answers.

Example of Conceptual Evaluaon Expressions and Strings

SELECT acctn SELECT name, age=2011-yrofbirth FROM Branch, Acct FROM Alumni WHERE Branch.bname=Acct.bname AND assets<20 WHERE dept LIKE ‘C%S’

bname bcity assets bname acctn bal • Illustrates use of arithmec expressions and string pu Pton 10 pu 33 356 paern matching: Find pairs (Alumnus(a) name pu Pton 10 nyu 45 500 and age defined by year of birth) for alums whose nyu nyc 20 pu 33 356 dept. begins with “C” and ends with “S”. nyu nyc 20 nyu 45 500 • LIKE is used for string matching. `_’ stands for any time sq nyc 30 pu 33 356 one character and `%’ stands for 0 or more time sq nyc 30 nyu 45 500 arbitrary characters.

CREATE TABLE Acct Range Variables (bname CHAR(20), acctn CHAR(20), bal REAL, • Refer to tuples from a relaon PRIMARY KEY ( acctn), FOREIGN KEY (bname REFERENCES Branch ) • Really needed only if the same relaon FROM CREATE TABLE Branch CREATE TABLE Cust appears twice in the clause. : (bname CHAR(20), (name CHAR(20), bcity CHAR(30), street CHAR(30), SELECT acctn assets REAL, city CHAR(30), FROM Branch, Acct PRIMARY KEY (bname) ) PRIMARY KEY (name) ) WHERE Branch.bname=Acct.bname AND assets<20 CREATE TABLE Owner OR OR (name CHAR(20), SELECT R.acctn SELECT R.acctn acctn CHAR(20), FOREIGN KEY (name REFERENCES Cust ) FROM Branch S, Acct R FROM Branch as S, Acct as R FOREIGN KEY (acctn REFERENCES Acct ) ) WHERE S.bname=R.bname WHERE S.bname=R.bname AND assets<20 AND assets<20 Nested Queries Nested Queries

Find names of all branches with accts of cust. who live in Rome Result = names of all branches with accts of cust. who live in Rome: SELECT A.bname SELECT A.bname FROM Acct A FROM Acct A WHERE A.acctn IN (SELECT D.acctn WHERE A.acctn IN (SELECT D.acctn FROM Owner D, Cust C FROM Owner D, Cust C WHERE D.name = C.name AND C.city=‘Rome’) WHERE D.name = C.name AND C.city=‘Rome’)

A very powerful feature of SQL: a WHERE clause can itself contain an Princeton branch with two accts – SQL query! (Actually, so can FROM and HAVING clauses.) acct A has two owners, neither living in Rome What get if use NOT IN? acct B has one owner living in Rome and one not living in Rome

To understand semancs of nested queries, think of a nested loops Is Princeton branch in Result? evaluaon: For each Acct tuple, check the qualificaon by compung the subquery. Is Princeton branch in (SELECT bname FROM Acct) - Result ? Is Princeton branch in result of replacing IN with NOT IN above?

Nested Queries with Correlaon More on Set‐Comparison Operators Find acct no.s whose owners own at least one acct with a balance over 1000 SELECT D.acctn • We’ve already seen IN, EXISTS and UNIQUE. Can also FROM Owner D use NOT IN, NOT EXISTS and NOT UNIQUE. WHERE EXISTS (SELECT * • ANY ALL from , , , , , FROM Owner E, Acct R Also available: op , op , op > < = ≥ ≤ ≠ WHERE R.bal>1000 AND R.acctn=E.acctn • Find names of branches with assets at least as large as AND E.name=D.name) the assets of some NYC branch: SELECT B.bname • EXISTS set comparison operator, like IN, tests not empty set FROM Branch B • UNIQUE set operator checks for duplicate tuples WHERE B.assets ≥ ANY (SELECT Q.assets – If UNIQUE used, and * replaced by E.name, finds acct no.s whose FROM Branch Q owners own no more than one acct with a balance over 1000. Includes NYC branches? WHERE Q.bcity=’NYC’) • Why, in general, subquery must be re‐computed for each Branch tuple. note: key word SOME is interchangable with ANY - 15 ANY easily confused with ALL

Division in SQL Division in SQL – simple template Schemas

Find tournament winners who have won all tournaments. • WholeRelation: (r1, r2, …, rm, q1, q2, …qn) • DivisorRelation: (q1, q2, …qn) SELECT R.wname CREATE TABLE • WholeRelation ÷ DivisorRelation: (r1, r2, …, rm) FROM Winners R Winners WHERE NOT EXISTS (wname CHAR((30), SELECT R.r1, R.r2, …, R.rm ((SELECT S.tourn tourn CHAR(30), FROM WholeRelation R WHERE NOT EXISTS FROM Winners S) year INTEGER) ( (SELECT * EXCEPT FROM DivisorRelation Q (SELECT T.tourn ) FROM Winners T EXCEPT q , T.q , …T.q WHERE T.wname=R.wname)) (SELECT T. 1 2 n FROM WholeRelation T WHERE R.r1= T.r1 ∧ R.r2 = T.r2 ∧ … ∧ R.rm = T.rm ) ) COUNT (*) Aggregate Operators COUNT ( [DISTINCT] A) Division in SQL – general template SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) Significant extension of MAX (A) SELECT MIN (A) FROM relaonal algebra. WHERE NOT EXISTS single ((SELECT FROM Example: Find name and city of the poorest branch WHERE ) EXCEPT  The first query is illegal! SELECT S.bname, MIN (S.assets) (SELECT FROM Branch S FROM  Is it poorest branch or WHERE ) poorest branches? SELECT S.bname, S.assets FROM Branch S WHERE S.assets = can do projections and other predicates within nested selects (SELECT MIN (T.assets) FROM Branch T)

GROUP BY and HAVING Queries With GROUP BY and HAVING

SELECT [DISTINCT] select-list • Somemes, we want to apply aggregate FROM from-list operators to each of several groups of tuples. WHERE qualification GROUP BY grouping-list Find the maximum assets of all branches in a city HAVING group-qualification for • The select‐list contains (i) aribute names (ii) terms each city containing at least one branch. with aggregate operaons (e.g., MIN (S.age)). SELECT B.bcity, MAX(B.assets) – The aribute list (i) must be a subset of grouping‐list. FROM Branch B Intuively, each answer tuple corresponds to a group, GROUP BY B.bcity and these aributes must have a single value per group. (A group is a set of tuples that have the same value for • for each city ‐ one name ‐ aggregate assets all aributes in grouping‐list.)

22

Conceptual Evaluaon What aributes are unnecessary? • Compute cross‐product of from‐list • Discard tuples that fail qualificaon (WHERE) ↓ • Delete `unnecessary’ aributes What aributes are necessary: • Paron remaining tuples into groups by the value of aributes in grouping‐list. • Apply group‐qualificaon to eliminate some groups. Exactly those menoned in Expressions in group‐qualificaon must have a single value per group! (HAVING) SELECT, GROUP BY or HAVING clauses – In effect, an aribute in group‐qualificaon that is not an argument of an aggregate op also appears in grouping‐list. (SQL does not exploit primary key semancs here!) • Generate one answer tuple per qualifying group.

23 Find the maximum assets of all branches in a city for Joins in SQL each city containing at least two branches.  SQL has both inner joins and outer SELECT B.bcity, MAX(B.assets) bname bcity assets  Use in "FROM … " poron of query FROM Branch B pu Pton 10 GROUP BY B.bcity pmc Pton 8  Inner join variaons HAVING COUNT(*) >1 nyu nyc 20 • NATURAL INNER JOIN time sq nyc 30 • Generalized versions empty WHERE upenn phili 50

Outer join includes tuples that don’t match bcity assets • fill in with nulls • 3 variees: le, right, full Pton 10 bcity 2nd column of result Pton 8 Pton 10 is unnamed. nyc 20 nyc 30 (Use AS to name it.)

nyc 30

• Outer Joins Example

• Left outer join of S and R: Given sid residence sid dept – take inner join of S and R (with whatever Tables: 77 GC 77 ELE 35 Lawrence 21 COS qualification) – add tuples of S that are not matched in inner join, 21 Butler 42 MOL filling in attributes coming from R with "" NATURAL INNER JOIN: 77 GC ELE • Right outer join: 21 Butler COS – as for left, but fill in tuple of R NATURAL LEFT OUTER JOIN add: • Full outer join: 35 Lawrence null NATURAL RIGHT OUTER JOIN add: – both left and right 42 null MOL NATURAL FULL OUTER JOIN add both

General form SQL Query Example Query  Now seen all major components jobs: study: (position, (SS#, Structure of Query: SELECT select-list division, academic_dept., FROM from-list SS#, adviser) WHERE qualification managerSS#) • Three set operations GROUP BY grouping-list • Only these combine HAVING group-qualification separate SELECT SELECT DISTINCT M.academic_dept., J.division statements. UNION or INTERSECT or EXCEPT • All other SELECTs FROM study M NATURAL LEFT OUTER JOIN jobs J SELECT select-list nested. FROM from-list WHERE qualification Scope of range variable GROUP BY grouping-list What does this produce? within SELECT… HAVING group-qualification FROM… and nested … continuing general query form subqueries in it 30 Null Values Integrity Constraints (Review)

• represent unknown value or inapplicable aribute • An IC describes condions that every legal • can test aribute value IS NULL or IS NOT NULL instance of a relaon must sasfy. • need a 3‐valued logic (true, false and unknown) to – Inserts/deletes/updates that violate IC’s are deal with null values in predicates. disallowed. – comparisons with null evaluate to unknown – Can be used to ensure applicaon semancs (e.g., – Boolean operaons on unknown depend on truth table sid is a key), or prevent inconsistencies (e.g., – can test IS UNKNOWN and IS NOT UNKNOWN sname has to be a string, age must be < 200) • meaning of constructs must be defined carefully • Types of IC’s: Domain constraints, primary key – Example: WHERE clause eliminates rows that don’t constraints, constraints, foreign evaluate to true key constraints, general constraints. – aggregaons, except COUNT(*), ignore nulls

General Constraints More General Constraints

CREATE TABLE GasStation CREATE TABLE FroshSemEnroll • Can use ( name CHAR(30), queries to ( sid CHAR(10), street CHAR(40), express sem_title CHAR(40), city CHAR(30), constraint. PRIMARY KEY (sid, sem_title), st CHAR(2), • Constraints FOREIGN KEY (sid) REFERENCES Students • Useful when more type CHAR(4), can be CONSTRAINT froshonly named. general ICs than PRIMARY KEY (name, street, city, st), CHECK (2017 = keys are involved. • Constraints CHECK ( type=‘full’ OR type=‘self’ ), can use other ( SELECT S.classyear CHECK (st <>’nj’ OR type=‘full’) ) tables FROM Students S ⇒ Must check if WHERE S.sid=sid) ) ) other table modified

Number of bank branches in a city is less than 3 or Constraints Over Mulple Relaons the populaon of the city is greater than 100,000

CREATE ASSERTION branchLimit Number of bank branches in a city is less than 3 or the CHECK population of the city is greater than 100,000 ( NOT EXISTS ( (SELECT C.name, C.state FROM Cities C • Cannot impose as CHECK on each table. If either table is WHERE C.pop <=100000 ) empty, the CHECK is sasfied INTERSECT • Is conceptually wrong to associate with individual tables ( SELECT D.name, D.state • ASSERTION is the right soluon; not associated with either FROM Cities D table. WHERE 3 <= (SELECT COUNT (*) FROM Branches B WHERE B.bcity=D.name ) ) ) ) Summary • SQL an important factor in the early acceptance of the relaonal model – more natural than earlier, procedural query languages. • Significantly more expressive power than fundamental relaonal model – Blend of relaonal algebra and calculus ‐ plus extensions • Many alternave ways to write a query – opmizer should look for most efficient evaluaon plan – when efficiency counts, users need to be aware of how queries are opmized and evaluated for best results • SQL allows specificaon of rich integrity constraints

– But oen DB system does not support 37