The SQL Query Language
• Structured Query Language • Developed by IBM (system R) in the 1970s • Need for a standard since it is used by many vendors COS 597D: – ANSI (American Na onal Standards Ins tute) Principles of – ISO (Interna onal Organiza on for Standardiza on) Database and Informa on Systems • Standards: – SQL‐86 SQL: – SQL‐92 (major revision) – SQL‐99 (major extensions) Overview and highlights – SQL 2003 (XML ↔ SQL) – SQL 2008 – SQL 2011 – con nue enhancements Based on slides for Database Management Systems by R. Ramakrishnan and J. Gehrke
Crea ng Rela ons in SQL Referen al Integrity in SQL • CREATE TABLE Movie ( Observe: name CHAR(30), •type producer CHAR(30), • SQL‐92 on support all 4 op ons on deletes and updates. (domain) rel_date CHAR(8), – Default is NO ACTION (delete/update is rejected) of each ra ng CHAR, attribute PRIMARY KEY (name, producer, rel_date) ) – CASCADE (also delete all tuples that refer to deleted tuple) specified • CREATE TABLE Employee ( – SET NULL / SET DEFAULT (sets foreign key value of SS# CHAR(9), referencing tuple) •type name CHAR(30), enforced addr CHAR(50), CREATE TABLE Acct by DBMS startYr INT, (bname CHAR(20) DEFAULT ‘main’, whenever PRIMARY KEY (SS#)) acctn CHAR(20), tuples are • CREATE TABLE Assignment ( bal REAL, added or posi on CHAR(20), PRIMARY KEY ( acctn), modified. SS# CHAR(9), FOREIGN KEY (bname) REFERENCES Branch manager SS# CHAR(9), ON DELETE SET DEFAULT ) PRIMARY KEY (posi on), FOREIGN KEY(SS# REFERENCES Employee), BUT individual implementations may NOT support FOREIGN KEY (managerSS# REFERENCES Employee) )
SELECT [DISTINCT] select-list Primary and Candidate Keys in SQL Basic SQL Query FROM from-list WHERE qualification • Possibly many candidate keys (specified using
UNIQUE), one of which is chosen as the primary key. • from‐list A list of rela on names (possibly with a range‐ variable a er each name). • at most one book with a CREATE TABLE Book • select‐list A list of a ributes of rela ons in from‐list given title and edition – date, (isbn CHAR(10) • qualifica on Comparisons (A r op const or A r1 op publisher and isbn are title CHAR(100), A r2, where op is one of <, >,=, ≤, ≥, ≠ ) combined determined ed INTEGER, using AND, OR and NOT. • Used carelessly, can prevent pub CHAR(30), • DISTINCT is an op onal keyword indica ng that the the storage of database date INTEGER, answer should not contain duplicates. Default is that instances that arise in PRIMARY KEY (isbn), duplicates are not eliminated! practice! Title and ed suffice? UNIQUE (title, ed )) UNIQUE (title, ed, pub)? Conceptual Evalua on Strategy Example Instances
• Seman cs of an SQL query defined in terms instance of bname bcity assets of the following conceptual evalua on Branch strategy: pu Pton 10 • We will use these nyu nyc 20 – Compute the cross‐product of from‐list. instances of the Acct – time sq nyc 30 Discard resul ng tuples if they fail qualifica ons. and Branch rela ons
– Delete a ributes that are not in select‐list. in our examples. – If DISTINCT is specified, eliminate duplicate rows. bname acctn bal • This strategy is probably the least efficient instance of way to compute a query! An op mizer will Acct pu 33 356 find more efficient strategies to compute the nyu 45 500
same answers.
Example of Conceptual Evalua on Expressions and Strings
SELECT acctn SELECT name, age=2011-yrofbirth FROM Branch, Acct FROM Alumni WHERE Branch.bname=Acct.bname AND assets<20 WHERE dept LIKE ‘C%S’
bname bcity assets bname acctn bal • Illustrates use of arithme c expressions and string pu Pton 10 pu 33 356 pa ern matching: Find pairs (Alumnus(a) name pu Pton 10 nyu 45 500 and age defined by year of birth) for alums whose nyu nyc 20 pu 33 356 dept. begins with “C” and ends with “S”. nyu nyc 20 nyu 45 500 • LIKE is used for string matching. `_’ stands for any time sq nyc 30 pu 33 356 one character and `%’ stands for 0 or more time sq nyc 30 nyu 45 500 arbitrary characters.
CREATE TABLE Acct Range Variables (bname CHAR(20), acctn CHAR(20), bal REAL, • Refer to tuples from a rela on PRIMARY KEY ( acctn), FOREIGN KEY (bname REFERENCES Branch ) • Really needed only if the same rela on FROM CREATE TABLE Branch CREATE TABLE Cust appears twice in the clause. : (bname CHAR(20), (name CHAR(20), bcity CHAR(30), street CHAR(30), SELECT acctn assets REAL, city CHAR(30), FROM Branch, Acct PRIMARY KEY (bname) ) PRIMARY KEY (name) ) WHERE Branch.bname=Acct.bname AND assets<20 CREATE TABLE Owner OR OR (name CHAR(20), SELECT R.acctn SELECT R.acctn acctn CHAR(20), FOREIGN KEY (name REFERENCES Cust ) FROM Branch S, Acct R FROM Branch as S, Acct as R FOREIGN KEY (acctn REFERENCES Acct ) ) WHERE S.bname=R.bname WHERE S.bname=R.bname AND assets<20 AND assets<20 Nested Queries Nested Queries
Find names of all branches with accts of cust. who live in Rome Result = names of all branches with accts of cust. who live in Rome: SELECT A.bname SELECT A.bname FROM Acct A FROM Acct A WHERE A.acctn IN (SELECT D.acctn WHERE A.acctn IN (SELECT D.acctn FROM Owner D, Cust C FROM Owner D, Cust C WHERE D.name = C.name AND C.city=‘Rome’) WHERE D.name = C.name AND C.city=‘Rome’)
A very powerful feature of SQL: a WHERE clause can itself contain an Princeton branch with two accts – SQL query! (Actually, so can FROM and HAVING clauses.) acct A has two owners, neither living in Rome What get if use NOT IN? acct B has one owner living in Rome and one not living in Rome
To understand seman cs of nested queries, think of a nested loops Is Princeton branch in Result? evalua on: For each Acct tuple, check the qualifica on by compu ng the subquery. Is Princeton branch in (SELECT bname FROM Acct) - Result ? Is Princeton branch in result of replacing IN with NOT IN above?
Nested Queries with Correla on More on Set‐Comparison Operators Find acct no.s whose owners own at least one acct with a balance over 1000 SELECT D.acctn • We’ve already seen IN, EXISTS and UNIQUE. Can also FROM Owner D use NOT IN, NOT EXISTS and NOT UNIQUE. WHERE EXISTS (SELECT * • ANY ALL from , , , , , FROM Owner E, Acct R Also available: op , op , op > < = ≥ ≤ ≠ WHERE R.bal>1000 AND R.acctn=E.acctn • Find names of branches with assets at least as large as AND E.name=D.name) the assets of some NYC branch: SELECT B.bname • EXISTS set comparison operator, like IN, tests not empty set FROM Branch B • UNIQUE set operator checks for duplicate tuples WHERE B.assets ≥ ANY (SELECT Q.assets – If UNIQUE used, and * replaced by E.name, finds acct no.s whose FROM Branch Q owners own no more than one acct with a balance over 1000. Includes NYC branches? WHERE Q.bcity=’NYC’) • Why, in general, subquery must be re‐computed for each Branch tuple. note: key word SOME is interchangable with ANY - 15 ANY easily confused with ALL
Division in SQL Division in SQL – simple template Schemas
Find tournament winners who have won all tournaments. • WholeRelation: (r1, r2, …, rm, q1, q2, …qn) • DivisorRelation: (q1, q2, …qn) SELECT R.wname CREATE TABLE • WholeRelation ÷ DivisorRelation: (r1, r2, …, rm) FROM Winners R Winners WHERE NOT EXISTS (wname CHAR((30), SELECT R.r1, R.r2, …, R.rm ((SELECT S.tourn tourn CHAR(30), FROM WholeRelation R WHERE NOT EXISTS FROM Winners S) year INTEGER) ( (SELECT * EXCEPT FROM DivisorRelation Q (SELECT T.tourn ) FROM Winners T EXCEPT q , T.q , …T.q WHERE T.wname=R.wname)) (SELECT T. 1 2 n FROM WholeRelation T WHERE R.r1= T.r1 ∧ R.r2 = T.r2 ∧ … ∧ R.rm = T.rm ) ) COUNT (*) Aggregate Operators COUNT ( [DISTINCT] A) Division in SQL – general template SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) Significant extension of MAX (A) SELECT MIN (A) FROM rela onal algebra. WHERE NOT EXISTS single column ((SELECT FROM Example: Find name and city of the poorest branch WHERE ) EXCEPT The first query is illegal! SELECT S.bname, MIN (S.assets) (SELECT FROM Branch S FROM Is it poorest branch or WHERE ) poorest branches? SELECT S.bname, S.assets FROM Branch S WHERE S.assets = can do projections and other predicates within nested selects (SELECT MIN (T.assets) FROM Branch T)
GROUP BY and HAVING Queries With GROUP BY and HAVING
SELECT [DISTINCT] select-list • Some mes, we want to apply aggregate FROM from-list operators to each of several groups of tuples. WHERE qualification GROUP BY grouping-list Find the maximum assets of all branches in a city HAVING group-qualification for • The select‐list contains (i) a ribute names (ii) terms each city containing at least one branch. with aggregate opera ons (e.g., MIN (S.age)). SELECT B.bcity, MAX(B.assets) – The a ribute list (i) must be a subset of grouping‐list. FROM Branch B Intui vely, each answer tuple corresponds to a group, GROUP BY B.bcity and these a ributes must have a single value per group. (A group is a set of tuples that have the same value for • for each city ‐ one name ‐ aggregate assets all a ributes in grouping‐list.)
22
Conceptual Evalua on What a ributes are unnecessary? • Compute cross‐product of from‐list • Discard tuples that fail qualifica on (WHERE) ↓ • Delete `unnecessary’ a ributes What a ributes are necessary: • Par on remaining tuples into groups by the value of a ributes in grouping‐list. • Apply group‐qualifica on to eliminate some groups. Exactly those men oned in Expressions in group‐qualifica on must have a single value per group! (HAVING) SELECT, GROUP BY or HAVING clauses – In effect, an a ribute in group‐qualifica on that is not an argument of an aggregate op also appears in grouping‐list. (SQL does not exploit primary key seman cs here!) • Generate one answer tuple per qualifying group.
23 Find the maximum assets of all branches in a city for Joins in SQL each city containing at least two branches. SQL has both inner joins and outer join SELECT B.bcity, MAX(B.assets) bname bcity assets Use in "FROM … " por on of query FROM Branch B pu Pton 10 GROUP BY B.bcity pmc Pton 8 Inner join varia ons HAVING COUNT(*) >1 nyu nyc 20 • NATURAL INNER JOIN time sq nyc 30 • Generalized versions empty WHERE upenn phili 50
Outer join includes tuples that don’t match bcity assets • fill in with nulls • 3 varie es: le , right, full Pton 10 bcity 2nd column of result Pton 8 Pton 10 is unnamed. nyc 20 nyc 30 (Use AS to name it.)
nyc 30
• Outer Joins Example
• Left outer join of S and R: Given sid residence sid dept – take inner join of S and R (with whatever Tables: 77 GC 77 ELE 35 Lawrence 21 COS qualification) – add tuples of S that are not matched in inner join, 21 Butler 42 MOL filling in attributes coming from R with "null" NATURAL INNER JOIN: 77 GC ELE • Right outer join: 21 Butler COS – as for left, but fill in tuple of R NATURAL LEFT OUTER JOIN add: • Full outer join: 35 Lawrence null NATURAL RIGHT OUTER JOIN add: – both left and right 42 null MOL NATURAL FULL OUTER JOIN add both
General form SQL Query Example Query Now seen all major components jobs: study: (position, (SS#, Structure of Query: SELECT select-list division, academic_dept., FROM from-list SS#, adviser) WHERE qualification managerSS#) • Three set operations GROUP BY grouping-list • Only these combine HAVING group-qualification separate SELECT SELECT DISTINCT M.academic_dept., J.division statements. UNION or INTERSECT or EXCEPT • All other SELECTs FROM study M NATURAL LEFT OUTER JOIN jobs J SELECT select-list nested. FROM from-list WHERE qualification Scope of range variable GROUP BY grouping-list What does this produce? within SELECT… HAVING group-qualification FROM… and nested … continuing general query form subqueries in it 30 Null Values Integrity Constraints (Review)
• represent unknown value or inapplicable a ribute • An IC describes condi ons that every legal • can test a ribute value IS NULL or IS NOT NULL instance of a rela on must sa sfy. • need a 3‐valued logic (true, false and unknown) to – Inserts/deletes/updates that violate IC’s are deal with null values in predicates. disallowed. – comparisons with null evaluate to unknown – Can be used to ensure applica on seman cs (e.g., – Boolean opera ons on unknown depend on truth table sid is a key), or prevent inconsistencies (e.g., – can test IS UNKNOWN and IS NOT UNKNOWN sname has to be a string, age must be < 200) • meaning of constructs must be defined carefully • Types of IC’s: Domain constraints, primary key – Example: WHERE clause eliminates rows that don’t constraints, candidate key constraints, foreign evaluate to true key constraints, general constraints. – aggrega ons, except COUNT(*), ignore nulls
General Constraints More General Constraints
CREATE TABLE GasStation CREATE TABLE FroshSemEnroll • Can use ( name CHAR(30), queries to ( sid CHAR(10), street CHAR(40), express sem_title CHAR(40), city CHAR(30), constraint. PRIMARY KEY (sid, sem_title), st CHAR(2), • Constraints FOREIGN KEY (sid) REFERENCES Students • Useful when more type CHAR(4), can be CONSTRAINT froshonly named. general ICs than PRIMARY KEY (name, street, city, st), CHECK (2017 = keys are involved. • Constraints CHECK ( type=‘full’ OR type=‘self’ ), can use other ( SELECT S.classyear CHECK (st <>’nj’ OR type=‘full’) ) tables FROM Students S ⇒ Must check if WHERE S.sid=sid) ) ) other table modified
Number of bank branches in a city is less than 3 or Constraints Over Mul ple Rela ons the popula on of the city is greater than 100,000
CREATE ASSERTION branchLimit Number of bank branches in a city is less than 3 or the CHECK population of the city is greater than 100,000 ( NOT EXISTS ( (SELECT C.name, C.state FROM Cities C • Cannot impose as CHECK on each table. If either table is WHERE C.pop <=100000 ) empty, the CHECK is sa sfied INTERSECT • Is conceptually wrong to associate with individual tables ( SELECT D.name, D.state • ASSERTION is the right solu on; not associated with either FROM Cities D table. WHERE 3 <= (SELECT COUNT (*) FROM Branches B WHERE B.bcity=D.name ) ) ) ) Summary • SQL an important factor in the early acceptance of the rela onal model – more natural than earlier, procedural query languages. • Significantly more expressive power than fundamental rela onal model – Blend of rela onal algebra and calculus ‐ plus extensions • Many alterna ve ways to write a query – op mizer should look for most efficient evalua on plan – when efficiency counts, users need to be aware of how queries are op mized and evaluated for best results • SQL allows specifica on of rich integrity constraints
– But o en DB system does not support 37