<<

CS 5614: Basic Definition and Modification in SQL 67

Introduction to SQL

† Structured (‘Sequel’) † Serves as DDL as well as DML

† Declarative † Say what you want without specifying how to do it † One of the main reasons for commercial success of DBMSs

† Many standards and implementations † ANSI SQL † SQL-92/SQL-2 (Null operations, Outerjoins etc.) † SQL3 (Recursion, Triggers, Objects) † Vendor specific implementations

† “Bag Semantics” instead of “ Semantics” † Used in commercial RDBMSs

CS 5614: Basic Data Definition and Modification in SQL 68

Example:

† Create a / in SQL

CREATE TABLE Students (sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL);

† Support for Basic Data Types † CHAR(n) † VARCHAR(n) † BIT(n) † BIT VARYING(n) † INT/INTEGER † FLOAT † REAL, DOUBLE PRECISION † DECIMAL(p,d) † DATE, TIME etc.

CS 5614: Basic Data Definition and Modification in SQL 69

More Examples

† And one for Courses

CREATE TABLE Courses (courseid CHAR(6), department CHAR(20));

† And one for their relationship!

CREATE TABLE takes (sid CHAR(9), courseid CHAR(6)); † Why?

† Can also provide default values

CREATE TABLE Students (sid CHAR(9), .... age INTEGER DEFAULT 21, gpa REAL);

CS 5614: Basic Data Definition and Modification in SQL 70

Examples Contd.

† DATE and TIME † Implementations vary widely † Typically treated as strings of a special form † Allows comparisons of an ordinal nature (<, > etc.)

† DATE Example † ‘1999-03-03’ (No Y2K problems)

† TIME Examples † ‘15:30:29’ † ‘15:30:29.3875’

† Deleting a Relation/Table in SQL

DROP TABLE Students;

CS 5614: Basic Data Definition and Modification in SQL 71

Modifying Relation Schemas

† ‘Drop’ an attribute ()

ALTER TABLE Students DROP login;

† ‘Add’ an attribute (column)

ALTER TABLE Students ADD phone CHAR(7);

† What happens to the new entry for the old records? † Default is ‘NULL’ or say

ALTER TABLE Students ADD phone CHAR(7) DEFAULT ‘unknown’;

† Always begin with ‘ALTER TABLE

† Can use DEFAULT even with regular definition (as in Slide 69)

CS 5614: Basic Data Definition and Modification in SQL 72

How do you enter/modify data?

† INSERT command

INSERT INTO Students VALUES (‘53688’,’Mark’,’mark2345’,23,3.9)

† Cumbersome (use bulk loading; described later)

† DELETE command

DELETE FROM Students S WHERE S.name = ‘Smith’

† UPDATE command

UPDATE Students S SET S.age=S.age+1, S.gpa=S.gpa-1 WHERE S.sid = ‘53688’

CS 5614: Basic Data Definition and Modification in SQL 73

Domains

† Domains: Similar to Structs and other user-defined types

CREATE DOMAIN Email AS CHAR(8) DEFAULT ‘unknown’; .... login Email // instead of login CHAR(8) DEFAULT ‘unknown’

† Advantages: can be reused

junkaddress Email, fromaddress Email, toaddress Email, ....

† Can DROP DOMAINS too!

DROP DOMAIN Email; † Affects only future declarations

CS 5614: Basic Data Definition and Modification in SQL 74

Keys

† To Specify Keys † Use or UNIQUE † Declare alongside attribute † For multiattribute keys, declare as a separate line

CREATE TABLE takes ( sid CHAR(9), courseid CHAR(6), PRIMARY KEY (sid,courseid) );

† Whats the difference between PRIMARY KEY and UNIQUE? † Typically only one PRIMARY KEY but any number of UNIQUE keys † Implementor allowed to attach special significance

CS 5614: Basic Data Definition and Modification in SQL 75

Creating Indices/Indexes

† Why? † Speeds up query processing time

† For Students

CREATE INDEX indexone ON Students(sid); CREATE INDEX indextwo ON Students(login);

† How to decide attributes to place indices on? † One is (typically) created by default on PRIMARY KEY † Creation of indices on UNIQUE attributes is implementation-dependent † In general, physical /tuning is very difficult! † Use Tools: Microsoft SQLServer has an index selection Wizard

† Why not place indices on all attributes? † Too cumbersome for insertions/deletions/updates

† Like all things in computer science, there is a tradeoff! :-)

CS 5614: Basic Data Definition and Modification in SQL 76

Other Properties

† ‘NOT NULL’ instead of DEFAULT

CREATE TABLE Students (sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL);

† Can a tuple without a value for gpa † NULL will be inserted

† If we had specified

gpa REAL NOT NULL);

† insert cannot be made!

¢¡¤£¦¥¨§ ©   ¦¦

¤ "!$#&%('*),+.-&/10(2$35462748%(0(09':!;/&-<=!$#&>?2@2A-&%CBD2@>?2$ ;!FEG/&2$>:N2$O&>N2$'?2$ ;!;IP!;%C+L &'Q/&'N2$-%( SR$+L JT

/& &RV!;%(+J W48%X!;#W!$#&2 >?2$0CIY!;%C+L &IJ0L)"+.-&2$0CZ\[]#&2 ^&>N'!_%('9>N2$0(IP!;%C+L &IJ0.IL0(KJ2$`&>NIL3GIL QIL0CKL2$`&>NIL%CR_IJ &-AO1>?+.$2$-1/&>?IJ0

U

46I¢]R$>?2@IY!;%C &Kc &2N4d>?2@0(IY!$%(+J &']ab>?+J)eKL%gfD2$ +L 12$'?Zh[]#&2Q'?2$R@+L &-¨%(']i*IY!;IJ0(+JKL3548#&%(RY#¨%(']0(+JKL%CR$IL0jIL &-

-&2$R@0(IL>NIY!$%XfD2h%( ¨ &IY!$/&>?2,k:'!;/1-&2$ ;!;']abIL)"%(0C%(IJ>l48%X!$#F!;#&2Qm n]oQp9o8qrO&>?+JKL>NIL)"),%C &K,0CIL &KJ/&ILKJ2h48%C0(0¦^1 &-

!;#1%('\IJ0(0!;+.+c &IP!;/&>NIL0(sNZt ¤ abIJRN!;3!;#&2@>?2uIJ>?2uR$0C+L'N2uR$+L>N>?2$'NOv+L &-&2@ &R$2$'w`D2V!V4u2$2$ -1IY!;IJ`&IL'N2Q'<¦'!;2@),'wIL &-

m n]oQp9o8qQZwm_n]oQp9oQqxR$IL y`v2z!;#1+L/&KJ# !=+Ja]IL'AIS-&IY!$IL`&IJ'?2¨'<¦'!$2$){48#&2$>N2,IJ0(0|!;#12-&IY!$I}^J!$'A%C !$+

)"IL%( ~)"2$)"+L><.Z ¤ €R@+L ;!;>?IJ'!$3uI-1%('!$%( &KJ/&%('N#&%( 1Kab2$IY!$/&>?2S+Launhiu‚]ƒ„&'c%C'Q!$#&IY!"!;#&2V< +LOv2$>NIY!;2+J

'?2@R$+L &-1IL><€'!$+L>NILKL2@Z†oW!$#&2$>W!;#&IJ ‡!;#&IP!ykIJ &-~'?+J),2-1%(Bv2$>?2@ &R$2$'Q%( yEG/&2$><€O&>N+.R$2$'?'N%( &KJs?3ˆ!$#&2$>?2"IL>N2

2N‰¦R$2$0C0(2$ ;!7IL 1IL0(+JKL'w!$+c`v+Y!;#SR$/&0g!;/&>N2$'?Z]‚]+P!;#¨m n]o8p9oQqŠIJ &-¨ntiu‚]ƒ„1']IL>?2Q-&2@R$0(IJ>?IY!$%XfD2$ZŒ‹#&IY!]4u2

Ž

&+¢4!;+"`v2A>?2@0(IY!$%(+J &']IL>N2A>?2@ab2$>?>N2$-z!;+"IL'*(O&>N2$-&%(R@IY!;2@'?‘¦%C ¨m_n]oQp9oQqQZ“’”!$/&O&0(2Q%C']R$IL0C0(2$-¨I"KL>?+J/& &-

abILRV!F%C }m n]oQp9o8qQZ•’ !;IJ`&0(2z%('hR$IL0C0(2$-¨IL C2N‰G!;2$ &'N%(+J &IL0l-&2$^& 1%X!;%C+L &‘9%C }m n]oQp9oQq–IJ &-'?++J &Z—k:iu+

Ž

&+P!8KL2N!u`D+JKLKJ2$-}-&+¢48 `;<"!;#&2$'N2u'?Ov2$R@%(^&R$'N˜G4u2t%C &R$0(/1-&2 !$#&2$)™#&2$>N2 /&'!u'?+*!$#&IY!w

U

R$+J & &2$RN!$%(+J &3•%(a|?2cIL0C>?2$IJ-J<›abIJ),%C0(%CIL>w48%g!;#m n]oQp9o8qQZ œ 0('N2$3• &+P!;#&%C &KF!;+F46+J>?><.Z(s[]#&2*!$#&%(>N-

EG/&2$><=>N2$O&>N2$'?2$ ;!;IP!;%C+L "%('?3j+La9R$+J/&>?'N2$3¦„1ž8pš!$#&IY!w46IJ']%( ;!;>?+.-1/&R$2$-"2$IL>N0(%C2$>?Zw V 6!;#12u>?2$)"IL%C &-&2$> +Ja1!;#&%C'

-&+.R$/1),2$ ;!;3ˆ4u2F48%(0(0 %C !$>?+.-&/&R@2=`1IL'?%CR¨+LOv2$>?IP!;%C+L &'"IL &-~)"IL &%CO&/&0CIY!;%C+L &'*!$#&IY!c4u2¨R$IL ~Ov2$>?ab+J>?)Ÿ+J

>?2@0(IY!$%(+J &'?Z" v+L>Q2$ILRY#'?/1R¡#`&IJ'?%CRc+LOv2$>?IP!;%C+L &3Œ462*48%(0C0“'N#&+¢4¢#&+¢4–%X!c%('Q>?2$O1>?2$'N2$ !$2$-}%C 2$ILRY#+La|!$#&2

!;#1>?2$2Q-&%CBD2@>?2$ ;!6 &+P!;IP!;%(+J &'?Z

Zu¤¨¥l¦¨§v¥©§vª]«¬G­¨®&¯@¦¨§v¥l° ±\[]#&2z/& &%C+L ¨+Laj!¤4u+c>?2$0CIY!;%C+L &'7²IL &-=³†%('!;#&2Q'?2V!8+Lal2$0(2$)"2$ ;!;'Œ!;#&IP!

£

Ž

IJ>?2u%C š²+J>\%C —³´+L>]`v+P!;#&Zˆ‹€2uIL'N'?/&)"2w!$#&IY!w!;#&2h'?RY#&2$)"IL'w+LaŒ²IJ &-—³›IL>N2uIL0C% 28k+Ja9R$+L/1>?'?2@s

Ž

IJ &-z!;#1IY! !;#&2@%(>•R$+J0(/&)" &' IL>N2tIJ0('N+A+L>N-&2$>N2$-,IJ0(% 26k+Ja¦R$+J/&>?'N2$3vILKJIL%( 1s?Z‹”2] &+¢4KJ%XfD2ˆ!;#12 !$#&>?2$2

>N2$O&>?2@'?2$ ;!;IP!;%(+J &']+La5!;#128/1 &%(+J }>?2@0(IY!$%(+J &±

² ³k'?%C),O10(2$3j>?%(KJ# !$µ s

¶·?¸ ¹?º ¹?»¹¢¼¦½~¾À¿›Á ·?¸¹¢º¹¢»¹?¼|½ˆÂ

¶·?¸ ¹?º ¹?»¹¢¼¦½~¾À¿›Ã ·?¸¹¢º¹¢»¹?¼|½ˆÂ

Ä*+Y!;%CR$2S!$#&IY!!$#&2SfÀIL>N%(IL`10(2$' IJ>?2)"2$>?2@0X<(O10(ILR@2$#&+L0C-&2$>N'?‘Q/&'?2@-ab+J>O&IY!N!;2$>N

¸¹¢º¹?» ¹?¼

)"IY!$R¡#1%( &KJ˜1462uR$+J/&0(-¨#&I¢fD2W48>?%g!?!$2$ F!;#&2QIL`v+¢fD2W!¤46+cIL'N±

¶·?Å ¹?Æ ¹;Ǧ¹¢È¦½~¾À¿›Á ·?Ź¢Æ¹ Ǧ¹?È|½ˆÂ

¶·?É ¹?Ê ¹?˹¢Ì¦½~¾À¿›Ã ·?ɹ¢Ê¹¢Ë¹?Ì|½ˆÂ

·?ÃGÍÀÎGÍÀÏG¶ÑÐ

Ò¤ÓlÔ\Õ Ö¨ÔØ×_Ù ÚÜۈÝLÔYÚÜÙNÞ?ß9àáÔYÖ9âYãÛ¤ßߨÚäÙâˆ×_åVæ]×ەßç×_Ú Þ?èéå ÛVêWÞ?Ô]ë]ê ÚÜìLÛ¤ÖÛVÙVÞ_êÔ@èVã í]Û¤ÙNÞ\ßîÞbæ.ïÜەàîÔ¡ÖjðQÔ$ê ãïäەñ¡ò óLó

ôGÁ¦õ ö©Á¦½

÷Gø¦ùÀõ;ø

·?ÃGÍÀÎGÍÀÏG¶ÑÐ

ôGÁ¦õ ö©Ã¦½

Zuû¨¦ü_¬Gý ¬À¥lþ.¬"§Dªw«¬G­¨®&¯@¦¨§v¥l° ±\[]#&2*-&%(Bv2$>N2$ &R$2t² ³~+Ja1!¤46+A>N2$0(IP!;%C+L &'\²ÿIJ &-—³›%('Œ!;#&2*'?2N!u+La

ú

2$0C2$)"2$ !$'l!$#&IY!AIL>N2A%( =²`&/J!A &+Y! %C =³ Zh’u'h/&'?/1IL0(35462QIJ'?'?/1),27!;#&IP!t!$#&2A'NR¡#&2@),IJ']+La²IL &-

Ž Ž

³ ³›IL>N2uIL0C% 2hIL &-F!$#&IY!w!;#&2$%C> R$+L0C/&), 1'\IL>N2uIL0C'?+c+J>?-&2$>N2$-IJ0(% 2@Z\ƒ+L>N2$+¢fD2$>?31 &+Y!$%(R$2w!$#&IY!6²

%C'] &+Y!—kKJ2$ &2$>NIL0(0g<|sŒ!;#&2Q'NIL)"28IJ'*³ ²*Z

² ³

¶·?¸ ¹?º ¹?»¹¢¼¦½~¾À¿›Á ·?¸¹¢º¹¢»¹?¼|½D¹ø|õ;¶©Ã ·?¸¹¢º¹¢»¹?¼|½Â

·?ÃGÍÀÎGÍÀÏG¶ÑÐ

ôGÁ¦õ ö©Á¦½

Í¢¡GÏÀÍ¢£À¶

·?ÃGÍÀÎGÍÀÏG¶ÑÐ

ôGÁ¦õ ö©Ã¦½

Z¦¥Y¥¦¯$¬Gý °$¬Gþ.¯$¦¨§v¥§vª «¬G­¨®&¯@¦¨§v¥l° ± []#&2]%( ;!;2$>N'?2$RV!;%(+J F² ³´+La!¤46+Q>?2$0CIY!$%(+L 1'² IJ &-,³´%('j!$#&2]'?2V!

¤

+Ja“2$0C2$)"2$ !$'l!$#&IY!8IJ>?2A%C ¨§ ©  š²IL &-}³ Z]’*KLIL%C &31462*IL'?'N/&)"2h!;#1IY!]!;#128'NR¡#12$),IJ']+La²ÑIJ &-‡³

Ž Ž

IJ>?2\IJ0(% 2_IJ &-*!;#1IY!•!$#&2$%(>9R@+L0(/1), &'lIL>N2]IL0C'?+h+L>?-12$>?2$-cIJ0(% 2$Z•Äu+P!;%(R@29!$#&IY!7² ³² k¨² ³sNZ

² ³

¶·?¸ ¹?º ¹?»¹¢¼¦½~¾À¿›Á ·?¸¹¢º¹¢»¹?¼|½D¹à ·?¸ ¹?º¹¢»¹¢¼D½Â

·?ÃGÍÀÎGÍÀÏG¶ÑÐ

ôGÁ¦õ ö©Á¦½

ù;øG¶ÀÍGÁÀÃGÍGÏÀ¶

·?ÃGÍÀÎGÍÀÏG¶ÑÐ

ôGÁ¦õ ö©Ã¦½

Z¦Qý § ¬Àþ.¯$¦§v¥•±*o8Ov2$>NIY!$2$'u+L yI'N%( &KJ0(2A>N2$0(IP!;%C+L }IJ &->?2@),+¢fD2$'h'?+J),2z+La|!;#&2cR@+L0(/1), &'NZu'N2$ab/&0



ab+J>,>N2$'!$>?%(RV!;%( 1K}%( 1ab+L>?)"IY!$%(+J &Z©’u'?'N/&)"2c!$#&IY!"462F4uIL ;!=+L &0g<!;#&2¨ &IJ),2¨IJ &-€IJ-&-&>?2@'?'"ab>?+L)

>N2$0(IP!;%(+J =²*Z

²

 ! #"$"&%')(*(

¶·?¸ ¹?º|½©¾G¿›Á ·?¸ ¹?º¹¢»¹¢¼D½Â

[]#G/&' `D2@R$+L)"2]%(>N>?2$0C2NfÀIL ;!WIP!?!;>N%(`&/!;2$'NZl‹€2]R$+L/&0C-,%C &'!$2$IL-c>N2$%( 1ab+L>?R@2ˆ!;#&%C'•` ?%g!;%( 1K

»¹¢¼

¶·?¸ ¹?º|½©¾G¿›Á ·?¸ ¹?º¹,+¦¹,+.½Â

ÃGÍGÎÀÍGÏÀ¶.-GÅ¢/Àɹ—ÅGÈÀÈ¢0GÉ21¢1

ôGÁ¦õ ö©Á

Z¦4l¬À­¬Gþ.¯$¦¨§v¥ˆ±\oQOv2$>?IP!;2$'h+L SIc'?%C &KL0C28>N2$0(IP!;%C+L ¨IL &-S>?2$)"+¢fD2$']'?+J),2Q+Jaj!;#&2Q>N+ 48'NZt[]#12A>?2@),+¢fÀIL0

3

%C'8`1IL'?2@-+L 'N+L)"2,R@+L &-&%g!;%C+L y'?Ov2$R$%C^&2$-y` <!$#&2,/1'?2$>NZ D+J>Q2N‰|IJ),O10(2$3 '?/1O&Ov+L'?28462646IL ;!—IJ0(0

!$#&2W!;/&O&0C2$']ab>?+J)¢² 48#12$>?2W!;#128 1IL)"28%C'](ƒ%CR¡#&IJ2$0(‘CZ

ó65

²

7  8:9<;>=

¶·?¸ ¹?º ¹?»¹¢¼¦½~¾À¿›Á ·?¸¹¢º¹¢»¹?¼|½D¹¸FEHG¢öJIGÇ ÌGÅÀÉKJL¦Â

ÃGÍGÎÀÍGÏÀ¶Ð

ôGÁ¦õ ö©Á

M¢NGÍÀÁGÍ.-ÀÅ¢/GÉ.EOG?ö2IGÇ;ÌÀÅGÉ2KJL

Z¦Qÿ§vý ¬R4l¬À­¬Gþ.¯$¦¨§v¥l°TSH„&2$0C2$RN!;%C+L &'c`v2$R@+L)"2}R$+J),O&0C%(R$IP!;2$-‡48#&2@ y!;#12}R$+J &-&%X!$%(+J &',KJ2N!}0(+L 1KL2$>N3

P

O&IJ>!$%(R$/&0CIL>N0X<¨48%X!$#}iuIP!;IL0C+LKkç!$#&2A+P!;#&2@>w!¤46+ab+J>?)"'tIJ>?2cO&>N2N!?!¤<'!$>?IL%CKL#;!;ab+J>46IJ>?-&sNZVU]+L &'N%(-&2@>

ab+J>_2V‰|IJ),O&0C2$3;48#&2$ *462ˆ4uIL ;!7IL0C0J!;#12ˆ!;/&O&0C2$'•ab>N+L)dn´48#12$>?2ˆ!$#&2] &IL)"2]%('ˆ(ƒ%(RY#&IJ2$0(‘.oQny48#12$

!$#&2QKL2$ &-12$>]%(']Cƒ‘(ZG‹€2W48>?%g!;2w!$#&%(']%C =i*IY!;IJ0(+JKIL'N±

¶·¢¸¹?º ¹?» ¹?¼¦½†¾G¿›Á ·?¸ ¹?º¹¢»¹¢¼D½¦¹,¸FEHG?ö2IGÇ ÌGÅGÉ2KJLDÂ

¶·¢¸¹?º ¹?» ¹?¼¦½†¾G¿›Á ·?¸ ¹?º¹¢»¹¢¼D½¦¹,»FEHG?öWL¦Â

Äu+P!;%CR$2h!$#&IY!Q462cC'?O&0C%X!$‘j!;#&2cR@+L &-&%g!;%C+L ILR@>?+L'N'h!¤46+š>?/&0C2$'?3 /1'!,IL'W462"-&+¨%( š!$#&2c/& &%C+L yR$IL'N2

U

Ž

kXU]+L)"2z!;+ !;#&%C +Lah%X!$3l!;#12,oQn™%('A%C &-&2$2@-¨!$#&2,/1 &%(+J †+Jaˆ!¤46+}R@+L &-&%g!;%C+L &'Ns?Zy„&%()"%(0CIL>?0g<3j!$#&2

R$+J),)"I%C }2$IJR¡#S+La|!;#&2cIJ`v+ fD2c-1IY!;IJ0(+JK=>N/&0(2@'t)"+.-&2$0C'w!;#&2c’*ÄuidR$+J &-&%g!;%(+J &Zcp92N!$‘('hR$+L &'N%(-&2@>

I)"+L>N2AR$+J),O10(%(R@IY!;2@-=R@+L &-&%g!;%C+L &±Q„&2$0C2$RN!8IJ0(05!;#&27!;/&O&0C2$'hab>?+L)xn !;#&IP! IL>N2A &2$%g!;#&2@>t)"IL0C2A &+J>

#&I¢fD2w!$#&2Q &IL)"2Q( v+¢‰|‘CZ

¶·¢¸¹?º ¹?» ¹?¼¦½†¾G¿›Á ·?¸ ¹?º¹¢»¹¢¼D½¦¹,» ¾¢YZG?öWLG¹=¸d¾FY[G¢ôJ\;¸WL¦Â

‹#&%C0(2Q'?2@0(2$RN!$%( &K*!$#&2h!$/&O&0C2$']ab>?+J) n”!$#&IY! IJ>?2z &+Y! `v+P!;#)"IL0(2QIJ &-#&I¢fD2W!;#&2z &IL)"2Q( v+¢‰|‘9%C'

IJR¡#&%C2NfD2$-"` <†kç48#;<|µ s?±

¶·¢¸¹?º ¹?» ¹?¼¦½†¾G¿›Á ·?¸ ¹?º¹¢»¹¢¼D½¦¹,» ¾¢YZG?öWL¦Â

¶·¢¸¹?º ¹?» ¹?¼¦½†¾G¿›Á ·?¸ ¹?º¹¢»¹¢¼D½¦¹,¸ ¾¢YZG?ô2\;¸HL|Â

Z¦]c®&ý ¯$¬G°$¦®&¥^Qý§:_ `lþ.¯ ±c[]#&%('z%('w!$#&2"'?2N!,+Law(O&IJ%(>N'?‘ ab+L>N),2$-y`;<›RY#&+.+L'N%( &K !;#&2"^&>?':!,2$0(2$)"2$ ;!

ó

ab>N+L)¢²ÑIL 1-"!;#12u'?2$R@+L &-¨2$0C2$),2@ !tab>?+J)¢³Z] ¤ R$IL'N2Q+LalR$+L &ab/1'?%(+J &'hIL)"+L &K"IY!N!;>?%C`&/J!$2u &IL)"2$'N3

-&%C'?IJ)c`&%(KJ/&IY!$2h!;#12$)x` <O&>N2$^J‰¦%( &K6!;#&2@)Š48%g!;# !;#&2c>N2$0(IP!;%(+J } &IJ),2@Z8[]#12cR$IL>:!;2$'N%(IL SO&>?+.-1/&RN!

+Jav!¤46+">?2@0(IY!$%(+J &'7²ÑIL &-}³ %('hKL%gfD2$ `;<|±

² ³

¶·?¸2a¦¹¢ºJa¦¹¢»Ja|¹V¼JaD¹?¸Fbˆ¹?ºFbˆ¹?»¢bˆ¹¢¼ b¦½~¾À¿›Á ·?¸Ja|¹?º2a¦¹?»2a¦¹V¼2a.½¦¹,à ·?¸¢b ¹?ºFb¹?»Fb¹V¼Fb¦½ˆÂ

ÃGÍGÎÀÍGÏÀ¶©ÁhÂ*-ÀÅ¢/Àɹ—ÁhÂÅÀÈGÈF0GÉJ1F1D¹ÁtÂËÀÉ¢-GÈÀÉ¢0 ¹,Áh¨ÆJIc0FdGÌÀÈGÅ¢d.É ¹

ÃhÂ*-ÀÅ¢/Àɹ—ÃhÂÅÀÈGÈF0GÉJ1F1D¹ÃtÂËÀÉ¢-GÈÀÉ¢0 ¹,Ãh¨ÆJIc0FdGÌÀÈGÅ¢d.É

ôGÁ¦õ ö©Á¹¢Ã

Äu+P!;%CR$2Q#&+¢4462z-&%('NIL)c`&%CKL/&IP!;2AIP!?!$>?%(`1/J!;2$'h%( !;#&2Q„&žQp‡fD2$>N'?%(+J &Zu’u0C'?+" &+Y!$%(R$27!;#&IP!F%Ca²r#1IL'

³H48%C0(0¦#1I fD2 f‡!;/1O&0(2$'NZ !;/&O&0C2$']IL 1-‡³ #1IL'Wf‡!;/&O&0C2$'?31!$#&2$ =²

e e

ó6g

Ž

Z¦hji“¬G¯$®lk#m¦§v¦¨¥ˆ±¨[]#&%('c%C' /&'!=0C% 2*!$#&2¨R$IL>:!;2$'N%(IL ~O&>N+.-&/&RN!=`1/J!=KL+.2$'"I'!$2$O~ab/&>!$#&2$>NZ†’*aä!;2$>

5

U

ab+J>?)"%( &K*!$#&2 f‡!;/&O10(2$'N3¦%X!u'?2$0C2$RN!$'\+L 10XN39`&IL'N2$-

e

+J 'N+L)"2R$+L &-1%X!;%C+L &ZŠ[]#G/&'N3h!;#12š!;#&2N!$IYT +L%C ©+Lah!¤4u+>?2$0CIY!$%(+L 1'#&IL'8!;#&2'NIL)"2 À/&)c`v2$>š+La

U

R$+J0(/&)" &'hIL'!;#&2zR$IL>:!;2$'N%(IJ }O&>N+.-&/&RN!A`&/J!A &+Y!c &2$R$2$'N'?IJ>?%(0g<!;#128'NIL)"2A G/&)c`v2$>]+La >N+ 48'Ak'N+L)"2

+Ja]!$#&2}>N+¢48'Q48%C0(0h`D2S>?2$)"+¢fD2$-~`v2$R$IL/1'?2š!;#12N< -&%C'?'?IP!;%C'?aä<Ñ'N+L)"2}R$+J &-&%g!;%(+J &s?Z^U]+J &'?%C-&2$>ab+J>

2N‰¦IL)"O&0C2$3 !;#&IP!]462W46IJ !w!;+c^& &-¨CO&IL%C>?'?‘9+Ja“':!;/&-12$ !$'\'N/&R¡#F!$#&IY!]!$#&2Q^&>?':!8OD2@>?'?+J ¨%( F!;#&2*O&IL%C>

%C']IL0X4uI¢<|']Cƒ%(RY#&IL2$0C‘(Z|‹€28KJ2N!;±

²on)p6qsr ³

t#u#8lv ;>=

¶·?¸2a¦¹¢ºJa¦¹¢»Ja|¹V¼JaD¹?¸Fbˆ¹?ºFbˆ¹?»¢bˆ¹¢¼ b¦½~¾À¿›Á ·?¸Ja|¹?º2a¦¹?»2a¦¹V¼2a.½¦¹,à ·?¸¢b ¹?ºFb¹?»Fb¹V¼Fb¦½D¹,¸2acEHG¢öJIÀÇ;ÌGÅÀÉJKL|Â

ÃGÍGÎÀÍGÏÀ¶©ÁhÂ*-ÀÅ¢/Àɹ—ÁhÂÅÀÈGÈF0GÉJ1F1D¹ÁtÂËÀÉ¢-GÈÀÉ¢0 ¹,Áh¨ÆJIc0FdGÌÀÈGÅ¢d.É ¹

ÃhÂ*-ÀÅ¢/Àɹ—ÃhÂÅÀÈGÈF0GÉJ1F1D¹ÃtÂËÀÉ¢-GÈÀÉ¢0 ¹,Ãh¨ÆJIc0FdGÌÀÈGÅ¢d.É

ôGÁ¦õ ö©Á¹¢Ã

M¢NGÍÀÁGÍ©ÁtÂ*-GÅF/GÉwExG?öJIÀÇ;ÌÀÅGÉJK2L

Äu+P!;%CR$2|!;#&IP!“4u2_%C !$>?+.-&/1R$2l!$#&2 (`v+¢4]!;%(2@‘.'<¦)c`D+J0yn*pz'?/lz6‰ÀT?2$-z` “%C &-&%CR$IY!$%( &K

!$#&2W!;#&2N!$IYT +L%C &Z]’u0('N+L39>N2$IL0C%|{$2w!$#&IY!

U

²}n*p~´³ ~]k² ³s

7

Z¦€S®&¯`lý®1­m¦§v¦¥•±S[]#&%('z%(' /&':!—IR$0C2NfD2$>N2$>w46I¢<!$+R$+L)c`1%( &26!¤46+}>N2$0(IP!;%C+L &'c%C !$+}+L 12$Z~[]#&2

g

U

`&IJ'?%CR,%C-&2$I¨%('W!;#1IY!—%Ca|!;#&26!¤46+¨>?2$0CIY!$%(+L 1'8#1I fD2"'?+J),2"R$+J0(/&)" Œk'NsQ%( yR$+L)"),+J &3|!;#&2@ ¨4u2,R$IJ

CR$+L0C0(ILO1'?2$‘j!$#&2$) %C !$+"!;#12,'NIL)"2R$+L0C/&)" %C ¨!$#&2,^1 &IL0w+L/J!$O&/J!;Zyƒ+L>N2$+¢fD2$>?3|4u2,R$IJ -&+ !;#&%C'

+J &0X<}%(a!;#&2w!¤4u+h!;/1O&0(2$' ab>N+L)Ñ!;#&2w!¤4u+8>N2$0(IP!;%C+L &'wILKL>N2$2u%C z!;#&+J'?2uR@+L)"),+J R$+L0C/&)" &'?Zw[]#À/&'N3¦%g!

%C'u'?%C),%C0(IL>ˆ!$+z!;#12cR$IL>:!;2$'N%(IL O1>?+.-&/&RV!;3•`&/!u462z +L%C &‘l+L &0g<}!;#1+L'?2cO1IL%(>N'w!;#1IY! )"IY!$R¡#S%( š!;#12$%(>

U

R$+J),)"+L "IY!N!;>N%(`&/J!$2$'?Z‚U]+L &'N%(-&2@>l!;#1IY! 46246IJ ! !;+Q^& 1-z!;#&2h &IJ),2$3vIJ-&-&>?2@'?'?3vKJ2$ &-&2$>N3vKLO&IzIL &-

`&%C>!$#&-&IY!$2A+Ja•'!$/&-&2$ ;!;'h%( ¨I'N%( &KJ0(2Q>?2$0CIY!$%(+L 1ZuÄu+Y!$%(R$2_!;#&IP! KLO&I"%('hI¢fÀIL%(0CIL`&0C2Qab>?+L)„ƒ”`&/!t!$#&2

+P!;#&2$> ab+J/&>•IY!N!;>N%(`&/J!$2$' IL>N2]O&>?2$'N2$ ;!7%( ²*Z]„1+L3 462] &2@2$-cIW46I¢

!¤46+c>N2$0(IP!;%C+L &'N±

²on)p ƒ

¶·?¸ ¹?º ¹?»¹!†¹¢¼D½~¾G¿~Á·?¸ ¹?º ¹?»¹¢¼¦½|¹,η¢¸¹?º ¹†|½Â

ÃGÍGÎÀÍGÏÀ¶©ÁhÂ*-ÀÅ¢/Àɹ—ÁhÂ"ÅÀÈGÈF0GÉJ1F1¦¹,ÁtÂËÀÉ¢-GÈÀÉ¢0 ¹ÎtÂË¢‡ÀŹ—Áh¨ÆJIc0FdGÌÀÈGÅ¢d.É

ôGÁ¦õ ö©Á¹¢Î

M¢NGÍÀÁGÍ©ÁtÂ*-GÅF/GÉFEGÎhÂ)-.Å¢/ÀÉ

ˆ

ø¢‰Áh¨ÅGÈGÈF0GÉ21¢1ŠEÎhÂÅÀÈGÈF0GÉJ1F1

Z8«¬G¥l®lŒ¦¥uj±\[]#1%('h%(' /&':!FIcR$+.+J0v!$#&%( 1KL39%( ¨R$IJ'?2W462Q#&I¢fD2W!;+.+")"IL ;

U

£‹

R$+J &ab/&'N%(+L 1'uIL>N%('?%C &KLZ7‹”2cR@IL }/1'?2*!;#1%('h+LOv2$>?IP!;+J>]!$+>?2@ &IL)"2AI>N2$0(IP!;%C+L &‘C't &IJ),2zIL &-1ML+L>Q+J &2

Ž

+J>t)"+L>N28+Ja %X!;']IP!?!$>?%(`1/J!;2$'NZu v+L>]2N‰¦IL)"O&0C2$39IL'N'?/&)"2h4u2]4uIL ;!t!$+,>N2$ &IJ),28² !$+ IJ &-})"I 2

%g!;'hR$+L0C/&), 1'ˆ!;+`v2AR@IL0(0C2$- 3 IL 1- ZQ[]#&%C't%C't)"+L':!F/&'N2$ab/&0548%X!$#=>N2$0(IP!;%C+L &IJ0“IJ0(KL2@`&>?IJ3

ú ¤

£

e e e

Ž

0C% 2h'?+L±

k¨²*s

‘

;“’”–•& ‚—# s˜t™

5 ‹ CS 5614: Misc. SQL Stuff, Safety in Queries 81

What is still to be covered (and will be)

† Declaring constraints † Domain Constraints † (Foreign Keys)

† More SQL Stuff † Subqueries † Aggregation

† SQL Peculiarities † Strange Phenomena † More on Bag Semantics † Ifs and Buts

† Embedding SQL in a Programming Environment † Accessing DBs from within a PL † (will be covered in Module 3)

CS 5614: Misc. SQL Stuff, Safety in Queries 82

What will be mentioned (but not covered in detail)

† Triggers † Read Cow Book or Boat Book

† More SQL Gory Details

† Recursive Queries (SQL3) † Why do we need these?

† Security

† Authorization and Privacy

† Trends towards Object Oriented DBMSs

CS 5614: Misc. SQL Stuff, Safety in Queries 83

Tuple-Based Domain Constraints

† Already Seen † NOT NULL † UNIQUE, PRIMARY KEY etc.

† In General

CREATE TABLE Students (sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL, CHECK (gpa >= 0.0) );

† Note: Implementations vary, but this is the general idea

† Other Complicated Forms † Constraints on whole relations, Assertions

CS 5614: Misc. SQL Stuff, Safety in Queries 84

Referential Integrity Constraints

† Foreign Keys † An attribute a of R1 is a if it “references” the primary key (say b) of another relation R2 † In addition, there is a ref. integrity constraint from R1 to R2.

† Example † login is a FOREIGN KEY for Students

CREATE TABLE Students (sid CHAR(9) PRIMARY KEY, name VARCHAR(20), login CHAR(8) REFERENCES Accounts(acct), age INTEGER, gpa REAL );

CREATE TABLE Accounts ( acct CHAR(8) PRIMARY KEY );

CS 5614: Misc. SQL Stuff, Safety in Queries 85

Alternatively

† Can use “FOREIGN KEY” construct

CREATE TABLE Students (sid CHAR(9) PRIMARY KEY, name VARCHAR(20), login CHAR(8), age INTEGER, gpa REAL, FOREIGN KEY login REFERENCES Accounts(acct) );

CREATE TABLE Accounts ( acct CHAR(8) PRIMARY KEY );

† Note: acct should be declared as PRIMARY KEY for Accounts! † in both cases

CS 5614: Misc. SQL Stuff, Safety in Queries 86

SQL Subqueries

† Given Students(sid,name,login,age,gpa) HasCar(sid,carname) † Find † the car of the student with login=”mark”

† Traditional Way

SELECT carname FROM Students, HasCar WHERE Students.login=’mark’ AND Students.sid=HasCar.sid;

† The ‘Subway’

SELECT carname FROM HasCar WHERE sid= (SELECT sid FROM Students WHERE login=’mark’);

CS 5614: Misc. SQL Stuff, Safety in Queries 87

Aggregation

† Given Students(sid,name,login,age,gpa)

† Find † the average of the ages of all the students

† Solution

SELECT AVG(age) FROM Students;

† Other Operations † SUM (summation of all the values in a column) † MIN (least value) † MAX (highest value) † COUNT (the number of values), e.g.

SELECT COUNT(*) FROM Students;

† COUNTs the number of Students!

CS 5614: Misc. SQL Stuff, Safety in Queries 88

Ordering

† Given Students(sid,name,login,age,gpa)

† List † the students in (ascending) alphabetical order of name

† Solution

SELECT * FROM Students ORDER BY name;

† and that for DESCending ORDER is

SELECT * FROM Students ORDER BY name DESC;

† Default is ASC

CS 5614: Misc. SQL Stuff, Safety in Queries 89

Grouping

† Given Students(sid,name,login,age,gpa)

† Find † the names of students with gpa=4.0 and † group people with like ages together

† Solution

SELECT name FROM Students WHERE gpa=4.0 GROUP BY name;

CS 5614: Misc. SQL Stuff, Safety in Queries 90

More on Grouping

† Given Students(sid,name,login,age,gpa)

† Find † the names of students with gpa=4.0 and † group people with like ages together and † show only those groups that have more than 2 students in it

† Solution

SELECT name FROM Students WHERE gpa=4.0 GROUP BY name HAVING COUNT(*) > 2;

CS 5614: Misc. SQL Stuff, Safety in Queries 91

Summary of SQL Syntax

† General Form

SELECT FROM WHERE GROUP BY HAVING

† Order of Execution † FROM † WHERE † GROUP BY † HAVING † SELECT

CS 5614: Misc. SQL Stuff, Safety in Queries 92

Views

† Can be viewed as temporary relations † do not exist physically BUT † can be queried and modified (sometimes) just like normal relations

† Example:

CREATE VIEW GoodStudents(id,name) AS SELECT sid,name FROM Students WHERE gpa=4.0;

SELECT * FROM GoodStudents WHERE name=’Mark’;

† Can we the original relation using the GoodStudents VIEW?

CS 5614: Misc. SQL Stuff, Safety in Queries 93

Beginning of Wierd Stuff

† SQL uses Bag Semantics † meaning: does not normally eliminate duplicates † e.g. the SELECT clause

† BUT (a big BUT)

† this doesn’t apply to † UNION, INTERSECT and DIFFERENCE

† Either way, it provides facilities to do whatever we want

† If you want duplicates eliminated in SELECT clause † use SELECT DISTINCT .....

† If you want to prevent elimination of duplicates in UNION etc. † use (SELECT ...) UNION ALL (SELECT ...) † Likewise for INTERSECT and DIFFERENCE

CS 5614: Misc. SQL Stuff, Safety in Queries 94

... and that’s just the tip of the iceberg

† What happens with the following code?

SELECT R.A FROM R,S,T WHERE R.A = S.A or R.A = T.A

† when R(A) has {2,3}, S(A) has {3,4} and T(A) is {}

† Confusion Reigns!

CS 5614: Misc. SQL Stuff, Safety in Queries 95

Safety in Queries

† Some queries are inherently “unsafe” † should not be permitted in DB access

† Example † Given only the following relation

Students(id)

† Find all those who are not students

† Easy to distinguish unsafe queries via common-sense † Final result is not closed † Is there an automatic way to determine “safety”?

CS 5614: Misc. SQL Stuff, Safety in Queries 96

Answer: Yes!

† Easiest to spot when written in Datalog

Answer(id) <- NOTStudents(id).

† Golden Rule † Any variable that appears anywhere must also appear in a non-negated body part † In this case, id causes the query to be unsafe

† Example of a Safe Query

Answer(id) <- People(id), NOT Students(id)

† This produces all those people who are NOT students † safe because the People relation provides a reference point † id which appears in a negated body part also appears non-negated

CS 5614: Misc. SQL Stuff, Safety in Queries 97

More Dangers

† Problem not restricted to negated body parts † occurs even with arithmetic body parts (why?)

† Given † only the following relation

Students(id,age)

† Find all those numbers that are greater than the age of some student

Answer(x) <- Student(id,age), x>age.

† Extension to previous rule † Any variable that appears anywhere must also appear in a non-negated, non-arithmetic body part † In this case, x causes the query to be unsafe bcoz it doesn’t appear in a non-negated, non-arithmetic part

CS 5614: Misc. SQL Stuff, Safety in Queries 98

One More Example

† Given † a relation Composite(x) † which lists all the composite numbers

† Write a query to find † the prime numbers

† Wrong Way † Prime(x) <- NOT Composite(x).

† Right Way † Prime(x) <- Number(x), NOT Composite(x).

† Safety in Other Notations † : via the subtraction operator † SQL: via the EXCEPT construct

† Notice how SQL and Relational Algebra do not allow unsafe queries † because there is no way to write such queries with the given constructs † how clever, eh? :-) † It is always amazing how “languages” force you to think in a certain manner † a problem long studied by philosophers

CS 5614: Misc. SQL Stuff, Safety in Queries 99

Recursion in Queries

† Used to specify an indefinite number of “applications” of a relation

† Example † Given only the following relation

Person(name,parent)

† Find all the ancestors of “Mark”

† Easy to find an ancestor at a predefined level † parent: Use Person † grandparent: Join Person with Person † great-grandparent: Join Person with Person with Person † and so on.

† To find an ancestor at no predefined level † Need to Person with Person an “indefinite” number of times

† SQL3 provides support for recursive definitions

CS 5614: Misc. SQL Stuff, Safety in Queries 100

Solution in Datalog

† First, the base case

Ancestor(x,y) <- Person(x,y).

† Then, the inductive step

Ancestor(x,y) <- Person(x,z), Ancestor(z,y).

† Can also write the previous rule as

Ancestor(x,y) <- Ancestor(x,z), Ancestor(z,y).

† why?

CS 5614: Misc. SQL Stuff, Safety in Queries 101

Recursion in SQL3

† Use the “WITH RECURSIVE ... SELECT” construct

† Example

WITH RECURSIVE Ancestor(name,ans) AS

(SELECT * FROM Person) UNION (SELECT Person.name,Ancestor.ans FROM Person, Ancestor WHERE Person.parent=Ancestor.name)

SELECT * FROM Ancestor;

† Use with caution: Some kinds of recursive queries will not be allowed! † example: the following Datalog query might not be allowed in SQL3 Ancestor(x,y) <- Ancestor(x,z), Ancestor(z,y).

† because the rule involves 2 applications of the recursively defined predicate † “Linear recursion” allows only one (as in the SQL code above)

CS 5614: Misc. SQL Stuff, Safety in Queries 102

Final Example

† Be careful when combining negation, aggregation and recursion † perfect recipe for disaster!

† Mutual Recursion † Odd(x) <- Number(x), NOT Even(x). † Even(x) <- Number(x), NOT Odd(x).

† What are the problems? † Notice that the query appears “safe” (per Slide 96) † cycles indefinitely!; no proper base cases

† Illegal in SQL3 † not because of mutual recursion † but due to the fact that there is no “unique interpretation” to the query † Eg: 6 could be either in Odd or in Even; both are acceptable!

† Sometimes mutual recursion is good and fruitful, if written properly † with proper limiting constraints and base cases

CS 5614: Misc. SQL Stuff, Safety in Queries 103

Introduction to Deductive DBMSs

† Intersection of traditional RDBMSs and Logic Programming

† Example Systems † CORAL (Univ. Wisc.) † LDL++ (MCC) † XSB Systems (SUNY, Stony Brook)

† Can be viewed as † extending PROLOG-type systems with secondary storage † extending RDBMSs with deductive functionality

† Mappings: Commonalities between PROLOG and DBMSs † Predicate: Relation † Argument: Attribute † Ground Fact: Tuple † Extensional Definition: Table (defined by data) † Intensional Definition: Table (defined by a )

CS 5614: Misc. SQL Stuff, Safety in Queries 104

PROLOG vs. RDBMSs

† Characteristics of PROLOG † Tuple-at-a-time † Backward Chaining † Top-Down † Goal-Oriented † Fixed-Evaluation Strategy (Depth-First)

† Characteristics of RDBMSs † Set-at-a-time (recall relational algebra) † Forward Chaining † Bottom-Up † Query Optimizer figures a good evaluation strategy

† Example † ancestor(X,X). parent(amy,bob). † ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y).

† Query † Find the ancestors of bob: ancestor(X,bob)?

CS 5614: Misc. SQL Stuff, Safety in Queries 105

PROLOG Pitfalls

† Previous Example † Linear Recursion † Tail Recursion

† What if we reverse the order of clauses in † ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y). † PROLOG goes into an infinite loop (why?)

† What if we make it † ancestor(X,Y) <- ancestor(X,Z), ancestor(Z,Y). † “Not Linear” Recursion

† Inference = Resolution + Unification † Entailment in First Order Logic is Semi-decidable

CS 5614: Misc. SQL Stuff, Safety in Queries 106

Example of Deductive

† Same-Generation: Hello World of DDBMSs † sg(X,Y) <- flat(X,Y). † sg(X,Y) <- up(X,U),sg(U,V),down(V,Y).

† Magic: A Rewriting Technique † Rewrite query such that advantages of bottom-up evaluation goal-oriented behavior are combined

† Example: For the query † sg(john,Z)?

† Magic produces † sg(X,Y) <- magic_sg(X),flat(X,Y). † sg(X,Y) <- magic_sg(X),up(X,U),sg(U,V),down(V,Y). † magic_sg(john).

† How do you know when to stop? † Iterative Fixpoint Evaluation (when the answer stops changing)

CS 5614: Misc. SQL Stuff, Safety in Queries 107

SQL in a Programming Environment

† Incorporating SQL in a complete application

† Why? † There are some things we cannot do with SQL alone † e.g. preserving complex states, looping, branching etc. † Typically embed SQL in a host-language interface

† Problems: Impedance Mismatch † SQL operates on sets of tuples † Languages such as C, C++ operate on an individual basis

† Solution † easy when SELECT returns only one

† When more than one row is returned † design an iterator to “run” over the results † called a “

CS 5614: Misc. SQL Stuff, Safety in Queries 108

How are these implemented?

† Vendor-Specific Implementations † ORACLE: PL/SQL (procedural extensions to SQL)

† Open Database Connectivity Standard † Provides a standard API for transparent database access † used when “database independence” is important † used when required to “connect” to diverse data sources

CS 5614: Misc. SQL Stuff, Safety in Queries 109

Tradeoffs

† ODBC † originated by Microsoft in 1991 † adds one more abstraction layer † not as fast as a native API (does not exploit “special features”) † least-common denominator approach † constantly evolving

† PL/SQL etc. † “tailored” to the details of the underlying DBMS † might not extend to heterogeneous domains † modeled after a specific programming language (e.g. Ada for Pl/SQL)

CS 5614: Misc. SQL Stuff, Safety in Queries 110

In Between: Stored Procedures

† Used for developing “tightly-coupled” applications † ”push computations” selectively into the database system † avoid performance degradation † work in database address space instead of application address space

† Advantages † No sending SQL statements to and fro † eliminate pre-processing † speedup by an order of magnitude

† Example Applications † Database Adminstration † Integrity Maintenance and Checks † Database Mining

† Disadvantages † Non-standard implementation † Difficult to enforce transactional synchronization † Without traditional SQL optimization, can lead to performance degradation

CS 5614: Misc. SQL Stuff, Safety in Queries 111

Introduction to Query Optimization

† Helps attain declarativeness of RDBMSs † One of the main reasons for commercial success of DBMSs

† A motivating example † Find all students with 4.0 gpa enrolled in CS5614

SELECT name FROM Students, Classroll WHERE Students.name = Classroll.studentname AND Students.gpa = 4.0 AND Classroll.coursename = ‘CS5614’

† Two Strategies † Do join and then filter out the ones with gpa <> 4.0 and course <> CS5614 † Filter first the ones with gpa <> 4.0 and course <> CS5614 and then Join

† Which is Better? † Always good to “push selections” as far down into the query parse tree

CS 5614: Misc. SQL Stuff, Safety in Queries 112

How does a Query Optimizer work?

† Three Requirements † A Search Space of “Plans” † A Cost Model (for Plan evaluation) † An Enumeration Algorithm

† Ideally † Search Space: contains both good and efficient plans † Cost Models: cheap to compute and accurate † Enumeration Algorithm: efficient (not a monkey-typewriter algorithm)

† Example of a Search Space † See Previous Slide

† Examples of Cost Models † #(tuples) evaluation † #(main memory locations) etc.

† Example of an enumeration algorithm † Sequential enumeration of a lattice of plans † Dynamic Programming vs. Greedy Approaches

CS 5614: Misc. SQL Stuff, Safety in Queries 113

A Simple Measure of “Cost”

† #(Tuples) in a query

† Easiest to compute for † Cartesian Product: #(R X S) = #(R)#(S) † Projection:#(Pi(R)) = #(R)

† A Notation for Other Operations † V(R,A) = Number of distinct values of attribute “A” in R † formulas assume that all values of “A” are equally likely in R † Holds in average case for most distributions (e.g. Zipf)

† Selectivity Factors for Selection Operations † Equality Tests: Use 1/V(R,A) † < or > Tests: Use 1/3 † “<>” Test: Use (V(R,A)-1)/V(R,A) † AND Conditions: Multiply Selectivity Factors † OR Conditions: Three Choices † Sum of results from individual selectivity factors † Max(sum,total size of relation): why? † n(1-(1-m1/n)(1-m2/n)) formula : most accurate

CS 5614: Misc. SQL Stuff, Safety in Queries 114

Estimating the Size of a Join

† Assume: R(X,Y) Join S(Y,Z)

† Range of Values † Minimum: 0 † In-between: #(R) (if Y is a foreign key for R and a key for S) † Maximum: #(R)#(S) (if Y’s in R and S are all the same)

† Assumptions for Join Size Estimation † Containment of Value Sets † Preservation of Value Sets

† Containment of Value Sets † If V(R,Y) <= V(S,Y) then the Y’s in R are a subset of the Y’s in S † Satisfied when Y is a foreign key in R and a key in S

† Preservation of Value Sets † #(R Join S,X) = #(R,X) † #(R Join S,Z) = #(S,Z) † why is this reasonable?

CS 5614: Misc. SQL Stuff, Safety in Queries 115

The Actual Estimate

† Assume that V(R,Y) <= V(S,Y) † Every tuple in R has a chance of 1/V(S,Y) of joining with a tuple of S † Every tuple in R has a chance of #(S)/V(S,Y) of joining with S † All tuples in R have a chance of #(R)#(S)/V(S,Y) of joining with S

† What if V(S,Y) <= V(R,Y) † Answer: #(R)#(S)/V(R,Y) † In general: #(R)#(S)/(max (V(S,Y),V(R,Y)))

† What if there are multiple join attributes † Have a “max” factor in the denominator for each such attribute!

† How to Estimate #(R Join S Join T)? † Does it matter which we do first?

† Surprise! † Estimation formula preserves associativity of Joins! † In other words, “it takes care of itself!”

† Thus, for a Join attribute appearing > 2 times † 3 times: Use two highest values † 4 times: Use three highest values etc.

CS 5614: Misc. SQL Stuff, Safety in Queries 116

More on Join Associativity and Commutativity

† Which is better: (R Join S) or (S Join R) † Good to put the smaller relation on the left † Why? Most Join algorithms are assymmetric

† Example: † Construct a “good query tree” for the following

SELECT movietitle FROM Actors,ActedIn WHERE Actors.name = ActedIn.actorname AND Actors.age = 23

† Number of Possible Trees of n Attributes † Arises from the shape of the trees: T(n) † Arises from permuting the leaves: n! † Total choices: n! T(n)

CS 5614: Misc. SQL Stuff, Safety in Queries 117

What is T(n)?

† Sample Values † 1: 1 † 2: 1 † 3: 2 † 4: 5 † 5: 14

† A formula † T(1) = 1 (Basis) † T(n) = T(1)T(n-1) + T(2)T(n-2) + ..... + T(n-1)T(1)

† Classifications † Left-Deep Trees: All right children are leaves † Right-Deep Trees: All left children are leaves † Bushy Trees: Neither Left-Deep nor Right-Deep

† Choosing a Join Order: Restricted to Left-Deep Trees † By Dynamic Programming: O(n!) † Greedy Approach: Make local selections

CS 5614: Misc. SQL Stuff, Safety in Queries 118

Example

† Consider † R(a,b): #(R) = 1000, V(R,a) = 100, V(R,b) = 200 † S(b,c): #(S) = 1000, V(S,b) = 100, V(S,c) = 500 † T(c,d): #(T) = 1000, V(T,c) = 20, V(T,d) = 50

† Possible Join Orders † (R Join S) Join T † (S Join R) Join T (same as above; why?) † (R Join T) Join S † (T Join R) Join S (same as above) † (S Join T) Join R † (T Join S) Join R (same as above)

† Cost Estimation = Sizes of Intermediate Relations † (R Join S) Join T: 5000 † (R Join T) Join S: 1000000 † (S Join T) Join R: 2000

† Best Plan = (S Join T) Join R

CS 5614: Misc. SQL Stuff, Safety in Queries 119

Database Tuning: Why?

† Two Families of Queries † OLTP (Access small number of records) † OLAP (Summarize from a large number of records)

† Sources of Poor Performance † Imprecise Data Searches † Random vs. Sequential Disk Accesses † Short Bursts of Database Interaction † Delays due to Multiple Transactions

† What can be done? † Tune Hardware Architecture † Tune OS † Tune Data Structures and Indices

CS 5614: Misc. SQL Stuff, Safety in Queries 120

Examples of Database Tuning

† To normalize or not to † Sacrificing Redundancy Elimination † Sacrificing Dependency Elimination

† Several Choices of Normalized Schemas † Vertical Partitioning Applications

† Recomputing Indices † Histograms etc. might be outdated

† Restricting Uses of Subqueries † Unnesting query blocks by Joins

† Declining the Use of Indices † Table Scans for Small Tables † Rule-based optimization: Rewrite A=6 as “A+0=6”

† Provide Redundant Tables † Decision-Support/ Data Mining Queries