University of Warwick s1

CS2520

UNIVERSITY OF WARWICK Department of Computer Science Second Year Winter Examinations 2008/09 Fundamentals of Relational Databases FEEDBACK FOR STUDENTS by Hugh Darwen Time allowed: 1.5 hours

This is a closed book exam. No information sources and communication devices are allowed. Illegible text will not be evaluated.

Answer Question 1 and ANY TWO out of the remaining three questions.

Read carefully the instructions on the answer book and make sure that the particulars required are entered on each answer book.

General: Where you were required to write expressions or statements in Tutorial D or SQL we have not penalised minor syntax errors, nor have we penalised solutions that are over-elaborate but still give the right result. 1. Consider the following Tutorial D script, referred to as Script1: VAR xy BASE RELATION { x CHAR, y INTEGER } KEY { x } INIT ( RELATION { TUPLE { x 'C1', y 3 } } ); INSERT xy RELATION { TUPLE { x 'C4', y 4 }, TUPLE { y 5, x 'C5' } } , r4 := xy JOIN ((r5 WHERE y > 3 AND z > y){y, z}) ; (a) Describe what you see in Script1, answering the following questions: (i) How many statements does it contain? [3] The correct answer is 2, given by the number of semicolons. You get some credit for the answer 3 (where you have wrongly interpreted the invocations of INSERT and := as full statements in their own right), but not for any other answer. (ii) What invocations of read-only operators are contained in the last line? In each case, name the operator being invoked and state whether the invocation denotes a relation, a truth value, or something else. [6] There are six (the invocations of JOIN, WHERE, AND, >, > again, projection), so it was basically one mark each. Note that > counts twice because there are two invocations of it. We have accepted brief answers such as “JOIN, relation”. Common mistakes concerning the values denoted by the invocations were truth value instead of relation for WHERE and, curiously, relation instead of truth value for AND or >. (iii) Write down the only constraint definition contained in Script1. [3]

Page 1 of 13 CS2520

This is the KEY constraint on relvar xy. The WHERE clause specifies a restriction condition and is not a constraint definition. (iv) Write down the only heading definition contained in Script1. [3] The heading is just {x CHAR, y INTEGER}. It is part of the declared type RELATION {x CHAR, y INTEGER}, which in turn is part of the variable declaration for xy. We have accepted extraneous text to the left of the heading but not inclusion of the KEY specification or INIT clause. (v) What update operators are invoked in Script1, and what are the arguments to those invocations that are in substitution for parameters subject to update? [5] The invocations we expected were those of INSERT (on xy as target) and := (on r4 as target). We didn’t think of including the INIT clause of the VAR declaration but if you did we were happy to accept that! (b) A predicate for r4 is "y people like x, which at least 4 people like, and there exists a w with spare capacity, having room for z people but containing only y". What predicates for xy and r5 can be deduced from that? [3] xy: “y people like x” r4: “w has room for z people and contains y people” An occasional mistake was to add something to the given predicate for xy. A much more common one was to retain “there exists” in front of w. That’s a mistake because that quantifier arises from the projection, {y, z}, that excludes w. Note that it is also somewhat inappropriate to retain the words “spare” and “only” in the predicate for r5, because those arise from the restriction condition used in the query. For all we know, r5 even allows room w to contain more people than it has capacity for! (c) Write down a Tutorial D definition for the heading of r4. [3] Again we allowed extraneous text as in (a)(iv), but a heading does include attribute types and you were penalised if you left those out. (d) What is the degree of r4? [1] The correct answer is 3 but your answer was assessed for its consistency with your answer to (c)-you were being tested on your understanding of the term degree. (e) What is the value of COUNT(r4) immediately after execution of Script1? [2] The correct answer is zero, because, under the rules for multiple assignment, xy contains only its initial tuple when the right-hand side of the assignment to r4 is evaluated, and that tuple cannot possibly match any tuples in the other JOIN operand under the given WHERE condition. If you got this wrong, because you hadn’t taken on board the full implications of multiple assignment, then you scored zero for this part but your answer to the next part, (f), was marked according to consistency with your misunderstanding. (f) Explain your answer to (e). [4] If you wrote “I don’t know” (or similar) for (e) and your explanation here was based on an assumption that xy included the tuples inserted in the first part of the multiple assignment and r5{y,z} might have any number of matching tuples, then you scored full marks on this part.

Page 2 of 13 CS2520

(g) Assume that SQL base tables equivalent to relvars r4 and r5 are defined. Write a sequence of SQL statements that when executed will have the same effect as Script1. [7] Important points here were: (a) To execute DELETE r4 (or TRUNCATE r4) before using INSERT on r4, because INSERT retains existing rows and so is equivalent to := only when the target is empty. (b) To insert the initial tuple into xy before populating r4 but to insert the other two tuples afterwards. However, point (b) is a consequence of the semantics of multiple assignment, so we charitably marked your answer here for consistency with your solution to parts (a), (e) and (f).

Page 3 of 13 CS2520

2. A company runs courses. Every student attending a course is invited to give an evaluation, from 1 to 10, for the presentation of the course that they attend. The company engages the services of an SQL expert, Joe, to design a database to record the students' ratings. From the answers to his questions, Joe determines that the following business rules apply: BR1: every presentation of a course runs for exactly one day BR2: every presentation is given by one teacher BR3: every presentation is given in a particular room BR4: no more than one course can be given in the same room on the same day BR5: a teacher cannot teach more than one course on the same day BR6: a student cannot attend more than one course on the same day BR7: a student cannot evaluate more than one presentation of the same course Joe is given some sample data to help with his design and he places this data in a single table, RATING, with columns S (student), C (course), R (room), D (date), T (teacher), and E (evaluation). This data is shown in Figure 1. RATING

S C R D T E Anne C1 R1 15/09/2008 Will 9 Anne C2 R1 17/09/2008 Will 8 Anne C3 R2 18/09/2008 Xavier 5 Boris C1 R1 15/09/2008 Will 7 Boris C2 R3 17/09/2008 Yvonne 6 Cindy C1 R1 20/09/2008 Xavier 9 Devinder C3 R2 15/09/2008 Zöe 7 Devinder C1 R2 20/09/2008 Will 7 Devinder C2 R3 17/09/2008 Yvonne 6 Devinder C4 R2 16/09/2008 Zöe 4 Figure 1, Sample data for RATING. (a) Complete the following definition: "A relvar R is in Boyce-Codd Normal Form (BCNF) if and only if …" [3] “… for every nontrivial functional dependency A → B that holds in R, A is a superkey of R.” One or two of you omitted either or both of “nontrivial” and “super-”. Because some textbooks do the same we have cast a blind eye to those minor errors! Curiously, a very large number of you omitted the word “for” and got the rest right. But the sentence isn’t even grammatical if you leave out “for”! We did not penalise you for this; we wondered if you were simply copying a misprint from somewhere. (b) Which two of the following functional dependencies (FDs) are inconsistent with the sample data given in Figure 1 and therefore do not hold in RATING? [3] FD1: {R,D} → {C,T} FD2: {T,C} → {R}

Page 4 of 13 CS2520

FD3: {S,D} → {R} FD4: {S,C} → {D,E} FD5: {T,D} → {R} FD6: {R,D} → {S} We made a slight mistake here as there are actually three FDs in the given list that are inconsistent with the stated business rules: FD2, FD3, and FD6. Like us, very few of you spotted FD3 (the mistake is in the last two rows of Figure 1, which violate BR6), so not much harm was done. (c) Each of the four FDs given in (b) that do hold in RATING relates to one or more of Joe's business rules. For each one, list the relevant rule number(s). (Note: Not every business rule is related to an FD.) [4] FD1 relates to BR4, BR5 FD3 relates to BR6 (and no others) FD4 relates to BR7 (and no others) FD5 relates to BR5 and, arguably, BR3. If you got (b) wrong then obviously had to get (c) wrong too, and here you will have lost further marks because you will have revealed misunderstanding of the connections between the FDs and the BRs. You were not required to include BR3 with FD5, but you did have to give both BR4 and BR5 for FD1 and you were penalised for any extra BRs on FD3 or FD4. (d) Which two determinants in the listed FDs are superkeys of RATING? [2] {S,D} and {S,C} Intuitively a (super)key must include S (student), because every evaluation is by a particular student. You are told that two of the determinants are superkeys, so it must be the two that include S. You might have worked this out by a formal proof, but note that part (b) doesn’t say that the given list includes all the nontrivial FDs that hold in RATING (as well as the three that don’t hold). In particular, it doesn’t include the FD {T,D} → {C}, relating to BR1. (e) Choose one of your answers to (d) and show how it can be proved from the given FDs, using the theorems of functional dependency (self-determination, left augmentation, and transitivity). [4] Hardly anybody scored any marks for this part. It does take quite a bit of thinking time and we perhaps didn’t pay enough attention to such exercises in the worksheets and exercises included in the lecture notes for CS252, so perhaps it was bit unfair to include this part. For the record, here are our solutions, first for the key {S,D}: {S, D}. We need to show that every attribute of RATING is functionally dependent on {S, D}. Clearly S and D are, by self-determination; and R is by the given FD3. That leaves C, T, and E. {S, D} → {R, D} (FD3 + self-determination) {R, D} → {C, T} (FD1), therefore {S, D} → {C, T} (transitivity) It remains to show that {S,D} → {E}.

Page 5 of 13 CS2520

{S, D} → {S, C} (already proved) {S, C} → {E} (FD4) therefore {S, D} → {E} (transitivity) {S,C}. We need to show that every attribute of RATING is functionally dependent on {S,C}. Clearly S and C are, by self-determination; and D and E are by the given FD4. That leaves T and R. {S, C} → {S, D} (FD4 + self-determination) {S, D} → {R} (FD3), therefore {S, C} → {R} (transitivity) It remains to show that {S,C} → {T}. {S, C} → {R, D} (R already proved, D from FD4) {R, D} → {T} (FD1), therefore {S. C} → {T} (transitivity) (f) Give two ways of decomposing RATING into an equivalent pair of relvars such that each is in BCNF. For each relvar, give its attribute names followed by one or more KEY declarations as required. For example, RV1{S,C,R,D} KEY{S,C} KEY{R}, if you think that should be one of your BCNF relvars. [4] Our intended solution: 1. RV1{T, D, C, R} KEY{T, D} KEY{D, R} RV2{S, D, T, E} KEY{S, D} 2. RV1{T, D, C, R} KEY{T, D} KEY{D, R} RV3{S, D, R, E} KEY{S, D} But some of you gave decompositions such as {S,C,D,E} and {S,C,R,T}, where one of the keys mentioned in (e) becomes a key for each of the new relvars and the nonkey attributes are distributed in a way to guarantee BCNF. These decompositions are bad, because they “lose” FDs that don’t need to be lost, but they are clearly valid solutions to the exercise as worded and so we have accepted them. We have not penalised you if your decomposition allows a tutor to be teaching several courses on the same day, because of our failure, noted under (b), to include and FD to prohibit that. (g) Assuming the relvars you identified in (f) are defined as SQL tables, for each of your two decompositions state which of its two tables requires a FOREIGN KEY declaration, and write out that FOREIGN KEY declaration in full. [4] Our intended solution, in keepingwith our solution to (f): 1. In RV2, FOREIGN KEY (T, D) REFERENCES RV1(T, D) 2. In RV3, FOREIGN KEY (R, D) REFERENCES RV1(R, D) Note that SQL requires the referenced columns to consitute a key of the referenced table, and for the foreign key to contain a corresponding column for each referenced column. Solutions that give a single-column FK referencing a table with a composite key don’t make sense. If you chose a decomposition such as {S,C,D,E} and {S,C,R,T}, then strictly speaking each relvar needs a foreign key referencing the other, but as the question

Page 6 of 13 CS2520

strongly suggests that only one FK per decomposition is expected, we have allowed you to give just one of them. (h) For each of your two decompositions, identify an FD that holds in RATING but does not hold in either of the two relvars. In each case list the business rule(s) with which the database might become inconsistent unless a constraint is declared to compensate for the lost FD. [2] Our intended solution, in keepingwith our solution to (f): 1. FD3. BR6 exposed 2. FD4. BR7 exposed If you chose the decomposition {S,C,D,E} and {S,C,R,T}, then you should have stated FD1 (BR4, BR5), FD3 (BR6), and FD5 (BR5 and arguably BR3). (i) Choose one of the decompositions and give a Tutorial D CONSTRAINT statement to address the problem arising from the lost FD you identified in (g) for that decomposition. [4] You were taught a trick using COUNT for such purposes but hardly anybody seemed to remember that one. Here are our solutions, in keeping with our solution to (f): 1. CONSTRAINT SD_KEY WITH RV1 JOIN RV2 AS RATING: COUNT(RATING) = COUNT(RATING{S,D}); 2. CONSTRAINT SC_KEY WITH RV1 JOIN RV2 AS RATING: COUNT(RATING) = COUNT(RATING{S,C});

Page 7 of 13 CS2520

3. In each part you are given a script that is in either Tutorial D or SQL. Some scripts, such as (a), consist of imperatives; others are expressions. For each script, if it is in Tutorial D, give an equivalent script in SQL; and if it is in SQL, give an equivalent script in Tutorial D. In each case an informal description of the script is given to assist you. (a) Create a pair of base tables, P and PC, for genealogical research purposes. P is for recording people's genders. PC is for recording instances of parenthood as pairs, meaning "P is a parent of C". References to PC and P in part (b) onwards, are to these tables or their corresponding relvars in Tutorial D. CREATE TABLE P ( n VARCHAR(20) NOT NULL, g VARCHAR(20) NOT NULL, CONSTRAINT MorF CHECK (g IN ('M', 'F')), PRIMARY KEY (n) ); CREATE TABLE PC ( p VARCHAR(20) NOT NULL, c VARCHAR(20) NOT NULL, PRIMARY KEY (p, c) ) FOREIGN KEY (p) REFERENCES P FOREIGN KEY (c) REFERENCES P; [9] Our solution: VAR P BASE RELATION { n CHAR, g CHAR } KEY {n}; CONSTRAINT MorF IS_EMPTY(P WHERE g<>'M' AND g<>'F'); VAR PC BASE RELATION { p CHAR, c CHAR } KEY {ALL BUT}; CONSTRAINT FK_p IS_EMPTY (PC RENAME (p AS n) NOT MATCHING P); CONSTRAINT FK_C IS_EMPTY (PC RENAME (c AS n) NOT MATCHING P); We didn’t bother to represent the SQL length constraint, (20) on the CHAR attributes (and nor did you!). Important points:  Constraint MorF has to be written as a standalone constraint, not part of the definition of P. That required the use of negation (<> for “not equal to”). Some of you who got that much right went and spoiled things slightly by writing OR instead of AND. We didn’t mind if you chose your own way of spelling “not equal to” instead of TD’s, within reason.  The constraints corresponding to the foreign keys need the use of RENAME-in each case we need a match with attribute n of P. (b) Get all recorded parenthoods. SELECT * FROM PC [1] The simple solution is just PC, but obviously we had to allow PC{p,c} and PC{ALL BUT} too. (c) Get pairs such that x is either a parent of y or a child of y. PC RENAME ( p AS x, c AS y ) UNION PC RENAME ( p AS y, c AS x ) [4] Our solution:

Page 8 of 13 CS2520

SELECT p AS x, c AS y FROM PC UNION SELECT c AS x, p AS y FROM PC We deliberately set a nasty trap here, as we have in previous years. You have to remember that SQL’s UNION makes columns correspond by ordinal position rather than by name, so “p AS y, c AS x” has to become “c AS x, p AS y” in the SQL version. You still got some credit if you overlooked this but your solution was otherwise correct. (d) Check that nobody has more than one parent of each gender. NOT EXISTS ( SELECT c FROM PC, P WHERE PC.p = P.n GROUP BY c, g HAVING COUNT(*) > 1 ) [4] Our solution: IS_EMPTY( SUMMARIZE (PC RENAME (p AS n) JOIN P) BY {c,g} ADD (COUNT() AS Parent_ct) WHERE Parent_ct > 1 ) (e) Who are Anne's grandparents? ((PC WHERE c = 'Anne') RENAME (p AS x, c AS y) JOIN (PC RENAME (p as gp, c AS x)){gp} [4] Our solution: SELECT PC2.p AS gp FROM PC PC1, PC PC2 WHERE PC1.c = 'Anne' AND PC1.p = PC2.c If you prefer to use NATURAL JOIN, then something like: SELECT gp FROM PC NATURAL JOIN (SELECT p AS gp, c AS p FROM PC) WHERE c = 'Anne' (f) Who are Boris's siblings or half-siblings? SELECT DISTINCT B.c FROM PC A, PC B WHERE A.c = 'Boris' AND A.p = B.p AND NOT (B.c = 'Boris') [4] Our solution: (PC RENAME (c AS Boris ) WHERE Boris = 'Boris' JOIN (PC WHERE c <> 'Boris')){c} (g) How many children do Anne and Boris have? COUNT ( (PC WHERE p = 'Anne') RENAME (p AS m) JOIN (PC WHERE p = 'Boris') RENAME (p AS f) ) [4] A few of you decided to include people who have either Anne or Boris as a parent. We think that’s a slightly perverse interpretation but in any case you were supposed to translating the given Tutorial D expression, so you lost 2 marks for the misinterpretation. Also, if you did interpret the question that way you might have

Page 9 of 13 CS2520

fallen into the trap of counting twice those who have both Anne and Boris as parents, in which case you lost another mark. Our solution: SELECT COUNT(*) FROM PC A, PC B WHERE A.p = 'Anne' AND B.p = 'Boris' AND A.c = B.c Solutions using NATURAL JOIN are acceptable, of course, so long as you look after those RENAMEs properly.

Page 10 of 13 CS2520

4. Consider we have the following small database that records students (s), the bars (ba) they visit, the beers (be) they like, and the beer served in bars, as follows. CREATE TABLE likes ( s VARCHAR(20) NOT NULL, be VARCHAR(20) NOT NULL, PRIMARY KEY (s, be) ); CREATE TABLE visits ( s VARCHAR(20) NOT NULL, ba VARCHAR(20) NOT NULL, PRIMARY KEY (s, ba) ); CREATE TABLE serves ( ba VARCHAR(20) NOT NULL, be VARCHAR(20) NOT NULL, PRIMARY KEY (ba, be) ); (a) Write down an SQL ORACLE query that retrieves all students in the database. [3] Our solution: SELECT s FROM likes UNION SELECT s FROM visits A common error was to assume that all students are represented in likes, overlooking that some might appear in visits but not in likes. (b) Write down an SQL ORACLE query that lists the top three beers (i.e., starting with the beer that is liked by most students and the ones following). For each beer, also mention the number of students liking it and its rank. [6] Our solution: SELECT be, count, ROWNUM (SELECT be, COUNT(be) AS count FROM likes GROUP BY be ORDER BY count DESC) WHERE ROWNUM <= 3 If you used TOP 3 on the SELECT clause or LIMIT 3 on ORDER BY, that was accepted as a good alternative. Not that ROWNUM, TOP, LIMIT, and the ability to write ORDER BY in a subquery are all nonstandard SL, proprietary to certain particular products. (c) Write down an SQL ORACLE query that retrieves beers that no student likes (and that are present in the database). [3] Our solution: SELECT be FROM serves MINUS SELECT be FROM likes Alternatives:

SELECT DISTINCT be FROM serves WHERE be NOT IN (SELECT be FROM likes) and: SELECT DISTINCT be FROM serves WHERE NOT EXISTS (SELECT be FROM likes WHERE likes.be = serves.be) Note carefully the need for DISTINCT in the alternatives. It’s not needed in our solution because MINUS, like UNION, removes redundant duplicate rows (unless ALL is specified). (d) Translate the following SQL Oracle query into normal English. SELECT DISTINCT s FROM likes WHERE be IN ((SELECT be FROM likes) MINUS (SELECT be FROM serves)); [4] Nearly everybody got this one: Show students who like a beer that is served in no bar. (e) Write down an SQL ORACLE query that retrieves students that visit a bar that serves at least one beer they like. [3] Our solution: SELECT DISTINCT s FROM serves NATURAL JOIN visits NATURAL JOIN likes or SELECT DISTINCT s FROM serves se, visits v, likes l WHERE se.ba = v.ba AND v.s = l.s AND l.be = se.be (f) Write down an SQL query that shows, for each student visiting bars, and each bar he is visiting, the beers that should be served by that bar in order to satisfy the student, or no beers if the student doesn’t like any beer. A student would be satisfied if a bar were to serve precisely the beers he likes. [3] Our solution: SELECT s, ba, be FROM visits LEFT NATURAL JOIN likes or: SELECT vv.s, vv.ba, l.be FROM visits vv LEFT JOIN likes l ON vv.s = l.s

(g) Write down Codd’s completeness criterion. Briefly show based on it the claim of SQL to completeness. Mention at least one reason of why the completeness claim of SQL doesn’t entirely hold. [8] Codd gave several different definitions over the years and it appears that some textbooks have even misquoted him, so we have marked this one rather leniently. The relational operators TIMES (product), restriction (WHERE), projection, difference, and union are sufficient. Some texts erroneously misquote this list to include JOIN instead of restriction, so we have allowed that. Which of SQL’s various defects with respect to relational theory actually cause SQL to be relationally incomplete is a debatable matter, especially if the reference is to Codd’s work only (he was somewhat unclear on several issues). He was clear that attribute order is insignificant but he never seemed to buy fully into the consequence of that point regarding attribute/column naming. Some of you cited flaws such as NULL and duplicate rows as contravening relational completeness. It could be argued that these are just additional “features” but we have given you some credit for mentioning them. At least one student cited SQL’s incorrect implementation of “=”. We have gladly accepted that because in CS253 we describe this error and claim that it does interfere with completeness.

(End)

Page 13 of 13