Fundamentals/ICY: Databases May/June 2010

Fundamentals/ICY: Databases – May/June 2010

ANSWERS / CREATIVITY / LEARNING OUTCOMES (minor corrections done 1 June 2010)

Coverage of Learning Outcomes by the Exam (Note: all are covered by the Continuous Assessment)

On successful completion of this module, the student should be Assessed by: able to:

1 Use SQL to create, modify and query databases. Q1(c), Q2(d), Q3(c), Q4(b)

2 Analyse a real-world scenario and perform a conceptual Q3(a,d) database design for it.

3 Take a conceptual data design model and translate it into the Q1(a,b), Q2(a,b), relational model. Q3(c)

4 Use an existing database system.

5 Apply relational algebra and the mathematical theory of Q1(d), Q2(c), relations to describe databases, queries, and consistency Q3(b), Q4(c) conditions.

Creative Elements

Q2(a) to the extent that the student needs to invent a suitable example if he/she cannot remember the (moderately complex) one shown in a lecture

Q2(d): ideally the student should realise that only one extra table is needed. An example was given during the module that implies this but the point was not stressed.

Q4(a) to the extent that the student needs to invent a suitable example if he/she cannot remember the one briefly mentioned in a lecture. Answers / Answer Notes

Note: In various places I allow for extra points to be given for a part of a Question, though the total for each Question is capped at the totals specified.

Question 1 [total points: 30]

(a) The method is the introduction of a bridging (= linking, or composite) entity type with 1:M relationships coming into to it from the two original entity types, and no direct relationship between those two types. The ERD fragment should show the three types, the two relationships, their 1:M cardinality and their mandatoriness in the direction away from the bridging type. [4 points; extra 1 for stating the bridging-type attributes] (b) The student should briefly show that there is a danger of at least one of the tables for the two types to need to have more than one row for the same entity, with different rows linking to different entities in the other table. The question then is whether to duplicate the remaining information RI for that entity, other than its PK values, into the various rows, or to have it just in one of the rows (with nulls or some other dummy values in the RI cells of the other rows). If duplicated, we get standard potential data-anomaly problems, update inefficiency problems, and wastage of space. If not duplicated, then queries seeking part of the RI about the entity need to be made more complicated to avoid those rows containing all nulls/dummies in the RI attributes. This problem is worsened when some, and especially when all, of the RI attributes are allowed to be null anyway. There is also considerable complication when deleting a link from the table to the other table, because there is a danger of deleting the row that contains the RI.info. [8 points; a few extra points occasionally given for the fuller answers] (c) Possible answers include those below. The problem is quite difficult to get quite right in the ten miuntes or so available.

select distinct painter_id from bridging where painting_id in (select painting_id from bridging group by painting_id having count(*) <> 1 );

select distinct painter_id from bridging as b where (select count(*) from bridging where painting_id = b.painting_id ) > 1;

[12 points] (d) The relation from either of the current entity sets to the other one at any given moment may fail to be a partial function (or: may fail to be “functional”; or: may be “multivalued”). [6 points; extra 1 for making clear that the mathematical existing at any given moment is dependent on which individual entities are currently present] Question 2 [total points: 24]

(a) The relevant problem is caused by trying to implement multivalued dependencies within a single table, in cases when it is not appropriate to use a fixed set of columns per value for each such dependency. The special difficulties that arise in were discussed during the module in particular practical situations. They are ones of wasted space, redundancy, and/or complexity in table query and update operations (and rise especially strongly when there are multiple mulivalued dependencies). Credit still given for talking about multivalued attributes rather than multivalued dependencies, as the textbook is prevaricatory and unclear on this point. Partial credit given for talking fro partial or transitive dependencies instead of multivalued ones, as 4NF does preclude the former as well. [6 points; up to extra 3 for the fuller answers] (b) Each offending multivalued dependency should be reimplemented by means of linking to a separate table or tables, in one of a couple of ways shown in the module, depending on the exact nature of the dependency. [6 points]

(c) Let the dependency be that of attribute Y on attributes X1 to Xn. Then in any possible state of the table, the induced relation from the value-domains (plus NULL) for the Xi into the value domain (plus NULL) for Y is a partial function that is total on the combinations of Xi values in the table. (Alternatively: the relation is a function from those combinations into the value domain plus NULL for Y.) The function may be different for different states of the table (as there is no requirement for the association between X and Y values to be constant). [6 points] (d) select People.*, Age, StarSign from People, V where People.DateofBirth = V.DateofBirth;

where V is a table with attributes Date of Birth, Age, Star Sign, and Date of Birth as PK. The query may not give exactly the original People table because it may have the columns in a different order. Also, cases where the original People table contains NULL dates of birth are not preserved. [6 points; extra 2 for good statement re divergence from original People; extra 2 for realising that only one extra table is needed, not two] Question 3 [total points: 26]

(a) The expense-account information could be in (i) the overall employee type (the supertype), in (ii) the manager and abroad-employee subtypes, or in (iii) a separate type. The third possibility avoids wasted-space/redundancy problems with the other two, and the student should explain these. [6 points] (b) In cases (i) and (ii) in part (a) the answer can just do the natural join of the employee subtype table E, the manager subtype M and the abroad-employee type A:

E ⋈ M ⋈ A In case (iii) the additional table also needs to be joined in by extending the above expression. Partial credit for giving an appropriate SQL query rather than a relational-algebra expression. [6 points] (c) The tables will normally have the same PKs (explicitly specified by PRIMARY KEY constraints), and these serve to link the tables. The subtype-to-supertype relationship is mandatory. Information common to all represented subtypes should be kept in attributes in the supertype, not individual subtypes. [8 points; extra 2 for REFERENCE constraint from subtype PK to supertype PK; extra 2 for associated ON DELETE CASCADE constraint] (d) Each subtype/supertype relationship is strong in both directions (assuming that the PK of the supertype is used as the PK for the subtypes and for the linking). Each subtype is existence-dependent on the supertype (i.e., the subtype-to-supertype relationships are mandatory), so each subtype is weak. [6 points; extra 2 for supertype not being forced to be weak by the situation] Question 4 [total points: 20] (a) The example in the textbook is as follows. Each customer entity links to an agent by an agent_code attribute, but a customer may not have been assigned an agent and correspondingly the customer’s agent_code value can be null. In the agent entity type there is a phone-number attribute. Suppose we wish to construct a table showing for each customer the phone-number of his/her agent if any, but noting the lack of a phone number for the customers that do not have an agent specified. This is achieved (partly) by a left outer join of customer with agent. [6 points] (b) select customer.id, agent_phone from customer, agent

where customer.agent_code = agent.agent_code union select id, NULL from customer where customer.agent_code = NULL;

That answer is adequate for the above example as stated. However, a more general answer is to replace the =NULL test in the last line by NOT IN (select agent_code from agent)

[8 points] (c) Let the relations currently existing in the input tables be L and R. Form a new relation J by concatenating the tuples of L with those of R, keeping only those that match on the values in the tuple positions corresponding to the join atrributes, and then shortening the tuples by removing the components of the R tuples at the join- attribute positions. (This can also be expressed as forming a “flattened” Cartesian product of L and R and then reducing the resulting relation and shortening its tuples.) Also form a relation N consisting of every tuple of L that does not match a tuple of R on the join attributes, and then extending those tuples by adding NULL values until they are the same length as the tuples of J. Finally, form the union of J and N. [6 points]