Fall, 2009 CIS 550

Database and Information Systems

Homework 3 Due on October 21, 2009

Problem 1 [15 points]: Consider a R with four attributes ABCD. You are given the following (possibly incomplete) set of dependencies: A → BC, C → D, CD → AB.

1. List all keys for R (other than superkeys). C,A

2. Is R in 3NF? Why or why not? Yes, it is in BCNF.

3. Is R in BCNF? Why or why not? Yes, C, A, CD are all keys.

Problem 2 [30 points]: The task is to design a social network schema, roughly along the lines of the example schemas that we have been using (but with enhancements). We will have users who may create accounts, and will add profile information and connections to other users (friends, partners). Users might also post messages that can be viewed by other users or by everyone. These messages will be searchable by keyword.

• Each user will have a unique numeric ID. Additionally, the first and last names, email address, age, gender, and physical address will need to be stored. Users may be categorized as Students or Workers depending on education and employment status (described below).

• Each user will have one or more educational experiences, at some institution, some level (e.g., grade school, university), and with a start and end time.

• Each user will have zero or more interests.

• Each user will have zero or more employment experiences, with a job title, employer, employer address, and salary.

• Each user will be connected to zero or more friends (who are also users).

• Users can assign to friends different access levels that will control how much personal infor- mation they can see.

1 • Users have personal relationships with other users (e.g., dating, marriage, etc.).

• Users have mailboxes with IDs.

• Each mailbox has zero or more posts, with the option of being publicly viewable, or only viewable by the recipient.

Draw an ER diagram for the social network system. The ER diagram should include various at- tributes, keys, participation constraints, overlap and covering constraints.

Problem 3 [25 points]: Consider a relation R with five attributes ABCDE and the FD set F = {AB → E, AC → B,E → D,D → E}. Let F + denote the closure set of F .

1. For each of the following attribute sets, do the following: (i) write down a minimal cover of the subset of F + that holds over the set; (ii) name the strongest normal form that is not violated by the relation containing these attributes; (iii) decompose it into a collection of BCNF relations if it is not already in BCNF.

(a) ABDE (i) {AB → E,E → D,D → E} (ii) ABDE has a single key AB. Thus, neither E → D or D → E satisfy the conditions for 3NF. However, neither of these is a partial dependency, and so the relation is 2NF. (iii) ABE, DE (b) ABCD (i){AB → D, AC → B} (ii) Corrected: ABCD has key AC. Therefore it is not in BCNF, since AB is not a , nor is it in 3NF since both B and D are not in AC. (iii) ABC, ABD

2. For each of the following decompositions of the above R = ABCDE, with the same set of functional dependencies F , say whether the decomposition is (i) dependency preserving, and (ii) lossless join.

(a) {ABC, ABE, ED} (i) FABC contains {AC → B}, FABE contains {AB → E}, and FDE contains E → D,D → E, + + so F = (FABC ∪ FABE ∪ FDE) and so the decomposition is dependency preserving. (ii) ABC 1 ABE is lossless since F + contains AB → E. ABCE 1 ED is lossless since F + contains E → D. Thus, the decomposition is lossless. (b) {ABDE, ABC} (i) FABDE contains {AB → E,E → D,D → E} and FABC contains {AC → B}, so the decomposition is dependency preserving. (ii) F + contains AB → C so the decomposition is lossless.

Problem 4 [20 points]: Suppose you are given a relation R(A, B, C, D). For each of the following (complete) sets of FDs, (i) identify the candidate key(s) for R, and (ii) state whether or not the proposed decomposition of R into smaller relations is a “good” decomposition and briefly explain why or why not.

2 1. A → B, B → CD. Decompose into AB, BCD. (i) R has one candidate key, A. (ii) The decomposition is good. First off, it is in BCNF, eliminating redundancy. Furthermore, it a lossless join decomposition (F + contains B → CD), and dependency preserving A → B is cointained in FAB and B → CD is contained in FBCD. 2. D → A, B → C. Decompose into AD and BC. (i) R has one candidate key, BD. (ii) The decomposition is “bad” in the sense that it is lossy.

Problem 5 [15 points]: What is the difference between an FD and a candidate key? Do candidate keys capture all possible FDs?

FDs are more general than candidate keys. A key is just one type of . Candidate keys do not capture all possible FDs. For example, ABC might have the FDs A → BC and B → C. A is a candidate key. However, this does not capture the fact that a relation instance cannot, for example, contain (a1, b1, c1), (a2, b1, c2).

3