I Agree That My Grades Are Posted Using the Last 4 Digits of My Ssn

Total Page:16

File Type:pdf, Size:1020Kb

I Agree That My Grades Are Posted Using the Last 4 Digits of My Ssn

Final Exam COSC 6340 (Data Management) May 9, 2000

Your Name: Your SSN:

I agree that my grades are posted using the last 4 digits of my ssn ………………….(signature, if you like us to post your grades)

Problem 1 [18]: Problem 2 [10]: Problem 3 [8]: Problem 4 [9]: Problem 5 [14]: Problem 6 [22]: Problem 7: [7] Problem 8: [12]

:

Grade:

The exam is “open books” and you have 145 minutes to complete the exam.

1

1) Relational Database Design [18]

Assume we have a relation R(A,B,C,D,E) with the following dependencies: (1) AB CDE (2) CD ABE (3) E DB

Answer the following questions giving reasons for your answers: a) Is R in BCNF? [4]

b) Does ABE  D hold for R? [2]

c) Does CD  B hold for R? [2]

d) Does ED hold for R (either show that this dependency can be inferred from the given 3 dependencies, or give a counter example of a relation that satisfies (1), (2), (3) but violates ED)? ** [10]

2 ssn 2) Design an ODL Schema [10] PERSON Assume that the following E/R schema is given: name

city MALE FEMALE PRIEST Husband (0,1) wife (0,1) (0,n) performs

WED date DING Transform the E/R schema into an equivalent ODL-schema. Give the schema using ODL-syntax (not a diagram!). Assume that name and city are of type STRING, ssn is of type INTEGER, and date is of type DATE. Define inverse relationship for every relationship you define in your ODL schema!

3 3) Write SQL Queries [8] The following relational schema is given: person(ssn, name, address), works- for(employee, employer, salary) and company(C#, c-name, location) with ssn being used as a foreign key for employee in works-for, and C# being used as a foreign key for employer in works-for; name in person stores a person’s last name.

Create a table that gives the social security number, name, and number of employments for each person that has at least 2 employments (if you prefer you can write a sequence of SQL-queries instead of a single query)! [8]

4) Transaction Management [9] a) What is isolation of transactions and why is it important for modern database systems? What techniques do database systems employ to guarantee isolation? [5]

b) Assume we have two transactions T1 and T2 T1: A=A+10; B=B-10; T2: A=A-30; B=B+30;

Give a schedule that interleaves the execution of T1 and T2 that is not serializable! [4]

4 5) Data Mining [14] a) What is the purpose of clustering? Why do scientists employ clustering (what do they try to find out)? [5]

b) Assume you have to apply the APRIORI algorithm assuming that the minimum support is 40% (4 out of 10) to the following set of 10 transactions that involve purchases of items A, B, C, D, E, F, G. T1={A, D, E} T6={A, D, E, G} T2={A, D, F} T7={A, B, D, F} T3={A, E, F} T8={A, B, D, F, G} T4={A, B, D, F} T9={B, D, E, G} T5={B, D, F} T10={A, D, E} Indicate how Apriori’s Large Item Set Generation algorithm works for the example. Indicate what candidate itemsets will be generated in each pass, and which remain in the candidate set after pruning (use notations on page 11 of the survey paper and assume A>B>C>D>E>F>G). [9]

5 6) Decision Support [22] a) What are the features of multi-dimensional data model? Why is it quite popular for writing OLAP queries; why do decision maker prefer the multi-dimensional data model over SQL/Relational data model? [8]

b) Give an example of a Top-N query. Why is it difficult to implement Top-N queries using SQL? [4]

c) Now assume you are the leader of ORACLE development team and your task is to extend ORACLE to provide a better solutions for the TOP-N query problem. What approach would you suggest your programmers should use? Describe your ideas in sufficient detail [10 + up to 4 extra points] **

6 More Space for Problem 6

7) Internet Databases/Information Retrieval [7] What are signature files and how can they be used to index free text documents on the web? What is the purpose and justification of using bit strings and of using a hashing function in signature files? [7]

7 8) Physical Database Design [12] Assume a relation R(A, B, C, D) is given; R is stored as an unordered file and contains 1000000 (1 million) tuples. Attributes A, B, C, D need 4 byte of storage each, and blocks have a size of 4096 Byte. Moreover, we assume that static hashing is used to implement index structures, and that index pointers require 4 byte of storage; furthermore, you can assume that pages of index blocks are 80% full and do not contain any overflow pages. What index structures would you create to speed up the following 3 queries?

Q1: Select A, C Q2: Select D Q3: Select sum(R.D) from R from R from R where B=12; where C=12; where C=12; returns 200 answers returns 30000 answers returns one answer

Describe which index structures you would create (justify your design!), how they would be stored, and compute the cost for executing Q1, Q2, and Q3 for your chosen design (Hint: look for unusual solutions!).

8

Recommended publications