I a Suppose You Want to Build a Video Site Similar to Youtube and Keep Data in File- Processing System. Discuss the Relevance Of
Total Page:16
File Type:pdf, Size:1020Kb
October 2017 Database Management Systems Solution Set I a Suppose you want to build a video site similar to YouTube and keep data in file- processing system. Discuss the relevance of each of the following points to the storage of actual video data, and to metadata about the video, such as title, the user who uploaded it, tags, and which users viewed it. i. Data redundancy and inconsistency ii. Difficulty in accessing data iii. Data isolation iv. Integrity problems v. Atomicity problems vi. Concurrent system anomalies vii. Security problems Ans i. Data redundancy and inconsistency. This would be relevant to metadata to 1a some extent, although not to the actual video data, which is not updated. There are very few relationships here, and none of them can lead to redundancy. ii. Difficulty in accessing data. If video data is only accessed through a few predefined interfaces, as is done in video sharing sites today, this will not be a problem. However, if an organization needs to find video data based on specific search conditions (beyond simple keyword queries) if meta data were stored in files it would be hard to find relevant data without writing application programs. Using a database would be important for the task of finding data. iii. Data isolation. Since data is not usually updated, but instead newly created, data isolation is not a major issue. Even the task of keeping track of who has viewed what videos is (conceptually) append only, again making isolation not a major issue. However, if authorization is added, there may be some issues of concurrent updates to authorization information. iv. Integrity problems. It seems unlikely there are significant integrity constraints in this application, except for primary keys. if the data is distributed, there may be issues in enforcing primary key constraints. Integrity problems are probably not a major issue. v. Atomicity problems. When a video is uploaded, metadata about the video and the video should be added atomically, otherwise there would be an inconsistency in the data. An underlying recovery mechanism would be required to ensure atomicity in the event of failures. vi. Concurrent-access anomalies. Since data is not updated, concurrent access anomalies would be unlikely to occur. vii. Security problems. These would be a issue if the system supported authorization. 1 October 2017 Database Management Systems Solution Set b State the advantages and disadvantages of the following data models: Hierarchical, Network, Relational, Entity Relationship, Object Oriented and NoSQL. State if the models support data and structural independence. 2 October 2017 Database Management Systems Solution Set 1 c State and explain the twelve Codd’s rules for relational databases. 3 October 2017 Database Management Systems Solution Set 1 d What is Unified modelling language? What are its parts? Show the ER diagram notations and equivalent notations in UML. Ans 1 d The Unified Modeling Language (UML) is a standard developed under the auspices of the Object Management Group (OMG) for creating specifications of various components of a software system. Some of the parts of UML are: • Class diagram: A class diagram is similar to an E-R diagram. • Use case diagram: Use case diagrams show the interaction between users and the system, in particular the steps of tasks that users perform (such as withdrawing money or registering for a course). • Activity diagram: Activity diagrams depict the flow of tasks between various components of a system. • Implementation diagram: Implementation diagrams show the system components and their interconnections, both at the software component level and the hardware component level. 4 October 2017 Database Management Systems Solution Set 1 e Construct an E-R diagram for a car insurance company whose customers own one or more cars each. Each car has associated with it zero to any number of recorded accidents. Each insurance policy covers one or more cars, and has one or more premium payments associated with it. Each payment is for a particular period of time, and has an associated due date, and the date when the payment was received. Ans 1 e The E-R diagram is shown in Figure below. Payments are modeled as weak entities since they are related to a specific policy. Note that the participation of accident in the relationship participated is not total, since it is possible that there is an accident report where the participating car is unknown. 1 f i. Design an E-R diagram for keeping track of the exploits of your favourite sports team. You should store the matches played, the scores in each match, the players in each match, and individual player statistics for each match. Summary statistics should be modelled as derived attributes. 5 October 2017 Database Management Systems Solution Set ii. Consider an E-R diagram in which the same entity set appears several times, with its attributes repeated in more than one occurrence. Why is allowing this redundancy a bad practice that one should avoid? The different occurrences of an entity may have different sets of attributes, leading to an inconsistent diagram. Instead, the attributes of an entity should be specified only once. All other occurrences of the entity should omit attributes. Since it is not possible to have an entity without any attributes, an occurrence of an entity without attributes clearly indicates that the attributes are specified elsewhere. 2 a The natural outer-join operations extend the natural-join operation so that tuples from the participating relations are not lost in the result of the join. Describe how the theta join operation can be extended so that tuples from the left, right, or both relations are not lost from the result of a theta join. 2 b Given the following relational schemas: 푅 = (퐴, 퐵, 퐶) 푆 = (퐷, 퐸, 퐹) Suppose the relations r(R) and s(S) are defined. Write the expressions in tuple relational calculus equivalent to each of the following: i. ∏퐴(푟) ii. 휎퐵=17(푟) iii. 푟 × 푠 iv. ∏퐴,퐹(휎퐶=퐷(푟 × 푠)) Ans 2b i. {t | ∃ q ∈ r (q[A] = t[A])} ii. {t | t ∈ r ∧ t[B] = 17} iii. {t | ∃ p ∈ r ∃ q ∈ s (t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = p[C] ∧ t[D] = q[D] ∧ t[E] = q[E] ∧ t[F] = q[F])} iv. {t | ∃ p ∈ r ∃ q ∈ s (t[A] = p[A] ∧ t[F] = q[F] ∧ p[C] = q[D]} 2 c Consider the relational database below, where primary keys are underlined. employee (person name, street, city) works (person name, company name, salary) company (company name, city) manages (person name, manager name) Give an expression in tuple relational calculus for each of the following queries: i. Find all employees who work directly for “Jones.” ii. Find all cities of residence of all employees who work directly for “Jones.” 6 October 2017 Database Management Systems Solution Set iii. Find the name of the manager of the manager of “Jones.” iv. Find those employees who earn more than all employees living in the city “Mumbai.” Ans 2c i. {t | ∃ m ∈ manages (t[person name] = m[person name] ∧ m[manager name] = ’Jones’)} ii. {t | ∃ m ∈ manages ∃e ∈ employee(e[person name] = m[person name] ∧ m[manager name] = ’Jones’ ∧ t[city] = e[city])} iii. {t | ∃ m1 ∈ manages ∃m2 ∈ manages(m1[manager name] = m2[person name] ∧ m1[person name] = ’Jones’ ∧ t[manager name] = m2[manager name])} iv. {t | ∃ w1 ∈ works ¬∃w2 ∈ works(w1[salary] < w2[salary] ∃e2 ∈ employee (w2[person name] = e2[person name] ∧ e2[city] = ’Mumbai’))} 2 d What is normalization? What is its objective? Give a distinguishing characteristic of 1NF, 2NF, 3NF, 4NF and BCNF. Ans 2d Normalization is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies. The normalization process involves assigning attributes to tables based on the concept of determination. The objective of normalization is to ensure that each table conforms to the concept of well-formed relations—in other words, tables that have the following characteristics: • Each table represents a single subject • No data item will be unnecessarily stored in more than one table (in short, tables have minimum controlled redundancy). The reason for this requirement is to ensure that the data is updated in only one place. • All nonprime attributes in a table are dependent on the primary key—the entire primary key and nothing but the primary key. The reason for this requirement is to ensure that the data is uniquely identifiable by a primary key value. • Each table is void of insertion, update, or deletion anomalies, which ensures the integrity and consistency of the data. 7 October 2017 Database Management Systems Solution Set i. Using the INVOICE table structure shown in table below, write the relational schema, draw its dependency diagram and identify all dependencies (including all partial and transitive dependencies). You can assume that the table does not contain repeating groups and that any invoice number may reference more than one product. (Hint: This table uses a composite primary key.) Attribute Name Sample Value Sample Value Sample Sample Value Sample Value Value INV_NUM 211347 211347 211347 211348 211349 PROD_NUM AA-E3422QW QD-300932X RU-995748G AA-E3422QW GH-778345P SALE_DATE 15-Jan-2016 15-Jan-2016 15-Jan-2016 15-Jan-2016 16-Jan-2016 PROD_LABEL Rotary sander 0.25-in. drill bit Band saw Rotary sander Power drill VEND_CODE 211 211 309 211 157 VEND_NAME NeverFail, Inc. NeverFail, Inc. BeGood, Inc. NeverFail, Inc. ToughGo, Inc. QUANT_SOLD 1 8 1 2 1 PROD_PRICE ₹4995 ₹345 ₹3999 ₹4995 ₹8775 ii. Using the initial dependency diagram drawn in question i, remove all partial dependencies, draw the new dependency diagrams, and identify the normal forms for each table structure you created.