Designing a Week 10: Database Schema Design Relational (1,N) (1,N) Part(Name,Description,Part#) Part orders Customer Supplier(Name, Addr) (1,1) Customer(Name, Addr) Date Supplies(Name,Part#, Date) From an ER Schema to a Relational supplies Orders(Name,Part#) One (1,N) Supplier Restructuring an ER schema Network Performance Analysis Hierarchical Analysis of Redundancies, Removing Part Part Customer Generalizations name Translation into a Relational Schema Supplier Customer Supplier

CSC343 – Introduction to Database Design — 1 CSC343 – Introduction to Databases Database Design — 2

(Relational) Database Design  Given a (ER, but could also be Database a UML), generate a logical (relational) schema. Design  This is not just a simple translation from one model to another for two main reasons: Process o not all the constructs of the Entity-Relationship model can be translated naturally into the ; o the schema must be restructured in such a way as to make the execution of the projected operations as efficient as possible.  The topic is covered in Section 3.5 of the textbook.

CSC343 – Introduction to Databases Database Design — 3 CSC343 – Introduction to Databases Database Design — 4

1 Performance Analysis Logical Design Steps o It is helpful to divide the design into  An ER schema is restructured to optimize: two steps:  Cost of an operation (evaluated in terms of the number of occurrences of entities and relationships o Restructuring of the Entity- that are visited during the execution of an Relationship schema, based on operation); criteria for the optimization of the  Storage requirements (evaluated in terms of schema and the simplification of the number of bytes necessary to store the data following step; described by the schema). o Translation into the logical  In order to study these parameters, we need to know: model, based on the features of the  Projected volume of data; logical model (in our case, the  Projected operation characteristics. relational model).

CSC343 – Introduction to Databases Database Design — 5 CSC343 – Introduction to Databases Database Design — 6

Cost Model  The cost of an operation is measured in terms of Employee-Department Example the number of disk accesses required. A disk access is, generally, orders of magnitude more expensive than in-memory accesses, or CPU operations.  For a coarse estimate of cost, we assume that a Read operation (for one entity or relationship) requires 1 disk access; A Write operation (for one entity or relationship) requires 2 disk accesses (read from disk, change, write back to disk).  There are many other cost models depending on use and type of DB Warehouse (OLAP - On-Line Analysis Processing) CSC343O – Inptroedurcatiotn ito Dnataabals esDB (OLTP - On-Line TransacDtaitaobanse Design — 7 CSC343 – Introduction to Databases Database Design — 8 Processing)

2 Typical Operations Workload Design

 Operation 1: Assign an employee to a project.  During initial design and requirements  Operation 2: Find an employee record, including her analysis department, and the projects she works for.  Operation 3: Find records of employees for a Estimate operations and frequency department. Gross estimates at best  Operation 4: For each branch, retrieve its  After database is operational departments, and for each department, retrieve the last names of their managers, and the list of their Tools record actual workload employees. characteristics  Need operations and their volume/frequency

CSC343 – Introduction to Databases Database Design — 9 CSC343 – Introduction to Databases Database Design — 10

Analysis of Redundancies Analysis Steps A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is, obtained through a series of retrieval operations) from other data in the database. An Entity-Relationship schema may contain various forms of redundancy.

CSC343 – Introduction to Databases Database Design — 11 CSC343 – Introduction to Databases Database Design — 12

3 Deciding About Redundancies Examples of Redundancies  The presence of a redundancy in a database may be  an advantage: a reduction in the number of accesses necessary to obtain the derived information;  a disadvantage: because of larger storage requirements, (but, usually at negligible cost) and the necessity to carry out additional operations in order to keep the derived data consistent.  The decision to maintain or eliminate a redundancy is made by comparing the cost of operations that involve the redundant information and the storage needed, in the case of presence or absence of redundancy.

CSC343 – Introduction to Databases Database Design — 13 CSC343 – Introduction to Databases Database Design — 14

Removing Generalizations Cost Comparison: An Example  The relational model does not allow direct representation of generalizations that may be present in an E-R diagram.  For example, here is an ER schema with generalizations:

In this schema the attribute NumberOfInhabitants is redundant.

CSC343 – Introduction to Databases Database Design — 15 CSC343 – Introduction to Databases Database Design — 16

4 Option 1 Option 3 Note! Possible ...Two Restructuring More... s

Option 2 Option 4 (combination)

CSC343 – Introduction to Databases Database Design — 17 CSC343 – Introduction to Databases Database Design — 18

General Rules For Removing Generalization Partitioning and Merging of  Option 1 is convenient when the operations involve Entities and Relationships the occurrences and the attributes of E0, E1 and E2  Entities and relationships of an E-R schema can be more or less in the same way. partitioned or merged to improve the efficiency of  Option 2 is possible only if the generalization operations, using the following principle: satisfies the coverage constraint (i.e., every Accesses are reduced by separating instance of E0 is either an instance of E1 or E2) and attributes of the same concept that are is useful when there are operations that apply only accessed by different operations and to occurrences of E1 or E2. by merging attributes of different  Option 3 is useful when the generalization is not concepts that are accessed by the coverage-compliant and the operations refer to same operations. either occurrences and attributes of E (E ) or of  The same criteria with those discussed for 1 2 redundancies are valid in making a decision about E0, and therefore make distinctions between child and parent entities. this type of restructuring.  Available options can be combined (see option 4)

CSC343 – Introduction to Databases Database Design — 19 CSC343 – Introduction to Databases Database Design — 20

5 Example of Partitioning Deletion of Multi-Valued Attribute

Recall this is a weak entity

CSC343 – Introduction to Databases Database Design — 21 CSC343 – Introduction to Databases Database Design — 22

Merging Entities Partitioning of a Relationship

Suppose that composition represents Are these equivalent? current and past compositions of a team

Two candidate keys

CSC343 – Introduction to Databases Database Design — 23 CSC343 – Introduction to Databases Database Design — 24

6 Selecting a Primary Key Translation into a  Every must have a unique primary key.  The second step of logical design consists of a translation between different data models.  The criteria for this decision are as follows:  Starting from an E-R schema, an equivalent Attributes with values cannot form primary keys; relational schema is constructed. By “equivalent”, we mean a schema capable of One/few attributes is preferable to many representing the same information. attributes; Internal key preferable to external ones (weak  We will deal with the translation problem entity); systematically, beginning with the fundamental case, that of entities linked by A key that is used by many operations to access the instances of an entity is preferable many-to-many relationships. to others.  At this stage, if none of the candidate keys satisfies the above requirements, it may be CSbC3e43 s– Itnt rotdouct ioinn to tDrataobadsesuce a new attribute (e.g.,D astaboasce Diaesigln — 25 CSC343 – Introduction to Databases Database Design — 26 insurance #, student #,…)

Many-to-Many Relationships Many-to-Many Recursive Relationships

Employee(Number, Surname, Salary) Project(Code, Name, Budget) Participation(Number, Code, StartDate) Product(Code, Name, Cost) Composition(Part, SubPart, Quantity)

CSC343 – Introduction to Databases Database Design — 27 CSC343 – Introduction to Databases Database Design — 28

7 Ternary Relationships One-to-Many Relationships

Player(Surname, DateOfBirth, Position) Team(Name, Town, TeamColours) Contract(PlayerSurname, PlayerDateOfBirth, Supplier(SupplierID, SupplierName) Team, Salary) Product(Code, Type) OR Player(Surname, DateOfBirth, Position, Department(Name, Telephone) TeamName, Salary) Supply(Supplier, Product, Department, Quantity) Team(Name, Town, TeamColours)

CSC343 – Introduction to Databases Database Design — 29 CSC343 – Introduction to Databases Database Design — 30

Weak Entities One-to-One Relationships

Head(Number, Name, Salary, Department, StartDate) Department(Name, Telephone, Branch) Student(RegistrationNumber, University, OR Surname, EnrolmentYear) Head(Number, Name, Salary, StartDate) University(Name, Town, Address) Department(Name, Telephone, HeadNumber, Branch)

CSC343 – Introduction to Databases Database Design — 31 CSC343 – Introduction to Databases Database Design — 32

8 Optional One-to-One Relationships A Sample ER Schema

Employee(Number, Name, Salary) Department(Name, Telephone, Branch, Head, StartDate) Or, if both entities are optional Employee(Number, Name, Salary) Department(Name, Telephone, Branch) Management(Head, Department, StartDate)

CSC343 – Introduction to Databases Database Design — 33 CSC343 – Introduction to Databases Database Design — 34

1-1 and Optional 1-1 Relationships Entities with Internal Identifiers R3

E5 R4 E6

E5 E6 R5

1-1 or optional 1-1 relationships can E3(A31, A32) lead to messy E4(A41, A42) E4 transformations E5(A51, A52) E3 E4 E6(A61, A62, A63) E3 E5(A51, A52, A61R3, A62R3, AR3, A61R4, A62R4, A61R5, A62R5, AR5) CSC343 – Introduction to Databases Database Design — 35 CSC343 – Introduction to Databases Database Design — 36

9 Weak Entities Many-to-Many Relationships

R3 R3

E1 R1 E5 R4 E6 E1 R1 E5 R4 E6

R5 R5 R6 R6

E1(A11, A51, A12) E2 E2 E2(A21, A11, A51, A22) R2 E4 E4 E3 E3 R2(A21, A11, A51, A31, A41, AR21, AR22)

CSC343 – Introduction to Databases Database Design — 37 CSC343 – Introduction to Databases Database Design — 38

Homework - Fill in a Translation Summary of Transformation Rules

E1(A11, A51, A12) E2(A21, A11, A51, A22) E3(A31, A32) E4(A41,A42) E5(A51, A52, A61R3, A62R3, AR3, A61R4, A62R4, A61R5, A62R5, AR5) E6(A61, A62, A63) R2(A21, A11, A51, A31, A41, AR21, AR22)

CSC343 – Introduction to Databases Database Design — 39 CSC343 – Introduction to Databases Database Design — 40

10 ...More Rules... …Even More Rules...

CSC343 – Introduction to Databases Database Design — 41 CSC343 – Introduction to Databases Database Design — 42

…and the Last One... The Training Company Revisited

CSC343 – Introduction to Databases Database Design — 43 CSC343 – Introduction to Databases Database Design — 44

11 Logical Design Using CASE Tools  The logical design phase is partially supported by database design tools: Logical  the translation to the relational model is carried out Design by such tools semi-automatically; with a  the restructuring step is difficult to automate and CASE tools provide little or no support for it. CASE  Most commercial CASE tools will generate automatically Tool SQL code for the creation of the database.  Some tools allow direct connection with a DBMS and can construct the corresponding database automatically.  [CASE = Computer-Aided Software Engineering]

CSC343 – Introduction to Databases Database Design — 45 CSC343 – Introduction to Databases Database Design — 46

12