Design: The Relational

Basics of the From Conceptual to Logical Schema

Logical Schema Design

Select data model ƒ Hierarchical data model: hierarchies of record types mainframe oldie, still in use, outdated ƒ Network data model: graph like data structures, still in use, outdated ƒ Relational data model: the most important one today ƒ Object-Oriented data model: more flexible data structures, less powerful data manipulation language ƒ Object-Relational: the best of two worlds?

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Transform conceptual model into logical schema of data model ƒ easy for Relational Data Model (RDM) ƒ can be performed automatically (e.g. by Oracle Designer)

1.2

1 Logical Schema Design: Relational Data Model

The Relational Data Model Important terms ƒ 1970 introduced by E.F. Codd, honoured by Turing award (Codd: A Relational Model of Data for Large Shared Data Banks. ACM Comm. 13( 6): 377- 387, 1970) Basic ideas: ƒ is collection of relations ƒ Relation R = set of n-tuples

ƒ Relation schema R(A1,A2,…An)

ƒ Attributes Ai = atomic values of domain Di=dom(Ai)

Student relation (table) FName Name Email SID FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I attribute Tina Müller mueller@... 13555 Anna Katz katz@... 12555 tuple Carla Maus piep@... 11222 relation schema: Student(Fname, Name, Email, SID) or Student(Fname:string, Name:string, Email:string, SID:number) 1.3

Logical Schema Design: Relational Data Model

Relation Schema R(A1, A2, …, An)

notation sometimes R:{[A1, A2, …, An]}

Relation R ⊆ dom(A1) x dom(A2) x … x dom(An)

Attribute set R ={A1, A2, …, An} Degree of a relation: number of attributes

Example: Student(Fname, Name, Email, SID) FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Student ⊆ string x string x string x number R (Student) ={Fname, Name, Email, SID}

 is set of relation schemas 1.4

2 Logical Schema Design: Relational Data Model

Time-variant relations ƒ Relations have state at each point in time. ƒ Integrity constraints on state part of DB schema

Tuples (rows, records) ƒ Not ordered ƒ No duplicate tuples (Relations are sets) ƒ Null-values for some attributes possible ƒ Distinguishable based on tuple values (key-concept) FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Different to object identification in o-o languages: there, each object has implicit identity, usually completely unrelated to its fields

1.5

Logical Schema Design: Relational Data Model

Superkey: Important terms subset of attributes of a relation schema R for which no two tuples have the same value.

ƒ Every relation at least 1 superkey. ƒ Example: Student(Fname, Name, Email, SID) Superkeys: {Fname, Name, Email, SID}, {Name, Email, SID}, {Fname, Email, SID}, {Fname, Name, SID}, {Fname, SID}, {Name, SID}, {Fname, Name, Email}, FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I {Fname, Email}, {Name, Email}, {Email, SID}, {Email}, {SID}

1.6

3 Logical Schema Design: Relational Data Model

Candidate Key: Important terms superkey K of relation R such that if any attribute A ∈ K is removed, the set of attributes K\A is not a superkey of R.

ƒ Example: Student(Fname, Name, Email, SID) Candidate Keys: {Email}, {SID}

Primary Key:

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I arbitrarily designated candidate key

ƒ Example: Student(Fname, Name, Email, SID) Primary Key: {SID}

1.7

Logical Schema Design: Relational Data Model

Foreign Key: Important terms set of attributes FK in relation schema R1 that references relation R2 if: ƒ Attributes of FK have the same domain as the attributes

of primary key K2 of R2

ƒ A value of FK in tuple t1 in R1 either occurs as a value of K2 for some t2 in R2 or is null.

ƒ Example: FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I R1: Exercise(EID, Tutor, ExcerciseHours, Lecture)

R2: Student(Fname, Name, Email, SID)

K1={EID}, K2={SID} Tutor is foreign key of Student (from Exercise ). 1.8

4 Logical Schema Design: Transformation

1. Select data model → relational data model 2. Transform conceptual model into logical schema of relational data model

 Define relational schema, table names, attributes and types, invariants  Design steps: ƒ Translate entities into relations ƒ Translate relationships into relations ƒ Simplify the design FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I ƒ Define tables in SQL ƒ Define additional invariants ƒ (Formal analysis of the schema)

1.9

Logical Schema Design: Entities

Step 1: Transform entities ƒ For each entity E create relation R with all attributes ƒ Key attributes of E transform to keys of the relation

director year id category Price_per_day

length title

Movie FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Relational Schema: Movie(id, title, category, year, director, Price_per_day, length)

1.10

5 Logical Schema Design: Weak Entities

Step 2: Transform weak entities ƒ For each weak entity WE create relation R ƒ Include all attributes of WE ƒ Add key of identifying entity to weak entities key ƒ Part of key is foreign key

eid cid (0,*) (1,1) Employee has Child FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Relational Schema: Employee(eid) Child(cid, eid) Note: weak entity and defining relationship transformed together! 1.11

Logical Schema Design: Weak Entities

Step 2: Transform weak entities Item iid Example: (1,1)

has

date (1,*) (1,*) (1,1) Delivery contains Order did

oid FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Relational Schema: contact date Delivery(did, date) Order(oid, did, date, contact) Item(iid, oid, did)

1.12

6 Logical Schema Design: Relationships

Step 3: Transform Relationships ƒ For each 1:1-relationship between E1, E2 create relation R ƒ Include all key attributes ƒ Define as key: key of entity E1 or entity E2 ƒ Choose entity with total participation for key (driving licence)

(1,1) (1,1) Country has President

CName PName FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Relational Schema: Country(CName) President(PName) has(CName, PName) or has(Cname, PName)

1.13

Logical Schema Design: Relationships

Step 3: Transform Relationships ƒ For each 1:N-relationship between E1, E2 create relation R ƒ Include all key attributes ƒ Define as key: key of N-side of relationship

lid Lecture hold Lecturer (1,1) (0,*) lnr name

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Relational Schema: Lecture(lnr) Lecturer(lid, name) hold(lnr,lid)

1.14

7 Logical Schema Design: Relationships

Step 3: Transform Relationships ƒ For each N:M-relationship create relation R ƒ Include all key attributes ƒ Define as key: keys from both entities

User_account has emailMessage (0,*) (1,*) id username from

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Relational Schema: User_account(username) emailMessage(id, from) has(username, messageid)

1.15

Logical Schema Design: Relationships

Step 3: Transform Relationships ƒ Include all relationship attributes in relation R

lid Lecture hold Lecturer (1,1) (0,N) lnr date name Relational Schema: Lecture(lnr) Lecturer(lid, name) hold(lnr,lid, date) lnr

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I ƒ Rename keys in recursive relationships Lecture Relational Schema: predecessor successor require(preLNR, succLNR) (0,1) (0,*) require

1.16

8 Logical Schema Design: Relationships

Step 3: Transform Relationships ƒ For each n-ary relationship (n>2) create relation R ƒ Include all key attributes ƒ Define as key: keys from all numerously involved entities

rating

(0,*) (1,*) Lecturer recommend Textbook ISBN

(0,*) title lid name lnr FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Lecture author Relational Schema: Lecture(lnr) Lecturer(lid, name) Textbook(ISBN, title, author) recommend(lid, lnr, ISBN, rating) 1.17

Logical Schema Design: Generalization

Step 4: Generalization FName ƒ 3 Variants for Transformation Person Name Email

is-a is-a pid

SID Student Lecturer Contract

Person(pid, name, fname, email) Student(pid, sid) Lecturer(pid, contract) FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Student(pid, name, fname, email, sid) Lecturer(pid, name, fname, email, contract)

Person(pid, name, fname, email, sid, contract)

1.18

9 Logical Schema Design: Generalization aid (1) Separate relation for each entity A a ƒ Key in specialized relations: is-a is-a foreign key to general relation b B C c

A(aid, a) B: B(aid, b) A: C(aid, c) C:

ƒ Gathering data from different tables time consuming

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I ƒ Appropriate specialization prevents unnecessary data access

1.19

Logical Schema Design: Generalization aid (2) Relations for specialization A a ƒ Separate tables which include is-a is-a A’s attributes B C ƒ Separate keys for relations b c

AB(aid, a, b) AB: AC(aid, a, c) AC:

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I ƒ Only valid for exhaustive specialization (no data in A only) ƒ Time consuming data retrieval for non-distinct specializations

1.20

10 Logical Schema Design: Generalization aid (3) one relation for all entities A a is-a is-a

b B C c ABC(aid, a, b, c)

ABC: NULLs, if subtypes are disjoint

Empty, if subtypes are exhaustive FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I ƒ Removes generalization! ƒ Relation may contain many NULL-values

1.21

Logical Schema Design: Example year director title category id charge Format Price_per_day name (0,*) (1,*) hold Movie length belong_to (1,1) (0,*) (1,1) play

id Tape First_name (1,*) Address (0,*) Last_name is_in Telephone Actor (1,1) Mem_no FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I from birthday Rental stage_name

until (1,1) (0,*) real_name have Customer

1.22

11 Logical Schema Design: Example

Format(name, charge) Tape(id) Movie(id, title, category, year, director, price_per_day, length) Actor(stage_name, real_name, birthday) Customer(mem_no, last_name, first_name, address, telephone) Rental(tape_id, member, from, until)

Belong_to(formatname, tape_id) Hold(tape_id, movie_id)

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Play(movie_id, actor_name)

Reduce relation number by simplification!

1.23

Logical Schema Design: Simplification

Collect those relation schemas with the same key attributes into one relation schema ƒ Semantics of attributes must match, not literal name ƒ Do not destroy generalization! hours title LID Lecture(lid, title, hours) (1,1) Lecture hold Person(pid, fname, name, email) (3,N) FNameName Student(pid, sid) (0,N) Email Person pid Lecturer(pid, contract) attend Attend(lid, pid) FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I is-a is-a Hold(lid, pid) (1,N) Student Lecturer Lecture(lid, title, hours, lecturer) SID Contract 1.24

12 Logical Schema Design: Example

Format(name, charge) Tape(id) Movie(id, title, category, year, director, price_per_day, length) Actor(stage_name, real_name, birthday) Customer(mem_no, last_name, first_name, address, telephone) Rental(tape_id, member, from, until)

Belong_to(formatname, tape_id) Hold(tape_id, movie_id) Play(movie_id, actor_name) FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Simplified relation from Tape/Belong_to/Hold: Tape(id, format, movie_id)

No simplification based on partial keys!

1.25

Logical Schema Design: Simplification

Restrictions on schema simplification ƒ Simplification (folding) of relations may result in NULL values in some tables duration

Student get Grant gid (0,1) (0,1) … … id amount since responsible Student(id, …) Grant(gid, duation, amount) Get(student, gid, since, responsible)

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Grant(gid, duation, amount, student, since, responsible)

Consequence: ƒ optional relationships should not be transformed if many NULLS to be expected ƒ May lead to time consuming retrieval 1.26

13 Logical Schema Design: Short summary

Relational data model ƒ Representation data in relations (tables) ƒ the most important data model today

Important terms & concepts ƒ Relation: set of n-tuples ƒ Relation schema defines structure ƒ Attribute: property of relation, atomic values ƒ Superkey, candidate key, primary key identify tuple ƒ Transformation rules for entities, relationships ƒ

FU-Berlin, DBS I 2006, Hinze / Scholz / Hinze 2006, FU-Berlin, DBS I Simplification

Open aspects of logical schema design ƒ DDL for defining and changing relation schemas ƒ Definition of constraints (Min-cardinalities, Value restrictions for attributes,…) 1.27

14