Conceptual Steps in and The Entity- Relationship Model • Requirements Analysis – user needs; what must database do? CS 186 Spring 2006 • Conceptual Design Lectures 19 & 20 – high descr (often done w/ER model) R &G - Chapter 2 • Logical Design A relationship, I think, is like a – translate ER into DBMS data model shark, you know? It has to • Schema Refinement constantly move forward or it – consistency, normalization dies. And I think what we got on • Physical Design - indexes, disk layout our hands is a dead shark. • Security Design - who accesses what, and how

Woody Allen (from Annie Hall, 1979)

Databases Model the Real World Conceptual Design • “Data Model” allows us to translate real • What are the entities and relationships in world things into structures computers the enterprise? can store • What information about these entities and • Many models: Relational, E-R, O-O, relationships should we store in the Network, Hierarchical, etc. database? • Relational • What are the integrity constraints or –Rows & Columns business rules that hold? – Keys & Foreign Keys to link Relations Enrolled • A database `schema’ in the ER Model can sid cid grade Students be represented pictorially (ER diagrams). 53666 Carnatic101 C sid name login age gpa 53666 Reggae203 B 53666 Jones jones@cs 18 3.4 • Can then map an ER diagram into a 53650 Topology112 A 53688 Smith smith@eecs 18 3.2 relational schema. 53666 History105 B 53650 Smith smith@math 19 3.8

ER Model Basics name ssn lot ER Model Basics (Contd.)

since Employees name dname ssn lot did budget • Entity: Real-world object, distinguishable from other objects. An entity is described using a set Employees Works_In Departments of attributes. • Entity Set: A collection of similar entities. E.g., • Relationship: Association among two or more entities. all employees. E.g., Attishoo works in Pharmacy department. – All entities in an entity set have the same set – relationships can have their own attributes. of attributes. (Until we consider hierarchies, • Relationship Set: Collection of similar relationships. anyway!) –An n-ary relationship set R relates n entity sets E1 ... En ; – Each entity set has a key (underlined). each relationship in R involves entities e1  E1, ..., en  En – Each attribute has a domain. name since dname ssn lot did budget ER Model Basics (Cont.) name Key Constraints ssn lot Employees Manages Departments An employee can Employees since work in many Works_In dname super- subor- departments; a did budget visor dinate since dept can have Reports_To Departments Works_In many employees.

•Same entity set can participate in different In contrast, each dept relationship sets, or in different “roles” in has at most one the same set. manager, according to the key constraint Many-to- 1-to Many 1-to-1 on Manages. Many

Participation Constraints Weak Entities • Does every employee work in a department? A weak entity can be identified uniquely only by • If so, this is a participation constraint considering the primary key of another – the participation of Employees in Works_In is said to be (owner) entity. total (vs. partial) – What if every department has an employee working in it? – Owner entity set and weak entity set must • Basically means “at least one” participate in a one-to-many relationship set (one owner, many weak entities). since name dname – Weak entity set must have total participation in ssn lot did budget this identifying relationship set.

Employees Manages Departments name cost ssn lot pname age Works_In

Means: “exactly one” Employees Policy Dependents since Weak entities have only a “partial key” (dashed underline)

Binary vs. Ternary Relationships Binary vs. Ternary Relationships (Contd.) name ssn lot pname age • Previous example illustrated a case when two binary Employees Covers Dependents relationships were better than one ternary. If each policy is owned by just 1 employee: Bad design Policies • An example in the other direction: a ternary Key constraint on relation Contracts relates entity sets Parts, Policies would policyid cost Departments and Suppliers, and has descriptive mean policy can name pname age only cover 1 ssn lot attribute quantity. dependent! Dependents – No combination of binary relationships is an Employees adequate substitute. quantity Purchaser • Think through all Beneficiary the constraints in Parts Contract Departments the 2nd diagram! Better design Policies

policyid cost Suppliers name Aggregation ssn lot

Binary vs. Ternary Relationships (Contd.) Employees quantity Used to model a relationship Monitors until Parts Contract Departments involving a relationship set. VS. started_on since Suppliers Allows us to treat a dname pid pbudget did budget Parts needs Departments relationship set as an entity set Projects Sponsors Departments can-supply for purposes of Suppliers deals-with participation in Aggregation vs. ternary relationship? (other)  Monitors is a distinct relationship, – S “can-supply” P, D “needs” P, and D “deals-with” S does with a descriptive attribute. not imply that D has agreed to buy P from S. relationships.  – How do we record qty? Also, can say that each sponsorship is monitored by at most one employee.

name ISA (`is a’) Hierarchies ssn lot Review - Our Basic ER Model

As in C++, or other PLs, Employees attributes are inherited. hourly_wages hours_worked • Entities and Entity Set (boxes) ISA If we declare A ISA B, contractid • Relationships and Relationship sets (diamonds) every A entity is also –binary considered to be a B Hourly_Emps Contract_Emps –n-ary entity. • Key constraints (1-1,1-M, M-M, arrows on 1 side) • Overlap constraints: Can Simon be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed) • Participation constraints (bold for Total) • Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no) • Weak entities - require strong entity for key • Reasons for using ISA: • Aggregation - an alternative to n-ary relationships – To add descriptive attributes specific to a subclass. • i.e. not appropriate for all entities in the superclass • Isa hierarchies - abstraction and inheritance – To identify entities that participate in a particular relationship • i.e., not all superclass entities participate

Conceptual Design Using the ER Model Entity vs. Attribute

• Should address be an attribute of Employees • ER modeling can get tricky! or an entity (related to Employees)? • Design choices: • Depends upon how we want to use address – Should a concept be modeled as an entity or an attribute? information, and the semantics of the data: – Should a concept be modeled as an entity or a relationship? • If we have several addresses per employee, – Identifying relationships: Binary or ternary? Aggregation? address must be an entity (since attributes • Note constraints of the ER Model: cannot be set-valued). – A lot of data semantics can (and should) be captured. •If the structure (city, street, etc.) is important, – But some constraints cannot be captured in ER diagrams. address must be modeled as an entity (since • We’ll refine things in our logical (relational) design attribute values are atomic). Entity vs. Attribute (Cont.) Entity vs. Relationship OK as long as a from to name dname manager gets a ssn lot did since dbudget • Works_In2 does not budget separate name dname allow an employee to discretionary budget ssn lot did budget Employees Works_In2 Departments work in a department (dbudget) for each for two or more periods. dept. EmployeesManages2 Departments • Similar to the problem of What if manager’s wanting to record several dbudget covers all name addresses for an ssn lot managed depts? dname employee: we want to name dname did (can repeat value, but Employees budget record several values of ssn lot did budget such redundancy is the descriptive attributes Departments for each instance of this Employees Works_In3 Departments problematic) is_manager relationship. managed_by since from Duration to apptnum Mgr_Appts dbudget

A Cadastral E-R Diagram These things get pretty hairy!

• Many E-R diagrams cover entire walls! • A modest example:

A Cadastral E-R Diagram Logical DB Design: ER to Relational ssn name lot • Entity sets to tables. 123-22-3666 Attishoo 48 name 231-31-5368 Smiley 22 cadastral: showing or recording property boundaries, subdivision lines, buildings, ssn lot and related details 131-24-3650 Smethurst 35

Source: US Dept. Interior Bureau of Land Management, Employees Federal Geographic Data Committee Cadastral Subcommittee http://www.fairview-industries.com/standardmodule/cad-erd.htm CREATE TABLE Employees (ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn)) Relationship Sets to Tables Review: Key Constraints CREATE TABLE Works_In( • Each dept has at • In translating a many-to- ssn CHAR(1), most one since many relationship set to a did INTEGER, name dname manager, since DATE, ssn lot did budget relation, attributes of the according to the PRIMARY KEY (ssn, did), relation must include: key constraint on FOREIGN KEY (ssn) Manages. Employees Manages Departments 1) Keys for each REFERENCES Employees, participating entity set FOREIGN KEY (did) (as foreign keys). This REFERENCES Departments) set of attributes forms ssn did since a superkey for the Translation to relation. 123-22-3666 51 1/1/91 relational model? 123-22-3666 56 3/3/93 2) All descriptive 1-to-1 1-to Many Many-to-1 Many-to-Many attributes. 231-31-5368 51 2/2/92

Translating ER with Key Constraints Review: Participation Constraints since name dname • Does every department have a manager? ssn lot did budget – If so, this is a participation constraint: the participation of Employees Manages Departments Departments in Manages is said to be total (vs. partial). • Since each department has a unique manager, we •Every did value in Departments table must appear in a could instead combine Manages and Departments. row of the Manages table (with a non-null ssn value!)

CREATE TABLE Manages( CREATE TABLE Dept_Mgr( since ssn CHAR(11), did INTEGER, name dname ssn lot did budget did INTEGER, dname CHAR(20), since DATE, Vs. budget REAL, Employees Manages Departments PRIMARY KEY (did), ssn CHAR(11),

FOREIGN KEY (ssn) since DATE, Works_In REFERENCES Employees, PRIMARY KEY (did), FOREIGN KEY (did) FOREIGN KEY (ssn) REFERENCES Departments) REFERENCES Employees) since

Participation Constraints in SQL Review: Weak Entities • We can capture participation constraints involving one • A weak entity can be identified uniquely only by entity set in a binary relationship, but little else considering the primary key of another (owner) entity. (without resorting to CHECK constraints). – Owner entity set and weak entity set must participate in a one-to-many relationship set (1 owner, many weak CREATE TABLE Dept_Mgr( entities). did INTEGER, dname CHAR(20), – Weak entity set must have total participation in this budget REAL, identifying relationship set. ssn CHAR(11) NOT NULL, name cost since DATE, ssn lot pname age PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees, Employees Policy Dependents ON DELETE NO ACTION) name ssn lot

Translating Weak Entity Sets Review: ISA Hierarchies Employees • Weak entity set and identifying relationship hourly_wages hours_worked set are translated into a single table. ISA – When the owner entity is deleted, all owned weak As in C++, or other PLs, contractid entities must also be deleted. attributes are inherited. Hourly_Emps Contract_Emps If we declare A ISA B, every A CREATE TABLE Dep_Policy ( entity is also considered to be a B pname CHAR(20), entity. age INTEGER, • Overlap constraints: Can Joe be an Hourly_Emps as well as a cost REAL, Contract_Emps entity? (Allowed/disallowed) ssn CHAR(11) NOT NULL, • Covering constraints: Does every Employees entity also have PRIMARY KEY (pname, ssn), to be an Hourly_Emps or a Contract_Emps entity? (Yes/no) FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

Translating ISA Hierarchies to Relations Now you try it

• General approach: University database: – 3 relations: Employees, Hourly_Emps and Contract_Emps. • Courses, Students, Teachers • Hourly_Emps: Every employee is recorded in • Courses have ids, titles, credits, … Employees. For hourly emps, extra info recorded in Hourly_Emps (hourly_wages, hours_worked, ssn); must • Courses have multiple sections that have time/rm delete Hourly_Emps tuple if referenced Employees tuple and exactly one teacher is deleted). • Must track students’ course schedules and transcripts • Queries involving all employees easy, those involving including grades, semester taken, etc. just Hourly_Emps require a join to get some attributes. • Must track which classes a professor has taught • Alternative: Just Hourly_Emps and Contract_Emps. • Database should work over multiple semesters – Hourly_Emps: ssn, name, lot, hourly_wages, hours_worked. – Each employee must be in one of these two subclasses.

CREATE TABLE Sailors Other SQL DDL Facilities ( sid INTEGER, General Constraints sname CHAR(10), • Integrity Constraints (ICs) - Review rating INTEGER, • An IC describes conditions that every legal instance age REAL, of a relation must satisfy. • Useful when PRIMARY KEY (sid), more general ICs CHECK ( rating >= 1 – Inserts/deletes/updates that violate IC’s are disallowed. than keys are AND rating <= 10 )) – Can be used to ensure application semantics (e.g., sid is involved. CREATE TABLE Reserves a key), or prevent inconsistencies (e.g., sname has to be • Can use queries ( sname CHAR(10), to express a string, age must be < 200) INTEGER, constraint. bid DATE, • Types of IC’s: Domain constraints, primary key • Checked on insert day constraints, foreign key constraints, general or update. PRIMARY KEY (bid,day), constraints. • Constraints can CONSTRAINT noInterlakeRes – Domain constraints: Field values must be of right type. be named. CHECK (`Interlake’ <> Always enforced. ( SELECT B.bname FROM Boats B – Primary key and foreign key constraints: you know them. WHERE B.bid=bid))) Constraints Over Multiple Relations CREATE TABLE Sailors Or, Use a Trigger ( sid INTEGER, Number of boats sname CHAR(10), plus number of • Trigger: procedure that starts automatically if specified • Awkward and wrong!rating INTEGER, changes occur to the DBMS • Only checks sailors! sailors is < 100 age REAL, • Three parts: • Only required to hold if the associated tablePRIMARY KEY (sid), – Event (activates the trigger) is non-empty. CHECK – Condition (tests whether the triggers should run)

• ASSERTION is the right( (SELECT COUNT (S.sid) FROM Sailors S) – Action (what happens if the trigger runs) solution; not + (SELECT COUNT (B.bid) FROM • Triggers (in some form) are supported by most DBMSs; associated with either Assertions are not. table. Boats B) < 100 ) • Support for triggers is defined in the SQL:1999 • Unfortunately, not supported in many CREATE ASSERTION smallClub standard. DBMS. • Triggers are another CHECK solution. ( (SELECT COUNT (S.sid) FROM Sailors S) + (SELECT COUNT (B.bid) FROM Boats B) < 100 )

Triggers Triggers: Example CREATE TRIGGER trigger_name ON TABLE CREATE TRIGGER member_delete {FOR {[INSERT][,][UPDATE][,][DELETE]} ON member FOR DELETE [WITH APPEND] AS AS IF (Select COUNT (*) FROM loan INNER JOIN deleted sql-statements ON loan.member_no = deleted.member_no) > 0 BEGIN PRINT ‘ERROR - member has books on loan.’ • Cannot be called directly – initiated by events on the ROLLBACK TRANSACTION database. END • Can be synchronous or asynchronous with respect to ELSE DELETE reservation WHERE reservation.member_no = the transaction that causes it to be fired. deleted.member_no

Summary of Conceptual Design Summary: Triggers, Assertions, Constraints • Conceptual design follows requirements analysis, – Yields a high-level description of data to be stored • Very vendor-specific (although standard has been • ER model popular for conceptual design developed). – Constructs are expressive, close to the way people think • Triggers vs. Contraints and Assertions: about their applications. – Triggers are “operational”, others are declarative. – Note: There are many variations on ER model • Triggers can make the system hard to understand if • Both graphically and conceptually not used with caution. • Basic constructs: entities, relationships, and attributes (of – ordering of multiple triggers entities and relationships). – recursive/chain triggers • Some additional constructs: weak entities, ISA hierarchies, • Triggers can be hard to optimize. and aggregation. • But, triggers are also very powerful. • Use to create high-performance, “active” databases. Summary of ER (Cont.) Summary of ER (Cont.)

• Several kinds of integrity constraints: • ER design is subjective. There are often many ways to – key constraints model a given scenario! • Analyzing alternatives can be tricky, especially for a large – participation constraints enterprise. Common choices include: – overlap/covering for ISA hierarchies. – Entity vs. attribute, entity vs. relationship, binary or n- • Some foreign key constraints are also implicit in ary relationship, whether or not to use ISA hierarchies, aggregation. the definition of a relationship set. • Ensuring good database design: resulting relational • Many other constraints (notably, functional schema should be analyzed and refined further. dependencies) cannot be expressed. – Functional Dependency information and normalization • Constraints play an important role in determining techniques are especially useful. the best database design for an enterprise.