The Relational Model
What DB models exist?
Most widely current model: Relational Model Vendors: IBM DB2, Microsoft, Oracle, Sybase, etc… “Legacy systems” have older models E.g., IBM’s IMS (hierarchical), CODASYL network model Recent and future competitors: object-oriented model: ObjectStore, Versant Object-relational model: Oracle, DB2 Semi-structured: XML Integrated in all major relational database systems Native XML database systems
421B: Database Systems - The Relational Model 2
1 Definitions
Relational Database: a set of relations Relation: Consists of two parts: Schema: specifies name of relation, plus a set of attributes, plus the domain/type of each attribute E.g., Students(sid:string, name:string, login:string, faculty:string, gpa:real) Instance: set of tuples (all tuples are distinct). A relation can be seen as a table: Column headers = attribute names, rows = tuples/records, columns/fields = attribute values #Rows = cardinality #Fields = degree / arity If clear from context, we say instead of “instance of a relation” simply “relation” Database Schema: collection of relation schemas
421B: Database Systems - The Relational Model 3
Example of Students Relation
sid name login faculty gpa Column headers = attributes 53666 Bartoli bartoli@cs Science 3.4 53688 Chang chang@eecs Eng 3.2 Cardinality = 3 53650 Chang chang@math Science 3.8 Degree = 5 ...
All rows are distinct (set-oriented) Rows are not ordered (a permutation of rows represents still the same table) Columns are per definition not ordered but in practice we often assume a fixed order with this, a single tuple can be represented as (53666, Bartoli, bartoli@cs, Science, 3.4)
421B: Database Systems - The Relational Model 4
2 Relational DDL and DML
Data Definition Language (DDL): defines the schema of a database Data Manipulation Language (DML): “manipulates” the data, i.e., the instances of the relations Insert, update, delete tuples “Query” the relations: retrieve tuples that fulfill certain criteria (hence, often called “query language”) The Relational Model offers simple and powerful querying of data with precise semantics independent of how data is stored or whether changes in the physical structure are made (physical data independence)
421B: Database Systems - The Relational Model 5
The SQL Query Language
Developed by IBM (system R) in the 1970s Need for a standard since it is used by many vendors Standards: SQL-86 … SQL-99 / SQL3 (adds object-relational features) SQL:2003 (adds XML features)
421B: Database Systems - The Relational Model 6
3 SQL Data Types
All attributes must have a data type. SQL supports several basic data types Character and string types CHAR(n) denotes a character string of fixed length (containing trailing blanks for padding if necessary). VARCHAR2(n) denotes a string of up to n characters (between 0 and n characters). SQL permits reasonable coercion between values of character- string types Integer Types INT or INTEGER (names are synonyms) SHORTINT
421B: Database Systems - The Relational Model 7
Data Types (contd.)
Floating point numbers FLOAT or REAL (names are synonyms) DOUBLE PRECISION DECIMAL(n,d): real number with fixed decimal point. Value consists of n digits, with the decimal point d positions from the right. Dates and time: DATE: has the form ‘YYYY-MM-DD’ TIME: has the form ‘15:00:02’ or ‘15:00:02.5’ May be compared and converted to string types Bit strings User defined domains New name for a data type Possibility to define restrictions on values of domain (< 10)
421B: Database Systems - The Relational Model 8
4 Data Definition: Table Creation
Defines all attributes of the CREATE TABLE Students relation (sid CHAR(20), The type/domain of each name VARCHAR2(20), attribute is specified DBMS enforce correct type login CHAR(10), whenever a tuple is added or faculty VARCHAR(20), modified gpa REAL SQL is case insensitive DEFAULT 0.0) It is possible to define default values CREATE TABLE Enrolled Special NULL value: ‘unknown’ (sid CHAR(20), cid CHAR(20), grade CHAR(2))
421B: Database Systems - The Relational Model 9
Data Definition: Destroying and Altering Relations
DROP TABLE Students Destroys the relation Students. The schema information and the tuples are deleted.
ALTER TABLE Students ADD COLUMN firstYear:integer The schema of students is altered by adding a new field; every existing tuple in the current instance is extended with a null value in the new field.
421B: Database Systems - The Relational Model 10
5 Data Manipulation: insert, delete, update
Insert a single tuple using: INSERT INTO Students (sid,name,login,faculty,gpa) VALUES(53688,’Chang’,’cheng@eecs’,’Eng’,3.2) INSERT INTO Students (sid,name,login,faculty) VALUES(53688,’Chang’,NULL,’Eng’) Can delete all tuples satisfying some condition DELETE FROM Students WHERE name = ‘Chang’ Can update all tuples satisfying some condition
UPDATE Students SET gpa = 3.4 WHERE sid = 53688
421B: Database Systems - The Relational Model 11
Querying the Data
Find the names and gpa of all students with gpa less than 3.5 SELECT name, gpa FROM Students WHERE gpa < 3.5
sid name login faculty gpa name gpa 53666 Bartoli bartoli@cs Science 3.4 53688 Chang chang@eecs Eng 3.2 Bartoli 3.4 53650 Chang chang@math Science 3.8 Chang 3.2
421B: Database Systems - The Relational Model 12
6 Integrity Constraints (ICs)
Integrity Constraints must be true for any instance of the database; e.g., domain constraints ICs are specified when schema is defined ICs are checked when relations are modified A legal instance of a relation is one that satisfies all specified ICs. DBMS should not allow illegal instances If DBMS checks ICs, stored data is more faithful to real-world meaning (also checks for entry errors) Of course, DBMS can only check what is specified in the schema
421B: Database Systems - The Relational Model 13
Primary Key Constraints
A set of fields is a key for a relation if No two distinct tuples can have same values in all key fields, and This is not true for any subset of the key. If there are two or more keys, one of the candidates is chosen to be the primary key. The primary key attributes of a tuple may not be NULL. A set of fields that contains a subset of fields fulfilling the key constraint is called a superkey E.g. sid is a key for Students. (What about name?). The set (sid,gpa) is a superkey
421B: Database Systems - The Relational Model 14
7 Primary and Candidate Keys in SQL
Possibly many candidate keys exist, one of which is chosen as CREATE TABLE Students the primary key (sid CHAR(20) PRIMARY KEY, Each student has a unique id. name VARCHAR2(20), For a given student and course, … there is a single grade; further, no two students in a course receive the same grade CREATE TABLE Enrolled Application dependent (sid CHAR(20), Defined carelessly, an IC can cid CHAR(20), prevent the storage of database grade CHAR(2), instances that arise in practice! PRIMARY KEY (sid,cid), UNIQUE (cid,grade))
421B: Database Systems - The Relational Model 15
Foreign Key
Foreign Key: Set of fields in one relation that is used to “refer” to a tuple in another relation. Must correspond to the primary key of the second relation. Represents a “logical pointer”. Examples in relation Enrolled, sid is a foreign key referring to Students: Students(sid:CHAR(20),name:VARCHAR(20),login:CHAR(10), faculty:VARCHAR(20), gpa:REAL) Enrolled(sid:CHAR(20),cid:CAHR(20),grade:CHAR(2))
421B: Database Systems - The Relational Model 16
Foreign Key Constraint: the foreign key value of a tuple must represent an existing tuple in the referred relation Enrollment may only contain a tuple of a student who exists in the Students relation If all foreign key constraints are enforced, referential integrity is achieved, i.e., no dangling references
421B: Database Systems - The Relational Model 17
Foreign Keys in SQL
CREATE TABLE Enrolled Only students listed in (sid CHAR(20), the Students relation cid CHAR(20), should be allowed to grade CHAR(2), enroll for courses PRIMARY KEY (sid,cid) FOREIGN KEY (sid) REFERENCES Students)
sid cid grade sid name login faculty gpa 53666 Topology112 C 53666 Reggae203 B 53666 Bartoli bartoli@cs Science 3.4 53688 Chang chang@eecs Eng 3.2 53650 Topology112 A 53650 Chang chang@math Science 3.8 53668 History105 B
421B: Database Systems - The Relational Model 18
9 Enforcing Referential Integrity
An Enrolled tuple with a sid is inserted such that no tuple with this sid exists in Students Disallow insertion A Students tuple is deleted Delete all Enrolled tuples that refer to it Disallow the deletion of a Students tuple to which Enrolled tuples point Set sid in Enrolled tuples that refer to it to “default sid” (in SQL set sid in Enrolled tuples that refer to it to NULL value) The primary key of a Students tuple is changed Update the sid of all Enrolled tuples that refer to the original value Further options similar to delete
421B: Database Systems - The Relational Model 19
Referential Integrity in SQL/92
SQL standard supports all CREATE TABLE Enrolled 4 options on deletes and (sid CHAR(20), updates cid CHAR(20), Default is NO ACTION grade CHAR(2), delete/update is rejected PRIMARY KEY (sid,cid) CASCADE FOREIGN KEY (sid) also delete/update all REFERENCES Students tuples that refer to the deleted/updated tuple ON DELETE CASCADE SET NULL / SET DEFAULT ON UPDATE SET NULL) Set foreign key value of referencing tuple to NULL / given default
421B: Database Systems - The Relational Model 20
10 Where do ICs come from?
ICs are based on the semantics of the real-world enterprise that is being described in the database relations We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance An IC is a statement about all possible instances For example, we might think that the student name is a key because in a given instance there do not exist two tuples with the same name; but it might happen in the future. Key and foreign key ICs are the most common; more general ICs supported, too. (Discussed later in the course)
421B: Database Systems - The Relational Model 21
Logical Design: ER to Relational
Employees(eid, name, salary) Entity sets to tables CREATE TABLE Employees (eid CHAR(11), name name VARCHAR(20), eid salary salary REAL, PRIMARY KEY (eid)) Employees Departments(did, dname, budget) CREATE TABLE Departments (did INTEGER, dname CHAR(20), Departments budget REAL, PRIMARY KEY (did)) did budget dname
421B: Database Systems - The Relational Model 22
11 Many-to-many Relationship Sets
Map relationship set to table. Attributes of the table must include Keys for each participating entity set (as foreign keys) This set of attributes forms the key for the relation All descriptive attributes
Works_in(eid, did, since)
eid name salary CREATE TABLE Works_In (eid CHAR(11), Employees did INTEGER, since DATE, since Works_in PRIMARY KEY (eid,did), FOREIGN KEY (eid) Departments REFERENCES Employees, FOREIGN KEY (did) did dname budget REFERENCES Departments)
421B: Database Systems - The Relational Model 23
Relationships Sets with Key Constraints
Alternative 1: map relationship set to table Many-one from entity set E1 to entity set E2: key of E1 i.e., key of entity-set with the key constraint is the key for the new relationship table (did is now the key) One-one: key of either entity set Separate tables for entity sets (Employees and Departments)
eid name salary Manages(eid, did, since)
Employees CREATE TABLE Manages (eid CHAR(11), did INTEGER, Manages since since DATE, PRIMARY KEY (did), FOREIGN KEY (eid) Departments REFERENCES Employees, FOREIGN KEY (did) did dname budget REFERENCES Departments)
421B: Database Systems - The Relational Model 24
12 Relationships Sets with Key Constraints (contd.) Alternative II: include relationship set in table of the entity set with the key constraint Possible because there is at most one relationship per entity Not useful if many entities do not have a relationship (wasted space, many not filled values)
DepartmentsM(did, dname, budget, eidmgr, since) eid name salary CREATE TABLE DepartmentsM Employees (did INTEGER, dname CHAR(20), Manages since budget REAL, eidmgr CHAR(11), since DATE, Departments PRIMARY KEY (did), FOREIGN KEY (eidmgr) did dname budget REFERENCES EMPLOYEES(eid)) 421B: Database Systems - The Relational Model 25
Relationship Sets with Participation Constraints and Key Constraints Include relationship set in table of the entity set with the key constraint and the participation constraint We can capture participation constraints involving one entity set in a binary relationship if it also has a key constraint, but little else (at least within the table definitions)
DepartmentsM(did, dname, budget, eidmgr, since) eid name salary
Employees CREATE TABLE DepartmentsM (did INTEGER, dname CHAR(20), Manages since budget REAL, eidmgr CHAR(11) NOT NULL, Departments since DATE, PRIMARY KEY (did), did dname budget FOREIGN KEY (eidmgr) REFERENCES EMPLOYEES(eid)) 421B: Database Systems - The Relational Model 26
13 Renaming
In the case the keys of the participating entity sets have the same names we must rename attributes accordingly
name Reports_To(super-eid, sub-eid) eid salary CREATE TABLE Reports_To Employees (supervisor_eid CHAR(11), subordinate_eid CHAR(11), supervisor subordinate PRIMARY KEY (supervisor_eid, Reports_To subordinate_eid), FOREIGN KEY (supervisor_eid) REFERENCES Employees(eid), FOREIGN KEY (subordinate_eid) REFERENCES Employees(eid))
421B: Database Systems - The Relational Model 27
Translating Weak Entity Sets
Weak entity set and identifying relationship set are translated into a single table When the owner entity is deleted, all owned weak entities must also be deleted Dependants_Policy(pname, dob, eid name salary cost, eid)
Employees CREATE TABLE Dependants_Policy (pname CHAR(20), dob DATE, Policy cost cost REAL, eid CHAR(11), Dependants PRIMARY KEY (pname,eid), FOREIGN KEY (eid) pname dateofbirth REFERENCES Employees, ON DELETE CASCADE) 421B: Database Systems - The Relational Model 28
14 Translating ISA Hierarchies
General Approach: distribute information among relations Relation of superclass stores the general attributes and defines key Relations of subclasses have key of superclass and addit. attributes when a tuple in super-relation is deleted, corresponding tuples in sub-relation must be deleted Employees(eid, name, salary) Contract_Emps(contract_id,eid)
CREATE TABLE Employees eid name salary (eid CHAR(20), name CHAR(20),salary REAL, Employees PRIMARY KEY (ssn)) CREATE TABLE Contract_Emps hours_worked Contract_id (contract_id INTEGER, hourly_wages ISA eid CHAR(11), PRIMARY KEY (eid), Hourly_Emps Contract_Emps FOREIGN KEY (eid) REFERENCES Employees, ON DELETE CASCADE) 421B: Database Systems - The Relational Model 29
Translating ISA Hierarchies (contd.)
Object-oriented approach: Sub-classes have all attributes; Employees(eid, name, salary) if an entity is in a sub-class it does not Hourly_Emps(eid,name,salary, appear in the super-class relation; hourly_wages,hours_worked) Pro/Contra: Contract_Emps(eid, name, + A query asking for all hourly employees salary,contract_id) only has to go to one relation (in general approach it has to read two relations) CREATE TABLE Employees - Query on general attributes of all (eid CHAR(20) PRIMARY KEY, employees has to read all three tables name CHAR(20),salary REAL) - If an entity is both Hourly_emps and CREATE TABLE Hourly_Emps Contract_emps, name and salary are (eid CHAR(20) PRIMARY KEY, stored twice => undesired redundancy name CHAR(20),salary REAL, hourly_wages REAL, hours_worked REAL) CREATE TABLE Contract_Emps (eid CHAR(20) PRIMARY KEY, name CHAR(20),salary REAL, contract_id INTEGER)
421B: Database Systems - The Relational Model 30
15 Translating ISA Hierarchies (contd.)
Last Alternative: one big Employees(eid,name,salary, relation hourly_wages,hours_worked, contract_id) Create only one relation for the root entity set CREATE TABLE Employees with all attributes found (eid CHAR(20), anywhere in its network of name CHAR(20), subclasses. salary REAL, Put NULL in attributes not hourly_wages REAL, relevant to a given entity hourly_worked REAL, contract_id INTEGER, PRIMARY KEY (eid))
421B: Database Systems - The Relational Model 31
Translating Aggregation
No key constraints Key constraint from Projects(pid,started_on,pbudget) Projects to Departments Projects(pid, started_on, Departments(did,dname,budget) pbudget, did, since) Employees(eid,name,salary) No Sponsors Sponsors(pid,did,since) Monitors(pid, eid, until) Monitors(pid,did,eid,until) eid Key constraint from Sponsorsname Employees to Employees salary Sponsors(pid,did,eid,since,until) No Monitors Monitors until
since Started_on did dname budget pid pbudget Projects Sponsors Departments
421B: Database Systems - The Relational Model 32
16 Relational Model: Summary
A tabular representation of data Simple and intuitive, currently the most widely used model. Integrity constraints can be specified based on application semantics (up to a certain degree). DBMS checks for violations Two important ICs : primary and foreign keys In addition, we always have domain constraints Powerful and natural query languages exist Rules to translate ER to relational model
421B: Database Systems - The Relational Model 33
Review: Binary vs. Ternary
Putting any constraints inssn name lot pname dob upper picture: Key constraint on Policies Employees covers Dependants (to guarantee that each pid policy only owned by one cost employee), would also mean Policies that the policy can only cover one dependent Constraints of lower pname ssn name dob picture: lot Dependants Each dependant Employees determined by one policy Each policy is owned by beneficiary one employee purchaser
cost Policies pid 421B: Database Systems - The Relational Model 34
17 Binary vs. Ternary
Policies(pid, cost, eid) Key constraints allow us to Dependants(pname, dob, pid)
combine Purchaser with CREATE TABLE Policies Policies and Beneficiary with (pid INTEGER, Dependants cost REAL, eid CHAR(11) NOT NULL, Participation constraints PRIMARY KEY (pid), lead to NOT NULL FOREIGN KEY (eid) constraints (or primary key REFERENCES Employees in case of weak entity) ON DELETE CASCADE) CREATE TABLE Dependants What if policy is a weak (pname CHAR(20), entity? dob DATE, pid INTEGER, PRIMARY KEY (pname,pid), FOREIGN KEY (pid) REFERENCES Policies, ON DELETE CASCADE) 421B: Database Systems - The Relational Model 35
18