The Relational Model  What is the Relational Model ▪ Relations ▪ Domain Constraints  SQL  Integrity Constraints  Translating an ER diagram to the Relational Model and SQL  Views

 A relational database consists of a collection of tables ▪ Each table has a unique name ▪ Each row represents a single entity or relationship  A table corresponds to the mathematical concept of a set ▪ Note that the definition of a relation* is not the same as the usual mathematical definition *per E.F. Codd

▪ Relations correspond to tables a relation is a set of tuples (d1, d2, ..., dn), where each element dj is a ▪ Tuples correspond to rows member of Dj, a data domain  The key construct in the relational model is a relation ▪ A DB is a collection of relations  Relations consist of a relation instance and a relation schema ▪ A relation instance corresponds to an actual table with a set of rows ▪ A relation schema consists of the column headings and domains of the table  A schema consists of ▪ The relation's name ▪ The name of each column in the table, also known as a field, or attribute ▪ The domain of each field ▪ Each domain has a set of associated values like a type  For example, the Customer relation ▪ Customer = (sin, firstName, lastName, age, income) ▪ The schema also specifies the domain of each attribute Relation schema: Customer = (sin, firstName, lastName, age, salary)

sin firstName lastName age salary 111 23 43000.00

222 Xander Harris 22 6764.87 Relation instance 333 47 71098.65 444 17 4033.32

Each field (attribute) has a domain: char(11), char(20), char(20), integer, real

An instance is a (small) subset of the Cartesian product of the domains  A domain of a field is similar to the type of a variable ▪ The values that can appear in a field are restricted to the field's domain  A relation instance is a subset of the Cartesian product of the list of domains

▪ If a relation schema has domains D1 to Dn

▪ Where the domains are associated with fields f1 tofn

▪ A relation must be a subset of D1  D2  …  Dn-1  Dn ▪ In practice a relation instance is usually some very small subset of the Cartesian product  A relation instance is a set of tuples, or records ▪ Each tuple has the same number of fields as the schema  No two tuples are identical ▪ Relations are defined to be sets of unique tuples ▪ A set is a collection of distinct values ▪ In practice DBMS allow tables to have duplicate records in some circumstances  The order of the tuples is not important  The degree of a relation is the number of fields ▪ Also referred to as arity ▪ The degree of the Customer relation is 5  The cardinality of a relation instance is the number of tuples it contains ▪ The cardinality of the Customer relation instance is 4  A relational database is a collection of relations with distinct relation names ▪ A relational database schema is the collection of schemas for all relations in the database  Relational model  Database Terminology ▪ Relation schema ▪ Table schema ▪ Relation instances ▪ Tables (relations) ▪ Columns or attributes ▪ Fields ▪ Records or rows ▪ Tuples

 SQL stands for Structured Query Language  SQL is divided into two parts ▪ Data Manipulation Language (DML) which allows users to create, modify and query data ▪ Data Definition Language (DDL) which is used to define external and conceptual schemas  The DDL supports the creation, deletion and modification of tables ▪ Including the specification of domain constraints and other constraints  To create a table use the CREATE TABLE statement ▪ Specify the table name, field names and domains

CREATE TABLE Customer ( Question – is SQL case sensitive? sin CHAR(11), Answer – SQL keywords (create firstName CHAR(20), and table for example) are not lastName CHAR(20), case sensitive. age INTEGER, Named objects (tables, columns income REAL) etc.) may be.  To insert a record into an existing table use the INSERT statement ▪ The list of column names is optional ▪ If omitted the values must be in the same order as the columns

INSERT INTO Customer(sin, firstName, lastName, age, income) VALUES ('111', 'Sam', 'Spade', 23, 65234)  To delete a record use the DELETE statement ▪ The WHERE clause specifies the record(s) to be deleted DELETE FROM Customer WHERE sin = '111'

 Be careful, the following SQL query deletes all the records in a table DELETE FROM Customer  Use the UPDATE statement to modify a record, or records, in a table ▪ Note that the WHERE statement is evaluated before the SET statement  Like DELETE the WHERE clause specifies which records are to be updated

UPDATE Customer SET age = 37 WHERE sin = '111'  To delete a table use the DROP TABLE statement ▪ This not only deletes all of the records but also deletes the table schema DROP TABLE Customer ▪ http://xkcd.com/327/  Columns can be added or removed to tables using the ALTER TABLE statement ▪ ADD to add a column and ▪ DROP to remove a column

ALTER TABLE Customer ADD height INTEGER

ALTER TABLE Customer DROP height  An integrity constraint restricts the data that can be stored in a DB ▪ To prevent invalid data being added to the DB ▪ e.g. two people with the same SIN or ▪ someone with a negative age  When a DB schema is defined, the associated integrity constraints should also be specified  A DBMS checks every update to ensure that it does not violate any integrity constraints  Domain Constraints ▪ Specified when tables are created by selecting the type of the data ▪ e.g. age INTEGER  Key Constraints ▪ Identifies primary keys and other candidate keys  Foreign Key Constraints ▪ References primary keys of other tables  General constraints  A key constraint states that a minimal subset of the fields in a relation is unique ▪ i.e. a candidate key  SQL allows two sorts of key constraints  The UNIQUE statement identifies candidate keys ▪ Records may not have duplicate values for the set of attributes specified in the UNIQUE statement  A PRIMARY KEY identifies the primary key ▪ Records may not have duplicate primary keys and ▪ May not have null values for primary key attributes  A primary key identifies the unique set of attributes that will be used to identify a record in a table ▪ In some cases there are obvious primary keys in the problem domain (e.g. SIN) ▪ In other cases, a primary key may be generated by the system  Candidate keys are identified to record the fact that a set of attributes should be unique ▪ Preventing duplicate data being entered into the table for this set of attributes  Assume that a patient can be uniquely identified by either SIN or MSP number ▪ SIN is chosen as the primary key CREATE TABLE Patient ( sin CHAR(11), msp CHAR(15), firstName CHAR(20), lastName CHAR(20), age INTEGERINTEGER, ) CONSTRAINT unique_msp UNIQUE (msp) PRIMARY KEY (sin) )  Consider the abbreviated ERD shown below ▪ Takes is a many-to-many relationship, so a database table must be created for it (more on this later) ▪ It makes no sense for the Takes table to be allowed to contain the student ID of students who do not exist  The student ID in Takes should be identified as a foreign key, that references Student

Student takes Course  A foreign key constraint references the primary key of another relation  The value of the foreign key attribute(s) must match a primary key in the referenced table

referencing referenced relation relation Student Takes Course  The attributes of the foreign key and the referenced primary key must be consistent ▪ The same number of attributes, with ▪ Compatible attribute types, but ▪ The attribute names may be different  A foreign key references the entire primary key ▪ Where the primary key is compound the foreign key must also be compound ▪ And not multiple single attribute foreign keys an account must be owned by a single customer Customer owns Account

CREATE TABLE Account( accountNumber INTEGER, type CHAR(5), balance REAL, custSIN CHAR(11), PRIMARY KEY (accountNumber))), CONSTRAINT fk_customer FOREIGN KEY (custSIN) REFERENCES Customer )  A DB may require constraints other than primary keys, foreign keys and domain constraints ▪ Limiting domain values to subsets of the domain ▪ e.g. limit grade to the set of legal grades ▪ Or other constraints involving multiple attributes  SQL supports two kinds of general constraint ▪ Table constraints associated with a single table ▪ Assertions which may involve several tables and are checked when any of these tables are modified  These will be covered later in the course  Whenever a table with a constraint is modified the constraint must be checked ▪ A modification can be a deletion, a change (update) or an insertion of a record ▪ If a transaction violates a constraint what happens?  Primary key constraints ▪ A primary key constraint can be violated in two ways ▪ A record can be changed or inserted so that it duplicates the primary key of another record in the table or ▪ A record can be changed or inserted so that (one of) the primary key attribute(s) is null ▪ In either case the transaction is rejected  Consider these schema (the domains have been omitted for brevity) ▪ Customer = (sin, firstName, lastName, age, income) ▪ Account = (accountNumber, balance, type, sin) ▪ sin in Account is a foreign key referencing Customer

age balance income sin accountNumber

Customer owns Account

lastName firstName type number balance type sin Inserting {409, 0, CHQ, 761 904.33 CHQ 111 555} into Account violates 856 1011.45 CHQ 333 the foreign key on sin as 903 12.05 CHQ 222 there is no sin 555 in 1042 10000.00 SAV 333 Customer

sin first last age salary The insertion is rejected 111 Buffy Summers 23 43000.00 To process the insertion a 222 Xander Harris 22 6764.87 Customer with a sin of 555 333 Rupert Giles 47 71098.65 must first be inserted into 444 Dawn Summers 17 4033.32 the Customer table number balance type sin Changing this record’s sin 761 904.33 CHQ 111 to 555 violates the foreign 856 1011.45 CHQ 333 key, again leading to the transaction being rejected 903 12.05 CHQ 222 1042 10000.00 SAV 333

sin first last age salary 111 Buffy Summers 23 43000.00 222 Xander Harris 22 6764.87 333 Rupert Giles 47 71098.65 444 Dawn Summers 17 4033.32 number balance type sin 761 904.33 CHQ 111 856 1011.45 CHQ 333 903 12.05 CHQ 222 1042 10000.00 SAV 333

sin first last age salary Deleting Buffy violates 111 Buffy Summers 23 43000.00 the foreign key, because a record with 222 Xander Harris 22 6764.87 that sin exists in the 333 Rupert Giles 47 71098.65 Account table 444 Dawn Summers 17 4033.32 number balance type sin 761 904.33 CHQ 111 856 1011.45 CHQ 333 903 12.05 CHQ 222 1042 10000.00 SAV 333

sin first last age salary Updating Buffy's sin to 111 Buffy Summers 23 43000.00 666 also violates the foreign key, because a 222 Xander Harris 22 6764.87 record with the old sin 333 Rupert Giles 47 71098.65 is in the Account table 444 Dawn Summers 17 4033.32  A deletion or update transaction in the referenced table may violate a foreign key  Different responses can be specified in SQL ▪ Reject the transaction (the default) – NO ACTION ▪ Delete or update the referencing record – CASCADE ▪ Set the referencing record's foreign key attribute(s) to null (only on deletion) – SET NULL ▪ Set the referencing record's foreign key attribute(s) to a default value (only on deletion) – SET DEFAULT ▪ The default value must be specified in the foreign key

 The ER model is a high level design which can be translated into a relational schema ▪ And therefore into SQL  Each entity and relationship set can be represented by a unique relation ▪ Rows in an entity set table represent unique entities ▪ Rows in a relationship set table represent associations between entities ▪ Some relationship sets do not need to be represented by separate relations ▪ SQL constraints should be created to represent constraints identified in the ERD  A table that represents an entity set should have the following characteristics ▪ One column for each attribute ▪ The domain of each attribute should be known and specified in the table ▪ The primary key should be specified in the table ▪ Other constraints that have been identified outside the ER model should also be created where possible CREATE TABLE Customer ( sin sin CHAR(11), firstName CHAR(20), age lastName lastName CHAR(20), age INTEGER, income firstName income REAL, Customer PRIMARY KEY (sin) )  The attributes of a relationship set are ▪ The primary keys of the participating entity sets, and ▪ Any descriptive attributes of the relationship set  The mapping cardinalities of the entities involved in a relationship determine ▪ The primary key of the relationship set and ▪ Whether or not the relationship set needs to be represented as a separate table  Foreign keys should be created for the attributes derived from the participating entity sets CREATE TABLE WorksIn ( sin CHAR(11), branchName CHAR(30), startDate DATETIME, FOREIGN KEY (sin) REFERENCES Employee, FOREIGN KEY (branchName) REFERENCES Branch, PRIMARY KEY (sin, branchName) ) startDate budget salary sin city branchName name

Employee worksIn Branch  A separate table is required to represent a relationship set with no cardinality constraints ▪ i.e. relationships that are many to many  The primary key of the relationship set is the union of the attributes derived from its entity sets ▪ A compound key made up of the primary keys of the participating entity sets  Attributes derived from its entity sets should be declared as foreign keys CREATE TABLE Manages ( managerSIN CHAR(11), subordinateSIN CHAR(11), FOREIGN KEY (managerSIN) REFERENCES Employee, FOREIGN KEY (subordinateSIN) REFERENCES Employee, PRIMARY KEY (managerSIN, subordinateSIN) )

salary firstName

sin lastName

subordinate manager Employee manages  Determine attributes for the table as for a relationship set without cardinality constraints  To determine the primary key of a binary relationship set ▪ If the relationship is one to one then the primary key of either entity set can be its primary key ▪ But it must be only one of the two primary keys ▪ If the relationship is many to one, or one to many the primary key of the many entity set is its primary key ▪ That is, the entity set whose entities can only appear once in the relationship set; the entity set with the key constraint  If the relationship set has no cardinality constraints the primary key is a compound key ▪ Union of the primary keys of the participating entity sets  If the relationship set has at least one cardinality constraint ▪ The primary key is taken from one of the entity sets with a many relationship ▪ That is, one of the entity sets whose entities can be involved in at most one relationship  A binary relationship does not require a table if at least one its entities has a key constraint  Add the relationship set's attributes to the table for the entity set that provided the primary key ▪ If the entity’s participation in the relationship is partial, some rows may not have data for these attributes ▪ Such rows will contain null for these fields, which wastes space, and has other undesirable properties  The attributes from the other entity sets involved in the relationship are specified as foreign keys CREATE TABLE Employee ( Since an employee can only sin CHAR(11), work in one branch there is still name CHAR(40), only one row for each employee in the Employee table salary REAL, branchName CHAR(30), No table is created for the startDate DATETIME, worksIn relationship FOREIGN KEY (branchName) REFERENCES Branch, PRIMARY KEY (sin) ) startDate budget salary sin address branchName name

Employee worksIn Branch  Where possible participation constraints should be included in a table specification  By declaring the attributes on which there is a foreign key as NOT NULL ▪ This approach only works when a relationship set is not represented in a separate table ▪ i.e. when the relationship is represented as a foreign key in a table that represents an entity set ▪ Specifying an attribute as NOT NULL forces every record of that entity set to have a value for the attribute  Some participation constraints must be modeled using assertions, triggers or other methods CREATE TABLE Owns ( sin and accountNumber sin CHAR(11), cannot be null but the accountNumber INTEGER, participation constraint is FOREIGN KEY (sin) REFERENCES Customer, not represented FOREIGN KEY (accountNumber) REFERENCES Account, PRIMARY KEY (sin, accountNumber) ) Why not?

sin type birthdate lastName accountNumber balance income firstName

Customer owns Account CREATE TABLE Employee ( sin CHAR(11), name CHAR(40), salary REAL, Making branchName NOT NULL ensures that every Employee must branchName CHAR(30) NOT NULL, work in a branch startDate DATETIME, FOREIGN KEY (branchName) REFERENCES Branch, PRIMARY KEY (sin) ) startDate budget salary sin address branchName name

Employee worksIn Branch CREATE TABLE WorksIn ( Note that a table is not required sin CHAR(11), for Duration as it would be branchID INTEGER, redundant from DATETIME, All durations must appear in to DATETIME, worksIn and Duration only has FOREIGN KEY (sin) REFERENCES Employee, primary key attributes FOREIGN KEY (branchID) REFERENCES Branch, PRIMARY KEY (sin, branchID, from, to) ) from to

Duration branchID lastName sin branchName budget salary firstName

Employee worksIn Branch This ERD requires the following tables

Account with NOT NULL foreign key, sin, references Customer and represents Owns

sin type birthdate lastName accountNumber balance income firstName

Customer owns Account

Customer, with foreign key empSIN salary sin to represent PersonalBanker name personal And an Employee table banker Employee  A weak entity set has the following characteristics ▪ Total participation in the identifying relationship ▪ A cardinality constraint with the identifying relationship ▪ A partial key  Therefore the weak entity set should: ▪ Include a foreign key (to its owner entity set), the attributes of which are part of the WES' primary key ▪ Which prevents these attributes from being null ▪ Usually specify the foreign key as ON DELETE CASCADE CREATE TABLE Room ( name CHAR(10), squareFeet REAL, taxNumber INTEGER, PRIMARY KEY (name, taxNumber), FOREIGN KEY (taxNumber) REFERENCES Building ON DELETE CASCADE)

street number squareFeet name city taxNumber

House contains Room  There are two basic approaches to translating class hierarchies to tables  Create separate tables for the superclass and for each subclass ▪ The superclass table only contains attributes of the superclass entity ▪ The subclass tables contain their own attributes, and the primary key attributes of the superclass ▪ Superclass attributes are declared as both the primary key and a foreign key in the subclasses ▪ Cascade deletion of superclass records to subclasses  Or …  Create tables for the subclasses only ▪ The subclass tables contain their attributes, and all of the attributes of the superclass ▪ The primary key of the subclass is the primary key of the superclass  This assumes that there are no entities in the superclass that are not entities in a subclass ▪ i.e. That a coverage constraint exists  Otherwise coverage and overlap constraints must be specified using assertions Building = (taxNumber, city, street, number, tax) and …

RentalProperty = (taxNumberFK, units, rate) taxNumber HeritageBuilding = (taxNumberFK, buildYear) city street

number tax Building

units rate buildYear ISA

RentalProperty HeritageBuilding RentalProperty = (taxNumber, city, street, number, tax, units, rate) and …

HeritageBuilding = (taxNumber, city, street, number, tax, buildYear)

taxNumber city street

number tax Building

units rate buildYear ISA

RentalProperty HeritageBuilding  An aggregate entity is represented by the table defining the relationship set in the aggregation  The relationship set between the aggregate entity and the other entity has the following attributes: ▪ The primary key of the participating entity set, and ▪ The primary key of the relationship set that defines the aggregate entity, and ▪ Its own descriptive attributes, if any  The normal rules for determining primary keys and omitting tables apply Create tables for the entity sets In this example, each department, project pair is monitored by an Project Department Employee employee And the relationship sets monitors sponsors monitors Employee

Department sponsors Project  There is a case where no table is required for an aggregate entity ▪ Even where there are no cardinality constraints ▪ If there is total participation between the aggregate entity and its relationship, and ▪ If the aggregate entity does not have any descriptive attributes ▪ Insert the attributes of the aggregate entity into the table representing the relationship with that entity A project, department pair must be Create tables for the entity sets monitored by an employee, so each such pair must appear in monitors Project Department Employee

If sponsors has no descriptive attributes, create a monitors monitors Employee table with the primary keys of Employee, Project and Department

Department sponsors Project  Views are not explicitly stored in a DB but are created as required from a view definition ▪ At least conceptually, if not necessarily in practice  Once defined, views may be referred to in the same way as a tables  View are useful for ▪ Convenience – users can access the data they require without referring to many tables ▪ Security – users can only access appropriate data ▪ Independence – views can help mask changes in the conceptual schema  Consider a DB containing these two tables: ▪ Branch =(branchName, managerSIN, budget) ▪ Employee = (sin, name, salary, birthDate, branchName) ▪ Where branchName in Employee is a foreign key  A view is needed to access the employee's SIN, name and manager’s SIN CREATE VIEW Emp_Manager (E.sin, E.name, B.managerSIN) AS SELECT E.sin, E.name, B.managerSIN FROM Branch B Employee E WHERE B.branchName = E.branchName  Users should be able to interact with views as if they were base tables ▪ And without being aware of the distinction ▪ This goal is achieved when running queries on views  However updates to views may cause problems ▪ Particularly if a view is derived from multiple tables and does not include the primary keys of all tables  Generally, updates are only allowed on views derived from a single table ▪ And, even then, there are further restrictions

 Two distinct tuples in a relation instance, cannot have identical values in all the fields of a superkey ▪ Or if two entities have the same superkey values, they must have the same values for their other attributes ▪ So must, in fact, be the same entity  We can describe this idea formally in this way ▪ A subset K of a relation R is a superkey of R if for all pairs of

tuples t1 and t2 | t1  t2 then t1[K]  t2[K] ▪ That is, where K is a set of attributes of a relation R ▪ K is a superkey for R if for all pairs of different tuples there are no two tuples with the same values for K  The notion of a superkey can be generalized and applied to smaller sets of attributes ▪ e.g. Vehicle = {vin, colour, model, capacity} ▪ vin (vehicle identification number) is a superkey ▪ We can say that capacity is functionally determined by vin as there can only be one capacity for a particular vin ▪ The model and the capacity have the same relationship ▪ A particular model can only have one capacity ▪ However, unlike the vin, the model does not functionally determine the colour  We call this relationship functional dependency  Denote functional dependencies as follows ▪ If R is a relation and if   R and   R we can express a functional dependency as     means subset,  proper subset

▪    holds on R if for all pairs of tuples such that t1[] = t2[] then t1[] = t2[] ▪ A subset K of a relation R is a superkey if K  R  In the vehicle example ▪ vin  {vin, colour, model, capacity} ▪ model  {capacity}  Why use functional dependencies? ▪ To specify constraints on relations ▪ To determine if a relation is legal given a set of functional dependencies ▪ To reason about whether or not the set of tables in a database is appropriate ▪ We will use functional dependencies to decompose tables into normal forms  We will discuss functional dependencies in more detail later in the course