<<

CSC4480: Principles of Systems

Lecture 4: Relational Model • Conceptual (High-level) – How users perceive the data – Entity, Attribute, Relationship – ER (Entity-Relationship) Model

• Implementation (Representational) – Has direct implementation in DBMS, but hides details of disk storage – Relational Data model – Ch. 5 – SQL – Ch. 6 & 7

• Physical (Low-level) – How data is stored on the physical media, invisible to most users – File structure, hashing, indexing – Ch. 16 & 17

Slide 2 – First Look

• Represents the db as a collection of relations • But what’s a ? – Like a : represents an instance of an entity, columns give meaning to the values in the row • Each row is called a tuple and each is an attribute • Each attribute has a domain of valid values

Slide 3 The attributes and tuples of a relation STUDENT. Figure 5.1

Slide 4 Relational Model – Formal Definitions

• Relation Schema with name R is denoted by R(A1, A2, A3…An) where Ai are attributes

• Each attribute Ai has values in the domain D, denoted by dom(Ai)

• Degree of a relation is the number of attributes in its relation schema

• Ex: STUDENT(Name, Ssn, Home_phone,Address, Office_phone, Age, Gpa) is a relation schema named Student of degree 7

Slide 5 Relational Model – Formal Definitions

• Relation r of the relation schema R(A1, A2, A3…An), denoted by r(R), is a set of n-tuples r = {t1, t2, t3…tn}

• Each n-tuple t is an ordered list of n values t=< v1, v2, v3…vn >, where each value vi (1<=i<=n) is an element of dom(Ai) or is NULL

th • The i value in tuple t, which corresponds to the attribute Ai, can be referred to in several equivalent ways t[Ai] t. Ai t[i]

Slide 6 Relational Model – Formal Definitions

• Given the prior formal definitions, a relation can be formally defined as follows

• A relation r(R) is a mathematical relation of degree n on the domains dom(A1), dom(A2) … dom(An), which is a subset of the Cartesian product of the domains that define R

r(R) Í dom(A1) x dom(A2) x … x dom(An)

• The Cartesian product specifies all possible combinations of values from the underlying domains.

Slide 7 Definition Summary – Fill in the Formal Terms

Informal Terms Formal Terms Table Column Header All possible Column Values Row Table Definition Populated Table

Slide 8 Definition Summary – Fill in the Formal Terms

Informal Terms Formal Terms Table Relation Column Header Attribute All possible Column Domain Values Row Tuple Table Definition Entity Populated Table Schema of a Relation

Slide 9 Ordering

Ordering of Tuples in a Relation • A relation is a set of tuples • A set has no inherent ordering • Therefore, two relations with the same collection of tuples would be equal, regardless of the order in which the tuples are presented

Ordering of Attributes in a Single Tuple • Our formal definition of an n-tuple stated “ordered list of attributes” • In an abstract sense, though, we could consider a tuple to be a set of pairs (in which case order doesn’t matter)

Slide 10 Relational Model Constraints

1. Model-based constraints – characteristics of relations Ex: A relation cannot have duplicate tuples 2. Schema-based constraints – constraints that can be explicitly expressed in the data model (via DDL) 3. Application-based constraints – cannot be expressed in data model, and so must be enforced by the application, a.k.a. business rules

Slide 11 Domain Constraints

• Attribute value must be atomic and from its allowed domain

• Data type specified for each attribute – What data types can you think of?

• Values can be limited to a subset of those allowed by the data type or an enumeration

Slide 12 Keys

In ER model, saw that an entity can have one or more keys, and keys may be single-valued or composite

More definitions about keys: • Key vs. • Candidate Keys vs. • Surrogate vs. Natural Key

Slide 13 Key Constraints

• Superkey of R: – Is a set of attributes SK of R with the following condition: •No two tuples in any valid relation state r(R) will have the same value for SK •That is, for any distinct tuples t1 and t2 in r(R), t1[SK] ¹ t2[SK] •This condition must hold in any valid state r(R) • Key of R: – A "minimal" superkey – That is, a key is a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey (does not possess the superkey uniqueness property) • A Key is a Superkey but not vice versa

Slide 14 Key Constraints (continued)

• Example: Consider the CAR relation schema: – CAR(State, Reg#, SerialNo, Make, Model, Year) – CAR has two keys: •Key1 = {State, Reg#} •Key2 = {SerialNo} – Both are also superkeys of CAR – {SerialNo, Make} is a superkey but not a key. • In general: – Any key is a superkey (but not vice versa) – Any set of attributes that includes a key is a superkey – A minimal superkey is also a key

Slide 15 • SSN is a key • {SSN, Name, Age} is a superkey • {SSN, Name} and {SSN, Age} are also superkeys • Remove the redundant information from a superkey to form a key

Slide 16 Key Constraints (continued)

• If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. – The primary key attributes are underlined. • Example: Consider the CAR relation schema: – CAR(State, Reg#, SerialNo, Make, Model, Year) – We chose SerialNo as the primary key • The primary key value is used to uniquely identify each tuple in a relation – Provides the tuple identity • Also used to reference the tuple from another tuple – General rule: Choose as primary key the smallest of the candidate keys (in terms of size) – Not always applicable – choice is sometimes subjective

Slide 17 CAR table with two candidate keys – LicenseNumber chosen as Primary Key

Slide 18 Schema

• Relational Database Schema: – A set S of relation schemas that belong to the same database. – S is the name of the whole database schema – S = {R1, R2, ..., Rn} and a set IC of integrity constraints. – R1, R2, …, Rn are the names of the individual relation schemas within the database S • Following slide shows a COMPANY database schema with 6 relation schemas

Slide 19 COMPANY Database Schema

Slide 20 Relational Database State

• A relational database state DB of S is a set of relation states DB = {r1, r2, ..., rm} such that each ri is a state of Ri and such that the ri relation states satisfy the integrity constraints specified in IC.

• A relational database state is sometimes called a relational database snapshot or instance.

• We will not use the term instance since it also applies to single tuples.

• A database state that does not meet the constraints is an invalid state

Slide 21 Populated database state

• Each relation will have many tuples in its current relation state

• The relational database state is a union of all the individual relation states

• Whenever the database is changed, a new state arises

• Basic operations for changing the database: – INSERT a new tuple in a relation – DELETE an existing tuple from a relation – MODIFY an attribute of an existing tuple

• Next slide (Fig. 5.6) shows an example state for the COMPANY database schema shown in Fig. 5.5.

Slide 22 Populated database state for COMPANY

Slide 23 Integrity

• Entity integrity •

Slide 24 Entity Integrity

• Entity Integrity: – The primary key attributes PK of each relation schema R in S cannot have values in any tuple of r(R). •This is because primary key values are used to identify the individual tuples. •t[PK] ¹ null for any tuple t in r(R) •If PK has several attributes, null is not allowed in any of these attributes – Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key.

Slide 25 Referential Integrity

• A constraint involving two relations – The previous constraints involve a single relation.

• Used to specify a relationship among tuples in two relations: – The referencing relation and the referenced relation.

Slide 26 Referential Integrity

• Tuples in the referencing relation R1 have attributes FK (called attributes) that reference the primary key attributes PK of the referenced relation R2. – A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].

• A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.

Slide 27 Referential Integrity (or foreign key) Constraint

• Statement of the constraint – The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either: •(1) a value of an existing primary key value of a corresponding primary key PK in the referenced relation R2, or •(2) a null. • In case (2), the FK in R1 should not be a part of its own primary key.

Slide 28 Displaying a relational database schema and its constraints

• Each relation schema can be displayed as a row of attribute names • The name of the relation is written above the attribute names • The primary key attribute (or attributes) will be underlined • A foreign key (referential integrity) constraint is displayed as a directed arc (arrow) from the foreign key attributes to the referenced table – Can also point to the primary key of the referenced relation for clarity • Next slide shows the COMPANY relational schema diagram with referential integrity constraints

Slide 29 Referential Integrity Constraints for COMPANY database

Slide 30 Primary Key vs Foreign Key

• Primary key is what uniquely identifies each row in a table

• A foreign key is used to maintain referential integrity between two or more tables that have relationships with each other

• Referential integrity is the concept that the tables must be consistent with each other, so if one table refers to another, that reference must exist

Slide 31 PK/FK Example

• If employee Bob works for Department 42, then Department 42 must exist for referential integrity constraint to be satisfied • EMPLOYEE.DNO is FK to D.DNUMBER • D.DNUMBER is PK of DEPARTMENT TABLE • If E.DNO = 42, then there must be a PK D.DNUMBER = 42 • If no such D.DNUMBER existed, then there would be an inconsistency in the database

Slide 32 PK

FK

Slide 33 Operations

• Insert • Update • Delete • Transactions • Dealing with constraint violations

Slide 34 Insert

• Adds a new tuple t to the relation R

• Must include a list of values for the attributes

• Example: Want to add a new tuple to the Employee relation

• What values would be we need?

Slide 35 Delete

• Remove a tuple t from the relation R

• Example: Want to remove a tuple from the Employee relation with SSN = 123-45-6789

Slide 36 Update

• Modify an existing tuple t in the relation R by changing the value of one or more of its attributes

• Example: Change the manager of the Employee tuple with SSN = 123- 45-6789 to Mgr_Ssn=111-22-3344

Slide 37 Operations on Relations

• Operations on Relations can violate constraints • In case of integrity violation, several actions can be taken: – Cancel the operation that causes the violation (RESTRICT or REJECT option) – Perform the operation but inform the user of the violation – Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) – Execute a user-specified error-correction routine

Slide 38 Possible violations for each operation

• INSERT may violate any of the constraints: – Domain constraint: •if one of the attribute values provided for the new tuple is not of the specified attribute domain – Key constraint: •if the value of a key attribute in the new tuple already exists in another tuple in the relation – Referential integrity: •if a foreign key value in the new tuple references a primary key value that does not exist in the referenced relation – Entity integrity: •if the primary key value is null in the new tuple

Slide 39 Possible violations for each operation

• DELETE may violate only referential integrity: – If the primary key value of the tuple being deleted is referenced from other tuples in the database •Can be remedied by several actions: RESTRICT, CASCADE, SET NULL (see Chapter 6 for more details) –RESTRICT option: reject the deletion –CASCADE option: propagate the deletion by deleting tuples that reference the tuple that is being deleted –SET NULL option: set the foreign keys of the referencing tuples to NULL (related option is SET DEFAULT, which sets the FKs to point to another tuple) – One of the above options must be specified during for each foreign key constraint

Slide 40 Possible violations for each operation

• UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified • Any of the other constraints may also be violated, depending on the attribute being updated: – Updating the primary key (PK): •Similar to a DELETE followed by an INSERT •Need to specify similar options to DELETE – Updating a foreign key (FK): •May violate referential integrity – Updating an ordinary attribute (neither PK nor FK): •Can only violate domain constraints

Slide 41 Transaction

• Operations can be grouped together into a logical “chunk” of behavior • This is called a transaction • At the end of executing all the operations in a transaction, the db must be left in a valid state with no constraints violated • Typically, either all operations succeed or the whole transaction is rolled back (canceled) • We will talk about transactions more later in the semester

Slide 42