Normalization and Transactions LECTURE 8 Dr. Philipp Leitner [email protected] @xLeitix Some SQL Query Examples

EMPLOYEE(id, fname, lname, bday, location, manager) manager -> EMPLOYEE.id

DEPENDENT(name, relationship, employee) employee -> EMPLOYEE.id

PROJECT(id, name, department)

WORKS_ON(employee, project) employee -> EMPLOYEE.id project -> PROJECT.id

6/1/16 Chalmers 2 Some SQL Query Examples

Show all employees and their dependents.

6/1/16 Chalmers 3 Some SQL Query Examples

Show all employees and their dependents.

select * from employee, dependent where dependent.employee = employee.id;

6/1/16 Chalmers 4 Some SQL Query Examples

Which employee has no manager?

6/1/16 Chalmers 5 Some SQL Query Examples

Which employee has no manager?

select * from employee where manager is ;

6/1/16 Chalmers 6 Some SQL Query Examples

Show the names of all employees and the names of the projects they work on. Sort by the employee last name.

6/1/16 Chalmers 7 Some SQL Query Examples

Show the names of all employees and the names of the projects they work on. Sort by the employee last name.

select fname, lname, name from employee, works_on, project where employee.id = works_on.employee and works_on.project = project.id order by lname;

6/1/16 Chalmers 8 Some SQL Query Examples

Show the names of all employees and the names of the projects they work on. Sort by the employee last name.

Or: select fname, lname, name from employee inner join works_on on employee.id = employee inner join project on project = project.id order by lname;

6/1/16 Chalmers 9 Some SQL Query Examples

Show the last names of all employees and the last names of their managers. Employees with no managers should also be contained in the result.

6/1/16 Chalmers 10 Some SQL Query Examples

Show the last names of all employees and the last names of their managers. Employees with no managers should also be contained in the result.

select e.lname as Employee, m.lname as Manager from employee e left outer join employee m on e.manager = m.id;

6/1/16 Chalmers 11 LECTURE 8

Covers …

Database Normalization (Chapter 14) Transactions (Chapter 20)

6/1/16 Chalmers 12 Assignments

No directly matching assignment tasks - use the time to finish working on the SQL tasks in Assignment 2

6/1/16 Chalmers 13 What constitutes a “good” relational design?

6/1/16 Chalmers 14 Informal criteria for “good” design

Clear semantics Mapping to the real world

Minimal (controlled) redundancy

Avoidance of NULL values

Support for arbitrary queries

Efficiency

6/1/16 Chalmers 15 6/1/16 Chalmers 16 An example of a model that does not allow arbitrary queries

Original (good) mapping: EMPLOYEE(SSN, Fname, Minit, Lname, Bdate, Address, Sex, Salary, Super_ssn, Dno)

Bad mapping: EMPLOYEE(SSN, Fname, Minit, Lname, Bdate, Address, Sex, Salary)

(EMPLOYEE misses the foreign keys, so we can’t find out anymore which department an employee works in or who their manager is)

6/1/16 Chalmers 17 An example of redundancy

Original (good) mapping: WORKS_ON(ESSN, Pno, Hours)

Bad mapping (redundant): WORKS_ON(Emp#, Proj#, Ename, Pname, No_hours)

(the employee and project name are unnecessarily redundant, because we could also get them via a join to the employee and project relations)

Informally: A redundancy happens if information is stored explicitly that could also be derived from some other place in the .

6/1/16 Chalmers 18 On redundancy

Two problems of redundancy:

(1) wastes space (same info is stored multiple times) This is nowadays not the biggest problem anymore

(2) data can easily become corrupted when updating (update anomalies)

6/1/16 Chalmers 19 On redundancy

However:

There is usually a trade-off, and in some database designs developers will choose redundancy to improve query times.

Joins, and even moreso calculations, are expensive, but projections are cheap.

Alternatively: Using database views

6/1/16 Chalmers 20 Functional dependencies

Functional dependencies (FDs) Used to specify formal measures of the "goodness" of relational designs And keys are used to define normal forms for relations Are constraints that are derived from the meaning and interrelationships of the data attributes

A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y

6/1/16 Chalmers 21 Functional dependencies

A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y

Mathematically: X → Y (“X implies Y”)

Or in RA form: t1[X]=t2[X] → t1[Y]=t2[Y]

6/1/16 Chalmers 22 Some examples

Social security number determines employee name SSN → ENAME

Project number determines project name and location PNUMBER → {PNAME, PLOCATION}

Employee ssn and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} → HOURS

6/1/16 Chalmers 23 FDs and keys

FDs are not bidirectional SSN → ENAME, but the inverse is not true

Keys automatically have a FD on every attribute in the If you know the key, you can determine all other attribute values

6/1/16 Chalmers 24 Redundancies formulated as FDs

Note that data redundancies can be formulated as (unwanted) functional dependencies:

WORKS_ON(Emp#, Proj#, Ename, Pname, No_hours)

WORKS_ON[Ename] → EMPLOYEE[Ename] WORKS_ON[Pname] → PROJECT[Pname]

These are said to be redundant FDs As opposed to the expected FDs between primary / foreign keys. Colloquially, we can state that we need to maintain multiple pairs of FDs where one would do.

6/1/16 Chalmers 25 Acceptable FDs

Acceptable FDs always involve keys

X[x] → Y[y] is ok if (and only if):

(1) x or a subset of x is a key (primary or unique) and X==Y or (2) x or a subset of x is a pointing at a key of Y

6/1/16 Chalmers 26 Finding FDs

We cannot define functional dependencies without knowing what the attributes in our data mean

However, given a valid database state, we can rule out certain FDs

6/1/16 Chalmers 27 In-Class Exercise - Functional Dependencies

STUDENT(personnr, name, birthday)

ENROLLMENT(id, studend_id, course_name, student_name) studend_id -> STUDENT.personnr

• Identify one (ok) in STUDENT • Identify one (ok) functional dependency that spans both relations • Identify one unwanted likely functional dependency

6/1/16 Chalmers 28 In-Class Exercise - Functional Dependencies

HOTEL(id, name, age, type) ROOM(name, hotel_id, hotel_name, size, price) hotel_id -> HOTEL.id

• Identify one (ok) functional dependency in ROOM • Identify one unwanted likely functional dependency

6/1/16 Chalmers 29 Normal Forms and Normalization

Normalization is the process of identifying FDs and refactoring a so that (1) keys are properly identified and (2) unwanted FDs are removed.

Normal Forms: A database is said to be in normal form after normalization. Typically: 1NF, 2NF, 3NF (there are more which we won’t cover)

6/1/16 Chalmers 30 (1NF)

Disallows Composite attributes Multivalued attributes Nested relations

This basically comes automatically with the RM (RM does not support models that are not in 1NF)

6/1/16 Chalmers 31 (2NF)

Requires that every attribute that is not part of a is fully functionally dependent on the primary key.

That is, no attribute should have a FD on only a part of a composed primary key.

6/1/16 Chalmers 32 (3NF)

Requires that no non-key attribute has a functional dependency on another non-key attribute.

Basically: Avoid unwanted redundancies

6/1/16 Chalmers 33 More on normal forms

In the book you find (much) more formal definitions and concrete algorithms for normalization. Plus: more normal forms

In practice you will be fine if you follow these informal guidelines: Define relations / attributes in a way that makes domain sense Avoid redundancy Define primary keys

6/1/16 Chalmers 34 Kahoot Quiz

6/1/16 Chalmers 35 What we will be covering

Transactions

6/1/16 Chalmers 36 Multi-User

Single-user DBMS At most one user at a time can use the system So far we have used our DBMS in this way

Multi-user DBMS Many users can access the DBMS concurrently

So far we did not think about the potential of other users running competing updates in parallel

6/1/16 Chalmers 37 Concurrency in DBMSs DBMSs are typically designed to support multiple concurrent users Transactions are way to ensure consistency of interleaved processes

6/1/16 Chalmers 38 Example

6/1/16 Chalmers 39 Transactions

Transactions are an atomic set of statements: - Either all statements should be executed, or none of them - No other statements should be executed between statements in a transaction

6/1/16 Chalmers 40 Transactions Boundaries

Transactions are typically defined through transaction boundaries

Three kinds of markings: BEGIN TRANSACTION TRANSACTION (save changes to database) ROLLBACK TRANSACTION (discard changes)

6/1/16 Chalmers 41 Transactions in SQL

BEGIN used to start a new transaction

COMMIT used to save changes to disk

ROLLBACK used to undo changes

6/1/16 Chalmers 42 In Postgres For example:

BEGIN; INSERT INTO … INSERT INTO … INSERT INTO … COMMIT;

6/1/16 Chalmers 43 ACID Properties of SQL Transactions Atomicity All changes are applied, or nothing is applied Consistency Database is always in a consistent state (no “in-between” time) Isolation - guarantees that concurrent transactions are not interfering Durability Once a change is applied, it remains even through failures

6/1/16 Chalmers 44 Some Transactional Problems: Lost Updates

6/1/16 Chalmers 45 Some Transactional Problems: Temporary Updates

6/1/16 Chalmers 46 Some Transactional Problems: Nonrepeatable Reads

Transaction T reads the same item twice Value is changed by another transaction T′ between the two reads

T receives different values for the two reads of the same item

6/1/16 Chalmers 47 Database Isolation Violations Dirty read Read operations can return uncommited data of other transactions (temporary updates) Nonrepeatable reads Reading the same twice within a transaction may show different attribute values Phantom reads Executing the same query twice within a transaction may lead to a differing number of results

6/1/16 Chalmers 48 Database Isolation Levels

6/1/16 Chalmers 49 In Postgres

SET TRANSACTION isolation level ;

Where is one of: SERIALIZABLE REPEATABLE READ READ COMMITED (default in Postgres, as in most SQL DBs) READ UNCOMMITED

6/1/16 Chalmers 50 Key Takeaways

Normal forms and normalization: Functional dependencies Normal forms (1/2/3NF)

Transactions are bundles of statements that should be executed in an all-or-nothing fashion ACID (Atomicity, Consistency, Isolation, Durability) Database Isolation Levels

6/1/16 Chalmers 51