Normalization and Transactions LECTURE 8 Dr. Philipp Leitner [email protected] @xLeitix Some SQL Query Examples
EMPLOYEE(id, fname, lname, bday, location, manager) manager -> EMPLOYEE.id
DEPENDENT(name, relationship, employee) employee -> EMPLOYEE.id
PROJECT(id, name, department)
WORKS_ON(employee, project) employee -> EMPLOYEE.id project -> PROJECT.id
6/1/16 Chalmers 2 Some SQL Query Examples
Show all employees and their dependents.
6/1/16 Chalmers 3 Some SQL Query Examples
Show all employees and their dependents.
select * from employee, dependent where dependent.employee = employee.id;
6/1/16 Chalmers 4 Some SQL Query Examples
Which employee has no manager?
6/1/16 Chalmers 5 Some SQL Query Examples
Which employee has no manager?
select * from employee where manager is null;
6/1/16 Chalmers 6 Some SQL Query Examples
Show the names of all employees and the names of the projects they work on. Sort by the employee last name.
6/1/16 Chalmers 7 Some SQL Query Examples
Show the names of all employees and the names of the projects they work on. Sort by the employee last name.
select fname, lname, name from employee, works_on, project where employee.id = works_on.employee and works_on.project = project.id order by lname;
6/1/16 Chalmers 8 Some SQL Query Examples
Show the names of all employees and the names of the projects they work on. Sort by the employee last name.
Or: select fname, lname, name from employee inner join works_on on employee.id = employee inner join project on project = project.id order by lname;
6/1/16 Chalmers 9 Some SQL Query Examples
Show the last names of all employees and the last names of their managers. Employees with no managers should also be contained in the result.
6/1/16 Chalmers 10 Some SQL Query Examples
Show the last names of all employees and the last names of their managers. Employees with no managers should also be contained in the result.
select e.lname as Employee, m.lname as Manager from employee e left outer join employee m on e.manager = m.id;
6/1/16 Chalmers 11 LECTURE 8
Covers …
Database Normalization (Chapter 14) Transactions (Chapter 20)
6/1/16 Chalmers 12 Assignments
No directly matching assignment tasks - use the time to finish working on the SQL tasks in Assignment 2
6/1/16 Chalmers 13 What constitutes a “good” relational design?
6/1/16 Chalmers 14 Informal criteria for “good” design
Clear semantics Mapping to the real world
Minimal (controlled) redundancy
Avoidance of NULL values
Support for arbitrary queries
Efficiency
6/1/16 Chalmers 15 6/1/16 Chalmers 16 An example of a model that does not allow arbitrary queries
Original (good) mapping: EMPLOYEE(SSN, Fname, Minit, Lname, Bdate, Address, Sex, Salary, Super_ssn, Dno)
Bad mapping: EMPLOYEE(SSN, Fname, Minit, Lname, Bdate, Address, Sex, Salary)
(EMPLOYEE misses the foreign keys, so we can’t find out anymore which department an employee works in or who their manager is)
6/1/16 Chalmers 17 An example of redundancy
Original (good) mapping: WORKS_ON(ESSN, Pno, Hours)
Bad mapping (redundant): WORKS_ON(Emp#, Proj#, Ename, Pname, No_hours)
(the employee and project name are unnecessarily redundant, because we could also get them via a join to the employee and project relations)
Informally: A redundancy happens if information is stored explicitly that could also be derived from some other place in the database.
6/1/16 Chalmers 18 On redundancy
Two problems of redundancy:
(1) wastes space (same info is stored multiple times) This is nowadays not the biggest problem anymore
(2) data can easily become corrupted when updating (update anomalies)
6/1/16 Chalmers 19 On redundancy
However:
There is usually a trade-off, and in some database designs developers will choose redundancy to improve query times.
Joins, and even moreso calculations, are expensive, but projections are cheap.
Alternatively: Using database views
6/1/16 Chalmers 20 Functional dependencies
Functional dependencies (FDs) Used to specify formal measures of the "goodness" of relational designs And keys are used to define normal forms for relations Are constraints that are derived from the meaning and interrelationships of the data attributes
A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y
6/1/16 Chalmers 21 Functional dependencies
A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y
Mathematically: X → Y (“X implies Y”)
Or in RA form: t1[X]=t2[X] → t1[Y]=t2[Y]
6/1/16 Chalmers 22 Some examples
Social security number determines employee name SSN → ENAME
Project number determines project name and location PNUMBER → {PNAME, PLOCATION}
Employee ssn and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} → HOURS
6/1/16 Chalmers 23 FDs and keys
FDs are not bidirectional SSN → ENAME, but the inverse is not true
Keys automatically have a FD on every attribute in the relation If you know the key, you can determine all other attribute values
6/1/16 Chalmers 24 Redundancies formulated as FDs
Note that data redundancies can be formulated as (unwanted) functional dependencies:
WORKS_ON(Emp#, Proj#, Ename, Pname, No_hours)
WORKS_ON[Ename] → EMPLOYEE[Ename] WORKS_ON[Pname] → PROJECT[Pname]
These are said to be redundant FDs As opposed to the expected FDs between primary / foreign keys. Colloquially, we can state that we need to maintain multiple pairs of FDs where one would do.
6/1/16 Chalmers 25 Acceptable FDs
Acceptable FDs always involve keys
X[x] → Y[y] is ok if (and only if):
(1) x or a subset of x is a key (primary or unique) and X==Y or (2) x or a subset of x is a foreign key pointing at a key of Y
6/1/16 Chalmers 26 Finding FDs
We cannot define functional dependencies without knowing what the attributes in our data mean
However, given a valid database state, we can rule out certain FDs
6/1/16 Chalmers 27 In-Class Exercise - Functional Dependencies
STUDENT(personnr, name, birthday)
ENROLLMENT(id, studend_id, course_name, student_name) studend_id -> STUDENT.personnr
• Identify one (ok) functional dependency in STUDENT • Identify one (ok) functional dependency that spans both relations • Identify one unwanted likely functional dependency
6/1/16 Chalmers 28 In-Class Exercise - Functional Dependencies
HOTEL(id, name, age, type) ROOM(name, hotel_id, hotel_name, size, price) hotel_id -> HOTEL.id
• Identify one (ok) functional dependency in ROOM • Identify one unwanted likely functional dependency
6/1/16 Chalmers 29 Normal Forms and Normalization
Normalization is the process of identifying FDs and refactoring a database schema so that (1) keys are properly identified and (2) unwanted FDs are removed.
Normal Forms: A database is said to be in normal form after normalization. Typically: 1NF, 2NF, 3NF (there are more which we won’t cover)
6/1/16 Chalmers 30 First Normal Form (1NF)
Disallows Composite attributes Multivalued attributes Nested relations
This basically comes automatically with the RM (RM does not support models that are not in 1NF)
6/1/16 Chalmers 31 Second Normal Form (2NF)
Requires that every attribute that is not part of a primary key is fully functionally dependent on the primary key.
That is, no attribute should have a FD on only a part of a composed primary key.
6/1/16 Chalmers 32 Third Normal Form (3NF)
Requires that no non-key attribute has a functional dependency on another non-key attribute.
Basically: Avoid unwanted redundancies
6/1/16 Chalmers 33 More on normal forms
In the book you find (much) more formal definitions and concrete algorithms for normalization. Plus: more normal forms
In practice you will be fine if you follow these informal guidelines: Define relations / attributes in a way that makes domain sense Avoid redundancy Define primary keys
6/1/16 Chalmers 34 Kahoot Quiz
6/1/16 Chalmers 35 What we will be covering
Transactions
6/1/16 Chalmers 36 Multi-User Databases
Single-user DBMS At most one user at a time can use the system So far we have used our DBMS in this way
Multi-user DBMS Many users can access the DBMS concurrently
So far we did not think about the potential of other users running competing updates in parallel
6/1/16 Chalmers 37 Concurrency in DBMSs DBMSs are typically designed to support multiple concurrent users Transactions are way to ensure consistency of interleaved processes
6/1/16 Chalmers 38 Example
6/1/16 Chalmers 39 Transactions
Transactions are an atomic set of statements: - Either all statements should be executed, or none of them - No other statements should be executed between statements in a transaction
6/1/16 Chalmers 40 Transactions Boundaries
Transactions are typically defined through transaction boundaries
Three kinds of markings: BEGIN TRANSACTION COMMIT TRANSACTION (save changes to database) ROLLBACK TRANSACTION (discard changes)
6/1/16 Chalmers 41 Transactions in SQL
BEGIN used to start a new transaction
COMMIT used to save changes to disk
ROLLBACK used to undo changes
6/1/16 Chalmers 42 In Postgres For example:
BEGIN; INSERT INTO … INSERT INTO … INSERT INTO … COMMIT;
6/1/16 Chalmers 43 ACID Properties of SQL Transactions Atomicity All changes are applied, or nothing is applied Consistency Database is always in a consistent state (no “in-between” time) Isolation Concurrency control - guarantees that concurrent transactions are not interfering Durability Once a change is applied, it remains even through failures
6/1/16 Chalmers 44 Some Transactional Problems: Lost Updates
6/1/16 Chalmers 45 Some Transactional Problems: Temporary Updates
6/1/16 Chalmers 46 Some Transactional Problems: Nonrepeatable Reads
Transaction T reads the same item twice Value is changed by another transaction T′ between the two reads
T receives different values for the two reads of the same item
6/1/16 Chalmers 47 Database Isolation Violations Dirty read Read operations can return uncommited data of other transactions (temporary updates) Nonrepeatable reads Reading the same row twice within a transaction may show different attribute values Phantom reads Executing the same query twice within a transaction may lead to a differing number of results
6/1/16 Chalmers 48 Database Isolation Levels
6/1/16 Chalmers 49 In Postgres
SET TRANSACTION isolation level
Where
6/1/16 Chalmers 50 Key Takeaways
Normal forms and normalization: Functional dependencies Normal forms (1/2/3NF)
Transactions are bundles of statements that should be executed in an all-or-nothing fashion ACID (Atomicity, Consistency, Isolation, Durability) Database Isolation Levels
6/1/16 Chalmers 51