CISC 7610 Lecture 2 Review of relational databases
Topics: Relational database management systems Example data modeling problem Entity-relationship diagrams Structured query language
A relational database management system (RDBMS) ● Uses relational data structures ● Has a declarative data manipulation language at least as powerful as the relational algebra ● Not required, but typically also – Supports ACID transactions – Uses SQL as the data manipulation language
Uses relational data structures
● Relation: table with rows and columns ● Attribute: column
● Tuple: row
● Key: combination of attributes that uniquely identifies each row
● Integrity rules: Constraints imposed upon the database
Has a declarative data manipulation language ● Declarative: says what, not how to manipulate data ● Relational algebra – Selection: extract a subset of tuples – Projection: extract a subset of attributes – Cartesian product: extract all combinations of pairs of tuples from two relations – Union: combine two sets of tuples – Set difference: remove one set of tuples from another
Supports ACID transactions
● Transaction: A sequence of DB operations that represents a single real-world operation
● ACID properties – Guaranteed by RDBMSs – Atomicity: all operations happen or none – Consistency: transaction moves DB from one state that meets integrity constraints to another – Isolation: concurrent transactions have the same effect as serial – Durability: once committed, transaction’s effects are permanent
● Example: bank account transfer
● Relaxed by NoSQL databases in various combinations
Structured query language (SQL)
● Data definition language – Define relational schemata (pl of schema) – Create/alter/delete tables and the attributes
● Data manipulation language – Insert/delete/modify tuples in relations – Query one or more tables
● Can implement relational algebra, but also takes some liberties with it
Example data: Music collection
● Artists: Name ● Albums: Name, Release date
● Tracks: Name, Duration, Number
● Each album has one artist
● Tracks can appear on multiple albums (compilations)
Schema normalization: Unnormalized data
Artist Album Released Track Num Track Dur
David Bowie Space 1969 1 Space 5:15 Oddity Oddity
David Bowie … Ziggy 1972 10 Suffragette 3:25 Stardust ... city
David Bowie Best of 2002 1 Space 5:15 Bowie Oddity
David Bowie Best of 2002 8 Suffragette 3:25 Bowie city
Queen Hot space 1982 11 Under 4:02 pressure
Entity-relationship diagrams
Attribute Entity
Cardinality
Relationship
Cardinality2
Entity2
Do: Draw ER diagram for ex data
● Artists: Name ● Albums: Name, Release date
● Tracks: Name, Duration, Number
● Each album has one artist
● Tracks can appear on multiple albums (compilations)
Translating ER diagrams to schema
● Entities become tables ● Attributes become their attributes
● Many-to-many relationships become join tables – Can have additional attributes
● Other relationships become foreign keys – One-to-one, many-to-one, one-to-many – Attributes added to table
Do: Translate ER diagram to schema for example data
SQL CREATE statement
CREATE TABLE table_name ( column_name1 data_type(size), column_name2 data_type(size), column_name3 data_type(size), .... );
Do: Create tables for example data
SQL INSERT statement
INSERT INTO table_name (column1,column2,column3,...) VALUES (value1,value2,value3,...);
Do: Populate tables with ex data
Artists Albums Id Name Id Name Release ArtistId 1 Space oddity 1969 1
1 David Bowie 2 … Ziggy 1972 1 startdust ...
2 Queen 3 Best of Bowie 2002 1
4 Hot space 1982 2
Track AlbumsHaveTracks Id Name Duration AlbumId TrackId Number 1 Space 5:15 1 1 1 oddity 2 2 10 2 Suffragette 3:25 city 3 1 1 3 Under 4:02 3 2 8 pressure 4 3 11
Schema normalization: Anomalies in unnormalized data ● The above example schema can suffer from three types of “anomalies” – Update anomaly: repeated data could be inconsistent between rows – Insertion anomaly: can’t add info on artist or album with out a track – Deletion anomaly: deleting the last track deletes an album or artist
Schema normalization: Normal forms
● Schema normalization factors logically independent data into independent relations
● And links them using foreign key relationships
● Projection is the process of factoring an unnormalized relation into separate normalized relations
● Boyce-Codd normal form: there are only non-trivial functional dependencies from superkeys (sets of attributes that uniquely identify entities) to other attributes
Schema normalization: Unnormalized data
Artist Album Released Track Num Track Dur
David Bowie Space 1969 1 Space 5:15 Oddity Oddity
David Bowie … Ziggy 1972 10 Suffragette 3:25 Stardust ... city
David Bowie Best of 2002 1 Space 5:15 Bowie Oddity
David Bowie Best of 2002 8 Suffragette 3:25 Bowie city
Queen Hot space 1982 11 Under 4:02 pressure
Schema normalization: Normalized data
Artists Albums Id Name Id Name Release ArtistId 1 Space oddity 1969 1
1 David Bowie 2 … Ziggy 1972 1 startdust ...
2 Queen 3 Best of Bowie 2002 1
4 Hot space 1982 2
Track AlbumsHaveTracks Id Name Duration AlbumId TrackId Number 1 Space 5:15 1 1 1 oddity 2 2 10 2 Suffragette 3:25 city 3 1 1 3 Under 4:02 3 2 8 pressure 4 3 11
Reminder: Main question of course
How can systems process and store multimedia data so that users can find what they are looking for in the future?
Queries: find what they are looking for ● Search through the data ● Search through complex relationships
● Aggregate over the data for reporting
● And do all of this efficiently...
SQL SELECT, single table
SELECT attribute1, attribute2 FROM relation WHERE attribute1 = 'condition' ORDER BY attribute2;
Do: Write a select query to answer
What is the duration of “Suffragette City”?
SQL SELECT, multiple tables
SELECT r1.attribute1, r2.attribute1 FROM relation1 AS r1, Relation2 AS r2 WHERE attribute1 = 'condition' AND r1.attribute1 = r2.attribute2 ORDER BY r1.attribute1;
Do: Write a select query to answer
Find the AlbumIds of all of David Bowie's albums
Do: Write a select query to answer
Find the TrackIds of all of David Bowie's tracks
Do: Write a select query to answer
● Find all songs containing David Bowie's vocals ● Find all songs at 120 beats per minute
● Find all songs sampled by other artists – These all require further modeling or analysis of the audio...
How do we make databases that are
● Effective (correct, durable, coherent, ...) – Transactions ● Efficient – Concurrency – Memory hierarchy – Indexing – Query optimization