<<

CISC 7610 Lecture 2 Review of relational

Topics: Relational management systems Example data modeling problem Entity-relationship diagrams Structured

A relational database management system (RDBMS) ● Uses relational data structures ● Has a declarative data manipulation language at least as powerful as the ● Not required, but typically also – Supports ACID transactions – Uses SQL as the data manipulation language

Uses relational data structures

: with rows and columns ● Attribute:

● Tuple:

● Key: combination of attributes that uniquely identifies each row

● Integrity rules: Constraints imposed upon the database

Has a declarative data manipulation language ● Declarative: says what, not how to manipulate data ● Relational algebra – Selection: extract a subset of tuples – Projection: extract a subset of attributes – Cartesian product: extract all combinations of pairs of tuples from two relations – Union: combine two sets of tuples – Set difference: remove one set of tuples from another

Supports ACID transactions

● Transaction: A sequence of DB operations that represents a single real-world operation

● ACID properties – Guaranteed by RDBMSs – Atomicity: all operations happen or none – Consistency: transaction moves DB from one state that meets integrity constraints to another – Isolation: concurrent transactions have the same effect as serial – Durability: once committed, transaction’s effects are permanent

● Example: bank account transfer

● Relaxed by NoSQL databases in various combinations

Structured query language (SQL)

● Data definition language – Define relational schemata (pl of schema) – Create/alter/delete tables and the attributes

● Data manipulation language – Insert/delete/modify tuples in relations – Query one or more tables

● Can implement relational algebra, but also takes some liberties with it

Example data: Music collection

● Artists: Name ● Albums: Name, Release date

● Tracks: Name, Duration, Number

● Each album has one artist

● Tracks can appear on multiple albums (compilations)

Schema normalization: Unnormalized data

Artist Album Released Track Num Track Dur

David Bowie Space 1969 1 Space 5:15 Oddity Oddity

David Bowie … Ziggy 1972 10 Suffragette 3:25 Stardust ... city

David Bowie Best of 2002 1 Space 5:15 Bowie Oddity

David Bowie Best of 2002 8 Suffragette 3:25 Bowie city

Queen Hot space 1982 11 Under 4:02 pressure

Entity-relationship diagrams

Attribute Entity

Cardinality

Relationship

Cardinality2

Entity2

Do: Draw ER diagram for ex data

● Artists: Name ● Albums: Name, Release date

● Tracks: Name, Duration, Number

● Each album has one artist

● Tracks can appear on multiple albums (compilations)

Translating ER diagrams to schema

● Entities become tables ● Attributes become their attributes

● Many-to-many relationships become join tables – Can have additional attributes

● Other relationships become foreign keys – One-to-one, many-to-one, one-to-many – Attributes added to table

Do: Translate ER diagram to schema for example data

SQL CREATE statement

CREATE TABLE table_name ( column_name1 data_type(size), column_name2 data_type(size), column_name3 data_type(size), .... );

Do: Create tables for example data

SQL INSERT statement

INSERT INTO table_name (column1,column2,column3,...) VALUES (value1,value2,value3,...);

Do: Populate tables with ex data

Artists Albums Id Name Id Name Release ArtistId 1 Space oddity 1969 1

1 David Bowie 2 … Ziggy 1972 1 startdust ...

2 Queen 3 Best of Bowie 2002 1

4 Hot space 1982 2

Track AlbumsHaveTracks Id Name Duration AlbumId TrackId Number 1 Space 5:15 1 1 1 oddity 2 2 10 2 Suffragette 3:25 city 3 1 1 3 Under 4:02 3 2 8 pressure 4 3 11

Schema normalization: Anomalies in unnormalized data ● The above example schema can suffer from three types of “anomalies” – Update anomaly: repeated data could be inconsistent between rows – Insertion anomaly: can’t add info on artist or album with out a track – Deletion anomaly: deleting the last track deletes an album or artist

Schema normalization: Normal forms

● Schema normalization factors logically independent data into independent relations

● And links them using relationships

● Projection is the process of factoring an unnormalized relation into separate normalized relations

● Boyce-Codd normal form: there are only non-trivial functional dependencies from superkeys (sets of attributes that uniquely identify entities) to other attributes

Schema normalization: Unnormalized data

Artist Album Released Track Num Track Dur

David Bowie Space 1969 1 Space 5:15 Oddity Oddity

David Bowie … Ziggy 1972 10 Suffragette 3:25 Stardust ... city

David Bowie Best of 2002 1 Space 5:15 Bowie Oddity

David Bowie Best of 2002 8 Suffragette 3:25 Bowie city

Queen Hot space 1982 11 Under 4:02 pressure

Schema normalization: Normalized data

Artists Albums Id Name Id Name Release ArtistId 1 Space oddity 1969 1

1 David Bowie 2 … Ziggy 1972 1 startdust ...

2 Queen 3 Best of Bowie 2002 1

4 Hot space 1982 2

Track AlbumsHaveTracks Id Name Duration AlbumId TrackId Number 1 Space 5:15 1 1 1 oddity 2 2 10 2 Suffragette 3:25 city 3 1 1 3 Under 4:02 3 2 8 pressure 4 3 11

Reminder: Main question of course

How can systems process and store multimedia data so that users can find what they are looking for in the future?

Queries: find what they are looking for ● Search through the data ● Search through complex relationships

● Aggregate over the data for reporting

● And do all of this efficiently...

SQL SELECT, single table

SELECT attribute1, attribute2 FROM relation WHERE attribute1 = 'condition' ORDER BY attribute2;

Do: Write a select query to answer

What is the duration of “Suffragette City”?

SQL SELECT, multiple tables

SELECT r1.attribute1, r2.attribute1 FROM relation1 AS r1, Relation2 AS r2 WHERE attribute1 = 'condition' AND r1.attribute1 = r2.attribute2 ORDER BY r1.attribute1;

Do: Write a select query to answer

Find the AlbumIds of all of David Bowie's albums

Do: Write a select query to answer

Find the TrackIds of all of David Bowie's tracks

Do: Write a select query to answer

● Find all songs containing David Bowie's vocals ● Find all songs at 120 beats per minute

● Find all songs sampled by other artists – These all require further modeling or analysis of the audio...

How do we make databases that are

● Effective (correct, durable, coherent, ...) – Transactions ● Efficient – Concurrency – Memory hierarchy – Indexing –