<<

database

Comp 205 advanced web programming

1 Definition - a collection of structured information for one or more specific purposes – often represented by a cylinders

2 Definition Database - a collection of structured information for one or more specific purposes

Relational Database - information is stored in a set of related tables

Table – organizational structure used to represent an entity – Columns are attributes of entity also called fields • Name & Type – Rows are instances of the entity also called records

3 Music Database - 1st Attempt We can store our data in a delimited text file – Represents One- Solution – Problems?

Teenage Dream, Katy Perry, Teenage Dream, 2010 Viva la Vida, Coldplay, Death and All His Friends, 2009 Strong, Kanye West, Graduation, 2007 …

AJributes Name Ar(st Album Year Teenage Katy Perry Teenage Dream 2010 Dream Viva la Vida Coldplay Death and All his Friends 2009 Instances Stronger Kanye GraduaHon 2007 West

4 Normal Forms Normalization - process to develop clean DB design

Normal Forms – incremental set of DB designs – increases the number of tables & attributes – try to use most simple form as possible – goal is to have the greatest access to all data with the fewest operations

There are three main reasons to normalize a database:

1. to minimize duplicate data, 2. to minimize or avoid data modification issues, and 3. to simplify queries.

5 Reasons for Normalization

The first thing to notice is this table serves many purposes including:

1. Identifying the organization’s salespeople 2. Listing the sales offices and phone numbers 3. Associating a salesperson with an sales office 4. sShowing each salesperson’s customers Reasons for Normalization

1. Insert Anomoly. We cannot record a new sales office until we also know the sales person. a. in order to create the record, we need provide a . In our case this is the EmployeeID.

2. Update Anomoly. The same information is recorded in multiple rows. a. if the office number changes, then there are multiple updates that need to be made across all rows.

3. Deletion Anomoly. Deletion of a can cause more than one set of facts to be removed. a. if John Hunt retires, then deleting that row cause use to lose information about the New York office. First Normal Form (1NF) 1. All attributes are “single-valued” 2. All instances have a unique identifier

The repeating groups of columns now become separate rows in the Customer table linked by the EmployeeID foreign key. A foreign key is a value which matches back to another table’s primary key. This design is superior to our original table in several ways:

The original design limited each SalesStaffInformation entry to three customers. In the new design, the number of customers associated to each design is practically unlimited.

It was nearly impossible to Sort the original data by Customer. Now, it is simple to sort customers.

The insert and deletion anomalies for Customer have been eliminated. You can all the customer for a SalesPerson without having to delete the entire SalesStaffInformaiton row. First Normal Form (1NF) 1. All attributes are “single-valued” 2. All instances have a unique identifier

Does this 1NF work for our Music DB? Song – No, collaborations between artists Name ArHst Album Year Genre Record Label

6 Multiple Tables Multiple-value attribute should be removed by adding multiple tables

Song Ar(st Name Name Album Country Year Country Abbr Genre Record Label

7 Unique Identifiers We want a way to uniquely identify each song – covers, remakes, songs with same name Solution: create an artificial ID for each instance in each table – auto-incrementing integer

Song Ar(st ID ID Name Name Album Country Year Country Abbr Genre Record Label

Turnbull - CS205 - Topic 11 8 Relationships Three-types of Relationships – one-to-one - can usually merge two tables – one-to-many - most common – many-to-many - most complex What is the relationship between the song and artist tables?

Song Ar(st ID ID Name M2M Name Album Country Year Country Abbr Genre Record Label 9 2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity or

All the non-key columns are dependent on the table’s primary key.

The primary key uniquely identifies each row in a table. All columns must depend on the primary key:

in order to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer. 2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity

Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself,

“Does this serve to describe what the primary key identifies?”

If you answer “yes,” then the column is dependent on the primary key and belongs in the table.

If you answer “no,” then the column should be moved different table.

When a table is in second normal form, it has a single purpose, such as storing employee information. 2nd Normal Form (2NF)

The first issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID.

The second issue is that there are several attributes which don’t completely rely on the entire Customer table primary key. 2nd Normal Form (2NF)

Since the columns identified in red aren’t completely dependent on the table’s primary key, they belong elsewhere. In both cases, the columns are moved to new tables.

In the case of SalesOffice and OfficeNumber, a SalesOffice was created. A foreign key was then added to SalesStaffInformaiton so we can still describe in which office a sales person is based. 2nd Normal Form (2NF)

The changes to make Customer a second normal form table are trickier.

Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID! The three columns don’t depend on this part of the key.

So remove EmployeeID from the table 2nd Normal Form (2NF)

Now create a table named SalesStaffCustomer to describe which customers a sales person calls upon.

This table has two columns CustomerID and EmployeeID.

Together, they form a primary key.

Separately, they are foreign keys to the Customer and SalesStaffInformation tables respectively. 2nd Normal Form (2NF) 2nd Normal Form (2NF)

You can now eliminate all the sales people, yet retain customer records. Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing sales people.

The SalesStaffCustomer table is all keys!

This type of table is called an intersection table. An intersection table is useful when you need to model a many-to-many relationship.

2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity

Song Album Ar(st ID ID ID Name M2M Name O2M Name Album Year Country Year Country Abbr Genre

O2M

Genre ID Name 10 2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity

Album Song Ar(st ID O2M ID M2M ID Name Name Name Year Country O2M Country Abbr

Genre ID Name

11 3rd Normal Form (3NF) • Everything from 2NF • No Attribute Dependencies • Idea: don’t allow of bad data entry to corrupt DB

Album Song Ar(st ID O2M ID M2M ID Name Name Name Year Country O2M Country Abbr

Genre ID Country Name ID Name Abbr

12 3rd Normal Form (2NF) • Everything from 2NF • No Attribute Dependencies • Idea: don’t allow of bad data entry to corrupt DB

Album Song Ar(st ID O2M ID M2M ID Name Name Name Year O2M O2M transitive dependence: a column’s value relies upon Genre another column through a second ID Country intermediate column. Name ID

see https://www.essentialsql.com/get-ready-to-learn-sql-11- Name database-third-normal-form-explained-in-simple-english/ Abbr

13 Six Important Concepts

1. Entites are tables 2. Attributes or Fields are columns of tables 3. Each attributes has a data type (int, string, date) 4. Instances or Records are rows of a tables 5. Unique ID for instance is call the “primary key” 6. Relationships encoded as “foreign keys”

14 Foreign Keys For a one-to-many relationship, we add a “foreign key” to the “many” table.

Album Song

ID O2M ID Name Name Year

Album Song

ID O2M ID Name Name Year AlbumID

15 Many-To-Many We can implement M2M by adding “ tables” – sometime called junctions – Idea: M2M ≈ M2O + O2M

Song Ar(st ID M2M ID Name Name

Song SongToAr(st Ar(st ID O2M ID O2M ID Name SongID Name Ar(stID

16 Putting it all together

Album Song SongToAr(st Ar(st ID ID ID ID Name Name SongID Name Year AlbumID ArHstID CountryID GenreID

Genre Country ID ID Name Name Abbr

17 Summary: DB Schema Creation Algorithm

1. Identify Major Entities – draw a box for each table 2. Figure out attributes for each entity – add integer id – name & data type 3. Figure out relationship between each pair of entities – O2O – combine entities – O2M – add foreign key to – M2M – create a new join table

18 Exercise

Design a database schema for keeping track of class rosters (e.g., Homer): Hints: Consider students, courses and professors Assume each course has at most one professor

19 Next time We will introduce you to SQL – Structure – Designed to directly encode semantics of DB • “Select all songs by Kanye West from 2007”

20

COMP 205 advanced web programming

21 Pop Quiz Design a database schema for keeping track of class rosters (e.g., Homer):

Hints: Consider students, courses and professors Assume each course has at most one professor

22 IC Database Schema

Student StudentToCourse Course Instructor id id id id studentID courseNumber firstname firstname courseID days lastname lastname Hme email gpa room email instructorID major Rules: 1) Table are Capitalized Cammelback 2) ProperHes are (lowercase) Cammelback 3) First AJribute is always the “id” 4) Join Tables are called “Table1ToTable2”

23 Why ? Make it easy to relate, store, and retrieve data

server web server server-side program client request

response

database 24 24 SQL Structured Query Language standard for most DBs – , sqlite3, postgres

Uses: – create database “schema” – insert, , delete data – “query” the database of information

25 Learn by doing…

26