database
Comp 205 advanced web programming
1 Definition Database - a collection of structured information for one or more specific purposes – often represented by a cylinders
2 Definition Database - a collection of structured information for one or more specific purposes
Relational Database - information is stored in a set of related tables
Table – organizational structure used to represent an entity – Columns are attributes of entity also called fields • Name & Data Type – Rows are instances of the entity also called records
3 Music Database - 1st Attempt We can store our data in a delimited text file – Represents One-Table Solution – Problems?
Teenage Dream, Katy Perry, Teenage Dream, 2010 Viva la Vida, Coldplay, Death and All His Friends, 2009 Strong, Kanye West, Graduation, 2007 …
AJributes Name Ar(st Album Year Teenage Katy Perry Teenage Dream 2010 Dream Viva la Vida Coldplay Death and All his Friends 2009 Instances Stronger Kanye GraduaHon 2007 West
4 Normal Forms Normalization - process to develop clean DB design
Normal Forms – incremental set of DB designs – increases the number of tables & attributes – try to use most simple form as possible – goal is to have the greatest access to all data with the fewest operations
There are three main reasons to normalize a database:
1. to minimize duplicate data, 2. to minimize or avoid data modification issues, and 3. to simplify queries.
5 Reasons for Normalization
The first thing to notice is this table serves many purposes including:
1. Identifying the organization’s salespeople 2. Listing the sales offices and phone numbers 3. Associating a salesperson with an sales office 4. sShowing each salesperson’s customers Reasons for Normalization
1. Insert Anomoly. We cannot record a new sales office until we also know the sales person. a. in order to create the record, we need provide a primary key. In our case this is the EmployeeID.
2. Update Anomoly. The same information is recorded in multiple rows. a. if the office number changes, then there are multiple updates that need to be made across all rows.
3. Deletion Anomoly. Deletion of a row can cause more than one set of facts to be removed. a. if John Hunt retires, then deleting that row cause use to lose information about the New York office. First Normal Form (1NF) 1. All attributes are “single-valued” 2. All instances have a unique identifier
The repeating groups of columns now become separate rows in the Customer table linked by the EmployeeID foreign key. A foreign key is a value which matches back to another table’s primary key. This design is superior to our original table in several ways:
The original design limited each SalesStaffInformation entry to three customers. In the new design, the number of customers associated to each design is practically unlimited.
It was nearly impossible to Sort the original data by Customer. Now, it is simple to sort customers.
The insert and deletion anomalies for Customer have been eliminated. You can delete all the customer for a SalesPerson without having to delete the entire SalesStaffInformaiton row. First Normal Form (1NF) 1. All attributes are “single-valued” 2. All instances have a unique identifier
Does this 1NF work for our Music DB? Song – No, collaborations between artists Name ArHst Album Year Genre Record Label
6 Multiple Tables Multiple-value attribute should be removed by adding multiple tables
Song Ar(st Name Name Album Country Year Country Abbr Genre Record Label
7 Unique Identifiers We want a way to uniquely identify each song – covers, remakes, songs with same name Solution: create an artificial ID for each instance in each table – auto-incrementing integer
Song Ar(st ID ID Name Name Album Country Year Country Abbr Genre Record Label
Turnbull - CS205 - Topic 11 8 Relationships Three-types of Relationships – one-to-one - can usually merge two tables – one-to-many - most common – many-to-many - most complex What is the relationship between the song and artist tables?
Song Ar(st ID ID Name M2M Name Album Country Year Country Abbr Genre Record Label 9 2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity or
All the non-key columns are dependent on the table’s primary key.
The primary key uniquely identifies each row in a table. All columns must depend on the primary key:
in order to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID, to look up the answer. 2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity
Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself,
“Does this column serve to describe what the primary key identifies?”
If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
If you answer “no,” then the column should be moved different table.
When a table is in second normal form, it has a single purpose, such as storing employee information. 2nd Normal Form (2NF)
The first issue is the SalesStaffInformation table has two columns which aren’t dependent on the EmployeeID.
The second issue is that there are several attributes which don’t completely rely on the entire Customer table primary key. 2nd Normal Form (2NF)
Since the columns identified in red aren’t completely dependent on the table’s primary key, they belong elsewhere. In both cases, the columns are moved to new tables.
In the case of SalesOffice and OfficeNumber, a SalesOffice was created. A foreign key was then added to SalesStaffInformaiton so we can still describe in which office a sales person is based. 2nd Normal Form (2NF)
The changes to make Customer a second normal form table are trickier.
Rather than move the offending columns CustomerName, CustomerCity, and CustomerPostalCode to new table, recognize that the issue is EmployeeID! The three columns don’t depend on this part of the key.
So remove EmployeeID from the table 2nd Normal Form (2NF)
Now create a table named SalesStaffCustomer to describe which customers a sales person calls upon.
This table has two columns CustomerID and EmployeeID.
Together, they form a primary key.
Separately, they are foreign keys to the Customer and SalesStaffInformation tables respectively. 2nd Normal Form (2NF) 2nd Normal Form (2NF)
You can now eliminate all the sales people, yet retain customer records. Also, if all the SalesOffices close, it doesn’t mean you have to delete the records containing sales people.
The SalesStaffCustomer table is all keys!
This type of table is called an intersection table. An intersection table is useful when you need to model a many-to-many relationship.
2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity
Song Album Ar(st ID ID ID Name M2M Name O2M Name Album Year Country Year Country Abbr Genre
O2M
Genre ID Name 10 2nd Normal Form (2NF) • Everything from 1NF • Non-identifying attributes should be moved • Idea: if same value appears multiple time for an attribute, it should be another entity
Album Song Ar(st ID O2M ID M2M ID Name Name Name Year Country O2M Country Abbr
Genre ID Name
11 3rd Normal Form (3NF) • Everything from 2NF • No Attribute Dependencies • Idea: don’t allow of bad data entry to corrupt DB
Album Song Ar(st ID O2M ID M2M ID Name Name Name Year Country O2M Country Abbr
Genre ID Country Name ID Name Abbr
12 3rd Normal Form (2NF) • Everything from 2NF • No Attribute Dependencies • Idea: don’t allow of bad data entry to corrupt DB
Album Song Ar(st ID O2M ID M2M ID Name Name Name Year O2M O2M transitive dependence: a column’s value relies upon Genre another column through a second ID Country intermediate column. Name ID
see https://www.essentialsql.com/get-ready-to-learn-sql-11- Name database-third-normal-form-explained-in-simple-english/ Abbr
13 Six Important Concepts
1. Entites are tables 2. Attributes or Fields are columns of tables 3. Each attributes has a data type (int, string, date) 4. Instances or Records are rows of a tables 5. Unique ID for instance is call the “primary key” 6. Relationships encoded as “foreign keys”
14 Foreign Keys For a one-to-many relationship, we add a “foreign key” to the “many” table.
Album Song
ID O2M ID Name Name Year
Album Song
ID O2M ID Name Name Year AlbumID
15 Many-To-Many We can implement M2M by adding “join tables” – sometime called junctions – Idea: M2M ≈ M2O + O2M
Song Ar(st ID M2M ID Name Name
Song SongToAr(st Ar(st ID O2M ID O2M ID Name SongID Name Ar(stID
16 Putting it all together
Album Song SongToAr(st Ar(st ID ID ID ID Name Name SongID Name Year AlbumID ArHstID CountryID GenreID
Genre Country ID ID Name Name Abbr
17 Summary: DB Schema Creation Algorithm
1. Identify Major Entities – draw a box for each table 2. Figure out attributes for each entity – add integer id – name & data type 3. Figure out relationship between each pair of entities – O2O – combine entities – O2M – add foreign key to – M2M – create a new join table
18 Exercise
Design a database schema for keeping track of class rosters (e.g., Homer): Hints: Consider students, courses and professors Assume each course has at most one professor
19 Next time We will introduce you to SQL – Structure Query Language – Designed to directly encode semantics of DB • “Select all songs by Kanye West from 2007”
20 sql
COMP 205 advanced web programming
21 Pop Quiz Design a database schema for keeping track of class rosters (e.g., Homer):
Hints: Consider students, courses and professors Assume each course has at most one professor
22 IC Database Schema
Student StudentToCourse Course Instructor id id id id studentID courseNumber firstname firstname courseID days lastname lastname Hme email gpa room email instructorID major Rules: 1) Table are Capitalized Cammelback 2) ProperHes are (lowercase) Cammelback 3) First AJribute is always the “id” 4) Join Tables are called “Table1ToTable2”
23 Why Databases? Make it easy to relate, store, and retrieve data
server web server server-side program client request
response
database 24 24 SQL Structured Query Language standard for most DBs – mysql, sqlite3, postgres
Uses: – create database “schema” – insert, update, delete data – “query” the database of information
25 Learn by doing…
26