<<

Computing B P. Spaelti

Actors Database Links • IMDb (International Movie Database) http://www.imdb.com/ • Oracle of Bacon http://oracleofbacon.org/index.php

Goal We want to be able to ask questions like these: • What movies did Johnny Depp star in? • Who starred in the movie The Green Mile? • Who has Johnny Depp starred together with, and in what movie? • How many movies has Johnny Depp starred in? • How old is Johnny Depp? • etc.

Process: Part I 1. Actors Database

Name Birthdate Movies , Johnny Depp June 9, 1963 , Alice in Wonderland,

2. Data collection group work, data exchange to combine our databases

3. Using the data • How old is X? • How many actors are born in March? • Order actor by first name name / last name • How many movies for each actor?

1 Computing B P. Spaelti

Process: Part II Problem: The database has one big problem: the list of movies is very hard to work with. Sometimes there is just one movie, and sometimes many. For some actors a movie name comes first, for another actor last, or in the middle of the list. This means it is very hard to know if two actors starred in the same film. Another problem is that the movie names are repeated many times in the ‘Movie’ column. This leads to mistakes, since we might spell the movie names differently. A third problem is that we have no place to put information about the movies. For example we might want to include the year the movie was released.

Solution: Split the database We must try to make sure that ‘important’ information is entered only once in the database. The solution is to make separtate Actor and Movie databases

Actors Database

Name Birthdate Johnny Depp June 9, 1963

Movie Database

Title Year released Edward Scissorhands 1990

Now we can link the two databases using a new table called ‘starred-in’:

“Starred-in” Database

Actor Name Movie Title Johnny Depp Edward Scissorhands Johnny Depp Pirates of the Caribbean Johnny Depp Alice in Wonderland Keira Knightly Pirates of the Caribbean

2 Computing B P. Spaelti

Problem 2: Another problem with our database is that sometimes it the information can be unclear. For example there is more than one movie called ‘Alice in Wonderland’: one was made in the year 2010 and one in the year 1951. Johnny Depp only starred in the movie made in 2010.

Solution: Unique ID We use a number to distinguish items with the same information. This number must be unique for each item. This number is often called a KEY.

Movie Database

ID Title Year released 1 Edward Scissorhands 1990

2 Alice in Wonderland 2010

3 Alice in Wonderland 1951

4 Pirates of the Caribbean 2003

Actors Database

ID Name Birthdate

1 Johnny Depp June 9, 1963 2 Keira Knightly March 26, 1985

“Starred-in” Database

Actor_ID Actor Name Movie_ID Movie Title 1 Johnny Depp 1 Edward Scissorhands 1 Johnny Depp 4 Pirates of the Caribbean 1 Johnny Depp 2 Alice in Wonderland 2 Keira Knightly 4 Pirates of the Caribbean

3