Building a Predictor for Movie Ratings Final Report Haowen Cao
[email protected] Daniel Holstein
[email protected] Casatrina Lee
[email protected] Abstract directors who are involved in it. We modeled the database as a directed bipartite graph with actors, We investigate the degree to which movie directors and movies as nodes. We chose to focus popularity is a function of cast and genre. primarily on directors and actors based on the as- We mined the open-source IMDb dataset, sumption that those two roles would be the most and restructured it into two bipartite graphs visible to audiences and therefore most influential (a graph of movies to actors, and a graph in determining the popularity of the movie. Edges of movies to genres). Using the HITS algo- are drawn from actors and directors to movies when rithm, we assign each actor and director a the particular person has worked on that particular hub score and an authority score based on movie. the popularity of future projects with other Using the HITS algorithm, we would be able to collaborators. Using a similar technique, assign each actor and director a hub score and an we also use the ratings of the movie to pre- authority score based on the popularity of his past dict the popularity of a particular genre. We projects, and therefore train a model to be able to also used PageRank on 2 bipartite graphs predict the popularity of future projects with other (movies to actors, and movies to directors). collaborators. Using a similar technique, we also While PageRank did not achieve another use the ratings of the movie to predict the popularity rating prediction method, it did yield in- of a particular genre.