A Restaurant Recommendation Web Application Using Machine Learning and Yelp
Total Page:16
File Type:pdf, Size:1020Kb
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Eat-Smart: A Restaurant Recommendation Web Application Using Machine Learning and Yelp Dataset A graduate thesis project submitted in partial fulfillment of the requirements For the degree of Master of Science in Computer Science By Frank Fan Cao May 2018 The thesis of Frank Fan Cao is approved: _____________________________________ _______________ Dr. Adam Kaplan Date _____________________________________ _______________ Dr. Robert McIlhenny Date _____________________________________ _______________ Dr. Jeff Wiegley, Chair Date California State University, Northridge ii Acknowledgements I would like to thank my committee chair, Dr. Jeff Wiegley for providing support, motivations, and expert knowledge during the entire thesis process. I would like to express my gratitude to Dr. Adam Kaplan and Dr. Robert McIlhenny for their help and guidance throughout the program. Lastly, I would like to thank professors for all my coursework. I am grateful for the knowledge you have shared with me. iii Table of Contents Signature Page .................................................................................................................... ii Acknowledgements ............................................................................................................ iii List of Tables ...................................................................................................................... v List of Figures .................................................................................................................... vi Abstract ............................................................................................................................. vii 1. Introduction ..................................................................................................................... 1 2. Data Collection and Preprocessing ................................................................................. 4 3. Recommendation Model ............................................................................................... 11 4. Model Testing and Optimization .................................................................................. 15 5. System Design .............................................................................................................. 18 6. Recommendation Web Service Implementation .......................................................... 22 7. Web Application Implementation Using Laravel ......................................................... 27 8. Web Application Features............................................................................................. 34 9. Conclusion and Future Work ........................................................................................ 40 Reference .......................................................................................................................... 41 iv List of Tables Table 1 Contents of Yelp Data Files ................................................................................... 4 Table 2 Business Data Distribution .................................................................................... 6 Table 3 Collaborative Filtering Algorithm Analysis ........................................................ 13 Table 4 ALS Parameters ................................................................................................... 15 Table 5 Grid Search Results for Rank and Maximum Iterations ...................................... 16 Table 6 RMSE Results from Grid Search in Regularization Parameters ......................... 17 Table 7 List of Views in Eat-Smart .................................................................................. 30 Table 8 List of Controller Methods .................................................................................. 32 v List of Figures Figure 1 Machine Learning Workflow ............................................................................... 2 Figure 2 Yelp Dataset Overview......................................................................................... 4 Figure 3 MySQL Database Schema .................................................................................... 5 Figure 4 Python Script for Filtering Business Data ............................................................ 8 Figure 5 Python Script for Filtering Review Data .............................................................. 9 Figure 6 Comparison of Storage File Format ................................................................... 10 Figure 7 Collaborative Filtering Example ........................................................................ 11 Figure 8 Python Script for Spark Model Creation ............................................................ 14 Figure 9 System Architecture Diagram ............................................................................ 19 Figure 10 Web service Workflow ..................................................................................... 22 Figure 11 App.py content ................................................................................................. 25 Figure 12 Laravel MVC .................................................................................................... 27 Figure 13 Eloquent Model Definition ............................................................................... 28 Figure 14 Master Blade Template..................................................................................... 29 Figure 15 Child Template ................................................................................................. 30 Figure 16 Laravel Controller Example ............................................................................. 31 Figure 17 Laravel Routing Example ................................................................................. 32 Figure 18 Web Application Routes................................................................................... 33 Figure 19 Home Page for Guest........................................................................................ 35 Figure 20 Home Page for Authenticated Users ................................................................ 36 Figure 21 User Reviews Page ........................................................................................... 36 Figure 22 Business Detail Page ........................................................................................ 37 Figure 23 Add Review Page ............................................................................................. 38 Figure 24 Confirmation Page for Successful Review Submission ................................... 38 Figure 25 Top Recommendation Page .............................................................................. 39 vi Abstract Eat-Smart: A Restaurant Recommendation Web Application using Machine Learning and Yelp Data By Frank Fan Cao Master of Science in Computer Science People are increasingly relying on reviews from other people to decide which items to buy, which movies to watch, which books to read and where to eat. Traditionally, these questions are answered with peer recommendation (words of mouth, blog posts and reviews) or expert advice (columnist, librarian). Crowd-sourced business review platforms are ubiquitous now. Apps such as Yelp, TripAdvisor and Google provide a wide range of information of local businesses, especially restaurants. The seemingly unlimited options for food and services contribute to information overload as users often struggle to make informed choices that cater to their individual wants and needs. One solution to the problem is a recommender system that provides accurate and personalized recommendations. This will greatly reduce the effort and time needed to discover new restaurants. vii The core of the application is a recommendation engine with an appropriate prediction model. The prediction models are built using data from the Yelp Challenge Dataset, which contains detailed information of 1.1M business, 4.1M reviews and 947K tips by 1M users. Various machine learning algorithms, such as K-Means, SVM, and Collaborative Filtering are investigated and benchmarked. Ultimately, Collaborative Filtering using Alternative Least Square is chosen because it offers good balance of accuracy and performance. The second part of the application is a web application built around the recommendation engine using the newest web technology and framework available. The application provides personalized and relevant recommendations to users with high prediction accuracy. viii 1. Introduction 1.1 Background Recommender system is a system of methods that filters through large observation and information spaces to provide predictions in the information space that users do not have any observations yet. In simpler words, recommender system provides predictions for items that users have not rated yet. Recommender system has long existed in various fields and disciplines. Online retailer Amazon uses recommendation system to recommend new products after users purchase or search for certain items. Social networking application such as LinkedIn uses recommender system to suggest new connections. Streaming service like Netflix uses similar systems to recommend movies and music based on user’s previous choices and search history. Following this trend, the project focuses on creating a recommendation system for Yelp users to accurately predict potential food choices based on their