<<

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254

Restaurant Review Classification and Analysis

Dhiraj Kumar1, Gopesh2, Avinash Choubey3, Ms.Pratibha Singh4 1Student, Computer Science & Engineering Department ABES Engineering College Ghaziabad, U.P. 2Student, Computer Science & Engineering Department ABES Engineering College Ghaziabad, U.P. 3Student, Computer Science & Engineering Department ABES Engineering College Ghaziabad, U.P.

4Faculty, Computer Science & Engineering Department ABES Engineering College Ghaziabad, U.P.

Abstract- Restaurants nowadays prefer taking online orders. It not only helps in getting effective customer feedback but 1.Problem Introduction also useful for managing orders easily. We are moving towards an automated and For years food and hospitality businesses are digital world. Having a significant online running on the assumption that good food and presence is necessary for any restaurant to service is the way to attract more customers. be successful and prosperous. Getting But the advent of science and technology, customer feedback and analyzing them in more importantly, the data created by the use an effective manner makes the difference. of online platforms has pointed towards new This study analyses the restaurant reviews findings and opened new doors: Most and presents useful information that the consumers nowadays rate a product online, ratings do not consider or overlook. over 1/3rd of them write reviews and nearly Combined research is done using two 88% of the people trust online reviews. different datasets of restaurant reviews in Review Services like , Reviews, this paper. Machine learning algorithms etc. provide customers and businesses a way like Naïve Bayes and Logistic regression is to interact with one another. Reviews and used for first classifying the reviews in Ratings are useful sources of information but proper aspects then performing sentiment significant problems exist in extracting analysis on them. Summarization is done relevant information and predicting the using gensim and results are displayed future through analysis and correlation of the using effective visualization techniques. existing data. Each day thousands of Future work is also discussed so that an restaurants and businesses are reviewed by efficient analysis system can be developed the customers. utilizing the potential of reviews. The main objective of the work proposed in Keywords- Support Vector Machine (SVM), this paper is to enhance the user experience Naive Bayes classifier, Sentiment Analysis, by analyzing the reviews of restaurants and Topic Modelling, aspect classification. categorize them in some aspects so that a user can easily know about the restaurant. Restaurants are not able to utilize reviews for their businesses. We want to use the aspects

that are important in the food and service industry so that we can analyze the sentiment of text reviews and help them to improve their businesses. This research paperwork

www.jespublication.com . Page No:169

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 can be applied to many other industries gives highly accurate results for the related to food and hospitality. theoretical types of challenges. The phrases and expressions of n-gram [11] give it an 2. Related Previous Work edge over all other techniques used for a technical set of challenges. The results When diving into this research work, we explained the effectiveness of sentiment found that a large amount of relevant work analysis challenges for improving the has already been done in this field but what accuracy of the model [12]. was missing was the fact that most of them were not industry oriented. We tried to 2.2. Aspect based Sentiment Oriented incorporate the most noticeable findings in Summarization of Hotel Reviews these works as our base so we could build upon the work already done. Most of the work focuses on improving the models for Due to the unstable size of review dimensions classification. and customer produced content, different text analytic approaches like opinion mining [13], This research paper will help the sentiment analysis, topic modeling [14], aspect classification, play a significant role in researchers to learn and help them to take analyzing the content. Topic Modelling can further implementations and improvements. find diverse topics in a corpus of text because New methods like semantic orientation have of its statistical nature. For every aspect type, also been discussed. The Fakeness of the there is a certain opinion linked to it and the reviews is a common problem that arises. Sentiment analysis method can effectively Some deep learning techniques have also bring out these emotions. been compared with classical techniques. Whether it is a business intelligence problem or a case of unstructured document Some of these works have been summarized categorization sentiment analysis is useful in the subsequent sections. for most of the cases. It has emerged as the most important aspect of the Information 2.1. A survey of Sentiment Analysis Retrieval process. The strategies regarding Challenges text summarization [15] can boost sentiment analysis research. The opinion mining of the hotel reviews is done using SentiWord[16] The consequences of challenges in the area of library. The reviews were summarized on sentiment analysis [1] has been discussed. different aspects and sentiment analysis was Sentiment review structure is compared with performed. [17] sentiment analysis challenges in the first distinction. The effect of this distinction shows that domain-dependence [2] is an 2.3. Assessing the Helpfulness of Online important part of sentiment challenges. The Hotel Reviews second comparison deals with the accuracy of sentiment analysis models based on the challenges. Structured [3], Semi-structured A review can be considered useful for [4], and Unstructured [5] are three types of decision making purposes only when it is review structures that were used for the first thoughtful and insightful. The indicators comparison. Theoretical and technological representing the importance of reviews are are the two types of sentiment analysis different for diverse research areas due to challenges. The challenges include Domain ease of access. In the case of travel and dependence, negation [6], bipolar words [7], hospitality , the reviews containing entity feature/ keyword extraction [8], spam, maximum votes are considered to be more or fake review[9], NLP overheads like (short informative and useful for consumers. It can abbreviations, ambiguity, emotions, be helpful in optimizing the cost of the search sarcasm). Parts-of-speech (POS) tagging [10]

www.jespublication.com . . Page No:170

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 for most of the consumers by using feature crafted to train the SVM model. Its engineering.[18]. effectiveness in the classification of binary targets is also an important trait to support 2.4. A framework for Fake Review this performance. After analysis of different datasets, it is observed that SVM performed Detection in Online better in almost all of the aspect analysis problems [23] consisting of small datasets. This paper paves an idea of the challenges that we could face in our research. Specifying Generally, aspect-based sentiment analysis is the significance of online feedback for divided into 4 main parts: different types of industries and the amount of difficulty attached in procuring and 1. Extraction of aspect-terms maintaining a favorable honor on the 2. Recognition of polarity aspect-terms Internet, diverse methods have been used to 3. Recognition of aspect-category enhance digital existence, including 4. Recognition of polarity of aspect- unethical practices. Fake reviews are one of category [22] the most preferred unethical methods which exist on sites such as Yelp or TripAdvisor. In response to that Fake Feature Framework 3. FOLLOWING TWO DATASETS ARE (F3), helps to assemble and constitute USED FOR REVIEW ANALYSIS features for fake reviews choice. F3 estimates knowledge obtained from both the user 3.1 SemEval-2014 Task-4 Dataset (personal profile, reviewing activity, trusting information, and social interactions) and review elements (review text).[19] For aspect classification, the dataset is taken from SemEval-2014 Task-4. Our dataset consists of over 3 thousand English sentences 2.5. Sentiment Analysis of Hotel Reviews from different restaurant reviews. The dataset format is in XML consisting of reviews with It is observed that Semantic orientation can their aspects and polarity. Terms also also be used as a sentiment analysis model to mentioned in the dataset on which the aspects classify reviews as 0 or 1 representing are based. There is more than one aspect negative and positive respectively. It is related to some of the reviews [24]. possible to classify a review on the foundation of the average semantic 3.2 Bangalore Restaurant Dataset orientation of phases in the review which comprises adverbs and adjectives. It is Zomato Bangalore restaurant dataset expected that there will be an efficient value uploaded in 2019. The data was scraped from when we merge semantic orientation [20] Zomato for educational purposes only. with sentiments. The review is recommended Dataset consists of 17 columns with 51717 only if the mean is positive and otherwise not unique URLs and 8792 Unique restaurants. Some of the fields in location and phone are it is recommended. The Naive Bayes model missing. We ignore the column Phone no. generally performs better than SVM [21]. The review list columns contain reviews of restaurants for a specific restaurant. The 2.6. Deep Recurrent Neural Network Vs. Support Vector

In this paper, it was discussed that SVM which is one of the supervised learning techniques performs better than other approaches and models. It is because of the fact that the features are carefully hand-

www.jespublication.com . . Page No:171

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 analysis of the restaurants and sentiment preprocessed data obtained from phase 2. classification is done using this dataset [25]. The sentiments are predicted for each aspect

Fig. 1. Flow diagram of the proposed method 4. Proposed Methodology and analyzed. The analyzed results and sentiments are to be displayed using front- Explanation of three phases as shown in Fig. end technology. During the whole process, 1 is mentioned below: we keep on updating the review database. 5. Exploratory Data Analysis Phase 1: This phase deals with the collection of the right dataset for classification and sentiment analysis. The dataset for aspect We create a dataset for aspect classification classification has been collected from the using Restaurant_Train.xml file. The XML SemEval-2014 competition which contains file is converted to the data frame for aspect the reviews and the labels associated with the classification using pandas. We have taken a aspects. The labeled dataset for sentiment Zomato review and performed the analysis. analysis is taken from Kaggle which has been We have shown some graphs and charts for a extracted from Zomato using zomato API. better understanding of the dataset.

Phase 2: This phase deals with the aspect It is observed that BTM city has the greatest classification and preprocessing part for number of restaurants and New Bel Road has sentiment analysis. It includes stopwords a small number of restaurants but if we removal, POS tagging, lemmatization, combine all blocks of Koramangala then it stemming, etc. Feature generation is exceeds BTM (as shown in fig. 2). It gives an performed using TF-IDF (Term Frequency) idea about the restaurants in a particular city. it normalizes the document term matrix. It is If we look into this data and order restaurants the product of TF and IDF. Finally, the received from these cities and compare them dataset is prepared into vectors using the then we get an idea of improvement. Such BOW vectorizer for sentiment prediction. information helps in targeting specific areas. For the restaurant Phase 3: It is the final phase where we are going to do the sentiment prediction on the

www.jespublication.com . . Page No:172

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 BTM and Koramangala are among the restaurants taking the highest number of online orders i.e 2100 and 1750 respectively. It has been observed that the success rate of restaurants increases by taking online orders and feedback from customers.

Fig. 2. A number of restaurants in each city. owners or if someone wants to open a new branch of a restaurant.

As per the graph (as shown in fig. 3) nearly 58.9% of restaurants are taking online orders while 41.1% don’t take online orders. Taking online orders is a necessary parameter to analyze a restaurant because it is very important in times of crises like covid-19 Fig. 4. Bar plot of average cost vs city. when people don’t want to go outside of their houses then they will definitely order food

Church Street is expensive and Banashankari is pocket friendly among all restaurants in Bangalore city. Price is evenly distributed around 400 average costs for 2 persons(as shown in fig. 4). It is very much required for the restaurant to keep the average cost at around 500 or less. Most people go to affordable places, so restaurant owners Fig. 3. Pie plot representing total should keep in mind the average cost and a restaurants taking online orders or not. high percentage of discounts.

from the restaurants. Biryani is the most sought dish and a good option for a restaurant to have on the menu. Moreover, people who work in companies Fig. 5 shows the importance of a type of food find it easier to order online. In such a item in the success of a restaurant. For the situation it will be beneficial for restaurants owner of the restaurant, it is very much that take online orders (as shown in fig.3). required to have the top dishes on the menu. Feedback is one of the most important factors It is very much clear that in Bangalore people for the restaurant to improve its services like the biriyani most and it is one of the top when people order online then they will provide feedback for the food quality and service.

www.jespublication.com . . Page No:173

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 dishes in Bangalore city. Although dishes negative review. It is used in Language like Paratha and Masala Dosa are also liked Modelling and chatbots creation.

Finally, we create a word cloud of restaurants Donne Biriyani Mane and Kanti Sweets (as shown in fig. 7). In Donne Biryani Main words like “chicken” and “Biryani” have high frequency whereas in Kanti Sweets words like “Sweet” and “fruitbite” have a high frequency. other common words are “fruit”,” good”,” place” etc.

The word cloud represented in fig. 8 brings out the illustration of words used in the Fig. 5. Top 10 dishes liked by the people of reviews. A word cloud is generated by Bangalore. counting the number of occurrences of each word in a sentence and then displaying them in an organized manner with the highest by people in the morning. From the cleaned count word having the largest font. One can obtained dataset, potential features are extracted and converted to a word vector representation. The vectorization techniques are used to convert textual data to a numerical format. Using vectorization, a matrix is

Fig. 6. POS tagging to know the essence of a sentence.

. created in which each column represents a review feature and each row represents an individual review.

Here we have done POS tagging to identify the grammatical Parts in a given review like noun, pronoun, verb, adjectives, adverbs, etc (as shown in fig.6). Here speech attached with the different words in the sentence “But the staff was so horrible to us.” The PoS tags are very useful for reducing words into their root form. It not only makes it easy to understand but also brings out the sense and Fig. 7. Word cloud of Donne Biriyani Mane relationship between the words of a sentence. and Kanti Sweets Parts of speech are context-dependent, as we can see here the context is about criticizing the staff of the restaurant and giving a

www.jespublication.com . . Page No:174

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 clearly understand the restaurant type in this Forest performs with 82.5%. MultinomialNB case by just looking at the word clouds. classifiers are better than other classifiers in almost every evaluation metric. It takes very 5. Validation and Model Comparison Table 1. Classification report of models.

Fig. 8. Comparison between different sentiment analysis models.

little time (i.e 0.271) while Random Forest takes much (i.e 2.68) to fit and predict the output, that is why it can be used for real-time classification systems. The comparison is done on the basis of performance measure on Restaurant.xml dataset for aspect classification. These models are chosen for

The first column of table 1 contains a classification report of three different classification models used for aspect classification in our paper. The second column contains a vertical bar chart displaying the performance of models on the basis of precision, recall, F1 Score, and time it takes to fit a dataset. Fig. 9. Fitting time comparison of models The comparison shows that the comparison because they perform well on MultinomialNB model performs with an multilabel classification. accuracy of 86.6%, SGD Classifier performs with an accuracy of 85.8% and Random

www.jespublication.com . . Page No:175

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254

In fig. 8 both tabular and graphical 6. Results representations of different sentiment analysis models are shown. The blue color bars represent the logistic regression model, red represents the decision tree and yellow represents the naive Bayes model. The x-axis has the evaluation metrics and the y-axis contains the values from 0 to 1.

The accuracy, precision, recall, and F1 score of a decision tree is more than logistic regression but Decision trees are nowhere near logistic regression when it comes to the

fitting time. The fitting time is the time taken Fig. 10 Result of McDonald’s by a model to fit the data and learn from it. At this time the training data is fed to the model with target values. We can represent our result graphically (as shown in fig.10 and fig. 11) to show positive The scoring time on the other hand is the and negative sentiments for each aspect of a predicting time. The time model takes to particular restaurant. McDonald’s has predict the output when a string is given to positive sentiments in almost every aspect (as the trained model. Logistic regression is a shown in fig.10). Which shows that it has a better algorithm in terms of both fitting and good no of positive reviews. splitting time so it should be used for real- time applications. In the food industry, it is very much important for the restaurant to have good The red portion of the pie chart in fig 9 quality food. People can compromise on the represents the time taken by a decision tree other aspects but not the quality of food. KFC model to fit the training data, the yellow has a very high value for a food aspect as portion represents the fitting time of Naive compared to others which shows that it has a Bayes algorithm and the smallest one good no of positive reviews. In the food representing the best fitting time belongs to industry, it is very much important for the the logistic regression model. Pie charts are restaurant to have good quality food. People mostly used for percentage distribution so if can compromise on the other aspects but not we measure time to be 100s then logistic the quality of food. KFC has a very high regression takes 1.7 seconds to fit the data value for a food aspect as compared to other whereas decision trees take 94 seconds to fit aspects (as shown in fig. 11). the data. So, a fast algorithm is preferred over a more accurate one for real-time applications.

www.jespublication.com . . Page No:176176

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 As shown in fig. 12, it contains the text for ‘Kabab Ghar’ restaurant. The summary and keywords are extracted. The set of keywords for this restaurant are: chicken, food, good taste, ordered biryani, non-veg, order, orders, totally, add double. These results simplify the process of understanding large corpus of reviews. People are very busy nowadays, so presenting them with concise and relevant information can make a big difference in quality and productivity. Gensim tool is used to generate a summary and extract keywords. It is an open-source library for unsupervised Fig. 11. Result of KFC. topic modeling and natural language processing. It combines the reviews and generates a summary by extracting important This concludes that in the fast-food sentences and keywords from the reviews (as industry, quality and taste of food is much shown in fig. 12). The algorithm used in it is more important than other aspects. This a TextRank algorithm. It generates a short might not be true for a costly restaurant summary of sentences. whereas the service cost increases, people will expect an all-round service as well. Different insights can be drawn for different 8. Conclusion kinds of businesses. After processing a large corpus of reviews, 7. Summarization and Keywords we reach to the conclusion that the MultinomialNB model performs better than other algorithms in almost every evaluation metric. It takes very little time to fit and In summarization, we create a short summary predict the output, that is why it can be used of all the reviews which will help the people for real-time classification systems. or customers to decide which is best for their Although accuracy, precision, recall, and F1 requirements.

Fig.12. Summary of Kabab Ghar restaurant. score of the decision tree is better than other Decision tree fitting time is much higher (as shown in fig. 9). Some other observations taking online orders and table bookings

www.jespublication.com . . Page No:177

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 increases the number of customers. Describing sentiments on the Restaurants should collaborate with delivery structured web of data." (2011). sites like Zomato, , etc to grow their 4. Yi, Jeonghee, and Neel Sundaresan. business. The average cost for a person "A classifier for semi-structured should not be high. There must be quick bites documents." In KDD, pp. 340-344. for the areas where the MNC’s are situated, (2000). as people don’t want to spend more time in 5. Moghaddam, Samaneh, and Martin restaurants. Ester. "Opinion digger: an unsupervised opinion miner from unstructured product reviews." In 9. Future Scope Proceedings of the 19th ACM international conference on Information and knowledge The work carried out here suggests the management, pp. 1825-1828. ACM, usefulness of online reviews in transforming (2010). businesses using NLP and text analysis 6. Wiegand, Michael, Alexandra techniques. In the future, a real-time Balahur, Benjamin Roth, Dietrich application can be made so that people can Klakow, and Andrés Montoyo. "A use it to analyze the reviews more effectively survey on the role of negation in and grow their businesses. Neural network sentiment analysis." In Proceedings architectures can be used to make the of the workshop on negation and summarization of text more accurate and speculation in natural language readable and to the point. processing, pp. 60-68. (2010). Summarization of text reviews and the 7. Flekova, Lucie, Daniel Preoţiuc- power to get insights from them can open an Pietro, and Eugen Ruppert. all-new world into the field of analytics and "Analysing domain suitability of a how we use data in businesses. There is a vast sentiment lexicon by identifying array of techniques that are left to explore distributionally bipolar words." In which can make this an even more exciting Proceedings of the 6th Workshop on area to study and improve. Not only is there Computational Approaches to a vast room for future research, but this study Subjectivity, Sentiment and Social can also be performed for various other Media Analysis, pp. 77-84. (2015). industries like transportation, hospitality, 8. Asghar, Muhammad Zubair, healthcare as well as education. Aurangzeb Khan, Shakeel Ahmad, and Fazal Masud Kundi. "A review of feature extraction in sentiment References analysis." Journal of Basic and Applied Scientific Research 4, no. 3 1. Basant, A., M. Namita, B. Pooja, and (2014): 181-186. Sonal Garg. "Sentiment analysis 9. Lin, Yuming, Tao Zhu, Hao Wu, using common-sense and context Jingwei Zhang, Xiaoling Wang, and information." (2015). Aoying Zhou. "Towards online anti- 2. Bykh, Serhiy, and Detmar Meurers. opinion spam: Spotting fake reviews "Native Language Identification from the review sequence." In 2014 using Recurring n-grams– IEEE/ACM International Investigating Abstraction and Conference on Advances in Social Domain Dependence-grams– Networks Analysis and Mining Investigating Abstraction and (ASONAM 2014), pp. 261-264. Domain Dependence." In IEEE, (2014). Proceedings of COLING 2012, pp. 10. Gimpel, Kevin, Nathan Schneider, 425-440 (2012). Brendan O'Connor, Dipanjan Das, 3. Westerski, Adam, Carlos Angel Daniel Mills, Jacob Eisenstein, Iglesias Fernandez, and Fernando Michael Heilman, Dani Yogatama, Tapia Rico. "Linked opinions: Jeffrey Flanigan, and Noah A. Smith. Part-of-speech tagging for twitter:

www.jespublication.com . . Page No:178

Vol 11, Issue 8, August/ 2020 ISSN NO: 0377-9254 Annotation, features, and 18. Lee, Pei-Ju, Ya-Han Hu, and Kuan - experiments. Carnegie-Mellon Univ Ting Lu. "Assessing the helpfulness Pittsburgh Pa School of Computer of online hotel reviews: A Science, (2010). classification-based approach." 11. Cavnar, William B., and John M. Telematics and Informatics 35, no. 2 Trenkle. "N-gram-based text (2018): 436-445. categorization." In Proceedings of 19. Barbado, Rodrigo, Oscar Araque, SDAIR-94, 3rd annual symposium and Carlos A. Iglesias. "A framework on document analysis and for fake review detection in online information retrieval, vol. 161175. consumer electronics retailers." (1994). Information Processing & 12. Hussein, Doaa Mohey El-Din Management 56, no. 4 (2019): 1234- Mohamed. "A survey on sentiment 1244. analysis challenges." Journal of King 20. Thumbs Up Thumbs Down by Saud University-Engineering D.Tumey, Turney, (2020). Sciences 30, no. 4 (2018): 330-338. 21. Priyantina, Reza Amalia, and 13. Anwer, Naveed, Ayesha Rashid, and Riyanarto Sarno. "Sentiment Syed Hassan. "Feature based opinion Analysis of Hotel Reviews " mining of online free format International Journal of Intelligent customer reviews using frequency Engineering and Systems 12, no. 4 distribution and Bayesian statistics." (2019): 142-155. In The 6th International Conference 22. M. Al-Smadi, O. Qawasmeh, B. on Networked Computing and Talafha, M. Quwaider, Human Advanced Information Management, annotated Arabic dataset of book pp. 57-62. IEEE, (2010). reviews for aspect based sentiment 14. Patil, Pratik P., Shraddha Phansalkar, analysis, in: 3rd International and Victor V. Kryssanov. "Topic Conference on Future Internet of Modelling for Aspect-Level Things and Cloud (FiCloud), IEEE, Sentiment Analysis." In Proceedings pp. 726–730 (2015). of the 2nd International Conference 23. Mohammad Al-Smadi, Omar on Data Engineering and Qawasmeh, Mahmoud Al-Ayyoub, Communication Technology, pp. Yaser Jararweh, Brij Gupta “Deep 221-229. Springer, Singapore, Recurrent neural network vs. (2019). support vector machine for aspect- 15. Aggarwal, Charu C. "Text based sentiment analysis of Arabic Summarization." In Machine hotels’ reviews Journal of Learning for Text, pp. 361-380. Computational Science 27 (2018) Springer, Cham, (2018). 386–393”. 16. Fikri, Mohammad, and Riyanarto 24. SemEval-2014 Task-4 dataset for Sarno. "A Comparative Study of aspect classification. Available at: Sentiment Analysis using SVM and http://alt.qcri.org/semeval2014/task SentiWordNet." Indonesian Journal 4/ (2014). of Electrical Engineering and 25. Zomato Bangalore Restaurant Computer Science 13, no. 3 (2019): Dataset Available at: 902-909. https://www.kaggle.com/himanshup 17. Akhtar, Nadeem, Nashez Zubair, oddar/zomato-bangalore-restaurants Abhishek Kumar, and Tameem (2019). Ahmad. "Aspect based sentiment oriented summarization of hotel reviews." Procedia computer science 115 (2017): 563-571.

www.jespublication.com . . Page No:179