Sentiments and Opinions From Twitter About Peruvian Touristic Places Using Correspondence Analysis
Luis Cajachahua Indira Burga Universidad Nacional de Ingeniería Universidad de Ingeniería y Tecnología [email protected] [email protected]
Abstract very valuable, because it allows to discover and develop the main attributes of touristic places in Tourism in Perú has become very im- Perú, considering not only the place itself, but all portant, since there is a growing number of the ecosystem: hotels, touristic agencies, handi- tourists arriving each year. This paper fo- craft stores, transportation, public services, etc. cus in understand what do speaking- english tourists have in consideration For that reason, we downloaded 192,525 when they visit Perú. We obtained all the tweets, published during 2016, considering only tweets published in english during the year english language. After a careful data preparation 2016, filtered by touristic places visited. In and feature extraction, we applied some sentiment total, more than 192 thousand tweets were analysis techniques to tag each tweet using the collected. We performed different analysis following eight sentiments: anger, fear, sadness, to describe the data, including correspond- disgust, surprise, anticipation, trust and joy, pro- ence analysis, a statistical technique which posed by Plutchik (Mohammad and Turney, is normally applied to categorical data. 2010). Using correspondence analysis, we discov- The goal was to understand the sentiments ered some associations between touristic places and opinions expressed in those tweets. visited and sentiments expressed in the extracted Keywords: Twitter, Tourism, Sentiment tweets. Analysis, Text Mining, Correspondence In addition, we used other text mining tech- Analysis, Perú. niques to determine some general concepts men- tioned on tweets and identify the association be- 1 Introduction tween the places visited and these concepts.
Tourism is an important source of economic 2 Background growth in Perú. In 2015, 3.5 million of tourists visited Perú, leaving more than USD 4,151 mil- Text mining has become a useful tool to deal lion that correspond to the 3.75% of the gross do- with the data deluge. With few and simple tools, mestic product (Promperú, 2016). For this reason, we can now extract valuable insights from a big all the government tourism bureaus, specially amount of unstructured data available today in the Promperú, are working hard to keep those num- internet. For example, one of the first studies of bers growing on Figure 1. sentiment analysis in Twitter was developed in 2011 by Agarwal et al. They built a formula to calculate the polarity of a comment, if the tweet is positive or negative. They developed a tree kernel approach to improve the classification (Agarwal, et al., 2011). In the same year, Dodds et al analyzed a huge list of tweets, around 4.6 billion, posted by over 63 million individual users and extracted in a 33- month span (Dodds, et al., 2011). The main goal Figure 1: Tourist arrivals to Perú in the last 10 was to develop a dictionary optimized to measure years (Promperú, 2016). the level of happiness of a given text, e.g. another tweet. They named their solution as “The Hedom- In this context, every initiative oriented to un- eter” derstand the tourist preferences and sentiments is
178
One of the first studies about tourism using Being Twitter an internet social network, all tweets was developed by Antoniadis. This study the users must be registered and logged to was focused in verifying the correlation between use it. But not every person or tourist in the Twitter performance and tourism competitive in- world has a Twitter account. For this reason, dex for 38 destination management organizations results are not generalizable/representative (DMO’s). They found that Twitter use was in ac- for the tourists’ population. cordance with countries’ tourism performance (Antoniadis, et al., 2014). We acquired all the tweets directly from Oku et al, analyzed 4.5 million tweets to find Twitter. We didn’t used streaming or scrap- the location where a tourist posted a photo. They ing methods to download the tweets. But, used only geotagging information from the pic- we used many filters to select tweets re- tures, leaving text data out of the analysis (Oku, et ferred to touristic places. In consequence, we could have selected a non-representative al., 2015). sample of tweets. In addition, Bassolas performed an interesting analysis about the 20 most interesting touristic We considered only original posted tweets. places around the world, including Machu Picchu. We are not considering retweets. But, there They tried to measure the “attractiveness” of each are many platforms that post news or copies place, considering two metrics: a) Radius, defined of tweets, e.g. IFTTT. Therefore, we could as the average distance between the places of resi- find the exact same tweet published by dence and the touristic site and b) Coverage, the many different users. area covered by the users’ places of residence computed as the number of distinct zones (or Some Tweets (10%) include the location of countries) of residence (Bassolas, et al., 2016). the user but this information is not always All these studies were considered to develop a real which makes classification by origin different approach, focused on finding the emo- difficult. Besides, the georeferenced infor- tions expressed by the tourists in the visited plac- mation is not representative. es. 3 Methodology 2.1 Objectives Having reviewed the references and settled the This research was focused in five goals: scope of the study, we selected some tools and Understand the sentiment expressed in the techniques available to analyze the information. tweet (positive or negative) about the place. 3.1 Scope Tag each tweet using the sentiments that the The population considered for this study was user is expressing in the tweet formed by 192,525 tweets downloaded directly from Twitter. The distribution of tweets is shown Identify the main qualitative attributes men- in the Figure 2. tioned in the tweets, related to the touristic
place.