(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 Netnography and Text Mining to Understand Perceptions of Indian Travellers using Online Services

Dashrath Mane1, Dr. Prateek Srivastava2 Department of CSE, Sir Padampat Singhania University [SPSU] Udaipur,

Abstract—Advancements in the electronic commerce industry The best costs and cashback bargains on air are assured with have helped online travel services (OTA) in many ways. The .com. paper examines the overall impact of traveller’s using online services and their sentiments derived from a collection of reviews online travel service provider offers internet for online travel service providers known as online travel agents booking for train and flight tickets for domestic and (OTA) in India. Customer reviews from different identified international bookings. Goibibo.com is another name which sources are collected and the satisfaction of travellers using offers the best arrangements/bookings in trains, flight. From various online travel services is analyzed using netnographic booking universal/local occasions to encouraging Foreign analysis and text mining. This paper also covers a detailed Exchange, Visa, Passport, protection for movement purposes. process of data collection, analysis using netnography and text Thomas cook does this everything for the travellers. mining methods which helps us for the analysis and deriving Travelguru.com, regularly appraised as a standout amongst the sentiments from collected reviews. Various results obtained are most went by Indian travel destinations on the web, is a presented as part of token lists, keyword analysis, and service- fortune place of treats with regards to booking air tickets, specific analysis. The statistical analysis of different results is lodging offices. tested to understand the relationship between various services and OTA. Netnography and Tourism industry: Netnography is Method used for studying culture and communities online. It Keywords—Consumer; travellers; netnography; text mining; is a tool to understand social interaction in digital OTA; sentiment; perception communications [6]. Netnography is a known tool accepted as a virtual ethnography tool, is being explored along with text I. INTRODUCTION mining. The approach used in this paper also produces an A. Tourism Market in India analysis of perceptions of travellers in India using text mining techniques. India, the country is known for its rich history, cultures. India placed 40th out of 136 nations as per the report of Travel Phases in the Netnography Process [11]: Fig. 1 explains & Tourism Competitiveness for the year 2017 [1]. The various phases in the process of netnographic analysis; below tourism industry in India is playing a significant role in the are the various phases in it: economy [2]. 1) Research planning: Research planning [3] is the initial The current scenario shows that India has a bright future in phase in netnographic studies. This phase speaks about the tourism sector as lots of development taking place in the defining the problem, defining the research objectives, should Indian tourism industry. In the last few years, the Ministry of talk about translating the research objectives into a specific set Tourism, Government of India, has initiated various attempts of questions. to lift tourism in a country that has opened research opportunities in it. The plans like PRASAD, e-tourist Visa, 2) Entrée: In this phase, online communities, blogs, Mobile applications for tourists are helping out to gain a new groups are identified and allows an ethnographic to enter into tourist footprint [3]. the communities to get a better knowledge of cultures and communities. B. Online travel services in India 3) Data collection: Data collection is a very vital phase in It is expected that by the end of 2020, the count of the this type of study. In this phase, Netnographer collects data bookings made online is likely to reach 4 billion in India.[4] from Internet data, interview data, and field notes. Travellers‘ Makemytrip.com has been positioning high on the rundown of reviews downloaded using various rating & referral sites like movement sites that fill in as a one-stop look for the whole mouthshut.com and consumeraffairs.com. These are the group. The online booking travel related firm is a commonly known name these days & it has been collaborating with websites that allow the user to collect millions of reviews imperative players in the industry. Yatra.com[5] is the best for posted by travellers in India. The whole reviews were movement specialists, long-standing corporate customers, and collected using systematics steps in the netnographic easygoing hikers alike & is an excellent travel site in India. process[7].

559 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

4) Interpretations/Data analysis: This makes use of One this online travel-related online services in India. Fig. 3 shows of the data analysis technique known as Analytical coding, the overall rating for Cleartrip on mouthshut.com. Renal data analysis consist of coding, noting, abstracting, Fig. 4 represents the sample to review and rating for checking and refining, generalizing and theorizing as shown Cleartrip [7]. In addition to actual comment, Reviews contain in Fig. 2. values from 1 (low) to 5(high) for all the website components. 5) The data collection process was performed from Fig. 5 is a sample of the overall rating for MakeMyTrip. sources like mouthshut.com [8] and Consumer Affairs.com for Referring to the Fig. 5, it is very much clear that for all five this study. More than 2000 reviews from these platforms have website components, the overall rating for MakeMyTrip is 2 been collected and maintained in an excel sheet. [8]. Data collected from the identified referral, rating sites is stored in the Excel sheet. This data contains various fields as 1) Date of review, 2) Reviewer name, 3) Gender, 4) Age, 5) Location, 6) Review, 7) Source, 8) Review rating, 9) Service and support, 10) Information depth, 11) Content, 12) User-friendly, 13) Time to load. Fig. 1. Phases in Netnography Process [11]. Fig. 6 shows an Excel document of reviews collected from different sources of online travel agents. A. Text Pre-Processing The reviews were collected from various online platforms & referral rating sites need to have a series of text preprocessing before referring them for analysis using Text mining. The concept of text preprocessing consists of a series Fig. 2. Analytical Data Analysis [11]. of stages, which include spelling in normalization, filtering, lemmatization. The various text preprocessing tasks are being II. RESEARCH METHODOLOGY essential and this includes content cleanup, tokenization, The online travel agents considered in this research work grammatical form tagging [9]. includes MakeMyTrip, Goibibo, Cleartrip, redBus, and Yatra. Table I represent various phases in determining positive A good number of reviews for each of these agents [12] negative reviews the process of determining sentiments selected are evaluated using netnographic studies. The consist of some of the essential processes like splitting words, analysis performed as netnographic analysis is used to carry POS tagging, lemmatization, joining and then sentiment out the sentiments, the levels of sentiments of travellers using analysis.

Fig. 3. The Overall Rating for Cleartrip (Source: Mouthshut.com).

Fig. 4. Sample Review and Rating for Cleartrip (Source: Mouthshut.com).

560 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

Fig. 5. Overall Rating for MMT (Source: Mouthshut.com).

Fig. 6. Different Fields in Format of Reviews Collected.

TABLE I. VARIOUS PHASES IN DETERMINING POSITIVE-NEGATIVE REVIEWS

Process Output Horrible experience, hotels don't provide basic amenities and charge for everything. Initial Review Customer service part is worst. Lowercase Horrible experience, hotels don't provide basic amenities and charge for everything. Customer service part is worst. Punctuation Horrible experience hotels don't provide basic amenities and charge for everything customer service part is worst Split words ['horrible', 'experience', 'hotels', "don't", 'provide', 'basic', 'amenities', 'and', 'charge', 'for', 'everything', 'customer', 'service', 'part', 'is', 'worst'] [('horrible', 'JJ'), ('experience', 'NN'), ('hotels', 'NNS'), ("don't", 'VBP'), ('provide', 'VB'), ('basic', 'JJ'), ('amenities', 'NNS'), ('and', 'CC'), POS tagging ('charge', 'NN'), ('for', 'IN'), ('everything', 'NN'), ('customer', 'NN'), ('service', 'NN'), ('part', 'NN'), ('is', 'VBZ'), ('worst', 'JJ')] Lemmatization ['horrible', 'experience', 'hotel', "don't", 'provide', 'basic', 'amenity', 'and', 'charge', 'for', 'everything', 'customer', 'service', 'part', 'be', 'bad'] Joining horrible experience hotel don't provide basic amenity and charge for everything customer service part be bad Sentiment Analysis {'negative': 0.333, 'neutral': 0.667, 'positive': 0.0, 'compound': -0.7906} Sentiment Positive / Negative Fig. 7 and 8 graphically represents the top 5 tokens in III. RESULTS AND DISCUSSIONS positive and negative category respectively The results obtained from this research are presented in this section. 2) Gender-specific (Male) Analysis of tokens: In the gender-specific category of tokens obtained, the Table III 1) Token identification: Various subsections are being represents the top tokens based on their frequency of identified based on the results. After processing the textual occurrence in reviews that are listed. It is also identified from reviews collected using text mining techniques following are the Table III that when it comes to male support, care, friend, the detailed list of tokens identified. The below section help these are the frequently used words while expressing in represents the top 25 tokens from the overall volume of the form of review. And it denotes the satisfaction of the reviews processed and also online agent-specific [10]. travellers the category of male. Tokens identified are used for deriving the overall Similarly, for or negative categories in males, the perception of the travellers for the overall. Table II represents commonly used words based on the results obtained are a the top 25 tokens from both the categories positive and problem, fraud, Waste, mistake. This represents the negative. The top 25 tokens based on the frequency of their dissatisfaction of travellers regarding online travel-related occurrence has shown in the Table II For overall positive services. Tables III and IV represents the top 5 positive and tokens care, support, help, these are the frequently used words top 5 negative tokens for males. Overall male analysis by the reviewers while posting the reviews. Similarly, in the performed and results obtained indicate that usually while negative tokens category words like problems, fraud, cheat, expressing on social platforms about satisfaction, travellers and mistakes are some of the commonly used words by the have used quite similar words. reviews while expressing on social platforms.

561 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

TABLE II. OVERALL POSITIVE-NEGATIVE TOKENS

Overall Positive Tokens Overall Negative Tokens

Sr. No Words Frequency Polarity Sr. No Words Frequency Polarity

1 Care 489 0.49 1 Problem 244 -0.4019

2 Support 454 0.40 2 Fraud 153 -0.5859

3 Friend 267 0.49 3 Cheat 70 -0.4588

4 Please 191 0.32 4 Mistake 65 -0.34

5 Help 189 0.40 5 Cheater 60 -0.5423

6 holiday 133 0.40 6 Fault 57 -0.4019

7 thanks 109 0.44 7 Waste 50 -0.4215

8 credit 90 0.38 8 Emergency 47 -0.3818

9 Kind 88 0.53 9 Delay 45 -0.3182

10 trust 65 0.51 10 Error 35 -0.4019

11 thank 59 0.36 11 Loss 33 -0.3182

12 solution 42 0.32 12 Complain 27 -0.3612

13 promise 40 0.32 13 Trouble 25 -0.4019

14 value 37 0.34 14 Penalty 25 -0.4588

15 hope 30 0.44 15 Shock 22 -0.3818

16 comfort 29 0.36 16 Hell 21 -0.6808

17 profit 26 0.44 17 Fool 20 -0.4404

18 resolve 25 0.38 18 inconvenience 19 -0.3612

19 hand 23 0.49 19 Doubt 19 -0.3612

20 satisfaction 22 0.44 20 Rude 16 -0.4588

Overall male analysis performed and results obtained indicate that usually while expressing on social platforms about dissatisfaction, travellers have used quite similar words.

TABLE III. TOP FIVE POSITIVE MALE

Overall Male Positive Sr. No Words Frequency polarity 1 Support 327 0.40 2 Care 284 0.49

3 Friend 199 0.49 Fig. 7. Overall Positive Tokens (Top Five). 4 Help 125 0.40 5 please 116 0.32

TABLE IV. TOP FIVE NEGATIVE MALE

Overall Male Negative Sr. No words Frequency Polarity 1 problem 173 -0.4019 2 Fraud 89 -0.5859 3 Waste 34 -0.4215 4 mistake 33 -0.34 5 emergency 33 -0.3818 Fig. 8. Overall Negative Tokens (Top Five).

562 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

3) Gender-specific (Female) Analysis of tokens: This study. Fig. 9 explains lines of code referred from Python- section describes various tokens used in the form of reviews based implementation which has helped to get the overall while expressing on social platforms by female travellers. The graphical representation of keywords and its analysis. analysis carried out in below Table V and VI represents the Fig. 10 represents the first Visualization of keywords for top 5 tokens based on their value of frequency. It also the real data values collected for the research purpose. From represents their importance while deriving positive and the results obtained, it very much clear that words like Hotel, negative sentiments from the reviews hosted by female book, ticket, service, call are the most critical words based on travellers. In a positive category, the frequently used word by their appearances. female Travellers is Care, support, please, and so on. In Fig. 11 and 12 represents the essential keywords in negative categories, the most frequently identified words are positive and negative category for the entire study. From the problem fraud, cheat, a mistake. overall positive category, it is very much clear that words like Tables V and VI are the tabular representation of the top 5 best, awesome, trust, kind, super are the critical words reviews from gender-specific analysis of tokens in female describing positive sentiments in this category. In contrast, in traveller. overall negative word analysis. Words like fraud, bad, horrible, pathetic, failed is the crucial words that frequently Overall female analysis performed and results obtained occurred while expressing sentiments by the Travellers. indicate that usually while expressing on social platforms about dissatisfaction, travellers have used quite similar words. TABLE V. TOP FIVE POSITIVE FEMALE

The analysis carried out in table VI represents the top 5 Overall Female Positive tokens based on their value of frequency for the positive female category. Sr. No Words Frequency polarity 1 Care 193 0.49 Table VII represents a comparison matrix with the top ten tokens in each OTA. Based on the results shown, it is very 2 support 115 0.40 much clear that satisfied travellers are frequently expressing 3 please 73 0.32 with words like support, help, kind, comfort, and dissatisfied 4 friend 63 0.49 travellers are using words like problem, fraud, cheater, 5 Help 62 0.40 mistake while expressing on social platforms.

The comparison matrix shows the frequency of particular TABLE VI. TOP FIVE POSITIVE FEMALE words when Traveller post their comments reviews on various sites. This comparison matrix also helps us understand Overall Female Negative comparative analysis between the keywords in the form of Sr. No words Frequency Polarity tokens for all the five online travel agents chosen in this 1 problem 68 -0.4019 research. Words like problem, help, and support are common 2 Fraud 60 -0.5859 in all the OTA. 3 Cheat 34 -0.4588 4) Keyword Analysis: Following part of the research 4 mistake 31 -0.34 paper presents various keyword visualizations to understand 5 cheater 29 -0.5423 more importance of each of the keyword plays in the entire

TABLE VII. COMPARISON MATRIX FOR OTA‘S AGAINST TOP 10 TOKENS

Sr. Yatra redBus MMT Goibibo Cleartrip No Positive Negative Positive Negative Positive Negative Positive Negative Positive Negative 1 care problem Care problem support problem care problem support fraud 2 please fraud Support fraud care cheat support fraud care cheater 3 support waste Friend mistake friend fraud friend cheat friend problem 4 holiday cheat Please delay holiday mistake please cheater help cheat 5 help emergency Help emergency help error help waste please fault 6 thanks mistake kind fault please emergency thanks fault credit delay 7 friend penalty thanks waste kind loss credit mistake thanks mistake 8 credit fraudsters comfort trouble thanks fault holiday fake thank cheating 9 kind Cheater thank cheater promise penalty trust loss trust doubt 10 value Delay trust rude credit cheater kind fool holiday scam

563 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

wordcloud = WordCloud( background_color ='white') wordcloud.generate_from_frequencies(frequencies=d1) plt.figure() plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.savefig(abc+'_overall.png', dpi=600) plt.show() Fig. 9. Keyword Analysis LOC.

Fig. 10. Visualization of Overall Word Cloud. Fig. 14. Cleartrip Dissatisfied Travellers Word Cloud.

6) Goibibo Keyword analysis: Fig. 15 and 16, covers keywords based on their frequent occurrence for Goibibo. Based on the analysis carried out for Goibibo keywords excellent, best, great, trust, free are the crucial words in the positive category for Goibibo. Whereas worst, bad, fraud, hate is the commonly observed keywords in the negative category for Goibibo. Fig. 11. Overall Satisfied Travellers. 7) MMT Keyword Analysis: MakeMyTrip (MMT) keyword analysis is expressed in Fig. 17 and 18 in graphical representation. In a positive category, best, free, kind, happy, wonderful are critical words for MMT. In contrast, when it comes to negative category pathetic, bad, worst, horrible, fraud, these are the words carrying much importance in MMT keyword analysis.

Fig. 12. Overall Dissatisfied Travellers.

The section below describes the online travel agent (OTA) Specific keyword analysis. 5) Cleartrip Keyword analysis: Fig. 13 and 14 depicts the visualization of keywords in Cleartrip. Based on the results obtained, it is very much clear that words like best, trust, Fig. 15. Goibibo Satisfied Travellers Word Cloud. great, happy are the most critical in a positive category for Cleartrip. In contrast, Fig. 13 shows bad, fraud, pathetic, fail other common words in negative categories for Cleartrip.

Fig. 16. Goibibo Dissatisfied Travellers Word Cloud.

Fig. 13. Cleartrip Satisfied Travellers Word Cloud.

564 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

Fig. 17. MMT Satisfied Travellers Word Cloud. Fig. 22. Yatra Dissatisfied Travellers Word Cloud.

10) Analysis based on Rating: Each of the review collected from the Mouthshut.com has a rating value attached to it ranging from 1 as Low to 5 as High. Following is the OTA‗s and reviews based on rating value percentage Fig. 23 covers lines of code for getting the statistical value of review ratings for all the reviews which are being collected and processed Fig. 18. MMT Dissatisfied Travellers Word Cloud. and relevant graphical representations are obtained with the 8) redBus Keyword Analysis: Similar to other online help of similar lines of code. travel agents, keyword analysis is performed for redBus. The Table VIII represents a statistical analysis of reviews Fig. 19 and 20 depicts various important words in positive and regarding the overall rating given by the reviewer on the scale negative categories for redBus. Best, happy, kind, comfortable of 1 to 5, where 1 is low, and 5 is high. The table shows a these are the top-rated words in the positive category, and bad, good number of reviews for each of the online travel agents as worst, horrible these are some of the highly-rated words in the review rating low, which means dissatisfaction of travellers is negative category for redBus. low. At the same time the volume of reviews classified under 5 represents the high level of satisfaction of travellers using an 9) Yatra Keyword Analysis: Fig. 21 and 22 is the word online travel agent and there relevant services MakeMyTrip, count analysis done on online reviews collected for Yatra Goibibo and Cleartrip are having good volume of reviews online travel agent. The outcome of the analysis demonstrates under the highest rating. the words like best, great, kind, splendid, love is the essential words in the positive category for the Yatra. In the negative sum = [] category, analysis says the words like bad, fraud, worst, percent = [] unprofessional, horrible are some of the words that play an essential role in Yatra. for i in range(5): sumval = 0 for j in range(5): sumval += len(ratingList[j][i]) sum.append(sumval)

Fig. 19. redBus Satisfied Travellers Word Cloud. for i in range(5): percent = [] for j in range(5): percent.append(round(((len(ratingList[j][i]) * 100) / sum[i] ) , 1 ))

worksheet.write(i, j, round(((len(ratingList[j][i]) * 100) / Fig. 20. redBus Dissatisfied Travellers Word Cloud. sum[i] ) , 1 )) worksheet.write(i, 5, sum[i]) percent.append(sum[i]) print(percent) workbook.close() Fig. 23. Rating based Analysis LOC.

Fig. 21. Yatra Satisfied Travellers Word Cloud.

565 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

TABLE VIII. OTA'S REVIEW RATING refine and get a meaningful text for further processing. Sentiment score stage, the sentiments of each word, and then Total Review Rating 1 2 3 4 5 Reviews for the entire review are obtained.in this stage, using the Vader Cleartrip 236 13 12 33 57 351 sentiment lexicon dictionary. The sentiments are derived as positive, negative, neutral—compound sentiment score help to Goibibo 321 33 36 54 75 519 classify the review into positive or negative. In the review OTA's MMT 281 18 48 90 94 531 classification stage, based on the compound sentiment scores redBus 292 38 45 46 48 469 with more than 0.05 compounded courses are classified as positive, and the sentiment scores between -0.05 and 0.0 are Yatra 172 17 15 40 21 265 considered neutral review. The review has a compounded score of less than -0.05 classified as negative reviews. In the Table IX represents the statistical analysis of review visualization stage, based on the review, the various analysis ratings for all the OTA in percentage collected and processed has been carried out in the graphical representation has been for each of these online travel agents. From the table, it is very generated in the form of the word cloud, pie charts. much clear that MakeMyTrip is having the highest percentage of review rating value with 5. It means the satisfaction level of travellers using various services offered by MakeMyTrip is also high in comparison to other online travel agents. Cleartrip has 16.2% reviews based on the overall rating given to the review this percentage and it‘s second after MakeMyTrip. Goibibo is third in the list with the highest percentage under 5. It is very much clear from table values that Yatra is having low preference under the top-level, which means satisfaction is very low for the Yatra.

TABLE IX. OTA'S REVIEW RATING PERCENTAGE Total Review Rating 1 2 3 4 5 volume Cleartrip 67.2 3.7 3.4 9.4 16.2 351 Goibibo 61.8 6.4 6.9 10.4 14.5 519 OTA MMT 52.9 3.4 9 16.9 17.7 531 redBus 62.3 8.1 9.6 9.8 10.2 469 Fig. 24. Architectural view of Approach used for Visualization.

Yatra 64.9 6.4 5.7 15.1 7.9 265 V. STATISTICAL ANALYSIS OF RESULT Looking at Table IX for Yatra, 64.9 percent of Travellers In this section, the statistical analysis results obtained have given Low review rating, represent the percentage of the using the Chi-square test are discussed. A Chi-square test for satisfaction of Travellers using services offered by Yatra. various parameters is performed to understand and test the relationship between various variables based on the results of IV. ARCHITECTURAL VIEW OF MODEL the test acceptance or rejection of the hypothesis is being In this research work, netnography and Text mining chosen. Performed Chi-square test on the following scenarios: techniques are used as an integrated approach. The following 1) Scenario 1(OTA vs. Gender): Here, the Chi-square are the various stages in the architectural view of the basic statistical test is used to test whether two variables are model in this study. independent or not. Chi-square hypothesis testing is used to Fig. 24 represents the architectural view of the platform understand whether there is any relationship between the used in the text mining process. As mentioned in the diagram, online travel agent and the gender of travellers. the following are the steps that help get a visualized a) Null hypothesis (H0): No association/relationship representation of specific results. The first stage in the between gender of traveller and OTA. architectural view is inputting of the review. In this research study, the reviews or posts used are collected from various b) The alternate hypothesis (H1): Gender of traveller online platforms using Netnographic guidelines. Collected and OTA has an association/relationship reviews are maintained in a repository. The second stage is the Table X(a) and X(b) shows the calculations of test preprocessing stage. In this stage, preprocessing of the text statistics. reviews collected is done using various techniques of splitting the words, converting the words, checking for the missing P-value obtained is .00000009307. With the degree of values is performed. The next stage is cleaning the reviews; in freedom 4 and at 5% level of significance, the critical this process of cleaning of the reviews, it is very much value/tabular value is 9.49. The calculated chi-square essential that some of the processes which help the text to value(33.54) is greater than the critical or table value(9.49).

566 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

TABLE X. (A) TEST STATISTICS FOR OTA VS. GENDER

Gender Male Female Total OTA Observed Expected % Observed Expected % Observed % Cleartrip 242 233.38 16.74 110 118.62 14.97 352 16.14 Goibibo 429 385.87 29.67 153 196.13 20.82 582 26.69 MMT 325 341.44 22.48 190 173.56 25.85 515 23.61 redBus 306 310.95 21.16 163 158.05 22.18 469 21.5 Yatra 144 174.37 9.96 119 88.63 16.19 263 12.06 Total 1446 100 735 100 2181 100

(b) Test Statistics for OTA vs. Gender

Observed Expected (obs-exp)^2 (obs-exp)^2/ Exp 242 233.38 74.30 0.32 429 385.87 1860.20 4.82 325 341.44 270.27 0.79 306 310.95 24.50 0.08 144 174.37 922.34 5.29 110 118.62 74.30 0.63 153 196.13 1860.20 9.48 190 173.56 270.27 1.56 163 158.05 24.50 0.16 119 88.63 922.34 10.41 Calculated Chi-Square Value 33.54 relationship between OTA Sentiment against the total number There is enough of the statistical evidence to reject the null of reviews for the online travel agents. hypothesis and to accept the fact that there is an association or relationship between various OTAs and the gender of 3) Statistical Analysis of Positive negative ratio Vs. travellers. Gender: In the below section, a detailed analysis of the Since 33.52 > 9.48 or our P-value < 0.05 relationship between positive-negative sentiment ratio and gender. Based on the data shown in the Table, P-value using The alternate hypothesis is accepted. Thus there is an the Chi-square statistical test is 0.00. The following are the association between gender and people using OTA. null and alternate hypotheses framed to identify whether there Fig. 25 is the graphical representation of observed values is any relationship between the positive-negative sentiment for males and females for all the online travel agents against ratio and gender. the total volume of reviews collected. a) Null hypothesis (H0): No association between 2) OTA vs. Positive Negative Sentiments: In this section, Sentiment and Gender the Chi-square statistical test performed to understand the b) Alternate Hypothesis (H1): Association between relationship between positive negative sentiments of travellers Sentiment and Gender and various online travel-related service providers (OTA). Referring to Table XII, The Chi-square value calculated a) Null hypothesis (H0): No association between using P-value is 12.96. And with a 5% level of significance Sentiment and people using OTA. and 4 degrees of freedom, the tabular value for chi-square is 9.48. Since the calculated value, 12.96 is more than the tabular b) Alternate Hypothesis (H1): Association between value for chi-square 9.48. There is enough evidence to reject Sentiment and people using OTA. the null hypothesis and accept the alternate hypothesis, which From Table XI, P-value obtained using the chi-square test says there is a relationship between Sentiment and gender. is 0.01, chi-square value from P-value is 12.78; critical or Since 12.95 > 9.48, Alternate hypothesis is accepted. tabulated chi-square value at the degree of freedom 4 and level of significance 5% is 9.49; since 12.78 > 9.48 An alternate Fig. 27 represents a detailed classification of the total hypothesis is accepted; it means an association or relationship volume of reviews and sentiments as a positive and negative between sentiments and people using OTA. Fig. 26 is a gender-wise. The finding from the analyses says that the male graphical representation of about table values representing the and female total percentage of negative review sentiment in male is less than female.

567 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020

VI. SERVICE SPECIFIC ANALYSIS Online travel agents used in this research provides a various set of services to the users, customers, travellers. These services include hotel booking, flight booking, bus booking. In this section, the results obtained regarding these services are evaluated, and reasonable interpretations are derived. Fig. 28 shows lines of code help to do the analysis with reference to sentiments for each of the OTA specific to services identified.

Fig. 25. OTA Specific Gender Values. xpos = np.arange(len(OTA))

TABLE XI. TEST STATISTICS FOR OTA VS. POSITIVE NEGATIVE plt.bar(xpos-0.2,posi, width=0.4, label="Positive") SENTIMENTS plt.bar(xpos+0.2,negi, width=0.4,label="Negative") Positive Negative Total Gender plt.xticks(xpos,OTA) Observe Expecte Observe Expecte Observe OTA % % d d d d d plt.ylabel("Number of "+word+" reviews") 14.9 Cleartrip 165 178.5 187 173.5 17.4 352 2 plt.title("Sentiment of "+word+" reviews") 25.9 Goibibo 303 295.14 27.4 279 286.86 582 5 plt.legend() 20.8 MMT 291 261.16 26.3 224 253.84 515 4 plt.savefig(word+"_OTA") 20.3 redBus 225 237.83 244 231.17 22.7 469 Fig. 28. Service Specific Sentiment Analysis LOC. 4 11.0 13.1 Yatra 122 133.37 141 129.63 263 The graphical representation below Fig. 29 clearly 3 2 indicates for bus-related bookings our services Travellers have Total 1106 100 1075 100 2181 preferred redBus.

Fig. 26. Sentiments and Gender Relationship for OTA.

TABLE XII. RELATIONSHIP BETWEEN SENTIMENTS AND GENDER Fig. 29. Sentiments of Bus Reviews for OTA.

Observed Expected Gender After redBus Travellers preferred Goibibo and then Female Total Sentiment Male Male % Female Male Female Makemytrip, the finding here is for bus-related services, % redBus is on top of all. Positive 773 69.89 673 62.60 733.28 712.72 1446 Flight-related services are offered by the following online Negative 333 30.11 402 37.40 372.72 362.28 735 travel agents chosen for the study, Cleartrip, Goibibo, Total 1106 100 1075 100 2181 MakeMyTrip, and Yatra. The graphical representation Fig. 30 shows that for the flight-related services, the volume of negative reviews and sentiments are more with Cleartrip than other service providers. For Goibibo, the positive sentiments are little more than the negative sentiments which we can interpret in a manner that Travellers or customers are a little happier regarding flight-related services offered by Goibibo. Similarly, for MakeMyTrip, positive sentiments are more than negative sentiments for all the possible reviews processed for flight-related services which also represents the satisfaction of travellers is more than dissatisfaction for Makemytrip. For Yatra, the positive sentiments are less than the negative sentiments. Also, the volume of reviews for Yatra regarding Fig. 27. Classification Reviews and Sentiments. flight-related services is low; it represents the satisfaction is

568 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 lower than the dissatisfaction and preference for flight-related Cleartrip, Goibibo, redBus analysis is performed using text services is not towards Yatra. mining as an integrated approach. Based on results obtained regarding service-specific analysis, it is possible to conclude that for flight booking, related services traveller's satisfaction is more with MMT, Goibibo. Consumers have preferred MakeMyTrip as the first option and Goibibo as Second and followed by Yatra. For Hotel booking related services satisfaction of travellers using services from MMT, Goibibo is high compared to others. For bus booking related services, users are more happy and satisfied with redBus then Goibibo and MMT.

This research work has used various approaches Fig. 30. The Sentiment of Flight Reviews for OTA's. concerning the understanding of satisfaction, dissatisfaction, or the perception of travellers using selective travel-related The following are the online travel agents that offer hotel online service providers in India to build a competitive booking related services online Cleartrip, Goibibo, advantage to customers and company too. Online travel MakeMyTrip, Yatra. Fig. 31 clearly explains the results of the service providers in India can make appropriate decisions analysis done regarding Hotel related services offered by these according to the results obtained from the study regarding online travel agents in India. When it comes to Cleartrip Hotel customer perception and create a competitive advantage. Also, related services, Fig. 31 represents the negative sentiments are consumers can choose a particular online travel-related service more than positive it means when it comes to hotel-related provider based on the result of the study for improved services services, online dissatisfaction is more observed in customers and traveller satisfaction. or travellers about Cleartrip. Goibibo looks at the top choice REFERENCES for Hotel related services online. Again based on the analysis, [1] C. Science and M. Studies, ―A Study on Tourist Satisfaction towards it is proved that positive sentiments are more than negative Hotel Related Services in Gujarat Tourism,‖ vol. 4, no. 6, pp. 105–110, sentiments. It means the satisfaction of travellers using Hotel 2016. related services offered by Goibibo is high. MakeMyTrip also [2] ―Internet users in India to cross 500 mn in 2016: Prasad,‖ May 2016. follows the line of Goibibo when it comes to hotel-related [3] D. Mane and D. Bhatnagar, ―Integrating Netnographic Analysis and services. It has become the second preference for Hotel related Text Mining for Assessing Satisfaction of Travellers Visiting to India - services after Goibibo. It is also clearly indicated that Yatra is A Review of Literature,‖ 2020, pp. 564–572. rarely preferred by travellers when it comes to online Hotel [4] ―Top Online Travel Agencies (OTAs) in India - THE ‗PERFECT‘ related services. HOTEL.‖ [Online]. Available: http://whiteskyhospitality.co.uk/top- online-travel-agencies-otas-in-india/. [Accessed: 28-Jul-2020]. [5] J. Chakravarthi and V. Gopal, ―Comparison of traditional and online travel services: A concept note,‖ IUP J. Bus. Strateg., vol. 9, no. 1, pp. 45–59, 2012. [6] D. Mane., ―Text Mining to Understand Major Keywords Explaining Sentiments of Travelers Using Travel Related Online,‖ books.google.com. [7] ―CLEARTRIP.COM Reviews, Feedback, Complaint, Experience, Customer Care Number - MouthShut.com.‖ [Online]. Available: https://www.mouthshut.com/websites/Cleartrip-com-reviews- 925062909-srch. [Accessed: 28-Aug-2020]. [8] ―MAKEMYTRIP.COM Reviews, Feedback, Complaint, Experience, Customer Care Number - MouthShut.com.‖ [Online]. Available: https://www.mouthshut.com/websites/MakeMyTrip-com-reviews- 925031929-srch. [Accessed: 28-Aug-2020]. Fig. 31. The Sentiment of Hotel Reviews for OTA's. [9] P. Gupta, R. Tiwari, and N. Robert, ―Sentiment Analysis and Text Summarization of Online Reviews : A Survey,‖ 2016 Int. Conf. VII. CONCLUSION Commun. Signal Process., pp. 241–245, 2016. Customer is King for a successful business, customer [10] Prameswari P., Surjandari I., and Laoh E.. Opinion Mining from Online Reviews in Bali Tourist Area. 226–230. (2017). satisfaction is considered to be a prime attribute. This research [11] Kozinets R., Netnography: doing ethnographic research online. Sage helps in consumer decision making while choosing various Publication, London (2010b). travel-related services and also businesses to choose their [12] Top 10 travel Companies in India|TopEcommerceStartups., Retrieved strategies to improve on services offered based on listening to from http://topecommercestartups.com/top-10-travel-companies-in- the consumers. india/Travel& (2014). Netnography and Text mining techniques used to perform analysis and processing of reviews and comments collected from various online platforms. The review collection has followed all the guidelines suggested by netnographic studies for five online travel agents i.e. MakeMyTrip (MMT), Yatra,

569 | P a g e www.ijacsa.thesai.org