SOCIAL MEDIA SENTIMENT ANALYSIS for FIRM's REVENUE PREDICTION Ioanna Dimadi
Total Page:16
File Type:pdf, Size:1020Kb
SOCIAL MEDIA SENTIMENT ANALYSIS FOR FIRM’S REVENUE PREDICTION Ioanna Dimadi Subject: Information Systems Corresponds to: 30 hp Presented: VT 2018 Supervisor: Simone Callegari Examiner: Görkem Paçaci Department of Informatics and Media Abstract The advent of the Internet and its social media platforms have affected people’s daily life. More and more people use it as a tool in order to communicate, exchange opin- ions and share information with others. However, those platforms have not only been used for socializing but also for expressing people’s product preferences. This wide spread of social networking sites has enabled companies to take advantage of them as an important way of approaching their target audience. This thesis focuses on study- ing the influence of social media platforms on the revenue of a single organization like Nike that uses them actively. Facebook and Twitter, two widely-used social me- dia platforms, were investigated with tweets and comments produced by consumer’s online discussions in brand’s hosted pages being gathered. This unstructured social media data were collected from 26 Nike official pages, 13 fan pages from each plat- form and their sentiment was analyzed. The classification of those comments had been done by using the Valence Aware Dictionary and Sentiment Reasoner (VADER), a lexicon-based approach that is implemented for social media analysis. After gathering the five-year Nike’s revenue, the degree to which these could be affected by the clas- sified data was examined by using multiple stepwise linear regression analysis. The findings showed that the fraction of positive/total for both Facebook and Twitter ex- plained 84.6% of the revenue’s variance. Fitting this data on the multiple regression model, Nike’s revenue could be forecast with a root mean square error around 287 billion. Keywords: Social Media, Facebook, Twitter, Sentiment Analysis, VADER, Linear Regression Table of Contents Abstract ........................................................................................................................ 2 1. Introduction .................................................................................................. 1 1.1. Background ............................................................................................ 1 1.2. Research Problem ................................................................................... 3 1.3. Research question and purpose ................................................................ 4 1.4. Delimitations .......................................................................................... 5 1.5. Outline ................................................................................................... 5 2. Related work ................................................................................................. 6 3. Literature Review ......................................................................................... 9 3.1. Fundamental Concepts ............................................................................ 9 3.1.1. Social Media and Brand Pages ......................................................... 9 3.1.2. Financial Performance ................................................................... 10 3.1.3. Sentiment Analysis ........................................................................ 12 3.2. Development tools ................................................................................ 15 3.2.1. Python ........................................................................................... 16 3.2.2. VADER ......................................................................................... 16 4. Methodology ............................................................................................... 18 4.1. Research Approach ............................................................................... 18 4.2. Data Collection ..................................................................................... 19 4.2.1. Nike Inc. Revenue ......................................................................... 19 4.2.2. Social media data ........................................................................... 20 4.3. Sentiment Analysis process ................................................................... 21 4.3.1. Data Preprocessing ........................................................................ 21 4.3.2. Sentiment Classification of the Data............................................... 22 4.4. Prediction Model .................................................................................. 23 4.4.1. Finding the Independent variables .................................................. 23 4.4.2. Model Creation .............................................................................. 24 4.4.3. Evaluating the model ..................................................................... 25 4.5. Critical Reflection ................................................................................ 25 5. Results and Discussion ................................................................................ 28 5.1. Collected Data ...................................................................................... 29 5.2. Analyzed Data ...................................................................................... 30 5.3. Data Classification ................................................................................ 32 5.4. Analyzing the multiple linear regression model ..................................... 36 5.4.1. Identification of the independent variables ..................................... 36 5.4.2. Multiple linear regression deduction .............................................. 37 5.4.2.1. Coefficient and Multicollinearity ............................................ 37 5.4.2.2. Examining the statistical significance ...................................... 38 5.4.2.3. Examining Confidence Interval for B and Coefficient.............. 38 5.4.2.4. Testing Autocorrelation .......................................................... 39 5.4.3. Prediction Process ......................................................................... 40 5.4.3.1. Predicted Model ..................................................................... 40 5.4.3.2. Model Reliability .................................................................... 40 5.4.3.3. Evaluation .............................................................................. 42 6. Concluding Remarks and Future Perspective ............................................... 43 6.1. Summarizing the findings ..................................................................... 43 6.2. Limitations ........................................................................................... 44 6.3. Future Research .................................................................................... 45 6.4. Conclusions .......................................................................................... 46 7. References .................................................................................................. 48 List of Figures Figure 1. Converting resources into capabilities .................................................... 11 Figure 2. Sentiment Analysis methods .................................................................. 13 Figure 3. Steps of the thesis ................................................................................. 19 Figure 4. VADER Steps ....................................................................................... 23 Figure 5. Total comments and tweets. ................................................................... 29 Figure 6. Analyzed comments and tweets ............................................................. 31 Figure 7. Classified positive comments and tweets ............................................... 33 Figure 8. Classified negative comments and tweets .............................................. 33 Figure 9. Classified total comments and tweets ..................................................... 34 Figure 10. Classified positive/negative comments and tweets. ............................... 34 Figure 11.Classified positive/total comments and tweets ....................................... 35 Figure 12. Normal P-P Plot of Regression Standardized Residual ......................... 41 Figure 13. Standardized Residuals ........................................................................ 41 Figure 14. Actual and Predicted quarterly revenue ................................................ 42 List of Tables Table 1. Fan pages from Facebook and Twitter ..................................................... 21 Table 2. Excluded variables ................................................................................. 36 Table 3. Coefficients ............................................................................................ 37 Table 4. A. Coefficients, B. ANOVA ................................................................... 38 Table 5. A. Confidence Interval, B. Model Summary ............................................ 39 Table 6. Autocorrelation ...................................................................................... 39 List of Equations 1. Annual Revenue ............................................................................................. 20 2. Normalization Model ...................................................................................... 23 3. Multiple Regression .......................................................................................