Multi-Factor Sentiment Analysis for Gauging Investors Fear by Ichihan Tai B.S. in Engineering in Computer Science, May 2012

Multi-factor Sentiment Analysis for Gauging Investors Fear by Ichihan Tai B.S. in Engineering in Computer Science, May 2012, University of Michigan M.S. in Engineering in Industrial and Operations Engineering, December 2012, University of Michigan A Praxis submitted to The Faculty of The School of Engineering and Applied Science of The George Washington University in partial fulfillment of the requirements of the degree of Doctor of Engineering August 31, 2018 Praxis directed by Amir H. Etemadi Assistant Professor of Engineering and Applied Science Ebrahim Malalla Visiting Associate Professor of Engineering and Applied Science The School of Engineering and Applied Science of The George Washington University certifies that Ichihan Tai has passed the Final Examination for the degree of Doctor of Engineering as of the date of praxis defense, July 19, 2018. This is the final and approved form of the praxis. Multi-factor Sentiment Analysis for Gauging Investors Fear Ichihan Tai Praxis Research Committee: Amir H. Etemadi, Assistant Professor of Engineering and Applied Science, Praxis Co-Director Ebrahim Malalla, Visiting Associate Professor of Engineering and Applied Science, Praxis Co-Director Takeo Mitsuhashi, Chief Executive Officer & Chief Investment Officer, Committee Member ii © Copyright 2018 by Ichihan Tai All rights reserved iii Acknowledgements I am grateful to my academic advisors Dr. Amir H. Etemadi and Dr. Ebrahim Malalla, and previous academic advisors Dr. Bill Olson and Dr. Paul Blessner for all the support and guidance throughout my doctorial study. I appreciate my employer Takeo Mitsuhashi and Satoshi Tazaki for financial supports and accommodations to my doctorial study. I am thankful to my previous superiors Sunny Park, Andrew Ray, and David Carlebach for all the encouragements and supports to get me started in this doctorial journey. I am appreciative to my mentor Dr. Peng Zang for the guidance and the inspirations. Finally, I thank my family members for continuous support throughout the program. iv Abstract Multi-factor Sentiment Analysis for Gauging Investors Fear The Chicago Board Options Exchange Volatility Index (VIX) is widely used to gauge investor fear and measures the “insurance premiums” of the stock market. Traditionally, VIX prediction was done using time-series analysis models (e.g., GARCH), and some attempt was made to predict it using sentiment analysis approach. However, traditional sentiment analysis is focused on authors’ sentiment (sentiment expressed in news authors’ word choices), and this Praxis will demonstrate that adding other factors (e.g., similarity, readability) into the traditional authors’ sentiment model will improve VIX prediction results. This Praxis research leverages natural language processing (NLP) and machine learning (ML) techniques to build a VIX prediction model. v Table of Contents Acknowledgements ............................................................................................................ iv Abstract ................................................................................................................................v List of Figures .................................................................................................................... ix List of Tables .......................................................................................................................x List of Acronyms ............................................................................................................... xi Chapter 1-Introduction .........................................................................................................1 1.1 Background ................................................................................................................1 1.2 Problem statement ......................................................................................................5 1.3 Thesis statement .........................................................................................................5 1.4 Research objectives ....................................................................................................5 1.5 Research questions .....................................................................................................8 1.6 Research hypotheses ..................................................................................................9 1.7 Research limitations ...................................................................................................9 1.8 Organization of praxis ..............................................................................................10 Chapter 2-Literature Review ..............................................................................................11 2.1 Problems ...................................................................................................................11 2.1.1 VIX Index Prediction.........................................................................................11 2.1.2 Modeling Financial Markets ..............................................................................12 vi 2.1.3 Managing crises .................................................................................................16 2.1.4 Engineering Management ..................................................................................19 2.2 Data ..........................................................................................................................20 2.2.1 Data Hierarchy ...................................................................................................20 2.2.2 Textual Data: Metadata .....................................................................................21 2.2.3 Textual Data: Contents ......................................................................................22 2.3 Methodologies ..........................................................................................................23 2.3.1 Textual Analysis ................................................................................................23 2.3.2 Textual Data Preprocessing ...............................................................................25 2.3.3 Features Engineering .........................................................................................26 2.3.4 Machine Learning ..............................................................................................28 2.3.5 Evaluation ..........................................................................................................30 2.4 Gaps in the existing literatures .................................................................................31 Chapter 3-Methods.............................................................................................................33 3.1 Data ..........................................................................................................................37 3.2 Pre-processing ..........................................................................................................43 3.3 Features Engineering ................................................................................................48 3.3.1 Sentiment ...........................................................................................................50 3.3.2 Similarity ...........................................................................................................51 3.3.3 Readability .........................................................................................................54 vii 3.3.4 Topic Frequency ................................................................................................54 3.4 Machine Learning ....................................................................................................55 3.5 Evaluation .................................................................................................................58 3.6 Benchmark ...............................................................................................................59 Chapter 4-Results ...............................................................................................................61 4.1 Pre-processing ..........................................................................................................61 4.2 Machine learning ......................................................................................................64 Chapter 5-Discussion and Conclusions .............................................................................66 5.1 Discussion (interpretation of results) .......................................................................66 5.2 Conclusions ..............................................................................................................68 5.3 Contributions ............................................................................................................70 5.4 Future research directions ........................................................................................70 Appendices .........................................................................................................................85 Appendix A: Factor Library ...........................................................................................85 Appendix B: Optimum number of topics for Latent Dirichlet Allocation .....................94 Appendix C: Main Multi-Factor Sentiment Analysis Code ...........................................95

Multi-Factor Sentiment Analysis for Gauging Investors Fear by Ichihan Tai B.S. in Engineering in Computer Science, May 2012

Analysis of News Sentiment and Its Application to Finance

Text and Context: Language Analytics in Finance

Sentiment Analysis of News Headlines for Stock Trend Prediction

Its Impact on Liquidity and Trading Review Authors

Download File

How Conflict and Corruption Impact Financial Markets

Topic Modelling and Sentiment Analysis with the Bangla Language: a Deep Learning Approach Combined with the Latent Dirichlet Allocation

Predicting Returns with Text Data

Leveraging Text Mining and Analytical Technology to Enhance Financial Planning and Analysis

The Relevance of High-Frequency News Analytics for Lower-Frequency Investment Strategies∗

Social Media Analytics: a Survey of Techniques, Tools and Platforms

Automated Analysis of News to Compute Market Sentiment: Its Impact on Liquidity and Trading