Signature Redacted I MIT Sino in School of Management May 6, 2016
Total Page:16
File Type:pdf, Size:1020Kb
Individual Investors, Social Media and Chinese Stock Market: a Correlation Study By Yonghui Wu B.E., Shanghai Jiao Tong University, 2007 M.E., Shanghai Jiao Tong University, 2010 SUBMITTED TO THE MIT SLOAN SCHOOL OF MANAGEMENT IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MANAGEMENT STUDIES MASSACHUSETTS INSTITUTE OF TECHNOLOGY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUN 082016 JUNE 2016 LIBRARIES @2016 Yonghui Wu. All rights reserved. ARCHIVES The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author: Signature redacted I MIT Sino in School of Management May 6, 2016 Certified by: Signature redacted Erik Brynjolfsson Schussel Family Professor Thesis Supervisor Signature redacted____ Accepted by: Rodrigo S. Verdi Associate Professor of Accounting Program Director, M.S. in Management Studies Program MIT Sloan School of Management Individual Investors, Social Media and Chinese Stock Market: a Correlation Study By Yonghui Wu Submitted to MIT Sloan School of Management on May 6, 2016 in Partial fulfillment of the requirements for the Degree of Master of Science in Management Studies. ABSTRACT Chinese stock market is a unique financial market where heavy involvement of individual investors exists. This article explores how the sentiment expressed on social media is correlated with the stock market in China. Textual analysis for posts from one of the most popular social media in China is conducted based on Hownet and NTUSD, two most commonly used sentiment Chinese dictionaries. The correlation matrices and regressions between sentiment ratios and returns of 9 holding periods for all the 30 sample securities reveal that correlation exists between investor sentiment on social media and the future returns of the Chinese stock market. In addition, I find that negative sentiment ratio is superior than positive sentiment ratio, and correlation of sentiment ratio to return is persistent in future holding periods. Also, by comparing different stocks and indices, I find that well-established market index has better correlation with social media sentiments than individual stocks, and well-known 'star' stocks have better correlation with social media than other stocks. However, I test the VAR model on Shanghai Composite Index, and find that the model is stable but shows no Granger causality. Better data and improved analysis are needed to predict stock market with social media. Thesis Supervisor: Erik Brynjolfsson Title: Schussel Family Professor (This page left intentionally blank) Acknowledgements I feel grateful and privileged to have worked with my thesis advisor Professor Erik Bryn- jolfsson. I would like to thank him for his guidance for helping me navigate through the thesis process, and for his prompt feedback and suggetions regarding the directions and the resources of this study. I would also like to thank Professor Marshall Van Alstyne and other fellow students for their valuable comments and encouragement on this research in the class of Economics of Digitalization. This study is very new and challenging for me because I have little prior experienc in programming. This thesis could not have been possible without the help of my friend Lerith Tian. Lerith has helped me tremendously with python programing and textual analysis. I am very grateful for his help and also learned a lot from his patient guidance. I also benefited a lot from my other friends. Shuyi Yu has provided me with many valuable suggestions on statistical analysis. Shan Huang has helped me narrow down the research scope at the very beginning. Alora Chen, Jin Jing Liu and Liam O'Dea have greatly supported me during my preparation for this thesis. I am indebted to these dear friends of mine. Last but not least, I would like to thank my parents Shunfeng Wu and Ganying Deng as well as my sister Yonghong Wu. Thank you for always believing in me and standing behind all my endeavors. (This page left intentionally blank) Contents 1 Introduction 4 1.1 Literature review on investor sentiment and the stock market ........ 4 1.2 Social media and stock markets in China ....... ........ .... 6 1.3 Literature review on Chinese NLP . ......... ........ .... 10 1.4 Summary ....... ......... ........ ......... .. 11 2 Data 12 2.1 G uba ........ ............. ............. .... 12 2.1.1 Guba as a social media in China ... ............ .... 12 2.1.2 Posts and samples ........ ............ ...... 13 2.2 Financialdata . .. .. ......... ........ .... 16 3 Method 16 3.1 Dictionaries and word list .......... ............. .... 16 3.2 Segmenting and parsing the posts. ........... ........... 17 3.3 Quantifying the positive and negative sentiment ........ ....... 19 3.4 Regression methods ..... ............. ............ 20 4 Results 21 4.1 Correlation Analysis ......... ............. ....... 21 4.1.1 Positive ratio v.s. Negative ratio . ............ ...... 21 4.1.2 Differnt Holding Periods and securities ......... ...... 23 4.2 Regression Analysis ............. ............. .... 25 4.2.1 Positive ratio v.s. negative ratio ... ........ ........ 25 4.2.2 Difference between stocks ......... ............ 26 4.2.3 Difference between stocks and indices ........ ........ 30 4.3 Time-series Analysis ... .......................... 31 4.3.1 Lag selection .......... ................... 33 4.3.2 VAR Model .............................. 33 4.3.3 Stability test ............................. 33 'I 4.3.4 Granger causality analysis .. ... ... ... ... ... ... .. 35 5 Conclusion 36 2 List of Figures 1 Social network penetration in China from 2012 to 2018 ........... 6 2 Domestic market capitalization of stock exchanges in the world in 2014 7 3 Accumulted Number of Individual A Share Account in China . ...... 8 4 Trading Volume of Different Investor Type in China (2011, 2012) ..... 9 5 Timespan and Posts Under the Selected 30 sections . ............ 14 6 Company Information of the 28 Selected Stocks ............... 15 7 Summary of Method ... .......................... 17 8 List of Positive & Negative Words for Stock Market .... ......... 18 9 Denotations of Returns ..... ......... ........ ...... 21 10 Correlation of Negative Ratio and Returns of Sample Stocks and Indices 22 11 Correlation of Positive Ratio to Returns of Sample Stocks and Indices . .. 22 12 Average of Correlation Coefficients for Positive and Negative Ratio .. .. 23 13 Negative Correlations with Different Returns for Sample Securities ... .. 24 14 Positive Correlation with Different Holding Period Return .. ....... 24 15 Sample Securities Ranked by Posts per Day ........ ......... 26 16 Insignificant Positive Ratio and Significant Negative Ratio for 11 Stocks .. 27 17 Positive Ratio and Negative Ratio are Both Significant for 17 stocks .. .. 28 18 Outliers in Regression Coefficients ..... ......... ....... 29 19 Coefficients of Positive Ratio and Negative Ratio for Stocks ........ 30 20 Regression Results for Indices ....... ........ ......... 31 21 Coefficients for Positive Ratio: Indices v.s. Stocks ... ........... 32 22 Coefficients for Negative Ratio: Indices v.s. Stocks ........... .. 32 23 Lag Length Selection . ......... ........ ......... .. 33 24 VAR M odel .. ............. ............ ....... 34 25 Unit Root Check ......... ........ ......... ..... 34 26 Granger Causality Test .... ........ ......... ....... 35 3 1 Introduction 1.1 Literature review on investor sentiment and the stock market Behavior science tells us that emotions can influence people's decisions. In financial ar- eas, many researchers in beharioral finance have identified that stock perfomances are af- fected by investor behaviors and sentiments. Unlike the standard finance model, where unemotional investors always force capital market prices to equal to the rational present value of expected future cash flows, behavior finance has grossed substantially in the past decade to augment the standard model. The first dimension of behavior finance is about behavior patterns. Many behaviro patterns and biases have been discovered. For example, M. Seasholes and N. Zhu(2010) have found that individuals tilt their portfolios towards locally-headquartered firms, and this local bias doesn't bing them superior returns. J. En- gelbert and C. Parsons (2011) have identified the causal effect of the local media on the trading behavior-all else equal, local press coverage increases the daily trading volume of local retail investors. The second dimension of behavior finance is related to sentiments. One of the most important assumptions in behavior finance is that investors are subject to sentiment. Investor sentiment, defined broadly, is a belief about future cash flows and in- vestment risks that is not justified by the facts at hand (Beker 2007). Although the question is no longer whether investor sentiment affects stock prices, but rather how to measure in- vestor sentiment and quantify its effects. Many measurements have been developed, such as Investor Surveys (Qiu and Welch, 2004), Investor Mood (Kamstra, Kramer and Levi 2003), Retail Investor Trades (Barber, Odean, and Zhu, 2003), IPO Frist-Day Returns, Option im- plied volatility (Market Volatility Index or VIX which measures the implied volatility of options on the standard and Poor's 100 stock index). However, those measurements are only proxies of investors' sentiments, not direct measurements. In addition, data avail-