How Google Search Trends Can Be Used As Technical Indicators for the S&P500-Index

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2018 How Google Search Trends Can Be Used as Technical Indicators for the S&P500-Index A Time Series Analysis Using Granger’s Causality Test ALBIN GRANELL FILIP CARLSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES How Google Search Trends Can Be Used as Technical Indicators for the S&P500-Index A Time Series Analysis Using Granger’s Causality Test ALBIN GRANELL FILIP CARLSSON Degree Projects in Applied Mathematics and Industrial Economics Degree Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2018 Supervisors at KTH: Jörgen Säve-Söderbergh, Julia Liljegren Examiner at KTH: Henrik Hult TRITA-SCI-GRU 2018:182 MAT-K 2018:01 Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci Abstract This thesis studies whether Google search trends can be used as indicators for movements in the S&P500 index. Using Granger's causality test, the level of causality between movements in the S&P500 index and Google search volumes for certain keywords is analyzed. The result of the analysis is used to form an investment strategy entirely based on Google search volumes, which is then backtested over a five year period using historic data. The causality tests show that 8 of 30 words indicate causality at a 10% level of significance, where one word, mortgage, indicates causality at a 1% level of significance. Several investment strategies based on search volumes yield higher returns than the index itself over the considered five year period, where the best performing strategy beats the index with over 60 percentage units. 1 Hur Google-söktrenderkan användassom tekniska indikatorer för SP500-indexet: en tidsserieanalys med hjälpav Grangers kausalitetstest Sammanfattning Denna uppsats studerar huruvida Google-söktrenderkan användas som indikatorer förrörelseri S&P500-indexet. Genom Grangers kausalitetstest studeras kausalitetsniv˚anmellan rörelseri S&P500 och Google- sökvolymer försärskilltutvalda nyckelord. Resultaten i denna analys används i sin tur föratt utforma en investeringsstrategi enbart baserad p˚aGoogle-sökvolymer, som med hjälpav historisk data prövas över en fem˚arsperiod. Resultaten av kausalitetstestet visar att 8 av 30 ord in- dikerar en kausalitet p˚aen 10%-ig signifikansniv˚a,varav ett av orden, mortgage, p˚avisarkausalitet p˚aen 1%-ig signifikansniv˚a.Flera invester- ingsstrategier baserade p˚asökvolymer genererar högreavkastning änin- dexet självtöver den prövade fem˚arsperioden, därden bästastrategin sl˚ar index med över 60 procentenheter. 3 Acknowledgements We would like to thank our supervisors at the Royal Institute of Tech- nology (KTH), PärJörgenSäve-Söderbergh and Julia Liljegren for their support before and throughout the study. 5 Contents 1 Introduction 9 1.1 Background . .9 1.2 Objective . .9 1.3 Problem Statement . 10 1.4 Limitations . 10 1.5 Previous Research . 10 2 Theoretical Framework 12 2.1 Technical Indicators . 12 2.2 Financial Theory . 12 2.2.1 Efficient Market Hypothesis (EMH) . 12 2.2.2 Behavioural Finance . 13 2.3 Mathematical Framework . 13 2.3.1 Vector Autoregression (VAR) . 13 2.3.2 VAR Order Selection . 14 2.3.3 Stable VAR process . 15 2.3.4 Stationarity . 16 2.3.5 Augmented Dickey-Fuller Test . 16 2.3.6 OLS Estimation of VAR Parameters . 16 2.3.7 Breusch-Godfrey test . 17 2.3.8 Granger-Causality . 17 2.3.9 F-Statistics for Granger-Causality . 18 3 Method 19 3.1 Word selection . 19 3.2 Data collection . 21 3.2.1 Search data . 21 3.2.2 S&P500-Index . 22 3.3 Investment Strategies . 22 3.4 Outline . 24 4 Results 26 4.1 Transformation of Data . 26 4.2 Selection of lag order . 26 4.3 Model Validation . 27 4.4 Granger-Causality Tests . 28 4.5 Backtesting Investment Strategies . 29 4.5.1 Strategy 1 . 29 4.5.2 Strategy 2 . 30 4.5.3 Strategy 3 . 31 5 Discussion 32 5.1 Interpretation of Results . 32 5.1.1 Granger-Causality Test . 32 5.1.2 Investment Strategies . 33 5.1.3 Comparison to Previous Findings . 34 5.1.4 Financial Implications . 35 5.2 Sources of Errors . 36 7 5.2.1 Mathematical Sources of Errors . 36 5.2.2 Errors From Data Collection . 36 5.2.3 Nature of the Financial Market . 37 5.3 Further Research . 37 5.4 Conclusion . 38 References 40 A Appendix 41 A.1 Augmented Dickey-Fuller test . 41 A.2 Strategy 1 Returns . 43 A.3 Strategy 2 Returns . 44 A.4 Strategy 3 Returns . 45 8 1 Introduction 1.1 Background In the beginning of the 21st century, papers, books, tv broadcasting and radio were the main sources of information. Today, this has changed as the Internet has developed and changed our way of living. Nowadays, top news are shown as a pop-up notification in smartphones only minutes, sometimes even seconds after the occurrence and information is never more than an online search away. Simultaneously with this rapid change Google has become the number one search engine worldwide with trillions of searches every year and a 91% online search market share by February 2018.[1] In 2010 Google's Executive Chairman Eric Schmidt claimed that the information gathered over two days, equals the accumulated amount from the dawn of mankind up to 2003.[2] The new era of big data creates new possibilities and several businesses see it as the holy grail for finally being able to predict who, where and when customers will buy their products.[3] Despite the emergence of big data, the increase of information used does not compare as only about one percent of the data collected is analyzed.[4] Thus, there is a lot of unexplored possibilities in the new era of big data. Today's most commonly used technical trading indicators have not been influenced by the increase in big data, as they are still mainly based on momentum calculated from trading volumes, volatility and historical returns of the considered asset.[5] Such indicators are used by investors in order to analyze price charts of financial assets to be able to, for example, predict future stock price movements. Unlike fundamental analysis, in which investors try to determine whether a company is under- or overvalued, technical analysis does not consider the fundamental value of the stock. Instead indicators are used to identify patterns and in that way predict short term movements in the price of the considered asset.[6] 1.2 Objective The thesis investigates whether there exists a causal relationship between online search activity and the overall performance of the stock market. Today, many investors base their trading on technical indicators or key performance indicators, such as price-earnings ratios, earnings per shares, historic returns etc. However, as a result of the Internet's, Google's in particular, increasing influence on peoples day-to-day life it is reasonable to believe that data from online activity potentially could reflect the overall state of the economy. As further discussed in section 1.6, there is no prevailing consensus on the topic, as previous studies come to different conclusions using various methods. The objective of the thesis is to find mathematically substantiated evidence, through Granger-causality tests, that Google search volumes can be used as a technical indicator for movements on the S&P500 index. Furthermore, based on the results of the causality tests, the thesis aims to find a trading algorithm using Google search volumes that, using a backtest strategy, can be 9 shown to give a higher return than the index itself over a 5-year period. 1.3 Problem Statement The problem statement is broken down into two general questions underlying the thesis: • Can Google search volumes be used as a technical indicator for the S&P500 index? • Can a successful investment strategy be based on these potential findings? 1.4 Limitations The thesis only considers the S&P500 index and 30 selected keywords. Search volumes and index prices are limited to the period of March 24th 2013 to March 24th 2018. As the stock markets and overall economy in different countries may vary it is not reasonable to assume that there is an overall global trend in the economy, in the sense of cause-effect mechanisms from Google searches. Thus, in order for the search data to best represent the trend of the American stock market (i.e. the S&P500 index), the search data is geographically limited to within the United States. 1.5 Previous Research Previous research on Google search trends and their predictive ability on the financial market has been conducted in different scale, using various approaches, leading to different conclusions. This section presents a selection of the studies, the tests conducted and their findings. Several studies have managed to prove the predictable properties of Google search volumes on different economic and social indicators. Varian et al. showed how Google trends can be used to forecast near term economic indicators such as automobile sales, travel destinations, and unemployment rates.[7] Four years later L. Kristoufek et al. used an autoregressive approach to show that Google search volumes also significantly increases the accuracy of the prediction of suicide rates in the UK, compared to only using historical rates for forecasting.[8] In 2013 Moat, Preis et al. empirically studied the relationship between Google search trends and the financial market with a quantifying approach. By analyzing the search volumes the study identified patterns that could be interpreted as \early warning signs" of upcoming stock market moves.

Load more