Predictive Power of Google Trends Analysis on Euronext Brussels Stock

UNIVERSITEIT GENT

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE

ACADEMIEJAAR 2015 – 2016

Predictive power of Google Trends analysis on Euronext Brussels stock

performance

Masterproef voorgedragen tot het bekomen van de graad van

Master of Science in de Toegepaste Economische Wetenschappen: Handelsingenieur

Sam De Smet

onder leiding van

Prof. Dirk Van den Poel

UNIVERSITEIT GENT

FACULTEIT ECONOMIE EN BEDRIJFSKUNDE

ACADEMIEJAAR 2015 – 2016

Predictive power of Google Trends analysis on Euronext Brussels stock

performance

Masterproef voorgedragen tot het bekomen van de graad van

Master of Science in de Toegepaste Economische Wetenschappen: Handelsingenieur

Sam De Smet

onder leiding van

Prof. Dirk Van den Poel

iii

PERMISSION

Ondergetekende verklaart dat de inhoud van deze masterproef mag geraadpleegd en/of gereproduceerd worden, mits bronvermelding.

Sam De Smet

Abstract (Dutch) Dankzij de groei van het internet hebben we toegang tot een rijkdom aan data, aan een enorme snelheid. Daarnaast worden onze eigen acties online zo goed als allemaal getraceerd, waardoor internetgebruikers op hun beurt een stroom aan waardevolle informatie genereren. Een bekend voorbeeld hiervan is het verkrijgen van inzichten in consumentengedrag via sociale netwerken zoals Twitter en Facebook. Een andere manier om dit onlinegedrag in kaart te brengen is via de zoektermen die worden ingeven op zoekmachines zoals Google en Yahoo.

Deze paper onderzoekt de correlatie tussen de volumes van woorden in zoekmachines en de prestaties op financiële markten. Er wordt nagegaan of het aantal keer dat een bepaalde set woorden worden ingegeven in een zoekmachine een voorspellende kracht heeft op de prestaties van de financiële markten, op basis van een set trading rules. Daarna wordt de mogelijkheid onderzocht om een winstgevende strategie te ontwikkelen op basis van deze zoekvolumes.

Ten eerste test dit onderzoek de trading rules voor elk woord uit de set afzonderlijk. Hierbij gebruiken we de marktindex BEL 20 als verhandeld financieel product, en dit voor de periode van 01/01/2004 tot 01/03/2016. De resultaten bevestigen dat er voor een aantal woorden een correlatie bestaat tussen de relatieve veranderingen in zoekvolumes en de prestatie van de marktindex. Wanneer de trading rules gebruikt worden voor deze woorden, halen we een significant hogere cumulatieve opbrengst dan de markt. Een aantal woorden haalt over de hele periode een opbrengst die maar liefst 200% boven de marktopbrengst ligt. Daarnaast bevestigen de resultaten dat woorden die financieel relevant zijn, een hogere opbrengst genereren.

Ten tweede ontwikkelt deze paper een model waarbij gebruik gemaakt wordt van de trading rules en voorspellende methoden. Het model wordt vervolgens getest op de BEL 20, alsook op zes grote internationale indices, nl. AEX, CAC 40, DAX 30, DJIA, FTSE 100 en S&P 500. We vinden dat het model voor elke marktindex een gemiddelde opbrengst geeft die significant hoger ligt dan de gemiddelde marktopbrengst. De gemiddelde opbrengst van het model schommelt tussen de 0.18% en 0.61% per periode van vier weken, over alle beschouwde marktindices. De cumulatieve opbrengsten over de periode van 01/01/2004 tot 01/03/2016 gaan van 110% tot 380%, over de verschillende marktindices.

Dit onderzoek bewijst dat het mogelijk is een winstgevend beleggingsmodel te ontwerpen op basis van data gegenereerd door zoekmachines. De resultaten onderstrepen het enorme potentieel dat schuilt in het analyseren van gegevens die internetgebruikers online achterlaten, en meer bepaald in de context van financiële markten.

vii

Preface

‘New Study Says You Can Predict The Stock Market With Google!’

That is the title of an article from Business Insider that somehow popped up on my Facebook feed in October 2014. The exact title, indeed, including the exclamation mark. With a mix of curiosity and skepticism I started reading. It marked the beginning of my master thesis.

Nineteen months later I am proud of what I have accomplished. Doing the research and writing the paper has been a challenging and exciting project. I want to thank Prof. Dr. Van den Poel for giving me the opportunity to write my thesis on this interesting topic. I also want to thank Bram Steurtewagen for his insightful solutions and support.

This thesis is also the culmination of five years of hard work, which makes it a perfect moment to thank the people who made this possible. I want to thank my girlfriend and friends, who have been there for me along the entire road. Finally, a special word of thanks goes to my parents, whose support enabled me to always perform to the best of my abilities.

viii

Table of contents

INTRODUCTION ...... 1

1. LITERATURE ...... 2

1.1. FINANCIAL MARKET HYPOTHESES ...... 2

1.1.1. EFFICIENT MARKET HYPOTHESIS ...... 2

1.1.2. ARE MARKETS EFFICIENT? ...... 3

1.1.3. CONCLUSION ...... 3

1.2. BEHAVIORAL ASPECTS ...... 4

1.2.1. DECISION MAKING ...... 4

1.2.2. ATTENTION ...... 5

1.3. GOOGLE TRENDS AND SEARCH VOLUME INDICES ...... 5

1.3.1. FUNCTIONALITIES ...... 5

1.3.2. SEARCH VOLUME INDICES ...... 6

1.3.3. TRANSLATED SEARCH TERMS ...... 9

1.4. STOCK MARKET PREDICTION USING SEARCH ENGINE DATA ...... 11

1.4.1. PREIS ET AL. (2013) ...... 11

1.4.2. COMMENTS BY CHALLET & BEL HADJ AYED (2014) ...... 12

1.4.3. RETURN PREDICTION ...... 13

1.4.4. VOLATILITY AND VOLUME PREDICTION ...... 13

1.5. PREDICTIVE ANALYSIS ...... 14

1.5.1. TRAINING AND TESTING THE MODEL ...... 14

1.6. MODELING ENVIRONMENT ...... 14

1.6.1. R PROGRAMMING ...... 14

1.6.2. GOOGLE TRENDS IN R: THE GTRENDSR PACKAGE ...... 14

1.6.3. FINANCE IN R: THE QUANTMOD PACKAGE ...... 15

2. RESEARCH ...... 16

2.1. DATA ...... 16

2.1.1. HISTORICAL CLOSING PRICES AND SEARCH VOLUMES ...... 16

2.1.2. WORD SET ...... 16

2.2. METHODOLOGY ...... 17

2.2.1. RETURNS ...... 17

2.2.2. TRADING RULES ...... 17

2.2.3. MODEL ...... 20

2.2.4. EXAMPLE ...... 26

2.2.5. TIME PERIODS ...... 28

2.3. TRANSACTION COSTS ...... 28

2.4. RESULTS ...... 30

2.4.1. SINGLE WORD RESULTS ...... 30

2.4.2. MODEL RESULTS ...... 32

3. CONCLUSIONS ...... 36

4. LIMITATIONS AND FURTHER RESEARCH...... 37

4.1. DATA GRANULARITY ...... 37

4.2. SVIS AND RETAIL INVESTORS ...... 37

4.3. WORD SETS ...... 37

4.4. TIES ...... 37

4.5. DATA AVAILABILITY ...... 37

REFERENCES ...... 39

APPENDIX ...... 42

List of figures

FIGURE 1. THE DECISION MAKING PROCESS (SIMON, 1955) ...... 4 FIGURE 2. GOOGLE TRENDS OPTION BAR ...... 6 FIGURE 3. INTERFACE OF GOOGLE TRENDS ...... 7 FIGURE 4. RAW GOOGLE TRENDS DATA ...... 7 FIGURE 5. GOOGLE TRENDS RELATED SEARCHES ...... 8 FIGURE 6. GOOGLE TRENDS TOPIC SEARCHES ...... 8 FIGURE 7. SVI OF COMPONENT PARTS OF HOUSE ...... 9 FIGURE 8. SVI OF COMBINED SEARCH TERM HOUSE ...... 11 FIGURE 9. DEFINITION OF ∆N ...... 18 FIGURE 10. TRAINING AND TESTING OF THE MODEL ...... 22 FIGURE 11. ROLLING-WINDOW PRINCIPLE ...... 25 FIGURE 12. TIME PERIOD OF THE EXAMPLE ...... 26 FIGURE 13. APPLYING THE STRATEGY OVER THE TOTAL TIME RANGE ...... 27 FIGURE 14. DEFINITION OF THE TIME FRAME ...... 28 FIGURE 15. RETURNS OF THE GT STRATEGY USING 'MARRIAGE' ...... 31 FIGURE 16. CUMULATIVE RETURN FOR METHOD L ON BEL 20 ...... 35

List of tables

TABLE 1. TRANSLATIONS OF SEARCH TERMS ...... 9 TABLE 2. WEIGHTS PER LANGUAGE ...... 10 TABLE 3.TRADING SIGNALS BASED ON ΔN ...... 18 TABLE 4. REVERSED GOOGLE TRENDS STRATEGY ...... 20 TABLE 5. WORD SET COMPOSITION METHODS ...... 22 TABLE 6. RETURNS FOR THE TRAINING PERIOD ...... 26 TABLE 7. RETURNS FOR THE TESTING PERIOD ...... 27 TABLE 8. CORRELATION BETWEEN FINANCIAL RELEVANCE AND SEARCH WORD PERFORMANCE ...... 31 TABLE 9. AVERAGE FOUR-WEEKLY RETURNS WITHOUT TRANSACTION COSTS ...... 32 TABLE 10. STATISTICAL SIGNIFICANCE WITHOUT TRANSACTION COSTS ...... 33 TABLE 11. AVERAGE FOUR-WEEK RETURNS WITH TRANSACTION COSTS ...... 34 TABLE 12. SIGNIFICANT RETURNS WITH TRANSACTION COSTS INCLUDED ...... 34

xii

List of terms

GT Google Trends SVI Search volume index ETF Exchange traded fund BEL 20 Leading index of the Brussels stock exchange AEX Leading index of the Amsterdam stock exchange CAC 40 Leading index of the Paris stock exchange DAX 30 Leading index of the Frankfurt stock exchange DJIA Dow Jones Industrial Average; leading index of Nasdaq and NYSE FTSE 100 Leading index of the London Stock Exchange S&P 500 Standard & Poor’s 500; leading index of Nasdaq and NYSE

xiii

Introduction The internet has changed the way we look at our world. As internet connections have become widespread, online sources have grown into the most important sources of information. Billions of websites with data on the most diverse subjects can be reached within milliseconds using a search engine such as Google and Yahoo. News from across the world spreads at an unseen speed through social networks such as Twitter and Facebook. The same services enable people from across the world to communicate with each other. But the internet can also help us to change the way we look at ourselves. Many of our actions online, if not all of them, leave digital traces. Via their behavior online, people reveal useful information about their needs, wants, interests, and concerns. One way to analyze this online behavior is to take a look at the search terms that are entered into search engines. Each time we use a search engine, we leave information about our interests codified as search terms.

Research has shown that this search engine traffic can be analyzed to forecast trends in the real world. Nowcasting, as it is called by Castle et al. (2009), refers to predicting a metric of a current state of which the exact figure is only known at the end of a certain period (Challet & Bel Hadj Ayed, 2014). Patterns have been discovered proving that changes in certain search terms correlate to disease infection rates (Ginsberg et al., 2008), GDP estimates (Castle et al., 2009), quarterly company earnings (Da et al., 2011) and US unemployment rates (Choi & Varian, 2011).

Also the financial markets have become a research target for search query analysis. This is not surprising, as stock market prediction has always attracted a lot of attention from scientists. On the question whether the stock market can be predicted, a lot of answers have been formulated. The Efficient Market Hypothesis (EMH) states that investors demonstrate rational behavior, that stock prices fully reflect all available information and that changes in stock prices are only driven by new information. Recent studies, however, have shown that information freely available on the internet in some cases has predictive power on the stock market volumes and returns.

The purpose of the research presented in this paper is two-fold. First we will test the correlation between search volumes and market index returns. Next we will analyze whether search volume data can be used to construct a profitable trading strategy. Using a methodology introduced by Preis et al. (2013), we will further investigate the possibility of discovering insights in market dynamics through the analysis of Google search volumes.

The paper is organized as follows. In section one, the context of the research is outlined. The theoretical background of financial markets is explained, together with its implications. Next to that, this section provides an overview of existing literature on the topic and an insight into the principles of Google search queries. In section two our own research is discussed. In this section the trading model is explained, and the results are presented. The conclusions of our research, together with its limitations and potential sources of further research, are provided in section three. At the end of the paper the appendices can be found.

1 1. Literature

1.1. Financial market hypotheses Since we will be working in the field of stock markets, an introduction to the basic underlying principles of financial markets is given below. This includes a primer on the Efficient Market Hypothesis and its theoretical and empirical consequences. This structure of this section is based on Frömmel (2013).

1.1.1. Efficient market hypothesis The Efficient Market Hypothesis (EMH) was formulated by Fama (1970) and built upon concepts developed in the 19th century. The hypothesis states that financial asset prices follow a random walk1 and fully reflect all information. Therefore, the quoted stock prices are always correct. As all the information is already incorporated in the stock price, the analysis of any available data does not generate excess returns over an entirely random diversified investment strategy. This implies that stock market movements are driven by news, which is by definition unpredictable. In the EMH, news spreads immediately and is incorporated into the financial asset prices without delay. The difference between the price at a certain moment and the price on any other moment entirely depends on the news that reached the investors in the period between those dates.

Fama (1970) distinguishes three degrees of market efficiency: weak form efficiency, semi-strong form efficiency and strong form efficiency.

If current financial asset prices reflect all information from historical prices, the market is weak-form efficient. This means that an investment strategy based on the analysis of historical prices does not earn excess returns in the long run, which implies that technical analysis should be unprofitable (Frömmel, 2013). Also, no patterns or serial dependencies between financial asset prices can exist during a certain time period. Asset prices follow a random walk, and are therefore by no means predictable.

A market shows a semi-strong form efficiency if asset prices reflect all publicly available information. This means that next to historical price movements, all company-specific data (e.g. earnings, sales figures, etc.), trading data (e.g. volume, momentum) and macro-economic data (e.g. unemployment figures, central bank policy) are included in the asset price. In a semi-strong efficient market, fundamental analysis does not earn excess returns, except when the

1 A time series follows a random walk if subsequent changes in a variable represent random deviations from the past changes in that variable. In the case of asset prices this translates to a ‘price series where all subsequent price changes represent random departures from previous prices’ (Malkiel, 2003).

2 investor has access to private information not available to the public. This would in many countries be considered a case of illegal insider trading2.

In the strictest sense of efficiency, strong-form efficiency, no investor can consistently outperform the market over a long period, not even if the investor has access to private information. This form of efficiency is therefore impossible if there are legal barriers to prevent private information from becoming public, which is the case.

Grossman and Stiglitz (1980) point out that an efficient market implies that there is no information processing, as there is no incentive for financial institutions to make this cost. However, this absence of information processing means that the market cannot be efficient. Jensen (1978) already formulated a weaker definition of market efficiency, taking into account the cost of of information processing and the cost of transactions, which was ultimately supported by Fama (1991). This weakened definition states that financial asset prices indeed reflect all available information, as long as the marginal profit made from this information exceeds the marginal costs to process this information.

1.1.2. Are markets efficient? The question whether the EMH holds is important, as it determines whether there exists a possibility that the processing of existing information earns excess return. This has been the subject of countless research papers, challenging the theory empirically and theoretically. Research has discovered a large amount of market anomalies, which contradict the EMH. For more detail we refer to Malkiel (2003). Furthermore, the research analyzing the profitability of technical analysis (TA) proves that certain methods indeed enable investors to earn excess returns.

Following the null hypothesis that the EMH holds, our research should show that a trading strategy based on publicly available information cannot generate excess returns. If our results suggest that it is possible to create a strategy that beats the market using our trading rules, then we have found an anomaly in the EMH.

퐻0 = EMH holds

퐻1 = anomaly in the EMH

1.1.3. Conclusion Malkiel (2003) concludes that the collective judgment of investors sometimes makes mistakes and that those investors are not always driven by purely rational behavior. As a result, incorrect asset prices and predictable

2 Illegal insider trading is defined by the US Securities and Exchange Commission (SEC) as ‘buying or selling a security, in breach of a fiduciary duty or other relationship of trust and confidence, while in possession of material, nonpublic information about the security.’

3 patterns in stock returns can appear over time and even persist for short periods. This is in line with one of the justifications for the profitability of technical analysis formulated by Menkhoff and Taylor (2007). As mentioned by Grossman and Stiglitz (1980), the market cannot be perfectly efficient as there would be no incentive for professionals to uncover and process the information. Malkiel (2003) predicts that researchers and practitioners will find further inconsistencies with the EMH, driven by the increase in available data and data processing capabilities. However, Malkiel’s conclusion is that the patterns and anomalies will not sustain and will not earn investors excess returns in the long run.

The EMH is often illustrated by a story of a finance professor and a student walking along the street. The student notices a $100 bill lying on the ground, and obviously stops to pick it up. ‘Don’t bother trying to pick it up,’ is the professor’s reaction. ‘If it was really a $100 bill, it wouldn’t be there.’ Referring to this well-known story, Malkiel (2003) concludes his research by stating: ‘If any $100 bills are lying around the stock exchanges of the world, they will not be lying around for long.’

1.2. Behavioral aspects Some of the findings that stand in contrast to the EMH refer to behavioral aspects of financial decision making. With regard to our research we focus on the importance of intelligence gathering in the decision making process, as well as two theories focusing on the investors’ attention and its impact on the markets.

1.2.1. Decision making Stock prices and stock price movements are formed by the aggregated investment decisions of market participants. Investment decisions, just like any other decision, follow a stepwise process. A number of decision making models have been developed in literature, but one step most of them have in common has to do with information gathering. We will illustrate the principle with the straightforward three-step decision making process identified by Simon (1955).

Intelligence Collect data and understand problem

Design Find and/or generate alternatives

Evaluate and select alternative Choice

Figure 1. The decision making process (Simon, 1955)

Figure 1 shows that market participants begin a decision making process by gathering information and understanding the problem. Due to the growth of the internet, these information gathering processes mainly happen

4 online, where investors can find a wealth of reliable information in a split second. Analyzing search volumes from online search engines such as Google might thus generate new insights into this first step of the investment decision.

1.2.2. Attention Two related theories exist linking the attention of investors to their actions, and thus indirectly to the share prices. The first theory is the investor recognition hypothesis, formulated by Merton (1987) and empirically confirmed by a number of studies (e.g. Lehavy & Sloan, 2008; Bodnaruk & Östberg, 2004). This hypothesis stipulates that individual investors only buy and hold shares from companies about which they have enough information. This definition can be expanded from company shares to virtually any financial asset. In this situation, a higher visibility of a financial asset leads to a higher degree of investor recognition, which in turn increases the probability of an investor to buy the asset.

The second theory, the price-pressure hypothesis or attention theory, was formulated by Barber & Odean (2008). It claims that individual investors are net buyers of stocks that receive a lot of attention. This implies that an increase in attention to a company increases the probability of an investor to buy the company’s stock, which in turn results in a temporary upwards rise in the stock price.

1.3. Google Trends and Search Volume Indices The search engine used in our research is Google. The most recent figures3 (ranging from March 2015 until March 2016) show that Google has an average market share of 96.4% of all Belgian web searches, with Microsoft’s Bing and Yahoo at 2.9% and 0.4%, respectively. It is therefore a fair assumption that analyzing the search volumes submitted through Google will be representative for all search engines.

Google Trends (GT) is a freely available online service provided by Google, which allows internet users to analyze the relative search volume of different search terms. The results go back to 01/01/2004, and can be broken down by countries, regions, cities and language.

1.3.1. Functionalities We illustrate the functionalities of GT using two straightforward search queries: pharmaceutical company Omega Pharma, and its founder Marc Coucke.

3 The market share of search engines in Belgium can be obtained from HowWeBrowse.com, an open source project by Brussels- based digital advertising and business intelligence agency Semetis3.

The user interface in Figure 2 shows how GT allows us to set the geographical location where we want to analyze the results. Next to that we have to set the time range. Using the categories option, we can select search categories (e.g. Business and Industrial) and subcategories (e.g. Pharmaceuticals & Biotech). The final menu gives us the option to analyze pure text web searches, or other searches like image searches or YouTube searches.

Figure 2. Google Trends option bar

1.3.2. Search volume indices The results of Google Trends come in the form of relative search frequencies or search volume indices (SVI) rather than absolute search frequencies, probably due to privacy reasons. The SVI of a search word at a moment 푡 is an integer between 0 and 100 and is calculated as the search volume 푉푡 of the word at that moment 푡 divided by the maximum search volume 푉푚푎푥,∆푡 of that word during a time period ∆푡. For the remainder of this research the SVI at a moment 푡 will be represented by 푛푡.

푉푖,푡 푆푉퐼 = 푛푡 = 푉푚푎푥,∆푡

With GT it is also possible to enter more than one search term at a time. If a user enters multiple search terms, the results for each word are benchmarked to the highest value among all words entered during the examined time period ∆푡. In our case we enter Omega Pharma and Marc Coucke simultaneously. The corresponding SVIs are given in Figure 3. The highest absolute search volume is reached for Omega Pharma in April 2012, which results in an SVI of 100 at that moment. As explained above, all other search volumes for Omega Pharma are then compared to that reference value to calculate the 푛푡 for Omega Pharma. Since we entered the two search terms simultaneously, the SVI of Marc Coucke is also compared to the maximum volume among all search words. In this case this is the same peak of Omega Pharma in April 2012.

The graph in Figure 4 generated from the extracted SVIs shows a different pattern compared to the one generated by Google. This is explained by the fact that Google aggregates its results on a monthly base on the website, but on a weekly base when we extract the data. The monthly results peaked in April 2012 for Omega Pharma because of four successive weeks of high search volumes. However, the weekly maximum belongs to Marc Coucke, during the second week of September 2015. To avoid any inconsistencies, we will make use of R to process our data (see Section 1.6.).

In our research we will extract search volume data from 01/01/2004 until 01/03/2016. When extracting search volumes over a long period ∆푡, the volumes are by default aggregated on a weekly scale, which means it does not allow us to gain insight in inter- or intraday search volume trends and volatility. To solve this problem, we can

6 download data for smaller time periods and join the resulting time series. Another remark regarding the data is that small changes in the Google Trends data are hidden by the rounding process. Next to that, every new maximum that is reached decreases the precision of the entire time series (and thus increases the granularity). Note also that Google Trends only provides data relating to search terms for which traffic exceeds a certain threshold, so that the words with low search volume will not appear. The Google Trends system also eliminates repeated queries from a single user over a short period of time, so that the level of interest is not artificially impacted by such behavior.

Figure 3. Interface of Google Trends

Historical SVI marc coucke omega pharma 100

60 SVI 40

0 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Figure 4. Raw Google Trends data

Google further provides us with a list of related search terms (Figure 5) for each term we enter. In this case, popular search terms related to Omega Pharma refer to the professional cycling team with the pharmaceutical company as the main sponsor. This example is a caveat that GT cannot make a distinction between the different contexts in which the search term is used. We can think of a similar problem occurring when looking at the search volumes of apple, both referring to the fruit and to the company, or nest, again referring to both the structure built by birds and the company selling intelligent thermostats.

Figure 5. Google Trends related searches

In December 2013 Google added to GT a beta version allowing more refined searches. Using topics, Google’s algorithms count all the search queries that may relate to the same topic, even including misspelled queries. When we apply this to Apple (Technology company), Apple (Fruit) and apple (Search term), we get a more detailed view (Figure 6). This example once again shows the wealth of data at your fingertips when you’re using Google Trends. For the technology company we see distinct peaks each year in September, when the company’s new iPhones are released. The search volume of the fruit shows a recurring trend as well, curiously around the same time as the technology company. Is the internet users’ interest for apples spurred by the fact that they just bought a new iPhone? Probably the explanation is a little less influenced by high-tech, and a little more by mother nature: apples are usually harvested between August and October.

Figure 6. Google Trends topic searches

1.3.3. Translated search terms

1.3.3.1. Approach An interesting characteristic of the model is related to language. Belgium is a trilingual state (Dutch, French, German) and next to that English is used frequently in both the professional and personal environment. Therefore, if we want to analyze search volumes of a certain word, we could combine the volumes for the four translations of that word. Below we construct an approach that enables us to combine the results of a word in the four languages. As an example we take the search volume of house, translated in the four languages used in Belgium. To get an insight in the popularity of house at a certain moment 푡 we have to look at the search volume of house in the four different languages (Table 1).

Translated search terms 푙 Language Translation D Dutch huis F French maison G German haus E English house

Table 1. Translations of search terms

In Section 1.3.1. a distinction is made between entering one word in GT and entering multiple words simultaneously. We define the separate results as the SVIs of a word 푄 when 푄 is entered alone in Google Trends. The combined results are the SVIs of 푄 when it is entered simultaneously with other words. In this case the other words are the translations. We define: 푐 푛푡,푙 = SVI in the language 푙 in the combined results at moment 푡 푠 푛푡,푙 = SVI in the language 푙 in the separate results at moment 푡

The separate results for our example are shown graphically in Figure 7.

SVI of component parts of house 100

80 huis 60 maison SVI 40 haus 20 house 0 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

Figure 7. SVI of component parts of house

1.3.3.2. Aggregated SVI 푐 Our goal is to calculate the aggregated SVI 푛푡 of a word. To calculate 푛푡 we need both the combined values 푛푡,푙 and 푠 푠 the separate values 푛푡,푙. The aggregated SVI 푛푡 takes the sum of the weighted separate search indices 푛푡,푙 over the four languages. The weights 푤푙 for each language 푙 are defined below. This translates to: 푠 푠 푠 푠 푛푡 = 푤퐷 × 푛푡,퐷 + 푤퐹 × 푛푡,퐹 + 푤퐺 × 푛푡,퐺 + 푤퐸 × 푛푡,퐸 Or:

푠 푛푡 = ∑ 푤푙 × 푛푡,l 푙=퐷,퐹,퐺,퐸

푐 The weights 푤푙 for a word in language 푙 are determined using the combined SVIs 푛푡,푙. More specifically, we use the ̅̅̅푐 averages 푛푙 . We define: 푐 푛푡,푙= SVI in the language 푙 in the combined results at moment 푡 ̅̅̅푐 푛푙 = average SVI in the combined results for language 푙 over period ∆푡

푡+∆푡 1 푛̅̅̅푐 = × ∑ 푛푐 푙 ∆푡 푙,푡 푡

̅̅̅푐 The weights 푤푙 for a word in language 푙 are then calculated as the average SVI in the combined results 푛푙 for that ̅̅̅푐 language 푙, divided by the sum of the average SVIs in the combined results 푛푙 over all languages. This results in the following formula: ̅̅̅푐 푛푙 = the average SVI in the combined results for language 푙

푛̅̅̅푐 푤 = 푙 푙 ̅̅̅푐 ∑푙=퐷,퐹,퐺,퐸 푛푙

Table 2 summarizes the results of applying this method to our example of house.

Weights 푤푙 ̅̅̅푐 푙 푛푙 푤푙 D 39.35 27.96% F 65.20 46.32% G 1.80 1.28% E 34.41 24.45% Total 140.76 100%

Table 2. Weights per language

Using these weights 푤푙 we can now calculate the aggregated SVI 푛푡 for each period. The results are plotted in Figure 8.

SVI of combined search term house 100

60 SVI 40

0 2004 2006 2008 2010 2012 2014 2016

Figure 8. SVI of combined search term house

For the remainder of this research we will focus on the analysis of 푛푡 as the SVI of stand-alone words that have not been translated. The approach to translating words mentioned in this section is tested as a side step in the results.

1.4. Stock market prediction using search engine data A number of academic papers have tested the possibility to predict stock market volume or returns based on search engine data. Here we will focus on that part of the literature that looks at predicting the returns. We will start by elaborating on the research of Preis et al. (2013), which provides the methodology that will be used in our research, and the comments by Challet & Bel Hadj Ayed (2014). After that a number of important insights on the relationship of search engine data and stock market returns are explained.

1.4.1. Preis et al. (2013)

1.4.1.1. Approach Preis et al. (2013) conducted research on the predictive power of Google Trends data on market index returns to find patterns that may be interpreted as early warning signs of market index moves. This would imply that GT data do not only reflect the current state of the stock market, but may also be able to predict certain future trends. The research was performed on the Dow Jones Industrial Average (DJIA).

Preis et al. (2013) develop a methodology to generate trading signals (buying or selling signals) based on the changes in the search volume index (SVIs) of a search term. These trading rules are based on the assumption that notable drops in the financial market, and thus in the market index, are preceded by periods of investor concern. The three- step decision making process of Simon (1955) suggests an explanation for this behavior (Section 1.2.1.). In the first step of the decision making process, Intelligence, market participants start gathering information and identifying all

11 the opportunities. In periods preceding a large drop in financial asset prices, investors might indeed just do that: they might search for more information about the market conditions. Since the internet is one of the main information sources for investors, this behavior will result in a higher SVI for words related to market conditions. An increase in price, on the other hand, is assumed to be preceded by a decrease in search volume.

This results in a double hypothesis: a. A decrease in the price of an asset is preceded by an increase in search volume for certain financially relevant terms; b. An increase in the price of an asset is preceded by a decrease in search volume for certain financially relevant terms.

Based on this the authors create a hypothetical investment strategy called the Google Trends strategy (GT strategy). The methodology is further clarified in Section 2.2.

1.4.1.2. Results Preis et al. analyze search volume data and financial data from January 2004 until February 2011. The GT strategy is executed for each word independently, each time for the entire period. The returns are overall significantly higher than those from random strategies and buy-and-hold strategies. However, the performance of the strategy depends on the particular search term chosen. For example, the top performing search term debt yields a return of no less than 326% between 2004 and 2011. It was also found that the difference in performance between the different search terms is correlated with the extent to which the terms were of financial relevance. These results suggest that the trading rules can be used to construct a profitable trading model.

1.4.2. Comments by Challet & Bel Hadj Ayed (2014) In a paper by Challet & Bel Hadj Ayed (2014) a number of remarks are formulated on the methodology and results of Preis et al. (2013). Our research takes the following remarks into account.

1.4.2.1. Trading strategy Challet & Bel Hadj Ayed (2014) state that there is no reason why the hypothetical relationship between search volume and closing prices should hold for every word, nor that the relationship should remain the same for a word during the entire period. We cover this issue by also designing a strategy that is based on the exact opposite hypothesis. We then let the model decide which of the two strategies to use (see Section 2.2.2.).

1.4.2.2. No out-of-sample testing Although Preis et al. (2013) conclude with the suggestion that search engine data ‘could have been exploited in the construction of profitable trading strategies’, this is by no means possible by using the methodology mentioned in the paper. The performance of the independent words is only assessed at the end of the time period. The results

12 cannot be controlled for robustness or consistency, as the performance of the words has not been tested on an independent set of data (out-of-sample testing). We refer the reader to Leinweber’s article Stupid data mining tricks: Over-fitting the S&P 500 (2007) for the full story on the pitfalls of research without out-of-sample periods.

In our research this issue is handled by making use of principles from the field of predictive analytics, namely training and testing a model (see Section 1.5.).

1.4.2.3. No transaction fees The absence of transaction costs results in unrealistically high returns. In our model the transaction costs are included as the sum of all costs and taxes related to trading on the financial market, resulting in more realistic model results (see Section 2.3.).

1.4.3. Return prediction Da, Engelbert & Gao (2011) analyze the possibility to use Google Trends data to measure investor attention. Contrary to Preis et al. (2013), their research focuses on individual stocks instead of market indices and uses ticker symbols instead of words. However, their results lead to a number of interesting conclusions that are relevant to our model.

First, there is strong evidence that search volume data mostly capture the information gathering behavior of retail investors, which are generally less sophisticated investors. Since institutional investors have access to more specialized (and more expensive) information, through services like Reuters or Bloomberg, it also makes sense intuitively that search queries from less sophisticated investors are disproportionately represented in SVIs.

Second, stocks whose ticker symbols’ SVI has increased during a week, on average outperform the market by 0.3% the following two weeks. This is especially true for small stocks and stocks frequently traded by less-sophisticated investors. Looking at the returns in the long run, the paper notices that the initial outperformance is entirely reversed in one year time. These findings are also confirmed in the paper by Joseph et al. (2011) and provide strong support for the attention theory of Barber & Odean (2008).

1.4.4. Volatility and volume prediction A number of studies research the relationship between search volumes and trading volumes and/or volatility. The results are largely consistent over all studies, namely that an increase of search volumes precedes an increase in trading activity and volatility in the market. Preis et al. (2010) find clear evidence of a correlation between weekly transaction volumes of companies that are part of the S&P 500 and weekly search volumes of the corresponding company names. Bank et al. (2011) confirm these results for the German market, Vlastakis & Markellos (2012) do the same for major US stocks and Takeda & Wakao (2013) find similar results in Japan. There is however some

13 disagreement over the length of influence, as Bordino et al. (2012) find that there is only an impact within a one to three days’ lag period, and no impact on a weekly basis.

A lot of the papers mentioned above fail to identify the impact of search volumes on market performance, as it is found that neither buying transactions nor selling transactions are preferred. While the correlation between search volume and trading volume is strong, it is quite often only weakly positive for stock returns. Challet and Bel Hadj Ayed (2014) summarize that ‘using this kind of data to predict volume or volatility is relatively easy, but the correlation with future price returns is much weaker’.

1.5. Predictive analysis

1.5.1. Training and testing the model The terms training and testing are used frequently in data science and machine learning. A training set is a set of data used to discover the possibly predictive relationships between variables. Using the test set, the discovered relations are tested on a different dataset to see whether they actually have predictive power.

The terms training and testing loosely correlate to the terms formation and holding frequently used in financial portfolio strategies. Since datasets in finance are time series, the sets can also be referred to as periods. During the formation period the performance of a set of assets is analyzed, and a portfolio of financial assets is formed based on predefined criteria. During the holding period the performance of the selected portfolio of assets is tested.

For reasons of clarity, we will use the terms training and testing period throughout this research when referring to this methodology.

1.6. Modeling environment

1.6.1. R Programming The model has been developed in the R programming language. R is widely used by statisticians and data scientists for data analytics purposes. Over the years a large community has developed many add-ons (called packages), which allow users to simplify their code by using these predefined functions. The programming code used in this research can be found in Appendix (4.).

1.6.2. Google Trends in R: the gTrendsR package The gTrendsR package developed by Philippe Massicotte and Dirk Eddelbuettel was released in November 2015. The package allows users to retrieve and display information returned online by Google Trends. Using the functions’

14 arguments, the search can be specified to geographic locations and time periods. Next to the SVIs, the functions in the package return all of the detailed information Google Trends provides in the online version. This includes geographical details (sub regions and cities) as well as related searches. This package is used in our research to retrieve search engine data.

1.6.3. Finance in R: the quantmod package The quantmod package was developed by Jeffrey Ryan, Joshua Ulrich and Wouter Thielen. It is widely used to specify, build, trade, and analyze quantitative financial trading strategies in R. The package is used in our model to retrieve financial data from Yahoo Finance and Oanda.

2. Research 2.1. Data Three types of data sets are used in our research: historical search volumes, historical closing prices and a set of search terms. The search volumes and closing prices are time series starting at 01/01/2004 and ending at 01/03/2016. This is a time span of 12 years and 2 months, or 634 weeks in total.

2.1.1. Historical closing prices and search volumes Historical search volume data are retrieved from Google Trends through the gtrends package in R (see Section 1.6.2.). The quantmod package in R (see section 1.6.3.) is used to retrieve weekly closing prices for financial assets from Yahoo Finance (for stocks and indices) and Oanda (for exchange rates) (see Section 1.6.3.).

2.1.2. Word set Preis et al. (2013) identified a set of 98 search terms. The dataset consists of words related to stock markets and finance in general (e.g. stocks, debt), combined with words suggested by Google Sets. The full dataset can be found in Appendix (2.1.).

Google Sets was a beta-service developed by Google Labs, and allowed users to search for semantically related keywords. If a user typed, for example, Renault and BMW, Google Sets returned a list of other car manufacturers. Together with Google Labs the service was discontinued in 2011, but remained accessible through the spreadsheet application of Google Drive until 2014.

The search term debt has an obvious semantic connection to the world of finance, while some of the other words in the used data set are definitely less financially relevant. Examples are religion, color or cancer. After composing the set of search terms, the authors calculated the normalized financial relevance 푅푓 of each of the terms based on the occurrence of the term in the Financial Times, using the following formula: 푛푢푚푏푒푟 표푓 ℎ𝑖푡푠 𝑖푛 퐹𝑖푛푎푛푐𝑖푎푙 푇𝑖푚푒푠 푅 = 푓 푛푢푚푏푒푟 표푓 ℎ𝑖푡푠 표푛 퐺표표푔푙푒

The results of the financial relevance of the words are taken from Preis et al. (2013) and can be found in Appendix (2.2.).

In our research the same word set will be used, expanded by a translation of these search terms in the three official languages of Belgium using the online versions of Van Dale (Dutch), Larousse (French) and Beolingus (German). The translated word set can be found in Appendix (2.1.).

2.2. Methodology

2.2.1. Returns The return of any investment strategy or portfolio can be expressed using either percentages or natural logarithms. We distinguish:

푝푡 = asset price at time 푡 푟% = portfolio return expressed in percentages 푟푙푛 = portfolio return expressed in natural logarithms

These are defined as follows:

푝푡 − 푝푡−1 푟% = 푝푡−1

푝푡 푟푙푛 = ln( ) = ln(푝푡) − ln(푝푡−1) 푝푡−1

The percentage returns are calculated as the difference between the buying price and the selling price, divided by the buying price. The logarithmic returns are calculated as the difference between the natural logarithms of the two prices.

As suggested by Frömmel (2013) we will work with logarithmic returns instead of percentage returns. Our research contains a model which aggregates a number of small returns to determine the overall return, and an important advantage of the logarithmic notation is that it enables us to easily aggregate returns. The difference in returns between both methods is negligible in the case of those small returns, as in this case the logarithmic function can be approximated by a linear function:

푟% ≈ 푟푙푛

For the evaluation of the performance of a trading strategy, the excess return over a benchmark is calculated. Excess returns learn us whether a strategy beats the markets or not, by comparing its performance with the performance of another strategy or the market. For example, when the 10-year DJIA return is 50% and a certain strategy X yields a return of 60% over 10 years, strategy X is profitable and beats the market. The excess return is 10%. On the other hand, a strategy Y that yields a return of 40% is still profitable but does not generate above-market returns. In this case, an investor that buys and holds the DJIA over 10 years has a 10% better result than an investor using the strategy Y.

2.2.2. Trading rules Based on the hypotheses developed in Section 1.4.1., Preis et. al (2013) designed a set of two trading rules: a. An increase in search volume triggers a sell signal; b. A decrease in search volume triggers a buy signal.

In this section the trading rules are explained. Further in Section 2.2.3. these rules will act as building blocks to combine into our trading model.

2.2.2.1. Change in search volume

We define the relative change ∆푛푡 in SVI during week 푡 using the formula below. It gives the difference between the search volume 푛푡 for week 푡 and the average search volume 푁푡−1,∆푡 of the previous ∆푡 weeks. This is an unweighted average, since the SVI in each week are considered to be of equal importance. 푡 = index of a week ∆푡 = number of weeks looking back 푛푡 = search volume index in week 푡 ∆푛푡,∆푡 = change in search volume index 푛 during week 푡 compared to the previous ∆푡 weeks

∆푛푡,∆푡 = 푛푡 − 푁푡−1,∆푡

Where 푁푡−1,∆푡 is the unweighted average search volume of the previous ∆푡 weeks.

푛푡−1 + 푛푡−2 + ⋯ + 푛푡−∆푡 푁 = 푡−1,∆푡 ∆푡

Schematically we can look at ∆푛 during week 푡 as the difference between the search volume index 푛푡 in week 푡 and the average of the search volume indices of the previous ∆푡 weeks.

Weeks t-Δt … t-2 t-1 t

SVI nt-Δt … nt-2 nt-1 nt

average = Nt-1, Δt

Δnt, Δt = nt - Nt-1, Δt Figure 9. Definition of ∆n

The trading signals based on the relative change ∆푛푡 in SVI are summarized in Table 3.

Trading signals

Δnt Δpt+1 signal < 0 > 0 buy > 0 < 0 sell Table 3.Trading signals based on Δn

We call parameter ∆푡 the lag parameter. It is defined as the number of previous weeks that is considered in the calculation of the relative change in search volume ∆푛 during week 푡. In the paper by Preis et al. (2013) a lag parameter between 1 and 6 is used.

2.2.2.2. Strategies In a first strategy, called the long strategy, long positions are taken following a decrease in search volume, and no short positions are taken. In a second strategy, the short strategy, short positions are taken following an increase in search volume, and no long positions are taken. An important condition for these strategies is that it is possible to sell (or short) something that you don’t have, which is indeed the case in financial markets.

In our research we will use a third strategy, the Google Trends strategy (GT strategy), which combines the long and the short strategy to a set of trading rules depending on the difference in search volume. The GT strategy distinguishes two different situations, namely an increase and a decrease in search volume ∆푛.

An increase in search volume results in a positive ∆푛:

∆푛푡,∆푡 > 0

This situation triggers two trading signals:

a. Sell the asset at the closing price 푝푡 of the last trading day of week 푡;

b. Buy the asset at the closing price 푝푡+1 of the last trading day of the following week (푡 + 1).

The second signal, which in this case means buying the asset at the end of week (푡 + 1), is there for the purpose of closing the position each week.

+ The return of the strategy using the search term 푄 during week 푡 in case of a positive ∆푛푡,∆푡 is 푟푡,푄. It is defined as the natural logarithm of this week’s closing price 푝푡 minus the natural logarithm of next week’s closing price 푝푡+1. + 푟푡,푄 = return in week 푡 of word 푄 if ∆푛푡,∆푡 > 0 푝푡 = closing price in week 푡 푝푡+1 = closing price in week (푡 + 1)

+ 푟푡,푄 = ln(푝푡) − ln (푝푡+1)

A decrease in search volume results in a negative ∆n:

∆푛푡,∆푡 < 0

This situation triggers two trading signals:

a. Buy the asset at the closing price 푝푡 of the last trading day of week 푡;

b. Sell the asset at the closing price 푝푡+1 of the last trading day of the following week (푡 + 1).

Here again the position is closed at the end of week (푡 + 1), in this case by selling the asset.

− The return of the strategy using the search term 푄 during week 푡 in case of a negative ∆푛푡,∆푡 is 푟푡,푄.. It is defined as the natural logarithm of next week’s closing price 푝푡+1 minus the natural logarithm of this week’s closing price 푝푡.

− 푟푡,푄 = return in week 푡 of word 푄 if ∆푛푡,∆푡 < 0 푝푡 = closing price in week 푡 푝푡+1 = closing price in week (푡 + 1)

− 푟푡,푄 = ln(푝푡+1) − ln(푝푡)

In the research of Preis et al. (2013) the financial asset on which the strategy is performed is the DJIA, a leading US market index. In practice, investing in a market index can be realized by investing in exchange traded funds (ETFs).

2.2.2.3. Reverse strategies The double hypothesis put forward by Preis et al. (2013) is only a hypothesis. Challet and Bel Hadj Ayed (2014) point out that there is no reason why the relationship between increasing (decreasing) SVIs and decreasing (increasing) returns should hold over all periods and/or for all financial assets. It is possible that for a certain word the relationship changes over time. For example, the connotation of a word could change because of news during the period. It is also possible to identify words that consistently demonstrate the opposite relationship between changes in SVI and returns.

This comment is backed by the attention theory by Barber & Odean (2008) and the paper by Da et al. (2011), whose findings suggest that an increase in returns is preceded by an increase in search volume, albeit for search words referring to individual stocks rather than words referring to market conditions.

This insight lead to the creation of the Reverse Google Trends strategy (RGT strategy), which uses the exact opposite hypothesis compared to the Google Trends strategy. In the reverse strategy, a decrease in 푛푡 is followed by a decrease in 푝푡+1, resulting in a buy-signal. An increase in 푛푡 is followed by an increase in 푝푡+1, resulting in a sell- signal. Reversed trading signals

Δnt Δpt+1 signal < 0 < 0 sell > 0 > 0 buy Table 4. Reversed Google Trends strategy

This implies that the RGT strategy uses the exact opposite trading signals, and thus generates exactly opposite returns.

2.2.3. Model The trading rules from the GT strategy and the RGT strategy are now used to develop an active trading model. Our model consists of a training period of one month and a testing period of one month. The process consists of five major steps:

1. Identification of the traded asset and geography; 2. Definition of a relevant word set; 3. Training period; 4. Testing period; 5. Application of the strategy over a period.

Next we will zoom in on each of these steps.

2.2.3.1. Identification of the traded asset and geography The model uses the closing prices on the last day of the trading week of an exchange-traded asset. Since the financial data are loaded using Yahoo Finance or Oanda, the selection of assets is limited to the assets covered by these platforms. Within these constraints, the model can be used on the following exchange-traded assets: - Company stock quotes (Yahoo Finance); - Stock market indices (Yahoo Finance; - Foreign exchange rates (Oanda).

Once an asset is chosen, we select the appropriate geographical area of Google search queries. Next we continue to the definition of the relevant word set.

2.2.3.2. Definition of a relevant word set In our model the relevant word set is defined as the word set used by Preis et al. (2013). The complete set can be found in Appendix (2.1.).

2.2.3.3. Training period During the training period, the performance of each word in the word set is calculated using the GT strategy. The total return 푟푝,푄 of a word 푄 over a period 푝 is the sum of the individual returns 푟푡,푄 during the weeks 푡. In our model the period 푝 is defined as four consecutive weeks.

푟푡,푄 = return in week 푡 of word 푄 푟푝,푄 = return over period 푝 of word 푄

푡+3

푟푝,푄 = ∑ 푟푡,푄 푡

Next, the words are ordered according to the return that was realized during that month. Based on this ranking, the set of words is composed that will be used during the testing period. In our model we will construct a set of ten words.

We can distinguish three different methods to compose this set. We can select according to the following criteria: a. The top performing words; b. The worst performing words; c. A combination of the top performing and the worst performing words.

Based on Section 2.2.2. we can distinguish two exactly opposite strategies for each word: a. Google Trends strategy; b. Reversed Google Trends strategy.

This results in a total of eight possible methods to construct the word set, as summarized in Table 5.

Methods ID Strategy Explanation F First ten Take the ten best performing words, execute the GT strategy Take the ten best performing words, execute the RGT FR First ten reversed strategy Take the ten worst performing words, execute the GT L Last ten strategy Take the ten worst performing words, execute the RGT LR Last ten reversed strategy Take the five best and the five worst performing words, FL First five, last 5 execute the GT strategy Take the five best and the five worst performing words, FLR First five, last 5 reversed execute the GT strategy for the five best and the RGT strategy for the five worst Take the five best and the five worst performing words, FRL First five reversed, last 5 execute the RGT strategy for the five best and the GT strategy for the five worst Take the five best and the five worst performing words, FRLR First five reversed, last 5 reversed execute the RGT strategy

Table 5. Word set composition methods

2.2.3.4. Testing period The testing period of four weeks follows immediately after the training period. In this phase the ten selected words are used in the GT or the RGT strategy, as determined by Table 5. Figure 10 illustrates this principle.

Weeks 1 2 3 4 5 6 7 8 Period 1

Selection of 10 words

Figure 10. Training and testing of the model

The performances of the individual words 푟푝,푄 over a period 푝 are summed to calculate the performance of the strategy during the period 푟푝.

푟푝,푄 = return over period 푝 of word 푄 푟푝 = total return over period 푝

푟푝 = ∑ 푟푝,푄 푄

This sum can also be decomposed on a weekly basis. The return 푟푡 of the entire set of words in week 푡 is the sum of the individual returns 푟푡,푄 of the ten words in that week 푡.

푟푡,푄 = return in week 푡 of word 푄 푟푡 = total return in week 푡

푟푡 = ∑ 푟푡,푄 푄

To then calculate the performance of the strategy during the four week period 푟푝 we take the sum of the return 푟푡 of the set of ten words in week 푡 for the four weeks that constitute our period.

푟푡 = total return in week 푡 푟푝 = total return over period 푝

푡+3

푟푝 = ∑ 푟푡 푡

When we look at this sum from a practical point of view, we can see that the total return in a certain week 푡 is the aggregated return of the words 푄 during that week. This is an important insight because it tells us something about the number of transactions that have to be executed during any given week 푡. As we discussed before (see Section

2.2.2.), based on the ∆푛푡,∆푡 of a word 푄 during a week 푡 two trading possibilities arise. Instead of executing the orders for each word with an equal sign of ∆푛푡,∆푡 independently, we can aggregate the orders so that the number of transactions decreases. We will use an example to clarify this step.

In our case of a set of ten words, imagine that we use the F method from Table 5 and seven words have a positive

∆푛푡,∆푡. This means that each of those seven words triggers two trading signals for the financial asset:

a. Sell the asset at the closing price 푝푡 of the last trading day of week 푡;

b. Buy the asset at the closing price 푝푡+1 of the last trading day of the following week (푡 + 1).

This would result in seven selling orders of a certain amount invested 퐼 at the end of week 푡 and seven buying orders of a certain amount invested 퐼 at the end of week (푡 + 1), or a total of 14 transactions. However, we can aggregate

23 the signals by executing the selling order for an invested amount of seven times 퐼, and do the same for the buying order. This way we reduce the number of transactions for the words with a positive ∆푛푡,∆푡 during week 푡 to just two.

During that same week 푡 three words have a negative ∆푛푡,∆푡. Since we use the F method, for each word the following trading signals are triggered:

a. Buy the asset at the closing price 푝푡 of the last trading day of week 푡;

b. Sell the asset at the closing price 푝푡+1 of the last trading day of the following week (푡 + 1).

Here again we follow the same procedure. Instead of having three buying orders of amount 퐼 and three selling orders of amount 퐼, this results in one buying order for an amount of three times 퐼 and one selling order for the amount of three times 퐼.

This way we reduced the number of transactions from 20 to just four per week. However, notice that the two possible values of ∆푛푡,∆푡 (positive or negative) yield returns that are each other’s exact opposites. Remember that the return 푟푡 of the entire portfolio of words in week 푡 is the sum of the individual returns 푟푡,푄 of the ten words in + − that week 푡. We define 푘푡 as the number of words that have a positive ∆푛푡,∆푡 during week 푡 and 푘푡 as the number of words that have a negative ∆푛푡,∆푡 during week 푡. + 푘푡 = number of words that have a positive ∆푛푡,∆푡 during week 푡 − 푘푡 = number of words that have a negative ∆푛푡,∆푡 during week 푡

In our example we chose 푘푡,+ = 7 and 푘푡,− = 3. Since the word set contains 10 words: + − 푘푡 + 푘푡 = 10

Combining this with the equations from Section 2.2.2.2.: + 푟푡,푄 = ln(푝푡) − ln (푝푡+1) if ∆푛푡,∆푡 > 0 − 푟푡,푄 = ln(푝푡+1) − ln(푝푡) if ∆푛푡,∆푡 < 0

And with the equation for the total weekly return 푟푡:

푟푡 = ∑ 푟푡,푄 푄

We get: + + − − ⟺ 푟푡 = 푘푡 × 푟푡,푄 + 푘푡 × 푟푡,푄 + − ⟺ 푟푡 = 푘푡 × [ln(푝푡) − ln(푝푡+1)] + 푘푡 × [ln(푝푡+1) − ln(푝푡)] + − ⟺ 푟푡 = (푘푡 − 푘푡 ) × [ ln(푝푡) − ln(푝푡+1)] + − + ⟺ 푟푡 = (푘푡 − 푘푡 ) × 푟푡,푄

+ We can conclude that the total return 푟푡 of the 10 words in week 푡 is the difference between 푘푡 , the number of − words that have a positive ∆푛푡,∆푡 during week 푡, and 푘푡 , the number of words that have a negative ∆푛푡,∆푡 during week 푡, multiplied by the return of one word with a positive ∆푛푡,∆푡 during the week 푡. Going back to our example this means that the returns of the three words with a negative ∆푛푡,∆푡 are canceled out by the returns of three words with a positive ∆푛푡,∆푡. The final return 푟푡 is in this case formed by the remaining four words with a positive ∆푛푡,∆푡. As a result, if we were to implement this model in practice, we would execute two orders each week regardless of the amount of words in the word set we select. This dramatic decrease in transactions has a positive effect on the transaction costs (see Section 2.3.).

2.2.5.5. Application of the strategy over the entire period To evaluate the trading strategy, we will apply it on a rolling-window base during the entire time frame from 01/01/2004 to 01/03/2016. The rolling-window principle means that a new strategy with a duration of eight weeks (i.e. four weeks of training and four weeks of testing) is started each week. This is illustrated in Figure 11.

Weeks 1 2 3 4 5 6 7 8 9 10 11 12 Period 1 Period 2 Period 3 Period 4 Period 5

Training period

Testing period Figure 11. Rolling-window principle

The average performance of our model 푟푚 is calculated by averaging the returns 푟푝 of each period over the total number of periods.

푟푝 = total return over period p 푟푚 = model performance

#푝푒푟푖표푑푠 1 푟 = × ∑ 푟 푚 #푝푒푟𝑖표푑푠 푝 푝=1

The number of times we can run the eight-week model depends on a number of constraints in our model (see Section 2.2.5.2.).

2.2.4. Example To further clarify this algorithmic trading model, an example with fictitious data is provided below.

2.2.4.1. Identification of traded asset and geography Assume that we are interested in looking at the correlation of a set of words with the FTSE 100, based on their search volume in the United Kingdom.

2.2.4.2. Definition of a relevant word set In our research the word set suggested by Preis et al. (2013) is used. For this example, however, we suggest a different method to construct the word set. The word set that we are using is the set of top-100 words most commonly used on the website of The Economist (except for words like is, a, etc.).

2.2.4.3. Training period Assume that we start using our model in week one. The training period will consist of weeks five to eight, while the testing period will start in week nine and lasts until week twelve. If we define our lag parameter ∆푡 at four, this means that we need Google Trends data starting from week one.

Weeks 1 2 3 4 5 6 7 8 9 10 11 12 Period 1 Trends Training Testing

Figure 12. Time period of the example

We calculate and list the return 푟푝,푄 for every word at the end of week eight. Assume that we will use the F method, which means that ten best performing words will be selected. This top-10 list of words is represented in Table 6.

Training period Index Word Excess return 1 finance 4.56% 2 banks 4.37% 3 fish 4.12% 4 chips 3.97% 5 mortgage 3.56% 6 derivative 3.55% 7 future 3.38% 8 gold 3.29% 9 commodity 3.11% 10 euro 3.04% Table 6. Returns for the training period

The search term finance in the GT strategy yields an excess return of 4.56% over the FTSE 100. It is closely followed by the search term banks, and the top-10 is closed by euro. These results indicate that this set of words performs best in the current market environment. Following the F method, these words will form the set of ten words used in the testing period.

2.2.4.4. Testing period At the end of week eight we know which words to use in week nine to twelve, and we are ready for the testing period. Each trading week a number of buying and selling signals come from the Google Trends analysis. We follow our selected strategy and at the end of the four-week period analyze the results (Table 7).

Testing period Index Word Excess return 1 commodity 5.89% 2 mortgage 4.44% 3 banks 3.38% 4 derivative 3.23% 5 chips 2.16% 6 gold 0.87% 7 future -0.71% 8 finance -1.54% 9 euro -3.65% 10 fish -4.26% Portfolio 9.81% Table 7. Returns for the testing period

The set of the ten best performing words in the training period yields an excess profit of 9.81% in the testing period. As we can see with euro or finance, this doesn’t necessarily mean that the return of each of the words has to be positive. It is the combination of these ten words that beats the FTSE 100 by almost 10%.

2.2.4.5. Application of the strategy over the entire period Each week a new four-week period is started. This means that in the second week, one week after we started Period 1, a subsequent four-week Period 2 is started. This time the training period starts in week six, while the testing period starts in week ten.

Weeks 1 2 3 4 5 6 7 8 9 10 11 12 13 Period 1 Trends Training Testing Period 2 Trends Training Testing Figure 13. Applying the strategy over the total time range

This process continues in a similar way as described above for the following weeks. At any given time, the model’s performance is calculated as the average of the returns during each period.

2.2.5. Time periods

2.2.5.1. Definition a week The start and end dates of a week on Google Trends and on the financial market do not exactly match. Prices of financial assets are quoted on a Monday-to-Friday basis, while Google identifies a week between Sunday and the following Saturday. We follow Google’s week format.

Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue

Financial markets week Google Trends week Figure 14. Definition of the time frame

2.2.5.2. Tails of the time series As mentioned before, weekly data are used from 01/01/2004 until 01/03/2016. This results in 634 data points. The total number of times the model can be executed is lower than this. Both at the beginning and at the end, a couple of data points cannot be fully used by the model.

The first model return is recorded in week 12. The last model return, on the other hand, is recorded in week 634, but is based on GT data from week 623. Every model that starts after week 623 cannot fully complete the full analysis, consisting of four weeks of GT data, four weeks of training and four week of testing. Since eleven weeks are lost at the beginning and eleven weeks are lost at the end, the model can be run 612 times in total.

2.3. Transaction costs As remarked by Challet and Bel Hadj Ayed (2014), Preis et al. (2013) do not take transaction costs into account. In our research the transaction cost 푐푡 of an order is defined as the total cost an investor incurs when placing an order. Including these costs in the model will result in more realistic results.

In line with the methodology, a distinction can be made between the training period and the testing period. During the training period, the impact of transaction costs on our model is zero. Due to the transaction costs, the performance during that period for any given word will indeed be lower. However, these costs will not impact the ranking, since they are independent of the word used. The ten words that are finally selected to form the word set will be the same, regardless of what method is followed.

In the testing period on the other hand, the transaction costs have an important impact on the performance of the model. During the four weeks of the testing period, a number of transactions are executed. As mentioned in Section 2.2.3.5., the number of orders can in practice be lowered to two per week. This gives us for each four-week periods:

푐푡 = transaction cost per order 퐶푡 = total transaction costs per period of four weeks

푡+3

퐶푡 = ∑(푐푡 × 2) 푡

Assuming that transaction costs remain constant over all periods, this results in:

퐶푡 = 푐푡 × 8

The total transaction costs 퐶푡 for a period of four weeks amount to eight times the costs for an individual order.

The cost per order 푐푡 depends on the amount invested per order, and can be quantified as a percentage of this investment. For the model to generate profit during a four-week period, the following condition has to be met:

푟푝 − 퐶푡 ≥ 0

⇔ 푟푝 − 푐푡 × 8 ≥ 0

⇔ 푟푝 ≥ 푐푡 × 8

푟푝 ⇔ 푐 ≤ 푡 8

Our model yields a profit if the costs are lower than the period return 푟푝 divided by eight. Assume that the transaction cost per trade amounts to 2 basis points (bps, i.e. 0.02%), as suggested by Challet and Bel Hadj Ayed (2014). This means that for the model to be profitable during a period, the period return 푟푝 should meet the following condition. Given that:

푐푡 = 0.02%

We get:

푟푝 ≥ 0.02% × 8

푟푝 ≥ 0.16%

This means that only if the period return is higher than 0.16%, the model proves to be profitable during that period.

For the reason of flexibility, transaction costs are implemented in the model as a parameter.

2.4. Results In the first section the analysis of Preis et al. (2013) is repeated for the Belgian stock index BEL 20 using Belgian search engine date. We will call this the single word results. The analysis is also executed for the words translated in the four languages used in Belgium.

In the second section the model explained in Section 2.2.4. is executed. We start again by testing the strategy on the BEL 20 using Belgian search volumes. Next to that, we test the strategy on six other major stock market indices, namely the AEX, CAC 40, DAX 30, DJIA, FTSE 100 and the S&P 500.

2.4.1. Single word results

2.4.1.1. Not translated First we look at the results of the GT strategy using the individual words for the period between 01/01/2004 and 01/03/2016. The Google search words are entered in Belgium and the asset used in the strategy is the BEL 20 stock index. The words are the ones used by Preis et al. (2013), in this case not translated, and a lag period is defined of one to six weeks. The complete results can be found in Appendix (3.1.).

For each lag period we find that the cumulative returns of certain words are significantly higher than the market returns. The total cumulative return of a buy-and-hold strategy of the BEL 20 during the period 2004-2016 amounts to 46%. The top performing words over the six lag periods generate a return between 158% and 231%, which corresponds to an excess return between 112% and 185%. We also find that the performance of a word depends on the lag period, as the ranking of the words is not the same for every lag period.

Let’s zoom in on an example. For a lag period of three weeks, the top performing search term in this word set is marriage. The strategy using marriage results in a strategy return over the investment period of 225%, which beats the market by a full 180%. Also, when compared to the results of Preis et al. (2013), marriage confirms its predictive potential as the word secures a spot in the upper quartile of the word set. Interestingly, one would not immediately associate this word with anything financial. The financial relevance parameter indeed ranks the term only in the bottom quartile of the word set. Figure 15 illustrates the weekly returns (a) and cumulative returns (b) of the GT strategy using the word marriage.

Next in ranking after this remarkable result, comes a set of words that do have economic or financial significance, according to common sense as well as according to the financial relevance parameter (see Appendix (2.2.)). With cash, housing, hedging, banking, hedge, inflation and invest, we identify seven words that generate an excess return of at least 100% over the buy-and-hold strategy of the BEL 20.

(a) Weekly return using marriage (b) Cumulative return using marriage 30% marriage B&H BEL20 300% 225%

20% 250% 200% 10% 150%

0% 100% 46%

50% -10% 0%

-20% -50% 2004 2006 2009 2012 2014 2004 2006 2008 2010 2012 2014 2016

Figure 15. Returns of the GT strategy using 'marriage'

Based on the approach of Preis et al. (2013) we investigate whether the difference in return between the search terms can be partially explained by their financial relevance. Using Kendall’s rank correlation coefficient, we find that the return associated with a word is indeed correlated with its financial relevance score. The Spearman rank correlation coefficient confirms this conclusion. Repeating this test for the entire range of lag periods leads to consistent results. This is an important insight, as it means that a word with higher financial relevance on average performs better in the model than a word with a lower financial relevance. These results are consistent with the findings of Preis et al. (2013).

Rank correlation (N=98) Lag ∆푡 Method Coefficient p Kendall τ=0,1822 0.00807 1 Spearman ρ=0,2624 0.00904 Kendall τ=0,1801 0.00883 2 Spearman ρ=0,2596 0.00983 Kendall τ=0,1877 0.00635 3 Spearman ρ=0,2809 0.00507 Kendall τ=0,1450 0.03495 4 Spearman ρ=0,2317 0.02169 Kendall τ=0,1522 0.02687 5 Spearman ρ=0,2429 0.01593 Kendall τ=0,1687 0.01418 6 Spearman ρ=0,2515 0.01248

Table 8. Correlation between financial relevance and search word performance

2.4.1.2. Translated Referring to Section 1.3.3., it seems interesting to incorporate translations of the search words as a side step in the analysis. However, when executing the GT strategy on the translated search terms we encounter a technical difficulty. The R package gtrendsR used to retrieve GT data throws an error. For more information we refer to Section 4.5. Due to the technical error only the results for lag parameters of one week and two weeks are available. Next to that only 58 of the 98 search terms return an SVI that is significantly different from zero for each of the languages. The limited results can be found in Appendix (3.1.).

2.4.2. Model results We use a lag period ∆푡 of 4 weeks, a training period of 4 weeks and a testing period of 4 weeks. The model is executed on the 634 weeks between 01/01/2004 and 01/03/2016. The search volumes used in the remainder of this section are referring to search terms that are not translated.

2.4.2.1. Transaction costs not included Table 9 summarizes the average four-weekly returns of the model using the seven major indices without transaction costs, for each of the eight methods of word set composition (see Section 2.2.3.3.). The detailed results can be found in Appendix (3.2.1.).

Average four-weekly returns without transaction costs index F FR L LR FL FRLR FLR FRL BEL 20 0.17% -0.17% 0.63% -0.63% 0.35% -0.35% -0.05% 0.05% AEX 0.57% -0.57% 0.65% -0.65% 0.61% -0.61% -0.03% 0.03% CAC 40 0.21% -0.21% 0.44% -0.44% 0.38% -0.38% 0.07% -0.07% DAX 30 0.28% -0.28% 0.70% -0.70% 0.27% -0.27% -0.16% 0.16% DJIA 0.18% -0.18% 0.77% -0.77% 0.44% -0.44% -0.36% 0.36% FTSE 100 0.34% -0.34% 0.46% -0.46% 0.45% -0.45% -0.05% 0.05% S&P 500 0.29% -0.29% 0.74% -0.74% 0.41% -0.41% -0.34% 0.34%

AVG 0.29% -0.29% 0.63% -0.63% 0.41% -0.41% -0.13% 0.13% Table 9. Average four-weekly returns without transaction costs

A couple of interesting findings can be derived from this table. Without taking transaction costs into account, the returns of the methods using the GT strategy compared to those using the RGT strategy are each other’s exact opposite. It is striking that over all the indices, the reverse strategies generate a negative average return, while the regular strategies generate a positive return. The signs only differ once, this is for the CAC 40 using the FLR and the FRL method. This is not so surprising, as the absolute value of the returns for these two methods are significantly lower compared to the other methods. A possible explanation is that in both FLR and FRL the word set using the regular GT strategy yields a positive return, while the word set using the reversed strategy yields a negative return.

Over all the indices, the L method, which applies the GT strategy on the ten lowest performing words of the training period, emerges as the highest performer. There is no exception. The overall highest performance is recorded for the L method used on the DJIA, with an average four-weekly return of 0.77%. Subsequently the LR method, as the reversed method of L, yields the overall lowest performance with -0.77%. The second best performing method is the FL method, with one exception: on the DAX 30 the F method outperforms the FL method with one basis point on average.

Next the returns are tested for statistical significance. To find out whether the model beats the market, we compare the returns of the model to the market returns of the respective market index using a T-test. The market returns are equivalent to the returns from a buy-and-hold strategy on the index. In a buy-and-hold strategy an investor buys the index at the beginning of each four-week period, and sells it again at the end.

Table 10 summarizes the findings. For the BEL 20 only the L method proves to significantly outperform the average market return on a four-weekly basis. The same is true for the DJIA and the S&P 500. For the Amsterdam AEX both the L method and the FL method prove to significantly beat the market. Applying the model to the CAC 40, the DAX 30 or the FTSE 100 does not generate a significantly positive return. We conclude that it is possible to design a profitable trading strategy using certain stock indices, albeit without taking into account transaction costs.

Statistical significance (no transaction costs) Index Method T p BEL 20 L -2.424709 0.00777 L -1.786558 0.03713 AEX FL -1.778148 0.03781 CAC 40 not significant DAX 30 not significant DJIA L -2.313127 0.01044 FTSE 100 not significant S&P 500 L -1.849789 0.0323

Table 10. Statistical significance without transaction costs

2.4.2.2. Transaction costs included Table 11 summarizes the average four-weekly returns of our model using the seven major indices with transaction costs included. The only difference between Table 9 and Table 11 is that each return with transaction cost is exactly 0.16% lower than the return without transaction costs. Apart from that, all of the findings in 2.4.2.1. still hold. Running the model on the DJIA and using the L method yields an average four-weekly return of 0.61%, or 61 basis

33 points. On the BEL 20 the L method results in an average four-weekly profit of about half a percentage (0.48% or 48 basis points).

Average four-weekly returns with transaction costs index F FR L LR FL FRLR FLR FRL BEL 20 0.01% -0.33% 0.48% -0.79% 0.19% -0.51% -0.20% -0.11% AEX 0.41% -0.73% 0.49% -0.81% 0.45% -0.77% -0.19% -0.13% CAC 40 0.05% -0.37% 0.28% -0.60% 0.22% -0.54% -0.09% -0.23% DAX 30 0.12% -0.44% 0.54% -0.86% 0.11% -0.43% -0.32% 0.00% DJIA 0.02% -0.34% 0.61% -0.93% 0.28% -0.60% -0.52% 0.20% FTSE 100 0.18% -0.50% 0.30% -0.62% 0.29% -0.61% -0.21% -0.11% S&P 500 0.13% -0.45% 0.58% -0.90% 0.25% -0.57% -0.50% 0.18%

AVG 0.13% -0.45% 0.47% -0.79% 0.26% -0.57% -0.29% -0.03% Table 11. Average four-week returns with transaction costs

Testing the returns for statistical significance we again compare them to the market returns that are created via a buy-and-hold strategy. The transaction costs are also included for the buy-and-hold strategy, which implies two transactions per four-week period. It is remarkable that with transaction costs included, it is possible to create a profitable trading strategy for each market index that beats the market with a 95% confidence interval. The results are summarized in Table 12. The average returns on a four-weekly basis range from 0.18% or 18 basis points for the FTSE 100 to 0.61% or 61 basis points for the DJIA. The methods F, L and FL are most common. One exception is the significant outperformance of the model using the FRL method on the DJIA.

Significant returns (transaction cost included) Index Method AVG CUM T p L 0.48% 295% -3.46184 0.000283 BEL 20 FL 0.19% 119% -2.33936 0.009779 F 0.41% 257% -2.17203 0.01503 AEX L 0.49% 301% -2.53974 0.005609 FL 0.45% 277% -2.58197 0.004969 L 0.28% 177% -2.04753 0.02041 CAC 40 FL 0.22% 136% -1.97727 0.02412 DAX 30 L 0.54% 336% -1.66241 0.048346 L 0.61% 380% -3.29485 0.000507 DJIA FL 0.28% 175% -2.18499 0.014537 FLR 0.20% 126% -1.88053 0.030135 F 0.18% 111% -1.70346 0.044374 FTSE 100 L 0.30% 187% -2.24927 0.012338 FL 0.29% 177% -2.26378 0.011881 L 0.58% 357% -2.76136 0.002923 S&P 500 FL 0.25% 154% -1.77528 0.038048

Table 12. Significant returns with transaction costs included

The cumulative results of the returns that significantly outperform the market are also given in Table 12. The cumulative results from 2004 to 2016 range from 110% for the FTSE 100 using the F method to as much as 380% for the DJIA using the L method. As an illustration, Figure 16 shows the cumulative return of the method L on the BEL 20. If an investor started using the model on January 1st 2004 and used it until February 29th 2016, the cumulative return would have been 300%. As comparison: if an investor bought and sold the BEL 20 index on the same two dates, the return would be 46%. The complete results are given in Appendix (3.2.2.)

Cumulative return L on BEL 20 400% 295% L B&H BEL20 300%

200% 46% 100%

-100% 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Figure 16. Cumulative return for method L on BEL 20

3. Conclusions

The purpose of this paper is to research the potential of using internet search engine data in creating a profitable trading strategy. First we test the correlation between search volumes and market index returns. Next we analyze whether search volume data can be used to construct a profitable trading model.

Using a methodology introduced by Preis et al. (2013), a set of trading rules is developed. First we test the trading rules for each word separately. Using the BEL 20 index as traded asset, we confirm that there is a correlation between changes in search volumes and market index returns for a number of words. This supports the findings by Preis et al. (2013). For each lag period we find that the cumulative returns of certain words are significantly higher than the market returns. The highest performing words beat the market by about 200% over the course of twelve years. It is also confirmed that the performance of a word is positively correlated with its degree of financial relevance.

Secondly, the model proves that using search engine data for investment decisions can generate a return that significantly outperforms the market. Using techniques from the field of predictive analytics we construct a model. The model includes a training period of four weeks, to determine the composition of the word set, and a testing period of four weeks, to determine our model’s performance. It is then tested on the BEL 20 together with six other major stock indices, namely the AEX, CAC 40, DAX 30, DJIA, FTSE 100 and the S&P 500. Taking into account transaction costs, for each market index our model results in an average four-weekly return that significantly outperforms the respective market return. The average returns on a four-weekly basis range from 0.18% or 18 basis points for the FTSE 100 to 0.61% or 61 basis points for the DJIA. The cumulative results from 2004 to 2016 range from 110% for the FTSE 100 to as much as 380% for the DJIA.

Constructing a profitable trading strategy using search engine data has been proven possible. Using publicly available information in the form of search volumes, we are able to generate returns that consistently outperform the market. These findings appear to be in contradiction to the semi-strong form of market efficiency. A market shows a semi- strong form efficiency if asset prices reflect all publicly available information. Since the model is based on publicly available search engine data, it contradicts this form of market efficiency.

We conclude that the results further illustrate the opportunities that are offered by the enormous amount of information that is captured by internet users’ online behavior, especially in the context of financial markets.

4. Limitations and further research 4.1. Data granularity In our model we retrieve the Google Trends data using the gTrendsR package in R. The SVIs for each word are downloaded for the entire time range that we are considering (in this case, from 01/01/2004 until 01/03/2016), which results in search volume data on a weekly base. A solution to this issue is to retrieve the data on a monthly or quarterly basis and aggregate the results.

4.2. SVIs and retail investors It has been confirmed that SVIs mostly capture the information gathering behavior of less-sophisticated investors. Professional traders at large institutions use more sophisticated services like Bloomberg or Reuters. Search volumes from Bloomberg could give an insight in the information gathering processes of institutional investors, just like Google Trends does for retail investors. Using the same methodology as described above, a trading strategy could be designed to test the predicative power of search volumes on stock market returns. Whether Bloomberg would be willing to release this data is an entirely different question.

4.3. Word sets In this research the word set of Preis et al. (2013) is used to illustrate the potential to build a profitable trading strategy based on Google Trends. Instead of using Google Sets to fill in the list of search terms, a list of 100 or 200 financially relevant words could be composed. In Section 2.2.4. the example is given of the top-100 words on the website of The Economist. For Belgium this list can be composed by searching for the number of hits of all the words used in the online version of the specialized financial papers De Tijd (for the Dutch speaking part) and L’Echo (for the French speaking part). Using a formula similar to the one in Section 2.1.2., a financial relevance score could then decide which set of words will be in the strategy.

4.4. Ties The final result of the training period of the model is a set of words that consists of the ten best or worst performers, or the five best and five worst performers. It is important to note that ties between words during the training period are broken in a random way. A potential solution to this issue would be to base the ranking on a secondary criterion. In this case the ties between two equally performing words could be broken by looking at their performance in the model during a previous period.

4.5. Data availability When running certain words through our model it returns the following error message: Not enough search volume. Please change your search terms.

This error comes from Google Trends itself, and indicates that the search volume for that particular word is not sufficiently high. This limitation was also reported by Da et al. (2011).

However, the error also occurs when a large number of search terms is entered using the function of the gTrendsR package. The problem is reported by a number of users4, and acknowledged by the authors. In our research the problem only occurs when trying to run the translated model. This is not remarkable, as we run four times more search terms to the gTrendsR package functions compared to when we use the words that are not translated.

4 Different threats on GitHub referring to the issue can be found at: https://github.com/PMassicotte/gtrendsR/issues/

REFERENCES

Bank, M., Larch, M. & Peter, G. (2011). Google search volume and its influence on liquidity and returns of German stocks. Financial Markets and Portfolio Management, 25(3), 239-264.

Barber, B. & Odean, T. (2008). All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors. Review of Financial Studies, 21, 785–818.

Bodnaruk, A., Ostberg, P. (2008). Does investor recognition predict returns? Journal of Financial Economics, 91(2), 208–226.

Bollen, J., Mao & H., Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.

Bordino, I., Battiston, S., Caldarelli, G., Cristelli, M., Ukkonen, A. & Weber, I. (2012). Web Search Queries Can Predict Stock Market Volumes. PLoS One 7, e40014.

Castle, J., Fawcett, N. & Hendry, D. (2009). Nowcasting is not just contemporaneous forecasting. National Institute Economic Review, 210(1), 71–89.

Challet, D. & Bel Hadj Ayed, A. (2014). Do Google Trend data contain more predictability than price returns?

Challet, D. & Bel Hadj Ayed, A. (2014). Predicting ﬁnancial markets with Google Trends and not so random keywords.

Choi, H. & Varian, H. (2012). Predicting the present with Google Trends. The Economic Record, 88, 2–9.

Da, Z., Engelberg, J. & Gao, P. (2011). In search of attention. The Journal of Finance, 66(5), 1461–1499.

Fama, E. (1970). Efficient capital markets: a review of theory and empirical work. Journal of Finance, 25, 383–417.

Fama, E. (1991). Efficient Capital Markets: II, Journal of Finance, 45:5, 1575-1617.

Frömmel, M. (2013). Portfolios and Investments (3rd ed.). Norderstedt: BoD – Books on Demand.

Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinsky, M., & Brilliant, L. (2009). Detecting influence epidemics using search engine query data. Nature, 457, 1012–1014.

Grossman, S. & Stiglitz, J. (1980). On the impossibility of informationally efficient markets, American Economic Review, 70:3, 393-408.

Jensen, M. (1978). Some anomalous evidence regarding market efficiency. Journal of Financial Economics, 6, 95-101.

Joseph, K., Wintoki, M. B., and Zhang, Z. (2011). Forecasting abnormal stock returns and trading volume using investor sentiment: Evidence from online search. International Journal of Forecasting, 27, 1116 – 1127.

King, G. (2011). Ensuring the Data-Rich Future of the Social Sciences. Science, 331, 719–721.

Kristoufek, L. (2013). Can Google Trends search queries contribute to risk diversification? Sci. Rep., 3, 2713.

Lehavy, R., Sloan, R. (2008). Investor recognition and stock returns. Review of Accounting Studies, 13(2), 327-361.

Leinweber, D. (2007). Stupid data miner tricks: overﬁtting the S&P 500. The Journal of Investing, 16(1), 15–22

Malkiel, B. (2003). The Efficient Market Hypothesis and its critics. Journal of Economic Perspectives, 17, 159-182.

Menkhoff, L. & Taylor, M. (2007). The obstinate passion of foreign exchange professionals: Technical Analysis. Journal of Economic Literature, 45:4, 936-972.

Preis, T., Moat, H. & Stanley, H (2013). Quantifying trading behavior in ﬁnancial markets using Google Trends. Sci. Rep., 3, 1684.

Preis, T., Moat, H.S. & Stanley, H.E. (2013). Quantifying Trading Behavior in Financial Markets Using Google Trends. Sci. Rep., 3, 1684.

Preis, T., Reith, D. & Stanley, H. (2010). Complex dynamics of our economic life on different scales: insights from search engine query data. Phil. Trans. R. Soc. A, 368, 5707–5719.

Simon, H. (1955). A Behavioral Model of Rational Choice. Quarterly Journal of International Economics, 69, 99-118.

Takeda, F. & Wakao, T. (2013). Google search intensity and its relationship with returns and trading volume of Japanese stocks. Tokyo: University of Tokyo.

Vlastakis, N. & Markellos, R. (2012). Information demand and stock market volatility. Journal of Banking and Finance, 36(6), 1808-1821.

APPENDIX

1. R FUNCTIONS MANUAL ...... 2

1.1. SINGLE WORD RESULTS ...... 2

1.2. MODEL RESULTS ...... 3

2. WORD SET ...... 5

2.1. TRANSLATED WORD SET (PREIS ET AL., 2013) ...... 5

2.2. FINANCIAL RELEVANCE (PREIS ET AL., 2013) ...... 8

3. RETURNS ...... 9

3.1. SINGLE WORD RETURNS ...... 9

3.2. MODEL RETURNS ...... 16

4. R CODE ...... 24

1 1. R Functions Manual For calculating the profitability of the presented trading strategy based on Google Trends data a range of functions in R were developed.

1.1. Single word results

1.1.1. GTS() The GTS() function is defined as follows:

GTS <- function(Q="hello", translate=FALSE, loc="BE", index="BEL20", startIN="2004-01-01", endIN="2016-02-27", outsample=FALSE, startOUT=2014-01-01, endOUT=2016-02-27, lag=3, PW=TRUE)

With the following parameters:

The Google search query of which you want to investigate the performance in the investment Q strategy. Individual as well as vectors of search queries can be entered. The default value is “hello”. Whether or not you want to run the analysis on the translated Q (using the algorithm described translate in XX). The default value is FALSE. Country code of the country where the queries are entered in Google. The default value is loc Belgium: “BE”. The index used in the investment strategy. The default value is “BEL20”. For now: only BEL20 is possible. Goal is to program a quantmod method that allows the user to index calculate the strategy’s performance on foreign stock indices (e.g. DJIA, CAC40), individual stock quotes (e.g. GOOG, GIMB) and (possibly) foreign exchange rates. Start of the in-sample period, in the “YYYY-MM-DD” format. The default value is “2004-01- startIN 01”. End of the in-sample period, in the “YYYY-MM-DD” format. The default value is “2016-02- endIN 27”. Whether or not an out-of-sample analysis should be included. The default value is FALSE. outsample For now: only in-sample is possible. Lag parameter indicating the number of previous weeks that are considered in the calculation lag of the relative change in query volume ∆n during week t (see XX). The default value is 3. Whether or not the entered query is part of the word set used by Preis et al. (2013). The default PW value is TRUE.

The output of the GTS() function is threefold: 1. Plot of the weekly returns of the investment strategy; 2. Plot of the cumulative returns of the investment strategy and the cumulative returns from a buy-and-hold strategy of the BEL20; 3. Returning the cumulative return of the strategy at the end of the investment period.

1.1.2. ReturnList() The ReturnList() function is used to run GTS() through a list of words. Next to the plots returned by GTS() the function returns an ordered list of the returns realized using the investment strategy of each query, which is further exported to a .csv-file.

1.2. Model results

1.2.1. GWCP() The function GWCP(), named after Get Weekly Closing Prices, is defined as follows:

GWCP <- function(asset="^GDAXI", type="stock", startIN="2004-01-01", endIN="2016-02-27")

The symbol for the financial asset. This symbol depends on the source of the financial data, i.e. Yahoo Finance or Oanda.

The codes used in Yahoo Finance are: - BEL 20: “^BFX” asset - AEX: “^AEX” - CAC 40: “^FCHI” - DAX 30: “^GDAXI” - DJIA: “^DJI” - FTSE 100: “^FTSE” - S&P 500: “^GSPC”

When the financial asset is a stock or market index, the parameter value “stock” should type be used. When it is an exchange rate the parameter value “forex” should be used. Start of the sample period, in the “YYYY-MM-DD” format. The default value is “2004- startIN 01-01”. End of the sample period, in the “YYYY-MM-DD” format. The default value is “2016- endIN 02-27”.

The function makes use of the GetSymbols() function from the quantmod package. The output of GWCP() is a vector of weekly returns for the given financial asset and for the given time period.

1.2.2. GGTD() The function GGTD(), named after Get Google Trends Data, is defined as follows:

GGTD <- function(Q="hello", translated=FALSE, loc="BE", startIN="2004-01-01", endIN="2016-02-27", PW=FALSE, ...)

The Google search word of which you want to investigate the performance in the Q investment strategy. Individual as well as vectors of search queries can be entered. The default value is “hello”. Whether or not you want to run the analysis on the translated Q (using the algorithm translate described in XX). The default value is FALSE. Country code of the country where the queries are entered in Google. The default value loc is Belgium: “BE”. The index used in the investment strategy. The default value is “BEL20”. For now: only BEL20 is possible. Goal is to program a quantmod method that allows the index user to calculate the strategy’s performance on foreign stock indices (e.g. DJIA, CAC40), individual stock quotes (e.g. GOOG, GIMB) and (possibly) foreign exchange rates. Start of the sample period, in the “YYYY-MM-DD” format. The default value is “2004- startIN 01-01”. End of the sample period, in the “YYYY-MM-DD” format. The default value is “2016- endIN 02-27”. Only relevant if the word set is translated. It determines whether or not the entered PW word set is organized in a way similar to the word set used by Preis et al. (2013). The default value is TRUE. Lag parameter indicating the number of previous weeks that are considered in the lag calculation of the relative change in query volume ∆n during week t (see XX). The default value is 3.

The function makes use of the gtrends() function from the gTrendsR package. The output of GGTD() is a vector of weekly returns for the given search word Q over the period between startIN and endIN.

1.2.3. LWRAW() The function LWRAW(), named after List Weekly Returns off All Words, is defined as follows:

LWRAW <- function(tcosts=0)

tcosts The transaction cost per four-weekly period

This is the main function that executes our model as described in Section 2.2.3. It incorporates the results from GWCP() and GGTD(), and calculates the average four-weekly return of the model.

2. Word set 2.1. Translated word set (Preis et al., 2013) The 98 terms used as queries in the paper by Preis et al. (2013), together with the translations from Van Dale (D), Larousse (F) and Beolingus (G).

Dataset: Preis et al. (2013) Q D F G E arts kunst art kunst arts banking bank banque bankwesen banking bonds obligaties bons Verzinsliches Wertpapier bonds bubble zeepbel bulle seifenblase bubble buy kopen acheter kaufen buy cancer kanker cancer krebs cancer car auto voiture auto car cash cash espèces cash cash chance toeval hasard zufall chance color kleur couleur farbe color conflict conflict conflit konflikt conflict consume consumeren consommer konsumieren consume consumption consumptie consommation konsum consumption crash crash crash zusammenbruch crash credit krediet crédit kredit credit crisis crisis crise krise crisis culture cultuur culture kultur culture debt schulden dette schuld debt default wanbetaling défaut de paiement Nichtzahlung default derivatives derivaten dérivées derivaten derivatives dividend dividend dividende dividende dividend dow jones dow jones dow jones dow jones dow jones earnings inkomsten revenus einkünfte earnings economics economie économie wirtschaft economics economy economie économie wirtschaft economy energy energie énergie energie energy environment milieu environnement umwelt environment fed fed fed fed fed finance financiën finance finanz finance financial markets financiële markten marchés financiers Finanzmärkte financial markets fine boete amende busse fine fond dierbaar précieux teuer fond

5 food voedsel nourriture nahrung food forex wisselkoers change wechselkurs forex freedom vrijheid liberté freiheit freedom fun plezier plaisir Vergnügen fun gain stijging gain ansteig gain gains winst profit gewinn gains garden tuin jardin garten garden gold goud or gold gold greed hebzucht avidité habsucht greed growth groei croissance wachstum growth happy gelukkig heureux glücklich happy headlines hoofdpunten titres Schlagzeile headlines health gezondheid santé Gesundheit health hedge indekken couvrir absichern hedge hedging indekken couvrir absichern hedging holiday vakantie vacances Urlaub holiday home thuis maison Haus home house huis maison Haus house housing huisvesting logement Unterkunft housing inflation inflatie inflation inflation inflation invest investeren inverstir investieren invest investment investering investissement investierung investment kitchen keuken cuisine küche kitchen labor werk travail arbeit labor leverage hefboom effet de levier hebel leverage lifestyle levenswijze mode de vie lebensweise lifestyle loss verlies perte verlust loss markets markten marchés markte markets marriage huwelijk mariage heirat marriage metals metalen métal metalle metals money geld argent geld money movie film film film movie nasdaq nasdaq nasdaq nasdaq nasdaq nyse nyse nyse nyse nyse office kantoor bureau büro office oil olie pétrole öl oil opportunity kans occasion chance opportunity ore erts minerai erz ore politics politiek politique politik politics portfolio portfolio portefeuille portefeuille portfolio

6 present cadeau cadeau geschenk present profit winst profit nutzen profit Metalle der Seltenen rare earths zeldzame aarden terre rare Erden rare earths religion religie religion religion religion restaurant restaurant restaurant restaurant restaurant return rendement rendement ertrag return returns rendementen rendements ertragen returns revenue omzet chiffre d'affaires umsatz revenue rich rijk riche reich rich ring ring bague ring ring risk risico risque risiko risk sell verkopen vendre verkaufen sell short selling short gaan vente à découvert leerverkauf short selling society maatschappij société gesellschaft society stock market beurs bourse börse stock market stocks aandelen actions anteilen stocks success succes succès erfolg success tourism toerisme tourisme tourismus tourism trader handelaar contrepartiste händler trader train trein train zug train transaction transactie transaction transaktion transaction travel reizen voyager reisen travel unemployment werkloosheid chômage Arbeitslosigkeit unemployment war oorlog guerre krieg war water water eau wasser water world wereld monde welt world

2.2. Financial relevance (Preis et al., 2013)

3. Returns 3.1. Single word returns

3.1.1. Not translated, lag ∆푡 = 1

Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H inflation 183% 138% food 178% 133% risk 172% 127% housing 155% 109% lifestyle 149% 103% consume 146% 101% dow jones 143% 97% credit 142% 97% growth 141% 96% fed 138% 93% money 136% 90% fond 135% 89% bonds 133% 88% marriage 132% 87% economy 130% 85% hedge 125% 79% society 124% 78% stocks 123% 78% tourism 123% 77% water 122% 77% crisis 121% 76% debt 120% 75%

nasdaq 116% 71% strategy buy-and-hold BEL20 the Outperforming invest 114% 68% finance 113% 67% culture 108% 62% train 107% 61% banking 107% 61% cancer 103% 57% success 96% 51% financial markets 96% 50% labor 95% 50% ore 95% 49% markets 93% 47% economics 92% 47% investment 91% 46% short selling 91% 45% unemployment 91% 45% arts 87% 41% home 84% 38% environment 82% 37% dividend 82% 36% fine 78% 32% freedom 78% 32% returns 74% 28% present 74% 28% office 72% 27% health 71% 26% forex 71% 25% bubble 65% 19% return 61% 15% buy 60% 14% cash 60% 14% opportunity 57% 12% war 57% 12% leverage 57% 11% gold 53% 7% greed 50% 5% holiday 50% 4% stock market 49% 4% gains 49% 4% color 48% 3% conflict 45% 0% loss 44% - 2% headlines 43% - 2%

car 43% - 3% strategy buy-and-hold BEL20 the outperforming Not travel 40% - 5% religion 38% - 7% profit 38% - 8% earnings 34% - 11% chance 32% - 13% happy 32% - 13% portfolio 31% - 14% transaction 31% - 15% oil 28% - 18% metals 28% - 18% energy 23% - 23% garden 20% - 26% revenue 19% - 26% politics 12% - 34% sell 12% - 34% movie 9% - 37% world 8% - 37% gain 4% - 41% hedging 4% - 42% consumption 3% - 42% nyse 0% - 45% derivatives - 4% - 49% strategy trading Unprofitable fun - 5% - 51% rich - 7% - 53% default - 8% - 53% crash - 8% - 54% ring - 11% - 57% restaurant - 12% - 57% trader - 27% - 73% house - 28% - 74% kitchen - 42% - 88% rare earths - 71% - 117% 9

3.1.2. Not translated, lag ∆푡 = 2

Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H debt 158% 113% lifestyle 148% 102% credit 146% 100% banking 145% 99% home 141% 95% growth 137% 91% water 134% 88% fond 133% 87% freedom 131% 86% crisis 126% 81% housing 126% 80% economics 126% 80% returns 125% 79% loss 121% 75% cash 116% 71% hedge 114% 68% dow jones 111% 65% marriage 110% 65% invest 105% 59% default 104% 59%

consume 102% 57% strategy buy-and-hold BEL20 the Outperforming inflation 102% 56% stock market 101% 56% headlines 97% 51% cancer 97% 51% health 93% 48% sell 92% 47% unemployment 90% 45% tourism 90% 45% money 90% 44% bonds 90% 44% short selling 88% 42% ore 87% 42% war 86% 41% oil 83% 37% leverage 81% 36% financial markets 81% 36% nasdaq 80% 34% dividend 79% 33% stocks 73% 28% color 73% 27% travel 72% 26% metals 70% 25% finance 70% 25% risk 69% 24% return 65% 20% fine 65% 20% labor 65% 20% transaction 65% 19% office 62% 17% society 62% 17% chance 62% 16% environment 60% 15% greed 60% 14% holiday 56% 10% economy 54% 8% food 50% 4% nyse 48% 3% investment 46% 1% hedging 39% - 6% strategy buy-and-hold BEL20 the outperforming Not house 38% - 8% earnings 34% - 12% train 34% - 12% fed 33% - 13% gold 29% - 16% culture 29% - 17% politics 27% - 18% profit 25% - 21% success 21% - 25% forex 20% - 26% markets 18% - 28% arts 17% - 29% gain 16% - 29% rich 13% - 33% movie 7% - 39% rare earths 1% - 45% religion 1% - 45% world - 2% - 48% gains - 3% - 48% bubble - 5% - 51% portfolio - 8% - 53% crash - 9% - 54% present - 11% - 56% strategy trading Unprofitable derivatives - 12% - 58% car - 13% - 58% kitchen - 14% - 60% conflict - 18% - 64% consumption - 19% - 65% fun - 23% - 68% buy - 29% - 75% ring - 37% - 83% happy - 38% - 84% energy - 42% - 88% opportunity - 43% - 89% revenue - 55% - 101% trader - 65% - 110% restaurant - 77% - 123% garden - 115% - 160%

3.1.3. Not translated, lag ∆푡 = 3

Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H marriage 225% 180% cash 182% 136% housing 180% 135% hedging 163% 118% banking 163% 118% hedge 162% 117% inflation 162% 116% invest 146% 100% freedom 141% 96% health 136% 91% leverage 135% 89% metals 128% 83% debt 125% 79% returns 125% 79% credit 122% 77% food 121% 75% water 120% 74% movie 116% 70% unemployment 114% 68% buy 113% 67% crisis 112% 67% economics 112% 66% strategy buy-and-hold BEL20 the Outperforming stock market 111% 65% loss 108% 63% growth 101% 56% ore 101% 55% bonds 100% 54% short selling 95% 49% dividend 94% 49% color 92% 47% society 91% 46% home 89% 43% dow jones 89% 43% war 87% 41% finance 85% 40% religion 83% 38% chance 82% 37% greed 80% 34% nasdaq 74% 29% default 73% 27% headlines 72% 27% labor 72% 27% portfolio 71% 26% risk 71% 26% cancer 71% 25% travel 70% 24% investment 69% 24% return 69% 23% lifestyle 68% 22% tourism 62% 16% gold 61% 15% earnings 60% 15% markets 59% 14% fine 58% 12% stocks 55% 10% crash 53% 7% arts 51% 6% sell 51% 6% nyse 50% 5% transaction 46% 1% gains 43% - 2%

car 43% - 3% strategy buy-and-hold BEL20 the outperforming Not bubble 40% - 5% economy 39% - 6% kitchen 39% - 7% oil 36% - 9% holiday 35% - 10% profit 34% - 12% politics 33% - 12% fun 31% - 15% derivatives 30% - 15% fond 28% - 18% opportunity 19% - 26% environment 17% - 29% fed 16% - 30% garden 16% - 30% consumption 13% - 32% gain 13% - 33% money 12% - 33% forex 6% - 39% train 6% - 40% office 2% - 43% house - 3% - 49% energy - 3% - 49%

culture - 11% - 57% strategy trading Unprofitable success - 12% - 58% restaurant - 19% - 64% financial markets - 19% - 65% happy - 22% - 67% consume - 24% - 70% ring - 32% - 78% rich - 40% - 85% revenue - 57% - 103% present - 87% - 133% conflict - 97% - 142% trader - 101% - 147% world - 113% - 158% rare earths - 139% - 185%

3.1.4. Not translated, lag ∆푡 = 4 Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H marriage 207% 161% cash 199% 153% color 152% 106% lifestyle 143% 98% bonds 140% 94% ore 136% 91% inflation 131% 86% consume 131% 85% housing 128% 82% buy 124% 78% food 121% 75% dow jones 113% 67% crisis 112% 66% returns 104% 58% metals 103% 58% growth 103% 57% portfolio 101% 56% invest 101% 56% economics 100% 55% stock market 100% 55% strategy buy-and-hold BEL20 the Outperforming health 97% 51% nasdaq 96% 51% stocks 93% 47% debt 90% 44% banking 89% 43% hedge 87% 41% fed 84% 39% fine 82% 37% earnings 82% 37% leverage 79% 34% finance 75% 30% movie 74% 29% financial markets 74% 28% cancer 71% 25% credit 70% 24% water 68% 23% default 68% 22% sell 67% 22% opportunity 66% 21% freedom 65% 19% kitchen 63% 17% risk 61% 16% economy 60% 14% profit 59% 13% travel 58% 13% society 57% 11% nyse 56% 10% loss 56% 10% home 55% 10% fun 54% 9% short selling 52% 7% money 48% 3% crash 47% 1% greed 46% 0% return 46% 0% bubble 45% 0% transaction 45% - 1% markets 43% - 3%

chance 42% - 4% Not outperforming the BEL20 buy-and-hold strategy buy-and-hold BEL20 the outperforming Not holiday 41% - 4% war 41% - 5% politics 37% - 9% ring 35% - 11% oil 34% - 11% gold 31% - 15% dividend 30% - 15% gains 30% - 16% train 29% - 16% energy 26% - 19% environment 20% - 26% hedging 17% - 28% headlines 16% - 29% investment 15% - 30% forex 15% - 31% derivatives 11% - 35% conflict 9% - 36% rich 7% - 39% arts 2% - 44% tourism 1% - 45% culture 0% - 46% office 0% - 46% rare earths - 1% - 46% religion - 2% - 48% happy - 2% - 48% consumption - 4% - 50% strategy trading Unprofitable labor - 7% - 53% gain - 7% - 53% revenue - 8% - 53% unemployment - 14% - 59% fond - 14% - 60% present - 24% - 70% garden - 27% - 72% trader - 46% - 91% success - 46% - 92% car - 58% - 104% restaurant - 65% - 110% house - 66% - 112% world - 87% - 132% 12

3.1.5. Not translated, lag ∆푡 = 5 Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H marriage 231% 185% hedge 163% 118% health 163% 117% metals 158% 112% bonds 144% 98% lifestyle 135% 89% portfolio 131% 85% debt 125% 80% inflation 123% 77% nasdaq 120% 74% loss 119% 74% food 119% 74% ore 118% 73% housing 117% 71% credit 116% 71%

cash 111% 66% strategy buy-and-hold BEL20 the Outperforming investment 109% 64% banking 109% 64% movie 107% 61% financial markets 106% 60% stock market 106% 60% invest 99% 54% fed 95% 50% default 88% 42% economics 88% 42% growth 86% 40% dow jones 85% 40% water 84% 39% returns 83% 38% culture 83% 37% leverage 80% 34% fine 79% 33% stocks 79% 33% money 78% 33% chance 75% 29% economy 73% 27% nyse 72% 26% consume 72% 26% environment 63% 18% finance 60% 14% success 57% 11% travel 56% 11% cancer 56% 10% profit 55% 10% holiday 53% 7% train 52% 7% hedging 52% 6% earnings 47% 2% return 47% 1% headlines 44% - 2% buy 39% - 7% arts 38% - 8% bubble 36% - 10% forex 35% - 10% happy 35% - 11% unemployment 33% - 12%

energy 31% - 15% strategy buy-and-hold BEL20 the outperforming Not fun 31% - 15% color 30% - 15% markets 30% - 15% gains 30% - 16% religion 29% - 17% short selling 28% - 17% revenue 28% - 18% risk 28% - 18% opportunity 26% - 19% transaction 25% - 21% conflict 21% - 25% office 18% - 28% kitchen 16% - 30% crash 16% - 30% home 16% - 30% freedom 15% - 30% society 15% - 31% present 14% - 32% war 13% - 32% ring 12% - 34% crisis 11% - 34% oil 11% - 35% world 9% - 36% dividend 7% - 38% consumption 5% - 40% politics 5% - 40% gain - 1% - 46% garden - 3% - 48% rare earths - 3% - 49% strategy trading Unprofitable rich - 17% - 62% tourism - 19% - 64% labor - 22% - 67% gold - 22% - 68% greed - 31% - 77% sell - 33% - 78% fond - 37% - 83% house - 38% - 84% derivatives - 40% - 85% car - 59% - 104% restaurant - 62% - 108% trader - 76% - 121%

3.1.6. Not translated, lag ∆푡 = 6 Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H marriage 225% 179% bonds 167% 121% lifestyle 166% 121% hedge 152% 106% stock market 150% 104% financial markets 125% 80% movie 124% 78% ore 118% 72% nyse 117% 72% finance 113% 68% environment 109% 64% credit 106% 60% unemployment 102% 57% money 100% 54% banking 98% 52% leverage 97% 51% stocks 95% 49% nasdaq 95% 49% strategy buy-and-hold BEL20 the Outperforming portfolio 92% 46% metals 91% 45% inflation 91% 45% fine 87% 42% opportunity 86% 41% consume 84% 39% cancer 83% 37% cash 79% 34% dow jones 79% 34% food 79% 34% invest 77% 32% investment 76% 30% fed 75% 29% returns 74% 28% color 73% 28% earnings 70% 24% chance 70% 24% debt 68% 22% health 66% 20% travel 64% 18% economy 64% 18% growth 61% 15% water 59% 14% economics 58% 13% culture 55% 10% hedging 52% 7% housing 49% 4% success 49% 4% energy 48% 2% profit 48% 2% loss 47% 1% return 47% 1% fun 47% 1% crisis 46% 0% greed 44% - 2% society 42% - 3%

bubble 42% - 4% strategy buy-and-hold BEL20 the outperforming Not happy 37% - 9% revenue 32% - 14% train 32% - 14% politics 32% - 14% sell 30% - 16% forex 27% - 19% transaction 27% - 19% gains 25% - 20% markets 23% - 23% war 22% - 23% arts 22% - 24% risk 22% - 24% ring 21% - 25% home 20% - 26% oil 17% - 28% freedom 16% - 29% religion 13% - 33% gain 12% - 34% consumption 9% - 36% default 6% - 40% buy 0% - 45% labor - 1% - 47% short selling - 6% - 51% headlines - 6% - 52% office - 11% - 57% present - 13% - 59%

garden - 15% - 61% strategy trading Unprofitable rich - 17% - 62% rare earths - 17% - 63% crash - 19% - 65% fond - 21% - 67% dividend - 24% - 70% kitchen - 30% - 76% holiday - 31% - 77% house - 35% - 80% conflict - 41% - 87% tourism - 49% - 95% restaurant - 56% - 101% derivatives - 57% - 102% car - 62% - 108% gold - 67% - 113% trader - 97% - 142% world - 97% - 143% 14

3.1.7. Not translated, average over all lag periods Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H marriage 188% 143% lifestyle 135% 89% hedge 134% 88% inflation 132% 86% bonds 129% 83% housing 126% 80% cash 124% 79% banking 118% 73% credit 117% 71% debt 114% 69% food 111% 66% ore 109% 64% invest 107% 61% growth 105% 59% health 104% 59% dow jones 103% 58% stock market 103% 57% water 98% 52% returns 97% 52% strategy buy-and-hold BEL20 the Outperforming nasdaq 97% 51% metals 96% 51% economics 96% 50% leverage 88% 43% crisis 88% 42% stocks 86% 41% finance 86% 40% consume 85% 40% loss 82% 37% cancer 80% 34% color 78% 33% money 77% 32% financial markets 77% 32% fine 75% 29% freedom 74% 29% fed 74% 28% movie 73% 27% risk 71% 25% economy 70% 24% portfolio 70% 24% unemployment 70% 24% investment 68% 22% home 67% 22% society 65% 20% chance 60% 15% travel 60% 15% environment 59% 13% short selling 58% 12% nyse 57% 12% return 56% 10% default 55% 10% earnings 55% 9% hedging 55% 9% war 51% 6% buy 51% 5% dividend 45% - 1% headlines 44% - 1% markets 44% - 1% culture 44% - 2% train 43% - 2%

profit 43% - 3% Not outperforming the BEL20 buy-and-hold strategy buy-and-hold BEL20 the outperforming Not greed 41% - 4% transaction 40% - 6% fond 37% - 8% bubble 37% - 9% sell 37% - 9% arts 36% - 10% opportunity 35% - 10% oil 35% - 11% tourism 35% - 11% holiday 34% - 12% labor 34% - 12% gains 29% - 17% forex 29% - 17% success 27% - 18% religion 27% - 19% politics 24% - 21% office 24% - 22% fun 22% - 23% gold 14% - 32% energy 14% - 32% crash 13% - 32% happy 7% - 39% gain 6% - 39% kitchen 5% - 40% consumption 1% - 44% ring - 2% - 48% revenue - 7% - 52% strategy trading Unprofitable present - 8% - 54% rich - 10% - 56% derivatives - 12% - 57% conflict - 13% - 59% car - 18% - 63% garden - 21% - 66% house - 22% - 68% rare earths - 38% - 84% world - 47% - 92% restaurant - 48% - 94% trader - 69% - 114% 15

3.1.8. Translated, lag ∆푡 = 1

Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H

train 129% 84% strategy buy-and-hold BEL20 the Outperforming holiday 113% 68% economy 113% 68% crisis 108% 63% water 103% 57% society 102% 57% money 97% 51% cash 96% 51% forex 81% 36% garden 76% 31% office 73% 28% credit 69% 24% economics 60% 14% inflation 59% 14% oil 58% 12% freedom 57% 12% housing 48% 2% color 47% 1%

arts 47% 1% Not outperforming the BEL20 BEL20 the outperforming Not happy 42% - 4% finance 40% - 5% strategy buy-and-hold culture 40% - 6% derivatives 38% - 7% car 30% - 16% fed 23% - 22% risk 11% - 34% fine 8% - 37% home 3% - 43% buy 3% - 43% cancer 1% - 45% debt - 1% - 47% politics - 2% - 47% rich - 8% - 53% stock market - 10% - 56% environment - 16% - 62% nasdaq - 17% - 63% opportunity - 18% - 64% fond - 22% - 68%

nyse - 24% - 70% strategy trading Unprofitable world - 25% - 71% dow jones - 29% - 75% house - 29% - 75% travel - 32% - 77% present - 32% - 78% war - 40% - 85% movie - 40% - 86% ring - 44% - 90% dividend - 52% - 97% portfolio - 60% - 106% sell - 64% - 110% labor - 64% - 110% religion - 71% - 116% success - 76% - 122% profit - 88% - 134% gold - 105% - 150% health - 112% - 157% tourism - 152% - 197% restaurant - 176% - 221%

3.1.9. Translated, lag ∆푡 = 2

Returns of the Google Trends strategy using Preis' (2013) search queries Q Preis (2013) Returns of GT strategy Excess return over BEL20 B&H

credit 138% 93% buy-and-hold BEL20 Outperforming the the Outperforming crisis 132% 87%

inflation 100% 54% strategy office 99% 54% finance 98% 52% cash 92% 46% risk 80% 35% holiday 48% 2% stock market 48% 2%

money 43% - 2% buy-and- BEL20 the outperforming Not housing 40% - 6% dow jones 32% - 14% economy 30% - 16% nasdaq 28% - 18% war 27% - 19% strategy hold freedom 26% - 20% derivatives 23% - 22% color 23% - 23% travel 22% - 24% culture 18% - 28% forex 15% - 31% oil 14% - 32% fed 8% - 38% buy 6% - 40% present 1% - 45% fond 0% - 45% car - 1% - 47% success - 1% - 47% debt - 3% - 48% world - 6% - 52% profit - 7% - 53% environment - 8% - 54% dividend - 11% - 57% garden - 11% - 57% society - 13% - 59% movie - 16% - 62% water - 17% - 62% strategy trading Unprofitable cancer - 19% - 65% happy - 21% - 67% arts - 26% - 72% train - 29% - 75% politics - 34% - 80% economics - 38% - 83% portfolio - 39% - 84% nyse - 42% - 88% restaurant - 43% - 89% gold - 52% - 97% religion - 54% - 99% house - 54% - 99% home - 54% - 100% opportunity - 60% - 106% ring - 63% - 108% labor - 101% - 147% fine - 101% - 147% sell - 103% - 149% health - 110% - 156% tourism - 115% - 160% rich - 128% - 174%

3.2. Model returns

3.2.1. Average returns (without transaction costs) The plots in this section are added as an illustration of the data visualization tool Pirate Plots. For each of the seven market indices the average four-weekly returns are plotted using the pirateplot function of the yarrr package. In this illustration we used the returns without transaction costs.

3.2.1.1. BEL 20

3.2.1.2. AEX

3.2.1.3. CAC 40

1.1.2.4. DAX 30

1.1.2.5. DJIA

1.1.2.6. FTSE 100

1.1.2.7. S&P 500

3.2.2. Cumulative returns (including transaction costs) Below are the graphs of the methods that significantly outperform the market indices. The returns are calculated including transaction costs (as defined in Section 2.3.).

3.2.2.1. BEL 20

Cumulative returns on BEL 20 400% 295% L FL 300%

200% 119%

100%

Cumulative Cumulative return 0%

-100% 2004 2006 2008 2010 2012 2014

3.2.2.2. AEX

Cumulative returns on AEX 500% F L FL 301% 400%

300%

200% 257% 277% Cumulative Cumulative return 100%

0% 2004 2006 2008 2010 2012 2014

3.2.2.3. CAC 40

Cumulative returns on CAC 40 400% L FL 177% 300%

200%

136%

100% Cumulative Cumulative return

0% 2004 2006 2008 2010 2012 2014

3.2.2.4. DAX 30

Cumulative returns on DAX 30 500% L 400%

300% 336% 200%

Cumulative Cumulative return 100%

0% 2004 2006 2008 2010 2012 2014

3.2.2.5. DJIA

Cumulative returns on DJIA

400% L FL FLR

300% 380% 175% 200%

100% 126% Cumulative Cumulative return 0%

-100% 2004 2006 2008 2010 2012 2014

3.2.2.6. FTSE 100

Cumulative returns on FTSE 100 400% F L FL 187% 300%

200%

100% 177%

Cumulative Cumulative return 0% 111%

-100% 2004 2006 2008 2010 2012 2014

3.2.2.7. S&P 500

Cumulative returns on S&P 500 357% 400% L FL

300% 154% 200%

100%

Cumulative Cumulative return 0%

-100% 2004 2006 2008 2010 2012 2014

4. R code The R code used in the functions defined in Appendix (1.) are given below. The code is exported using GitHub Gist.

GTS()

GTS <- function(Q="hello", translated=TRUE, loc="BE", index="BEL20", startIN="2004-01-01", endIN="2016-02-27", lag=1, PW=TRUE){

## 1. GETGT(Q, loc, startIN, endIN) if(translated==FALSE){ GT <- gtrends(Q , location=loc, start_date=startIN, end_date=endIN) TRENDS <- GT$trend } else { if(PW==TRUE){ GT <- gtrends(as.vector(t(Preiswords[Q,])) , location, start_date=start, end_date=end) TRENDS <- GT$trend

mqD<- mean(as.vector(t(TRENDS[3]))) mqF<- mean(as.vector(t(TRENDS[4]))) mqG<- mean(as.vector(t(TRENDS[5]))) mqE<- mean(as.vector(t(TRENDS[6])))

sumM <- (mqD+mqF+mqG+mqE)

wqD <- mqD/sumM wqF <- mqF/sumM wqG <- mqG/sumM wqE <- mqE/sumM

#calculate combined query SVI COMBINED <- ( + wqD*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qD"])) , location, start_date=start, end_date=end)$trend[3])) + wqF*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qF"])) , location, start_date=start, end_date=end)$trend[3])) + wqG*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qG"])) , location, start_date=start, end_date=end)$trend[3])) + wqE*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qE"])) , location, start_date=start, end_date=end)$trend[3])))

#combine to dataframe and name columns TRENDS <- cbind(TRENDS[1], TRENDS[2], COMBINED) } else { # vind een manier om de niet-Preiswords te translaten via R! } }

colnames(TRENDS)[1] <- "Sunday" colnames(TRENDS)[2] <- "Saturday" colnames(TRENDS)[3] <- Q TRENDS <- TRENDS[-c(635, 636, 637, 638), ]

l <- 634 GT_lag1 <- head(c(0, TRENDS[[3]]), l) GT_lag2 <- head(c(0, 0, TRENDS[[3]]), l) GT_lag3 <- head(c(0, 0, 0, TRENDS[[3]]), l) GT_lag4 <- head(c(0, 0, 0, 0, TRENDS[[3]]), l) GT_lag5 <- head(c(0, 0, 0, 0, 0, TRENDS[[3]]), l) GT_lag6 <- head(c(0, 0, 0, 0, 0, 0, TRENDS[[3]]), l)

TRENDS <- cbind(TRENDS, GT_lag1, GT_lag2, GT_lag3, GT_lag4, GT_lag5, GT_lag6)

## 2. STRATEGY (lag, index) if (lag==6){ TRENDS$DeltaN <- (TRENDS[[3]] - (TRENDS$GT_lag1+TRENDS$GT_lag2+TRENDS$GT_lag3+TRENDS$GT_lag4+TRENDS$GT_lag5+TRENDS$GT_lag6) /lag) } else if (lag==5){ TRENDS$DeltaN <- (TRENDS[[3]] - (TRENDS$GT_lag1+TRENDS$GT_lag2+TRENDS$GT_lag3+TRENDS$GT_lag4+TRENDS$GT_lag5)/lag) } else if (lag==4){ TRENDS$DeltaN <- (TRENDS[[3]] - (TRENDS$GT_lag1+TRENDS$GT_lag2+TRENDS$GT_lag3+TRENDS$GT_lag4)/lag) } else if (lag==3){ TRENDS$DeltaN <- (TRENDS[[3]] - (TRENDS$GT_lag1+TRENDS$GT_lag2+TRENDS$GT_lag3)/lag) } else if (lag==2){ TRENDS$DeltaN <- (TRENDS[[3]] - (TRENDS$GT_lag1+TRENDS$GT_lag2)/lag) } else { TRENDS$DeltaN <- (TRENDS[[3]] - TRENDS$GT_lag1) }

#profit from strategy #TRENDS$profit <- ifelse(TRENDS$DeltaN>0, (BEL20$Fr_closing- BEL20$Fr_closing_lead1)/BEL20$Fr_closing_lead1, (BEL20$Fr_closing_lead1- BEL20$Fr_closing)/BEL20$Fr_closing) TRENDS$profit <- ifelse(TRENDS$DeltaN>0, log(BEL20$Fr_closing)-log(BEL20$Fr_closing_lead1), log(BEL20$Fr_closing_lead1)-log(BEL20$Fr_closing))

#cumulative profit over holding period TRENDS$totalprofit[4:634] = cumsum(TRENDS$profit[4:634]) totalcumprofit <- TRENDS$totalprofit[634] #print(paste("The cumulative return of the GT strategy using '", Q, "' is", totalcumprofit))

################################# # ## 3. PLOT # #if(translate==FALSE){ # # #weekly profit plotten en wegschrijven onder de juiste naam # dest1 <- file.path("C:", "Users", "Sam", "Documents", "Sam", "UGent", "2015-2016", "Thesis", "Rplots", "Weekly return not translated", paste("WeeklyReturn_",Q, ".wmf", sep="")) # win.metafile(file=dest1) # # plot(TRENDS$profit, type = "p", pch=19, main=paste("Weekly returns of GT strategy using '", Q, "'"), xlab="weeks", ylab="weekly return", ylim=c(-0.3, 0.3)) # dev.off() # # #cumulative profit plotten en wegschrijven onder de juiste naam # dest2 <- file.path("C:", "Users", "Sam", "Documents", "Sam", "UGent", "2015-2016", "Thesis", "Rplots", "Cum return not translated", paste("CumReturn_",Q, ".wmf", sep="")) # win.metafile(file=dest2)

# plot(TRENDS$totalprofit, type = "l", main=paste("Cumulative return of GT strategy using '", Q, "'"), xlab="weeks", ylab="cumulative return", col="red", ylim=c(-1.5, 2)) # lines(BEL20$Return_to_date, col="blue") # abline(h=0, col=black, lty="dotted") # dev.off() # #} else if(translate==TRUE){ # # #weekly profit plotten en wegschrijven onder de juiste naam # dest1 <- file.path("C:", "Users", "Sam", "Documents", "Sam", "UGent", "2015-2016", "Thesis", "Rplots", "Weekly return translated", paste("WeeklyReturn_",Q, "_lag",lag, ".wmf", sep="")) # win.metafile(file=dest1) # # plot(TRENDS$profit, type = "p", pch=19, main=paste("Weekly returns of GT strategy using '", Q, "'"), xlab="weeks", ylab="weekly return", ylim=c(-0.3, 0.3)) # dev.off() # # #cumulative profit plotten en wegschrijven onder de juiste naam

# dest2 <- file.path("C:", "Users", "Sam", "Documents", "Sam", "UGent", "2015-2016", "Thesis", "Rplots", "Cum return translated", paste("CumReturn_",Q, "_lag",lag, ".wmf", sep="")) # win.metafile(file=dest2) # # plot(TRENDS$totalprofit, type = "l", main=paste("Cumulative return of GT strategy using '", Q, "'"), xlab="weeks", ylab="cumulative return", col="red", ylim=c(-1.5, 2)) # lines(BEL20$Return_to_date, col="blue") # abline(h=0, col="black", lty="dotted") # dev.off() # #} ######################################################################

## 4. RETURN TOTALCUMPROFIT return(totalcumprofit) }

ReturnList()

ReturnList <- function (keus="GTSTRAT", lijst="Preiswords"){

if(identical(keus, "GTSTRAT")){ #ReturnList voor GTSTRAT ReturnList <- as.data.frame(cbind(Q = Preiswordsadjusted$qE, returns = 0)) ReturnList$returns <- sapply(Preiswordsadjusted$qE,GTSTRAT)

} else if (identical(keus, "GTS")){ #ReturnList voor GTS ReturnList <- as.data.frame(cbind(Q = Preiswords$qE, returns = 0)) ReturnList$returns <- sapply(Preiswords$qE,GTS)

}

ReturnList <- ReturnList[order(ReturnList$returns, decreasing=TRUE), ] write.csv2(ReturnList, file=paste("ReturnList_Translated_", lijst,"_lag1", ".csv", sep=""))

return(ReturnList)

}

GWCP()

GWCP <- function(asset="^GDAXI", type="stock", ){

# voorwaarde: quantmod package installed # doel: weekly closing prices van financial asset inladen zodat de GT strategy erop kan worden toegepast

## 1. Get financial data (closing prices) from quantmod

if(type=="stock"){ closing <- getSymbols(asset, from="2004-01-01", to="2016-02-28", warnings=FALSE, env=NULL) } else { closing <- getFX(asset, from="2004-01-01", to="2016-02-28", warnings=FALSE, env=NULL) }

## 2. Make it weekly closing prices weeklyclosing <- to.weekly(closing) weeklyclosing <- weeklyclosing[,4] colnames(weeklyclosing)[1] <- "Fr_closing"

weeklyclosing <- weeklyclosing[-1,]

## 3. Add lag1 and lead1 Fr_closing_lag1 <- c(NA, as.vector(weeklyclosing$Fr_closing)) Fr_closing_lag1 <- Fr_closing_lag1[-635] # extraweek <- as.vector(getSymbols(Symbol=asset, from="2016-02-29", to="2016-03-05", warnings=FALSE, env=NULL)[5,4])

if(type=="stock"){ extraweek <- as.vector(getSymbols(asset, from="2016-02-29", to="2016-03-05", warnings=FALSE, env=NULL)[5,4]) } else { extraweek <- as.vector(getFX(asset, from="2016-02-29", to="2016-03-05", warnings=FALSE, env=NULL)[5,4]) } Fr_closing_lead1 <- c(as.vector(weeklyclosing$Fr_closing[-1,]), as.vector(extraweek))

weeklyclosing <- cbind(weeklyclosing, Fr_closing_lag1, Fr_closing_lead1) colnames(weeklyclosing) <- c("Fr_closing", "Fr_closing_lag1", "Fr_closing_lead1")

## 4. return weekly closing write.csv2(weeklyclosing, file=paste("Weekly RETURNS of DJI", ".csv", sep="")) return(weeklyclosing) }

GGTD()

GGTD <- function(Q="hello", translated=FALSE, loc="DE", startIN="2004-01-01", endIN="2016- 02-27", PW=FALSE, ...){

## 1. GETGT(Q, loc, startIN, endIN)

if(translated==FALSE){ GT <- gtrends(Q , location=loc, start_date=startIN, end_date=endIN) TRENDS <- GT$trend

} else { if(PW==TRUE){ GT <- gtrends(as.vector(t(Preiswords[Q,])) , location="", start_date=start, end_date=end) TRENDS <- GT$trend

mqD<- mean(as.vector(t(TRENDS[3]))) mqF<- mean(as.vector(t(TRENDS[4]))) mqG<- mean(as.vector(t(TRENDS[5]))) mqE<- mean(as.vector(t(TRENDS[6])))

sumM <- (mqD+mqF+mqG+mqE)

wqD <- mqD/sumM wqF <- mqF/sumM wqG <- mqG/sumM wqE <- mqE/sumM

#calculate combined query SVI

COMBINED <- ( + wqD*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qD"])) , location, start_date=start, end_date=end)$trend[3])) + wqF*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qF"])) , location, start_date=start, end_date=end)$trend[3])) + wqG*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qG"])) , location, start_date=start, end_date=end)$trend[3])) + wqE*as.vector(t(gtrends(as.vector(t(Preiswords[Q,"qE"])) , location, start_date=start, end_date=end)$trend[3])))

#combine to dataframe and name columns

TRENDS <- cbind(TRENDS[1], TRENDS[2], COMBINED) } else { # vind een manier om de niet-Preiswords te translaten via R! }

}

colnames(TRENDS)[1] <- "Sunday" colnames(TRENDS)[2] <- "Saturday" colnames(TRENDS)[3] <- Q

TRENDS <- TRENDS[-c(635, 636, 637, 638), ]

l <- 634

GT_lag1 <- head(c(0, TRENDS[[3]]), l) GT_lag2 <- head(c(0, 0, TRENDS[[3]]), l) GT_lag3 <- head(c(0, 0, 0, TRENDS[[3]]), l) GT_lag4 <- head(c(0, 0, 0, 0, TRENDS[[3]]), l)

TRENDS <- cbind(TRENDS, GT_lag1, GT_lag2, GT_lag3, GT_lag4)

## 2. CALCULATE DELTA N

lag <- 4 TRENDS$DeltaN <- (TRENDS[[3]] - (TRENDS$GT_lag1+TRENDS$GT_lag2+TRENDS$GT_lag3+TRENDS$GT_lag4)/lag)

## 3. CALCULATE RETURNS #GWCP

#weeklyclosing <- BEL20 #weeklyclosing <- wordt geladen in het begin van LWRAW en opgeslaan onder weeklyclosing

#profit from strategy TRENDS$profit <- ifelse(TRENDS$DeltaN>0, log(weeklyclosing$Fr_closing)- log(weeklyclosing$Fr_closing_lead1), log(weeklyclosing$Fr_closing_lead1)- log(weeklyclosing$Fr_closing))

WeeklyReturns <- TRENDS$profit WeeklyReturns[1:4] <- 0 # eerste 4 weken neutraliseren

## 4. RETURN WEEKLY RETURNS return(WeeklyReturns) }

LWRAW()

LWRAW <- function(tcosts=0){ ## 0. SET SEED set.seed(7)

## 1. WEEKLY RETURNS #load financial data weeklyclosing <<- GWCP(asset="^GDAXI", type="stock")

#create empty data frame WEEKLYRETURNS <- data.frame(c(1:634))

#fill each column with weekly returns of one of the words for(i in Preiswords$qE){ WeeklyReturns <- GGTD(i) WEEKLYRETURNS[i] <- WeeklyReturns }

#clean up the output WEEKLYRETURNS[1] <- NULL colnames(WEEKLYRETURNS) <- Preiswords$qE

#export output to csv write.csv2(WEEKLYRETURNS, file=paste("Weekly RETURNS of all words_GDAXI", ".csv", sep=""))

## 2. 4-WEEKLY RETURNS #create empty data frame FOURWEEKLYRETURNS <- data.frame(c(1:634))

#fill each column with 4-weekly returns of one of the words for(i in Preiswords$qE){ FOURWEEKLYRETURNS[i] <- rollapply(WEEKLYRETURNS[i], 4, sum, align="right", fill=0) }

#clean up the output #weeks become periods FOURWEEKLYRETURNS[1] <- NULL FOURWEEKLYRETURNS <- FOURWEEKLYRETURNS[-c(1,2,3,4),] #the first 4 weeks are lag period (only GT data) FOURWEEKLYRETURNS <- FOURWEEKLYRETURNS[-c(5,6,7),] #the next 3 4-weekly periods still use weeks from the lag period colnames(FOURWEEKLYRETURNS) <- Preiswords$qE

## 3. TRAINING PERIOD: DETERMINE WINNERS AND LOSERS

n <- nrow(FOURWEEKLYRETURNS) c <- ncol(FOURWEEKLYRETURNS)

#determine the 10 highest performances, over each 4-week period (= over each row) for (i in 1:n){

ordered <- FOURWEEKLYRETURNS[i, 1:c][order(FOURWEEKLYRETURNS[i,1:c], sample(1:c), decreasing=TRUE)] FOURWEEKLYRETURNS$first[i] <- colnames(ordered)[1] FOURWEEKLYRETURNS$second[i] <- colnames(ordered)[2] FOURWEEKLYRETURNS$third[i] <- colnames(ordered)[3] FOURWEEKLYRETURNS$fourth[i] <- colnames(ordered)[4] FOURWEEKLYRETURNS$fifth[i] <- colnames(ordered)[5] FOURWEEKLYRETURNS$sixth[i] <- colnames(ordered)[6] FOURWEEKLYRETURNS$seventh[i] <- colnames(ordered)[7] FOURWEEKLYRETURNS$eighth[i] <- colnames(ordered)[8] FOURWEEKLYRETURNS$ninth[i] <- colnames(ordered)[9] FOURWEEKLYRETURNS$tenth[i] <- colnames(ordered)[10] }

#determine the 10 lowest performances, over each 4-week period for (i in 1:n){

ordered <- FOURWEEKLYRETURNS[i, 1:c][order(FOURWEEKLYRETURNS[i,1:c], sample(1:c), decreasing=FALSE)]

FOURWEEKLYRETURNS$last[i] <- colnames(ordered)[1] FOURWEEKLYRETURNS$secondtolast[i] <- colnames(ordered)[2] FOURWEEKLYRETURNS$thirdtolast[i] <- colnames(ordered)[3] FOURWEEKLYRETURNS$fourthtolast[i] <- colnames(ordered)[4] FOURWEEKLYRETURNS$fifthtolast[i] <- colnames(ordered)[5] FOURWEEKLYRETURNS$sixthtolast[i] <- colnames(ordered)[6] FOURWEEKLYRETURNS$seventhtolast[i] <- colnames(ordered)[7] FOURWEEKLYRETURNS$eighthtolast[i] <- colnames(ordered)[8] FOURWEEKLYRETURNS$ninthtolast[i] <- colnames(ordered)[9] FOURWEEKLYRETURNS$tenthtolast[i] <- colnames(ordered)[10] }

#get it on the same line m <- nrow(FOURWEEKLYRETURNS)

FOURWEEKLYRETURNS$first_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$first[1:(m-4)]) FOURWEEKLYRETURNS$second_lag <- c(0,0,0, 0, FOURWEEKLYRETURNS$second[1:(m-4)]) FOURWEEKLYRETURNS$third_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$third[1:(m-4)]) FOURWEEKLYRETURNS$fourth_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$fourth[1:(m-4)]) FOURWEEKLYRETURNS$fifth_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$fifth[1:(m-4)])

FOURWEEKLYRETURNS$sixth_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$sixth[1:(m-4)]) FOURWEEKLYRETURNS$seventh_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$seventh[1:(m-4)]) FOURWEEKLYRETURNS$eighth_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$eighth[1:(m-4)]) FOURWEEKLYRETURNS$ninth_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$ninth[1:(m-4)]) FOURWEEKLYRETURNS$tenth_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$tenth[1:(m-4)]) FOURWEEKLYRETURNS$last_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$last[1:(m-4)]) FOURWEEKLYRETURNS$secondtolast_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$secondtolast[1:(m-4)]) FOURWEEKLYRETURNS$thirdtolast_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$thirdtolast[1:(m-4)]) FOURWEEKLYRETURNS$fourthtolast_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$fourthtolast[1:(m-4)]) FOURWEEKLYRETURNS$fifthtolast_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$fifthtolast[1:(m-4)]) FOURWEEKLYRETURNS$sixthtolast_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$sixthtolast[1:(m-4)]) FOURWEEKLYRETURNS$seventhtolast_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$seventhtolast[1:(m- 4)]) FOURWEEKLYRETURNS$eighthtolast_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$eighthtolast[1:(m-4)]) FOURWEEKLYRETURNS$ninthtolast_lag <- c(0,0,0,0, FOURWEEKLYRETURNS$ninthtolast[1:(m-4)]) FOURWEEKLYRETURNS$tenthtolast_lag <- c(0,0,0,0,FOURWEEKLYRETURNS$tenthtolast[1:(m-4)])

#export output to csv write.csv2(FOURWEEKLYRETURNS, file=paste("4-Weekly RETURNS of all words_GDAXI", ".csv", sep=""))

## 4. TESTING PERIOD: SEE WHAT WINNERS AND LOSERS ARE DOING 4 weeks later FOURWEEKLYRETURNS$returns_first <- NA FOURWEEKLYRETURNS$returns_second <- NA FOURWEEKLYRETURNS$returns_third <- NA FOURWEEKLYRETURNS$returns_fourth <- NA FOURWEEKLYRETURNS$returns_fifth <- NA FOURWEEKLYRETURNS$returns_sixth <- NA FOURWEEKLYRETURNS$returns_seventh <- NA FOURWEEKLYRETURNS$returns_eighth <- NA FOURWEEKLYRETURNS$returns_ninth <- NA FOURWEEKLYRETURNS$returns_tenth <- NA FOURWEEKLYRETURNS$returns_last <- NA FOURWEEKLYRETURNS$returns_secondtolast<- NA FOURWEEKLYRETURNS$returns_thirdtolast <- NA FOURWEEKLYRETURNS$returns_fourthtolast <- NA FOURWEEKLYRETURNS$returns_fifthtolast <- NA FOURWEEKLYRETURNS$returns_sixthtolast <- NA FOURWEEKLYRETURNS$returns_seventhtolast <- NA FOURWEEKLYRETURNS$returns_eighthtolast <- NA FOURWEEKLYRETURNS$returns_ninthtolast <- NA FOURWEEKLYRETURNS$returns_tenthtolast <- NA

for (i in 8:m){ FOURWEEKLYRETURNS$returns_first[i]<- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$first_lag[i]] FOURWEEKLYRETURNS$returns_second[i]<- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$second_lag[i]]

FOURWEEKLYRETURNS$returns_third[i]<- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$third_lag[i]] FOURWEEKLYRETURNS$returns_fourth[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$fourth_lag[i]] FOURWEEKLYRETURNS$returns_fifth[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$fifth_lag[i]] FOURWEEKLYRETURNS$returns_sixth[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$sixth_lag[i]] FOURWEEKLYRETURNS$returns_seventh[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$seventh_lag[i]] FOURWEEKLYRETURNS$returns_eighth[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$eighth_lag[i]] FOURWEEKLYRETURNS$returns_ninth[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$ninth_lag[i]] FOURWEEKLYRETURNS$returns_tenth[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$tenth_lag[i]] FOURWEEKLYRETURNS$returns_last[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$last_lag[i]] FOURWEEKLYRETURNS$returns_secondtolast[i]<- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$secondtolast_lag[i]] FOURWEEKLYRETURNS$returns_thirdtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$thirdtolast_lag[i]] FOURWEEKLYRETURNS$returns_fourthtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$fourthtolast_lag[i]] FOURWEEKLYRETURNS$returns_fifthtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$fifthtolast_lag[i]] FOURWEEKLYRETURNS$returns_sixthtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$sixthtolast_lag[i]] FOURWEEKLYRETURNS$returns_seventhtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$seventhtolast_lag[i]] FOURWEEKLYRETURNS$returns_eighthtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$eighthtolast_lag[i]] FOURWEEKLYRETURNS$returns_ninthtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$ninthtolast_lag[i]] FOURWEEKLYRETURNS$returns_tenthtolast[i] <- FOURWEEKLYRETURNS[i, FOURWEEKLYRETURNS$tenthtolast_lag[i]] }

#clear first rows FOURWEEKLYRETURNS$returns_first <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_first[8:m]) FOURWEEKLYRETURNS$returns_second <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_second[8:m]) FOURWEEKLYRETURNS$returns_third <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_third[8:m]) FOURWEEKLYRETURNS$returns_fourth <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_fourth[8:m]) FOURWEEKLYRETURNS$returns_fifth <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_fifth[8:m]) FOURWEEKLYRETURNS$returns_sixth <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_sixth[8:m])

FOURWEEKLYRETURNS$returns_seventh <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_seventh[8:m]) FOURWEEKLYRETURNS$returns_eighth <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_eighth[8:m]) FOURWEEKLYRETURNS$returns_ninth <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_ninth[8:m]) FOURWEEKLYRETURNS$returns_tenth <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_tenth[8:m]) FOURWEEKLYRETURNS$returns_last <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_last[8:m]) FOURWEEKLYRETURNS$returns_secondtolast<- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_secondtolast[8:m]) FOURWEEKLYRETURNS$returns_thirdtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_thirdtolast[8:m]) FOURWEEKLYRETURNS$returns_fourthtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_fourthtolast[8:m]) FOURWEEKLYRETURNS$returns_fifthtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_fifthtolast[8:m]) FOURWEEKLYRETURNS$returns_sixthtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_sixthtolast[8:m]) FOURWEEKLYRETURNS$returns_seventhtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_seventhtolast[8:m]) FOURWEEKLYRETURNS$returns_eighthtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_eighthtolast[8:m]) FOURWEEKLYRETURNS$returns_ninthtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_ninthtolast[8:m]) FOURWEEKLYRETURNS$returns_tenthtolast <- c(0,0,0,0,0,0,0, FOURWEEKLYRETURNS$returns_tenthtolast[8:m])

## 5. TRANSACTION COSTS

#total transaction cost = # of weeks * 2 *

#weeklytcost <-

## 6. DIFFERENT STRATEGIES

## the strategies are defined as: # F: First ten # L: Last ten # FR: first ten reversed # LR: last ten reversed

# FL: first five, last five # FLR: first five, last five reversed # FRL: first five reversed, last five # FRLR: first five reversed, last five reversed

FOURWEEKLYRETURNS$periodreturnF <- (FOURWEEKLYRETURNS$returns_first + FOURWEEKLYRETURNS$returns_second + FOURWEEKLYRETURNS$returns_third + FOURWEEKLYRETURNS$returns_fourth + FOURWEEKLYRETURNS$returns_fifth)/5 + (FOURWEEKLYRETURNS$returns_sixth + FOURWEEKLYRETURNS$returns_seventh + FOURWEEKLYRETURNS$returns_eighth + FOURWEEKLYRETURNS$returns_ninth + FOURWEEKLYRETURNS$returns_tenth)/5

FOURWEEKLYRETURNS$periodreturnFR <- -(FOURWEEKLYRETURNS$returns_first + FOURWEEKLYRETURNS$returns_second + FOURWEEKLYRETURNS$returns_third + FOURWEEKLYRETURNS$returns_fourth + FOURWEEKLYRETURNS$returns_fifth)/5 - (FOURWEEKLYRETURNS$returns_sixth + FOURWEEKLYRETURNS$returns_seventh + FOURWEEKLYRETURNS$returns_eighth + FOURWEEKLYRETURNS$returns_ninth + FOURWEEKLYRETURNS$returns_tenth)/5

FOURWEEKLYRETURNS$periodreturnL <- (FOURWEEKLYRETURNS$returns_last + FOURWEEKLYRETURNS$returns_secondtolast + FOURWEEKLYRETURNS$returns_thirdtolast + FOURWEEKLYRETURNS$returns_fourthtolast + FOURWEEKLYRETURNS$returns_fifthtolast)/5 + (FOURWEEKLYRETURNS$returns_sixthtolast + FOURWEEKLYRETURNS$returns_seventhtolast + FOURWEEKLYRETURNS$returns_eighthtolast + FOURWEEKLYRETURNS$returns_ninthtolast + FOURWEEKLYRETURNS$returns_tenthtolast)/5

FOURWEEKLYRETURNS$periodreturnLR <- -(FOURWEEKLYRETURNS$returns_last + FOURWEEKLYRETURNS$returns_secondtolast + FOURWEEKLYRETURNS$returns_thirdtolast + FOURWEEKLYRETURNS$returns_fourthtolast + FOURWEEKLYRETURNS$returns_fifthtolast)/5 - (FOURWEEKLYRETURNS$returns_sixthtolast + FOURWEEKLYRETURNS$returns_seventhtolast + FOURWEEKLYRETURNS$returns_eighthtolast + FOURWEEKLYRETURNS$returns_ninthtolast + FOURWEEKLYRETURNS$returns_tenthtolast)/5

FOURWEEKLYRETURNS$periodreturnFL <- (FOURWEEKLYRETURNS$returns_first + FOURWEEKLYRETURNS$returns_second + FOURWEEKLYRETURNS$returns_third + FOURWEEKLYRETURNS$returns_fourth + FOURWEEKLYRETURNS$returns_fifth)/5 + (FOURWEEKLYRETURNS$returns_last + FOURWEEKLYRETURNS$returns_secondtolast + FOURWEEKLYRETURNS$returns_thirdtolast + FOURWEEKLYRETURNS$returns_fourthtolast + FOURWEEKLYRETURNS$returns_fifthtolast)/5

FOURWEEKLYRETURNS$periodreturnFLR <- (FOURWEEKLYRETURNS$returns_first + FOURWEEKLYRETURNS$returns_second + FOURWEEKLYRETURNS$returns_third + FOURWEEKLYRETURNS$returns_fourth + FOURWEEKLYRETURNS$returns_fifth)/5 - (FOURWEEKLYRETURNS$returns_last + FOURWEEKLYRETURNS$returns_secondtolast + FOURWEEKLYRETURNS$returns_thirdtolast + FOURWEEKLYRETURNS$returns_fourthtolast + FOURWEEKLYRETURNS$returns_fifthtolast)/5

FOURWEEKLYRETURNS$periodreturnFRL <- -(FOURWEEKLYRETURNS$returns_first + FOURWEEKLYRETURNS$returns_second + FOURWEEKLYRETURNS$returns_third + FOURWEEKLYRETURNS$returns_fourth + FOURWEEKLYRETURNS$returns_fifth)/5 +

(FOURWEEKLYRETURNS$returns_last + FOURWEEKLYRETURNS$returns_secondtolast + FOURWEEKLYRETURNS$returns_thirdtolast + FOURWEEKLYRETURNS$returns_fourthtolast + FOURWEEKLYRETURNS$returns_fifthtolast)/5

FOURWEEKLYRETURNS$periodreturnFRLR <- -(FOURWEEKLYRETURNS$returns_first + FOURWEEKLYRETURNS$returns_second + FOURWEEKLYRETURNS$returns_third + FOURWEEKLYRETURNS$returns_fourth + FOURWEEKLYRETURNS$returns_fifth)/5 - (FOURWEEKLYRETURNS$returns_last + FOURWEEKLYRETURNS$returns_secondtolast + FOURWEEKLYRETURNS$returns_thirdtolast + FOURWEEKLYRETURNS$returns_fourthtolast + FOURWEEKLYRETURNS$returns_fifthtolast)/5

#returns opmaken modelreturnsF <- FOURWEEKLYRETURNS$periodreturnF[8:m] - tcosts modelreturnsFR <- FOURWEEKLYRETURNS$periodreturnFR[8:m] - tcosts modelreturnsL <- FOURWEEKLYRETURNS$periodreturnL[8:m] - tcosts modelreturnsLR <- FOURWEEKLYRETURNS$periodreturnLR[8:m] - tcosts modelreturnsFL <- FOURWEEKLYRETURNS$periodreturnFL[8:m] - tcosts modelreturnsFLR <- FOURWEEKLYRETURNS$periodreturnFLR[8:m] - tcosts modelreturnsFRL <- FOURWEEKLYRETURNS$periodreturnFRL[8:m] - tcosts modelreturnsFRLR <- FOURWEEKLYRETURNS$periodreturnFRLR[8:m] - tcosts

MODELRETURNS <- cbind(modelreturnsF, modelreturnsFR, modelreturnsL, modelreturnsLR, modelreturnsFL, modelreturnsFLR, modelreturnsFRL, modelreturnsFRLR)

modelperformanceF <- mean(modelreturnsF) modelperformanceFR <- mean(modelreturnsFR) modelperformanceL <- mean(modelreturnsL) modelperformanceLR <- mean(modelreturnsLR) modelperformanceFL <- mean(modelreturnsFL) modelperformanceFLR <- mean(modelreturnsFLR) modelperformanceFRL <- mean(modelreturnsFRL) modelperformanceFRLR <- mean(modelreturnsFRLR)

#wegschrijven write.csv2(MODELRETURNS, file=paste("MODELRETURNS_GDAXI", ".csv", sep=""))

modelperformance <- (c(modelperformanceF, modelperformanceFR, modelperformanceL, modelperformanceLR, modelperformanceFL, modelperformanceFLR, modelperformanceFRL, modelperformanceFRLR))

## 7. PLOT THIS!

#boxplot(FOURWEEKLYRETURNS$periodreturnMM, col="royalblue2", ylab ="4 weeks return", xlab ="Google Trends strategy ") #works

#beanplot(FOURWEEKLYRETURNS$periodreturnMM, ylab="4-weekly retunrs", main="Return distribution", col=c("royalblue2", "red", "black", "black"), what=c(1,1,1,0) ) #works

#pirate plot F <- cbind(modelreturnsF, "F") FR <- cbind(modelreturnsFR, "FR") L <- cbind(modelreturnsL, "L") LR <- cbind(modelreturnsLR, "LR") FL <- cbind(modelreturnsFL, "FL") FLR <- cbind(modelreturnsL, "FLR") FRL <- cbind(modelreturnsF, "FRL") FRLR <- cbind(modelreturnsFR, "FRLR")

plotprep <- as.data.frame(rbind(F, FR, L, LR, FL, FLR, FRL, FRLR), stringsAsFactors=FALSE)

colnames(plotprep) <- c("periodreturn", "method")

plotprep$periodreturn <- as.numeric(plotprep$periodreturn)

str(plotprep)

pirateplot(formula = periodreturn ~ method, data = plotprep, xlab = "Method", ylab = "4-weekly return", ylim = c(-1,1), main = "Return distribution", theme.o = 2, pal = "southpark" )

## 8. RETURN

return(modelperformance) }