How Google Search Trends Can Be Used As Technical Indicators for the S&P500-Index

Total Page:16

File Type:pdf, Size:1020Kb

How Google Search Trends Can Be Used As Technical Indicators for the S&P500-Index DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2018 How Google Search Trends Can Be Used as Technical Indicators for the S&P500-Index A Time Series Analysis Using Granger’s Causality Test ALBIN GRANELL FILIP CARLSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES How Google Search Trends Can Be Used as Technical Indicators for the S&P500-Index A Time Series Analysis Using Granger’s Causality Test ALBIN GRANELL FILIP CARLSSON Degree Projects in Applied Mathematics and Industrial Economics Degree Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2018 Supervisors at KTH: Jörgen Säve-Söderbergh, Julia Liljegren Examiner at KTH: Henrik Hult TRITA-SCI-GRU 2018:182 MAT-K 2018:01 Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci Abstract This thesis studies whether Google search trends can be used as indicators for movements in the S&P500 index. Using Granger's causality test, the level of causality between movements in the S&P500 index and Google search volumes for certain keywords is analyzed. The result of the analysis is used to form an investment strategy entirely based on Google search volumes, which is then backtested over a five year period using historic data. The causality tests show that 8 of 30 words indicate causality at a 10% level of significance, where one word, mortgage, indicates causality at a 1% level of significance. Several investment strategies based on search volumes yield higher returns than the index itself over the considered five year period, where the best performing strategy beats the index with over 60 percentage units. 1 Hur Google-s¨oktrenderkan anv¨andassom tekniska indikatorer f¨or SP500-indexet: en tidsserieanalys med hj¨alpav Grangers kausalitetstest Sammanfattning Denna uppsats studerar huruvida Google-s¨oktrenderkan anv¨andas som indikatorer f¨orr¨orelseri S&P500-indexet. Genom Grangers kausalitet- stest studeras kausalitetsniv˚anmellan r¨orelseri S&P500 och Google- s¨okvolymer f¨ors¨arskilltutvalda nyckelord. Resultaten i denna analys anv¨ands i sin tur f¨oratt utforma en investeringsstrategi enbart baserad p˚aGoogle-s¨okvolymer, som med hj¨alpav historisk data pr¨ovas ¨over en fem˚arsperiod. Resultaten av kausalitetstestet visar att 8 av 30 ord in- dikerar en kausalitet p˚aen 10%-ig signifikansniv˚a,varav ett av orden, mortgage, p˚avisarkausalitet p˚aen 1%-ig signifikansniv˚a.Flera invester- ingsstrategier baserade p˚as¨okvolymer genererar h¨ogreavkastning ¨anin- dexet sj¨alvt¨over den pr¨ovade fem˚arsperioden, d¨arden b¨astastrategin sl˚ar index med ¨over 60 procentenheter. 3 Acknowledgements We would like to thank our supervisors at the Royal Institute of Tech- nology (KTH), P¨arJ¨orgenS¨ave-S¨oderbergh and Julia Liljegren for their support before and throughout the study. 5 Contents 1 Introduction 9 1.1 Background . .9 1.2 Objective . .9 1.3 Problem Statement . 10 1.4 Limitations . 10 1.5 Previous Research . 10 2 Theoretical Framework 12 2.1 Technical Indicators . 12 2.2 Financial Theory . 12 2.2.1 Efficient Market Hypothesis (EMH) . 12 2.2.2 Behavioural Finance . 13 2.3 Mathematical Framework . 13 2.3.1 Vector Autoregression (VAR) . 13 2.3.2 VAR Order Selection . 14 2.3.3 Stable VAR process . 15 2.3.4 Stationarity . 16 2.3.5 Augmented Dickey-Fuller Test . 16 2.3.6 OLS Estimation of VAR Parameters . 16 2.3.7 Breusch-Godfrey test . 17 2.3.8 Granger-Causality . 17 2.3.9 F-Statistics for Granger-Causality . 18 3 Method 19 3.1 Word selection . 19 3.2 Data collection . 21 3.2.1 Search data . 21 3.2.2 S&P500-Index . 22 3.3 Investment Strategies . 22 3.4 Outline . 24 4 Results 26 4.1 Transformation of Data . 26 4.2 Selection of lag order . 26 4.3 Model Validation . 27 4.4 Granger-Causality Tests . 28 4.5 Backtesting Investment Strategies . 29 4.5.1 Strategy 1 . 29 4.5.2 Strategy 2 . 30 4.5.3 Strategy 3 . 31 5 Discussion 32 5.1 Interpretation of Results . 32 5.1.1 Granger-Causality Test . 32 5.1.2 Investment Strategies . 33 5.1.3 Comparison to Previous Findings . 34 5.1.4 Financial Implications . 35 5.2 Sources of Errors . 36 7 5.2.1 Mathematical Sources of Errors . 36 5.2.2 Errors From Data Collection . 36 5.2.3 Nature of the Financial Market . 37 5.3 Further Research . 37 5.4 Conclusion . 38 References 40 A Appendix 41 A.1 Augmented Dickey-Fuller test . 41 A.2 Strategy 1 Returns . 43 A.3 Strategy 2 Returns . 44 A.4 Strategy 3 Returns . 45 8 1 Introduction 1.1 Background In the beginning of the 21st century, papers, books, tv broadcasting and radio were the main sources of information. Today, this has changed as the Internet has developed and changed our way of living. Nowadays, top news are shown as a pop-up notification in smartphones only minutes, sometimes even seconds after the occurrence and information is never more than an online search away. Simultaneously with this rapid change Google has become the number one search engine worldwide with trillions of searches every year and a 91% online search market share by February 2018.[1] In 2010 Google's Executive Chairman Eric Schmidt claimed that the information gathered over two days, equals the accumulated amount from the dawn of mankind up to 2003.[2] The new era of big data creates new possibilities and several businesses see it as the holy grail for finally being able to predict who, where and when customers will buy their products.[3] Despite the emergence of big data, the increase of information used does not compare as only about one percent of the data collected is analyzed.[4] Thus, there is a lot of unexplored possibilities in the new era of big data. Today's most commonly used technical trading indicators have not been influenced by the increase in big data, as they are still mainly based on momentum calculated from trading volumes, volatility and historical returns of the considered asset.[5] Such indicators are used by investors in order to analyze price charts of financial assets to be able to, for example, predict future stock price movements. Unlike fundamental analysis, in which investors try to determine whether a company is under- or overvalued, technical analysis does not consider the fundamental value of the stock. Instead indicators are used to identify patterns and in that way predict short term movements in the price of the considered asset.[6] 1.2 Objective The thesis investigates whether there exists a causal relationship between online search activity and the overall performance of the stock market. Today, many investors base their trading on technical indicators or key performance indicators, such as price-earnings ratios, earnings per shares, historic returns etc. However, as a result of the Internet's, Google's in particular, increasing influence on peoples day-to-day life it is reasonable to believe that data from online activity potentially could reflect the overall state of the economy. As further discussed in section 1.6, there is no prevailing consensus on the topic, as previous studies come to different conclusions using various methods. The objective of the thesis is to find mathematically substantiated evidence, through Granger-causality tests, that Google search volumes can be used as a technical indicator for movements on the S&P500 index. Furthermore, based on the results of the causality tests, the thesis aims to find a trading algorithm using Google search volumes that, using a backtest strategy, can be 9 shown to give a higher return than the index itself over a 5-year period. 1.3 Problem Statement The problem statement is broken down into two general questions underlying the thesis: • Can Google search volumes be used as a technical indicator for the S&P500 index? • Can a successful investment strategy be based on these potential findings? 1.4 Limitations The thesis only considers the S&P500 index and 30 selected keywords. Search volumes and index prices are limited to the period of March 24th 2013 to March 24th 2018. As the stock markets and overall economy in different countries may vary it is not reasonable to assume that there is an overall global trend in the economy, in the sense of cause-effect mechanisms from Google searches. Thus, in order for the search data to best represent the trend of the American stock market (i.e. the S&P500 index), the search data is geographically limited to within the United States. 1.5 Previous Research Previous research on Google search trends and their predictive ability on the financial market has been conducted in different scale, using various approaches, leading to different conclusions. This section presents a selection of the studies, the tests conducted and their findings. Several studies have managed to prove the predictable properties of Google search volumes on different economic and social indicators. Varian et al. showed how Google trends can be used to forecast near term economic indicators such as automobile sales, travel destinations, and unemployment rates.[7] Four years later L. Kristoufek et al. used an autoregressive approach to show that Google search volumes also significantly increases the accuracy of the prediction of suicide rates in the UK, compared to only using historical rates for forecasting.[8] In 2013 Moat, Preis et al. empirically studied the relationship between Google search trends and the financial market with a quantifying approach. By analyzing the search volumes the study identified patterns that could be interpreted as \early warning signs" of upcoming stock market moves.
Recommended publications
  • Program, List of Abstracts, and List of Participants and of Participating Institutions
    Wellcome to the Econophysics Colloquium 2016! In this booklet you will find relevant information about the colloquium, like the program, list of abstracts, and list of participants and of participating institutions. We hope you all have a pleasant and productive time here in São Paulo! Program Wednesday, 27 Thursday, 28 Friday, 29 8.30 – Registration 9:15 am 9:15 – Welcome 9.30 am Universality in the interoccurrence Multiplex dependence structure of Measuring economic behavior using 9.30 - times in finance and elsewhere financial markets online data 10.30 am (C. Tsallis) (T. Di Matteo) (S. Moat) 10.30- Coffee Break Coffee Break Coffee Break 11.00 am Portfolio optimization under expected Financial markets, self-organized Sensing human activity using online 11.00- shortfall: contour maps of estimation criticality and random strategies data 12.00 am error (A. Rapisarda) (T. Preis) (F. Caccioli) 12.00- Lunch Lunch Lunch 2.00 pm Trading networks at NASDAQ OMX Complexity driven collapses in large 2.00-3.00 IFT-Colloquium Helsinki random economies pm (R. Mantegna) (R. Mantegna) (G. Livan) Financial market crashes can be 3.00-4.00 Poster Session quantitatively forecasted Parallel Sessions 2A and 2B pm (S.A. Cheong) 4.00-4.30 Coffee Break Coffee Break Closing pm Macroeconomic modelling with Discussion Group: Financial crises and 4.30-5.30 heterogeneous agents: the master systemic risk - Presentation by Thiago pm equation approach (M. Grasselli) Christiano Silva (Banco Central) 5.45-6.45 Discussion Group: Critical transitions in Parallel Sessions 1A and 1B pm markets 7.00- Dinner 10.00 pm Wednesday, 27 July Plenary Sessions (morning) Auditorium 9.30 am – 12 pm 9.30-10.30 am Universality in the Interoccurence times in finance and elsewhere Constantino Tsallis (CBPF, Brazil) A plethora of natural, artificial and social systems exist which do not belong to the Boltzmann-Gibbs (BG) statistical-mechanical world, based on the standard additive entropy SBG and its associated exponential BG factor.
    [Show full text]
  • WRAP 0265813516687302.Pdf
    Original citation: Seresinhe, Chanuki Illushka, Moat, Helen Susannah and Preis, Tobias. (2017) Quantifying scenic areas using crowdsourced data. Environment and Planning B : Urban Analytics and City Science . 10.1177/0265813516687302 Permanent WRAP URL: http://wrap.warwick.ac.uk/87375 Copyright and reuse: The Warwick Research Archive Portal (WRAP) makes this work of researchers of the University of Warwick available open access under the following conditions. This article is made available under the Creative Commons Attribution 3.0 (CC BY 3.0) license and may be reused according to the conditions of the license. For more details see: http://creativecommons.org/licenses/by/3.0/ A note on versions: The version presented in WRAP is the published version, or, version of record, and may be cited as it appears here. For more information, please contact the WRAP Team at: [email protected] warwick.ac.uk/lib-publications Urban Analytics and Article City Science Environment and Planning B: Urban Quantifying scenic areas Analytics and City Science 0(0) 1–16 ! The Author(s) 2017 using crowdsourced data Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0265813516687302 Chanuki Illushka Seresinhe, journals.sagepub.com/home/epb Helen Susannah Moat and Tobias Preis Warwick Business School, University of Warwick, UK; The Alan Turing Institute, UK Abstract For centuries, philosophers, policy-makers and urban planners have debated whether aesthetically pleasing surroundings can improve our wellbeing. To date, quantifying how scenic an area is has proved challenging, due to the difficulty of gathering large-scale measurements of scenicness. In this study we ask whether images uploaded to the website Flickr, combined with crowdsourced geographic data from OpenStreetMap, can help us estimate how scenic people consider an area to be.
    [Show full text]
  • UC Riverside UCR Honors Capstones 2016-2017
    UC Riverside UCR Honors Capstones 2016-2017 Title GTWENDS Permalink https://escholarship.org/uc/item/41g0w0q9 Author Asfour, Mark Publication Date 2017-12-08 Data Availability The data associated with this publication are within the manuscript. eScholarship.org Powered by the California Digital Library University of California GTWENDS By Mark Jeffrey Asfour A capstone project submitted for Graduation with University Honors October 20, 2016 University Honors University of California, Riverside APPROVED ______________________________ Dr. Evangelos Christidis Department of Computer Science & Engineering ______________________________ Dr. Richard Cardullo, Howard H Hays Chair and Faculty Director, University Honors Associate Vice Provost, Undergraduate Education Abstract GTWENDS is an online interactive map of the United States of America that displays the locations of trending Twitter tweets, Google Search trends, and Google Hot Trends topics. States on the map are overlaid with a blue color where Twitter trends originate and a red color where Google trends originate with respective opacity levels varying based on the levels of interest from each website. Through the use of web crawling, map-reducing, and utilizing distributed processing, this project allows visitors to have an interactive geographic visual representation of current national social media topic activities. Visitors can use GTWENDS to learn about social media trends by observing where trends originate from, comparing the contrasts and similarities between trends from Twitter and Google, understanding what types of events trigger mass social media sharing, and much more. ii Acknowledgements I have worked on and developed GTWENDS with my classmates, Mehran Ghamaty (University of California, Riverside Spring 2016) and Jacob Xu (University of California, Riverside Spring 2016), during our senior design project class, CS 179G Databases Spring 2016, under the supervision of my professor and mentor, Professor Evangelos Christidis.
    [Show full text]
  • Finding the Face of Your Data 
    Finding the Face of Your Data There’s been an explosion in data assets Growth of the “digital universe”1 Data overload in context2 40,000 1 EB = 1 billion gigabytes x70 30,000 20,000 10,000 Exabytes 2009 2020 IDC estimates that “tagged” information accounts for only about 3% The amount of information generated by humanity during the first of the digital universe, with analyzed information at 0.5%. The value day of a baby’s life today is equivalent to 70 times the information of big data technology lies in exploring the “untapped pools.” contained in the Library of Congress. Enterprise expectations are as big as the data Big data spending forecast by component3 50 Compute 31% CAGR Storage 40 Networking 30 Infrastructure Software SQL Database Software 20 NoSQL Database Software Application Software 10 Professional Services Billion Dollars Xaas 2011 2012 2013 2014 2015 2016 2017 According to a Wikibon study, big data spend will shift from infrastructure and middleware to value-add services and software during the next five years. Infrastructure, middleware, and technical services will likely become increasingly commoditized as they mature and common standards are adopted. We also note that this study did not include the costs associated with the business and domain experts’ time – a critical element of actionable insight. Taking advantage of data requires new tools ... Traditional and non-traditional value in data Data diving Pattern finding In taking advantage of new data assets – The other side of that same coin, from internal, external, structured, and seeking and patterning for previously unstructured data – and analytics tools, unasked and unanswerable questions, the most common form of value is realized is less common, but potentially more through exploiting deeper detail for new important to the enterprise.
    [Show full text]
  • Unit 4 Lesson 1
    Unit 4 Lesson 1 What is Big Data? Resources Name(s)_______________________________________________ Period ______ Date ________________ Unit 4 Lesson 01 Activity Guide - Big Data Sleuth Card Directions: Web Sites: ● With a partner, select one of the tools in the list to the right. 1. Web archive http://www.archive.org ​ ● Determine what the tool is showing. 2. Measure of America http://www.measureofamerica.org/maps/ ​ ● Find the source of the data it allows you to explore. 3. Wind Sensor network http://earth.nullschool.net/ ​ ● Complete the table below. 4. Twitter sentiment https://www.csc.ncsu.edu/faculty/healey/tweet_viz/tweet_app/ ​ 5. Alternative Fuel Locator http://www.afdc.energy.gov/locator/stations/ ​ Website Name What is this website potentially useful for? What ​ kinds of problems could the provided information be used to solve? Is the provided visualization useful? Does it provide insight into the data? How does it help you look at a lot of information at once? How could it improve? Where is the data coming from? Check for “About”, “Download”, or “API”. You may also need to do a web search. ● Is the data from one source or many? ● Is it static or live? ● Is the source reputable? Why or why not? ● Add a link to the raw data if you can find one. Do you consider this “big” data? Explain your reasoning. 1 Unit 4 Lesson 2 Finding Trends with Visualizations Resources Unit 4 Lesson 2 Name(s)_______________________________________________ Period ______ Date ___________________ Activity Guide - Exploring Trends What’s a Trend? When you post information to a social network, watch a video online, or simply search for information on a search engine, some of that data is collected, and you reveal what topics are currently on your mind.
    [Show full text]
  • Apache Hadoop & Spark – What Is It ?
    Hadoop { Spark Overview J. Allemandou Generalities Hadoop Spark Apache Hadoop & Spark { What is it ? Demo Joseph Allemandou JoalTech 17 / 05 / 2018 1 / 23 Hello! Hadoop { Spark Overview Joseph Allemandou J. Allemandou Generalities Hadoop Spark Demo 2 / 23 Plan Hadoop { Spark Overview J. Allemandou Generalities Hadoop Spark 1 Generalities on High Performance Computing (HPC) Demo 2 Apache Hadoop and Spark { A Glimpse 3 Demonstration 3 / 23 More computation power: Scale up vs. scale out Hadoop { Spark Overview J. Allemandou Scale Up Scale out Generalities Hadoop Spark Demo 4 / 23 More computation power: Scale up vs. scale out Hadoop { Spark Overview Scale out J. Allemandou Scale Up Generalities Hadoop Spark Demo 4 / 23 Parallel computing Hadoop { Spark Overview Things to consider when doing parallel computing: J. Allemandou Partitioning (tasks, data) Generalities Hadoop Spark Communications Demo Synchronization Data dependencies Load balancing Granularity IO Livermore Computing Center - Tutorial 5 / 23 Looking back - Since 1950 Hadoop { Spark Overview J. Allemandou Generalities Hadoop Spark Demo Figure: Google Ngram Viewer 6 / 23 Looking back - Since 1950 Hadoop { Spark Overview J. Allemandou Generalities Hadoop Spark Demo Figure: Google Ngram Viewer 6 / 23 Looking back - Since 1950 Hadoop { Spark Overview J. Allemandou Generalities Hadoop Spark Demo Figure: Google Ngram Viewer 6 / 23 Looking back - Recent times Hadoop { Spark Overview J. Allemandou Generalities Hadoop Spark Demo Figure: Google Trends 7 / 23 Same problem, different tools Hadoop { Spark Overview J. Allemandou Generalities Supercomputer Big Data Hadoop Spark Demo Dedicated hardware Commodity hardware Message Passing Interface 8 / 23 MPI Hadoop { Spark Overview J. Allemandou Generalities C / C++ / Fortran / Python Hadoop Spark Demo Low-level API - Send / receive messages a lot to do manually split the data assign tasks to workers handle synchronisation handle errors 9 / 23 Hadoop + Spark Hadoop { Spark Overview J.
    [Show full text]
  • Training Tomorrow's Analysts
    64 www.efmd.org/globalfocus 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101 01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101 01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101 01010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101
    [Show full text]
  • Latency: Certain Operations Have a Much Higher Latency Than Other Operations Due to Network Communication
    Gi2M+v "B; .i MHvbBb rBi? a+H M/ aT`F >2i?2` JBHH2` Distribution Distribution introduces important concerns beyond what we had to worry about when dealing with parallelism in the shared memory case: 111-- Partial failure: crash failures of a subset of the machines involved in a distributed computation . ...,. Latency: certain operations have a much higher latency than other operations due to network communication. Distribution Distribution introduces important concerns beyond what we had to worry about when dealing with parallelism in the shared memory case: 111-- Partial failure: crash failures of a subset of the machines involved in a distributed computation . ...,. Latency: certain operations have a much higher latency than other operations due to network communication. Latency cannot be masked completely; it will be an important aspect that also impacts the programming model. Important Latency Numbers L 1 cache reference 0.5ns Branch mispredict 5ns L2 cache reference 7ns Mutex lock/unlock 25ns Main memory reference l00ns Compress lK bytes with Zippy 3,000ns == 3µs Send 2K bytes over lGbps network 20,000ns == 20µs SSD random read 150,000ns == 150µs Read 1 MB sequentially from 250,000ns == 250µs Roundtrip within same datacenter 500,000ns == 0.5ms Read 1MB sequentially from SSD 1,000,000ns == lms Disk seek 10,000,000ns == l0ms Read 1MB sequentially from disk 20,000,000ns == 20ms Send packet US ---+ Europe ---+ US 150,000,000ns == 150ms Original compilation by Jeff Dean & Peter Norvig, w/ contributions by Joe Hellerstein & Erik Meijer
    [Show full text]
  • The Digital Treasure Trove
    background in computer science. My PhD gave me the a large scale,” says Preis. “Google provides the possibility chance to bring all these different disciplines together, of looking at the early stages of collective decision making. to try and understand the vast amounts of data generated Investors are not disconnected from the world, they’re by the financial world. So everything I’m doing today is hugely Googling and using various platforms and services to collect interdisciplinary, involving large volumes of data.” information for their decisions.” In effect, these new sources In the era of big data, Preis’s research looks at what of data can predict human behaviour. The events of the world he calls the “digital traces”, the digital detritus, the granules are Googled before they happen. The next challenge for these of information and insight we leave behind in our interactions researchers is to be able to identify key words and phrases at with technology and the internet in particular. His research the time they are being used rather than historically. is probably best described as “computational social science”, an emerging interdisciplinary field which aims to exploit such The risks of diversification data in order to better understand how our complex social world operates. Preis’s fascination with big data and financial markets also led to his involvement in a study of the US market index, the Dow Positive traces Jones. His research analysed the daily closing prices of the 30 stocks forming the Dow Jones Industrial Average from March One intriguing source of information used by Preis’s research 15, 1939, to December 31, 2010.
    [Show full text]
  • Unemployment Rate Forecasting Using Google Trends Bachelor Thesis in Econometrics & Operations Research
    Unemployment rate forecasting using Google trends Bachelor Thesis in Econometrics & Operations Research erasmus university rotterdam erasmus school of economics Author: Olivier O. Smit, 415283 Supervisor: Jochem Oorschot July 8, 2018 Abstract In order to make sound economic decisions, it is of great importance to be able to predict and interpret macro-economic variables. Researchers are therefore seeking continuously to improve the prediction performance. One of the main economic indicators is the US unemployment rate. In this paper, we empirically analyze whether, and to what extent, Google search data have additional predictive power in forecasting the US unemployment rate. This research consists of two parts. First, we look for and select Google search data with potential predicitive power. Second, we evaluate the performance of level and directional forecasts. Here, we make use of different models, based on both econometric and machine learning techniques. We find that Google trends improve the predictive accuracy in all used forecasting methods. Lastly, we discuss the limitations of our research and possible future research suggestions. 1 1 Introduction Nowadays, search engines are intensely used platforms. They serve as the gates to the Internet and at the same time help users to create order in the immense amount of websites and data available on the Internet. About a decade ago, researchers have realized that these search engines contain an enormous quantity of new data that can be of great additional value for modeling all kinds of processes. Previous researches have already proven this to be true. For example, Constant and Zimmermann (2008) have shown that including Google - the largest search engine in terms of search activity and amount of users - query data can be very useful in measuring economic processes and political activities.
    [Show full text]
  • 2016 Google Food Trends Report
    Food Trends 2016 U.S. Report [email protected] Intro With every query typed into a search bar, we are given a glimpse into user considerations or intentions. By compiling top searches, we are able to render a strong representation of the United States’ population and gain insight into this population’s behavior. In our Google Food Trends Report, we are excited to bring forth the power of data into the hands of the marketers, product developers, restaurateurs, chefs, and foodies. The goal of this report is to share useful data for planning purposes accompanied by curated styles of what we believe can make for impactful trends. We are proud to share this iteration and look forward to hearing back from you. Jon Lorenzini | Senior Analytics Lead, Food & Beverage Lisa Lovallo | Global Insights Manager, Food & Beverage Olivier Zimmer | Trends Data Scientist Yarden Horwitz | Trends Brand Strategist Methodology To compile a list of accurate trends within the food industry, we pulled top volume queries related to the food category and looked at their monthly volume from January 2014 to February 2016. We first removed any seasonal effect, and then measured the year-over-year growth, velocity, and acceleration for each search query. Based on these metrics, we were able to classify the queries into similar trend patterns. We then curated the most significant trends to illustrate interesting shifts in behavior. Query De-seasonalized Trend 2004 2006 2008 2010 2012 2014 Query 2016 Characteristics Part One Part Two Part Three top risers a spotlight on an extensive list of and decliners top trending the top volume themes food searches Trend Categories To identify top trends, we categorized past data into six different clusters based on similar behaviors.
    [Show full text]
  • Google Benefit from News Content
    Google Benefit from News Content Economic Study by News Media Alliance June 2019 EXECUTIVE SUMMARY: The following study analyzes how Google uses and benefits from news. The main components of the study are: a qualitative overview of Google’s usage of news content, an analysis of news content on Google Search, and an estimate of revenue Google receives from news. I. GOOGLE QUALITATIVE USAGE OF NEWS ▪ News consumption increasingly shifts towards digital (e.g., 93% in U.S. get some news online) ▪ Google has increasingly relied on news to drive consumer engagement with its products ▪ Some examples of Google investment to drive traffic from news include: o Significant algorithmic updates emphasize news in Search results (e.g., 2011 “Freshness” update emphasized more recent search results including news) ▪ Google News keeps consumers in the Google ecosystem; Google makes continual updates to Google News including Subscribe with Google (introduced March 2018) ▪ YouTube increasingly relies on news: in 2017, YouTube added “Breaking News;” in 2018, approximately 20% of online news consumers in the US used YouTube for news ▪ AMPs (accelerated mobile pages) keep consumers in the Google ecosystem II. GOOGLE SEARCH QUANTITATIVE USAGE OF NEWS CONTENT A. Key statistics: ▪ ~39% of results and ~40% of clicks on trending queries are news results ▪ ~16% of results and ~16% of clicks on the “most-searched” queries are news results B. Approach ▪ Scraped the page one of desktop results from Google Search o Daily scrapes from February 8, 2019 to March 4, 2019
    [Show full text]