Nowcasting Indian GDP with Google Search Data

Nowcasting Indian GDP with Google Search data

Eric Qian

March 19, 2018

Abstract Gross domestic product is an important indicator for measuring overall economic health, and accurate forecasts are essential in crafting monetary and economic policy. Developing economies are constrained by the availability and quality of economic time series data. With increasing global Internet penetration, search data is an appealing source for supplementing traditional economic data. Using India as a case study, this paper uses a dynamic factor model with with a Kalman ﬁlter to generate predictions for GDP. This paper has two contributions to the Indian context. First, a multifactor model yielded better predictions than a single factor. Second, incorporating Google Search data did not improve prediction accuracy.

1 Introduction

Monetary and economic policy heavily rely on accurate and timely information. How- ever, since data releases are often delayed, current values are often not known until later months. This produces a need for short run predictions. Academics and central banks often use an approach called “nowcasting,”, which gives a framework for forecasting the past, present, and near-future. This approach is especially important in predicting low frequency macroeconomic variables with long publication lags, and is often used for pro- ducing forecasts for GDP. A key advantage to this approach is the ability to incorporate and interpret data releases in real-time. Much of the work is based on the model described in Banbura, Giannone and Reichlin (2011). The growth in search engines and data collection has made Internet data an attractive source for supplementing traditional economic time series. As highlighted in the seminal work by Choi and Varian (2012), aggregated Google search data provides a timely and

1 unfiltered measure for people’s preferences. Data are available in near real-time, and people are unlikely to filter their search queries. These features are especially attractive for macroeconomic prediction. As described in Vosen and Schmidt (2011), Google Search can be used to measure economic sentiment ahead of traditional economic surveys. Its direct connection to consumer sentiment makes Search data particularly appealing for forecasting prices (Pan, Chenguang Wu and Song, 2012; Goldfarb, Greenstein and Tucker, 2015). More generally, Google data has also been used to study the Great Re- cession (Suhoy, 2009; Schmidt and Vosen, 2012). Despite its advantages, Search has more limited power when data quality is high and publication delays are low. For example, Li (2016) found that in forecasting US initial jobless claims, incorporating Search data did not improve forecasts. The comparative advantage to using search data is lower in contexts where conventional data sources sufficiently capture consumer sentiment. With the constraints in data timeliness and quality, there is promise in using Internet data for emerging markets. Existing data sources are often not published at the same standards as in developed economies. Swallow and Labbé(2010) and Seabold and Coppola (2015) showcase the potential for Internet data in improving predictions by supplementing traditional sources. In places where survey data are unavailable, search data can act as a substitute for capturing consumer sentiment. This advantage has become more salient over time, as mobile phone usage and Internet penetration have increased (World Bank, 2017). The goal of this paper is to apply the nowcasting framework to the development context and to study whether Internet data can be used as a solution for the imperfect data problem. In particular, I aim to forecast Real Gross Domestic Product for India. India gives a case study to creating predictions under imperfect conditions. Compared to developed economies fewer economic time series are available and existing data are often lower quality and more volatile. Moreover, high Internet penetration gives conditions where Google Search data can quantify consumer sentiment.

2 2 Data

This paper incorporates two main sources of data. I ﬁrst describe the traditional data sources, which are series typically used in economic forecasting applications. I then discuss the aggregated Google Search data.

2.1 Traditional data sources

The dataset containing traditional data sources was aggregated and collected from Bloomberg and contains monthly time series data from January 2000 to August 2017. The series were selected to give a full representation of the Indian economy and to mimic how market par- ticipants respond to information. The choice of series is partially taken from Bragoli and Fosten (2017) and (Giannone, Agrippino and Modugno, 2013a) which examine Indian and Chinese GDP respectively. Because of the relative scarcity of qualitative sentiment data, both papers primarily use hard data. While these series are less noisy, these are subject to greater publication lag. Different from developed economies, data quality is a substantial concern in the India context. Even though the service sector represents approximately 58 % of the economy, activity is undercounted in the economic census and through the unorganized sector (Bard- han, 2013). Moreover, due to issues with survey coverage, price-indexing is often applied inconsistently, with underrepresentation of low income groups. As a result, in the prediction context, forecasting is a greater challenge. Economic time series information is less consistent and noisier, so there is greater variation in the predictions themselves. GDP in particular is subject to some uncertainty. Similar to other series, inputs to publishing official GDP figures by the Central Statistics Organization (CSO) have quality problems. As a result, published figures are subject to substantial data revisions. In some cases, revisions between initial and final yearly growth rates were up to 2.5% (Sapre and Sengupta, 2017). As a result, prediction values are also noisier. The series fall into two primary categories: hard data and financial data. The hard data are quantitative economic indicators. These series are meant to capture economic activity in India broadly. Several series capture domestic production (India Industrial Production, Crude Oil Production, Steel Production). Considering the connectedness of the broader Indian economy, several several of the series focus on trade (Indian imports and Indian exports). Similarly, several international series are included (US Industrial Production,

3 Figure 1: GDP plot: Raw and transformed series Indian GDP in chained (2000) rupees

30 25 20 15

Rupee (trillions) 10 2000 2005 2010 2015 Date

0.10

0.05 YoY pct. change YoY 0.00 2000 2005 2010 2015 Date

European Industrial Production). Financial variables are available at a higher frequency (some are available at the hourly level). For this paper, these are aggregated to the monthly level. Similar to the hard indicators, these are intended to give a broad economic view. BSE 30 and Asia Sentix are indicators of the Indian and Asian markets respectively. The exchange rate gives a hard indicator for global trade. A few considerations were made for the data to enter the model. First, the data were transformed to ensure stationarity just as in Bragoli and Fosten (2017). The GDP series found in ﬁgure 1 illustrates a sample transformation. The top graph shows the untrans- formed series with a clear positive trend. The bottom shows the result after applying a year over year transformation. We see that the positive trend disappears. Using year-over-year changes diﬀers from some of the nowcasting literature (Li, 2016; Banbura, Giannone and Reichlin, 2011; Giannone, Reichlin and Small, 2008). While many papers focus on developed economies with high quality timely data, the developing context

4 Table 1: Series, transformations, and loadings Name Mnemonic Transformation Frequency Common Hard Financial India Industrial Production INPIINDU pcy M 1 1 0 India Exports INMTEXU pcy M 1 1 0 India Imports INMTIMU pcy M 1 1 0 India PMI Services MPMIINSA pcy M 1 1 0 India PMI Manufacturing MPMIINMA pcy M 1 1 0 India Crude Oil Production INRGGTGT pcy M 1 1 0 India Steel Production INFRSTEL pcy M 1 1 0 India Electricity Generation INFRELEC pcy M 1 0 1 Indian Money Supply (M1) INMSM1 pcy M 1 0 1 India Exchange Rate (Rupee/US$, EOP) USDINR pcy M 1 0 1 5 India 91-Day Treasury Bill IYTB3M pcy M 1 0 1 India Stock Prices Sensex/BSE 30 SENSEX pcy M 1 0 1 India Industrial Workers Consumer Price INCPIIND pcy M 1 1 0 India Agricultural Laborers Consumer Price INCPAGLB pcy M 1 1 0 India Rural Laborers Consumer Price INCPRULB pcy M 1 1 0 India Consumer Price INFUTOT pcy M 1 1 0 India Wholesale Price Index (All) INFINF pcy M 1 1 0 India Wholesale Price Index (Fuel Power) INFIFUE pcy M 1 1 0 US Industrial Production Indx IP pcy M 1 1 0 US ISM Manufacturing PMI NAPMPMI lvl M 1 1 0 Euro 19 Industrial Production EUITEMUM lvl M 1 1 0 Asia (ex Japan) Sentix overall Index SNTEASGX pcy M 1 0 1 Current Price Gross Domestic Product in India INDGDPNQDSMEI pcy Q 1 1 0 differs. Most time series lack seasonal adjustment, so including raw series in the model would primarily capture seasonal variation rather than underlying trends. Taking a yearly view addresses these concerns, and has previously been used in the development context (Swallow and Labbé,2010; Giannone, Agrippino and Modugno, 2013a). While some papers have adjusted for seasonality upon input (such as implementing the US Census Bureau’s X-13ARIMA-SEATS seasonal adjustment approach), this approach is less salient for the GDP prediction, as two-sided filters are less context appropriate (Swallow and Labbé,2010; Vosen and Schmidt, 2011; Li, 2016).

2.2 Google Search data

I include Google Search data, which are publicly available through the Google Trends interface 1. Each series gives relative search volume for a particular term, where values are scaled so that the maximum search volume is 100. As a consequence, for some volatile terms, the lowest level is 0 (even when raw search volume itself is nonzero). Users can adjust search scope — globally, by country, or by state — and series frequency — hourly, daily, weekly, and monthly. For this study, I examine search volume in India by monthly frequency. Its relatively high usage in India allows for Google search to be used to approximate consumer sentiment. Figure 2 gives Google’s search market share in India over time. By traffic volume, Google commands 96-97% of the market from 2009-2018 StatCounter (2018). One explanation to Google’s spread is English’s widespread role, especially among younger people. While other languages are are very regional (e.g. Hindi in the north, Bengali in the east, Telugu in the south, Marathi in the west), English plays a large role in schools nationally as its second official national language (Census of India, 2001). This gives Google an advantage over regional language search variants, and provides a means to measure preferences nationally. Several studies have used Search data to capture consumer sentiment. In this context, search data acts as a signal for people’s unfiltered expectations. The advantage to this approach is in timeliness; search data are available in real-time. The terms are categorized into three categories: economy, jobs, and household. These are intended to holistically capture people’s feelings about the economy as a whole, similar

1https://trends.google.com/trends/

6 Figure 2: Google Search has maintained high market share over time Indian search engine market share 100

75 Search engine Bing 50 Google Other 25 Market share (%) Market

0 2010 2012 2014 2016 2018 Year to the approach found in Seabold and Coppola (2015). Search terms related to the economy help indicate economic anxiety generally, where people search terms like “recession” and “petrol prices” when nervous about the future. Similarly, in the “jobs” category, people are likely to search for terms such as “salary” and “raise” for similar reasons. Finally, the “household” category is intended to capture consumer aspirations, with searches such as “cars” appearing during more positive economic outlooks. Over its history, Google has provide varying degrees of language support for India. Despite the increase in support, traﬃc volume is largely in English. Figure 3 gives an illustration. Over time, English search volume is several times greater than that of mulya. There are several approaches used for incorporating Google Trends data into the forecasting framework. The primary hurdle is extracting signal from many time series. Several papers suggest creating composite indices from the Search terms, eﬀectively reducing mul- tiple time series to a single indicator via least squares (Vosen and Schmidt, 2011; Seabold and Coppola, 2015). An advantage to this approach is its ease for being incorporated directly into VAR models. Considering choice of evaluation framework, factor models allow these series to be directly inputted just as in Li (2016). This approach is consistent with how the traditional data sources are incorporated. The factor model itself provides a form

7 Figure 3: Search volume for “price” is greater than that of mulya Relative search volume for 'price'

100

75 Terms

50 price (English) price (Hindi)

25 Search volume (max = 100) Search volume 0 2005 2010 2015 Year

Table 2: Google Search terms and categories Category Terms Economy price, prices, cost, inﬂation, recession, petrol price Jobs salary, raise, unemployment, bankruptcy, insolvency, job, jobs, job vacancy, jobs in Household rent, rent in, job salary, cars, rent house of dimension reduction. One shortcoming to Search data is the lack of seasonal adjustment. Consistent with the other economic time series, I transform the series using a year-over-year percent change transformation. Figure 4 displays a sample transformation. Qualitatively, the untrans- formed series appears to be increasing over time. After performing the transformation, the resulting series appears to be stationary.

8 Figure 4: Google Search: Example of year-over-year percent change transformation Google Search: "Salary" 100

Index (max = 100) Index 2005 2010 2015 Date

0.8

0.4

0.0 Transformed (% YoY) Transformed 2005 2010 2015 Date

3 Model

A dynamic factor model with Kalman filtering was used to model the series dynamics (Banbura, Giannone and Reichlin, 2011; Giannone, Reichlin and Small, 2008). This setup is common in the economic forecasting context, with past works focusing on different countries (China, India, U.K., Eurozone, United States) and for different time series (Li, 2016; Bragoli and Fosten, 2017; Swallow and Labbé,2010; Giannone, Agrippino and Modugno, 2013b). The model is driven by two relationships. First, the monthly series have the follow the following factor structure: M xt = µ + Λft + εt (1)

where xt gives the transformed data series (yearly growth rate or differences), µ gives a constant, ft is a r × 1 vector containing r unobserved factors, and Λ gives the factor coefficients. εt gives the idiosyncratic error. Call this factor structure the observation equation. This approach effectively is a means for dimension reduction, as each data series can be decomposed into several common components. In practice, from transformations, the constant µ = 0.

9 Second, the factors themselves can be modeled as an AR(1) process where

ft = Aft−1 + ut (2) for i.i.d. ut ∼ N(0,Q). Matrix A is r × r and contains autoregressive coeﬃcients. This relationship is called the transition equation, which can be understood as determining the factor relationship across time. Intuitively, for each period t, the value for each time series (e.g. Consumer Price Index) is determined by the value of the unobserved factors fi. The values fi change over time according to the dynamics of the transition equation. The idiosyncratic component can also be modeled as an AR(1) process where

εi,t = αiεi,t−1 + ei,t (3)

2 for i.i.d ei,t ∼ N(0, σi ). For this model, consistent with existing literature, we further impose structure onto the factors driven by economic theory (Giannone, Reichlin and Small, 2008; Giannone, Agrippino and Modugno, 2013a). In our case, we take number of factors r = 3 where G H F each factor can be written as ft = (ft , ft , ft ) representing the “global”, “hard”, and “ﬁnancial” factors. Furthermore, the loadings matrix Λ is structured so that the global factor loads onto each series. Depending on the type of data, each series also takes on H F either a hard ft or ﬁnancial ft loading: ! Λ Λ 0 Λ = H,G H,H . ΛF,G 0 ΛF,F

For the models incorporating Google Trends, four factors are used: “global”, “hard”, “ﬁnancial”, and “soft” where Trends data are assigned to the soft factor. For the transition equation, factor dynamics are related to each factor’s own previous state, so   Ai,G 0 0   Ai =  0 A 0  .  i,H  0 0 Ai,F In order to incorporate mixed frequency data, quarterly data can be modeled as the

10 sum of monthly data. By convention, these series are placed at the end of the quarter while preserving monthly indexing t. Using the approach found in Mariano and Murasawa (2003), the quarterly series can be modeled as

Q yt = yt + 2yt−1 + 3yt−2 + 2yt−3 + yt−4.

With the monthly analogue, the quarterly series can enter the transition and observation equations using the same dynamics. The model is estimated using an expectation maximization algorithm Giannone, Reich- lin and Small (2008). This approach is often used in situations where missing observations make direct maximization of the likelihood function computationally intractable. Broadly, there are several steps. Factor estimates are ﬁrst initialized using principal components. Then, these initial conditions enter the expectation-maximization loop. In the expectation step, the log-likelihood conditional on the data is computed based on values from the previous iteration. In the maximization step, new parameters are estimated by maximizing the expected log-likelihood. Expectation and maximization are looped until estimates con- verge, measured by the change in log-likelihood. Details on this procedure can be found in Doz, Giannone and Reichlin (2006); Ba´nburaand Modugno (2014).

11 Figure 5: Forecasting error by month number

10-3 RMSFE by month (difference from first month) 0

-0.2

-0.4

-0.6

-0.8 RMSFE (difference from first month)

-1 1 2 3 Month

4 Results

To evaluate model performance, I use a pseudo-data vintage approach. The goal is to simulate the real-world forecast creation process; as more information become available, the forecasts are updated. Each month, a prediction for current quarter GDP is created using information available at that month. Therefore, three predictions are generated for the current quarter GDP (month 1, month 2, and month 3). This simulates the publication delay for GDP ﬁgures, but abstracts away irregular release intervals (monthly series are not updated concurrently) and data revisions (estimates are often revised after initial estimates are published). While there is substantial literature taking data revisions and release schedules into account, this will not be the focus of this paper (Giannone, Reichlin and Small, 2008; Giannone, Agrippino and Modugno, 2013c). Figure 5 illustrates improvements in model performance as more information becomes available. On average, as time progresses, the forecasting error decreases. Month 1 has the highest error, month 2 has lower error, and month 3 has the lowest error. Figure 6 gives an illustration of the model’s mechanics. Each transformed series is standardized and plotted (the colored series) alongside the common factor (in black). No- tice that the common factor captures some comovement for the each of the series, which

12 Figure 6: Standardized data and common factor plot Standardized data and common factor 6

-2

-4

-6

-8

-10

-12 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

is especially apparent during and after the Great Recession. During the downturn, many economic indicators downturns decreased in value, and afterwards, many decreased. This figure does not give predictions: it illustrates that series have some degree of commonality. Figures 7 and 8 display the four prediction models over time. On inspection, the four approaches roughly track GDP throughout the period and perform best during low volatility intervals. All four models perform relatively well from 2006-2008 and from 2012- 2015. The four models share similar characteristics during high volatility periods. The models overpredict GDP during the beginning of the Great Recession and underpredict its recovery afterwards. Similarly, GDP is slightly overpredicted during the 2012 slowdown. Moreover, all four slightly underpredict GDP from 2016 onwards, despite being a low volatility period. One potential reason is a change in methodology in price measurement. In 2016, the Cen- tral Statistics Office modified estimates for Wholesale Price Index and Index of Industrial Production, which are used to compute GDP in constant prices. These changes resulted in an upwards GDP revision of .5%. It is possible that the model provides predictions without accounting for changes in methodology. Figure 9 gives a more detailed account for prediction error over time. While the move-

13 Figure 7: Within quarter predictions (no search data)

Current quarter predictions (baseline)

0.16 Data Three factors 0.14 One factor 0.12

0.1

0.08

YoY GDP 0.06

0.04

0.02

-0.02 2004 2006 2008 2010 2012 2014 Date

Figure 8: Within quarter predictions (with search data)

Current quarter predictions (Google Trends)

0.16 Data Four factors 0.14 One factor 0.12

0.1

0.08

YoY GDP 0.06

0.04

0.02

-0.02 2004 2006 2008 2010 2012 2014 Date

14 Figure 9: Prediction error over time (no search data)

Residuals for 3 factor and 1 factor models (no Google trends) 0.04

3 factors 1 factor 0.02

-0.02

-0.04

-0.06 2006 2008 2010 2012 2014 2016

Figure 10: GDP Root Mean Square Error Series 3 factors 1 factor 4 factors (with trends) 1 factor (with trends) RMSE 0.0139 0.0157 0.0143 0.0165 ments for the multifactor and single factor models are similar, the multifactor model is more effective in predicting the higher volatility periods. For a more formal comparison of forecasting performance, figure 10 shows root mean square error. For each set of data, the single factor model performs worse than the multifactor model. Comparing the Trends and non-Trends model, it appears that adding the Search data does not improve predictions. From figure 11, notice that these performance differences are constant overtime. Displaying the RMSFE, the three factor no Trends model consistently outperforms the other three models from 2006-2018. Similarly, the other three models maintain the same relative position, with all four models moving together. Next, I generated predictions for the Indian Consumer Price Index series. This exercise gives insight to the Google Trends “consumer sentiment” measurement hypothesis. Prices are tied more closely to movements in sentiment than for GDP. If the Trends data is a good indicator for sentiment, we would expect Trends data to increase the performance of

15 Figure 11: Models perform similarly over time Root mean squared forecast error over time 0.025

0.02

0.015 RMSFE 0.01

3 factor 0.005 1 factor 4 factor (Trends) 1 factor (Trends)

0 2006 2008 2010 2012 2014 2016 Year the model. Note that CPI is a monthly, not a quarterly series, so predictions are created for the current month. Figures 12 and 13 display predictions for CPI. The four models predict CPI fairly well, as the prediction error is low. The improved prediction performance indicates that forecasting CPI is an easier prediction problem than GDP. CPI’s relatively low volatility and higher frequency make for better predictions. Monthly values make the series more predictable. Moreover, the higher performance in the baseline models using only economic data give little room for performance improvements in including Trends data. Figure 14 gives root mean squared error for CPI predictions. Just as before, the Google Trends models perform slightly worse than the traditional economic models.

16 Figure 12: CPI predictions using economic data Current quarter predictions 0.18

Data 0.16 Baseline (1 factor) Baseline (3 factors) 0.14

0.12

0.1 YoY CPI 0.08

0.06

0.04

0.02 2004 2006 2008 2010 2012 2014 Date

Figure 13: CPI predictions using economic data plus search data Current quarter predictions 0.18

Data 0.16 Trends (four factors) Trends (one factor) 0.14

0.12

0.1 YoY CPI 0.08

0.06

0.04

0.02 2004 2006 2008 2010 2012 2014 Date

Figure 14: Root mean squared error for CPI Series 3 factors 1 factor 4 factors (with trends) 1 factor (with trends) RMSE 0.0048 0.0059 0.0050 0.0048

17 5 Discussion

For Indian GDP, incorporating search data into the prediction procedure provides no im- provement over the baseline model. However, incorporating a multifactor setup appears to improve forecasts. These differences reflect the difficulty of the problem. In some cases, the GDP figure itself changed by greater than 1% (Sapre and Sengupta, 2017). Advanced estimates can differ substantially from final numbers. As a result, forecasting can suffer from imprecision, complementing existing problems with noisy and unreliable data. Also, the current does not take into account the effect of staggered data releases on changes to prediction. While releasing data concurrently is parsimonious, a more realistic approach would be to add data according to a release calendar. Instead of modifying the dataset each month, data can added on their release dates. This would allow for a more careful study for how particular series releases move forecasts. Perhaps more importantly, following a release calendar can help take timeliness into account. Release sequencing is especially important for studying qualitative data. While soft indicators generally are noisier, they are subject to shorter publication lag. These series give an early signal (albeit less precise) to the direction of the target series. Without release sequencing, these advantages disappear. The current simulation does not give an environment for seeing the strengths of the Google Search data. For the future, the study can be improved by more closely examining the mechanism for measuring consumer sentiment. While this study includes a broad selection of terms, more careful consideration for cultural and linguistic norms would likely aid the analysis. India has the second most English speakers, but more closely examining ground-level search patterns would shed light onto the specific transmission of economic attitudes.

18 References

Ba´nbura,Marta, and Michele Modugno. 2014. “Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data.” Journal of Applied Econometrics, 29(1): 133–160.

Banbura, Marta, Domenico Giannone, and Lucrezia Reichlin. 2011. “Nowcasting.” Oxford Handbook of Economic Forecasting.

Bardhan, Pranab. 2013. “The State of Indian Economic Statistics : Data Quantity and Quality Issues.”

Bragoli, Daniela, and Jack Fosten. 2017. “Nowcasting Indian GDP.” Oxford Bulletin of Economics and Statistics (forthcoming).

Census of India. 2001. “Population by Bilingualism, Trilingualism, Age and Sex.”

Choi, Hyunyoung, and Hal Varian. 2012. “Predicting the Present with Google Trends.” Economic Record, 88(SUPPL.1): 2–9.

Doz, Catherine, Domenico Giannone, and Lucrezia Reichlin. 2006. “A Quasi Max- imum Likelihood Approach for Large Approximate Dynamic Factor Models.” European Central Bank Working Paper Series, , (674).

Giannone, Domenico, Lucrezia Reichlin, and David Small. 2008. “Nowcasting: The real-time informational content of macroeconomic data.” Journal of Monetary Eco- nomics, 55(4): 665–676.

Giannone, Domenico, Silvia Miranda Agrippino, and Michele Modugno. 2013a. “Nowcasting China Real GDP.”

Giannone, Domenico, Silvia Miranda Agrippino, and Michele Modugno. 2013b. “Nowcasting China Real GDP.”

Giannone, Domenico, Silvia Miranda Agrippino, and Michele Modugno. 2013c. “Nowcasting China Real GDP Forecast-Nowcast-Backcast : Q3-2013.” , (February): 1– 26.

19 Goldfarb, Avi, Shane M Greenstein, and Catherine E Tucker. 2015. Economic Analysis of the Digital Economy Volume.

Li, Xinyuan. 2016. “Nowcasting with Big Data : is Google useful in Presence of other Information ?” 1–41.

Mariano, Roberto S., and Yasutomo Murasawa. 2003. “A new coincident index of business cycles based on monthly and quarterly series.” Journal of Applied Econometrics, 18(4): 427–443.

Pan, Bing, Doris Chenguang Wu, and Haiyan Song. 2012. Forecasting hotel room demand using search engine data. Vol. 3.

Sapre, Amey, and Rajeswari Sengupta. 2017. “An analysis of revisions in Indian GDP data.” , (September).

Schmidt, Torsten, and Simeon Vosen. 2012. “Using internet data to account for special events in economic forecasting.” Ruhr Economic Paper, , (382).

Seabold, Skipper, and Andrea Coppola. 2015. “Nowcasting Prices Using Google Trends An Application to Central America.” World Bank Policy Research Working Pa- per, , (7398).

StatCounter. 2018. “Search Engine Market Share India.”

Suhoy, Tanya. 2009. “Query Indices and a 2008 Downturn: Israeli Data.” Bank of Israel Discussion Paper Series, , (6).

Swallow, Yan Carri`ere,and Felipe Labb´e. 2010. “Nowcasting with Google Trends in an Emerging Market.” Documentos de Trabajo (Banco Central de Chile), 298(588): 1.

Vosen, Simeon, and Torsten Schmidt. 2011. “Forecasting private consumption: Survey-based indicators vs. Google trends.” Journal of Forecasting, 30(6): 565–578.

World Bank. 2017. “World Development Indicators.”

20 A Structure of model

In practice, the monthly, quarterly, and idiosyncratic errors are stacked through block matrices. The stacked version of the observation equation is displayed below. Notice that the monthly series are ordered before the quarterly series by convention, and that the monthly to quarterly interpolation scheme is listed.

  ft   ft−1   f   t−2   ft−3     ! ! ! ft−4 xt µ Λ 0 0 0 0 In 0 0 0 0 0   = +  εt  (4) yQ µ˜ Λ 2Λ 3Λ 2Λ Λ 0 1 2 3 2 1   t Q Q Q Q Q Q  Q   εt    εQ   t−1  Q  εt−2   εQ   t−3 Q εt−4

Similarly, the idiosyncratic and factor relationships are stacked for the transition equation. To be consistent with the quarterly to monthly interpolation scheme, ﬁve period for factors f are used.

21       ft   ft−1 ut   A1 0 0 0 0 0 0 0 0 0 0     ft−1   ft−2  0     Ir 0 0 0 0 0 0 0 0 0 0     f    f   0   t−2    t−3      0 Ir 0 0 0 0 0 0 0 0 0     ft−3   ft−4  0     0 0 I 0 0 0 0 0 0 0 0        r      ft−4   ft−5  0     0 0 0 Ir 0 0 0 0 0 0 0      εt  =   εt−1 +  et     0 0 0 0 0 diag(α , . . . , α ) α 0 0 0 0      Q   1 n Q   Q   Q  εt    εt−1 et     00000 0 10000     εQ    εQ   0   t−1    t−2    Q   00000 0 01000  Q    εt−2   εt−3  0     00000 0 00100     εQ    εQ   0   t−3  t−4   Q 00000 0 00010 Q εt−4 εt−5 0 (5)