Tilburg University

Essays on the role and effects of advertising He, Chen

Publication date: 2018

Document Version Publisher's PDF, also known as Version of record

Link to publication in Tilburg University Research Portal

Citation for published version (APA): He, C. (2018). Essays on the role and effects of advertising. CentER, Center for Economic Research.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 01. okt. 2021 Essays on the Role and Effects of Advertising

Chen He

October 30, 2018

Essays on the Role and Effects of Advertising

PROEFSCHRIFT ter verkrijging van de graad van doctor aan Tilburg Uni- versity op gezag van de rector magnificus, prof.dr. E.H.L. Aarts, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de Ruth First zaal van de Universiteit op dinsdag 30 oktober 2018 om 16.00 uur door

CHEN HE

geboren op 16 juli 1988 te Gansu, China PROMOTIECOMMISSIE: PROMOTOR: prof.dr. Bart J. Bronnenberg COPROMOTOR: dr. Tobias J. Klein OVERIGE LEDEN: prof.dr. Jaap H. Abbring dr. George Knox dr. Jason M.T. Roos

Essays on the Role and Effects of Advertising

Copyright @ 2018 Chen He

All rights reserved. Acknowledgements

My nine years in Tilburg has been a journey that is very exciting and fulfilling. During this journey, I was surrounded by many people without whose support this dissertation would never be possible. First and foremost, I would like to express my deepest gratitude to my supervisors, Tobias Klein and Bart Bronnenberg for their insightful guidance and continued support. Tobi, you are the person who enlightened me about how to motivate, conduct high quality research and present the research output in well-written papers. You encouraged me to think and read broadly, which has shaped my research interests towards the interaction of several sub-fields in economics and marketing. The journey to a Ph.D. is never flat. Your positive, encouraging and inspiring attitude have supported me every time when I get stuck on my research. Moreover, thank you for your valuable guidance and suggestions during our meetings both before and after my job market. Those have greatly relieved my stress and given me the confidence to find a job. Bart, thank you for your valuable and strong support when I was on the market. I benefited a lot from your information on both the marketing and econ job market. I also learned a lot from you in improving the quality of my job market paper. Thank you for your detailed and insightful feedback on my work. Your attitude on research and your great attention to detail have not only help me find a job but also set a great example for me to be a qualified researcher. It has been a privilege for me to learn from you. I would like to thank Jaap Abbring who is not only a member of my doctoral committee but also the person who taught me introductory econometrics course during my undergraduate study in Economics. Jaap, thank you for introducing me to a wonderful area in which I myself become a researcher. Thank you also for your strong support on my job market and your feedback on my thesis. Also, thank you for teaching me to conduct research when I was your research assistant. I would like to express my gratitude to the remaining two members of my doctoral committee, George Knox and Jason Roos. I greatly benefited from their expertise and their great comments and suggestions on my thesis. I benefited a lot from many other faculty members. I would like to thank Rik Pieters for his great comments on my work and his support on my job market. Many thanks to Hannes Datta who also help me on the market and gave me constructive comments on how to prepare my job talk. I would like to thank the members of the CentER Ph.D. Job Placement Committee: Otilia Boldea and Martin Salm for their great feedback and help. I also want to thank Cecile de Bruijn

i for her excellent support on the job market. It was my great honor to be part of the excellent Structural Econometrics Group in which I myself also became an organizer for one year. I would like to thank all of the group participates for their valuable discussions on my work and also on many other great topics from which I have benefited a lot. Special thanks to Yufeng Huang, Yan Xu and Yifan Yu who generously share their research experience and information about the job market. I would like to express my gratitude to Ittai Shacham who is a co-author of one chapter in this dissertation. I would like to thank Elisabeth Beusch, Roxana Fernandez, Bas van Heiningen, Emanuel Marcu, Renata Rabovic, Laura Capera Romero, Moritz Suppliet and Bert Willems for their nice comments on my work. My Ph.D. journey at Tilburg would not have been fun without many colleagues and friends. First, I would like to thank Xingang Wen, Xue Xu and Kun Zheng for the great time that we got together. Special thanks to Yi Zhang, we have been to many great restaurants together during the weekends. I want to thank my two roommates Stefan Hubner and Stefania Basiglio for the time that we spent together. In addition, I want to thank many friends that I have encountered. They are Khulan Altangerel, Hasan Apakan, Shuai Chen, Ruonan Fu, Di Gong, Tao Han, Yi He, Hao Hu, Michal Kobielarz, Xu Lang, Lei Lei, Hong Li, Jing Li, Hao Liang, Manwei Liu, Shuo Liu, Ahmadreza Marandi, Zilong Niu, Anderson Grajales Olarte, Christos Revelas, Jop Schouten, Lingbo Shen, Lei Shu, Vatsalya Srivastava, Chen Sun, Loes Verstegen, Ruixin Wang, Xiaoyu Wang, Yadi Yang, Yuxin Yao, Wencheng Yu, Wanqing Zhang, Jianzhe Zhen, Yeqiu Zheng and Bo Zhou. I would like to extend my sincere gratitude to our secretaries of EOR: Anja Heijeriks, Lenie Laurijssen, Anja Manders-Struijs and Heidi Ket-van Veen and also to our management assistant Korine Bor for creating a supportive, collaborative and efficient working environment. Last but not least, I would also like to express my deepest love to my parents who always stand by me. I could never thank you enough for being there for me without a doubt.

Chen He August 2, 2018 Tilburg,

ii Contents

List of Tables ...... iii List of Tables ...... vi List of Figures ...... viii

1 Introduction 1

2 Optimizing Online Sales using Targeted Advertising 4 2.1 Introduction ...... 4 2.2 The market for lottery tickets in the Netherlands ...... 8 2.3 Data and descriptive statistics ...... 8 2.3.1 Overview ...... 8 2.3.2 Descriptive evidence ...... 9 2.4 Effect of advertising ...... 14 2.4.1 Effect of advertising on visits and sales ...... 14 2.4.2 Effect of advertising on online conversion rate ...... 16 2.4.3 Effect of advertising across channels ...... 17 2.5 A model of lottery ticket demand ...... 18 2.5.1 Motivation of estimating a model ...... 18 2.5.2 General structure ...... 19 2.5.3 Advertising ...... 19 2.5.4 Consumers ...... 20 2.5.5 Discussion ...... 22 2.5.6 Solving the model ...... 23 2.5.7 Empirical implementation ...... 26 2.6 Results ...... 27 2.6.1 Parameter estimates and fit ...... 27 2.6.2 Decompose probabilities ...... 29 2.6.3 Elasticities of advertising ...... 32 2.6.4 The proposed model vs. the model with no correlation between e-s .. 32 2.7 Counterfactual experiments ...... 33 2.7.1 Setup ...... 33

iii 2.7.2 Results ...... 35 2.8 Summary and concluding remarks ...... 36 2.A Additional tables and figures ...... 38 2.B Computing conditional choice probability ...... 42 2.C The probit model: a more general case ...... 45 2.C.1 Purchasing stage decision ...... 45 2.C.2 Visiting stage decision ...... 46 2.D Details on the econometric implementation ...... 46 2.D.1 Empirical setup ...... 46 2.D.2 Method of simulated moments ...... 48 2.D.3 Moments and weighting matrix ...... 48

3 Advertising as a reminder: Evidence from the Dutch State Lottery 50 3.1 Introduction ...... 50 3.2 The market for lottery tickets in the Netherlands ...... 53 3.3 Data and descriptive statistics ...... 54 3.3.1 Overview ...... 54 3.3.2 Descriptive evidence ...... 54 3.4 Evidence on the effect of advertising ...... 57 3.4.1 Direct evidence for big advertisements ...... 59 3.4.2 Evidence from a distributed lag model ...... 60 3.4.3 The dependence of the effect of advertising on time ...... 63 3.5 A model of lottery ticket demand ...... 65 3.5.1 General structure ...... 66 3.5.2 Consideration ...... 66 3.5.3 Purchase decision ...... 67 3.5.4 Expectations ...... 67 3.5.5 Solving the model ...... 68 3.5.6 Empirical implementation ...... 70 3.6 Results ...... 71 3.6.1 Parameter estimates and fit ...... 71 3.6.2 The dependence of advertising effects on time ...... 73 3.7 Counterfactual experiments ...... 73 3.8 Summary and concluding remarks ...... 77 3.A Details on the econometric implementation ...... 79 3.A.1 Empirical setup ...... 79 3.A.2 Method of simulated moments ...... 79 3.A.3 Moments and weighting matrix ...... 80 3.A.4 Smoothing ...... 80

iv 3.B Robustness ...... 81 3.B.1 Assumption on market size ...... 81 3.B.2 A model with serially correlated viewership ...... 83 3.C Additional tables and figures ...... 84

4 Advertising Match Values and Viewership Demand 93 4.1 Introduction ...... 93 4.2 Conceptual framework ...... 97 4.3 A short history of Israeli television ...... 99 4.4 Data and descriptive statistics ...... 99 4.4.1 Data description ...... 100 4.4.2 Sample selection ...... 101 4.4.3 Descriptive statistics ...... 101 4.5 Empirical strategy ...... 105 4.6 Results ...... 106 4.7 Counterfactuals ...... 107 4.8 Concluding remarks ...... 114 4.A Additional tables and figures ...... 115

Bibliography 129

v List of Tables

2.1 Rescaled percentiles for GRP’s at the minute level ...... 11 2.2 The effect of advertising ...... 15 2.3 The effect of advertising on conversion rate ...... 17 2.4 Estimation results: key parameters ...... 27 2.5 Estimation results: draw fixed effect ...... 28 2.6 Elasticities of advertising ...... 32 2.7 Comparison estimates: key parameters ...... 33 2.8 Comparison estimates: draw fixed effect ...... 34 2.9 Effect of various advertising strategies ...... 36 2.12 Effect of TV and radio advertising from different channels ...... 38 2.10 Definition of TV channels and groups ...... 43 2.11 Definition of radio channels and groups ...... 44

3.1 The effect of advertising on sales ...... 61 3.2 Parameter estimates ...... 72 3.3 Effect of various advertising strategies ...... 76 3.4 Robustness checks: parameter estimates when we double the market size and allow for serially correlated viewership ...... 82 3.5 Differences across draws ...... 84 3.6 Effect of TV and radio advertising ...... 85 3.7 Evidence from a distributed lag model at the hourly level ...... 87

4.1 Number and length of advertisements by industry ...... 102 4.2 Summary statistics of program genre ...... 102 4.3 Summary statistics of commercial breaks ...... 103 4.4 Effect of re-ordering on predicted GRP’s ...... 112 4.5 Effect of re-ordering on predicted GRP’s for different sub-viewers ...... 113 4.6 Effect number of ad by age ...... 115 4.7 Effect number of ad by income ...... 118 4.8 Viewer segments just before the commercial break across program genres by age 122 4.9 Viewer segments just before the commercial break across program genres by income ...... 122

vi 4.10 Effect number of ad by age ...... 123 4.11 Effect number of ad by income ...... 125

vii List of Figures

2.1 distribution of GRP’s across channels ...... 9 2.2 Share of GRP’s across channels ...... 10 2.3 GRP’s and visits at the minute-level for a regular draw ...... 12 2.4 GRP’s and sales at the minute-level for a regular draw ...... 12 2.5 GRP’s and visits at the minute-level for a short time window ...... 13 2.6 GRP’s and sales at the minute-level for a short time window ...... 13 2.7 Effect of a 10-GRP advertisement on site visits and online sales ...... 16 2.8 Effect of a 5-GRP advertisement on site visits and online sales for different channels ...... 18 2.9 Model summary ...... 20 2.10 Option value ...... 25 2.11 Model fit ...... 30 2.12 Decompose probabilities ...... 31 2.13 Option value: all other cases ...... 47

3.1 Cumulative sales for selected draws ...... 55 3.2 Advertising and sales during the day ...... 56 3.3 GRP’s at the minute-level for a regular draw ...... 57 3.4 GRP’s at the minute-level for a short time window ...... 58 3.5 The effect of advertising on sales for big advertisements ...... 60 3.6 Effect of timing ...... 64 3.7 Model fit ...... 74 3.8 Dependence of predicted effect of advertising on timing ...... 75 3.9 Effect of different advertising strategies ...... 77 3.10 Cumulative sales for remaining draws ...... 88 3.11 Advertising and sales during the day of the draw ...... 89 3.12 GRP’s at the minute-level for a special draw ...... 90 3.13 Advertisements that were used to construct Figure 3.5 ...... 91 3.14 Expectations ...... 92

4.1 Viewer segments across program genres ...... 104 4.2 Effect of advertisement number ...... 106

viii 4.3 Effect of advertisement industry ...... 107 4.4 Effect of advertisement number by age and income ...... 108 4.5 Effect of advertisement industry by age and income ...... 109 4.6 Distribution of commercial breaks ...... 121

ix Chapter 1

Introduction

This Ph.D. dissertation consists of three essays on the role and effects of advertising. This topic is relevant for both industrial organization and marketing. From the marketing side, advertising is one of the most important instruments for firms to market their products. In 2016, global ad- vertising spending amounted to 493 billion US dollars (Letang and Stillman, 2016). Therefore, It is important to ask the question how empirical work can guide firms who wish to improve their advertising strategy. From the viewpoint of the industrial organization, advertisements are welfare-improving. Although historically, advertising is mainly served as a means to fund TV shows, the range of services funded by advertising today is far beyond TV shows. In fact, much of the free content that consumers benefit from are directly financed by advertising. Therefore, it remains important to further our understanding of whether—and if so how—advertising can causally affect individual behavior. Chapter 2, “Optimizing Online Sales using Targeted Advertising”, studies how reallocating advertising budgets can increase online sales. I find empirical evidence that advertising does not only causally increase online traffic but also causally increase online conversion rates. I then point out that the observed increase in the conversion rate could be due to the fact that those who are motivated to visit the website through advertisements are different from those who usually visit. I find that the former has a higher probability to buy given that they visit. Ignoring this and studying consideration and conversion separately could result in using an underestimated conversion rate and thus a suboptimal advertising strategy, in particular when advertising on different channels reaches different audiences. Motivated by this, I propose and estimate a new integrated model of consideration and conversion. My estimates show that one would overestimate both the effects of advertising and the cost of visiting the website if one would ignore this selection. Finally, I show in my counterfactual experiments that shifting advertising across channels could lead to increased sales. Advertisements can fulfill a number of roles (Bagwell, 2007). They can be informative, persua- sive, or seen as a complement to actual consumption. The informative role asserts that adver- tising serves as a means to firms through which they convey information about the existence, location, price and other characteristics of products. Persuasive advertising affects consumer

1 preferences before buying the product, while complementary advertising makes consuming a product more enjoyable. The exact role advertising plays is highly context-specific and is re- lated to whether consumers benefit or suffer from it. Chapter 3, “Advertising as a Reminder: Evidence from the Dutch State Lottery”, studies the dynamic effects of advertising. The central idea is that advertisements can also remind con- sumers to buy. We find the effects of advertising to be strong and to last up to about 4 hours. They are the bigger the less time there is until the draw. Based on these findings, we point out a tradeoff the firm faces. On the one hand, if it allocates all the advertising budget very late, then it may not reach certain consumers, for instance because they will not watch TV on these days; on the other hand, if it spreads advertising expenditures out over time in order to reach more consumers, then it may forego the possibility to effectively spend the money at later points in time. This means that total sales will crucially depend on the dynamic advertising strategy and that it would be valuable to assess the dependence of sales on counterfactual advertising strategies. For this, we develop a tractable dynamic structural model of consumer behavior. Our counterfactual experiments suggest that the firm puts too much weight on advertising early and spreading advertisements over time. Shifting advertising expenditures to the days before the draw could be a strategy to increase sales for a given advertising budget. When studying the role of advertising, it is important to keep in mind that the exact same ad- vertisement can be informative for some consumers, persuasive or a complement for others, and a nuisance for yet another group of consumers. This will manifest itself in heterogeneous responses of consumers to the advertisement: when an advertisement that is aired on television is a nuisance rather than being useful, then consumers will be more inclined to tune away; con- versely, when it is useful to consumers, then they will tentatively stay tuned. The primary aim of Chapter 4, “Advertising Match Values and Viewership Demand”, is to characterize this het- erogeneity. The idea is that pairs of advertisements and consumers can be thought of as being associated with a pair specific match value, which expresses itself in the response to the adver- tisement. Following this idea, we estimate heterogeneous match values using high frequency data that contain information on viewing behavior for different types of consumers at the level of a position within the commercial break. We flexibly control for channel, show and time ef- fects, as well as the group-specific baseline viewing pattern within a commercial break, which is driven by advertising aversion and the desire to return to the program before the commercial break ends. Our results imply that there are rich patterns of dynamic selection that are driven by heterogeneity across groups and the order in which different types of advertisements are aired. These affect the total number of advertising impressions that can be sold at the group level and for a given total length of the commercial break. We quantify the magnitude of this effect in a set of counterfactual experiments. Once we see viewership response to advertising as a revealed preference measure, this suggests that overall welfare effects of reordering advertisements could be positive, in line with the view that more targeted advertising can be welfare-improving. To sum things up, in this dissertation, I measure the causal effects of advertising on consumer

2 behavior. I study the reminder role through which advertising affects consumers. I characterize the heterogeneous effects of advertising and show how firms could improve their advertising strategy. These three pieces of work help further our understanding of whether, how and why advertisements improve our day of life.

3 Chapter 2

Optimizing Online Sales using Targeted Advertising

2.1 Introduction

Advertising has for a long time been one of the most important instruments for firms to market their products and as a result, they spend a large amount of money on advertising. The adver- tising industry has even been expanding in recent years. Globally, media owners advertising revenues grew by 5.7% in 2016, to $493 billion Letang and Stillman (2016). TV advertising revenues grew by nearly 4% in 2016, reaching a market share of 38% Letang and Stillman (2016). Moreover, the trend shows that the industry is gradually and surely shifting to more Internet-based targeted ads and the “pay-per-click” model for online advertising, from the tra- ditional “pay-per-impression” based ads. According to the latest industry forecast, digital-based ad sales will become the number one media category in 2017, reaching a market share of 40% Letang and Stillman (2016). Though important, understanding how a firm should allocate its advertising budget remains a difficult question. Broadly speaking, at least two aspects that the firm needs to consider when it comes to allocating its advertising budget are when to advertise and where to advertise. Regarding the first aspect, He and Klein (2018) document that the timing of advertising is important and by advertising at the “right time”, the firm could improve its profit for given advertising budget. This paper addresses the second aspect: on which channel should the firm air its advertisement? This is important because the audiences are different across channels and therefore respond differently to advertising. I study this question in the context of the online sales, where TV and radio advertising can affect individual behavior in two stages, i.e., it can affect website traffic and conversion (the probability of buying conditional on visiting). I use high frequency data on TV and radio advertising from different channels together with online sales and website visits data to measure the effects of advertising. The high frequency nature of the data allows me to

4 cleanly identify these effects and to show that they depend on the channel on which the firm advertises. The contributions of this paper are threefold: first, I show that advertising causally increases online traffic as well as online conversion rate and that the effects depend on the channel on which the advertising is aired. This implies that targeting indeed matters. Second, I point out that the observed increase in the conversion rate could be due to the fact that those who are mo- tivated to visit the website through advertisements are different from those who usually visit. The former ones have a higher probability to buy given that they visit. Ignoring this and study- ing consideration and conversion separately could result in an underestimated conversion rate and thus a suboptimal advertising strategy, in particular when advertising on different channels reaches different audiences. Motivated by this, third, I specify and estimate a new integrated model of consideration and conversion, in which the effects of advertising are channel-specific. Crucially, simultaneously modeling the individual decisions on visiting and purchasing in an integrated two-stage model allows me to distinguish the conversion rate between two groups: those who are motivated to visit by advertisements and those who usually visit. When the manager lays out a one-stage model only for purchasing decision and estimates the conversion rate using only data of those who visit, then what she gets is the average conversion rate: average between those who go on the website just like that and those who go on the website because they see an ad. But the one that matters for the manager is the one for those who actually go on the website because they see an ad since the other group’s visiting decision is not affected by advertisements. Only simultaneously modeling two-stage decisions then allows me to study the effects of ads on the conversion rate for the treatment group. In the model, consumers first decide whether or not to visit the website. This decision is driven by an option value. Importantly, this option value is allowed to depend on unobserved consumer characteristics. Therefore, unlike standard discrete choice models, the model allows consumers who visited the website to have a higher probability of buying than those who did not visit the website and consequently it could generate the observed pattern in the data even if advertising had no direct effect on consumers once they visit the website. My estimates show that one would overestimate both, the effects of advertising and the cost of visiting the website if one would ignore this selection. Using the model structure and the obtained estimates, I demonstrate that shifting advertising across channels could lead to increased sales. The empirical context of this article is the market of the Dutch State Lottery. Studying the effects of advertising in the context of lotteries is promising. A lottery ticket is a simple product whose attributes are generally known or easy to describe. In addition, the market for lottery tickets in the Netherlands is very concentrated and therefore it is not unreasonable to assume that the firm acts as a monopoly in this market and therefore I could abstract away from supply side complication. Moreover, the website is very simple. It essentially offers only one product. This means that consumers can obtain little value from window shopping and thus I can reasonably assume that if a consumer visits the website because of watching the ad, she seriously considers

5 buying a ticket. This is in contrast to those more general websites (e.g., Amazon) that offer many products and thus they are still worth visiting because of the value obtained from window shopping, even without purchasing. This study is most closely related to He and Klein (2018) . There are two major differences: first, He and Klein (2018) answer the question when the firm should advertise. For that pur- pose, they estimate a dynamic model that abstracts from targeting and advertising only affects consideration but not the conversion. Here, I instead develop a static model of consumer choice and study on which channel should the firm air its advertisements. I simultaneously study the effects of advertising on both consideration and conversion. The decomposition of online sales into the two sub-stages (visit and purchase given visit) is useful for marketers. The first stage is informative about how many consumers can the advertisements attract. The second stage answers the question whether advertising can attract the right consumers. Besides He and Klein (2018), this study is also related to several strands of the literature. The marketing literature has described consideration and conversion as different stages of the pur- chase funnel. Hoban and Bucklin (2015) show that display advertising positively affects visita- tion to the firm’s website for users in most stages of the purchase funnel, but not for those who previously visited the site without creating an account. Lodish et al. (1995) show that TV ad- vertising could increase sales, but not always. Sherman and Deighton (2001) show that banner advertising leads to more site visits. The authors also show that targeting can improve the click through-rate (CTR). Manchanda et al. (2006) show that banner advertising increases online pur- chase. Haans et al. (2013) find that click-through rates are higher for advertisements involving expert evidence and statistical evidence than for those involving causal evidence, but the latter leads to a higher conversion rate. In the world of TV advertising, Kitts et al. (2014) show that TV advertising can increase the number of new visitors to a brand’s website. More closely related, Liaukonyte et al. (2015) use a quasi-experimental design to show that TV advertising triggers website visitation and online shopping, and the effect crucially depends on the content and media placement of the advertisement. In terms of comparing the relative effectiveness of advertising channels, Danaher and Dagger (2013) find that catalogs, television, and direct mail most strongly influence sales and profit, followed by radio and newspaper. The findings in my study are consistent with those in Tellis et al. (2000) and Chandy et al. (2001). The authors find that TV advertisements lead to more consumer telephone calls, but their effects dissipate very rapidly. My study contributes to this strand of literature by investigating the effects of ads on consideration and conversion simultaneously in an integrated dataset. This is possible since, in my data, website visits and online sales come from the same group of consumers. From a modeling perspective, this study is related to the literature on the effect of advertising on choice sets. The way in which advertising affecting consumer choice is related to Dubé et al. (2005) who model advertising to enter the flow utility of consumers non-linearly with dimin- ishing marginal return of additional unit of advertising. I generalize their model by allowing the effect of advertising to be different across channels. My study is also related to the search

6 and choice set literature because visiting the website can be viewed as a proxy of including the product into the choice set. The literature has found different ways to address the chal- lenge that choice sets of consumers are usually unobserved (to the researcher) which can be seen as a missing data problem. Bronnenberg and Vanhonacker (1996), Ackerberg (2001) and Albuquerque and Bronnenberg (2009), among others, use auxiliary information such as past purchases. Kim et al. (2010, 2016) treat the choice set as the result of a process of sequential search. In their model, the consumer will search an additional option if the marginal benefit of searching is larger than the marginal cost of searching. My study shares the same spirit as theirs. In my model, consumers will visit the website if and only if doing so is better than not visiting. Roberts and Lattin (1991) develop a two-stage model of consideration and choice. However, their model does not feature advertising. Sovinsky Goeree (2008) directly augment the product choice model with a model of choice set formation. In her model, the probability that a consumer considers a given brand is a function of her demographics and advertising. Since she cannot observe the choice set, she estimates the model using simulated choice sets. Draganska and Klapper (2011) combine micro-level survey data on brand awareness with de- mand and advertising data to estimate an aggregate discrete choice model. They use consumer survey data of brand awareness to construct the choice set. They find evidence that advertising has a direct effect on the probability of inclusion in the choice set in addition to its effects on consumers preferences. Their paper is one of the very few cases where choice sets are observed by researchers. Somewhat differently, Clark et al. (2009) find that advertising has a significant positive effect on brand awareness but no significant effect on perceived quality. One of the main topics of this article is targeted advertising. The previous literature has investi- gated this topic using descriptive approach. Goldfarb and Tucker (2011a) use a large-scale field experiment to show that targeted online advertisements increase purchase intent. In a related study, Lambrecht and Tucker (2013) find that online targeted advertisements have a positive effect on consumers with narrowly construed preferences. From the supply side, Goldfarb and Tucker (2011b) show that advertisers are willing to pay more for targeted search advertise- ments. In addition to developing a model, my study contributes to this strand of the literature by showing how targeted offline advertising can help improve online sales. The rest of this paper is structured as follows. Section 3.2 gives a brief overview of the back- ground information for lottery tickets in the Netherlands. Section 4.4 describes the data and shows descriptive evidence. Section 2.4 shows additional descriptive evidence on the effect of advertising on visits and sales. Section 3.5 develops the model of lottery ticket demand with advertising effects. Section 4.6 presents the results. Section 3.7 performs counterfactual exper- iments for the supply side. Finally, Section 3.8 concludes.

7 2.2 The market for lottery tickets in the Netherlands

The market for lottery tickets in the Netherlands is very concentrated, with three organizations conducting different types of lotteries. First, the Stichting Exploitatie Nederlandse Staatsloterij, from which the data is received, offers lottery tickets for The Dutch State Lottery (in Dutch: Staatsloterij) and the Millions Game (Miljoenenspel). Staatsloterij has a history going back to the year 1726 and is run by the government. It is by far the biggest of its kind in the Netherlands. The second player is the De Lotto. It offers the Lotto Game (Lottospel), which is comparable but much smaller in size, next to other games such as Eurojackpot and Scratch Tickets (Krasloten) and sports betting. In 2016, these two organizations merged. The third player is Nationale Goede Doelen Loterijen offering a ZIP Code Lottery (Postcodeloterij), whose main purpose it is to donate money to charity. For that reason, it is not directly comparable to the other two lotteries. The lottery run by Staatsloterij is classical. A ticket has a combination of numbers and Arabic letters and a consumer can choose some of them. The size of the prize depends then on how many numbers and letters of a ticket match with the ones of the winning combination. On top of that, there is a jackpot whose size varies over time. For all draws but the very last one in a year, consumers can choose between a full ticket that costs 15 euros and multiples of one fifth of a ticket. For the last draw, the price of a ticket is 15 euros and consumers can buy multiples of one half of a ticket. Winning amounts are then scaled accordingly. The tickets can be purchased in two ways: they can either be purchased online via the official website of Staatsloterij, or offline, for example, in a supermarket or a gas station. About 80 percent of the sales are offline. There are 16 draws in a calendar year. 12 of them are regular draws and 4 of them are special draws. Regular draws take place on the 10th of every month. The dates of 4 additional special draws vary slightly from year to year. In 2014 (the year for which we have data), the 4 special draws were on April 26 (King’s day in the Netherlands), on June 24, October 1 and on December 31 (the new year’s eve draw). All draws but the last in a year take place at 8pm (Central European Time). From 6pm onward, no more tickets can be bought for that draw.

2.3 Data and descriptive statistics

2.3.1 Overview

The data consists of 3 parts: the number of visits to the seller’s website, the number of online sales and TV and radio advertisement. All of them are measured at the minute level. The ad- vertising data is measured in terms of Gross Rating Points (GRP’s), separately for each TV and radio channel. GRP’s measure impressions of the target population. More specifically, GRP’s is defined as the percentage of people that have been reached times the average frequency ⇥ of the reach. For example, 5 GRP’s mean that 5 percent of the target population (in this case

8 Figure 2.1: distribution of GRP’s across channels percentage of total TV GRP TV total of percentage 0 10 20 30 40 TV channel

percentage of total radio GRP radio total of percentage 0 10 20 30 40 radio channel

Notes: This figure shows the distribution of GRP’s across different channels. TV (radio) GRP’s are computed as percentage of total TV (radio) GRP’s. the general population) are reached by the ads. It can also mean that 2.5 percent of the target population is reached twice. This is a standard measure in the advertising industry. Measuring ads using GRP’s has the advantage over the traditional measure of advertising in terms of mone- tary expenditures since it gives a precise measure of the percentage of the target population that has been reached. Related to my identification strategy, the GRP’s in the data are the actually delivered GRP’s, rather than the contracted GRP’s. I will return to this in section 2.4. Besides, I observe the jackpot size for the 12 regular draws in 2014. There is no information on jackpot size for the 4 special draws, as more involved rules apply to them. This makes it difficult to calculate an equivalent one-shot jackpot size. For example, on the drawing day, every 15 minutes consumers can win an additional 100,000 euros. In the empirical analysis, I will capture differences across draws in a flexible way. Throughout the paper, I am not allowed to report levels of visits and sales and advertising. Therefore, I will only present relative numbers and (semi-) elasticities in the tables and figures below and some vertical axis will have no units of measurements.

2.3.2 Descriptive evidence

Figure 2.1 shows the distribution of GRP’s across different channels. One can see that the firm diversifies its budget over a large number of channels and there are many channels where the

9 Figure 2.2: Share of GRP’s across channels

National and Regional TV Commercial Radio

Regional Radio

Commercial TV

National Radio

firm only spend few GRP’s. This motivates me to classify all channels into different groups based on channel characteristics. I assign all channels into 5 groups: Group 1 consists of public and regional TV stations. Group 2 is commercial TV stations. Group 3 is the public radio stations. Group 4 consists of regional radio stations. Finally, group 5 is commercial radio stations.1 Two criterions are used in this classification. First, after classification, the audiences across each channel should be different in terms of their interest in lottery tickets and the audiences within one channel should be similar. Second, after classification, the total number of spending for GRP’s should not differ too much across different groups. I do not have data on channel specific viewership and thus could not verify the first criterion. I verify the second criterion using Figure 2.2, which shows the share of GRP’s after the combination of channels. One can see that the firm spends most of the budget on commercial TV and radio channels. Table 2.1 shows the mean and various rescaled percentiles for GRP’s by each channel group. These numbers are rescaled by the overall average GRP if there is an ad. One can see that the mean size of GRP for an ad is larger on TV and national radio channels than on other two channels. This means advertising on TV and national radio channels are more effective in reaching people. Figure 2.3 shows GRP’s and visits at the minute level for one representative regular draw. First,

1The reason that I do not separate public and regional TV stations into 2 groups is that the GRP’s spent on regional TV stations are very few compared with other groups. I thus put public and regional TV stations into one group.

10 Table 2.1: Rescaled percentiles for GRP’s at the minute level

channel 5th 25th 50th 75th 95th max mean national & local TV channel 0.07 0.28 0.63 1.74 6.74 41.74 1.67 commercial TVchannel 0.07 0.35 0.92 1.81 4.86 29.79 1.46 national radio channel 0.21 0.63 1.25 2.71 4.58 14.72 1.74 local radio channel 0.07 0.07 0.07 0.21 1.67 5.76 0.35 commercial radio channel 0.07 0.21 0.49 1.04 2.85 6.25 0.83

notice that I disregard data for the first 3 days since the last draw. This is because the number of visits is extremely large in the first 3 days after the last draw. These site visits are generated by those who check online whether they have won in the last lottery. Clearly, those visits have nothing to do with advertising and thus are disregarded. In the reduced-form regressions and structural estimation, I also disregard the first 3 days of data for each draw. Second, we see that the firm only starts advertising on the 17th day after the last regular draw. Figure 2.4 shows GRP’s and sales at the minute level for the same draw. Unlike site visits, most of the sales occur during the last few days. Next, figure 2.5 and 2.6 zoom in further and shows the pattern for one of the days in Figure 2.3 and 2.4. In both figures, the lower part depicts the GRP’s. The higher the GRP spikes, the more people are reached. In particular, each color represents a different group of channels. It is interesting to notice that the raw data presented in Figure 2.5 and 2.6 has already shown some evidence of short-run site visits and sales responding to advertising. For example, there are some spikes of GRP’s just before 20:50, followed by spikes of visits and sales several minutes later. Finally, an interesting question is that, given consumers have visited the website, whether ad- vertising could increase their probability of buying a ticket. That is, whether advertising could convert website traffic into sales. The variable of interest here is the probability of buying a ticket conditional on visiting. At first glance, it is tempting to measure this conditional proba- bility by the number of sales over the number of visits at each minute. However, taking a closer look at Figure 2.3 and 2.4, one notices that the number of visits always responds to advertising faster compared to the number of sales. This is especially the case for the spikes around 22:13. In other words, there is a delay on sales response after advertising compared to visits. More- over, such a delay is “random”, which makes the ratio of sales over visits within the same minute meaningless. In the following section, I investigate the effect of advertising on conversion rate, taking into account the delay between the number of visits and sales.

11 Figure 2.3: GRP’s and visits at the minute-level for a regular draw

Figure 2.4: GRP’s and sales at the minute-level for a regular draw

12 Figure 2.5: GRP’s and visits at the minute-level for a short time window

Figure 2.6: GRP’s and sales at the minute-level for a short time window

13 2.4 Effect of advertising

Motivated by descriptive statistics, I now characterize the short-term effects of advertising more systematically using regressions. I control for draw, time of the day and days until the draw fixed effects. The effect of advertising in this section has a causal interpretation. The establishment of causality relies on the industry practice called “make-goods”, as explained in Dubé et al. (2005). The idea is that there is a difference between the contracted GRP’s and the actually delivered GRP’s. Although it is possible that the firm strategically chooses its desired GRP levels, which makes contracted GRP’s endogenous, the actually delivered GRP’s is random after controlling for draw, time of the day and days until the draw fixed effects. The intuition is that at a given minute in time, the instantaneous viewing rate for a particular show on a particular channel is random. There is a related approach by Liaukonyte et al. (2015), who reconstruct baseline on the level of ads and then attribute the systematic differences between the pre- and post-ads windows to the ad insertion. The difference between their approach and mine is that I control for baseline effects using regression while Liaukonyte et al. (2015) reconstruct baseline on the level of ads (like matching estimator).

2.4.1 Effect of advertising on visits and sales

Throughout, I use a common regression framework: distributed lag model. A distributed lag model is a model in which I regress variables of interest on lagged amounts of advertising. A nice feature of distributed lag model is that it imposes little structure. I control for the draw, time of the day and days until the draw fixed effects. More precisely, I specify

N yt = b0 + Si b1i grpt i+1 + xt0 b2 + et, (2.1) · · lags of GRP’s time and draw dummies where xt is a vector of dummy variables| {z including} draw,| the{z } hour of the day and days until the draw dummies. The dependent variable yt is log of one plus the number of visits (sales). Notice that I do not distinguish GRP’s from different channels in this specification. That is, I treat every unit of GRP’s the same. The main interest here is to measure the effect of advertising, no matter where they come from, on the number of visits and the number of sales. Table 3.1 summarizes the result. Column (1) shows the effect of advertising on visits. The main effect is observed in the first hour, but there are effects thereafter. The maximal effect is an increase in the number of visits of about 2.9 percent for each additional GRP of advertising, between 5 and 9 minutes after the advertisement was aired. Next, Column (2) reports the effect of advertising on sales. Compared with column (1), I find that the maximal effect is an increase in sales of about 3.8 percent for each additional GRP

14 Table 2.2: The effect of advertising

(1) (2) log(1+visits) log(1+sales)

GRP between 0 and 4 minutes ago 0.0241⇤⇤⇤ 0.0167⇤⇤⇤ (0.000918) (0.00106)

5 and 9 minutes 0.0286⇤⇤⇤ 0.0352⇤⇤⇤ (0.000832) (0.00106)

10 and 14 minutes 0.0106⇤⇤⇤ 0.0382⇤⇤⇤ (0.000650) (0.000923)

15 and 19 minutes 0.00931⇤⇤⇤ 0.0286⇤⇤⇤ (0.000661) (0.000966)

20 and 24 minutes 0.00918⇤⇤⇤ 0.0239⇤⇤⇤ (0.000682) (0.000969)

25 and 29 minutes 0.00908⇤⇤⇤ 0.0209⇤⇤⇤ (0.000708) (0.00105)

0.5 and 1 hour 0.00727⇤⇤⇤ 0.0164⇤⇤⇤ (0.000295) (0.000420)

1 and 1.5 hours 0.00635⇤⇤⇤ 0.0111⇤⇤⇤ (0.000297) (0.000413)

1.5 and 2 hours 0.00479⇤⇤⇤ 0.00871⇤⇤⇤ (0.000292) (0.000423)

2 and 2.5 hours 0.00345⇤⇤⇤ 0.00310⇤⇤⇤ (0.000301) (0.000366)

2.5 and 3 hours 0.000884⇤⇤ -0.000795⇤ (0.000297) (0.000349)

3 and 3.5 hours -0.000898⇤⇤ -0.00565⇤⇤⇤ (0.000287) (0.000322)

3.5 and 4 hours -0.00499⇤⇤⇤ -0.00940⇤⇤⇤ (0.000286) (0.000322) draw dummies Yes Yes days to draw dummies Yes Yes hour dummies Yes Yes Observations 441223 441223 R2 0.841 0.655 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

Notes: This table shows the results of regressions of the log of one plus sales/visits on GRP’s of adver- tising and lags thereof. Regressions were carried out at the minute level and standard errors are robust to heteroskedasticity. 15 Figure 2.7: Effect of a 10-GRP advertisement on site visits and online sales visits relative to the baseline before advertisement before baseline the to relative visits sales relative to the baseline before advertisement before baseline the to relative sales

0 50 100 150 200 250 0 50 100 150 200 250 minutes relative to time advertisement minutes relative to time advertisement of advertising, between 10 and 14 minutes after the advertisement was aired. Notice that the regressions show evidence on the delay of sales response: the maximal effect on sales is between 10 and 14 minutes after the advertisement was aired whereas it is between 5 and 9 minutes after the advertisement was aired for the number of visits. This means, on average, there is a delay of 5 minutes between visits and sales. To summarize, I find advertising has a significant positive effect on both site visits and online sales. The effect of advertising on sales has an average delay of 5 minutes compared to visits.

2.4.2 Effect of advertising on online conversion rate

As documented in the previous subsection, there is a “random” delay between the time when consumers visit the website and the time when they actually make a purchase. This makes regressing sale-visit ratio on the distributed lags of GRP’s meaningless. To measure the effect of advertising on the online conversion rate, ideally one needs the stand-alone advertisement with no advertising before and after itself. This is crucial since the effect of advertising would otherwise overlap with each other in the presence of multiple advertising. Unfortunately, I do not have many stand-alone advertisements in the data.2 Most of the advertisements stand close to each other, as can be seen from Figure 2.5. Thus, I measure the effect of advertising on online conversion rate in a counterfactual setting using the estimates in Table 3.1. More specifically, I first compute the effect of a 10 GRP’s stand-alone advertisement on the number of visits. The interpretation of this number is the total number of extra visits due to the GRP’s compared to baseline visits (without GRP’s). On the left side of the Figure 2.7, this is the total area under the impulse response curse. Then, I compute a similar number for sales, which is depicted again on the right side of the Figure 2.7. The effect of advertising on conditional sales is then measured by the ratio of two. Comparing this conversion rate with the average conversion rate over all periods, I find that the

2Regression with only stand-alone advertising would be subject to serious sample selection issue.

16 Table 2.3: The effect of advertising on conversion rate

(1) conversion rate

GRP the current hour 0.0237⇤⇤ (0.00898)

GRP one hour ago 0.0624⇤⇤⇤ (0.0101) GRP two hours ago 0.0108 (0.00909) GRP three hours ago 0.0157 (0.00836) GRP four hours ago -0.00415 (0.00738) draw dummies Yes days to draw dummies Yes hour dummies Yes Observations 7531 R2 0.773 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

Notes: This table shows the results of regressions of the conversion rate on GRP’s of advertising and lags thereof. Regressions were carried out at the hourly level and standard errors are robust to heteroskedasticity. conversion rate is higher than that without advertising.3 As an alternative approach, I characterize the effect of ads on conversion rate using the same regression as in (2.1) , with yt be the conversion rate. To overcome the aforementioned delay problem, I aggregate the data to the hourly level. The underlying idea is that the delay problem is eliminated after aggregation. As can be seen from Table 3.7, advertising increases conversion rate in the present and one hour in the future. The maximal effect is an increase of about 0.06 percent point for each additional GRP. Again, this difference in conversion rate is between those who are motivated to visit the website through advertisements and those who usually visit.

2.4.3 Effect of advertising across channels

To study the heterogeneous effect of advertising across channels, I extend the distributed lag model in (2.1) with interaction terms of channel dummies with GRP’s and their lags, while controlling for the same set of dummy variables. More precisely, I estimate

3I did not report the overall conversion rate because of confidentiality .

17 Figure 2.8: Effect of a 5-GRP advertisement on site visits and online sales for different channels

predicted increase visits predicted increase sales predicted visits/sales predicted

National&Regional TV Commercial TV National Radio Regional Radio Commercial Radio

Notes: Group 1=National & Local TV; 2=Commercial TV; 3=National Radio; 4=Local Radio; 5=Commercial Radio.

N 5 yt = b0 + Si S jb1 ji grp jt i+1 + xt0 b2 + et, (2.2) · · interaction between GRP’s and channels time and draw dummies where grp jt is GRP’s from channel| group{z j at time} t. Figure 2.8| {zsummarizes} the reult. First, channel group 5 (commerical radio channels) is the most effective in attracting both online traffic and online sales. However, channel 3 has the highest conversion rate. One can find the the full table in the appendix 2.A.

2.5 A model of lottery ticket demand

2.5.1 Motivation of estimating a model

There are three reasons that motivate me to estimate a model. First, as the descriptive evidence suggests, the online conversion rate increases after ads. This could be due to the fact that those who are motivated to visit the website through advertisements are different from those who usually visit. The former ones have a higher probability to buy given that they visit. Spelling out a two-stage model with correlated structure allows me to capture this selection. Therefore, the effects of ads on the conversion rate that the model captures are the difference between two group of visitors: Second, I can use the model to rigorously describe the idea that the firm faces

18 a tradeoff: advertising on the more effective channels has a higher return, but at the same time that the marginal return of additional unit of advertising is diminishing. Last but not the least, once the structural parameters are estimated, I can evaluate various counterfactual targeting advertising strategies.

2.5.2 General structure

There are 5 channel groups. Within each j = 1,2,...,5 channel group, consumers are homoge- neous in observed characteristics but are heterogeneous in their unobserved (to the econometri- cian) taste shocks. Moreover, consumers are assumed to watch one channel group. Consumers differ across each channel group in two ways. First, they differ in how much ads they have been reached. This can be seen directly from data since each channel group has different GRP’s. Second, for given amount of advertising, individuals from different groups react differently to advertisements. There are N expected discounted utility-maximizing consumers. Each consumer i comes from one of the 5 channel groups. Time t = 1,2,...,T is discrete and finite and measured at the hourly level. T is the last hour of the draw. In every hour, each individual has to make two sequential decisions. She first decides whether or not to visit the website. If she does, then she pays the cost of visiting the website (e.g., time cost of opening the website on a computer or smartphone). Otherwise, she receives the utility of outside option and continues in the next period and has the option of visiting the website there. After the individual has visited the website, she then decides whether or not to purchase a lottery ticket. If she does, then she receives a one-off flow of utility. Otherwise, she receives the utility of outside option and continues in the next period.

2.5.3 Advertising

The flow utility from visiting the website and buying a ticket (described below) is modeled to depend on the advertising goodwill stock. Loosely speaking, the advertising goodwill stock summarizes how many ads that the consumer has been exposed and it will increases if the indi- vidual is exposed to an advertisement. The size of the increase depends on GRP’s. Moreover, the goodwill stock depreciates over time.

More specifically, let the goodwill stock on channel j at the beginning of period t denote by g jt. The firm has the opportunity to increase the goodwill stock by purchasing GRP’s. The goodwill stock will depreciate at the beginning of the next period. Let l denote the depreciation rate and assume that the initial goodwill stock is 0. The law of motion for the goodwill stock is

g jt = lg jt 1 + GRPjt and

19 Figure 2.9: Model summary

visit or not

No Yes

ui jt0: ei jt0 ui jt1: c + G1(g jt)+E[max ui jt10,ui jt11 ei jt0] { }|

buy or not

No Yes

u : e u : p + d T ty + G (g )+e i jt10 i jt1 i jt11 2 jt i jt2

g = 0 j. j0 8 The specification is similar to that in Dubé et al. (2005). There are two differences. First, unlike Dubé et al. (2005) who model the GRP’s to enter the goodwill stock non-linearly, I specify a linear goodwill stock production function. Moreover, I allow the goodwill stock to be different for each channel group j.

2.5.4 Consumers

Consumer i from channel group j at time t decides whether or not to visit the website of lottery tickets. Visiting the website yields flow utility

u = c + G (g ) + I , (2.3) i jt1 1 jt i jt cost of visiting the website effect of ads on visiting option value where c is the cost of visiting the|{z} website, g jt is the| advertising{z } goodwill|{z} stock and Ii jt is the option value of purchasing the lottery ticket, which will be explained later. As in Dubé et al.

(2005), g jt enters consumer’s flow utility non-linearly. In particular, I specify

G (g )=g log(1 + g g ). (2.4) 1 jt j1 3 · jt The functional form of G ( ) is motivated as follows. In particular, g measures to what extent 1 · jt the consumers from channel j have been exposed to ads. The coefficient g j1 measures that consumers from different groups react differently to ads. Notice that the log functional form of

20 G ( ) implies diminishing marginal returns of advertising at a given point in time. This means 1 · the firm needs a tradeoff between a larger value of g j1 and the diminishing marginal return of goodwill stock implied by the log functional form of G ( ). The parameter in front of the 1 · goodwill stock, g3 , affects the curvature of the log function. The larger this parameter, the stronger diminishing marginal return of an additional unit of goodwill stock. If a consumer chooses not to visit the website, she receives the utility of outside option, which is normalized to 0, plus an unobserved taste shock ei jt0:

ui jt0 = 0 + ei jt0.

Visiting the website gives the consumer an option to buy a ticket. First, suppose that she chooses not to buy a ticket after visiting the website, she again receives a flow utility of outside option

ui jt10 = ei jt1.

Now suppose that she instead purchase a ticket, she receives flow utility

T t u = p + d y + G (g )+e , i jt11 2 jt i jt2 where p is the price of the ticket, d is the hourly discount factor and y is the value of holding a ticket at the time of the draw. Notice that y is fixed for each draw. This is because as a rule of the Staatslotterij, the size of the jackpot for each draw is fixed from the beginning and is known to everyone. Moreover, there is no extra information on how many tickets have been sold during the entire period of the draw. This is in contrast to e.g., football lottery in which case the size of jackpot is changing over time. Therefore, the effect of jackpot size is captured by draw fixed effects. The same as the case of flow utility of visiting the website, g jt enters consumer’s flow utility non-linearly:

G (g )=g log(1 + g g ). 2 jt j2 3 · jt Notice that I allow different effects of advertising on flow utility of visiting and buying given visiting. The taste shocks in flow utilities: ei jt0, ei jt1, ei jt2 are assumed to be jointly multivariate normally distributed:

e ,e ,e N(0,S) { i jt0 i jt1 i jt2}⇠ with

1 s01 0 S = 2 s10 103. 001 6 7 4 5 The covariance between ei jt0 and ei jt1, s01, is what makes this model non-standard: it allows

21 the taste shock in the visiting stage to be correlated with that in the purchasing stage.4 From the consumer’s perspective, this means that the taste shock of the outside option at the visiting stage is informative about that at the purchasing stage. From the modeling perspective, those who enter the purchasing stage decision are selected by their taste shock. It will become clear later that only those with a small enough ei jt0 choose to visit the website. Consequently, those who visited the website have a large probability of purchasing given visiting, which, in turn, generates the spikes observed in sales data. To see this more formally, consider the option value of visiting the website: Ii jt. By definition, it is the expected value of the maximum of flow utility between buying a ticket and not buying one:

Ii jt = Ee ,e max ui jt10,ui jt11 ei jt0,g jt,T t . (2.5) i jt1 i jt2 { }| Unlike standard models where the expectation⇥ operator is taken over⇤ the taste shocks uncon- ditionally, here because of the positive s01, the expectation is taken conditional on the visiting stage shock ei jt0. In other words, the option value, Ii jt, is a function of ei jt0. The main mo- tivation for the distributional assumption is computational: it can be shown that if the taste shocks are type-I extreme value distributed, then the conditional expectation has no closed form solution.5 The option value (2.5) has an economic interpretation: the consumer has taken into account the likelihood of buying a ticket when she decides whether or not to visit the website. If she knows for sure that she would not purchase a lottery ticket, then there is also little reason for her to visit the website. Put differently, those who are motivated to visit the website through advertisements are different from those who usually visit: they have a higher probability to buy given that they visit. In the model, this fact is captured by the correlation between ei jt0 and ei jt1 through the option value I . Only those consumers that belong to the set e u > u will visit the i jt { i jt0| i jt1 i jt0} website. The key elements of the full model are summarized in Figure 2.9 .

2.5.5 Discussion

Having spelled out the model, I give a short discussion on what distinguishes the model from the literature and why the distinction is important. The literature has modeled the effects of advertising in two different ways. Advertising can affect the consideration stage (Sovinsky Go- eree, 2008) or it can have an impact on the purchase stage (Draganska and Klapper, 2011). In this paper, I choose to model advertising affecting the consumer decision on both consideration stage and purchase stage, captured by G ( ) and G ( ). What distinguishes the model from the 1 · 2 · literature in that on top of the effects of advertising, there is a selection mechanism captured by the correlated error terms: those who visit the website because of ad are also more likely to

4 Notice that since I normalize the variance of taste shocks to 1. The covariance s01 becomes the correlation coefficient and thus it can never be larger than 1 or smaller than -1. 5 If ei jt1 and ei jt2 are drawn from type-I extreme value distribution and Ii jt is taken unconditionally over ei jt0, then it has the familiar log sum closed form solution.

22 purchase compared to those who do not visit the website. This is motivated by the empirical fact that the respond rate for lottery tickets is low: most consumers will ignore the advertisement when they see it. Put differently, those who respond to the advertisement are modeled to have a higher probability to purchase given that they visit. Accounting for the selection mechanism when one estimates the effects of advertising on conversion is particularly important for prod- ucts that have a lower response rate such as lottery tickets. One would otherwise attribute the observed increase in conversion rate fully to the effects of advertising, which may result in an over-estimation. This means the estimated effects of advertising in this paper are conservative. Moreoever, the effectiveness of advertising is channel specific. This is the source of variation in the effectiveness across channels.

2.5.6 Solving the model

I now describe how to solve the model for given values of the parameters, which I then vary in the outer loop of the estimation procedure. In the following, I discuss the solution of the model in backward order. That is, I first discuss the solution of the model in the purchase stage, given that the consumer has visited the website. I then discuss the decision that the consumer face in the visiting stage. The key insight for the purchasing stage decision is that conditional on having visited the web- site, the decision between buying and not buying a ticket is a binary probit choice. To see this, notice that it follows from result of multivariate normal distribution that conditional on ei jt0, the joint distribution of e ,e is also (bivariate) normally distributed: { i jt1 i jt2}

s01ei jt0 ei jt1,ei jt2 ei jt0 , S12 , { }| ⇠ " 0 # ! with

2 1 s01 0 S12 = . " 01#

Also note that conditional on ei jt0, S12 implies ei jt1is independent from ei jt2. Joint normality of error terms means the purchasing stage decision is a standard binary probit model with uncorre- lated error terms. This result dramatically reduces the computational burden of the model since it implies that, conditional on ei jt0, the choice stage decision can be solved without simulating integral. To see this, define V 0 and V p + d T ta + G ( ) so that u = V + e and 0 ⌘ 1 ⌘ 2 · i jt10 0 i jt1 u = V +e . Letu ˜ V V , then the probability of buying a ticket, given having visited i jt11 1 i jt2 i jt ⌘ 1 0 the website and ei jt0 is given by

P(buy visit,e )=F˜ (u˜ ) (2.6) | i jt0 i jt where F˜ ( ) is the cdf of normal distribution with mean s e and variance 2 s 2 . · 01 i jt0 01

23 One can see how the selection procedure is embedded in the model more clearly from (2.6).

The smaller the outside option value is (the more negative ei jt0 is), the lower the mean of F˜ ( ) and hence the higher P(buy visit,e ) is. Moreover, the larger the s is, the smaller · | i jt0 01 the variance is. It is this correlated structure that generates the spikes observed in sales. The correlated structure has an economic interpretation of the role of advertising in consumer’s decision process: consumers visited the website because of the advertisements. Moreover, since they take into account the possibility of buying, they are more likely to buy given they visit. Having discussed how to solve the purchase stage decision, I now turn to the upper layer of the model, visiting stage decision. The key challenge in this stage is to evaluate the option value term, Ii jt, in (2.5). Notice that Ii jt = E[max V0 + ei jt1, V1 + ei jt2 ei jt0,g jt,T t] where { }| V0 and V1 is defined above. It follows immediately from independence of ei jt1 and ei jt2 that V0 + e e N V + s e ,1 s 2 and V + e N(V ,1). Using a result from Nadarajah i jt1| i jt0 ⇠ 0 01 i jt0 01 1 i jt2 ⇠ 1 and Kotz (2008), it follows that

V0 V1 + s01ei jt0 Ii jt = V0 + s01ei jt0 F 0 2 s 2 1 01 @ q A V1 V0 s01ei jt0 +V1F 0 2 s 2 1 01 @ q A 2 V0 V1 + s01ei jt0 + 2 s01f , (2.7) 0 2 s 2 1 q 01 @ q A where f( ) and F( ) denote, respectively, the probability density function (pdf) and the cumula- · · tive density function (cdf) of the standard normal distribution. (2.7) has a structure that is easy to interpret. Indeed, if one ignores the last term for the moment, the option value is the weighted average of the two flow utilities in the purchasing stage, with the cdf being the weight. Figure 2.10 gives a graphical illustration of the option value. The red line is the 45-degree line and the blue line represents ui jt1, which is the option value plus a scalar utility shifter, c + G ( ). The individual will pay a visit to the website if u > e . Notice that the flat 1 · i jt1 i jt0 part of the blue curve is the utility of buying, ui jt1. The positive slope part of the blue curve is the utility of visit but not buy, which is a function of ei jt0. The intuition goes as follows: if the individual has a small ei jt0 draw (the taste shock when not visit), then since she expects she may get another small draw of ei jt1 if visit but not buy as well (since the positive correlation). Thus, she expects that u = e < u = p+d T ty +G (g )+e and thus the option value i jt10 i jt1 i jt11 2 jt i jt2 equals to the flow utility of visit and buy, which is the flat part of the blue curve. Conversely, if the individual has a large ei jt0 draw, then since she expects she may get another large draw of ei jt1 if visit but not buy as well. Consequently, she expects that ui jt10 = ei jt1 > ui jt11 =

24 Figure 2.10: Option value

inclusive value

inclusive value epsilon0 inclusive value

-3 -2 -1 0 1 2 3 epsilon0

p+d T ty +G (g )+e and thus the option value equals to the flow utility of visit but not 2 jt i jt2 buy, which is the positive slope part of the blue curve.

Figure 2.10 implies that individual will only visit the website when she has a small draw of ei jt0. Denote the upper and lower threshold points respectively by et0min , the probability of visiting the website is given by

Pjt(visit)=Fei jt0 (et0min), (2.8) where F ( ) denotes the cdf of normal distribution with mean 0 and variance 1. ei jt0 · Figure 2.10 has a meaningful economic interpretation. It says that those with a small value of e0 will visit the website. Moreover, given that they visit the website, they also have a large probability to purchase.

Figure 2.10 shows that the option value crosses ei jt0 only once. This follows from the fact that the varance of e s has been normalized to 1 and thus s can never be larger than 1. This 01 implies that the slope of the option value Ii jt can never be larger than 1. This is also the reason that I have normalized the variance so that we have a simple case. In general, depend on the slope of ei jt0 and variance of ei jt1 and ei jt2, it could have 0,1 or 2 intersections across ei jt0. In appendix 2.C, I provide a more general case with un-normalized variance of ei jt1 and ei jt2. In that case, the option value needs not across ei jt0 only once. Having discussed the option value and the probability of visiting the website, it is ready to derive

25 the other two choice probabilities. Consider the probability of buying a ticket conditional on visiting the website and ei jt0, given by (2.6). To get the conditional probability of buying given visiting without conditional on ei jt0, one simply needs to integrate out ei jt0. Notice however that one needs to integrate out ei jt0 not on the entire real line, but only in those regions where consumers would visit the website. In figure 2.10, this would be the two regions below the minimum threshold. Mathematically, it follows that

P (buy visit)= F˜ (u˜)dF¯(e ), (2.9) jt | i jt0 Z where F¯(ei jt0) is the cdf of normal distribution “truncated” in certain regions. Details on com- puting the conditional probability can be found in section 2.B. Finally, the probability of buying a ticket is given by6

P (buy)=P (visit)P (buy visit). (2.10) jt jt jt |

2.5.7 Empirical implementation

There is an inner and an outer loop. In the inner loop, I solve the consumer’s choice probability for given values of the parameters and compute the value of a method of simulated moment (SMM) objective function. In the outer loop, I then estimate the parameters. The moments I use are related to visits and sales at a given point in time given the advertising activity before that, and the evolution of cumulative visits and sales. I assume the market size for Dutch online lottery tickets market is 250,000 and I set the market size in the model, denoted by M, to be 1000.7 Thus each consumer in the model represents 250 real consumers. To implement this, I take aggregate sales and site visitations and divide them by 250. In addition, since I do not observe the market share for each channel group, I assume each channel group has the same market share: 0.2M.8 Finally, in the estimation, one time unit is equal to one hour and I count the time between midnight and 7 am as 1 hour. This is a compromise between computational burden and how realistic the model is. In the data, I only observe that a consumer has bought a ticket, but not which one. I assume that the price of the tickets bought is 3 euros. The key simplifying assumption is that everybody buys the same ticket (and not that some consumers buy multiple ones, for instance). To estimate the parameters, I first compute the option value at every period numerically on a grid. This gives me, at each point in time, the threshold point that defines the truncation region in F¯( ) and thus P (visit). Next, I compute the P (buy visit) by integrating out e s in the · jt jt | t0 6Formally, it should be P (buy&visit)=P (visit)P (buy visit). But since I assume that the decisions are jt jt jt | sequencitally, it follows that P(buy&visit)=P(buy). 7I experimented with different market sizes and found that results of the counterfactual simulations are not very sensitive to it. 8I plan to use data on the average market share of the channels in terms of the audience to refine the equal market share assumption in the future.

26 Table 2.4: Estimation results: key parameters

parameter estimate std.err. depreciation rate goodwill stock (l) 0.450 0.104 hourly discount factor (d) 0.986 0.004 covariance between taste shock (s01) 0.240 0.008 curvature parameter (g3) 3.000 0.174 channel specific effects on flow utility of buy national & local TV channel (g12) 0.055 0.084 commercial TVchannel (g22) 0.101 0.041 national radio channel (g32) 0.200 0.140 local radio channel (g42) 0.245 0.131 commercial radio channel (g52) 0.255 0.062 channel specific effects on flow utility of visit national & local TV channel (g11) 0.035 0.026 commercial TVchannel (g21) 0.035 0.019 national radio channel (g31) 0.030 0.046 local radio channel (g41) 0.037 0.052 commercial radio channel(g51) 0.037 0.026

Notes: Structural estimates. Obtained using the method of simulated moments. See Sections 3.5.6 and Appendix 3.A for details. truncation region obtained from the previous step. The simulated aggregate demand is then given by 0.2M S P (visit), and 0.2M S P (buy). I then match those two demands to actual · j j · j j aggregated visits and sales. Further details are provided in Appendix 3.A.

2.6 Results

In this section, I present the estimated results. After I show the parameter estimates and fit in subsection 3.6.1, I decompose choice probabilities in subsection 2.6.2. Next, in subsection 2.6.3, I calculate the elasticities of advertising implied by the parameter estimates. This section is concluded with a comparison between the estimates from the proposed model vs. the model with no correlation between e-s.

2.6.1 Parameter estimates and fit

In this section, I present my estimation results and assess the fit of the model. Table 3.2 shows the estimated parameters for the key parameters of the model. The effect of advertising de- preciates quickly, at an hourly rate of about 55(=1-0.450) percent. The hourly discount factor is estimated to be 0.986. This means that one month before a draw, the value that consumers attach to a ticket is less than 1% of the value on the day of the draw. Next, the estimated co-

27 Table 2.5: Estimation results: draw fixed effect parameter estimate std.err. cost of visiting the website 10 January, 2014 1.750 0.023 10 February, 2014 2.030 0.025 10 March, 2014 2.020 0.021 10 April, 2014 2.020 0.016 26 April, 2014 (King’s Day) 1.690 0.032 10 May, 2014 1.900 0.026 10 June, 2014 2.040 0.017 24 June, 2014 (Orange draw) 1.620 0.018 10 July, 2014 1.800 0.040 10 August, 2014 1.990 0.020 10 September, 2014 2.020 0.020 1 October, 2014 (special 1 October draw) 1.880 0.023 10 October, 2014 1.800 0.029 10 November, 2014 2.020 0.032 10 December, 2014 1.920 0.030 31 December, 2014 (New year’s eve draw) 1.620 0.060 value to having a ticket on the day of the draw 10 January, 2014 0.020 0.205 10 February, 2014 0.250 0.510 10 March, 2014 0.190 0.505 10 April, 2014 0.260 0.256 26 April, 2014 (King’s Day) 0.930 0.337 10 May, 2014 0.460 0.233 10 June, 2014 0.410 0.327 24 June, 2014 (Orange draw) 0.170 0.191 10 July, 2014 0.400 0.274 10 August, 2014 0.210 0.381 10 September, 2014 0.460 0.413 1 October, 2014 (special 1 October draw) 0.220 0.462 10 October, 2014 0.390 0.217 10 November, 2014 0.420 0.464 10 December, 2014 0.270 0.483 31 December, 2014 (New year’s eve draw) 1.690 0.154

Notes: Structural estimates. Obtained using the method of simulated moments. See Sections 3.5.6 and Appendix 3.A for details.

28 variance between taste shock (s01) is estimated to be 0.240. Recall that I have normalized the variance of the taste shock to 1. This implies that s01 is also the correlation coefficient. The sig- nificant positive correlation shows that the visiting and purchasing stage are indeed positively correlated. That is, those who visit the website are also more likely to purchase. Now I come to the effect of advertising on visits and sales. The effect of the goodwill stock on

flow utility of buying a ticket (g j2) differs from channel to channel. Channel 5, the commercial radio channel group, is the most effective. Its effect is estimated to be 0.255. In contrast, channel 1, the national and regional tv group, is the least effective channel. The result is in line with the descriptive evidence in described in Figure 2.8. Next, the effect of the goodwill stock on

flow utility of visiting the website (g j1) is much smaller. The reason is due to the specification of flow utility: the spikes in the visits data is measured by the sum of two parts: the common part, G ( ) , and the individual specific option value I . The estimated small value of g 1 · i jt j1 implies that the option value accounts for most of the spikes in the visits data. The economic interpretation behind this is that advertising generates online visits mainly through informing high-value consumers. Once taking out the effect of advertising on “high-value” consumers, the remaining effect on an average consumer is small. Apart from the key parameters, I have also estimated the fixed effect for each draw. These are the cost to visit the website (c) and the value of holding a ticket at the time of the draw (y). Table 2.5 presents the result. In general, a larger number of visits in the month implies a lower cost of visiting and similarly, larger sales implies higher estimates of draw fixed effects for sales. Moreover, one can see that draws with a short time period are estimated to have a lower cost of visiting the website. Figure 3.7 shows the model fit. With only a few parameters, the model arguably fits the patterns in the data relatively well.

2.6.2 Decompose probabilities

One of the advantages of estimating a structural model is that, once estimated, one can decom- pose probability of sales into the probability of visits and probability of buy given visits. As a result, one can study the effect of advertising on these three probabilities. Figure 2.12 shows the plot of these probabilities for one typical draw. Two things are worth to notice: first, the conditional probability of buying a ticket, given that the consumer has visited the website, increases over time. This means that it is a trend that the closer to the deadline, the more those with a high probability of buying are entering the pool.9 Second, it is clear from the figure that advertisement does have a positive effect on the conditional probability of buying, that is, the conversion rate.

9In my model, those consumers with a high probability of buying, given visiting the website, are those with low e0.

29 Figure 2.11: Model fit

model prediction data cumulative sales for each draw

0 50 100 150 200 250 300 350 time in days since January 1

model prediction data cumulative visits for each draw

0 50 100 150 200 250 300 350 time in days since January 1

30 Figure 2.12: Decompose probabilities

conditional prob buy given visit prob visit prob buy probabilities

0 50 100 150 200 250 300 350 400 450 500 Time in hour

grp channel 1 grp channel 2 grp channel 3 grp channel 4 grp channel 5 GRP

0 50 100 150 200 250 300 350 400 450 500 Time in hour

31 Table 2.6: Elasticities of advertising

channel prob(visit) prob(buy|visit) prob(buy) national & local TV channel 0.091 0.042 0.132 commercial TVchannel 0.090 0.084 0.174 national radio channel 0.108 0.167 0.276 local radio channel 0.126 0.203 0.329 commercial radio channel 0.138 0.209 0.347

2.6.3 Elasticities of advertising

In this subsection, I calculate the elasticities of advertising implied by the model parameter estimates. The elasticity is defined as Elasticitity =[P(1.01GRP) 1] 100, where GRP is the P(GRP) · average GRP’s over the last two weeks before the draw and P(.) is the choice probability. I calculate the elasticity in the following way: I increase GRP only for one channel at a time. I do this for each hour during the last 5 days before the draw using the average value for the cost of visiting the website and the value of holding a ticket and then take the average elasticities across these hours. Table 2.6 shows the result. For example, 0.174 in row 2, column 3 means that 1% increase in GRP on commercial TVchannel will increase the prob(buy) by 0.174%. Interpretations of other numbers are similar. These elasticities are more or less in line with those found in other literature. 10

2.6.4 The proposed model vs. the model with no correlation between e-s

In this subsection, I compare the estimates from two models. The first one is the proposed model with the correlation between taste shocks. The second model is the one without the correlation. Apart from the correlation, these two models are identical otherwise. Table 2.7 and 2.8 show the results. First, Table 2.7 shows that if one ignored the correlation between taste shocks, one would get much larger estimates for the effect of advertising on purchasing. Second, Table 2.8 shows that the cost of visiting the website would be much larger. The intuition goes as follows: without selection, both the serious buyers and the “average” visitors (those with low probability to buy) will visit the website. Consequently, for given parameters, the model would predict a much lower conversion rate. As a result, in order to get a higher conversion rate that fits the data, the cost of visiting the website must be larger (so fewer people will visit) and the effect of ads on purchasing a ticket would be larger (so more people will buy).

10Sethuraman et al. (2011) summarize the elasticities of advertising for different studies.

32 Table 2.7: Comparison estimates: key parameters

Comparison with correlation without correlation depreciation rate goodwill stock (l) 0.450 0.539 hourly discount factor (d) 0.986 0.964 covariance between taste shock (s01) 0.240 0 curvature parameter (g3) 3.000 2.934 channel specific effects on flow utility of buy national & local TV channel (g12) 0.055 0.187 commercial TV channel (g22) 0.101 0.165 national radio channel (g32) 0.200 0.358 local radio channel (g42) 0.245 0.445 commercial radio channel (g52) 0.255 0.527 channel specific effects on flow utility of visit national & local TV channel (g11) 0.035 0.035 commercial TVchannel (g21) 0.035 0.035 national radio channel (g31) 0.030 0.030 local radio channel (g41) 0.037 0.037 commercial radio channel (g51) 0.037 0.037 model fit 0.853 0.964

2.7 Counterfactual experiments

Having estimated the model, I turn to the supply side. I do not have access to data on the profitability of an additional sold ticket, and also not on the cost of one GRP in a given TV/radio station. It is, however, not unreasonable to assume as an approximation that the price of one GRP does not vary over different stations. Therefore, it is meaningful to study whether a given (monetary) budget could be allocated better over different channels, by asking the question whether it is possible to sell more tickets when one allocates the same number of GRP’s in a different way. Importantly, there are two dimensions where I could vary GRP’s. The first dimension is time: when to advertise? For example, one can allocate more GRP’s to a day that is closer to the deadline. The second dimension is location: where to advertise? In this paper, I focus on the second one. To do this, one needs to keep both the timing and the size of GRP’s as they are and then only varies GRP’s across channels.

2.7.1 Setup

Suppose there is a typical draw: the cost of visiting the website and the value of holding a ticket are the average values over all draws. Moreover, there are no ads during the entire period of this draw. Now suppose the manager, Sophia, buys 10 GRP’s between 7 pm-7:59 pm and 10

33 Table 2.8: Comparison estimates: draw fixed effect

Comparison with correlation without correlation cost of visiting the website 10 January, 2014 1.750 2.250 10 February, 2014 2.030 2.530 10 March, 2014 2.020 2.520 10 April, 2014 2.020 2.520 26 April, 2014 (King’s Day) 1.690 2.190 10 May, 2014 1.900 2.400 10 June, 2014 2.040 2.540 24 June, 2014 (Orange draw) 1.620 2.120 10 July, 2014 1.800 2.300 10 August, 2014 1.990 2.490 10 September, 2014 2.020 2.410 1 October, 2014 (special 1 October draw) 1.880 2.380 10 October, 2014 1.800 2.300 10 November, 2014 2.020 2.419 10 December, 2014 1.920 2.420 31 December, 2014 (New year’s eve draw) 1.620 2.120 value to having a ticket on the day of the draw 10 January, 2014 0.020 0.047 10 February, 2014 0.250 0.404 10 March, 2014 0.190 0.268 10 April, 2014 0.260 0.341 26 April, 2014 (King’s Day) 0.930 1.315 10 May, 2014 0.460 0.450 10 June, 2014 0.410 0.589 24 June, 2014 (Orange draw) 0.170 0.274 10 July, 2014 0.400 0.619 10 August, 2014 0.210 0.266 10 September, 2014 0.460 0.573 1 October, 2014 (special 1 October draw) 0.220 0.276 10 October, 2014 0.390 0.376 10 November, 2014 0.420 0.348 10 December, 2014 0.270 0.309 31 December, 2014 (New year’s eve draw) 1.690 2.961

34 GRP’s between 8 pm-8:59 pm on the day 7 days before the draw. Then she faces the following question: how to allocate 10 GRP’s between 7 pm-7:59 pm and 10 GRP’s between 8 pm-8:59 pm over different channels so that the online sales are maximized. Sophia’s question can be summarized in the following mathematical form:

Max q(w,qˆ)

s.t. (w + w + w + w + w ) 10 10 17 27 37 47 57 · 

(w + w + w + w + w ) 10 10 18 28 28 28 28 · 

w 0, 8 jt where q is total sales, qˆ is the estimated parameters which the manager takes as given and w jt is the weight for channel j at hour t. For example, w27 is the weight for channel 2 at the hour between 7 pm-7:59 pm.

2.7.2 Results

First suppose that Sophia allocates the budget based on “past experience”, i.e., she uses the average weight over the 7 days before the draw from previous draws. Next, she uses the target- ing strategy, that is, allocating all GRP’s on one channel. Finally, she solves the maximization problem to find the optimal allocation. Table 3.3 shows the result of various advertising strategies. The first row is the reference level: she uses the average weight in the data. Row 2 to Row 6 show the model prediction under the targeted strategy. As implied by the parameter estimates, the commercial radio channel which has the largest estimated effect performs best. The national & local TV channel, on the other hand, is least effective. Finally, the optimal allocation shows that Sophia could do better than using pure targeted strat- egy. In particular, the optimal strategy is to spend about 43% of the budget on the commercial radio channels, about 37% on the local radio channels, about 18% on the national radio channel and about 2% on the commercial TV channels. That is, although the commercial radio channel has the largest estimated effect, due to the functional form of diminishing marginal return of advertising, she could improve sales by allocating some GRP’s to other channels so that the marginal returns at the optimal allocation are more or less the same across channels.

35 Table 2.9: Effect of various advertising strategies

strategy visits sales conversion rate data (reference point) 100% 100% 100% allocate all GRP’s on channel national & local TV channel 99.52% 98.18% 98.65% commercial TVchannel 99.55% 98.42% 98.86% national radio channel 99.58% 99.03% 99.45% local radio channel 99.69% 99.60% 99.91% commercial radio channel 99.70% 99.70% 100.01% optimal allocation 99.97% 100.39% 100.42%

Notes: This table shows the effect of using alternative advertising strategies. See text for a description of these strategies. The conversion rate is calculated by the predicted sales over the predicted visitations.

2.8 Summary and concluding remarks

Advertising can affect consumer behavior at the consideration and the purchasing stage. This paper uses high frequency data on TV and radio advertising from different channels together with online sales and website visits data to measure the effects of advertising. I find positive effects of advertising on consideration and conversion and the effects depend on the channel on which the firm advertises. Besides, I point out that the observed increase in the conversion rate could be due to the fact that those who are motivated to visit the website through advertisements are different from those who usually visit. The former ones have a higher probability to buy given that they visit. Ignoring this and studying consideration and conversion separately could result in an underestimated conversion rate and thus a suboptimal advertising strategy, in particular when advertising on different channels reaches different audiences. I provide an explanation for this by spelling out a new integrated model of consideration and conversion that can generate the observed pattern in the data. I estimate the structural parameters of this model and show that one would overestimate both the effects of advertising and the cost of visiting the website if one would ignore this selection. I simulate the effects of counterfactual targeted advertising strategies and conclude that shifting advertising across channels can lead to increased sales. The proposed model can be used by firms to optimize online sales by choosing TV and radio advertising schedules, but of course, the model could be used in other situations as well. A closely related one is the real-time bidding in advertising auctions for online advertisements on, say, Facebook or Google. In this context, firms need to bid for the advertisement position on the user’s web browser. The value of the bid then depends crucially on the user’s expected conversion rate, as the price is only paid when users actually click on the advertisement. Using

36 a two-stage model that takes into account selection is particularly important: when the manager estimates the conversion rate using only data of those individuals who visit then she gets an average conversion rate— average between those who go on the website just like that and those who go on the website because they see an ad—but the one that matters for ad auctions is the one for those who actually go on the website because they see an ad. In a broader sense, the model proposed in this paper can also be used in situations where either the agent does not need to experience the good or doing so is not possible. An example could be buying an airline ticket. Airline tickets are similar to the lottery tickets since they also have simple characteristics: price and the time of the flight. Moreover, there is also a deadline for buying an airline ticket determined by the consumer: the date of her trip. Finally, consumers cannot experience the product before the purchase. Therefore, those who visit the website of airlines because of the ads are more likely to purchase as well: they have no reason to check the ticket price from, say, London to New York City if they have no plan to travel between these two cities. In other words, those who visit the website are also selected. In this paper, I have shown that taking such selection issues into account will help firms to design better advertising strategies.

37 2.A Additional tables and figures

Table 2.12: Effect of TV and radio advertising from different channels

(1) (2) visits sales

GRP1 between 0 and 4 minutes ago 0.0236⇤⇤⇤ 0.00761⇤⇤⇤ (0.00191) (0.00165)

5 and 9 minutes 0.0301⇤⇤⇤ 0.0338⇤⇤⇤ (0.00176) (0.00188)

10 and 14 minutes 0.0100⇤⇤⇤ 0.0384⇤⇤⇤ (0.00114) (0.00163)

15 and 19 minutes 0.00402⇤⇤⇤ 0.0226⇤⇤⇤ (0.000966) (0.00166)

20 and 24 minutes 0.00315⇤⇤⇤ 0.0130⇤⇤⇤ (0.000938) (0.00134)

25 and 29 minutes 0.00204⇤ 0.00976⇤⇤⇤ (0.000941) (0.00164)

0.5 and 1 hour -0.00128⇤⇤ 0.00445⇤⇤⇤ (0.000400) (0.000598)

1 and 1.5 hours 0.00214⇤⇤⇤ 0.00326⇤⇤⇤ (0.000479) (0.000630)

1.5 and 2 hours -0.000858⇤ 0.000693 (0.000415) (0.000616)

2 and 2.5 hours 0.00274⇤⇤⇤ 0.000863 (0.000520) (0.000570)

2.5 and 3 hours -0.00108⇤ -0.00385⇤⇤⇤ (0.000512) (0.000564)

3 and 3.5 hours -0.00284⇤⇤⇤ -0.00711⇤⇤⇤ (0.000483) (0.000506)

3.5 and 4 hours -0.00700⇤⇤⇤ -0.0127⇤⇤⇤ (0.000491) (0.000545)

GRP2 between 0 and 4 minutes ago 0.0346⇤⇤⇤ 0.0184⇤⇤⇤ (0.00141) (0.00165)

38 5 and 9 minutes 0.0478⇤⇤⇤ 0.0453⇤⇤⇤ (0.00127) (0.00195)

10 and 14 minutes 0.0144⇤⇤⇤ 0.0503⇤⇤⇤ (0.00103) (0.00173)

15 and 19 minutes 0.0101⇤⇤⇤ 0.0372⇤⇤⇤ (0.00103) (0.00161)

20 and 24 minutes 0.00817⇤⇤⇤ 0.0280⇤⇤⇤ (0.00105) (0.00167)

25 and 29 minutes 0.00861⇤⇤⇤ 0.0183⇤⇤⇤ (0.00109) (0.00160)

0.5 and 1 hour 0.00948⇤⇤⇤ 0.0161⇤⇤⇤ (0.000437) (0.000688)

1 and 1.5 hours 0.00890⇤⇤⇤ 0.0117⇤⇤⇤ (0.000445) (0.000662)

1.5 and 2 hours 0.00759⇤⇤⇤ 0.00746⇤⇤⇤ (0.000454) (0.000640)

2 and 2.5 hours 0.00517⇤⇤⇤ 0.00158⇤ (0.000470) (0.000616)

2.5 and 3 hours 0.00186⇤⇤⇤ -0.00419⇤⇤⇤ (0.000471) (0.000587)

3 and 3.5 hours -0.00199⇤⇤⇤ -0.0124⇤⇤⇤ (0.000498) (0.000588)

3.5 and 4 hours -0.00794⇤⇤⇤ -0.0221⇤⇤⇤ (0.000496) (0.000575) GRP3 between 0 and 4 minutes ago 0.00354 0.00413 (0.00186) (0.00252)

5 and 9 minutes -0.00582⇤⇤ 0.0114⇤⇤⇤ (0.00206) (0.00263)

10 and 14 minutes -0.00617⇤⇤⇤ 0.0105⇤⇤⇤ (0.00180) (0.00268)

15 and 19 minutes -0.00563⇤⇤ 0.00525⇤ (0.00185) (0.00260)

39 20 and 24 minutes -0.00405⇤ 0.00693⇤⇤ (0.00180) (0.00250)

25 and 29 minutes -0.00424⇤ 0.00691⇤⇤ (0.00188) (0.00255)

0.5 and 1 hour 0.00134 0.00878⇤⇤⇤ (0.000831) (0.00115)

1 and 1.5 hours -0.00500⇤⇤⇤ 0.00926⇤⇤⇤ (0.000835) (0.00112)

1.5 and 2 hours -0.00313⇤⇤⇤ 0.0122⇤⇤⇤ (0.000845) (0.00115)

2 and 2.5 hours -0.00502⇤⇤⇤ 0.00781⇤⇤⇤ (0.000833) (0.00114)

2.5 and 3 hours -0.00993⇤⇤⇤ 0.00627⇤⇤⇤ (0.000872) (0.00117)

3 and 3.5 hours -0.000299 0.0110⇤⇤⇤ (0.000878) (0.00114)

3.5 and 4 hours -0.00723⇤⇤⇤ 0.00713⇤⇤⇤ (0.000919) (0.00120)

GRP4 between 0 and 4 minutes ago 0.0108⇤⇤⇤ -0.000113 (0.00319) (0.00412) 5 and 9 minutes 0.00553 0.00306 (0.00316) (0.00407)

10 and 14 minutes 0.00739⇤ 0.00629 (0.00320) (0.00413)

15 and 19 minutes 0.0133⇤⇤⇤ 0.00758 (0.00314) (0.00402)

20 and 24 minutes 0.0144⇤⇤⇤ 0.0174⇤⇤⇤ (0.00321) (0.00410)

25 and 29 minutes 0.0153⇤⇤⇤ 0.0151⇤⇤⇤ (0.00337) (0.00408)

0.5 and 1 hour 0.0158⇤⇤⇤ 0.00957⇤⇤⇤ (0.00138) (0.00178)

40 1 and 1.5 hours 0.00697⇤⇤⇤ 0.00978⇤⇤⇤ (0.00139) (0.00176)

1.5 and 2 hours 0.00922⇤⇤⇤ 0.0115⇤⇤⇤ (0.00139) (0.00180)

2 and 2.5 hours 0.00469⇤⇤ 0.0110⇤⇤⇤ (0.00148) (0.00184)

2.5 and 3 hours 0.00850⇤⇤⇤ 0.0117⇤⇤⇤ (0.00147) (0.00190)

3 and 3.5 hours 0.00911⇤⇤⇤ 0.0130⇤⇤⇤ (0.00144) (0.00183)

3.5 and 4 hours 0.00210 0.0121⇤⇤⇤ (0.00143) (0.00179)

GRP5 between 0 and 4 minutes ago 0.0160⇤⇤⇤ 0.00899⇤⇤⇤ (0.00186) (0.00225)

5 and 9 minutes 0.0107⇤⇤⇤ 0.0117⇤⇤⇤ (0.00178) (0.00225)

10 and 14 minutes 0.0114⇤⇤⇤ 0.0138⇤⇤⇤ (0.00182) (0.00232)

15 and 19 minutes 0.0141⇤⇤⇤ 0.0156⇤⇤⇤ (0.00180) (0.00227)

20 and 24 minutes 0.0129⇤⇤⇤ 0.0147⇤⇤⇤ (0.00179) (0.00224)

25 and 29 minutes 0.0144⇤⇤⇤ 0.0181⇤⇤⇤ (0.00181) (0.00225)

0.5 and 1 hour 0.0115⇤⇤⇤ 0.0171⇤⇤⇤ (0.000783) (0.000948)

1 and 1.5 hours 0.0104⇤⇤⇤ 0.0177⇤⇤⇤ (0.000769) (0.000950)

1.5 and 2 hours 0.00875⇤⇤⇤ 0.0191⇤⇤⇤ (0.000790) (0.000952)

2 and 2.5 hours 0.00546⇤⇤⇤ 0.0166⇤⇤⇤ (0.000796) (0.000953)

41 2.5 and 3 hours 0.00690⇤⇤⇤ 0.0139⇤⇤⇤ (0.000822) (0.000937)

3 and 3.5 hours 0.00882⇤⇤⇤ 0.0124⇤⇤⇤ (0.000809) (0.000913)

3.5 and 4 hours 0.00735⇤⇤⇤ 0.0121⇤⇤⇤ (0.000830) (0.000920) draw dummies Yes Yes days to draw dummies Yes Yes hour dummies Yes Yes

Observations 441223 441223 R2 0.843 0.665

Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

2.B Computing conditional choice probability

I now discuss for given paramters, how to conpute (2.9), the conditional probability of buying given that the consumer has visited the website:

P(buy visit)= F˜ (u˜)dF¯(e ). | it0 Z The key challenge is to take random draws from F¯(eit0) which is the cdf of normal distribution “truncated” in certain regions. Once such draws are taken, the rest is standard: the integral can be computed by the sample average. Thus, the question boils down to taking random draws from

F¯(eit0) . This is not trivial because F¯(eit0) is a function of parameters. That is, the truncation region changes as parameter value changes. Moreover, depending on the parameter’s value,

F¯(eit0) can be left truncated, right truncated or truncated on both sides. These bring two issues: first, the random draws from F¯(eit0) should be “fixed” so that one is certain that the value of objective function changes due to parameter changes instead of new random draw. Second, the algorithm should be such that it includes all cases for different truncation regions. Suppose one has computed P(visit) so that one knows for each time period, the lower and upper cutoff points, denoted by a and b. Given this input, I propose the following algorithm: 1. Draw a set of random numbers from standard uniform distribution and keep it fixed outside the estimation loop Denote this set by S. 2. Compute the total length of truncated region, denoted by r: r = F (a)+(1 F (b)). e0 e0 3. Transform each point in S into the uniform distribution of length r : s S : s s r, denote 8 2 ! ·

42 Table 2.10: Definition of TV channels and groups

channel name channel number before grouping channel number after grouping 24Kitchen 1 2 Com.Cent.F 2 2 Comedy 3 2 Discovery 4 2 FS3E 5 2 Fox 6 2 Fox Sp1 7 2 Fox Sp 2 8 2 Geographic 9 2 ID 10 2 L1 TV 11 1 MTV 12 2 NPO1 13 1 NPO2 14 1 NPO3 15 1 Ned1 16 1 Ned2 17 1 Ned3 18 1 Net5 19 2 O.Brabant 20 1 RTL4 21 2 RTL5 22 2 RTL7 23 2 RTL8 24 2 RTV Utr. 25 1 Rijnmond 26 1 SBS6 27 2 TLC 28 2 TV Drenthe 29 1 TV Flevo 30 1 TV Gelderl 31 1 TV Nh 32 1 TV Noord 33 1 TV Oost 34 1 TV West 35 1 TV Zeeland 36 1 Veronica 37 2

Notes: This table shows the lists of Dutch TV channels. The first column shows the names for each channel. The second column is the channel number before grouping. They are assigned to each channel accroding to the alphabetical order of channel name. The third column shows the new channel number after grouping. Channel 1 consists of national and local TV channels. Channel 2 indicates commercial channels.

43 Table 2.11: Definition of radio channels and groups

channel name channel number before grouping channel number after grouping 100% NL 1 5 2 5 Freez FM 3 5 Fresh FM 4 5 Hot-Radio 5 5 Joy Radio 6 5 L1 Radio 7 4 NPO 3FM 8 3 NPO Radio 1 9 3 NPO Radio 2 10 3 NPO Radio 3FM 11 3 12 4 13 4 Omrop Fryslan 14 4 Open Rotterdam 15 4 Optimaal FM 16 5 Puur NL 17 5 Q-Music 18 5 19 5 Radio 2 20 3 Radio 3FM 21 3 22 5 Radio 8 FM 23 4 Radio Continu 24 5 Radio Decibel 25 5 Radio Drenthe 26 4 Radio Flevoland 27 4 Radio 28 4 Radio M 29 4 Radio Noord 30 4 Radio Noord-Holland 31 4 Radio Oost 32 4 Radio Rijnmond 33 4 Radio Royaal 34 5 Radio Veronica 35 5 Radio West 36 4 RadioNL 37 5 Simone FM 38 5 39 5 Slam!FM 40 5 Sublime FM 41 5 Waterstad FM 42 5 Wald FM 43 5 Notes: This table shows the lists of Dutch radio channels. The first column shows the names for each channel. The second column is the channel number before grouping. They are assigned to each channel accroding to the alphabetical order of channel name. The third column shows the new channel number after grouping. Channel 3 consists of national radio channels. Channel 4 refer to those local radio channels. Channel 5 indicates commercial channels. 44 this new set by S0. 4. Transform each point in S into the truncated region: ( •,a] [b,+•) using the following 0 [ mapping:

s s +(b a) if s > a s S0 : ! , 8 2 8s s otherwise < ! denote the new set by S00. : ¯(.) 1( ) 5. The desired draws from F are draws from Fe0 S00 . Notice that the proposed algorithm overcomes the aforementioned issues. First, step 1 implies that although the real value of draws changes for each parameter value, the relative position of each random draw on the finite length line segment keeps the same. This is important since those random draws are “fixed” from the estimation point of view. Second, the proposed algorithm includes every case of truncation (left, right or two-sided). For example, if it is left truncated, then the upper threshold b tends to infinity. This allows me to avoid case by case inspection so that the computational time is saved.

2.C The probit model: a more general case

Consider a model described in figure 2.9. The vector of utility shock is jointly multivariate 2 s0 s01 0 normally distributed: e ,e ,e N(0,S), where S = . s 2 0 . The idea is that { it0 it1 it2}⇠ 2 1 3 ..s 2 6 2 7 the model imposes that eit0 and eit1 are correlated while eit24is independent.5 In the following, I first discuss the lower layer of the model: purchasing decision stage and then discuss the upper layer of the model: visiting decision stage.

2.C.1 Purchasing stage decision

It follows from result of multivariate normal distribution that conditional on ei jt0, the joint distri- s01 e s 2 i jt0 bution of ei jt1,ei jt2 is also (bivariate) normally distributed: ei jt1,ei jt2 ei jt0 N 1 , S12 { } { }| ⇠ " 0 # ! s 2 s 2 01 0 1 s 2 where S12 = 0 . Notice that conditional on ei jt0, S12 implies ei jt1is independent from ei jt2. 2 2 3 0 s2 Joint normality4 of error terms5 means the purchasing stage decision is a standard binary probit model with uncorrelated error terms and thus can be estimated without numerical integral.

45 2.C.2 Visiting stage decision

The key challenge in the upper model is to evaluate the option value term, Ii jt, which is a function of e . For notational purpose, define V 0 and V p + d T ta + G ( ) so that i jt0 0 ⌘ 1 ⌘ 2 · Iit = E[max V0 + ei jt1,V1 + ei jt2 ei jt0]. It follows immediately from independence in the pre- { }| 2 s01 2 s01 2 vious subsection that ei jt1 ei jt0 + A N V0 + 2 ei jt0,s1 2 and V1 + ei jt2 N(V1,s2 ). | ⇠ s1 s0 ⇠ ⇣ ⌘ s01 V0 V1+ 2 ei jt0 s01 s1 Using result of multivariate normal distribution, Ii jt = V + e F0 1 + 0 s 2 i jt0 2 1 2 s01 2 s1 2 +s2 ⇣ ⌘ B s s0 C B C @ A s01 s01 V1 V0 ei jt0 V0 V1+ ei jt0 s2 s 2 s2 V F0 1 1 + s 2 01 + s 2f 0 1 1 where f( ) and F( ) denote, re- 1 2 1 s 2 2 2 2 s01 2 0 2 s01 2 · · s1 2 +s2 r s1 2 +s2 B s s0 C B s s0 C B C B C spectively,@ the pdf andA the cdf of the standard@ normal distribution.A In general, depending on the slope of ei jt0 and parameters that affect the difference in flow utilities, it could have 0,1 or 2 intersections across ei jt0. Figure 2.13 shows all other 4 cases. Whether the option value will cross the 45 degree line from above depends on the slope before s01 ei jt0: 2 . s1

2.D Details on the econometric implementation

I provide further details on the econometric implementation.

2.D.1 Empirical setup

The data contain information on ticket sales, online traffic and advertising activities for 16 draws. Since I collapse these data during the night, every day in the model has 18 hours. The starting period is 00:00-00:59 on Jan 1 and the last period is 17:00-17:59 on Dec 31. Further- more, since the online traffic at the beginning of each draw is very high because consumers want to check if they have won the lottery. These amount of online traffic obviously has nothing to do with purchasing decision, I exclude the first 3 days of data for each draw. Thus, the total number of periods is t = 6564 (t is not to be confused with T, which we have defined in the context of our model). I divide them up into sub-periods, one for each draw. I account for the fact that they differ with respect to the total number of hours (T in the model) and the value to holding a ticket (y in the model), and of course with respect to the realized advertising activity.11 The ticket price is constant over time and across draws.

11T and y need to indexed by the draw, because they differ across draws. For the ease of the exposition, in Section 3.5, we have described the model only for one draw. Within each draw, t runs from 1 to the draw-specific T.

46 Figure 2.13: Option value: all other cases

inclusive value inclusive value

inclusive value inclusive value epsilon0 epsilon0 inclusive value inclusive value

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 epsilon0 epsilon0

inclusive value inclusive value

inclusive value inclusive value epsilon0 epsilon0 inclusive value inclusive value

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 epsilon0 epsilon0

47 2.D.2 Method of simulated moments

The set of structural parameters that do not change across draws is l,s ,s ,d,g ,g . In { 01 1 1 2} addition, I estimate 16 values y1,...,y16 to holding a ticket at the time of the draw and 16 cost of visiting the website c1,...,c16. Thus the full set of structural parameters to be estimated is q l,s,d,g ,g ,y ,...,y ,c ,...,c . ⌘{ 1 2 1 16 1 16} Let zt be a vector of exogenous variables constructed from the data (specified in Section 3.A.3 below) andq ˆs(q) qs q˜s(q) be the difference between actual demand qs in the data and the t ⌘ t t t model predictionq ˜s(q). Similarly, defineq ˆv(q) qv q˜v(q) be the difference between actual t t ⌘ t t v v website visits qt in the data and the model predictionq ˜t (q). I stack the two quantity together qs ( ) ( ) t and defineu ˆt q qt q˜t q with qt v .(˜qt is defined similarly.) In Section 3.A.3 below ⌘ ⌘ " qt # I specify a set of moments E[m(zt,uˆt(q))] = 0 (where the left hand side is a column vector and the right hand side is a vector of zeros and the expectation is taken over hours). The (technical) condition for identification is that they hold if, and only if, we evaluate the function m at the true parameters q (see for instance Newey and McFadden, 1994).

Letm ¯ (q˜) be the average of m(zt,uˆt(q)), over time in hours across all draws (thus over t time periods), evaluated at any candidate parameter vector q˜. The SMM estimator is

qˆ = argminm¯ (q˜)0Wm¯ (q˜), q˜ where W is a positive definite weighting matrix.

Under the assumption that the prediction error is orthogonal to the variables in zt, qˆ is consistent. An estimator of the variance-covariance matrix is given by (Newey and McFadden, 1994)

1 1 1 var\(qˆ)= (A0WA) B(A0WA) , t where

∂m¯ (qˆ) A = ˆ ∂q 0 and

B = A0W(m(qˆ) m¯ (qˆ))(m(qˆ) m¯ (qˆ))0WA.

2.D.3 Moments and weighting matrix zt contains 3 sets of exogenous variables: a full set of dummy variables for the number of days until the draw, the number of GRP’s in t, t 1, t 2, and t 3, and variables calculating cumulative sales up to point t. This means that we attempt to pick the parameters so that the model captures well the evolution of sales over time and the reaction to advertisements.

48 Specifically, I stack allu ˆ (q) into a vectoru ˆ(q) of dimension 2t 1 and define a 2t 2M matrix t ⇥ ⇥ z 0 of exogenous variables Z in the following way: Z = and z = z0 z1 z2 z3 with " 0 z # h i z0 being a vector of 1-s, z1 containing the times until the draw dummies in the columns, z2 containing GRP’s and lags thereof in the columns, and z3 being a matrix with indicators such that it takes cumulative sales at the daily level, separately for each draw. z3 is block-diagonal with sub-matrices z3,r on the diagonal (r indexing draws). Each column of these sub-matrices is for one day and contain a set of ones on top and zeros in the bottom, such that the cumulative prediction error is calculated on a daily level when we multiply z30 withu ˆt(q). After eliminating linearly dependent columns, Z has M = 359 2 = 718 columns, meaning that ⇤ there are 718 exogenous variables.12 Using this, I calculate

1 m¯ (q˜)= Z0uˆ(q˜). t I choose the weighting matrix W to be

1 W =(Z0Z/t) .

12 z0 has 1 column. z1 originally has 30 columns. z2 contains GRP’s from 5 groups of channels and 3 lags thereof, so it has 5*4=20 columns. z3 has 332 columns. Most columns in z1 are linear combinations of columns in z3. After dropping those, z1 has 7 columns left. Thus, there are in total 1 + 6 + 20 + 332 = 359 columns.

49 Chapter 3

Advertising as a reminder: Evidence from the Dutch State Lottery

3.1 Introduction

In 2016, global advertising spending amounted to 493 billion US dollars.1 Yet, it remains a challenge to measure the effects of advertising and characterize the underlying mechanism through which it affects consumer behavior (Lewis and Rao, 2015). The consensus that has emerged in the literature is that conceptually, firms pursue a combina- tion of goals when advertising: they either aim at conveying information about the existence, characteristics and prices of products; or they wish to positively influence the inclination of consumers who already know about their products to buy them.2 In this paper, we present novel empirical evidence that is in line with the view that in addition to these well-received ways of working, advertising can act as a reminder. The underlying idea is that consumers have limited attention and may therefore value being reminded. Specifically, we use high frequency data on TV and radio advertising together with online sales data for lottery tickets to measure the short run effects of advertising. The high frequency nature of our data allows us to credibly identify advertising effects. The exact timing of advertisements is beyond the control of the firm and therefore, the thought experiment we can undertake is to compare sales just before the advertisement was aired to sales right after this. Our setup is well- suited to study reminder effects, because there are recurrent deadlines within a year, consumers are well-aware of the product and its characteristics, and there is no other closely competing product that is offered to them. We find the short run effects of advertising to be sizable. They last up to about 4 hours and are the bigger the less time there is until the draw, consistent with our interpretation that advertise- ments indeed remind consumers to buy a ticket and that consumers value this. The underlying idea is that consumers enjoy the benefits of buying a ticket mainly at a later point in time, on

1Taken from a report by Letang and Stillman (2016). 2See Bagwell (2007) for an excellent survey on the economics of advertising.

50 the day of the draw, which is why they prefer to be reminded later and then react stronger to it. This argument is developed in Section 3.4.3. An important related question is whether advertising only leads to purchase acceleration (in- dividuals buying earlier rather than later) or also to market expansion (more people buying in total). In order to provide model-free evidence on this, we point out that if advertising has a short-run effect until the end of the period in which tickets can be bought, then it must be the case that it also leads to market expansion.3 We find this to be the case. After presenting this model-free evidence, we point out that in terms of timing the interest of the firm that is advertising and the consumers are aligned: consumers wish to be reminded in a way that makes them most likely to consider buying a lottery ticket. The tradeoff they face is that on the one hand, if the firm allocates all the advertising very late, then it may not reach certain consumers, for instance because they will not watch TV on these days; on the other hand, if it spreads advertising expenditures out over time in order to reach more consumers, then advertising effects may be smaller because consumers do not want to buy too early. This means that total sales crucially depend on the dynamic advertising strategy and therefore, it would be valuable to quantify the dependence of sales on counterfactual advertising strategies. For this, we then develop a tractable structural model of consumer behavior that we estimate. Our counterfactual simulations suggest that relative to the actual schedule it would be profitable for the firm and also valued by the consumer if advertising would be shifted towards the last days before the draw. Besides, the model is useful to formalize the idea that advertisements act as a reminder. We think of consumers, at a given point in time, as either buying a ticket for the following draw, or postponing the decision to do so to a later point in time, with the possibility that they either forget to buy a ticket or consciously decide not to do so. In each period, there is a considerations stage. The probability to consider buying a ticket, i.e. to compare the value from buying the ticket and the value of waiting, depends on an advertising goodwill stock that depreciates over time. When estimating the model we pay particular attention to the fact that at a given point in time goodwill stocks will be heterogeneous across ex ante identical consumers, because they depend on heterogeneous but unobserved viewing behavior. Our paper relates to several strands of the literature. The overarching theme is that quantifying the effects of advertising and shedding light on the exact mechanism through which it affects consumer behavior remains challenging, even using large-scale field experiments (Lewis and Rao, 2015); but novel data sources and innovative empirical designs have allowed researchers to measure advertising effects in credible ways and also shed additional light on the underlying mechanism. Overall, given the importance of the advertising industry, there is relatively little empirical work on the topic in economics. The idea that advertising may serve as a reminder has appeared in the context of a debate on the optimal number of times a consumer should be reached. Krugman (1972) argued that consumers

3We would like to thank Martin Peitz for this suggestion.

51 need to first understand the nature of the stimulus, then evaluate the personal relevance, and finally are reminded to buy when they are in a position to do so. So, he concluded that they should be reached at least three times. The underlying way of thinking about consumers is related to models of limited attention that, according to Kahneman (1973) and others, may originate in limits of information processing power and therefore may lead to forgetting. We add to this by presenting novel empirical evidence that is in line with that view. Ackerberg (2001, 2003) also focuses on the mechanism by which advertising influences con- sumer behavior. His aim is to empirically distinguish between advertisements being effective because they are informative vis-à-vis them being effective because they increase the valuation for the brand. He finds mainly support of former. We complement this by presenting evidence that is in line with the view that advertisements act as a reminder. Lodish et al. (1995) study the effectiveness of TV advertising and document a combination of no and positive effects. Hu et al. (2007) find that the effects have increased in later years. We contribute to this literature by using high frequency data to show that advertising has significant effects on online sales and that it leads to market expansion. A number of recent studies shed light on the relationship between TV advertising and behavior online. Joo et al. (2015) find that there is a significant effect of TV advertising on consumers’ tendency to search online. Lewis and Reiley (2013) study the effects of Super Bowl advertising on online search behavior. They find that large spikes in search behavior related to the adver- tiser or product within 15 seconds following the conclusion of the TV commercial. Stephens- Davidowitz et al. (2017) exploit a natural experiment and find that advertising has a positive effect on searches and on the demand for movie tickets on the opening weekend. Du et al. (2017) characterize how the effects of advertising on online searches depend on advertisement content, media-contextual factors, and brand. Both Liaukonyte et al. (2015) and our paper com- plement these papers with evidence from high-frequency advertising and sales (as opposed to search or low-frequency sales) data. The model we estimate features an advertising goodwill stock as in the model by Dubé et al. (2005). They estimate a static model and focus on interesting dynamics on the supply side. Another difference is that in our model, advertising affects the probability of considering to buy a ticket at a given point in time and not the flow utility associated with buying. Sovinsky Goeree (2008) and Draganska and Klapper (2011) estimate static models with a consideration stage using aggregate level data. Our model is dynamic and consumers decide at each point in time whether to buy a ticket or wait. Melnikov (2013) and De Groote and Verboven (2016) estimate similar models. Their re- spective models do however not feature a consideration stage in which advertising has an effect. So, our modeling contribution lies in proposing a model in which advertising can naturally be thought of as acting as a reminder because it affects the probability to consider buying through an advertising goodwill stock in a dynamic decision context. The rest of this paper is structured as follows. Section 3.2 gives a brief overview over the

52 market for lottery tickets in the Netherlands. Section 4.4 describes the data and provides de- scriptive statistics. Section 3.4 shows reduced-form evidence on the effect of advertising on sales. Section 3.5 develops our model of lottery ticket demand with advertising effects. Sec- tion 4.6 presents the results. Section 3.7 performs counterfactual experiments for the supply side, and Section 3.8 concludes by pointing towards other situations in which our model could be used, including public policy. The (intended) Online Appendix is attached at the very end. Appendix 3.A provides details on the structural estimation procedure, Appendix 3.B contains robustness checks for the structural analysis, and Appendix 4.A contains additional tables and figures.

3.2 The market for lottery tickets in the Netherlands

The market for lottery tickets in the Netherlands is very concentrated, with three organizations conducting different types of lotteries. First, the Stichting Exploitatie Nederlandse Staatsloterij, from which we received the data, offers lottery tickets for The Dutch State Lottery (in Dutch: Staatsloterij) and the Millions Game (Miljoenenspel). Staatsloterij has a history going back to the year 1726 and is run by the government. It is by far the biggest of its kind in the Netherlands. The second player is the De Lotto. It offers the Lotto Game (Lottospel), which is comparable but much smaller in size, next to other games such as Eurojackpot and Scratch Tickets (Krasloten) and sports betting. In 2016, these two organizations merged. The third player is Nationale Goede Doelen Loterijen offering a ZIP Code Lottery (Postcodeloterij), whose main purpose it is to donate money to charity. For that reason, it is not directly comparable to the other two lotteries.4 The lottery run by Staatsloterij is classical. A ticket has a combination of numbers and Arabic letters and a consumer can choose some of them. The size of the prize depends then on how many numbers and letters of a ticket match with the ones of the winning combination. On top of that, there is a jackpot whose size varies over time. For all draws but the very last one in a year, consumers can choose between a full ticket that costs 15 euros and multiples of one fifth of a ticket. For the last draw, the price of a ticket is 15 euros and consumers can buy multiples of one half of a ticket. Winning amounts are then scaled accordingly. The tickets can be purchased in two ways: they can either be purchased online via the official website of Staatsloterij, or offline, for example, in a supermarket or a gas station. Most of the sales are offline, but nevertheless the online business is considered important. There are 16 draws in a calendar year. 12 of them are regular draws and 4 of them are special draws. Regular draws take place on the 10th of every month. The dates of 4 additional special

4In 2014, Staatsloterij had a turnover of 738 million euros with 579 million euros related to its lottery and De Lotto of 322 million euros with 144 million euros related to its lottery (https://over.nederlandseloterij.nl/over- ons/publicaties, accessed May 2018). The turnover of Nationale Goede Doelen Loterijen was 847 million euros in total with 624 million euros related to its charity ZIP code lottery (https://view.publitas.com/nationale-postcode- loterij-nv/npl-jaarverslag-2014/page/58-59, accessed February 2016).

53 draws vary slightly from year to year. In 2014 (the year for which we have data), the 4 special draws were on April 26 (King’s day in the Netherlands), on June 24, October 1 and on December 31 (the new year’s eve draw). All draws but the last in a year take place at 8pm (Central European Time). From 6pm onward, no more tickets can be bought for that draw.

3.3 Data and descriptive statistics

3.3.1 Overview

Our data are for 2014 and consist of 3 parts: online transactions, TV and radio advertising, and jackpot sizes. The transaction data are collected at the minute level. We observe the number of lottery tickets sold online.5 The advertising data consists of minute-level measurements of gross rating points (GRP’s), separately for TV and radio advertising. GRP’s measure impressions as a percentage of the target population at a given point in time. For example, 5 GRP’s in our data mean that in that minute 5 percent of the target population (in our case the general population) are exposed to an advertisement. This is a standard measure in the advertising industry. Besides, we observe the jackpot size for the 12 regular draws in 2014. There is no jackpot size for the 4 special draws, as more involved rules apply to them. For example, on the drawing day, every 15 minutes consumers can win an additional 100,000 euros. In the empirical analysis, we will capture differences across draws in a flexible way. We are not allowed to report levels of sales and advertising. Therefore, we will only present relative numbers and (semi-) elasticities in the tables and figures below and some vertical axis will have no units of measurements. Of course, we will still use these data when conducting the analysis.

3.3.2 Descriptive evidence

Figure 3.1 shows cumulative sales for 6 selected regular draws against the time until the draw, together with the respective jackpot size.6 Some of the draws take place one full month after the previous draw, while others will take place after less than a month. For example, the draw on July 10 follows on the one of June 24 and therefore the line for the draw on July 10 is only from

5These data have been collected using Google Analytics. In particular, visits to the “exit page” confirming payment have been recorded. This means that we do not observe what type of ticket a consumer has bought. Advertising also affects offline sales, and therefore, ideally, we would also like to observe the number of lottery tickets sold offline. However, offline transactions are not observed in the dataset. At the same time, it generally takes longer until an offline sale takes place after an individual listens to a radio advertisement or sees a TV advertisement. At the minimum, this will be the time it takes between listening to a radio commercial in the car and buying a ticket in a shop. Therefore, it will be much more challenging to measure advertising effects in offline data—a challenge we try to overcome with our high frequency online sales data. For the interpretation of our results below we focus on online sales. 6Patterns for the other draws are similar. See Figure 3.10 in the Online Appendix for the remaining draws.

54 Figure 3.1: Cumulative sales for selected draws

Notes: This figure shows cumulative sales for 6 selected regular draws. The respective jackpot amounts are given in the legend. See Figure 3.10 in the Online Appendix for the remaining draws.

June 24 (6:00 pm) to July 10 (5:59 pm). We do not expect this to have big effects, however, because most tickets are sold in the week before the draw.7 The figure shows that across draws there is a positive relationship between jackpot size and sales (that is, cumulative sales on the day of the draw). The draw on July 10 has the largest total sales of the 6 draws. It also has the largest jackpot size. The second largest sales for the draw on June 10, which also has the second largest jackpot size. However, in general, it is not true that larger jackpot size always implies larger total sales. We further explore differences across draws by regressing the log of the total number of tickets sold online on the log of the jackpot size and the total number of days between the date of the previous and current draw.8 Obviously, we only have 16 observations and jackpot size only varies among the 12 regular draws. Nevertheless, we find a significant relationship between jackpot size and sales. We estimate the effect of a 1 percent increase in the jackpot size to be a 0.4 percent increase in total sales. We find no significant effects of lagged variables on sales. Figure 3.2 shows the pattern of sales and GRP’s across different hours of a day. We average over all days in 2014 except for the days of the draw. The reason for this is that the time until which tickets can be bought is 6pm and we observe that a large amount of sales occurs during the hours before 6pm. At the same time, we observe that sales are unusually low in the first

7We nevertheless take this into account in our analysis. 8See Table 3.5 in Appendix 4.A for details.

55 Figure 3.2: Advertising and sales during the day

radio TV average GRP average

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 average transactions average 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Notes: This figure shows average GRP’s and sales for different times of the day. To produce this figure we first aggregate sales at the hourly level and then average over days and draws. We exclude the respective day of the draw because tickets can only be bought until 6pm on that day and there is a lot of advertising activity just before this deadline. See Figure 3.11 for the pattern on the day of the draw. several hours after 6pm on the day of the draw, as one would expect. So, by excluding those 16 drawing days, we can get a cleaner picture on how sales and GRP’s are distributed over time during a typical day. We distinguish between radio and TV advertisements. TV advertisements are concentrated during evening and night hours, while radio advertisements are more likely to be aired in the morning and in the afternoon. This clear separation is due to the fact that in the Netherlands TV advertisements related to gambling must not be aired during the day time, until 7pm.9 Figure 3.2 shows that GRP’s are positively correlated with sales. During the hours in which sales are high, GRP’s are also high. However, this does not necessarily mean that advertising has positive effects, because GRP’s have not been assigned randomly. For instance, it could be that consumers have more time in the evening and are therefore more likely to buy a lottery ticket anyway.10 Next, Figure 3.3 shows GRP’s and sales at the minute level for one regular draw.11 We see that

9We have tried to exploit this regression discontinuity design to produce estimates of advertising effects. How- ever, it turns out to be difficult to distinguish the discontinuity in the total number of GRP’s from a flexible time trend. The reason is that number of GRP’s increases in a continuous manner between 7pm and 9pm and did not sharply jump to a high level right after 7pm. 10This is a well-known challenge for the analysis of advertising effects. Our identification strategy for measuring the effects of advertising is akin to a regression discontinuity design and described in Section 3.4 below. 11Figure 3.12 in the Online Appendix shows GRP’s and sales for the special draw on April 26. Patterns are

56 Figure 3.3: GRP’s at the minute-level for a regular draw GRP per minute per GRP

0 10 20 30 transactions per minute per transactions

0 10 20 30 days since previous draw

Notes: This figure shows GRP’s and sales at the minute level, for the regular draw on April 10, 2014. Tickets for the next draw can be bought from 6pm on the day of the previous draw, which is depicted as 0 days since the previous draw. the firm starts advertising on the 17th day after the last regular draw, while sales only increase in the last days before the draw. This is already a first indication that advertising effects are low before those last days. Finally, Figure 3.4 zooms in further and shows the pattern for one of the days in Figure 3.3. Related to our identification strategy described below, it is interesting to notice that the raw data presented in Figure 3.4 already show some evidence of short run sales responses to advertising. For example, there are some spikes of GRP’s just before 20:50, followed by spikes of sales several minutes later. In the following section, we investigate this more systematically and show the dependence of advertising effects on the time until the draw.

3.4 Evidence on the effect of advertising

In this section, we empirically characterize the short term effects of advertising how they depend on the time until the draw. In general, a challenge for the estimation of advertising effects is that sales and advertising are recorded at a low frequency, such as a week or a month. For that reason, they may be con- founded by factors unobserved to the econometrician. This then leads to a positive correlation between the two even if advertising effects are zero. Consequently, a regression of sales on similar.

57 Figure 3.4: GRP’s at the minute-level for a short time window GRP per minute per GRP

18:03 19:26 20:50 22:13 23:36 transactions per minute per transactions

18:03 19:26 20:50 22:13 23:36 time

Notes: This figure shows GRP’s and sales at the minute level for a short time window on April 3, 2014. the amount of advertising will lead upward-biased estimates of advertising effects even if one controls for month or week dummies. Here, we overcome this challenge by exploiting the high frequency nature of our data. There are two sources of exogenous variation. The first is related to the fact that advertising buying takes place several weeks in advance. The company specifies, among other things, a time window that is at least several hours long and a target amount of advertising during that time window.12 This means that the exact timing of advertising is not controlled by the firm. The second source of exogenous variation is that for a given time in the future, it is uncertain how many viewers will be reached, as viewership demand depends on many factors other than the TV schedule, for instance the weather. This means that the target quantity bought by the firm is allocated to multiple spots, until the amount of advertising that was actually bought has been provided (see also Dubé et al., 2005). Consequently, once we control for all factors that drive advertising buying from an ex ante perspective we can estimate advertising effects by regressing sales on (lags of) advertising exposure. In practice this amounts to controlling for draw, days to the draw, and hour-of-day dummies We can also control for these confounding factors by means of a fixed effect for each time interval around an advertisement. The idea is then that the variation in advertising within this time window is random. This identification strategy is akin to the one in a regression discontinuity design: average sales

12It is in principle possible for the firm to buy specific spots. However, Staatsloterij did generally not do so because the price for those is higher.

58 just before the advertisement can be interpreted as a baseline. The average difference between actual sales after the advertisement has been aired and those sales can therefore be interpreted as an estimate of the average effect of the advertisement. Below we use a variety of different specifications that are all variants of this strategy. We first use data at the minute level to present direct evidence for a selected set of advertisements. Thereafter, we estimate a distributed lag model at the minute level, controlling for time effects in a very flexible way. Then, we aggregate the data to the hourly level to verify that estimated effects are similar. Finally, we provide evidence on the dependence of advertising effects on the time until the draw. Our key finding is that advertising effects are stronger the later an advertisement is aired. In Section 3.6.2 we develop the argument that this finding is in line with the idea that advertisements act as a reminder.

3.4.1 Direct evidence for big advertisements

In our data, there are a number of relatively small advertisements. This means that there is often only a short amount of time between advertisements, which means that providing direct evidence on the effect of advertisements is challenging as advertising effects may overlay each other. Our first approach to overcome this challenge is to select advertisements with at least 9 GRP and then only keep the ones out of these advertisements for which we do not see another big advertisement in the hour before and after.13 Figure 3.13 in Appendix 4.A shows which advertisements were used. Then, we regress sales divided by the average number of sales in the hour before the advertise- ment was aired on time to and since the advertisement, respectively, using two separate local polynomial regressions. Figure 3.5 shows the resulting plot of relative sales against the time to and since the adver- tisement was aired. Notice that sales are flat in the 60 minutes before the advertisement was aired, in line with the idea these constitute a baseline that can be extrapolated. The dashed line denotes average sales before the advertisement was aired. Assuming that this is indeed the baseline against which sales have to be compared after the advertisement was aired we find that the effect of a big advertisement is an increase in sales that lasts for about 30 minutes. The effect is fairly immediate and dies out relatively quickly. It is as high as 60 percent after a few minutes and overall leads to an increase of sales by 17 percent in the hour after it is aired.14 To provide more systematic evidence without selecting advertisements, we next estimate a dis- tributed lag model, still using minute-level data.

13Results were similar when we only kept advertisements of sizes bigger than 9 GRP. However, this results in even more selected samples. We also experimented with smaller advertisements but found that effects for those are not measurable in this direct way. 14Note, however, that this is a highly selected set of very big advertisements. The effect of an “average” adver- tisement is expected to be much lower. We have been told that an effect of an increase in sales by 1 or 2 percent for a typical advertisement is already considered big in the industry.

59 Figure 3.5: The effect of advertising on sales for big advertisements 1.8 1.6 1.4 1.2 1 sales relative to average before advertisement before average to relative sales .8

-60 -45 -30 -15 0 15 30 45 60 minutes to/since advertisement

Notes: This figure shows the effect of advertising on sales, relative to average sales in the hour before the advertisement. Obtained using separate local polynomial regressions for the time to and since the advertisement was aired, respectively. We used a fourth-order polynomial and the rule-of-thumb bandwidth. The shaded area depicts pointwise 95 percent confidence intervals. See text for additional details.

3.4.2 Evidence from a distributed lag model

A distributed lag model is a model in which we regress sales on lagged amounts of advertising. We control for draw, time of the day and days until the draw fixed effects. It is important to control for these time effects in a flexible way, because they would otherwise confound advertising and sales: there are periods in which sales are naturally higher and during which the firm also advertises more, for instance in the last days before the draw. This means that we control for systematic variation in advertising and sales. We would expect an upward bias in our estimates if we did not control for this variation, because we expect the amount of advertising to be higher at the times at which sales are high anyway. After controlling for those time effects, as explained above, we assume that the remaining variation in the amount of advertising is random, which allows us to give our estimates a causal interpretation.15 Table 3.1 shows the results when we use the log of one plus sales as the dependent variable.16

15Note that the empirical strategy we use here is similar, but slightly different from the one we used in Section 3.4.1 above. The two strategies have in common that we assume that the exact timing of advertising is random conditional on time effects. In Section 3.4.1 we control for time effects by dividing by the respective number of sales before the advertisements were aired. This is akin to an approach with multiplicative fixed effects in levels or additive fixed effects in logs. 16We have also experimented with the pure level of sales. However, we found that the effect of advertising is better captured by this specification (in an R2 sense). We use the log of one plus sales because there are hours in which sales are zero. We will interpret results as being (approximately) percentage changes. This is slightly worse

60 Table 3.1: The effect of advertising on sales

(1) (2) log(1+visits) log(1+sales)

GRP between 0 and 4 minutes ago 0.0241⇤⇤⇤ 0.0167⇤⇤⇤ (0.000918) (0.00106)

5 and 9 minutes 0.0286⇤⇤⇤ 0.0352⇤⇤⇤ (0.000832) (0.00106)

10 and 14 minutes 0.0106⇤⇤⇤ 0.0382⇤⇤⇤ (0.000650) (0.000923)

15 and 19 minutes 0.00931⇤⇤⇤ 0.0286⇤⇤⇤ (0.000661) (0.000966)

20 and 24 minutes 0.00918⇤⇤⇤ 0.0239⇤⇤⇤ (0.000682) (0.000969)

25 and 29 minutes 0.00908⇤⇤⇤ 0.0209⇤⇤⇤ (0.000708) (0.00105)

0.5 and 1 hour 0.00727⇤⇤⇤ 0.0164⇤⇤⇤ (0.000295) (0.000420)

1 and 1.5 hours 0.00635⇤⇤⇤ 0.0111⇤⇤⇤ (0.000297) (0.000413)

1.5 and 2 hours 0.00479⇤⇤⇤ 0.00871⇤⇤⇤ (0.000292) (0.000423)

2 and 2.5 hours 0.00345⇤⇤⇤ 0.00310⇤⇤⇤ (0.000301) (0.000366)

2.5 and 3 hours 0.000884⇤⇤ -0.000795⇤ (0.000297) (0.000349)

3 and 3.5 hours -0.000898⇤⇤ -0.00565⇤⇤⇤ (0.000287) (0.000322)

3.5 and 4 hours -0.00499⇤⇤⇤ -0.00940⇤⇤⇤ (0.000286) (0.000322) draw dummies Yes Yes days to draw dummies Yes Yes hour dummies Yes Yes Observations 441223 441223 R2 0.841 0.655 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

Notes: This table shows the results of regressions of the log of one plus sales on GRP’s of advertising and lags thereof. Regressions were carried out at the minute level and standard errors are robust to heteroskedasticity. Regressions separately for TV and radio advertising are shown in Table 4.6 in Appendix 4.A.

61 Column (1) is for our baseline specification. We find that the effect of advertising increases until 10 to 14 minutes after the advertisement was aired and then decreases. The main effect is observed in the first hour, but there are effects thereafter. The maximal effect is an increase in sales of about 3.7 percent for each additional GRP of advertising, between 10 and 14 minutes after the advertisement was aired. The total effect of advertising is an increase of sales by about 2 percent of the baseline sales in one hour.17 Moving to column (2) and (3), we find percentage increases in sales to be higher before the last week. This is measured from a lower baseline: sales are 8.1 times higher in the last week (see also Figure 3.1 that shows cumulative sales). Using this, we find that the absolute effect of advertising in the first hour after the advertisement was aired is about 3.5 times higher in the last week before the draw. This dependence of advertising effects on the time until the draw is closely related to advertisements acting as a reminder. In Section 3.4.3, we will characterize it in more detail. Finally, in the last column, we carry out the same regression as in column (1), except that we do not control for time. As explained before, this should lead to upward-biased estimates as we then face an endogeneity problem as sales and advertising are jointly determined. And indeed, coefficient estimates are much higher. Table 4.6 in Appendix 4.A shows results from a specification in which we distinguish between TV and radio advertisements. We find the effects of TV advertising to be stronger, but to die out faster. Radio advertisements have a longer lasting effect. This could be related to the fact that it takes time to actually purchase a ticket. When seeing a TV advertisement, an individual may buy directly using his smart phone, sitting in front of the TV. To the contrary, when listening to a radio advertisement she could be driving her car or be occupied with something else and buy the ticket at the next occasion after having finished her ride or her task. Notice that the overall effects are nevertheless similar in terms of size and therefore we generally don’t distinguish between TV and radio advertisements in this paper. We have also estimated similar models using data aggregated to the hourly level. Table 3.7 in Appendix 4.A shows the results. We find that advertising has a similar effect in the hour in which it is aired as it has in the following hour: on average, one GRP of advertising leads to about a 1.2 percent increase in the amount of tickets sold. The effect is about one third of this two hours after the advertisement was aired and not significantly different from zero (at the 5 percent level) after 3 hours. Comparing Table 3.1 to Table 3.7 shows that aggregating the data to the hourly level does not seem to have a big impact on the effect of advertising that we estimate: focusing on baseline specification, we find that the effect for the first hour that we estimate in the minute-level regressions is about 2 percent on average, and about 1 percent in the second an approximation as for the case of the pure natural logarithm. To see this denote sales without the advertisement by sales0 and with an advertisement by sales1. Then, we have that if, say, log(1 + sales1) log(1 + sales0)=0.4, t t t t then one can calculate that the increase in sales is about 50 percent provided that sales are above 2. 17This can be calculated as the weighted average of the reported coefficients, where the weights are proportional to the length of the captured time interval.

62 hour. This is important, because it would not be feasible to estimate a structural model at the minute level.

3.4.3 The dependence of the effect of advertising on time

So far, we have shown that advertising effects are measurable using our data. We have presented first evidence pointing towards them being stronger the less time there is until the draw. This dependence of advertising effects on the time until the draw is related to advertisements acting as a reminder. We now develop this argument more systematically and then characterize the relationship between advertising effects and the time until the draw in more detail. The way we think about consumers is that they are exposed to limits of information processing power and attention, which may lead to forgetting. Once they think about buying a lottery ticket, they weigh the costs of doing so at that point in time against the benefits. Costs here can be both, monetary and non-monetary, and may also include effort costs. Benefits are delayed, because the draw will only take place in the future, while costs are immediate. For that reason, consumers value to be reminded to buy a ticket as late as possible. In addition, when reminded early and therefore considering to buy a ticket early, they will be more reluctant to do so the less likely it is that they will consider buying a ticket in the future. After all, they are still able to buy in the future. This means that they will be more likely to buy, once reminded, when there is a substantial risk that they will not consider buying a ticket in the future. This in turn means that advertising effects will be strongest right before the deadline, because that is the last time at which they can buy a ticket if they have not done so already. Moreover, advertising effects will tend to decrease in the time until the deadline. This way of reasoning is fully compatible with the structural model we propose in Section 3.5. The model can be seen as a formal version of the above argument. We estimate the structural parameters of this model and then use it to predict sales for alternative counterfactual advertising strategies. One of the model properties is that advertising effects depend on the time until the draw (see Figure 3.6 below). To characterize the relationship between advertising effects and the time until the draw in more detail, ideally, we would estimate a different response curve for every day, but this is not feasi- ble. Therefore, we instead estimate the immediate absolute effect of advertising on ticket sales and relate it to the number of days that are left until the draw. For this, we aggregate data to the hourly level and take first differences to control for patterns in baseline sales. We specify

salest salest 1 = b0 + b1 (grpt grpt 1)+et (3.1) · where salest is the number of tickets sold in hour t and grpt is the number of GRP’s of adver- tising in t. We set GRP’s to 0 if they are below 3, in order to single out advertisements that are big enough to have a measurable impact.18 Moreover, guided by the finding in the previous 18There are many very small advertisements. Those small advertisements will lead to small increases in sales

63 Figure 3.6: Effect of timing

20 15 10 5 0 number of days until draw

Notes: This figure shows the immediate effect of one GRP of advertising on sales by day until the draw, for the last 21 days. See text for details. subsection that the advertising effect lasts for about 4 hours, we drop observations where we see more than 3 GRP’s of advertising in any of the four hours prior to that, which means that also grpt 1 = 0 in (3.1). Thereby, we ensure that advertising effects of previous instances have died out. Hence, the coefficient b1 that we are estimating is the immediate increase in sales in response to increasing the GRP’s by one, relative to sales before when there was no advertising. We run a separate regression for each day until the draw. We also control for draw and day of the week fixed effects to allow for differences in time trends across those.19 In Figure 3.6, we plot the estimated effects and the corresponding 95% confidence intervals against the number of days until the draw.20 Towards the time of the draw, the effects increase, in line with the idea that advertisements act as a reminder. We can use this empirical setup to make an additional observation. In general, if advertisements have an effect, then it could either be that consumers are motivated to buy earlier, but would have bought anyway (purchase acceleration). Or it means that consumers that would otherwise not have bought decided to buy (market expansion). Usually, it is challenging to empirically tell these apart from one another. To a large extent, this is the case because typically, consumers always have the possibility to buy a product later. However, in our case, there is a fixed ending that we ignore. For that reason, the specification we use here is conservative because the estimated effects are lower bounds. 19Recall that the dependent variable is the difference in sales over time. 20We have also tried to “zoom in” and show that the effect is there in the very last hours before the draw, but we only have data for 16 draws, with a limited number of advertisements in the last hours before the draw.

64 time up to which lottery tickets can be bought. This provides us with the opportunity to study whether advertising also has an effect until shortly before the draw, which is what we find. This is direct evidence suggesting that advertising does not only lead to purchase acceleration but also to market expansion. To summarize, exploiting the high frequency nature of our data, we have shown that advertising leads to economically sizable direct effects on sales in the order of a 2 percent increase. The absolute effect of advertising on sales is higher the less time there is until the draw, in line with the idea that advertisements act as a reminder. Our results also show that advertising does not only lead to purchase acceleration, but also to market expansion.

3.5 A model of lottery ticket demand

Informed by the model-free evidence, we now spell out our dynamic structural model of ticket sales. The model is useful to rigorously describe the idea that advertisements act as a re- minder. Moreover, while the data are informative about short run effects of advertising, in- ferring medium run effects directly is challenging because the necessary exogenous variation is typically missing. A model can be used to quantify these effects. Obviously, assumptions have to be made for this. Related to this, once the structural parameters that allow us to capture both, short and medium run effects, are estimated, we can predict the total amount of sales for any counterfactual dynamic advertising strategy. This is not possible without estimating a model, even if exogenous variation is present. As pointed out before, our model has elements of the rational adoption models by Melnikov (2013) and De Groote and Verboven (2016). In an adoption model, consumers decide when to buy a product. The way taste shocks affect dynamic decision making is modeled as in Rust (1987). We augment this model by advertising affecting consumer choice through an advertising goodwill stock that increases the probability that a consumer will consider buying a ticket at a given point in time. An important generalization relative to other models with an advertising goodwill stock is that the advertising goodwill stock differs across consumers. At a given point in time, some of them are reached—the percentage is known and given by the number of GRP’s—while others are not. We implement this by simulating whether or not advertising reaches each member of a number of simulated consumers whom we follow over time. These simulated consumers therefore have heterogeneous advertising goodwill stocks.21 We now first describe the building blocks of our model. Then we describe how to solve it and take it to the data. The robustness to making alternative assumptions on the market size and viewership behavior is assessed in Appendix 3.B.

21See Section 3.5.6 below for details.

65 3.5.1 General structure

There are N expected discounted utility-maximizing consumers. Choice is independent across draws. Time t = 1,2,...,T is discrete and finite and measured at the hourly level. T is the hour of the draw and the last moment at which consumers can buy a ticket. Each individual can buy at most one ticket. In every hour, each individual decides whether or not to buy a lottery ticket. If she does, then she receives a one-off flow of utility and cannot make any decisions anymore.22 Otherwise, she continues in the next period and has the option of buying a ticket there.

3.5.2 Consideration

In our model, advertising affects the likelihood that a consumer considers buying a ticket through an advertising goodwill stock.23 This goodwill stock increases if the individual is exposed to an advertisement, but from an ex ante perspective it is uncertain for the consumer whether she will be exposed to an advertisement. The number of GRP’s in our data are informative about how many consumers are reached at a given point in time and we use it to simulate a number of goodwill stocks for different consumers. This is similar to, but also extends the specification of Dubé et al. (2005), where the goodwill stock is the same for all individuals. In our model, the goodwill stock is not the same for all consumers, but the probability to see an advertisement in a given period is the same. That is, consumers are identical ex ante, but we simulate how they differ ex post.

Denote the goodwill stock of individual i at the beginning of period t by git . We will refer to the goodwill stock after the time at which the individual can be reached by an advertisement a as the augmented goodwill stock. It is denoted by git . The augmented goodwill stock affects consumer choice and depreciates exponentially over time. Let l denote the depreciation rate and assume that the initial goodwill stock is 0. The law of motion for the (augmented) goodwill stock is

a git if i did not see an advertisement in t git = 8g + 1 if i saw an advertisement in t < it with initial condition : gi0 = 0 and 22This is isomorphic to a model in which she pays a price today and expects to receive a flow utility in the future, provided that she cannot make any decisions in the meantime. 23Advertising can also have brand building effects across draws. This will be captured by draw fixed effects that also capture all the other across-draw effects. We will abstract from across-draw advertising effects in our coun- terfactual simulations because they are not separately identified from all other differences across draws without making strong assumptions.

66 g =(1 l) ga . it+1 · it The augmented advertising goodwill stock then affects the probability to consider buying a ticket. We specify this probability as

1 Pit (consider)= . 1 + exp (g + g ga ) 0 1 it 3.5.3 Purchase decision

In the consideration stage, a consumer decides whether or not to buy a lottery ticket. Buying a ticket yields flow utility T t u = p + d y + se , it i1t where p is the price of the ticket, d is the hourly discount factor, y is the value of holding a ticket at the time of the draw, and eit is a type 1 extreme value distributed taste shock (recentered, so that it is mean zero). The coefficient on the price is normalized to be minus 1, which means that flow utility is measured in terms of money. Specifying flow utility to depend on p + d T ty means that a consumer has a taste for buying the ticket as late as possible because she has to pay for it immediately but only receives a discounted benefit from this. This feature of our model is meant to capture the empirical pattern that most sales occur in the last days before the draw (see Figure 3.1). a a If a consumer chooses not to buy before the last period, she gets the continuation value dE[Vt+1(g ) g ]+ it+1 | it se , where again e is a type 1 extreme value distributed taste shock and V ( ) is the value i0t i0t t+1 · function tomorrow that is a function of advertising goodwill stock tomorrow. The expectation here is taken over whether or not the consumer will consider buying a ticket, whether she is reached by an advertisement, and future realizations of the taste shocks. We provide more de- tails below in Section 3.5.5. If she does not buy in the last period, then the terminal value is sei0t. In our model, as explained above, there is a cost to buying earlier. The benefit is that consumers won’t forget to buy a ticket later, if they want to do so in principle, as they can’t be sure to consider doing so in the future. Hence, they may forget.

3.5.4 Expectations

In our model, expectations about future advertising play an important role, as advertising re- minds consumers to buy a ticket by increasing the probability that a consumer will consider a doing so. The scalar state variable git summarizes all relevant information on consumer i’s ad- vertising exposure in the past. In addition, the value function in Section 3.5.3 is indexed by t because the consumer problem is a finite horizon one and because the probabilities to see ad- vertisements in the future change over time. If, for example, the consumer knows that there is a

67 large probability that she will see an advertisement tomorrow (or shortly before the draw), then she may be more likely to delay her purchase to tomorrow because she will likely be reminded to buy a ticket. There are two ways in which we could proceed regarding these expectations when solving and structurally estimating the model. We could either solve a game between the consumers and the firm and then use the implied beliefs. This, however, may not be promising because there could be multiple equilibria, and it may be hard to solve that game in the first place. Moreover, we would have to do this within every iteration of our estimation procedure, which would be computationally challenging (if not infeasible). And most importantly, we would have to make the strong assumption that the advertising strategy of the firm that we observe was actually optimal. Instead, we estimate this probability from our GRP data. The specification we use for this is

grpt/100 = xt0b + et, (3.2) where xt includes a constant term and a full set of hour, day, and draw dummies. The fitted value is then the probability to see an advertisement in t, which we denote by Pt. Figure (3.14) in Appendix 4.A shows this probability together with the ones we use in our counterfactual ex- periments (discussed below). We take the these expectations P T about advertising activities { t}t=1 as known.

3.5.5 Solving the model

We now describe how we solve the model for given values of the parameters, which we then vary in the outer loop of our estimation procedure. Recall that one time unit is equal to one hour. Also, observe that the time of the day does not enter the model directly. Instead, we count the time between midnight and 7am as 1 hour. This choice is guided by Figure 3.2 where one can see there are little sales during those hours.24 The state variables are time, whether or not a consumer has already bought a ticket, and the a advertising goodwill stock git . The first two state variables are discrete, while the advertising goodwill stock is non-negative real-valued. The time horizon is finite. We solve this model recursively on a grid for the advertising stock, using interpolation to compute continuation values. We use an equally spaced grid with G = 2000 grid points. Denote the set of grid points by G . We use the same grid points in each time period.

24One could in principle model the flow utility to depend on the time of the day and also on the day of the week. However, this would come at the cost of substantially increasing the computational burden. In the estima- tion procedure (described in more detail in Appendix 3.A below), we solve the model each time we evaluate the objective function. However, it is unlikely that we will suffer from the same omitted variables bias as we would when estimating a distributed lag model as in Section 3.4.2 without controlling for hour of day effects. The rea- son for this is that the model structure imposes a lot of smoothness in the sense that sales in adjacent hours are predicted to be very similar to one another. Most of the time there are no advertisements and therefore parameters capturing the evolution of baseline sales net of hour-of-day and day-of-the-week effects will not be biased. Given this advertising effects will also be unbiased.

68 The structure of the adoption model simplifies the computation considerably, as individuals can buy at most once and the value to buying consists only of the flow utility. The main task is to compute the value to not buying, for every t and on the grid for the advertising stock. Another simplifying factor is that individuals will either see an advertisement in the next hour or not, with a known probability. This means that we can write down an expression for the corresponding expectation over this event and don’t have to use simulation or numerical integration. The assumption that the taste shocks are distributed type 1 extreme value allows us to also find an analytic expression for the value to not buying in period t, given the value function in t + 1, as in Rust (1987). For that reason, we can solve the model relatively fast and on a grid with many grid points. We solve the model recursively. For each time period t and grid pointg ˜a G , we calculate it 2 a the expected value function in the next period, E[maxVit+1 g˜ ]; the value when considering to | it c a a buy in the current period, Vit (g˜it ); and the value in the current period Vit (g˜it ). Next, we provide more details. First consider the case in which an individual has not bought before the last period t = T and the goodwill stock takes on the valueg ˜a G on the grid. Then, the value to not buying is 0 it 2 because there is no future period. The value when considering in the last period is

d 0 p + y V c = s log exp · + exp , iT · s s  ✓ ◆ ✓ ◆ where d 0 is the discounted value of not buying, which is zero because the individual cannot · buy in the future, and p + y is the mean utility associated with buying. From this it follows that the value in the last period is

V (g˜a )=P (consider) V c +(1 P (consider)) d 0, iT iT iT · iT iT · · where, again, d 0 is the value associated with not buying. · Now turn to the case in which an individual has not bought before t = T 1, the second to last a period. The expected value function in the next period, E[maxVit+1 g˜ ], is | it

a a+ a E maxViT g˜iT 1 = PT ViT (g˜iT )+(1 PT ) ViT (g˜iT). | · · Here, the expectation⇥ is taken over the⇤ advertising goodwill stock and the taste shocks. The a a+ goodwill stockg ˜iT 1 either changes tog ˜iT when the individual sees an advertisement in T, a which will be the case with probability PT , or it changes tog ˜iT if not, with probability 1 PT a+ a (see also Section 3.5.2). The two values ViT (g˜iT ) and ViT (g˜iT) are obtained using interpolation. From this, we get that the value when considering in the second to last period is

a c a d ET 1 maxViT g˜iT 1 p + dy ViT 1(g˜iT 1)=s log exp · | + exp · s s " ⇥ ⇤! ✓ ◆#

69 and the value function is

a c a ViT 1(g˜iT 1)=PiT 1(consider) ViT 1 +(1 PiT 1(consider)) d E maxViT g˜iT 1 . · · · | ⇥ ⇤ We proceed in a similar manner for the remaining time periods up to t = 1. This results in values V (g˜a ) for all t and allg ˜a G . From those, we can calculate the probability of buying it it it 2 given consideration as

T t p+d y exp s P (buy consider)= it a | + T t ⇣ d E⌘t [ maxVit+1 g˜ ] p d y · | it exp s + exp s ⇣ ⌘ ✓ ◆ and the unconditional probability of buying as

P (buy)=P (consider) P (buy|consider). it it · it

3.5.6 Empirical implementation

In the first stage, we estimate the probability Pt to see an advertisement at any given point in time, as described in Section 3.5.3 above. In the second stage, we take these probabilities as given and estimate the parameters of the structural model. There is an inner and an outer loop. In the inner loop, we simulate consumer choice for given values of the parameters and compute the value of a method of simulated moment (MSM) objective function. In the outer loop we then estimate the parameters. The moments we use are related to sales at a given point in time given the advertising activity before that, and the evolution of cumulative sales. We assume the market size for Dutch online lottery tickets market is 250,000 and we simulate choices of 1,000 consumers.25 Thus each simulated consumer represents 250 real consumers. This is again a trade-off between computational burden and how realistic the model is. To implement this, we take aggregate sales and divide them by 250. The thought experiment that underlies our approach is that we match simulated sales to the expectation thereof, across 250,000 actual consumers, which is given by our data. In our aggregate data, we only observe that a consumer has bought a ticket, but not which ticket. We assume that the price of the tickets bought is 3 euros. The key assumption we make here is that everybody buys the same ticket.26

25This is considerably more than the maximum number of tickets that was sold in each month in our data. We experimented with different market sizes and found that results of the counterfactual simulations are not very sensitive to it. In Section 3.B we also present results when we assume that the market size is twice as high. 263 euros is the price for the smallest ticket one can buy. See Section 3.2 for details. See also footnote 5. Assuming a different price will only re-scale the parameters, but will not change the results of counterfactual experiments. To see this, suppose we double the price and double at the same time Y and s. Then, it follows c a a c a from the expressions above that ViT , ViT (g˜iT ) and E maxViT g˜iT 1 will double. Consequently, ViT 1(g˜iT 1) and a | ViT 1(g˜ ) will double. But importantly, Pit (buy consider) and Pit (buy) will stay exactly the same. This shows iT 1 |⇥ ⇤ that both models are observationally equivalent. Consequently, simulated sales under counterfactual advertising

70 In our estimation procedure we pay particular attention to the fact that different consumers have different advertising stocks at a given point in time, as it is random whether or not they are exposed to advertisements in the periods before that. Tentatively, there will be dynamic selection in the short run, because those consumers with higher advertising goodwill stocks will be more likely to buy, so that those with lower advertising goodwill stocks remain. Our strategy allows and controls for that. For an example, think of 250,000 individuals who may in principle buy a ticket (the market size we assume). Suppose that there are 3 GRP’s of advertising in a given hour and that there have not been any advertisements before that. Then, in expectation, 7,500 individuals will be reached. Now suppose that there are 4 GRP’s of advertising in the hour after this. This reaches in total 10,000 individuals. Some of those individuals were among the 7,500 who have already seen an advertisement before and some of those will not. We assume that it is independent over time who is reached and therefore 300 individuals will see both advertisements. After solving the model and simulating advertising goodwill stocks, we follow the simulated consumers with heterogeneous advertising goodwill stocks, at which we evaluate the value functions that we have already solved for, and then combine them with random draws uit from the standard uniform distribution for each consumer at each point in time to generate simulated choices. To be precise, we calculate the probability of buying at the simulated goodwill stock g˜a for all time periods and compare them to u . If P (buy) u , then the simulated choice it it it it dˆit is one and otherwise zero. A consumer can buy at most one ticket and therefore we set this variable to zero after a consumer has bought for the first time. Aggregating gives simulated aggregate demand, which we match to (rescaled, as described above) actual aggregated demand. Further details are provided in Appendix 3.A.

3.6 Results

3.6.1 Parameter estimates and fit

In this section, we present our estimation results and assess the fit of the model. Table 3.2 shows the estimated parameters. The effect of advertising on sales depreciates quickly, at an hourly rate of about 33.4 percent. The baseline probability of considering is estimated to be 1/(1 + exp( g )) 0.39. This means that 39 percent of the consumers will consider buying 0 ⇡ a ticket in the absence of advertising. The effect of the goodwill stock on flow utility (g1) is estimated to be 1.637, which means that a one unit increase in the goodwill stock from zero to one, driven by seeing an advertisement, will increase the probability of considering to 1/(1 + exp( (g + g ))) 0.77 . One hour later, the goodwill stock is 1 0.334 = 0.666 and 0 1 ⇡ the probability to consider buying is 0.66 if no advertisement reaches the consumer. Yet another strategies will be the same. This means that given that we assume that everybody buys the same ticket, setting the price to a particular value is a normalization.

71 Table 3.2: Parameter estimates parameter estimate ste. depreciation rate goodwill stock (l) 0.334 0.197 effect of goodwill stock on probability of considering (g1) 1.637 0.731 intercept of goodwill stock on probability of considering (g0) -0.430 0.329 hourly discount factor (d) 0.994 0.000 multiplying factor taste shock (s) 0.318 0.010 value to having a ticket on the day of the draw 10 January, 2014 1.477 0.056 10 February, 2014 1.669 0.039 10 March, 2014 1.493 0.053 10 April, 2014 1.448 0.067 26 April, 2014 (King’s Day) 1.906 0.052 10 May, 2014 1.620 0.046 10 June, 2014 1.711 0.049 24 June, 2014 (Orange draw) 1.728 0.046 10 July, 2014 1.887 0.058 10 August, 2014 1.410 0.060 10 September, 2014 1.644 0.053 1 October, 2014 (special 1 October draw) 1.608 0.048 10 October, 2014 1.541 0.057 10 November, 2014 1.510 0.051 10 December, 2014 1.749 0.045 31 December, 2014 (New year’s eve draw) 2.336 0.066 Notes: Structural estimates. Obtained using the method of simulated moments. See Section 3.5.6 and Appendix 3.A for details on the estimation procedure. The probability to consider buying is specified as P (consider)=1/(1 + exp( (g + g ga ))). it 0 1 it

72 hour later it is 0.444 and the probability to consider is 0.57. If the consumer is instead reached by another advertisement in the hour after she was first reached, then the augmented goodwill stock becomes 1.666 and the probability to consider buying is 0.91 in the second period. And when she is reached again one hour later, it is 0.95. This form of concavity in the goodwill stock is the reason why consumers prefer it when advertisements are spread over time. Then the expected number of periods in which they consider buying is maximized. The hourly discount factor is estimated to be 0.994. This means that one month before a draw, the value consumers attach to a ticket is only 3.9% of the value on the day of the draw. For that reason, consumers will value buying tickets late and being reminded at later points in time. Together with the desire to spread advertisements over time this gives rise to an interesting tradeoff that we explore further in our counterfactual experiments. The estimated standard deviation of the taste shock is 0.318 p2/6 0.41 euros ( p2/6 is · ⇡ the standard deviation of a type 1 extreme value random variable).p Finally, the 16p estimates of the draw fixed effects are in line with expectations and positively related to the size of the jackpot, mirroring the pattern in Figure 3.1.27 Figure 3.7 shows the model fit. Arguably, with only a few parameters, the model fits the overall patterns in the data relatively well.

3.6.2 The dependence of advertising effects on time

A key quantity the model predicts is the immediate effect of advertising on sales and how this effect depends on the time until the draw. Figure 3.8 shows, for our structural estimates, how the probability of buying a ticket, if the individual has not done so yet, changes when she is exposed to an advertisement. There are three lines for three different time periods, just before the draw and 1 and 2 days before that. As one can see, the closer the time of the advertisement is to the deadline, the more effective is the advertisement—in line with the model-free evidence that we have presented in Section 3.4. The figure shows that our model can generate this effect.

3.7 Counterfactual experiments

Having estimated the model, we turn to the supply side. We do not have access to data on the profitability of an additional sold ticket, and also not on the cost of one GRP. It is, however, not unreasonable to assume as an approximation that the price of one GRP does not vary over time. Therefore, it is meaningful to study whether a given (monetary) budget could be allocated better over time, by asking the question whether it is possible to sell more tickets when one allocates the same number of GRP’s in a different way.

27Observe that they are smaller than the price, 3 euros. This is expected because only a relatively small fraction of individuals actually buys a ticket, driven by favorable draws of the taste shocks. For that reason, the value to holding a ticket is higher than the mean utility for those who buy a ticket.

73 Figure 3.7: Model fit

model prediction data cumulative sales for each draw

0 50 100 150 200 250 300 350 time in days since January 1

Notes: This figure shows actual cumulative sales and cumulative sales predicted by our structural model using the estimated parameters reported in Table 3.2.

74 Figure 3.8: Dependence of predicted effect of advertising on timing

day of the draw day before draw 2 days before draw predicted absolute increase in sales

-4 -3 -2 -1 0 1 2 3 4 5 6 hours to/since advertisement

Notes: This figure shows how the predicted absolute increase in sales that is due to seeing an advertisement depends on the time until the draw. Obtained from our structural model using the estimated parameters reported in Table 3.2.

In the following, we use the model to generate counterfactual predictions about the total number of tickets sold. We consider 8 alternative strategies and compare the number of tickets sold to the simulated one for the original GRP schedule in the data. The first alternative strategy is to remove all advertising. In the second, we allocate all advertising to the last two days before the draw and distribute it equally over all hours on those two days. In the third, we spread all advertisements equally in the last 4 days before the draw. Then, there are three pulsing strategies. Studying the effect of those is interesting, as the model is non-linear in advertising exposure and its history, and therefore reacts to such patterns (Dubé et al., 2005). In the first pulsing strategy, the firm advertises in the last hour before the draw, but not in the second to last hour, again in the third to last hour, and so on, for the last 4 days. The amount of advertising, when the firm does so, is always the same. The second pulsing strategy proceeds similarly, but in blocks of two hours. The third pulsing strategy always allocates twice the amount in one hour and then pauses for three, and also lasts for 4 days. The last two counterfactual strategies take, respectively, the schedule as it is in the third week and move it to the fourth week, and vice versa. When simulating the impact of those strategies, we distinguish between two cases. In the first

75 Table 3.3: Effect of various advertising strategies

expectations strategy unchanged rational data (reference point) 100% 100% no advertising at all 94% 94% all advertising in the last 2 days before the draw 109% 108% spreading advertisements equally in the last 4 days before the draw 105% 104% pulsing strategy in the last 4 days before draw (1 hour blocks) 105% 104% pulsing strategy in the last 4 days before draw (2 hour blocks) 104% 103% pulsing strategy in the last 4 days before draw (1 hour double, 3 hour none) 104% 103% shift advertising from third week before draw to fourth week 103% 103% shift advertising from fourth week before draw to third week 95% 95% Notes: This table shows the effect of using alternative dynamic advertising strategies for the February draw. See text for a description of these strategies. In the column labeled “unchanged” consumer expectations are consistent with the advertising data we used to estimate the model and not with the changed advertising strategy. In the last column, we adjust expectations to reflect the change in the policy. Simulations are based on the parameter estimates reported in Table 3.2. case, we assume that expectations individuals have about the likelihood to be reminded in the future, by seeing an advertisement, remain unchanged (and in line with what we have used to estimate the model) even though we change the advertising strategy. The second case is one in which consumers’ expectations are rational in the sense that they reflect the changes advertising strategy (we assumed this when we estimated the model). Making this distinction is interesting because it allows us to quantify the relative importance of changes in expectations. One can think of this as a second order effect, with the first order effect being the change in the actual advertising strategy. Table 3.3 shows the result. We first focus on the last column, for rational expectations. Not advertising at all leads to 94% of the original sales. Generally, allocating advertising to later points in time increases sales. Doing the opposite leads to a decrease in sales. Moving it all to the last two days before the draw has the biggest effect, but may be infeasible in practice. Shifting advertising from the third week before draw to the fourth week may be feasible and leads to an increase of sales by 3 percent. The most successful pulsing strategy is the one with blocks of one hour. Figure 3.9 shows the underlying dynamics. We plot the difference between the cumulative sales for a given strategy and the baseline strategy. As an illustration, consider the strategy of shifting all the GRP’s from the fourth week to the third week. As expected, sales increase in the third week and decrease in the fourth week. Overall, fewer tickets are sold, which is reflected in the lower end point. The results also show that expectations of consumers matter, but are quantitatively not of first- order importance. Qualitatively, when expectations are rational, then effects become smaller. The intuition for this is that when consumers wrongly expect advertising activity to be lower at later points in time, then they already buy earlier and therefore the effect of changing the

76 Figure 3.9: Effect of different advertising strategies Predicted number of tickets sold over time relative to the baseline case 40 3to4 4to3 30 End No Pulse

20

10

0

-10

-20 Predicted nTicketSold-predicted nTicketSoldBase -30

-40 0 100 200 300 400 500 600 time

Notes: This figures shows the difference between the cumulative number of tickets sold at each point in time for a counterfactual advertising strategy and the cumulative number of tickets sold given the actual advertising schedule for the February draw. Consumer expectations are consistent with the respective actual or changed GRP schedule (column “rational” in Table 3.3). Based on parameter estimates reported in Table 3.2. advertising strategy is bigger because they are reminded more often than they expected. Con- versely, when there is no advertising anymore while consumers still expect to be reminded by it, then sales are lower. The same holds true when we shift advertising from the fourth to the third week. To summarize, the results show that shifting advertising to later points in time leads to higher sales. Given the model structure and the fixed GRP budget, this is desirable for both, consumers and the firm.

3.8 Summary and concluding remarks

This paper uses high frequency advertising and sales data to measure the short run effects of advertising. The thought experiment we undertake for this is akin to a regression discontinuity design: we compare sales just before the advertisements are aired to sales thereafter. We find short-term advertising effects to be sizable and to last for about 4 hours. Besides, we make use of the fact that there is a given purchase cycle with a fixed deadline until which consumers can buy a ticket. Exploiting this we find that advertising does not only lead to purchase acceleration,

77 but also to market expansion. Furthermore, advertising effects depend on the time until that deadline. The later the firm advertises, the higher the short term effect on sales. We develop the argument that this is novel evidence in favor of the view that advertising reminds consumers to consider buying a ticket. We then spell out a structural adoption model that can generate this dependence. We estimate the parameters of this model and simulate the effects of counterfactual dynamic advertising strategies on sales. Based on this we conclude that it is indeed likely that starting from the actual advertising schedule in the data and shifting advertising to later points in time has positive effects on sales. The context of our study is the sale of lottery tickets for an upcoming draw. This context is par- ticularly helpful for obtaining model-free evidence on the effects of advertising and structural estimates for the key model parameters, but advertisements can of course remind consumers also in other contexts. Examples include the purchase of durable goods and the adoption of technologies, in particular when there are natural deadlines such as Christmas or the end of the year, or deadlines set by the government. For instance, De Groote and Verboven (2016) study the adoption of solar panels. There is a deadline until which households have to buy in order to still be eligible for a tax subsidy. Our model could be used in such a situation to study how dynamic pricing and advertising strategies interact in a competitive environment, or which advertising strategy by the government complementing the subsidy scheme could be most ef- fective. Other deadlines set by the government concern the decision of households to enroll into a savings plan or to change health insurance. Our model can be used to design an advertising strategy that most effectively reminds individuals to do so. One could also generalize it to study the supply side interactions between firms to shed light on the question whether the possibil- ity to remind consumers leads to increased competition and lower prices. Finally, our model could be extended to study the effects of present-bias and how they could most effectively be counteracted using information campaigns. In the context of these examples one can imagine that advertisements that act as a reminder may be beneficial for consumers, for instance because they help individuals to do what they actually want to do or because they lead to increased competition through higher levels of awareness and consideration. This would be very much in line with Stigler’s (1961) original point that the possibility to provide information by means of advertisements can lead to welfare improve- ments. His argument was based on the idea that consumers have limited information about the existence of products, their characteristics, or prices. We have presented novel evidence in line with the idea that advertisements can also help consumers who suffer from limited attention and forgetting.

78 3.A Details on the econometric implementation

In Section 3.5.6, we have given an overview over the estimation procedure. In this section, we provide further details.

3.A.1 Empirical setup

The data contain information on ticket sales and advertising activities for 16 draws. Since we collapse these data during the night, every day in the model has 18 hours. The starting period is 00:00-00:59 on Jan 1 and the last period is 17:00-17:59 on Dec 31. Thus, the total number of periods is t = 6,564 (t is not to be confused with T, which we have defined in the context of our model). We divide them up into sub-periods, one for each draw. We account for the fact that they differ with respect to the total number of hours (T in the model) and the value to holding a ticket (y in the model), and of course with respect to the realized advertising activity.28 The ticket price is constant over time and across draws.

3.A.2 Method of simulated moments

The set of structural parameters that do not change across draws is l,s,d,g . In addition, we { } estimate 16 values y1,...,y16 to holding a ticket at the time of the draw. Thus the full set of structural parameters to be estimated is q l,s,d,g,y ,...,y . ⌘{ 1 16} Recall that we only have access to aggregate data. Letu ˆ (q) q q˜ (q) be the difference t ⌘ t t between actual aggregate demand qt in the data, divided by 250, and the model prediction q˜t(q) as described in Section 3.5.6. In Section 3.A.3 below we will specify a set of moments E[m(zt,uˆt(q))] = 0, where zt is a vector of exogenous variables constructed from the data so that the left hand side is a column vector and the right hand side is a vector of zeros and the expectation is taken over hours. The (technical) condition for identification is that they hold if, and only if, we evaluate the function m at the true parameters q (see for instance Newey and McFadden, 1994).

Letm ¯ (q˜) be the average of m(zt,uˆt(q)), over time in hours across all draws (thus over t time periods), evaluated at any candidate parameter vector q˜. The MSM estimator is

qˆ = argminm¯ (q˜)0Wm¯ (q˜), q˜ where W is a positive definite weighting matrix. Under the assumption mentioned above, qˆ is consistent. An estimator of the variance-covariance matrix is given by (Newey and McFadden, 1994)

28This means that T and y need to indexed by the draw, because they differ across draws. For the ease of the exposition, in Section 3.5, we have described the model only for one draw. Within each draw, t runs from 1 to the draw-specific T.

79 1 1 1 var\(qˆ)= (A0WA) B(A0WA) , t where

∂m¯ (qˆ) A = ˆ ∂q 0 and

B = A0W(m(qˆ) m¯ (qˆ))(m(qˆ) m¯ (qˆ))0WA.

3.A.3 Moments and weighting matrix zt contains 3 sets of exogenous variables: a full set of dummy variables for the number of days until the draw, the number of GRP’s in t, t 1, t 2, and t 3, and variables calculating cumulative sales up to point t. This means that we attempt to pick the parameters so that the model captures well the evolution of sales over time and the reaction to advertisements. Specifically, we stack allu ˆ (q) into a vectoru ˆ(q) of dimension t 1 and define a t M matrix t ⇥ ⇥ of exogenous variables Z =(1,Z1,Z2,Z3), where 1 is a vector of ones, Z1 contains times until draw dummies in the columns, Z2 contains GRP’s and lags thereof in the columns, and Z3 is a matrix with indicators such that it takes cumulative sales at the daily level, separately for each draw. Specifically, Z3 is block-diagonal with sub-matrices Z3,r on the diagonal (r indexing draws). Each column of these sub-matrices is for one day and contain a set of ones on top and zeros in the bottom, such that the cumulative prediction error is calculated on a daily level when we multiply Z30 withu ˆt(q). After eliminating linearly dependent columns, Z has M = 376 columns, meaning that we have 376 exogenous variables.29 Using this, we calculate

1 m¯ (q˜)= Z0uˆ(q˜). t We choose the weighting matrix W to be

1 W =(Z0Z/t) .

3.A.4 Smoothing

When using a simulation-based procedure to estimate a discrete choice model, one common challenge is that the simulated choice probabilities (if one use simulated maximum likelihood), or, in our case, simulated demand, is not a smooth function in the parameters. This is due to the

29 Z1 originally has 30 columns. Z2 contains GRP’s and 3 lags thereof, so it has 4 columns. Z3 has 365 columns. Most columns in Z1 are linear combinations of columns in Z3. After dropping those, Z1 has 7 columns left. Thus, we have in total 1 + 7 + 4 + 364 = 376 columns.

80 fact that in discrete choice models individuals are assumed to either choose to buy or not at a given point in time. Consequently, for each simulated consumer, small changes in parameters will either have no effect on his decision (which stays at 0 or 1), or change his decision from 0 to 1. Such non-smoothness can lead to problems with the usual methods for finding an optimum of the objective function because of flat spots. In principle, this could be addressed by increasing the number of simulated consumers. But it is not possible to fully overcome it, as the number of simulated consumers will stay finite. Therefore, as an alternative, we use a smoothed accept-reject simulator to make the demand function fully smooth in the parameters. We use this very conservatively, however, and only to avoid that the estimator gets stuck on a flat spot. Following McFadden (1989), the simulator that we choose has the logit form. Instead of gener- ating choices for individual i in t that are either 0 or 1, we generate smoothed choices

u P˜ (g˜a) exp it it t ˜ s Sit = 1 a , ⇣ uit P˜it (g˜⌘) it 1 + exp s ⇣ ⌘ a where P˜it (g˜t ) is the simulated probability to buy given considering, uit is a random draw from the standard uniform distribution and s is the smoothing parameter. The higher s the more smoothing there is. In our case, it is sufficient to use very little smoothing. We specify s = 0.00015.30

3.B Robustness

In this appendix, we assess how robust our results are to assuming a different market size (Ap- pendix 3.B.1) and to allowing for serial correlation in viewership behavior (Appendix 3.B.2).

3.B.1 Assumption on market size

We first assess the robustness to making alternative assumptions about the market size. For this, we re-estimate the model assuming a market size of 500,000. We expect this to mainly affect the our parameter estimates related to the baseline probability of considering to buy a ticket and for the value to holding a ticket.

30We have experimented with different values of s and the result is not sensitive to the choice of s for values of s around 0.00015.

81 Table 3.4: Robustness checks: parameter estimates when we double the market size and allow for serially correlated viewership

baseline double generalized parameter specification ste. market size ste. model ste. one minus depreciation rate goodwill stock (l) 0.334 0.197 0.404 0.097 0.193 0.006 effect of goodwill stock on probability of considering (g1) 1.637 0.731 2.041 0.199 2.085 0.362 intercept of stock on probability of considering (g0) -0.430 0.329 -2.270 0.118 -1.086 0.007 hourly discount factor (d) 0.994 0.000 0.999 0.000 0.994 0.000 multiplying factor taste shock (s) 0.318 0.010 0.109 0.024 0.339 0.006 value to having a ticket on the day of the draw 10 January, 2014 1.477 0.056 2.513 0.119 1.496 0.046 10 February, 2014 1.669 0.039 2.575 0.106 1.638 0.053 10 March, 2014 1.493 0.053 2.507 0.121 1.505 0.054 10 April, 2014 1.448 0.067 2.490 0.125 1.489 0.058 82 26 April, 2014 (King’s Day) 1.906 0.052 2.650 0.088 2.040 0.039 10 May, 2014 1.620 0.046 2.542 0.112 1.609 0.052 10 June, 2014 1.711 0.049 2.556 0.110 1.762 0.040 24 June, 2014 (Orange draw) 1.728 0.046 2.597 0.101 1.789 0.033 10 July, 2014 1.887 0.058 2.614 0.099 1.914 0.046 10 August, 2014 1.410 0.060 2.496 0.123 1.388 0.060 10 September, 2014 1.644 0.053 2.559 0.109 1.656 0.047 1 October, 2014 (special 1 October draw) 1.608 0.048 2.604 0.099 1.666 0.040 10 October, 2014 1.541 0.057 2.513 0.120 1.574 0.046 10 November, 2014 1.510 0.051 2.504 0.121 1.593 0.051 10 December, 2014 1.749 0.045 2.576 0.106 1.824 0.037 31 December, 2014 (New year’s eve draw) 2.336 0.066 2.857 0.045 2.421 0.031

Notes: Structural estimates. See Section 3.5.6 and Appendix 3.A for details on the estimation procedure. Estimates for the baseline specification are repeated in the first column (see Table 3.2). The second set of parameter estimates was obtained under the assumption that the market size is 500,000 instead of 250,000. The third set of estimates is for the generalized model with serially correlated viewership behavior described in Appendix 3.B.2. The probability to consider buying is specified as P (consider)=1/(1 + exp( (g + g ga ))). it 0 1 it Table 3.4 shows that the probability to consider buying a ticket is estimated to be lower, while the value to holding a ticket is estimated to be higher. In combination, this produces choice probabilities that are roughly half as big as for our baseline model with a market size that is half as big. In addition, the estimate of the hourly discount factor is higher and the estimated standard deviation of the taste shock is lower. The parameters that are least affected are the depreciation rate of the advertising goodwill stock and the coefficient on the goodwill stock that measures by how much it influences the probability of considering. In fact, the percentage point increase in that probability when we change the goodwill stock from zero to one is very similar across those two specifications.

3.B.2 A model with serially correlated viewership

So far, we have assumed that the probability that a consumer i is reached in t by an advertisement is given by the number of GRPs. Implicitly, this assumes that reaching a consumer in t is independent of reaching the same consumer in another period t , for instance t 1. This can 0 only be the case if viewership behavior is not serially correlated. While this is likely violated at the minute level, it may be a reasonable approximation at the hourly level at which we estimate our model. We have no data to directly quantify how likely it is that the same consumer is reached when there are advertisements in two consecutive hours. Therefore, we assess whether this assumption substantially affects our estimates and the main conclusions we draw from them by extending our model to allow for serial correlation in view- ership behavior. In this extended version of the model, there are two states for each consumer: watching TV or listening to the radio and not watching TV or listening to the radio. When estimating the model we proceed in two steps. We first simulate, for each individual, whether they are watching TV or are listening to the radio in a given time period. Then we impose that advertising can only reach those consumers who are actually watching TV or are listening to the radio. At the same time, we assume that consumer expectations are still reasonably approximated by (3.2). The reason for this is that modeling consumer expectations would involve introducing an additional state variable.31 Formally, let state k = 1 be the state of not watching or listening and state k = 2 the one of watching TV or is listening to the radio. Specify a 2-by-2 Markov transition matrix

0.80.2 P = . " 0.40.6 #

This means that if an individual is watching TV or is listening to the radio at time t, then there will be 40% chance that she will stop watching and 60% chance that she will continue watching

31We do not expect this to have a big effect on our parameter estimates. In Table (3.3), we have seen that expectations have a relatively small effect on predictions under counterfactual advertising schedules.

83 in period t + 1. From this we compute the implied 2-by-1 vector P• of stationary probabilities. Then, we use P• to simulate individual viewership demand in the first period and P to simulate paths in subsequent periods. Note that here, we treat the transition probabilities as known. We could estimate them if we had data at the consumer level. Table 3.4 shows the estimation result for this more general model. Now, the same pattern in the data needs to be rationalized by a model in which viewership demand is serially correlated. The results show that this can be achieved by a lower depreciation rate of the goodwill stock and a higher effect of the goodwill stock on the probability of buying a ticket. The remaining parameters are almost not affected.

3.C Additional tables and figures

Table 3.5: Differences across draws

(1) (2) (3) (4) all draws regular draws special draws all draws

log jackpot size 0.366⇤ 0.366⇤⇤⇤ 0.509⇤ (0.178) (0.106) (0.251)

special draw 1.509⇤⇤⇤ 2.107⇤⇤ (0.492) (0.633) log number of days 0.182 0.153 0.805 0.408 (0.174) (0.106) (1.657) (0.558) log jackpot size previous draw -0.245 (0.297) special draw in previous draw -0.107 (0.969) log number GRP previous draw 0.954 (0.583) Observations 16 12 4 15 R2 0.562 0.605 0.106 0.727 Standard errors in parentheses

⇤ p < 0.10, ⇤⇤ p < 0.05, ⇤⇤⇤ p < 0.01

Notes: This table shows the results of a regression of the log of total sales on the total number of days on which tickets could be bought and the jackpot size if the draw was regular. In column (1) and (4) we pool across regular and special draws and set the log of the jackpot size to zero for the latter. One observation is one draw. There are only 15 observations for the last specification because we lack data on the previous draw for the first one that is in our data.

84 Table 3.6: Effect of TV and radio advertising

(1) TV and radio

TV GRP between 0 and 4 minutes ago 0.0133⇤⇤⇤ (0.00118)

5 and 9 minutes 0.0415⇤⇤⇤ (0.00140)

10 and 14 minutes 0.0453⇤⇤⇤ (0.00123)

15 and 19 minutes 0.0308⇤⇤⇤ (0.00120)

20 and 24 minutes 0.0206⇤⇤⇤ (0.00108)

25 and 29 minutes 0.0144⇤⇤⇤ (0.00113)

0.5 and 1 hour 0.0116⇤⇤⇤ (0.000446)

1 and 1.5 hours 0.00819⇤⇤⇤ (0.000462)

1.5 and 2 hours 0.00412⇤⇤⇤ (0.000436) 2 and 2.5 hours -0.000105 (0.000408)

2.5 and 3 hours -0.00567⇤⇤⇤ (0.000392)

3 and 3.5 hours -0.0112⇤⇤⇤ (0.000374)

3.5 and 4 hours -0.0190⇤⇤⇤ (0.000396) radio GRP between 0 and 4 minutes ago 0.00102 (0.00143)

5 and 9 minutes 0.00544⇤⇤⇤ (0.00146)

85 10 and 14 minutes 0.00617⇤⇤⇤ (0.00147)

15 and 19 minutes 0.00474⇤⇤ (0.00146)

20 and 24 minutes 0.00818⇤⇤⇤ (0.00145)

25 and 29 minutes 0.00981⇤⇤⇤ (0.00143)

0.5 and 1 hour 0.00881⇤⇤⇤ (0.000650)

1 and 1.5 hours 0.0116⇤⇤⇤ (0.000652)

1.5 and 2 hours 0.0149⇤⇤⇤ (0.000661)

2 and 2.5 hours 0.0133⇤⇤⇤ (0.000655)

2.5 and 3 hours 0.0106⇤⇤⇤ (0.000651)

3 and 3.5 hours 0.0110⇤⇤⇤ (0.000632)

3.5 and 4 hours 0.00883⇤⇤⇤ (0.000646) draw dummies Yes days to draw dummies Yes hour dummies Yes

Observations 515205 R2 0.632

Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

Notes: This table shows the results of a regression of the log of one plus sales on GRP’s of advertising, separately for TV and radio advertising, and lags thereof. Table 3.1 shows effects when we pool TV and radio advertising together. Regressions were carried out at the minute level and standard errors are robust to heteroskedasticity.

86 Table 3.7: Evidence from a distributed lag model at the hourly level

(1) (2) (3) (4) (5) baseline TV and radio before last week last week no controls

GRP current hour 0.0120⇤⇤⇤ 0.0110⇤⇤⇤ 0.00406⇤ 0.0779⇤⇤⇤ (0.00145) (0.00238) (0.00163) (0.00536)

GRP 1 hour lagged 0.0120⇤⇤⇤ 0.0121⇤⇤⇤ 0.00529⇤⇤⇤ 0.0431⇤⇤⇤ (0.00133) (0.00245) (0.00156) (0.00533)

GRP 2 hours lagged 0.00428⇤⇤ 0.00457 0.000309 0.0277⇤⇤⇤ (0.00134) (0.00235) (0.00147) (0.00677)

GRP 3 hours lagged 0.00412⇤ 0.00369 -0.000451 0.0545⇤⇤⇤ (0.00168) (0.00306) (0.00200) (0.00654)

GRP TV current hour 0.0128⇤⇤⇤ (0.00185)

GRP TV 1 hour lagged 0.0133⇤⇤⇤ (0.00168)

GRP TV 2 hours lagged 0.00329⇤ (0.00163) GRP TV 3 hours lagged 0.00154 (0.00246)

GRP radio current hour 0.00758⇤⇤⇤ (0.00214)

GRP 1 hour lagged 0.00838⇤⇤⇤ (0.00218)

GRP 2 hours lagged 0.00775⇤⇤ (0.00239)

GRP 3 hours lagged 0.00802⇤⇤ (0.00246) draw dummies Yes Yes Yes Yes No days to draw dummies Yes Yes Yes Yes No hour dummies Yes Yes Yes Yes No Observations 7662 7662 5406 2256 7662 R2 0.917 0.917 0.858 0.942 0.263 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

Notes: This table shows the results of regressions of the log of one plus sales on GRP’s of advertising and lags thereof. In column (2) we distinguish between TV and radio advertising. Regressions were carried out at the hourly level and standard errors are robust to heteroskedasticity.

87 Figure 3.10: Cumulative sales for remaining draws

Notes: Figure 3.1 shows cumulative sales for 6 selected regular draws. This figure shows them for the remaining draws.

88 Figure 3.11: Advertising and sales during the day of the draw

radio TV average GRP average

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 average Transactions average 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Notes: This figure shows average GRP’s and sales for different times of the day. To produce this figure we first aggregate sales at the hourly level and then average over draws. On the day of the draw tickets for this draw can only be bought until 6pm. See Figure 3.2 for the pattern on the remaining days.

89 Figure 3.12: GRP’s at the minute-level for a special draw GRP per minute per GRP

0 5 10 15 transactions per minute per transactions

0 5 10 15 days since previous draw

Notes: Figure 3.3 shows GRP’s and sales for the draw on April 10, 2014. This figure shows GRP’s and sales at the minute level for the special draw on April 26, 2014 (King’s Day). The last regular draw took place on April 10, 2014. Tickets for the next draw can be bought from 6pm on the day of the previous draw, which is depicted as 0 days since the previous draw.

90 Figure 3.13: Advertisements that were used to construct Figure 3.5 GRP

time within year 2014

Notes: This figure shows which advertisements were used in the sample for Figure 3.5. It shows a dot for each advertisement with at least 9 GRP, with the number of GRP’s plotted against time. The diamonds are the advertisements that were used.

91 Figure 3.14: Expectations Individual expectation over time 0.8 phatBase phat3to4 0.7 phat4to3 phatEnd phatNo phatPulse 0.6

0.5

0.4

0.3 Individual expectation

0.2

0.1

0 0 100 200 300 400 500 600 time Notes: Figure shows the expected probability to see an advertisement, from the individual perspective. Obtained from regression of GRP’s on hours of a day dummies, days until draw dummies for the draw in February.

92 Chapter 4

Advertising Match Values and Viewership Demand

4.1 Introduction

Advertisements can fulfill a number of roles (Bagwell, 2007). They can be informative, persua- sive, or seen as a complement to actual consumption. The informative role asserts that adver- tising serves as a means to firms through which they convey information about the existence, location, price and other characteristics of products. Persuasive advertising affects consumer preferences before buying the product, while complementary advertising makes consuming a product more enjoyable. The exact role advertising plays is highly context-specific and is re- lated to whether consumers benefit or suffer from it. When studying the role of advertising, it is important to keep in mind that the exact same advertisement can be informative for some consumers, persuasive or a complement for others, and a nuisance for yet another group of consumers. This will manifest itself in heterogeneous responses of consumers to the advertisement: when an advertisement that is aired on television is a nuisance rather than being useful, then consumers will be more inclined to tune away; conversely, when it is useful to consumers, then they will tentatively stay tuned. The primary aim of this paper is to characterize this heterogeneity at the level of a group of viewers, for instance defined by their age, and a type of advertisement, for instance for clothing. In doing so, we focus on the question how many and which consumers attach value to seeing a given advertisement. This is useful for several reasons. First, it allows us to further our understanding of which consumers value which types of advertisements. This is an important input into debates of the question how we should regulate firms selling advertising, such as online platforms, newspa- pers, television channels, and radio stations. The interests of consumers, advertisers and firms selling advertising space are the more aligned, the better advertisements are matched to con- sumers’ interests. Second, and related to that, it is in line with the idea of targeted advertising,

93 where firms deliberately select groups of customers they want to reach. Our results are useful, as they show how one can use television viewership data to find out which consumers are more inclined to view a given advertisement. Third, and also related to the first point, information on heterogeneous match values is valuable for the broadcasters who sell advertising slots to firms. An important question for them is how to reach as many consumers as possible in a given commercial break. The match value provides a basis for answering such a question, as it will provide guidance for the ordering of advertisements within a given break. The empirical approach is to study the match value between consumers and advertisements using data from Israel pertaining to the reaction to advertising at the level of a consumer and advertising type. Controlling for potentially confounding factors and baseline patterns using a host of information in our data, we relate viewership at the level of commercials within a commercial break to the advertising history. The resulting estimates of advertising effects have the interpretation of a revealed preference measure that is closely related to the concept of a match value: the more consumers leave the channel after an advertisement was aired, the lower the match value between the specific advertisement type and those consumers. Our data are more detailed than most datasets that have been used before, as they contain infor- mation on the exact location of each advertisement within the commercial break, classify ad- vertisements into groups by their industries, and at the same time contain viewership measures at the level of consumer segments. This allows us to separate the effects of the advertisement itself on viewership demand at the viewer segment level from the effects of advertisement slots and characterize heterogeneity in the effects. We find that the viewer responses to advertisements are heterogeneous, both across different advertisements for a given viewer segment and across various viewer segments for the same advertisement. This implies a high degree of heterogeneity in the match values between adver- tisements and viewers. Moreover, we notice that the number of viewers that an advertisement could reach in a given commercial break depends crucially on its advertisement slot within the commercial break. This implies that there are rich and heterogeneous patterns of dynamic se- lection, meaning that the number of viewers in a given group and at a given position within the commercial break depends on the composition of advertisements that have been aired until then. This means that there exist externalities among advertisements: advertisements that are broad- casted in earlier slots impose externalities on the advertisements in later advertisement slots. The existence of externalities has direct implications on the total number of advertising impres- sions that can be sold since the advertisements in the early spots during a commercial break will have an impact on both how many and which consumers will be exposed to the advertisements thereafter. We quantify this effect in a set of counterfactual experiments both across viewer segments and across shows. Our results indicate that by re-ordering the set of advertisements in the commercial break, the broadcasters could reach approximately 5% more viewers. The effects depend on the default order in the data and the characteristics of the viewers. Moreover, we find that the effect is larger across viewer segments than across shows.

94 A broader point we wish to make with our paper is that once we see viewership response to advertising as a revealed preference measure, the result of our counterfactual experiments sug- gests that overall welfare effects of reordering advertisements could be positive, in line with the view that more targeted advertising can be welfare-improving. We add to the literature in several ways. First, we provide a conceptual framework which unifies the ways through which advertising affects consumers. Our detailed data on the advertisement’s exact location within the commercial break allow us to separate the effects of the advertisement itself on viewership demand at the viewer segments level from the effects of advertisement slots and characterize heterogeneity in the effects. Second, unlike the previous literature, which has focused on contemporaneous effects of advertisement content on viewership demand, we study sequences of commercials within commercial break and the dynamic pattern of advertising avoidance. Third, and related to advertising targeting, we demonstrate how to reach as many consumers as possible in a given commercial break using information on heterogeneous match values. In this respect, our work is most closely related to Wilbur et al. (2013) who analyze how to select, order, and price advertisements in a TV commercial break of endogenous length in order to correct audience externalities. One important conceptual difference is that Wilbur et al. (2013) regard audience externalities as advertisement stimulus that advertisers strategi- cally include in their advertisements that increase their own effectiveness at the expense of the network’s remaining audience size. We instead think of advertising externalities as the loss in viewers caused by the low match value between the audiences and advertisement contents. An- other difference is that we have data on both the advertisement content and the viewer segments and therefore we could measure the match value between the two. The third difference is that unlike Wilbur et al. (2013) who only have data on program genre in terms of 3 categories, we have rich program information on top of program genre. Therefore, we can control for it at a finer level, for instance by means of show fixed effects rather than genre fixed effect, when estimating the match value. Besides, our paper relates to several strands of the literature. The first strand is related to the matching function of advertising. Anand and Shachar (2001) show that advertising can act as a matchmaker in the sense that it makes viewership audiences more homogeneous. More recently, Tuchman et al. (2017) regard advertisements as endogenous goods in a world in which consumers jointly decide how much adverting to watch and how many goods to purchase. They find that consumers tend to watch more advertisements of the brands that they have previously purchased. We complement their work by characterizing the rich heterogeneity in match values using high frequency data with information on consumer segments and program type. The second strand of the literature that our paper is related to is on measuring the effect of advertising content on viewership demand. Woltman Elpers et al. (2003) find that both the en- tertainment and the information value have a strong effect on the consumer’s viewing decision. Schweidel and Kent (2010) find that across program genres, viewers avoid fewer advertise- ments of dramas while avoiding more advertisements of reality television experiences. Li-

95 aukonyte et al. (2015) study the effect of advertising content in an online setting. They find that action-focus content increases direct website traffic and sales. Information-focus and emotion- focus advertisement content actually reduce website traffic while simultaneously increasing purchases. Wilbur (2016) uses the Passive/Active Zap as a more precise measure of television advertising avoidance. The author finds that advertising avoidance is negatively associated with movie advertisements and positively associated with advertising for websites, auto insurance and women’s clothing. advertisement avoidance also tends to rise with repeated exposures to the same advertisement creative, advertising aired on general-interest television networks, later hours of the evening, and rainfall. Interestingly, Deng and Mela (2018) find something differ- ent: rather than brand- or show-specific factors, most of the advertising avoidance is explained by unobserved individual specific factors. Unlike us, Deng and Mela (2018) do not have data on advertising exposure. They instead infer it. We add to this by estimating the effects of advertisements separately from the effects of advertisement positions on viewership demand. Third, our work is also related to the growing literature on advertising targeting. Deng and Mela (2018) show that household-level advertising targeting can be more effective than show- level targeting. Goldfarb and Tucker (2011a) use a large-scale field experiment to show that targeted online advertisements increase purchase intent. In a related study, Lambrecht and Tucker (2013) find that online targeted advertisements have a positive effect on consumers with narrowly construed preferences. Goldfarb and Tucker (2011b) show that advertisers are willing to pay more for targeted search advertisements. Tuchman et al. (2017) suggest that advertisers should target those who have previously consumed the advertised product. We contribute to this strand of the literature by showing the broadcaster could reach more consumers by re-ordering the given set of advertisements in the commercial break. Fourth, our paper is related to the wealth of empirical evidence on advertising effects and the roles of advertising.1 Sovinsky Goeree (2008) studies consumer demand in the U.S. Personal Computer Industry where advertising conveys characteristics of the products to consumers. Draganska and Klapper (2011) find that advertising has a direct effect on brand awareness in addition to its effect on consumer preferences. More recently, He and Klein (2018) assert that advertisements can also remind consumers to buy. The reminding effects of advertising belong to the informative role since it provides consumers information on deadline before which con- sumers could buy the product. On the complementary role of advertising, Tuchman et al. (2017) regard advertisements as endogenous goods and thus consumers jointly decide how much ad- vertisements to watch and how much goods to purchase. In line with this view of how adver- tisements affect consumers, they find that consumers tend to watch more advertisements of the brands that they have previously purchased. The rest of this paper is structured as follows. Section 4.2 outlines a conceptual framework in which advertising externalities are highly relevant for both advertisers and publishers. The institutional background for our empirical analysis is briefly discussed in Section 4.3. Section

1Summarizing this literature is beyond the scope of this paper.

96 4.4 describes the data and shows the related descriptive statistics. Section 4.5 discusses our empirical strategy and Section 4.6 shows the results. Section 4.7 contains the results from our counterfactual experiments. Finally, Section 4.8 concludes.

4.2 Conceptual framework

Our study proposes to use consumer’s reaction to advertisements to measure the match value between consumers and advertisements. This section details the underlying idea and its impli- cations for advertisers and broadcasters wishing to maximize the reach of their advertisements. Imagine that it is 8pm in a typical evening and viewers are watching a TV show. In the middle of the show, a commercial break begins. Viewers may respond to the commercial break in different ways. Some viewers may choose to switch to another channel. Others may stay in the channel. As the commercial break progresses, more viewers switch back so as not to miss the continu- ation of the show. In this thought experiment, there are three types of the viewer reactions to advertising across various dimensions. The first of these is the manner by which different types of viewers react to a given commercial. For example, assume that men prefer cars to women’s clothing and the opposite holds for women. Now suppose the broadcasted advertisement is about women’s clothing. The behavioral response of the aforementioned preference relation is that more men will leave the channel than women. Conversely, during a car advertisement, more women will switch to another channel than men. Here the revealed preference is informative about the match value. More formally, if m( , ) : viewer segment advertisement content R · · ⇥ ! denotes a function that assigns every pair (viewer segment, advertisement content) a value, then the above example can be formulated as

m(women, women’s clothing) > m(men, women’s clothing), and

m(women, car) < m(men, car).

The extent of the reaction varies with different types of viewers, holding the advertisement constant. Of course, one can also hold the viewer types constant and vary the advertisement types. In the above example, it means that the match value between women and women’s clothing is higher than that between women and cars, and that the match value between men and cars is higher than that between men and women’s clothing. Using the same notation, we have

m(women, women’s clothing) > m(women, car), and

97 m(men, women’s clothing) < m(men, car).

The last type of reaction is that viewers may switch back to the channel at different points in time. For example, it could be that men are more averse to missing the continuation of the show resulting in that women switch back to the channel 30 seconds before the end of the commercial break while men switch back a bit earlier. Unlike the previous two types of reactions which are static, the last one is dynamic. The dynamic feature reveals an important aspect of advertising: advertisements impose externalities on each other. The idea is analogous to the original idea of externalities in public goods (e.g., factories pollute the river where other people go fishing). More precisely: during a commercial break, the commercial that was broadcasted in the previ- ous time slot affects the viewership of the succeeding advertisements. For illustration, suppose that in the previous example the commercial break consists of two advertisement slots. First, consider the case where the broadcaster airs the women’s clothing advertisement in the first time slot and the car advertisement in the second. In this case, the women’s clothing advertisement imposes an externality on the second advertisement: by assumption, more men have left the channel when they see the first advertisement. Thus, the car advertisement which is placed in the second slot has a diminished value because it could not reach the group of audiences it is meant to reach. Now consider the opposite case in which the broadcaster put the car advertise- ment in the first slot and the car advertisement in the second slot. Then it is easy to see that the converse has happened: the women’s clothing advertisement lost from its value because of the externality generated by the proceeding car advertisement. The externality generated by the previous advertisement affects not only who and how many will be reached by the advertisement after it but also how many will be reached in the whole com- mercial break. Reconsider the previous example. If the broadcaster airs the women’s clothing advertisement in the first slot and the car advertisement in the second, then the whole commer- cial break will reach the whole population plus the number of women while the opposite order will reach the whole population plus the number of men. Thus the total number of viewers reached by the two different order is different. This fact implies the ordering of advertisements in a given commercial break is important not only for advertisers but also for broadcasters. For advertisers, the ordering affects both who and how many viewers will see their advertisements. For broadcasters, the ordering of advertisements affects how many consumers they could reach in total during a commercial break. In Section 4.7, we quantify this externality using the esti- mates derived from our empirical analysis. The second reason why externalities are important is related to correctly measuring the match values. Comparing the loss in viewership of two advertisements without properly controlling for externalities could lead to drawing imprecise conclusions. Once again consider the previous example. Suppose that compared to the viewers before the start of the commercial break, one percent of the viewers have left the channel after the women’s clothing advertisement and two

98 percent have left after the car advertisement, it might be tempting to conclude that the women’s clothing has a higher match value to the overall audiences. But it could just be that women’s clothing advertisement is placed in the last slot before the end of the commercial and thus viewers are coming back. Our data and empirical strategy allow us to properly control for the effect of advertisement position as well as other potentially confounding factors and thus measure the match values in a credible way.

4.3 A short history of Israeli television

The analysis in this paper is based on data from Israel. The first Israeli television channel was the Israel national channel also referred to as Channel 1, which began to broadcast in 1968. 22 years later, in 1986, the second Israeli television channel was introduced, commonly referred to as Channel 2. In contrast to channel 1 which is a governmentally financed non-profit channel, channel 2 is a commercial for-profit channel. In an attempt to introduce a degree to competitiveness in the programming, the regulator divided the broadcasting days of channel 2 among three networks: Keshet, Telad and Reshet. Each of the three networks broadcasts on different days on the same channel whereas "Channel 2 news" is an independent company jointly financed by the advertisement income of the three networks. For example, in 2003 Telad broadcasted on Sunday and Wednesday; Reshet on Tuesday, Friday and Saturday (until April); and Keshet broadcasted on Monday, Thursday and Saturday (from April onwards).2 In 2002 a second commercial channel was introduced, Channel 10. The television industry in Israel is highly regulated, licenses for television channels are given out scarcely. Furthermore, the regulation permits commercials to be broadcast on only two channels: 2 and 10. Thus, until 2002, channel 2 was a monopoly over television advertisement in Israel, this changed with the entry of channel 10. Parallel to the development of the public commercial television, cable and satellite broadcasting began to operate in Israel circa 1990. Three operators were chosen to provide throughout Israel, where each one operated in a different geographical region of Israel. In 2003 the three cable companies were unified under one company, Hot Telecommunication Sys- tems. 2000 saw the introduction of a direct broadcast satellite operator - YES. By the early to mid-2000’s (2001-2005), roughly 66% of households had access to multichannel broadcast, consisting of roughly 74% of households owning a television.

4.4 Data and descriptive statistics

In this section, we first describe our viewing data and define the set of variables that will be used in the empirical analysis. Afterwards, we describe our sample selection procedure. Finally, we

2Telad lost concessionaires broadcasting bid for channel 2 and ceased broadcast on 30th of October 2005, leaving only Reshet and Keshet.

99 show descriptive statistics for our estimation sample.

4.4.1 Data description

Two datasets are integrated for this study, including 1) the Programs data and 2) the Spots data. The data contain information on the program viewership and timing on three television channels in Israel between 2001 and 2005. Both datasets are purchased from Kantar Media, a private firm which collects data on television viewership. Programs data detail the start time, end time and genre of every program and commercial break on channels 1, 2 and 10 between 20:00-22:00 from January 1st, 2001 and until December 31st, 2005. The genres were defined by the data collection agency and are: children; cinema; commercial; culture, leisure & education; documentary; entertainment; news & current events; sports; television drama; and other. The Spots data contain information on the viewership market share across various segments of the population for every commercial on each one of the three aforementioned channels broad- casted between 19:00-22:59 from January 1st, 2002 and until December 31st, 2005. The seg- ments are based on a survey and include: all households; households by income (a lot below average, below average, average, above average and a lot above average); and households by age (4-24, 25-44, 45-64 and 65+). Moreover, the dataset details the start time, duration and industry of each commercial. There are 25 industries: beverages; cleaning & pest control; construction & housing; consumer goods; cosmetics & hygiene; education; fashion & accessories; finan- cial services; food; fundraising; furniture & household equipment; gambling; health; household appliances; leisure & entertainment; organizations; other; other services; public information; public service; retail & supermarkets; social cultural messages; technology & communications; tourism & accommodation and vehicle. Notice that the data on viewerships are marginal view- erships for each income and age group. We do not have the joint viewerships. Based on the start time, end time, channel and the show on which the advertisement was aired, we classify each advertisement into a unique commercial break. Two advertisements are classi- fied into the same commercial break if 1) they are aired on the same date, same channel, same show and 2) the start time of the second advertisement is no later than 5 seconds after the end time of the first advertisement. After the classification, we have a panel structure in the sense that there are multiple advertisements for each commercial break. The panel is unbalanced since each commercial break may contain a different number of advertisements. For example, some short commercial breaks contain only one advertising while other ones contain more than 20 commercials. As mentioned in Section 4.2, the exact location of the advertisement in the commercial break is important since advertising in the previous location has an impact on the viewership of the advertising after it. Therefore, we assign a slot location number p = 1, 2,...,P to each ad- vertisement within the commercial break. Thus, p = 2 means the advertisement is the second

100 commercial broadcasted within the commercial break. Moreover, for advertisements that are broadcasted between 20:00-22:00, we also observe the viewership market share 1 minute before the start of the commercial break, we set p = 0 for the time before the start of the commercial break.3

4.4.2 Sample selection

First, we notice that some of the 25 industries have only a few observations so that we could not estimate those industries precisely. Therefore, we decide to put those industries into the industry “other”. After grouping, we have 20 industries in total including: beverages; cleaning & pest control; construction & housing; consumer goods; cosmetics & hygiene; education; fashion & accessories; financial services; food; furniture & household equipment; gambling; health; household appliances; leisure & entertainment; organizations; other; retail & supermarkets; technology & communications; tourism & accommodation and vehicle. After dropping the commercial breaks that contain missing data, we are left with 39,817 com- mercial breaks. In order to make commercial breaks more comparable in length, we only keep commercial breaks that contain between 6 and 20 advertisements. Figure 4.6a in Appendix 4.A displays the distribution of commercial breaks by the number of advertisements. We use only the observations between the two vertical red lines that account for roughly 66.7% of the orig- inal sample. Finally, Figure 4.6b shows the distribution of commercial breaks by viewerships of the first advertisement. We drop commercial breaks if the average viewership of all house- holds of the first advertisement in the commercial break is less than 1 percent. Our final sample consists of 18,680 commercial breaks. For the continuation, all results are based on this final sample.

4.4.3 Descriptive statistics

This section reports descriptive statistics for advertisements, commercial breaks, and viewer- ship. Table 4.1 reports the total number of advertisements and their respective average length for each of the 20 industries. There are a total of 210,440 advertisements in the sample. Among these, food, technology & communications and cosmetics and hygiene industry are the most prevalent. The average length of an advertisement across industries is 0.34 60 = 20.4 seconds ⇤ with a standard deviation of 0.167 60 = 10.02 seconds. Advertisements from industries such ⇤ as organizations, technology & communications and other are longer the longest, on average. Table 4.2 and 4.3 present summary statistics of commercial breaks. There are 10 program Gen- res in the data: Children; Cinema; Commercial; Culture, Leisure & Education; Documentary;

3Since Programs data are collected between 20:00-22:00 while Spots data are between 19:00-22:59, we only observe the viewership before the start of the commercial break for about half of the sample (those between 20:00- 22:00).

101 Table 4.1: Number and length of advertisements by industry

Industry lengh (in minute) number mean std. dev. beverages 0.327 0.168 16,970 cleaning & pest 0.349 0.156 7,568 construction & housing 0.309 0.143 2,097 consumer goods 0.253 0.119 789 cosmetics & hygiene 0.329 0.132 21,950 education 0.283 0.115 1,938 fashion & accessory 0.288 0.205 8,656 financial services 0.372 0.166 14,645 food 0.344 0.160 39,102 furniture & household equipment 0.243 0.110 1,904 gambling 0.330 0.181 7,915 health 0.292 0.131 6,950 household appliance 0.248 0.147 3,530 leisure & entertainment 0.336 0.171 14,574 organizations 0.450 0.141 4,412 other 0.399 0.172 3,101 retail & supermarket 0.261 0.152 17,595 technology & communications 0.419 0.167 26,881 tourism & accommodation 0.347 0.161 2,123 vehicle 0.339 0.153 7,740 total 0.340 0.167 210,440

Table 4.2: Summary statistics of program genre program genre associated with commercial break frequency percent Children 30 0.16 Cinema 565 3.02 Commercial 2,892 15.48 Culture, Leisure & Education 552 2.96 Documentary 1,562 8.36 Entertainment 3,617 19.36 News & Current Affairs 6,483 34.71 Other 278 1.49 Sports 653 3.50 Television Drama 2,048 10.96 total 18,680 100

102 Table 4.3: Summary statistics of commercial breaks

mean std. dev. total length of the break (in minute) 3.8 1.2 number of advertisements in the break 11.4 3.5 percentage into the program at the start of the break 53.3 22.9

Entertainment; News & Current Affairs; Other; Sports and Television Drama. Across pro- gram genres, commercial breaks are placed more frequently during News & Current Affairs (34.71%), Entertainment (19.36%) and Commercial programs4 (15.48%). The average length of a commercial break is approximately 3.8 minutes with a standard deviation of 1.2 minutes. Moreover, the average commercial break includes about 11 advertisements. The temporal po- sition of the commercial break within the program (defined as percentage into the program) is (time program start time+1) 100 defined as (program end time program start⇤ time) , where time (in second) is the start time of the first advertisement (p = 1) of the commercial break. On average, commercial breaks start in the middle of the program (about 53 percent into the program). Targeted advertising at the show level rests on the observation that audiences are similar within a show and are different between shows. Therefore firms could air advertisements on a subset of shows to target a certain subgroup of audiences. Figure 4.1a and 4.1b show the viewer composition before the start of the commercial break across different program genres by age and income respectively. The numbers are reported in terms of average Gross Rating Points (GRP), which are defined as the average percentage of viewers that have been reached of the total population coming from a demographic subgroup viewing a specific genre at a given point in time. For example, Children programs reach, on average, about 6 percent of the viewers in the population. Within those, 2.7 percentage points are of the viewers from age between 4-24, 1.5 from age between 25-44, 1.1 from age between 25-64 and 0.7 percentage points from age above 64.5 Across age groups, there is substantial heterogeneity in terms of genres. For example, the youngest group like Children program most while the senior viewers (those from age group 65+) prefer News & Current Affairs programs. Across income groups, the differences in viewer ratings between genres are smaller. Viewers across all income groups slightly prefer Entertainment program. The exact viewers composition for each program genre and for each viewers segment is summarized in Table 4.8 and 4.9 in Appendix 4.A.

4Program genre “Commercial” pertains to a commercial break between programs as opposed to commercial breaks within programs which are assigned the genre of the program in which it is broadcasted. 5The number is calculated using the percentage of viewers for each viewer segment weighted by the average weight of that segment in the population.

103 Figure 4.1: Viewer segments across program genres

(a) Viewers across program genres by age group

Children

Cinema

Culture, Leisure & Education

Documentary

Entertainment

News & Current Affairs

Other

Sports

Television Drama

0 2 4 6 8 percentage

age 4-24 age 25-44 age 45-64 age 65+

(b) Viewers across program genres by income group

Children Cinema Culture, Leisure & Education Documentary Entertainment News & Current Affairs Other Sports Television Drama

0 2 4 6 8 percentage

a lot below average below average average above average a lot above average

Notes: The figures present the average percentage of viewers within an age or income subgroup viewing a specific genre. The numbers are reported in terms of GRP’s: the average percentage of viewers that has been reached in the total population coming from an age or income subgroup. It is calculated by the viewer ratings of a viewer segment times the weight of that segment in the total population. Notice that the numbers are those before the start of the commercial break. Recall that we only observe the viewership before the start of the commercial break for about half of the sample (those between 20:00-22:00).

104 4.5 Empirical strategy

Our goal is to measure how the value viewers attach to advertisements differs across groups of viewers and types of advertisements. To this end, we specify

p log(ygbp)=Âq gk  dkbp˜ + l1(tbp)+l2b + l3gp + ugbp, (4.1) k p˜=1 where ygbp is the viewership rating for household segment g for commercial slot p within com- mercial break b, defined as the average percentage of viewers seeing that particular commer- cial.6 dkbp˜ is an indicator for commercialp ˜ in break b being of type k, so that Âdkbp˜ = 1. This means k p that  dkbp˜ is the number of commercials of type k that a consumer has seen when watching p˜=1 7 commercial p. Our key parameter of interest is q gk, which measures the effect advertisements of type k have on viewership in group g. The dependent variable is log viewership, which means that qgk is approximately the percentage change in viewership of group g if the advertisement is of type k, holding other factors fixed.

In order to estimate qgk, we control for potentially confounding factors. To this end, we leverage our high frequency data. tbp is the calendar time at which slot p in break b is aired and l1(tbp) is a set of variables capturing time effects, including hour-of-the-day, day-of-the-week, month, year, memorial day and holiday dummies. l2b accounts for differences across commercial breaks using fixed effects for the network, the channel, the show, and the percentage in the show when the commercial break starts. Next, l3gp is the group-specific effect of the location p within commercial break b. This captures the U-shaped pattern reflecting general advertising aversion that motivates consumers to switch channels at the beginning of a commercial break and return after a few minutes to continue watching the show they are interested in.

Formally, we assume that ugbp is mean independent of dkbp˜ given l1(tbp), l2b, and l3gp. This means that the variation we exploit comes from differences in the composition of commercial breaks for the same show and across times at which the show is aired, while we control for time effects in a flexible way. With our specification (4.1) we use this variation to separately estimate the effects of the advertisements of type k on the viewership by group g, q gk, and the effect of the location in the commercial break on viewership rating, l3gp. In terms of functional form restrictions, the key assumption we make is that viewership of group g during position p is invariant to permutations in the sense that it does only depend on the number of advertisements of particular types up to that point, but not the order in which they have been shown.

6The base stays constant over time so that we can think of the dependent variable as the log of the number of viewers. 7The viewership is measured as the average over the time during which commercial p is aired. When reactions to advertisements are almost immediate, then this is very close to the viewership after consumers have chosen to either keep watching commercial p or tune away.

105 Figure 4.2: Effect of advertisement number .05 0 -.05 -.1 -.15 percentage change viewers relative to baseline to relative viewers change percentage -.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 advertisement number

4.6 Results

We first present results when we pool across groups. Figure 4.2 shows the percentage change of viewers across different advertisement positions in the commercial break relative to viewers just before the start of the commercial break. Viewers start to leave the channel after the first advertisement. More viewers leave the channel after the second advertisement. Then the speed of leaving gets smaller and smaller. After the fourth advertisement, viewers start to return. More viewers return closer to the end of the commercial break. Figure 4.3 shows the percentage change of viewers relative to just before the start of the com- mercial when one more advertisement of a particular type is shown. As one would expect, in general, an advertisement has a statistically negative effect on the rating. Across industries, for given advertisement slot in the commercial break, advertisements such as financial services and education are less avoided by viewers than, for example, organizations advertisements. Next, Figure 4.4a indicate the percentage change of viewers relative to baseline for each adver- tisement number. Across age groups, Those from age group 65+ leave the channel faster but also return faster. The pattern is similar across income groups. The highest income group has a deeper U-shape compared to other income groups. The next two figures show the effect of the advertisement industry and advertisement number for different groups. Figure 4.5a reports the effect of advertisement across industries by different age groups. The figure suggests that the match value varies both across advertisement industries and across income groups. For example, advertisements pertaining to consumer goods have a higher match value with younger viewers (age group 25-44) relative to other advertisement

106 Figure 4.3: Effect of advertisement industry .01 0 -.01 -.02 -.03 percentage change viewers relative to baseline to relative viewers change percentage food other health vehicle fashion tourism gambling education beverages construction organizations entertainment financial service financial consumer goods consumer cleaning and pest and cleaning furniture and house and furniture household appliance household tech&communication retail and supermarket and retail cosmetics and hygiene and cosmetics

industries. However, they have a lower match value with age groups 45-64 and 65+ relative to other industries. Another example is that viewers of the youngest age group prefer tourism advertisements while those from age group 45-64 dislike them. Figure 4.5b shows the same content but across income groups. The match value varies across both dimensions. Taking again consumer goods as an example, the match value is higher for the higher income group (those with average income or above) relative to other industries, but is less preferred by those below average income. The regression tables with the estimated coefficients are reported in Appendix 4.A. Overall, the estimates indicate that there exist rich heterogeneities in terms of match value across both viewer segments and advertisement industries. The evolution of viewers across advertisement slots during a commercial break is U-shaped.

4.7 Counterfactuals

Having estimated the parameters, we now use them to demonstrate the idea of how externalities among advertisements could possibly affect both advertisers and broadcasters. Consider a typical evening at 8pm. People are sitting in front of their TV watching programs on channel 2. There is a commercial break coming in. The advertiser bought a slot during the commercial break. The exact percentage of viewers that will be reached by her advertisement will depend on the slot in the commercial break. Table 4.4 reports the effects of re-ordering advertisements on the predicted percentage of viewers that is reached by advertisements. We

107 Figure 4.4: Effect of advertisement number by age and income

(a) Effect of advertisement number by age .1 0 -.1 -.2 -.3 percentage change viewers relative to baseline to relative viewers change percentage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

age 4-24 age 25-44 age 45-64 age 65+

(b) Effect of advertisement number by income .1 0 -.1 -.2 -.3 percentage change viewers relative to baseline to relative viewers change percentage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

a lot below average below average average above average a lot below average

108 Figure 4.5: Effect of advertisement industry by age and income

(a) Effect of advertisement industry by age

beverages cleaning and pest construction consumer goods cosmetics and hygiene education fashion financial service food furniture and house gambling health household appliance entertainment organizations other retail and supermarket tech&communication tourism vehicle -.06 -.04 -.02 0 .02 percentage change viewers relative to baseline

age 4-24 age 25-44 age 45-64 age 65+

(b) Effect of advertisement industry by income

beverages cleaning and pest construction consumer goods cosmetics and hygiene education fashion financial service food furniture and house gambling health household appliance entertainment organizations other retail and supermarket tech&communication tourism vehicle -.06 -.04 -.02 0 .02 .04 percentage change viewers relative to baseline

a lot below average below average average above average a lot below average

109 do the experiments for two commercial breaks in two programs. The upper panel of the table is for a News & Current Affairs program and the lower panel is for a Children’s program. Both of them have 7 advertisement slots. The effects are calculated as such: we first predict, for each age group and for each advertisement in the commercial break, the percentage of viewers that have been reached. We use this to calculate the GRP’s by multiplying it by the weight for each age group in the whole population. First look at the upper panel of the table. The respective GRP’s for slot 0 are the percentage of viewers before the start of the commercial break. In this case, 3.84 percent of the viewers in the whole population are watching the News program before the start of the commercial break. The actual order means the order in the data which, for this particular commercial break, is technology & communications; food; food; food; education; fashion & accessory; cosmetics & hygiene; education; education; leisure & entertainment; education; and financial services. Under the actual order, the first advertisement, technology & communications, reaches 3.84 percent of the viewers. The second advertisement, food, reaches 3.45 percent of the viewers, so on and so forth. The worst order means ordering the advertisements by their estimated match value from the lowest match value to the highest match value and put them in slots 1 to 12 in a ascending order. That is, put the advertisement with the highest match value in slot 1, put the advertisement with the second highest match value in slot 2, so on and so forth. In this case, the worst order is: cosmetics & hygiene; fashion & accessory; leisure & entertainment; food; food; food; financial services; and then followed by four education advertisements. Conversely, the best order means putting the advertisements in the descending order of their match value. Notice that by construction, the best order is exactly the reverse of the worst order. To reason why placing the advertisements in the descending order of their match value is the best strategy for the broadcaster is intuitive: those advertisements with higher match value will retain many audiences so that those advertisements with lower match value could reach them when they are aired. Conversely, if the lower match value advertisements are placed in the early slots, then audiences will leave and thus the good ones could not reach as many as they could. In other words, as to minimize the negative externality early advertisements impose on the later ones. Two interesting results are worth mentioning. First, the amount of the viewers that the adver- tisement could reach does not only depend on the content of the advertisement and its time slot. That is, even for given advertisement slot and for given advertisement, it could reach a different amount of viewers depending on the advertisement that is placed before it. This is due to the ex- ternalities among advertisements discussed in Section 4.2. For example, the food advertisement under both the worst order and the actual order is in slot 4. But it reaches different percentages of viewers (3.07 percent in the worst order and 3.23 percent in the best order) because the ad- vertisements before it are different under the two cases. Second, from the broadcaster’s point of view, the total number of GRP’s sold in the best order is about 2.6% higher compared to the actual order in the data. That is, the broadcaster could sell more GRP’s if properly order the

110 advertisement by their match values. Another related interesting question is whether the gain from re-ordering will be different for different types of programs. We do the same experiment for the Children’s program which is reported in the lower panel of Table 4.4. We use exactly the same set of advertisements as those in the News program. That is, we only change the show fixed effect from News to Children, keeping everything else constant. Then, quantitatively, the change in the effect of re-ordering is not big. This comes from the fact that the audience composition in News program is different from that in Children program. The next counterfactual experiment examines the effect of re-ordering advertisements within the same program but across different segments of viewers. We choose a Documentary program with Tourism & Accommodation advertisements. As indicated in Figure 4.5a that Tourism & Accommodation has heterogeneous match value across different sub-samples, age group 1 and 3 in particular. The actual order in the data is: Leisure & Entertainment; Technology & Communications; Retail & Supermarkets; Retail & Supermarkets; Gambling; Tourism & Accommodation; Tourism & Accommodation; Tourism & Accommodation; Food; Construc- tion & Housing; Tourism & Accommodation; and finally Food. The best order for age group 1 is: Tourism & Accommodation; Tourism & Accommodation; Tourism & Accommodation; Tourism & Accommodation; Technology & Communications; Construction & Housing; Retail & Supermarkets; Retail & Supermarkets; Food; Food; Leisure & Entertainment; and finally Gambling. Table 4.5 reports the results. We do the experiment twice: once for age group 4-24 and the other for age group 45-64. Importantly, the three strategies, the worst order, actual order and the best order, are kept constant across the two experiments. We multiply the predicted percentage of audience that is reached for each age group by its associated weight in the population so that the numbers reported are in terms of GRP’s. The results indicate that the same ordering has very different effects for different viewers. The best ordering strategy increases the percentage of viewers from age group 4-24 by about 4.6%. But the same ordering strategy decreases the percentage of viewers from age group 45-64 by about 1.6%. This difference comes exactly from the heterogeneous match values: while those from age group 4-24 prefer Tourism & Accommo- dation advertisements, those from age group 45-64 do not. This result is related to advertising targeting: the best strategy is different for different groups of viewers. In general, the gain from ordering will be larger if the advertisements were very different (e.g., consumer goods advertisement vs. education advertisement) and that will be smaller if the advertisements are very similar in terms of match value in the commercial breaks. The match value depends on both the content of the advertisements and the characteristics of the audiences. In the extreme case that every advertisement is exactly the same, it is clear that there is no effect of re-ordering. In general, we find that the effect is larger across viewer segment than across programs.

111 Table 4.4: Effect of re-ordering on predicted GRP’s

News Program slot worst order actual order best order 0 3.84 3.84 3.84 1 3.45 3.45 3.50 2 3.19 3.23 3.30 3 3.07 3.12 3.23 4 3.04 3.09 3.23 5 3.03 3.10 3.22 6 3.02 3.07 3.21 7 3.02 3.07 3.22 8 3.04 3.11 3.23 9 3.09 3.16 3.24 10 3.14 3.16 3.25 11 3.19 3.22 3.24 12 3.25 3.25 3.25 relative percentage of GRP’s sold 98.8% 100% 102.6%

Children Program slot worst order actual order best order 0 6.82 6.82 6.82 1 6.16 6.18 6.25 2 5.74 5.80 5.94 3 5.54 5.92 5.83 4 5.48 5.56 5.84 5 5.44 5.59 5.84 6 5.44 5.55 5.84 7 5.45 5.55 5.84 8 5.49 5.62 5.85 9 5.58 5.71 5.87 10 5.66 5.70 5.85 11 5.76 5.80 5.85 12 5.87 5.87 5.87 relative percentage of GRP’s sold 98.8% 100% 102.8%

Notes: The table contains the predicted GRP’s under different re-ordering strategies. Obtained using the estimates of (4.1). See text for details.

112 Table 4.5: Effect of re-ordering on predicted GRP’s for different sub-viewers

Age group 4-24 slot worst order actual order best order 0 1.66 1.66 1.66 1 1.48 1.49 1.53 2 1.39 1.41 1.47 3 1.34 1.36 1.45 4 1.33 1.35 1.46 5 1.32 1.32 1.46 6 1.31 1.35 1.47 7 1.32 1.37 1.46 8 1.32 1.40 1.46 9 1.35 1.40 1.46 10 1.38 1.40 1.46 11 1.41 1.44 1.46 12 1.45 1.45 1.45 relative percentage of GRP’s sold 98.1% 100% 104.6%

Age group 45-64 slot worst order for group 4-24 actual order best order for group 4-24 0 1.63 1.63 1.63 1 1.44 1.46 1.44 2 1.34 1.36 1.32 3 1.29 1.31 1.26 4 1.28 1.29 1.23 5 1.26 1.27 1.21 6 1.26 1.25 1.23 7 1.28 1.23 1.23 8 1.28 1.22 1.23 9 1.27 1.23 1.23 10 1.26 1.25 1.24 11 1.25 1.24 1.25 12 1.24 1.24 1.24 relative percentage of GRP’s sold 100.7% 100% 98.4%

Notes: The table contains the predicted GRP’s under different re-ordering strategies. The predicted GRP’s are calculated using the predicted audience according to (4.1), multiplied by the weight of that age group in the population. See text for details.

113 4.8 Concluding remarks

Advertising affects individuals in different ways and the exact same advertisement can be infor- mative for some consumers, persuasive or a complement to consumption of goods and services for others, and a nuisance for yet another group of consumers. This paper uses high frequency television advertising and viewing data to characterize heterogeneous responses to advertising. We think of pairs of advertisements and consumers as being associated with a pair-specific match value, which expresses itself in the response to the advertisement. The idea is that con- sumers like to watch more of those advertisements that have a high match value with them, no matter what the underlying role of the advertisement is. We estimate those heterogeneous match values using a novel dataset that contains information on viewing behavior for different types of consumers at the level of an advertisement position and industry. This allows us to sep- arate the effects of the advertisement itself on viewership demand at the viewer segments level from the effects of advertisement slots and characterize heterogeneity in the effects. Exploiting these we find that the viewer responses to advertisements are heterogeneous, both across differ- ent advertisements for given viewer segments and across various viewer segments for the same advertisement. An important question for the broadcaster is how to reach as many consumers as possible in a given commercial break. We point out that the number of viewers in a given group and at a given position within the commercial break depends on the composition of advertisements that have been aired until then. We show that this means that there exist externalities among advertisements: advertisements that are broadcasted in the earlier slot have externalities on the advertisements in a later advertisement slot. Our counterfactual experiments show that by re-ordering the set of advertisements in a commercial break based on their match value the broadcasters could reach more viewers in a given commercial break and that the effect is larger across viewer segments than that across shows. Once we see the response of viewers to advertising as a revealed preference measure, this sug- gests that overall welfare effects of reordering advertisements could be positive. This is obvi- ously a partial equilibrium view and many factors determining welfare are not considered here. Nevertheless, we believe that this aspect deserves more attention. In this paper, we take a first step in this direction.

114 4.A Additional tables and figures

Table 4.6: Effect number of ad by age

(1) (2) (3) (4) (5) all groups age 4-24 age 25-44 age 45-64 age 65+

percProg 0.00102⇤⇤⇤ 0.00158⇤⇤⇤ 0.00148⇤⇤⇤ 0.000885⇤⇤⇤ -0.0000253 (0.0000296) (0.0000560) (0.0000417) (0.0000385) (0.0000554)

beverages -0.00709⇤⇤⇤ -0.00782⇤⇤⇤ -0.00530⇤⇤⇤ -0.00789⇤⇤⇤ -0.0112⇤⇤⇤ (0.00113) (0.00213) (0.00158) (0.00146) (0.00211)

cleaning and pest -0.0218⇤⇤⇤ -0.0296⇤⇤⇤ -0.0211⇤⇤⇤ -0.0166⇤⇤⇤ -0.0233⇤⇤⇤ (0.00153) (0.00289) (0.00215) (0.00199) (0.00286)

construction -0.00780⇤⇤ -0.00543 -0.00860⇤ 0.00244 -0.0226⇤⇤⇤ (0.00243) (0.00460) (0.00343) (0.00316) (0.00456)

consumer goods -0.0212⇤⇤⇤ -0.0197⇤⇤ 0.000894 -0.0389⇤⇤⇤ -0.0374⇤⇤⇤ (0.00377) (0.00713) (0.00531) (0.00490) (0.00707)

cosmetics and hyge -0.0101⇤⇤⇤ -0.00891⇤⇤⇤ -0.00919⇤⇤⇤ -0.0112⇤⇤⇤ -0.0144⇤⇤⇤ (0.00108) (0.00204) (0.00152) (0.00140) (0.00202)

education 0.00240 0.00723 0.000407 0.000803 0.00701 (0.00258) (0.00488) (0.00363) (0.00335) (0.00483)

fashion -0.0154⇤⇤⇤ -0.0132⇤⇤⇤ -0.0144⇤⇤⇤ -0.0199⇤⇤⇤ -0.0203⇤⇤⇤ (0.00128) (0.00242) (0.00180) (0.00166) (0.00240)

financial service -0.00365⇤⇤ -0.000445 -0.00155 0.0000403 -0.0150⇤⇤⇤ (0.00122) (0.00231) (0.00172) (0.00159) (0.00229)

food -0.00799⇤⇤⇤ -0.00943⇤⇤⇤ -0.00636⇤⇤⇤ -0.00999⇤⇤⇤ -0.00628⇤⇤⇤ (0.000968) (0.00183) (0.00136) (0.00126) (0.00181)

furniture -0.0105⇤⇤⇤ -0.00692 -0.0106⇤⇤ -0.00741⇤ -0.0268⇤⇤⇤ (0.00240) (0.00454) (0.00338) (0.00312) (0.00450)

gambling -0.0142⇤⇤⇤ -0.0211⇤⇤⇤ -0.00349 -0.0216⇤⇤⇤ -0.0193⇤⇤⇤ (0.00188) (0.00356) (0.00265) (0.00244) (0.00352)

health -0.0153⇤⇤⇤ -0.00677⇤ -0.0147⇤⇤⇤ -0.0164⇤⇤⇤ -0.0249⇤⇤⇤ (0.00145) (0.00273) (0.00204) (0.00188) (0.00271)

household appliance -0.00859⇤⇤⇤ -0.00939⇤⇤ -0.00931⇤⇤⇤ -0.00838⇤⇤⇤ -0.00573 (0.00177) (0.00334) (0.00249) (0.00230) (0.00331)

leisure and entert -0.00999⇤⇤⇤ -0.0152⇤⇤⇤ -0.0113⇤⇤⇤ -0.00609⇤⇤⇤ -0.0144⇤⇤⇤ (0.00125) (0.00237) (0.00176) (0.00163) (0.00235)

organizations -0.0223⇤⇤⇤ -0.0189⇤⇤⇤ -0.0212⇤⇤⇤ -0.0246⇤⇤⇤ -0.0283⇤⇤⇤ (0.00230) (0.00435) (0.00324) (0.00299) (0.00431)

other -0.0112⇤⇤⇤ -0.0103⇤ -0.0112⇤⇤⇤ -0.0143⇤⇤⇤ -0.00974⇤ (0.00239) (0.00451) (0.00336) (0.00310) (0.00447)

115 retail and supermarket -0.0106⇤⇤⇤ -0.0121⇤⇤⇤ -0.0122⇤⇤⇤ -0.00939⇤⇤⇤ -0.0130⇤⇤⇤ (0.00110) (0.00208) (0.00155) (0.00143) (0.00206) technology and communication -0.00796⇤⇤⇤ -0.00314 -0.00913⇤⇤⇤ -0.00945⇤⇤⇤ -0.0136⇤⇤⇤ (0.00107) (0.00203) (0.00151) (0.00140) (0.00201) tourism and accomondation -0.00848⇤⇤⇤ 0.0103⇤ -0.00796⇤ -0.0221⇤⇤⇤ -0.00688 (0.00236) (0.00447) (0.00333) (0.00307) (0.00443) vehicle -0.00457⇤⇤ 0.00187 -0.00674⇤⇤⇤ -0.0105⇤⇤⇤ -0.0107⇤⇤⇤ (0.00141) (0.00266) (0.00198) (0.00183) (0.00263) location 1 -0.0894⇤⇤⇤ -0.0907⇤⇤⇤ -0.0750⇤⇤⇤ -0.102⇤⇤⇤ -0.117⇤⇤⇤ (0.00314) (0.00594) (0.00442) (0.00408) (0.00589)

2 -0.145⇤⇤⇤ -0.142⇤⇤⇤ -0.125⇤⇤⇤ -0.166⇤⇤⇤ -0.194⇤⇤⇤ (0.00341) (0.00645) (0.00480) (0.00443) (0.00639)

3 -0.169⇤⇤⇤ -0.164⇤⇤⇤ -0.145⇤⇤⇤ -0.195⇤⇤⇤ -0.224⇤⇤⇤ (0.00386) (0.00730) (0.00543) (0.00502) (0.00723)

4 -0.172⇤⇤⇤ -0.166⇤⇤⇤ -0.148⇤⇤⇤ -0.199⇤⇤⇤ -0.228⇤⇤⇤ (0.00444) (0.00840) (0.00625) (0.00577) (0.00832)

5 -0.170⇤⇤⇤ -0.163⇤⇤⇤ -0.147⇤⇤⇤ -0.198⇤⇤⇤ -0.223⇤⇤⇤ (0.00511) (0.00966) (0.00719) (0.00664) (0.00957)

6 -0.164⇤⇤⇤ -0.155⇤⇤⇤ -0.142⇤⇤⇤ -0.191⇤⇤⇤ -0.213⇤⇤⇤ (0.00583) (0.0110) (0.00821) (0.00758) (0.0109)

7 -0.155⇤⇤⇤ -0.146⇤⇤⇤ -0.134⇤⇤⇤ -0.181⇤⇤⇤ -0.202⇤⇤⇤ (0.00661) (0.0125) (0.00930) (0.00858) (0.0124)

8 -0.145⇤⇤⇤ -0.137⇤⇤⇤ -0.125⇤⇤⇤ -0.169⇤⇤⇤ -0.189⇤⇤⇤ (0.00741) (0.0140) (0.0104) (0.00963) (0.0139)

9 -0.133⇤⇤⇤ -0.127⇤⇤⇤ -0.116⇤⇤⇤ -0.155⇤⇤⇤ -0.176⇤⇤⇤ (0.00825) (0.0156) (0.0116) (0.0107) (0.0154)

10 -0.122⇤⇤⇤ -0.118⇤⇤⇤ -0.107⇤⇤⇤ -0.140⇤⇤⇤ -0.158⇤⇤⇤ (0.00910) (0.0172) (0.0128) (0.0118) (0.0171)

11 -0.108⇤⇤⇤ -0.105⇤⇤⇤ -0.0932⇤⇤⇤ -0.126⇤⇤⇤ -0.143⇤⇤⇤ (0.00998) (0.0189) (0.0140) (0.0130) (0.0187)

12 -0.0956⇤⇤⇤ -0.0875⇤⇤⇤ -0.0813⇤⇤⇤ -0.113⇤⇤⇤ -0.128⇤⇤⇤ (0.0109) (0.0205) (0.0153) (0.0141) (0.0204)

13 -0.0824⇤⇤⇤ -0.0777⇤⇤⇤ -0.0696⇤⇤⇤ -0.0973⇤⇤⇤ -0.108⇤⇤⇤ (0.0118) (0.0223) (0.0166) (0.0153) (0.0221)

14 -0.0652⇤⇤⇤ -0.0585⇤ -0.0553⇤⇤ -0.0762⇤⇤⇤ -0.0848⇤⇤⇤ (0.0127) (0.0240) (0.0179) (0.0165) (0.0238)

15 -0.0476⇤⇤⇤ -0.0378 -0.0386⇤ -0.0591⇤⇤⇤ -0.0664⇤⇤ (0.0137) (0.0259) (0.0193) (0.0178) (0.0257)

116 16 -0.0344⇤ -0.0212 -0.0288 -0.0442⇤ -0.0522 (0.0148) (0.0280) (0.0208) (0.0192) (0.0277)

17 -0.0224 -0.00492 -0.0202 -0.0375 -0.0329 (0.0161) (0.0304) (0.0226) (0.0209) (0.0301)

18 -0.0151 0.00171 -0.0135 -0.0252 -0.0189 (0.0176) (0.0332) (0.0247) (0.0228) (0.0329)

19 -0.0106 0.000986 -0.00753 -0.0165 -0.0180 (0.0197) (0.0373) (0.0278) (0.0257) (0.0370)

Observations 179022 179022 179022 179022 179022 R2 0.867 0.803 0.783 0.787 0.713 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

117 Table 4.7: Effect number of ad by income

(1) (2) (3) (4) (5) a lot a lot above below average below average average above average average percProg 0.000846⇤⇤⇤ 0.000772⇤⇤⇤ 0.00145⇤⇤⇤ 0.00133⇤⇤⇤ 0.000938⇤⇤⇤ (0.0000538) (0.0000501) (0.0000537) (0.0000596) (0.0000741) beverages -0.00704⇤⇤⇤ -0.00622⇤⇤ -0.00914⇤⇤⇤ -0.00834⇤⇤⇤ -0.00884⇤⇤ (0.00205) (0.00190) (0.00204) (0.00227) (0.00282) cleaning and pest -0.0243⇤⇤⇤ -0.0137⇤⇤⇤ -0.0242⇤⇤⇤ -0.0282⇤⇤⇤ -0.0217⇤⇤⇤ (0.00278) (0.00259) (0.00278) (0.00308) (0.00383) construction -0.0135⇤⇤ -0.0267⇤⇤⇤ 0.00232 0.00460 -0.000814 (0.00442) (0.00412) (0.00442) (0.00491) (0.00610) consumer goods -0.0295⇤⇤⇤ -0.0110 -0.0231⇤⇤⇤ -0.0206⇤⇤ -0.0413⇤⇤⇤ (0.00686) (0.00638) (0.00685) (0.00761) (0.00945) cosmetics and hyge -0.00878⇤⇤⇤ -0.00768⇤⇤⇤ -0.0158⇤⇤⇤ -0.0123⇤⇤⇤ -0.0147⇤⇤⇤ (0.00196) (0.00183) (0.00196) (0.00218) (0.00270) education -0.00302 -0.00724 0.0253⇤⇤⇤ -0.0136⇤⇤ 0.0155⇤ (0.00469) (0.00437) (0.00469) (0.00520) (0.00646) fashion -0.0202⇤⇤⇤ -0.0179⇤⇤⇤ -0.0127⇤⇤⇤ -0.0184⇤⇤⇤ -0.0275⇤⇤⇤ (0.00232) (0.00216) (0.00232) (0.00258) (0.00320)

financial service -0.000687 -0.00132 -0.00273 -0.0102⇤⇤⇤ -0.00950⇤⇤ (0.00222) (0.00207) (0.00222) (0.00247) (0.00307) food -0.00346⇤ -0.00726⇤⇤⇤ -0.0126⇤⇤⇤ -0.0118⇤⇤⇤ -0.00922⇤⇤⇤ (0.00176) (0.00164) (0.00176) (0.00195) (0.00242) furniture -0.0123⇤⇤ -0.00953⇤ -0.0297⇤⇤⇤ -0.000170 -0.0323⇤⇤⇤ (0.00436) (0.00406) (0.00436) (0.00484) (0.00601) gambling -0.0136⇤⇤⇤ -0.00896⇤⇤ -0.0155⇤⇤⇤ -0.0265⇤⇤⇤ -0.0107⇤ (0.00342) (0.00318) (0.00342) (0.00379) (0.00471) health -0.0278⇤⇤⇤ -0.0107⇤⇤⇤ -0.00688⇤⇤ -0.0244⇤⇤⇤ -0.0273⇤⇤⇤ (0.00263) (0.00245) (0.00263) (0.00291) (0.00362) household appliance -0.0168⇤⇤⇤ 0.00848⇤⇤ -0.0196⇤⇤⇤ -0.00169 -0.0135⇤⇤ (0.00321) (0.00299) (0.00321) (0.00356) (0.00442) leisure and entert -0.0114⇤⇤⇤ -0.00781⇤⇤⇤ -0.0101⇤⇤⇤ -0.0146⇤⇤⇤ -0.00154 (0.00227) (0.00212) (0.00227) (0.00252) (0.00314) organizations -0.0243⇤⇤⇤ -0.0299⇤⇤⇤ -0.0211⇤⇤⇤ -0.0339⇤⇤⇤ -0.00592 (0.00418) (0.00389) (0.00418) (0.00463) (0.00576) other -0.0147⇤⇤⇤ -0.00521 -0.00501 -0.0174⇤⇤⇤ -0.00888 (0.00434) (0.00404) (0.00434) (0.00481) (0.00598) retail and supermarket -0.00554⇤⇤ -0.0131⇤⇤⇤ -0.0148⇤⇤⇤ -0.0160⇤⇤⇤ -0.0128⇤⇤⇤

118 (0.00200) (0.00186) (0.00200) (0.00222) (0.00276) technology and communication -0.00603⇤⇤ -0.00839⇤⇤⇤ -0.0115⇤⇤⇤ -0.0115⇤⇤⇤ -0.00316 (0.00195) (0.00182) (0.00195) (0.00217) (0.00269) tourism and accomondation -0.00617 -0.0120⇤⇤ 0.00648 -0.0205⇤⇤⇤ -0.00875 (0.00429) (0.00400) (0.00429) (0.00476) (0.00592) vehicle -0.00965⇤⇤⇤ -0.00972⇤⇤⇤ 0.000692 -0.00107 0.0162⇤⇤⇤ (0.00255) (0.00238) (0.00255) (0.00283) (0.00352) location 1 -0.0714⇤⇤⇤ -0.0828⇤⇤⇤ -0.112⇤⇤⇤ -0.113⇤⇤⇤ -0.130⇤⇤⇤ (0.00571) (0.00532) (0.00571) (0.00633) (0.00787)

2 -0.114⇤⇤⇤ -0.142⇤⇤⇤ -0.181⇤⇤⇤ -0.181⇤⇤⇤ -0.206⇤⇤⇤ (0.00620) (0.00577) (0.00620) (0.00688) (0.00854)

3 -0.133⇤⇤⇤ -0.169⇤⇤⇤ -0.209⇤⇤⇤ -0.210⇤⇤⇤ -0.239⇤⇤⇤ (0.00701) (0.00653) (0.00701) (0.00778) (0.00967)

4 -0.134⇤⇤⇤ -0.174⇤⇤⇤ -0.212⇤⇤⇤ -0.213⇤⇤⇤ -0.244⇤⇤⇤ (0.00807) (0.00752) (0.00807) (0.00896) (0.0111)

5 -0.132⇤⇤⇤ -0.172⇤⇤⇤ -0.209⇤⇤⇤ -0.209⇤⇤⇤ -0.246⇤⇤⇤ (0.00928) (0.00864) (0.00928) (0.0103) (0.0128)

6 -0.124⇤⇤⇤ -0.168⇤⇤⇤ -0.202⇤⇤⇤ -0.199⇤⇤⇤ -0.240⇤⇤⇤ (0.0106) (0.00987) (0.0106) (0.0118) (0.0146)

7 -0.115⇤⇤⇤ -0.160⇤⇤⇤ -0.190⇤⇤⇤ -0.187⇤⇤⇤ -0.230⇤⇤⇤ (0.0120) (0.0112) (0.0120) (0.0133) (0.0165)

8 -0.105⇤⇤⇤ -0.151⇤⇤⇤ -0.178⇤⇤⇤ -0.171⇤⇤⇤ -0.218⇤⇤⇤ (0.0135) (0.0125) (0.0135) (0.0149) (0.0186)

9 -0.0920⇤⇤⇤ -0.142⇤⇤⇤ -0.163⇤⇤⇤ -0.155⇤⇤⇤ -0.206⇤⇤⇤ (0.0150) (0.0140) (0.0150) (0.0166) (0.0207)

10 -0.0852⇤⇤⇤ -0.131⇤⇤⇤ -0.146⇤⇤⇤ -0.138⇤⇤⇤ -0.188⇤⇤⇤ (0.0165) (0.0154) (0.0165) (0.0183) (0.0228)

11 -0.0712⇤⇤⇤ -0.121⇤⇤⇤ -0.130⇤⇤⇤ -0.117⇤⇤⇤ -0.175⇤⇤⇤ (0.0181) (0.0169) (0.0181) (0.0201) (0.0250)

12 -0.0584⇤⇤ -0.110⇤⇤⇤ -0.114⇤⇤⇤ -0.104⇤⇤⇤ -0.159⇤⇤⇤ (0.0197) (0.0184) (0.0197) (0.0219) (0.0272)

13 -0.0469⇤ -0.0948⇤⇤⇤ -0.102⇤⇤⇤ -0.0872⇤⇤⇤ -0.142⇤⇤⇤ (0.0214) (0.0199) (0.0214) (0.0237) (0.0295)

14 -0.0283 -0.0772⇤⇤⇤ -0.0836⇤⇤⇤ -0.0628⇤ -0.122⇤⇤⇤ (0.0231) (0.0215) (0.0231) (0.0256) (0.0318)

15 -0.0140 -0.0599⇤⇤ -0.0649⇤⇤ -0.0413 -0.0933⇤⇤ (0.0249) (0.0232) (0.0249) (0.0277) (0.0344)

16 -0.00737 -0.0371 -0.0525 -0.0328 -0.0836⇤

119 (0.0269) (0.0251) (0.0269) (0.0298) (0.0371)

17 -0.00128 -0.0238 -0.0372 -0.0111 -0.0763 (0.0292) (0.0272) (0.0292) (0.0324) (0.0402)

18 0.00477 -0.0172 -0.0192 -0.00547 -0.0604 (0.0319) (0.0297) (0.0319) (0.0354) (0.0439)

19 0.0163 -0.0267 -0.0157 0.000492 -0.0658 (0.0359) (0.0334) (0.0358) (0.0398) (0.0494)

Observations 179022 179022 179022 179022 179022 R2 0.799 0.742 0.673 0.634 0.501 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

120 Figure 4.6: Distribution of commercial breaks

(a) Distribution of commercial breaks by the number of advertisements .15 .1 density .05 0 0 10 20 30 ad number

(b) Distribution of commercial breaks by viewerships of the first advertisement .15 .1 density .05 0 0 5 10 15 20 percentage of viewers in the first ad

Notes: The figures display the commercial breaks before sample selection. We use the commer- cial breaks that have between 6 and 20 advertisements (those between two red lines in 4.6a) and those have more than 1 percent of viewers in the first advertisement (those on the right side of the red line in 4.6b).

121 Table 4.8: Viewer segments just before the commercial break across program genres by age

program genre age group 1 2 3 4 Children 2.7 1.5 1.1 0.7 Cinema 0.8 1.2 0.8 0.3 Commercial 1.2 1.6 1.3 0.8 Culture, Leisure & Education 1.2 1.5 1.0 0.6 Documentary 1.3 1.9 1.5 0.9 Entertainment 2.0 2.2 1.5 0.7 News & Current Affairs 1.0 1.6 1.4 0.9 Other 1.2 1.5 1.1 0.6 Sports 1.2 1.3 0.9 0.4 Television Drama 1.4 1.5 1.0 0.5 average 1.4 1.6 1.2 0.6

Notes: The table presents the average percentage of viewers within an age subgroup viewing a specific genre. The numbers are reported in terms of GRP’s: the average percentage of viewers that has been reached in the total population coming from an age or income subgroup. It is calculated by the viewer ratings of a viewer segment times the weight of that segment in the total population.

Table 4.9: Viewer segments just before the commercial break across program genres by income

program genre income group 1 2 3 4 5 Children 1.6 1.5 1.4 0.9 0.5 Cinema 0.8 0.6 0.7 0.6 0.3 Commercial 1.4 1.2 1.2 0.7 0.4 Culture, Leisure & Education 1.1 1.0 1.1 0.6 0.4 Documentary 1.5 1.3 1.4 0.9 0.5 Entertainment 1.7 1.5 1.6 1.1 0.6 News & Current Affairs 1.3 1.2 1.2 0.8 0.4 Other 1.2 1.1 1.1 0.6 0.3 Sports 0.9 0.9 1.1 0.6 0.3 Television Drama 1.2 0.9 1.2 0.7 0.4 average 1.3 1.1 1.2 0.7 0.4

Notes: The table presents the average percentage of viewers within an income subgroup viewing a specific genre. The numbers are reported in terms of GRP’s: the average percentage of viewers that has been reached in the total population coming from an age or income subgroup. It is calculated by the viewer ratings of a viewer segment times the weight of that segment in the total population.

122 Table 4.10: Effect number of ad by age

(1) (2) (3) (4) (5) all groups age 4-24 age 25-44 age 45-64 age 65+ percProg 0.00107⇤⇤⇤ 0.00157⇤⇤⇤ 0.00148⇤⇤⇤ 0.000982⇤⇤⇤ 0.0000142 (0.0000282) (0.0000521) (0.0000394) (0.0000368) (0.0000518) beverages -0.00611 -0.00357 -0.00466 -0.00540 -0.0135 (0.00424) (0.00782) (0.00591) (0.00551) (0.00777) cleaning and pest -0.0213⇤⇤⇤ -0.0272⇤⇤⇤ -0.0208⇤⇤⇤ -0.0148⇤⇤ -0.0258⇤⇤ (0.00437) (0.00807) (0.00610) (0.00569) (0.00802) construction -0.00890 -0.00334 -0.0117 0.00217 -0.0217⇤ (0.00474) (0.00876) (0.00662) (0.00617) (0.00870) consumer goods -0.0197⇤⇤⇤ -0.0128 0.00273 -0.0395⇤⇤⇤ -0.0439⇤⇤⇤ (0.00550) (0.0102) (0.00768) (0.00716) (0.0101) cosmetics and hyge -0.00949⇤ -0.00632 -0.00862 -0.00974 -0.0159⇤ (0.00422) (0.00779) (0.00589) (0.00549) (0.00775) education 0.00258 0.0118 -0.00211 0.00395 0.00461 (0.00480) (0.00886) (0.00670) (0.00625) (0.00881) fashion -0.0143⇤⇤⇤ -0.00963 -0.0134⇤ -0.0183⇤⇤ -0.0207⇤⇤ (0.00427) (0.00789) (0.00596) (0.00556) (0.00784)

financial service -0.00432 0.00184 -0.00241 -0.000770 -0.0181⇤ (0.00426) (0.00787) (0.00595) (0.00555) (0.00782) food -0.00756 -0.00636 -0.00574 -0.00911 -0.00882 (0.00420) (0.00775) (0.00586) (0.00546) (0.00770) furniture -0.00909 -0.00466 -0.00934 -0.00569 -0.0246⇤⇤ (0.00473) (0.00873) (0.00661) (0.00616) (0.00868) gambling -0.0160⇤⇤⇤ -0.0191⇤ -0.00708 -0.0222⇤⇤⇤ -0.0199⇤ (0.00449) (0.00830) (0.00627) (0.00585) (0.00825) health -0.0129⇤⇤ -0.00194 -0.0118 -0.0145⇤ -0.0269⇤⇤⇤ (0.00433) (0.00799) (0.00605) (0.00564) (0.00795) household appliance -0.00985⇤ -0.00954 -0.0118 -0.00874 -0.00664 (0.00445) (0.00821) (0.00621) (0.00579) (0.00816) leisure and entert -0.00874⇤ -0.00972 -0.0102 -0.00517 -0.0183⇤ (0.00427) (0.00789) (0.00597) (0.00556) (0.00784) organizations -0.0188⇤⇤⇤ -0.0138 -0.0187⇤⇤ -0.0187⇤⇤ -0.0272⇤⇤ (0.00468) (0.00864) (0.00654) (0.00609) (0.00859) other -0.00883 -0.00462 -0.00863 -0.0126⇤ -0.0106 (0.00466) (0.00861) (0.00651) (0.00607) (0.00855) retail and supermarket -0.00952⇤ -0.00820 -0.0104 -0.00824 -0.0153⇤

123 (0.00423) (0.00780) (0.00590) (0.00550) (0.00775) technology and communication -0.00704 0.000245 -0.00816 -0.00843 -0.0157⇤ (0.00423) (0.00780) (0.00590) (0.00550) (0.00776) tourism and accomondation -0.00819 0.0128 -0.00605 -0.0231⇤⇤⇤ -0.0117 (0.00469) (0.00866) (0.00655) (0.00611) (0.00861) vehicle -0.00347 0.00405 -0.00424 -0.00877 -0.0125 (0.00431) (0.00796) (0.00602) (0.00561) (0.00791) location 1 -0.0884⇤⇤⇤ -0.0920⇤⇤⇤ -0.0750⇤⇤⇤ -0.102⇤⇤⇤ -0.112⇤⇤⇤ (0.00505) (0.00932) (0.00705) (0.00657) (0.00926)

2 -0.142⇤⇤⇤ -0.144⇤⇤⇤ -0.123⇤⇤⇤ -0.164⇤⇤⇤ -0.182⇤⇤⇤ (0.00881) (0.0163) (0.0123) (0.0115) (0.0162)

3 -0.168⇤⇤⇤ -0.170⇤⇤⇤ -0.145⇤⇤⇤ -0.194⇤⇤⇤ -0.211⇤⇤⇤ (0.0128) (0.0237) (0.0179) (0.0167) (0.0235)

4 -0.172⇤⇤⇤ -0.177⇤⇤⇤ -0.150⇤⇤⇤ -0.200⇤⇤⇤ -0.214⇤⇤⇤ (0.0169) (0.0313) (0.0236) (0.0220) (0.0311)

5 -0.171⇤⇤⇤ -0.178⇤⇤⇤ -0.150⇤⇤⇤ -0.200⇤⇤⇤ -0.207⇤⇤⇤ (0.0211) (0.0389) (0.0294) (0.0274) (0.0386)

6 -0.166⇤⇤⇤ -0.174⇤⇤⇤ -0.146⇤⇤⇤ -0.194⇤⇤⇤ -0.196⇤⇤⇤ (0.0252) (0.0465) (0.0352) (0.0328) (0.0462)

7 -0.158⇤⇤⇤ -0.168⇤⇤ -0.139⇤⇤⇤ -0.185⇤⇤⇤ -0.182⇤⇤⇤ (0.0294) (0.0542) (0.0410) (0.0382) (0.0539)

8 -0.148⇤⇤⇤ -0.163⇤⇤ -0.131⇤⇤ -0.175⇤⇤⇤ -0.168⇤⇤ (0.0335) (0.0619) (0.0468) (0.0436) (0.0615)

9 -0.138⇤⇤⇤ -0.157⇤ -0.122⇤ -0.163⇤⇤⇤ -0.154⇤ (0.0377) (0.0696) (0.0526) (0.0491) (0.0692)

10 -0.127⇤⇤ -0.152⇤ -0.114 -0.149⇤⇤ -0.134 (0.0419) (0.0773) (0.0585) (0.0545) (0.0768)

11 -0.115⇤ -0.141 -0.101 -0.136⇤ -0.117 (0.0460) (0.0850) (0.0643) (0.0599) (0.0845)

12 -0.103⇤ -0.128 -0.0898 -0.125 -0.100 (0.0502) (0.0927) (0.0701) (0.0654) (0.0922)

13 -0.0907 -0.122 -0.0791 -0.111 -0.0783 (0.0544) (0.100) (0.0760) (0.0708) (0.0998)

14 -0.0750 -0.107 -0.0665 -0.0916 -0.0541 (0.0586) (0.108) (0.0818) (0.0763) (0.108)

15 -0.0587 -0.0899 -0.0509 -0.0764 -0.0343 (0.0628) (0.116) (0.0877) (0.0818) (0.115)

16 -0.0475 -0.0787 -0.0428 -0.0634 -0.0194

124 (0.0670) (0.124) (0.0936) (0.0872) (0.123)

17 -0.0382 -0.0693 -0.0357 -0.0581 -0.000506 (0.0713) (0.132) (0.0995) (0.0928) (0.131)

18 -0.0321 -0.0676 -0.0297 -0.0474 0.0130 (0.0756) (0.139) (0.105) (0.0983) (0.139)

19 -0.0266 -0.0697 -0.0221 -0.0377 0.0179 (0.0799) (0.148) (0.112) (0.104) (0.147)

20 -0.0175 -0.0692 -0.0144 -0.0218 0.0295 (0.0844) (0.156) (0.118) (0.110) (0.155)

21 -0.00749 -0.0581 0.000156 -0.00723 0.0268 (0.0892) (0.165) (0.125) (0.116) (0.164)

22 0.00256 -0.0469 0.00809 -0.00729 0.0545 (0.0944) (0.174) (0.132) (0.123) (0.173)

23 0.0164 0.00863 0.00853 -0.0115 0.0290 (0.102) (0.188) (0.142) (0.132) (0.187)

24 0.00507 0.000408 0.0150 -0.0286 0.0595 (0.114) (0.210) (0.159) (0.148) (0.209)

Observations 196939 196939 196939 196939 196939 R2 0.867 0.805 0.785 0.788 0.716 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

Table 4.11: Effect number of ad by income

(1) (2) (3) (4) (5) a lot a lot above below average below average average above average average percProg 0.000746⇤⇤⇤ 0.000879⇤⇤⇤ 0.00149⇤⇤⇤ 0.00150⇤⇤⇤ 0.00107⇤⇤⇤ (0.0000504) (0.0000472) (0.0000507) (0.0000561) (0.0000692) beverages -0.00848 -0.00382 -0.00648 -0.0150 -0.00853 (0.00755) (0.00708) (0.00760) (0.00842) (0.0104) cleaning and pest -0.0270⇤⇤⇤ -0.0135 -0.0196⇤ -0.0361⇤⇤⇤ -0.0194 (0.00780) (0.00731) (0.00785) (0.00869) (0.0107) construction -0.0117 -0.0312⇤⇤⇤ 0.000854 -0.00181 -0.00609 (0.00846) (0.00793) (0.00852) (0.00943) (0.0116) consumer goods -0.0241⇤ -0.00782 -0.0161 -0.0347⇤⇤ -0.0571⇤⇤⇤ (0.00981) (0.00920) (0.00988) (0.0109) (0.0135) cosmetics and hyge -0.0102 -0.00500 -0.0142 -0.0196⇤ -0.0142 (0.00753) (0.00706) (0.00758) (0.00839) (0.0103) education -0.00651 -0.00366 0.0245⇤⇤ -0.0183 0.0152

125 (0.00856) (0.00802) (0.00862) (0.00954) (0.0118) fashion -0.0207⇤⇤ -0.0157⇤ -0.00698 -0.0276⇤⇤ -0.0277⇤⇤ (0.00762) (0.00714) (0.00767) (0.00849) (0.0105)

financial service -0.00265 -0.000816 -0.00204 -0.0208⇤ -0.0122 (0.00760) (0.00713) (0.00765) (0.00848) (0.0104) food -0.00509 -0.00596 -0.01000 -0.0191⇤ -0.0105 (0.00749) (0.00702) (0.00754) (0.00835) (0.0103) furniture -0.0172⇤ -0.00549 -0.0236⇤⇤ -0.00783 -0.0260⇤ (0.00844) (0.00791) (0.00849) (0.00941) (0.0116) gambling -0.0160⇤ -0.00770 -0.0167⇤ -0.0351⇤⇤⇤ -0.0153 (0.00801) (0.00751) (0.00807) (0.00894) (0.0110) health -0.0287⇤⇤⇤ -0.00701 -0.000508 -0.0302⇤⇤⇤ -0.0292⇤⇤ (0.00772) (0.00724) (0.00777) (0.00861) (0.0106) household appliance -0.0191⇤ 0.00661 -0.0193⇤ -0.0122 -0.0132 (0.00793) (0.00744) (0.00799) (0.00885) (0.0109) leisure and entert -0.0129 -0.00711 -0.00511 -0.0212⇤ -0.00299 (0.00762) (0.00714) (0.00767) (0.00850) (0.0105) organizations -0.0229⇤⇤ -0.0236⇤⇤ -0.0150 -0.0369⇤⇤⇤ -0.00108 (0.00835) (0.00783) (0.00840) (0.00931) (0.0115) other -0.0133 -0.00448 0.00469 -0.0239⇤⇤ -0.00980 (0.00831) (0.00779) (0.00837) (0.00927) (0.0114) retail and supermarket -0.00711 -0.0112 -0.0105 -0.0227⇤⇤ -0.0112 (0.00753) (0.00706) (0.00758) (0.00840) (0.0104) technology and communication -0.00782 -0.00661 -0.00801 -0.0175⇤ -0.00353 (0.00754) (0.00707) (0.00759) (0.00841) (0.0104) tourism and accomondation -0.00693 -0.00720 0.00552 -0.0255⇤⇤ -0.0135 (0.00837) (0.00784) (0.00842) (0.00933) (0.0115) vehicle -0.0134 -0.00582 0.00399 -0.0102 0.0171 (0.00769) (0.00721) (0.00774) (0.00858) (0.0106) location 1 -0.0688⇤⇤⇤ -0.0837⇤⇤⇤ -0.112⇤⇤⇤ -0.103⇤⇤⇤ -0.126⇤⇤⇤ (0.00900) (0.00844) (0.00906) (0.0100) (0.0124)

2 -0.108⇤⇤⇤ -0.140⇤⇤⇤ -0.183⇤⇤⇤ -0.161⇤⇤⇤ -0.195⇤⇤⇤ (0.0157) (0.0147) (0.0158) (0.0175) (0.0216)

3 -0.125⇤⇤⇤ -0.169⇤⇤⇤ -0.214⇤⇤⇤ -0.184⇤⇤⇤ -0.229⇤⇤⇤ (0.0229) (0.0215) (0.0230) (0.0255) (0.0314)

4 -0.125⇤⇤⇤ -0.176⇤⇤⇤ -0.221⇤⇤⇤ -0.181⇤⇤⇤ -0.236⇤⇤⇤ (0.0302) (0.0283) (0.0304) (0.0337) (0.0415)

5 -0.121⇤⇤ -0.178⇤⇤⇤ -0.221⇤⇤⇤ -0.170⇤⇤⇤ -0.236⇤⇤⇤

126 (0.0376) (0.0352) (0.0378) (0.0419) (0.0516)

6 -0.111⇤ -0.175⇤⇤⇤ -0.217⇤⇤⇤ -0.154⇤⇤ -0.230⇤⇤⇤ (0.0449) (0.0421) (0.0452) (0.0501) (0.0618)

7 -0.101 -0.169⇤⇤⇤ -0.208⇤⇤⇤ -0.135⇤ -0.220⇤⇤ (0.0524) (0.0491) (0.0527) (0.0584) (0.0719)

8 -0.0891 -0.162⇤⇤ -0.199⇤⇤⇤ -0.112 -0.207⇤ (0.0598) (0.0560) (0.0602) (0.0667) (0.0821)

9 -0.0753 -0.156⇤ -0.188⇤⇤ -0.0888 -0.195⇤ (0.0672) (0.0630) (0.0677) (0.0750) (0.0924)

10 -0.0665 -0.147⇤ -0.175⇤ -0.0648 -0.177 (0.0747) (0.0700) (0.0752) (0.0833) (0.103)

11 -0.0513 -0.139 -0.161 -0.0368 -0.164 (0.0821) (0.0770) (0.0827) (0.0916) (0.113)

12 -0.0377 -0.130 -0.149 -0.0158 -0.149 (0.0896) (0.0840) (0.0902) (0.0999) (0.123)

13 -0.0245 -0.117 -0.141 0.00861 -0.132 (0.0970) (0.0910) (0.0977) (0.108) (0.133)

14 -0.00530 -0.101 -0.126 0.0395 -0.112 (0.105) (0.0980) (0.105) (0.117) (0.144)

15 0.0101 -0.0858 -0.112 0.0679 -0.0844 (0.112) (0.105) (0.113) (0.125) (0.154)

16 0.0170 -0.0679 -0.103 0.0838 -0.0759 (0.120) (0.112) (0.120) (0.133) (0.164)

17 0.0230 -0.0595 -0.0925 0.109 -0.0682 (0.127) (0.119) (0.128) (0.142) (0.175)

18 0.0289 -0.0568 -0.0775 0.121 -0.0519 (0.135) (0.126) (0.136) (0.150) (0.185)

19 0.0407 -0.0650 -0.0734 0.140 -0.0492 (0.142) (0.134) (0.143) (0.159) (0.196)

20 0.0332 -0.0510 -0.0578 0.159 -0.0131 (0.151) (0.141) (0.152) (0.168) (0.207)

21 0.0518 -0.0550 -0.0414 0.199 -0.0205 (0.159) (0.149) (0.160) (0.177) (0.218)

22 0.0384 -0.0578 -0.0227 0.231 0.00122 (0.168) (0.158) (0.169) (0.188) (0.231)

23 0.0426 -0.0129 -0.00577 0.266 0.0410 (0.181) (0.170) (0.183) (0.202) (0.249)

24 0.0895 -0.0909 -0.0273 0.255 0.0638

127 (0.203) (0.190) (0.204) (0.226) (0.278)

Observations 196939 196939 196939 196939 196939 R2 0.801 0.745 0.680 0.640 0.508 Standard errors in parentheses

⇤ p < 0.05, ⇤⇤ p < 0.01, ⇤⇤⇤ p < 0.001

128 Bibliography

Ackerberg, D. A. (2001). Empirically distinguishing informative and prestige effects of adver- tising. RAND Journal of Economics, 316–333. Ackerberg, D. A. (2003). Advertising, learning, and consumer choice in experience good mar- kets: an empirical examination*. International Economic Review 44(3), 1007–1040. Albuquerque, P. and B. J. Bronnenberg (2009). Estimating demand heterogeneity using aggre- gated data: An application to the frozen pizza category. Marketing Science 28(2), 356–372. Anand, B. N. and R. Shachar (2001). Advertising, the matchmaker. The RAND Journal of Economics 42(2), 205–245. Bagwell, K. (2007). The economic analysis of advertising. In M. Armstrong and R. Porter (Eds.), Handbook of Industrial Organization, Volume 3, pp. 1701–1844. Elsevier. Bronnenberg, B. J. and W. R. Vanhonacker (1996). Limited choice sets, local price response and implied measures of price competition. Journal of Marketing Research, 163–173. Chandy, R. K., G. J. Tellis, D. J. MacInnis, and P. Thaivanich (2001). What to say when: Advertising appeals in evolving markets. Journal of Marketing Research 38(4), 399–414. Clark, C. R., U. Doraszelski, and M. Draganska (2009). The effect of advertising on brand awareness and perceived quality: An empirical investigation using panel data. Quantitative Marketing and Economics 7(2), 207–236. Danaher, P. J. and T. S. Dagger (2013). Comparing the relative effectiveness of advertising channels: A case study of a multimedia blitz campaign. Journal of Marketing Research 50(4), 517–534. De Groote, O. and F. Verboven (2016). Subsidies and myopia in technology adoption: Evidence from solar photovoltaic systems. Working Paper, KU Leuven, Belgium. Deng, Y. and C. F. Mela (2018). Tv viewing and advertising targeting. Journal of Marketing Research 55(1), 99–118. Draganska, M. and D. Klapper (2011). Choice set heterogeneity and the role of advertising: An analysis with micro and macro data. Journal of Marketing Research 48(4), 653–669. Du, R. Y., L. Xu, and K. C. Wilbur (2017). Should TV advertisers maximize immediate online response? SSRN Working Paper 3037734. Dubé, J.-P., G. J. Hitsch, and P. Manchanda (2005). An empirical model of advertising dynam-

129 ics. Quantitative Mareting and Economics 3(2), 107–144. Goldfarb, A. and C. Tucker (2011a). Online display advertising: Targeting and obtrusiveness. Marketing Science 30(3), 389–404. Goldfarb, A. and C. Tucker (2011b). Search engine advertising: Channel substitution when pricing ads to context. Management Science 57(3), 458–470. Haans, H., N. Raassens, and R. van Hout (2013). Search engine advertisements: The impact of advertising statements on click-through and conversion rates. Marketing Letters 24(2), 151–163. He, C. and T. J. Klein (2018). Advertising as a reminder: evidence from the dutch state lottery. C.E.P.R. Discussion Paper 12948. Hoban, P. R. and R. E. Bucklin (2015). Effects of internet display advertising in the purchase funnel: Model-based insights from a randomized field experiment. Journal of Marketing Research 52(3), 375–393. Hu, Y., L. M. Lodish, and A. M. Krieger (2007). An analysis of real world TV advertising tests: A 15-year update. Journal of Advertising Research 47(3), 341. Joo, M., K. C. Wilbur, and Y. Zhu (2015). Effects of TV advertising on keyword search. International Journal of Research in Marketing. Kahneman, D. (1973). Attention and effort, Volume 1063. Prentice-Hall Englewood Cliffs, NJ. Kim, J. B., P. Albuquerque, and B. J. Bronnenberg (2010). Online demand under limited con- sumer search. Marketing science 29(6), 1001–1023. Kim, J. B., P. Albuquerque, and B. J. Bronnenberg (2016). The probit choice model under sequential search with an application to online retailing. Management Science. Kitts, B., M. Bardaro, D. Au, A. Lee, S. Lee, J. Borchardt, C. Schwartz, J. Sobieski, and J. Wadsworth-Drake (2014). Can television advertising impact be measured on the web? web spike response as a possible conversion tracking system for television. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, pp. 1–9. ACM. Krugman, H. E. (1972). Why three exposures may be enough. Journal of Adverting Re- search 12(6), 11–14. Lambrecht, A. and C. Tucker (2013). When does retargeting work? information specificity in online advertising. Journal of Marketing Research 50(5), 561–576. Letang, V. and L. Stillman (2016). MAGNA global advertising forecast. https://www.magnaglobal.com/wp-content/uploads/2016/12/MAGNA-December-Global- Forecast-Update-Press-Release-1.pdf. Lewis, R. A. and J. M. Rao (2015). The unfavorable economics of measuring the returns to advertising. The Quarterly Journal of Economics 130(4), 1941–1973. Lewis, R. A. and D. H. Reiley (2013). Down-to-the-minute effects of super bowl advertising on online search behavior. In Proceedings of the fourteenth ACM conference on Electronic

130 commerce, pp. 639–656. ACM. Liaukonyte, J., T. Teixeira, and K. C. Wilbur (2015). Television advertising and online shop- ping. Marketing Science 34(3), 311–330. Lodish, L. M., M. Abraham, S. Kalmenson, J. Livelsberger, B. Lubetkin, B. Richardson, and M. E. Stevens (1995). How TV advertising works: A meta-analysis of 389 real world split cable tv advertising experiments. Journal of Marketing Research, 125–139. Manchanda, P., J.-P. Dubé, K. Y. Goh, and P. K. Chintagunta (2006). The effect of banner advertising on internet purchasing. Journal of Marketing Research 43(1), 98–108. McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57(5), 995–1026. Melnikov, O. (2013). Demand for differentiated durable products: The case of the us computer printer market. Economic Inquiry 51(2), 1277–1298. Nadarajah, S. and S. Kotz (2008). Exact distribution of the max/min of two gaussian random variables. IEEE Transactions on very large scale integration (VLSI) systems 16(2), 210–212. Newey, W. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle and D. McFadden (Eds.), Handbook of Econometrics, Volume 4, pp. 2112–2245. : . Roberts, J. H. and J. M. Lattin (1991). Development and testing of a model of consideration set composition. Journal of Marketing Research, 429–440. Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica 55(5), 999–1033. Schweidel, D. A. and R. J. Kent (2010). Predictors of the gap between program and commercial audiences: An investigation using live tuning data. Journal of Marketing 74(3), 18–33. Sethuraman, R., G. J. Tellis, and R. A. Briesch (2011). How well does advertising work? generalizations from meta-analysis of brand advertising elasticities. Journal of Marketing Research 48(3), 457–471. Sherman, L. and J. Deighton (2001). Banner advertising: Measuring effectiveness and optimiz- ing placement. Journal of Interactive Marketing 15(2), 60–64. Sovinsky Goeree, M. (2008). Limited information and advertising in the US personal computer industry. Econometrica 76(5), 1017–1074. Stephens-Davidowitz, S., H. Varian, and M. D. Smith (2017). Super returns to super bowl ads? Quantitative Marketing and Economics 15(1), 1–28. Stigler, G. J. (1961). The economics of information. Journal of Political Economy 69(3), 213– 225. Tellis, G. J., R. K. Chandy, and P. Thaivanich (2000). Which ad works, when, where, and how often? modeling the effects of direct television advertising. Journal of Marketing Re- search 37(1), 32–46.

131 Tuchman, A. E., H. S. Nair, and P. M. Gardete (2017, Oct). Television ad-skipping, consumption complementarities and the consumer demand for advertising. Quantitative Marketing and Economics. Wilbur, K. C. (2016). Advertising content and television advertising avoidance. Journal of Media Economics 29(2), 51–72. Wilbur, K. C., L. Xu, and D. Kempe (2013). Correcting audience externalities in television advertising. Marketing Science 32(6), 892–912. Woltman Elpers, J. L., M. Wedel, and R. G. Pieters (2003). Why do consumers stop viewing television commercials? two experiments on the influence of moment-to-moment entertain- ment and information value. Journal of Marketing Research 40(4), 437–453.

132