STRUCTURAL MODELS OF TECHNOLOGY

by

Botao Yang

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Joseph L. Rotman School of Management

University of Toronto

© Copyright by Botao Yang (2009) Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-55690-0 Our file Notre reference ISBN: 978-0-494-55690-0

NOTICE: AVIS:

The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, pr&er, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non­ support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

1+1 Canada Structural Models of Technology Adoption

Botao Yang

Doctor of Philosophy (2009)

Joseph L. Rotman School of Management

University of Toronto

ABSTRACT

This dissertation consists of two essays, studying technology adoption decisions from both the demand/consumer side and supply/firm side by using structural models. Essay 1 investigates consumers' ATM card adoption decisions and Essay 2 examines firms' 56K modem adoption decisions in 1997.

The first essay, "Dynamics of Consumer Adoption of Financial Innovations: the Case of ATM Cards", offers a new explanation to a stylized fact - seniors' low technology adoption rate. Previous literature tries to rationalize this fact by arguing that seniors have psychological resistance toward technology, or they have more difficulties in learning new technologies.

However, one potential explanation has been neglected: the elderly have shorter life horizons than the young, and consequently they have smaller discounted adoption benefits. To capture this, we model consumers to be forward-looking and solve a finite-horizondynami c programming problem when making adoption decisions. We apply this framework to the case of

ATM cards. To measure monetary benefits from ATM card adoption, we also model how consumers make cash withdrawal decisions. We estimate the structural parameters by using a

ii micro-level panel dataset. We find evidence that the elderly may not have larger monetary adoption costs for ATM cards.

The second essay, "Are All Managers Created Equal", explores the idea that managers have different strategic thinking levels when playing a simultaneous entry game. Based on the cognitive hierarchy framework of Camerer, Ho and Chong (2004), we develop a structural model that estimates the level of strategic thinking. In the model, firms with a high level of strategic thinking are more likely to correctly conjecture the expected actions of their competitors. We apply this model to decisions by 2,233 Internet Service Providers to offer their customers access through 56K modems in 1997. The model is validated by showing that firms with a higher probability of strategic thinking were more likely to have survived through April

2007. The estimation results show considerable heterogeneity in the degree to which firms behave strategically and suggest that strategic ability affects marketing outcomes: a simulated increase in strategic ability means that fewer firms offer the technology to their customers.

in To my parents

IV ACKNOWLEDGEMENTS

I am deeply grateful to my dissertation committee members, Andrew Ching, Sridhar

Moorthy, Avi Goldfarb, and Victor Aguirregabiria, for their exceptional guidance and help on this dissertation. Without your help, writing my dissertation could never have been such a rewarding experience.

I want to thank all faculty members in the Marketing area at the University of Toronto for their numerous help along the way. In particular, I would like to thank Andy Mitchell,

Mengze Shi, Dilip Soman, Nitin Mehta, Ron Borkovsky, Claire Tsai, Min Zhao, David

Soberman, and Sergio Meza. 1 also appreciate the comments from Ignatius J. Horstmann,

Kenneth Corts and other seminar participants at the University of Toronto.

Many thanks to my fellow PhD students and friends in Toronto. Their friendship made my PhD life more enjoyable and memorable.

My special thanks goes to my parents and my fiancee Lori Qingyuan Yue. This dissertation was made possible by your unconditional love and encouragement.

Finally, I want to say "thank you" to all those who ever helped me in one way or another during my PhD life. Although I am not able to list all your names here, I will never forget your help.

v TABLE OF CONTENTS

INTRODUCTION 1 ESSAY 1 5

1 INTRODUCTION 6 2 LITERATURE REVIEW ll 3 INSTITUTIONAL DETAILS 13 4 DATA 16 5 MODEL 21 6 EMPIRICAL STRATEGY 29 7 ESTIMATION RESULTS AND COUNTERFACTUAL EXPERIMENTS 32 8 LIMITATIONS 40 9 CONCLUSION 41 APPENDIX 43 REFERENCES 44 ESSAY 2 85

1 INTRODUCTION 86 2 A REVIEW OF THE Two KEY BUILDING BLOCKS 90 3 MODEL AND EMPIRICAL STRATEGY 94 4 RESULTS 101 5 LIMITATIONS 107 6 CONCLUSION no REFERENCES 112 APPENDIX 126

VI 1

Introduction 2

The diffusion of a new technology usually requires adoption decisions from both the supply side and the demand side - firms should carry the new technology products or provide the new services; consumers should decide to either buy the new technology products or use the new services. Consequently, it is important to study adoption decisions from both sides. This dissertation fulfills this task - it studies both consumers' technology adoption decisions and firms' technology adoption decisions by using structural models. Specifically, Essay 1 investigates forward-looking consumers' ATM card adoption decisions in a dynamic discrete choice model, and Essay 2 examines firms'

56K modem adoption decisions in 1997 in a static simultaneous entry game framework.

Essay 1: Dynamics of Consumer Adoption of Financial Innovations: The Case of

ATM Cards (coauthored with Andrew Ching)

Consumer technology adoption has long been a research topic in Marketing and

Economics. One interesting stylized fact is that usage of new technologies by the elderly is consistently much lower than that by other age groups. Previous literature tries to rationalize this fact by arguing that the elderly have psychological resistance toward new technologies, or it is relatively more difficult for them to learn and use new technologies.

If one estimates individuals' adoption costs in a static choice model, this stylized fact would translate into higher adoption costs for the elderly. However, there is one potential explanation that has been neglected in the previous literature: the elderly have much shorter life horizons than the young, and consequently their total discounted benefits from adoption could also be much smaller. In order to capture this factor, we explicitly model consumers to be forward-looking and solve a finite horizon dynamic programming 3 problem when deciding whether to adopt a new technology. We apply this framework to the case of ATM cards. To measure monetary benefits per period from ATM card adoption, we also explicitly model how consumers make cash withdrawal decisions. We estimate the structural parameters of our model by using a micro-level panel dataset, which consists of detailed demographic information, individuals' adoption decisions of

ATM cards and cash withdrawal patterns, and the number of ATM machines and interest rates over time, as provided by the Bank of . The estimation results allow us to measure the relative importance of adoption costs and total discounted benefits in influencing consumers' ATM card adoption decisions. We find evidence that the elderly may not have larger adoption costs for ATM cards in Italy - the lower ATM card adoption rate among the elderly can be explained in terms of differences in total discounted benefits of adoption across age groups. Since we can infer consumers' adoption benefits per period from observing their usage patterns, then combining this with the dynamic model of adoption decisions, we are able to measure adoption costs in monetary terms - this is another important contribution of this paper. By conducting counterfactual experiments, we quantify how consumers' ATM adoption decisions would be affected by changing (i) the amount of sign-up bonuses, (ii) number of ATMs, and (iii) interest rates.

Essay 2: Are All Managers Created Equal (coauthored with Avi Goldfarb)

Some managers are better than others. Based on the cognitive hierarchy framework of Camerer, Ho and Chong (2004), we develop a structural econometric model that estimates the level of strategic thinking. In the model, firms with a high level 4 of strategic thinking are more likely to correctly conjecture the expected actions of their competitors. We apply this model to decisions by managers at 2,233 Internet Service

Providers to offer their customers access through 56K modems in 1997. The model is validated by showing that firms with a higher estimated probability of strategic thinking were more likely to have survived through April 2007. The estimation results show considerable heterogeneity in the degree to which firms behave strategically and suggest that strategic ability affects marketing outcomes: a simulated increase in strategic ability means that fewer firms offer the technology to their customers. 5

Essay 1

Dynamics of Consumer Adoption of Financial Innovations: The Case of ATM Cards (coauthored with Andrew Ching)

Keywords: Technology Adoption, Monetary Adoption Cost, Optimal Stopping, Cash

Demand Model, ATM Cards, Elderly People 6

1 Introduction

Consumer technology adoption has long been a research topic in Marketing and

Economics. One interesting stylized fact is that usage of new technologies (e.g., calculators, computers, video recorders, cable television, and automated teller machines

(ATMs)) by the elderly is consistently much lower than that by other age groups

(Kerschner and Chelsvig (1984), Gilly and Zeithaml (1985)).' Previous literature tries to rationalize this fact by arguing that either the elderly have psychological resistance toward new technologies (technophobia), or it is relatively difficult for them to learn and use new technologies (Adams and Thieben (1991), Hatta and Liyama (1991), Rogers et al. (1996)). If one estimates individuals' adoption costs in a static choice model, this stylized fact would translate into higher adoption costs (both physiological and psychological) for the elderly. However, one potential explanation has been neglected in the previous literature: the elderly have much shorter life horizons than the young, and consequently their total discounted benefits from adoption could also be much smaller.

Ignoring the differences in total discounted benefits from adopting a new technology could lead to biased estimates in adoption costs for various age groups. To capture this factor, we explicitly model consumers to be forward-looking and solve a finite horizon dynamic programming problem when deciding whether to adopt a new technology. Our first goal is to use this framework to measure the relative importance of adoption costs and total discounted future benefits in influencing adoption decisions.

Our empirical setting is ATM card adoption and we estimate the structural parameters of our model on a unique micro-level panel dataset provided by the Bank of

Italy. The 's unique dataset is essential for us to achieve our second goal in 7 this paper - to provide an estimate of adoption costs in monetary terms. This dataset consists of detailed demographic information (age, income, consumption, gender, education, etc.), individuals' adoption decisions of ATM cards and cash withdrawal patterns, the number of ATM machines and interest rates over time, and the average survival probabilities for different ages. Most importantly, the information on income, consumption, cash withdrawal patterns - both before and after adopting an ATM card, and interest rates over time, allows us to model individuals' cash withdrawal decisions using a cash demand model. In particular, we are able to calibrate this cash demand model and use it to measure the monetary benefits per period from adopting an ATM card conditional on individuals' observed characteristics. Moreover, the information on age and survival probabilities allows us to incorporate consumers' different life horizons in our model, and hence calculate their discounted future benefits and recover their adoption costs more accurately.2 The basic intuition is that we have consumers' ATM and bank counter usage data (to withdraw cash). With the help of the cash demand model, we can infer consumers' adoption benefits (at least partial) per period from their usage patterns. Then combining this with the dynamic model of adoption decisions, we are able to measure adoption costs in monetary terms.

There are three reasons why ATM cards provide a representative case to study consumers' adoption decisions of a new technology. First, the costs, especially non- pecuniary learning costs, are typically incurred at the time of adoption and usually cannot be compensated by the benefits that immediately follow adoption.3 By and large, as in any durable goods purchase case, the benefits from adopting an ATM card are benefit flows, which are received throughout the life of the acquired ATM technology. Without 8 considering people's forward-looking behavior, a static choice model would miss the future benefits and thus underestimate the adoption cost to a large extent. Second, the

ATM provides a typical example that the elderly tried and adopted to a lesser degree than the non-elderly (Gilly and Zeithaml (1985), Kerschner and Chelsvig (1984), Rogers et al.

(1996)). Similar results are found in our reduced-form regressions even after controlling for other personal characteristics, such as education, gender, income, and geographic area.

Without a dynamic model to take into account older people's shorter life horizons, we could overestimate the relative adoption costs of older people. Third, some adoption decisions had to be made in an uncertain situation, especially in the introductory stages of the ATM technology, when it was difficult to foresee how the technology would evolve in the future so the total adoption benefits would appear to be small. For instance, consumers might worry about the availability of the ATM network. Many consumers would thus intentionally postpone their adoption decisions until there were enough

ATMs nearby and the technology was sufficiently reliable. All these features are common for many new technologies like online banking and the Internet, and hence the general framework developed here should be applicable to other cases as well, albeit with some modifications. More generally, the finite horizon dynamic programming model can also be used to study people's new skills acquisition decisions (for example, your decision about whether to learn a new computer language) as long as a terminal period is involved. The terminal period is not limited to the terminal age; it could be the retirement age, or contract expiration date.

To the best of our knowledge, this is the first estimated dynamic structural model that: (i) introduces a life-cycle forward-looking framework to model adoption decisions 9 for different age groups; (ii) based on some partially observed benefits of adopting a technology, provides an estimate of adoption costs in monetary terms; (iii) examines consumers' adoption decisions of a financial innovation. Because of the "graying" of the marketplace in today's world, studying the behavior and decision making process of older people is becoming increasingly important. Marketers are also interested in this segment more than before because the elderly population has been growing and becoming stronger in terms of total purchasing power. The estimation results of this model have important managerial implications for designing optimal marketing mixes to accelerate consumers' adoption decisions. Consider a scenario where banks try to encourage their customers to adopt ATM cards - by persuading their customers to use

ATMs, banks can hire less human tellers and save expensive labor costs. Banks can either reduce customers' adoption costs by launching a training program to teach their customers how to use ATM cards, or increase customers' adoption benefits by, for instance, installing more ATMs in the neighborhood. Both launching a training program and installing more ATMs cost money. If the marketing budget is fixed, banks should consider how to allocate the total marketing budget in order to produce the highest return.

Intuitively, higher adoption costs suggest that banks should pay more attention to the cost side and allocate more marketing resources to reduce consumer adoption costs.

Technically speaking, in the optimal marketing mix, in terms of marginal effectiveness, the last $1 spent in reducing consumer adoption costs should be identical to the last $1 spent in increasing consumer adoption benefits. If we have decreasing marginal effectiveness of money in both reducing adoption costs and increasing adoption benefits, larger adoption costs may suggest that a larger proportion of marketing resources be 10 allocated to reduce adoption costs. This also explains why we need to get the estimates of adoption costs in different age groups as precise as possible. For instance, if elderly people have larger adoption costs, more marketing resources should be used to help reduce their adoption costs; if the lack of adoption among the elderly is mainly caused by their smaller total benefits, more incentives should be provided to this group in order to induce adoption. Different estimates of adoption costs have different marketing implications.

Our results can be summarized as follows. After taking the expected total discounted benefits into account, we find evidence that the elderly may not have larger adoption costs for ATM cards in Italy. The lower ATM card adoption rate among the elderly can be explained in terms of differences in total discounted benefits of adoption across age groups. In the model with two latent segments, we find that the adoption costs are €142.50 and €207.54 in 2002 euros. By conducting counterfactual experiments, we quantify how consumers' ATM adoption decisions would be affected by changing (i) the amount of sign-up bonuses offered to the elderly, (ii) number of ATMs, and (iii) interest rates.

The remainder of this paper is structured as follows. Related literature is discussed in Section 2. Section 3 outlines relevant institutional details about ATMs and the banking system in Italy. Section 4 describes the unique micro-level panel data that we employ in this work. Section 5 presents the model. The estimation algorithm and some identification issues are discussed in Section 6. Section 7 shows the estimation results and findings from several counterfactual experiments. Section 8 lists some limitations and Section 9 concludes the paper. 11

2 Literature Review 2.1 Measuring Adoption/Switching Costs in a Dynamic Structural Model

The model that we develop in this paper is related to the one presented in

Swanson et al. (1997), who also argue that the elderly have shorter life horizons, and hence have less incentives to adopt new technologies. However, their paper is mainly a theoretical one, and they do not estimate their model or measure the relative importance of adoption costs and total expected discounted benefits.

This paper is also related to the work of Ryan and Tucker (2007), who study technology adoption and communications choice decisions of a multinational bank's employees in an infinite horizon dynamic model. In their model, the length of a period is very short and hence an infinite horizon dynamic programming model provides a good proxy to the environment. However, their data does not allow them to measure the benefits of adopting the new technology in monetary terms. As a result, unlike our paper, they can only measure adoption costs in a relative sense instead of in monetary terms.

Another related paper is by Song and Chintagunta (2003), who study consumer adoption decisions of a new durable product. They assume that the benefits of adopting a new product come from a composite quality index. In contrast, we use a cash demand model to explicitly measure the monetary benefits of adopting ATM cards, based on interest rates and cash withdrawal patterns. This is why our framework is able to recover adoption costs in monetary terms.

We are aware of two papers, in which infinite horizon dynamic programming models are used to measure consumer switching costs: Goettler and Clay (2007) use 12

micro-level data to estimate switching costs in a model of consumer learning and tariff

choice; Shcherbakov (2007) uses aggregate level data to measure consumer switching

costs in the US television industry. It is reasonable to apply infinite horizon dynamic

models in their papers, since Goettler and Clay (2007) use weekly data and Shcherbakov

(2007) only has aggregate level data. By contrast, we adopt a finite horizon dynamic

life-cycle model because we want to examine the technology adoption decisions by

consumers in different age groups, and our individual level panel dataset makes this task possible.

More broadly, the paper is related to the health investment literature: see Khwaja

(2001) and Fang et al. (2007). Their major point is that, from a dynamic perspective, better insurance may increase one's life expectancy, and consequently enhances one's incentive to invest in health. This dynamic effect counteracts the usual "moral hazard" story in static models that insurance induces more risky behavior. We share with them the idea that the longer the expected planning horizon (life-span in our cases), the greater is the incentive to invest in a new technology or heath. Ratchford (2001) also discussed a similar idea in a human capital model. One prediction of the human capital model is that

"finite life reduces return to new investments in knowledge as consumers age", which is very similar to our argument that elderly people lack incentives to adoption new technologies because they have shorter time to benefit from the new technologies.

2.2 ATM Adoption

ATM adoption itself is not a new topic in economics. Much of the existing literature examines banks' ATM adoption decisions (for example, Hannan and 13

McDowell (1984, 1987), Saloner and Shepard (1995), Ishii (2005), Ferrari et al. (2007)).

However, since the ATM market is a two-sided market, banks' decisions on whether to install more ATM machines depend on how many consumers adopt ATM cards (and vice versa). Surprisingly, there is little research analyzing consumers' ATM card adoption decisions (a lack of consumer-level data might be one important reason for this). To our knowledge, there are only three empirical papers that provide a quantitative study of consumers' ATM card adoption decisions (Attanasio et al. (2002), Huynh (2007),

Riccuarelli (2007)) and all of them use a static choice model. However, as argued in the introduction, consumers' ATM card adoption decisions are not likely to be myopic decisions. We need a dynamic model to capture consumers' forward-looking adoption decisions and more accurately recover consumers' adoption costs.

3 Institutional Details 3.1 ATMs in Italy

ATMs were first introduced to Italy in the 1970s (Canato and Corrocher (2004)).

Bancomat, the Italian inter-banking cash dispenser project, was promoted by the Italian

Society for Interbanking Automation starting in 1983 (Orlandi (1989)). During the time period we study in this paper, Bancomat was the only ATM network in Italy, which allowed customers at all Italian banks in the system to use any ATM in the system. An

ATM card is also called a Bancomat card in Italy.

Hester et al. (2001) provide a detailed discussion of the evolution of ATMs and the Italian banking system. According to them, because of privatization, changing regulations, reduced restrictions on branching, and the rapid technical progress in data 14 processing, the Italian banking system underwent substantial restructuring since 1988.

At the same time, there was a rapid expansion in branches and ATMs throughout Italy.

Figure 1 and Figure 2 show the number of ATMs and the ratio between ATMs and branches from 1991 to 1999, respectively. Between 1991 and 1999, the number of bank branches rose from 18,332 to 27,134 and the number of ATMs rose from 11,601 to

30,266. ATMs and branches had been growing at different rates in the five major geographic areas of Italy (Northwest, Northeast, Central, South, and Islands). In all areas the ratio of ATMs to branches had increased. In 1991 the ratio of ATMs to branches was highest in the Northwest, which includes financial centers like Milan and Turin, and lowest in the Islands, which is the poorest area of Italy. By 1999, the ratio of ATMs to branches was almost constant in the ; only the islands of and

Sicily lagged in this ratio.

Figure 3 shows the overall ATM card adoption rate from 1991 to 2004. The numbers are calculated from the Bank of Italy's Survey of Household Income and

Wealth (SHIW). Generally speaking, the adoption rate has been steadily increasing over time, with a 31.9% adoption rate in 1991 and 57.8% of households having at least one

ATM card in 2004.

3.2 Some Facts about the Banking System in Italy

Many Italian banks charge an annual service fee for ATM cards, but it has never been a significant amount: Attanasio et al. (2002) show that the average yearly fee was

6.2 euros on a sample of 38 banks. There are no additional service charges when a customer uses an ATM card issued by the bank owning an ATM. 15

The normal bank account for day-to-day transactions in Italy is a cheque or current account. All cheque accounts in Italy are interest bearing, and interest is received quarterly. An ATM card needs to be linked to a cheque account before it can be used to withdraw cash.

In Italy, bank opening hours vary according to the bank and location. In general, banks are open from 08:30 until 13:30, and then again for an hour and a half from 14:30 until 16:00. Banks are generally closed on weekends and holidays. On the day before a holiday, banks are often closed in the afternoon as well.

3.3 The Use of ATM Cards in Italy

Most ATM cards in Italy have POS (point of sale) functionality, which means that they can be used as debit cards to check out in places like shopping malls and supermarkets as long as merchants are equipped with POS terminals. Credit cards can usually be used to make payments at POS terminals as well. However, during the period studied in this paper, Italian ATM cards were "primarily used for cash acquisition" and credit card number and use had a "comparatively low base" in Italy (see European

Payment Cards Yearbook 2005-6). Table 1 shows Italy retail trade value, debit card POS transactions and credit card POS transactions from 1999 to 2007. It is easy to see that only a very small proportion of retail trade sales were paid by debit cards. For example, in 2004, the percentage is 31,667/754,206 = 4.2%. Even we take into account the fact that only ATM card holders can make debit card payments at POS terminals, the percentage is still small. Again, in 2004, the adoption rate is 57.8%, so the new percentage is 31,667/(754,206*0.578) = 7.3%. Considering the facts that ATM card 16 adopters are generally richer and spend more and debit card transactions include payments made at restaurants, hotels, and other non-retail services, the actual percentage of retail trade paid by debit cards should be smaller than 7.3%. The value of credit card transactions is pretty small compared to debit card transactions.

The above evidence indicates that, although equipped with POS functionality,

ATM cards were mainly used to withdraw cash in Italy, at least in the period studied in this paper (1991-2004). Credit card payments were very unpopular in Italy. Actually, according to a recent (2009) article on International Relations and Security Network,

"Italy, for the most part, remains a cash economy". These institutional details are critical to our model assumptions in section 5, where we use consumption of non-durable goods to measure consumption financed by cash in a cash demand model.

4 Data

The data used in this paper come from four different sources: (i) Survey of

Household Income and Wealth from the Bank of Italy; (ii) interest rate data drawn from the Bank of Italy's public database; (iii) ATM data drawn from another special survey from the Bank of Italy (iv) population and survival probability data obtained from the

Italian Institute of Statistics.

4.1 Bank of Italy Survey of Household Income and Wealth (SHIW)

The SHIW is a comprehensive socio-economic survey; this database contains information regarding: 1) Individual characteristics and occupational status, 2) Sources of household income, 3) Consumption expenditures. 17

These surveys were conducted on an annual basis from 1977 to 1987, except that

1986 is skipped, and then on a bi-annual basis from 1989 to 1995, and then from 1998 to

2004. We select a panel from 1991 to 2004 as the sample used for estimation. 1991 is the first year that ATM card information appears in the Bank of Italy's public database and the 2004 survey is the latest SHIW wave that is available to the public. The key questions for this study in the survey include the following:

ATM card:

"Did you or any other member of your household have an ATM card?"

Average amount of withdrawal at an ATM/bank counter:

"What was the average amount per withdrawal?"

We selected a panel mainly based on the following several criteria: (i) panel households need to have bank accounts; (ii) they need to be non-adopters of ATM cards in their first observation periods; (iii) we know where they live at the provincial level; (iv) they are observed through 2004. There are 694 households observed from 1991 through

2004. We exclude a few outliers with unreasonably high income/consumption levels or irregular adoption patterns (for example, first "non-adoption", then "adoption", then

"non-adoption", then "adoption"...) from the panel, which account for less than 5% of the total observations. 387 of the remaining households had a bank account, but did not possess an ATM card in 1991. 96 new households were added to the panel in 1993 - each of them had a bank account, but did not have an ATM card in 1993. 51 new households were added in 1995 - again, they were bank account holders, but non- adopters of ATM cards in 1995. Figure 4 shows the composition of the panel households 18 selected for estimation. There are two reasons to fix the panel households from 1995 and only select non-adopters in their first observation periods. First, we find it rare in this panel that households abandon ATM cards after adopting them. Therefore, we will model ATM card adoption as an optimal stopping problem4, while keeping a reasonably long time horizon for all panel households. Second, the provincial level (the size of one province in Italy is comparable to that of a county in the US) residence location data that we obtained is limited to pre-1998 households. The Bank of Italy's public database does not contain residence location information at the provincial level. In order to match our panel households with the number of ATMs data, which is at the provincial level, we do not add new households to our panel after 1995.

Table 2 summarizes the cumulative adoption rate of this panel. Because we only select non-adopters in 1991, the adoption rate is zero in that year. After that, the adoption rate is increased by about 10% every two years (except for the 20% increase from 1995 to 1998, and 4% increase from 2002 to 2004).

Table 3 shows the summary statistics of some key variables. Generally speaking, this is a sample of an old population with the average age of the household head at 52 in

1991 and 62 in 2004, with standard deviation at around 13-14. This shows that the data has a lot of variation in age, and therefore should be quite suitable for estimating a consumer life-cycle model (also see Figure Series Al in the appendix for age dispersions in different survey waves). Both (nominal) household income and consumption of non- durables have a slightly upward trend. The percentage of male household heads has a decreasing trend, probably reflecting the demise of male heads and female longevity.

This also indicates that some households did change heads over time and household head 19 demographics could be time varying. 55.62% of households live in the north or in the central area of Italy. The remaining 44.38% live in the south or in the islands area.

Comparatively speaking, this is a poorly educated sample. Less than 5.5% of household heads hold a bachelor's degree or above. Around 20% of household heads have a high school diploma and about 30% have a middle school diploma. Almost 40% of the heads have only received elementary school education and this is the largest segment of the panel population. On average, more than 5% of the heads have not received any education at all. By comparison, the corresponding percentages are 17% (pre-primary and primary education), 32% (lower secondary education), 37% (upper secondary education), and 14% (post-secondary education) for 25-to-64-year-olds in Italy, by highest level of education attained. (OECD Indicators (2007))

Figure 5 depicts the adoption level by age over time. We can clearly see that age is a negative factor in predicting ATM card adoption - seniors over 65 have a much lower adoption rate than people aged less than 50.5 To prevent confusion, we do not show the adoption rate curve for the 50-65 age group because, for example, a 55 years old person in 1991 would become 68 in 2004. Figure 6 shows the adoption rate by education. Also, it is clear that education level is positively related to ATM card adoption. Figure Series

A2 in the appendix displays the adoption rate by both age and education in a wave-by- wave manner. It shows similar patterns conditional on both age and education.

4.2 Interest Rate 20

The nominal interest rate6 on current account deposits is also drawn from the

Bank of Italy's public database, which is available at http://bip.bancaditalia.it/4972unix/homebipeng.htm.

The time-series interest rate variation includes an increase in the early part of the

1990s and then a steady decrease up to 2004. This variation is mainly caused by Italy's entrance into the European Monetary Union. Since the interest rate is at the regional level, we only show the average value over Italy's 20 regions in Table 4 (for more details, please see p. 62, Technical Appendix of Huynh (2007)).

4.3 ATM Data

Before 1998, the data on the number of ATMs was drawn from another special survey from the Bank of Italy. The Bank of Italy also provides provincial information about the number of ATMs from 1997 to 2006.

Table 5 displays the number of ATMs per 1,000 population. Because the ATM data is at the provincial level and there are more than 90 provinces in Italy (the number was 109 as of 2006), we only show the average number over the provinces that the panel households lived in. By and large, the number of ATMs per 1,000 population has an upward tendency, with a big jump from 1995 to 1998 and a stable transition from 2002 to

2004.

4.4 Population and Survival Probability Data 21

National and provincial level data about population and age-conditional survival probability are obtained from the website of the Italian Institute of Statistics (ISTAT): http://demo.istat.it/index_e.html

Figure 7 shows the 2004 Italian national level survival probability conditional on age (the probability of surviving until t + 1 at age t). The survival probability is obviously a decreasing and non-linear function of age.

5 Model

Before delving into the mathematical model, it is useful to briefly discuss the benefits and costs associated with adopting an ATM card. The benefits are incremental benefits compared to the traditional way of withdrawing money from a human teller at a bank counter. The costs are "the costs of change."7

Benefits

The benefits from adoption mainly lie in reduced transaction cost (versus withdrawing money at a bank counter), more interest savings (can put more money in an interest-bearing bank account) and increased convenience (24-hour ATMs vs. daytime human tellers). The means of measuring the adoption benefit is explained in the cash demand model shown below.

Costs

There are three types of costs involved with adopting an ATM card: the initial adoption cost (including learning cost, psychological cost, hassle cost, etc.), the ongoing annual fee, and the usage-based transaction fee. Although we do not have detailed data 22 on either the annual fee or the transaction fee, this should not be a serious problem. A bank customer can use an ATM card for free at ATMs owned by the bank issuing the

ATM card; therefore, to a large extent consumers can manage to avoid transaction fees.

As discussed in section 3.2, the annual fee has never been expensive and the average yearly fee was only 6.2 euros (Attanasio et al. (2002)). What we are most interested to recover in this paper is the initial adoption cost, although our estimate should include the total discounted sum of annual fees, if any.

5.1 Adoption Benefits: A Cash Demand Model8

In order to quantitatively measure the cost savings from adopting an ATM card, we use an extension of the Baumol (1952) - Tobin (1956) cash demand model.9 It is a cash inventory management model where the consumer chooses the average amount of withdrawal, m, to minimize the sum of transaction costs and interest losses, TC. Interest losses are the forgone interest from holding cash rather than putting it in an interest- bearing bank account (remember in Italy, checking accounts are interest-bearing and the nominal interest rate could be as high as more than 10% - like in 1993). The objective function is shown in the following equation:

(1) mmmTCj=w*TJ*(X) + R*(%), where w is the unit time cost of transaction (opportunity cost of time); 7} measures the technology-specific transaction time of each withdrawal (7\ for ATM and T0 for no ATM,

s tne TL < T'o); 9 i consumption financed by cash in each time period (q for ATM and c0 for no ATM), so — is the average number of withdrawals in each period; R is the interest 23 rate. The first term, wT,- (-£•), captures the total transaction cost in each period. The second term, i?(y) measures interest losses because the average cash inventory in hands is y. There is a trade-off between reducing transaction costs and avoiding interest losses: a larger m means less withdrawal transactions, but more interest losses in each period.

Simple algebra gives us the optimal amount of cash withdrawal and the minimized total cost:

(2) m* = JlwTjCj/R = JWj * Jwcj/R

(3) TC- = J2wTjCjR = JTf) * Jwc~R.

Thus, the total cost saving from adoption per period can be represented by the difference between the minimized total cost without an ATM card (TCQ) and the minimized total cost with an ATM card (TC():

10 (4) Arc = rc0*-rc1* = (A/27^-V27^')*V^

We assume that w is a function of annual income (y^) and Cj is a function of the

n consumption of non-durable goods (cit). Suppose w=A * yit, and c;=//;- * cit, where A and Hj are constants and 0 < /^ < 1. Note that we allow for the case that ju0 =£ Mi* which indicates that the proportion of consumption of non-durables financed by cash is conditional on the ATM card adoption decision. In other words, ATM holders might make purchases through POS transactions, so a smaller proportion of their consumption is paid by cash. We take this possibility into account in our model. ATC can then be

expressed as a variable - (^J2XIA.0T0 — ^2\\i{Ty) * •v/yjitcii£/?jit, which is directly

12 proportional to y/J^c~[R~t. 24

5.2 ATM Card : An Optimal Stopping Problem

In the data most panel households would keep an ATM card once they adopted it

(also see p. 24, Technical Appendix of Huynh (2007)) - there are rare occurrences of households first adopting an ATM card and then discarding it in our panel. Consequently, we model the adoption decision as an optimal stopping problem.

Depending on the adoption status (ai>t) and the state variables (Sijt), the utility function for household i in time t can be shown as:

(5) U(aiitlSLt) =

> tfoi (At) = U(ATCi>t, niit) + iKO - Fiit + eilt, if ai>t = 1 and Vs < t, aiiS = 0

Un{Siit) = U(ATCiit,niit) + 0(0, if 3s < t,aLs = 1 V, e tfoofot) = m> if Vs < t, aiiS = 0 where subscript 1 means adoption and 0 means non-adoption. In Uxy, x is the adoption status in the previous period and y is the adoption status in the current period; (/01 is the current period utility of a new adopter; Un is the current period utility of an old adopter;

UQQ is the current period utility of a non-adopter; A7Q)t is the cost saving from adoption defined in the previous subsection; niit is the number of bank ATMs per 1,000 population; i/>(t)13 captures the time trend of the ATM technology and an increasingly attractive

14 ATM technology means -^ii>0; Fix is the one-time lump sum adoption cost; eiU and ei0t are error terms.

In our empirical estimations, we use a linear functional form for U[ATCiit, nit):

(6) U(ATClit, niit) = bTC * ATCU + bn * ni>t

b = TC * (V^I^To - V2A^i7\) * yJyi,tCi,tRu + K* niit. 25

The Bellman equation for household i in time t can be written as:

(7) V(Sit) = maxaE{u(atitlSiit) + zu+1/?/ V{Siit+1)dF(Siit+1\aL..S^)}, where ziit+i is the survival probability of household i's head from time t to t + 1. At the terminal period, we assume Z( t+1=0 and U(ait, Sit)-0. For readers who are not familiar with dynamic programming, please refer to Appendix 2, where we present an analytical example to illustrate a forward-looking household's adoption decisions.

Specifically, the Bellman equation for the optimal choice of potential adopter i is:

(8) VQ(SU) = max{K01(%), V0Q(SLt)}.

And,

(9) V01(Siit) = UQ1(SU) + zlit+10f V11(SLt+1)dF(Siit+1\Siit) is the value of adopting an ATM card in time t.

(10) V00(Siit) = U00(Sitt) + zi>t+1/3f V0(Siit+1)dF(Su+1\SLt) is the value of still waiting in time t.

(ID V11(Siit) = Un(Siit) + ziit+1pfVn(Siit+1)dF(Su+1\Siit-) is the value of holding an ATM card in time t.

Since we do not allow ATM cards to be abandoned, there is no expression for

Vw(Siit).

The likelihood increment for household i in time t is then:

(12) U,t = PrM%) > V0o(Siit)) * (1 - au_i) * au +

Pr[M%) ^ Vbo&t)] * C1 ~ a*-0 * (1 ~ a^> where

(13) Pr[Koi(5u) > Vbofo,,)] = 26

Pr[Uoifo,t) + Ztx+iPf Vu(Siit+1)dF(Slit+1\Sitt) >

£/oo(At) + Zit+iPf Vo(Su+i)dF(Stit+1\Slit)];

Individual Heterogeneity: A Concomitant Variable Latent Class Model

We incorporate unobserved individual heterogeneity by using a concomitant variable latent class segmentation (see Dayton and McReady (1988), Gupta and

Chintagunta (1994)): if household i belongs to segment r, the initial adoption cost would be

(14) Flit = F0ir + ax * (ageu - S0\agei:t > 50) + a2 * (50 - 19\ageiit > 50)

15 +a2 * (agetit - 19\agei>t < 50).

By using the above expression, we allow the adoption cost to vary upon age and the coefficients of age-specific adoption cost to be different in two age groups - ax for the less than 50 age group and a2 for the over 50 age group. There are reasons for us to choose this functional form and pick 50 as the cut-off point. We choose this functional form instead of a quadratic specification because it is easier to interpret the meanings of two key coefficients in equation (14) - a-^ and a2 represent the age-specific adoption costs for the less than 50 age group and over 50 age group, respectively. As for the choice of age=50 as the cutoff, we also tried 60 and 65 as the cut-off point in static model estimations - as shown in Table Al in the appendix. The qualitative results do not change and the goodness of fit of these alternative cutoffs are inferior to setting it to 50.

This suggests that 50 may be a good choice. Moreover, it should be noted that (1)

Italians usually retire in their fifties and people's cost structure might change (both physiologically and psychologically) after retiring from workforce; (2) we can see from 27

Figure 5 that the adoption rate for people younger than 50 is much higher than that in the over 65 age group. In order to make the cost function continuous, we have the term a2 * (50 — 19\ageit > 50) in equation (14). In addition, note that we only allow F0r to vary across different latent segments and this is just a simplification.

The probability that household i belongs to segment r is represented by a logistic formula:

(15)

exp(y0o,r + Yx.r * Xiit) where Xit are demographic variables of the household head. Since some households in the surveys did change household heads over time, Xit cannot be simplified to Xt.

Finally, the unconditional likelihood function can be expressed as:

(16)

i r tj where tt is the first time that household i is included in the sample; Tt is the last time that household i is included in the sample - it is either the adoption time for a new adopter or the last survey wave for a non-adopter.

Equation (16) is the usual way to write down the likelihood function. To be precise, in our estimation, if there was no household head change, nir is fixed over time

s - household i's likelihood contribution is £r nir * (J\t' Lit\ry, if the household head did change, we allow nir to be different in the pre-change stage and post-change stage - the likelihood contribution becomes 28

(17)

[Zj'ir,, * (n[/Lw |r)] * [z?*ir\, * (n£+1 Lt.t |r)].

Variables and the Evolution of State Variables

State variables (Sit): time or survey wave (t), age (ageit), number of bank ATMs per 1,000 population (niit), income (yiit), consumption of non-durables (cit), interest rate (Riit), age-specific survival probability (zit+1)

Control variable (aiit): adoption decision

Household head demographics (Xit): education, gender, location16

The evolution of main state variables:

]7 first-order Markov process for niit, ciit, yiit ; deterministic process for age with an upper bound Ag e=\02li;

I.I.D. type-1 extreme value distribution for eit.

We assume the time trend, xp(t), and the interest rate, Rit, are totally exogenous from each household head's perspective. When household heads forecast the future, how the time trend and the interest rate evolve is beyond their expectations. We assume their projections of the time trend and the future interest rate are approximated by the current period i//(£) and Rit, respectively. There are two reasons to make this assumption. First, in reality, it is usually hard to predict the speed of technology improvement. ATM technology is no exception, so although consumers can observe \p(t) in time t but they 29 cannot predict i/;(s) at s = t + 1, t + 2,...; for the interest rate, it is unreasonable to assume that ordinary people are able to predict its direction correctly. In fact, even professional economists make more wrong predictions than right ones.19 In other words, individuals are assumed to be forward-looking to calculate discounted future benefits, but they do not have correct expectations about the direction of future interest rates and the level of future technology advancement. Second, it can lessen the computational burden.

If the time trend can be predicted, each age group would have a unique set of value functions, which would make the already tremendous state space even larger. In our case, if there are m different age groups, the dimension of the new state space would be m times the original dimension. Similarly, if the interest rate can be predicted and there are k interest rates, the dimension of the new state space would be k times the original one.

6 Empirical Strategy 6.1 Estimation

There are three comprehensive review papers in which the estimation of a dynamic discrete choice model is discussed: Eckstein and Wolpin (1989), Rust (1994a), and most recently, Aguirregabiria and Mira (2007). In this paper, the estimation is carried out in two stages (Rust (1987), Rust and Phelan (1997)). In the first stage, we recover consumer beliefs about the evolution processes of most state variables (transition probabilities) by imposing rational expectation and exclusion restrictions (independence of state variables). In the second stage, we estimate a formal dynamic model to recover consumers' preference parameters and their adoption costs (with latent class 30 segmentation). Since the model is a finite-horizon dynamic one, the value function is calculated by the backward solving method.

6.2 Identification

We first demonstrate how we use some transformations to measure adoption benefits per period in monetary values. To continue from section 5.1, w can be approximated by annual income (yit) and c is measured by the consumption of non­ durable goods (qt). Suppose w=A * yiit, c0=n0 * ciit, and, C]=jix * ciit where A, nQ and

Hi are constants. Then,

(18) m* = j2Xi^T;*JyLtciit/Riit

(19) ATC = TQ - TC{ = (V2l^ - 4W^i) * JyucuRlit

We need to know ^2^7} in order to calculate ATC. Fortunately, the Bank of Italy's unique dataset contains information about an individual household's withdrawal behaviour, both before and after adopting an ATM card. Specifically, we observe TUQ, m{, yit, cit, and Rit. Therefore, we can separately estimate -yj2XnQTQ and ~j2XpL{i\ from equation (18). Plugging these two scalars into the expression for ATC in equation

(19), we can measure monetary cost savings from adoption. Intuitively, based on households' different withdrawal patterns, we can infer total monetary cost savings from

ATM card adoption.

As shown above, we can use a cash demand model to measure adoption benefits per period in monetary terms. More importantly, the estimation of the cash demand model is independent of the estimation of the structural model. The basic intuition to 31 identify adoption cost is through variation in adoption benefits and different adoption decisions within the same age group. It is not through variation in adoption benefits across different age groups. The main idea is that two people with the same age could have different discounted future benefits (might be because they have different consumption levels, they live in regions with different number of ATMs, etc.).

Combined with their different adoption decisions, we can infer their adoption costs.

Another way to think about this is to consider how the cost parameter is identified in an infinite horizon model, where there is essentially only one "age" group. A simple example is presented in Appendixl to illustrate the intuition for identification.

To some extent, the identification of adoption cost relies on the assumption that the cash demand model can successfully measure adoption benefits, people are forward- looking and they do cost-benefit analysis. The large variation in age in our sample and non-linear survival probabilities are also helpful in controlling for different life horizons and identify age-specific adoption costs. However, we should emphasize that our model is not simply identified through functional form (someone might argue that we could identify cost because adoption cost is a linear function of age while adoption benefit is a non-linear function of age). In fact, if there is enough variation in per period adoption benefits within each age group, we can recover the adoption costs for all age groups without applying functional form restrictions to the cost side.

We also try to estimate the annual discount factor /? in addition to the main model parameters and the results are presented in Tables A2-A5 in the appendix. The estimated

/? varies in different specifications and is generally less than 0.85. However, if we use the interest rate to predict the discount factor - /? « 1/(1 + R), /? should be larger than 32

0.9. In the economics and marketing literature, most papers on dynamic structural

models simply use a pre-determined /?. In order to be consistent with the previous

literature and make good use of information from the estimated (3, we pick 0.85 as the

annual discount factor for one set of dynamic models and we choose 0.90 for another set

of dynamic models. The following discussions center on these two versions of dynamic

models.

7 Estimation Results and Counterfactual Experiments

7.1 Estimation Results

Stepl

We discretize the main continuous state variables, namely, number of bank ATMs per 1,000 population (n[t), household income (yt t) and consumption of non-durables

(Cjt). Assuming each of these variables conforms to an AR(1) process, we estimate the equations governing their evolution. Households are assumed to have rational expectations, so they have a good understanding of these stochastic processes. The estimated equations are shown below:

(20) nu=0.1142+0.9191 *niit_x+en, en~N(0, 0.1139)

(21) yj/t=0.9528*yi/t_1+ey,ey~W(0, 19.4462)

(22) Ci^Q.9651*^^^,., ec~/V(0, 11.6134).

Table 6 contains estimation details. Their R2, which ranges from 0.82 to 0.9, indicates that AR(1) is a good approximation of the evolution processes. 33

Step 2

We first estimate a series of static reduced form Probit models with different specifications. Based on the estimation results of the static models (some of them are shown in Table A6 in the appendix), we select significant variables and some commonly used demographic variables that would enter our dynamic model. In total, we estimate

12 (=3*2*2) specifications based on: (1) model with one latent segment, two latent segments or three latent segments; (2) a discount factor of 0.85 or 0.9; (3) a linear or concave time trend. To be concise, we only show the results of models with a concave time trend. Table 7 shows results of models with one segment; Table 8 presents results of models with 2 segments; Table 9 contains results of models with 3 segments.

Which model performs the best?

We can select the best model from the twelve candidates along the above mentioned three dimensions: (1) segment: both AIC and BIC favour dynamic models with two latent segments; (2) time trend: similarly, in terms of goodness-of-fit statistics, specifications with the concave time trend outperform specifications with the linear time trend to a large extent; (3) discount factor: clearly, models with a 0.85 discount factor are superior to models with a 0.9 discount factor according to the model selection criteria.

Overall, the results show that the best model is the dynamic model with two latent segments and a discount factor of 0.85. Therefore, the remaining result discussions and counterfactual experiments are mainly based on this specification.

Goodness of fit of the best model 34

As shown in Figure 8, the dynamic model with two latent segments and a 0.85

discount factor fits the overall adoption rate over time very well.

Do the elderly have a larger adoption cost?

According to the estimation results of the specifications above, we find evidence

that older people might not have a larger adoption cost and this result is robust across

most dynamic models. In the two-segment 0.85-discount-rate model, the coefficient for

seniors' age-specific (age>50) adoption cost, a1; is not significant. a\ is even negative

and marginally significant in the 0.9 discount factor dynamic models. In contrast, a^ is

always significantly positive in the static models we tried.

This result might seem to be surprising. One possible explanation involves taking

into account the opportunity cost of time. In the survey, many senior household heads

were unemployed, probably due to retirement. For example, according to the 2004

SHIW survey, 59.4% of household heads aged between 51 and 65 were unemployed and

the percentage was 98.9% for heads over 65 years in age. Unemployed seniors had more

free time to spend and thus they had a lower opportunity cost of time. Even though

seniors might have more difficulties to learn how to use ATM technology,20 because their

unit time cost is lower, their total adoption cost may not be higher than that of younger

people.

We view this as evidence that seniors' lower adoption rate is mainly driven by their lack of incentives to adopt, not because they have a larger adoption cost. In other words, the lower adoption rate of the elderly can be largely rationalized by their shorter

life horizons. There are many reports about how old people lag behind in today's 35

"information revolution" world. For example, according to a New York Times article

(2004), only 22 percent of Americans over 65 went online, compared with 75 percent of those aged 30 to 49. Why don't seniors take advantage of new technologies, such as computers and the Internet? This paper offers a potential answer: even though seniors may be able to foresee the benefits of these new technologies, because of the much smaller discounted total benefits, it might not be worthwhile for them to exert the effort to learn them.

How important are the adoption benefits measured in the cash demand model?

We construct a measure of adoption benefits, ATC, in a cash demand model. The coefficient of ATC, bTC, is very significant and robust across all the models. This supports the validity of using this cash demand model and indicates that ATC is indeed a good measure of adoption benefits. Because ATC is a direct measure of cost savings from adopting an ATM card, it is more concrete than the composite quality index used in the durable goods literature.

How large are the adoption costs, in euros?

To obtain monetary adoption costs, we first need to disentangle bTC from bTC — bTC * (TJ2AIJ.QT0 — sJlXniTi). To accomplish this, we put together the amount of

ATM withdrawal (m[) for every ATM adopter and the amount of bank counter withdrawal (mj$) for every non-adopter across time. We also calculate the square root of each household's income*consumption of non-durables/interest rate - yJyi,tCix/Rit in the cash demand model. The summary statistics are shown in Table 10. 36

Running two OLS regressions,

(23) m\ = y/2A^Tx * ^]yi,tciit/Rix+e1

(24) m*0 = j2A(iQT0 * ^/J^c~jR~t+EQ, we get the estimates for •s]2XiI{r[ (0.9807) and -JIXHQTQ (3.6650), respectively. Details are shown in Table 11. One implicit assumption by using these two OLS regressions is that we do not consider unobserved individual heterogeneity in the cash demand model, although we do allow for unobserved individual heterogeneity in the dynamic discrete choice model. We discuss this limitation further in Section 8.

Plugging these two scalars into the expression of bTC, we can get bTC. Since ATC is measured in 1,000 Lire, dividing the estimates of the non-age specific adoption costs

(F01 and Foi2) by bTC, we can get the equivalent monetary adoption costs. In the dynamic model with two latent segments and a 0.85 discount factor,

bTC bTC- * , , -0.0433. The adoption cost is thus ^=275,924 Lire (€142.50 in

2002 euros) for the first segment, and ^=401,864 Lire (€207.54 in 2002 euros) for the ore second segment. The numbers in the static counterpart are much smaller. The static model in Table 8 shows that bTC is 0.0634, and the non-age specific adoption costs are

€52.92 and €61.25, respectively, for two latent segments.

Given that many Italian banks did charge an annual fee for the ATM card and the average rate was 6.2 euros in Attanasio et al. (2002), the estimated monetary adoption costs in the dynamic model appear reasonable, especially after considering the learning cost and hassle cost. However, since the sample employed for estimation in this paper is 37

a poorly educated population, it is possible that the adoption costs are exaggerated.

These numbers should therefore be interpreted with this caveat in mind.

The impacts of time trend and the development of the ATM network

In our estimation, we use a concave function to capture the time trend: ip(t) =

xp0 —. A positive and significant \p0 is found in the estimation results. Given that ATM

technology has been improving all the time in terms of both security and versatility, we

are not surprised to find a positive trend. The availability of ATMs (number of ATMs

per 1,000 population) also has a positive effect on adoption. Because the time trend and the number of ATMs are highly correlated (correlation coefficient > 0.7), the number of

ATMs is not very significant in some specifications.

Who belongs to which segment?

Consistent with common wisdom, education turns out to be the most important predictor as to which segment each household belongs to. Household heads with a lower

level of education21, namely, none or elementary school, are much more likely to belong to the segment with a larger adoption cost. Living in the north or south doesn't affect an individual's likelihood to fall into a given segment.

In adoption cost, households with a male head are not different from households with a female head. This is not consistent with previous research, which shows that men are more likely to adopt technology (for example, Kerschner and Chelsvig (1984)). One possible reason why we do not find that gender matters is that we use the same survival probabilities for both men and women. Therefore, a confounding effect could arise: 38 women have longer life expectancies and thus bigger adoption benefits than men; on the other hand, men have a higher tendency to try new technologies and thus smaller adoption costs than women. To separate out these two factors, we allow for gender- specific survival probabilities (please refer to Figure 9 for gender-specific survival probability curves) and re-estimate the model with a 0.85 discount factor and two latent segments. The estimation results are presented in Table 12.

Most of the results are just consistent with the counterpart without gender-specific survival probabilities. Interestingly, in the new specification, the coefficient for gender is now positive and marginally significant, which means that after controlling for different life expectancies, men are more likely to have a smaller adoption cost. We also view this as a validation of the main model.

7.2 Counterfactual Experiments

In this section we use the parameter estimates in the two-segment 0.85 discount factor model to analyze households' adoption decisions. Specifically, we examine the impact of 1) a sign-up bonus to seniors, 2) an increase/decrease in the number of ATMs,

3) an increase in the interest rate, on adoption. The results of our counterfactual experiments can help marketers better understand 1) if a certain amount of sign-up bonus is given out to the elderly group, how effective can it be; 2) the quantitative benefits to consumers if more ATMs are installed; 3) the quantitative effects of increasing the interest rate on consumer deposits. 39

We mainly use the percentage of new adopters in each period over the previous

period non-adopters to represent adoption. We also compare the overall cumulative

adoption rate in the first subsection - section 7.2.1.

7.2.1 The effect of a sign-up bonus to the elderly group on the percentage of new

adopters and the cumulative adoption rate

Since it is mainly the elderly who have a low adoption rate, a sign-up bonus

targeted at the elderly group (age > 50) is very effective: a €10 sign-up bonus to the

elderly can increase the average percentage of new adopters to 19.4%; a €20 sign-up

bonus can raise the number to 26.0%; a €50 sign-up bonus can make it 55.0%! Figures

10 and 11 show the impact of different amounts of sign-up bonuses to seniors on the

percentage of new adopters and the overall cumulative adoption rate, respectively.

7.2.2 The effect of an increase/decrease in the ATM number on the percentage of new adopters

Installing more ATMs can attract more customers to adopt the ATM card, and the

impact is non-trivial. A 50% increase in the number of ATMs can result in an on average

33.4% increase in the percentage of new adopters, while a 50% decrease in the number of

ATMs can roughly decrease the percentage of new adopters by 27.0%. Details are depicted in Figure 12. Depending on the cost of installing one ATM, the subsequent maintenance cost, and competitors' strategies, banks can form plans about how many new ATMs to install in the future. 40

7.2.3 The effect of an interest rate increase on the percentage of new adopters

A higher interest rate would make it more costly to hold cash in hands, thus

giving people more incentives to adopt ATM cards. As shown in Figure 13, a 0.01

interest rate increase would induce 12.6% more non-adopters to adopt an ATM card,

whereas a 0.02 interest rate increase would have made 23.3% of non-adopters decide to

adopt. Banks might want to learn this side effect of a promotional interest rate increase:

it may not only encourage people to deposit more, but also make non-adopters more

likely to consider getting ATM products.

8 Limitations

Several limitations should be noted. First, we use a very simple calibration

method to transform adoption benefits into monetary values (and then recover adoption

costs in monetary values), without considering individual heterogeneity. Consequently,

we can only interpret the estimated adoption costs as average costs. The advantage of

making this assumption is that the calibration step can be done independent of the

structural estimation. It is much more complicated to consider individual heterogeneity

in the calibration step because individual heterogeneity in both the calibration and

structural estimation step should be correlated. Future research can make the calibration

more flexible at the expense of a heavier computational burden, because we have to jointly estimate the cash demand model and the dynamic structural model.

Second, people might get non-monetary benefits from using ATM cards or bank counter services. This feature is currently missing in the paper, largely because our data 41 is not rich enough to identify those non-monetary benefits. Depending on where those non-monetary benefits come from, our estimates of adoption costs could be biased downwards or upwards. If, for example, people are afraid of making mistakes from

ATM transactions so they avoid using ATM cards, the cash demand model would overstate ATM card adoption benefits, and consequently the adoption costs would be overestimated. If, for example, people like the option that they can withdraw cash with

ATM cards when travelling to a foreign country, the cash demand model would understate ATM card adoption benefits, and the adoption costs would thus be underestimated.

Third, it might be worthwhile to more finely discretize the state space and allow for normally distributed random coefficients (instead of a latent class model) by using, for example, the interpolation and simulation method proposed by Keane and Wolpin

(1994).

Finally, the usual caveat applies: the results may apply only to our sample of households. For example, the educational level of our sample is relatively low. So the estimates of adoption costs may be upwardly biased.

9 Conclusion

This paper is motivated by a stylized fact of consumers' adoption decisions - the elderly have much lower adoption rates of new technologies. Different reasons are offered in the literature and these can be summarized as the elderly having larger adoption costs. Having larger adoption costs is one explanation, and having shorter life 42 horizons is another. Without a dynamic model taking people's limited life horizons into account, these two reasons cannot be separated out. This paper provides the first attempt to disentangle these two reasons by allowing for age-specific adoption costs and incorporating people's different life horizons in a finite horizon dynamic model. The findings are: for ATM adoption in Italy, our estimation results suggest that the elderly may not have larger adoption costs, so their lower adoption rates are probably caused by their shorter remaining life horizons. At the same time, with the help of a Baumol-Tobin type cash inventory management model, we can first measure the adoption benefits in

2002 euros and then recover the adoption costs in 2002 euros. The estimates can give managers a good sense of the magnitude of consumers' monetary adoption costs and further help them design optimal marketing mixes to speed up the diffusion process of new technology products. By conducting counterfactual experiments, we quantify how consumers' ATM adoption decisions would be affected by changing (i) the amount of sign-up bonuses offered to the elderly, (ii) number of ATMs, and (iii) interest rates. 43

Appendix

Appendix 1: The intuition for identification and why only a finite-horizon dynamic model can correctly estimate the initial adoption cost Suppose in the data we observe two individuals, Tom and Jerry. In 2008, both of them were 70 years old. Tom adopted an ATM card in 2008, but Jerry did not. Suppose we can measure that Tom's annual adoption benefit was $ 10, while Jerry's potential adoption benefit was $8. Let us further assume that the initial adoption cost F, was the same for both of them. What kind of interval estimate of F can we get? In a static model, Tom and Jerry are assumed to be myopic - they compare only their current benefit with the initial adoption cost. Because Tom adopted but Jerry did not, we would infer that F is between $8 and $10. In an infinite horizon model with a 0.9 discount factor, we would conclude that F is between $80 and $100, because we implicitly assume that people would live forever by using an infinite horizon model. Realistically speaking, people would not be so myopic as to consider only a one-period benefit, or be so naive as to assume that they could live forever. They must realize that they can enjoy the adoption benefit for many years, but not without an end. Suppose the terminal age is 80. After re-calculating their discounted future benefits, we could infer that the initial adoption cost is between $44 and $57. This is a very simple model with only two individuals and no stochastic factors involved. But the basic intuition has been shown - we can back out the initial adoption cost if we can measure individuals' adoption benefits and if we know their adoption decisions. In addition, a static model tends to underestimate the initial adoption cost and an infinite horizon dynamic model tends to overestimate the initial adoption cost. Only a finite-horizon dynamic model can provide a more precise estimate of the initial adoption cost. 44

Appendix 2: A simple analytical characterization of a forward-looking household's adoption decisions Notations

VT: total expected discounted utility if adoption happens at time T

V0: total expected discounted utility if adoption never happens

Bir: household /'s adoption benefits at time T

FiT: household /'s one-time adoption costs if adoption happens at time T T: terminal period

At time /, a non-adopter /'s decision is:

maxa._te{0,i}{Vt> max(Vt+1, Vt+2 VT, V0)} where

r t+1 = Et( £ F-'B^-pFit+i) \T=t + l /

Vt+2 v _ Vr=t+^T = t+22 /

T t T t VT = Et(B - BiiT-B - FiiT)

V0 = 0

If Vt > max(Kt+1, Vt+2,..., VT, V0), then ait = 1, i.e., if the total expected discounted utility from adopting at t is larger than the maximum of total expected discounted utility from not adopting at /, household / would adopt at time t; \$Vt < max(Vt+1, Vt+2, •••, VT, V0), then ait = 0 - household / decides not to adopt at time /. Similar problems repeat period after period until household / becomes an adopter and/or terminal period T is reached. Finally, please note that in order to underline the basic intuition, we omit complications such as the error terms and age-specific survival probabilities. The Bellman equations (9)-(l 1) in the main body of the paper have all those features. 45

References

Adams, A.S. and K.A. Thieben, (1991): "Automatic Teller Machines and the Older

Population." Applied Ergonomics, 22(2), 85-90.

Aguirregabiria, Victor and Pedro Mira (2008): "Dynamic Discrete Choice Structural

Models: A Survey," Journal of Econometrics, forthcoming.

Alvarez, Fernando and Francesco Lippi (2009): "Financial Innovation and the

Transactions Demand for Cash," Econometrica, 363-402.

Attanasio, Orazio R., Guiso, Luigi, and Jappelli, Tullio (2002): "The demand for money,

financial innovation, and the welfare cost of inflation: An analysis with household data,"

Journal of Political Economy, 110(2), 317-3 51.

Baumol, William J. (1952): "The Transactions Demand for Cash: an Inventory-Theoretic

Approach," Quarterly Journal of Economics, 66, pp. 545-56.

Canato, Anna and Nicoletta Corrocher (2004): "Information and communication technology: organisational challenges for Italian banks", Accounting, Business and

Financial History 14:3, 355-370.

Dayton, CM. and G.B. Macready (1988): "Concomitant-variable Latent-class Models",

Journal of the American Statistical Association, 83 (401), pp. 173-178. 46

Eckstein, Z., and K. Wolpin (1989): "The Specification and Estimation of Dynamic

Stochastic Discrete Choice Models," Journal of Human Resources, 24, 562-598.

European Payment Cards Yearbook, 2005-6 Edition

Fang, Hanming, Michael Keane, Ahmed Khwaja, Martin Salm and Dan Silverman

(2007): "Testing the Mechanisms of Structural Models: The Case of the Mickey Mantle

Effect", American Economic Review, P&P, Vol. 97(2), pp. 53-59, May 2007.

Ferrari, Stijn, Frank Verboven, and Hans Degryse (2007): "Investment and usage of new technologies: evidence from a shared ATM network", working paper, Catholic

University of Leuven

Gilly, Mary C. and Valerie Zeithaml (1985): "The Elderly Consumer and Adoption of

Technologies." Journal of Consumer Research, 12: 353-357.

Goettler, Ronald L. and Karen Clay (2007): "Tariff Choice with Consumer Learning:

Sorting-Induced Biases and Illusive Surplus", working paper, CMU

Gupta, S. and P.K. Chintagunta (1994): "On Using Demographic Variables to Determine

Segment Membership in Logit Mixture Models", Journal of Marketing Research, 31, pp.

128-136. 47

Hall, Bronwyn H. and Beethika Khan (2003): "Adoption of New Technology" In Jones,

Derek C, New Economy Handbook, Amsterdam: Elsevier Science, 2003.

Hannan, Timothy H. and John M. McDowell (1984): "The Determinants of Technology

Adoption: The Case of the Banking Firm" The RAND Journal of Economics, Vol. 15, No.

3., pp. 328-335.

Hannan, Timothy H. and John M. McDowell (1987): "Rival Precedence and the

Dynamics of Technology Adoption: An Empirical Analysis" Economica, New Series,

Vol. 54, No. 214., pp. 155-171.

Hatta, K. and Liyama, Y., (1991): "Ergonomic Study of Automatic Teller Machine

Operability" International Journal of Human Computer Interaction 3(3), 295-309.

Hester, Donald D., Calcagnini, Giorgio, and de Bonis, Ricardo (2001), "Competition through Innovation: ATMs in Italian Banks," Rivista Italiana degli Economisti, (3), 359—

381.

Huynh, Kim P. (2007): "Financial Innovation and the Persistence of the Extensive

Margin", working paper, Indiana University 48

Ishii, Joy (2005): "Compatibility, Competition, and Investment in Network Industries:

ATM Networks in the Banking Industry", working paper, Stanford University

Keane, Michael P. and Kenneth I. Wolpin (1994): "The Solution and Estimation of

Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte

Carlo Evidence," The Review of Economics and Statistics, Vol. 74(4), pp. 648-672.

Kerschner, P.A. & K. H. Chelsvig. (1984). "The Aged User and Technology," in Dunkle,

Ruth E., Haug Marie R., Rosenberg M. (eds) Communications Technology and the

Elderly: Issues and Forecasts. New York: Springer Publishing Company, 135-144.

Khwaja, Ahmed (2001): "Health Insurance, Habits and Health Outcomes: A Dynamic

Stochastic Model of Investment in Health." Ph.D. dissertation, University of Minnesota

International Relations and Security Network: "Italy not immune to crisis", ISN Security

Watch, 14 Jan 2009.

Magnac, Thierry, and David Thesmar (2002): "Identifying Dynamic Discrete Decision

Processes", Econometrica, 70, 801-816.

New York Times: "For Some Internet Users, It's Better Late than Never", March 25,

2004. 49

OECD Indicators (2007): "Education at a Glance 2007"

Orlandi, Eugenio (1989): "Computer Security Economics", ICCST, Zurich, Switzerland,

107-111.

Ratchford, Brian (2001): "The Economics of Consumer Knowledge." Journal of

Consumer Research, Vol. 27, 397-411.

Ricciarelli, Matteo (2007): "Transaction Privacy, Crime and Cash in the Purse: an

Analysis with Household Data", working paper.

Rogers, W.A., Cabrera, E.F., Walker, N., Gilbert, D.K. and Fisk, A.D. (1996): "A Survey

of Automatic Teller Machine Usage across the Adult Lifespan." Human Factors, 38,

156-166.

Rust, John (1987): "Optimal Replacement of GMC Bus Engines: An Empirical Model of

Harold Zurcher", Econometrica, 55, 999-1033.

Rust, John (1994a): "Estimation of Dynamic Structural Models, Problems and Prospects:

Discrete Decision Processes", in C. Sims (ed.) Advances in Econometrics. Sixth World

Congress, Cambridge University Press. 50

Rust, John (1994b): "Structural Estimation of Markov Decision Processes," in Handbook

of Econometrics, Vol. 4, ed. by R. Engle and D. McFadden. Amsterdam: North Holland,

pp. 3081-3143.

Rust, John and C. Phelan (1997): "How Social Security and Medicare Affect Retirement

Behavior in a World of Incomplete Markets." Econometrica, 65, 781-832.

Ryan, Stephen and Catherine Tucker (2007): "Heterogeneity and the Dynamics of

Technology Adoption", working paper, MIT

Saloner, Garth and Andrea Shepard (1995): "Adoption of Technologies with Network

Effects: An Empirical Examination of the Adoption of Automated Teller Machines" The

RAND Journal of Economics, Vol. 26, No. 3., pp. 479-501.

Shcherbakov, Oleksandr (2007): "Measuring Consumer Switching Costs in the

Television Industry", working paper, University of Arizona

Sjuggerud, Steve (2005): "Interest Rate Forecast... The World's Best Prediction for 2005

will Surprise You" The Investment U e-Letter, Issue #401, Monday, January 10.

Song, Inseong, and Pradeep Chintagunta (2003): "A Micromodel of New Product

Adoption with Heterogeneous and Forward Looking Consumers: Application to the

Digital Camera Category." Quantitative Marketing and Economics, 1, 371^407. 51

Swanson, Charles E., Kenneth J. Kopecky and Alan Tucker (1997): "Technology

Adoption over the Life Cycle and Aggregate Technological Progress." Southern

Economic Journal, Vol. 63, No. 4, pp. 872-887.

Tobin, James (1956): "The Interest Elasticity of Transactions Demand for Cash," The

Review of Economics and Statistics, 38, August, pp. 241-247. 52

Table 1: Retail trade, debit card transactions and credit card transactions

Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 Italy retail trade value 687,525 697,523 716,356 735,889 738,225 754,206 748,384 757,452 761,114 (million euros) All debit card POS transactions 14,792 18,855 23,059 32,427 27,899 31,667 33,633 35,181 36,880 (million euros) All credit card POS transactions 18 22 25 28 30 36 40 42 45 (million euros) Sources: Italian Institute of Statistics (1STAT) and Bank of Italy

Table 2: Cumulative adoption rate of ATM cards (1991*-2004) Year 1991 1993 1995 1998 2000 2002 2004 Adoption Rate 0 0.1387 0.2397 0.4363 0.5318 0.6236 0.6648 Observations (panel) 387 483 534 534 534 534 534 * ATM card information before 1991 is not included in the Bank of Italy's public database. 53

Table 3: Summary statistics of main variables* Variable 1991 1993 1995 1998 2000 2002 2004 Age (household 52.0956 53.1367 54.5225 56.8858 58.2097 60.1199 61.9214 head) (13.5214) (13.8939) (13.5967) (13.6345) (13.6782) (13.6558) (13.8924) 47.2688 46.5080 47.2081 51.0605 51.9912 50.2374 50.9209 Household income (22.9061) (24.5015) (24.9661) (28.0614) (28.6416) (27.0574) (27.1699) Consumption of non- 32.1393 32.9264 34.3390 33.3956 34.3983 33.5241 35.7546 durables (13.2241) (13.9453) (14.6396) (14.7588) (14.7183) (15.0404) (15.9624) 0.8527 0.7950 0.7809 0.7491 0.6592 0.6273 0.6236 Male head (0.3549) (0.4041) (0.4140) (0.4340) (0.4744) (0.4840) (0.4849) Living area Percent North or Centre 55.62 South or Islands 44.38 Highest educational qualification Percent achieved (household head) None 4.91 6.42 6.93 7.68 5.81 4.87 4.49 Elementary school 38.50 39.34 39.33 36.52 36.89 37.45 37.83 Middle school 33.59 30.64 30.52 29.21 29.03 29.78 29.03

High school 19.90 19.67 19.29 21.72 23.22 22.85 23.41 Bachelor's degree 3.10 3.93 3.93 4.87 5.06 5.06 5.24 and above Observations 387 483 534 534 534 534 534 * Income and Consumption of non-durables are measured in 1,000,000 Lire (2002). Numbers in brackets are standard deviations. 54

Table 4: Interest rate Year 1991 1993 1995 1998 2000 2002 2004 Interest Rate (mean) 8.872 10.274 6.829 3.811 1.862 1.647 0.916 Interest Rate (standard deviation) 0.489 0.401 0.263 0.206 0.190 0.170 0.115 Observations (number of regions) 20, interest rate varies by region in Italy

Table 5: Number of ATMs per 1,000 population Year 1991 1993 1995 1998 2000 2002 2004 Number of ATMs per 1,000 Population (mean) 0.141 0.208 0.246 0.497 0.573 0.648 0.646 Number of ATMs per 1,000 Population (standard deviation) 0.100 0.125 0.150 0.226 0.227 0.235 0.236 Observations (number of provinces) 91 92 91 95 95 95 95 55

Table 6: AR(1) process for main state variables

State Variables ni,t yiit (1,000 Lire) ciit (1,000 Lire) 0.9191 0.9528 0.9657 t-1 (0.0185) (0.00644) (0.00586) 0.1142 Constant n.a. n.a. (0.00863) a 0.1139 19.4462 11.6134 R2 0.8210 0.8794 0.9003 Range [0:0.1:1.5] [0:15:150] [0:10:100] Number of grid points 16 11 11 56

Table 7: Dynamic models with one segment /?=0.85, dynamic model /?=0.9, dynamic model parameter estimate s.d. estimate s.d.

xp0 (time trend) 2.6844** 0.4996 2.2861** 0.4101

bTC (adoption benefit) 0.0902** 0.0103 0.0794** 0.009

bn (# of ATMs) 0.9018** 0.324 0.7935* 0.3204

F0,i (adoption cost in segment 1) 13.0829** 1.5635 16.3025** 1.8322 ax (age-specific adoption cost, age>50) -0.0217 0.0182 -0.1036** 0.0257 a2 (age-specific adoption cost, age<50) -0.0227* 0.0115 -0.033* 0.013

-11 829.489 839.948 N 1961 1961 AIC 1670.978 1691.896 BIC 1678.733 1699.651 57

Table 8: Models with two segments /?=0.85, dynamic model /?=0.9, dynamic model static model parameter estimate s.d. estimate s.d. estimate s.d.

xp0 (time trend) 3.2062** 0.9509 2.6646** 0.7558 6.1296** 1.0652

bTC (adoption benefit) 0.1163** 0.0222 0.0981** 0.0164 0.1701** 0.0184

bn (# of ATMs) 1.7198* 0.7247 1.4794* 0.6424 0.7806* 0.3136

F0 ! (adoption cost in segment 1) 11.9547** 3.0041 14.3346** 3.3096 6.4956** 2.0075

Log(F0,2 - F0,i) 1.6968** 0.204 1.8596** 0.1758 0.0234 1.7567 (log(adoption cost difference)) a-i (age-specific adoption cost, -0.0004 0.0295 -0.0776+ 0.0426 0.0644** 0.0094 age>50)

a2 (age-specific adoption cost, -0.0154 0.0173 -0.0193 0.018 0.0038 0.0069 age<50)

Yo -0.813 0.7779 -0.823 0.564 -3.782 6.9018

Ysex 0.1204 0.2432 0.13 0.2423 0.7056 6.78

Ynort h -0.3386 0.2759 -0.2623 0.2705 -9.378 25.37562 Yedui (none) -1.3618* 0.6545 -1.5603* 0.6544 -5.5401 32.5692 Yedui (elementary) -0.5514+ 0.3069 -0.6453* 0.3 -0.0139 0.9562 Yedu3 (middle school) 0.0376 0.2674 0.0235 0.277 -3.1446 7.541

-11 816.833 819.467 820.089 N 1961 1961 1961 AIC 1659.666 1664.934 1666.178 BIC 1676.468 1681.736 1682.980 58

Table 9: Dynamic models with three segments /?=0.85, dynamic model /?=0.9, dynamic model parameter estimate s.d. estimate s.d. ^o (time trend) 3.1638** 0.8882 2.5156** 0.6651

bTC (adoption benefit) 0.1156** 0.0193 0.0977** 0.0136

bn (# of ATMs) 1.8292** 0.6494 2.0032** 0.6764

F01 (adoption cost in segment 1) 11.8927** 2.2483 14.303** 2.9898

Log(F02 - F0il) (log(adoption cost difference between segments 2 and 1)) -0.6188 3.2667 1.8029** 0.5583

Log(F0,3 - ^0,2) (log(adoption cost difference between segments 3 and 2)) 1.6386** 0.173 0.0725 7.743

aa (age-specific adoption cost, age>50) 0.0016 0.0289 -0.0635 0.0426

a2 (age-specific adoption cost, age<50) -0.0133 0.0163 -0.0233* 0.0095

Y01 -0.531 0.5411 0.8386 3.8343

Ysexl 0.1734 0.5409 -0.1535 0.3659

Ynorthl -2.5596* 1.2623 2.2363 24.5916 Yedui.i (none) -2.1159** 0.3795 -2.8384 2.8238 Yeduz.i (elementary) -2.5364 1.6958 -1.9436 3.7252 Yedu3,1 (middle school) -1.0292 0.8105 -1.1051 3.8374

Y02 -2.4221* 1.2248 0.9751 8.7846

Ysex2 0.0383 0.3259 -0.6911 0.7078

Ynort Kl 0.3974 0.522 4.7594 24.8495 Yedu\,2 (none) -1.0373 0.6794 -3.4609 10.3246 Yedu2,2 (elementary) 0.9774 0.8415 -3.154 10.9491 Yedu3,2 (middle school) 1.3812+ 0.8191 -2.4499 8.8966

-11 811.437 812.383 N 1961 1961 AIC 1662.874 1664.766 BIC 1688.724 1690.616 59

Table 10: Summary statistics of withdrawal patterns Variable (1,000 Lire) Mean Std. Dev. Min Max Observations m{ (amount of withdrawal at an ATM) 459.0951 294.581 52.59712 3287.858 868

yJyi,tCi,t/Ri,t 388.1708 196.3116 55.1366 1191.619 868 mj (amount of withdrawal at a bank counter) 1103.107 1081.906 73.63596 23604.31 1628

230.4905 140.1507 20.5744 1324.367 1628 Jyi,tCi,t/Ri,t

Table 11: OLS regressions to estimate -v/2A/i171 and •v/2A^0'^o Variable (1,000 Lire) mj m'o 0.9807 3.6650 Jyi,tCi,t/Ri,t (0.0265)** (0.1091)**

Observations 868 1628 R-squared 0.61 0.41 60

Table 12: Model with gender-specific survival probabilities dynamic model, dynamic model, /?=0.85 universal survival gender-specific survival probabilities probabilities parameter estimate s.d. estimate s.d.

\p0 (time trend) 3.2062** 0.9509 4.2508** 1.5245

bTC (adoption benefit) 0.1163** 0.0222 0.1225** 0.023

bn (# of ATMs) 1.7198* 0.7247 1.2078* 0.6315

F0 ! (adoption cost in segment 1) 11.9547** 3.0041 12.3759** 3.6627

Log(F0,2 - Fo,i) (log(adoption cost difference)) 1.6968** 0.204 1.9825** 0.2901 a-i (age-specific adoption cost, age>50) -0.0004 0.0295 -0.0128 0.0328

a2 (age-specific adoption cost, age<50) -0.0154 0.0173 -0.0118 0.0155

Yo -0.813 0.7779 -1.9937** 0.6593

Ymale 0.1204 0.2432 0.6676+ 0.3799

Ynort h -0.3386 0.2759 -0.1333 0.3344 Yedui (none) -1.3618* 0.6545 -1.5344* 0.7421 Yedui (elementary) -0.5514+ 0.3069 -0.713* 0.3323 Yedu3 (middle school) 0.0376 0.2674 0.0296 0.2966

-11 816.833 817.764 N 1961 1961 AIC 1659.666 1661.528 BIC 1676.468 1678.330 61

Table Al: Static models with one segment, different cut-off points

cut-off age=50 cut-off age=60 cut-off age=6 5 parameter estimate s.d. estimate s.d. estimate s.d.

ip0 (time trend) 6.2283** 1.0651 6.2341** 1.0642 6.2464** 1.063

bTC (adoption benefit) 0.1697** 0.0184 0.1721** 0.0184 0.1758** 0.0183

bn (# of ATMs) 0.7072* 0.3135 0.6882* 0.3135 0.5955+ 0.3108

F01 (adoption cost in segment 1) 7.5478** 0.9361 7.8102** 0.9339 7.9725** 0.9351 ax (age-specific adoption cost, age> cut-off age) 0.0641** 0.0094 0.0931** 0.0161 0.1138** 0.0218 a2 (age-specific adoption cost, age

-11 820.054 824.913 831.344 N 1961 1961 1961 62

Table A2: Dynamic models with one segment (estimate /?)

/?=0.85, dynamic model Estimate /? parameter estimate s.d. estimate s.d.

i/;0 (time trend) 2.6844** 0.4996 4.7335** 1.0612

bTC (adoption benefit) 0.0902** 0.0103 0.1356** 0.0205

bn (# of ATMs) 0.9018** 0.324 0.8067* 0.3216

F0 j (adoption cost in segment 1) 13.0829** 1.5635 8.9233** 1.2719 a.\ (age-specific adoption cost, age>50) -0.0217 0.0182 0.0538** 0.0128 a2 (age-specific adoption cost, age<50) -0.0227* 0.0115 -0.0013 0.0088 P 0.5841** 0.2151

-11 829.489 818.234 N 1961 1961 63

Table A3: Dynamic models with two segments (estimate /?) /?=0.85, dynamic model Estimate /? parameter estimate s.d. estimate s.d.

xpQ (time trend) 3.2062** 0.9509 5.6435+ 3.2719

bTC (adoption benefit) 0.1163** 0.0222 0.1656+ 0.0864

bn (# of ATMs) 1.7198* 0.7247 1.3005+ 0.7765

F01 (adoption cost in segment 1) 11.9547** 3.0041 10.2222** 3.7402

Log(F0,2 - F0jl) 1.6968** 0.204 1.2673 0.8555 (log(adoption cost difference)) <*! (age-specific adoption cost, -0.0004 0.0295 0.0591 0.0391 age>50)

a2 (age-specific adoption cost, -0.0154 0.0173 -0.0019 0.0132 age<50)

Yo -0.813 0.7779 -0.5008 1.2124

Ysex 0.1204 0.2432 0.1196 0.2735

Ynort h -0.3386 0.2759 -0.2123 0.3159 Yedui (none) -1.3618* 0.6545 -1.3402 1.1468 Yedui (elementary) -0.5514+ 0.3069 -0.4709 0.4121 Yeduz (middle school) 0.0376 0.2674 0.0567 0.2811 P 0.658** 0.2011

-11 816.833 813.709 N 1961 1961 64

Table A4: Dynamic models with gender-specific survival probabilities (estimate /?) dynamic model, gender-specific survival Estimate /? probabilities, /?=0.85 parameter estimate s.d. estimate s.d.

4>0 (time trend) 4.2508** 1.5245 5.5547** 1.3767

bTC (adoption benefit) 0.1225** 0.023 0.1574** 0.0222

bn (# of ATMs) 1.2078* 0.6315 1.4300* 0.5577

F01 (adoption cost in segment 1) 12.3759** 3.6627 10.8933** 0.7163

Log(F0,2 - fyi) 1.9825** 0.2901 1.2900** 0.3643 (log(adoption cost difference)) a-i (age-specific adoption cost, -0.0128 0.0328 0.0412 0.0264 age>50)

a2 (age-specific adoption cost, -0.0118 0.0155 -0.0043 0.0129 age<50)

Ko -1.9937** 0.6593 -0.958 1.1942

Ymale 0.6676+ 0.3799 0.2957 0.348

Ynort h -0.1333 0.3344 -0.4554* 0.2083 Yedui (none) -1.5344* 0.7421 -1.7409* 0.7284 Yedui (elementary) -0.713* 0.3323 -0.8598** 0.2366 Yedui (middle school) 0.0296 0.2966 -0.0187 0.2875 P 0.7036** 0.1822

-11 817.764 815.762 N 1961 1961 65

Table A5: Dynamic models with three segments (estimate /?) /?=0.85, dynamic model Estimate /? parameter estimate s.d. estimate s.d.

xp0 (time trend) 3.1638** 0.8882 3.887** 1.2694

bTC (adoption benefit) 0.1156** 0.0193 0.1349** 0.0266

bn (# of ATMs) 1.8292** 0.6494 1.9471** 0.7116

F0,i (adoption cost in segment 1) 11.8927** 2.2483 11.6432** 3.5189

L°g(F0i2 - F0il) -0.6188 3.2667 -0.6891 5.1897 (log(adoption cost difference between segments 2 and 1))

Log(F0>3 - F0i2) 1.6386** 0.173 1.5079** 0.3993 (log(adoption cost difference between segments 3 and 2))

aa (age-specific adoption cost, age>50) 0.0016 0.0289 0.0269 0.0343

Ysexl 0.1734 0.5409 0.2151 0.5429

Ynort hi -2.5596* 1.2623 -2.632 2.1771 Yedui.i (none) -2.1159** 0.3795 -2.1412 1.7832 Yedu2,i (elementary) -2.5364 1.6958 -2.5728 2.642 Yedu 3,1 (middle school) -1.0292 0.8105 -0.9473 1.1809 Yoi -2.4221* 1.2248 -2.317+ 1.383

Ysex2 0.0383 0.3259 0.0266 0.358

Ynort hi 0.3974 0.522 0.4965 0.8391 Yedui,2 (none) -1.0373 0.6794 -0.7614 2.6464

Yed»2,2 (elementary) 0.9774 0.8415 1.092* 0.4742 Yedu3,2 (middle school) 1.3812+ 0.8191 1.2903* 0.5988 P 0.8024** 0.1031

-11 811.437 809.466 N 1961 1961 66

Table A6: Static models with one segment, different specifications Model 1 Model 2 Model 3 parameter estimate s.d. estimate s.d. estimate s.d.

ip0 (time trend) 4.7481** 1.1885 4.8036** 1.187 5.2429** 1.137

bTC (adoption benefit) 0.1644** 0.022 0.1649** 0.0219 0.171** 0.0196

bn (# of ATMs) 1.5211** 0.4908 1.4586** 0.4868 1.3554** 0.453

F0i (adoption cost in segment 1) 6.4716** 1.0671 6.5606** 1.033 6.8069** 1.0015 ax (age-specific adoption cost, age>50) 0.0537** 0.0099 0.0531** 0.0099 0.0569** 0.0096 a2 (age-specific adoption cost, age<50) 0.0025 0.0071 0.0029 0.0071 0.0046 0.007

Ysex -0.0651 0.1497 -0.0709 0.1496 -0.0794 0.1485 Yedui (none) 0.8400* 0.4119 0.9054* 0.4084 0.8249* 0.4037 Yedu2 (elementary) 0.2348 0.1774 0.2748 0.1749 0.2248 0.1715 Yeduz (middle school) -0.0537 0.1648 -0.0416 0.1642 -0.0539 0.1623

Yareal (location 1) 0.2433 0.2734 0.254 0.2699 Yareai (location2) 0.6696* 0.2929 0.7143* 0.2887

Yarea 3 (location3) 0.4965* 0.2416 0.5192* 0.2361 Yareai (location4) 0.1353 0.2002 0.148 0.1947 Ysizei (living_areal) 0.2524 0.3283 Ysizei (Hving_area2) 0.1329 0.3342 Ysizes (Hving_area3) 0.0444 0.3095 Ynperc (number of income earners in hh) -0.1333 0.0897 -0.1218 0.0891

Ynort h 0.3891* 0.1879

-11 810.094 811.053 814.048 N 1961 1961 1961

Fiit = F0 + «i * {ageu - S0\ageiit > 50) + a2 * (50 - 19\ageLt > 50)

+a2 * (ageit - 19\ageLt < 50) + yx *X 67

Table A7: Some variables used in static models (X in Table A6)

edul no education edu2 elementary school edu3 middle school edu4 high school and above livingareal up to 20,000 inhabitants living_area2 from 20,000 to 40,000 living_area3 from 40,000 to 500,000 living_area4 more than 500,000 location 1 North -West location2 North -East location3 Centre location4 South location5 Islands 68

Figure 1: Number of ATMs in Italy, 1991-1999

12000 10000 • Northwest 8000 • Northeast 6000 ***4g°™" =-#«• Central 4000 -H-South 2000 X. Islands 0 1991 1992 1993 1994 1995 1996 1997 1998 1999

Source: Hester et al (2001)

Figure 2: Ratio of ATMs to Branches, 1991-1999

Northwest Northeast Central South Islands Total 1991 1992 1993 1994 1995 1996 1997 1998 1999

Source: Hester et al (2001) 69

Figure 3: Overall Adoption Rate of ATM Card in Italy, 1991-2004

Source: SHIW 1991-2004 70

Figure 4: Panel

• Observations 534 534 534 534 534 lllllll 1991 1993 1995 1998 2000 2002 2004 71

Figure 5: Cumulative ATM Card Adoption Rate by Age (Panel Households in 1991)

1991 1993 1995 1998 2000 2002 2004

up to 50 years in 1991 )( more than 65 years in 1991

Figure 6: Overall ATM Card Adoption Rate by Education (Panel) 72

Figure 7: Age-specific Survival Probability in Italy

1.2 -

l -

0.8 -

0.6 -

0.4 -

0.2 -

0 - 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 74 77 80 83 86 89 92 95 98

survival probability 73

Figure 8: Cumulative Adoption Rate: Actual and Predicted

1991 1993 1995 1998 2000 2002 2004

• Adoption Rate (Actual) ^j^—Adoption Rate (Predicted) 74

Figure 9: Age-specific Survival Probability by Gender

1.2 -

l -

0.8 -

0.6 -

0.4 -

0.2 -

0 - 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62 65 68 71 74 77 80 83 86 89 92 95 98

male —•female 75

Figure 10: Counterfactual Experiment: a sign-up bonus to seniors*

0.7 0.6 0.5 0.4 0.3 0.2 0.1

1993 1995 1998 2000 2002 2004

% of new adopters (Predicted) % of new adopters (€10)

% of new adopters (€20) % of new adopters (€50)

The y-axis is the percentage of new adopters in each period over the previous period non-adopters

Figure 11: Counterfactual Experiment: a sign-up bonus to seniors

-Adoption Rate (Predicted)

-Adoption Rate (€10)

-Adoption Rate (€20)

-Adoption Rate (€50)

1993 1995 1998 2000 2002 2004 76

Figure 12: Counterfactual Experiment: ATM points increase/decrease by 50%*

0.25

0.2

0.15

0.1

0.05

1993 1995 1998 2000 2002 2004

• % of new adopters (Predicted) —•*— % of new adopters (ATMs*1.5) —m— % of new adopters (ATMs*0.5)

The y-axis is the percentage of new adopters in each period over the previous period non-adopters 77

Figure 13: Counterfactual Experiment: interest rate increase by 0.01/0.02*

0.25

0.2

0.15

0.1

0.05

0 1993 1995 1998 2000 2002 2004

• % of new adopters (Predicted) —•— % of new adopters (interest+0.01) —>^— % of new adopters (interest+0.02)

The y-axis is the percentage of new adopters in each period over the previous period non-adopters 78

Figure Series Al

Age histograms for the panel (1991-2004)

20 40 60 100 20 40 60 ioo 20 40 eo

Age Histogram: 1991 Age Histogram: 1993 Age Histogram: 1995

40 60 100 20

Age Histogram: 1998 Age Histogram: 2000 Age Histogram: 2002

20 40

Age Histogram: 2004 79

Figure Series A2

% of new adopters over non-adopters in the previous period by age and education (1993-2004)

1993 0.35

up to 50 years from 51 to 65 years more than 65 years

•none U elementary school —^—middle school )( high school

1995

up to 50 years from 51 to 65 years more than 65 years

•none U elementary school —&= middle school )( high school

1998 0.6 0.5 0.4 0.3 0.2 0.1 0 up to 50 years from 51 to 65 years more than 65 years

•none B elementary school —•&— middle school )( high school 80

2000

0.4

up to 50 years from 51 to 65 years more than 65 years

•none • elementary school =j>=middle school )C •• high school

2002

0.6

up to 50 years from 51 to 65 years more than 65 years

•none •elementary school ^^p—middle school high school

2004

up to 50 years from 51 to 65 years more than 65 years

•none B elementary school =t>= middle school )( high school 81

1 We are not referring to those technologies specially designed to aid the elderly.

2 In Appendix 1, we use a simple example to show why a static model underestimates the

adoption cost and an infinite horizon model overestimates the adoption cost. The

intuition for identification is also briefly discussed in the example.

3 There may be an ongoing fee for using an ATM card, but typically it is a small sum and

can be fully covered by the benefits received in each period.

4 For the same modelling reason, the total number of "valid" observations for the

structural model is not 387+483+534*5=3,540. It is actually

387+416+406+301+250+201=1,961, after omitting households' first observations and

their post-adoption observations.

5 The adoption rates for people aged up to 50 are calculated over those younger than 51 in

the 1991 survey wave. It is the same for seniors over 65.

6 Interest income is subject to a withholding tax in Italy. The withholding tax rate is 30%

before 1997 and 27% since 1998. The flat rate withholding tax is deducted from nominal

interest rates in the empirical estimation.

7 "Diffusion can be seen as the cumulative or aggregate result of a series of individual

calculations that weigh the incremental benefits of adopting a new technology against the

costs of change, often in an environment characterized by uncertainty (as to the future evolution of the technology and its benefits) and by limited information (about both the benefits and costs and even about the very existence of the technology)." (Hall and Khan

(2003)) This model is usually called the money demand model. In order to distinguish it from the money demand model in monetary economics, we name it the cash demand model.

9 Alvarez and Lippi (2009) extend the basic Baumol-Tobin model to a dynamic environment where withdrawing cash at random times at a low cost is possible. Their focus is money demand, while ours is ATM adoption.

10 A potential drawback of this formula is that when R is zero, ATC becomes zero. In the empirical implementation, we set the minimum value of R to be 0.5%, which is consistent with the nominal interest rate in the period 1991-2004.

11 As discussed in Section 3, during the period studied in this paper, ATM cards were mainly used for cash acquisition and credit card numbers and use had a very low base in

Italy. We didn't use consumption of durable goods because 1) consumption of durables is less than 10% of consumption of non-durables for our panel households; 2) there are a lot of zero observations (70%) for consumption of durables; 3) people might make special payment arrangements to purchase generally speaking more expensive durable goods. So we believe that (a certain proportion, fXj, of) consumption of non-durables is a good approximation of consumption financed by cash.

12 Another potential drawback is that when yit is zero, ATC becomes zero as well.

Fortunately, there are only three zero observations of yiit for our panel households in all survey waves.

13 In our estimation, we use a concave functional form: \p(t) = ip0—. A linear time trend is also attempted, but the model fit is much worse.

14 It is possible that the initial adoption cost is decreasing over time. We do not distinguish the two stories (increasing attractiveness vs. decreasing adoption cost) and we interpret the adoption cost as the average adoption cost over time. In addition, please interpret the word "attractive" in a broad sense; it could be that the ATM technology is becoming more reliable, more secure, or more versatile (with more functions).

15 The youngest household head in our panel was 20.

16 Other variables like employment status, marital status, number of income earners in the household, number of household members, and size of the city by resident population, are also experimented on in the static choice models. Since none of them is significant, they are dropped from the structural model to save state space and lessen the computational burden.

17 yiit should also depend on the number of income earners in the household and the number of income earners should be correlated with age. Unfortunately, the number of income earners cannot be predicted well based on the age of household members.

Besides, regression analysis about yit based on the first order Markov process

2 assumption gives us a high R . Consequently, we keep this AR(1) assumption for yiit.

18 Rust and Phelan (1997) make the same assumption for terminal age; the oldest household head in my panel was 97.

1 Starting in 1982, the Wall Street Journal conducted polls asking economists for biannual interest rate forecasts and predictions. It was found that not only were these economists not even close in forecasting actual interest rates, they could not even predict the direction in which interest rates would move. In fact, in their forecasts, experts accurately predicted the direction of interest rates less than one third of the time.

(Sjuggerud (2005)) If this is the case, our results are still consistent with the traditional view that the elderly have more difficulties in learning new technologies, because we have taken elderly people's lower time costs into consideration.

21 Whether the household head has a spouse, and the spouse's education level are also tried in the static model estimation, but they are found to be non-significant. 85

Essay 2

Are All Managers Created Equal? (coauthored with Avi Goldfarb)

Keywords: Technology Adoption, Behavioral Game Theory, Empirical Industrial

Organization, Internet Service Providers, Cognitive Hierarchy 86

1 Introduction

Some managers are better than others. This (perhaps unsurprising) fact is implicit in our teaching of business students and the widespread reporting of good and bad managerial decisions. In order to better understand how management ability affects outcomes, it is necessary to allow for heterogeneity in ability in our models. Nevertheless, while numerous papers model heterogeneous consumers on a variety of dimensions, management heterogeneity is rarely examined. This is not for a lack of models of strategic heterogeneity. For instance, Camerer, Ho, and Chong (2004) develop a

"cognitive hierarchy" model (henceforth CH) of heterogeneous strategic thinking where players differ in how deeply they consider competitor choices.22 They then provide considerable supporting evidence from laboratory experiments. In our paper, we develop the first structural non-laboratory estimate of management heterogeneity based on the CH model and apply it to the decisions of 2,233 Internet Service Providers (ISPs) to provide

56K modem technology to their customers. In particular, based on evidence from laboratory experiments, we build an empirical model where players differ in their ability to correctly conjecture the behavior of their competitors. We then explore the consequences of a change in this ability for ISPs and for modem manufacturers.

Heterogeneity in strategic ability is particularly important in retail markets like the

ISP market. Retailers must choose which products to offer their customers, and the benefit of offering a particular product will depend on whether competing retailers also offer that product. Optimal product assortment decisions are therefore dependent on expectations over competitor actions. Strategic thinking by retailers will then also affect manufacturers. Ataman, Mela, and Van Heerde (2008) show that wide distribution may 87 be the most important factor in determining the success of a new product. Thus, if strategic ability affects retailer decisions to offer products, it will affect manufacturer outcomes as well.

In this paper, we explore a particular kind of strategic ability: the ability to correctly conjecture competitor actions through step-by-step reasoning. A rich experimental literature has found that the cognitive requirements of finding a Perfect

Bayesian Equilibrium are substantial (see Camerer, 2003, for a review). These studies have shown that, rather than solving for the equilibrium, players typically go through a small (and varying) number of iterations on the expected actions of other players (e.g.,

Costa-Gomes and Crawford, 2006; Stahl and Wilson, 1994). Overall, the experimental evidence on the difficulty of playing these games suggests that small firms with inexperienced managers in a new industry are unlikely to fully solve for the Perfect

Bayesian Equilibrium. Since we are studying such an industry, we adapt Camerer, Ho, and Chong's (2004) cognitive hierarchy model to the strategic decisions of ISPs. We operationalize this by modeling a type-0 retailer to act as if it is the only player in the market. A type-1 retailer acts as if it believes all other retailers act as if they are the only player in the market. A type-2 retailer acts as if all other players are distributed between type-0 and type-1. A type-3 retailer acts as if all other players are distributed among type-

0, type-1 and type-2. A type-4 retailer acts as if all other players are distributed among type-0, type-1, type-2 and type-3. And a type-k retailer acts as if all other players are distributed between type-0 and type-(k-l). This structure enables us to develop a prediction of behavior for players of different types. A useful consequence of this model is that the solution is unique because each firm believes it knows what its competitors are 88 doing. This overcomes the common problem of multiple equilibria in simultaneous entry games (e.g. Seim 2006; Bajari, Hong, and Ryan 2004). We then fit these predictions to data to see which distribution of types best explains observed behavior.

Our context for estimating this model is the 1997 decision by ISPs to offer customers a higher speed service (56Kbps over 33Kbps), and if so, which technology to provide. As discussed in Augereau, Greenstein, and Rysman (2006), firms faced a clear, reasonably well-defined technology choice game between not upgrading, upgrading to

Rockwell Semiconductor's K56Flex modem, upgrading to US Robotics' X2 modem, or upgrading to both. We ask (1) How does strategic thinking affect the distribution of 56K modem technology?, (2) Are those players estimated to be more strategic thinkers more likely to survive?, and (3) What factors are correlated with strategic thinking? We find that strategic thinking slowed the distribution and diffusion of the new technology,23 that those ISPs estimated to be more likely to be strategic using 1997 data were more likely to have survived through April 2007,24 and that firms behaved more strategically if they competed in larger cities, they competed in markets with more educated populations, and they competed with more firms.25 Thus, even though the estimate of strategic thinking is associated with increased competition, the ISPs with higher levels of strategic thinking were more likely to survive. More broadly, our results provide external validity to the current laboratory research on the CH model: In addition to the finding on survival, our estimate of the parameter that measures the distribution of strategic ability across the population is at the high end of the range found by Camerer, Ho, and Chong (2004).

The early ISP market provides an ideal setting for examining heterogeneity in strategic thinking. In addition to the clear strategic decision described above, many firms 89 competed in a number of local markets. The dial-up nature of the technology means that we can easily define markets by local telephone calling areas. Perhaps because this was a new industry, large firms like AOL co-existed with very small companies run out of people's homes. MBAs and seasoned managers competed against recent computer science graduates who had helped run the modem pools at their universities. Unlike a

Perfect Bayesian Equilibrium approach, the CH model can account for this heterogeneity in managerial expertise in the context of simultaneous entry games; and while the decision explored in this paper is not truly simultaneous, Augereau, Greenstein, and

Rysman (2006) provide rich detail on why it can be reasonably viewed as a simultaneous game and is therefore an appropriate setting for the CH model.

Overall, the CH model helps explain the variation in managerial decision-making in a useful way. Our combination of behavioral game theory with the structural methods of the New Empirical Industrial Organization provides a new framework for understanding variation in the decisions of managers who face similar choices.26 Without a model of strategic ability, it is impossible to examine how that ability affects market outcomes. Thus, such a model is a necessary step toward our finding that strategic thinking slowed the distribution and diffusion of 56K modem technology, supporting

Reinganum's (1981) theoretical work on the subject. More strategic managers are less likely to adopt new technologies because they anticipate lower profits due to competition. 7

This suggests an important difference between the diffusion of products to consumers and to businesses: The likelihood of a given firm's adoption of a business product often depends on the behavior of other competing businesses. However, our 90 results suggest that the importance of this effect is heterogeneous across managers with different abilities. For example, strategic considerations may be less important when the product is aimed at a new industry with inexperienced management than at a mature industry with lifetime professional managers.

Next, we review the two key papers on which this research is built: Augereau,

Greenstein, and Rysman (2006) provide the main data and the empirical setting, and

Camerer, Ho, and Chong (2004) provide the theoretical basis for the model.

2 A Review of the Two Key Building Blocks

56K Modem Technology and Augereau, Greenstein, and Rysman (2006)

56K modems were introduced in 1997. They allowed data transfer over the

Internet at a faster speed than the previous technology at a time when Internet traffic was increasing rapidly. Two modem technologies competed for the market: the X2 modem from US Robotics and the K56Flex modem by Rockwell Semiconductor. These technologies had the same performance capabilities, although they differed in their ease of connection depending on local characteristics. They were also incompatible: A consumer with a given modem could only connect to an ISP at 56K speed if that ISP had the same technology.

Augereau, Greenstein and Rysman (2006) study the choice of 56K modem technology by ISPs. Specifically, ISPs that offered 33K service decided whether or not to offer 56K service on X2, K56Flex, both, or neither. They model the ISPs' problem as an entry game into two markets and assume a Perfect Bayesian Equilibrium. They then use a 91 bivariate probit model to estimate the parameters and show that ISPs were less likely to adopt the technology that more of their competitors adopted (in other words, they chose to differentiate).

Building on Augereau, Greenstein, and Rysman (2006), we model an ISP's technology choice problem as an entry game of imperfect information. Then we use CH theory to capture heterogeneity in ISP use of strategic thinking. We believe the early ISP market is a particularly good industry on which to apply CH theory because: (1) ISP managers are likely heterogeneous in experience, reasoning ability, etc., (2) each ISP's payoff depends on competing ISPs' technology choices, (3) the set of players and markets is well-defined, unlike many other entry-type games, and (4) the decisions were largely made over a three-month period, a period short enough that a simultaneous game might be a reasonable model.

Data and summary statistics. Our main data set is identical to that used in

Augereau, Greenstein, and Rysman (2006). Their paper provides a rich description of the data; we therefore only briefly describe some key aspects of the data. Augereau,

Greenstein, and Rysman use two ISP directories, theDirectory and Boardwatch, to collect information on ISP location (through the telephone numbers that could be used to dial in),

56K technology, and some features of the ISP. Following Augereau, Greenstein, and

Rysman (2006), we define markets by telephone switches. We consider an ISP to compete in a given switch/market if it is a local telephone call from that switch to the ISP dial-in number. We also have demographic data based mainly on the zip codes associated 92 with each switch. The data consist of 2,233 ISPs in 9,070 markets for a total of 216,186

ISP-market combinations.

Table 1A provides descriptive statistics by market, table IB provides descriptive statistics by ISP, and table 1C provides descriptive statistics by ISP-market. Most variable names are self-explanatory. The variable ISP has digital connection is missing for a number of observations. We include a variable missing for these observations to allow us to include the digital connection variable while limiting the effect of the missing data on our results. We supplement this core data set with information collected by visiting each ISP's URL to determine which of the ISPs still existed in April 2007.

We observe ISPs making one of four adoption choices: (1) adopt neither technology, (2) adopt Rockwell Semiconductor's K56Flex, (3) adopt US Robotics' X2, or (4) adopt both. Augereau, Greenstein, and Rysman (2006) argue that the decision can be viewed as simultaneous because the diffusion of the technology was so rapid. Table

IB contrasts the adoption rate for the technologies in July and October 1997. Since the bulk of the adoptions occur in this short window, they assume the game can be viewed as simultaneous. We also make this assumption. To explore the consequences of this assumption, we estimated a model with the July decisions taken as given and only examine changes from July to October. Qualitative results do not change.

The descriptive statistics reveal a further complication: most ISPs operated in multiple markets. The average ISP operated in 96 markets, and the median served 16

(equivalent to one or two local calling areas). No ISP served all switches. Multi-market

ISPs operated the same technology in all their markets. This complicates our analysis because we need to alter the standard CH model to address multi-market ISPs and to 93 constrain ISP decisions to be the same across markets. We discuss how we deal with this below.

Cognitive Hierarchy and Camerer, Ho, and Chong (2004)

Suppose many players play a simultaneous-move game where all players' payoffs not only depend on their decision but also on other players' decisions. Players therefore need to form expectations about what the other players will do. While many models allow players to differ in their payoff functions, they typically assume all players have the same ability to think through the game. Camerer, Ho, and Chong (2004) argue that this assumption is flawed. They develop CH theory to allow players to differ in their ability to think strategically. They show that CH works well in both the entry-type game examined in this paper and in a "p-beauty contest" game (Nagel 1995; Ho, Camerer, and Weigelt

1998).

In CH theory, players have different hierarchies of rationality. Type-0 thinkers do not consider their competitors (they assume they are local monopolists); a type-1 thinker assumes all competitors are type-0; a type-2 player assumes the other players are a combination of type-0 players and type-1 players; a type-k players assumes the other players are distributed between type-0 and type-(k-l). Camerer, Ho, and Chong (2004) provide evidence that a Poisson distribution effectively describes the observed distribution of players. We rely on this evidence to support our model and identification.

In the CH model, a type-k player assumes all other players are distributed truncated

Poisson between type-0 and type-(k-l). The model assumes the distribution of types in 94 the population has the same Poisson parameter as the truncated Poisson used by players to assess competitor types.

We interpret this hierarchy of rationality as heterogeneity in strategic ability. This interpretation relies on a prior experimental literature that shows players who appear to think strategically show decision processes consistent with this idea (Bosch-Domenech et al. 2002; Camerer and Johnson 2004; Costa-Gomes, Crawford, and Broseta 2001; Chong,

Camerer, and Ho 2005). Therefore, type-0 managers do not consider competitors. They instead only consider the characteristics of their firm and their market. Given their own characteristics, type-1 players best respond to a situation where all their competitors are type-0. And so on. A key difference between CH and Nash is therefore that in CH models some players will be surprised by the behavior of their competitors because they did not correctly conjecture their competitor's choices.28

3 Model and Empirical Strategy

In this section, we build on Camerer, Ho, and Chong (2004) to enable us to take the CH model to the ISP data. Our specification differs from the definitions used in laboratory experiments. In particular, in order to take the model to data outside the laboratory, we use observable data to allow ISPs to be heterogeneous in ways other than strategic thinking. Augereau, Greenstein, and Rysman (2006) show that ISP- and market- specific characteristics influence whether to adopt 56K modem technology at all and, if so, which technology to adopt. We therefore add the ISP- and market-level covariates used in that paper. This means that, rather than choosing randomly, a type-0 player's choice is the one with the higher intrinsic value to that player, independent of competitor 95 choices. Higher-level players also consider the intrinsic value of each choice in addition to competitor behavior. In what follows, we formalize this approach.

Suppose there are JISPs,y = /,...,./that operate in markets indexed by i. ISPs observe market-specific characteristics x, and ISP-specific characteristics Xj. Also, each

ISP has four choices: adopt neither technology, adopt Rockwell Seminconductor's

K56Flex modem (technology "A"), US Robotics' X2 modem (technology "B"), or adopt both. We use Sj = {0, A, B, AB) to denote this choice set and normalize E[n°j \ k] - 0.

In our model, E[n*j \ k] will depend on the ISP's level of strategic thinking, in addition to market-level and ISP characteristics. As discussed above, we assume a type-0

ISP (denoted by j) does not take competitor actions into account. Its expected profit in market / is therefore only a function of ISP- and market-level characteristics.

(1) £[^|0] = #+*,#+*,#

Here x,are market-level characteristics that affect the profitability of adoption and

Xj are ISP characteristics. For a type-k > 0 ISP, its expected profit in market / is:

(2)

£[< I *] = Po +*AA +*AA +£&?(."? +1) +VX +tfrf\X,0,k\

B B B E[nfj |k] = Pi +x,/3f +Xjp 2 +E[¥°n? +¥ 2{n? +l) + ^nf \X,0,k]

Here ni , nt , and nt are the (expected) number of market/'competitors who adopt technologies A, B, and both, respectively. These will therefore be a function of the market and competitor characteristics. Then t// represents coefficients on expected competitor behavior and f3 represents coefficients on other parameters of the profit 96 function. For type-1 ISPs, the values for nf, nf, and nfB are calculated assuming that all of their competitors are type-0 ISPs who choose the technology that maximizes their profits. For type-2 ISPs, these values are calculated assuming all of their competitors are either type-0 or type-1. For type-k ISPs, these values are calculated assuming all of their competitors are distributed between type-0 and type-(k-l). In this way we assume that all

ISP- and market-specific characteristics are public information. Thus, any ISP can observe the characteristics of all the other ISPs and predict their behavior according to the distribution of types.29 Given ISP and market characteristics and the parameters of the model, the choices of type-0 ISPs are perfectly predictable up to the idiosyncratic error in the profit function. The choices of higher-level ISPs are consequently also iteratively known given the distribution of types. Our modeling approach to study this type- dependent choice problem is similar to those that examine state-dependent choice problems (e.g. Netzer, Lattin, and Srinivasan 2008). Here we have type distributions and type-dependent choice probabilities, while those methods have similar state distributions and state-dependent choice probabilities.

Following Camerer, Ho, and Chong (2004), we assume this distribution to be a truncated Poisson. In particular, we assume that types are distributed Poisson with parameter T. The Poisson distribution is convenient because a single parameter describes it. As T increases, the distribution of player types becomes relatively more strategic. We can assume that a type-k ISP believes its competitors are distributed truncated Poisson (at k-1) with the same parameter x. Alternatively, in order to estimate how strategic ability varies with market and ISP characteristics, we modify this distribution to allow the

Poisson parameter to vary with these characteristics. In particular, we set ln(T') = ycr+TiZy 97 where z includes three market-level characteristics (the number of competitors in the market, the percentage of the population that lives in an urban area, and the percentage of the population that has graduated college) and a firm-level characteristic (number of markets served).

Given its type, ISPy picks the choice that maximizes its profit:

Maxiy{0,;r;,*;,*;*}

Since ISPs operate in many markets and they offer the same technology in all markets, we assume that they add up the profits across markets and choose the technology that gives the highest total profit. Then,

i

*;=5>r;i*]+v;, (3)

< = ££[< I *]+££[< I k]+v* +Vy- +r

•N(0,I) and E = 0*y P 1

T represents the additional payoff of adopting both technologies beyond the sum of adopting each technology. Since p and Y are not separately identified in a setting like ours (Gentzkow 2006), we normalize T to be 0. The error terms V/ are ISP-level shocks that affect the profitability of the different technologies observed by the ISPs but not by the econometrician. 98

We can now predict multi-market ISP/s choice probabilities, conditional on its type. The general procedure is as follows: We first calculate every ISP's branch-level profits (or market-level profits). Then we add them up by ISP to get the ISP-level profits.

Next we consider each ISP's aggregate profit maximization problem to determine its technology adoption decision. Then we map every ISP's decision to all its branches. We repeat this procedure to get every ISP's expectation about other ISPs' decisions, conditional on it being of each possible type. We calculate the ISP's choice probabilities assuming that the ISP is maximizing profits, conditional on it being of each type.

Formally, the first step is to calculate ISPj's choice probabilities if it is of type-0

(with probability pk (j), suppose k is the highest type possible):

Pk CO -> Pry ^^ (°> A-> 5>AB) • Similarly, we calculate all the other ISPs' choice

= probabilities if they are type-0. Then we map s° into s°, Pr° into Prf°., / =JI,...JBJ

1,2,...,/(where J = 2,233 in our data). Second, if ISPj is a type-1 ISP (with probability p\U) )> based on its beliefs about other ISPs' types and branch level decisions (Pr°.), we calculate/s expected branch-level profits. Adding up these profits, we obtain/s aggregate profit level and its choice probabilities from its profit-maximizing problem: p\ U) -> Prj <-> s) G {°> A, B, AB). We similarly calculate all the other ISPs' choice probabilities if they are type-1 ISPs, and then we map s\ into s]j, Prj into Pr/., / =

JI,--JB, j = 1,2,..., J. We repeat this procedure until we get all ISPs' choice probabilities under all types.

Mathematically, a type-& ISP/s expected number of competitors adopting the technologies in market / can be shown by the vector: 99

(4)

B {EJ(nf\k),EJ(nf\k),EJ(nf \k)} =

m=k~] m=k-\ {Z 2>;-,(/)>

Here, all type-k ISPs assume that any other ISP (denoted by/) is distributed according to a normalized Poisson distribution with one parameter rJ, from type-0

(Pk-\ U))t0 type k -1 (ftl,1 (j)). Again, note here each /SP has an idiosyncratic Poisson distribution parameter rJ, while in the original CH model, each group of lab subjects has one idiosyncratic Poisson distribution parameter x. In other words, in the original CH paper, all subjects' types are drawn from the same Poisson distribution with one parameter x; while here each multi-market ISP is drawn from a generalization of the

Poisson distribution where the x varies with the ISP's characteristics according to the coefficients on these characteristics.

Next, we can compute ISP/s aggregate choice probabilities (weighted by p"{j)) with respect to the choice set {0, A, B, ABj:

(5)

Pj(0)= Z p'kU)xrtj(0), ij(0) = r lfSj=° ; i=o,...k [0, otherwise

x-i i i lU if-s,- = A pM)=Hp^J)*mA)>iM) = \ J ; i=o,...k [0, otherwise

lfSj B Pj(B)= Z ^0')xPrJ(5), IJ(B) = r ^ ; /=o,...i [0, otherwise pii^xPr^AB), I,(AB) = \ J /=<>,...* [0, otherwise

This gives the likelihood function:

(6) '' '

We estimate this likelihood function using a genetic "differential evolution" algorithm (Storn and Price 1997). This method is simple and efficient for global optimization over continuous spaces. We combine this with a GHK simulator using 50 draws in order to simulate the choice probabilities (we choose this number based on

Monte Carlo evidence in Keane (1994) and elsewhere).31

Intuition for identification. Our model is identified because the model predicts that different types will behave differently in otherwise identical situations. It relies on the assumption that we can assess the attractiveness of adopting the technologies to each

ISP. For example, suppose we observe a market with three ISPs and we know that the optimal number of adopters is two. If we observe three adopt, then we can assume that they are all type-0. If we observe none adopt, then they must all be type-l (and expect that their competitors both adopted as type-Os). The model will generate decision rules like this and we will compare these predictions to data. Each x will generate a distribution of types. For example, if x = 1, 37% of players will be type-l and less than 1% will be type-5. In contrast, if x = 3, 16% of players will be type-l and 11% of players will be type-5. Given that we have a large number of ISPs (2,233) serving an even larger number of markets (9,070), we can find the value for x that best fits the data to distribution of types predicted by the model. 101

4 Results

Model Estimates

In this section, we discuss the parameter estimates. Table 2 presents the main results. Table 2 column 1 uses four different characteristics of the ISPs and the markets they serve to define x. The results suggest that firms that operated in areas with more educated populations, that faced more competitors, and that operated in urban areas had higher values of x. (The negative coefficient for the number of markets is not robust to alternative specifications and therefore we do not emphasize it.) For example, this means that the strategic thinking distributions for firms that faced more competitors first order stochastically dominate the distributions for other types of firms. Firms with these characteristics are therefore more likely to be higher-type players and thus behave more strategically. These results are consistent with prior laboratory research. For example, Ho,

Camerer, and Weigelt (1998) find that strategic thinking increases as the number of competitors increases and Chong, Camerer, and Ho (2005) find that laboratory subjects who attend a higher-quality school are more strategic.

Column 2 estimates the model where we assume x to be equal across all ISPs. The estimated x is 2.67 (i.e., e9809). This value means that the number of type-0, type-l, type-

2, type-3, type-4, and type-5 and above are 164, 437, 583, 519, 346, and 185, respectively.

This is at the high end of the range of values for x found in Camerer, Ho, and Chong

(2004). For example, the median value from all of the experiments they examine is 1.6 102 and the maximum is 4.9. For a group of portfolio managers, x is 2.8. We view this as providing external validity for the CH model: given that this is a business decision, we expect managers to think it through more carefully than undergraduates would in a lab.

Still, the level of strategic thinking is still well within the range of the lab, suggesting that the laboratory insights do apply in our setting.

Rows 6, 7, 9, and 10 of table 2 show that ISPs typically differentiate from their rivals. The parameter i//is negative when estimating a firm's incentives to adopt the same modem technology as its competitors and positive when estimating incentives to adopt a different technology. For example, rows 6 and 7 show that if an ISP adopted the K56Flex, then, all else equal, its competitor was more likely either to have adopted the X2 or to not have adopted at all. Furthermore, we find that the incentives not to adopt the same technology as a competitor were larger than the incentives to adopt the competing technology. This suggests that strategic thinking may have led to an overall decrease in adoption of 56K modems. We examine this idea in detail below.

We explore robustness to a number of alternative specifications in the appendix.

These results generally confirm our main findings.

Comparison to Augureau, Greenstein, and Rysman (2006)

Our results are consistent with Augereau, Greenstein, and Rysman (2006) although a comparison provides important additional insights into the consequences of allowing heterogeneity in strategic ability. Their objective was to determine whether ISPs coordinate to take advantage of potential network externalities or differentiate to generate 103 local market power. As in our estimation, they allow for both coordination and differentiation to arise in their analysis. The key difference between our paper and theirs is that our paper allows for heterogeneity in managerial ability. Their primary contribution and main result is that ISPs tended to differentiate from their rivals when choosing which 56K modem technology to adopt. This is consistent with our results on t// in rows 6, 7, 9, and 10 of table 2. Our primary empirical contribution instead arises from the estimates of the strategic ability parameter x and the simulations of the consequences of varying strategic ability. Our model therefore provides additional and distinct insights because we can assess how ability affects outcomes.

A direct comparison of our results with Augereau, Greenstein, and Rysman provides an interesting further insight: Our estimated level of differentiation is much stronger (in significance and relative coefficient magnitude) than the one estimated in their paper. Given the assumptions of the CH model, this is expected: Low-type ISPs may not differentiate effectively. These would be averaged with the others had we estimated a Perfect Bayesian Equilibrium model. In the CH model that we estimate, the coefficients are driven only by the firms that behave strategically.

It is important to note that the CH model does not generally fit the data better than the Perfect Bayesian Equilibrium model used in Augereau, Greenstein, and Rysman

(2006). We estimated each of the specifications they presented and our estimated log likelihoods are similar to theirs with one interesting exception: our model fits the data better than theirs when we treat the July 1997 decisions as exogenous—i.e. when making decisions in October, ISPs observe their competitors' decisions in July. We believe we fit the data better in this case because these decisions are more likely to be truly 104 simultaneous due to the short time horizon. In particular, suppose early adoption decisions (say, those in April) are observable by later adopters (say, those in October).

Then the ISPs making a decision in October will be able to best-respond to the early adopters. This will mean that the resulting adoption patterns will more closely resemble

Nash. In contrast, since it takes time to set up the technology, it is unlikely that late adopters (those in October) will be able to best-respond to ISPs that adopted in August. It is therefore more likely that conjectures about competitor behavior over a short time horizon will rely on k-step thinking. In this way, treating the July decisions as exogenous and observed, and then modeling only the subsequent decisions as simultaneous, is closer to the simultaneous game that we model. Therefore, this is suggestive of the usefulness of the CH model over the Nash model when the game is truly simultaneous.

Did High x Firms Do Better?

In this sub-section, we provide a test of the external validity of our estimates. We cannot explicitly test our model against the Nash equilibrium. Instead, we examine whether the ISPs that survived until April 2007 had a higher estimated value of x. If the firms that are estimated to be more strategic are more likely to survive, we believe this provides some surface validity for our strategic ability parameter.32

Our data contain the URLs of 2,233 different ISPs that were operating in 1997.

We manually visited each of these 2,233 URLs again in April 2007. Of the 2,233 URLs,

1,107 were still operating as ISPs that provided dial-up Internet, DSL, or both. Another

933 were no longer operating as ISPs. The remaining 193 were operating as ISPs but the visitor was forwarded to another website.33 We use this information in table 3A to assess the correlation between the strategic ability parameter (x) predicted from our model and survival through 2007. All three columns show the same substantive result: those ISPs that survived (through continued operations or acquisition) have a higher value of T. We use table 2 column 1 to predict -c.

Column 1 defines survival as either still operating as an ISP or having been acquired.

Column 2 takes the ISPs that were acquired out of the data. Column 3 treats acquired

ISPs as having exited.

Overall, table 3A shows that higher T firmsdi d better in that they were more likely to have survived for 10 years. We do not mean to say that the 56K modem decision itself led to survival. Instead, we argue that high strategic ability overall is likely correlated with observed strategic behavior in the decision to adopt 56K modems. Firms that survived had higher estimated levels of strategic thinking in this context, and therefore we argue that they likely had higher levels of strategic thinking overall. Still, this correlation between survival and strategic ability, however, needs to be treated as suggestive rather than conclusive evidence in favor of our model. It is possible that those variables correlated with estimated strategic thinking, T, are correlated with survival for reasons independent of strategic thinking. While this is unlikely to be the case for the number of competitors—all else equal more competitors should lead to more failures, it is possible that ISPs that operated in more urban and educated areas had high survival rates. Later, we describe several limitations of the model in more detail.

Underlying this test is the assumption that strategic thinkers are more likely to be profitable and hence more likely to survive. While Stahl (1993) showed in an evolutionary setting that some non-strategic thinkers survive if they are lucky enough to randomly choose a good strategy, we find that strategic thinkers (i.e. those with higher estimated T) do earn a higher profit on average in our model. In particular, table 3B shows that predicted profits and the predicted strategic ability parameter T are strongly and positively correlated within the model. The purpose of this table is simply to show that, in our model, ISPs with higher x generally earn higher profits: strategic thinkers are more likely to be profitable. Therefore, the result that strategic thinkers survive beyond the estimation period does provide external validity for our assertion that x measures strategic thinking.

Consequences of Strategic Thinking on 56K Modem Diffusion

We next examine how different levels of strategic thinking may lead to different outcomes. Based on the coefficients of table 2 column 1, figures 1 and 2 show simulation results where we allow the distribution of strategic thinking to vary. Figure 1 shows that the percentage of ISPs that provide at least one 56K modem technology falls as strategic thinking rises. If everyone is a type-0 player, provision of one or the other technology is over 99%. However, provision falls under 50% as x approaches two and it falls under

25% as x approaches 5. Beyond x = 5, the effect of increasing x appears to have little systematic impact on behavior. Thus figure 1 suggests that strategic thinking slows the overall diffusion of the technology: if the ISPs are more strategic then fewer will offer the upgraded service to their customers. Figure 2 adds two further insights: (1) fewer ISPs will adopt both technologies as strategic thinking rises and (2) the relative shares of the competitors will level off as strategic thinking rises. These results reflect the incentive to 107

differentiate. When firms consider the competition, the model suggests that they understand that providing a different service from the competition increases profitability.

Besides the results in figures 1 and 2, we conducted a simulation where all players are type-1. Under this situation, less than 1% adopt both technologies and over 95% adopt US Robotics' technology, apparently in an attempt to differentiate from their expected type-0 competition. This simulation shows the importance of heterogeneity in strategic ability in providing interesting and reasonable insights. It is not simply bounded rationality: If everyone is boundedly rational in the same way but no structure is imposed in terms of reasonable beliefs, then the market outcomes become unbalanced.

In summary, the simulation results suggest that allowing for heterogeneity in strategic ability helps understand variation in ISP technology choices. Competitive considerations slowed the diffusion of 56K modem technology; however, diffusion would have been even slower if the ISPs were more strategic (as might be expected as the industry matures).

5 Limitations

As in any empirical work, this paper has a number of limitations. First, we assume, rather than test, the CH model. While we provide some evidence of external validity, our model does not nest Nash equilibrium assumptions. Our goal has been to understand the drivers of changes in the ability distribution parameter x, assuming that the model behind it is correct. We rely on the prior experimental literature to support our modeling assumptions. Based on this literature, we measure strategic ability as the number of thinking steps a firm goes through in order to differentiate from its rivals. Therefore, two 108

firms with different characteristics behave differently in the model in a probabilistic

sense.

It is possible that the observed variation in managerial ability is simply variation in unobserved heterogeneity along other dimensions. Our estimate of heterogeneity in the ability to correctly conjecture competitor behavior puts a specific structure on unobserved heterogeneity. Given the cross-sectional nature of our data and the fact that ISPs choose the same technologies in every market, we cannot allow for unobserved heterogeneity in the likelihood of ISPs to upgrade their services or in the attractiveness of upgrading across markets. These limitations can be overcome in future work with alternative settings where firms make different decisions in different markets. In this case, firm and market (random or fixed) effects can be identified.

Additionally, while we find that ISPs with high estimated T were more likely to survive (despite being less likely to adopt 56K modems and facing more competitors) and that the results on what drives the strategic ability parameter are intuitive, without a clear instrument that is correlated with x but not survival, this evidence remains suggestive as it is largely identified off the functional form. ISPs with more competitors that operate in educated urban markets are more likely to be strategic. Still, we can put forth alternative explanations for our intuitive results and the correlation between having a high estimated

T and surviving. Thus, we cannot say that the CH model is somehow "better" than assuming Nash behavior. In fact, our model does not consistently fit the data better than

Augereau, Greenstein, and Rysman's Nash equilibrium model. Instead, we argue that the assumptions of the CH model allow us to learn different things from the data than a Nash model allows. 109

Second, a somewhat restrictive assumption inherent in the CH theory is that all players think they are smarter than all other players. In other words, the CH theory precludes the possibility that players expect their competitors to be their equals in level of strategic thinking. However, if we allow players to think rivals may have equal ability, then this will result in a mutually best response through infinitely many iterations, meaning the uniqueness of the solution would be lost.

Third, we do not have rich data on managerial characteristics. While we found several market-level characteristics to be related to our measure of strategic ability, we cannot say much about the manager-specific factors that are related to ability. More information on managers would allow for a deeper understanding of the types of managers that are more strategic.

Fourth, we identify a very specific kind of ability: the ability to correctly conjecture competitor behavior. We cannot say anything about the ability of managers in other dimensions that are relevant to success.

Finally, the empirical setting may differ from the model in ways that may affect the results in unforeseen ways. For example, multi-market ISPs may weight markets differently than our assumptions suggest. ISPs may be forward-looking firms that consider future market changes that we cannot measure. There may be unobservable shocks to adoption costs or benefits that affect technology choice. For example, a temporary, locally focused, price promotion for one technology may influence our results on strategic behavior. Furthermore, although adoption takes place over a short period of time, the game we study is not truly a simultaneous game. ISPs may respond to each other's decisions quickly. Finally, it is also possible that some ISPs are playing a 110 coordination game rather than a differentiation game. While Augereau, Greenstein, and

Rysman (2006) find that ISPs did not behave this way on average, if some ISPs were coordinating, they will appear in the estimates to be less strategic.

6 Conclusion

As the first study to our knowledge to combine behavioral game theory with the structural models of the New Empirical Industrial Organization, our paper provides a new framework for understanding variation in the decisions of managers who face similar choices. This framework allows us to show how strategic thinking affects outcomes.

We find that strategic thinking slowed the diffusion of 56K modem technology, supporting Reinganum's (1981) theoretical work on the subject. In particular, our results suggest that strategic thinking by some customers substantially reduced modem distribution for both Rockwell Semiconductor and US Robotics. This impact suggests that competitive considerations in technology adoption are important to managers of business-to-business products and for policymakers trying to encourage technology diffusion. That said, the degree to which competitive considerations matter depends on the strategic sophistication of the firms: Our simulations suggest that adoption rates would have been lower if the average level of strategic sophistication were higher.

Generally speaking, in industries with inexperienced managers, competitive considerations may be less important. This paper therefore builds on the rich existing literature that generally focuses on the diffusion of new consumer-oriented products

(starting with Bass 1969). Ill

Our results suggest two new variables that should be considered when a new product is aimed at businesses: (1) the strategic consequences of the product for the targeted industry and (2) the strategic ability of the players. The competitive considerations of business customers affect diffusion, and this is particularly important in industries with sophisticated, experienced managers. Consistent with the theoretical results in Soberman (2007), this means it may be most effective for business-to-business marketers to target just one firm in each market, as the marginal returns to targeting multiple competing customers will be lower. Furthermore, our results also suggest that incentives for business customers to differentiate from competitors may hinder the creation of winner-take-all markets.

We show in this paper that estimating heterogeneity in managerial types is feasible and provides interesting insights. Several opportunities for future work remain that builds structural econometrics models from the assumptions of behavioral games.

We encourage future researchers to examine whether strategic thinking limits (or encourages) technology adoption in other industries and whether this impact increases as industries mature and managers become more experienced. Similarly, scholars could apply this modeling technique to other settings to further explore how strategic thinking affects market outcomes. 112

References

Aradillas-Lopez, Andres, and Elie Tamer (2008), "The Identification Power of

Equilibrium in Games," forthcoming Journal of Business and Economic Statistics.

Ataman, Berk M, Carl Mela, and Harald J. van Heerde (2008), "Building Brands," forthcoming Marketing Science.

Augereau, Angelique, Shane Greenstein and Marc Rysman (2006), "Coordination vs.

Differentiation in a Standards War: 56K Modems," RAND Journal of Economics, 37(4),

887-909.

Bajari, Patrick, Han Hong, and Stephen Ryan (2004), "Identification and Estimation of

Discrete Games of Complete Information," NBER Working Paper #T0301.

Bass, Frank M. (1969), "A New Product Growth for Model Consumer Durables,"

Management Science, 15(5), 215-27.

Bosch-Domenech, Antoni, Jose G. Montalvo, Rosemarie Nagel, and Albert Satorra

(2002), "One, Two, (Three), Infinity,...: Newspaper and Lab Beauty-Contest

Experiments," American Economic Review, 92(5), 1687-1701. 113

Brown, Alexander L., Colin F. Camerer, and Dan Lovallo (2007), "Limited Strategic

Thinking in the Field: The Box Office Premium to Unreviewed 'Cold Opened' Movies," working paper, California Institute of Technology.

Camerer, Colin (2003), Behavioral Game Theory: Experiments in Strategic Interaction,

Princeton University Press: Princeton NJ.

—, Teck-Hua Ho and Juin-Kuan Chong (2004), "A Cognitive Hierarchy Model of

Games," Quarterly Journal of Economics, 119(3), 861-98.

—, and Eric J. Johnson (2004), "Thinking About Attention in Games: Backward and

Forward Induction," In The Psychology of Economic Decisions, Isabelle Brocas and Juan

Carillo, eds. New York: Oxford University Press, 111-30.

Chan, Tat Y., Barton H. Hamilton, and Christopher Makler (2007), "Using Expectations

Data to Infer Managerial Objectives and Choices," working paper, Washington

University.

Che, Hai, K. Sudhir, and P.B. Seetharaman (2007), "Bounded Rationality in Pricing

Under State-Dependent Demand: Do Firms Look Ahead? How Far Ahead?" Journal of

Marketing Research, 44(3), 434-49. 114

Chong, Juin-Kuan, Colin F. Camerer, and Teck-Hua Ho (2005) "Cognitive Hierarchy: A

Limited Thinking Theory in Games," in Experimental Business Research V. Ill, Rami

Zwick and Amnon Rapoport eds. New York, Kluwer Academic Publishers, 203-28.

Costa-Gomes, Miguel, and Vincent P. Crawford (2006), "Cognition and Behavior in

Two-Person Guessing Games: An Experimental Study," American Economic Review,

96(5), 1737-68.

—, —, and Bruno Broseta (2001), "Cognition and Behavior in Normal-Form Games: An

Experimental Study," Econometrica, 65(5), 1193-1235.

Gentzkow, Matthew A. (2007), "Valuing New Goods in a Model with Complementarity:

Online Newspapers", American Economic Review, 97(3), 713-44.

Greenstein, Shane (2000), "Building and Delivering the Virtual World: The Commercial

Internet Access Market," Journal of Industrial Economics, 48(4), 391-411.

Haile, Philip A., Ali Hortacsu, Grigory Kosenok (2008), "On the Empirical Content of

Quantal Response Equilibrium," American Economic Review, 98(1), 180-200.

Haruvy, Ernan, Dale Stahl, and Paul Wilson (2001), "Modeling and testing for heterogeneity in observed strategic behavior," Review of Economics and Statistics, 83(1),

146-57. 115

Ho, Teck H., C. Camerer, and K. Weigelt (1998), "Iterated Dominance and Iterated Best-

Response inp-Beauty Contests," American Economic Review, 88(4), 947-69.

—, Noah Lim, and Colin F. Camerer (2006), "Modeling the Psychology of Consumer and

Firm Behavior with Behavioral Economics," Journal of Marketing Research, 43(3), 307-

31.

Hortacsu, Ali, and Steven L. Puller (2007), "Understanding Strategic Bidding in Multi-

Unit Auctions: A Case Study of the Texas Electricity Spot Market," forthcoming RAND

Journal of Economics.

Keane, Michael (1994), "A Computationally Practical Simulation Estimator for Panel

Data," Econometrica, 62(1): 95-116.

Lim, Noah, and Teck-Hua Ho (2007), "Designing Price Contracts for Boundedly

Rational Customers: Does the Number of Blocks Matter?" Marketing Science, 26(3),

312-26.

McKelvey, Richard, and Thomas Palfrey (1995), "Quantal Response Equilibria for

Normal-Form Games," Games and Economic Behavior, 10(1), 6-38. 116

Nagel, Rosemarie (1995), "Unraveling in Guessing Games: An Experimental Study,"

American Economic Review, 85(5), 1313-26.

Netzer, Oded, James Lattin, and V. Seenu Srinivasan (2008), "A Hidden Markov Model of Customer Relationship Dynamics," Marketing Science, 27(2), 185-204.

Reinganum, Jennifer F. (1981), "Market Structure and the Diffusion of New

Technology," Bell Journal of Economics, 12(2), 618-24.

Seim, Katja (2006), "An Empirical Model of Firm Entry with Endogenous Product-Type

Choices," RAND Journal of Economics, 37(3), 619-40.

Soberman, David A. (2007), "Marketing Agencies, Media Experts, and Sales Agents:

Helping Competitive Firms Improve the Effectiveness of Marketing," Working Paper,

INSEAD.

Stahl, Dale O. (1993), "Evolution of Smartn Players," Games and Economic Behavior

5(4), 604-17.

—, and Paul W. Wilson (1994), "Experimental Evidence on Players' Models of Other

Players," Journal of Economic Behavior and Organization, 25 (3), 309-27. 117

Storn, Rainer and Kenneth Price (1997), "Differential Evolution - A Simple and Efficient

Heuristic for Global Optimization over Continuous Spaces," Journal of Global

Optimization, 11(4), 341-59. 118

TABLE 1A: SUMMARY STATISTICS BY MARKET (N = 9070) Variable Mean Std. Dev. Minimum Maximum # of ISPs in the market 23.84 29.80 1 139 # of backbone providers 6.579 17.40 0 106 % population urban .4612 .3993 0 1 % population in different county 5 yrs ago .1704 .0807 0 .8667 Median household income 42644 14719 6136 200001 % population college graduate .0848 .0515 0 .825 # of business establishments/person .0235 .0067 .0028 .0981 119

TABLE IB: SUMMARY STATISTICS BY ISP (N = 2233) Variable Mean Std. Dev. Minimum Maximum Choose Rockwell (A) in October .2342 .4236 0 1 Choose US Robotics (B) in October .1742 .3794 0 1 Choose both in October .0828 .2757 0 1 Choose neither in October .5087 .5000 0 1 Choose Rockwell (A) in July .0502 .2183 0 1 Choose US Robotics (B) in July .0828 .2757 0 1 Choose both in July .0121 .1093 0 1 Choose neither in July .8549 .3523 0 1 # of markets served 96.81 451.9 1 4916 ISP has digital connection (Tl or ISDN) .7443 .4364 0 1 120

TABLE 1C: SUMMARY STATISTICS BY ISP-MARKET (N = 216,186) Variable Mean Std. Dev. Minimum Maximum Choose Rockwell (A) in October .1347 .3414 0 1 Choose US Robotics (B) in October .1706 .3762 0 1 Choose both in October .0905 .2869 0 1 Choose neither in October .6042 .4890 0 1 Choose Rockwell (A) in July .0149 .1212 0 1 Choose US Robotics (B) in July .0449 .2071 0 1 Choose both in July .0083 .0906 0 1 Choose neither in July .9319 .2519 0 1 # of ISPs in the market 61.08 35.83 1 139 # of backbone providers 22.17 33.19 0 106 ISP has digital connection (Tl or ISDN) .5981 .4903 0 1 Missing .2873 .4525 0 1 % population urban .6808 .3793 0 1 % population in different county 5 yrs ago .1663 .0850 0 .8667 Median household income 50353 18249 6136 200001 % population college graduate .1068 .0574 0 .825 # of business establishments/person .0241 .0061 .0028 .0981 TABLE 2: MAIN RESULTS (1) (2) Coefficient Std.Error Coefficient Std. Error 1 constant (yo) .6716** .0364 .9809** .0152 Correlates with 2 ln(# of markets served) -.0221" .0059 strategic thinking 3 ln(# ISPs in market) .0403** .0137 parameter T (y) 4 % population urban .1569** .0481 5 % population college graduate 1.1731** .2177 Competitive 6 # of ISP's on Rockwell -2.8343** .4297 -3.1408** .4098 incentives for 7 # oflSP's on US Robotics 1.0453** .1929 2.1284** .3243 adopting Rockwell's K56Flex (vA) S # of ISP's on both technologies -2.3236** .4742 -5.3661** .8573 Competitive 9 # of ISP's on Rockwell .2653** .0927 .3086** .0422 incentives for 10 # of ISP's on US Robotics -1.0847** .1809 -.8824** .0699 adopting US B Robotics' X2 (V ) 11 # of ISP's on both technologies 1.7539** .2800 1.2991** .1593 12 constant -1.8913* .7786 -2.3810** .0773 13 ln(# ISPs in market) .3648** .0737 -.1112** .0205 14 ISP has digital connection 2.3463** .3541 2.5445** .3696 15 missing -.6874** .1592 -.0368** .0100 Controls: Non- strategic factors that 16 ln(median household income) .2110** .0801 .2575** .0070 affect adopting # of business establishments 17 3.2702* 1.6001 3.0050* 1.2269 Rockwell's K56Flex per person (PA) 18 % population college graduate -2.1197* .8621 -3.3205** .3541 19 % population urban .2318 .1916 .2919** .0830 % county population in 20 -.0557 .5879 1.1854** .2251 different county 5 yrs ago 21 # of backbone providers -.0428** .0064 -.0017* .0008 22 constant -6.3662** .5978 -3.2724** .0186 23 ln(# ISPs in market) .0042 .0154 .0661** .0068 24 ISP has digital connection 1.0491** .1892 .9094** .0754 25 missing .0207 .0272 -.0002 .0010 Controls: Non- strategic factors that 26 ln(median household income) .5909** .0567 .2964** .0028 affect adopting US # of business establishments 27 -4.2215+ 2.4711 -.9406 .5672 Robotics' X2 per person B (P ) 28 % population college graduate 2.2869** .6438 1.3414** .3461 29 % population urban -.4507** .1212 -.1257** .0274 % county population in 30 .1165 .3937 -.1849 .1407 different county 5 yrs ago 31 # of backbone providers .0276** .0078 -.0111** .0009 32 P -.1765 .2710 .0617 .2527 33 log likelihood -2623.0 -2644.8 +significant at 90% confidence level, 'significant at 95% confidence level, "significant at 99% confidence level. TABLE 3A: ISPS WITH HIGHER x ARE MORE LIKELY TO HAVE SURVIVED TO APRIL 2007 (1) (2) (3) All ISPs Only ISPs that Acquired ISPs (T defined as in maintain an treated as table 2, column 3) independent website having exited .2259* .2247* .1943* T (.0915) (.0941) (.0907) -.3848 -.4819+ -.5203* constant (.2413) (.2481) (.2394) log likelihood -1514.4 -1403.7 -1545.4 N 2233 2040 2233 +significant at 90% confidence level. *significant at 95% confidence level. Notes: Probit regression of survival on predicted T. Standard errors in parentheses. 123

TABLE 3B: THE MODEL PREDICTS ISPS WITH HIGHER x WILL HAVE HIGHER PROFITS (1) (2) All ISPs Only ISPs that (T defined as in maintain an table 2, column 3) independent website .4671** .4605** T (.0919) (.0939) -.8348** -.8283** constant (.2425) (.2479) R-squared .0115 .0117 N 2233 2040 *significant at 99% confidence level. Notes: OLS regression of predicted profits on predicted x. Standard errors in parentheses. 124

FIGURE 1 % ISPS THAT PROVIDE AT LEAST ONE 56K MODEM TECHNOLOGY

Everyone is 12 3 4 TypeO Degree of Strategic Thinking in the Market (x) Notes: Simulations based on table 2 column 2. 125

FIGURE 2 % ISPS THAT PROVIDE EACH 56K MODEM TECHNOLOGY

100

90

80

70 Provide Only

US Robotics' 1 F . 60 Estimated level Teehno!ogy{B} of x in Table 2 50

40

30

20

10 **——H

0 Everyone is 1 TypeO Degree of Strategic Thinking in the Market (x) Notes: Simulations based on table 2 column 2. 126

Appendix

APPENDIX TABLE Wl: LEVELS OF THINKING (# OF TYPES ALLOWED IN ESTIMATION/ (1) (2) (3) (4) (5) (6) (V) Up to Up to Up to Up to Up to Up to Up to Type-0 Type-1 Type-2 Type-3 Type-4 Type-5 Type-6 .2462** .3947** .6886** .7144** .5679** .5549** constant (Yo) (.0280) (.00135) (.00334) (.0489) (.0494) (.0135) -.1864** -.3867** .00575 -.0175 .000563 .000687 Correlates ln(# of markets served) (.00822) (.00155) (.00509) (.0129) (.00369) (.00301) with strategic .0238 .0933** .0407** .0445** .0403** .0303** thinking ln(# ISPs in market) (.0165) (.000516) (.00739) (.0146) (.0138) (.00366) parameter T 1.2013** .5658** .3577** .3280** .2701** .2649** (Y) % population urban (.0477) (.00196) (.00618) (.0342) (.0431) (.0211) % population college -6.1273** -4.6629** .8968** -.5735** .3950* .5982** graduate (.00925) (.000861) (.00772) (.0901) (.1683) (.0490) Competitive -.0416** -.0550** -4.7685** -3.9492** -5.6544** -9.3099** # of ISP's on Rockwell incentives for (.00851) (.00160) (.00989) (.0945) (.3094) (.1579) adopting .0376** .0623** .1370** .1077** .5900** 1.0175** # of ISP's on US Robotics Rockwell's (00872) (.00255) (.000872) (.00300) (.0568) (.0428) K56Flex #ofISP'sonboth .00354** -.00915** -.0654** .0114** -1.0862** -1.8378** (VA) technologies (.0000874) (.00118) (.000982) (.000552) (.14407) (.1011) 1.4720** 1.1988** 1.8447** 1.4388* 2.4819** 4.2772** Competitive # of ISP's on Rockwell (.00821) (.00596) (.0244) (.5700) (.3872) (.2901) incentives for -1.4038" -1.6029** -4.6372** -3.9966** -5.6658** -9.2913** # of ISP's on US Robotics adopting US (.00794) (.00617) (.0299) (.0885) (.3085) (.1752) Robotics' X2 B #ofISP'sonboth -.0156** .5645** 5.5753** 2.9390** 5.9635** 9.6295** (>|> ) technologies (.000362) (.00396) (.0485) (.1348) (.3824) (.2692) .00662** -.1642** -.1660** -.0874** -.0671** -.2487** -.0483** constant (.000776) (.00000365) (.00000831) (.0000602) (.00493) (.00885) (.00805) .00117** .000137** -.000223** .0231** .00277+ .0114** .0113** ln(# ISPs in market) (.000199) (.00000135) (.00000279) (.0000204) (.00166) (.00309) (.00213) .000144 .000273** .000133** 4.6722** 3.8302** 5.5712** 9.1887** ISP has digital connection (.000161) (.0000106) (.0000174) (.00905) (.0925) (.3059) (.1640) .000121 -.000102 -.0000805 -.000315** .000713** .0118** .0108** Controls: missing (.000179) (.0000199) (.000103) (.0000237) (.000183) (.00310) (.00203) Non-strategic .00152** .0173** .0175** .00620** .0118** .0235** .00527** ln(median household income) factors that (.0000538) (.00000266) (.0000366) (.00000470) (.0000966) (.00153) (.000907) affect # of business establishmts -.6510*' -.4192** -.1879** .3304** -.1679* -1.1268** -1.1982** adoption per person (.00154) (.000592) (.000746) (.00637) (.0768) (.2290) (.0789) (P) % population college -.0255** -.1448** -.1742** .4244** .7111** .5563** .5177** graduate (.000847) (.000351) (.0000321) (.00455) (.0242) (.0967) (.0653) -.00305** .00626** .00719** -.0118** -.0657** -.0340** -.0332** % population urban (.000746) (.0000795) (.0000172) (.000866) (.00429) (.0102) (.00991) % county population in -.0430** -.00884** -.0160** -.4005** -.4141** -.1865** -.2003** different county 5 yrs ago (.0000405) (.000528) (.000104) (.00138) (.0160) (.0663) (.0445) -.0000339** -.0000142** -.0000359** -.00291** -.00192** -.00128** -.00109** # of backbone providers (.00000829) (.000000509) (.00000166) (.00000295) (.0000827) (.000169) (.0000879) log likelihood -3034.6 -2744.7 -2735.4 -2712.3 -2664.7 -2644.7 -2641.8 +significant at 90% confidence level. *significant at 95% confidence level, "significant at 99% confidence level. "Assumes symmetry between factors that drive adoption of technology A and technology B (aside from differentiation). 127

APPENDIX TABLE W2: ROBUSTNESS TO ALTERNATIVE (SYMMETRIC) SPECIFICATIONS

b (1) (2) (3) (4)» (5) Determinants Allow July Single- correlation Decisions Basic OfT market ISP between A Treated as model not model estimated & B errors Exogenous .5679** .9000** .5874** .7479** .5451** constant (y0) (.0494) (.0150) (.0543) (.007) (.0110) .000563 -.00991 -.012* N/A Correlates with ln(# of markets served) (.00369) (.00877) (.0058) strategic .0403** .0409** .057** .1145** thinking ln(# ISPs in market) (.0138) (.0147) (.0066) parameter T (.00318) .2701** .2225** .1253** -.0287* % population urban (Y) (.0431) (.0533) (.0125) (.0144) .3950* .7488** 1.1289** -.0320** % population college graduate (.1683) (.2849) (.0127) (.00631) -5.6544** -5.6572** -5.6821** -2.6209** -.3796** Competitive # of ISP's on Rockwell incentives for (.3094) (1.0501) (.0831) (.053) (.00532) adopting .5899** .1551** .6133** .1652** -.00807** # of ISP's on US Robotics Rockwell's (.0568) (.0288) (.0527) (.0093) (.000354) -1.0862** .00734** -1.0773** -.152** .0113** K56Flex # of ISP's on both technologies (vA) (.1441) (.00248) (1515) (.0268) (.000241) 2.4819** 2.6414** 2.5865** -.2315 .2472** # of ISP's on Rockwell Competitive (.3872) (.9817) (.2583) (.172) (.0156) incentives for -5.6658** -5.7030** -5.6954** -2.5025** adopting US # of ISP's on US Robotics -1.1694** (.3085) (1.0444) (.0739) (.0404) (.0199) 5.9635** 3.9459** 5.9614** .8055** 1.0465** (vB) # of ISP's on both technologies (.3824) (.7064) (.0732) (.083) (.0252) -.2487** -.0522** -.2801** .0315* .0544** (.00885) (.00204) (.0306) (.0124) (.0129) -.00465** -.0011 ln(# ISPs in market) .0114** .0107** .3754** (.00309) (.000693) (.00402) (.0033) (.00252) 5.5712** 5.5933** 2.48** ISP has digital connection 5.5266** -.1928** (.3059) (1.0403) (.0719) (.0207) (.0135) .0118** .00177** .0114** .00001 -2.1305** (.00310) (.000163) (.00363) (.0002) (.0108) Controls: Non- .0235** .0192** .0265** -.0044** -.000561 strategic factors ln(median household income) (.00153) (.00128) (.00258) (.0007) (.000653) adoption # of business establishments -1.1268** -3.4383** -.5293 -2.5307** -.00238 per person (.2290) (.6078) (.4694) (.0102) (.00563) (P) 1.0594** .00291 % population college graduate .5563** .5043** 1.3298** (.0967) (.1069) (.155) (.07) (.00507) -.0340** -.0516** -.000445 % population urban -.0988** -.0359* (.0102) (.00953) (.0144) (.0104) (.000872) % county population in -.1865** -.5118** -.2268* .026 -.00169 different county 5 yrs ago (.0663) (.0417) (.1012) (.079) (.00421) -.00128** -.000824** -.00120** -.0017** .00203** # of backbone providers (.000169) (.0000394) (.000279) (.0004) (.0000371) P -.4888** -.4825** (.0148) (.0103) log likelihood -2644.7 -2677.3 -2633.7 -2082.9 -225,705 +significant at 90% confidence level. *significant at 95% confidence level. **significant at 99% confidence level. a Here we treat the decisions made before July as exogenous. So, if an ISP had adopted one technology by July, this ISP only needed to consider whether to adopt the other technology or not in October. Of course, for those ISPs that had adopted both technologies by July, they had no technology adoption choice to make in October. Our previous structure is still applicable to those ISPs that had adopted neither technology by July. It is possible that earlier decisions by ISPs were observed by later adopters. In order to reflect the influence of these potentially observed decisions in July, we incorporate them into the expectation formation process of all ISPs and update their profit functions and choice probabilities accordingly. For example, if type k ISP/ adopted technology A by July, its choice probabilities in October conditional on its type are: 128

{Pr(.y. = 0|A),Pr(s = A\k),Pr(s. = fi|*),Pr(s = AB\k)} = {0,Pr(^ < K* |/fc),0,l -Pr(^ < n? \k)} where ?I{^ < ^ \k) = Pr(£ £[)r» |/fc] + / + r < 0) = *(-£ £[< \k] - T) • b The single market ISP model treats each local branch of a multi-market ISP as an independent decision-maker, which means that local branches of the same ISP make independent decisions and that these decisions can be different from each other. In the multi-market ISP model presented in the main paper, we have the constraint that all branches of a multi-market ISP must make the same choice. TABLE W3: BETTER-RESPOND RATHER THAN BEST-RESPOND (1) (2) 97% accuracy of 85% accuracy of what lower types what lower types should do should do .6832** .7344** constant (yo) (.0476) (.042) -.014** -.0338** Correlates with ln(# of markets served) (.0053) (.0035) strategic .0412* .0495** thinking ln(# ISPs in market) (.0166) (.013) parameter T .2029** .2542** (y) % population urban (.0673) (.0441) .7775** 1.4826** % population college graduate (.0818) (.197) Competitive -3.2773** -3.5464** # of ISP's on Rockwell incentives for (.175) (.3267) adopting 1.1942** 1.0159** # of ISP's on US Robotics Rockwell's (.079) (.0865) K56Flex -2.8131** -2.6232** # of ISP's on both technologies (vA) (.192) (.2287) .2797** .5351** Competitive # of ISP's on Rockwell (.0598) (.0505) incentives for adopting US # of ISP's on US Robotics -1.1876** -3.9845** Robotics' X2 (.1787) (.3165) (vB) # of ISP's on both technologies 2.0056** 8.8276** (.297) (.7267) -2.136** -4.9438** constant (.0047) (.2324) .3817** .7244** ln(# ISPs in market) (.002) (.0742) 2.8437** 3.2915** ISP has digital connection (1815) (.3206) -.8048** -1.0363** Controls: Non- missing (.0092) (.1031) strategic factors .2292** .4797** that affect ln(median household income) adopting (.0021) (.0246) Rockwell's # of business establishments 3.3227** -15.0931** K56Flex per person (.6973) (4.1787) A -1.7764** 3.4961* % population college graduate (P ) (.2589) (1.5167) .322** .6561* % population urban (.0522) (.2616) % county population in -.3869 -.6729 different county 5 yrs ago (.2872) (.5935) -.0414** -.0728** # of backbone providers (.0003) (.007) -8.8489** -19.8065** constant (.1782) (2.0141) .0249 -.0173+ ln(# ISPs in market) (.0195) (.0091) 1.1319** 3.6686** ISP has digital connection (.167) (.2961) .0215 .2385** Controls: Non- missing (.0301) (.0212) strategic factors .8171** 1.8279** ln(median household income) that affect (.0154) (.19) adopting US U of business establishments .3058 4.6567* Robotics' X2 per person (.5631) (2.3319) B 1.2688** 2.3056** (P ) % population college graduate (.1553) (.5884) -.5674** -.2524+ % population urban (.0985) (.1403) % county population in .3201** -.8019** different county 5 yrs ago (1171) (.1086) .0343** .0499** # of backbone providers (.0042) (.0062) -.2177 -.2329 P (.3388) (.5547) log likelihood -2617.3 -2601.7 130

+significant at 90% confidence level, 'significant at 95% confidence level, "'significant at 99% confidence level. 131

TABLE W4: ROBUSTNESS TO SYMMETRIC, SINGLE MARKET SPECIFICATIONS WITH UNOBSERVED HETEROGENEITY a a (D (2) (3)a (4)a 500 markets, 500 markets, 1000 markets, 1000 markets, with no with no heterogeneity heterogeneity heterogeneity heterogeneity .3273** .3259** .3760** .3757* constant (y0) (.0997) (.0932) (.0954) (.1778) Correlates ln(# of markets served) N/A N/A N/A N/A with strategic .1699** .1705** .1499** .1499** ln(# ISPs in market) thinking (.0288) (.0262) (.0295) (.046) parameter x .0704 .0692 -.0561 -.0561 % population urban (Y) (.0529) (.0513) (.0394) (.0417) % population college -.1191 -.1225 -.2411 -.2407 graduate (.3224) (.3059) (.5678) (.3442) Competitive -.8652** -.844** -.7219** -.7222** # of ISP's on Rockwell incentives for (.0543) (.0644) (.1433) (.1305) adopting -.0371** -.0378** -.3609+ -.361 + #ofISP'sonUS Robotics Rockwell's (.0068) (.0062) (.2082) (.1934) K56Flex #ofISP'sonboth .1448** .1389** .58** .5801** (vA) technologies (.0076) (.0154) (.2245) (.1771) .053 .1371 .9428** .9435** Competitive # of ISP's on Rockwell (.1942) (.2026) (.0357) (.0282) incentives for -.8353** adopting US # of ISP's on US Robotics -1.9824** -1.9904** -.8361** Robotics' X2 (.3237) (.2161) (.0312) (.0227) B # of ISP's on both 2.4442** 2.5732** -.1263** -.1262** (V ) technologies (.5052) (.3565) (.0228) (.014) .6289** .5269 -.1822** -.1815** constant (P0) (.0796) (.3863) (.0633) (.0302) .3204** -.0966** -.0968** ln(# ISPs in market) .3012** (.027) (.0413) (.0249) (.0228) -.3477** ISP has digital connection -.3681** .6612** .6612** (.0998) (.0888) (.0328) (.0222) -2.2373** -2.2417** -.2298** -.2297** missing (.1288) (.1342) (.0305) (.0242) Controls: ln(median household .0084 .0137 .0147 .0147* Non-strategic income) (.0109) (.036) (.0111) (.0072) factors that # of business establishments -6.052** -6.0143** -.6304 -.6402 affect per person (.8402) (1.7908) (.5024) (1.8129) adoption % population college .0102 -.012 -.0726 -.0719 (P) graduate (.0547) (.216) (.1601) (.1986) -.0431** .0058 % population urban -.0418 .0059 (.0073) (.0371) (.0257) (.0264) % county population in -.1354 -.1353 -.1422 -.1426 different county 5 yrs ago (.1212) (.093) (.2004) (.1871) -.0027** -.0026* -.0047** -.0047** # of backbone providers (.0008) (.0011) (.0006) (.0006) .0188 .0013 constant (a ) 0 (.0117) (.0286) log likelihood -6071.5 -6072.1 -25,825.0 -25,825.0 +significant at 90% confidence level. *significant at 95% confidence level. **significant at 99% confidence level. "Market-level unobserved heterogeneity is captured by the random intercept C: C~N(p0,a0). The markets were randomly selected. For 500 markets, after omitting markets with 1 or 2 ISPs, we get a sample of 290 markets. For 1000 markets, after omitting markets with 1 or 2 ISPs, we get a sample of 694 markets. These estimates use the single market ISP model in Appendix Table W2 column 5 because unobserved heterogeneity is not identified in the multi- market model. 132

TABLE W5: OPERATIONAL SOPHISTICATION, SURVIVAL, AND STRATEGIC THINKING Dependent Variable b b T3 Survival" Survival Survival Have a networking .0798** -.0678 -.0840 -.0505 maintenance business (.0219) (.0980) (.0985) (.0905) Have a web design .0377* .0799 .0719 business (.0187) (.0837) (.0838) T .205 .212+ (.129) (.128) Constant 2.61** .315** -.220 -.219 (.0105) (.0467) (.338) (.338) Log likelihood N/A -799.4 -798.2 -798.5 R2 .022 N/A N/A N/A # of observations 1213 1213 1213 1213 +significant at 90% confidence level. *significant at 95% confidence level. **significant at 99% confidence level. aOLS Regression; bProbit Regression Notes: Uses the 1213 ISPs for which we have data on other activities that proxy for operational sophistication. 133

TABLE W6: NUMBER OF ISPS ADOPTING FOR DIFFERENT SIMULATED LEVELS OF STRATEGIC THINKING Adopt Adopt Rockwell Adopt US Robotics' Semiconductor's Adopt both neither Technology Technology (B) (A) Everyone is type-0 12.5 282.4 29.1 1909.0 x = l 157.7 111.5 1258.5 705.3 T = 2 1238.4 127.2 599.8 267.6 T = estimated from the data (the average is 2.62 though it 1106.9 521.0 453.2 151.8 varies across firms) x = 3 1242.6 511.6 371.0 107.9 x = 4 1544.7 397.4 242.1 48.8 x = 5 1723.6 296.7 187.4 25.3 Everyone is type-1 96.9 10.6 2122.0 3.5 Notes: Simulations based on table 2 column 2 and 2,233 total ISPs in the data. This table was used to generate figures 1 and 2 134

Haruvy, Stahl, and Wilson (2001), Ho, Lim, and Camerer (2006), and others discuss a number of other behavioral economic models of player heterogeneity such as McKelvey and Palfrey's (1995) quantal response equilibrium.

23 Augereau, Greenstein, and Rysman (2006) find that ISPs want to differentiate from each other by adopting different technologies. Our result is based on but beyond that finding: when players become more strategic (with a larger T in the type distribution), the diffusion of a new technology becomes slower. More detail is provided in the RESULTS section.

24 High strategic ability overall is likely correlated with observed strategic behavior in the decision to adopt 56K modems. And firms with high strategic ability are likely to do better in other dimensions, and thus more likely to survive in the long run. More detail is provided in the RESULTS section.

25 Intuitively, firms have more chances to hire more strategic managers in larger cities and in markets with more educated populations, because of a larger and better workforce pool.

With more competitors around, it is easier for managers to notice competition, gain experience from interacting with managers from other firms, and accordingly become more strategic. Further detail is provided in the RESULTS section.

26 Brown, Camerer, and Lovallo (2007) undertake a similar exercise, comparing quantal response equilibrium, cursed equilibrium, and CH in the context of movie distributors' decisions to show movies to critics. Che, Sudhir, and Seetharaman (2007) and Lim and

Ho (2007) also explore the consequences of behavioral assumptions to firms. Other related studies document biases exhibited by real-world managers (Hortacsu and Puller 2007; Chan, Hamilton, and Makler 2007) and develop semi-parametric models of rationalizability (Aradillas-Lopez and Tamer 2008).

27 One might wonder about interim profits before the competition develops and the loss of profits due to non-adoption of superior technologies. Our setting is a one-shot simultaneous-move entry game which happened in a relatively short time period. We assume firms are acting quickly and maximizing their profits within this short time interval. We cannot say much about the long-term profits and dynamic forward-looking decisions. We list this as a limitation in the LIMITATIONS section.

28 Similarly, an important difference between Perfect Bayesian Equilibrium (PBE) and

CH is that: in a PBE, all players have unbiased beliefs about the true type distribution (or other distributions), while in a CH model, the beliefs can be wrong.

29 This seems to be a strong assumption. But this is a common assumption in the empirical entry game literature. Without this assumption, it is very difficult to establish equilibrium conditions and estimate the model.

While it may seem unintuitive to include market level characteristics, we believe it is an empirical question whether they matter (and we find they do). We include the firm- level covariate number of markets served because we felt it made intuitive sense even though it is not significant in many specifications.

In the main results, we treat all ISPs' technology adoption decisions as simultaneous regardless of whether they first occur in July or October 1997. As Augereau, Greenstein, and Rysman (2006) discuss, the descriptive statistics suggest this is a reasonable thing to do. For example, in Table lb, over four times as many ISPs had adopted Rockwell

Semiconductor's technology in October as compared to July. 136

Haile, Hortacsu, and Kosenok (2008) suggest this type of validation strategy in their

paper on the difficulties in estimating Quantal Response Equilibria using data from

outside the laboratory.

33 For example, typing "www.abts.net" forwards the visitor to "www.earthlink.net." We

interpret this as the ISP having been acquired but show robustness to not including these

ISPs.