<<

The Power of Rankings: Quantifying the Effects of Rankings on Online Consumer Search and Choice∗

Raluca M. Ursu†

Current version: September 16, 2015 Please find an updated version at home.uchicago.edu/∼rursu/

JOB MARKET PAPER Abstract

The Internet has led to an explosion of product choices facing consumers. When con- sumers face many options, intermediaries can help by ranking them, which in turn can influence how consumers search and what they ultimately purchase. To understand the role of intermediaries’ rankings, it is important to separate the effect of the position in which a firm is displayed in an intermediary’s listing and the characteristics of the firm. However, as intermediaries choose rankings to maximize their profits, rankings are endogenous, thus separately identifying their role is challenging. In this paper, I identify the causal effect of rankings by using a data set on hotel searches from that includes searches from an experiment where rankings were randomly generated. First, using detailed clickstream and purchase data, I show that (1) top positions lead to more clicks and purchases, but condi- tional on a click, higher ranked hotels do not generate more purchases and (2) that rankings mainly affect choices by reducing search costs, rather than expectations or direct utility. I then turn to quantifying the effect of rankings on consumer choices. To this end, I estimate a sequential search model and find position effects ranging from $1.85 to $3.73, lower than those typically found without experimental variation. Finally, I construct three counterfac- tuals to show how companies can use these results to design more effective rankings. First, I find that using the model’s estimates to construct a utility-based ranking leads to a sizable increase in consumer valuation of as much as $38.36 (21% of the purchase price). Second, I simulate consumer choices when search costs increase (as on mobile devices) and show that this would cost consumers as much as $16.23 (9% of the purchase price) in poorer matches and higher prices, highlighting the tension between consumer search costs and the impact of the ranking. Third, I evaluate the merits of a recently adopted approach at of only ranking independent hotels. Contrary to existing concerns about reducing the diversity of the listed hotels, I find that such a ranking benefits consumers by as much as $9.20 (5% of the purchase price), suggesting new avenues for improving the performance of a ranking.

Keywords: online consumer search, hotel industry, search intermediaries, popularity rank- ings, endogeneity bias. JEL Classifications: L81, D83.

∗I wish to thank Ali Hortaçsu, Pradeep Chintagunta, Hugo Sonnenschein, Richard Van Weelden, Elisabeth Honka, Sergei Koulayev, Stephan Seiler, Matthew Backus, Chris Nosko, Anita Rao, and Bradley Shapiro. I thank the participants at the Searle Conference on Internet Search and Innovation (June 2015, Chicago), the Workshop on Search and Switching Costs (May 2015, Groningen), the IIOC conference (April 2015, Boston) and the University of Chicago IO and Marketing working groups. I thank Kaggle and the Wharton Consumer Analytics Initiative for providing me with the data. The usual disclaimer applies. †University of Chicago, E-mail: [email protected]. 1 Introduction

The advent of the Internet has led to an explosion of product choices facing consumers. Companies with large product assortments face a fundamental question of how to present these products to consumers. Rankings provide an efficient solution to presenting products, because they decrease consumer search costs and increase the probability of a match between the consumer and the product. In fact, rankings have become so profitable that online search intermediaries, such as or Expedia, derive a large fraction of their revenues from ranking third party sellers by relevance for consumers. Even though rankings provide an answer for presenting products when companies have large product assortments, they also lead to a host of new questions, two of which I address in this paper. How do rankings shape consumers’ product discovery process? What is the economic value of rankings? In this paper, I seek to understand how rankings affect consumer choices by focusing on the travel industry, whose core business revolves around rankings.1 Total spending in the U.S. travel industry exceeded $458 billion in 2014 alone, of which 43% was sold online, while the rest mostly represents business trips handled by corporate travel agents. Almost 80% of bookings made online are made through online travel agents (OTA), which had combined bookings of $157 billion in the U.S. and $278 billion world wide in 2013. To compete for consumers, OTAs aggregate and rank third-party seller’s products for consumers, such as flights or hotels. How do these rankings affect consumers’ search and choice? How can companies improve their rankings? The main challenge in answering these questions is that the rankings that consumers observe are not exogenous, but are rather chosen by OTAs to maximize their profits. As a result, separately identifying the effect of ranking on choices from all other characteristics of the ranked firm is difficult. This problem has been previously identified in the literature and existing approaches control for the endogeneity of position using a number of methods, including a control function approach (De los Santos and Koulayev, 2014), regression discontinuity design (Narayanan and Kalyanam, 2014), simultaneous equation model (Ghose et al., 2013) or latent instrumental variables (Rutz, Bucklin and Sonnier, 2012). However, without experimental variation in the position of a ranked company, measuring the causal effect of rankings on consumer choices is challenging. My paper employs a unique data set on hotel searches from Expedia to recover the causal effect of rankings on choices. The unique feature of this data set is that it includes searches performed on randomly ranked hotels in addition to searches from Expedia’s ranking. In constructing the Random ranking, a hotel’s popularity or match with the consumer did not affect its probability of being displayed in any position, but rather the position of the hotel was randomly determined. Using this feature of the data set, I identify two interesting patterns of the causal effect of rankings on consumer choices. First, I show that higher ranked hotels are clicked and purchased more often under both rankings. The fact that top positions receive more clicks and purchases has been

1All figures reported come from three sources: 1. The Economist article: http://www.economist.com/news/ business/21604598-market-booking-travel-online-rapidly-consolidating-sun-sea-and-surfing; 2. Forbes article: http://www.forbes.com/sites/greatspeculations/2014/04/08/competitive-landscape\ -of-the-u-s-online-travel-market-is-transforming/; 3. Wall Street Journal article: http://www.wsj. com/articles/amazons-new-travel-service-enters-lucrative-online-travel-market-1429623993

2 documented for other search intermediaries (e.g. Google) and is thus not surprising of Expedia’s curated ranking. However, the fact that the same pattern holds for randomly ranked hotels reveals the importance of a hotel’s position in influencing consumer choices. Second, I find that conditional on a click, higher ranked hotels receive more purchases only under Expedia’s ranking, while under the Random ranking the fraction of purchases across positions is constant. This shows that conditional on a click, consumers do not derive any additional utility from purchasing from a higher ranked hotel. In other words, consumers’ realized utility does not depend on the position of the hotel. The fact that under Expedia’s ranking top ranked hotels lead to more purchases reveals that Expedia provides consumers with a real benefit: it identifies which hotels consumers wish to purchase from (not necessarily those they wish to click on) and ranks those at the top, thereby helping consumers search more efficiently. The observation that consumers’ realized utility is not affected by rankings, raises the question of what mechanism is responsible for the effect of rankings on choices. Having ruled out the effect of position on utility and of rankings on purchases, there are only two other mechanisms through which the position of an alternative in a ranking may affect its probability of a click: rankings affect consumers’ expected utility (Varian, 2007; Athey and Ellison, 2011) or consumers’ search costs (Ghose et al. 2012b; Chen and Yao, 2014). I use the availability of opaque offers, i.e. deals where consumers get a large discount by booking a hotel they only learn about after they make a transaction, to shed light on this question. Since an opaque offer demotes a hotel to a lower position without providing additional information about the hotel, it creates variability in the position of a hotel that is unrelated to consumers’ expectations. My test exploits this variability and reveals evidence consistent with rankings affecting consumer search costs, providing new insights into the role of rankings in consumer choices. Finally, I quantify the effect of rankings on choices using a structural model and I show how companies can use these results to improve their rankings. Using a sequential search model à la Weitzman’s (1979), I find position effects ranging from $1.85 to $3.73 for different cities in my data, which are typically lower than previous results without experimental variation. I use the preference and search cost parameters estimated to construct three counterfactual experiments of interest. First, I quantify the value of improving Expedia’s current ranking by ordering hotels based on their search model estimated expected utility. I find a sizable increase in consumer valuation of as much as $38.36 (21% of the purchase price), a third of which comes from lower search costs. I also find that at least half of consumers buy the same hotel under the Utility- based ranking as they did under Expedia, highlighting once more the ability of Expedia’s current ranking to identify hotels consumers wish to purchase from. Second, I predict consumer choices when their search costs increase, as is the case on mobile devices, where the smaller size of the screen focuses consumers’ attention on higher ranked alternatives. I find that displaying the same ranking on a mobile device costs consumers as much as $16.23 (9% of the transaction price) in poorer matches and higher prices. This result emphasizes the tension between consumer search costs and the impact of the ranking. If search was frictionless, then consumers would make the same choices regardless of the ranking they observe. However, as search costs increase, the need

3 for an improved ranking increases as well, since consumers naturally rely more heavily of top ranked hotels. Third, I analyze the consequences of decreasing the diversity of hotels ranked by dedicating a ranking to one type of hotel. This counterfactual is motivated by Amazon’s recent entry into the online travel agent market focusing on only ranking independent hotels. One concern with Amazon’s approach is that the decrease in diversity of the hotels ranked will hurt consumers. Surprisingly, I find that even though match values decrease, this loss is compensated by lower search costs, leading to an overall gain of as much as $9.20 (5% of the transaction price) for consumers. In contrast, the alternative of only ranking chains is detrimental to consumers, leading to a loss of at least $21.68. These results suggest that pre-filtering hotels may be an effective way to improve matches between consumers and hotels, when the filter chosen is diverse enough to have a small impact on matches, but substantially decrease consumer search costs. As such, this insight opens up new avenues to improving matches without refining the ranking algorithm, but instead focusing on which parts of the ranking to recommend to consumers. My research contributes to the study of the causal effect of rankings and provides new insights into the construction of ranking algorithms. From a managerial standpoint, providing consumers with relevant rankings of alternatives decreases search costs, increases consumer welfare and likely increases long-term profits for the OTA from increased loyalty. A first step to constructing such rankings is understanding their effect on consumer search and choices, which I am able to do leveraging a unique data set that allows recovering the true effect of rankings. The rest of the paper is organized as follows. In the next two sections, I review related work and describe the institutional details of the online travel agent market. In Section 4, I present the data, and in Section 5, I describe reduced form evidence of the effect of position on consumer choices. In Section 6, I introduce the model followed by a discussion of the estimation approach and identification. In Section 8, I present my results, while in Section 9, I consider three counterfactuals. The last section concludes and provides a discussion of limitations and future research.

2 Related Literature

This paper relates to the literature on consumer search. Since this literature is extensive, I focus in this section only on the most closely related strand examining the effect of rankings on consumer choices. Papers such as Chen and Yao (2014), De los Santos and Koulayev (2014), Koulayev (2014), and Ghose et al. (2012a, 2012b, 2013), consider the effect of rankings on choices in the online hotel industry and find position effects ranging from 25 cents in Chen and Yao (2014) to $35 in De los Santos and Koulayev (2014). I generally find lower position effects using the Random ranking which eliminates the endogeneity of the ranking. Two papers, De los Santos and Koulayev (2014) and Ghose et al. (2013), address the endogeneity problem of the ranking using a control function approach and a simultaneous equation model, respectively. In contrast, my paper provides a setting where some searches come from a Random ranking, thus removing the endogeneity present in rankings. In terms of improved ranking methods, in one counterfactual I

4 rearrange hotels based on estimated utility. Ghose et al. (2012a) was one of the earliest papers to use a utility-based ranking, a method which is closely related to work in online recommender systems (see Ansari et al. 2000, Ansari and Mela 2003), and to show through lab experiments that it is superior to several baseline rankings. Futhermore, Ghose et al. (2013) show that a utility-based ranking outperforms other rankings in terms of revenues, while De los Santos and Koulayev (2014) show that it can increase click through rates almost twofold. In contrast, I focus on quantifying consumer welfare gain from a utility-based ranking over the current ranking, as well as measuring the impact of this ranking on hotels and Expedia. In the second counterfactual, I measure the impact of transferring the current ranking to a mobile device. Ghose, Goldfarb and Han (2012) show that compared to desktop computer, on mobile devices ranking effects are 50% higher, a result I use in computing my counterfactual and quantifying consumer value of a ranking on mobile device. Understanding how rankings affect consumer search is also a central question in the online sponsored search ads literature (Ghose and Yang, 2009; Yang and Ghose, 2010; Jerath et al. 2011; Yao and Mela, 2011; Athey and Ellison, 2011; Jeziorski and Segal, 2012; Blake et al. 2014; Baye et al., 2014; Jeziorski and Moorthy, 2014; Chan and Park, 2014; Narayanan and Kalyanam, 2014; Athey and Imbens, 2015) as well as in the theoretical search literature (Hagiu and Jullien, 2011; Berman and Katona, 2013; De Corniere and Taylor, 2014). Three papers are most relevant for my work. Baye et al. (2014) study search results at Google and Bing to measure the importance of name prominence and position on consumers’ clicks. In separately identifying the two effects, they are worried about the endogeneity bias of position, which they solve by instrumenting for position and ads on Google with position and ads on Bing. They also find that failing to account for the endogeneity in position inflates the position effect and minimizes the effect of name prominence. Second, Narayanan and Kalyanam (2014) show how to use advertisers’ quality scores in a regression discontinuity design framework to address the endogeneity of ad position. They demonstrate a negative bias in the position effect when not accounting for endogeneity, which could result from advertisers bidding lower when they have a promotion that attracts more organic clicks. Since in the online travel agent market hotels cannot directly affect the position they will be displayed in, only a positive bias could result from endogeneity, which I document in this paper. Most recently, Athey and Imbens (2015) study machine learning methods to estimate heterogeneity in causal effects, which they apply to data from an experiment demoting the best matched search result to the third position. The authors find a differential effect of rankings on clicks: click through rates on the third ranked option decrease the most in search queries that do not result in images or videos and that are informational in nature. In addition, I employ a data set where rankings were fully randomized and I also investigate the effect of rankings on purchases.

5 3 The Online Travel Agent Market

In this section, I describe the institutional details of the online travel agent (OTA) market that are relevant for my paper. As mentioned in the introduction, almost 80% of online bookings nowadays are handled by OTAs. In the U.S., four firms, Expedia, Booking, and , account for 95% of bookings. OTAs are used by consumers to reserve flights or hotel rooms and rent cars. OTAs revenue is derived under both the agency and the merchant model. Under the agency model, the OTAs receive a commission from third-party sellers for a purchase. These commissions are estimated to range in the hotel industry from 10% to 25%. Under the merchant model, the OTA negotiates with third party sellers to purchase a block of hotel rooms at a wholesale price, which it then marks up and sells to consumers. In the case of Expedia, in the first quarter of 2013 (the relevant period for my analysis), 70% of its global revenues come from the sale of hotel rooms and most if its bookings were done under the agency model (54%).2 To maximize profits, the OTA ranks third-party sellers’ products. A more relevant ranking decreases the consumer’s search cost and increases her probability of a valuable match with a third party seller. Rankings matter for the OTA via two channels: increasing the probability of a purchase and repeated business from consumers. OTAs invest in constructing relevant rankings for consumers on the basis of machine learning techniques. In this section, I provide a general overview of the ranking algorithm used by OTAs, while in Appendix B 12.2, I describe the technical details behind “learning to rank” algorithms. A hotel’s position in the ranking is a function of its past conversion rate and click through rate, its characteristics (price, quality) and its match with the consumer search query entries (e.g. availability). The learning to rank algorithm learns a function of these components that best predicts the probability of a purchase at a certain hotel. In effect, the mechanism assigns each hotel a score and higher scoring hotels will be listed closer to the top of the ranking. Hotels cannot directly affect the probability of being ranked in a certain position, however, indirectly, by increasing quality or decreasing prices, they may improve their position in the ranking. In addition, OTAs reserve specific positions in the ranking for sponsored ads. In the case of Expedia, the top position and the last two positions in the ranking are reserved for sponsored ads for hotels. In this case, the hotel may sign up with TravelAds and target consumers searching for hotels in a particular destination and for particular travel dates.3 Such a hotel is then entered into a pay-per-click auction for the sponsored ad slots that is adjudicated by evaluating the bids submitted by all hotels and the quality score of the hotel. The winning hotel is assigned the top position in the ranking, with the second and third placed hotels are shown at the bottom of the ranking. The winning hotels are displayed in these positions on all result pages. The institutional details of the OTA market motivate and affect my analysis. In particular,

2All figures reported come from three sources: 1. Forbes article: http://www.forbes.com/sites/ greatspeculations/2014/04/08/competitive-landscape-of-the-u-s-online-travel-market-is-transforming/; 2. Expedia’s Earnings Release for the first quarter of 2013; 3. Wall Street Journal article: http://www.wsj.com/ articles/amazons-new-travel-service-enters-lucrative-online-travel-market-1429623993. 3Details on the auction used to determine sponsored ads can be found at http://searchsolutions.expedia. com/how-it-works/.

6 the learning to rank algorithm used makes the position of the seller endogenous. As a result, using observational data on consumers searching for hotels given a particular ranking will not suffice in determining the causal effect of the ranking on consumer choices. Thus, I rely on estimates using the Random ranking to recover the causal effect of the ranking on choices. Without properly separating the effect of the position on choices from other firm characteristics, improving the current ranking as well as measuring the welfare effect of rankings would not be possible.

4 Data

The Expedia data set that I use comes from a competition organized at the International Confer- ence on Data Mining (ICDM) in December 2013 entitled “Learning to rank hotels to maximize purchases”. This contest started in September 2013 and ended in November 2013 and was hosted by Kaggle.com.4 The data is provided at the level of a search impression. A search impression is an ordered list of hotels (a ranking) and their characteristics (such as the number of stars, consumer reviews, and prices) seen by consumers in response to a search query describing the location and dates of their trip. The most important feature of this data set is the fact that only two thirds of the data set comes from search impressions under Expedia’s proprietary ranking, while the rest of the data comes from search impressions where the ranking was randomly gen- erated. A random ranking is a ranking where the position of the hotel does not depend on its characteristics or its past purchases, but rather is generated randomly.5 Search impressions with a Random ranking are costly for Expedia since they generally lead to fewer purchases. Neverthe- less, they are used by Expedia to train their ranking algorithm without the position bias of the existing algorithm. I will use this feature of the data to investigate the causal effect of rankings on consumer search and purchase decisions.

4.1 Description of the Data

The data set I use contains 7,986,074 observations on hotels from search impressions between November 1, 2012 and June 30, 2013.6 The data comes from searches of 132,412 hotels located in 171 countries and 21,190 different destinations. In Figure 1, I summarize graphically the variables of interest present in the data. At the search query level, I have information on the date and time of the search, the destination ID (city, county or neighborhood), the length of stay (in days), the booking window (the number of days between the search and the first day of the trip), the number of adults and children traveling, the number of rooms searched, and an indicator for whether the trip includes a Saturday night. At the search impression level, I observe the first page

4See Appendix B 12.2 for details about learning to rank algorithms and about the winning algorithm in this competition, LambdaMART. 5See Appendix B 12.1 for a formal test of the two types of randomness present in my data: (i) consumers were randomly assigned the two rankings; and (ii) the position of the hotel was randomly generated under the Random ranking. The test for (ii) also provides insights into what characteristics of the hotels influence Expedia’s ranking. 6Appendix A contains details about data cleaning.

7 of results that was displayed to consumers.7 This contains the hotel ID and its characteristics (for example, the price and the number of stars) and position in the ranking.8 I observe consumer choices in the form of their clicks and purchases at a particular hotel. Moreover, less than 5% of observations also include information on the average star rating and average price of hotels previously purchased by a consumer, as well as the country in which the consumer lives. However, this is not enough information to link consumers who are making repeated searches over time, and as a result I will interpret each search impression as being made by a different individual.9 Finally, at Expedia, after typing a search query, the consumer observes its optimized ranking. She can then sort or filter results by price, number of stars or other characteristics. My data only contains information about searches were consumers did not sort or filter, but rather only made choices from Expedia’s default ranking. Figure 1: Information on the Data Observed

Hotel’s Page: In- formation After the Search Query Click Search Impression: Information Before the Click

Table 1 provides summary statistics at the level of a search impression. The data set contains 317,218 search impressions with an average number of 25 hotels shown in a search impression. Consumers on average search more than a month in advance of their trip for trips lasting ap- proximately two days. Approximately half of impressions were for trips that included a Saturday night stay. The average search was for a trip for one hotel room and two adults traveling with no children. Search impressions contain a large fraction of hotels that are part of a chain (64%) or that are on promotion (20%). One third of search impressions and 2,516,587 observations come from consumers who were shown the Random ranking. There are a total of 352,523 clicks, with 118,149 clicks under the Random ranking. There is approximately one click per search impres- sion, with 6% of search impressions including two or more clicks. Finally, two thirds of all search impressions end in a transaction for a total of 201,442 transactions. Only approximately 14,900 search impressions have historical information about the consumer. I find that consumers on average purchased in the past from hotels with 3.3 stars at a price of $170 per night. 7In a companion data set from Wharton Customer Analytics Initiative (WCAI) on consumers searching for hotels on a similar online travel agent in Manhattan, I find that in 67% of search impressions consumers only consider the first page of results. 8Hotel ID’s are anonymized. As a result, the same brand located in two different parts of a city is given different hotel ID’s. For example, “Hotel A City Center” and “Hotel A Airport” appear as different hotels in my data. 9In the same companion data set from WCAI, I find that a significant fraction of consumers (more than 40%) only search once.

8 Table 1: Summary statistics: Search impressions

Mean Median Std. Dev. Min Max Number of Hotels Displayed 25.18 30.00 9.01 5 38 Trip Length (days) 2.36 2.00 2.08 1 59 Booking Window (days) 37.19 16.00 52.38 0 498 Saturday Night (percent) 0.51 1.00 0.50 0 1 Adults 1.99 2.00 0.87 1 9 Children 0.36 0.00 0.76 0 9 Rooms 1.11 1.00 0.43 1 8 Chain (percent) 0.64 0.71 0.29 0 1 Promotion (percent) 0.20 0.15 0.19 0 1 Random Ranking (percent) 0.33 0.00 0.47 0 1 Total Clicks 1.11 1.00 0.57 1 30 Two or More Clicks (percent) 0.06 0.00 0.25 0 1 Total Transactions 0.64 1.00 0.48 0 1 Observations 317,218

The data is anonymized, so determining the exact country or city to which a consumer wishes to travel is not possible. However, there exists suggestive evidence that the largest country (labeled 219) is the U.S. The largest country has 5,236,418 observations and 203,858 search impressions. Out of those, 84% of searches are made by consumers also located in this country, suggesting that the country has a large territory with a large fraction of domestic travel. This is also consistent with information from Alexa which shows that in May 2015, 73% of Expedia’s traffic came from visitors located in the U.S., while the second largest country in terms of traffic was South Korea with less than 2% of traffic.10 The prices charged are also consistent with the largest country being the U.S. According to the American Hotel and Lodging Association, the average price of a hotel room in the U.S. in 2013 was $110.35.11 In my data set, which contains only a subset of all properties in the U.S., the median price in 2013 was $118. Table 2 shows how the characteristics of the hotels displayed vary by the type of ranking observed. I divide results by the type of the search impressions, Expedia or Random, as well as by whether the search impression ended in a transaction. What is immediately clear is that Expedia’s ranking displays more expensive hotels of higher quality, as measured by the number of stars and the reviews of the hotels. Also, Expedia’s ranking displays a larger proportion of chains and hotels with more promotions than the Random ranking. Finally, search impressions that lead to a transaction, regardless of the ranking type, have cheaper hotels displayed. The last two columns in this table perform a t-test confirming that these differences are significant. Tables 11 and 12 in Appendix B 12.3 show that on average, clicked and purchased hotels are cheaper and of higher quality than those displayed. Also, Appendix B 12.4.3 shows how the characteristics of the hotels displayed (price, number of stars and reviews) vary by position and ranking type.

10See http://www.alexa.com/siteinfo/expedia.com. 11See http://www.ahla.com/content.aspx?id=36332

9 Table 2: Hotel characteristics displayed by search impression type

No Tran. Tran. No Tran. Tran. Random Expedia Random Expedia Mean SD Mean SD Mean SD Mean SD Diff. Diff. Price 153.62 106.14 167.63 109.54 131.68 85.24 136.65 85.71 -14.01∗∗∗ -4.97∗∗∗ Stars Less than 3 0.20 0.40 0.14 0.35 0.26 0.44 0.21 0.41 0.06∗∗∗ 0.05∗∗∗ 3 0.42 0.49 0.37 0.48 0.45 0.50 0.43 0.49 0.05∗∗∗ 0.02∗∗∗ 4 0.30 0.46 0.38 0.49 0.24 0.43 0.29 0.46 -0.08∗∗∗ -0.05∗∗∗ 5 0.08 0.27 0.11 0.31 0.05 0.21 0.06 0.24 -0.03∗∗∗ -0.01∗∗∗ Review Score Less than 2.5 0.09 0.28 0.05 0.21 0.07 0.25 0.05 0.22 0.04∗∗∗ 0.02∗∗∗ Between 2.5 and 3 0.12 0.32 0.09 0.28 0.13 0.34 0.11 0.31 0.03∗∗∗ 0.02∗∗∗ Between 3.5 and 4 0.46 0.50 0.48 0.50 0.46 0.50 0.48 0.50 -0.02∗∗∗ -0.02∗∗∗ Between 4.5 and 5 0.34 0.47 0.39 0.49 0.34 0.47 0.36 0.48 -0.05∗∗∗ -0.02∗∗∗ Chain 0.59 0.49 0.64 0.48 0.69 0.46 0.68 0.47 -0.05∗∗∗ 0.01∗∗∗ Location Score 2.83 1.55 3.27 1.52 2.49 1.40 2.76 1.47 -0.44∗∗∗ -0.26∗∗∗ Promotion 0.19 0.39 0.30 0.46 0.15 0.36 0.22 0.41 -0.11∗∗∗ -0.07∗∗∗ Significance of differences obtained by means of a t-test. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

4.2 Pros and cons of using the Kaggle data set

The data set is well suited to study the causal effect of rankings on choices since under the experimental condition consumers saw hotels that were ranked randomly. However, the data made available has the following features: (i) all search impressions have at least one click, and (ii) the fraction of searches leading to a transaction was oversampled. Since the data was made available for a machine learning competition, including consumer choices (clicks and purchases) in the data is necessary to allow the ranking algorithm to learn consumers’ preferences. The data made available was randomly sampled from searches ending and not ending in a transaction, therefore the data set can be used to consistently estimate preferences and search costs. Preference parameters are identified from consumer purchase decisions and since I observe both consumers who purchase and those who do not purchase under both rankings, they can be consistently estimated. Search costs are not identified from purchase decisions (similar to discrete choice models) and are thus not affected by the sampling done. Even though I can estimate preferences and search costs consistently, the selection of the data has three implications for my analysis. First, it means that I cannot compare conversion rates across the two ranking types or for the same hotel across two rankings, because this conversion rate is not representative of the performance of the two rankings. Second, within the Random ranking, if all searches contain at least one click, my results may not generalize to all consumers who saw the Random ranking. Third, if searches contain at least one click, the two groups of consumers who were selected into the sample from the two rankings may not be comparable. One possibility is that those consumers who see the inferior Random ranking and click have smaller search costs than those who click under the Expedia ranking, which will make the two groups not comparable. At the same time, these consumers may click under the Random ranking because they find the product they desire, in which case the two groups would be comparable. In fact, there is evidence in the data that the two groups may be comparable. I observe in Appendix B 12.1 that consumers who appear in my data were randomly assigned to the two rankings and that they are comparable in terms of observable characteristics, although it may still be the case that

10 they differ on unobservables, which I cannot test. In addition, Expedia uses the Random ranking to train their ranking algorithm without the position bias, further supporting the idea that the two groups are comparable. In light of this evidence, to be remain conservative, I choose to present my results for the two rankings separately, cautioning against quantifying differences among the two rankings, but stressing qualitative differences. More precisely, I will present both reduced form and estimation results for the two rankings separately. However, when performing counterfactuals, I use the aforementioned evidence that the two groups of consumers are comparable and assume that consumers under the two rankings have the same model parameters, in order to compare Expedia’s ranking to alternative rankings. Although this data set has its limitations, I conclude that the benefit of recovering the causal effect of rankings on choices and understanding the source of endogeneity are important questions that cannot be properly addressed without this data set that contains experimental variation in the positions of hotels.

5 Reduced Form Evidence

In this section, I present reduced form results for the causal effect of rankings on consumer choices, as well study the mechanism through which rankings affect choices.

5.1 The Effect of Ranking on Search and Choice

I start by considering the effect of rankings on consumer search. Consumers search by clicking on a hotel on the first page of results. In Figure 2, I illustrate the click through rate (CTR) of a position. The click through rate of a position measures the fraction of times a position received a click out of all the times it was displayed. I restrict attention to search impressions that do not include a hotel in positions 5, 11, 17, 23. These positions are typically reserved for opaque promotions (deals where consumers get a large discount by booking a hotel they only learn about after they make a transaction). Only 13% of search impressions include a hotel in those positions, so that most search impressions include opaque offers. What is immediately obvious is that higher ranked hotels lead to more clicks even though the ranking was randomized.12 This pattern is expected of a curated ranking that optimizes which hotels to displayed and it has been documented of other search intermediaries, such as Google.13 However, the fact that a similar pattern holds for the click through rate of the Random ranking is surprising. Hotels ranked at the top under the Random ranking are not more likely to be of higher quality than those ranked lower, suggesting that the position of the hotel plays a large role in determining the consumer’s click.14 The fact that the click through rate of a position is decreasing under the Random ranking also alleviates a possible concern about the data selection. If search impressions contain at least

12The fact that the click through rate curves cross derives from the fact that most searches have one click. 13See http://marketingland.com/new-click-rate-study-google-organic-results-102149. 14See Appendix B 12.4.1 for various robustness checks. The same pattern as in Figure 2 holds.

11

Figure 2: Click through rate (CTR) by position

Random95% CI Ranking 0.05.1.15.2.25Click10203040Position through rate .25 .2 .15 .1 Click through rate .05 0

0 10 20 30 40 Position

Random Ranking 95% CI one click and converting searches were oversampled, one possible concern is that consumers made their search for the best hotel on previous visits, and thus the search observed in this data is one where the consumer has already identified her ideal hotel. However, this story is contradicted by the fact that consumers were randomly assigned to the two rankings and the fact they click more often on higher ranked hotels under the Random ranking, which are unlikely to display their previously identified hotel.

Figure 3: Conversion rate (CR) conditional on a click by position

Random95% CI Ranking .8.85.9.951Conversion010203040Position rate 1 .95 .9 Conversion rate .85 .8

0 10 20 30 40 Position

Random Ranking 95% CI

Note: Restrict attention to search impressions ending in a transaction.

After the consumer clicks on a hotel, she has the option of purchasing it. I study the effect of rankings on purchases by looking at the conversion rate (CR) of a position. The conversion rate measures the percent of clicks that end in a purchase. In Figure 3, I plot the conversion rate of a

12 position for searches that end in a transaction.1516 I find that the conversion rate of the Random ranking is approximately constant across positions.17 In other words, after a click, higher positions cannot convince consumers to purchase a hotel, reflecting the fact that their direct utility does not depend on the position of the hotel in the ranking. This observation is an important step toward understanding how rankings affect consumer choices. Note that unconditional on a click, higher ranked hotels lead to more purchases under both rankings (see Figure 14 in Appendix B 12.4.2). However, my analysis here shows that this effect comes from rankings affecting choices at the click stage, not at the purchase stage, insight which would be confounded by endogeneity in the absence of the Random ranking data. The same patterns as in Figures 2 and 3 can be shown in a regression controlling for hotel and destination characteristics (see Appendix B 12.5). In comparison, in Figure 4 I show the equivalent graph of the click through rate and the conversion rate by position for the Expedia ranking. These figures show that under the Expedia ranking, higher ranked hotels lead to more clicks, as well as more purchases conditional on click. However, the Random ranking shows that this later effect is not due to position, but rather it is due to better targeting by Expedia. As a result, without experimental variation in the ranking, I expect an overestimation of the importance of the position in the ranking due to endogeneity. This overestimation in illustrated in the difference in patterns in Figures 3 and 4, where the effect of position on purchases is confounded with that of better matches without experimental variation. The same pattern can be shown in a regression controlling for hotel and destination characteristics (see Appendix B 12.5).

Figure 4: Click through rate by position and conversion rate conditional on click by position for

the Expedia ranking

Expedia95% CI Ranking Expedia's95% CI Ranking 0.05.1.15.2.25Click10203040Position through rate .8.85.9.951Conversion010203040Position rate 1 .25 .2 .95 .15 .9 .1 Conversion rate Click through rate .85 .05 0 .8

0 10 20 30 40 0 10 20 30 40 Position Position

Expedia Ranking 95% CI Expedia's Ranking 95% CI

To summarize, I find that the causal effect of rankings is obscured in observational data. Specifically, rankings affect consumer clicks, but conditional on a click subsequent purchases are not affected by the position of the hotel in the ranking as it would appear without experimental data. The fact that under Expedia’s ranking top hotels command more transactions suggests

15I thank Sergei Koulayev for this suggestion. 16Restrict attention to positions 1-36 for easy comparison. See Appendix B 12.4.2 for a similar pattern for positions 1-38. 17See Appendix B 12.4.2 for additional robustness checks.

13 that Expedia’s algorithm successfully identifies those hotels that consumers want to purchase. This observation confirms the main benefit of search intermediaries: as information aggregators, they help consumers search more effectively by ranking first firms that they are more likely to find relevant thereby decreasing search costs and increasing match probability. Even though theoretical concerns exist about intermediaries diverting consumer search (see Hagiu and Jullien, 2011), these figures show that they should be (at least partially) alleviated in this particular setting.

5.2 Informing the Role of Position in Search

The last section established the causal effect of rankings: positions affect clicks, but not subsequent purchases. As a result, positions do not affect consumers’ direct utility, otherwise higher ranked hotels would lead to more purchases conditional on click as well. However, the exact way in which positions affect consumer clicks is unknown. The literature describes several mechanisms through which the position may affect consumer clicks. In the absence of an effect of utility, these mechanisms can be grouped into two main effects: rankings affect consumers’ expected utility (Varian, 2007; Athey and Ellison, 2011), or consumer search costs (Ghose et al. 2012b; Chen and Yao, 2014). Unfortunately, there has been little empirical evidence on the exact mechanism through which positions affect consumer choices. This is without a doubt due to the difficulty of controlling one channel and exploring how position affects the other. In this section, I propose a test of whether the position of a hotel mainly affects search costs or expected utility. The test is intended to hold consumers’ expected utility constant and attribute changes in consumer choices to the effect of position through search costs. This test exploits a unique feature of my data: the availability of opaque offers in some searches which leads to variation in the position of a displayed hotel. More precisely, in most search impressions, positions 5, 11, 17, 23 are reserved for opaque offers. In this case, no hotel is displayed in these positions, but instead an offer to purchase an unidentified hotel at a discount is displayed. For example, in this case, the 5th displayed hotel will then be shown in position 6 instead of position 5. Approximately 13% of search impressions do not include such offers, and in this case, the 5th displayed hotel is shown in position 5. I exploit this variation in the position of the 5th displayed hotel to test whether the position of the hotel mainly affects consumers’ search costs or their expected utility. More formally, I look at search impressions that do not contain a click in the first four positions. In this case, consumers’ expected utility of the 5th displayed hotel should be the same regardless of whether the 5th displayed hotel is shown in position 5 or 6. Any difference in the click through rate of position 5 and 6 will then be attributed to differences in search costs, not expected utility. In Table 3, I show my results using data from the Random ranking.18 I estimate a linear probability model with dependent variable a click and explanatory variables an indicator for whether the 5th displayed hotel is shown in position 6, as well as observable characteristics of the hotels.

18See Appendix B 12.15 for results from both the Random and Expedia rankings.

14 Table 3: Estimates of click on the position of the fifth displayed hotel

All Destination 4562 Position of fifth displayed hotel -0.0176 -0.1408 (0.0117) (0.1299) Stars 0.0265∗∗∗ (0.0017) Review Score 0.0046∗∗∗ (0.0012) Chain 0.0021 (0.0026) Location Score -0.0030∗∗∗ (0.0009) Price -0.0002∗∗∗ -0.0003 (0.0000) (0.0002) Promotion 0.0099∗∗ 0.0160 (0.0031) (0.0581) Hotel fixed effects No Yes Observations 62,584 642 R2 0.0063 0.5482 Standard errors in parentheses Note: Linear probability model with dependent variable a click of the probability of a click happening in the fifth displayed hotel, conditional on no click occurring in the first four displayed hotels. Restrict to searches under Random ranking. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

15 Additionally, in the second column, I restrict attention to the largest destination in my data set to control for unobserved hotel specific effects. I find that when the 5th displayed hotel is shown in position 6 rather than position 5, it receives fewer clicks. This finding is consistent with position affecting consumers’ search costs. This suggests that consumers do perceive lower ranked hotels as being harder to find, as they may involve additional time to learn about their characteristics. In this specific example, learning about the 5th displayed hotel when it is in position 6 rather than position 5, involves scrolling past the opaque offer, which may impose additional costs and thus decrease the probability of a click.19 One concern with this result may be that consumers who clicked less on the 5th displayed hotel when it was shown in position 6 actually clicked on the opaque offer. Even though I cannot completely alleviate this concern as I do not observe clicks on the opaque offer, it is unlikely that such an event is the predominant behavior in the data. Since in my data set all consumers clicked at least once and since in this exercise I restrict attention to consumers who did not click on the first four displayed hotels, it follows that they click on one of the hotels displayed lower in the ranking. If consumers did click on an opaque offer, this action would involve navigating to a different screen (new tab on the browsing window), making it unlikely that the consumer would return to click on one of the lower ranked hotels on the main results page. Although this is not impossible (for example the consumer might first click on the lower ranked hotels and then on the opaque offer), I expect clicks on opaque offers to be more common in searches without any clicks or with clicks at the top of the ranking, and thus not detract from the main findings of Table 3. In this section, I provided evidence of the effect of rankings on consumer search and choices, as well as provide evidence of the mechanism through which rankings affect search. In the next section, I present a sequential search model that can describe consumer choices from an ordered list of alternatives.

6 Model

To qualitatively understand the causal effect of rankings on consumer searches and choices, I can rely solely on the reduced form evidence and exploit the exogenous feature of the ranking ob- served. However, in order to quantify preferences and search costs and to perform counterfactuals constructing better rankings, I use the following model of consumer sequential search.20

6.1 Utility

When the consumer arrives at the OTAs website, she types in the destination, the exact dates of her trip, the number of guests traveling and the number of rooms she is looking to book. In response to this search query, she gets a search impression, i.e. an ordered list of hotels that match her search criteria. Such a search impression includes a lot of valuable information that

19This test can be used to rule in the fact that position affects search costs. However, it does not rule out the case that position affects both search costs and expected utility. 20See Appendix B 12.13 for evidence of consumers searching sequentially.

16 the consumer observes without clicking on a particular hotel, i.e. without searching. For example, this list contains information about the name of the hotel, the number of stars it has, its review score and its price. By clicking on a particular hotel, the consumer discovers more detailed information about it. More precisely, she can see more pictures of the hotel, can locate it on the map, read past consumer reviews, and learn about different amenities. I model this information as the consumer’s match value with the hotel. The consumer readily observes this information after clicking on a hotel and can determine how much utility she derives from it, but from the econometrician’s perspective, this information is unobserved. Therefore, I model the match value as a random error term. These considerations lead to the following model for consumer i’s utility for hotel j ∈ {1,...,J} uij = vij + ij (1)

= xjβ − αpij + ij (2) where vij contains consumers’ valuation for non-price xj and price pij characteristics, where pij is hotel j’s price at the time of i’s search query. More precisely, xj contains the number of stars of the hotel, its review score, a location score, an indicator for whether the hotel is part of a chain, and an indicator for whether the hotel is running a promotion. Because the data set is anonymized, I cannot tell the exact brand or name of the hotel, so these characteristics are meant to best capture the relevant information about the brand of the hotel. The random shock to utility

ij follows a standard normal distribution, consistent with previous literature (Kim et al., 2010;

Chen and Yao, 2014). The match value ij is only discovered by paying a search cost to click. The consumer also has an outside option denoted by j = 0, that of not booking a hotel, booking a hotel at a later time or choosing a different firm to book the trip. I do not have information about the exact outside option that the consumer chooses, so I model here the outside option as ui0 = i0. Since my data is at the search impression level, I do not observe the consumer making the search. I thus consider a search in the data as being performed by a unique consumer, which mo- tivates my assumption of population level parameters [β, α]. In addition, the hotel characteristics observed on the first page of results, such as the number of stars or the reviews of the hotel, are more likely to be considered by consumers as vertical characteristics. In this case, I expect there to be limited across-consumer variation in their valuation for these characteristics. Consumers are more likely to treat characteristics observed on the hotel’s page as horizontal (for example, interpreting the text in a review or seeing detailed pictures of the hotel). Thus, consumer hetero- geneity is captured by the idiosyncratic shock to utility ij that models the characteristics that consumers search for and that are revealed on the hotel’s page.

6.2 Search Cost

The consumer observes vij for all j’s displayed on the first page of results for free. To learn about the match value ij of a particular hotel, the consumer has to pay a search cost. I model consumer i’s search cost for hotel j ∈ {1,...,J} as

17 cij = exp(k + γWi + δρij) (3) where k gives the baseline search costs of the consumer, Wi denotes the booking window (the number of days before the trip that the consumer searches) and ρij gives the position of the hotel in the ranking that the consumer observes. I thus assume that the position of the hotel affects the consumer’s search costs, consistent with my findings in the previous section and the literature (Ghose et al., 2013; Chen and Yao, 2014). The exponential function of the search costs is consistent with prior literature (Kim et al. 2010, Ghose et al. 2013, Chen and Yao 2014) and ensures that search costs are positive. Consumers observe the outside option for free. Note that I had previously normalized the systematic component of the outside good utility to zero. One reason for this is that this utility is not separately identified from the baseline search cost parameter, k. Further, most consumers only click once in my data set and I cannot link searches made by the same consumer over time. For these reasons, the estimated parameter k, and hence the implied baseline search costs are large and difficult to interpret.21 The data does not contain information on the order in which consumers click. However, most searches contain only one click, while for searches that contain more clicks, I do the following to recover this order. From the WCAI companion data set that does contain the order in which consumers search, I show in Appendix B 12.9 that for searches with at least two clicks, the position of the hotel explains most of the click order. 22 This allows me to model consumers’ click order for the few consumers who click more than once even in the absence of information on the order of clicks.

6.3 Optimal Search

To compute the optimal search strategy of consumers, I rely on Weitzman (1979) who provides the solution to a general ordered search problem. The solution indicates that it is optimal for consumers to begin by ranking firms in order of their reservation utility. Reservation utilities are defined as the level of utility that the consumer would have to have in hand before searching a particular hotel to make her indifferent between searching that hotel or not. Weitzman (1979) shows that reservation utilities can be computed by equating the marginal gains from searching firm j with the marginal cost as in

Z ∞ cij = (uij − zij)f(uij)duij (4) zij where the zij that solves this equation is consumer i’s reservation utility from searching j. Kim et al. (2010) show that equation (4) can be rewritten by taking advantage of the distributional

21See Appendix B 12.12 for more details. 22In a data set where order is observable and where there is reason to expect individual heterogeneity to play a large role in determining the order in which consumers click, a different model is required, which I describe in section 12.11 and demonstrate the bias that would result from not properly modeling click order.

18 assumptions made. More precisely, using ij’s normality assumption and the expression for the expectation of the truncation of normally distributed random variables, equation (4) can be rewritten as

cij = (1 − Φ(mij))(λ(mij) − mij)

= B(mij) (5) φ(·) where λ(·) = 1−Φ(·) is the hazard function and where mij = zij − vj. The result in equation (5) provides a straightforward way of computing the reservation utility zij. More precisely, it says 23 that given any search cost cij, one can invert equation (5) and solve for mij. Then, using the definition of mij, the reservation utility is given by zij = mij + vj. To speed up computation, I follow Kim et al. (2010) and construct a look-up table for cij = B(mij) outside the estimation loop. During estimation, for a particular value of search costs, I use the table to look up the value of mij and construct the reservation utility.

Once the consumer computes all reservation utilities zij, the following strategy due to Weitz- man (1979) characterizes her optimal search

1. (Selection Rule): If a search is to be made, the firm with the highest reservation utility should be searched next.

2. (Stopping Rule): Search should terminate when the maximum utility observed exceeds the reservation utility of any unsearched firm.

3. (Choice Rule): Once the consumer stops searching, she will purchase from the firm with the highest realized utility of those searched.

7 Estimation and Identification

7.1 Joint Likelihood

Suppose there are J firms that consumer i ∈ {1,...,I} can search. Order these firms by consumer i’s reservation utility. Denote by Ri(n) the identity of the firm with the nth high- est reservation utility. Suppose consumer i searched a number h ≤ J of these firms, so that

Ri = [Ri(1),...,Ri(h)] gives the set of searched firms and the order in which they were searched.

The outside option is always searched (denote it for simplicity as either j = 0 or Ri(0)). Weitz- man’s optimal search strategy dictates that the following inequalities relating reservation and realized utilities must hold. First, if the consumer makes an nth search, then her reservation utility from that firm must exceed her reservation utility from all firms searched next and all those not searched. Formally, it must be that

J ziR (n) ≥ max ziR (k), ∀n ∈ {1,...,J − 1} (6) i k=n+1 i

23Kim et al. (2010) show that the function B(·) is monotonic and decreasing in its argument and that a unique solution to cij = B(mij) exists. Thus, this inversion is possible.

19 otherwise, using the selection rule, the consumer would have searched another firm next that had a higher reservation utility. Second, if the consumer makes an nth search, then her reservation utility from that firm must exceed her utility from all firms searched so far, including the outside option. Otherwise, according to the stopping rule, the consumer would have stopped searching. Formally, n−1 ziR (n) ≥ max uiR (k), ∀n ∈ {1, . . . , h} (7) i k=0 i Third, all unsearched firms must have a lower reservation utility than the maximum utility of the searched alternatives, including the outside option,

h ziR (m) ≤ max uiR (k), ∀m ∈ {h + 1,...,J} (8) i k=0 i otherwise, according to the stopping rule, the consumer should have continued searching. Finally, if the consumer chose to purchase from firm j, including choosing the outside option, then her utility from this choice must exceed all utilities searched. Formally,

h uij ≥ max uiR (k), ∀j ∈ Ri ∪ {0} (9) k=0 i If consumers search sequentially, then their search and purchase decisions are not separate. This means, that the probability of observing a certain outcome is characterized by a joint prob- ability. The probability PijRi that i searches exactly in the order Ri and purchases from firm j (including the outside option) is given by

J n−1 h h PijR = P rob(ziR (n) ≥ max ziR (k) ∩ ziR (n) ≥ max uiR (k) ∩ ziR (m) ≤ max uiR (k) ∩ uij ≥ max uiR (k), i i k=n+1 i i k=0 i i k=0 i k=0 i

∀n ∈ {1,...,J − 1}, ∀m ∈ {h + 1,...,J}, ∀j ∈ Ri ∪ {0}) ZZ = I(cond)φ(i)diφ(ηi)dηi (10) where cond stands for the four conditions I derived from Weitzman’s optimal search rule (equations 6-9) and where I(·) is an indicator for whether these conditions hold. The log-likelihood function is given by X X X LL = dijRi logPijRi (11) i Ri j where dijRi = 1 if i chose search order Ri and purchased from j (including outside option). The integral in equation (10) does not have a closed form solution. Thus, I replace the choice ˆ probability PijRi with the simulated choice probability PijRi , which approximates the integral in (10) with a summation over D draws of the utility error term. This results in the following simulated log-likelihood X X X ˆ SLL = dijRi logPijRi (12) i Ri j The choice probability can be simulated in a number of ways. The most straightforward and widely used simulator is accept-reject (AR). It was originally proposed by Manski and Ler- man (1981) for probits. This simulator approximates PijRi by the proportion of draws from the appropriate distribution that satisfy the conditions (10). However, using the AR simulator in

20 maximizing the SLL can be problematic for two reasons. First, any finite number of draws D can ˆ result in a reject, so that PijRi is zero and the log of zero is undefined. This possibility is especially likely if the data contains very few choices, so that the true probability is low. This is the case with my data set. Each search impression contains on average 25 hotels, making searching in a particular order and buying from a particular hotel especially unlikely. The second difficulty comes from the fact that the choice probabilities are not twice differentiable, so the simulated probabilities will not be smooth. Thus, finding a maximum by optimizing the SLL using first and second derivatives will not be effective. Even though there is a way to circumvent this problem and use an approximation of the gradient to the SLL instead, Train (2009) concludes that in practice AR is difficult to use. For these reasons, I choose to replace the indicator function in the AR simulator with a smooth function. Any function that is increasing in the chosen alterna- tive and that has defined first and second derivatives can be used. As suggested by McFadden (1989), I choose the logit function that satisfies these conditions and is convenient to use. This is known as the logit-smoothed AR simulator. It has also been successfully used by Honka (2014) and Honka and Chintagunta (2014) in the consumer search setting and by many others when estimating probit models. ˆ I now describe the steps I use to simulate PijRi using the logit-smoothed AR simulator.

d 1. Draw d = {1,...,D} samples of ij for each consumer and each firm.

d d 2. Use ij to form utility uij.

d d d d 3. Use the relation cij = B(mij) to compute mij and form reservation utilities zij. 4. Define the following expressions for each draw d

(a) νd = zd − maxJ zd 1 iRi(n) k=n+1 iRi(k) (b) νd = zd − maxn−1 ud 2 iRi(n) k=0 iRi(k) (c) νd = maxh ud − zd 3 k=0 iRi(k) iRi(m) (d) νd = ud − maxh ud 4 ij k=0 iRi(k) 5. Put these expressions into the logit formula and compute Sd for each draw d 1 Sd = (13) νd P4 − n 1 + n=1 e λ where λ > 0 is a scaling parameter.

6. The simulated choice probability is the average over D draws of the error terms,

ˆ 1 X d PijRi = S (14) D d There is little guidance in choosing the scaling parameter λ. As λ → 0, the simulator is unbiased because it approaches the AR simulator. So, the researcher should use a small enough λ, but not too small to reintroduce the numerical problems one faces when optimizing with a non-smooth function.

21 7.2 Identification

The optimal choice rules of Weitzman (1979) help identify preference and search cost parameters as follows. Stopping rules in equations (7)-(8) (i.e. number of clicks), impose an upper and a lower bound on search costs, respectively, that must have made it optimal for the consumer to conduct a certain number of searches. The stopping rules, however, only recover a range of search costs. The level of search costs (parameter k) is pinned down by the functional form and the distribution of the utility function through the optimal search relation in equation (5). Note that the larger the variance of the unobserved utility shock, the larger the benefits from searching. If, as it is the case in my data set, most consumers only click once, this larger variance is associated with large consumer search costs. Other search cost parameters are identified as follows: booking window is identified through the variation across consumers in the number of searches performed; the position effect is identified from variation within consumers in their choice of the position of a click. The selection, stopping and choice rules in equations (6)-(9) help identify the preference parameters of characteristics that vary by hotel. Characteristics in the utility function that do not vary by hotel cannot be identified, so I do not include them in the specification of the utility function. Comparisons of the reservation utilities with realized utilities will not help identify these constants since they shift both reservation and realized utilities by the same amount. Both the price of the hotel and its position (in Expedia’s ranking) may be endogenous. I will show evidence to alleviate concerns about price endogeneity and instead focus on the en- dogeneity of position, which I eliminate using the Random ranking. Price may be endogenous for two reasons. First, an unobserved quality shock may affect both consumer choices and hotel prices. Second, consumer specific choice probabilities may affect what prices hotels set. The most common method to alleviate price endogeneity concerns is instrumental variables. However, in the hotel industry, very few instruments (if any) are available. One possibility is using Hausman style instruments such as the average price of the same hotel or same star hotels in a different location. Unfortunately, I do not observe the same hotel in different locations and since desti- nations and countries are anonymous, taking the average price across very different destinations (for example cities on different continents) will make the assumption that the average price is capturing marginal costs hard to satisfy. Other possible instruments are lagged prices of the same hotel, which may not be valid if the unobserved quality of the hotel is correlated over time. An- other option is using region dummies as proxies for marginal costs, but I do not observe regions and determining whether a destination is a neighborhood or an entire city will not provide an accurate enough approximation. Finally, as another instrument for price, one can use the average price of other hotels for the same trip, as well as the focal hotel’s non-price characteristics. These instruments are similar to the one’s used by Chen and Yao (2014) in the online hotel industry, and by Hortacsu and Syverson (2004) and BLP in different settings. These instruments capture the position in characteristics space of the focal hotel relative to all others, assuming that char- acteristics are predetermined or exogenous. However, this last assumption may not be tenable in my case. Even though price instruments in the hotel industry are difficult to obtain, concerns about

22 the endogeneity of price may be partially alleviated by the observation that prices are set by the hotel’s revenue management system and thus not set in response to individual consumers’ preferences.24 Hotel revenue management systems segment consumers based on willingness to pay, price-elasticity, and group discounts (Cross et al. 2009; Mauri, 2013). If hotels target their prices, then observing the price that one consumer sees reveals important information about their underlying response parameters (Manchanda et al. 2004). However, I will show that even this form of endogeneity does not pose a significant concern. To this end, I run a regression of price on observable characteristics to show that these capture most of the variation in prices. My results can be found in Table 15 in Appendix B 12.7. In the first column, I only regress price on hotel and trip date fixed effects by destination and obtain an adjusted R2 of 0.766. This suggests that specific dates command different prices, but all consumers searching for a hotel for the same trip date will see the same price for a particular hotel. In the next columns, I add additional trip and search characteristics, and include information about the average prices of similar hotels for the same trip and obtain an adjusted R2 of 0.813. In the last three columns of Table 15 show that a similar pattern holds across different destinations.25 This analysis suggest that observable characteristics explain most of the variation in price of a hotel, with the trip date explaining the majority of it. From discussions with an employee at an large hotel chain and from previous literature, the remaining price variation may be due either to (i) different suppliers selling the particular hotel, or (ii) experimental price variation (Einav et al., 2015; Koulayev, 2014). Since both of these explanations are not demand related (supply or experimental), I conclude that the price variation observed conditional on parameters of request, is unlikely to be correlated with the utility error term and thus does not need an instrument. Finally, a hotel’s price did not vary by ranking type. As a result, any remaining concerns about price endogeneity should not translate into concerns about the price coefficient estimates being biased across the two rankings. In this section, I discussed identification and in the next section, I present simulation results that confirm that my model is identified.

7.3 Monte Carlo Simulation

In this section, I describe simulation results to show that Simulated Maximum Likelihood using the logit-smoothed AR simulator recovers parameters well.26 To this end, I generate a data set of 1,000 consumers, each searching among five firms.27 I use 50 draws from the distribution of the utility error terms for each consumer and hotel combination and a scaling factor λ = 1/5 and I repeat the simulation 50 times. I have performed simulations with 1/λ ranging from 1 to 7 and

24Koulayev (2014) makes a similar observation. 25In Appendix B 12.6 in Figure 18, I also report the adjusted R2 from running separate regressions as in Table 15 on each hotel in a destination, and I obtain a similar result. In Figure 19, I also investigate which hotels have a larger unexplained portion of the variation in price. I find that such hotels have a smaller number of displays, are chains with fewer than three stars and lower hotel location score. 26I thank Elisabeth Honka for the hints she gave me on running this simulation. 27See section 12.11 for details about data generation.

23 found that λ = 1/5 recovers parameters the best in the simulation sample, which led me to use the same scaling parameter in estimation.

Table 4: Simulation results

True values Estimated values Price -1.8 -2.0448∗∗∗ (0.0410) Stars 0.5 0.5441∗∗∗ (0.0245) Review Score 0.5 0.5882∗∗∗ (0.0225) Location Score 0.5 0.5348∗∗∗ (0.0231) Chain 0.5 0.5759∗∗∗ (0.0310) Promotion 0.5 0.5647∗∗∗ (0.0299) Position 0.5 0.5369∗∗∗ (0.0028) Booking Window -0.1 -0.1390∗∗∗ (0.0030) Constant k -2.5 -2.5659∗∗∗ (0.0040) Observations 5,000 Log-likelihood -1,143 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

My results are given in Table 4. In the first column are the true parameters and in the second column I show the estimated parameters. I find that my method works quite well in recovering the parameters of interest. Having introduced the model and discussed its identification, in the next section I apply the model to data on consumer online searches for hotels.

8 Results

The purpose of estimation is recovering θ = [β, α, k, γ, δ]. I estimate parameters using Simulated Maximum Likelihood using the logit-smoothed AR simulator with 50 draws for each consumer- hotel combination of utility error terms and a scaling factor of λ = 1/5. For the present analysis, I focus my attention on the four largest destinations in my data set, destinations 4562, 9402, 8347 and 13870.28 This allows me to control for differences across destinations and not confound results. These destinations are in the largest country, 219, which I have shown earlier is likely the U.S. 28See Appendix B 12.8 for summary statistics for these destinations and details about the estimation sample.

24 Table 5: Main Estimation Results: Random ranking

Destination Destination Destination Destination 4562 9402 8347 13870 Panel A: Coefficients Preferences (u) Price ($100) -0.1423∗∗∗ -0.1560∗∗∗ -0.1963∗∗∗ -0.2028∗∗∗ (0.0146) (0.0000) (0.0305) (0.0241) Stars 0.0240∗∗∗ 0.1314∗∗∗ 0.1623∗∗∗ 0.0331 (0.0137) (0.0000) (0.0000) (0.0393) Review Score -0.0360∗∗∗ -0.0738∗ -0.0177 0.0599∗∗∗ (0.0079) (0.0095) (0.0128) (0.0000) Location Score 0.0685∗∗∗ 0.1034∗∗∗ -0.0417∗∗ 0.0218 (0.0070) (0.0075) (0.0139) (0.0184) Chain -0.0158 -0.0245 -0.0736∗ 0.0401 (0.0196) (0.0229) (0.0325) (0.0533) Promotion -0.0138 0.0000 0.1090∗∗∗ 0.1213∗ (0.0212) (0.0266) (0.0288) (0.0526)

Search Cost (c) Position 0.0053∗∗∗ 0.0035∗∗∗ 0.0036∗∗∗ 0.0042∗∗∗ (0.0006) (0.0007) (0.0005) (0.0018) Booking Window -0.1717∗∗∗ -0.1590∗∗∗ -0.1584∗∗∗ -0.1384∗∗∗ (100 days) (0.0012) (0.0000) (0.0000) (0.0038) Constant ek 0.3751∗∗∗ 0.4073∗∗∗ 0.5717∗∗∗ 0.6659∗∗∗ (0.0049) (0.0000) (0.0000) (0.0277) Observations 17,000 10,850 6,275 2,475 Log-likelihood -3,136 -2,008 -1,201 -462

Panel B: Equivalent Change $ Position $3.73 $2.31 $1.85 $2.10 Stars $16.89 $84.23 $82.72 $16.32 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Note: Prices measured in $100 and booking window measured in 100 days. Position and booking window parameters expressed as change in search cost implied by a unit change.

25 Table 5 shows the main estimation results for the Random ranking.29 In Panel A, I show the coefficient estimates, while in Panel B, I derive consumer’s willingness to pay for parameters of interest. In general, the estimates of preferences and search costs are economically meaningful and significant. For example, consumers derive higher utility from lower prices and hotels that have more stars and better location. I find that a 10% increase in prices in the first destination is associated with a 2.19% decrease in transactions. Also, a higher position in the ranking increases search costs, being associated by a 1.37% decrease in transactions and 1.49% decrease in clicks in the first destination. Finally, a larger booking window decreases search costs, as consumers searching further in advance of their trip have more opportunities to return to search, thereby decreasing the urgency of the current search. Position effects range from $1.85 to $3.73 under the Random ranking.3031 As expected, because of endogeneity, these results are typically smaller than those found in the hotel industry literature. For example, Koulayev (2014) finds position effects that range from $2.93 to $18.78, De los Santos and Koulayev (2014) find position effects exceeding $7.76, while Ghose et al. (2012b) find a position effect of $6.24. The exception is Chen and Yao (2014) who find a position effect of approximately 25 cents when estimating the model at the consumer level. The intuition for this result is as follows. Under a curated ranking, consumers find more desirable alternatives faster. Not accounting for endogeneity, a model will attribute this behavior to sizable ranking and search cost effects, when in fact part of this effect comes from better targeting. My estimate using experimental variation in the ranking recover the causal effect of position. In the next section, I use the main estimation results to measure the impact of three counter- factual rankings on consumers’ match with a hotel and their search costs, compared to Expedia’s current ranking.

9 Counterfactuals

In this section, I present my results from three counterfactual experiments. To perform these counterfactuals, I use the Random ranking estimates from the previous section to measure the change in consumer valuation from a counterfactual ranking over Expedia’s ranking. To perform this comparison, I assume that consumers under the Expedia and the Random rankings have the same model parameters. In making this assumption, I acknowledge a tradeoff: performing

29See Appendix B 12.14 for results for both Expedia and Random ranking. These will reveal that both position and search costs are overestimated under Expedia’s ranking, although the magnitude should be interpreted with caution. 30See Appendix B 12.10 for results using a conditional logit model of the probability of a click for a similar pattern. 31As mentioned in the model setup section, I expect the lack of information on consumers’ outside option, the fact that most searches contain only one click, and the inability to link different searches made by the same consumer to result in large estimates of search costs. Indeed, I find baseline search costs of approximately $260. Even though these are large and hard to interpret, I show in Appendix B 12.12 that when I restrict attention to searches with more than one click or when I estimate the model on the WCAI data set which allows linking different searches made by the same consumer, search cost estimates decrease to as low as $30. This is comparable to Chen and Yao’s (2014) estimate of $25 or De los Santos and Koulayev (2014) who estimate search costs ranging from $10 to $50.

26 counterfactuals using the Expedia ranking estimates suffers from the endogeneity of the ranking, while using the Random ranking estimates may introduce selection bias. However, I provide supportive evidence for this assumption in Section 4.2, which encourages me to use the Random ranking estimates for counterfactuals. An alternative approach would be to use the Random ranking estimates to compare the Random ranking with alternatives, which would not suffer from either selection or endogeneity bias. Since the Random ranking is inferior to Expedia’s ranking, my results in this section would understate the change in welfare due to improving the Random ranking. I propose to measure the total value of consumer i under a certain ranking as the utility of the choice she made under that ranking (the outside option or the hotel she purchased from) net of her total search costs. More formally,

X Valuei(ranking)=ui(choice)- ci(clickj) (15) j I use this measure of consumer valuation as well as measures of firm performance to compare Expedia’s ranking against three counterfactual rankings. These counterfactuals are computed by changing the ranking that consumers observe and simulating their new click and purchase decisions using Weitzman’s (1979) optimal search rules. Note that the counterfactual results represent short-run effects of changing the structural parameters.

9.1 Utility-based Ranking

The last section presented preference and search cost estimates from a sequential search model. In this section, I evaluate the impact of reordering the hotels that were displayed under Expedia’s ranking by their average expected utility (as estimated from the Random ranking).32 Consistent with the literature, I refer to this ranking as the Utility-based ranking (Ghose et al. 2012a). My results can be found in Table 6, where I compare the value of the Utility-based ranking with that of Expedia’s current ranking. I find that there are large gains to be obtained from moving away from Expedia’s current ranking and towards the Utility-based ranking. More precisely, consumers gain $38.36 in the first destination (21% of the transaction price), and $23.86 (15% of the purchased price) in the second destination under the Utility-based ranking.33 The benefit of the Utility-based ranking for consumers comes from better matches and lower search costs. For the first destination, $16.46 (43% of the value) comes from improved matches, $8 comes from lower prices, while more than a third of the value ($13.90) comes from lower search costs. Similarly, for the second destination, more than half of the value comes from lower search costs. In fact, I find in Table 6 that as much as 64.77% of consumers buy from the exact same

32Another option for constructing the ranking would be to take all hotels available in a certain destination and rank them according to the average expected utility and propose this ranking for all searches. However, there may be good reasons why Expedia chose a particular set of hotels to display to consumers (e.g. availability), which is why I choose not to focus on this counterfactual. 33I restrict attention to the first two destinations when performing counterfactuals for ease of exposition. The same qualitative results hold for the other two destinations and are available upon request.

27 Table 6: Counterfactual 1: Utility-based ranking results

Destination 4562 Destination 9402 U-E U-E Change in Consumer Valuation $38.36 $23.86 (% Tran. Price) (21.41%) (15.32%) Match $16.46 $6.56 Price −$8.00 −$5.33 Total Search Costs −$13.90 −$11.97 % Change in Transactions 4.89% 2.63% % Who Purchase Same Hotel 52.97% 64.77% Position of Transaction −3.21 −4.28 Note: U=Utility-based ranking; E=Expedia ranking. hotel under the Utility-based ranking as they did under Expedia’s ranking, thereby only gaining from lower search costs. This suggests that the benefits of improving the ranking may vary for different types of consumers. Consumers who choose the same hotel under both rankings only benefit from lower search costs under the Utility-based ranking, while all others benefit from both better matches and lower search costs. What differentiates consumers who choose the same hotel under the Utility-based ranking as under Expedia’s ranking from those who buy from a different hotel under the Utility-based ranking is the number of days before the trip that they search. In particular, consumers who choose the same hotel are more “patient”, searching at least 11 days earlier than those who choose a different hotel.34 The intuition of this result is as follows. Consumers who are more patient spend more time under Expedia’s ranking to determine their ideal hotel, leading to lower benefits when the ranking is improved. The fact that more than half of consumers purchase from the exact same hotel under the Utility-based ranking as they did under Expedia’s ranking suggests that Expedia’s ranking already provides consumers with large benefits. This finding is consistent with my results in Figure 3, where I showed that Expedia is successful in determining which hotels consumers wish to purchase from. These gains are in line with the benefits of the Utility-based ranking reported in the literature. For example, Ghose et al. (2012a) ask Amazon Mechanical Turk users to compare a Utility-based ranking with one of the several baseline rankings used by online travel agents and find that more than 60% of consumers prefer the Utility-based ranking. Furthermore, Ghose et al. (2013) show that a Utility-based ranking outperforms other rankings in terms of revenues, including one based on conversion rate or Travelocity’s default ranking. Here, I quantity consumers’ valuation of the Utility-based ranking, as well as consider the sources of this valuation (match and search costs) and what types of consumers will value a Utility-based ranking the most. Compared to the literature, however, I find that the default ranking provides consumers with larger benefits than previously perceived. In addition, I consider the effect of the Utility-based ranking on the other two sides of the market, hotels and Expedia. For hotels, the Utility-based ranking leads to a shift towards cheaper hotels with fewer stars, as can be seen in Figure 5. For Expedia, even though

34A t-test of the difference in the booking window of the two groups shows a difference of −10.85 (t= −3.24) for the destination 4562, and −14.01 (t= −2.96) for the destination 9402.

28 transaction prices decrease, the conversion rate under the Utility-based ranking increases even more, leading to an overall gain.

Figure 5: Utility-based ranking: Gains for hotels

9.2 Mobile Ranking

The benefit of a ranking may vary not only with the value of the hotels displayed, but also with consumers’ ability to discover the hotels displayed. On mobile devices, the smaller size of the screen makes it harder to browse for information than on desktop computers, increasing the importance of the ranking. More precisely, Ghose et al. (2012c) show that moving closer to the top by one position increases the odds of clicking by 37% on mobile devices, compared to only 25% on desktops. I use this result that position effects are 50% higher on mobile devices, to predict consumers’ choices when the same ranking they observed on Expedia on a desktop computer is transferred to a mobile device. As consumers spend more time on mobile devices, it is important to understand how designing more effective rankings may differ by platform or device. My results can be found in Table 7. I find a sizable loss of as much as $16.23 for consumers (9% of the transaction price), as well as fewer clicks and purchases. Surprisingly, a breakdown of this value shows that the loss in consumers’ valuation does not come from higher search costs. Rather, I find that total search costs decrease, as consumers click less (2.94% fewer clicks in the first destination) and click closer to the top of the ranking compared to Expedia’s ranking (average position where transaction occurs drops by one). I thus find that consumers react to higher ranking effects by changing their click behavior so as to minimize their search costs, but this comes at a loss in their match quality with a hotel. More precisely, consumers purchase from more expensive hotels (as much as $4.24 more expensive) and find worse matches decreasing valuation by as much as $14.02. In sum, this result highlights the tension between the magnitude of consumer search costs and the impact of a ranking. If search were frictionless (search costs were zero), then regardless of the quality of the ranking, consumers would make the same choices. However, as search costs increase, the need for an improved ranking increases as well, since consumers naturally rely more on options ranked at the top. As a result, even a ranking that performs well on desktops, may

29 Table 7: Counterfactual 2: Mobile ranking results

Destination 4562 Destination 9402 M-D M-D Change in Consumer Valuation -$16.23 -$10.85 (% Tran. Price) (9.06%) (6.97%) Match −$14.02 −$8.80 Price $4.24 $2.92 Total Search Costs −$2.03 −$0.87 % Change in Transactions 1.25% −1.31% % Change in Clicks −2.94% −2.51% Position of Transaction −1.18 −1.19 Legend: M=Mobile ranking; D=Desktop ranking. lead to substantial match and transaction losses on mobile devices, as search costs and thus search behavior are altered on the latter.

9.3 Chain versus Independent Hotel Rankings

In April 2015, Amazon entered the hotel ranking industry with a distinct focus on ranking inde- pendent (non-chain) hotels, citing that these hotels would benefit the most from its rankings.35 Although this may be beneficial for independent hotels, one concern with Amazon’s new approach to ranking hotels is that the decrease in diversity of the ranked hotels will hurt consumers. Ded- icating a ranking to only one type of hotel may lower the probability of finding a matching hotel and therefore decrease consumer welfare. In this counterfactual, I explore the impact of lowering diversity by proposing to rank only independent hotels. Specifically, I split Expedia’s ranking into chains versus independent hotels (non-chains) and evaluate consumers’ choices from each separate ranking. For the Independent ranking, I eliminate all the chains displayed in a search and offer consumers all the non-chains in the same order as they were ranked under Expedia’s ranking. I then simulate their clicks and purchases using Weitzman’s (1979) search rules. Similarly, the Chain ranking displays only chains as they were ordered under Expedia’s ranking. My results can be found in Table 8. Contrary to expectations, the concern about the In- dependent ranking hurting consumer welfare can be alleviated. More precisely, I find that the Independent ranking decreases match values, but this loss is compensated by even lower search costs, leading to an overall gain of as much as $9.20 (5% of the transaction price) for consumers. In other words, the decrease in search costs owing to the reduction in alternatives presented over- weighs the decrease in diversity of options displayed, increasing consumer valuation. Also, the Independent ranking leads to a slight decrease in revenues for the intermediary, finding which is consistent with Amazon’s persistent belief in promoting consumer welfare, even at the expense of lower revenues. 35See http://www.wsj.com/articles/amazons-new-travel-service-enters-lucrative-online-travel-\ market-1429623993

30 Table 8: Counterfactual 3: Chain versus Independent Hotel Ranking Results

Destination 4562 Destination 9402 I-X C-X I-X C-X Change in Consumer Valuation $9.20 −$66.46 $6.23 −$21.68 Match −$12.68 −$57.93 −$47.13 −$24.19 Price −$2.08 $30.59 −$1.15 $20.47 Total Search Costs −$19.80 −$22.06 −$52.22 −$22.98 % Change in Transactions 0.66% −6.31% −6.59% −4.07% % Change in Revenue −0.50% 9.34% −7.28% 8.57% Legend: I=Independent hotel ranking; C=Chain ranking; X=Mix independent hotels and chains.

In contrast, the alternative of offering a ranking dedicated to chains is detrimental to con- sumer welfare. Here, the loss in match values overweighs the gain in search costs, leading to a loss of at least $21.68 for consumers. The differential effect of the Independent versus the Chain ranking suggests that promoting independent hotels ensures a relatively higher degree of varia- tion in the hotels displayed, thereby mitigating the reduction in options offered. In contrast to the Independent ranking, a Chain ranking would be beneficial for the intermediary, increasing revenues by as much as 9.34%, largely from higher transaction prices.

9.4 Conclusions and Future Research

In this paper, I study the effect of rankings on consumer search and purchase decisions in the context of hotel searches on Expedia. I employ a unique data set from a field experiment fully randomizing the position of hotels in Expedia’s ranking, which allows me to recover the causal effect of rankings on choices. I find that (1) top positions lead to more clicks and purchases, but conditional on a click, higher ranked hotels do not generate more purchases and that (2) rankings mainly affect clicks by reducing search costs, rather than through expectations or direct utility. To quantify these effects, I estimate consumers’ preference and search costs using a model of sequential search. I use the model’s estimates to construct three counterfactual experiments of interest comparing the value of Expedia’s ranking with that of alternative rankings. First, I show that using the model’s estimates to construct a Utility-based ranking improves matches and lowers consumer search costs. Second, I find that moving Expedia’s ranking to mobile devices where search costs are larger and position effects are more important, hurts consumers and leads to fewer transactions. Third, I investigate the merits of focusing a ranking on only one type of hotel. Despite concerns about welfare losses due to a reduction in diversity of hotels displayed, I find that a ranking dedicated to Independent hotels may be beneficial for consumers. This finding suggests new avenues for improving the performance of a ranking. Although this paper makes important steps toward understanding the role of rankings, data availability imposes a number of limitations. First, my data set does not provide enough infor- mation to link different searches made by the same consumer. With this type of data, I would be able to study how different ranking types affect consumer behavior across different searches,

31 rather than within a search. This is an important area of future study inquiring into the nature of the relation between rankings and consumer learning over time. It would also allow me to better understand the source of the magnitude of search costs within a search. Second, the data does not allow comparing the two rankings in quantitative terms. With this data, I would be able to both quantity the endogeneity bias of the ranking as well as propose a method to eliminate the endogeneity bias inherent in the ranking. This method can be validated by comparing estimates obtained by using it with those obtained from the Random ranking subsample. I would also compare popular methods used in the literature, such as a control function approach or a simul- taneous equation model to determine which method works best in eliminating the endogeneity bias. This would provide managers as well as academics with a tested method for resolving the endogeneity bias inherent in rankings. One area of research I plan to focus on next is propose a new method for improving matches between consumers and hotels without refining Expedia’s current ranking. Previous literature as well as Expedia’s current approach to ranking has been focused on improving the ranking algorithm by discovering which features best predict consumers’ purchase behavior. A different approach would be to split Expedia’s ranking into smaller rankings (for example a Chain or a Non-Chain ranking) and allow consumers to choose which ranking they wish to search from prior to seeing any hotels. The main benefit of this approach lies in increasing the number of “position 1’s”, which I expect will benefit both consumers (through lower search costs) and Expedia (through more purchases).

32 10 References

1. Anderson, S. P., and R. Renault (2015): “Search Direction,” Working paper.

2. Ansari, A., and C. Mela (2003): “E-customization,” Journal of Marketing Research, 40, 131-145.

3. Ansari, A., S. Essegaier, and R. Kohli (2000): “Internet Recommendation Systems,” Journal of Marketing Research, 37, 363-375.

4. Athey, S., and G. Ellison (2011): “Position Auctions with Consumer Search,” Quarterly Journal of Economics, 126, 1213-1270.

5. Athey, S., and G.W. Imbens (2015): “Machine Learning Methods for Estimating Heteroge- neous Causal Effects,” Working paper.

6. Baye, M, B. De los Santos, and M. Wildenbeest (2014): “What’s in a Name? Measuring Prominence, and Its Impact on Organic Traffic from Search Engines,” Working paper.

7. Berman, R., and Z. Katona (2013): “The Role of Search Engine Optimization in Search Marketing,” Marketing Science, 32, 644-651.

8. Bilotkach, V., N. Rupp, and V. Pai (2013): “Value of a Platform to a Seller: Case of American Airlines and Online Travel Agencies,” Working Paper.

9. Blake, T., C. Nosko, and S. Tadelis (2014): “Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment,” Working paper.

10. Breiman, L., J. Friedman, C. Stone, and R. Olshen (1984): “Classification and Regression Trees,” Wadsworth Statistics/Probability.

11. Burges, C., T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender (2005): “Learning to Rank Using Gradient Descent,” In Proceedings of the 22nd Interna- tional Conference on Machine Learning, 89-96. ACM.

12. Burges, C., R. Ragno, and Q.V. Le (2006): “Learning to Rank with Non-Smooth Cost Functions,” Advances in Neural Information Processing Systems.

13. Burges, C. (2010): “From RankNet to LambdaRank to LambdaMART: An Overview,” Technical report, Research Technical Report MSR-TR-2010-82.

14. Chan, T.Y., and Y.H. Park (2015): “Consumer Search Activities and the Value of Ad Positions in Sponsored Search Advertising,” Marketing Science, Articles in Advance, 1-18.

15. Chapelle, O., and Y. Chang (2011): “Yahoo! Learning to Rank Challenge Overview,” Journal of Machine Learning Research-Proceedings Track, 14, 1-24.

33 16. Chen, Y., and S. Yao (2014): “Sequential Search with Refinement: Model and Application with Click-stream Data,” Working paper.

17. Cross, R., J. Higbie, and D. Cross (2009): “Revenue management’s renaissance: a rebirth of the art and science of profitable revenue generation,” Cornell Hospitality Quarterly 50: 56-81.

18. De los Santos, B., and S. Koulayev (2014): “Optimizing Click-through in Online Rankings for Partially Anonymous Consumers, ” Working paper.

19. De los Santos, B., A. Hortaçsu, and M. Wildenbeest (2012): “Testing Models of Consumer Search Using Data on Web Browsing and Purchasing Behavior,” American Economic Re- view, 102, 2455-2480.

20. De Corniere, A., and G. Taylor (2014): “Quality Provision in the Presence of a Biased Intermediary,” Working paper.

21. Einav, L., T. Kuchler, J. Levin, and N. Sundaresan (2015): “Assessing Sale Strategies in Online Markets Using Matched Listings,” American Economic Journal: Microeconomics, 7(2), 215-247.

22. Ghose, A., P. Ipeirotis, and B. Li (2012a): “Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd-Sourced Content,” Marketing Science, 31, 492-520.

23. Ghose, A., P. Ipeirotis, and B. Li (2012b): “Surviving Social Media Overload: Predicting Consumer Footprints on Product Search Engines, ” Working paper.

24. Ghose, A., A. Goldfarb, S.P. Han (2012c): “How Is the Mobile Internet Different? Search Costs and Local Activities, ” Information System Research, Articles in Advance, 1-19.

25. Ghose, A., P. Ipeirotis, and B. Li (2013): “Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue,” forthcoming at Management Science.

26. Ghose, A., and S. Yang (2009): “An Empirical Analysis of Search Engine Advertising: Sponsored Search in Electronic Markets,” Management Science, 55, 1605-1622.

27. Hagiu, A., and B. Jullien (2011): “Why do Intermediaries Divert Search?," RAND Journal of Economics, 42, 337-362.

28. Hausman, J. (1996): “Valuation of New Goods Under Perfect and Imperfect Competition,” in The Economics of New Goods, Studies in Income and Wealth, ed. by T. Bresnahan, and R. Gordon, 207-248. National Bureau of Economic Research.

29. Yoganarasimhan, H. (2014): “Search Personalization,” Working paper.

34 30. Hong, H., and M. Shum (2006): “Using Price Distributions to Estimate Search Cost,” RAND Journal of Economics, 37, 257-275.

31. Honka, E. (2014): “Quantifying search and switching costs in the U.S. auto insurance industry,” forthcoming in RAND Journal of Economics.

32. Honka, E., and P. Chintagunta (2014): “Simultaneous or Sequential? Search Strategies in the U.S. Auto Insurance Industry,” Working paper.

33. Hortaçsu, A., and C. Syverson (2004): “Product Differentiation, Search Costs, and Com- petition in the Mutual Fund Industry: A Case Study of S&P 500 Index Funds,” Quarterly Journal of Economics, 119, 403-456 .

34. Jerath, K., L. Ma, Y. Park, and K. Srinivasan (2011): “A Position Paradox in Sponsored Search Auctions,” Marketing Science, 30, 612-627.

35. Jeziorski, P., and S. Moorthy (2014): “Advertiser Prominence Effects in Search Advertising,” Working paper.

36. Jeziorski, P., and I. Segal (2012): “What Makes them Click: Empirical Analysis of Consumer Demand for Search Advertising,” Working paper.

37. Kihlstrom, R. E., and M. H. Riordan (1984): “Advertising as a Signal,” Journal of Political Economy, 92, 427-450.

38. Kim, J. B., P. Albuquerque, and B. J. Bronnenberg (2010): “Online Demand under Limited Consumer Search,” Marketing Science, 29, 1001-1023.

39. Koulayev, S. (2014): “Search for Differentiated Products: Identification and Estimation,” RAND Journal of Economics, 45, 553-575.

40. Manchanda, P., P.E. Rossi, and P.K. Chintagunta (2004): “Response Modeling with Non- random Marketing-Mix Variables,” Journal of Marketing Research, 41, 467-478.

41. Manski, C., and S. Lerman (1981): “On the Use of Simulated Frequencies to Approximate Choice Probabilities,” in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric Applications, MIT Press, Cambrdige, MA, 305-319.

42. Mauri, A.G. (2013): “Hotel Revenue Management: Principles and Practices,” Pearson.

43. McFadden, D. (1989): “A Method of Simulating Moments for Estimation of Discrete Re- sponse Models Without Numerical Integration,” Econometrica, 57, 995-1026.

44. Mehta, N., S. Rajiv, and K. Srinivasan (2003): “Price Uncertainty and Consumer Search: A Structural Model of Consideration Set Formation,” Marketing Science, 22, 58-84.

35 45. Moraga-Gonzalez, J.L., Z. Sandor, and M. Wildenbeest (2015): “Consumer Search and Prices in the Automobile Market,” Working paper.

46. Narayanan, S., and K. Kalyanam (2014): “Position Effects in Search Advertising: A Re- gression Discontinuity Approach,” Working paper.

47. Rao, A., and S. Akca (2015): “Value of Search Aggregators,” Working Paper.

48. Seiler, S. (2013): “The Impact of Search Costs on Consumer Behavior: A Dynamic Ap- proach,” Quantitative Marketing and Economics, 11, 155-203.

49. Train, K. (2009): “Discrete Choice Methods with Simulation,” Cambridge University Press.

50. Varian, H. R. (2007): “Position Auctions,” International Journal of Industrial Organization, 25, 1163-1178.

51. Weitzman, M. L. (1979): “Optimal search for the best alternative,” Econometrica, 47, 641-654.

52. Yang, S., and A. Ghose (2010): “Analyzing the Relationship Between Organic and Spon- sored Search Advertising: Positive, Negative, or Zero Interdependence?,” Marketing Science, 29, 602-623.

53. Yao, S., and C. F. Mela (2011): “A Dynamic Model of Sponsored Search Advertising,” Marketing Science, 30, 447-468.

36 11 Appendix A: Data Cleaning

The training data set contains 9,917,530 observations. I make two necessary changes to the raw data set and use 7,986,074 observations for my analysis.

1. First, that data set contains some errors in the way price information was stored. Some hotel prices are either very high, more than $19 million per night, or very low, $0.01 per night. I used the following method to remove searches that include such outliers. In the data set I observe not only the average price displayed for a hotel, but also the total spent by the consumer (i.e. price multiplied by the number of nights and number of hotel rooms booked, plus taxes and fees). I used these two numbers to correct for outliers. More precisely, I removed searches that contain at least one observation where the total amount spent exceeds the price paid multiplied by the length of the trip and the number of rooms booked plus taxes (not exceeding 30% of the price). This meant removing 1,618,626 observations.

2. Second, I choose to focus on “typical” searches and remove searches that include prices lower than $10 or higher than $1000 per night. This further reduces the data set by 312,830 observations. I focus only on these searches for two reasons. First, not having very high or very low prices helps mitigate the first problem above for searches not ending in a transaction, but which are likely to suffer from the same problems. Second, there are very few searches that include such extreme prices. The histogram in Figure 6 below shows that hotels with prices close to $1000 are very rare.

Figure 6: Histogram of Prices Displayed by Hotels

0.05.1Fraction2004006008001000Price .1 Fraction .05 0 0 200 400 600 800 1000 Price

Note: Very few hotels have prices close to $1000.

37 12 Appendix B: Further Evidence

12.1 Further Evidence for Section 4: The Experiment

In this section, I test formally for the two types of randomness present in my data set: (i) consumers were randomly assigned to the two types of rankings and (ii) in constructing the Random ranking, a hotel’s quality did not affect its probability of being placed in any position, but rather the position of the hotel was randomly determined. These claims are also supported informally by discussions with the administrator of the competition.

12.1.1 Random assignment of consumers to the two types of rankings

To show that consumers were randomly assigned to each type of ranking, I perform two tests. First, I test whether the time of arrival at the OTAs website is related to the type of ranking the consumer saw. One concern may be that different types of consumers visit the website at different times of the day and if the probability of observing one type of ranking is different at different times of the day, then this could bias the results. For example, if business travelers were known to search for hotels after 5pm and they are also more likely to purchase, then if after 5pm the probability of observing the random ranking were lower, there would be a correlation between higher purchases and Expedia’s ranking in the data that is not due to the causal effect of the ranking observed, but rather to the way in which consumers were assigned to different rankings. However, Figure 7 shows that this is not a concern in the data. More precisely, the left panel plots the number of search impressions made by the time of the day and shows that more searches occur in the afternoon and evening. The right panel plots the fraction of search impressions seeing the Random ranking every 30 seconds during the course of one day, in the entire data set. Even though more consumers are searching in the second part of the day, the fraction seeing the Random ranking is constant throughout the day. Thus, this figure suggests that consumers were randomly assigned to seeing either type of ranking.

Figure 7: Number of search impressions displayed and the fraction seeing the Random ranking

every 30 seconds in a day

050100150200Number0am4am8am12pm4pm8pmTime of theof search day impressions 0.2.4.6.8Pc.0am4am8am12pm4pm8pmTimePercent95%Fitted of CIof searchValues thewith day Randomimpressions Ranking with random ranking .8 200 .6 150 .4 100 .2 50 Number of search impressions 0 Pc. of search impressions with random ranking 0am 4am 8am 12pm 4pm 8pm 0am Time of the day 0 Percent with Random Ranking 95% CI 0am 4am 8am 12pm 4pm 8pm 0am Time of the day Fitted Values Number of search impressions Fraction of random search impressions

38 The second test I perform to check whether consumers were randomly assigned to see each ranking is to check whether consumer characteristics observed by the OTA prior to showing a ranking are different between the two rankings. When the consumer arrives at the OTAs website, she reveals details of her upcoming trip, such as her destination, the length of the trip, how long in advance she is searching for, the number of travelers and rooms requested, as well as whether her trip includes a Saturday night. For some consumers, the OTA also has historical information that is revealed when the consumer arrives at the website. One concern might be that the OTA takes all of this information into account when it decides which search impressions see the Random ranking and which see Expedia’s ranking.

Table 9: T-test (Expedia-Random) ranking: Consumer observables

Destination All All 4562 4562 9402 9402 8347 8347 Conversion No Tran Tran No Tran Tran No Tran Tran No Tran Tran Trip Length (days) 0.1928∗∗∗ 0.0614∗∗∗ 0.1138 0.0686 0.1788 0.2212 -0.1466 0.7559 (0.0197) (0.0155) (0.1762) (0.3523) (0.2188) (0.2190) (0.2001) (0.5166) Booking Window (days) 5.0308∗∗∗ 1.0879∗ -6.9376 -3.0726 10.5048 -4.0115 -1.9349 -3.8941 (0.4611) (0.4372) (4.0245) (7.7610) (6.2182) (7.4734) (4.5702) (11.2487) Adults 0.0455∗∗∗ -0.0128 0.0197 -0.0124 0.0055 0.2043 -0.0354 0.0817 (0.0066) (0.0084) (0.0556) (0.1250) (0.0800) (0.1216) (0.0757) (0.2351) Children 0.0517∗∗∗ 0.0001 -0.0000 0.0316 0.0104 -0.0686 -0.1746∗ -0.1489 (0.0059) (0.0073) (0.0404) (0.0919) (0.0669) (0.1063) (0.0721) (0.2536) Rooms 0.0057 -0.0031 -0.0137 -0.0788 -0.0337 0.0330 -0.0866∗∗ 0.0554 (0.0034) (0.0041) (0.0274) (0.0609) (0.0356) (0.0544) (0.0279) (0.0895) Saturday Night -0.0243∗∗∗ -0.0063 -0.0178 -0.0126 0.0588 -0.0077 0.0501 -0.0452 (0.0038) (0.0049) (0.0301) (0.0809) (0.0451) (0.0743) (0.0305) (0.1127) Consumer Hist. Stars -0.0052 0.0511 -0.2330 0.8855 -1.0850 0.5469 -0.0383 0.7916 (0.0355) (0.0287) (0.3028) (.) (.) (0.4111) (0.6993) (.) Consumer Hist. Price -5.4904 0.1174 -21.9471 30.5742 -203.1883 75.1268 -31.1367 23.6600 (5.3397) (4.4236) (62.5292) (.) (.) (73.3137) (46.9590) (.) Observations 115,776 201,442 1,321 1,331 781 1,389 942 950 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Note: (.) denotes the fact that there are no consumers for which there is historical information who purchased under the Random ranking.

Table 9 shows that this also not a concern. Comparing search impressions with the same conversion across the two rankings by means of a t-test, I find that consumers seeing Expedia’s ranking have very similar characteristics as those seeing the Random ranking. Although in the full sample (first two columns) some of these difference are statistically significant, their magni- tude is very small and the significance disappears when I condition on a particular destination. Combined, these findings suggest that there are no systematic differences in consumer observables across the two different types of rankings. This finding alleviates some concerns about the two groups of consumers being comparable. In particular, to the extent that these observables reveal information about both preferences and search costs, this findings suggest that consumers may be comparable across the two rankings. Moreoever, one indication of consumer search costs is the booking window, the number of days before the trip that the consumer searches. In the full sample, consumers under the Expedia ranking search 1-5 days further in advance of their trip, and may therefore have lower search costs than consumers under the Random ranking. This finding contradicts the hypothesis that consumers under the Random ranking have lower search costs

39 because they click even under a ranking that is inferior. This effect goes away when I condition on a destination, further cementing the idea that consumers searching under the rankings are similar.

12.1.2 The Random ranking

The second type of randomness in my data set comes from the construction of the Random ranking. As stated in the competition description, Expedia’s approach is a learning to rank ap- proach.36 This means that the hotel’s position at a point in time depends on its past performance. A machine learning algorithm computes a score for the hotel based on its past conversion and click through rates, its characteristics and its match with the consumer search query entries. A higher score is interpreted as a more desirable hotel that has a higher probability of purchase. Hotels are then ranked in decreasing order of these scores. This method for ranking hotels makes the position of the hotel endogenous. In contrast, under the Random ranking, hotels are randomly assigned to positions, as stated by the competition administrator.37 To show that this is indeed the case in the data, I run a rank ordered logit regression of position on hotel past conversion rates and its characteristics to mimic what a learning to rank algorithm does. My results can be found in Table 10 below. For this test, I restrict my attention to four of the largest destinations in my data set.38 With very few exceptions, past performance of the hotel or its characteristics do not determine its position within the Random ranking, while under Expedia’s ranking there is a strong correlation between these characteristics and the hotel’s position.39 Moreover, this result provides important insights into how the Expedia’s ranking is constructed. Expedia’s ranking favors non-chains that were purchased more often in the past, that are cheaper, of higher quality and that are on promotion. The fact that the Random ranking is constructed by ignoring past performance of the hotel or its match with the consumer search query makes individual hotels have a more even probability of being displayed. This symptom can be illustrated with a simple plot. In Figure 8, I plot the histogram (with a normal density) of the number of times a hotel in destination 4562 is displayed on the first page of results under each type of ranking. I find that under Expedia’s ranking a few hotels are displayed 800 times (out of approximately 1600 possible search impressions), while the median hotel is only displayed 35 times. This observation is consistent with the idea of Expedia’s ranking oversampling some hotels as the ranking algorithm displays more favorable hotels more frequently. In contrast, the right panel shows the histogram of the number of times a hotel in the same destination is displayed under the Random ranking. Here the distribution looks closer to a normal distribution, with the mean and median number of displays around 52 (out of 980 possible search impressions). This finding is consistent with the idea that under the Random

36https://www.kaggle.com/c/expedia-personalized-sort/forums/t/5808/position-benchmark. 37https://www.kaggle.com/c/expedia-personalized-sort/forums/t/5772/meaning-of-random-bool. 38Destination 8192 has the largest number of observations (121,522), but has few observations with the Random ranking, so I choose to focus on the next four largest destinations. See Appendix B 12.8 for summary statistics. 39In Appendix B 12.1, I show a symptom of displaying hotels based on past performance under Expedia’s ranking, that of Expedia oversampling a small set of hotels to display at the top of the ranking.

40 Table 10: Effect of a hotel’s past conversion rate and characteristics on position by ranking type

Destination 4562 Destination 9402 Destination 8347 Destination 13870 Random Expedia Random Expedia Random Expedia Random Expedia Position Position Position Position Position Position Position Position Past CR 0.5893 2.7753∗∗∗ 1.1671 6.6953∗∗∗ 0.4661 3.1106∗∗∗ 1.9734 5.7512∗∗∗ (0.5558) (0.2656) (0.7776) (0.3642) (0.6967) (0.2829) (1.5981) (0.5137) Price -0.0008∗∗∗ -0.0061∗∗∗ -0.0005 -0.0086∗∗∗ -0.0002 -0.0021∗∗∗ -0.0011 -0.0028∗∗∗ (0.0002) (0.0003) (0.0003) (0.0003) (0.0004) (0.0003) (0.0008) (0.0003) Stars 0.0253 0.4373∗∗∗ 0.0604 1.0209∗∗∗ 0.0254 0.3185∗∗∗ 0.0124 0.5431∗∗∗ (0.0292) (0.0275) (0.0381) (0.0300) (0.0441) (0.0292) (0.0701) (0.0262) Review Score -0.0442∗ 0.0852∗∗∗ -0.0809∗ 0.5471∗∗∗ -0.0562 0.2600∗∗∗ 0.0273 0.3158∗∗∗ (0.0183) (0.0190) (0.0356) (0.0391) (0.0304) (0.0327) (0.0711) (0.0377) Chain -0.1110∗∗ -0.1188∗∗∗ 0.0201 0.4860∗∗∗ 0.0780 -0.3143∗∗∗ 0.0104 -0.3501∗∗∗ (0.0384) (0.0282) (0.0500) (0.0358) (0.0626) (0.0324) (0.0904) (0.0330) Location Score 0.0312∗ 0.4543∗∗∗ -0.0231 0.4350∗∗∗ 0.0734∗∗ 0.0911∗∗∗ -0.0100 0.1708∗∗∗ (0.0130) (0.0178) (0.0150) (0.0182) (0.0243) (0.0184) (0.0291) (0.0155) Promotion 0.1020∗ 0.6865∗∗∗ 0.1076∗ 1.1274∗∗∗ 0.1717∗∗ 0.8561∗∗∗ -0.0061 0.8488∗∗∗ (0.0410) (0.0280) (0.0539) (0.0307) (0.0553) (0.0355) (0.0925) (0.0310) Observations 26,397 50,435 19,530 46,368 15,171 39,507 6,702 46,198 Log likelihood -12,233 -21,077 -8,668 -17,619 -6,890 -16,718 -3,023 -18,435 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Rank ordered logit regression with dependent variable position. A positive coefficient means correlation with a top position. Positions greater than five are coded as incomplete. This is motivated by the observation that the learning to rank algorithm is engineered to correctly predict choices in top positions, with lower penalties for predicting a lower position wrong. As a result, to test whether the same algorithm is at play behind both rankings I focus on top positions. ranking, hotels have a more even probability of being displayed. The same pattern that appears in these graphs holds even if I only look at hotels in top positions instead of all displayed or if I look across different destinations.

Figure 8: Number of times a hotel is displayed by search impression type: Destination 4562

0.2.4.6Fraction200400600800Number of times the hotel is displayed 0.05.1.15Fraction50100150Number of times the hotel is displayed .6 .15 .4 .1 Fraction Fraction .2 .05 0 0 0 200 400 600 800 0 50 100 150 Number of times the hotel is displayed Number of times the hotel is displayed Expedia’s Ranking Random Ranking

41 12.2 Further Evidence for Section 4.1: Learning to Rank Algorithm

In this Appendix, I describe the basic learning to rank problem, its evaluation and describe the winning algorithm, LambdaMART, for the Expedia challenge on Kaggle.40 This algorithm has also won the 2010 Yahoo! Learning to Rank Challenge (Chapelle and Chang, 2011) and had been used by Yoganarasimhan (2014) to improve on the winning algorithm of the competition using personalization. Yoganarasimhan (2014) provides a more in depth discussion of similar concepts. I will first describe how rankings are evaluated and then present the formal learning problem.

12.2.1 Evaluation

The most commonly used evaluation method in the learning to rank literature is NDCG (Nor- malized Discounted Cumulative Gain). It is also used by Kaggle to evaluate the quality of the rankings proposed in the Expedia challenge, as well as in many other settings (for example the Yandex competition studied by Yoganarasimhan, 2014). I will thus also use this metric and describe it briefly here. Denote by k the number of possible hotels to be ranked on one results page (in my application, this was 38). Define the Discounted Cumulative Gain (DCG) as the following

k rel X 2 p − 1 DCGk = p log2(p + 1) where relp gives the relevance assigned by the consumer to the hotel in position p. For example, for the Expedia competition if a consumer purchased the hotel in position p, then relp was assigned a value of 5, if the consumer clicked in position p then relp = 1 and relp = 0 if the consumer did not consider the hotel in position p. The NDCG is formed by dividing the DCG by the ideal DCG (IDCG) which is a ranking ordered by the revealed relevance scores of the consumer. As a result, the NDCG ∈ {0, 1}. As an illustration of this metric, consider the following example. Suppose there are three hotels A, B, C and a consumer is shown these hotel in this order. Further suppose that the consumer clicks on A, purchases from B and does not consider C. In this case,

21 − 1 25 − 1 DCG3 = + + 0 ≈ 20.56 log2(2) log2(3) The ideal ranking would have displayed B before A, in which case

25 − 1 21 − 1 IDCG3 = + + 0 ≈ 31.63 log2(2) log2(3)

DCG3 It follows that the ranking proposed A, B, C achieved a score of NDCG3 = = 0.65. As IDCG3 a comparison, the winning algorithm for Expedia’s competition on Kaggle earned a NDCG38 =

40Winning algorithms by Owen Z. and Jun Wang are available at https://www.kaggle.com/c/ expedia-personalized-sort/details/winners

42 0.54 improving on Expedia’s default ranking which scored NDCG38 = 0.50.

12.2.2 Learning problem

Machine learning is a subfield of computer science that studies algorithms to learn patterns and make predictions from data. Learning to rank algorithms are an example of such algorithms with the goal of ranking documents based on relevance. In the current application, a document is a hotel and its characteristics. The system, Expedia, maintains a collections of these hotels and when a consumer makes a search query, it proceeds to rank the hotels. The ranking task can thus be summarized by a ranking model f(xih, δ) that takes the characteristics xih of hotel h in response to a query by consumer i and computes a score sˆih = f(xih, δ) such that hotels with a higher score are ranked closer to the top. Note that a ranking model f(xih, δ) can be constructed even without learning by only taking into account the characteristics of the hotel, such as price and number of stars. In contrast, a learning ranking model exploits the availability of data on so called relevance scores of the users. What this means is that in many instances (including in the present application) data on consumer clicks or bookings are available. This data gives an indication of how relevant the ranking proposed was for the consumer. Learning to rank algorithms thus use consumer observed choices to construct f(xih, δ) in addition to the characteristics of the hotel. The purpose of the model is then to learn a function that will rank most relevant hotels at the top. More concretely, this means that the goal is to find a function that will rank the hotel that will be purchased by the consumer at the top (the most relevant), followed by hotels that the consumer will click and finally by those that will not be considered by the consumer. Denote by sih the score given by consumer i to hotel h, where sih is highest if i purchases h (for example, for the competition, sih = 5 if h was purchased by i, sih = 1 if h was clicked and zero if the consumer did not consider h). The purpose of the learning to rank algorithm is then to take the data on hotel characteristics xih for a query performed by i and i’s clicks/purchases sih for each h and learn a function

f(xih, δ) =s ˆih so that the ranking order of predicted scores sˆih are exactly equal to the ranking order of observed sih. Consider two hotels h and j. The observed relevance score for a consumer can be interpreted as

 s > s , if h is prefered to j  ih ij  sih = sij, if h is equal to j (16)   sih < sij, if j is prefered to h The researcher than proposes a model to predict how relevant a consumer will find two hotels. A ranking algorithm that is at the basis of LambdaMART is called RankNet (see Burges et al, 2005, 2010). This method models the probability that a hotel h is more relevant than j using a

43 sigmoid function as follows

1 P = 1 + e−σ(ˆsih−sˆij ) where σ determines the shape of the sigmoid function. The log-likelihood of observing the data is then given by

LL = −P¯ log(P ) − (1 − P¯)log(1 − P ) where P¯ s are the actual probabilities observed in the data. Define

 1, if s > s  ih ij  yihj 0, if sih = sij (17)   −1, if sih < sij Using this new notation, the log-likelihood becomes

1 LL = (1 − y )σ(ˆs − sˆ ) + log(1 + e−σ(ˆsih−sˆij )) 2 ihj ih ij When the function f is known, then estimation proceeds using maximum likelihood to recover the parameters δ of the function. However, when the function f is not known, both it and its parameters must be recovered through estimation. LambdaMART and other learning to rank algorithms provide a solution to this problem. The solution starts by using stochastic gradient descent algorithm to determine δ iteratively by updating from

∂LL δ → δ + η ∂δ   ∂LL ∂sˆih ∂sˆij 1 1 where = λ( − ) and λ = σ (1 − yihj) − −σ(ˆs −sˆ ) and where η is the rate at which ∂δ ∂δ ∂δ 2 1+e ih ij the researcher wants the algorithm to learn. These results follow from differentiation. However, Burges et al. (2006) show that modifying the expression for λ so that it is weighted by the change in NDCG from changing two hotels’ positions performs better. More precisely, this means defining λ as

−σ λ = |∆NDCG| 1 + e−σ(ˆsih−sˆij ) h,j h,j where |∆NDCG| = |NDCG(ˆsi) − NDCG(ˆsi )| and NDCG(ˆsi ) is the NDCG score of sˆi with the entries for h and j switched. The model is then trained using gradient boosted regression trees (MART-Multiple Additive Regression Tree). A regression tree is a method to determine the effect of one variable on another that classifies the predictor variable into sets and tries to determine the threshold for classification based on these. In addition, gradient boosted methods perform classification based on residuals (for more details, see Breiman et al. (1984) for an introduction). Learning to rank is a supervised learning task and thus uses training, validation and testing

44 phases. What this means is that the researcher usually has access to three types of data sets. The training data set typically consists of search queries performed by the consumer (the details of the trip that the consumer requested) and the ranking she was displayed (the ordered list of hotels she observes in response to her query). It also includes how relevant the consumer perceived the ranking to be. This relevance is manifested in clicks and/or bookings of the hotels displayed. Hotels booked are interpreted to be the most relevant, followed by those clicked and lastly by those not clicked. In technical terms, the training data set is considered labeled. The validation data set contains the same features as the training data set, while the testing data set contains the displayed hotels, but does not reveal consumers’ clicks and purchases. The researcher estimates several models on the training data set, and tests which model performs best in terms of out of sample prediction on the validation data set. The chosen model is then used to predict choices made on the test data set that does not contain relevance scores. In technical terms, the test data is not labeled.

45 12.3 Further Evidence for Section 4.1: Hotel characteristics clicked and purchased

Table 11: Hotel characteristics clicked by search impression type

No Tran. Tran. No Tran. Tran. Random Expedia Random Expedia Mean SD Mean SD Mean SD Mean SD Diff. Diff. Price 147.81 94.81 156.13 95.52 117.00 56.76 118.77 58.58 -8.32∗∗∗ -1.77∗∗ Stars Less than 3 0.15 0.36 0.10 0.30 0.20 0.40 0.17 0.38 0.05∗∗∗ 0.03∗∗∗ 3 0.40 0.49 0.34 0.47 0.46 0.50 0.44 0.50 0.05∗∗∗ 0.02∗∗∗ 4 0.36 0.48 0.44 0.50 0.30 0.46 0.34 0.48 -0.08∗∗∗ -0.05∗∗∗ 5 0.09 0.29 0.11 0.32 0.04 0.20 0.05 0.22 -0.02∗∗∗ -0.01∗∗∗ Review Score Less than 2.5 0.07 0.25 0.04 0.18 0.03 0.17 0.03 0.16 0.03∗∗∗ 0.01∗∗∗ Between 2.5 and 3 0.09 0.29 0.07 0.26 0.10 0.30 0.08 0.28 0.02∗∗∗ 0.02∗∗∗ Between 3.5 and 4 0.48 0.50 0.52 0.50 0.52 0.50 0.53 0.50 -0.03∗∗∗ -0.01∗∗ Between 4.5 and 5 0.35 0.48 0.38 0.49 0.35 0.48 0.36 0.48 -0.02∗∗∗ -0.01 Chain 0.59 0.49 0.64 0.48 0.69 0.46 0.67 0.47 -0.05∗∗∗ 0.02∗∗∗ Location Score 2.85 1.53 3.22 1.51 2.57 1.38 2.78 1.44 -0.37∗∗∗ -0.21∗∗∗ Promotion 0.24 0.43 0.36 0.48 0.24 0.43 0.31 0.46 -0.11∗∗∗ -0.07∗∗∗

Significance of differences obtained by means of a t-test. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table 12: Hotel characteristics of purchased hotels by search impression type

Random Ranking Expedia’s Ranking Mean SD Mean SD Diff. Price 116.55 55.50 118.07 57.42 -1.52∗∗ Stars Less than 3 0.20 0.40 0.17 0.38 0.03∗∗∗ 3 0.46 0.50 0.44 0.50 0.02∗∗∗ 4 0.30 0.46 0.34 0.47 -0.04∗∗∗ 5 0.04 0.20 0.05 0.22 -0.01∗∗ Review Score Less than 2.5 0.03 0.17 0.02 0.15 0.01∗∗∗ Between 2.5 and 3 0.10 0.30 0.08 0.27 0.02∗∗∗ Between 3.5 and 4 0.52 0.50 0.53 0.50 -0.02∗∗ Between 4.5 and 5 0.35 0.48 0.36 0.48 -0.01 Chain 0.70 0.46 0.67 0.47 0.03∗∗∗ Location Score 2.54 1.37 2.74 1.43 -0.20∗∗∗ Promotion 0.24 0.43 0.31 0.46 -0.07∗∗∗

Significance of differences obtained by means of a t-test. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

46 12.4 Further Evidence for Section 5.1: Robustness Checks for the Effect of Rankings on Search and Purchases

12.4.1 Click through rate

Figure 9: Click through rate by results position and search impression type: Search impressions

with at least 30 hotels (median) displayed

Expedia'sRandom95% CI Ranking Ranking 0.05.1.15.2Click10203040Position through rate .2 .15 .1 Click through rate .05 0

0 10 20 30 40 Position

Expedia's Ranking Random Ranking 95% CI

Figure 10: Click through rate by results position and search impression type: Search impressions

ending in a transaction

0.05.1.15.2Click10203040PositionExpediaRandom95% throughCI RankingRanking rate .2 .15 .1 Click through rate .05 0

0 10 20 30 40 Position

Expedia Ranking Random Ranking 95% CI

47 Figure 11: Click through rate by results position and search impression type: Search impressions

not ending in a transaction

0.05.1.15.2Click10203040PositionExpediaRandom95% throughCI RankingRanking rate .2 .15 .1 Click through rate .05 0

0 10 20 30 40 Position

Expedia Ranking Random Ranking 95% CI

Figure 12: Click through rate by results position and search impression type: Search impressions

with positions reserved for opaque offers

Expedia'sRandom95% CI Ranking Ranking 0.05.1.15.2Click10203040Position through rate .2 .15 .1 Click through rate .05 0

0 10 20 30 40 Position

Expedia's Ranking Random Ranking 95% CI

48 12.4.2 Conversion Rate

Figure 13: Conversion rate (CR) conditional on a click by results position and search impression

type: Position 1-38

Expedia's95 CI Ranking Random95 CI Ranking .8.85.9.95Conversion010203040Position rate .6.7.8.91Conversion010203040Position rate 1 .95 .9 .9 .8 Conversion rate Conversion rate .85 .7 .8 .6

0 10 20 30 40 0 10 20 30 40 Position Position

Expedia's Ranking 95 CI Random Ranking 95 CI

Note: Restrict attention to search impressions ending in a transaction.

Figure 14: Conversion rate unconditional on click by results position and search impression type

0.05.1.15.2Unconditional10203040PositionExpedia's95 CI Ranking conversion rate 0.05.1.15Unconditional10203040PositionRandom95 CI Ranking conversion rate .15 .2 .15 .1 .1 .05 .05 Unconditional conversion rate Unconditional conversion rate 0 0

0 10 20 30 40 0 10 20 30 40 Position Position

Expedia's Ranking 95 CI Random Ranking 95 CI

Note: Restrict attention to search impressions ending in a transaction.

49 12.4.3 Characteristics Displayed by Position

In this Appendix, I plot the average characteristics of the hotels displayed by position under each type of ranking. I consider the average price, the number of stars and the review score of the hotels displayed.41 I restrict attention to the first 30 positions in the ranking, because of the high volatility in the number of observations for position higher than that. This allows me to plot approximately 90% of the data set in these figures.

Figure 15: Characteristics Displayed by Position: Price

130135140145Price0102030ResultsExpediaRandom95 CI position RankingRanking in search impression 150160170180Price0102030ResultsExpediaRandom95 CI position RankingRanking in search impression 145 180 140 170 Price Price 135 160 130 150 0 10 20 30 0 10 20 30 Results position in search impression Results position in search impression

Expedia Ranking Random Ranking Expedia Ranking Random Ranking 95 CI 95 CI

Ending in transaction Not ending in transaction

Figure 16: Characteristics Displayed by Position: Stars

3.13.23.33.43.5Stars0102030ResultsExpediaRandom95 CI position RankingRanking in search impression 3.33.43.53.63.7Stars0102030ResultsExpediaRandom95 CI position RankingRanking in search impression 3.5 3.7 3.6 3.4 3.5 3.3 Stars Stars 3.4 3.2 3.3 3.1

0 10 20 30 0 10 20 30 Results position in search impression Results position in search impression

Expedia Ranking Random Ranking Expedia Ranking Random Ranking 95 CI 95 CI

Ending in transaction Not ending in transaction

41The figures for the other characteristics (e.g. fraction of chains displayed) are available upon request.

50

Figure 17: Characteristics Displayed by Position: Review Score

3.63.73.83.94Review0102030ResultsExpediaRandom95 CI Score position RankingRanking in search impression 3.63.73.83.94Review0102030ResultsExpediaRandom95 CI Score position RankingRanking in search impression 4 4 3.9 3.9 3.8 3.8 Review Score Review Score 3.7 3.7 3.6 3.6 0 10 20 30 0 10 20 30 Results position in search impression Results position in search impression

Expedia Ranking Random Ranking Expedia Ranking Random Ranking 95 CI 95 CI

Ending in transaction Not ending in transaction

51 12.5 Further Evidence for Section 5.1: Effect of position on consumer clicks and purchases: Random and Expedia ranking

Table 13: The effect of position on clicks and purchases conditional on clicks: Random ranking

Click Tran Tran/Click Tran/Click Random Random Random Random Position -0.00177∗∗∗ -0.00012∗∗∗ -0.00020 -0.00066 (0.00002) (0.00001) (0.00014) (0.00065) Price -0.00013∗∗∗ -0.00002∗∗∗ -0.00022∗∗∗ 0.00001 (0.00000) (0.00000) (0.00002) (0.00014) Stars 0.01571∗∗∗ 0.00117∗∗∗ 0.00218 -0.00012 (0.00039) (0.00010) (0.00242) (0.01164) Review Score 0.00084∗∗ 0.00019∗ 0.00543∗∗ 0.01652 (0.00030) (0.00008) (0.00200) (0.01153) Chain 0.00137∗ 0.00021 0.00562 0.03417∗ (0.00058) (0.00015) (0.00348) (0.01639) Location Score 0.00513∗∗∗ 0.00062∗∗∗ 0.00856∗∗∗ 0.01954∗∗ (0.00020) (0.00005) (0.00126) (0.00682) Promotion 0.01018∗∗∗ 0.00154∗∗∗ 0.01549∗∗∗ 0.06238∗∗∗ (0.00062) (0.00016) (0.00342) (0.01517) Destination fixed effects Yes Yes Yes Yes Observations 683,799 683,799 28,653 2,127 Adjusted R2 0.014 0.002 0.031 0.070 Standard errors in parentheses Note: Linear probability model of clicks/transactions as a function of hotel characteristics and position. The last column reports regression coefficients for purchases conditional on a click in searches ending in a transaction. I restricted attention to destinations with at least 10,000 observations. This allows me to include destination fixed effects in the ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

52 Table 14: The effect of position on clicks and purchases conditional on clicks: Expedia ranking

Click Tran Tran/Click Tran/Click Expedia Expedia Expedia Expedia Position -0.00252∗∗∗ -0.00203∗∗∗ -0.00427∗∗∗ -0.00238∗∗∗ (0.00001) (0.00001) (0.00018) (0.00015) Price -0.00018∗∗∗ -0.00015∗∗∗ -0.00096∗∗∗ -0.00017∗∗∗ (0.00000) (0.00000) (0.00003) (0.00003) Stars 0.00740∗∗∗ 0.00554∗∗∗ 0.00908∗∗ 0.00867∗∗∗ (0.00026) (0.00022) (0.00295) (0.00247) Review Score 0.00133∗∗∗ 0.00170∗∗∗ 0.01641∗∗∗ 0.01372∗∗∗ (0.00026) (0.00022) (0.00319) (0.00265) Chain 0.00321∗∗∗ 0.00253∗∗∗ 0.00689 0.00807∗ (0.00035) (0.00031) (0.00393) (0.00329) Location Score 0.00278∗∗∗ 0.00292∗∗∗ 0.02804∗∗∗ 0.01265∗∗∗ (0.00016) (0.00014) (0.00190) (0.00160) Promotion 0.00390∗∗∗ 0.00401∗∗∗ 0.01790∗∗∗ 0.01558∗∗∗ (0.00034) (0.00030) (0.00364) (0.00306) Destination fixed effects Yes Yes Yes Yes Observations 1,646,147 1,646,147 63,832 53,702 Adjusted R2 0.026 0.024 0.101 0.017 Standard errors in parentheses Note: Linear probability model of clicks/transactions as a function of hotel characteristics and position. The last column reports regression coefficients for purchases conditional on a click in searches ending in a transaction. I restricted attention to destinations with at least 10,000 observations. This allows me to include destination fixed effects in the regression. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

53 12.6 Further Evidence for Section 6: Price Endogeneity Concern

Figure 18: Histogram of the adjusted R2 from running separate regressions of price on observable

characteristics for each hotel in a destination

0.05.1.15Fraction.2.4.6.81Adjusted R-squared 0.05.1.15.2Fraction.4.6.81Adjusted R-squared .2 .15 .15 .1 .1 Fraction Fraction .05 .05 0 0 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 Adjusted R-squared Adjusted R-squared

Destination 4562 Destination 9402

0.05.1.15Fraction.2.4.6.81Adjusted R-squared 0.05.1.15.2.25Fraction.4.6.81Adjusted R-squared .15 .25 .2 .1 .15 Fraction Fraction .1 .05 .05 0 0 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 Adjusted R-squared Adjusted R-squared Destination 8347 Destination 13870

54 Figure 19: Graph of the adjusted R2 from running separate regressions of price on observable

characteristics for each hotel in Destination 9402

0.2.4.6.81Adjusted2004006008001000NumberStars <=3>3 of R-squared times that hotel is displayed 0.2.4.6.81Adjusted2004006008001000NumberIndependentChain of R-squared times that hotel is displayed 1 1 .8 .8 .6 .6 .4 .4 Adjusted R-squared Adjusted R-squared .2 .2 0 0

0 200 400 600 800 1000 0 200 400 600 800 1000 Number of times that hotel is displayed Number of times that hotel is displayed

Stars <=3 Stars >3 Independent Chain

Hotel Stars Chain

0.2.4.6.81Adjusted2004006008001000NumberLocation of ScoreMedianR-squared times that hotel is displayed 1 .8 .6 .4 Adjusted R-squared .2 0

0 200 400 600 800 1000 Number of times that hotel is displayed

Location ScoreMedian

Hotel Location Score

55 12.7 Further Evidence for Section 7.2: Observable characteristics ex- plain most of the variation in prices

Table 15: Predicting price with observable characteristics

Destination 4562 4562 4562 4562 9402 8347 13870 Trip characteristics Trip Length (days) 1.3723∗∗∗ 1.1195∗∗∗ 1.6314∗∗∗ 2.4010∗∗∗ 0.1274 0.3313∗∗∗ (0.1232) (0.1266) (0.1168) (0.1403) (0.1144) (0.0945) Adults 9.8311∗∗∗ 9.5846∗∗∗ 8.1004∗∗∗ 3.2964∗∗∗ 4.4646∗∗∗ 3.4776∗∗∗ (0.3201) (0.3215) (0.2943) (0.3018) (0.2219) (0.1853) Children 13.9633∗∗∗ 13.8446∗∗∗ 11.3240∗∗∗ 6.2819∗∗∗ 3.4076∗∗∗ 3.5209∗∗∗ (0.4252) (0.4255) (0.3887) (0.3050) (0.2049) (0.1211) Rooms -4.7943∗∗∗ -4.9863∗∗∗ -3.9110∗∗∗ -5.2620∗∗∗ -3.6941∗∗∗ -8.1948∗∗∗ (0.7051) (0.7058) (0.6439) (0.7167) (0.6114) (0.4839) Saturday Night 0.0200 0.3161 -0.4788 0.0522 1.6732 4.0309∗∗∗ (1.0484) (1.0485) (0.9573) (1.1569) (0.9064) (0.6819) Search characteristics Booking Window (days) 0.0729∗∗∗ 0.0482∗∗∗ -0.0323∗∗∗ -0.0243∗∗∗ -0.0904∗∗∗ (0.0071) (0.0065) (0.0060) (0.0065) (0.0043) Time of day 9am-6pm -0.9025 -0.6583 -2.5246∗∗∗ -1.6776∗∗ -0.5700 (0.6637) (0.6045) (0.6086) (0.5673) (0.4203) 6pm-midnight -2.8066∗∗∗ -2.0094∗∗ -2.8446∗∗∗ -2.3766∗∗∗ -1.3685∗∗ (0.7235) (0.6595) (0.6395) (0.6447) (0.4328) Weekend -0.7446 -0.0903 -1.9055∗∗∗ 0.2331 0.8280∗∗ (0.5574) (0.5085) (0.4848) (0.5049) (0.3115) Competition Avg. prices of similar hotels -1.7492∗∗∗ -2.0091∗∗∗ -1.8734∗∗∗ -2.0930∗∗∗ (0.0165) (0.0199) (0.0256) (0.0226) Promotion -21.2012∗∗∗ -26.1707∗∗∗ -17.7625∗∗∗ -14.1015∗∗∗ (0.5656) (0.5952) (0.6917) (0.5572) Hotel and trip date fixed effects Yes Yes Yes Yes Yes Yes Yes Observations 60,968 60,968 60,968 58,486 58,855 38,598 51,458 Adjusted R2 0.766 0.774 0.775 0.813 0.801 0.842 0.842 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Note: OLS regression with dependent variable price. Time of day of the search is with respect to the left out variable, searches performed between midnight and 9am (local time). The average price of similar hotels is computed as the average price of hotels with the same number of stars and reviews and same type (chain or independent) as the focal hotel for the same trip date I restrict attention to hotels that are displayed at least 100 times to be able to include hotel fixed effects in all specifications above.

56 12.8 Further Evidence for Sections 4.1 and 8: Summary Statistics for the Four Destinations used in Estimation

For the structural estimation, I choose to focus on the four largest destinations in my data set: 4562, 9402, 8347, 13870.42 This allows me to control for differences across destinations and not confound results. These destinations are in the largest country, 219, which I have shown earlier is likely the U.S. I summarize their characteristics in Tables 16, 17 and 18. For estimation, I restrict attention to search impressions of the same length to ensure that position effects are estimated consistently. More precisely, if across ranking types some search impressions were longer than others, then the estimated position effect may be in part driven by the fact that one ranking type has shorter search impressions, potentially leading to a larger position effect. I thus focus on search impressions longer than the average search impression length (25). These destinations have a total of 124,475 observations with 29.4% of those from the Random ranking. This is comparable to the 33% for the full data set. There are almost five thousand searches and 1493 hotels hotels, the majority of which appear in both rankings.43 Each search impression has at least one click and there are 2,727 transactions.

Table 16: Summary statistics by destination

4562 9402 8347 13870 Observations 49,400 33,175 24,900 17,000 Observations with Random Ranking 17,000 10,850 6,275 2,475 Search Impressions 1,976 1,327 996 680 Hotels 568 313 433 179 Hotels Displayed in Both Rankings 486 296 426 124 Clicks 2,229 1,458 1,108 740 Transactions 1,021 821 498 387

Table 17 reveals that these destinations have similar search query characteristics as the full sample. Search queries are for trips that are on average three days long, and are searched ap- proximately 10 days earlier than in the full sample. All other characteristics align closely with the ones in the full sample. Table 18 is the equivalent of Table 2 for these four destinations. I split the data into search impressions ending in a transaction and those that do not. I find that Expedia’s ranking displays more expensive hotels of higher quality, regardless of whether the search impression ends in a transaction. As in the full data set, search impressions ending in a transaction are generally those that are cheaper.

42Destination 8192 has the largest number of observations (121,522), but has few observations with the Random ranking, so I choose to focus on the next two largest observations that have a fraction of Random rankings that is closer to the one in the full data set. 43Recall that I only observe the first page of results, so even though a particular destination contains a fixed number of hotels and both rankings display the same hotels, they need not list the same hotels on the first page, which is why I do not expect that all hotels will appear on the first page of results under both rankings.

57 Table 17: Summary statistics: Search impressions by destination

4562 9402 8347 13870 Trip Length (days) 3.21 2.66 3.74 2.86 Booking Window (days) 48.72 46.77 46.80 46.15 Saturday Night (percent) 0.43 0.48 0.37 0.42 Adults 1.90 1.98 2.33 2.22 Children 0.26 0.36 0.98 1.29 Rooms 1.07 1.09 1.10 1.09 Chain (percent) 0.59 0.67 0.70 0.75 Promotion (percent) 0.39 0.32 0.47 0.24 Random Ranking (percent) 0.34 0.33 0.25 0.15 Total Clicks 1.13 1.10 1.11 1.09 Two or More Clicks (percent) 0.07 0.06 0.05 0.06 Total Transactions 0.52 0.62 0.50 0.57 Observations 1,976 1,327 996 680

Table 18: Hotel characteristics displayed by search impression type and conversion

Random Ranking Expedia’s Ranking No transaction Transaction No transaction Transaction Mean SD Mean SD Mean SD Mean SD Destination 4562 Price 234.58 132.83 222.62 114.54 257.78 113.51 236.61 104.69 Stars 3.32 0.90 3.41 0.90 3.75 0.74 3.71 0.77 Review Score 3.83 0.95 3.83 0.97 4.04 0.75 4.00 0.80 Chain 0.68 0.47 0.69 0.46 0.57 0.49 0.53 0.50 Location Score 3.79 1.84 3.87 1.89 4.93 1.38 5.07 1.30 Promotion 0.26 0.44 0.23 0.42 0.44 0.50 0.47 0.50 Destination 9402 Price 191.46 104.79 170.23 96.59 215.80 105.51 197.60 92.41 Stars 3.17 0.90 3.10 0.88 3.56 0.82 3.54 0.86 Review Score 3.91 0.71 3.89 0.70 4.15 0.52 4.09 0.57 Chain 0.72 0.45 0.69 0.46 0.66 0.48 0.65 0.48 Location Score 3.41 1.58 3.37 1.59 4.47 1.20 4.48 1.18 Promotion 0.18 0.38 0.19 0.39 0.39 0.49 0.40 0.49 Destination 8347 Price 128.02 82.90 105.06 53.86 160.97 111.55 142.03 76.04 Stars 3.13 0.69 3.04 0.64 3.50 0.75 3.55 0.66 Review Score 3.89 0.83 3.81 0.74 4.09 0.63 4.11 0.54 Chain 0.77 0.42 0.81 0.39 0.75 0.43 0.64 0.48 Location Score 2.82 0.95 2.91 0.99 2.70 0.91 2.85 0.83 Promotion 0.31 0.46 0.28 0.45 0.39 0.49 0.59 0.49 Destination 13870 Price 126.04 62.77 123.17 51.68 138.93 72.54 146.44 72.96 Stars 2.74 0.72 2.80 0.67 3.03 0.75 3.17 0.74 Review Score 3.70 0.70 3.72 0.64 3.86 0.61 3.95 0.52 Chain 0.77 0.42 0.79 0.41 0.75 0.43 0.74 0.44 Location Score 3.05 1.35 2.81 1.40 3.27 1.25 3.62 1.00 Promotion 0.16 0.36 0.17 0.38 0.21 0.41 0.27 0.44

58 12.9 Further Evidence for Section 6: Evidence on Click Order

In this Appendix, I show that the position of a hotel and its price are strong predictors of the order in which consumers click. To do so, I use the companion data set from WCAI that contains information on consumers’ click order in the form of time stamps associated with each click. I then ask what fraction of searches with at least two clicks had a click order that matched the position of the hotels clicked. I also compare this fraction with the fraction ordered by price. Table 19 shows my results. I find that in 35% of all searches with at least two clicks and in the majority of searches (65%) with exactly two clicks the position of the hotel exactly matches the click order of the consumer. In contrast, the price of the hotels clicked only matches the order of 20% of the clicks. This finding allows me to model consumers’ click order even in the absence of information on the order of clicks.

Table 19: Percentage of clicks ordered by price, position or either one: Evidence from Manhattan (WCAI)

Percentage Price Position Price or Position Searches with at least two clicks 20 35 40 Searches with exactly two clicks 49 65 77

59 12.10 Further Evidence for Section 8: Estimation using Conditonal Logit Model of the Probability of a Click

The utility of consumer i of a click on hotel j is given by

uij = xjβ + αpij + γpositionj + νij

where xj gives the characteristics of hotel j and pij gives its price at the time of consumer i’s search. The outside option is modeled as the decision to click on the last hotel shown in the ranking. Results mirror my findings in Table 25 of a large endogeneity bias inherent in Expedia’s ranking.

Table 20: Conditional logit with dependent variable: Click

Destination 4562 Destination 9402 Destination 8347 Destination 13870 Random Expedia Random Expedia Random Expedia Random Expedia Position -0.0642∗∗∗ -0.0761∗∗∗ -0.0423∗∗∗ -0.0843∗∗∗ -0.0505∗∗∗ -0.0570∗∗∗ -0.0514∗∗∗ -0.0548∗∗∗ (0.0055) (0.0043) (0.0067) (0.0055) (0.0089) (0.0054) (0.0138) (0.0061) Stars 0.5660∗∗∗ 0.7698∗∗∗ 0.8514∗∗∗ 0.4948∗∗∗ 0.7842∗∗∗ 0.5880∗∗∗ 0.5361∗∗ 0.3803∗∗∗ (0.0700) (0.0629) (0.0926) (0.0748) (0.1225) (0.0779) (0.1986) (0.0794) Review Score 0.0324 0.0357 0.1332 0.4950∗∗∗ -0.0240 0.2713∗∗ 0.3772 0.0355 (0.0461) (0.0378) (0.1026) (0.0943) (0.0889) (0.0882) (0.2355) (0.1117) Location Score 0.5280∗∗∗ 0.5915∗∗∗ 0.2977∗∗∗ 0.2770∗∗∗ 0.1122 0.3072∗∗∗ 0.2329∗ 0.1676∗∗∗ (0.0347) (0.0415) (0.0409) (0.0438) (0.0762) (0.0530) (0.0908) (0.0461) Chain 0.0296 0.1622∗ -0.0114 0.2451∗∗ -0.2013 -0.3220∗∗∗ 0.3242 0.0671 (0.0870) (0.0657) (0.1230) (0.0862) (0.1579) (0.0808) (0.2998) (0.1071) Promotion 0.0285 -0.0389 0.0647 -0.1778∗ 0.2555 0.1231 0.3799 0.3257∗∗ (0.0899) (0.0620) (0.1190) (0.0774) (0.1505) (0.0909) (0.2577) (0.1000) Price -0.0103∗∗∗ -0.0218∗∗∗ -0.0101∗∗∗ -0.0173∗∗∗ -0.0063∗∗∗ -0.0133∗∗∗ -0.0105∗∗ -0.0064∗∗∗ (0.0008) (0.0008) (0.0011) (0.0010) (0.0015) (0.0011) (0.0035) (0.0012) Observations 17,000 32,400 10,850 22,325 6,275 18,625 2,475 14,525 Log-likelihood -2,059 -3,702 -1,370 -2,670 -797 -2,323 -327 -1,877 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

60 12.11 Further Evidence for Section 7.3: Simulation results accounting for order

Data generation for this section and for section 7.3: To generate the data set used for the Monte Carlo simulation, I determine hotel and consumer characteristics as follows:

• Stars follow a normal distribution with mean 3 and standard deviation 0.5

• Reviews follow a normal distribution with mean 3.5 and standard deviation 0.55

• Location scores follow a normal distribution with mean 3 and standard deviation 0.6

• Price follows a log-normal distribution with mean 1 and standard deviation 0.3

• Chain and promotion follow a uniform distribution with 40% chains, 70% promotions

• Booking window is generated for each consumer and has equal probability of being 0, 0.5, 1, 1.5.

Specifically for this section: In section 7.3, I have described simulation results showing that estimating the model using Simulated Maximum Likelihood recovers parameters well for data without information about consumers’ click order. However, when such information is available and when there is reason to expect individual consumer heterogeneity to affect consumers’ search order, then, I will show in this section, accounting for this order is crucial for properly recovering preference and search cost parameters. To this end, I generate a data set of 2, 000 consumers, each searching among five firms. The data set is generated so that there are many instances in which different consumers face the same ranking of hotels, but click in a different order because of search cost heterogeneity. I will show that ignoring the fact that individual heterogeneity affects consumer click order in estimation (when this data is available) leads to substantial bias in estimating search costs. I estimate a similar model as in section 6 of this paper, except that I allow consumer heterogeneity to affect consumer search costs as follows

cij = exp(k + γWi + δρij + ηij) (18) where k gives the baseline search costs of the consumer, Wi denotes the booking window (the number of days before the trip that the consumer searches), ρij gives the position of the hotel 2 in the ranking that the consumer observes, and ηij follows N(0, σ ) and provides the source of 44 individual heterogeneity. Adding this idiosyncratic shock ηij to consumers’ search costs is meant to capture the idea that, conditional on observables, consumers might click on different hotels in a different order, which reveals important information about their search costs. For example, if otherwise similar consumers were shown the same ranking {A, B}, but one clicked on hotel A

44See Moraga-Gonzalez, Sandor and Wildenbeest (2015) for a similar approach in estimating a simultaneous search model.

61 first, while the other on hotel B first, then this model would capture this information about their behavior as variation in the unobserved shock to their search costs.

Table 21: Simulation results

Accounting Not Accounting for Order for Order Preferences Price (-1.8) -2.0116∗∗∗ -1.7414∗∗∗ (0.0440) (0.0477) Stars (0.5) 0.5743∗∗∗ 0.5387∗∗∗ (0.0246) (0.0292) Review Score (0.5) 0.5348∗∗∗ 0.4467∗∗∗ (0.0192) (0.0251) Location Score (0.5) 0.5633∗∗∗ 0.5201∗∗∗ (0.0188) (0.0246) Chain (0.5) 0.6082∗∗∗ 0.5836∗∗∗ (0.0255) (0.0304) Promotion (0.5) 0.5537∗∗∗ 0.4974∗∗∗ (0.0230) (0.0319) Search cost Position (0.5) 0.5466∗∗∗ 0.4220∗∗∗ (0.0134) (0.0033) Booking Window (-2) -2.0761∗∗∗ -2.1701∗∗∗ (0.0133) (0.0012) Constant ek (-2.5) -2.7720∗∗∗ -2.1020∗∗∗ (0.0063) (0.0037) σ (0.1) 0.0141 - (0.0099) - Observations 10,000 10,000 Log-likelihood -3,313 -3,034 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

My results can be found in Table 21. In the first column I show results from a model accounting for the order in which consumers search and I find that this model recovers parameters well. In the second column, I contrast my estimates with those from a model that does not account for the order of clicks. As described previously, this is a model where search costs are deterministic, so that the order of clicks does not inform estimates of preferences and search costs, i.e. it lacks the additional inequalities relating the order of reservation utilities in equation (6). I find that such a model identifies utility parameters well, but does not perform as well in recovering search cost parameters.

62 12.12 Further Evidence for Section 8: Magnitude of search costs

There are three data patterns that contribute to the estimation of large search costs in my paper. First, searches contain mostly one click, suggesting high search costs. Second, the data set does not provide enough information to allow me to link different searches made by the same consumer. If the same consumer made several searches in the data, this is an indication of lower search costs, however, with data at the search impression level, this information is lost. Third, there is no information on consumers’ outside option. These data limitations have lead me to emphasize the relative difference in search costs between the two rankings, Expedia and Random, since absolute values are hard to interpret. In this section, I show that with additional data on more clicks or more searches per consumer, search costs estimates would significantly decrease. To this end, I reestimate my model on two data sets. First, in Table 22, I estimate the model on the subset of searches that contain at least two clicks. Compared to the baseline search costs coefficient of 0.3751 in Table 25 for the Random ranking in the first destination, here the coefficient is significantly lower and equals 0.1477. Second, in Table 23, I reestimate my model on the WCAI companion data set for Manhattan. Since this is the same data set used in Chen and Yao (2014), I estimate the model here on searches ending in a transaction to make my results comparable to theirs (see Table 6 in Chen and Yao, 2014).45 Note that this assumption requires normalizing the price coefficient to -1 for identification. For this reason, estimates should be interpreted in dollar terms. The advantage of this data set is that it includes information that allows linking different searches made by the same consumer. In the table below, I show that most searches (77%) contain no clicks and that the average (median) number of clicks in searches with at least one click is 1.80 (1), comparable to the average (median) of 1.11 (1) in my data set. However, this number is significantly increased if information on consumer ids is available. More precisely, on average consumers who make at least one click make 3.28 clicks. Thus, estimating search costs at the level of a search or at the consumer level will significantly affect results, as I show in Table 23.

WCAI (Manhattan): Number of clicks at search and consumer level

Number of clicks Search level Consumer level 0 11,611 (77%) 2,793 (59%) 1 2,304 (15%) 880 (19%) 2 683 (5%) 392 (8%) >2 573 (3%) 687 (14%) Avg. clicks if at least 1 1.80 3.28 Median clicks if at least 1 1 2

To show that this information is crucial in estimating search costs, I estimate the model twice (see Table 23): once assuming no information on consumers is available (column 1) and once

45There are two notable differences in my estimation compared to that of Chen and Yao (2014): (i) the models we estimate are slightly different, as in their model the probability of click is independent of the probability of purchase in the likelihood function; and (ii) I focus the estimation on one destination, Manhattan, while they estimate their model on all four destinations available.

63 assuming this information is available and linking searches made by the same consumer (column 2). The results in column 1 mirror those in Table 25 showing search costs of $71.37. My results in column 2 are similar to those found in Chen and Yao (2014). Importantly, I find a much lower search cost estimate of $30.20 (a decrease of 63%), comparable to the one found by Chen and Yao (2014) of $25.28, as well as other papers in the literature, such as De los Santos and Koulayev (2014) who estimate search costs to range between $10 and $50. To summarize, I demonstrate that the magnitude of search costs estimated in Table 25 is inherited from the limitations of my data set. For example, in searches with at least two clicks or if more information on the searches made by the same consumer were available, search cost estimates would be significantly lowered.

64 Table 22: Main Estimation Results for Searches with at least Two Clicks

Destination 4562 Destination 9402 Destination 8347 Destination 13870 Panel A: Coefficients Random Expedia Random Expedia Random Expedia Random Expedia Preferences (u) Price -0.2321∗∗∗ -0.2812∗∗∗ -0.4857∗∗∗ -0.3247∗∗∗ -0.1755 -0.2563∗∗ -0.1290 0.3555∗ (0.0872) (0.0000) (0.0909) (0.0782) (0.1877) (0.0948) (0.2905) (0.1444) Stars 0.0184 0.1088∗∗∗ 0.1115∗∗∗ 0.1288∗ 0.0931 0.0914 -0.0181 0.0826 (0.0545) (0.0285) (0.0385) (0.0515) (0.1123) (0.0696) (0.1932) (0.1102) Review Score -0.0428 0.0158 -0.0426∗∗∗ 0.1481∗∗∗ -0.0895 0.0456 -0.0473 0.0894 (0.0292) (0.0101) (0.0000) (0.0542) (0.0565) (0.0640) (0.1540) (0.0772) Location Score 0.0399 0.0463∗∗∗ 0.0694∗∗ 0.0744∗∗ -0.0123 0.0543 0.0523 0.0812∗ (0.0243) (0.0000) (0.0274) (0.0290) (0.0681) (0.0396) (0.0659) (0.0356) Chain -0.0892 0.0177 -0.0497 0.0509 0.0915 -0.0755 0.1610 0.0455

65 (0.0735) (0.0432) (0.1084) (0.0587) (0.1502) (0.0693) (0.2437) (0.2121) Promotion -0.1040 0.0137 0.1516 -0.0389 0.1301 0.0414 -0.2353 0.0258 (0.0817) (0.0405) (0.0889) (0.0570) (0.1296) (0.0701) (0.3296) (0.0876)

Search Cost (c) Position 0.0011 0.0037∗∗∗ 0.0046∗ 0.0056∗∗∗ 0.0009 0.0010 0.0013 0.0020 (0.0010) (0.0009) (0.0024) (0.0014) (0.0022) (0.0018) (0.0027) (0.0076) Booking Window -0.1075∗∗∗ -0.1225∗∗∗ -0.1543∗∗∗ -0.1111∗∗∗ -0.0737∗∗∗ -0.1150∗∗∗ -0.1644∗∗∗ -0.1461∗∗∗ (0.0009) (0.0000) (0.0000) (0.0000) (0.0026) (0.0000) (0.0000) (0.0000) Constant ek 0.1477∗∗∗ 0.2138∗∗∗ 0.3680∗∗∗ 0.2936∗∗∗ 0.2640∗∗∗ 0.2859∗∗∗ 0.3764∗∗∗ 0.2936∗∗∗ (0.0071) (0.0000) (0.0026) (0.0000) (0.0046) (0.0062) (0.0000) (0.0367) Observations 875 2,450 575 1,425 250 985 175 775 Log-likelihood -189 -498 -125 -284 -55 -207 -39 -156 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Note: Prices measured in $100 and booking window measured in 100 days. Position and booking window parameters expressed as change in search cost implied by a unit change. Table 23: Manhattan (WCAI) Estimation Results

Panel A: Coefficients Search Level Consumer Level Preferences (u) Price (normalized) -1 -1

Stars 56.0870∗∗∗ 67.9450∗∗∗ (7.2774) (6.0577) Review Score 51.7600∗∗∗ 29.8880∗∗∗ (6.7878) (7.5547) Chain 3.0093 9.0931 (6.6991) (5.4615) Promotion -19.4270∗∗ -14.6760∗∗ (6.8075) (5.2664)

Search Cost (c) Position 0.0196∗∗∗ 0.0218∗∗∗ (0.0000) (0.0047) Booking Window -0.0118∗∗∗ -0.0572∗∗∗ (0.0000) (0.0009) Constant k 4.2678∗∗∗ 3.4078∗∗∗ (0.0000) (0.0100) Observations 665 6,740 Log-likelihood -171 -677

Panel B: Equivalent Change $ Position 1.40 0.66 Baseline search costs 71.37 30.20 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

66 12.13 Further Evidence for Section 6: Identifying Consumers’ Search Method

In this section, I exploit the variation in the hotels displayed under the two rankings to test for consumers’ search method.46. Simultaneous and sequential search models make different predictions on the number of searches that consumers make conditional on the product features their discover through search. In particular, if consumers are searching simultaneously, then their stopping decision is independent of the characteristics of the firms they previously observe. In contrast, when searching sequentially, consumers choose optimally when to stop searching, so that consumers who observe more favorable products are expected to stop searching earlier. These predictions can be used to test for consumers’ search method. Since impressions from Expedia’s ranking display more relevant hotels conditional on a click (see Figure 3), if consumers were searching sequentially, then they are expected to terminate their search earlier when faced with Expedia’s ranking than when faced with the Random ranking. This is exactly what I find in Table 24. To test for consumers’ search method, I perform a t-test of the difference in the number of clicks under the two rankings. A negative sign shows that consumers click less when faced with Expedia’s ranking, consistent with consumers using a sequential search method. Since this result may in part be due to the fact that most search impressions have exactly one click, I perform the same t-test only on search impressions with at least two clicks in the second column of Table 24 and I also find support for the claim that consumers are searching sequentially.

Table 24: T-test: Number of clicks by search impression type

All Searches with Difference (Expedia-Random) Searches at least Two Clicks Total clicks in search impression -0.0238∗∗∗ -0.0918∗∗∗ (0.0022) (0.0220) Observations 317,218 20,592 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

46These results hold under the assumption that the Random and Expedia ranking samples are comparable, for which I provide evidence in section 4.2

67 12.14 Further Evidence for Section 8: Full estimation results

68 Table 25: Main Estimation Results

Destination 4562 Destination 9402 Destination 8347 Destination 13870 Panel A: Coefficients Random Expedia Random Expedia Random Expedia Random Expedia Preferences (u) Price -0.1423∗∗∗ -0.4645∗∗∗ -0.1560∗∗∗ -0.4058∗∗∗ -0.1963∗∗∗ -0.2825∗∗∗ -0.2028∗∗∗ -0.1788∗∗∗ (0.0146) (0.0133) (0.0000) (0.0214) (0.0305) (0.0213) (0.0241) (0.0245) Stars 0.0240∗∗∗ 0.2287∗∗∗ 0.1314∗∗∗ 0.1146∗∗∗ 0.1623∗∗∗ 0.1415∗∗∗ 0.0331 0.0897∗∗∗ (0.0137) (0.0120) (0.0000) (0.0157) (0.0000) (0.0163) (0.0393) (0.0168) Review Score -0.0360∗∗∗ 0.0285∗∗∗ -0.0738∗ 0.2901∗∗∗ -0.0177 0.1380∗∗∗ 0.0599∗∗∗ 0.0780∗∗∗ (0.0079) (0.0087) (0.0095) (0.0172) (0.0128) (0.0167) (0.0000) (0.0210) Location Score 0.0685∗∗∗ 0.1937∗∗∗ 0.1034∗∗∗ 0.1276∗∗∗ -0.0417∗∗ 0.0980∗∗∗ 0.0218 0.0681∗∗∗ (0.0070) (0.0076) (0.0075) (0.0094) (0.0139) (0.0123) (0.0184) (0.0105) Chain -0.0158 0.0411∗∗ -0.0245 0.0411∗ -0.0736∗ -0.1041∗∗∗ 0.0401 0.0360

69 (0.0196) (0.0148) (0.0229) (0.0192) (0.0325) (0.0177) (0.0533) (0.0270) Promotion -0.0138 0.0127 0.0000 -0.0016 0.1090∗∗∗ 0.0957∗∗∗ 0.1213∗ 0.0732 (0.0212) (0.0140) (0.0266) (0.0167) (0.0288) (0.0211) (0.0526) (0.0230)

Search Cost (c) Position 0.0053∗∗∗ 0.0090∗∗∗ 0.0035∗∗∗ 0.0101∗∗∗ 0.0036∗∗∗ 0.0062∗∗∗ 0.0042∗∗∗ 0.0064∗∗∗ (0.0006) (0.0006) (0.0007) (0.0008) (0.0005) (0.0007) (0.0018) (0.0008) Booking Window -0.1717∗∗∗ -0.0801∗∗∗ -0.1590∗∗∗ -0.0452∗∗∗ -0.1584∗∗∗ -0.0603∗∗∗ -0.1384∗∗∗ -0.0452∗∗ (0.0012) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0038) (0.0000) Constant ek 0.3751∗∗∗ 0.7870∗∗∗ 0.4073∗∗∗ 0.8956∗∗∗ 0.5717∗∗∗ 0.8418∗∗∗ 0.6659∗∗∗ 0.8956∗∗∗ (0.0049) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0277) (0.0127) Observations 17,000 32,400 10,850 22,325 6,275 18,625 2,475 14,525 Log-likelihood -3,136 -4,824 -2,008 -3,236 -1,201 -3,060 -462 -2,441 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Note: Prices measured in $100 and booking window measured in 100 days. Position and booking window parameters expressed as change in search cost implied by a unit change. 12.15 Further Evidence for Section 5: Expectations versus search costs: Results from both Expedia and Random ranking

Table 26: Estimates of click on the position of the 5th displayed hotel

All Large Destinations Destination 4562 Position of 5th displayed hotel -0.0099 -0.0269∗ -0.0212 (0.0073) (0.0120) (0.0926) Stars 0.0121∗∗∗ 0.0193∗∗∗ (0.0011) (0.0022) Review Score 0.0070∗∗∗ 0.0018 (0.0009) (0.0021) Chain 0.0039∗ 0.0122∗∗∗ (0.0018) (0.0031) Location Score -0.0029∗∗∗ 0.0071∗∗∗ (0.0006) (0.0013) Price -0.0003∗∗∗ -0.0004∗∗∗ -0.0004∗∗ (0.0000) (0.0000) (0.0001) Promotion -0.0057∗∗∗ 0.0076∗∗ -0.0036 (0.0019) (0.0029) (0.0242) Destination Fixed Effects No Yes Yes Hotel Fixed Effects No No Yes Observations 167,985 51,698 1,671 R2 0.0069 0.0157 0.2523 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Note: Linear probability model that predicts clicks on the 5th displayed hotel as function of its position. Search impressions with opaque offers will display the 5th hotel in position 6, while those without opaque offers will display it in position 5.

70