Choice Set Confounding in Discrete Choice

Choice Set Confounding in Discrete Choice Kiran Tomlinson Johan Ugander Austin R. Benson Cornell University Stanford University Cornell University [email protected] [email protected] [email protected] ABSTRACT analysis is to learn individual preferences over a set of available Standard methods in preference learning involve estimating the pa- items (the choice set), given observations of people’s choices. In rameters of discrete choice models from data of selections (choices) recent years, machine learning approaches have enabled more ac- made by individuals from a discrete set of alternatives (the choice curate choice modeling and prediction [11, 38, 43, 50]. However, set). While there are many models for individual preferences, exist- observational choice data analysis has thus far overlooked a crucial ing learning methods overlook how choice set assignment affects fact: the choice set assignment mechanism underlying a dataset can the data. Often, the choice set itself is influenced by an individual’s have a significant impact on the generalization of learned choice preferences; for instance, a consumer choosing a product from an models, in particular their validity on counterfactuals. online retailer is often presented with options from a recommender Understanding how new choice sets affect preferences in such system that depend on information about the consumer’s prefer- counterfactuals is key to many applications, such as determining ences. Ignoring these assignment mechanisms can mislead choice which alternative-fuel vehicles to subsidize or which movies to models into making biased estimates of preferences, a phenomenon recommend. In particular, chooser-dependent choice set assign- that we call choice set confounding. We demonstrate the presence ment coupled with heterogeneous preferences can severely mislead of such confounding in widely-used choice datasets. choice models, as they do not model the influence of preferences To address this issue, we adapt methods from causal inference on choice set assignment. Recommender systems are one extreme to the discrete choice setting. We use covariates of the chooser case, where items are selected specifically to appeal to a user. Such for inverse probability weighting and/or regression controls, accu- situations also arise in transportation decisions, online shopping, rately recovering individual preferences in the presence of choice and personalized Web search, resulting in widespread (but often set confounding under certain assumptions. When such covariates invisible) error in choice models learned from this data. are unavailable or inadequate, we develop methods that take advan- Drawing on connections with causal inference [24], we term tage of structured choice set assignment to improve prediction. We the issue of chooser-dependent choice set assignment choice set demonstrate the effectiveness of our methods on real-world choice confounding. Choice set confounding is a major issue for recent data, showing, for example, that accounting for choice set con- machine learning methods whose success is due to capturing devia- founding makes choices observed in hotel booking and commute tions from the traditional principles of rational utility maximization transportation more consistent with rational utility maximization. that underlie the workhorse multinomial logit model [31]. (Unlike older econometric models of “irrational” behavior [52, 56], these re- CCS CONCEPTS cent methods are practical for modern, large-scale datasets.) These deviations are known as context effects, and occur whenever the • Mathematics of computing ! Probabilistic inference prob- choice set has an influence on a chooser’s preferences. Examples lems; • Information systems ! Recommender systems. include the asymmetric dominance effect [21], where superior options are made to look even better by including inferior alterna- KEYWORDS tives, and the compromise effect [45], where intermediate options discrete choice; causal inference; preference learning are preferred (e.g., choosing a medium-priced bottle of wine). While ACM Reference Format: context effects are widespread and worth capturing, choice set con- Kiran Tomlinson, Johan Ugander, and Austin R. Benson. 2021. Choice Set founding can result in spurious effects and over-fitting, and it is un- Confounding in Discrete Choice. In Proceedings of the 27th ACM SIGKDD clear if recent machine learning models are learning true effects or arXiv:2105.07959v2 [cs.LG] 17 Aug 2021 Conference on Knowledge Discovery and Data Mining (KDD ’21), August simply being misled by chooser-dependent choice set assignment. 14–18, 2021, Virtual Event, Singapore. ACM, New York, NY, USA, 11 pages. In this paper, we formalize when choice set confounding is an https://doi.org/10.1145/3447548.3467378 issue and show that it can result in arbitrary systems of choice probabilities, even if choosers are rational utility-maximizers (in 1 INTRODUCTION contrast, tractable choice models only describe a tiny fraction of Individual choices drive the success of businesses and public policy, possible choice systems). We also provide strong evidence of choice so predicting and understanding them has far-reaching applications set confounding in two transportation datasets commonly used to in, e.g., environmental policy [12], marketing [3], Web search [22], demonstrate the presence of context effects and to test new mod- and recommender systems [57]. The central task of discrete choice els [8, 26, 36, 43]. Then, to manage choice set confounding, we first adapt two causal inference methods—inverse probability weight- KDD ’21, August 14–18, 2021, Virtual Event, Singapore ing (IPW) and regression controls—to train choice models in the 2021. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of presence of confounding. These methods require chooser covari- the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21), ates satisfying certain assumptions that differ from the traditional August 14–18, 2021, Virtual Event, Singapore, https://doi.org/10.1145/3447548.3467378. KDD ’21, August 14–18, 2021, Virtual Event, Singapore Kiran Tomlinson, Johan Ugander, and Austin R. Benson causal inference setting. For instance, given access to the same co- key role. In another setting, Manski and Lerman [29] used an ap- variates used by a recommender system to construct choice sets, proach similar to our inverse probability weighting. They were con- we can reweight the dataset to learn a choice model as if choice cerned with “choice-based samples,” where we first sample an item sets had been user-independent. Alternatively, we can incorporate and then get an observation of a chooser who selected that item covariates into the choice model itself, recovering individual prefer- (usually, we sample a chooser and then observe their choice) [29]. ences as long as those covariates capture preference heterogeneity. The use of regression controls in discrete choice (i.e., including We also show how to manage choice set confounding without chooser covariates in the utility function) is standard in economet- such covariates, as many observational datasets have little infor- rics [9, 47, 51]. However, in these settings, regression aims to un- mation about the individuals making choices. We demonstrate a derstand how the attributes of an individual affect decision-making, link between models accounting for context effects and models for which can unknowingly and accidentally help with confounding. choice systems induced by choice set confounding. For example, This may explain why choice set confounding has not been widely we derive the context-dependent random utility model (CDM) [44] recognized (additionally, in an interview, Manksi discusses that from the perspective of choice set confounding, by treating the choice set generation has been under-explored [48]). We formalize choice set as a vector of substitute covariates (e.g., “someone who when and how regression adjusts for choice set confounding. is offered item 8”) in a multinomial logit model. We develop spectral clustering methods typically used for co- 2 DISCRETE CHOICE BACKGROUND clustering [14] that exploit choice set assignment as a signal for We start with some notation and the basics of discrete choice mod- chooser preferences, as a way to improve counterfactual predictions els. Let U denote a universe of = items and A a population of indi- for observed choosers. To show why and when this can work, we viduals. In a discrete choice setting, a chooser 0 2 A is presented a frame the problem of finding sufficient chooser covariates asa nonempty choice set 퐶 ⊆ U and they choose one item 8 2 퐶. Specif- problem of recovering latent cluster membership in a stochastic ically, 0 is sampled with probability Pr¹0º, then 퐶 is presented to block model (SBM) of the bipartite graph that connects choosers to 0 with probability Pr¹퐶 j 0º, and finally 0 selects 8 with proba- the items in their choice sets. bility Pr¹8 j 0,퐶º. Most discrete choice analysis focuses only on In addition to theoretical analysis, we demonstrate the efficacy Pr¹8 j 0,퐶º or Pr¹8 j 퐶º, but we consider this entire process. A dis- of our methods on real-world choice data. We provide evidence that crete choice dataset D is a collection of tuples ¹퐶, 8º generated by IPW reduces confounding when modeling hotel booking data, mak- this process. We use CD to denote the

Choice Set Confounding in Discrete Choice

Economic Choices

Small-Sample Properties of Tests for Heteroscedasticity in the Conditional Logit Model

Estimation of Discrete Choice Models Including Socio-Economic Explanatory Variables, Diskussionsbeiträge - Serie II, No

Discrete Choice Models As Structural Models of Demand: Some Economic Implications of Common Approaches∗

Discrete Choice Models with Multiple Unobserved ∗ Choice Characteristics

Chapter 3: Discrete Choice

Generalized Linear Models

Discrete Choice Models and Methods.2

Specification of the Utility Function in Discrete Choice Experiments

Models for Heterogeneous Choices

Discrete Choice Modeling William Greene Stern School of Business, New York University

Lecture 5 Multiple Choice Models Part I – MNL, Nested Logit