Choice Modelling in the Age of Machine Learning

Choice modelling in the age of machine learning Sander van CranenburghI*, Shenhao WangII, Akshay VijIII, Francisco PereiraIV, Joan WalkerV I Transportation and Logistics group, Department of Engineering Systems and Services, Delft University of Technology II Urban Mobility Lab, Massachusetts Institute of Technology III Institute for Choice, University of South Australia IV Department of Technology, Management and Economics, Technical University of Denmark V Department of Civil and Environmental Engineering, University of California Berkeley * Corresponding author. [email protected] Abstract Since its inception, the choice modelling field has been dominated by theory-driven models. The recent emergence and growing popularity of machine learning models offer an alternative data- driven approach. Machine learning models, techniques and practices could help overcome problems and limitations of the current theory-driven modelling paradigm, e.g. relating to the ad- hocness in search for the optimal model specification, and theory-driven choice model’s inability to work with text and image data. However, despite the potential value of machine learning to improve choice modelling practices, the choice modelling field has been somewhat hesitant to embrace machine learning. The aim of this paper is to facilitate (further) integration of machine learning in the choice modelling field. To achieve this objective, we make the case that (further) integration of machine learning in the choice modelling field is beneficial for the choice modelling field, and, we shed light on where the benefits of further integration can be found. Specifically, we take the following approach. First, we clarify the similarities and differences between the two modelling paradigms. Second, we provide a literature overview on the use of machine learning for choice modelling. Third, we reinforce the strengths of the current theory-driven modelling paradigm and compare this with the machine learning modelling paradigm, Fourth, we identify opportunities for embracing machine learning for choice modelling, while recognising the strengths of the current theory-driven paradigm. Finally, we put forward a vision on the future relationship between the theory-driven choice models and machine learning. 1 Introduction The inception of the Random Utility Maximisation (RUM) model (McFadden 1974) in the mid- 1970s has been foundational for the way in which choice behaviour has been modelled and studied in the past 50 years (Hess and Daly 2014). In this theory-driven modelling paradigm, the analyst imposes structure to the data by postulating that decision makers make decisions based on utility theory. Recently, an alternative modelling paradigm is gaining ground in the choice modelling field. In this data-driven modelling paradigm –also often referred to as machine learning– the structure of the problem is learned from the data, as opposed to being imposed by the analyst based on prior beliefs and behavioural theories. A growing number of studies aim to bring models, estimation techniques and practices from machine learning to the choice modelling field (Wong et al. 2017; Lederrey et al. 2020; Sifringer et al. 2020). 1 The motivations put forward in these papers to employ machine learning for choice modelling are diverse but may be summarised in terms of four main points. Firstly, machine learning models can overcome problems of theory-driven choice models relating to the search for the optimal model specification, and the adverse effects caused by misspecifications. In theory-driven choice models the analyst imposes structure to the problem through functional forms and variable selections, based on theoretical frameworks. Then, – in a time consuming but often fairly ad hoc and subjective process – the final model is selected from a series of competing specifications (Paz et al. 2019; Rodrigues et al. 2020). In case this final model turns out to be a poor description of the true underlying data-generating process, model inferences and predictions can be erroneous. In contrast, machine learning models do not have this issue as the structure of the problem is learned from the data. Moreover, it is argued that learning the structure from the data has the additional advantage of opening up the possibility to find the unexpected. Secondly, machine learning often achieves higher goodness-of-fit than its theory-driven counterparts, especially in the context of prediction applications (Lee et al. 2018). While goodness-of-fit is seldom an aim in and of itself in the choice modelling literature, better fit is desirable as it is taken as a signal that the model has accurately captured the underlying process. Thirdly, machine learning models and estimation techniques often work comparatively well in combination with large and continuous streams of data (Danaf et al. 2019). As choice modellers increasingly get access to very large data sets, machine learning offers opportunities to develop new ways for mining behaviourally meaningful, statistically robust and computationally efficient information from these data sets. Finally, machine learning models can work with types of data that are currently outside the realm of traditional theory-driven discrete choice models, such as text and image data. Introducing machine learning to choice modelling thus opens-up the opportunity to extend the reach of choice modelling to model choice behaviour in the context of those data types (Van Cranenburgh 2020). In sum, there is clear potential of machine learning models, estimation techniques and practices for choice modelling. However, despite this potential the choice modelling field has been somewhat hesitant to embrace machine learning. This hesitance appears to be attributable to at least four related factors. Firstly, there seems to be a lack of understanding about machine learning, for instance, about what theory- driven choice models and machine learning have in common and what sets them apart. Possibly, this lack of understanding is caused by differences in vocabulary and terminology across the two fields (Breiman 2001) which impedes choice modellers from effectively understanding the machine learning literature. Secondly, there exist persistent misconceptions about machine learning among choice modellers that are potentially holding back analysts from being open to what machine learning could offer. Common misconceptions are, for instance, that machine learning models can only be used for prediction as opposed to behavioural inference, and that machine learning models are overfitting the data more often than not. Thirdly, which is in part a consequence of the above two factors, there seems to be a lack of recognition of the potential value of integrating machine learning models, techniques and practices for the choice modelling field. Finally, a long-term research agenda and vision about the place of machine learning vis-a-vis theory-driven choice models have been absent, which in turn could hold back researcher to jump on the bandwagon. This paper aims to facilitate (further) integration of machine learning in the choice modelling field. To achieve this objective, in this paper we make the case that (further) integration of machine 2 learning in the choice modelling field is beneficial for the choice modelling field, and, we shed light on where the benefits of further integration can be found. Specifically, in this paper we take the following approach. Section 2 starts by clarifying the similarities and differences between the two modelling paradigms at a philosophical level. Also, it sets the stage by providing a concise literature overview on what is currently happening in terms of machine learning in the choice modelling field. Section 3 reinforces the strengths of the current theory-driven choice models. In this section we specifically ask ourselves ‘what makes the current theory-driven modelling paradigm strong? and ‘how does that compare to machine learning?’. After having identified what already goes well in the current theory-driven paradigm, in Section 4 we identify promising opportunities where the choice modelling field can do better by embracing machine learning. Thereby, we hope to define a research agenda and inspire future research. Finally, Section 5 puts forward a vision on the future relationship between the theory-driven choice models and machine learning. 2 Philosophical similarities and differences between theory-driven and data-driven models Inspired by the seminal paper on data-driven versus theory-driven models of Breiman (2001), several papers have tried to position data-driven models with respect to other adjacent fields. In this tradition, this section aims to explicate the similarities and differences between theory-driven choice models and data-driven (machine learning) models, while specifically taking a choice modeller’s perspective. We especially focus on high-level similarities and differences between the philosophical approaches that have motivated the development of models in either discipline. The realm of machine learning models is so broad that it hinders making general statements. Therefore, we first narrow down what we mean with machine learning models. When we talk about machine learning models in the discussions that follow, we specifically have in mind the machine learning classifiers that are commonly encountered in the computational intelligence field. These include, among others, Artificial Neural Networks (ANNs), Boltzmann machines

Choice Modelling in the Age of Machine Learning

Combining Choice Experiment and Attribute Best–Worst Scaling

Integrated Choice and Latent Variable Models: a Literature Review on Mode Choice Hélène Bouscasse

Cloud Computing Adoption Decision Modelling for Smes: from the PAPRIKA Perspective Salim Alismaili University of Wollongong

A Systematic Review of the Reliability and Validity of Discrete Choice

Best–Worst Scaling Vs. Discrete Choice Experiments: an Empirical Comparison Using Social Care Data Article (Accepted Version) (Refereed)

Experimental Measurement of Preferences in Health and Healthcare Using Best-Worst Scaling: an Overview Axel C

The Generalized Multinomial Logit Model

An R Package for Case 1 Best-Worst Scaling Mark H. White II

Design and Analysis of Simulated Choice Or Allocation Experiments in Travel Choice Modeling

Discrete Choice Experiments Are Not Conjoint Analysis

Is Best-Worst Scaling Suitable for Health State Valuation? a Comparison with Discrete Choice Experiments

Constructing Experimental Designs for Discrete-Choice Experiments: Report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force F