Data-Driven Experimental Design and Model Development Using Gaussian Process with Active Learning
Total Page:16
File Type:pdf, Size:1020Kb
Cognitive Psychology 125 (2021) 101360 Contents lists available at ScienceDirect Cognitive Psychology journal homepage: www.elsevier.com/locate/cogpsych Data-driven experimental design and model development using Gaussian process with active learning Jorge Chang a,*, Jiseob Kim b, Byoung-Tak Zhang b, Mark A. Pitt a, Jay I. Myung a a Department of Psychology, The Ohio State University, Columbus, OH 43210, USA b School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Republic of Korea ARTICLE INFO ABSTRACT Keywords: Interest in computational modeling of cognition and behavior continues to grow. To be most Computational cognition productive, modelers should be equipped with tools that ensure optimal efficiency in data Data-driven cognitive modeling collection and in the integrity of inference about the phenomenon of interest. Traditionally, Gaussian process models in cognitive science have been parametric, which are particularly susceptible to model Active learning misspecification because their strong assumptions (e.g. parameterization, functional form) may Optimal experimental design Delay discounting introduce unjustified biases in data collection and inference. To address this issue, we propose a Nonparametric Bayesian methods data-driven nonparametric framework for model development, one that also includes optimal experimental design as a goal. It combines Gaussian Processes, a stochastic process often used for regression and classification, with active learning, from machine learning, to iteratively fit the model and use it to optimize the design selection throughout the experiment. The approach, dubbed Gaussian process with active learning (GPAL), is an extension of the parametric, adaptive design optimization (ADO) framework (Cavagnaro, Myung, Pitt, & Kujala, 2010). We demon strate the application and features of GPAL in a delay discounting task and compare its perfor mance to ADO in two experiments. The results show that GPAL is a viable modeling framework that is noteworthy for its high sensitivity to individual differences, identifying novel patterns in the data that were missed by the model-constrained ADO. This investigation represents a firststep towards the development of a data-driven cognitive modeling framework that serves as a middle ground between raw data, which can be difficult to interpret, and parametric models, which rely on strong assumptions. 1. Introduction Experimentation is at the core of scientificresearch, whether one is interested in understanding how people trade off between small but immediate rewards and larger but delayed rewards in delay discounting or the neural basis of cognitive control in visual search. Advancement in empirical research depends critically on the collection of high-quality, informative data, from which one can draw inferences with confidenceabout the phenomenon under study. A challenge faced by researchers is that experiments can be difficultto design because the consequence of design decisions (e.g., stimulus values, task settings, and testing schedule) are not known prior to data collection. Efficiencyof data collection also matters, especially when experiments are costly to perform in terms of time, money, and availability of participants, such as in brain imaging experiments, research with infants, and clinical research. Ideally, one strives * Corresponding author. E-mail address: [email protected] (J. Chang). https://doi.org/10.1016/j.cogpsych.2020.101360 Received 6 November 2019; Received in revised form 26 September 2020; Accepted 15 November 2020 Available online 17 January 2021 0010-0285/© 2020 Elsevier Inc. All rights reserved. J. Chang et al. Cognitive Psychology 125 (2021) 101360 to design experiments that yield the most informative data in order to achieve the experimental objective with the fewest observations (trials) possible. Recent developments in statistical computing offer algorithm-based ways to achieve these goals. Specifically, computational methods of optimal experimental design (OED; Atkinson & Donev, 1992; Lindley, 1956) in Bayesian statistics can assist in improving scientific inference by efficiently searching the design space to identify the combination of design variables and parameters that are likely to be most informative trial after trial, making the experiment efficient. Concretely, in an optimized adaptive experiment, the values of design variables (e.g., reward amounts and time delays in a delay discounting experiment) are not predetermined but instead are computed iteratively trial by trial to be optimal in an information-theoretic sense by real-time analysis of participant responses from earlier trials. With a newly made observation using the optimal design, the adaptive process then repeats in the next trial. This is unlike traditional approaches in which experimental designs are fixedfor all participants or vary across trials using a heuristic decision rule, such as the staircase method in adaptive threshold estimation or psychometric function estimation (e.g., Garcia-Perez, 1998). Adaptive design optimization (ADO; Cavagnaro, Myung, Pitt, & Kujala, 2010; Myung, Cavagnaro, & Pitt, 2013) was developed as an OED framework for behavioral experiments and derives from Bayesian experimental design (Chaloner & Verdinelli, 1995) and active learning1 in machine learning (Cohn, Ghahramani, & Jordan, 1996). ADO is a general-purpose model-based algorithm that exploits the predictions of a computational model of task performance to guide design selection in each trial in an adaptive manner. The top two panels in Fig. 1 illustrate the difference between a traditional experiment and an ADO-based experiment. A growing body of work is showing that ADO can improve significantly the informativeness and efficiency of data collection (e.g., Cavagnaro, Pitt, Gonzalez, & Myung, 2013; Cavagnaro, Pitt, & Myung, 2011; Gu et al., 2016). One limitation of ADO, however, is the technical requirement that the assumed model is correctly specified,in that the modeling scheme represents the true data-generating model (e.g., the hyperbolic model is the most accurate description of the rate at which people discount future rewards). This assumption is unlikely to hold true in practice because all models are imperfect approximations of the underlying mental process under study. To the extent that the parametric modeling assumption is violated, ADO would be sub- optimal and not as efficient as it could be. In short, ADO is not robust with respect to the inaccuracies and uncertainties about the underlying system. If a model is wrong, it is considered misspecified,which can make the results from ADO experiments misleading. One way to address and resolve the poor robustness of ADO is to drop its parametric modeling requirement and adopt a nonparametric (i.e., data-driven) approach. In parametric modeling, observations are assumed to be generated from some unknown (to be inferred) parameterized form of the model equation. In nonparametric modeling, on the other hand, the target model is inferred directly from the data collected in the experiment, without constraining it to a specific parametric family of functional forms as in parametric modeling. A nonparametric model, therefore, is highly flexible, containing virtually all possible functional forms (linear, nonlinear, cyclic, etc.) for describing any data pattern, and is, in a sense, a parametric model with a theoretically infinite number of parameters in which the number of parameters grows with the amount of data. Nonparametric modeling via optimal experimental design is the focus of the present work. In this paper, we propose a data-driven approach to optimal experimental design (OED). It uses Gaussian Processes (GP), which is a nonparametric Bayesian method that establishes priors over functions. GPs are a popular modeling tool in machine learning for regression and classificationtasks Rasmussen and Williams (2006). Recently, researchers in psychology have also explored the use of GPs to model human behavior (e.g., Cox, Kachergis, & Shiffrin, 2012; Griffiths,Lucas, Williams, & Kalish, 2009; Schulz, Speekenbrink, & Krause, 2018; Song, Sukesan, & Barbour, 2018). Among these, the work that is most closely related to the present study is Schulz et al. (2018), who discuss a way for combining GP with active learning into a unified OED framework (pp. 9–10).2 Here, we further develop this idea and apply it to a behavioral task. We refer to this framework as Gaussian Process with active learning (GPAL). GPAL is capable of simultaneously modeling the underlying function that generated the data (i.e., the cognitive model) while optimizing the experimental design to model that function efficiently. The GPAL algorithm begins with a rough approximation (i.e., prior) of the (initially unknown) data-generating model, and then continually updates and refines its approximation via Bayes rule after each observation in the experiment is collected. GPAL, while similar to ADO, eliminates the need for a parameterized model, and instead seeks to learn the model entirely from data without the potentially misleading assumptions about its functional form. This data-driven model learning step of GPAL, as illustrated in the bottom panel of Fig. 1, sets it apart from ADO, and thus GPAL may be viewed as a “model-free” version of ADO. The virtually unlimited flexibility of GPAL allows it to capture a much wider range of data patterns compared to ADO, thus showing higher sensitivity to individual differences.