Applying Occam's Razor in Modeling Cognition: a Bayesian Approach

Applying Occam's Razor in Modeling Cognition: a Bayesian Approach

psychonomic Bulletin & Review 1997,4(1), 79-95 ARTICLE Applying Occam's razor in modeling cognition: A Bayesian approach IN JAE MYUNG and MARK A. PITT Ohio State University, Columbus, Ohio In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam's razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model's fit to the data and the number of parameters) but also the extension ofthe parameterspace, and, most importantly, the functional form of the model (i.e.,the way in which the parameters are combined in the model's equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed. A goal ofresearch in psychology, as in other behavioral [e.g., the number ofparameters] ofthe model simple?). sciences, is to infer the underlying process that generated Among these, descriptive adequacy and complexity have observed data. The use of sophisticated mathematical been used most frequently, probably because they are models to describe these processes has grown considerably, easier to quantify than the other two. Together they embody especially in cognitive psychology (see, e.g., 1.R. Ander­ the principle ofOccam's razor, which states that "entities son & Sheu, 1995; N. H. Anderson, 1981; Ashby & Town­ should not be multiplied beyond necessity" (William of send, 1986; Busemeyer & Townsend, 1993; Gillund & Occam, ca. 1290-1349). The goal ofmodel selection is Shiffrin, 1984; Green & Swets, 1966; Hintzman, 1986; to choose the simplest (i.e., least complex) model that de­ Kruschke, 1992; Massaro & Friedman, 1990; Medin & scribes the data well (i.e., descriptive adequacy). Schaffer, 1978; Murdock, 1982; Nosofsky, 1986; Oden & In this paper, we introduce a new Bayesian method of Massaro, 1978; Reed, 1972; van Zandt & Ratcliff, 1995). formalizing Occam's razor in model selection. It goes be­ Yet, the development of equally sophisticated and well­ yond current selection methods by taking into account justified methods for evaluating the adequacy ofthe mod­ dimensions of complexity that are not captured by its els themselves has lagged behind. Jacobs and Grainger predecessors. We begin with a tutorial on model selection (1994) recently summarized a number ofcriteria for choos­ methods that are currently in use. Next, fundamentals of ing among models: (I) generality (does the model general­ the Bayesian model selection approach are described, ize well across different experimental settings?); (2) ex­ and its desirable properties are discussed. The utility of planatory adequacy (are the assumptions of the model the Bayesian approach is then demonstrated using con­ plausible and compatible with established findings?); crete examples with simulated data. Finally, the merits and (3) descriptive adequacy (does the model fit the pattern shortcomings of the Bayesian approach are discussed of data well?); and (4) complexity (is the formulation and contrasted with traditional approaches. MODEL SELECTION CRITERIA A portion of this work was presented at the 27th annual meeting of the Society for Mathematical Psychology held at the University of Descriptive Adequacy California, Irvine, in August 1995. Many people provided very use­ The goal ofmathematical modeling in cognitive psy­ ful feedback on earlier versions of this paper. They include Greg chology is straightforward: Given observed data, identify Ashby, Michael Browne, Jerry Busemeyer, Dan Friedman, Lester Krueger, Duncan Luce, Robert MacCallum, Dominic Massaro, the underlying processes that generated the data. Because Richard Schweickert, James Townsend, Michael Wenger, and Patricia a model is defined as a set ofassumptions about under­ van Zandt. Greg Ashby and Lester Krueger were especially helpful in lying processes, the goal ofthe researcher is to determine sharpening our thinking on model complexity. This research was sup­ the viability ofthe model. There are, however, at least two ported in part by Ohio Supercomputer Center Grant PAS887-1. Cor­ respondence should be addressed to I. 1. Myung, Department ofPsy­ obstacles to such an endeavor. First, given the nearly in­ chology, Ohio State University, 1885 Neil Avenue Mall, Columbus, finite number of distinct models that can be defined by OH 43210-1222 (e-mail: [email protected]). combining different assumptions, the true model might 79 Copyright 1997 Psychonomic Society, Inc. 80 MYUNG AND PITT not be one of a particular set ofmodels that is being tested. Clearly,Modell fails the test ofdescriptive adequacy and Second, random noise in data can obscure model identi­ can be dropped from further consideration. In contrast, fication. Consequently, a realistic goal ofmathematical Model 2, a three-parameter exponential model, fits the modeling is to choose the model that represents the clos­ data fairly well, accounting for 96% ofthe variance with est approximation to the "true" model. no systematic deviation from the fit. Between the first two How can the closest approximation be identified when models, Model 2 would be chosen as the preferred de­ the true model has yet to be discovered? Consider the sit­ scription ofthe data. Models 3 and 4 are discussed in the uation in which the true model is included in the set of next section. models being tested and further, data are noise free. In this ideal situation, the true model must fit the data perfectly Model Complexity (e.g., as measured using a metric such as sum ofsquared It is important to note that descriptive adequacy is a errors). Note that this is a necessary-butnot sufficient­ heuristic. Selection ofthe best fitting model may be use­ condition, for there could be more than one model that fits ful in identifying the true model or closest approximation, the data perfectly. By extending this logic to less ideal sit­ but the rule's accuracy is not guaranteed. This is because uations (e.g., noisy data), the following model selection model fit can be improved by increasing model complex­ rule is obtained: Choose the model that provides the best ity. Complexity refers to the flexibility inherent in a fit to the data. Accordingly,the foremost criterion ofmodel model that enables it to fit diverse patterns of data. I It selection, descriptive adequacy, is born. Examples ofde­ can be understood by contrasting the data-fitting capa­ scriptive adequacy measures that are in use include the bilities ofsimple and complex models. A simple model percent variance accounted for by the model (i.e., coeffi­ is one that assumes that a specific pattern will be found cient ofdetermination), the sum ofsquared errors (SSE) in the data. If this pattern occurs, the model will fit the between observed and predicted outcomes, and the max­ data well. Simple models make clear and falsifiable pre­ imum likelihood, in which the probability ofobtaining the dictions precisely because a specific pattern is assumed observed data is maximized with respect to the model's to be present. In terms ofactually testing the model, what possible parameter values (see Bickel & Doksum, 1977). this means is that the model's fit will be good over a siz­ For a model to be considered true, it must satisfy the able range ofparameter values. A complex model, on the minimal condition of sufficiency in fitting data well. A other hand, is more flexible than a simple model, provid­ failure to do so invalidates the model. An illustrative ex­ ing good fits to a wide range ofdata patterns. To do so, ample is shown in Figure 1. Each of the four solid lines however, the complex model's parameters must be finely in the figure represents a model's best fit to the same data tuned. This is because as the model's parameters change, set (solid dots) using the least squares estimation method. even slightly, the postulated data pattern also changes. Modell is a two-parameter linear model. As can be seen, There are at least three dimensions of a model that con­ it fits the data poorly, with only 79.5% of the variance tribute to its complexity, thereby significantly affecting accounted for. Systematic deviations from the line are evi­ model fit: the number of parameters, the model's func­ dent at the endpoints and in the middle of the range. tional form, and the extension of the parameter space. In • Model 1 (n=2. 79.57.) • Model 2 (n=3. 96.07.) • . • x Model 3 (n=7. 98.57.) Model 4 (n=20. 1007.) >- • x Figure 1. The effect of the number of parameters in a model on the model's ability to fit data. A single data set (dots) was fitted to four models (lines) differing in the number of parameters (n). Percentage of variance accounted for by each model is shown in parentheses. BAYESIAN MODEL SELECTION 81 the following subsections, we discuss the implications of Steiger and colleagues (Steiger, 1990; Steiger & Lind, each ofthese for model selection. 1980; see also Browne & Cudeck, 1992, Cudeck & Henly, Number of parameters. In general, a model with 1991) introduced yet another criterion, called the root many parameters fits data better than a model with few mean square error ofapproximation (RMSEA), defined as parameters, even ifthe latter generated the data (Collyer, vr: 1985). The effect ofexcessive parameters on model fit is RMSEA = I I. (3) ~N illustrated in the bottom panels in Figure 1. Model 3 was - ni created from Model 2 by introducing an additional cyclic component with four new parameters. Its fit is improved In this equation, the function F; is a measure ofthe lack over Model 2, but only by a meager 2.5%.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us