Evaluating Cognitive Models and Architectures

Evaluating Cognitive Models and Architectures Marc Halbrugge¨ Human Factors Institute, Universitat¨ der Bundeswehr Munchen¨ [email protected] Abstract problem. While this is a recommendable request, it is hard to verify that the modeler has complied with it. Cognitive modeling is a promising research area combining the fields of psychology and artificial intelligence. Cognitive • Underlying theories. A special case of the irrelevant spec- models have been successfully applied to different topics like ification problem can occur when the architecture incor- theory generation in psychology, usability testing in human- porates knowledge from previous research, for example computer interface research, and cognitive tutoring in algebra Fitts’ law (Fitts 1954) about the time needed to perform a teaching. rapid movement of the forearm. These underlying empir- This paper is focused on two major problems concerning the ical laws may provide a big part of the fit of the cognitive evaluation of cognitive models and architectures. First, it is model in question. shown that current approaches to quantify model complexity are not sufficient and second, the claim is reinforced that • Interpretability. Most of the time, one needs to look at the flexibility of cognitive architectures leads to the problem the code of a model in order to understand what it really that they cannot be falsified and are therefore too weak to be does. As many psychologists do not hold a degree in com- considered as theories. puter science, this hinders them from participating in the cognitive modeling research field. Cognitive architectures The usage of a cognitive architecture – as opposed to From a psychologist’s point of view, the main goal of cog- modeling everything from scratch – does not solve these nitive modeling is to simulate human cognition. A cognitive problems completely, but weakens them noticeably. Cogni- model is seen as an instantiation of a theory in a computa- tive architectures are subject to social control by the research tional form. This can then be used to test a certain theory community. This way, theory compliance, interpretability, empirically. and rejection of irrelevant parts are enforced at least on the Cognitive architectures like SOAR (Laird, Newell, & level of the architecture. Rosenbloom 1987), EPIC (Kieras & Meyer 1997), and ACT- An informative evaluation of cognitive models has been R (Anderson et al. 2004) are useful frameworks for the cre- done by Gluck & Pew (2005). They applied several cogni- ation of cognitive models. In addition, they are often re- tive architectures to the same task and compared the result- garded as useful steps towards a unified theory of cognition ing models not only with regard to goodness-of-fit but also (Newell 1990). to modeling methodology, features and implied assumptions When a computational model is used to evaluate a theory, of the particular architecture, and the role of parameter tun- the main goal is (generalizable) concordance with human ing during the modeling process. data. But this is not a sufficient proof of the theory. The This approach of Gluck & Pew is well suited for address- following problems are often stated in the literature: ing the problems of model evaluation besides goodness-of- fit and therefore I’ll recommend it as standard practice. • Compliance with the theory. The underlying theory and its implementation (in a cognitive architecture) may di- verge. Evaluation methodology for cognitive models • Irrelevant specification (Reitman 1965). Most of the I stated above that from a psychologist’s point of view models need to make additional assumptions. It is very concordance with human data is the main goal of cog- hard to tell which parts of the model are responsible for nitive modeling. This concordance is typically assessed its overall performance and which parts may be obsolete. by computing goodness-of-fit indices. This approach has Newell (1990) demanded to “listen to the architecture” been critized sharply during the last years (Roberts & Pash- when creating new models in order to get around this ler 2000), initiating a discussion that concluded with the joint statement that a good fit is necessary, but not suffi- Copyright c 2007, Association for the Advancement of Artificial cient for the validity of a model (Rodgers & Rowe 2002; Intelligence (www.aaai.org). All rights reserved. Roberts & Pashler 2002). 27 In order to assess the fit of a model or to compare fits Model 2 consists of four states represented by chunks a, between models, it is important to be aware of the complex- b, c and d. The productions allow state transitions from ity of the models. Models with many degrees of freedom every state to its single successor (a→b, b→c, c→d, d→a). usually achieve better fits, but bear the risk of overfitting, Notice that although the second model looks determinis- the siamese twin of the irrelevant specification problem de- tic, the retrieval failure that breaks the cycle of firings may scribed in the previous section. In addition, the principle appear at any moment. All of the four activation weights are of parsimony demands to prefer a less complex model to a equally important for the outcome of the model. more complex one, given comparable fit. Therefore mea- Production utility learning is not used in the models. The surements of complexity are needed. only influential weights are the activations of the chunks. Pitt, Myung, & Zhang (2002) have developed statistics for Now the question of model complexity shall be addressed: model selection that take into account both goodness-of-fit According to the weight counting approach (Baker, Corbett, and complexity. These statistics aim at closed form mathe- & Koedinger 2003), the second model is more complex than matical models, where fitting means to choose the member the first one. It uses more memory objects with each single of a class of functions that represents some empirical data one being associated with a numerical activation. But be- best. Examples for this type of model range from simple cause of the arrangement of states and productions (see fig- linear regression to complex neural networks. ure 1), this model can only create outputs of the form abcd- In order to expand the notion of complexity to cognitive abcda and so on. The first model in contrast can output models, Baker, Corbett, & Koedinger (2003) used the num- any sequence of a’s and b’s. Therefore model 1 should be ber of free weights of the subsymbolic part of an ACT-R considered the more complex one. model as measure of the degress of freedom of the model. Even if model run time is used as dependent variable, you As will be show in the following, this is not sufficient. can spot the difference in complexity. Imagine that two of While some parts of cognitive models can be described by the four possible transitions in the models take a lot of time mathematical formulas (i.e. retrieval latencies, spreading ac- compared to the other two (i.e. 1000 ms vs. 100 ms). While tivation), they also include an algorithmic part. Additional the overall distribution of the run time will stay comparable degrees of freedom can emerge from the algorithmic struc- for the two models, the second model will only be able to ture of a model. Therefore the complexity of a model can produce a limited set of values (i.e. 100, 1100, 1200, 2200, not be adequately quantified by the number of numerical 2300). weights in its implementation. In order to test this, I created the two models,1 ran each one 10000 times and recorded the simulated run time. The Example: Memory retrieval in ACT-R results are displayed in figure 2. The greater restriction of This claim shall be illustrated by an example. Imagine two model 2 is apparent. simple cognitive models using ACT-R, both of them retriev- A close inspection of the models reveals that model 2 pos- ing a series of chunks from declarative memory. Which sesses just one degree of freedom – the total run time: De- chunk is to be retrieved next depends on the production that pending on the noisy chunk activation, the model may run currently fires. The models stop when the memory retrieval shorter or longer. But when the number of states visited dur- fails which depends mostly on the amount of noise added to ing a run is known, the sequence of states is known as well. chunk activation. On the other hand, model 1 has more degrees of freedom Both models consist of four productions (resembling four because it can take multiple paths of production firings to state transitions), but differ in their number of chunks in the reach a specific state at a given position in the sequence of declarative memory. The structures of the models are dis- states. played in figure 1. Taken together, the example provides two important find- ings: • Even for two simple models as the ones described above (about 100 lines of code each), it is very hard to assess the model complexity when only the number of chunks and productions is known. In order to be able to judge a model, one has to look into the code. Model 1 Model 2 • Knowledge of numerical parameters is not sufficient for the estimation of model complexity as more degrees of Figure 1: Structure of the two models. Circles represent the freedom can emerge from the algorithmic structure. internal states, arrows productions. Initial and final states The strategy of “eliminating parameter estimation” (An- and corresponding productions are not shown. derson et al. 2004, p. 1057) in order to deal with the criti- cism of model fitting (Roberts & Pashler 2000) falls short Model 1 consists of two states a and b, represented by in this case.

Evaluating Cognitive Models and Architectures

A Distributed Computational Cognitive Model for Object Recognition

The Place of Modeling in Cognitive Science

A Bayesian Computational Cognitive Model ∗

Connectionist Models of Language Processing

Cognitive Psychology for Deep Neural Networks: a Shape Bias Case Study

Cognitive Modeling, Symbolic

Connectionist Models of Cognition Michael S. C. Thomas and James L

Artificial Intelligence and Cognitive Science Have the Same Problem

Models of Cognition: Neurological Possiblity Does Not Indicate Neurological Plausibility

Artificial Neural Nets

A Cognitive Model of Drivers Attention

Using Recurrent Neural Networks to Understand Human Reward Learning