Competence-Preserving Retention of Learned Knowledge in Soar’s Working and Procedural

Nate Derbinsky ([email protected]) John E. Laird ([email protected]) University of Michigan, 2260 Hayward Street Ann Arbor, MI 48109-2121 USA

Abstract amounts of information from the environment, and where over time this learned knowledge overwhelms reasonable Effective management of learned knowledge is a challenge when modeling human-level behavior within complex, computational limits. In response, we present and evaluate temporally extended tasks. This paper evaluates one approach novel policies for selective retention of learned knowledge to this problem: knowledge that is not in active use in the working and procedural memories of Soar. These (as determined by base-level activation) and can likely be policies investigate a common hypothesis: it is rational for reconstructed if it becomes relevant. We apply this model for the architecture to forget a unit of knowledge when there is selective retention of learned knowledge to the working and a high degree of certainty that it is not of use, as calculated procedural memories of Soar. When evaluated in simulated, robotic exploration and a competitive, multi-player game, by base-level activation (Anderson et al., 2004), and that it these policies improve model reactivity and scaling while can be reconstructed in the future if it becomes relevant. We maintaining reasoning competence. demonstrate that these task-independent policies improve model reactivity and scaling, while maintaining problem- Keywords: large-scale cognitive modeling; working ; ; cognitive architecture; Soar solving competence. Introduction Related Work Typical cognitive models persist for short periods of time Previous cognitive-modeling research has investigated (seconds to a few minutes) and have modest learning forgetting in order to account for human behavior and requirements. For these models, current cognitive experimental data. As a prominent example, memory decay architectures, such as Soar (Laird, 2012) and ACT-R has long been a core commitment of the ACT-R theory (Anderson et al., 2004), executing on commodity computer (Anderson et al., 2004), as it has been shown to account for systems, are sufficient. However, prior work (Kennedy & a class of memory retrieval errors (Anderson et al., 1996). Trafton, 2007) has shown that cognitive models of complex, Similarly, research in Soar investigated task-performance protracted tasks can accumulate large amounts of effects of forgetting short-term (Chong, 2003) and knowledge, and that the computational performance of procedural (Chong, 2004) knowledge. By contrast, the existing architectures degrades as a result. motivation for and outcome of this work is to investigate the This issue, where more knowledge can harm problem- degree to which selective retention can support long-term, solving performance, has been dubbed the utility problem, real-time modeling in complex tasks. and has been studied in many contexts, such as explanation- Prior work shows the potential for cognitive benefits of based learning (Minton, 1990; Tambe et al., 1990), case- memory decay, such as in task-switching (Altmann & Gray, based reasoning (Smyth & Keane, 1995; Smyth & 2002) and heuristic inference (Schooler & Hertwig, 2005). Cunningham, 1996), and language learning (Daelemans et In this paper, we focus on improved reactivity and scaling. al., 1999). Markovitch and Scott (1988) have characterized We extend prior investigations of long-term symbolic different strategies for dealing with the utility problem in learning in Soar (Kennedy & Trafton, 2007), where the terms of information filters applied at different stages in the source of learning was primarily from internal problem problem-solving process. One common strategy that is solving. In this paper, the evaluation domains accumulate relevant to cognitive modeling is selective retention, or information from interaction with an external environment. forgetting, of learned knowledge. The benefit of this approach, as opposed to selective utilization, is that all The Soar Cognitive Architecture available knowledge is brought to bear on problem solving, Soar is a cognitive architecture that has been used for a property that is crucial for model competence in complex developing intelligent agents and modeling human tasks. However, it can be challenging to devise forgetting cognition. Historically, one of Soar’s main strengths has policies that work well across a variety of problem domains, been its ability to efficiently represent and bring to bear effectively balancing the task performance of cognitive large amounts of symbolic knowledge to solve diverse models with reductions in retrieval time and problems using a variety of methods (Laird, 2012). requirements of learned knowledge. Figure 1 shows the structure of Soar. At the center is a In context of this challenge, we present two tasks where symbolic that represents the agent’s effective behavior requires that the model accumulate large current state. It is here that perception, goals, retrievals from This distinction is relevant to forgetting productions. When an RL rule that was learned via chunking is updated, that rule is no longer redundant with the knowledge that led to its creation, as it now incorporates information from environmental interaction that was not captured in the original subgoal processing. Soar incorporates two long-term declarative memories, semantic and episodic (Derbinsky & Laird, 2010). stores working-memory objects, independent of overall working-memory connectivity (Derbinsky, Laird, & Smith, 2010), and incrementally encodes and temporally indexes snapshots of working memory, Figure 1: The Soar cognitive architecture. resulting in an autobiographical history of agent experience long-term memory, external action directives, and structures (Derbinsky & Laird, 2009). Agents retrieve knowledge from from intermediate reasoning are jointly represented as a one of these memory systems by constructing a symbolic connected, directed graph. The primitive representational cue in working memory; the intended memory system then unit of knowledge in working memory is a symbolic triple interprets the cue, searches its store for the best matching (identifier, attribute, value), termed a working-memory memory, and if it finds a match, reconstructs the associated element, or WME. The first symbol of a WME (identifier) knowledge in working memory. For episodic memory, the must be an existing node in the graph, whereas the second time to reconstruct knowledge depends on the size of (attribute) and third (value) symbols may be either terminal working memory at the time of , another constants or non-terminal graph nodes. Multiple WMEs that motivation for a concise agent state. share the same identifier are termed an “object,” and the set Agent reasoning in Soar consists of a sequence of of individual WMEs sharing that identifier are termed decisions, where the aim of each decision is to select and “augmentations” of that object. apply an operator in service of the agent’s goal(s). The Procedural memory stores the agent’s knowledge of when primitive decision cycle consists of the following phases: and how to perform actions, both internal, such as querying encode perceptual input; fire rules to elaborate agent state, long-term declarative memories, and external, such as as well as propose and evaluate operators; select an controlling robotic actuators. Knowledge in this memory is operator; fire rules that apply the operator; and then process represented as if-then rules. The conditions of rules test output directives and retrievals from long-term memory. patterns in working memory and the actions of rules add Unlike ACT-R, multiple rules may fire in parallel during a and/or remove working-memory elements. Soar makes use single phase. The time to execute the decision cycle, which of the Rete algorithm for efficient rule matching (Forgy, primarily depends on the speed with which the architecture 1982) and retrieval time scales to large stores of procedural can match rules and retrieve knowledge from episodic and knowledge (Doorenbos, 1995). However, the Rete algorithm semantic memories, determines agent reactivity. We have is known to scale linearly with the number of elements in found that 50 msec. is an acceptable upper bound on this working memory, a computational issue that motivates response time across numerous domains, including robotics, maintaining a relatively small working memory. video games, and human-computer interaction (HCI) tasks. Soar learns procedural knowledge via chunking (Laird et There are two types of persistence for working-memory al., 1986) and reinforcement learning (RL; Nason & Laird, elements added as the result of rule firing. Rules that fire to 2005) mechanisms. Chunking creates new productions: it apply a selected operator create operator-supported converts deliberate subgoal processing into reactive rules by structures. These WMEs will persist in working memory compiling over production-firing traces, a form of until deliberately removed. In contrast, rules that do not test explanation-based learning (EBL). If subgoal processing a selected operator create instantiation-supported structures, does not interact with the environment, the chunked rule is which persist only as long as the rules that created them redundant with existing knowledge and serves to improve match. This distinction is relevant to forgetting WMEs. performance by reducing deliberate processing. However, As evident in Figure 1, Soar has additional memories and memory usage in Soar scales linearly with the number of processing modules; however, they are not pertinent to this rules, typically at a rate of 1-5 KB/rule, which motivates paper and are not discussed further. forgetting of under-utilized productions. Reinforcement learning incrementally tunes existing Selective Retention in Working Memory production actions: it updates the expectation of action The core intuition of our working-memory retention policy utility, with respect to a subset of state (represented in rule is to remove the augmentations of objects that are not conditions) and an environmental or intrinsic reward signal. actively in use and that the model can later reconstruct from A production that can be updated by the RL mechanism long-term semantic memory, if they become relevant. We (termed in RL rule) must satisfy a few simple criteria related characterize WME usage via the base-level activation model to its actions, and is thus distinguishable from other rules. (BLA; Anderson et al., 2004), which estimates future usefulness of memory based upon prior usage. The primary deliberate reconstruction from semantic memory. activation event for a working-memory element is the firing Additionally, knowledge that is not in semantic memory can of a rule that tests or creates that WME. Also, when a rule persist indefinitely to support model reasoning. first adds an element to working memory, the activation of Requirement R5 supplements R4 by providing partial the new WME is initialized to reflect the aggregate support for the closed-world assumption. R5 dictates that activation of the set of WMEs responsible for its creation. either all object augmentations are removed, or none. This The base-level activation of a WME is computed as: policy leads to an object-oriented representation whereby ! procedural knowledge can distinguish between objects that !! � = ln ( �! ) have been cleared, and thus have no augmentations, and !!! those that simply are not augmented with a particular feature where n is the number of memory activations, tj is the time or relation. R5 makes an explicit tradeoff, weighting more since the jth activation, and d is a free decay parameter. For heavily model competence at the expense of the speed of computational efficiency, history size is bounded: each working-memory decay. This requirement resembles the working-memory element maintains a history of at most the declarative module of ACT-R, where activation is c most recent activations and the activation calculation is associated with each chunk and not individual slot values. supplemented by an approximation of the more distant past (Petrov, 2006). This model of activation sources, events, Empirical Evaluation and decay is task independent. We extended an existing system where Soar controls a At the end of each decision cycle, Soar removes from simulated mobile robot (Laird, Derbinsky, & Voigt, 2011). working memory each element that satisfies all of the Our evaluation uses a simulation instead of a real robot following requirements, with respect to τ, a static, because of the practical difficulties in running numerous, architectural threshold parameter: long experiments in large physical spaces. However, the R1. The WME was not encoded directly from perception. simulation is quite accurate and the Soar rules (and R2. The WME is operator-supported. architecture) used in the simulation are exactly the same as R3. The activation level of the WME is less than τ. the rules used to control the real robot. R4. The WME augments an object, o, in semantic memory. The robot’s task is to visit every room on the third floor of R5. The activation of all augmentations of o are less than τ. the Computer Science and Engineering building at the We adopted requirements R1-R3 from Nuxoll, Laird, and University of Michigan. For this task, the robot visits over James (2004), whereas R4 and R5 are novel. Requirement 100 rooms and takes about 1 hour of real time. During R1 distinguishes between the decay of representations of exploration, it incrementally builds an internal topographic perception, and any dynamics that may occur with actual map, which, when completed, requires over 10,000 WMEs sensors, such as refresh rate, fatigue, noise, or damage. to represent and store. In addition to storing information, the Requirement R2 is a conceptual optimization: as operator- model reasons about and plans using the map in order to supported WMEs are persistent, while instantiation- find efficient paths for moving to distant rooms it has sensed supported structures are direct entailments, if we properly but not visited. The model uses episodic memory to manage the former, the latter are handled automatically. objects and other task-relevant features during exploration. This means that if we properly remove operator-supported In our experiments, we aggregate working-memory size WMEs, any instantiation-supported structures that depend and maximum decision time for each 10 seconds of elapsed on them will also be removed, and thus our mechanism only time, all of which is performed on an Intel i7 2.8GHz CPU, manages operator-supported structures. The concept of a running Soar v9.3.1. Because each experimental run takes 1 fixed lower bound on activation, as defined by R3, was hour, we did not duplicate our experiments sufficiently to adopted from activation limits in ACT-R (Anderson et al., establish statistical significance and the results we present 1996), and dictates that working-memory elements will are from individual experimental runs. However, we found decay in a task-independent fashion as their use for qualitative consistency across our runs, such that the reasoning becomes less recent/frequent. variance between runs is small as compared to the trends we Requirement R4 dictates that our mechanism only focus on below. removes elements from working memory that can be We make use of the same model for all experiments, but reconstructed from semantic memory. From the perspective modify small amounts of procedural knowledge and change of cognitive modeling, this constraint on decay resembles a architectural parameters, as described here. The baseline working memory that is in part an activated subset of long- model (A0) maintains all declarative map information both term memory (Jonides et al., 2008). Functionally, in Soar’s working and semantic memories. A slight requirement R4 serves to balance the degree of working- modification to this baseline (A1) includes hand-coded rules memory decay with support for sound reasoning. to prune away rooms in working memory that are not Knowledge in Soar’s semantic memory is persistent, though required for immediate reasoning or planning. The may change over time. Depending on the task and the experimental model (A2) makes use of our working- model’s knowledge-management strategies, it is possible memory retention policy and we explored different values that any removed knowledge may be recovered via of the base-level decay rate (c=10 and τ=-2 for all models). Discussion It is possible to write rules that prune Soar’s working memory; however, this task-specific knowledge is difficult to encode and learn, and interrupts deliberate processing. In this work, we presented and evaluated a novel approach that utilizes a memory hierarchy to bound working-memory size while maintaining sound reasoning. This approach assumes that the amount of knowledge required for immediate reasoning is small relative to the overall amount of knowledge accumulated by the model. Under this assumption, as demonstrated in the robotic evaluation task, our policy scales even as learned knowledge Figure 2: Model working-memory size comparison. grows large over long trials. We note that since Soar’s semantic memory can change over time and is independent Figure 2 compares working-memory size between of working memory, our selective-retention policy does conditions A0, A1, and A2 over the duration of the admit a class of reasoning error wherein the contents of experiment. We note first the major difference in working- semantic memory are changed so as to be inconsistent with memory size between A0 and A1 after one hour, when the decayed WMEs. However, this corruption requires working memory of A1 contains more than 11,000 fewer deliberate reasoning in a relatively small time window and elements, more than 90% less than A0. We also find that the has not arisen in our models. While the model completed greater the decay-rate parameter for A2, the smaller the this task for all conditions reported here, at larger decay working-memory size, where a value of 0.5 qualitatively rates (≥0.6) the model thrashed because map information tracks A1. This finding suggests that our policy, with an was not held in working memory long enough to complete appropriate decay, keeps working-memory size comparable deep look-ahead planning. This suggests additional research to that maintained by hand-coded rules. is needed on either adaptive decay-rate settings or planning Figure 3 compares maximum decision-cycle time in approaches that are robust in the face of memory decay. msec., between conditions A0, A1, and A2 as the simulation progresses. The dominant cost reflected by this data is time Selective Retention in Procedural Memory to reconstruct prior episodes that are retrieved from episodic The intuition of our procedural-memory retention policy is memory. We see a growing difference in time between A0 to remove productions that are not actively used and that the and A2 as working memory is more aggressively managed model can later reconstruct via deliberate subgoal reasoning, (i.e. greater decay rate), demonstrating that episodic if they become relevant. We utilize the base-level activation reconstruction, which scales with the size of working model to summarize the history of rule firing. memory at the time of episodic encoding, benefits from At the end of each decision cycle, Soar removes from selective retention. We also find that with a decay rate of procedural memory each rule that satisfies all of the 0.5, our mechanism performs comparably to A1. We note following requirements, with respect to parameter τ: that without sufficient working-memory management (A0; A2 with decay rate 0.3), episodic-memory retrievals are not R1. The rule was learned via chunking. tenable for a model that must reason with this amount of R2. The rule is not actively firing. acquired information, as the maximum required processing R3. The activation level of the rule is less than τ. time exceeds the reactivity threshold of 50 msec. R4. The rule has not been updated by RL. We adopted R1-R3 from Chong (2004), whereas R4 is novel. Chong was modeling human skill decay, and did not delete productions, so as to not lose each rule’s activation history. Instead, decayed rules were prevented from firing, similar to below-utility-threshold rules in ACT-R. R1 is a practical consideration to distinguish learned knowledge from “innate” rules developed by the modeler, which, if modified, would likely break the model. R2 recognizes that matched rules are in active use and thus should not be forgotten. R3 dictates that rules will decay in a task- independent fashion as their use for reasoning becomes less recent/frequent. We note that for fixed parameters (d and τ) and a single activation, the BLA model is equivalent to the use-gap heuristic of Kennedy and Trafton (2007). However, the time between sequential rule firings ignores firing frequency, which the BLA model incorporates. Figure 3: Model maximum decision time comparison. Requirement R4 attempts to retain only those rules that the model cannot regenerate via chunking, a process that compiles existing knowledge applied in subgoal reasoning. Chunked rules that have been updated by RL encode expected utility information, which is not captured by other learning mechanisms. Because this information is difficult, if not impossible, to reconstruct, these rules are retained.

Empirical Evaluation We extended an existing system (Laird et al., 2011) where Soar plays Liar’s Dice, a multi-player game of chance. The rules of the game are numerous and complex, yielding a task Figure 5. Avg. task performance ±1 std. dev. that has rampant uncertainty and a large state space 11-40K, growth is linear (r2≥0.99). These plots indicate that (millions-to-billions of relevant states for games of 2-4 memory usage for the baseline (B0) and the slowly decaying players). Prior work has shown that RL allows Soar models model (B2, d=0.3) is much greater, and faster growing, than to significantly improve performance after playing a few models that more aggressively decay. It also shows that thousand games. However, this involves learning large there is a diminishing benefit from faster decay (e.g. d=0.5 numbers of RL rules to represent the state space. and 0.999 for B2 are indistinguishable). The model we use for all experiments learns two classes Figure 5 presents average task performance after 1,000 of rules: RL rules, which capture expected action utility, and games of training, where the error bars represent ±1 symbolic game heuristics. Our experimental baseline (B0) standard deviation. This data shows that given the inherent does not include selective retention. The first experimental stochasticity of the task, there is little, if any, difference modification (B1) implements our selective-retention between the performance of the baseline (B0) and decay policy, but does not enforce requirement R4 and is thereby levels of B2. However, by comparing B0 and B2 to B1, it is comparable to prior work (Kennedy & Trafton, 2007; clear that without R4, the model suffers a dramatic loss of Chong, 2004). The second modification (B2) fully task competence. For clarity, the model begins by playing a implements our policy. We experiment with a range of non-learning copy of itself and learns from experience with representative decay rates, including 0.999, where rules not each training session. While the B0 and B2 models improve immediately updated by RL are deleted (c=10, τ=-2 for all). from winning 50% of games to 75-80%, the B1 model We alternated 1,000 2-player games of training then improves to below 55%. We conclude that a selective- testing, each against a non-learning version of the model. retention policy that only incorporates production-firing After each testing session, we recorded maximum memory history (e.g. Chong, 2004; Kennedy & Trafton, 2007) will usage (Mac OS v10.7.3; dominated, in this task, by negatively impact performance in tasks that involve procedural memory), task performance (% games won), and informative interaction with an external environment. Our average decisions/task action. We do not report maximum policy incorporates both rule-firing history and rule decision time, as this was below 6 msec. for all conditions reconstruction, and thus retains this source of feedback. (Intel i7 2.8GHz CPU, Soar v9.3.1). We collected data for Finally, Figure 6 presents average number of decisions for all conditions in at least three independent trials of 40,000 the model to take an action in the game after training for games. For conditions that used selective retention, we were 10,000 games. In prior work (e.g. Kennedy & Trafton, able to gather more data in parallel, due to reduced memory 2007), this value was a major performance metric, as it consumption (six trials for d=0.35, seven for remaining). reflected the primary reason for learning new rules. In this Figure 4 presents average memory growth, in megabytes, work, each decision takes very little time, and so the number as the model trains, where the error bars represent ±1 of decisions to choose an action is not as crucial to task standard deviation. For all models, the memory growth of performance as the selected action. However, these data games 1-10K follows a power law (r2≥0.96), whereas for show that there exists a space of decay values (e.g. d=0.35) in which memory usage is relatively low and grows slowly (Figure 4), task performance is relatively high (Figure 5), and the model makes decisions relatively quickly (Figure 6).

Figure 4. Avg. memory usage ±1 std. dev. vs. games played. Figure 6. Avg. decisions/task action ±1 std. dev. Discussion Derbinsky, N., Laird, J. E. (2012). Computationally Efficient Forgetting via Base-Level Activation. Proc. of This work contributes evidence that we can develop models th that improve using RL in tasks with large state spaces. the 11 Intl. Conf. on Cognitive Modeling. Currently, it is typical to explicitly represent the entire state Derbinsky, N., Laird, J. E., Smith, B. (2010). Towards Efficiently Supporting Large Symbolic Declarative space, which is not feasible in complex problems. Instead, th Soar learns rules to represent only those portions of the Memories. Proc. of the 10 Intl. Conf. on Cognitive space it experiences, and our policy retains only those rules Modeling (pp. 49-54). that include feedback from environmental reward. Future Doorenbos, R. B. (1995). Production Matching for Large work needs to validate this approach in other domains. Learning Systems. Ph.D. Diss., Computer Science Dept. Carnegie Mellon, Pittsburgh, PA. Concluding Remarks Forgy, C. L. (1982). Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial This paper presents and evaluates two policies for effective Intelligence, 19 (1), 17-37. retention of learned knowledge from complex environments. Kennedy, W. G., Trafton, J. G. (2007). Long-term Symbolic While forgetting mechanisms are common in cognitive Learning. Cognitive Systems Research, 8, 237-247. modeling, this work pursues this line of research for Laird, J. E. (2012). The Soar Cognitive Architecture, MIT functional reasons: improving computational-resource usage Press. while maintaining reasoning competence. We have Laird, J. E., Derbinsky, N., Tinkerhess, M. (2011). A Case presented compelling results from applying these policies in Study in Integrating Probabilistic Decision Making and two complex, temporally extended tasks, but there is Learning in a Symbolic Cognitive Architecture: Soar additional work to evaluate these policies, and their Plays Dice. Papers from the 2011 Fall Symposium Series: parameters, across a wider variety of problem domains. Advances in Cognitive Systems (pp. 162-169). This paper does not address the computational challenges Laird, J. E., Derbinsky, N., Voigt, J. R. (2011). Performance associated with efficiently implementing these policies. Evaluation of Declarative Memory Systems in Soar. Proc. Derbinsky and Laird (2012) present and evaluate algorithms of the 20th Behavior Representation in Modeling & for implementing forgetting via base-level activation. Simulation Conf. (pp. 33-40). Laird, J. E., Rosenbloom, P. S., Newell, A. (1986). Acknowledgments Chunking in Soar: The Anatomy of a General Learning We acknowledge the funding support of the Air Force Mechanism. Machine Learning, 1 (1), 11-46. Office of Scientific Research, contract FA2386-10-1-4127. Markovitch, S., Scott, P. D. (1988). The Role of Forgetting in Learning. Proc. of the 5th Intl. Conf. on Machine References Learning (pp. 459-465). Minton, S. (1990). Qualitative Results Concerning the Altmann, E., Gray, W. (2002). Forgetting to Remember: Utility of Explanation-Based Learning. Artificial The Functional Relationship of Decay and Interference. Intelligence, 42, 363-391. Psychological Science, 13 (1), 27-33. Nason, S., Laird, J. E. (2005). Soar-RL: Integrating Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Reinforcement Learning with Soar. Cognitive Systems Lebiere, C., Qin, Y. (2004). An Integrated Theory of the Research, 6 (1), 51-59. Mind. Psychological Review, 111 (4), 1036-1060. Nuxoll, A., Laird, J. E., James, M. (2004). Comprehensive Anderson, J. R., Reder, L., Lebiere, C. (1996). Working Working Memory Activation in Soar. Proc. of the 6th Intl. Memory: Activation Limitations on Retrieval. Cognitive Conf. on Cognitive Modeling (pp. 226-230). Psychology, 30, 221-256. Petrov, A. (2006). Computationally Efficient Approximation Chong, R. (2003). The Addition of an Activation and Decay th of the Base-Level Learning Equation in ACT-R. Proc. of the Mechanism to the Soar Architecture. Proc. of the 5 Intl. th 7 Intl. Conf. on Cognitive Modeling (pp. 391-392). Conf. on Cognitive Modeling (pp. 45-50). Schooler, L., Hertwig, R. (2005). How Forgetting Aids Chong, R. (2004). Architectural Explorations for Modeling Inference. Psychological Review, 112 (3), 610-628. Procedural Skill Decay. Proc. of the 6th Intl. Conf. on Smyth, B., Cunningham, P. (1996). The Utility Problem Cognitive Modeling. Analysed: A Case-Based Reasoning Perspective. LNCS, Daelemans, W., Van Den Bosch, A., Zavrel, J. (1999). 1168, 392-399. Forgetting Exceptions is Harmful in Language Learning. Smyth, B., Keane, M. T. (1995). Remembering to Forget: A Machine Learning, 34, (pp. 11-41). Competence-Preserving Case Deletion Policy for Case- Derbinsky, N., Laird, J. E. (2009). Efficiently Implementing Based Reasoning Systems. Proc. of the 13th Intl. Joint Episodic Memory. Proc. of the 8th Intl. Conf. on Case- Conf. on Artificial Intelligence (pp. 377-382). Based Reasoning (pp. 403-417). Tambe, N., Newell, A., Rosenbloom, P. S. (1990). The Derbinsky, N., Laird, J. E. (2010). Extending Soar with Problem of Expensive Chunks and its Solution by Dissociated Symbolic Memories. Symposium on Human Restricting Expressiveness. Machine Learning, 5, 299-349. Memory for Artificial Agents, AISB (pp. 31-37).