<<

The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

Modelling Autobiographical Loss across Life Span Di Wang,1 Ah-Hwee Tan,1,2 Chunyan Miao,1,2,3 Ahmed A. Moustafa4,5 1Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly 2School of Computer Science and Engineering 3Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore 4School of Social Sciences and Psychology, Western Sydney University, Sydney, Australia 5Department of Social Sciences, College of Arts and Sciences, Qatar University, Doha, Qatar {wangdi,asahtan,ascymiao}@ntu.edu.sg, [email protected]

Abstract the psychological basis presented by Conway and Pleydell- Pearce(2000), which has been widely accepted and sup- Neurocomputational modelling of long-term memory is a ported by neural imaging evidence (Addis et al. 2012). Our core topic in computational cognitive neuroscience, which is prior work (Wang, Tan, and Miao 2016) focuses on memory essential towards -regulating brain-like AI systems. In this retrievals using imperfect cues and “wandering in reminis- paper, we study how people generally lose their and emulate various memory loss phenomena using a neuro- cence”, which refers to recalling a sequence of seemingly computational model. Specifically, random but contextually connected memory across different based on prior neurocognitive and neuropsychology studies, episodes of life events. In that prior work, we assume that we identify three neural processes, namely overload, decay the memory formation and retrieval processes can always and inhibition, which lead to memory loss in memory for- be performed perfectly, which would rarely be true in real- mation, and retrieval, respectively. For model valida- world scenarios. Moreover, due to the hardware constraints tion, we collect a memory dataset comprising more than one in agents or robots, discard of certain portion of the stored thousand life events and emulate the three key memory loss memory is necessary in most complex application domains. processes with model parameters learnt from memory Therefore, with a totally different purpose, in this paper, we behavioural patterns found in human subjects of different age show AM-ART can accurately emulate various human mem- groups. The emulation results show high correlation with hu- man memory recall performance across their life span, even ory loss phenomena. with another population not being used for . To the Specifically, we employ three key processes in AM-ART best of our knowledge, this paper is the first research work to replicate the three widely studied memory loss phases, on quantitative evaluations of autobiographical memory loss namely during memory formation, storage and retrieval using a neurocomputational model. (Jahn 2013), respectively. Moreover, we introduce three novel parameters to AM-ART to regulate the corresponding memory loss processes, namely overload as the likelihood Introduction of being affected by cognitive overload during formation In recent years, many governments and agencies have in- (Daselaar et al. 2009), decay as the rate of long-term mem- vested a record-high amount of resources to look deeper into ory fading during storage (Rubin 1982), and inhibition as human brain’s functional dynamics. However, as of today, the likelihood of retrieval failure during retrieval (Storm and it is still difficult or impossible to quantitatively evaluate a Levy 2012). Our approach of using a neural network with wide range of brain dynamics at the neural network level. relevant control parameters to model memory loss aligns From the point of view of AI, neurocomputational models with cognitive experts’ opinion that “the individual pattern built upon neurocognitive and neuropsychology theories can of impaired memory functions correlates with parameters of provide insight into human behavioural processes in a rapid structural or functional brain integrity” (Jahn 2013). and quantitative manner. For example, according to Wang, For performance evaluations, we collect an autobiograph- Gauthier, and Cottrell(2016), “one advantage of computa- ical memory dataset comprising more than one thousand tional models is that we can analyse them in ways we cannot life events from public domains. However, because this col- analyse human participants to provide hypotheses as to the lected dataset does not span across one’s entire life (e.g., underlying mechanisms of an effect.” from childhood to 70s), in order to conduct relevant ex- In this paper, we evaluate how people generally lose periments, we alter the event dates so that the collected their memories by exploiting an established computational life events are equally distributed across the life stages and autobiographical memory model (Wang, Tan, and Miao the ratio among pleasant, neutral and unpleasant memories 2016), named Autobiographical Memory-Adaptive Reso- in each life stage conforms to the distribution reported by nance Theory network (AM-ART). AM-ART is built upon Berntsen and Rubin(2002). Moreover, it has been found that people of all ages tend to recall more pleasant memories Copyright ⃝c 2019, Association for the Advancement of Artificial rather than unpleasant ones, although the voluntarily non- Intelligence (www.aaai.org). All rights reserved. recalled unpleasant memories are still retained. We model

1368 Figure 1: Network structure of AM-ART. All its input channels in F1 and the F2 and F3 layers match specific brain regions. this tendency based on the memory survey data reported ory loss is essential towards self-regulating systems to ac- by Rubin and Berntsen(2003). Subsequently, we perform commodate physical memory constraints. For example, to model evaluations based on the memory recall data reported achieve better efficiency, deep reinforcement learning agents by Berntsen and Rubin(2002). Specifically, we learn the normally perform mini-batch learning based on the experi- memory loss parameter values by emulating the memory re- ence replay strategy (Lin 1993). Other than the improvement call performance of human subjects in different age groups in time-wise learning efficiency, experience replay also pos- and further use the learnt parameter values to predict the sesses the following perk: “the behavior distribution is aver- performance of human subjects in the subsequent life stage. aged over many of its previous states, smoothing out learn- The emulation results show high correlation, even with the ing and avoiding oscillations or divergence in the parame- memory recall performance of another population reported ters” (Mnih et al. 2013). However, by performing random by Rubin and Schulkind(1997). sampling, the conventional experience replay strategy ig- As such, we show that AM-ART can accurately capture nores the importance or the quality of different experiences. the characteristics of human autobiographical memory loss. To incorporate the quality of the experiences during sam- Therefore, we provide a useful tool to analyse various mem- pling, various experience replay techniques, such as prior- ory loss phenomena that may be difficult or impossible in itized (Schaul et al. 2015), hindsight (Andrychowicz et al. human subjects. To the best of our knowledge, this paper is 2017) and dual (Wei et al. 2018), have been proposed in the the first research work on quantitative evaluations of autobi- literature. Nonetheless, these extended strategies are built ographical memory loss using a neurocomputational model. upon purely goal-orientated mechanisms, without any neu- rocognitive basis. Although not being the focus of this paper, Related Work it will be quite stimulating to implement autonomous agents For the same purpose of using a neurocomputational model that are able to emulate human memory recall behaviours. to verify neurocognitive theories and perform quantitative evaluations, Wang, Gauthier, and Cottrell(2016) use PCA AM-ART Model and Its Dynamics (principal component analysis) and MLP (multi-layer per- The network structure of Autobiographical Memory- ceptron) with one hidden layer, wherein different number of Adaptive Resonance Theory (AM-ART) model is shown in hidden neurons are used to represent the corresponding level Figure1. AM-ART is a three-layer neural network, wherein of the human participants’ pattern recognition ability. Their the event-specific knowledge of autobiographical memory model supports the “experience moderation effect” observed is presented to the bottom layer F1 to encode life events in by Gauthier et al.(2014). In this paper, we use AM-ART as the middle layer F2 and a sequence of related events in F2 the neurocomputational model to replicate human memory are encoded into an episode in the top layer F3. AM-ART is loss phenomena in different age groups. consistent with the hierarchical model established by Con- Many well-established cognitive models, such as Soar way and Pleydell-Pearce(2000), which is supported by neu- (Laird 2012), ACT-R (Anderson et al. 2004) and Icarus ral imaging evidence (Addis et al. 2012), in terms of both (Langley 2006), employ functionally specific memory mod- the network architecture and functional dynamics (Wang, ules. Moreover, few such cognitive models further inves- Tan, and Miao 2016). Furthermore, we find that the circuit tigate the dynamics of long-term memory , e.g., of AM-ART network may reside in the temporal lobe of the Derbinsky and Laird(2013) heuristically define memory de- human brain (see Figure1). Specifically, inputs of time and cay mechanisms in Soar. Nonetheless, we select AM-ART location may be from entorhinal cortex (Kraus et al. 2015), to emulate memory loss phenomena due to its (i) high con- inputs of people and activity may be from fusiform gyrus sistency with the neural and psychological basis in terms of (Kanwisher 2001), inputs of and imagery may be both the network architecture and functional dynamics and from amygdala (Phelps 2004), and both the F2 and F3 layers (ii) comprehensively defined memory and retrieval may reside in (Stark et al. 2013). Please note parameters and mechanisms. that the inputs to AM-ART are considered as recognized or In the perspective of AI, modelling long-term mem- processed information, e.g., imagery used for memory en-

1369 coding in hippocampus comes from amygdala (Phelps 2004) weight vector of Cj∗ is computed as follows: rather than directly from occipital lobe. k k |x ∧ w ∗ | AM-ART extends the network structure of fusion ART k j Mj∗ = . (2) (Tan, Carpenter, and Grossberg 2007), which is a generic |xk| self-organizing neural network comprising two layers of k k If C ∗ satisfies the vigilance criteria such that ∀M ∗ ≥ ρ , neural fields connected by bidirectional conditional links. j j a resonance occurs which leads to the subsequent learning However, the same bottom-up search and top-down readout or readout process. Otherwise, a mismatch reset occurs in operations between the layers still apply in AM-ART. which Tj∗ ← 0 until a resonance occurs at another F2 code. When an uncommitted code (definitely satisfies the criteria Dynamics of Fusion ART as weights are all 1s) is identified as the winner and recruited With reference to F (comprising six input channels) and F for learning, it becomes committed. Subsequently, a new un- 1 2 F (comprising one association channel) shown in Figure1, we committed code will be added in 2. As such, fusion ART introduce the dynamics of fusion ART as follows. self-organizes its network structure (Wang and Tan 2016). ∗ k k k k Template learning: If learning is required, once Cj is Input vectors: Let I = (I1 ,I2 ,...,IL) denote the identified, its corresponding weight vectors are updated by k input vector, where Il denotes input l to channel k, for the following learning rule: l = 1, 2,...,L and k = 1, 2,...,K, where L denotes the k k(new) k k(old) k k k(old) length of I and K denotes the number of input channels. wj∗ = (1 − β )wj∗ + β (x ∧ wj∗ ). (3) k Input channels: Let F1 denote an input channel that re- Knowledge readout: When this top-down knowledge ceives Ik and let xk = (xk, xk, . . . , xk ), where xk ∈ [0, 1], 1 2 L l readout process is invoked, C ∗ presents its weight vectors k k j denote the activation vector of F1 receiving I . If fuzzy k(new) k to the input fields, such that x = w ∗ . ART operations (see (1) and (3)) are used, xk is further aug- j k k k mented with a complement vector x , where xl = 1 − xl . Encoding and Retrieval of Events in AM-ART This augmentation is named complement coding, which is xk applied to prevent the “code proliferation” problems (Car- To make the activation vectors (5W1H of a life event) in F penter, Grossberg, and Rosen 1991). For comprehensive dis- each input channel of 1 generic, we use normalized values cussions on complement coding and fuzzy ART operations, to represent time (when) and location (where) and use cat- interested readers may refer to (Wang and Tan 2015a). egorical values to represent people (who), activity (what), emotion (how) and imagery (which) (all with complements). y = (y , y , . . . , y ) Association channel: Let 1 2 J denote Time vector (x1): It represents when the event happened F J the activation vector of 2, where denotes the number of in the form of normalized year: x1 = (I1 − 1900)/200, F J −1 1 1 codes in 2. Please note that there are always commit- month: x1 = I1/12, and day: x1 = I1/31. J F 2 2 3 3 ted (learned) codes and one uncommitted ( th) code in 2. Location vector (x2): It represents where the event hap- If fusion ART learns from scratch, it only has one uncom- 2 2 pened in the form of normalized latitude: x1 = (I1 + mitted code in F2 (weight vector is to 1s). 2 2 2 k 90)/180 and longitude: x2 = (I2 + 180)/360 (I is de- Weight vectors: Let wj denote the weight vector of the termined using the Google Geocoder API). k 3 jth code Cj in F2 for learning the input patterns in F1 . People vector (x ): It is a binary-valued vector represent- Parameters: The dynamics of fusion ART are regu- ing who were involved in the event. Its length corresponds to lated by the parameters associated with each input channel, the categorization of people based on inter-personal relation- namely choice parameters αk > 0, learning rate parame- ship. For the dataset used in this paper, we define eight types ters βk ∈ [0, 1], contribution parameters γk ∈ [0, 1], where of relationship, namely family, neighbours, spouse, friends, ∑ γk = 1, and vigilance parameters ρk ∈ [0, 1]. classmates, colleagues, acquaintances and strangers. Code activation: A bottom-up memory search first starts Activity vector (x4): It is a binary-valued vector repre- from the computation of the activation values in all codes in senting what was the event. Similarly, its length corresponds k K to the categorization of activities. For the dataset used in F2. Specifically, given {x |k=1}, for each F2 code Cj, the corresponding activation Tj is computed as follows: this paper, we define eight types of activities, namely work, meal, sports, travel, school, shopping, religious and leisure. k k Emotion vector (x5): It is a binary-valued vector rep- ∑ k |x ∧ wj | Tj = γ , (1) resenting how was the feeling during the event. Emotion αk + |wk| k j is an important component of our past experience, which highly affects the encoding and retrieval of autobiograph- where the fuzzy AND operation ∧ is defined by pi ∧ qi ≡ ∑ ical memories (Berntsen and Rubin 2002). We categorize min(pi, qi) and the norm |.| is defined by |p| ≡ i pi. nine types of emotion, namely neutral, astonished, excited, J Code competition: Given {Tj|j=1}, the F2 code with the happy, satisfied, tired, sad, miserable and annoyed, fol- highest activation value is named the winner, which is in- lowing the classical valence-arousal model (Russell 1980), ∗ ∗ dexed at j , where j = arg maxjTj. which has been widely adopted in various computational Template matching: This template matching process models, e.g., (Wang and Tan 2014; Tang et al. 2017). 6 checks whether resonance occurs at the winner code Cj∗ . Imagery vector (x ): It is a binary-valued vector rep- Specifically, the match between the input pattern andthe resenting which pictorial memory was associated with the

1370 Algorithm 1 Event encoding and retrieval in AM-ART Algorithm 3 Memory loss process during formation k k k 1: encode x in F1 w.r.t the given input pattern I 1: upon receiving an input pattern I for memory forma- k k k 2: activate all codes in F2 {code activation, see (1)} tion, encode x in F1, furthermore, update ρ and γ 3: repeat {overload effect, see (4) and (5), respectively} 4: selecting the winner code Cj∗ {code competition} 2: identify the winner code Cj∗ where resonance occurs 5: until resonance occurs {template matching, see (2)} 3: perform encoding for memory formation 6: if encoding is required then 4: if memory encoded is the first in a new time period, ∗ 7: perform learning {template learning, see (3)} ∀j ̸= j in F2, decrease vj {decay effect, see (6)} 8: else if retrieval is required then k 9: read out wj∗ in F1 {knowledge readout} 10: end if introduce the overload parameter λ ∈ [0, 1] to regulate the likelihood of one being affected by cognitive overload dur- Algorithm 2 Episode encoding and retrieval in AM-ART ing memory formation. Specifically, λ influences the vigi- lance parameters ρk (see template matching) and contribu- 1: for all subsequent events of an episode do k k tion parameters γ (see code activation) as follows: 2: select the winner code Cj∗ in F2 w.r.t x in F1 3: yj∗ ← 1, or a predefined value if using partial se- {1 − λ(1 − rand()k), if rand()k > λ, quence to identify the episode ρk = (4) 0, otherwise, 4: for all previously selected codes in F2 do (new) (old) 5: yj ← yj (1 − τ) where rand() ∈ [0, 1] generates a random number and 6: end for k 7: end for k ρ ∗′ γ = ∑ k . (5) 8: Select the winner code j in F3 w.r.t y k ρ 9: if encoding is required then ′ Due to the lack of quantitative studies in the related neu- 10: learn the weight vector wj∗′ in F3: ′(new) ′(old) ′(old) robiology and neurocognitive literature, there does not ex- wj∗′ ← (1 − β2)wj∗′ + β2(y ∧ wj∗′ ) ist a good reference on how to determine the cognitive load 11: else if retrieval is required then during memory formation based on both one’s state of mind ′ 12: read out wj∗′ in F2 and external stimuli. Instead, we have to employ a random 13: end if generator rand()k to emulate the cognitive capability on the kth input channel in F1 during the formation of each life event. Thus, equation (4) describes that with probabil- event. Its value encodes the specific repository address of ity λ (if rand()k ≤ λ), the kth input channel is overlooked the stored imagery. During memory retrieval, this vector is (ρk = γk = 0) during memory formation due to cognitive not presented along with the others as a part of the retrieval overload in the respective brain region. Otherwise, the vigi- cue. In other words, this imagery field is only involved when lance equals to the level of , which is estimated as encoding the life events and retrieving particular pieces of 1−λ(1−rand()k), i.e., a lower λ value and a higher rand()k memories for visual playback (Wang and Tan 2015b). value lead to the formation of more distinguishable memory. The F2 layer of AM-ART encodes events. The process of The process of memory loss during formation is shown event encoding and retrieval is shown in Algorithm1. in Algorithm3. Generally speaking, people in different life

stages, denoted as ti, differ in λti . In our emulations, we

Encoding and Retrieval of Episodes in AM-ART learn the values of λti using published memory survey data. Assume the related events of one episode happened at t , t , . . . , t and let y denote the activation value of the 0 1 n ti Memory Loss during Storage event happened at ti. To encode the sequence of the events, we need to always hold the inequality that ytn > ytn−1 > During long-term storage, memory decays along time due

··· > yt0 . Therefore, we use a succession parameter τ ∈ to inactivation. Although this decay is monotonic, its rate (new) declines rapidly at first and then much more slowly, which (0, 1) to regulate the activation sequence, such that yj = (old) well fits an exponential curve (Rubin 1982). Therefore, in yj (1 − τ) at each new time step. The F3 layer of AM- AM-ART, we introduce the decay parameter φ ∈ [0, 1] to ART encodes episodes to associate the related events en- regulate the rate of long-term memory fading. Moreover, we coded in F . The process of episode encoding and retrieval 2 introduce the vividness parameter vj ∈ [0, 1], which asso- is shown in Algorithm2. ciates with each event in F2 to denote the vividness of the jth event. Upon encoding (see (3)) at t , event j has the Memory Loss during Formation a highest level of vividness, i.e., vj = 1. Specifically, as time During the memory formation process, memory loss occurs elapses, the vividness of an encoded event decays (see Step4 in the form of encoding failure, which is caused by the deac- of Algorithm3) in the following manner: tivation of certain brain region(s) due to a demanding cogni- (new) (old) tive task (Daselaar et al. 2009). Therefore, in AM-ART, we vj = max(0, vj − exp(φ − (ti − ta))), if i > a, (6)

1371 where exp(φ−(ti −ta)) denotes the decay rate and (ti −ta) Algorithm 4 Memory loss process during retrieval denotes the amount of elapsed time. Because φ ≤ 1 and 1: upon receiving a cue Uk for memory retrieval, encode t −t ≥ 1, the decay rate is nicely bounded within the [0, 1] i a xk in F , furthermore, reset ρk ← ρk and γk ← γk interval. When v ≤ 0, the jth event is no longer retrievable. 1 0 0 j 2: repeat On the other hand, for healthy persons, their memory gets 3: selecting the winner code C ∗ for inhibition check refreshed through reactivation (Gisquet-Verrier and Riccio j 4: increase v ∗ {reactivation effect, see (7)} 2012), wherein a similar pattern of the associated features j 5: if inhibition occurs then is recalled (Chalfonte and Johnson 1996). Therefore, during 6: T ∗ ← 0 {inhibition effect, see (8)} memory retrieval, the vividness of a winner event j increases j 7: else proportionally to its activation value (see (1)) due to reacti- 8: further check if resonance occurs at C ∗ vation (see Step4 of Algorithm4) in the following manner: j 9: end if (new) (old) (old) vj = min(1, vj +exp(φ−(ti −ta))Tj), if vj > 0. 10: until resonance occurs (7) 11: perform readout for memory retrieval The decay rate can be rewritten as exp(φ) exp(−(ti − ta)), which means a higher φ value and longer elapsed time lead to greater memory decay or reactivation. In our emu- where θs ∈ [0◦, 360◦] denotes the angle of affective state s φ neutral 1 ++ – lations, we learn the values of ti associated with different ζ = (ζ + ζ ) in the 2-D circumplex. Moreover, ti 2 ti ti . In life stages using published memory survey data. µ ζ– ζ++ our emulations, we learn the values of ti , ti and ti using published memory survey data. Memory Loss during Retrieval The memory loss process during retrieval is shown in Al- Memory loss during retrieval manifests as retrieval-induced gorithm4, wherein the initial parameter values of AM-ART forgetting (RIF), which refers to the phenomenon of certain are denoted with 0 in the subscript. Unlike memory loss dur- information becomes less recallable due to memory interde- ing storage that an event can no longer be retrieved once pendency (Storm and Levy 2012). RIF has been identified as its vividness decreases to zero (see (6)), RIF only causes goal-directed and may not necessarily within conscious con- the memory temporarily inaccessible to conscious recall trol (Barnier, Hung, and Conway 2004). Among the various (Barnier, Hung, and Conway 2004). possible intricate accounts of memory retrieval inhibition, we adopt the two prominent ones that have been most widely Using AM-ART to Model Memory Loss and frequently supported by empirical studies (Storm and Levy 2012), namely cue independence, which means RIF To validate our approach of using AM-ART to model mem- takes place regardless of the choice of retrieval cues, and ory loss, we collect a memory dataset from public domains competition dependence, which means RIF is affected by and use it to conduct all the experiments in this paper. Please the similarity between the to-be-retrieved piece of memory note that the collection of a relatively large real-world auto- and its competitors. Furthermore, although it might be well- biographical memory dataset is definitely necessary because known that the elderly tend to recall more positive memories the natural relationships among the event-specific knowl- than negative ones, emotional inhibition has been identified edge (Conway and Pleydell-Pearce 2000) reflect actual sce- in young adults as well (Barnier, Hung, and Conway 2004). narios and remain relatively consistent throughout one’s life, Therefore, in AM-ART, we introduce the inhibition param- which a randomly generated memory set cannot offer. eter µ ∈ [0, 1] to regulate the likelihood of retrieval failure. Our collected dataset comprises 1,019 snapshots of life events (5W1H) in 131 episodes of Mr. Obama, the 44th Specifically, when the jth event in F2 is identified as the winner, before checking whether resonance occurs, its acti- President of USA. Other persons’ memory sets can also be vation value may be reset due to inhibition, which is regu- used for the experiments conducted in this paper. We sim- lated in the following manner: ply choose Mr. Obama because his life events are largely available online with rich context (for tagging 5W1H) and T = 0, if rand() < µ(1 − (T − T )T )ζs . (8) j j l j ti no privacy issue is involved. Specifically, we directly ex- where l denotes the index of the event that has the second tract the images and their corresponding context from the ζs highest activation value (see (1)) and ti denotes one’s emo- online web pages (Zimbio.com and Google Images) except tional coefficient parameter in life stage ti associated with emotion (manually derived, as emotion recognition based on affective state s of the winner event j. It is obvious in (8) image and its context is not the focus of this paper). How- that with a larger activation value Ti of the winner event ever, because this dataset does not evenly span across one’s and a larger difference between the winner and the runner- entire life (less memory collection in childhood and young up (Tj − Tl), the chance of the winner gets inhibited from adulthood), we alter the event dates and (roughly) equally ζs retrieval is smaller. Moreover, the value of ti is bounded distribute the events across all life stages based on the in- ζ– between that of the most negative state ti (low valence and tuitive assumption that the number of events experienced low arousal, see the 2-D circumplex model of affect (Russell during the same length of long time periods should also be ζ++ 1980)) and that of the most positive state ti (high valence equal. The number of life stages is set to eight, which fol- and high arousal), which can be computed as follows: lows the categorization criteria used by Berntsen and Rubin s ◦ (2002) that from 0s to 70s, each has ten years’ time span, s – 1 + cos(θ − 45 ) ++ – ζt = ζt + (ζt − ζt ), (9) i.e., t ∈ {0, 1,..., 7} (see (6)). Moreover, we make sure the i i 2 i i i

1372 Table 1: List of estimated emotional coefficient values.

1.6 20 Age 0s 10s 20s 30s 40s 50s 60s 70s 1.4 – 30 ζt 0.431 0.547 0.625 0.644 0.615 0.632 0.449 0.352 i 1.2 ++ ζt 0.877 0.893 0.882 0.900 0.842 0.805 0.716 0.671 40 i 1

0.8 50

0.6 Table 2: List of initial parameter values used in experiments. 60

Proportion of Memories 0.4 70 Parameter Value Description/Remark 0.2 Choice (αk) 0.001 Mainly used to avoid having NaN in (1) 0 0 0 10 20 30 40 50 60 70 80 k Life Stage Learning rate (β0 ) 0.5 Not in use during memory retrieval k ∑ k Contribution (γ0 ) 0.167 Equally assigned, such that γ0 = 1 (a) (b) k Vigilance (ρ0 ) 0.9 During memory formation, determined by (4) Succession rate (τ) 0.05 Used for encoding event sequence (see Algo.2) Figure 2: Memory recall distributions of different age groups across life span: (a) Figure 6 of [Berntsen and Rubin, 2002]. (b) Results of AM-ART emulations. To make all plots visi- ratio among pleasant (cos(θs − 45◦) > 0, see (9)), neutral ble, an offset of 0.2 is applied to each adjacent age group. and unpleasant (cos(θs − 45◦) < 0) memories in each life stage conforms to the distribution reported in Figure 12 of (Berntsen and Rubin 2002), in which a significantly higher proportion of memories recalled in different life stages to ratio of pleasant memories is reported in 20s. Please note investigate the following research question: that when an episode is selected for date alternation, the dates of all its events are changed to the corresponding life How accurately can our proposed computational mem- stage, following the original event sequence. As such, al- ory loss procedures replicate the memory recall be- though certain episode sequence may become unnatural in havioural patterns observed in real world? real world, this necessary event date alteration procedure To answer the above question by applying AM-ART does not affect the utility of our proposed model. memory loss procedures on the survey data visualized in Furthermore, before we conduct the memory loss emu- Figure2(a), we need to assume that an individual’s memory lations, we predetermine the emotional coefficient parame- loss parameter values do not vary within the same life stage, ter value ranges (see (9)) based on the number of emotional i.e., λti , φti and µti associated with each individual, where memory recalls reported in Table 1 of (Rubin and Berntsen ti ∈ {0, 1,..., 7}, remain invariant. Furthermore, we use 2003), which extends their prior study (Berntsen and Ru- Genetic Algorithm (GA) (Goldberg 1989) to emulate the in- bin 2002) (more human subjects: 1,307 VS 1,241). Specif- dividual subjects (assume they all went through the same life ζ++ ically, we compute ti using the ratio of the total number events at each life stage) and minimise the difference (root of “pride” and “love” memory recalls over the total number mean square error, RMSE) between the emulated memory ζ– of their attempts. Similarly, we compute ti based on “fear”, recall performance and the published survey data. Specif- “jealousy” and “anger”. Please note that “important” mem- ically, for each age group, the chromosome length is set to ory recalls listed in the same table is not used as they do not 3×(ti+1) and each gene represents one of λti , φti and µti in tie to any particular emotion. The predetermined parameter real number. The various GA strategies employed are tour- values are reported in Table1. Because the memory recalls nament selection of parents (size=2 and probability=0.75), in 0s and 10s are missing from (Rubin and Berntsen 2003), uniform crossover (rate=1), bounded mutation (to ensure all we estimate those values by polynomial extrapolating the gene values are kept within [0, 1], rate=0.75), and elitism re- same emotional coefficient values in other age groups. We placement (ratio=0.1). For each age group, the population set the polynomial degree to 2 because it is low enough to size is set to 200 and GA terminates after 20 iterations. In avoid overfitting and high enough to well keep the extrapo- addition, we maintain a pool of identical best-performers lated values within a certain range. For example, in Table1, across GA iterations in parallel. The pool size is set to 200, ++ ζ0 will be greater than 1 if the polynomial degree is set to which is close to the averaged number of subjects in each age 1 (i.e., linear extrapolation). group (206.83) interviewed by Berntsen and Rubin(2002). The initial parameter values of AM-ART are listed in Ta- The emulations are conducted as follows. For each age ble2. Most such parameters take on a standard set of param- group, each individual and each life stage, memory forma- eter values and all do not require tuning during runtime. tion (encoding) first takes place. Upon proceeding tothe subsequent life stage, memory decay takes place. Moreover, Emulating Memory Loss Across One’s Life Span all the prior memories are used once again as retrieval cues In the study conducted by Berntsen and Rubin(2002), they to emulate (reactivation). In the end, one’s interviewed 1,241 subjects aged 20 and above to learn their retrieval performance at the last life stage is recorded as the memory recalls across their life span in various manners. final emulation result. The performance of the 200 individu- Among the various assessments they reported, involuntary als kept in the pool is averaged and visualized in Figure2(b). autobiographical memory recalls may best represent the dis- The curves shown in Figure2(b) appear to be more sta- tribution of the well-preserved memories across one’s life ble, but are highly consistent with those in Figure2(a) span (see Figure2(a)). Therefore, we use this set of reported that the averaged correlation of all the age groups across

1373 Table 3: Prediction errors of the estimated parameter values. select diverse experiences to be preserved for batch learn- ing. Such linkage between our human memory loss model Prediction of 30s 40s 50s 60s 70s and the agent’s memory discard strategy will be quite stim- Linear 0.0119 0.0141 0.0312 0.0495 0.0732 ulating that an agent is enabled to emulate human’s memory Polynomial 0.0177 0.0169 0.0294 0.0363 0.0516 recall behaviours, e.g., better preservation of recent (adaptiv- Random 0.0231 0.0260 0.0358 0.0686 0.0822 ity in the case of agent), happy (higher rewards), and young- Note: Polynomial degree=2; Random means a parameter value is randomly gener- adulthood (most frequently referenced) memories. ated from the [min, max] range of the corresponding values in previous life stages. Conclusion each life stage between these two subfigures is computed In this paper, we introduce the dynamics of a neurocomputa- as 0.793±0.166. Moreover, the phenomena observed in Fig- tional autobiographical memory model on how to replicate ure2(b) are highly consistent with the widely reported liter- real-world memory loss phenomena based on well estab- ature that “older adults demonstrate a three-component pat- lished neurocognitive theories. The emulation results show tern in the distribution of memories across the life span: few high correlation with human memory recall performance. memories from childhood (childhood ), a bump in Although our approach may only replicate one of the many young adulthood followed by a decrease in midlife (a rem- possible mechanisms used by human brain, it can be con- iniscence bump), and increase in later years (a recency ef- sidered as a piece of ground-breaking work in this research fect)” (Fromholt et al. 2003). Although both subfigures show direction. Going forward, we will implement the stimulat- the widely observed “between the ages ing memory discard strategy in autonomous agents to inves- of 10 and 30” (Demiray, Gulgoz, and Bluck 2009), the bump tigate the implications of their human-like behaviours. in Figure2(a) is in 10s while that in Figure2(b) is in 20s. This difference may be explained by the fact that Figure 12 Acknowledgments of (Berntsen and Rubin 2002), which comprehensively vi- This research is supported, in part, by the National Re- sualizes the distribution of emotional memories across all search Foundation, Prime Minister’s Office, Singapore un- eight life stages, actually reports the re-analysed distribution der its IDM Futures Funding Initiative and the Singapore of another population (Rubin and Schulkind 1997) (see Fig- Ministry of Health under its National Innovation Challenge ure 12’s caption of (Berntsen and Rubin 2002)). on Active and Confident Ageing (NIC Project No. MO- Moreover, we find that the results shown in Figure2(b) H/NIC/COG04/2017 and MOH/NIC/HAIG03/2017). are highly consistent with comparable memory assessments reported in another well-known study (Rubin and Schulkind 1997), wherein the memory recall ratios of 20s, 30s and References 70s within past two decades (see Table 2 of (Rubin and Addis, D. R.; Knapp, K.; Roberts, R. P.; and Schacter, D. L. Schulkind 1997)) are computed as 0.830, 0.776 and 0.476, 2012. Routes to the past: Neural substrates of direct and respectively. These ratios are remarkably similar to the cor- generative autobiographical memory retrieval. Neuroimage responding AM-ART emulation results of 0.857, 0.745 and 59(3):2908–2922. 0.386, respectively. Anderson, J. R.; Bothell, D.; Byrne, M. D.; Douglass, S.; Lebiere, C.; and Qin, Y. 2004. An integrated theory of the Predicting Memory Loss in Subsequent Life Stage mind. Psychological Review 111:1036–1060. We further test whether the learnt parameter values can Andrychowicz, M.; Crow, D.; Ray, A.; Schneider, J.; Fong, be used to predict memory loss in one’s subsequent life R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, P.; and stage. Specifically, we extrapolate the learnt parameter val- Zaremba, W. 2017. Hindsight experience replay. In NIPS, ues of age group ti to predict their memory performance 5055–5065. in t . The prediction results in terms of RMSE based on i+1 Barnier, A. J.; Hung, L.; and Conway, M. A. 2004. the 200 best-performing individuals are reported in Table3. Retrieval-induced forgetting of emotional and unemo- As shown, when predicting one’s memory performance in tional autobiographical memories. and Emotion the latter life stages (t ≥ 5), polynomial extrapolation per- i 18(4):457–477. forms much better than linear extrapolation. This finding is consistent with the widely reported literature that elderly’s Berntsen, D., and Rubin, D. C. 2002. Emotionally charged memory performance declines rapidly as they age (Small et autobiographical memories across the life span: The recall al. 1999; Wang et al. 2014; Wang and Tan 2017). of happy, sad, traumatic, and involuntary memories. Psy- chology and Aging 17(4):636–652. Applicability of Modelling Memory Loss in Agents Carpenter, G. A.; Grossberg, S.; and Rosen, D. B. 1991. Being able to model long-term memory loss like human does Fuzzy ART: Fast stable learning and categorization of ana- may shed light upon the design of and log patterns by an adaptive resonance system. Neural Net- utilization strategies in autonomous agents. For example, works 4(6):759–771. our memory loss model can be straightforwardly employed Chalfonte, B. L., and Johnson, M. K. 1996. Feature memory by a deep reinforcement learning agent with limited mem- and binding in young and older adults. Memory and Cogni- ory capacity in a complex game environment to effectively tion 24(4):403–416.

1374 Conway, M. A., and Pleydell-Pearce, C. W. 2000. The con- Rubin, D. C., and Schulkind, M. D. 1997. Distribution struction of autobiographical memories in the self-memory of important and word-cued autobiographical memories in system. Psychological Review 107(2):261–288. 20-, 35-, and 70-year-old adults. Psychology and Aging Daselaar, S. M.; Prince, S. E.; Dennis, N. A.; Hayes, S. M.; 12(3):524–535. Kim, H.; and Cabeza, R. 2009. Posterior midline and ven- Rubin, D. C. 1982. On the retention function for autobio- tral parietal activity is associated with retrieval success and graphical memory. Journal of Verbal Learning and Verbal encoding failure. Frontiers in Human Neuroscience 3:1–10. Behavior 21(1):21–38. Demiray, B.; Gulgoz, S.; and Bluck, S. 2009. Examining Russell, J. A. 1980. A circumplex model of affect. Journal the life story account of the reminiscence bump: Why we of Personality and Social Psychology 39(6):1161–1178. remember more from young adulthood. Memory 17(7):708– Schaul, T.; Quan, J.; Antonoglou, I.; and Silver, D. 2015. 723. Prioritized experience replay. arXiv:1511.05952. Derbinsky, N., and Laird, J. E. 2013. Effective and efficient Small, S. A.; Stern, Y.; Tang, M.; and Mayeux, R. 1999. forgetting of learned knowledge in Soar’s working and pro- Selective decline in memory function among healthy elderly. cedural memories. Cognitive Systems Research 24:104–113. Neurology 52(7):1392–1396. Fromholt, P.; Mortensen, D. B.; Torpdahl, P.; Bender, L.; Stark, S. M.; Yassa, M. A.; Lacy, J. W.; and Stark, C. E. L. Larsen, P.; and Rubin, D. C. 2003. Life-narrative and word- 2013. A task to assess behavioral pattern separation (BPS) cued autobiographical memories in centenarians: Compar- in humans: Data from healthy aging and mild cognitive im- isons with 80-year-old control, depressed, and dementia pairment. Neuropsychologia 51(12):2442–2449. groups. Memory 11(1):81–88. Storm, B. C., and Levy, B. J. 2012. A progress report on the Gauthier, I.; McGugin, R. W.; Richler, J. J.; Herzmann, G.; inhibitory account of retrieval-induced forgetting. Memory Speegle, M.; and VanGulick, A. E. 2014. Experience moder- and Cognition 40(6):827–843. ates overlap between object and face recognition, suggesting Tan, A.-H.; Carpenter, G. A.; and Grossberg, S. 2007. In- a common ability. Journal of Vision 14(8):article 7. telligence through interaction: Towards a unified theory for Gisquet-Verrier, P., and Riccio, D. C. 2012. Memory reacti- learning. In ISNN, 1094–1103. vation effects independent of reconsolidation. Learning and Tang, C.; Wang, D.; Tan, A.-H.; and Miao, C. 2017. Memory 19:401–409. EEG-based emotion recognition via fast and robust feature Goldberg, D. E. 1989. Genetic Algorithms in Search, Opti- smoothing. In BI, 83–92. mization, and Machine Learning. Addison-Wesley. Wang, D., and Tan, A.-H. 2014. Mobile humanoid agent Jahn, H. 2013. Memory loss in Alzheimer’s disease. Dia- with mood for elderly care. In IJCNN, 1549– logues in Clinical Neuroscience 15:445–454. 1556. Kanwisher, N. 2001. Neural events and perceptual aware- Wang, D., and Tan, A.-H. 2015a. Creating autonomous ness. Cognition 79:89–113. adaptive agent in a real-time first-person shooter computer game. IEEE Transactions on Computational Intelligence Kraus, B. J.; Brandon, M. P.; Robinson II, R. J.; Connerney, and AI in Games 7(2):123–138. M. A.; Hasselmo, M. E.; and Eichenbaum, H. 2015. Dur- Wang, D., and Tan, A.-H. 2015b. MyLife: An online per- ing running in place, grid cells integrate elapsed time and sonal memory album. In WI-IAT, 243–244. distance run. Neuron 88(3):578–589. Wang, D., and Tan, A.-H. 2016. Self-regulated incremental Laird, J. E. 2012. The Soar Cognitive Architecture. MIT clustering with focused preferences. In IJCNN, 1297–1304. Press. Wang, D., and Tan, A.-H. 2017. eHealthPortal: A social Langley, P. 2006. Cognitive architectures and general intel- support hub for the active living of the elderly. In ICCSE, ligent systems. AI Magazine 27:33–44. 19–25. Lin, L.-J. 1993. Reinforcement learning for robots using Wang, D.; Subagdja, B.; Kang, Y.;Tan, A.-H.; and Zhang, D. neural networks. Ph.D. Dissertation, Carnegie Mellon Uni- 2014. Towards intelligent caring agents for aging-in-place: versity. Issues and challenges. In CIHLI, 1–8. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Wang, P.; Gauthier, I.; and Cottrell, G. 2016. Are face Antonoglou, I.; Wierstra, D.; and Riedmiller, M. and object recognition independent? A neurocomputational 2013. Playing Atari with deep reinforcement learning. modeling exploration. Journal of Cognitive Neuroscience arXiv:1312.5602. 28(4):558–574. Phelps, E. A. 2004. Human : Interac- Wang, D.; Tan, A.-H.; and Miao, C. 2016. Modelling auto- tions of the amygdala and hippocampal complex. Current biographical memory in human-like autonomous agents. In Opinion in Neurobiology 14:198–202. AAMAS, 845–853. Rubin, D. C., and Berntsen, D. 2003. Life scripts help to Wei, Z.; Wang, D.; Zhang, M.; Tan, A.-H.; Miao, C.; and maintain autobiographical memories of highly positive, but Zhou, Y. 2018. Autonomous agents in Snake game via deep not highly negative, events. Memory and Cognition 31(1):1– reinforcement learning. In ICA, 20–25. 14.

1375