DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020

Attractor Neural Network modelling of the Lifespan Retrieval Curve

PATRÍCIA PEREIRA

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

0

KTH Royal Institute of Technology School of Electrical Engineering and Computer Science Master Programme in Systems, Control and Robotics June 2020 Author: Patrícia Pereira, [email protected] Supervisors: Pawel Herman, [email protected] Anders Lansner, [email protected] Examiner: Erik Fransén, [email protected]

1

Abstract Human capability to episodic depends on how much time has passed since the was encoded. This dependency is described by a memory retrieval curve that reflects an interesting phenomenon referred to as a reminiscence bump - a tendency for older people to recall more memories formed during their young adulthood than in other periods of life. This phenomenon can be modelled with an attractor neural network, for example, the firing-rate Bayesian Confidence Propagation Neural Network (BCPNN) with incremental learning. In this work, the mechanisms underlying the reminiscence bump in the neural network model are systematically studied. The effects of synaptic plasticity, network architecture and other relevant parameters on the characteristics of the reminiscence bump are systematically investigated. The most influential factors turn out to be the magnitude of dopamine-linked plasticity at birth and the time constant of exponential plasticity decay with age that set the position of the bump. The other parameters mainly influence the general amplitude of the lifespan retrieval curve. Furthermore, the recency phenomenon, i.e. the tendency to remember the most recent memories, can also be parameterized by adding a constant to the exponentially decaying plasticity function representing the decrease in the level of dopamine neurotransmitters.

Keywords: reminiscence bump, attractor neural network, Bayesian Confidence Propagation Neural Network (BCPNN), recency, synaptic plasticity,

2

Sammanfattning Människans förmåga att återkalla episodiska minnen beror på hur lång tid som gått sedan minnena inkodades. Detta beroende beskrivs av en sk glömskekurva vilken uppvisar ett intressant fenomen som kallas ”reminiscence bump”. Detta är en tendens hos äldre att återkalla fler minnen från ungdoms- och tidiga vuxenår än från andra perioder i livet. Detta fenomen kan modelleras med ett neuralt nätverk, sk attraktornät, t ex ett icke spikande Bayesian Confidence Propagation Neural Network (BCPNN) med inkrementell inlärning. I detta arbete studeras systematiskt mekanismerna bakom ”reminiscence bump” med hjälp av denna neuronnätsmodell. Exempelvis belyses betydelsen av synaptisk plasticitet, nätverksarkitektur och andra relavanta parameterar för uppkomsten av och karaktären hos detta fenomen.

De mest inflytelserika faktorerna för bumpens position befanns var initial dopaminberoende plasticitet vid födseln samt tidskonstanten för plasticitetens avtagande med åldern. De andra parametrarna påverkade huvudsakligen den generella amplituden hos kurvan för ihågkomst under livet. Dessutom kan den s k nysseffekten (”recency effect”), dvs tendensen att bäst komma ihåg saker som hänt nyligen, också parametriseras av en konstant adderad till den annars exponentiellt avtagande plasticiteten, som kan representera densiteten av dopaminreceptorer.

Nyckelord: ”reminiscence bump”, attraktorneuronnät, Bayesian Confidence Propagation Neural Network (BCPNN), nysseffekt, synaptisk plasticitet, episodiskt mine.

3

Acknowledgements

I would like to thank Professors Pawel Herman and Anders Lansner for their enthusiastic supervision throughout the project. A warm thanks to my friends and colleagues with whom I shared my academic journey. My most grateful thanks to my parents for their love, care and support.

4

To my dear parents.

5

“Thus, our knowledge of the world, including ourselves, is incomplete as to space and indefinite as to time. This ignorance, implicit in all our brains, is the counterpart of the abstraction which renders our knowledge useful” - McCulloch and Pitts

6

Contents

1. Introduction ...... 8 1.1 Research question ...... 9 1.2 Aim and scope ...... 9 1.3 Thesis Outline ...... 10 2. Background ...... 11 2.1. Reminiscence bump ...... 11 2.1.1. Psychological and biological hypotheses ...... 12 2.1.2. Different ways of cuing lead to different bumps ...... 14 2.2. Neuronal computational models ...... 15 2.2.1. Abstract Models ...... 15 2.2.2. Detailed Models ...... 16 2.2.3. Attractor neural network memory modelling ...... 18 2.2.4. Other models ...... 19 3. Methods ...... 24 3.1. Attractor Memory Network Model ...... 24 3.1.1. Modularity ...... 24 3.1.2. BCPNN learning and network dynamics ...... 25 3.1.3. Meaning of model parameters ...... 26 3.2. Simulation protocol ...... 27 3.3. Analysis and evaluation ...... 29 4. Results ...... 30 4.1. Reminiscence bump ...... 31 4.2. Recency ...... 44 5. Discussion ...... 48 5.1. Summary of findings...... 48 5.2. Interpretation of the results and their impact ...... 48 5.3. Limitations ...... 49 5.4. Social, ethical and sustainability aspects ...... 49 6. Conclusion and Future Work ...... 51 Bibliography ...... 52

7

Chapter 1 Introduction

The advancement of neuroscience is beneficial to the humankind in many ways. There are however two main directions that have been tangibly capitalized on in recent times. The first one is that an improved understanding of neurological and psychological mechanisms enables the development of better medical treatments and therapies for neurological illnesses. It can also empower society as is illustrated by the example of headphones that maximize motor learning by applying a small electric current to the area of the brain that controls movement1. The other way in which a deepened understanding of the brain is beneficial is that it is a source of inspiration for algorithms that have useful applications such as deep learning algorithms used in computer vision and speech processing. In this direction, it contributes to the development of more “human-like” and powerful artificial agents. Within neuroscience, memory plays a key role. Studying memory is important because there has been an increasing interest in tackling brain diseases of which memory deficits are common symptoms such as Alzheimer’s disease and other types of dementia [1]. Memory is also a key aspect of cognition fundamental for intelligent behavior, namely in learning and decision processes [2]. In another perspective, we can consider life as a sum of memories important to keep our identity and mental health. The focus of this project is on long-term memory, precisely episodic memory. Episodic memory is a category of long-term memory which concerns events that occurred throughout one’s life. Important personal experiences belong to this category [3]. Long-term memory concerns information stored in the brain over a long period of time. It is established through long-term potentiation and depression, by which circuits of neurons in the brain are strengthened or weakened resulting in the strengthening or weakening synapses helping shape memory specific ensembles [4]. Long-term memory is important because it concerns the ability to learn new information and to recall that information later in time.

One such phenomenon is the reminiscence bump, the tendency for people above middle-aged to recall more memories from their 10-30 years old than from other periods of their life, that has consistently been observed in research [5]. This phenomenon was observed in 68 experiments between 1988 and 2017 [5]. Although the precise years in which the bump occurs vary according to the experiment, there is a strong empirical evidence that the maximal proportion of memories comes from and young adulthood [5].

1 www.haloneuro.com

8

Due to the importance of this phenomenon, computational models have been built to study potential mechanisms underlying this phenomenon. The advantage of a computational model is that it allows us to predict how each parameter corresponding to a biological mechanism affects the reminiscence bump which otherwise would be infeasible to test in human experiments. Among computational attempts at modelling the phenomenon in question are the Memory Chain Model [6] and the AM-ART model [7]. However, the focus in this work is on one of the most successful approaches to date in memory modelling, for its direct correspondence to neuronal circuits and interpretation of biological mechanisms [8], Bayesian confidence propagation neural network (BCPNN), developed in various forms at KTH Royal Institute of Technology. BCPNN employs a Hebbian learning rule [9] derived from the Bayes rule, which allows this recurrent model to function as an associative attractor memory network [8]. There have also been cases where BCPNN is used in a feed-forward architecture used for classification [10] and data mining [11]. Also, despite the availability of the BCPNN learning in a more biologically plausible spiking neural network implementation [10], in this thesis a more abstract rate based implementation is exploited due to the long-term nature of the memory phenomenon that is at the center of attention here.

1.1 Research question

We hypothesize that plasticity parameters play a fundamental role in modulating the aforementioned reminiscence bump. In addition, we test the capability of the model to account for the improved recall of the most recently encoded memories called a recency effect [12].

1.2 Aim and scope

The primary objective of this project is to study the effect of synaptic plasticity parameters governing the incremental BCPNN learning process of a rate based modular attractor memory network model on the characteristics of episodic memory recall. In addition, it is aimed to investigate how network size affects the storage of long-term memories over the modelled lifetime. It is expected that the network model’s mechanisms can be interpreted in the context of neurobiological effects. Therefore this study has potential to provide an embryo for new neurobiological hypotheses helping in understanding long-term effects of episodic memory recall.

9

In addition, it is intended to examine other phenomena in the memory recall over the lifetime, e.g. recency effect. The limitation defining the scope of the project is that the model only considers formed long-term memory and therefore does not represent the transfer of the memories from short-term memory into long-term as for example the Memory Chain Model [6] or the interaction between different areas of the brain such as in the Tracelink model [13]. Another limitation is the use of a firing-rate based network instead of its more realistic spiking counterpart. Finally, the focus is on the reminiscence bump and recency phenomena so no other episodic memory recall effects are considered in simulations.

1.3 Thesis Outline

Chapter 2 is an introduction to the reminiscence bump, the psychological and biological hypotheses for this phenomenon and how different ways of cuing lead to different bumps and provides an overview of neuronal computational models. Chapter 3 describes the model and explains the meaning of model parameters, describes the simulation protocol and the analysis and evaluation. Chapter 4 presents the results. Chapter 5 is a discussion of the results and confrontation with the hypotheses in Chapter 2. Chapter 6 is the conclusion and a description of future work.

10

Chapter 2 Background

2.1. Reminiscence bump

The reminiscence bump is the tendency for people above middle-aged to recall more memories from their 10-30 years old than from other periods of their life and has consistently being observed in autobiographical memory research [5]. The reminiscence bump from 4 different experimental studies is displayed (Figure 1).

Figure 1: Distribution of autobiographical memories from older adults as a function of reported age at time of event [14]

11

2.1.1. Psychological and biological hypotheses

There are several psychological and biological hypotheses for the origins of this phenomenon that are compatible and even support or complement each other. A cognitive account [14] states that in childhood one is confronted with many novel events but due to their rapid change they are less useful in later situations. Additionally, since novel events are more distinct and require more effort to process, memory organization changes constantly. On the other hand, in periods of stability starting from young adulthood, events are not that novel so there is less encoding effort and increased proactive interference2, resulting in poorer recall. Therefore, events from early adulthood corresponding to the transition period of memory organization from rapid change to stability are the ones that are more likely to be recalled since they have strong encoding in a stable memory organization with little proactive interference. Besides the reminiscence bump, the lifespan retrieval curve exhibits other phenomena such as , the inability of adults to recall episodic memories from their early childhood and recency, the tendency to retrieve recent memories. There exists work that attempts to remove recency from the experimental results, focusing on the reminiscence bump [15]. In a more concrete case of voluntary migration, the emigration period can be considered as a period of novelty in which the migrant is confronted with new realities and adaptation challenges. It is then followed by a period of stability, in which the immigrant settled down. In the light of the cognitive account, this would affect the reminiscence bump period of individuals that experienced migration and the peak would be expected to correspond to the migration or adaptation period. This is exactly what is reported in experimental studies [16]. Seniors that experienced migration were divided in different groups, according to their migration age. As expected, seniors that migrated during the usual bump period showed a bump corresponding to both the usual bump period that was also the migration period. More notably, seniors that migrated after the bump period, showed a bump corresponding to the migration period instead of the usual bump period, thus reflecting the influence of periods of novelty and adaptation and subsequent stability on the reminiscence bump. There were no significant feature differences in their memories neither were they more emotional than memories from other periods, indicating a pure cognitive adaptation phenomenon. In the basic-systems model of episodic memory [17], it is claimed that episodic memory is formed through the interaction of other supportive cognitive systems such as diverse sensory and action systems, memory systems, and other types of systems that result in multiple abilities such as search and retrieval, linguistic, emotional and

2 Proactive interference the interference effect of previously encoded memories on the encoding and retrieval of new memories. An example of proactive interference is the difficulty in remembering someone’s new phone number after having previously learned the old one.

12 narrative capabilities. In this context, episodic memory is a subtype of autobiographical memory that concerns the salient experiences that occurred throughout one’s life. This model can be used to support a cognitive abilities account [18] for the reminisce bump that based on the assumption that the rise to and decline from the reminiscence bump coincides with the evolution of other cognitive abilities. In accordance with the basic-systems model, the bump would be a result of the level of functioning of the other abilities. To test this theory, experiments assessing verbal and visuospatial memory together with autobiographical memory retention confirmed the hypothesized link between these and the latter [18], indicating that several cognitive abilities might have a direct influence on the reminiscence bump. However, tests addressing processing speed, memory and intelligence showed a much more rapid ability increase and much slower decrease that could not alone explain the bump evolution [14]. Nevertheless, this theory is still noteworthy and is featured throughout the autobiographical lifespan retrieval literature. Another account based on genetic fitness [14], in line with Darwin’s theory of evolution, states that since early adulthood is the period of reproduction an enhanced memory would serve the purpose of boosting cognitive abilities for selecting the best mate. Thus, a stable memory ability throughout all lifespan would be traded for an enhanced memory to support cognitive abilities during the reproduction period. This explanation provides no direct mechanism to be tested but can rather be viewed as an underlying explanation for the abovementioned accounts. There are also hypotheses based on identity formation that result in slightly different distributions for autobiographical memory and its content across different cultures but with small differences in the reminiscence bump [19]. According to this account, late adolescence and early adulthood is the time when a person develops its ideals and vocations and defines oneself socially. Thus, this is the period from which events have a great impact and are integrated in one’s view of oneself and life story, having higher importance in memory organization [14]. This could also motivate the cognitive and cognitive abilities account and benefit from them. There is a bump related to social identity, one’s association with cultural and social groups, from tens to twenties and a bump related to personal identity, one’s formation of life objectives and significant relationships, form twenties to thirties [20]. Lastly among the psychological hypotheses about the origin of the reminiscence bump is a hypothesis based on the life script, i.e. key events that one expects to experience throughout one’s life at specific ages [21] such as completing school, getting a job, marrying or having a child. These expectations are hypothesized to guide how one constructs and recalls one’s life story and are highly influenced by one’s culture rather than being focused on the individual, contrasting to the previous accounts [21]. Notably, the recall of negative or traumatic events does not lead to a bump, which may be due to the case that they are not expected since they are not present in the life script, which is composed of positive memories [22].

13

Coherent with these accounts on stronger encoding at the bump period is the experimental data, in which the bump is connected to the age at the time of encoding rather than age of time of retrieval [23]. Possible neurobiological hypothesis about mechanisms underlying the reminiscence bump are the decrease of brain plasticity with aging due to decrease of dopamine receptors [24] and pruning of synapses with aging [25]. Such mechanisms are the focus of this modelling work, which is an extension of previous work with a similar aim [8].

2.1.2. Different ways of cuing lead to different bumps

There are several methods for lifespan retrieval experiments with humans, such as olfactory cues and different types of word cues. There is the Galton cue-word technique, that rose from his recall of memories by using objects from the environment to his creation of lists of cue words to trigger the recall of memories, counting the time of recall and noting their distribution in the lifespan [26]. The Crovitz and Schiffman technique attempts to be an improvement of the Galton technique by reducing its bias and applying this improved technique to several participants [27] Galton states in his work “Psychometric Experiments” [26] that ideas emerge by association with an object perceived by the senses. He used to let the mind come up with ideas that emerged from a certain object, being careful to avoid coming up with ideas emerging from previous ideas. Afterwards, he would collect his thoughts and draw conclusions about them. Galton usually walked around 400 meters in Pall Mall3 and scrutinized every object he saw until one or two thoughts arose. He then took note of them and proceeded onto the next object, never allowing the mind to ramble. He noticed the great variety of ideas that could emerge by repeating the same walk several times but also how repetitive they could be. He then created his method which consisted in writing lists of words on different small pieces of paper and placing them under a book so that after some days he could read one word without knowing what the other words were. Then he used a chronograph to count the time from reading a word to the emergence of an idea. He would come up with one to four ideas per word cue. This whole process required a very calm and neutral mindset. He would also note from which period of his life his ideas came from and concluded that half of them were from the period after leaving college. In fact, both location and size of the bump can vary according to the cueing method that is used, with several accounts for that phenomena. From experiments with different cueing methods such as asking for important memories versus providing a word cue, it is hypothesized that cueing could be more relevant than encoding in the bump size and location [28,29]. Word cuing permits any association between the cue and the memory resulting in an early and smaller bump peak contrasting with important memory cuing that produces a narrative-based search connected to a

3 Street in London

14 person’s life story and produces a higher and later peak and a second peak in older years [29]. Olfactory cues result in a bump in the first decade of life [28,29,30]. This can be modelled in encoding by an accelerated decrease of plasticity of the olfactory cued memory system, although other accounts for this earlier bump exist.

2.2. Neuronal computational models

In this project, the mechanisms of a neuronal computational model underlying reminiscence bump parametrization are studied. Thus, in this section an overview of neuronal computational models is provided.

2.2.1. Abstract Models

The McCulloch and Pitts model from 1943 resulted in the first brain inspired network which units would correspond to basic brain cells [31] (Figure 2).

Figure 2: A McCulloch and Pitts unit

A unit receives several binary inputs 퐼푘, representing synaptic inputs on dendrites, that are summed in the soma. The neuron fires if this summed input exceeds a threshold, resulting in the binary output 푦. If one of the inputs is inhibitory, with value 0, the neuron does not fire. Such units can be used to implement Boolean functions such as OR, AND and NOT and several units in a network can implement more complex functions such as division by two. However, it has its limitations namely that the functions to be implemented need to be hard coded and that it does not allow the implementation of functions that are not linearly separable, such as the XOR function.

15

The Rosenblatt Perceptron proposed in 1957 [32] overcomes some drawbacks of the McCulloch and Pitts model (Figure 3).

Figure 3: A Rosenblatt perceptron

The 푥푠 are the inputs and the 푤푠 are the weights. The bias, Ɵ, is the negative of the activation threshold in the McCulloch and Pitts model. The weights are not binary, enabling flexibility in the weights influence on the output, and can also be negative. Moreover, the existence of a weight with zero input does not lead to a complete inhibition and this perceptron can be trained with supervised learning to perform binary classification.

2.2.2. Detailed Models

There exist more detailed models such as the Integrate and Fire Model [33] proposed by Louis Lapicque on 1907, that contributes to spiking neural networks (Figure 4). By stimulating nerve fibers that typically excited the frog’s leg muscle with an electrical pulse, Lapicque concluded that the nerve membrane is polarizable and can be compared to an RC circuit, a resistance in parallel to a capacitor.

16

Figure 4: An Integrate and Fire model unit described as an electric circuit

(푓) A presynaptic spike 훿(푡 − 푡푗 ) is low pass filtered at the synapse and generates an (푓) input current pulse 훼(푡 − 푡푗 ). A current 퐼(푡) charges the circuit with resistance 푅 and capacitance 퐶. The voltage 푢(푡) across the capacitance is compared to a threshold ϑ and the neuron fires if the threshold is exceeded generating an output pulse 훿(푡 − (푓) 푡푖 ). The RC circuit works as follows: 푢(푡) 푑푢(푡) 퐼(푡) = + 퐶 푅 푑푡 And by rewriting it the membrane time constant RC is yielded: 푑푢(푡) 푅퐶 = −푢(푡) + 푅퐼(푡) 푑푡 The Hodgkin-Huxley model, proposed in 1952, described in four ordinary differential equations that included the ionic mechanisms of sodium and potassium the alteration of the membrane potential in the squid giant axons [34]. This work was awarded the Nobel prize in 1963. There have been several improvements developed for this model, for instance, introducing more ionic mechanisms discovered from experimental data. More simplified versions of this model have been proposed, such as the FitzHugh- Nagumo model from 1961, with only two equations [35].

17

2.2.3. Attractor neural network memory modelling

Concerning memory modelling, Hopfield Networks [36] (Figure 5) have been suggested as models of biological memory although they are not used in many applications today since more powerful types of networks for e.g. classification exists. However, this type of recurrent network models have been used to model cortical associative memory [37]. After being trained with a set of patterns, a Hopfield network can retrieve one of these patterns after being fed a distorted version of it. Its nodes can take for example values of 0 or 1 and there are links, symmetric connections between these nodes. They result from a Hebbian learning rule4, i.e. the weights between neurons that are active at the same time are strengthened during training [9].

Figure 5: A Hopfield Network A computational study using a variant of Hopfield’s network studied how attractor neural network models can qualitatively account for basic features of memory degradation in diffuse cerebral atrophy5 and be used to predict manifestations of Alzheimer’s disease based on neurological conditions [38].

Purely correlation-based learning rules such as in the Hopfield network, lead to catastrophic forgetting, implying the loss of all memories when their number exceed the network capacity. To overcome that, the learning rule must exhibit palimpsest properties, or a gradual forgetting of older memories when learning new ones. Hopfield suggested “learning within bounds” [36], in which connection weights are bounded. This comes with a decrease of the network capacity from of 0.137N to 0.05N, i.e. the palimpsest property is traded against long-term capacity.

4 Hebb stated that “neurons that fire together, wire together” 5 Loss of neurons and connections between them

18

Another way of avoiding catastrophic forgetting is presented in a firing-rate Bayesian confidence propagation neural network (BCPNN) attractor neural network with incremental learning developed by the Lansner group at KTH [8][39][40], which is used in this project and is explained in detail in the next chapter.

2.2.4. Other models

A computational theory of hippocampal function [41] makes use of a connectionist6 model that depicts a stimulus representation over many elements as in figure 6:

Figure 6: A generic connectionist network for associative learning [41] The input layer is activated by the stimulus inputs. A function of the weighted sums of input activations activate the internal later that forms a new representation of the input. The output node is also a function of weighted sums of middle layer node activations and output layer activations are interpreted as the network’s response. In this model learning about stimuli is associating their representations with appropriate outputs. Figure 7 shows the complete cortico-hippocampal connectionist model:

6 Connectionism explains mental phenomena using artificial neural networks in which mental phenomena are described by interconnected networks of simple units and learning corresponds to modifying connections strengths based on experience

19

Figure 7: Cortico-hippocampal model [41] The hippocampal network on the right is a predictive autoencoder that learns to recode stimulus information in the internal layer. The network on the left represents learning in the cerebral cortex and long-term memory storage. A more complete version of the model can have several such cortical networks modulated by a hippocampal network (or networks). This theory makes predictions regarding the effects of hippocampal lesions. The Tracelink model [13] is a connectionist model composed of three systems: a trace system, a link system and a modulatory system as depicted in Figure 8:

Figure 8: The Tracelink Model [5]

The trace system is represented by the circles and connections on the plane and represents the neocortical basis for memories. The link system is represented by the

20 six circles in the rectangle and connections from these circles to the circles on the plane and includes the hippocampus and certain other structures. Finally, the modulatory system, ∆W, includes certain basal forebrain nuclei and several areas that have a more controlling function. An encoding of a memory is depicted in Figure 9:

Figure 9: Encoding of a new memory [13]

In the first stage (A), trace elements are activated by a new memory. In the second stage (B), link elements are activated and relevant trace-link connections are enhanced. The modulatory system is activated. In the third stage (C) weak trace-trace connections are forming and the modulatory system is weakly activated. In the fourth stage (D) strong trace-trace connections have been formed and trace-link connections have faded away. The modulatory system is deactivated.

This model can account for many characteristics of amnesia by deactivating the link system during learning and produce normal forgetting curves. It also provides an explanation for the advantages of learning under arousal for long-term recall.

21

The Memory Chain Model [6] is composed of a cascade of memory stores as depicted in Figure 10:

F

Figure 10: A – Memory systems at different time scales, B - Schematic of the Memory Chain Model [6]

When a new memory is encoded, a certain number of representations are formed in the first memory store. With time, this number of representations declines and some are transferred to a subsequent store. Each store has its own decline rate and the stores are organized in order of decreasing decline rate, representing the consolidation of short-term memories into long-term. The strength of a memory is proportional to the number of representations it has.

This model can account for a range of amnesia data namely temporal gradients7 in several animals and also datasets from human patients with several neurological diseases. There was an attempt to replicate the reminiscence bump using this model considering the differentiation of memory distribution into two separate functions, a decline function and an encoding-sampling function [42].

7 Phenomenon characteristic of retrograde amnesia which consists in greater loss of memory for occurrences from the recent past than for occurrences from long ago

22

A more recent work is the Autobiographical Memory-Adaptive Resonance Theory (AM-ART) [7], depicted in figure 11:

Figure 11: AM-ART model [7]

AM-ART is a three-layer neural network. The event-specific knowledge is presented to the bottom layer F1 to encode life events in the middle layer F2 and a sequence of related events in F2 are encoded into an episode in layer F3.

1−2 Input channel 퐹1 receives inputs of time and location from the entorhinal cortex, 3−4 5−6 퐹1 receives input of people and activity from the fusiform gyrus and 퐹1 receives inputs of emotion and imagery from the amygdala, as depicted by the arrows in the bottom. These inputs constitute the basis of events as represented by the connections from the bottom to the middle layer. The episodic pattern 푡푠 is formed in the events layer and is connected to the episodes layer in the hippocampus. There is a flow of memory search and readout throughout all layers as depicted by the grey arrows.

This model was successfully used to model the lifespan retrieval curve originating a curve with the reminiscence bump and also childhood amnesia and recency.

23

Chapter 3 Methods

3.1. Attractor Memory Network Model

In this subsection the model used in this project is represented and explained in more detail.

3.1.1. Modularity

The model has a specific modularity. In this network, a unit 휋푖푖′ (Eq. 2) corresponds to activity in a minicolumn, which is a local group of neurons that can be considered an elementary unit of the cortex. Minicolumns are then organized in groups, the hypercolumns. While a hypercolumn represents a feature of a memory, its corresponding minicolumns represent the values that the feature can take as it can be seen in Figure 12 which provides an example of encoding of an object represented by two features:

Figure 12: A modular attractor memory network with BCPNN learning The hypercolumn on the left (bigger circle) represents orientation of a seen object and each minicolumn (smaller circles) represents an angle of 0, 30, 60 or 90 degrees. The hypercolumn on the right represents color and each of its minicolumns represents a different color. The activity within each hypercolumn is normalized. Minicolumns belonging to different hypercolumns have connections described by weights as it is depicted in the figure for the connections of the 30 and 90 degrees’ minicolumns. The network used in this project is an attractor network with BCPNN plasticity [8]. It has 144 units, having 12 hypercolumns with 12 minicolumns each, to be able to store the desired number of patterns.

24

3.1.2. BCPNN learning and network dynamics

The differential equations governing unit behavior in the model are:

푑ℎ (푡) 푖푖′ 푁 푀푖 휏 = 훽 ′(푡) + ∑ log (∑ 푤 ′ ′ (푡)휋 ′ (푡)) − ℎ ′(푡) (1) 푐 푑푡 푖푖 푗 푗′ 푖푖 푗푗 푗푗 푖푖

ℎ 푒 푖푖′ 휋 ′(푡) = (2) 푖푖 ℎ푖푗 ∑푗 푒

푑Λ푖푖′ (푡) = 훼([(1 − 휆 )휋̂(푡) + 휆 ] − Λ ′ (푡)) (3) 푑푡 0 푖푖 0 푖푖

푑Λ푖푖′푗푗′ (푡) 2 2 = 훼 ([(1 − 휆 )휋̂′(푡)휋̂′(푡) + 휆 ] − Λ ′ ′(푡)) (4) 푑푡 0 푖푖 푗푗 0 푖푖 푗푗

β푖푖′(푡) = 푙표푔(Λ푖푖′ (푡)) (5)

Λ푖푖′푗푗′ (푡) 푤푖푖′푗푗′ (푡) = (6) Λ푖푖′ (푡)Λ푗푗′ (푡)

A set of active units, one per hypercolumn, indexed by 𝑖, represents an activated memory and its level of activation, indexed by 𝑖′ is a confidence estimate. The unit support is ℎ푖푖′ and evolves according to Eq. 1. A background activity 휆0 is introduced to avoid logarithms of zero in the calculations (Eq. 3 and 4). The encoding of a memory consists in the modification of the network’s weights, 푤, and biases, 훽, (Eq. 5 and 6) so that the configuration of unit activations corresponding to that memory becomes an attractor state of the network. While connection strengths between minicolumns belonging to different hypercolumns are represented by weights, minicolumns belonging to the same hypercolumn are related to each other via lateral inhibition, implemented with softmax, in each hypercolumn as in Eq. 2. There is a learning rate parameter in the incremental model, 훼 in Eq. 3 and 4, which can control the strength of encoding of each memory or how much it modifies the network’s weights and biases. Before the introduction of this incremental approach, a previous approach would encode several memories at once [8] by estimating the weights and biases’ probabilities by counting units’ co-activations, which would result in catastrophic forgetting. The incremental approach differentiates from the counter approach in the way that it estimates the weights and biases’ (Eq. 6 and 5, respectively) with exponential moving averages Λ푖푖′ and Λ푖푖′푗푗′ of activity and co-activity of the estimated unit activations 휋̂푖푖′

25

(Eq. 3 and 4). The advantage is that the learning rule can be applied online and the network exhibits palimpsest properties. It is therefore possible to mimic learning and gradual forgetting throughout time, which suits the objective of this project. Plasticity is governed by the whole set of differential equations. According to the first differential equation, the support of a unit ℎ푖푖′ is affected by the weighted contributions 푁 푀푖 of presynaptic units ∑푗 log (∑푗′ 푤푖푖′푗푗′ (푡)휋푗푗′ (푡)) added with the unit bias 훽푖푖′. This value is then affected by the normalization in Eq. 2 resulting in the value for the unit activation 휋푖푖′ , which is used in the calculations of the exponential moving averages in Eq. 3 and 4 that are then used to update the biases and weights. These are then reused once again in the first differential equation. This cycle is continuously repeating during learning. During recall the same happens but with the learning rate 훼 set to zero, preventing the weights and biases from changing their values.

3.1.3. Meaning of model parameters

Plasticity Parameters To model the degree of synaptic plasticity, there is a learning rate parameter in the incremental model, 훼 in Eq. 3 and 4 [8], which can control the strength of encoding of each memory or how much it modifies the network’s weights and biases. By decreasing it over time during learning it can be used to represent decrease of dopamine receptors combined with other aging phenomena, allowing the modelling of a reminiscence bump [8].

푡 − 휏 This way, 훼 = 훼0푒 푠 + 훼푐푠푡, with 휏푠 being the time constant of the age-dependent plasticity decay, that mediates this decay of dopamine receptors. Both 훼0 and 휏푠 are parameters that can be varied to investigate its effect on the reminiscence bump. While performing experiments with this model, I observed that if the learning rate decay stopped at a certain age it would be possible to model recency, the tendency to retrieve recent memories. This can be achieved by decomposing the evolution of the learning rate in a constant function 훼푐푠푡 added to the latter decreasing function, a parameter that is also important to investigate. Neural parameters

The membrane time constant, 휏푐, can represent the RC time constant as in the Integrate and Fire Model [20]. It thus can represent the time for the activation value to reach about 63% of its target value, the charging of the capacitor through the resistance, or reduce its value to about 37% in the absence of activity, being the level of activation the voltage of the capacitor. Other model parameters

A background noise activity 휆0 ≪ 1 was introduced to avoid logarithms of zero in the calculations resulting in all minicolumns having a minimal activity.

26

Regarding lateral inhibition, the softmax factor 훾 that can be present in Eq. 2 multiplying ℎ in the nominator and denominator represents how concentrated the estimate is around the larger input values. Other model parameters that can be varied are the degree of memory cue perturbation or the number of hypercolumn swaps in the perturbed pattern and the successful recognition overlap threshold. The number of hypercolumn swaps would represent the similarity of the cue to the target memory and the successful recognition overlap threshold would represent the required vividness of the recalled memory. Network parameters The network size, more concretely the number 퐻 in a network with 푁=퐻 × 푀 units organized in 퐻 hypercolumns with 푀=퐻 minicolumns each, represents the number of minicolumns available to store each memory. Based on experience, it is often good to have 푀=퐻.

3.2. Simulation protocol

Training consists in sequentially clamping the activation of each pattern for a certain duration, while letting the networks’ weights and biases evolve. During recall the network state evolves after being presented with a perturbed version of a pattern, while keeping the networks’ weights and biases fixed. Recall overlap is the overlap between the actual pattern and the final state reached. In cued recall, the perturbed version of the pattern consists in having a fixed number of randomly chosen hypercolumns with their activated minicolumn randomly swapped. For each pattern, the network is presented several times with a different perturbed version of that pattern and the overlaps are calculated. The measure of interest is the ratio of retrieval (Eq. 8). To investigate the effect of and sensitivity to a certain parameter on the reminiscence bump characteristics, a greedy approach is followed. All parameters are kept constant in the simulation except the one that is being investigated. The values of the parameter that is being subject to examination are chosen as follows: If it is a parameter which variation results in bump horizontal translation (such as initial plasticity and time constant of the age-dependent plasticity decay), its value is varied from values that result from a bump significantly shifted to the left to a bump significantly shifted to the right. The effect of these parameters is systematically measured by fitting a function in which the middle age of the bump is dependent on parameter value variations. If it is a parameter which variation results in general increase of ratio of retrieval (all the other parameters) its value is varied from values that result from very low to very high ratios of retrieval. The effect of these parameters is systematically measured by fitting a function in which the total ratio of retrieval over the lifespan, the area under the ratio of retrieval curve, is dependent on parameter value variations.

27

The following values for the initial combination of parameters have been chosen since they yield a realistic configuration of the bump, a graph similar to the experimental lifespan retrieval curves from studies with humans:

Parameter Value Network size 144 units in a network arranged in a 12-by-12 grid Number of presented patterns 70 Membrane time constant, 푡푐 1 Softmax gain, 훾 1 Euler step, 푑푡 0.01 Initial plasticity, 훼0 0.3 Time constant of plasticity decay, 휏푠 10 Background activity level, 휆0 0.01 Learning time 1 Clamping recall time 0.1 Recall time 2 Number of swaps, 푠 6 Number of generated networks 100 Number of perturbed patterns presented per age in 100 each network Overlap threshold, 표푡 11/12 Constant plasticity, 훼c - Table 2: Parameters for the initial configuration of simulations Since the ratio of retrieval follows a Bernoulli distribution of the random variable which takes the value 1 if the threshold for successful recall overlap is exceeded or 0 otherwise, the variance of the ratio of retrieval is its value multiplied by one minus its value. The standard error of the mean is the square root of the variance divided by the squared root of the sample size.

28

3.3. Analysis and evaluation

The network is presented with 70 patterns, where each pattern represents the salient episodic memories of one year of life. The overlap 표 between two patterns, 푝1 and 푝2, is defined as 푝 .푝 표 = 1 2 (7) ‖푝1‖‖푝2‖ The overlaps between pairs of patterns are mostly between 0 and 0.25. Sometimes there is a higher overlap, but it does not constitute a problem given that the usual overlap threshold defined for a successful recall is 0,916 (11/12) and the overlap between a pattern and its perturbed version, the same pattern but with a certain number of randomly chosen hypercolumns with their activated minicolumn randomly swapped, is also high enough. These overlap values result from the total number of patterns presented and the network size. The ratio of retrieval 푟 of a pattern is defined as 푛푢푚푏푒푟 표푓 푠푢푐푒푠푠푓푢푙 푟푒푐푎푙푙푠 푟 = (8) 푛푢푚푏푒푟 표푓 푟푒푐푎푙푙 푎푡푡푒푚푝푠

With this network size, almost all patterns in the incremental network with a low constant learning rate of 0.01 have a ratio of retrieval of 1 and never below 0.99, which means that the network’s capacity is not exceeded, so there is no incorrect storage of patterns which would affect the validity of the experiments. This ratio of retrieval, results from presenting the network with a perturbed pattern with 3 swaps. This means that the ratio of retrieval can only increase when presenting the network with a perturbed pattern yields an overlap increase from 0.75, at presentation time, to higher than 0.916 after relaxation, signaling converge towards the original pattern. This choice of overlap values is based on the experiments of Sandberg et al. (2002), Figure 4 [39], in which the ratio of retrieval increases if the overlaps increase from 0.8 (2 random hypercolumn swaps in a 100 unit network arranged in 10 hypercolumns with 10 minicolums each) to 0.85.

29

Chapter 4 Results

The primary aim of this work was to study the mechanisms underlying the modulation of reminiscence bump obtained in long-term simulations of episodic memory retrieval using a modular attractor network with incremental BCPNN learning [8]. To study the effect of each parameter in the reminiscence bump the parameter is varied in a greedy way and a curve is fitted relating the parameter variation and a quantitative measure of its effect on the reminiscence bump. All the parameters explained in section 2.4 are analyzed in this way. All simulations were performed in the same conditions as those used in an example demonstrated in Figure 11 and have variance of similar magnitudes. The experimental paradigm, described in more detail in section 3.2, consisted in training the network with 70 patterns, one per a simulated year within the network’s lifetime and then perform recall. In training, memories are sequentially presented to the network for 100 time steps or Euler steps. Recalling consisted in clamping the network with a perturbed pattern for 10 time steps and letting the network state evolve for 200 time steps. Recalling was performed sequentially for all the patterns after training. For each pattern, 100 recall attempts were made (trials with randomly perturbed cues) and the ratio of retrieval (Eq. 8) is shown in the graphs in this section. To evaluate the effect of a parameter on the reminiscence bump, all parameters are kept constant except for the one that is being investigated.

30

4.1. Reminiscence bump

In the first set of simulations we examined the individual effects of selected parameters on the characteristics of the reminiscence bump.

In Figure 13 an example of cued recall given the parameters in Table 1 is shown.

Figure 13: Recall with initial configuration of parameters and standard error of the mean

31

4.1.1. Network size The network size, more concretely the number of hypercolumns, 퐻, and minicolumns per hypercolumn, 푀, determines the memory capacity. In this work we decided to maintain the following relationship: 퐻=푀 The larger the network, the lower the crosstalk between the stored memory patterns and the higher the ratio of retrieval is (Figure 14).

Figure 14: The memory retrieval performance over the network’s lifetime depending on the network size (퐻=푀={9, 10, 11, 13, 15, 17, 20})

32

In order to systematically analyze the effects of the network size in the lifespan retrieval curve, a more detailed analysis of the relation between network size and the area under the ratio of retrieval curve was made, which demonstrated the sigmoidal relationship between the aforementioned entities (Figure 15).

Figure 15: Sigmoidal fit made to the simulation data explaining the dependence of the area under the ratio of retrieval curve on the network size

The area under the ratio of retrieval curve is the sum of the ratio of retrieval for all ages. Therefore, it should be interpreted as the total recalling during and individual’s lifetime. It is used to account for the complete recall capability of the individual. The type of equation to fit, sigmoidal, is chosen as the one that visually represents better the data.

33

4.1.2. Initial plasticity level

The initial plasticity level, 훼0, represents the initial level of dopamine in the brain. If it is low, the plasticity is very low at the older age of the network, resulting in recalling only memories from early years. If it is too high only memories from the recent years are recalled since there is a high level of plasticity even after its decay, which makes the network adapt to the latest memories that “overwrite” the early memories. This results in a shift of the bump mediated by the initial plasticity level (Figure 16).

Figure 8: Variation of Figure 16: The memory retrieval performance over the network’s lifetime depending on the initial plasticity level (훼0={0.01, 0.1, 0.3, 1, 2, 10})

34

In order to systematically analyze the effects of the initial plasticity level in the lifespan retrieval curve, a more detailed analysis of the relation between initial plasticity level and the middle age of the bump was made, which demonstrated the exponential relationship between the abovementioned entities (Figure 17).

Figure 17: Exponential fit made to the simulation data explaining the dependence of the middle age of the bump on the initial plasticity level

35

4.1.3. Time constant of the age-dependent plasticity decay

The time constant of the age-dependent plasticity decay, 휏푠, mediates the time it takes for the initial plasticity level to decrease. The higher it gets, the more the bump shifts to the right. If it is too high, plasticity is high throughout lifespan and consequently the more recent memories are retrieved, as it can be seen in Figure 18. It has the same effect as varying the initial plasticity parameter (compare with Figure 16). Interestingly, olfactory cues result in a bump in the first decade of life [30,31,32]. This can be modelled in encoding by an accelerated decrease of plasticity of the olfactory cued memory system by using a smaller value for 휏푠.

Figure 18: The memory retrieval performance over the network’s lifetime depending on the time constant of the age-dependent plasticity decay (휏푠={2, 8, 10, 15, 20, 50})

36

In order to systematically analyze the effects of the time constant of the age-dependent plasticity decay in the lifespan retrieval curve, a more detailed analysis of the relation between time constant of the age-dependent plasticity decay and the middle age of the bump was made, which demonstrated the sigmoidal relationship between the abovementioned entities (Figure 19).

Figure 19: Sigmoidal fit made to the simulation data explaining the dependence of the middle age of the bump on the time constant of the age-dependent plasticity decay

37

4.1.4. Background activity level

The background noise activity 휆0 ≪ 1 was introduced to avoid logarithms of zero in the calculations resulting in all the minicolumns having a minimal activity. If there was no background activity the weights and biases would be symmetric and memories would not be forgotten since weights grow exponentially (Eq 6) and biases decrease exponentially (Eq 5) and their overall sum would be the same so the lower this activity, the higher the recall. For very low values of this activity the recall decreases with the decrease of the background activity perhaps because the background activity is not sufficient and perturbs calculations (Figure 20).

Figure 20: The memory retrieval performance over the network’s lifetime depending on the background activity level ( 휆0={0.15, 0.12, 0.11, 0.1, 0.01, 0.001, 1e-5, 1e-10, 1e-25})

38

In order to systematically analyze the effects of the background activity level in the lifespan retrieval curve, a more detailed analysis of the relation between background activity level and the area under the ratio of retrieval curve was made, which demonstrated the linear relationship between the abovementioned entities (Figure 21).

Figure 21: Linear fit made to the simulation data explaining the dependence of the area under the ratio of retrieval curve on the background activity level

39

4.1.5. Degree of memory cue perturbation The degree of memory cue perturbation (noisy pattern) or number of binary swaps, 푠, represents the similarity between the cue and the target memory, as explained in the subsection Meaning of Model Parameters in Methods. Thus, the higher the number of swaps, the lower the ratio of retrieval. This is because the perturbed pattern leads the network state to another attractor that does not correspond to the target one (Figure 22).

Figure 22: The memory retrieval performance over the network’s lifetime depending on the degree of memory cue perturbation (푠={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12})

40

In order to systematically analyze the effects of degree of memory cue perturbation in the lifespan retrieval curve, a more detailed analysis of the relation between degree of memory cue perturbation and the area under the ratio of retrieval curve was made, which demonstrated the sigmoidal relationship between the abovementioned entities (Figure 23).

Figure 23: Sigmoidal fit made to the simulation data explaining the dependence of the area under the ratio of retrieval curve on the degree of memory cue perturbation

41

4.1.6. Overlap threshold The overlap threshold determines the overlap needed for successful recall, representing the required vividness of the recalled memory. It is noticeable that for a threshold of 5/12 or larger, the ratio of retrieval is the same throughout lifespan (Figure 24). Since the overlap between the 70 original patterns in a 12-by-12 network patterns is usually between 0 and 3/12 the high ratio of retrieval for thresholds between these values is due to this overlap values. Each value selected for the threshold is an addition of 1/12 to the previous value because the network has 12 hypercolumns and 12 minicolumns.

Figure 24: The memory retrieval performance over the network’s lifetime depending on the overlap threshold (표푡={1/12, 2/12, 3/12, 4/12, 5/12, 6/12, 7/12, 8/12, 9/12, 10/12, 11/12, 12/12})

42

In order to systematically analyze the effects of overlap threshold in the lifespan retrieval curve, a more detailed analysis of the relation between overlap threshold and the area under the ratio of retrieval curve was made, which demonstrated the sigmoidal relationship between the aforementioned entities (Figure 25).

Figure 25: Sigmoidal fit made to the simulation data explaining the dependence of the area under the ratio of retrieval curve on the overlap threshold

43

4.2. Recency

In the second set of experiments we focused on another aspect of the memory retrieval curve over the model’s lifetime, which reflects the capability to recall recently memorized pattens, referred to as the recency phenomenon depicted in the rise in recall in the years above 50 in Figure 1. I observed that if the learning rate decay stopped at a certain age it would be possible to model recency. It turns out that by decomposing the evolution of the learning rate as a constant plus a decreasing function, recency is achieved (Figure 26). Childhood amnesia, the inability of adults to recall episodic memories from their early childhood, is also observed. The modelling of recency is the secondary contribution of this work. The values of the parameters resulting in the recency graph (Figure 26) are the values of Table 2 with the following changes:

• initial plasticity, 훼0 = 0.25 • constant plasticity, 훼푐푠푡 = 0.015 • time constant of plasticity decay, 푡푠 = 8 • number of swaps, 푠 = 8 This changes were made to better fit the graphs from experimental studies with humans.

44

Figure 26: Recency with the standard error of the mean The same experiment without the constant plasticity parameter was performed to allow the comparison of the two graphs (Figure 27). It can be observed that the introduction of the constant plasticity leads to a decrease in the recall of early memories, to the desired recency effect and to a shift of the bump to the right.

Figure 27: Experiment from figure 26 without the constant plasticity

45

4.2.1. Constant Plasticity The constant plasticity is a fixed component of plasticity level that sets the lower limit for the decay of plasticity throughout lifetime. Thus, increasing constant plasticity results in a bump more shifted to the right and a higher recency effect (Figure 28). If the fixed plasticity is too high a bump forms in older ages due to the high plasticity at that age that prevails over the decreasing plasticity function.

Figure 28: The memory retrieval performance over the network’s lifetime depending on the constant plasticity (훼c={0, 0.01, 0.015, 0.02, 0.025, 0.1})

46

4.2.2. Time constant of the age-dependent plasticity decay

The time constant of the age-dependent plasticity decay, 휏푠, mediates the time it takes for the initial plasticity level to decrease. Thus, increasing the time constant of the age- dependent plasticity decay results in a bump more shifted to the right and a higher recency effect. If the time constant of plasticity decay is too high a bump forms in older ages due to the high plasticity at that age that prevails over the fact that plasticity is decreasing throughout time (Figure 29).

Figure 29: The memory retrieval performance over the network’s lifetime depending on the time constant of the age-dependent plasticity decay (휏푠={4, 8, 10, 15, 20, 50})

47

Chapter 5 Discussion

5.1. Summary of findings

The parameters that showed the most substantial effect in the bump were the initial plasticity and time constant of the age-dependent plasticity decay because these are the ones that set the position of the bump, namely the age at which the retrieval curve has higher magnitude and its peak. The constant component of the plasticity value throughout lifetime enables to model recency and also has a substantial effect on the position of the bump and shape of the lifespan retrieval curve, when added. By tuning this parameter a curve with a recency tail is achieved and a bump in later years can also be achieved. The other parameters have a lower relevance since they mainly only influence the magnitude of the retrieval curve.

5.2. Interpretation of the results and their impact

All the psychological hypotheses presented in the beginning of this work are compatible with the neurobiological hypothesis that the mechanisms underlying the reminiscence bump are • the decrease of brain plasticity with aging due to dropping levels of dopamine receptors and • the pruning of synapses with aging These are represented in this model by the decaying plasticity throughout time. Dopamine D1 activation influences synaptic plasticity [43]. It can provoke neuronal excitation or inhibition, resulting in synaptic potentiation or depression, an increase or decrease in the efficacy of the synapses, or “connections” between neurons. It is known that D1 decreases with aging [44]. By tuning the most important parameters, initial plasticity level, time constant of plasticity decay, constant plasticity and some other parameters, a curve with recency and childhood amnesia is produced. There was no need for a cascade of systems or different encoding and forgetting functions such as in the attempt to replicate the reminiscence bump with the Memory Chain Model [6]. The curve is similar to the curve generated by the AM-ART model [7]. The parametrization of the curve with recency is still compatible with the neurobiological hypothesis of decreasing dopamine receptors and pruning of synapses with aging. The parametrization using the constant plasticity parameter

48 suggests a biologically motivated assumption that the dopamine decay throughout lifetime has a lower limit.

5.3. Limitations

Although all results seem to be realistic using this approach that considers only long- term memory, the interaction between different brain areas was not represented as in the Memory Chain Model [6] and Tracelink model [13]. This could have been performed by connecting several networks representing the different areas that deal with memory and making use of synaptic adaptation, providing more realism to the model although since only long-term memory is considered this approach does not seem necessary. Furthermore, if we had a measure of how strongly encoded a pattern is, such as a sensitivity index [45], we could replicate experimental forgetting curves, i.e. how strongly a pattern is encoded along time for each age. We could tune the model to have the same forgetting rates as the experimental data and thus be more realistic. By doing this, the values obtained would have a more relevant meaning and the analysis could be quantitatively more realistic. It should be made clear that the values used in the parametrization are less relevant. The qualitative relations are expected however to be interpretable. Thus, this approach allowed us to understand the origin of translation as well as decrease and increase of the bump amplitude, but the precise values play little role. A lesson learned is that since there are different models and ways in which a phenomenon can be parametrized one has to choose the model based on the assumptions that one is willing to make and level of realism one wants to achieve.

5.4. Social, ethical and sustainability aspects

5.4.1. Ethical aspects This project makes use of a computational approach to study the brain. It is an approach that helps avoiding excessive experiments with humans and animals. There is no risk of complications or damages in the brain resulting from chemical or invasive techniques, e.g. the manipulation of dopamine levels in the brain or stimulating neurons. This is a safe approach and yet allowed us to formulate assumptions about the lower limit for the decay of dopamine throughout lifetime. This work predicts the effects that changing the level of dopamine has in the reminiscence bump and therefore may suggest a pharmacological intervention to yield these effects in real life. Here, we can raise an ethical question of whether using pharmacological control over cognitive skills is a better option than natural training methods such as mental exercises and hard work.

49

Another ethical question is why we should manipulate the natural evolution of memory skills since its declining is a natural process of aging. So, boundaries should be traced to determine when we should apply the pharmacological approach and to whom even if the results seem promising.

5.4.2. Social aspects The study of the brain has direct impact in the neurology community because the more it is known of its functioning the better is the diagnosis and treatment of neurological diseases. This improvement leads to an increase in the well-being of the patients and their interpersonal relationships can be kept. This constitutes a positive social impact, preventing isolation and marginalization of the old people and other patients with neurological and psychiatric disorders and slowing down the degeneration process caused by these diseases.

5.4.3. Sustainability aspects Promoting healthy lives and well-being at all ages is a sustainable development goal that should be taken in account in a more human society. As it was mentioned throughout this section, studying memory can reduce the impairments caused by diseases that affect brain function which have more incidence on the elderly and this is important because life expectancy is continuously increasing. From a technological and economic perspectives, the algorithmic implementation of brain function contributes to the development of better AI technologies taking advantage of how the brain works to improve their performance. This can increase productivity in several applications resulting in benefits for the economy and society.

50

Chapter 6 Conclusion and Future Work

In this project, the human lifespan retrieval curve was modelled with an incremental attractor network model and the effect of several parameters of the model were analyzed in a systematic way. The objective was to study the mechanisms that modulate bump characteristics, i.e. position and magnitude, in this firing-rate attractor neural network model with BCPNN plasticity [8]. The parameters that showed the most significant effect on the bump characteristics were the initial plasticity and time constant of the age-dependent plasticity decay that set the position of the bump. The constant component of plasticity value also demonstrated a significant impact on the position of the bump and shape of the lifespan retrieval curve, when added. The network size has to be large enough for the storage of all the patterns and the magnitude of the retrieval curve increases with the network size. The other parameters mainly influence the magnitude of the retrieval curve. Despite the model’s simplicity and high level of abstraction it has demonstrated considerable potential to simulate the human lifespan retrieval curve phenomena. This firing-rate based attractor neural network with BCPNN plasticity [8] provides insights into several mechanisms underlying reminiscence bump characteristics and even recency and childhood amnesia so there does not seem to be a motivated need for more complex or spiking models to replicate these phenomena. However, these could be considered in order to add more biological realism to the modelling. As for the future work, it would be interesting to study in this model to find out if the effects of varying all the parameters are kept and what parameter values would be suitable to obtain a good starting point for varying the parameters. Some preliminary results indicate that a softmax factor of 2 would be better than 1 used here, since it increases significantly the ratio of retrieval. Moreover, the effect of all the parameters could be tested in a model with recency and reproduce modality dependent bump position. Furthermore, parameters could be tuned to have even more realistic values. If we had a measure of how strongly encoded a pattern is, such as a sensitivity index [45], we could replicate experimental forgetting curves, i.e. how strongly a pattern is encoded along time for each age, so that the model has the same forgetting rates as the experimental data. Finally, a more advanced parameter sensitivity analysis could be performed.

51

Bibliography

[1] Bäckman, L. and Bèackman, L., “Memory functioning in dementia”, Advances in psychology, 1992

[2] Jaušovec, N. and Jaušovec, K., “Working memory training: Improving intelligence – Changing brain activity.” Brain and Cognition, vol. 79, no.2, pp. 96-106, 2012

[3] Wheeler M.E., Ploran E.J., “Episodic Memory” Encyclopedia of Neuroscience, pp 1167-1172, 2019

[4] Abraham W. C., Jones O. D. and Glanzman D. L., “Is plasticity of synapses the mechanism of long-term memory storage?” Science of Learning, vol. 4, no 1., pp. 1-10, 2019.

[5] Munawar K., Kuhn S. K. and Haque S., “Understanding the reminiscence bump: A systematic review.” Plos One, vol. 13, no. 12, 2018.

[6] Murre J.M.J., Chessa A. G and Meeter M., “A mathematical model of forgetting and amnesia.” Frontiers in psychology, vol.4, pp.76, 2013

[7] Wang D., Tan A., Miao C. And Moustafa A., “Modelling Autobiographical Memory Loss across Life Span”, 2019

[8] Sandberg, A., “Bayesian attractor neural network models of memory.” Dissertation, Stockholm University, 2003.

[9] Hebb, D.O. “The organization of behavior.” New York: Wiley, 1949

[10] Lansner, A. and Holst, A., “A Higher Order Bayesian Neural Network with Spiking Units.” International Journal of Neural Systems, pp. 115-28, 1996.

[11] Orre, R., Lansner, A., Bate, A. and Lindquist, M., “Bayesian neural networks with confidence estimations applied to data mining.” Computational Statistics & Data Analysis vol. 34 pp. 473- 493, 2000.

[12] Janssen, S. M. J., Rubin, D. C. And Jacques P. L., “The temporal distribution of autobiographical memory: changes in reliving and vividness over the life span do not explain the reminiscence bump”, Memory and Cognition, vol. 39, pp. 1-11, 2011

[13] Meeter, M. and Murre J. M. J., “Tracelink: A model of amnesia and consolidation” Cognitive Neuropsychology, vol. 22, no. 5, pp. 559-587, 2005

[14] Rubin D., Rahhal, T. and Poon, L., “Things learned in early adulthood are remembered best” Memory & Cognition, vol. 26, no. 1, pp. 3-19, 1998.

52

[15] Janssen, S., Gralak, A., Murre, J., ”A model for removing the increased recall of recent events from the temporal distribution of autobiographical memory.” Behavior Research Methods, vol. 43, no. 4, pp. 916-930

[16] Schrauf R. W. and Rubin D. C. “Effects of voluntary immigration on the distribution of autobiographical memory over the lifespan.” Applied Cognitive Psychology, vol. 15, no. 7, pp- S75-S88, 2001.

[17] Rubin D. C., “The Basic-Systems Model of Episodic Memory.” Perspectives on Psychological Science, vol. 1, no. 4, pp. 277-311, 2016

[18] Janssen S.M.J., Kristo, G., Rouw R. and Murre J.M.J.,”The relation between verbal and visuospatial memory and autobiographical memory.” Consciousness and Cognition, vol. 31, pp. 12-23, 2015

[19] M. A., Qi W., Kazunori, H., Shamsul, H., “A Cross-Cultural Investigation of Autobiographical Memory: On the Universality and Cultural Variation of the Reminiscence Bump.” Journal of Cross-Cultural Psychology, vol. 36, no. 6, pp. 739-749, 2005.

[20] Holmes A. and Conway M.A., “Generation identity and the reminiscence bump: Memory for public and private events.” Journal of Adult Development, vol 6, no. 1, pp. 21-34 1999

[21] Berntsen D. and Rubin D.C., “Cultural life scripts structure recall from autobiographical memory.” Memory & Cognition, vol. 32, no. 3, pp. 427-442, 2004

[22] Berntsen, D. and Rubin, D.C., "Emotionally charged autobiographical memories across the life span: The recall of happy, sad, traumatic, and involuntary memories". Psychology and Aging, vol. 17, no. 4, pp. 636–652, 2002

[23] Janssen, S., Chessa, A. and Murre, J., “The reminiscence bump in autobiographical memory: Effects of age, gender, education, and culture.” Memory, vol. 13, no. 6, pp. 658-668, 2005.

[24] Karrer, T. M., Josef, A. K., Mata, R., Morris, E. D. and Samanez-Larkin, G. R., “Reduced dopamine receptors and transporters but not synthesis capacity in normal aging adults: a meta-analysis.” Neurobiology of Aging, vol. 57, pp. 36-46, 2017.

[25] Peters, A., Sethares, C. and Luebke, J. I., “Synapses are lost during aging in the primate prefrontal cortex.” Neuroscience, vol. 152 no. 4, pp. 970-981, 2018.

[26] Galton, F. “Psychometric experiments.” Brain, vol. 2, pp. 149-162, 1879

[27] Crovitz, H. F., and Schiffman, H. “Frequency of episodic memories as a function of their age.” Bulletin of the Psychonomic Society, vol. 4, 1974

[28] Rubin, D. C., “One bump, two bumps, three bumps, four? Using retrieval cues to divide one autobiographical memory reminiscence bump into many.” Journal of Applied Research in Memory and Cognition, vol. 4, no. 1, pp. 87-89, 2015.

53

[29] Koppel, J. and Rubin, D.C., “Recent Advances in Understanding the Reminiscence Bump: The Importance of Cues in Guiding Recall From Autobiographical Memory.” Current Directions in Psychological Science, vol. 25, no. 2, pp. 135-140, 2016

[30] Larsson, M. and Willander, J. “Autobiographical odor memory.” Annals of the New York Academy of Sciences, vol. 1170, pp. 318-323, 2009.

[31] McCulloch, W. S., and Pitts, W., “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics, vol 5, pp. 115-133, 1943

[32] Rosenblatt, F., “The Perceptron - a perceiving and recognizing automaton.” Report 85 – 460 - 1, Cornell Aeronautical Laboratory, 1957

[33] Brunel, N. and van Rossum M.C.W., “Lapicque’s 1907 paper: from frogs to integrate-and-fire” Biol Cybern, vol. 97, pp. 337-339, 2007

[34] L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and its application to conduction and excitation in nerve.” Journal of Physiology vol. 119, no. 4, pp. 500-544, 1952

[35] FitzHugh R., “Impulses and Physiological States in Theoretical Models of Nerve Membrane." Biophysical Journal, vol. 1, no. 6, pp 445-466, 1961

[36] Hopfield, J. J., “Neural Networks and Physical Systems with Emergent Collective Computational Abilities.” Proceedings of the National Academy of Sciences of the United States of America, vol. 79, no. 8, pp. 2554-2558, 1982

[37] Lansner, A., “Associative memory models: from the cell-assembly theory to biophysically detailed cortex simulations.” Trends in Neurosciences, vol. 32 no. 3, pp.178-186

[38] Ruppin E. and Reggia J., “A Neural Model of Memory Impairment in Diffuse cerebral Atrophy”, British Journal of Psychiatry, vol. 166, pp. 19-28, 1995

[39] Sandberg, A., Lansner, A., Petersson K.M. and Ekeberg, O., “A Bayesian attractor network with incremental learning.” Network: Computation in Neural Systems, vol. 13, no. 2, pp. 179-194, 2002

[40] Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. “On forgetful attractor network memories.” Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB- 1 Conference (eds. Malmgren, H., Borga, M. & Niklasson, L.) 54–62 Springer, 2000.

[41] Gluck M. A, and Myers C. E., “Hippocampal mediation of stimulus representation: A computational theory”, Hippocampus, vol. 3, no.4, pp. 491-516, 1993

[42] Janssen S. M. J., Chessa A. G. and Murre J. M. J., “Modelling the reminiscence bump in autobiographical memory with the Memory Chain Model”, Constructive Memory, NBU Series in Cognitive Science, pp. 138-147, 2003

54

[43] Hagena, H., Manahan-Vaughan, D., “Dopamine D1/D5, But not D2/D3, Receptor Dependency of Synaptic Plasticity at Hippocampal Mossy Fiber Synapses that Is Enabled by Patterned Afferent Stimulation, or Spatial Learning. ” Frontiers in synaptic neuroscience, vol.8, pp.31, 2016.

[44] Abdulrahman, H., Fletcher, P. C., Bullmore, E., Morcom, A. M., “Dopamine and memory dedifferentiation in aging.” Neuroimage, vol. 153, pp. 211-220, 2017.

[45] Iatropoulos, G., “Modeling the Development of Synaptic Memory: Implications for Reminiscence Bumps and Forget Rates.” Ongoing.

55

TRITA -EECS-EX-2020:445

www.kth.se