Constructive Role of Plasticity Rules in Reservoir Computing
Total Page:16
File Type:pdf, Size:1020Kb
Constructive role of plasticity rules in reservoir computing Guillermo Gabriel Barrios Morales Master’s Thesis Master’s degree in Physics of Complex Systems at the UNIVERSITAT DE LES ILLES BALEARS Academic year 2018-2019 Date: 13/09/2019 UIB Master’s Thesis Supervisors: Miguel C. Soriano and Claudio R. Mirasso “Our treasure lies in the beehives of our knowledge. We are perpetually on our way thither, being by nature winged insects and honey gatherers of the mind. The only thing that lies close to our heart is the desire to bring something home to the hive.” — Friedrich Nietzsche Constructive role of plasticity rules in reservoir computing G. Barrios Morales ABSTRACT Over the last 15 years, Reservoir Computing (RC) has emerged as an appealing ap- proach in Machine Learning, combining the high computational capabilities of Recur- rent Neural Networks with a fast and easy training. By mapping the inputs into a high-dimensional space of non-linear neurons, this class of algorithms have shown their utility in a wide range of tasks from speech recognition to time series prediction. With their popularity on the rise, new works have pointed to the possibility of RC as an ex- isting learning paradigm within the actual brain. Likewise, successful implementation of biologically based plasticity rules into RC artificial networks has boosted the perfor- mance of the original models. Within these nature-inspired approaches, most research lines focus on improving the performance achieved by previous works on prediction or classification tasks. In this thesis however, we will address the problem from a different perspective: instead on focusing on the results of the improved models, we will analyze the role of plasticity rules on the changes that lead to a better performance. To this end, we implement synaptic and non-synaptic plasticity rules in a standard Echo State Network , which is a paradigmatic example of an RC model. Testing on temporal series prediction tasks, we show evidence that improved performance in all plastic models may be linked to a decrease in spatio-temporal correlations in the reservoir, as well as a significant increase on individual neurons ability to separate similar inputs in their activity space. From the perspective of the reservoir dynamics, optimal performance is suggested to occur at the edge of instability. This is a hypothesis previously suggested in literature, but we hope to provide new insight on the matter through the study of different stages on the plastic learning. Finally, we show that it is possible to combine different forms of plasticity (namely synaptic and non-synaptic rules) to further improve the performance on prediction tasks, obtaining better results than those achieved with single-plasticity models. Key words : Reservoir Computing, plasticity rules, Hebbian learning, intrinsic plastic- ity, temporal series prediction. 2 Constructive role of plasticity rules in reservoir computing G. Barrios Morales Contents 1 Introduction 4 1.1 On the shoulders of giants: a brief historical perspective. ........ 4 1.2 Motivation and objectives of the thesis. ................. 9 2 Theoretical framework 11 2.1 The ESN architecture. ........................... 11 2.2 The role of hyper-parameters. ....................... 12 2.3 Training a simple ESN. ........................... 14 2.4 Neural plasticity: biological and artificial implementations. ....... 17 2.4.1 Plasticity in the brain. ...................... 17 2.4.2 Plasticity in Echo State Networks. ................. 18 2.5 Prediction of a chaotic time series. .................... 22 2.6 Memory capacity task. ........................... 24 3 Results 26 3.1 Hyper-parameter optimization. ...................... 26 3.2 Performance in prediction task with MG-17. ............... 27 3.3 Performance in prediction task with MG-30. ............... 31 3.4 Performance in memory capacity task. .................. 34 3.5 Understanding the effects of plasticity. .................. 35 4 Conclusions 44 3 Constructive role of plasticity rules in reservoir computing G. Barrios Morales 1 Introduction 1.1 On the shoulders of giants: a brief historical perspective. From the first bird-inspired “flying machines” of Leonardo da Vinci to the latest ad- vances in artificial photosynthesis, humankind has constantly sought to mimic nature in order to solve complex problems. It is therefore not surprising that the dawn of Ma- chine Learning (ML) and Artificial Neural Networks (ANN) was also characterized by the idea of emulating the functionalities and characteristics of the human brain. Within his book The Organization of Behavior, Donald Hebb proposed in 1949 a neurophys- iological model of neuron interaction that attempted to explain the way associative learning takes place [1]. Theorizing on the basis of synaptic plasticity, Hebb suggested that the simultaneous activation of cells would lead to the reinforcement of the synapses involved. Today known as Hebbian theory1, its main postulate reads: “When an axon of cell A is near enough to excite cell B or repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” [1] Swiftly, Hebbian theory was taken by neurophysiologists and early brain modelers as the foundation upon which to build the first working artificial neural network. In 1950, a young Nat Rochester at the IBM research lab embarked in the project of modeling an artificial cell assembly following Hebb’s rules [4]. However, he would soon be discouraged by an obvious flaw in Hebb’s initial theory: as connection strength increases with the learning process, neural activity eventually spreads across the whole assembly, saturating the network. Further attempts were made to find the “additional rule” which could prevent the overgrowing of connections, but none of them proved successful, and the project was soon consigned to oblivion. It wouldn’t be until 1957 when Frank Rosenblatt, who had previously read The Or- ganization of Behavior and sought to find a more “model-friendly” version of Hebb’s assembly, came with a solution: the Perceptron, the first example of a Feed Forward Neural Network (FFNN) [5]. Dismissing the idea of an homogeneous mass of cells, 1As Hebb himself would recognized later, the idea of associative learning based on alterations in the strength of the connections between neurons was far from new [2]. Today, the merit of the theory is commonly conceded to Tanzi [3], although other authors claim that similar ideas had been already discussed by Alexander Bain even before [4]. 4 Constructive role of plasticity rules in reservoir computing G. Barrios Morales Rosenblatt introduced three types of units within the network: sensory units (S-units), to which impulses are transmitted when they are activated; association units (A-units), integrating the signals from the S-units and transmitting it to the R-units; and response units (R-units), which produce the output, becoming active if the total signal coming from the A-units exceeds a certain threshold. The original model of Rosenblatt even included inhibitory connections between the S and R-units, and the A-units, becoming a pioneer work in the implementation of feed-back connections within an ANN. The S, A and R sets in the original scheme, correspond today to what is usually known as in- put, hidden and output layer in a FFNN. Mathematically, the output of the perceptron is computed as: 1 if w · x + b > 0 f(x) = (1) 0 otherwise where w · x is the dot product of the input x with the weight vector w and b is a bias term that acts like a moving threshold. In modern FFNNs, the step function is usually substituted by a non-linearity ϕ (w · x + b) which receives the name of activation function. The basic scheme of an artificial neuron in the current notation of FFNNs together with its biological association is depicted in Fig.1. Figure 1: Architecture of a basic artificial neuron. Figure credit to Greezes [6]. Being computationally more applicable than the original ideas of Hebb, Rosenblatt paved the way that would progressively detach ML from its biological inspiration, as he himself would state in the following lines introducing the Perceptron: “The perceptron is designed to illustrate some of the fundamental properties of intelligent systems in general, without becoming too deeply enmeshed in 5 Constructive role of plasticity rules in reservoir computing G. Barrios Morales the special, and frequently unknown, conditions which hold for particular biological organisms.” [5] Despite the initial excitement in the promising new model, it was quickly proved that perceptrons could not be trained to recognize many classes of patterns. In fact, they were only capable of learning linearly separable patterns, becoming a linear classifier. This was shown in 1969 by Marvin Minsky and Seymour Papert in a famous book entitled Perceptrons, where they stated that it was impossible for this type of network to learn an XOR function [7]. In their book, the authors already foresaw the need for Multilayer Perceptrons (MLP) to tackle non-linear classification problems, but the lack of a suitable learning algorithm lead to the first of the AI winters [8], with neural network research stagnating for many years. The solution to this problem came around 1974 through what today are widely known as Backpropagation algorithms. Backpropagation was derived by multiple researchers in the early 60’s and implemented to run on computers as early as 1970 by Seppo Linnainmaa [9], but it was Paul Werbos —back then a PhD student at Harvard Uni- versity— the first one to propose that it could be used to effectively train MLPs [10]. Understood as a supervised learning method in multilayer networks, backpropagation aims to adjust the internal weights in each layer so to minimize the error or loss function at the output, using a gradient-descent approach based on the chain rule to propagate the error from the outer to the inner layers.