Neural Networks: Learning Process

Neural Networks: Learning Process Prof. Sven Lončarić [email protected] http://www.fer.hr/ipg 1 Overview of topics l Introduction l Error-correction learning l Hebb learning l Competitive learning l Credit-assignment problem l Supervised learning l Reinforcement learning l Unsupervised learning 2 Introduction l One of the most important ANN features is ability to learn from the environment l ANN learns through an iterative process of synaptic weights and threshold adaptation l After each iteration ANN should have more knowledge about the environment 3 Definition of learning l Definition of learning in the ANN context: l Learning is a process where unknown ANN parameters are adapted through continuous process of stimulation from the environment l Learning is determined by the way how the change of parameters takes place l This definition implies the following events: l The environment stimulates the ANN l ANN changes due to environment l ANN responds differently to the environment due to the change 4 Notation l vj and vk are activations of neurons j and k ϕ ϕ (vj) wkj (vk) l xj and xk are v v outputs of j xj k xk neurons j and k l Let wkj(n) be synaptic weights Neuron j Neuron k at time n 5 Notation l If in step n synaptic weight wkj(n) is changed by Δwkj(n) we get the new weight: wkj(n+1) = wkj(n) + Δwkj(n) where wkj(n) and wkj(n+1) are old and new weights between neurons k and j l A set of rules that are solution to the learning problem is called a learning algorithm l There is no unique learning algorithm, but many different learning algorithms, each with its advantages and drawbacks 6 Algorithms and learning paradigms l Learning algorithms determine how weight correction Δwkj(n) is computed l Learning paradigms determine the relation of the ANN to the environment l Three basic learning paradigms are: l Supervised learning l Reinforcement learning l Unsupervised learning 7 Basic learning approaches l According to learning algorithm: l Error-correction learning l Hebb learning l Competitive learning l Boltzmann learning l Thorndike learning l According to learning paradigm: • Supervised learning • Reinforcement learning • Unsupervised learning 8 Error-correction learning l Belongs to the supervised learning paradigm l Let dk(n) be desired output of neuron k at moment n l Let yk(n) be obtained output of neuron k at moment n l Output yk(n) is obtained using input vector x(n) l Input vector x(n) and desired output dk(n) represent an example that is presented to ANN at moment n l Error is the difference between desired and obtained output of neuron k at moment n: ek(n) = dk(n) - yk(n) 9 Error-correction learning l The goal of error-correction learning is to minimize an error function derived from errors ek(n) so that the obtained output of all neurons approximates the desired output in some statistical sense l A frequently used error function is mean square error: ⎡1 2 ⎤ J E e n = ⎢ ∑ k ( )⎥ ⎣2 k ⎦ where E[.] is the statistical expectation operator, and summation is for all neurons in the output layer 10 Error function l The problem with minimization of error function J is that it is necessary to know statistical properties of random processes ek(n) l For this reason an estimate of the error function in step n is used as the optimization function: 1 n e2 n E ( ) = ∑ k ( ) 2 k l This approach yields an approximate solution 11 Delta learning rule l Minimization of error function J with respect to weights wkj(n) gives Delta learning rule: Δwkj(n) = η ek(n) xj(n) where η is a positive constant determining the learning rate l Weight change is proportional to error and to the value at respective input l Learning rate η must be carefully chosen l Small η gives stability but learning is slow l Large η speeds up learning but brings instability risk 12 Error surface l If we draw error value J with respect to synaptic weights we obtain a multidimensional error surface l The learning problem consists of finding a point on the error surface that has the smallest error (i.e. to minimize the error) 13 Error surface l Depending on the type of neurons there are two possibilities: l ANN consists of linear neurons – in this case the error surface is a quadratic function with one global minimum l ANN consists of nonlinear neurons – in this case the error surface has one or more global minima and multiple local minima l Learning starts from an arbitrary point on the error surface and through minimization process: l In the first case it converges to the global minimum l In the second case it can also converge to a local minimum 14 Hebb learning l Hebbov principle of learning says (Hebb, The Organization of Behavior, 1942): l When axon of neuron A is close enough to activate neuron B and it repeats this many times there will be metabolical changes so that efficiency of neuron A in activating neuron B is increased l Extension of this principle (Stent, 1973): l If one neuron does not influence (stimulate) another neuron then the synapse between them becomes weaker or is completely eliminated 15 Activity product rule l According to Hebb principle weights are changed as follows: Δwkj(n) = F(yk(n), xj(n)) where yk(n) and xj(n) are the output and j-th input of k-th neuron l A special case of this prinicple is: Δwkj(n) = η yk(n) xj(n) where constant η determines the learning rate l This rule is called activity product rule 16 Activity product rule l Weight update is proportional to input Δwkj slope = ηyk value: Δwkj(n) = η yk(n) xj(n) l Problem: Iterative xj update with the same input and output causes continuous increase of weight wkj 17 Generalized activity product rule l To overcome the problem of weight saturation modifications are porposed that are aimed at limiting the increase of weight wkj l Non-linear limiting factor (Kohonen, 1988): Δwkj(n) = η yk(n) xj(n) - α yk(n) wkj(n) where α is a positive constant l This expression can be written as: Δwkj(n) = α yk(n)[cxj(n) - wkj(n)] where c = η/α 18 Generalized activity product rule l In generalized Hebb rule all inputs such Δwkj that xj(n)<wkj(n)/c slope = ηyk result in reduction of x weight wkj j l Inputs for which wkj/c xj(n)>wkj(n)/c -aykwkj increase weight wkj 19 Competitive learning l Unsupervised learning l Neurons compete to get opportunity to become active l Only one neuron can be active at any time l Useful for classification problems l Three elements of competitive learning: l A set of neurons having randomly selected weights, so they have different response for a given input l Limited weight of each neuron l Mechanism for competition of neurons so that only one neuron is given at any single time (winner-takes-all neuron) 20 Competitive learning l An example network with a x single neuron 1 layer x2 x3 x4 input output layer layer 21 Competitive learning l In order to win, activity vj of neuron x must be the largest of all neurons l Output yj of the winning neuron j is equal to 1; for all other neurons the output is 0 l The learning rule is defined as: ⎧ ⎪ η(xi − w ji ) if neuron j won Δw ji = ⎨ ⎩⎪ 0 if neuron j lost l The learning rule has effect of shifting the weight vector wj towards the vector x 22 An example of competitive learning l Let us assume that each input vector has norm equal to one – so that the vector can be represented as a point on N-dimensional unit sphere l Let us assume that weight vectors have norm equal to one – so they can also be represented as points on unit N-dimensional sphere l During training, input vectors are input to the network and the winning neuron weight is updated 23 An example of competitive learning l The learning weight vector process can be input vector represented as movement of weight vectors along unit sphere initial state final state 24 Credit-assignment problem l Credit-assignment problem is an important issue in learning algorithms l Credit-assignment problem is in assignment of credit/blame for the overall learning outcome that depends on a large number of internal decisions of the learning system 25 Supervised learning l Supervised learning is characterized by desired output the presence of a environment teacher teacher obtained output + ANN - error 26 Supervised learning l Teacher has knowledge in the form of input-output pairs used for training l Error is a difference between desired and obtained output for a given input vector l ANN parameters change under the influence of input vectors and error values l The learning process is repeated until ANN learns to imitate the teacher l After learning is completed, the teacher is no longer required and ANN can work without supervision 27 Supervised learning l Error function can be mean square error and it depends on the free parameters (weights) l Error function can be represented as a multidimensional error surface l Any ANN configuration is defined by weights and corresponds to a point on the error surface l Learning process can be viewed as movement of the point down the error surface towards the global minimum of the error surface 28 Supervised learning l A point on error surface moves towards the minimum based on gradient l The gradient at any point on the surface is a vector showing the direction of the steepest ascent 29 Supervised learning l Examples of supervised learning algorithms are: l LMS (least-mean-square) algorithm l BP (back-propagation) algorithm l A disadvantage of supervised learning is that learning is not possible without a teacher l ANN can only learn based on provided examples 30 Supervised learning l Suppervised learning can be implemented to work offline or online

Neural Networks: Learning Process

Arxiv:2107.04562V1 [Stat.ML] 9 Jul 2021 ¯ ¯ Θ∗ = Arg Min `(Θ) Where `(Θ) = `(Yi, Fθ(Xi)) + R(Θ)

Training Plastic Neural Networks with Backpropagation

Simultaneous Unsupervised and Supervised Learning of Cognitive

The Neocognitron As a System for Handavritten Character Recognition: Limitations and Improvements

Connectionist Models of Cognition Michael S. C. Thomas and James L

Stochastic Gradient Descent Learning and the Backpropagation Algorithm

The Hebbian-LMS Learning Algorithm

Artificial Neural Networks Supplement to 2001 Bioinformatics Lecture on Neural Nets

A Unified Framework of Online Learning Algorithms for Training

Big Idea #3: Learning Key Insights Explanation

Using Reinforcement Learning to Learn How to Play Text-Based Games

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification