Neural Networks: Learning Process

Neural Networks: Learning Process Prof. Sven Lončarić [email protected] http://www.fer.hr/ipg 1 Overview of topics l Introduction l Error-correction learning l Hebb learning l Competitive learning l Credit-assignment problem l Supervised learning l Reinforcement learning l Unsupervised learning 2 Introduction l One of the most important ANN features is ability to learn from the environment l ANN learns through an iterative process of synaptic weights and threshold adaptation l After each iteration ANN should have more knowledge about the environment 3 Definition of learning l Definition of learning in the ANN context: l Learning is a process where unknown ANN parameters are adapted through continuous process of stimulation from the environment l Learning is determined by the way how the change of parameters takes place l This definition implies the following events: l The environment stimulates the ANN l ANN changes due to environment l ANN responds differently to the environment due to the change 4 Notation l vj and vk are activations of neurons j and k ϕ ϕ (vj) wkj (vk) l xj and xk are v v outputs of j xj k xk neurons j and k l Let wkj(n) be synaptic weights Neuron j Neuron k at time n 5 Notation l If in step n synaptic weight wkj(n) is changed by Δwkj(n) we get the new weight: wkj(n+1) = wkj(n) + Δwkj(n) where wkj(n) and wkj(n+1) are old and new weights between neurons k and j l A set of rules that are solution to the learning problem is called a learning algorithm l There is no unique learning algorithm, but many different learning algorithms, each with its advantages and drawbacks 6 Algorithms and learning paradigms l Learning algorithms determine how weight correction Δwkj(n) is computed l Learning paradigms determine the relation of the ANN to the environment l Three basic learning paradigms are: l Supervised learning l Reinforcement learning l Unsupervised learning 7 Basic learning approaches l According to learning algorithm: l Error-correction learning l Hebb learning l Competitive learning l Boltzmann learning l Thorndike learning l According to learning paradigm: • Supervised learning • Reinforcement learning • Unsupervised learning 8 Error-correction learning l Belongs to the supervised learning paradigm l Let dk(n) be desired output of neuron k at moment n l Let yk(n) be obtained output of neuron k at moment n l Output yk(n) is obtained using input vector x(n) l Input vector x(n) and desired output dk(n) represent an example that is presented to ANN at moment n l Error is the difference between desired and obtained output of neuron k at moment n: ek(n) = dk(n) - yk(n) 9 Error-correction learning l The goal of error-correction learning is to minimize an error function derived from errors ek(n) so that the obtained output of all neurons approximates the desired output in some statistical sense l A frequently used error function is mean square error: ⎡1 2 ⎤ J E e n = ⎢ ∑ k ( )⎥ ⎣2 k ⎦ where E[.] is the statistical expectation operator, and summation is for all neurons in the output layer 10 Error function l The problem with minimization of error function J is that it is necessary to know statistical properties of random processes ek(n) l For this reason an estimate of the error function in step n is used as the optimization function: 1 n e2 n E ( ) = ∑ k ( ) 2 k l This approach yields an approximate solution 11 Delta learning rule l Minimization of error function J with respect to weights wkj(n) gives Delta learning rule: Δwkj(n) = η ek(n) xj(n) where η is a positive constant determining the learning rate l Weight change is proportional to error and to the value at respective input l Learning rate η must be carefully chosen l Small η gives stability but learning is slow l Large η speeds up learning but brings instability risk 12 Error surface l If we draw error value J with respect to synaptic weights we obtain a multidimensional error surface l The learning problem consists of finding a point on the error surface that has the smallest error (i.e. to minimize the error) 13 Error surface l Depending on the type of neurons there are two possibilities: l ANN consists of linear neurons – in this case the error surface is a quadratic function with one global minimum l ANN consists of nonlinear neurons – in this case the error surface has one or more global minima and multiple local minima l Learning starts from an arbitrary point on the error surface and through minimization process: l In the first case it converges to the global minimum l In the second case it can also converge to a local minimum 14 Hebb learning l Hebbov principle of learning says (Hebb, The Organization of Behavior, 1942): l When axon of neuron A is close enough to activate neuron B and it repeats this many times there will be metabolical changes so that efficiency of neuron A in activating neuron B is increased l Extension of this principle (Stent, 1973): l If one neuron does not influence (stimulate) another neuron then the synapse between them becomes weaker or is completely eliminated 15 Activity product rule l According to Hebb principle weights are changed as follows: Δwkj(n) = F(yk(n), xj(n)) where yk(n) and xj(n) are the output and j-th input of k-th neuron l A special case of this prinicple is: Δwkj(n) = η yk(n) xj(n) where constant η determines the learning rate l This rule is called activity product rule 16 Activity product rule l Weight update is proportional to input Δwkj slope = ηyk value: Δwkj(n) = η yk(n) xj(n) l Problem: Iterative xj update with the same input and output causes continuous increase of weight wkj 17 Generalized activity product rule l To overcome the problem of weight saturation modifications are porposed that are aimed at limiting the increase of weight wkj l Non-linear limiting factor (Kohonen, 1988): Δwkj(n) = η yk(n) xj(n) - α yk(n) wkj(n) where α is a positive constant l This expression can be written as: Δwkj(n) = α yk(n)[cxj(n) - wkj(n)] where c = η/α 18 Generalized activity product rule l In generalized Hebb rule all inputs such Δwkj that xj(n)<wkj(n)/c slope = ηyk result in reduction of x weight wkj j l Inputs for which wkj/c xj(n)>wkj(n)/c -aykwkj increase weight wkj 19 Competitive learning l Unsupervised learning l Neurons compete to get opportunity to become active l Only one neuron can be active at any time l Useful for classification problems l Three elements of competitive learning: l A set of neurons having randomly selected weights, so they have different response for a given input l Limited weight of each neuron l Mechanism for competition of neurons so that only one neuron is given at any single time (winner-takes-all neuron) 20 Competitive learning l An example network with a x single neuron 1 layer x2 x3 x4 input output layer layer 21 Competitive learning l In order to win, activity vj of neuron x must be the largest of all neurons l Output yj of the winning neuron j is equal to 1; for all other neurons the output is 0 l The learning rule is defined as: ⎧ ⎪ η(xi − w ji ) if neuron j won Δw ji = ⎨ ⎩⎪ 0 if neuron j lost l The learning rule has effect of shifting the weight vector wj towards the vector x 22 An example of competitive learning l Let us assume that each input vector has norm equal to one – so that the vector can be represented as a point on N-dimensional unit sphere l Let us assume that weight vectors have norm equal to one – so they can also be represented as points on unit N-dimensional sphere l During training, input vectors are input to the network and the winning neuron weight is updated 23 An example of competitive learning l The learning weight vector process can be input vector represented as movement of weight vectors along unit sphere initial state final state 24 Credit-assignment problem l Credit-assignment problem is an important issue in learning algorithms l Credit-assignment problem is in assignment of credit/blame for the overall learning outcome that depends on a large number of internal decisions of the learning system 25 Supervised learning l Supervised learning is characterized by desired output the presence of a environment teacher teacher obtained output + ANN - error 26 Supervised learning l Teacher has knowledge in the form of input-output pairs used for training l Error is a difference between desired and obtained output for a given input vector l ANN parameters change under the influence of input vectors and error values l The learning process is repeated until ANN learns to imitate the teacher l After learning is completed, the teacher is no longer required and ANN can work without supervision 27 Supervised learning l Error function can be mean square error and it depends on the free parameters (weights) l Error function can be represented as a multidimensional error surface l Any ANN configuration is defined by weights and corresponds to a point on the error surface l Learning process can be viewed as movement of the point down the error surface towards the global minimum of the error surface 28 Supervised learning l A point on error surface moves towards the minimum based on gradient l The gradient at any point on the surface is a vector showing the direction of the steepest ascent 29 Supervised learning l Examples of supervised learning algorithms are: l LMS (least-mean-square) algorithm l BP (back-propagation) algorithm l A disadvantage of supervised learning is that learning is not possible without a teacher l ANN can only learn based on provided examples 30 Supervised learning l Suppervised learning can be implemented to work offline or online

Neural Networks: Learning Process

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support