Unit 8: Introduction to Neural Networks. Perceptrons

Introduction Perceptrons Training with Delta rule Unit 8: Introduction to neural networks. Perceptrons D. Balbont´ınNoval F. J. Mart´ınMateos J. L. Ruiz Reina A. Riscos Nú~nez Departamento de Ciencias de la Computacióne Inteligencia Artificial Universidad de Sevilla Inteligencia Artificial, 2012-13 Introduction Perceptrons Training with Delta rule Outline Introduction Perceptrons Training with Delta rule Introduction Perceptrons Training with Delta rule Artificial neurons: biological inspiration • Artificial neural networks are a computational model inspired on their biological counterpart • However, do not forget it is just a formal model: • Some features/properties of the biological systems are not captured in the computational model and viceversa • Our approach is to use them as a mathematical model that is the base of powerful automated learning algorithms • Structure: directed graph (nodes are artificial neurons) Introduction Perceptrons Training with Delta rule How a neural network works • Each node or unit (artificial neuron), is connected to other units via directed arcs (modeling the axon ! dendrites connexion) • Each arc j ! i propagates the output of unit j (denoted aj ) which in turn is one of the inputs for unit i. The inputs and outputs are numbers • Each arc j ! i has an associated numerical weight wji which determines the strength and the sign of the connexion (simulating a sinapsis) Introduction Perceptrons Training with Delta rule How a neural network works • Each unit calculates its output according to the received inputs • The output of each unit is, in turn, used as one of the inputs of other neurons • The calculation performed by each unit is very simple, as we will see later on • The network receives a series of external inputs (input units) and returns the output of some of its neurons, called output units Introduction Perceptrons Training with Delta rule Calculation at each unit Pn • The output of a unit is: ai = g( j=0 wji aj ) • Where: • g is the activation function Pn • The summatory j=0 wji aj (denoted by ini ) gathers all the outputs of units j connected with unit i • Except for j = 0, which is considered as a virtual input a0 = −1 and has a weight w0i called bias Introduction Perceptrons Training with Delta rule Bias and activation functions • Intuitively, the bias weight w0i of each unit is interpreted as the minimum amount that the sum of the received input signals has to reach in order to activate the unit • The role of the activation function g is to \normalize" the output (usually to 1) upon activation. Besides, this ingredient allows that the network offers something more than a simple lineal function • Frequently used activation functions: 1 si x > 0 • Bipolar: sgn(x) = −1 si x ≤ 0 1 si x > 0 • Threshold: threshold(x) = 0 si x ≤ 0 • 1 Sigmoid: σ(x) = 1+e−x • The sigmoid function is derivable and σ0(x) = σ(x)(1 − σ(x)) Introduction Perceptrons Training with Delta rule Feedforward neural networks • A feedforward neural network is an artificial neural network where connections between the units do not form a directed cycle (we focus on them in this unit) Introduction Perceptrons Training with Delta rule Feedforward neural networks • Units in a feedforward neural network are usually structured into layers, in such a way that each layer receives its inputs from units at the layer immediately before it • input layer, hidden layers and output layer • Multi-layer networks • Other architectures: recurrent networks, where the output units provide retroalimentation to the input units Introduction Perceptrons Training with Delta rule Neural networks as classifiers • A feedforward neural network with n units on the input layer and m units on the output layer computes a function from Rn into Rm • Thus, it can be used as a classifier for sets in Rn: • For Boolean classification, take m = 1 and: • If threshold or bipolar activation functions are used, then one of the outputs (e.g. 1) is interpreted as \YES" and the other one as \NO" • If sigmoid is used, then consider all output values over 0.5 as \YES" and any lower value as \NO" • In general, for classifiers with m possibles values, each output unit corresponds to a classification value; and then the unit having the higher output provides the classification Introduction Perceptrons Training with Delta rule Neural networks and Learning • Learning or training, in the framework of artificial neural networks, means searching adequate weights for the arcs, in order to get a desired behaviour (given by a training set) • More precisely, in feedforward neural networks, we usually follow the following supervised learning scheme • Given a training set n m D = f(x~d ; y~d ): x~d 2 R ; y~d 2 R ; d = 1;:::; kg • And given a structure of a network (number of layers and units per layer) n • Find a set of weight values wij such that the function from R into Rm represented by the network provides the best fit with the examples in the training set • We need a precise definition of \best fit” Introduction Perceptrons Training with Delta rule Practical applications of neural networks • For problems that can be expressed in terms of numbers (discrete or continuous) • Usually suitable for domains with a huge amount of data, possibly with noise: cameras, microphones, digitalized images, etc • We only care about the solution, not why it is so • Problems where we can afford a long training time for the network • And we want fast evaluation of new instances Introduction Perceptrons Training with Delta rule ALVINN: an example of an application • ANN trained to drive a car, at 70 Kms/h, according to the visual perception received as input from its sensors • Input of the network: The image of the road digitalized as an array of 30 × 32 pixels. That is, 960 input data • Output of the network: Indication about turning the wheel, encoded as a 30 component vector (from turn completely to the left, to keep straight, and then all the way to turn completely to the right) • Structure: feedforward network, input layer having 960 units, one hidden layer having 4 units and an output layer with 30 units Introduction Perceptrons Training with Delta rule ALVINN: an example of an application • Training: using a human driver, that drives the car again and again • The visual sensors record the image seen by the driver (sequences of 960 data each) • Other sensors record simultaneously the movements of the wheel • After properly encoding all the gathered info, we have a number of different pairs of the form (~x;~y), where ~x = (x1; x2;:::; x960) and ~y = (y1; y2;:::; y30), constitute examples of input/output for the network • Goal: find the best values for wji associated to arcs j ! i such that when the network receives ~x, its output matches the corresponding value ~y (or is the best possible approximation) Introduction Perceptrons Training with Delta rule Examples of practical applications • Classification • Pattern recognition • Optimization • Prediction: weather, audience, etc • Speech recognition • Artificial vision, image recognition • Constraint satisfaction • Control (robots, cars, etc) • Data compression • Diagnosis Introduction Perceptrons Training with Delta rule Outline Introduction Perceptrons Training with Delta rule Introduction Perceptrons Training with Delta rule Perceptrons • Let us first focus on the simplest case of neural network: feedforward, just one input layer and one output layer. • Since each of the output units is independent, without loss of generality we can consider just one unit in the output layer • This type of network is called perceptron • A perceptron using threshold activation function is able to represent the basic Boolean functions: Introduction Perceptrons Training with Delta rule Perceptrons: limitations • A perceptron with n input units, weights wi (i = 0;:::; n) and threshold (or bipolar) activation function, accepts those Pn (x1;:::; xn) such that i=0 wi xi > 0 (where x0 = −1) Pn n • The equation i=0 wi xi = 0 represents a hyperplane in R • That is, a Boolean function can only be represented by a threshold perceptron if there exists an hyperplane separating positive elements from negative elements (linearly separable) • Perceptrons with sigmoid activation function have similar expressive limitations Introduction Perceptrons Training with Delta rule Perceptrons: limitations • For example, functions AND and OR are linearly separable, while XOR is not: • Despite their expressive limitations, perceptrons have the advantage that there exists a simple training algorithm for perceptrons with threshold activation function • Able to find the suitable perceptron for any linearly separable training set Introduction Perceptrons Training with Delta rule Algorithm for (threshold) Perceptron training • Input: A training set D (with examples of the form (~x; y), with ~x 2 Rn and y 2 f0; 1g), and a learning factor η Algorithm 1) Consider randomly generated initial weights w~ (w0; w1;:::; wn) 2) Repeat until halting condition 1) For each (~x; y) in the training set do Pn 1) Calculate o = threshold( i=0 wi xi ) (with x0 = −1) 2) For each wi do: wi wi + η(y − o)xi 3) Return w~ Introduction Perceptrons Training with Delta rule Comments about the algorithm • η is a positive constant, usually very small (e.g. 0.1), called learning factor, that moderates the weights update process • At each iteration, if y = 1 and o = 0, then y − o = 1 > 0, and therefore wi corresponding to positive xi will increase (and those corresponding to negative xi will decrease). Thus, o (real output) will come closer

Load more