Neural Networks - Module 1

Hari C.V.

Assistant Professor Department of Applied Electronics & Instrumentation Engineering Rajagiri School of Engineering & Technology,DRAFT Kakkanad, Kochi

Hari C.V. February 15, 2016 1 / 76 Fundamentals of Neural Networks [1]

Neural Networks (NN) are simplified models of the biological nervous sys- tem & therefore have drawn their motivation from the kind of computing performed by a human brain.

Also called Artificial Neural Systems (ANS), Artificial Neural Networks (ANN) or simply Neural Networks (NN). Simplified imitations of the central nervous system.

An NN, in general is a highly interconnected network by a large number of processing elements are called neurons inDRAFT an architecture inspired by the brain.

Hari C.V. February 15, 2016 2 / 76 In and cognitive science, artificial neural networks (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown [2].

Massively parallel and therefore NNs are said to be exhibit parallel distributed processing. Learn by example Trained with known examples of a problem to ’acquire’ knowledge. Two types of learning Supervised Learning Unsupervised Learning. DRAFT

Hari C.V. February 15, 2016 3 / 76 Supervised Learning A ’teacher’ is assumed to be present during the learning process, i.e. the network aims to minimize the error between the target (desired) output presented by the ’teacher’ and the computed output, to achieve better per- formance.

Unsupervised Learning There is no ’teacher’ present to hand over the desired output and the net- work therefore tries to learn by itself, organizing the input instances of the problem.

Classification Single Layer Feedforward NetworksDRAFT Multi Layer Feedforward Networks Recurrent Networks

Hari C.V. February 15, 2016 4 / 76 Well known NN systems Backpropagation Network ADALINE (Adaptive Linear Element) Associative Memory Boltzman Machine Adaptive Resonance Theory Self Organizing Map Hopfield Network NN - Applications Pattern Recognition Image Processing Data Compression Forecasting DRAFT Optimization ... etc

Hari C.V. February 15, 2016 5 / 76 Human Brain

The human brain is the command center for the human nervous system. It receives input from the sensory organs and sends output to the muscles [3]. DRAFT

Figure 1: Human Brain Anatomy [4].

Hari C.V. February 15, 2016 6 / 76 Four Different Regions or Lobes Temporal Lobe Frontal Lobe Parietal Lobe Occipital Lobe Sylvian fissure divides the frontal lobe from the temporal lobe. Central sulcus divides the frontal lobeDRAFT from the parietal lobe

Hari C.V. February 15, 2016 7 / 76 Different Functions [5]

Frontal lobe concerns itself with the primary task of future plans and control of movement. The temporal lobe deals with hearing and is also particularly important for hearing and long term memory storage. The parietal lobe deals with body sensations. The occipital lobe lies at the back of the brain and houses the visual cortex which is concerned with receptionDRAFT and interpretation of vision.

Hari C.V. February 15, 2016 8 / 76 DRAFT Figure 2: Human Brain Cross Section [3].

Hari C.V. February 15, 2016 9 / 76 Different Functions [5]

Not observable externally, but sitting deep within the cerebrum are many parts such as pons and medulla oblongata which constitutes the brain stem. Medulla control vital automatic functions such as breathing, heart rate and digestion and pons is responsible for conveying information about movement from the cerebrum to the cerebellum. Thalamus is a switching center which processes information from the central nervous system before transmitting it to the central cortex; and the hypothalamus regulates the endocrine system and autonomic function. The corpus callosum is a bundle of fibersDRAFT that go from one hemisphere to the other thereby permitting inter hemisphere communication.

Hari C.V. February 15, 2016 10 / 76 Brain contains about 104 basic units of neurons. Each neuron in turn, is connected to about 104 other neurons. A neuron is a small cell that receives electro chemical signals from its various sources and in turn responds by transmitting electrical impulses to other neurons. An average brain weights about 1.5 kg and an average neuron weight of 1.5 × 10−9gms. Some of the neurons perform input and output operations (referred to as afferent and efferent cells respectively), the remaining form a part of an interconnected network of neurons which are responsible for signal transformation and storage of information.DRAFT

Hari C.V. February 15, 2016 11 / 76 Structure of a Neuron

A neuron, (also known as a neurone or nerve cell) is an electrically excitable cell that processes and transmits information through electrical and chemical signals [6]. DRAFT Figure 3: Structure of a Neuron [7].

Hari C.V. February 15, 2016 12 / 76 A neuron is composed of a nucleus - a cell body known as soma. Attached to soma are long irregularly shaped filaments called dendrites. The dendrites behave as input channels. All inputs from other neurons arrive through the dendrites. Dendrites look like branches of a tree during winter. Another type of link attached to the soma is the Axon. Axon is electrically active and servesDRAFT as an output channel.

Hari C.V. February 15, 2016 13 / 76 Axons, which mainly appears on output cells are non linear threshold devices which produces a voltage pulse called Action Potential or Spike that lasts for about a millisecond. If the cumulative inputs received by the soma raise the internal electric potential of the cell known as Membrane Potential, then the neuron ’fires’ by propagating the action potential down the axon to excite or inhibit other neurons. The axon terminates in a specialised contact called synapse or synap- tic junction that connects the axon with the dendrites of another neuron. DRAFT

Hari C.V. February 15, 2016 14 / 76 The synaptic junction which is a very minute gap at the end of the dendritic link contains neuro transmitter fluid. This fluid is responsible for accelerating or retarding the electric charges to the soma. Each dendritic link can have many synapses acting on it thus bringing about massive interconnectivity. In general, a single neuron can have many synaptic inputs and synaptic outputs. DRAFT

Hari C.V. February 15, 2016 15 / 76 Model of an Artificial Neuron

An artificial neuron is a mathematical function conceived as a model of biological neurons. Artificial neurons are the constitutive units in an artificial neural network [8].

Figure 4: Simple Model of anDRAFT Artificial Neuron.

x1, x2, x3 . . . etc are the n inputs to the artificial neuron. w1, w2, w3 . . . etc are the weights attached to the input links.

Hari C.V. February 15, 2016 16 / 76 Biological neuron receives all inputs through the dendrites, sums them and produces an output if the sum is greater than a threshold value. The input signals are passed on to the cell body through the synapse which may accelerate or retard an arriving signal.

Acceleration or retardation of the input signals are modelled as weights. An effective synapse which transmits a stronger signal will have a corre- spondingly larger weights while week synapse will have smaller weights. Thus, the weights are multiplicative factors of the inputs to account for the strength of the synapse. DRAFT

Hari C.V. February 15, 2016 17 / 76 The total input I received by the soma of the artificial neuron is

n I = w1x1 + w2x2 + . . . wnxn = wi xi (1) i X=1 To generate the final output y, the sum is passed on to non-linear filter φ called Activation Function or Transfer Function or Squash Function which releases the output.

y = φ(I ) (2) Activations Functions Thresholding Function Signum Function Sigmoidal Function DRAFT Hyperbolic Tangent Function

Hari C.V. February 15, 2016 18 / 76 Thesholding Function

Commonly used one. Sum is compared with a threshold value θ and if the value I is greater than θ, then the output is 1 else it is 0.

n y = φ wi xi − θ (3) i ! X=1 where, φ is the step function known as Heaviside function and is such that

1, I > 0 φ(I )= DRAFT(4) (0, I ≤ 0

Hari C.V. February 15, 2016 19 / 76 Figure 5: Thresholding Function.

Output signal is either 1 or 0 resultingDRAFT in the neuron being on or off.

Hari C.V. February 15, 2016 20 / 76 Signum Function

Figure 6: Signum Function.

Also known as Quantizer function DRAFT +1, I >θ φ(I )= (5) (−1, I ≤ θ

Hari C.V. February 15, 2016 21 / 76 Sigmoidal Function

Figure 7: SigmoidalDRAFT Function.

Hari C.V. February 15, 2016 22 / 76 Sigmoidal function is a continuous function that varies gradually be- tween the asymptotic values 0 and 1 or -1 and +1 and is given by 1 φ(I )= (6) 1+ eαI where α is the slope parameter, which adjusts the abruptness of the function as it changes between two asymptotic values. Sigmoidal functions are differentiable, which is an important feature of NN theory. DRAFT

Hari C.V. February 15, 2016 23 / 76 Hyperbolic Tangent Function

The function is given by

φ(I )= tanh(I ) (7) and can produce negative output values.DRAFT

Hari C.V. February 15, 2016 24 / 76 Neural Network Architectures

An Artificial Neural Network (ANN) is defined as a data processing system consisting of a large number of simple highly interconnected processing elements (artificial neurons) in an architecture inspired by the structure of cerebral cortex of the brain. ANN structure can be represented using a directed graph. A graph G is an ordered 2-tuple (V,E) consisting of a set V of vertices and a set of E of edges. When each edge is assigned an orientation, the graph is directed and is called a directed graph or a digraphDRAFT.

Hari C.V. February 15, 2016 25 / 76 Figure 8: An example of digraph.

The vertices of the graph may represent the neurons (input/output) and the edges, the synaptic links. The edges are labelled by the weights attached to the synaptic links. Fundamental different classes DRAFT Single Layer Feedforward Network. Multilayer Feedforward Network. Recurrent Networks

Hari C.V. February 15, 2016 26 / 76 Single Layer Feedforward Network

Figure 9: Single Layer Feedforward Network.

Two Layers - input layer and outputDRAFT layer Input layer neurons receives the input signals and output layer neurons receives the output signals.

Hari C.V. February 15, 2016 27 / 76 The synaptic links carrying the weights connect every input neurons to the output neuron but not vice versa. Such a network is said to be feedforward or acyclic in nature. The network is named as single layer since it is the output layer, alone which performs the computation. The input layer merely transmits theDRAFT signals to the output layer.

Hari C.V. February 15, 2016 28 / 76 Multilayer Feedforward Network

Figure 10: Multi Layer Feedforward Network (l-m-n configuration.)

Multiple Layers. DRAFT Thus, architectures of this class besides possessing an input and an output layer also have one or more intermediary layers called hidden layers.

Hari C.V. February 15, 2016 29 / 76 The computational units of hidden layer are known as the hidden neu- rons or hidden units. The hidden layer aids in performing useful intermediary computations before directing the input to the output layer. The input layer neurons are linked to the hidden layer neurons and the weights on these links are referred to as input hidden layer weights. Again, the hidden layer neurons are linked to the output layer neurons and the corresponding weights are referred to as hidden output layer weights.

A multilayer feedforward network with l input neurons m1 neurons in the first hidden layer, m2 neurons in the second hidden layer and n output neurons in the output layer isDRAFT written as l − m1 − m2 − n.

Hari C.V. February 15, 2016 30 / 76 Recurrent Networks

Figure 11: A Recurrent NeuralDRAFT Network In this type, atleast one feedback loop is present. There could also be neurons with self feedback links, i.e. the output of a neuron is fed back into itself as input.

Hari C.V. February 15, 2016 31 / 76 Characteristics of Neural Networks

1 The NNs exhibit mapping capabilities, that is, they can map input patterns to their associated output patterns. 2 The NNs learn by examples. 3 The NNs possess the capability to generalize. Thus, they can predict new outcomes from past trends. 4 The NNs are robust systems and fault tolerant. They can, therefore, recall full patterns from incomplete, partial or noisy patterns. 5 The NNs can possess information in parallel, at high speed, and in a distributed manner. DRAFT

Hari C.V. February 15, 2016 32 / 76 Learning Methods

Figure 12: Classification of Learning Algorithms

Broadly classified into three basic typesDRAFT 1 Supervised Learning 2 Unsupervised Learning 3 Reinforced Learning

Hari C.V. February 15, 2016 33 / 76 Learning Methods

Supervised Learning Every input pattern that is used to train the network is associated with an output pattern, which is the target or the desired pattern. A teacher is assumed to be present during the learning process, when a comparison is made between the network’s computed output and the correct expected output, to determine the error. The error can then be used to change network parameters, which result in an improvement in performance.DRAFT

Hari C.V. February 15, 2016 34 / 76 Unsupervised Learning In this method, the target output is not presented in the network. It is as if there is no teacher to present the desired patterns and hence, the system learns of its own by discovering and adapting to structural features in the input patterns. Reinforced Learning In this method, a teacher though available, does not present the expected answer but only indicates if the computed output is correct or incorrect. The information provided helps the network in its learning process. A reward is given for correct answer computed and a penalty for a wrong answer. Reinforced learning is not one of theDRAFT popular forms of learning.

Hari C.V. February 15, 2016 35 / 76 Supervised and Unsupervised learning methods, which are most popular forms of learning, have found expression through various rules and some of the widely used rules are Hebbian Learning Gradient Descent Learning Competitive Learning Stochastic Learning DRAFT

Hari C.V. February 15, 2016 36 / 76 Hebbian Learning

This rule is proposed by Hebb and is based on correlative weight ad- justment. This is the oldest learning mechanism inspired by biology.

Input - output pattern pairs (Xi , Yi ) are associated by the weight matrix W , known as the correlation matrix.

n T W = Xi Yi (8) i X=1 DRAFT

Hari C.V. February 15, 2016 37 / 76 Gradient Descent Learning

This is based on minimization of error E defined in terms of weights and the activation functions of the network. It is required that the activation function employed by the network is differentiable, as the weight update is dependent on the gradient of the error E. th Thus, if △Wij is the weight update of the link connecting the i and th j neuron of the two neighbouring layers, then △Wij is defined as, δE △Wij = η (9) δWij δE where η is the learing rate parameter and δ is the error gradient with DRAFTWij reference to the weight Wij The Widrow and Hoffs Delta rule and Back propagation learning rule are examples of this type of learning.

Hari C.V. February 15, 2016 38 / 76 Competitive Learning

In this method, those neurons which respond strongly to input stimuli have their weights updated. When an input pattern is presented, all neurons in the layer compete and the winning neuron undergoes weight adjustment. - ”Winner takes all” DRAFT

Hari C.V. February 15, 2016 39 / 76 Stochastic Learning

In this method, weights are adjusted in a probabilistic fashion. An example is evident in simulated annealing - the learning mechanism employed by Boltzman and Cauchy machines, which are a kind of NN systems. DRAFT

Hari C.V. February 15, 2016 40 / 76 Early Neural Network Architectures

Perceptron [9] In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers: functions that can decide whether an input (represented by a vector of numbers) belongs to one class or another [9].

It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. The algorithm allows for online learning, in that it processes elements in the training set one at a time. The perceptron algorithm was inventedDRAFT in 1957 at the Cornell Aero- nautical Laboratory by Frank Rosenblatt, funded by the United States Office of Naval Research.

Hari C.V. February 15, 2016 41 / 76 Rosenblatt’s Perceptron

The perceptron is a computational model of the retina of the eye and hence, it is named as ’perceptron’. The network comprises three units Sensory Unit (S) Association Unit (A) Response Unit (R) DRAFT

Figure 13: Rosenblatt’s Original Perceptron Model.

Hari C.V. February 15, 2016 42 / 76 The S unit comprising 400 photodetectors receives input images and provide 0/1 electric signals as output. If the input signal exceeds a threshold, then the photodetector outputs 1 else 0. The photodetectors are randomly connected to the Association unit A. A unit comprises feature demons or predicates. The predicates examine the output of the S unit for specific features of the image. The third unit R comprises pattern recognizers or , which receives the results of the predicate, also in binary form. While the weights of the S and A units are fixed, those of R are adjustable. DRAFT

Hari C.V. February 15, 2016 43 / 76 The output of the R unit could be such that if the weighted sum of its inputs is less than or equal to 0, then the output is 0 otherwise it is the weighted sum itself. It could also be determined by a step function with binary values (0/1) or bipolar values (-1/1). Thus, in the case of a step function yielding 0/1 output values, it is defined as

yj = f (netj ) = 1, if netj > 0 = 0, otherwise where

n netj = DRAFTxi wij (10) i X=1 where xi is the input, wij is the weight on the connection leading to the output units (R unit), and yj is the output.

Hari C.V. February 15, 2016 44 / 76 The training algorithm of the perceptron is a supervised learning al- gorithm where weights are adjusted to minimize error whenever the computed output does not match the target output

(a) A Simple Perceptron Model (b) ADRAFT Multilayer Feedforward Perceptron Model

Figure 14: Perceptron Network

Hari C.V. February 15, 2016 45 / 76 Basic Learning Algorithm for Training the Perceptron

If the output is correct then no adjustment of weights is done. i.e

(k+1) (k) Wij = Wij If the output is 1 but should have been 0, then the weights are decreased on the active input links. i.e

(k+1) (k) Wij = Wij − ηxi If the output is 0 but should have been 1, then the weights are increased on the active input links. i.e

(k+1) (k) η Wij = WDRAFTij + xi k+1 (k) Here, Wij is the new adjusted weight, Wij is the old weight, xi the input and η is the learning parameter.

Hari C.V. February 15, 2016 46 / 76 Small, η leads to slow learning and large η to fast learning. However, large η also runs the risk of allowing weights to oscillate about values which would result in the correct outputs. For a fixed η, the learning algorithm is termed fixed increment algo- rithm. DRAFT

Hari C.V. February 15, 2016 47 / 76 Perceptron Convergence Theorem

For any starting vector w, the perceptron learning rule will converge to a weight vector (w ∗) that gives the correct response for all training patterns, and it will do so in a finite number of steps.

C1 DRAFTC2 Figure 15: Two Linearly Separated Classes - C1 & C2

Hari C.V. February 15, 2016 48 / 76 −−→−→ From Eq.10, we can define the decision boundary V = W T X = 0. (This is the equation of the of the decision boundary) The classification

If V >= 0 Class 1 < 0 Class 2

For training ⇒ H1 ∈ C1 and H2 ∈ C2 −−→−→ −→ W T X > 0 for every X ∈ C −−→ 1 T −→ −→ W X ≤ 0 for every X ∈ C2 −−→ T −→ If X ∈ H2 and W X > 0-Itisa wrongDRAFT classification and need to update the weights.

Hari C.V. February 15, 2016 49 / 76 Weight Updating Algorithm

Correct Classification −→ −→w (n +1)= −→w (n) if w T −→x (n) > 0 and −→x (n) ∈ C −→ 1 −→ −→ T −→ −→ w (n +1)= w (n) if w x (n) ≤ 0 and x (n) ∈ C2 Incorrect Classification −→ −→w (n +1)= −→w (n) − η−→x (n) if w T −→x (n) > 0 and −→x (n) ∈ C −→ 2 −→ −→ −→ T −→ −→ w (n +1)= w (n)+ η x (n) if w DRAFTx (n) ≤ 0 and x (n) ∈ C1

Hari C.V. February 15, 2016 50 / 76 Proof of Convergence

Assumptions Linear Separability −→ −→w (0) = 0 (Starting all weights with zero) η = 1 (normally from 0 to 1) Misclassification is also learning. For n = 1, 2,..., keep on misclassification. At some , it will correctly classified.DRAFT

Hari C.V. February 15, 2016 51 / 76 −→ Assume x (n) ∈ H1 and misclassification happened. −→w (n +1) = −→w (n)+ −→x (n) −→w (0) = 0 −→w (1) = −→x (0) −→w (2) = −→w (1) + −→x (1) = −→x (1) + −→x (0) So, −→w (n +1) = −→x (0) + −→x (1) + . . . + −→x (n) (11)

−−→ −→ T −→ A solution Wo exists for which Wo x (n) > 0 for which −→ −→ −→ x (1), x (2),... x (n) ∈ H1

There is a minimum positive value α −−→DRAFT T −→ α = minWo x (n) −→ where x (n) ∈ H1

Hari C.V. February 15, 2016 52 / 76 −−→ T Pre-multiplying Eq.11 with Wo −−→ −−→ −−→ −−→ T −→ T −−→ T −−→ T −−→ Wo W (n +1) = Wo x(1) + Wo x(2) + . . . Wo x(n)

−−→ T −→ Wo W (n + 1) ≥ nα (12) Apply cauchy schwarz inequality

−−→ 2 −→ 2 −→ 2 T −→ ||Wo || ||W (n + 1)|| ≥ Wo W (n + 1)   −→ 2 −→ 2 2 2 ||Wo || ||W (n + 1)|| ≥ n α

−→ n2α2 ||W (n + 1)||2 DRAFT≥ (13) −→ 2 ||Wo || −→ The lower bound of W (n + 1) is given in Eq.13

Hari C.V. February 15, 2016 53 / 76 Alternate Route

−→w (k +1) = −→w (k)+ −→x (k) (only change the index to k) Take squared euclidean norm

||−→w (k + 1)||2 = ||−→w (k)||2 + ||−→x (k)||2 + 2−→w (k)−→x (k) Perceptron incorrectly classifies for k = 0, 1,... n So −→ w T (k)−→x (k) < 0 So we can write

||−→w (k + 1)||2 ≤ ||−→w (DRAFTk)||2 + ||−→x (k)||2

||−→w (k + 1)||2 − ||−→w (k)||2 ≤ ||−→x (k)||2

Hari C.V. February 15, 2016 54 / 76 So n ||−→w (n + 1)||2 ≤ ||−→x (k)||2 Xk=1 Define β is a positive quantity

β = max||−→x (k)||2 −→ where x (k) ∈ H1 ||−→x (1)||2 ≤ β ||−→x (2)||2 ≤ β So, ||−→w (n + 1)||DRAFT2 ≤ nβ (14) The upper bound of ||−→w (n + 1)|| is given in Eq.14

Hari C.V. February 15, 2016 55 / 76 For large values of n, Eq.13 and 14 are not simultaneously possible.

So, at some point n = nmax , Eq.13 and 14 are satisfied with equality sign. i.e. nmax is the solution.

2 2 nmax α −→ = nmax β ||Wo || −→ β ||W ||2 n = o (15) max α2 Beyond this value of n; no update of weights and correct classifi- cation will occur. DRAFT

Hari C.V. February 15, 2016 56 / 76 Perceptron and Linearly Separable Tasks

Perceptron can not handle, in particular, tasks which are not linearly separable. Set of points in two dimensional spaces are linearly separable if the sets can be separated by a straight .

Generally, a set of points in n dimensional space are linearly separable if there is a hyperplane of (n-1) dimensionsDRAFT that separates the sets.

Hari C.V. February 15, 2016 57 / 76 (a) Linearly Separable Patterns (b) Non-linearly Separable Pat- terns

Figure 16: Linearly Separable & Non-Linearly Separable Patterns

The perceptron cannot find weights forDRAFT classification type of problems that are not linearly separable. Example - XOR problem

Hari C.V. February 15, 2016 58 / 76 XOR Problem

Figure 17: XOR Truth Table

ANN is to classify the inputs as odd parity or even parity. Odd parity means odd number of 1 bitsDRAFT in the inputs and even parity means even number of 1 bits in the inputs.

Hari C.V. February 15, 2016 59 / 76 Figure 18: The non linear separable patterns of the XOR problem

The perceptron is unable to find a line separatingDRAFT even parity input patterns from the odd parity input patterns.

Hari C.V. February 15, 2016 60 / 76 Why perceptron is unable to find weights for non linear separable classification problems ?

Figure 19: A Perceptron Model with Two Inputs X1, X2

The final output DRAFT

net = w0 + w1x1 + w2x2 represents the equation of a straight line.

Hari C.V. February 15, 2016 61 / 76 Figure 20: A Straight Line as a Decision Boundary for a 2 Classification Problem

The straight line acts as a decision boundaryDRAFT separating the points into classes C1 and C2 above and below the line respectively. This what the perceptron aims to do for a problem when it is able to obtain their weights.

Hari C.V. February 15, 2016 62 / 76 Fixed Increment Perceptron Learning Algorithm for a Classification Problem with n Input Features (x1, x2,... xn) and two output classes (0/1)

Create a perceptron with (n+1) input neurons x0, x1,... xn, where x0 = 1 is the bias input. Let 0 be the output neuron.

Initialize w = (w0, w1,... wn) to random weights.

Iterate through the input patterns X j of the training set using the n weight set, (i.e) compute the weighted sum of inputs netj = i=0 xi wi for each pattern j. P Compute the output yj using the stepDRAFT function yj = f (netj ) = 1, if netj > 0 = 0, otherwise

Hari C.V. February 15, 2016 63 / 76 Compare the computed output yj with the target output Yj for each pattern j. If all the input patterns have been classified correctly, output the weights and exit. Otherwise, update the weights as given below

If the computed output yj is 1 but should have been 0, wi = wi − ηxi , i =0, 1, 2,..., n If the computed output yj is 0 but should have been 1, wi = wi + ηxi , i =0, 1, 2,..., n Here, α is the learning parameter and is a constant. Go to step 3 DRAFT

Hari C.V. February 15, 2016 64 / 76 ADALINE Network

Adaptive Linear Neural Element Network (ADALINE) framed by Bernad Widrow of Stanford University makes use of supervised learning. Only one output neuron and the output values are bipolar (-1 or +1). DRAFT Figure 21: A Simple ADALINE Network

Hari C.V. February 15, 2016 65 / 76 The inputs xi could be binary, bipolar or real valued.

The bias weight is w0 with an input link of x0 = +1. If the weighted sum of the inputs is greater than or equal to 0 then the output is 1 otherwise it is -1. The supervised learning algorithm adopted by the network is similar to to the perceptron learning algorithm. Devised by Widrow -Hoff, the learning algorithm is known as the Least Mean Square (LMS) or Delta rule. The rule is given by

new old Wi = Wi + α(t − y)xi (16) where α is the learning coefficient, t is the target output, y is the computed output, and xi is the input.DRAFT Used virtually in all high speed modems and telephone switching systems to cancel the echo in long distance communication networks.

Hari C.V. February 15, 2016 66 / 76 MADALINE Network

A MADALINE (Many ADALINE) network is created by combining num- ber of ADALINES. The network of ADALINES may span many layers. The use of multiple ADALINES helps counter the problem of non linear separability. DRAFT

Figure 22: MADALINE Network

Hari C.V. February 15, 2016 67 / 76 The MADALINE network with two units exhibits the capability to solve the XOR problem. DRAFT Figure 23: A MADALINE Network to Solve the XOR Problem

Hari C.V. February 15, 2016 68 / 76 Each ADALINE unit receives the inputs x1, x2 and the bias input x0 = 1 as its inputs. The weighted sum of the inputs is calculated and passed on to the bipolar threshold units. The logical ’and’ing (bipolar) of the two threshold outputs are com- puted to obtain the final output. Here, if the threshold outputs are both +1 or −1 then the final output is +1. If the threshold outputs are different, (i.e) (+1, −1) then the final output is −1. Inputs which are of even parity produces positive outputs and inputs of odd parity produces negative outputs.DRAFT

Hari C.V. February 15, 2016 69 / 76 Figure 24: Decision Boundaries for the XOR Problem

Fig.24 shows the decision boundaries for the XOR problem while trying to classify the even parity inputs (positiveDRAFT outputs) from the odd parity inputs (negative outputs).

Hari C.V. February 15, 2016 70 / 76 The learning rule adopted by MADALINE network is termed as ’MADA- LINE Adaptive Rule’ (MR) and is a form of supervised learning. In this method, the objective is to adjust the weights such that the error is minimum for the current training pattern, but with as little damage to the learning acquired through previous training patterns. MADALINE network have been subjected to enhancements over the year. DRAFT

Hari C.V. February 15, 2016 71 / 76 NN - Applications

Pattern Recognition (PR) / Image Processing Recognition of visual images Handwritten Characters Printed Characters Speech and other PR based tasks Optimization / Constraint Satisfaction Manufacturing scheduling Finding the shortest possible tour givenDRAFT a set of cities ... etc

Hari C.V. February 15, 2016 72 / 76 Forecasting and risk assessment NN have exhibited the capability to predict situations from past trends. Therfore find ample applications in such areas as meteorology, stock market, banking and econometrics with high success rates. Control Systems NN have gained commercial ground by finding applications in control systems. Dozens of computer products, especially, by the Japannese comapnies is incorporating NN technology is a standing example. They have also been used for the control of chemical plants, robots and so on. DRAFT

Hari C.V. February 15, 2016 73 / 76 References I

[1] S. Rajasekharan and G. V. Pai, Neural Networks, Fuzzy Logic, and Genetic Algorithms Synthesis and Applications. PHI Learning Private Limited, 2012. [2] “Artificial neural network,” https://en.wikipedia.org/wiki/Artificial neural network, [Online; accessed 20-January-2016]. [3] “Human brain,” http://www.livescience.com/29365-human-brain.html, [Online; accessed 20-January-2016]. [4] “Human brain anatomy,” DRAFT https://www.google.co.in/imgres?imgurl=http://previews.123rf.com/images/ [Online; accessed 20-January-2016].

Hari C.V. February 15, 2016 74 / 76 References II

[5] S. Kumar, Neural Networks A Classroom Approach. Tata McGraw-Hill Publishing Company Limited, 2004. [6] “Neuron,” https://en.wikipedia.org/wiki/Neuron, [Online; accessed 20-January-2016]. [7] “Neuron,” http://bio3520.nicerweb.com/Locked/chap/ch03/neuron.html, [Online; accessed 20-January-2016]. [8] “Artificial neuron,” https://en.wikipedia.org/wiki/Artificial neuron, [Online; accessed 20-January-2016]. [9] “Perceptron,” https://en.wikipedia.org/wiki/PerceptronDRAFT, [Online; accessed 21-January-2016].

Hari C.V. February 15, 2016 75 / 76 Thank You DRAFT

Hari C.V. February 15, 2016 76 / 76