Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks MAE-491/591 Artificial Neural Networks: Biological Inspiration • The brain has been extensively studied by scientists. • Vast complexity prevents all but rudimentary understanding. • Even the behaviour of an individual neuron is extremely complex. Features of the Brain • >Ten billion (1010) neurons • Neuron switching time >10-3 secs • Face Recognition ~0.1 secs • On average, each neuron has several thousand connections (to other neurons) • Hundreds of operations per second • Very high degree of parallel computation • Distributed representations (not all info in one spot) • Die off frequently (never replaced): • redundancy & rewired • Compensated for problems by massive parallelism Computers vs Brains A Contrast in Architecture •One CPU • Highly parallel processing • Fixed connections • Changing connections •Fast • Good Pattern Recognition • Repeatable/reliable • Good association & context • Absolute arithmetic • Good for complex issues precision • Good noise tolerance • Good computation • Good with incomplete info • Poor recognition • Unreliable, slow • Poor computational ability Why Artificial Neural Networks? There are two basic reasons why we are interested in building artificial neural networks (ANNs): • Technical viewpoint: Some problems such as character recognition or the prediction of future states of a system require massively parallel and adaptive processing. • Biological viewpoint: ANNs can be used to replicate and simulate components of the human (or animal) brain, thereby giving us insight into natural information processing. Why do we need another paradigm other than symbolic AI or Fuzzy logic for building “intelligent” machines? • Symbolic AI and Fuzzy logic are well-suited for representing explicit knowledge that can be appropriately formalized, or stated. • However, learning in complex systems (e.g., biological) is mostly implicit – it is an adaptation process based on uncertain information by uncertain reasoning via experiences. • Good in mechatronics for • Developing “unwritten” rules of operation and control. • Making a model or estimator of a system by observation. How do NNs and ANNs work? • The “building blocks” of real neural networks are the neurons (nerve cells). synapsesynapse axonaxon nucleusnucleus cellcell body body dendritesdendrites How does a neuron work • A neuron only fires (via the axon) if its input signal exceeds a certain amount (threshold). • All or nothing (on or off) • Synapses (junctions/connections between neurons) vary in strength – Good connections allowing a large signal – Slight connections allow only a weak signal. – Synapses can be either excitatory (+) or inhibitory (-). How do they all piece together? • Basically, each neuron • Transmits information as a series of electric impulses, so- called spikes. • Receives inputs from many other neurons. One neuron can be connected. • Changes its internal state (activation) based on all the current inputs • Sends one output signal to many other neurons (>10,000 in cases), possibly including itself (recurrent network). • Phase, frequency are also important in real neuron. • In ANN we refer to neurons as units or nodes. How do NNs (and ANNs) work? • In biological systems, neurons of similar functionality are usually organized in separate areas (or layers). • Often, there is a hierarchy of interconnected layers with the lowest layer receiving sensory input and neurons in higher layers computing more complex functions. • For example, neurons in a monkey’s visual cortex have been identified that are activated only when there is a face (monkey, human, or drawing) in the primate’s visual field. Receptive Fields and Layers in Hierarchical Neural Networks neuron A receptive field of A Artificial Neural Network History & Terminology • Originally hailed as a breakthrough in AI • 1943-McCulloch & Pitts model -first neuron model • 1957 Rosenblatt - Perceptron networks • 1969-Minsky, Papert - Defined perceptron limitations: ANN setback • 1980’s - Re-emergence of ANN - multi-layer networks • Referred to as connectionist networks, parallel distributed processing, adaptive networks • Now, better understood • Hundreds of variants • Less a model of the actual brain than a useful tool • Numerous applications • handwriting, face, speech recognition,CMU van that drives itself Components of Modeled Neuron • The internal components of the neural unit consist of; • Weights. There is a weight associated with each input. These weights are two-valued as well ( {-1,1} ). ( w1, w2, … , wn ). θ • Threshold / Bias: . A threshold weight, w0, associated with a bias θ value of x0. = w0 x0 • Summing Unit. A summation unit which produces the weighted sum of the binary inputs ( s = w0x0 + w1x1 + … + wnxn ). • Activation Function. An activation function which determines whether the neural unit ‘fires’ or not. This function takes the weighted sum, s, as its input and outputs a single value ( ex. {0,1}, or{-1,1} ). • The following diagram displays the above components Basic Artificial Neuron with Threshold x 1 summing w1 x 2 w2 n activation output x w thresholdthreshold θθ … ∑ i i f wn i=1 x n weights inputs n f (w1x1, w2 x2 ,..., wn xn ) =1, if ∑ xi wi ≥θ i=1 = 0, otherwise The Threshold Activation Function •One possible of neuron activation model is a threshold function •The graph of this function looks like this: 1 f 0 n θ ∑ xi wi i=1 Properties of Artificial Neural Nets (ANNs) • Many simple neuron-like threshold switching units • Many weighted interconnections among units • Highly parallel, distributed processing • Learning is accomplished by tuning the connection weights (wi) through training • Training is usually separate from actual usage Appropriate Problem Domains for Neural Network Learning • Input is high-dimensional discrete or real-valued (e.g. raw sensor input, pixels of a CCD) • Output is discrete or real valued • Output is a vector of values • Form of target function is unknown • Humans do not need to interpret the results (black box model) Perceptron Learning: Delta Rule ∆ wi = wi + wi ∆ η wi = (t - o) xi t is the target value o is the perceptron output η Is a small constant (e.g. 0.1) called learning rate • If the output is correct (t=o) the weights wi are not changed ≠ • If the output is incorrect (t o) the weights wi are changed such that the output of the perceptron for the new weights is closer to t. • The algorithm converges to the correct classification • if the training data is linearly separable • and η is sufficiently small Supervised Learning • Training and test data sets • Training set: • 2 data components required: inputs & target Perceptron Training 1 if Σ w x >t Output= i=0 i i { 0 otherwise • Linear threshold is used. • W - weight value • t - threshold value Simple network 1 if Σ w x >t output= i=0 i i -1 { 0 otherwise W = 1.5 X t = 0.0 W = 1 Y Training Perceptrons For AND -1 A B Output W = ? 0 0 0 0 1 0 A W = ? t = 0.0 1 0 0 1 1 1 W = ? B ••WhatWhat areare thethe startingstarting weightweight values?values? ••InitializeInitialize withwith randomrandom weightweight valuesvalues Training Perceptrons For AND I1 -1 A B Output W = 0.3 0 0 0 0 1 0 A I2 W = 0.5 t = 0.0 1 0 0 1 1 1 I W = -0.4 3 B I1 I2 I3 Summation Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0 Learning algorithm Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1 ∆ η wi = (T - o) xi Learning algorithm Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the training value will be 1 Output , O : The output value from the neuron Xi : Inputs being presented to the neuron Wi : Weight from input neuron (Ij) to the output neuron η: The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1 Decision boundaries • In simple cases, divide feature space by drawing a hyperplane across it. • Known as a decision boundary. • Discriminant function: returns different values on opposite sides. (straight line) • Problems which can be thus classified are linearly separable. Linear Separability X 1 A A A B DecisionDecision A BoundaryBoundary B A B B A B A B B X2 B Decision Surface of a Perceptronx x2 2 + + + - + - - x x1 1 + - - + - Linearly separable Non-Linearly separable • Perceptron is able to represent some useful functions • AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1 • But functions that are not linearly separable (e.g. XOR) are not representable •Do additional layers help? No. Multilayer Perceptron (MLP) FeedForward Networks Output Values Output Layer Adjustable Weights Input Layer Input Signals (External Stimuli) Different Non-Linearly Separable Problems Types of Exclusive-OR Classes with Most General Structure Decision Regions Problem Meshed regions Region Shapes Single-Layer Half Plane A B Bounded By B A Hyperplane B A Two-Layer Convex Open A B Or B A Closed Regions B A Three-Layer Arbitrary (Complexity A B B Limited by No. A of Nodes) B A Types of Layers • The input layer. – Introduces input values into the network. – No activation function or other processing.

Introduction to Artificial Neural Networks

6.5 Applications of Exponential and Logarithmic Functions 469

Artificial Neural Networks Part 2/3 – Perceptron

Artificial Neuron Using Mos2/Graphene Threshold Switching Memristor

Using the Modified Back-Propagation Algorithm to Perform Automated Downlink Analysis by Nancy Y

Chapter 10. Logistic Models

Deep Multiphysics and Particle–Neuron Duality: a Computational Framework Coupling (Discrete) Multiphysics and Deep Learning

Neural Network Architectures and Activation Functions: a Gaussian Process Approach

Neural Networks

A Connectionist Model Based on Physiological Properties of the Neuron

Lecture 5: Logistic Regression 1 MLE Derivation

An Introduction to Logistic Regression: from Basic Concepts to Interpretation with Particular Attention to Nursing Domain

Artificial Neural Networks· T\ Briet In!Toquclion