Introduction to Artificial Neural Networks

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Artificial Neural Networks Introduction to Artificial Neural Networks MAE-491/591 Artificial Neural Networks: Biological Inspiration • The brain has been extensively studied by scientists. • Vast complexity prevents all but rudimentary understanding. • Even the behaviour of an individual neuron is extremely complex. Features of the Brain • >Ten billion (1010) neurons • Neuron switching time >10-3 secs • Face Recognition ~0.1 secs • On average, each neuron has several thousand connections (to other neurons) • Hundreds of operations per second • Very high degree of parallel computation • Distributed representations (not all info in one spot) • Die off frequently (never replaced): • redundancy & rewired • Compensated for problems by massive parallelism Computers vs Brains A Contrast in Architecture •One CPU • Highly parallel processing • Fixed connections • Changing connections •Fast • Good Pattern Recognition • Repeatable/reliable • Good association & context • Absolute arithmetic • Good for complex issues precision • Good noise tolerance • Good computation • Good with incomplete info • Poor recognition • Unreliable, slow • Poor computational ability Why Artificial Neural Networks? There are two basic reasons why we are interested in building artificial neural networks (ANNs): • Technical viewpoint: Some problems such as character recognition or the prediction of future states of a system require massively parallel and adaptive processing. • Biological viewpoint: ANNs can be used to replicate and simulate components of the human (or animal) brain, thereby giving us insight into natural information processing. Why do we need another paradigm other than symbolic AI or Fuzzy logic for building “intelligent” machines? • Symbolic AI and Fuzzy logic are well-suited for representing explicit knowledge that can be appropriately formalized, or stated. • However, learning in complex systems (e.g., biological) is mostly implicit – it is an adaptation process based on uncertain information by uncertain reasoning via experiences. • Good in mechatronics for • Developing “unwritten” rules of operation and control. • Making a model or estimator of a system by observation. How do NNs and ANNs work? • The “building blocks” of real neural networks are the neurons (nerve cells). synapsesynapse axonaxon nucleusnucleus cellcell body body dendritesdendrites How does a neuron work • A neuron only fires (via the axon) if its input signal exceeds a certain amount (threshold). • All or nothing (on or off) • Synapses (junctions/connections between neurons) vary in strength – Good connections allowing a large signal – Slight connections allow only a weak signal. – Synapses can be either excitatory (+) or inhibitory (-). How do they all piece together? • Basically, each neuron • Transmits information as a series of electric impulses, so- called spikes. • Receives inputs from many other neurons. One neuron can be connected. • Changes its internal state (activation) based on all the current inputs • Sends one output signal to many other neurons (>10,000 in cases), possibly including itself (recurrent network). • Phase, frequency are also important in real neuron. • In ANN we refer to neurons as units or nodes. How do NNs (and ANNs) work? • In biological systems, neurons of similar functionality are usually organized in separate areas (or layers). • Often, there is a hierarchy of interconnected layers with the lowest layer receiving sensory input and neurons in higher layers computing more complex functions. • For example, neurons in a monkey’s visual cortex have been identified that are activated only when there is a face (monkey, human, or drawing) in the primate’s visual field. Receptive Fields and Layers in Hierarchical Neural Networks neuron A receptive field of A Artificial Neural Network History & Terminology • Originally hailed as a breakthrough in AI • 1943-McCulloch & Pitts model -first neuron model • 1957 Rosenblatt - Perceptron networks • 1969-Minsky, Papert - Defined perceptron limitations: ANN setback • 1980’s - Re-emergence of ANN - multi-layer networks • Referred to as connectionist networks, parallel distributed processing, adaptive networks • Now, better understood • Hundreds of variants • Less a model of the actual brain than a useful tool • Numerous applications • handwriting, face, speech recognition,CMU van that drives itself Components of Modeled Neuron • The internal components of the neural unit consist of; • Weights. There is a weight associated with each input. These weights are two-valued as well ( {-1,1} ). ( w1, w2, … , wn ). θ • Threshold / Bias: . A threshold weight, w0, associated with a bias θ value of x0. = w0 x0 • Summing Unit. A summation unit which produces the weighted sum of the binary inputs ( s = w0x0 + w1x1 + … + wnxn ). • Activation Function. An activation function which determines whether the neural unit ‘fires’ or not. This function takes the weighted sum, s, as its input and outputs a single value ( ex. {0,1}, or{-1,1} ). • The following diagram displays the above components Basic Artificial Neuron with Threshold x 1 summing w1 x 2 w2 n activation output x w thresholdthreshold θθ … ∑ i i f wn i=1 x n weights inputs n f (w1x1, w2 x2 ,..., wn xn ) =1, if ∑ xi wi ≥θ i=1 = 0, otherwise The Threshold Activation Function •One possible of neuron activation model is a threshold function •The graph of this function looks like this: 1 f 0 n θ ∑ xi wi i=1 Properties of Artificial Neural Nets (ANNs) • Many simple neuron-like threshold switching units • Many weighted interconnections among units • Highly parallel, distributed processing • Learning is accomplished by tuning the connection weights (wi) through training • Training is usually separate from actual usage Appropriate Problem Domains for Neural Network Learning • Input is high-dimensional discrete or real-valued (e.g. raw sensor input, pixels of a CCD) • Output is discrete or real valued • Output is a vector of values • Form of target function is unknown • Humans do not need to interpret the results (black box model) Perceptron Learning: Delta Rule ∆ wi = wi + wi ∆ η wi = (t - o) xi t is the target value o is the perceptron output η Is a small constant (e.g. 0.1) called learning rate • If the output is correct (t=o) the weights wi are not changed ≠ • If the output is incorrect (t o) the weights wi are changed such that the output of the perceptron for the new weights is closer to t. • The algorithm converges to the correct classification • if the training data is linearly separable • and η is sufficiently small Supervised Learning • Training and test data sets • Training set: • 2 data components required: inputs & target Perceptron Training 1 if Σ w x >t Output= i=0 i i { 0 otherwise • Linear threshold is used. • W - weight value • t - threshold value Simple network 1 if Σ w x >t output= i=0 i i -1 { 0 otherwise W = 1.5 X t = 0.0 W = 1 Y Training Perceptrons For AND -1 A B Output W = ? 0 0 0 0 1 0 A W = ? t = 0.0 1 0 0 1 1 1 W = ? B ••WhatWhat areare thethe startingstarting weightweight values?values? ••InitializeInitialize withwith randomrandom weightweight valuesvalues Training Perceptrons For AND I1 -1 A B Output W = 0.3 0 0 0 0 1 0 A I2 W = 0.5 t = 0.0 1 0 0 1 1 1 I W = -0.4 3 B I1 I2 I3 Summation Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0 Learning algorithm Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]) Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1 ∆ η wi = (T - o) xi Learning algorithm Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the training value will be 1 Output , O : The output value from the neuron Xi : Inputs being presented to the neuron Wi : Weight from input neuron (Ij) to the output neuron η: The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1 Decision boundaries • In simple cases, divide feature space by drawing a hyperplane across it. • Known as a decision boundary. • Discriminant function: returns different values on opposite sides. (straight line) • Problems which can be thus classified are linearly separable. Linear Separability X 1 A A A B DecisionDecision A BoundaryBoundary B A B B A B A B B X2 B Decision Surface of a Perceptronx x2 2 + + + - + - - x x1 1 + - - + - Linearly separable Non-Linearly separable • Perceptron is able to represent some useful functions • AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1 • But functions that are not linearly separable (e.g. XOR) are not representable •Do additional layers help? No. Multilayer Perceptron (MLP) FeedForward Networks Output Values Output Layer Adjustable Weights Input Layer Input Signals (External Stimuli) Different Non-Linearly Separable Problems Types of Exclusive-OR Classes with Most General Structure Decision Regions Problem Meshed regions Region Shapes Single-Layer Half Plane A B Bounded By B A Hyperplane B A Two-Layer Convex Open A B Or B A Closed Regions B A Three-Layer Arbitrary (Complexity A B B Limited by No. A of Nodes) B A Types of Layers • The input layer. – Introduces input values into the network. – No activation function or other processing.
Recommended publications
  • 6.5 Applications of Exponential and Logarithmic Functions 469
    6.5 Applications of Exponential and Logarithmic Functions 469 6.5 Applications of Exponential and Logarithmic Functions As we mentioned in Section 6.1, exponential and logarithmic functions are used to model a wide variety of behaviors in the real world. In the examples that follow, note that while the applications are drawn from many different disciplines, the mathematics remains essentially the same. Due to the applied nature of the problems we will examine in this section, the calculator is often used to express our answers as decimal approximations. 6.5.1 Applications of Exponential Functions Perhaps the most well-known application of exponential functions comes from the financial world. Suppose you have $100 to invest at your local bank and they are offering a whopping 5 % annual percentage interest rate. This means that after one year, the bank will pay you 5% of that $100, or $100(0:05) = $5 in interest, so you now have $105.1 This is in accordance with the formula for simple interest which you have undoubtedly run across at some point before. Equation 6.1. Simple Interest The amount of interest I accrued at an annual rate r on an investmenta P after t years is I = P rt The amount A in the account after t years is given by A = P + I = P + P rt = P (1 + rt) aCalled the principal Suppose, however, that six months into the year, you hear of a better deal at a rival bank.2 Naturally, you withdraw your money and try to invest it at the higher rate there.
    [Show full text]
  • Artificial Neural Networks Part 2/3 – Perceptron
    Artificial Neural Networks Part 2/3 – Perceptron Slides modified from Neural Network Design by Hagan, Demuth and Beale Berrin Yanikoglu Perceptron • A single artificial neuron that computes its weighted input and uses a threshold activation function. • It effectively separates the input space into two categories by the hyperplane: wTx + b = 0 Decision Boundary The weight vector is orthogonal to the decision boundary The weight vector points in the direction of the vector which should produce an output of 1 • so that the vectors with the positive output are on the right side of the decision boundary – if w pointed in the opposite direction, the dot products of all input vectors would have the opposite sign – would result in same classification but with opposite labels The bias determines the position of the boundary • solve for wTp+b = 0 using one point on the decision boundary to find b. Two-Input Case + a = hardlim(n) = [1 2]p + -2 w1, 1 = 1 w1, 2 = 2 - Decision Boundary: all points p for which wTp + b =0 If we have the weights and not the bias, we can take a point on the decision boundary, p=[2 0]T, and solving for [1 2]p + b = 0, we see that b=-2. p w wT.p = ||w||||p||Cosθ θ Decision Boundary proj. of p onto w proj. of p onto w T T w p + b = 0 w p = -bœb = ||p||Cosθ 1 1 = wT.p/||w|| • All points on the decision boundary have the same inner product (= -b) with the weight vector • Therefore they have the same projection onto the weight vector; so they must lie on a line orthogonal to the weight vector ADVANCED An Illustrative Example Boolean OR ⎧ 0 ⎫ ⎧ 0 ⎫ ⎧ 1 ⎫ ⎧ 1 ⎫ ⎨p1 = , t1 = 0 ⎬ ⎨p2 = , t2 = 1 ⎬ ⎨p3 = , t3 = 1⎬ ⎨p4 = , t4 = 1⎬ ⎩ 0 ⎭ ⎩ 1 ⎭ ⎩ 0 ⎭ ⎩ 1 ⎭ Given the above input-output pairs (p,t), can you find (manually) the weights of a perceptron to do the job? Boolean OR Solution 1) Pick an admissable decision boundary 2) Weight vector should be orthogonal to the decision boundary.
    [Show full text]
  • Artificial Neuron Using Mos2/Graphene Threshold Switching Memristor
    University of Central Florida STARS Electronic Theses and Dissertations, 2004-2019 2018 Artificial Neuron using MoS2/Graphene Threshold Switching Memristor Hirokjyoti Kalita University of Central Florida Part of the Electrical and Electronics Commons Find similar works at: https://stars.library.ucf.edu/etd University of Central Florida Libraries http://library.ucf.edu This Masters Thesis (Open Access) is brought to you for free and open access by STARS. It has been accepted for inclusion in Electronic Theses and Dissertations, 2004-2019 by an authorized administrator of STARS. For more information, please contact [email protected]. STARS Citation Kalita, Hirokjyoti, "Artificial Neuron using MoS2/Graphene Threshold Switching Memristor" (2018). Electronic Theses and Dissertations, 2004-2019. 5988. https://stars.library.ucf.edu/etd/5988 ARTIFICIAL NEURON USING MoS2/GRAPHENE THRESHOLD SWITCHING MEMRISTOR by HIROKJYOTI KALITA B.Tech. SRM University, 2015 A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in the Department of Electrical and Computer Engineering in the College of Engineering and Computer Science at the University of Central Florida Orlando, Florida Summer Term 2018 Major Professor: Tania Roy © 2018 Hirokjyoti Kalita ii ABSTRACT With the ever-increasing demand for low power electronics, neuromorphic computing has garnered huge interest in recent times. Implementing neuromorphic computing in hardware will be a severe boost for applications involving complex processes such as pattern recognition. Artificial neurons form a critical part in neuromorphic circuits, and have been realized with complex complementary metal–oxide–semiconductor (CMOS) circuitry in the past. Recently, insulator-to-metal-transition (IMT) materials have been used to realize artificial neurons.
    [Show full text]
  • Using the Modified Back-Propagation Algorithm to Perform Automated Downlink Analysis by Nancy Y
    Using The Modified Back-propagation Algorithm To Perform Automated Downlink Analysis by Nancy Y. Xiao Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degrees of Bachelor of Science and Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 1996 Copyright Nancy Y. Xiao, 1996. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part, and to grant others the right to do so. A uthor ....... ............ Depar -uter Science May 28, 1996 Certified by. rnard C. Lesieutre ;rical Engineering Thesis Supervisor Accepted by.. F.R. Morgenthaler Chairman, Departmental Committee on Graduate Students OF TECH NOLOCG JUN 111996 Eng, Using The Modified Back-propagation Algorithm To Perform Automated Downlink Analysis by Nancy Y. Xiao Submitted to the Department of Electrical Engineering and Computer Science on May 28, 1996, in partial fulfillment of the requirements for the degrees of Bachelor of Science and Master of Engineering in Electrical Engineering and Computer Science Abstract A multi-layer neural network computer program was developed to perform super- vised learning tasks. The weights in the neural network were found using the back- propagation algorithm. Several modifications to this algorithm were also implemented to accelerate error convergence and optimize search procedures. This neural network was used mainly to perform pattern recognition tasks. As an initial test of the system, a controlled classical pattern recognition experiment was conducted using the X and Y coordinates of the data points from two to five possibly overlapping Gaussian distributions, each with a different mean and variance.
    [Show full text]
  • Chapter 10. Logistic Models
    www.ck12.org Chapter 10. Logistic Models CHAPTER 10 Logistic Models Chapter Outline 10.1 LOGISTIC GROWTH MODEL 10.2 CONDITIONS FAVORING LOGISTIC GROWTH 10.3 REFERENCES 207 10.1. Logistic Growth Model www.ck12.org 10.1 Logistic Growth Model Here you will explore the graph and equation of the logistic function. Learning Objectives • Recognize logistic functions. • Identify the carrying capacity and inflection point of a logistic function. Logistic Models Exponential growth increases without bound. This is reasonable for some situations; however, for populations there is usually some type of upper bound. This can be caused by limitations on food, space or other scarce resources. The effect of this limiting upper bound is a curve that grows exponentially at first and then slows down and hardly grows at all. This is characteristic of a logistic growth model. The logistic equation is of the form: C C f (t) = 1+ab−t = 1+ae−kt The above equations represent the logistic function, and it contains three important pieces: C, a, and b. C determines that maximum value of the function, also known as the carrying capacity. C is represented by the dashed line in the graph below. 208 www.ck12.org Chapter 10. Logistic Models The constant of a in the logistic function is used much like a in the exponential function: it helps determine the value of the function at t=0. Specifically: C f (0) = 1+a The constant of b also follows a similar concept to the exponential function: it helps dictate the rate of change at the beginning and the end of the function.
    [Show full text]
  • Deep Multiphysics and Particle–Neuron Duality: a Computational Framework Coupling (Discrete) Multiphysics and Deep Learning
    applied sciences Article Deep Multiphysics and Particle–Neuron Duality: A Computational Framework Coupling (Discrete) Multiphysics and Deep Learning Alessio Alexiadis School of Chemical Engineering, University of Birmingham, Birmingham B15 2TT, UK; [email protected]; Tel.: +44-(0)-121-414-5305 Received: 11 November 2019; Accepted: 6 December 2019; Published: 9 December 2019 Featured Application: Coupling First-Principle Modelling with Artificial Intelligence. Abstract: There are two common ways of coupling first-principles modelling and machine learning. In one case, data are transferred from the machine-learning algorithm to the first-principles model; in the other, from the first-principles model to the machine-learning algorithm. In both cases, the coupling is in series: the two components remain distinct, and data generated by one model are subsequently fed into the other. Several modelling problems, however, require in-parallel coupling, where the first-principle model and the machine-learning algorithm work together at the same time rather than one after the other. This study introduces deep multiphysics; a computational framework that couples first-principles modelling and machine learning in parallel rather than in series. Deep multiphysics works with particle-based first-principles modelling techniques. It is shown that the mathematical algorithms behind several particle methods and artificial neural networks are similar to the point that can be unified under the notion of particle–neuron duality. This study explains in detail the particle–neuron duality and how deep multiphysics works both theoretically and in practice. A case study, the design of a microfluidic device for separating cell populations with different levels of stiffness, is discussed to achieve this aim.
    [Show full text]
  • Neural Network Architectures and Activation Functions: a Gaussian Process Approach
    Fakultät für Informatik Neural Network Architectures and Activation Functions: A Gaussian Process Approach Sebastian Urban Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Prof. Dr. rer. nat. Stephan Günnemann Prüfende der Dissertation: 1. Prof. Dr. Patrick van der Smagt 2. Prof. Dr. rer. nat. Daniel Cremers 3. Prof. Dr. Bernd Bischl, Ludwig-Maximilians-Universität München Die Dissertation wurde am 22.11.2017 bei der Technischen Universtät München eingereicht und durch die Fakultät für Informatik am 14.05.2018 angenommen. Neural Network Architectures and Activation Functions: A Gaussian Process Approach Sebastian Urban Technical University Munich 2017 ii Abstract The success of applying neural networks crucially depends on the network architecture being appropriate for the task. Determining the right architecture is a computationally intensive process, requiring many trials with different candidate architectures. We show that the neural activation function, if allowed to individually change for each neuron, can implicitly control many aspects of the network architecture, such as effective number of layers, effective number of neurons in a layer, skip connections and whether a neuron is additive or multiplicative. Motivated by this observation we propose stochastic, non-parametric activation functions that are fully learnable and individual to each neuron. Complexity and the risk of overfitting are controlled by placing a Gaussian process prior over these functions. The result is the Gaussian process neuron, a probabilistic unit that can be used as the basic building block for probabilistic graphical models that resemble the structure of neural networks.
    [Show full text]
  • Neural Networks
    School of Computer Science 10-601B Introduction to Machine Learning Neural Networks Readings: Matt Gormley Bishop Ch. 5 Lecture 15 Murphy Ch. 16.5, Ch. 28 October 19, 2016 Mitchell Ch. 4 1 Reminders 2 Outline • Logistic Regression (Recap) • Neural Networks • Backpropagation 3 RECALL: LOGISTIC REGRESSION 4 Using gradient ascent for linear Recall… classifiers Key idea behind today’s lecture: 1. Define a linear classifier (logistic regression) 2. Define an objective function (likelihood) 3. Optimize it with gradient descent to learn parameters 4. Predict the class with highest probability under the model 5 Using gradient ascent for linear Recall… classifiers This decision function isn’t Use a differentiable function differentiable: instead: 1 T pθ(y =1t)= h(t)=sign(θ t) | 1+2tT( θT t) − sign(x) 1 logistic(u) ≡ −u 1+ e 6 Using gradient ascent for linear Recall… classifiers This decision function isn’t Use a differentiable function differentiable: instead: 1 T pθ(y =1t)= h(t)=sign(θ t) | 1+2tT( θT t) − sign(x) 1 logistic(u) ≡ −u 1+ e 7 Recall… Logistic Regression Data: Inputs are continuous vectors of length K. Outputs are discrete. (i) (i) N K = t ,y where t R and y 0, 1 D { }i=1 ∈ ∈ { } Model: Logistic function applied to dot product of parameters with input vector. 1 pθ(y =1t)= | 1+2tT( θT t) − Learning: finds the parameters that minimize some objective function. θ∗ = argmin J(θ) θ Prediction: Output is the most probable class. yˆ = `;Kt pθ(y t) y 0,1 | ∈{ } 8 NEURAL NETWORKS 9 Learning highly non-linear functions f: X à Y l f might be non-linear
    [Show full text]
  • A Connectionist Model Based on Physiological Properties of the Neuron
    Proceedings of the International Joint Conference IBERAMIA/SBIA/SBRN 2006 - 1st Workshop on Computational Intelligence (WCI’2006), Ribeir˜aoPreto, Brazil, October 23–28, 2006. CD-ROM. ISBN 85-87837-11-7 A Connectionist Model based on Physiological Properties of the Neuron Alberione Braz da Silva Joao˜ Lu´ıs Garcia Rosa Ceatec, PUC-Campinas Ceatec, PUC-Campinas Campinas, SP, Brazil Campinas, SP, Brazil [email protected] [email protected] Abstract 2 Intraneuron Signaling Recent research on artificial neural network models re- 2.1 The biological neuron and the intra- gards as relevant a new characteristic called biological neuron signaling plausibility or realism. Nevertheless, there is no agreement about this new feature, so some researchers develop their The intraneuron signaling is based on the principle of own visions. Two of these are highlighted here: the first is dynamic polarization, proposed by Ramon´ y Cajal, which related directly to the cerebral cortex biological structure, establishes that “electric signals inside a nervous cell flow and the second focuses the neural features and the signaling only in a direction: from neuron reception (often the den- between neurons. The proposed model departs from the pre- drites and cell body) to the axon trigger zone” [5]. vious existing system models, adopting the standpoint that a Based on physiological evidences of principle of dy- biologically plausible artificial neural network aims to cre- namic polarization the signaling inside the neuron is per- ate a more faithful model concerning the biological struc- formed by four basic elements: Receptive, responsible for ture, properties, and functionalities of the cerebral cortex, input signals; Trigger, responsible for neuron activation not disregarding its computational efficiency.
    [Show full text]
  • Lecture 5: Logistic Regression 1 MLE Derivation
    CSCI 5525 Machine Learning Fall 2019 Lecture 5: Logistic Regression Feb 10 2020 Lecturer: Steven Wu Scribe: Steven Wu Last lecture, we give several convex surrogate loss functions to replace the zero-one loss func- tion, which is NP-hard to optimize. Now let us look into one of the examples, logistic loss: given d parameter w and example (xi; yi) 2 R × {±1g, the logistic loss of w on example (xi; yi) is defined as | ln (1 + exp(−yiw xi)) This loss function is used in logistic regression. We will introduce the statistical model behind logistic regression, and show that the ERM problem for logistic regression is the same as the relevant maximum likelihood estimation (MLE) problem. 1 MLE Derivation For this derivation it is more convenient to have Y = f0; 1g. Note that for any label yi 2 f0; 1g, we also have the “signed” version of the label 2yi − 1 2 {−1; 1g. Recall that in general su- pervised learning setting, the learner receive examples (x1; y1);:::; (xn; yn) drawn iid from some distribution P over labeled examples. We will make the following parametric assumption on P : | yi j xi ∼ Bern(σ(w xi)) where Bern denotes the Bernoulli distribution, and σ is the logistic function defined as follows 1 exp(z) σ(z) = = 1 + exp(−z) 1 + exp(z) See Figure 1 for a visualization of the logistic function. In general, the logistic function is a useful function to convert real values into probabilities (in the range of (0; 1)). If w|x increases, then σ(w|x) also increases, and so does the probability of Y = 1.
    [Show full text]
  • An Introduction to Logistic Regression: from Basic Concepts to Interpretation with Particular Attention to Nursing Domain
    J Korean Acad Nurs Vol.43 No.2, 154 -164 J Korean Acad Nurs Vol.43 No.2 April 2013 http://dx.doi.org/10.4040/jkan.2013.43.2.154 An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain Park, Hyeoun-Ae College of Nursing and System Biomedical Informatics National Core Research Center, Seoul National University, Seoul, Korea Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial short- comings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nurs- ing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models. Key words: Logit function, Maximum likelihood estimation, Odds, Odds ratio, Wald test INTRODUCTION The model serves two purposes: (1) it can predict the value of the depen- dent variable for new values of the independent variables, and (2) it can Multivariable methods of statistical analysis commonly appear in help describe the relative contribution of each independent variable to general health science literature (Bagley, White, & Golomb, 2001).
    [Show full text]
  • Artificial Neural Networks· T\ Briet In!Toquclion
    GENERAL I ARTICLE Artificial Neural Networks· t\ BrieT In!TOQuclion Jitendra R Raol and Sunilkumar S Mankame Artificial neural networks are 'biologically' inspired net­ works. They have the ability to learn from empirical datal information. They find use in computer science and control engineering fields. In recent years artificial neural networks (ANNs) have fascinated scientists and engineers all over the world. They have the ability J R Raol is a scientist with to learn and recall - the main functions of the (human) brain. A NAL. His research major reason for this fascination is that ANNs are 'biologically' interests are parameter inspired. They have the apparent ability to imitate the brain's estimation, neural networks, fuzzy systems, activity to make decisions and draw conclusions when presented genetic algorithms and with complex and noisy information. However there are vast their applications to differences between biological neural networks (BNNs) of the aerospace problems. He brain and ANN s. writes poems in English. A thorough understanding of biologically derived NNs requires knowledge from other sciences: biology, mathematics and artifi­ cial intelligence. However to understand the basics of ANNs, a knowledge of neurobiology is not necessary. Yet, it is a good idea to understand how ANNs have been derived from real biological neural systems (see Figures 1,2 and the accompanying boxes). The soma of the cell body receives inputs from other neurons via Sunilkumar S Mankame is adaptive synaptic connections to the dendrites and when a neuron a graduate research student from Regional is excited, the nerve impulses from the soma are transmitted along Engineering College, an axon to the synapses of other neurons.
    [Show full text]