Population Dynamics: Variance and the Sigmoid Activation Function ⁎ André C

www.elsevier.com/locate/ynimg NeuroImage 42 (2008) 147–157 Technical Note Population dynamics: Variance and the sigmoid activation function ⁎ André C. Marreiros, Jean Daunizeau, Stefan J. Kiebel, and Karl J. Friston Wellcome Trust Centre for Neuroimaging, University College London, UK Received 24 January 2008; revised 8 April 2008; accepted 16 April 2008 Available online 29 April 2008 This paper demonstrates how the sigmoid activation function of (David et al., 2006a,b; Kiebel et al., 2006; Garrido et al., 2007; neural-mass models can be understood in terms of the variance or Moran et al., 2007). All these models embed key nonlinearities dispersion of neuronal states. We use this relationship to estimate the that characterise real neuronal interactions. The most prevalent probability density on hidden neuronal states, using non-invasive models are called neural-mass models and are generally for- electrophysiological (EEG) measures and dynamic casual modelling. mulated as a convolution of inputs to a neuronal ensemble or The importance of implicit variance in neuronal states for neural-mass population to produce an output. Critically, the outputs of one models of cortical dynamics is illustrated using both synthetic data and ensemble serve as input to another, after some static transforma- real EEG measurements of sensory evoked responses. © 2008 Elsevier Inc. All rights reserved. tion. Usually, the convolution operator is linear, whereas the transformation of outputs (e.g., mean depolarisation of pyramidal Keywords: Neural-mass models; Nonlinear; Modelling cells) to inputs (firing rates in presynaptic inputs) is a nonlinear sigmoidal function. This function generates the nonlinear beha- viours that are critical for modelling and understanding neuronal activity. We will refer to these functions as activation or input- Introduction firing curves. The mechanisms that cause a neuron to fire are complex The aim of this paper is to show how the sigmoid activation (Mainen and Sejnowski, 1995; Destexhe and Paré, 1999); they function in neural-mass models can be understood in terms of the depend on the state (open, closed; active, inactive) of several kinds dispersion of underlying neuronal states. Furthermore, we show of ion channels in the postsynaptic membrane. The configuration how this relationship can be used to estimate the probability den- of these channels depends on many factors, such as the history of sity of neuronal states using non-invasive electrophysiological presynaptic inputs and the presence of certain neuromodulators. As measures such as the electroencephalogram (EEG). a result, neuronal firing is often treated as a stochastic process. There is growing interest in the use of mean-field and neural- Random fluctuations in neuronal firing function are an important mass models as observation models for empirical neurophysiolo- aspect of neuronal dynamics and have been the subject of much gical time-series (Wilson and Cowan, 1972; Nunez, 1974; Lopes study. For example, Miller and Wang (2006) look at the temporal da Silva et al., 1976; Freeman, 1978, 1975; Jansen and Rit, 1995; fluctuations in firing patterns in working memory models with Jirsa and Haken, 1996; Wright and Liley, 1996; Valdes et al., 1999; persistent states. One perspective on this variability is that it is Steyn-Ross et al., 1999; Frank et al., 2001; David and Friston, caused by fluctuations in the threshold of the input-firing curve of 2003; Robinson et al., 1997, 2001; Robinson, 2005; Rodrigues individual neurons. This is one motivation for a sigmoid activation et al., 2006). Models of neuronal dynamics allow one to ask function at the level of population dynamics; which rests on the mechanistic questions about how observed data are generated. well-known result that the average of many different threshold These questions or hypotheses can be addressed through model functions is a nonlinear sigmoid. We will show the same sigmoid selection by comparing the evidence for different models, given function can be motivated by assuming fluctuations in the neuronal the same data. This endeavour is referred to as dynamic causal states (Hodgkin and Huxley, 1952). This is a more plausible modelling (DCM) (Friston, 2002, 2003; Penny et al., 2004; David assumption because variations in postsynaptic depolarisation over et al., 2006a,b; Kiebel et al., 2006). There has been considerable a population are greater than variations in firing threshold (Fricker success in modelling fMRI, EEG, MEG and LFP data using DCM et al 1999): in active cells, membrane potential values fluctuate by up to about 20 mV, due largely to hyperpolarisations that follow ⁎ Corresponding author. Wellcome Trust Centre for Neuroimaging, Institute activation. In contrast, firing thresholds vary up to only 8 mV. of Neurology, UCL, 12 Queen Square, London, WC1N 3BG, UK. Fax: +44 Furthermore, empirical studies show that voltage thresholds, 207 813 1445. determined from current injection or by elevating extracellular + E-mail address: [email protected] (A.C. Marreiros). K , vary little with the rate of membrane polarisation and that the Available online on ScienceDirect (www.sciencedirect.com). “speed of transition into the inactivated states also appears to 1053-8119/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2008.04.239 148 A.C. Marreiros et al. / NeuroImage 42 (2008) 147–157 contribute to the invariance of threshold for all but the fastest sigmoid function. This enables us to estimate the implicit probability depolarisations” (Fricker et al., 1999). In short, the same mean- density function on depolarisation of neurons within each popula- field model can be interpreted in terms of random fluctuations on tion. We conclude by discussing the implications of our results for the firing thresholds of different neurons or fluctuations in their neural-mass models, which ignore the effects of population variance states. The latter interpretation is probably more plausible from on the evolution of mean activity. We use these conclusions to neurobiological point of view and endows the sigmoid function motivate a more general model of population dynamics that will be parameters with an interesting interpretation, which we exploit in presented in a subsequent paper (Marreiros et al., manuscript in this paper. It should be noted that Wilson and Cowan (1972) preparation). anticipated that the sigmoid could arise from a fixed threshold and population variance in neural states; after Eq. (1) of their seminal Theory paper they state: “Alternatively, assume that all cells within a subpopulation have the same threshold, … but let there be a In this section, we will show that the sigmoid activation func- distribution of the number of afferent synapses per cell.” This tion used in neural-mass models can be derived from straightfor- distribution induces variability in the afferent activity seen by any ward considerations about single-neuron dynamics. To do this, we cell. look at the relationship between variance introduced at the level of This is the first in a series of papers that addresses the impor- individual neurons and their population behaviour. tance of high-order statistics (i.e., variance) in neuronal dynamics, Saturating nonlinear activation functions can be motivated by when trying to model and understand observed neurophysiological considering neurons as binary units; i.e., as being in an active or time-series. In this paper, we focus on the origin of the sigmoid inactive state. Wilson and Cowan (1972) showed that (assuming activation function, which is a ubiquitous component of many neuronal responses rest on a threshold or Heaviside function of neural-mass and cortical-field models. In brief, this treatment pro- activity) any unimodal distribution of thresholds results in a sig- vides an interpretation of the sigmoid function as the cumulative moid activation function at the population level. This can be seen density on postsynaptic depolarisation over an ensemble or pop- easily by assuming a distribution of thresholds within a population ulation of neurons. Using real EEG data we will show that characterised by the density, p(w). For unimodal p(w), the response population variance, in the depolarisation of neurons in somato- function, which is the integral of the threshold density, will have a sensory sources generating sensory evoked potentials (SEP) sigmoid form. For symmetric and unimodal distributions, the sig- (Litvak et al., 2007) can be quite substantial, especially in relation moid is symmetric and monotonically increasing; for asymmetric to evoked changes in the mean. In a subsequent paper, we will distributions, the sigmoid loses point symmetry around the present a mean-field model of population dynamics that covers inflection point; in the case of multimodal distributions, the sig- both the mean and variance of neuronal states. A special case of moid becomes wiggly (monotonically increasing but with more this model is the neural-mass model, which assumes that the than one inflexion point). Another motivation for saturating acti- variance is fixed (David et al., 2006a,b; Kiebel et al., 2006). In a vation functions considers the firing rate of a neuron and assumes final paper, we will use these models as probabilistic generative that its time average equals the population average (i.e., activity is models (i.e., dynamic causal models) to show that population ergodic). The firing rate of neurons always shows saturation and variance can

Population Dynamics: Variance and the Sigmoid Activation Function ⁎ André C

Training Autoencoders by Alternating Minimization

Neural Network in Hardware

Dynamic Modification of Activation Function Using the Backpropagation Algorithm in the Artificial Neural Networks

Two-Steps Implementation of Sigmoid Function for Artificial Neural Network in Field Programmable Gate Array

Information Flows of Diverse Autoencoders

Compositional Distributional Semantics with Long Short Term Memory

Long Short-Term Memory 1 Introduction

Neural Networks

1 Activation Functions in Single Hidden Layer Feed

Rethinking the Role of Activation Functions in Deep Convolutional Neural Networks for Image Classification

Chapter 10: Artificial Neural Networks

On the Derivatives of the Sigmoid