Optimal Prediction with Resource Constraints Using the Information Bottleneck
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069179; this version posted May 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Optimal prediction with resource constraints using the information bottleneck Vedant Sachdeva1, Thierry Mora2, Aleksandra M. Walczak3, Stephanie Palmer4 1 Graduate Progam in Biophysical Sciences, University of Chicago, Chicago, IL, 2Laboratoire de physique statistique, CNRS, Sorbonne Universit´e,Universit´eParis-Diderot, and Ecole´ Normale Sup´erieure (PSL University), 24, rue Lhomond, 75005 Paris, France, 3 Laboratoire de physique th´eorique,CNRS, Sorbonne Universit´e, and Ecole´ Normale Sup´erieure (PSL University), 24, rue Lhomond, 75005 Paris, France, 4 Deparment of Organismal Biology and Anatomy, University of Chicago, Chicago, IL∗ (Dated: April 29, 2020) Responding to stimuli requires that organisms encode information about the external world. Not all parts of the signal are important for behavior, and resource limitations demand that signals be compressed. Prediction of the future input is widely beneficial in many biological systems. We compute the trade-offs between representing the past faithfully and predicting the future for input dynamics with different levels of complexity. For motion prediction, we show that, depending on the parameters in the input dynamics, velocity or position coordinates prove more predictive. We identify the properties of global, transferrable strategies for time-varying stimuli. For non-Markovian dynamics we explore the role of long-term memory of the internal representation. Lastly, we show that prediction in evolutionary population dynamics is linked to clustering allele frequencies into non-overlapping memories, revealing a very different prediction strategy from motion prediction. I. INTRODUCTION predictions might be encoded by neural and molecular systems using a variety of dynamical inputs that explore a range of temporal correlation structures. We solve the How biological systems represent external stimuli is `information bottleneck' problem in each of these scenar- critical to their behavior. The efficient coding hypoth- ios and describe the optimal encoding structure in each esis, which states that neural systems extract as much case. information as possible from the external world, given ba- sic capacity constraints, has been successful in explaining The information bottleneck framework allows us to de- some early sensory representations in neuroscience. Bar- fine a `relevance' variable in the encoded sensory stream, low suggested sensory circuits may reduce redundancy in which we take to be the future behavior of that input. the neural code and minimize metabolic costs for signal Solving the bottleneck problem allows us to optimally es- transmission [1{4]. However, not all external stimuli are timate the future state of the external stimulus, given a as important to an organism, and behavioral and environ- certain amount of information retained about the past. mental constraints need to be integrated into this picture In general, prediction of the future coordinates of a sys- to more broadly characterize biological encoding. Delays tem, Xt+∆t reduces to knowing the precise historical co- in signal transduction in biological systems mean that ordinates of the stimulus Xt and an exact knowledge of predicting external stimuli efficiently can confer benefits the temporal correlations in the system. These rules and to biological systems [5{7], making prediction a general temporal correlations can be thought of as arising from goal in biological sensing. two parts: a deterministic portion, described by a func- tion of the previous coordinates, H(Xt), and the noise Evidence that representations constructed by sensory internal to the system, ξ(t). Knowing the actual real- systems efficiently encode predictive information has ization of the noise ξ(t) reduces the prediction problem been found in the visual and olfactory systems [8{10]. to simply integrating the stochastic equations of motion Molecular networks have also been shown to be predictive forward in time. If the exact realization of the noise if of future states, suggesting prediction may be one of the not known, we can still perform a stochastic prediction underlying principles of biological computation [11, 12]. by calculating the future form of the probability distri- However, the coding capacity of biological systems is lim- bution of the variable X or its moments [15, 16]. The ited because they cannot provide arbitrarily high preci- t higher-order moments yield an estimate of Xt and the sion about their inputs: limited metabolic resources and uncertainty in the our estimate. However, biological sys- other sources of internal noise impose finite precision sig- tems cannot precisely know Xt due to inherently limited nal encoding. Given these trade-offs, one way to effi- readout precision [17, 18] and limited availability of re- ciently encode the history of an external stimulus is to sources tasked with remembering the measured statistics. keep only the information relevant for the prediction of the future input [12{14]. Here, we explore how optimal Constructing internal representations of sensory stim- uli illustrates a tension between predicting the future, for which the past must be known with higher certainty, and compression of knowledge of the past, due to finite ∗Correspondence should be addressed to [email protected] resources. We explore this intrinsic trade-off using the in- bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069179; this version posted May 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 2 formation bottleneck (IB) approach proposed by Tishby Information et. al. [13]. This method assumes that the input variable, Bottleneck in our case the past signal Xt, can be used to make infer- ences about the relevance variable, in our case the future Build compressed representation signal Xt+∆t. By introducing a representation variable, X~, we can construct the conditional distribution of the representation variable on the input variable P(X~jXt) to be maximally informative of the output variable (Fig. 1). Formally, the representation is constructed by optimiz- FIG. 1: A schematic representation our predictive informa- ing the objective function, tion bottleneck. On the left hand side, we have coordinates Xt evolving in time, subject to noise to give Xt+∆t. We construct a representation, X~, that compresses the past input (mini- mizes I(Xt; X~)) while retaining as much information about ~ L = min I(X ; X~) − βI(X~; X ): (1) the future (maximizes I(X; Xt+∆t)) up to the weighting of t t+∆t the prediction compared to the compression set by β. P(X~ jXt) Each term is the mutual information between two vari- II. RESULTS ables: the first between the past input and estimate of the past given our representation model, X~, and the sec- A. The Stochastically Driven Damped Harmonic ond between X~ and future input. The tradeoff param- Oscillator eter, β, controls how much future information we want X~ to retain as it is maximally compressed. For large β, the representation variable must be maximally informa- tive about Xt+∆t, and will have, in general, the lowest Previous work explored the ability of the retina to con- compression. Small β means less information is retained struct an optimally predictive internal representation of about the future and high, lossy compression is allowed. a dynamic stimulus. Palmer et al [9] recorded the re- sponse of a salamander retina to a moving bar stimulus The causal relationship between the past and the fu- with SDDHO dynamics. In this case, the spike trains ture results in a data processing inequality, I(Xt; X~) ≥ in the retina encode information about the past stimuli in a near-optimally predictive way [9]. In order for op- I(Xt+∆t; X~), meaning that the information generated about the future cannot exceed the amount encoded timal prediction to be possible, the retina should encode about the past [19]. Additionally, the information the position and velocity as dictated by the information about the past that the representation can extract is bottleneck solution to the problem, for the retina's given bounded by the amount of information the uncompressed level of compression of the visual input. Inspired by this experiment, we explore the optimal predictive encoding past, itself, contains about the future, I(X~; Xt+∆t) ≤ schemes as a function of the parameters in the dynamics, I(Xt; Xt+∆t). and we describe the optimal solution across the entire pa- We use this framework to study prediction in two rameter space of the model, over a wide range of desired well-studied dynamical systems with ties to biological prediction timescales. data: the stochastically driven damped harmonic oscil- We consider the dynamics of a mass m in a viscous lator (SDDHO) and the Wright-Fisher model. We look medium attached to a spring receiving noisy velocity simultaneously at these two different systems to gain in- kicks generated by a temporally uncorrelated Gaussian tuition about how different types of dynamics influence process, as depicted in Figure 2a. Equations of motion the ability of a finite and noisy system to make accurate are introduced in terms of physical variablesx ¯,v ¯, and predictions. We further consider two types of SDDHO t¯ (bars will be dropped later when referring to rescaled processes with different noise profiles to study the effect variables), which evolve according to of noise correlations on prediction. Our exploration of the SDDHO system has a two-fold motivation: it is the dv¯ simplest possible continuous stochastic system whose full m = −kx¯ − Γ¯v + (2k T Γ)1=2ξ(t¯); (2) ¯ B dynamics can be solved exactly. Additionally, a visual dt dx¯ stimulus in the form of a moving bar that was driven by =v; ¯ an SDDHO process was used in retinal response stud- dt¯ ies [9, 20, 21].