Arxiv:1709.02341V5 [Q-Bio.NC] 11 Oct 2018

DEEP ACTIVE INFERENCE KAI UELTZHOFFER¨ Abstract. This work combines the free energy principle from cognitive neuroscience and the ensuing active inference dynamics with recent advances in variational inference on deep generative models and evolution strategies as efficient large-scale black-box optimisation technique, to introduce the "deep active inference" agent. This agent tries to minimize a variational free energy bound on the average surprise of its sensations, which is motivated by a homeostatic argument. It does so by changing the parameters of its generative model, together with a variational density approximating the posterior distribution over latent variables, given its observations, and by acting on its environment to actively sample input that is likely under its generative model. The internal dynamics of the agent are implemented using deep neural networks, as used in machine learning, and recurrent dynamics, making the deep active inference agent a scalable and very flexible class of active inference agents. Using the mountaincar problem, we show how goal-directed behaviour can be implemented by defining sensible prior expectations on the latent states in the agent's model, that it will try to fulfil. Furthermore, we show that the deep active inference agent can learn a generative model of the environment, which can be sampled from to understand the agent's beliefs about the environment and its interaction with it. 1. Introduction Active Inference (Friston et al., 2006, 2010; Friston, 2012) is a normative theory of brain function derived from the properties required by active agents to survive in dynamic, fluctuating environments. This theory is able to account for many aspects of action and perception as well as anatomic and physiologic features of the brain on one hand (Brown and Friston, 2012; Friston, 2005; Friston et al., 2011; Friston and Kiebel, 2009; Schwartenbeck et al., 2015; Adams et al., 2013), and encompasses many formal theories about brain function on the other (Friston, 2010). In terms of its functional form it rests on the minimisation of an upper variational bound on the agents average surprise. In this way it is formally very similar to state of the art algorithms for variational inference in deep generative models (Rezende et al., 2014; Kingma and Welling, 2014; Chung et al., 2015). However, optimising this bound for active agents introduces a dependency on the true dynamics of the world, to which the agent usually does not have access, and whose true arXiv:1709.02341v5 [q-bio.NC] 11 Oct 2018 Date: October 12, 2018 This is a pre-print of an article published in Biological Cybernetics. The final authenticated ver- sion is available online at: https://doi.org/10.1007/s00422-018-0785-7 The author has archived a personal copy of the accepted manuscript at his personal homepage: https://kaiu.me. 1 2 KAI UELTZHOFFER¨ functional form does not have to coincide with the functional form of the agent's generative model. Here we solve these problems using deep neural networks (LeCun et al., 2015) and recurrent neural networks (Karpathy et al., 2015) as flexible function approximators, which allow the agent's generative model to learn a good approximation of the true world dynamics. Futhermore we apply evolution strategies (Salimans et al., 2017) to estimate gradients on the variational bound, averaged over a population of agents. This formalism allows to obtain gradient estimates even for non-differentiable objective functions, which is the case here; since the agent does neither know the equations of motions of the world, nor its partial derivatives. In this way, our approach pairs active inference with state of the art machine learning techniques to create agents that can successfully reach goals in complex environments while simultaneously building a generative model of themselves and their surroundings. In this work we want to lay out the basic form of these so called "Deep Active Inference" agents and illustrate their dynamics using a simple, well known problem from reinforcement learning, namely the mountain car problem (Moore, 1991). By utilising models and optimisation techniques that have been applied successfully to real-world data and large-scale problems (Rezende et al., 2014; Kingma and Welling, 2014; Chung et al., 2015; Salimans et al., 2017), our agent can be scaled to complex and rich environments. With this paper we publish the full implementation of the resulting Deep Active Inference agent, together with all scripts used to create the figures in this publication at https://www.github.com/ kaiu85/deepAI_paper. In section 2 we will briefly recapitulate the active inference principle. In section 3 we will describe the mountain car environment and introduce a deep active inference agent that is able to solve this problem. In section 4 we will show that the agent can solve the mountain car problem while simultaneously learning a generative model of its environment. In section 5 we will discuss possible further directions of this approach and its relation to other approaches. 2. Active Inference Active Inference (Friston et al., 2006, 2010; Friston, 2012) rests on the basic assumption that any agent in exchange with a dynamic, fluctuating environment has to keep certain inner parameters within a well defined range. Otherwise, it would sooner or later encounter a phase transition due to which it would loose its defining characteristics and therefore disappear. Thus, the agent must restrict itself to a small volume in its state space. This can be formalised using the entropy of the probability distribution p(s∗) of finding the agent in a given state s∗ of its state space S∗: Z H(S∗) = ( ln p(s∗)) p(s∗) ds∗ s∗2S∗ − By minimising this entropy, an agent can counter dispersive effects from its environment and maintain a stable identity in a fluctuating environment. However, an agent does not have direct access to an objective measurement of its current state. Instead it only perceives itself and the world around it via its sensory epthelia. This DEEP ACTIVE INFERENCE 3 can be described by a potentially noisy sensory mapping o = g(s) from states s to sensations o. Defining the sensory entropy Z H(O) = ( ln p(o)) p(o) do o2O − over the space O of all possible observations of an agent, one can derive the following inequality in the absence of sensory noise (Friston et al., 2010) Z ∗ ∗ H(O) H(S ) + p(s ) ln gs∗ ds ≥ s∗2S∗ j j Agents, whose sensory organs did not have a good mapping of relevant physical states to appropriate sensory inputs, would not last very long. So the mapping between the agents true states and its sensations is assumed to have an almost constant sensitivity, in terms of the determinant of the Jacobian gs∗ over the encountered range of states s. This makes the last term approximately constantj j and allows upper bounding the entropy of the agent's distribution on state space by the entropy of its sensory states plus this constant term (Friston et al., 2010). Thus, to ensure keep its physiological variables within well-defined bounds, an agent has to minimize its sensory entropy H(O) 1 Assuming ergodicity, i.e. the equivalence of time- and ensemble-averages, one can write the sensory entropy as 1 Z T H(O) = lim ln p(o(t)) dt T !1 −T 0 From the calculus of variations it follows that an agent can minimize its sensory entropy by minimising its sensory surprise ln p(o(t)) at all times, in terms of the following Euler- Lagrange-equation: − o ( ln p(o(t))) = 0 r − To be able to efficiently do this, our agent needs a statistical model of its sensory inputs, to evaluate p(o). Since the world in which we live is hierarchically organised, dynamic, and features a lot of complex noise, we assume that the agent's model is a deep, recurrent, latent variable model (Conant and Ashby, 1970). Furthermore we assume that this model is generative, using the observation that we are able to imagine certain situations and perceptions (like the image of a blue sky over a desert landscape) without actually experiencing or having experienced them. Thus, we work with a generative model pθ(o; s) of sensory observations o and latent variables s, that represent the hidden true states of the world, which we can factorise into a likelihood function pθ(o s) and a prior on the states j pθ(s): pθ(o; s) = pθ(o s)pθ(s) j The set θ comprises the slowly changing parameters that the agent can change to improve its model of the world. In the brain this might be the pattern and strength of synapses. 1In more general formulations of active inference, the assumption that the mapping between hidden states and outcomes is constant can be relaxed (Friston et al., 2015). 4 KAI UELTZHOFFER¨ Given this factorisation, to minimize surprise, the agent has to solve the hard task of calculating Z pθ(o) = pθ(o s)pθ(s) ds j by marginalising over all possible states that could lead to a given observation. As the dimensionality of the latent state space S can be very high, this integral is extremely hard to solve. Therefore a further assumption of the free energy principle is, that agents do not try to minimize the surprise ln pθ(o) directly, but rather minimize an upper bound, which is a lot simpler to calculate.− Using the fact that the Kullback-Leibler (KL) Divergence Z pa(x) DKL(pa(x) pb(x)) = ln pa(x) dx jj x2X pb(x) between two arbitrary distributions pa(x) and pb(x) with a shared support on a com- mon space X is always greater or equal to zero, and equal to zero if and only if the two distributions are equal, we can define the variational free energy as: F (o; θ; u) = ln pθ(o) + DKL(qu(s) p(s o)) ln pθ(o) − jj j ≥ − Here qu(s) is an arbitrary, so called variational density over the space of hidden states s, which belongs to a family of distributions parameterised by a time-dependent, i.e.

Load more