Task-Driven Estimation and Control Via Information Bottlenecks

Task-Driven Estimation and Control via Information Bottlenecks Vincent Pacelli and Anirudha Majumdar Abstract— Our goal is to develop a principled and general algorithmic framework for task-driven estimation and control for robotic systems. State-of-the-art approaches for controlling robotic systems typically rely heavily on accurately estimating the full state of the robot (e.g., a running robot might estimate joint angles and velocities, torso state, and position relative to a goal). However, full state representations are often excessively rich for the specific task at hand and can lead to significant Fig. 1. A schematic of our technical approach. We seek to synthesize computational inefficiency and brittleness to errors in state (offline) a minimalistic set of task-relevant variables (TRVs) x~t that create x u estimation. In contrast, we present an approach that eschews a bottleneck between the full state t and the control input t. These TRVs π such rich representations and seeks to create task-driven repre- are estimated online in order to apply the policy t. We demonstrate our approach on a spring-loaded inverted pendulum model whose goal is to sentations. The key technical insight is to leverage the theory of run to a target location. Our approach automatically synthesizes a one- information bottlenecks to formalize the notion of a “task-driven dimensional TRV x~ sufficient for achieving this task. representation” in terms of information theoretic quantities that t measure the minimality of a representation. We propose novel estimated. Second, since only a few prominent variables need iterative algorithms for automatically synthesizing (offline) a to be estimated, fewer sources of measurement uncertainty task-driven representation (given in terms of a set of task- relevant variables (TRVs)) and a performant control policy that result in a more robust policy. While one can sometimes is a function of the TRVs. We present online algorithms for manually design task-driven representations for a given task, estimating the TRVs in order to apply the control policy. We we currently lack a principled theoretical and algorithmic demonstrate that our approach results in significant robustness framework for automatically synthesizing such representa- to unmodeled measurement uncertainty both theoretically and tions. The goal of this paper is to develop precisely such an via thorough simulation experiments including a spring-loaded inverted pendulum running to a goal location. algorithmic approach. Statement of Contributions. The main technical con- I. INTRODUCTION tribution of this paper is to formulate the synthesis of State-of-the-art techniques for controlling robotic systems task-driven representations as an optimization problem us- typically rely heavily on accurately estimating the full state ing information bottleneck theory [4]. We present offline of the system and maintaining rich geometric representations algorithms that encode the full state of the system into of their environment. For example, a common approach to a set of task-relevant variables (TRVs) and simultaneously navigation is to build a dense occupancy map produced by identify a performant policy (restricted to be a function scanning the environment and to use this map for planning of the TRVs) using novel iterative algorithms that exploit and control. Similarly, control for walking or running robots the structure of this optimization problem in a number of typically involves estimating the full state of the robot dynamical settings including discrete-state, linear-Gaussian, (e.g., joint angles, velocities, and position relative to a goal and nonlinear systems. We present online algorithms for location). However, such representations are often overly estimating the TRVs in order to apply the control policy. We detailed when compared to a task-driven representation. demonstrate that our approach yields policies that are robust One example of a task-driven representation is the “gaze to unmodeled measurement uncertainty both theoretically heuristic” from cognitive psychology [1], [2], [3]. When (using results from the theory of risk metrics) and in a attempting to catch a ball, an agent can estimate the ball’s number of simulation experiments including running using a position and velocity, model how it will evolve in conjunction spring-loaded inverted pendulum (SLIP) model (Fig.1). with environmental factors like wind, integrate the pertinent differential equations, and plan a trajectory in order to arrive A. Related Work at the ball’s final location. In contrast, cognitive psychology By far the most common approach in practice for control- studies have shown that humans use a dramatically simpler ling robotic systems with nonlinear dynamics and partially strategy that entails maintaining the angle that the human’s observable state is to independently design an estimator for gaze makes with the ball at a constant value. This method the full state (e.g., a Kalman filter [5]) and a controller reduces a number of hard-to-monitor variables (e.g., wind that assumes perfect state information (e.g., designed using speed) into a single easily-estimated variable. Modulating robust control techniques such as H1 control [6], sliding this variable alone results in accomplishing the task. The gaze heuristic example highlights the two primary . This work was supported by the National Science Foundation (NSF) [IIS-1755038] and a Google Faculty Research Award. advantages of using a task-driven representation. First, a . The authors are with the Mechanical and Aerospace Engineering control policy that uses such a representation is more ef- department at Princeton University, NJ, 08540, USA fvpacelli, ficient to employ online since fewer variables need to be [email protected] mode control [7], [8], passivity-based control [9], [10], or tations (minimal sufficient statistics of a full or “complete” Lyapunov-based control [8], [10]). While this strategy is op- representation). While highly ambitious in scope, this work timal for Linear-Quadratic-Gaussian (LQG) problems due to has largely been devoted to studying visual decision tasks the separation principle [11], it can produce a brittle system (e.g., recognition, detection, categorization). The algorithmic due to errors in the state estimate for nonlinear systems approach taken in [30], [31] is thus tied to the specifics (since the separation principle does not generally hold in of visual problems (e.g., designing visual feature detectors this setting). We demonstrate that our task-driven approach that are invariant to nuisance factors such as contrast, scale, affords significant robustness when compared to approaches and translation). Our goals and technical approach here are that perform full state estimation and control assuming the complementary. We do not specifically target visual decision separation principle (see Section V for numerical examples). problems; instead, we seek to develop a general framework Moreover, in contrast to traditional robust estimation and that is applicable to a broad range of robotic control problems control techniques, our approach does not rely on explicit and allows us to automatically synthesize task-driven repre- models of measurement uncertainty. We demonstrate that sentations (without manually designing feature detectors). robustness can be achieved implicitly as a by-product of task- driven representations. II. TASK-DRIVEN REPRESENTATIONS AND CONTROLLERS Historically, the work on designing information con- strained controllers has been pursued within the networked Our goal in this section is to formally define the control theory literature [12], [13], [14]. Recently, the op- notion of a “task-driven representation” and formulate timal co-design of data-efficient sensors and performant an optimization problem that automatically synthesizes controllers has also been explored beyond network appli- such representations. We focus on control tasks in this cations. One set of approaches — inspired by the cognitive paper and assume that a task is defined in terms of a psychology concept of bounded rationality [15], [16] — is (partially observable) Markov decision process ((PO)MDP). to limit the information content of a control policy measured Let xt 2 X; ut 2 U; yt 2 Y denote the full state of the with respect to a default stochastic policy [17], [18], [19], system, the control input, and the sensor output at time t [20], [21]. Another set of examples comes from the sensor respectively. Here the state space X, input space U, and selection problem in robotics, which involves selecting a output space Y may either be discrete or continuous. We minimal number of sensors or features to use for estimating assume that the dynamics and sensor model of the system the robot’s state [22], [23], [24]. While the work highlighted are known and given by potentially time-varying conditional above shares our goal of designing “minimalistic” (i.e., distributions pt(xt+1jut; xt); σt(ytjxt) respectively. Let informationally frugal) controllers, our approach here is c0(x0; u0); c1(x1; u1); : : : ; cT −1(xT −1; uT −1); cT (xT ) be a fundamentally different. In particular, our goal is to design sequence of cost functions that encode the robot’s desired task-driven representations that form abstractions which are behavior. The robot’s goal is to identify a control policy sufficient for the purpose of control. Online estimation and πt(utjyt) that minimizes the expected value of these control is performed purely based on

Task-Driven Estimation and Control Via Information Bottlenecks

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

Interpretations of Directed Information in Portfolio Theory, Data

Analogy Between Gambling and Measurements-Based Work Extraction

Universal Estimation of Directed Information Lei Zhao, Haim Permuter, Young-Han Kim, and Tsachy Weissman

Information Theoretic Causal Effect Quantification

Capacity of Continuous Channels with Memory Via Directed Information Neural Estimator

Information Structures for Causally Explainable Decisions

A Unified Bellman Equation for Causal Information and Value in Markov

Transfer-Entropy-Regularized Markov Decision Processes Takashi Tanaka1 Henrik Sandberg2 Mikael Skoglund3

Directed Information for Complex Network Analysis from Multivariate Time Series

Echo State Network Models for Nonlinear Granger Causality

Multi-Scale Information, Network, Causality, and Dynamics: Mathematical Computation and Bayesian Inference to Cognitive Neuroscience and Aging