Rational Inattention in Controlled Markov Processes
Total Page:16
File Type:pdf, Size:1020Kb
Rational Inattention in Controlled Markov Processes Ehsan Shafieepoorfard, Maxim Raginsky and Sean P. Meyn Abstract— The paper poses a general model for optimal Sims considers a model in which a representative agent control subject to information constraints, motivated in part decides about his consumption over subsequent periods of by recent work on information-constrained decision-making by time, while his computational ability to reckon his wealth – economic agents. In the average-cost optimal control frame- work, the general model introduced in this paper reduces the state of the dynamic system – is limited. A special case is to a variant of the linear-programming representation of the considered in which income in one period adds uncertainty average-cost optimal control problem, subject to an additional of wealth in the next period. Other modeling assumptions mutual information constraint on the randomized stationary reduce the model to an LQG control problem. As one policy. The resulting optimization problem is convex and admits justification for introducing the information constraint, Sims a decomposition based on the Bellman error, which is the object of study in approximate dynamic programming. The remarks [7] that “most people are only vaguely aware of their structural results presented in this paper can be used to obtain net worth, are little-influenced in their current behavior by performance bounds, as well as algorithms for computation or the status of their retirement account, and can be induced to approximation of optimal policies. make large changes in savings behavior by minor ‘informa- tional’ changes, like changes in default options on retirement I. INTRODUCTION plans.” Quantitatively, the information constraint is stated in terms of an upper bound on the mutual information in the In typical applications of stochastic dynamic program- sense of Shannon [8] between the state of the system and ming, the controller has access to limited information about the observation available to the agent. the state of the system. Unlike much of the existing literature Extensions of this work include Matejka and McKay [9], on problems with imperfect state information, in this paper who recovered the well-known multinomial logit model for it is assumed that the system designer has to decide not a situation where an individual must choose among discrete only about the control policy, but also about the observation alternatives yielding different values not perfectly known ex- channel based on which the control is derived. (A very partial ante. In the sequel [10], they extend this model to study list of references that touch upon similar themes includes Nash equilibria in a Bertrand oligopoly setting where N [1]–[5].) There are various applications which fit in this firms produce the same commodity with different qualities framework, either in engineering or economics. not perfectly known ex-ante; it is shown that this information In economic decision-making, the amount of information friction leads to increased prices for the commodity. On a required to make a truly optimal decision will typically similar theme, Peng [11] is motivated by the observation that exceed what an agent can handle. In his seminal work “investors have limited time and attention to process infor- [6], Christopher Sims (who has shared the 2011 Nobel mation.” In a factor model for asset returns, he investigates Memorial Prize in Economics with Thomas Sargent) adds the dynamics of prices and mutual information when there an information-processing constraint to a specific kind of is uncertainty in the decision making process. dynamic programming problem which is frequently used Given the appeal and generality of these questions, there in macroeconomic models. The aim, as he expresses, is is ample motivation for the creation of a general theory for to support John Maynard Keynes’ well-known view that optimal control subject to information constraints. The con- real economic behavior is inconsistent with the idea of tribution of the present paper is to initiate the development continuously optimizing agents interacting in continuously of such a theory, which would in turn enable us to address clearing markets. Sims uses the term “rational inattention” to many problems in macroeconomics and engineering in a describe the setting in which information-constrained agents systematic fashion. We focus on the average-cost optimal strive to make the best use of whatever information they are control framework and show that the construction of an op- able to handle. timal information-constrained controller reduces to a variant of the linear-programming representation of the average-cost E. Shafieepoorfard and M. Raginsky are with the Department of Electri- cal and Computer Engineering and the Coordinated Science Laboratory, optimal control problem, subject to an additional mutual University of Illinois at Urbana–Champaign, Urbana, IL 61801, USA; information constraint on the randomized stationary policy. fshafiee1,[email protected]. The resulting optimization problem is convex and admits S.P. Meyn is with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA; a decomposition in terms of the Bellman error, which is [email protected]. the object of study in approximate dynamic programming. Research supported in part by the NSF under CAREER award no. The structure revealed in this paper can be used to obtain CCF-1254041 and grant no. ECCS-1135598, and in part by the UIUC College of Engineering under Strategic Research Initiative on “Cognitive performance bounds, as well as algorithms for computation and Algorithmic Decision Making.” or approximation of optimal policies. Xt II. PRELIMINARIES AND NOTATION Controlled system All spaces are assumed to be standard Borel (i.e., iso- morphic to a Borel subset of a complete separable metric space), and will be equipped with their Borel σ-algebras. If Observation channel X is such a space, then B(X) will denote its Borel σ-algebra, Ut Zt and P(X) will denote the space of all probability measures Controller on (X; B(X)). We will denote by M(X) the space of all measurable functions f ∶ X → R and by Cb(X) ⊆ M(X) the space of all bounded continuous functions. We will often use Fig. 1. System model. 1 bilinear form notation for expectations: for any f ∈ L (µ), ⟨µ, f⟩ ≜ f(x)µ(dx) = E[f(X)]; data processing inequality SX satisfies the : if X → Y → Z is a Markov chain, then where in the last expression it is understood that X is an X-valued random object with Law(X) = µ. I(X; Z) ≤ I(X; Y ): (2) Given two spaces X and Y, a mapping K(⋅S⋅) ∶ B(Y)×X → In words, no additional processing can increase information. [0; 1] is a Markov (or stochastic) kernel if K(⋅Sx) ∈ P(Y) for all X and is measurable for every x ∈ x ↦ K(BSx) III. SOME GENERAL CONSIDERATIONS B ∈ B(Y). The space of all such Markov kernels will be Consider a model in which the controller is constrained to denoted by M(YSX). Markov kernels K ∈ M(YSX) act on observe the system being controlled through an information- measurable functions f ∈ M(Y) from the left as limited channel. This is illustrated in the block diagram Kf(x) ≜ f(y)K(dySx); ∀x ∈ X shown in Figure 1, consisting of: SY the (time-invariant) controlled system, specified by an and on probability measures µ ∈ P(X) from the right as ● initial condition µ ∈ P(X) and a stochastic kernel Q ∈ M(XSX × U), where X is the state space and U is the µK(B) ≜ K(BSx)µ(dx); ∀B ∈ B(Y): SX control (or action) space; the observation channel, specified by a sequence W of Moreover, Kf ∈ M(X) for any f ∈ M(Y), and µK ∈ P(Y) Z Xt Zt 1 Ut 1 X relative entropy information ● stochastic kernels Wt ∈ M( S × × ), t = for any µ ∈ P( ). The (or 1 1; 2;:::, where Z is some observation space− ; and− divergence) between any two µ, ν ∈ P(X) [8] is defined as the feedback controller, specified by a sequence Φ of ⎧ dµ stochastic kernels Φ U Z , t 1; 2;:::. ⎪cµ, log h ; if µ ≺ ν ● t ∈ M( S ) = D(µYν) ≜ ⎨ dν The X-valued state process {Xt}t 1, the Z-valued ob- ⎪+∞; otherwise ⎩ servation process Zt 1, and the ∞U-valued control pro- { }t = X cess Ut 1 are defined∞ on a common probability space Given any probability measure µ ∈ P( ) and any Markov { }t = Y X Ω; ; P ∞and have the causal ordering kernel K ∈ M( S ), we can define a probability measure ( F ) = µ ⊗ K on the product space (X × Y; B(X) ⊗ B(Y)) via its X1;Z1;U1;:::;Xt;Zt;Ut;::: (3) action on the rectangles A × B, A ∈ B(X);B ∈ B(Y): where, P-almost surely, P(X1 ∈ A) = µ(A) for all A ∈ B(X), (µ ⊗ K)(A × B) ≜ K(BSx)µ(dx): and for all 1 2 , Z , U , X we SA t = ; ;::: B ∈ B( ) C ∈ B( ) D ∈ B( ) have Note that µ ⊗ K(X × B) = µK(B) for all B ∈ B(X). The t t 1 t 1 t t 1 t 1 Shannon mutual information [8] in the pair (µ, K) is P(Zt ∈ BSX ;Z ;U ) = Wt(BSX ;Z ;U ) (4a) U C Xt;Zt−;U t 1− Φ C Z − − (4b) I(µ, K) ≜ D(µ ⊗ KYµ ⊗ µK); (1) P( t ∈ S ) = t( S t) t t − t P(Xt 1 ∈ DSX ;Z ;U ) = Q(DSXt;Ut) (4c) where, for any µ ∈ P(X) and ν ∈ P(Y), µ ⊗ ν denotes the product measure defined via (µ⊗ν)(A×B) ≜ µ(A)ν(B) for This specification+ ensures that, for each t, the next state t 1 t t 1 all A ∈ B(X);B ∈ B(Y).