A Framework for Time-Consistent, Risk-Averse Model Predictive Control: Theory and Algorithms

Yin-Lam Chow, Marco Pavone

Abstract— In this paper we present a framework for risk- planner conservatism and the risk of infeasibility (clearly this averse model predictive control (MPC) of linear systems af- can not be achieved with a worst-case approach). Second, fected by multiplicative uncertainty. Our key innovation is risk-aversion allows the control designer to increase policy to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically robustness by limiting confidence in the model. Third, risk- justified in terms of time-consistency of risk preferences, is aversion serves to prevent rare undesirable events. Finally, in amenable to dynamic optimization, and is unifying in the sense a reinforcement learning framework when the world model that it captures a full range of risk assessments from risk- is not accurately known, a risk-averse agent can cautiously neutral to worst case. Within this framework, we propose and balance exploration versus exploitation for fast convergence analyze an online risk-averse MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation and to avoid “bad” states that could potentially lead to a of time-consistent, dynamic risk metrics, we cast the computa- catastrophic failure [9], [10]. tion of the MPC control law as a convex optimization problem Inclusion of risk-aversion in MPC is difficult for two main amenable to implementation on embedded systems. Simulation reasons. First, it appears to be difficult to model risk in results are presented and discussed. multi-period settings in a way that matches intuition [10]. I.INTRODUCTION In particular, a common strategy to include risk-aversion in multi-period contexts is to apply a static risk metric, which Model predictive control (MPC) is one of the most popular assesses risk from the perspective of a single point in time, methods to address optimal control problems in an online to the total cost of the future stream of random outcomes. setting [1], [2]. The key idea behind MPC is to obtain However, using static risk metrics in multi-period decision the control action by repeatedly solving, at each sampling problems can lead to an over or under-estimation of the true instant, a finite horizon open-loop optimal control problem, dynamic risk, as well as to a potentially “inconsistent” be- using the current state of the plant as the initial state; the havior, see [11] and references therein. Second, optimization result of the optimization is an (open-loop) control sequence, problems involving risk metrics tend to be computationally whose first element is applied to control the system [3]. intractable, as they do not allow a recursive estimation of The classic MPC framework does not provide a systematic risk. In practice, risk-averse MPC often resolves into the way to address model uncertainties and disturbances [4]. minimization of the expected value of an aggregate, risk- Accordingly, one of the main research thrusts for MPC is weighted objective function [12], [13]. to find techniques to guarantee persistent feasibility and In this paper, as a radical departure from traditional ap- stability in the presence of disturbances. Essentially, current proaches, we leverage recent strides in the theory of dynamic techniques fall into two categories: (1) min-max (or worst- risk metrics developed by the operations research community case) formulations, where the performance indices to be [14], [15] to include risk aversion in MPC. The key property minimized are computed with respect to the worst possible of dynamic risk metrics is that, by assessing risk at multiple disturbance realization [5], [6], [7], and (2) stochastic formu- points in time, one can guarantee time-consistency of risk risk-neutral expected lations, where values of performance preferences over time [14], [15]. In particular, the essential indices (and possibly constraints) are considered [4], [8]. requirement for time consistency is that if a certain outcome The main drawback of the worst-case approach is that the is considered less risky in all states of the world at stage control law may be too conservative, since the MPC law k + 1, then it should also be considered less risky at stage k. is required to guarantee stability and constraint fulfillment Remarkably, in [15], it is proven that any that is under the worst-case scenario. On the other hand, stochas- time consistent can be represented as a composition of one- tic formulations whereby the assessment of future random step risk metrics, in other words, in multi-period settings, outcomes is accomplished through a risk-neutral expectation risk (as expected) should be compounded over time. may be unsuitable in scenarios where one desires to protect The contribution of this paper is threefold. First, we intro- the system from large deviations. duce a notion of dynamic risk metric, referred to as Markov The objective of this paper is to introduce a systematic dynamic polytopic risk metric, that captures a full range method to include risk-aversion in MPC. The inclusion of of risk assessments and enjoys a geometrical structure that risk aversion is important for several reasons. First, in un- is particularly favorable from a computational standpoint. certain environments, a guaranteed-feasible solution may not Second, we present and analyze a risk-averse MPC algorithm exist and the issue becomes how to properly balance between that minimizes in a receding–horizon fashion a Markov Y.-L. Chow is with the Institute for Computational and Mathemati- dynamic polytopic risk metric, under the assumption that cal Engineering, Stanford University, Stanford, CA 94305, USA. Email: the system’s model is linear and is affected by multiplicative [email protected]. uncertainty. Finally, by exploring the “geometrical” structure M. Pavone is with the Department of Aeronautics and As- tronautics, Stanford University, Stanford, CA 94305, USA. Email: of Markov dynamic polytopic risk metrics, we present a [email protected]. convex programming formulation for risk-averse MPC that is amenable to real-time implementation (for moderate horizon A4 Positive homogeneity: if λ 0 and Z , then lengths). Our framework has three main advantages: (1) it is ρ(λZ) = λρ(Z). ≥ ∈ Z axiomatically justified, in the sense that risk, by construction, is assessed in a time-consistent fashion; (2) it is amenable These axioms were originally conceived in [16] and ensure to dynamic and convex optimization, thanks to the compo- the “rationality” of single-period risk assessments (we refer sitional form of Markov dynamic polytopic risk metrics and the reader to [16] for a detailed motivation of these axioms). their geometry; and (3) it is general, in the sense that it One of the main properties for coherent risk metrics is a captures a full range of risk assessments from risk-neutral universal representation theorem for coherent risk metrics, to worst case. In this respect, our formulation represents a which in the context of finite probability spaces takes the unifying approach for risk-averse MPC. following form: The rest of the paper is organized as follows. In Section II Theorem II.2 (Representation Theorem for Finite Probabil- we provide a review of the theory of dynamic risk metrics. In ity Spaces [17, page 265]). Consider the probability space Section III we discuss the stochastic model we will consider Ω, , P where Ω is finite, i.e., Ω = ω1, . . . , ωL , { F } { Ω } F in this paper. In Section IV we introduce and discuss the is the σ-algebra of all subsets (i.e., = 2 ), and P = notion of Markov dynamic polytopic risk metrics. In Section (p(1), . . . , p(L)), with all probabilitiesF positive. Let be V we state the infinite horizon optimal control problem we the set of probability density functions: B wish to address and in Section VI we derive conditions for L risk-averse closed-loop stability. In Section VII and VIII we n L X o := ζ R : p(j)ζ(j) = 1, ζ 0 . present, respectively, a risk-averse model predictive control B ∈ ≥ law and approaches for its computation. Numerical exper- j=1 iments are presented and discussed in Section IX. Finally, The risk measure ρ : R is a coherent risk measure if in Section X we draw some conclusions and we discuss and only if there existsZ a convex→ bounded and weakly* closed directions for future work. set such that ρ(Z) = maxζ∈U Eζ [Z]. U ⊂ B II.REVIEWOF DYNAMIC RISK THEORY The result essentially states that any coherent risk mea- In this section, we briefly describe the theory of coherent sure is an expectation with respect to a worst-case density and dynamic risk metrics, on which we will rely extensively function ζ, chosen adversarially from a suitable set of test later in the paper. The material presented in this section density functions (referred to as risk envelope). summarizes several novel results in risk theory achieved in the past 10 years. Our presentation strives to present this B. Dynamic, time-consistent measures of risk material in a intuitive fashion and with a notation tailored to This section provides a multi-period generalization of the control applications. concepts presented in Section II-A and follows closely the A. Static, coherent measures of risk discussion in [15]. Consider a probability space (Ω, , P), F a filtration 0 1 2 N , and an Consider a probability space (Ω, , P), where Ω is the F ⊂ F ⊂ F · · · ⊂ F ⊂ F F adapted sequence of real-valued random variables Zk, k set of outcomes (sample space), is a σ-algebra over Ω ∈ F 0,...,N . We assume that 0 = Ω, , i.e., Z0 is representing the set of events we are interested in, and P { } F { ∅} is a probability measure over . In this paper we will deterministic. The variables Zk can be interpreted as stage- F wise costs. For each k 0,...,N , denote with k the focus on disturbance models characterized by probability ∈ { } Z mass functions, hence we restrict our attention to finite space of random variables defined over the probability space (Ω, k, P); also, let k,N := k N . Given sequences probability spaces (i.e., Ω has a finite number of elements F Z Z ×· · ·×Z Z = Zk,...,ZN k,N and W = Wk,...,WN or, equivalently, is a finitely generated algebra). Denote { } ∈ Z { } ∈ with the spaceF of random variables Z :Ω ( , ) k,N , we interpret Z W component-wise, i.e., Zj Wj Z 7→ −∞ ∞ Zfor all j k, . . . , N ≤. ≤ defined over the probability space (Ω, , P). In this paper ∈ { } a Z is interpretedF as a cost, i.e., the The fundamental question in the theory of dynamic risk smaller the realization∈ of ZZ, the better. For Z,W , we denote measures is the following: how do we evaluate the risk of the sequence Zk,...,ZN from the perspective of stage by Z W the point-wise partial order, i.e., Z(ω) W (ω) { } for all≤ω Ω. ≤ k? The answer, within the modern theory of risk, relies By a risk∈ measure (or risk metric, we will use these terms on two key intuitive facts [15]. First, in dynamic settings interchangeably) we understand a function ρ(Z) that maps an the specification of risk preferences should no longer entail constructing a single risk metric but rather a sequence of risk uncertain outcome Z into the extended real line R + N ∪{ ∞}∪ metrics ρk,N , each mapping a future stream of random . In this paper we restrict our analysis to coherent risk { }k=0 measures{−∞} , defined as follows: costs into a risk metric/assessment at time k. This motivates the following definition. Definition II.1 (Coherent Risk Measures). A coherent risk Definition II.3 (). A dynamic risk measure is a mapping ρ : R, satisfying the following four axioms: Z → measure is a sequence of mappings ρk,N : k,N k, k 0,...,N , obeying the following monotonicityZ property:→ Z ∈ A1 Convexity: ρ(λZ + (1 λ)W ) λρ(Z) + (1 { } , for all − and ≤ ; − λ)ρ(W ) λ [0, 1] Z,W ρk,N (Z) ρk,N (W ) for all Z,W k,N such that Z W. A2 Monotonicity: if Z∈ W and Z,W∈ Z , then ≤ ∈Z ≤ ρ(Z) ρ(W ); ≤ ∈ Z The above monotonicity property (analogous to axiom A2 ≤ A3 Translation invariance: if a R and Z , then in Definition II.1) is, arguably, a natural requirement for any ρ(Z + a) = ρ(Z) + a; ∈ ∈ Z meaningful dynamic risk measure. N Second, the sequence of metrics ρk,N k=0 should be is a mapping ρk : k+1 k, k 0,...,N 1 , with constructed so that the risk preference{ profile} is consistent the following four properties:Z → Z ∈ { − } over time [18], [19], [11]. A widely accepted notion of • Convexity: ρk(λZ + (1 λ)W ) λρk(Z) + (1 − ≤ − time-consistency is as follows [15]: if a certain outcome is λ)ρk(W ), λ [0, 1] and Z,W k+1; ∀ ∈ ∈ Z considered less risky in all states of the world at stage k +1, • Monotonicity: if Z W then ρk(Z) ρk(W ), ≤ ≤ then it should also be considered less risky at stage k. Z,W k+1; ∀ ∈ Z The following example (adapted from [20]) shows how • Translation invariance: ρk(Z+W ) = Z+ρk(W ), Z ∀ ∈ dynamic risk measures as defined above might indeed result k and W k+1; Z ∈ Z in time-inconsistent, and ultimately undesirable, behaviors. • Positive homogeneity: ρk(λZ) = λρk(Z), Z k+1 and λ 0. ∀ ∈ Z Example II.4. Consider the simple setting whereby there ≥ is a final cost Z and one seeks to evaluate such cost from We are now in a position to state the main result of this the perspective of earlier stages. Consider the three-stage section. scenario tree in Figure 1, with the elementary events Ω = Theorem II.7 (Dynamic, Time-consistent Risk Measures UU,UD,DU,DD , and the filtration 0 = , Ω , 1 = {n o } Ω F {∅ } F ([15])). Consider, for each k 0,...,N , the mappings , U , D , Ω , and 2 = 2 . Consider the dynamic risk ∈ { } ρk,N : k,N k defined as measure:∅ { } { } F Z → Z ρk,N = Zk + ρk(Zk+1 + ρk+1(Zk+2 + ... + ρk,N (Z) := max Eq[Z k], k = 0, 1, 2 (2) q∈U |F ρN−2(ZN−1 + ρN−1(ZN )) ...)), where contains two probability measures, one correspond- where the ρk’s are coherent one-step conditional risk mea- ing toUp = 0.4, and the other one to p = 0.6 Assume sures. Then, the ensemble of such mappings is a dynamic, that the random cost is Z(UU) = Z(DD) = 0, and time-consistent risk measure. Z(UD) = Z(DU) = 100. Then, one has ρ1(Z)(ω) = 60 for Remarkably, Theorem 1 in [15] shows (under weak as- all ω, and ρ0(Z)(ω) = 48. Therefore, Z is deemed strictly sumptions) that the “multi-stage composition” in equation riskier than a deterministic cost W = 50 in all states of (2) is indeed necessary for time consistency. Accordingly, in nature at time k = 1, but nonetheless W is deemed riskier the remainder of this paper, we will focus on the dynamic, than Z at time k = 0 – a paradox! time-consistent risk measures characterized in Theorem II.7. p U UU III.MODEL DESCRIPTION p 1 p UD Consider the discrete time system: R p 1 p DU xk+1 = A(wk)xk + B(wk)uk, (3) 1 p Nx D DD where k N is the time index, xk R is the state, uk Nu ∈ ∈ ∈ R is the (unconstrained) control input, and wk is ∈ W Fig. 1. Scenario tree for example II.4. the process disturbance. We assume that the initial condition . x0 is deterministic. We assume that is a finite set of It is important to note that there is nothing special about cardinality L, i.e., = w[1], . . . , w[LW] . For each stage k W { } the selection of this example, similar paradoxical results and state-control pair (xk, uk), the process disturbance wk is could be obtained with other risk metrics. We refer the reader drawn from set according to the probability mass function W to [15], [19], [11] for further insights into the notion of time T consistency and its practical relevance. The issue then is what p = [p(1), p(2), . . . , p(L)] , additional “structural” properties are required for a dynamic [j] where p(j) = P(wk = w ), j 1,...,L . Without loss of risk measure to be time consistent. We first provide a rigorous generality, we assume that p(j∈) > { 0 for all} j. Note that the version of the previous definition of time-consistency. probability mass function for the process disturbance is time- Definition II.5 (Time Consistency ([15])). A dynamic risk invariant, and that the process disturbance is independent of N the process history and of the state-control pair . measure ρk,N is time-consistent if, for all 0 l < k (xk, uk) { }k=0 ≤ ≤ Under these assumptions, the stochastic process is N and all sequences Z,W l,N , the conditions xk ∈ Z clearly a Markov process. { } Zi = Wi, i = l, . . . , k 1, and By enumerating all L realizations of the process distur- − (1) ρk,N (Zk,...,ZN ) ρk,N (Wk,...,WN ), bance wk, system (3) can be rewritten as: ≤  [1] imply that A1xk + B1uk if wk = w ,  . . xk+1 = . . ρl,N (Zl,...,ZN ) ρl,N (Wl,...,WN ).  [L] ≤ ALxk + BLuk if wk = w , As we will see in Theorem II.7, the notion of time- [j] [j] where Aj := A(w ) and Bj := B(w ), j 1,...,L . consistent risk measures is tightly linked to the notion of ∈ { } coherent risk measures, whose generalization to the multi- The results presented in this paper can be immediately period setting is given below: extended to the time-varying case (i.e., where the probability mass function for the process disturbance is time-varying). Definition II.6 (Coherent One-step Conditional Risk Mea- To simplify notation, however, we prefer to focus this paper sures ([15])). A coherent one-step conditional risk measure on the time-invariant case. IV. MARKOV POLYTOPIC RISK MEASURES Other important examples include the Conditional Value- In this section we refine the notion of dynamic time- at-Risk [23], mean absolute semi-deviation [22], the spectral consistent risk metrics (as defined in Theorem II.7) in two risk measures [24], [11], the optimized certainty equivalent ways: (1) we add a polytopic structure to the dual represen- and expected utility [25], [17], [21], and the distributionally- tation of coherent risk metrics, and (2) we add a Markovian robust risk [4]. The key point is that the notion of polytopic structure. This will lead to the definition of Markov dynamic risk metric covers a full gamut of risk assessments, ranging polytopic risk metrics, which enjoy favorable computational from risk-neutral to worst case. properties and, at the same time, maintain most of the B. Markov dynamic polytopic risk metrics generality of dynamic time-consistent risk metrics. Note that in the definition of dynamic, time-consistent risk A. Polytopic risk measures measures, since at stage k the value of ρk is k-measurable, the evaluation of risk can depend on the wholeF past, see [15, According to the discussion in Section III, the probability Section IV]. For example, the weight c in the definition of space for the process disturbance has a finite number of the average upper mean semi-deviation risk metric can be an elements. Accordingly, consider Theorem II.2; by definition PL k-measurable random variable (see [15, Example 2]). This of expectation, one has Eζ [Z] = j=1 Z(j)p(j)ζ(j). In generality,F which appears of little practical value in many our framework (inspired by [21]), we consider coherent risk cases, leads to optimization problems that are intractable. measures where the risk envelope is a polytope, i.e., there This motivates us to add a Markovian structure to dynamic, I E U I E exist matrices S , S and vectors T , T of appropriate time-consistent risk measures (similarly as in [15]). We start dimensions such that by introducing the notion of Markov polytopic risk measure poly = ζ SI ζ T I ,SEζ = T E . (similar to [15, Definition 6]). U ∈ B | ≤ Definition IV.2 (Markov Polytopic Risk Measures). Con- We will refer to coherent risk measures representable with a sider the Markov process xk that evolves according to polytopic risk envelope as polytopic risk measures. Consider equation (3). A coherent one-step{ } conditional risk measure the bijective map q(j) := p(j)ζ(j) (recall that, in our model, ρk( ) is a Markov polytopic risk measure with respect to p(j) > 0). Then, by applying such map, one can easily · xk if it can be written as rewrite a polytopic risk measure as { } ρk(Z(xk+1)) = max q[Z(xk+1)] poly E ρ(Z) = max Eq[Z], q∈Uk (xk,p) q∈U poly where where q is a probability mass function belonging to a  I I  polytopic subset of the standard simplex, i.e.: poly L Sk (xk, p)q Tk (xk, p), k (xk, p) = q ∆ E ≤ E . n o U ∈ | Sk (xk, p)q = Tk (xk, p) poly = q ∆L SI q T I ,SEq = T E , (4) U ∈ | ≤ In other words, a Markov polytopic risk measure is a L  L PL coherent one-step conditional risk measure where the eval- where ∆ := q R : j=1 q(j) = 1, q 0 . uation of risk is not allowed to depend on the whole past ∈ PL ≥ Accordingly, one has Eq[Z] = j=1 Z(j)q(j) (note that, (for example, the weight c in the definition of the average with a slight abuse of notation, we are using the same upper mean semi-deviation risk metric can depend on the poly I E symbols as before for , S , and S ). past only through xk), and the risk envelope is a polytope. The class of polytopicU risk measures is large: we give Correspondingly, we define a Markov dynamic polytopic risk below some examples (also note that any comonotonic risk metric as follows. measure is a polytopic risk measure [11]). Definition IV.3 (Markov Dynamic Polytopic Risk Measures). Example IV.1. (Examples of Polytopic Risk Measures) The Consider the Markov process xk that evolves according to expected value of a random variable Z can be represented equation (3). A Markov dynamic{ polytopic} risk measure is a according to equation (4) with the polytopic risk envelope set of mappings ρk,N : k,N k defined as poly = q ∆L q(j) = p(j) for all j 1,...,L . Z → Z U A second∈ example| is represented by the∈ average { upper} ρk,N =Z(xk) + ρk(Z(xk+1) + ... + semi-deviation risk metric, defined as ρN−2(Z(xN−1) + ρN−1(Z(xN ))) ...)), ρ (Z) := [Z] + c (Z [Z])+, for k 0,...,N , where the single-period risk measures AUS E E E ∈ { } − ρk are Markov polytopic risk measures. where 0 c 1. This metric can be represented ac- cording to≤ equation≤ (4) with polytopic risk envelope ([22], Clearly, a Markov dynamic polytopic risk metric is time [17]): poly = q ∆L q(j) = p(j)1 + h(j) consistent. Definition IV.3 can be extended to the case where L U ∈ | − the for the disturbance depends on the P h(j)p(j), 0 h(j) c, j 1,...,L . j=1 ≤ ≤ ∈ { } current state and control action. We avoid this generalization A third example is represented by the worst case risk, to keep the exposition simple and consistent with model (3). defined as V. PROBLEM FORMULATION WCR(Z) := maxZ(j): j 1,...,L . ∈ { } In light of Sections III and IV, we are now in a position to Such risk metric can be trivially represented according to state the risk-averse MPC problem we wish to solve in this equation (4) with polytopic risk envelope poly = ∆L. paper. Our problem formulation relies on Markov dynamic U polytopic risk metrics that satisfy the following stationarity 2) Design a model predictive control algorithm to effi- assumption. ciently compute a suboptimal state-feedback control policy. Assumption V.1. (Time-Invariance of Risk) The polytopic poly risk envelopes k do not depend explicitly on time and VI.RISK-SENSITIVE STABILITY U poly poly are independent of the state xk, i.e. k (xk, p) = (p) In this section we provide a sufficient condition for system U U for all k. (3) to be UGRSES, under the assumptions of Section V. This This assumption is crucial for the well-posedness of our condition relies on Lyapunov techniques and is inspired by formulation and in order to devise a tractable MPC algorithm [4] (Lemma VI.1 indeed reduces to Lemma 1 in [4] when that relies on linear matrix inequalities. We next introduce a the risk measure is simply an expectation). notion of stability tailored to our risk-averse context. Lemma VI.1. Consider a policy π Π and the corre- ∈ Definition V.2 (Uniform Global Risk-Sensitive Exponential sponding closed-loop dynamics for system (3), denoted by xk+1 = f(xk, wk). The closed-loop system is UGRSES Stabilty). System (3) is said to be Uniformly Globally Risk- N Sensitive Exponentially Stable (UGRSES) if there exist con- if there exists a function V (x): R x R and scalars →Nx b1, b2, b3 > 0 such that for all x R the following stants c 0 and λ [0, 1) such that for all initial conditions ∈ N≥x ∈ conditions hold: x0 R , ∈ 2 2 T k T b1 x V (x) b2 x , ρ0,k(0,..., 0, x xk) c λ x x0, for all k , (5) k 0 N k k ≤ ≤ k k 2 (7) ≤ ∈ ρ(V (f(x, w))) V (x) b3 x . where ρ0,k is a Markov dynamic polytopic risk measure − ≤ − k k satisfying{ Assumption} V.1. Proof. From the time consistency, monotonicity, translational invariance, and positive homogeneity One can easily show that UGRSES is a more restrictive of Markov dynamic polytopic risk measures, 2 stability condition than mean-square stability (considered, for condition (7) implies b1 ρ0,k+1(0,..., 0, xk+1 ) k k ≤ example, in [4]). ρ0,k+1(0,..., 0,V (xk+1)) = ρ0,k(0,..., 0,V (xk) + 2 Consider the MDP described in Section III and let Π be ρ(V (xk+1) V (xk))) ρ0,k(0,..., 0,V (xk) b3 xk ) − ≤ 2 − k k ≤ the set of all stationary feedback control policies, i.e., Π := ρ0,k(0,..., 0, (b2 b3) xk ). Also, since ρ0,k+1 is n N N − k k 2 π : R x R u . Consider the quadratic cost function monotonic, one has ρ0,k+1(0,..., 0, xk+1 ) 0, which → } k k ≥ Nx Nu implies b2 b3 and in turn (1 b3/b2) [0, 1). c : R R R≥0 defined as ≥ 2 − ∈ × → Since V (xk)/b2 xk , and using the previous inequal- c(x, u) := xT Q x + uT R u, ities, one can write:≤ k k 2 where Q = QT 0 and R = RT 0 are given state ρ0,k+1(0,..., 0,V (xk+1)) ≤ρ0,k(0,..., 0,V (xk) − b3kxkk )   and control penalties. The problem we wish to address is as b3 ≤ 1− ρ0,k (0,..., 0,V (xk)) . follows. b2 Optimization Problem — Given an initial Repeating this bounding process, one obtains: Nx OPT state x0 R , solve ∈  k b3 inf lim sup J0,k(x0, π) ρ0,k+1(0,..., 0,V (xk+1)) ≤ 1 − ρ0,1 (V (x1)) π∈Π k→∞ b2  k  k such that xk+1 = A(wk)xk + B(wk)π(xk) b3 b3 2 = 1 − ρ (V (x1)) ≤ 1 − V (x0) − b3kx0k System is UGRSES b2 b2  k+1 b3 2 where ≤ b2 1 − kx0k . b   2 J0,k(x0, π) = ρ0,k c(x0, π(x0)), . . . , c(xk, π(xk)) , Again, by monotonicity, the above result implies and is a Markov dynamic polytopic risk  k+1 ρ0,k T b2 b3 T measure{ satisfying} Assumption V.1. ρ0,k+1(0,..., 0, xk+1xk+1) 1 x0 x0. ≤b1 − b2 ∗ We will denote the optimal cost function as J∞(x0). Note By setting c = b2/b1 and λ = (1 b3/b2) [0, 1), the claim that the risk measure in the definition of UGRSES is assumed − ∈ to be identical to the risk measure used to evaluate the cost is proven. of a policy. Also, note that, by Assumption V.1, the single- VII.MODEL PREDICTIVE CONTROL PROBLEM period risk metrics are time-invariant, hence one can write In this section we set up the receding horizon version of   ρ0,k c(x0, π(x0)), . . . , c(xk, π(xk)) = c(x0, π(x0)) problem . This will lead to a model predictive control (6) algorithmOPT for the (suboptimal) solution of problem . + ρ(c(x1, π(x1)) + ... + ρ(c(xk, π(xk))) ...), Consider the following receding-horizon cost functionOPT for N 1: where ρ is a given Markov polytopic risk metric that models ≥ the “amount” of risk aversion. This paper addresses problem along two main dimensions: J(x , π , . . . , π ,P ) := ρ c(x , π (x )), OPT k|k k|k k+N−1|k k,k+N k|k k|k k|k 1) Find sufficient conditions for risk-sensitive stability T  . . . , c(xk+N−1|k, πk+N−1|k(xk+N−1|k), xk+N P xk+N , (i.e., for UGRSES). (8) where xh|k is the state at time h predicted at stage k (a This sequence of control policies is essentially the concate- ∗ N−1 discrete random variable), πh|k is the control policy to be nation of the sequence πk+h|k h=1 with a linear feedback Nx { } applied at time h as determined at stage k (i.e., πh|k : R control law for stage N (the reason why we refer to this → RNu ), and P = P T 0 is a terminal weight matrix. We policy with the subscript “k + h k + 1” is that we will use are now in a position to state the model predictive control this policy as a feasible policy for| problem starting problem. at stage k + 1). MPC Optimization problem — Given an initial Consider the problem at stage k+1 with initial con- Nx MPC MPC ∗ state xk|k R and a prediction horizon N 1, dition given by xk+1|k+1 = A(wk)xk|k + B(wk)πk|k(xk|k), ∈ ≥ solve and denote with J k+1(xk+1|k+1) the cost of the objective  function assuming that the sequence of control policies is min J xk|k, πk|k, . . . , πk+N−1|k,P N πk|k,...,πk+N−1|k given by π . Note that x (and therefore { k+h|k+1}h=1 k+1|k+1 such that xk+h+1|k = A(wk+h)xk+h|k+ J k+1(xk+1|k+1)) is a random variable with L possible realizations. Define: B(wk+h)πk+h|k(xk+h|k) T T for h 0,...,N 1 . Zk+N := −xk+N|kP xk+N|k + xk+N|kQxk+N|k ∈ { − } T Note that a Markov policy is guaranteed to be optimal for + (F xk+N|k) R F xk+N|k, problem (see [15, Theorem 2]). The optimal cost  T MPC ∗ Zk+N+1 := (A(wk+N|k) + B(wk+N|k)F )xk+N|k · function for problem is denoted by Jk (xk|k), and the set of minimizers is denotedMPC by π∗ , . . . , π∗ . For   { k|k k+N−1|k} P (A(wk+N|k) + B(wk+N|k)F )xk+N|k . each state xk, we set xk|k = xk and the (time-invariant) model predictive control law is then defined as By exploiting the dual representation of Markov polytopic MPC ∗ risk metrics, one can easily show that equation (10) implies π (xk) =πk|k(xk|k). (9) Zk+N + ρ(Zk+N+1) 0. (12) Note that the model predictive control problem in- ≤ volves an optimization over time-varying closed-loopMPC poli- One can then write the following chain of inequalities: cies, as opposed to the classical deterministic case where ∗ T ∗ T ∗ the optimization is over open-loop sequences. A similar Jk (xk|k) = xk|kQxk|k +(πk|k(xk|k)) Rπk|k(xk|k)+ approach is taken in [8], [4]. We will show in Section VIII  ∗ T how to solve problem efficiently. ρ ρk+1,N c(xk+1|k, πk+1(xk+1|k)), . . . , xk+N|kQxk+N|k+ MPC The following theorem shows that the model predictive ! control law (9), with a proper choice of the terminal weight T  (F x ) RF x +ρ(Z )−Z −ρ(Z ) ≥ P , is risk-sensitive stabilizing, i.e., the closed-loop system k+N|k k+N|k k+N+1 k+N k+N+1 (3) is UGRSES. T ∗ T ∗ xk|kQxk|k +(πk|k(xk|k)) Rπk|k(xk|k)+ Theorem VII.1. (Stochastic Stability for Model Predictive  ∗ T Control Law) Consider the model predictive control law in ρ ρk+1,N c(xk+1|k, πk+1(xk+1|k)), . . . , xk+N|kQxk+N|k+ equation (9) and the corresponding closed-loop dynamics for system with initial condition Nx . Suppose that ! (3) x0 R T  P = P T 0, and there exists a matrix∈F such that: (F xk+N|k) R F xk+N|k + ρ(Zk+N+1) = L   X T T xT Qx +(π∗ (x ))T Rπ∗ (x )+ρ J (x ) ≥ ql(j)(Aj +BjF ) P (Aj +BjF ) P +(F RF +Q) 0, k|k k|k k|k k|k k|k k|k k+1 k+1|k+1 − ≺ j=1 T ∗ T ∗  ∗  (10) xk|kQxk|k +(πk|k(xk|k)) Rπk|k(xk|k) + ρ Jk+1(xk+1|k+1) , for all l 1,..., cardinality( poly,V (p)) , where (13) poly,V ∈ { U poly} (p) is the set of vertices of polytope (p), ql where the first equality follows from our definitions, the U poly,V U is the lth element in set (p), and ql(j) denotes the second inequality follows from equation (12) and the mono- U jth component of vector ql, j 1 ...,L . Then, the closed tonicity property of Markov polytopic risk metrics (see also ∈ { } loop system (3) is UGRSES. [15, Page 242]), the third equality follows from the fact that N ∗ the sequence of control policies πk+h|k+1 is a feasible Proof. The strategy of this proof is to show that Jk is a valid { }h=1 Lyapunov function in the sense of Lemma VI.1. Specifically, sequence for the problem starting at stage k + 1 with ∗ MPC ∗ we want to show that J satisfies the two inequalities in initial condition xk+1|k+1 = A(wk)xk|k +B(wk)πk|k(xk|k), k ∗ equation (7); the claim then follows by simply noting that, and the fourth inequality follows form the definition of Jk+1 ∗ and the monotonicity of Markov polytopic risk metrics. in our time-invariant setup, Jk does not depend on k. We start by focusing on the bottom inequality in equation We now turn our focus to the top inequality in equation Nx ∗ (7). Consider a time k and an initial condition xk|k R (7). One can easily bound Jk (xk|k) from below according for problem . The sequence of optimal control policies∈ to: MPC∗ N−1 ∗ T 2 is given by π . Let us define a sequence of control Jk (xk|k) xk|kQxk|k λmin(Q) xk|k , (14) { k+h|k}h=0 ≥ ≥ k k policies from time k + 1 to N according to ∗ where λmin(Q) > 0 by assumption. To bound J (xk|k) from  ∗ k πk+h|k(xk+h|k) if h [1,N 1], above, define: πk+h|k+1(xk+h|k) := ∈ − F xk+N|k if h = N. MA := max max Ajr ...Aj1 Aj0 2. (11) r∈{0,...,N−1} j0,...,jr ∈{1,...,L} k k One can write: B. Convex programming approach ∗   Consider the following parameterization of history- Jk (xk|k) ≤ c(xk|k, 0) + ρ c(xk+1|k, 0) + ρ c(xk+2|k, 0) dependent control policies. Let j0, . . . , jh 1,...,L be  T   ∈ { } + ... + ρ xk+N|kP xk+N|k ... the realized indices for the disturbances in the first h + 1 stages of the problem, where h 0,...,N 1 ; for 2  2  2 MPC ∈ { − } ≤kQk2kxk|kk2 + ρ kQk2kxk+1|kk2 + ρ kQk2kxk+2|kk2+ h 1, we will refer to the control to be exerted at stage h as ≥ 2  U h(j0, . . . , jh−1). Similarly, we will refer to the state at stage ... + ρ kP k2kxk+N k2 ... . h as Xh(j0, . . . , jh−1). The quantities Xh(j0, . . . , jh−1) and Therefore, by using the translational invariance and mono- U h(j0, . . . , jh−1) enable us to keep track of the (exponential) tonicity property of Markov polytopic risk metrics, one growth of the scenario tree. In terms of this new notation, obtains the upper bound: the system dynamics (3) can be rewritten according to: Nu ∗ 2 X0 := xk|k, U 0 R , 0 J (xk|k) (N Q 2 + P 2) MA xk|k . (15) ∈ ≤ k ≤ k k k k k k2 X1(j0) = Aj0 X0 + Bj0 U 0, for h = 1, Combining the results in equations (13), (14), (15), and given (17) X (j , . . . , j ) = A X (j , . . . , j )+ the time-invariance of our problem setup, one concludes h 0 h−1 jh−1 h−1 0 h−2 ∗ that J (x ) is a “risk-sensitive” Lyapunov function for the Bj U h−1(j0, . . . , jh−2), for h 2. k k|k h−1 ≥ closed-loop system (3), in the sense of Lemma VI.1. This Indeed, the problem is defined as an optimization concludes the proof. problem over MarkovMPC control policies. However, in the con- vex programming approach, we re-define the problem The study of the conservatism of the LMI condition (10) is MPC left for future work. We note here that out of 100 randomly as an optimization problem over history-dependent policies. generated 5-state-2-action-3-scenario examples, 83 of them One can show (with a virtually identical proof) that the satisfied condition (10). stability Theorem VIII.1 still holds when history-dependent policies are considered. Furthermore, since Markov policies VIII.SOLUTION ALGORITHMS are optimal in our setup (see [15, Theorem 2]), the value In this section we discuss two solution approaches, the first of the optimal cost stays the same. The key advantage of one via dynamic programming, the second one via convex history-dependent policies is that their additional flexibility programming. leads to a convex optimization problem for the determination of the model predictive control law. Specifically, the model A. Dynamic programming approach predictive control law can be determined according to the following algorithm. Problem can be solved via dynamic programming, MPC Solution for — Given an initial state xk|k = see [15, Theorem 2]. However, one would first need to find MPC xk and a prediction horizon N 1, solve: a matrix P that satisfies equation (10). The next theorem ≥ presents a linear matrix inequality characterization of condi- min γ1 tion (10). γ1, W, GjN ,YjN , Q, γ2(j0, . . . , jN−1) U 0, U h(j0, . . . , jh−1), h ∈ {1,...,N}  T T T Theorem VIII.1. Define A = A1 ...AL , B = Xh(j0, . . . , jh−1), h ∈ {1,...,N}  T T T j0, . . . , jN−1 ∈ {1,...,L} B1 ...BL and Σl = diag(ql(1)I, . . . , ql(L)I) 0, poly,V  subject to for all ql (p). Consider the following set of linear matrix inequalities∈ U with respect to decision variables Y , G, • The linear matrix inequality in equation (16). T Q = Q 0: • The linear matrix inequality  T  1 γ2(j0, . . . , jN−1)I XN (j0, . . . , jN−1)  2  0. IL×L ⊗ Q 0 0 −Σl (AG + BY ) −1 Q   ∗ R 0 −Y  0, ∗  1  (16) •  ∗ ∗ I −Q 2 G  The system dynamics in equation (17). ∗ ∗ ∗ −Q + G + GT • The objective epigraph constraint: ρ (c (X , U ), . . . , c (X , U ), where l 1,..., cardinality( poly,V (p)) . The set of lin- k,k+N 0 0 0 N−1 N−1 N−1 ear matrix∈ { inequalities in (16) isU equivalent} to the condition γ2(j0, . . . , jN−1)) γ1. −1 ≤ −1 MPC in (10) by setting F = YG and P = Q . Then, set π (xk) = U 0. Proof. The theorem can be proven by using the Projection The proof of the correctness of this algorithm is omitted Lemma (see [26, Chapter 2]). The details are omitted in the due to lack of space and will be presented in the journal interest of brevity. version of this paper. This algorithm is clearly suitable only for “moderate” values of N, given the combinatorial Hence, a solution approach for is to first solve explosion of the scenario tree. Indeed, one could devise a the linear matrix inequality in TheoremMPC VIII.1 and then, hybrid algorithm where the above computation is split into if a solution for P is found, apply dynamic programming two steps: (1) the terminal cost function is computed offline, (after state and action discretization, see, e.g., [27], [28]). and (2) the MPC control law is computed online. We tested Note that the discretization process might yield a large-scale this approach and found that the computation time is very dynamic programming problem, which motivates the convex short for reasonably large problems. The details are deferred programing approach presented next. to a future publication. TABLE I REFERENCES STATISTICS FOR RISK-AVERSE MPC. Level Sample Sample Sample Mean Time [1] J. Qin and T. Badgwell. A survey of industrial model predictive control of risk Mean Dispersion Standard Deviation Per Itr. (Sec) technology. Control Engineering Practice, 11(7):733–764, 2003. c=0 2.9998 0.2889 0.4245 4.8861 [2] Y. Wang and S. Boyd. Fast model predictive control using online c=0.25 3.3012 0.2643 0.3520 5.2003 optimization. IEEE Transactions on Control Systems Technology, c=0.5 3.4178 0.2004 0.2977 4.4007 18(2):267–278, 2010. c=0.75 3.5898 0.1601 0.2231 4.4577 [3] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Con- c=1 3.6072 0.0903 0.1335 4.6498 strained model predictive control: stability and optimality. Automatica, 36(6):789–814, 2000. [4] D. Bernardini and A. Bemporad. Stabilizing model predictive control of stochastic constrained linear systems. IEEE Transactions on Automatic Control, 57(6):1468–1480, 2012. IX.NUMERICAL EXPERIMENTS [5] M. Kothare, V. Balakrishnan, and M. Morari. Robust constrained model predictive control using Linear Matrix Inequalities. Automatica, In this section we present numerical experiments run on 32(10):1361–1379, 1996. a 2.3 GHz Intel Core i5, MacBook Pro laptop, using the [6] C. de Souza. Robust stability and stabilization of uncertain discrete- MATLAB YALMIP Toolbox, version 2.6.3 [29]. Consider time Markovian jump linear systems. IEEE Transactions on Automatic Control, 51(5):836–841, 2006. the stochastic system: xk+1 = A(wk)xk + B(wk)uk, where [7] B. Park and W. Kwon. Robust one-step receding horizon control wk 1, 2, 3 and of discrete-time Markovian jump uncertain systems. Automatica, ∈ { } 38(7):1229–1235, 2002.  2 0.5 0.01 0.1  1.5 −0.3 [8] J. Primbs A and C. Sung. Stochastic receding horizon control of A1 = −0.5 2 ,A2 = 0.05 0.01 ,A3 = 0.2 1.5 , constrained linear systems with state and control multiplicative noise. IEEE Transactions on Automatic Control, 54(2):221–230, 2009.  3 0.1  1 0.5  2 0.3 [9] B. Defourny, D. Ernst, and L. Wehenkel. Risk-aware decision making B1 = 0.1 3 ,B2 = 0.5 1 ,B3 = 0.3 2 . and dynamic programming. In NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008. [10] T. M. Moldovan and P. Abbeel. Risk aversion in Markov decision The probability mass function for the process disturbance is processes via near optimal Chernoff bounds. In NIPS, pages 3140– 1 uniformly distributed, i.e., P(wk = i) = 3 , for i 1, 2, 3 . 3148, 2012. In this example, we want to explore the risk∈ { aversion} [11] D. A. Iancu, M. Petrik, and D. Subramanian. Tight approximations of dynamic risk measures. 2013. Submitted, available at http: capability of the risk-averse MPC algorithm presented in //arxiv.org/abs/1106.6102. Section VII (the solution relies on the convex programming [12] A. Zafra-Cabeza, J.M. Maestre, M. A. Ridao, E. F. Camacho, and approach). We consider as risk-aversion metric the mean L. Sanchez.´ A hierarchical distributed model predictive control approach to irrigation canals: a risk mitigation perspective. Journal of upper semi-deviation metric (see Section II), where c ranges Process Control, 21(5):787 – 799, 2011. in the set 0, 0.25, 0.5, 0.75, 1 . The initial condition is [13] P.-J. van Overloop, S. Weijs, and S. Dijkstra. Multiple model predictive { } control on a drainage canal system. Control Engineering Practice, x0(1) = x0(2) = 1 and the number of lookahead steps is 3. −4 16(5):531 – 540, 2008. We set Q = 1 I2×2 and R = 10 I2×2. We perform 100 [14] A. Ruszczynski and A. Shapiro. Conditional risk mappings. Mathe- Monte Carlo simulations× for each value of c. When c = 0, the matics of operations research, 21(3):544–561, 2006. problem is reduced to a risk-neutral minimization problem. [15] A. Ruszczynski. Risk averse dynamic programming for Markov decision process. Journal of Mathematical Programming, 125(2):235– On the other hand, one enforces maximum emphasis on 261, 2010. regulating semi-deviation (dispersion) by setting c = 1. Table [16] P. Artzner, F. Delbaen, J. Eber, and D. Heath. Coherent measures of I summarizes our results (computation times are given in risk. Mathematical finance, 9(3):203–228, 1998. [17] A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on stochastic seconds). When c = 0 (risk neutral formulation), the average programming: modeling and theory. SIAM, 2009. cost is the lowest (with respect to the different choices for c), [18] P. Cheridito and M. Stadjie. Time inconsistency of VaR and time- but the dispersion is the largest. Conversely, when c = 1, the consistent alternatives. Finance Research Letters., 6(1):40–46, 2009. [19] A. Shapiro. On a time consistency concept in risk averse multi-stage dispersion is the lowest, but the average cost is the largest. stochastic programming. Operations Research Letters, 37(3):143–147, 2009. [20] B. Roorda, J. M. Schumacher, and J. Engwerda. Coherent acceptability X.CONCLUSIONAND FUTURE WORK measures in multi-period models. Mathematical Finance, 15(4):589– 612, 2005. In this paper we presented a framework for risk-averse [21] A. Eichhorn and W. Romisch.¨ Polyhedral risk measures in stochastic MPC. Advantages of this framework include: (1) it is ax- programming. SIAM Journal on Optimization, 16(1):69–95, 2005. [22] W. Ogryczak and A. Ruszczynski.´ From stochastic dominance to iomatically justified; (2) it is amenable to dynamic and mean-risk models: semi-deviations as risk measures. European Journal convex optimization; and (3) it is general, in the sense that of Operational Research, 116(1):33–50, 1999. it captures a full range of risk assessments from risk-neutral [23] R. Rockafellar and S. Uryasev. Optimization of conditional value-at- risk. Journal of Risk, 2:21–42, 2000. to worst case (given the generality of Markov polytopic risk [24] D. Bertsimas and D. Brown. Constructing uncertainty sets for robust metrics). linear optimization. Operations Research, 57(6):1483–1495, 2009. This paper leaves numerous important extensions open [25] A. Ben-Tal and M. Teboulle. An old-new concept of convex risk measures: the optimized certainty equivalent. Mathematical Finance, for further research. First, we plan to study the case with 17(3):449–476, 2007. state and control constraints (preliminary work suggests that [26] R. Skelton, T. Iwasaki, and K. Grigoriadis. A unified algebraic such constraints can be readily included in our framework). approach to linear control design. CRC Press, 1998. [27] Y. Chow and M. Pavone. A uniform-grid discretization algorithm Second, it is of interest to consider the inclusion of multi- for dynamic programming with risk constraints. In Conference on period risk constraints. Third, we plan to characterize the Decision and Control, 2013. sub-optimal performance of risk-averse MPC by designing [28] C. Chow and J. Tsitsiklis. An optimal one-way multigrid algorithm for discrete-time stochastic control. Automatic Control, IEEE Transactions lower and upper bounds for the optimal cost. Fourth, we on, 36(8):898–914, 1991. plan to study in detail the offline and hybrid counterparts of [29]J.L ofberg.¨ YALMIP : A toolbox for modeling and optimization in the online MPC algorithm presented in this paper. Finally, it MATLAB. In CCA/ISIC/CACSD, September 2004. is of interest to study the robustness of our approach with respect to perturbations in the system’s parameters.