Optimal Control & Trajectory Optimization

Introduction to Optimal Control ORF523 CONVEX AND CONIC OPTIMIZATION SUMEET SINGH, GOOGLE BRAIN ROBOTICS APRIL 22, 2021 Outline • Optimal Control Problem • Open- vs Closed-Loop Solutions • Closed-Loop: Bellman’s Principle of Optimality & Dynamic Programming • Finite spaces • Continuous spaces – LQ control • Open-Loop: • Gradient descent • Newton descent • DDP • Model Predictive Control Feedback Control • Consider block diagram for tracking some reference signal. Adapted from: Stanford AA203, Spring 2021 Feedback Control • Consider block diagram for tracking some reference signal. Adapted from: Stanford AA203, Spring 2021 Feedback Control • Consider block diagram for tracking some reference signal. Disturbance Noise Adapted from: Stanford AA203, Spring 2021 Feedback Control Objectives • Stability: various formulations; loosely, system output is “under control” • Tracking: output should track reference “as close as possible” • Disturbance rejection: output should be “as insensitive as possible” to disturbances/noise • Robustness: controller should still perform well up to “some degree of model misspecification” Adapted from: Stanford AA203, Spring 2021 What’s Missing? • Performance: some mathematical quantification of all these objectives and control that realizes the tradeoffs • Planning: providing an appropriate reference trajectory to track (can be highly non-trivial) • Learning: adaptation to unknown properties of the system Adapted from: Stanford AA203, Spring 2021 What’s Missing? • Performance: some mathematical quantification of all these objectives and control that realizes the tradeoffs • Planning: providing an appropriate reference trajectory to track (can be highly non-trivial) • Learning: adaptation to unknown properties of the system Adapted from: Stanford AA203, Spring 2021 Optimal Control Problem 3 Key Ingredients: • Mathematical description of the system to be controlled • Specification of a performance criterion • Specification of constraints State-Space Models Where • are the state variables • are the control variables Adapted from: Stanford AA203, Spring 2021 State-Space Models In compact form: • A history of control input values during the interval [0, T] is called a control history, denoted by u • A history of state values during the interval [0, T] is called a state trajectory, denoted by x Adapted from: Stanford AA203, Spring 2021 Illustrative Example • Double integrator: point mass under controlled acceleration Adapted from: Stanford AA203, Spring 2021 Illustrative Example • Double integrator: point mass under controlled acceleration Adapted from: Stanford AA203, Spring 2021 Quantifying Performance Adapted from: Stanford AA203, Spring 2021 Quantifying Performance Adapted from: Stanford AA203, Spring 2021 Quantifying Performance Adapted from: Stanford AA203, Spring 2021 Quantifying Performance • More generally: Instantaneous or stage-wise cost Terminal Cost • 푙 and 휙 are scalar functions, and 푇 may be specified or “free” Constraints • Initial and final conditions (boundary conditions): • Constraints on state trajectory: • Control limits: • A control history and state trajectory that satisfy the control & state constraints for the entire time interval are termed admissible Adapted from: Stanford AA203, Spring 2021 The Optimal Control Problem Definitions: State: , Control: Continuous Time: ◦ Performance measure (minimize): ◦ Dynamics: ◦ + other constraints, e.g., 풖 푡 ∈ 푈, 풙 푡 ∈ 푋 ◦ 푇 < ∞ or 푇 = ∞ • Minimizer: (풙∗, 풖∗) is an optimal solution pair. • Existence & uniqueness not always guaranteed The Optimal Control Problem Discrete Time: ◦ Performance measure (minimize): ◦ Dynamics: ◦ + other constraints ◦ 푁 < ∞ or 푁 = ∞ Solution Methods Dynamic Programming (Principle of Optimality) ◦ Compositionality of optimal paths ◦ Closed-loop solutions: find a solution for all states at all times Calculus of Variations (Pontryagin Maximum/Minimum Principle) ◦ “Optimal curve should be such that neighboring curves don’t lead to smaller costs” → “Derivative = 0” ◦ Open-loop solutions: find a solution for a given initial state Figures: [Kelly, 2017] The Optimal Control Problem Continuous Time: • Performance measure (minimize): • Dynamics: • + other constraints, e.g., 풖 푡 ∈ 푈, 풙 푡 ∈ 푋 Closed-loop: find policy function 휋∗ 풙, 푡 s.t. 풖∗ 푡 = 휋∗(풙 푡 , 푡) ∗ ∗ 1,∞ ∞ Open-loop: given 풙 0 = 풙ퟎ, find optimal signals: (풙 , 풖 ), i.e., functions in 푊 [0, 푇] and 퐿 [0, 푇] The Optimal Control Problem Discrete Time: • Performance measure (minimize): • Dynamics: • + other constraints ∗ ∗ ∗ ∗ Closed-loop: find policy functions: {휋0, … , 휋푁−1}, s.t. 풖 푛 = 휋푛(풙 푛 ) ∗ ∗ Open-loop: Given 풙 0 = 풙ퟎ, find optimal sequences: (풙 [ ], 풖 [ ]). Optimal Control Open Loop Closed Loop Optimal Control Open Loop Closed Loop Principle of Optimality Principle of Optimality 퐽ሚ푏푐 Given trajectory from a→c, with cost 퐽푎푐 = 퐽푎푏 + 퐽푏푐 minimal, then 퐽푏푐 minimal for path b→c. Proof by contradiction: ◦ Assume there exists an alternative path b→c with lower cost 퐽ሚ푏푐 < 퐽푏푐. Then, 퐽ሚ푎푐 = 퐽푎푏 + 퐽ሚ푏푐 < 퐽푎푏 + 퐽푏푐 = 퐽푎푐, i.e., original path was not minimal. [Bertsekas, 2017] Principle of Optimality Applying Principle of Optimality • Principle of Optimality: If b – c is the initial segment of the optimal path from b – f, then c – f is the terminal segment of this path. • Thus, the optimal trajectory is found by comparing: ∗ 퐶푏푐푓 = 퐽푏푐 + 퐽푐푓 ∗ 퐶푏푑푓 = 퐽푏푑 + 퐽푑푓 ∗ 퐶푏푒푓 = 퐽푏푒 + 퐽푒푓 [Bertsekas, 2017] Applying Principle of Optimality • Need only to compare concatenation of immediate decisions with optimal decisions • In practice: carry out backwards in time. Dynamic Programming (DP) Performance measure: Dynamics: Dynamic Programming recursion (Bellman Recursion) proceeds backwards: DP – Stochastic Case Stochastic dynamics: Markovian assumption: i.e., disturbance at time 푛 is only a function of the state and control at time 푛 Implications: 1. Distribution of next state depends only on current state and control: 2. Sufficient to look for optimal policy at time 푛 as a function of 풙[푛] (and not as a function of the entire history before time 푛) DP – Stochastic Case Stochastic dynamics with Markovian property: Performance: Applying principle of optimality and exploiting linearity of expectation: DP – Inventory Control Example • Stochastic DP • Stock available 푥 푛 ∈ 푵, order 푢 푛 ∈ 푵, demand 푤 푛 ∈ 푵 • Dynamics: 푥 푛 + 1 = max(0, 푥 푛 + 푢 푛 − 푤 푛 ) • Constraints: 푥 푛 + 푢 푛 ≤ 2 • Simple stationary demand model: 푝 푤 푛 = 0 = 0.1, 푝 푤 푛 = 1 = 0.7, 푝 푤 푛 = 2 = 0.2 • Objective: No terminal cost Cost to purchase Lost business/over-supply cost DP – Inventory Control Example DP Algorithm: As an example: ∗ ∗ ∗ ∗ ∗ Thus: 퐽2 0 = 1.3, 휋2 0 = 1. Show: 퐽0 0 = 3.7, 퐽0 1 = 2.7, 퐽0 2 = 2.818 DP in Discrete Spaces Notice: Need to solve for all “successor” states first. Recursion needs solution for all possible next states. ◦ Doable for finite/discrete state-spaces (e.g., grids). ◦ Suffers from curse of dimensionality (e.g., consider quantizing a continuous state-space) Value Iteration: ◦ Set up a recursion: 퐽푛 풙 ← min(푙(푛, 풖, 풙) + 퐽푛+1 푓푑 풙, 풖, 푛 ) for all 풙. 푢 ◦ Infinite horizon setting → drop the time dependence, and iterate until convergence. Generalized Policy Iteration: ◦ Interleave policy evaluation (similar recursion with min replaced with policy), and policy improvement (argmin of Bellman formula with current value estimate) DP in Continuous Spaces Rarely, we have exact solution in continuous spaces. Otherwise: need function approximation: Approximate Dynamic Programming Examples: ◦ Fitted Value Iteration: bootstrap off current/delayed estimate of value function to compute “targets” and regress. ◦ Meshes: perform iteration on a discrete mesh and use interpolation Dynamics unknown: Reinforcement Learning: find optimal policy and value function using samples of experience (풙, 풖, 풙′, 푐). ◦ Algorithms resemble stochastic approximations of recursion formulas (+tricks) Optimal Control Open Loop Closed Loop Principle of Optimality Discrete-time: DP Continuous-time: HJB Optimal Control Open Loop Closed Loop Principle of Optimality Discrete-time: DP Continuous-time: HJB Application to Linear Dynamics DP For LQ Control Linear time-varying dynamics: Quadratic time-varying cost: 푄푛 ≽ 0, 푅푛 ≻ 0 Can treat as one big (convex) QP: Instead, let’s apply DP. DP for LQ Control Initialize Bellman recursion: Apply recursion: Plug in for dynamics, optimize w.r.t. 풖 (set gradients to zero*) and solve: 2 *Confirm: ∇풖푄푁−1 풙, 풖 = 푇 푅푁−1 + 퐵푁−1푉푁퐵푁−1 ≻ 0 ∗ Plug optimal control law back into 퐽푁−1 to get ∗ 푇 퐽푁−1 = 풙 푁 − 1 푉푁−1풙[푁 − 1] DP for LQ Control Full backward recursion (Riccati difference recursion): For 푁 = ∞, 퐴푛, 퐵푛, 푄푛, 푅푛, 푆푛 = (퐴, 퐵, 푄, 푅, 푆) with (퐴, 퐵) controllable, 푉푛, 퐿푛 → constant matrices (thereby obtaining infinite horizon/stationary policy) DP for LQ Control 푇 푇 If cost has linear terms 푞푛풙 푛 + 푟푛 풖[푛] and/or the dynamics has a drift term: 풙 푛 + 1 = 퐴푛풙 푛 + 퐵푛풖 푛 + 푐푛 Then, re-write using the composite state 풚 = 풙, 1 푇: Implications: • Optimal cost-to-go is a general quadratic: • Optimal policy is time-varying affine: Optimal Control Open Loop Closed Loop Indirect Methods Direct Methods Principle of Optimality 1. Derive 1. Just solve as one conditions of big optimization optimality problem. Discrete-time: DP Continuous-time: HJB 2. Solve these equations Application to Linear e.g., Pontryagin Dynamics Minimum Principle Optimal Control Open Loop Closed Loop Indirect Methods Direct Methods Principle of Optimality Discrete-time: DP Continuous-time: HJB Application to Linear Dynamics Open-Loop Optimal Control

Optimal Control & Trajectory Optimization

Propt Product Sheet

Optimal Control Theory Version 0.2

Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM

Optimal Control Theory

PSO-Based Soft Lunar Landing with Hazard Avoidance: Analysis and Experimentation

ID 467 Optimal Control Applied to an Economic Model for Reducing Unemployment Problems in Bangladesh

Optimal Control the Optimal Control Problem Can Be Described By

Optimal and Autonomous Control Using Reinforcement Learning: a Survey

Tomlab Product Sheet

Learning, Expectations Formation, and the Pitfalls of Optimal Control Monetary Policy

Decentralized Optimal Control of Connected and Automated Vehicles in a Corridor

Numerical Methods for Solving Optimal Control Problems