Introduction to Optimal Control ORF523 CONVEX AND CONIC OPTIMIZATION SUMEET SINGH, GOOGLE BRAIN ROBOTICS APRIL 22, 2021 Outline

• Optimal Control Problem • Open- vs Closed-Loop Solutions • Closed-Loop: Bellman’s Principle of Optimality & • Finite spaces • Continuous spaces – LQ control • Open-Loop: • Gradient descent • Newton descent • DDP • Model Predictive Control Feedback Control

• Consider block diagram for tracking some reference signal.

Adapted from: Stanford AA203, Spring 2021 Feedback Control

• Consider block diagram for tracking some reference signal.

Adapted from: Stanford AA203, Spring 2021 Feedback Control

• Consider block diagram for tracking some reference signal. Disturbance

Noise

Adapted from: Stanford AA203, Spring 2021 Feedback Control Objectives

• Stability: various formulations; loosely, system output is “under control”

• Tracking: output should track reference “as close as possible”

• Disturbance rejection: output should be “as insensitive as possible” to disturbances/noise

• Robustness: controller should still perform well up to “some degree of model misspecification”

Adapted from: Stanford AA203, Spring 2021 What’s Missing?

• Performance: some mathematical quantification of all these objectives and control that realizes the tradeoffs

• Planning: providing an appropriate reference trajectory to track (can be highly non-trivial)

• Learning: adaptation to unknown properties of the system

Adapted from: Stanford AA203, Spring 2021

What’s Missing?

• Performance: some mathematical quantification of all these objectives and control that realizes the tradeoffs

• Planning: providing an appropriate reference trajectory to track (can be highly non-trivial)

• Learning: adaptation to unknown properties of the system

Adapted from: Stanford AA203, Spring 2021 Optimal Control Problem

3 Key Ingredients:

• Mathematical description of the system to be controlled

• Specification of a performance criterion

• Specification of constraints State-Space Models

Where • are the state variables • are the control variables

Adapted from: Stanford AA203, Spring 2021 State-Space Models

In compact form:

• A history of control input values during the interval [0, T] is called a control history, denoted by u • A history of state values during the interval [0, T] is called a state trajectory, denoted by x

Adapted from: Stanford AA203, Spring 2021 Illustrative Example

• Double integrator: point mass under controlled acceleration

Adapted from: Stanford AA203, Spring 2021 Illustrative Example

• Double integrator: point mass under controlled acceleration

Adapted from: Stanford AA203, Spring 2021 Quantifying Performance

Adapted from: Stanford AA203, Spring 2021 Quantifying Performance

Adapted from: Stanford AA203, Spring 2021 Quantifying Performance

Adapted from: Stanford AA203, Spring 2021 Quantifying Performance

• More generally: Instantaneous or stage-wise cost

Terminal Cost

• 푙 and 휙 are scalar functions, and 푇 may be specified or “free” Constraints

• Initial and final conditions (boundary conditions):

• Constraints on state trajectory:

• Control limits:

• A control history and state trajectory that satisfy the control & state constraints for the entire time interval are termed admissible

Adapted from: Stanford AA203, Spring 2021 The Optimal Control Problem

Definitions: State: , Control: Continuous Time:

◦ Performance measure (minimize):

◦ Dynamics:

◦ + other constraints, e.g., 풖 푡 ∈ 푈, 풙 푡 ∈ 푋

◦ 푇 < ∞ or 푇 = ∞

• Minimizer: (풙∗, 풖∗) is an optimal solution pair. • Existence & uniqueness not always guaranteed The Optimal Control Problem

Discrete Time:

◦ Performance measure (minimize):

◦ Dynamics:

◦ + other constraints

◦ 푁 < ∞ or 푁 = ∞ Solution Methods

Dynamic Programming (Principle of Optimality) ◦ Compositionality of optimal paths ◦ Closed-loop solutions: find a solution for all states at all times

Calculus of Variations (Pontryagin Maximum/Minimum Principle) ◦ “Optimal curve should be such that neighboring curves don’t lead to smaller costs” → “Derivative = 0” ◦ Open-loop solutions: find a solution for a given initial state

Figures: [Kelly, 2017] The Optimal Control Problem

Continuous Time: • Performance measure (minimize):

• Dynamics:

• + other constraints, e.g., 풖 푡 ∈ 푈, 풙 푡 ∈ 푋

Closed-loop: find policy function 휋∗ 풙, 푡 s.t. 풖∗ 푡 = 휋∗(풙 푡 , 푡) ∗ ∗ 1,∞ ∞ Open-loop: given 풙 0 = 풙ퟎ, find optimal signals: (풙 , 풖 ), i.e., functions in 푊 [0, 푇] and 퐿 [0, 푇] The Optimal Control Problem

Discrete Time: • Performance measure (minimize):

• Dynamics:

• + other constraints

∗ ∗ ∗ ∗ Closed-loop: find policy functions: {휋0, … , 휋푁−1}, s.t. 풖 푛 = 휋푛(풙 푛 ) ∗ ∗ Open-loop: Given 풙 0 = 풙ퟎ, find optimal sequences: (풙 [ ], 풖 [ ]). Optimal Control

Open Loop Closed Loop Optimal Control

Open Loop Closed Loop

Principle of Optimality Principle of Optimality

퐽ሚ푏푐

Given trajectory from a→c, with cost 퐽푎푐 = 퐽푎푏 + 퐽푏푐 minimal, then 퐽푏푐 minimal for path b→c. Proof by contradiction: ◦ Assume there exists an alternative path b→c with lower cost 퐽ሚ푏푐 < 퐽푏푐. Then, 퐽ሚ푎푐 = 퐽푎푏 + 퐽ሚ푏푐 < 퐽푎푏 + 퐽푏푐 = 퐽푎푐, i.e., original path was not minimal.

[Bertsekas, 2017] Principle of Optimality Applying Principle of Optimality

• Principle of Optimality: If b – c is the initial segment of the optimal path from b – f, then c – f is the terminal segment of this path.

• Thus, the optimal trajectory is found by comparing:

∗ 퐶푏푐푓 = 퐽푏푐 + 퐽푐푓 ∗ 퐶푏푑푓 = 퐽푏푑 + 퐽푑푓 ∗ 퐶푏푒푓 = 퐽푏푒 + 퐽푒푓

[Bertsekas, 2017] Applying Principle of Optimality

• Need only to compare concatenation of immediate decisions with optimal decisions

• In practice: carry out backwards in time. Dynamic Programming (DP)

Performance measure:

Dynamics:

Dynamic Programming recursion (Bellman Recursion) proceeds backwards: DP – Stochastic Case

Stochastic dynamics:

Markovian assumption: i.e., disturbance at time 푛 is only a function of the state and control at time 푛

Implications: 1. Distribution of next state depends only on current state and control:

2. Sufficient to look for optimal policy at time 푛 as a function of 풙[푛] (and not as a function of the entire history before time 푛) DP – Stochastic Case

Stochastic dynamics with Markovian property:

Performance:

Applying principle of optimality and exploiting linearity of expectation: DP – Inventory Control Example

• Stochastic DP • Stock available 푥 푛 ∈ 푵, order 푢 푛 ∈ 푵, demand 푤 푛 ∈ 푵 • Dynamics: 푥 푛 + 1 = max(0, 푥 푛 + 푢 푛 − 푤 푛 ) • Constraints: 푥 푛 + 푢 푛 ≤ 2 • Simple stationary demand model: 푝 푤 푛 = 0 = 0.1, 푝 푤 푛 = 1 = 0.7, 푝 푤 푛 = 2 = 0.2 • Objective:

No terminal cost Cost to purchase Lost business/over-supply cost DP – Inventory Control Example

DP Algorithm:

As an example:

∗ ∗ ∗ ∗ ∗ Thus: 퐽2 0 = 1.3, 휋2 0 = 1. Show: 퐽0 0 = 3.7, 퐽0 1 = 2.7, 퐽0 2 = 2.818 DP in Discrete Spaces

Notice:

Need to solve for all “successor” states first. Recursion needs solution for all possible next states. ◦ Doable for finite/discrete state-spaces (e.g., grids). ◦ Suffers from curse of dimensionality (e.g., consider quantizing a continuous state-space) Value Iteration:

◦ Set up a recursion: 퐽푛 풙 ← min(푙(푛, 풖, 풙) + 퐽푛+1 푓푑 풙, 풖, 푛 ) for all 풙. 푢 ◦ Infinite horizon setting → drop the time dependence, and iterate until convergence. Generalized Policy Iteration: ◦ Interleave policy evaluation (similar recursion with min replaced with policy), and policy improvement (argmin of Bellman formula with current value estimate) DP in Continuous Spaces

Rarely, we have exact solution in continuous spaces. Otherwise: need function approximation: Approximate Dynamic Programming

Examples: ◦ Fitted Value Iteration: bootstrap off current/delayed estimate of value function to compute “targets” and regress. ◦ Meshes: perform iteration on a discrete mesh and use interpolation

Dynamics unknown: : find optimal policy and value function using samples of experience (풙, 풖, 풙′, 푐). ◦ Algorithms resemble stochastic approximations of recursion formulas (+tricks) Optimal Control

Open Loop Closed Loop

Principle of Optimality

Discrete-time: DP Continuous-time: HJB Optimal Control

Open Loop Closed Loop

Principle of Optimality

Discrete-time: DP Continuous-time: HJB

Application to Linear Dynamics DP For LQ Control

Linear time-varying dynamics:

Quadratic time-varying cost:

푄푛 ≽ 0, 푅푛 ≻ 0

Can treat as one big (convex) QP:

Instead, let’s apply DP. DP for LQ Control

Initialize Bellman recursion:

Apply recursion:

Plug in for dynamics, optimize w.r.t. 풖 (set gradients to zero*) and solve: 2 *Confirm: ∇풖푄푁−1 풙, 풖 = 푇 푅푁−1 + 퐵푁−1푉푁퐵푁−1 ≻ 0

∗ Plug optimal control law back into 퐽푁−1 to get ∗ 푇 퐽푁−1 = 풙 푁 − 1 푉푁−1풙[푁 − 1] DP for LQ Control

Full backward recursion (Riccati difference recursion):

For 푁 = ∞, 퐴푛, 퐵푛, 푄푛, 푅푛, 푆푛 = (퐴, 퐵, 푄, 푅, 푆) with (퐴, 퐵) controllable, 푉푛, 퐿푛 → constant matrices (thereby obtaining infinite horizon/stationary policy) DP for LQ Control

푇 푇 If cost has linear terms 푞푛풙 푛 + 푟푛 풖[푛] and/or the dynamics has a drift term:

풙 푛 + 1 = 퐴푛풙 푛 + 퐵푛풖 푛 + 푐푛 Then, re-write using the composite state 풚 = 풙, 1 푇:

Implications: • Optimal cost-to-go is a general quadratic:

• Optimal policy is time-varying affine: Optimal Control

Open Loop Closed Loop

Indirect Methods Direct Methods Principle of Optimality 1. Derive 1. Just solve as one conditions of big optimization optimality problem. Discrete-time: DP Continuous-time: HJB 2. Solve these equations Application to Linear e.g., Pontryagin Dynamics Minimum Principle Optimal Control

Open Loop Closed Loop

Indirect Methods Direct Methods Principle of Optimality

Discrete-time: DP Continuous-time: HJB

Application to Linear Dynamics Open-Loop Optimal Control

Discrete Time: • Performance measure (minimize):

• Dynamics:

∗ ∗ Open-loop: Given 풙 0 = 풙ퟎ, find optimal sequences: (풙 [ ], 풖 [ ]).

• If objective convex and dynamics linear → convex problem. Gradient Descent

• More generally, define the stage-wise Hamiltonian:

• Then, with 퐽푅 풖 ≔ 퐽(풖, 풙 풖 ), we have:

• Where 흀 (co-state/adjoint) satisfies a backward recursion: Newton Descent

Moreover, we also have:

For completeness:

Where

Dunn, 1989 Newton Descent

Moreover, we also have:

Can we do better?

Dunn, 1989 Optimal Control

Open Loop Closed Loop

Indirect Methods Direct Methods Principle of Optimality GD, Newton, SQP/SCP, IP, Discrete-time: DP Continuous-time: HJB etc. Application to Linear Dynamics Differential Dynamic Programming (DDP)

Consider the DP recursion: 퐽푛 풙 푛 = min[푙 풙 푛 , 풖 + 퐽푛+1 푓푑 풙 푛 , 풖 ] 풖 Fix sequence of controls 풖, with corresponding state sequence 풙, and for 푛′ = 푁 − 1, consider:

퐽푛′ 풙 푛′ + 휹풙 = min[ 푙 풙 푛′ + 휹풙, 풖 푛′ + 휹풖 + 휙 푓푑 풙 푛′ + 휹풙, 풖 푛′ + 휹풖 ] 휹풖 ≔푄푛′(휹풙,휹풖) nd Taylor expand 푄푛′ about 풙 푛′ , 풖 푛′ to 2 order and minimize w.r.t. 휹풖 ∗ ◦ Yields affine control law: 휹풖 푛′ = 퐿푛휹풙 푛′ + 휹풖푛′ “feedforward correction” ෢ ◦ Substitute back into 푄푛′, yielding quadratic approximation for 퐽푛′ about 풙[푛′] ◦ Continue recursion going backwards, with 푄푛 휹풙, 휹풖 = 푙 풙 푛 + 휹풙, 풖 푛 + 휹풖 + 퐽෣푛+1 푓푑 풙 푛 + 휹풙, 풖 푛 + 휹풖 ◦ *Almost* the Riccati backward recursion for the Newton LQ problem

Forward pass 풖 + 휹풖 through 푓푑 using affine control law: DDP Algorithm: “Quasi-Newton” Descent ◦ Newton would forward pass through linearized dynamics Near optimal, behaves like Newton. Far from optimal, much more efficient. Optimal Control

Open Loop MPC Closed Loop

Indirect Methods Direct Methods Principle of Optimality GD, Newton, SQP/SCP, IP, Discrete-time: DP Continuous-time: HJB etc. Application to Linear Dynamics DDP Model Predictive Control (MPC)

Implement Tradeoffs on Open vs Closed

Open-Loop Control ◦ Rarely used blindly online due to model errors ◦ Expensive to compute online (need to use MPC and/or simplifications) ◦ Can compute offline for a set of initial conditions (e.g., a trajectory library) and adapt online via fine-tuning (e.g., Boston Dynamics Atlas Robot) Closed-Loop Control ◦ Needed if there is any source of unmodelled effects (dynamics, disturbances, other agents, sensor noise), i.e., always ◦ Difficult to compute true optimal in discrete-time (functional recursion) or continuous-time (PDE) ◦ Difficult to certify optimality ◦ Instead, we look for certifying correctness and safety (e.g., via Lyapunov) ◦ MPC: bridge of open/closed-loop via online re-planning ◦ Feedback tracking: bridge of open/closed-loop: track open-loop plans with feedback controllers Other Topics

Hybrid systems (e.g., locomotion) Adaptive Control Reinforcement Learning Stochastic dynamics Approximate Dynamic Programming Partial observability High-dimensional observations (vision) Feedback controller design Task & Motion Planning Some References

The (updated) classic: Optimal Control & Dynamic Programming: ◦ Bertsekas Volumes 1 & 2 Introductory text – a must have: ◦ Kirk Applied Optimal control – more advanced, generally assumes knowledge of the basics: ◦ Bryson and Ho Model Predictive control – from a more modern perspective: ◦ Kouvaritakis & Cannon Applied Nonlinear control – a comprehensive intro to nonlinear control: ◦ Slotine & Li