Automatic Control 2 Optimal Control and Estimation

Lecture: Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 1 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Linear quadratic regulation (LQR) State-feedback control via pole placement requires one to assign the closed-loop poles Any way to place closed-loop poles automatically and optimally ? The main control objectives are 1 Make the state x(k) “small” (to converge to the origin) 2 Use “small” input signals u(k) (to minimize actuators’ effort) These are conflicting goals ! LQR is a technique to place automatically and optimally the closed-loop poles Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 2 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Finite-time optimal control Let the model of the open-loop process be the dynamical system x(k + 1) = Ax(k) + Bu(k) linear system with initial condition x(0) We look for the optimal sequence of inputs U = fu(0), u(1),..., u(N − 1)g driving the state x(k) towards the origin and that minimizes the performance index NX−1 ( ( ) ) = 0( ) ( ) + 0( ) ( ) + 0( ) ( ) J x 0 , U x N QNx N x k Qx k u k Ru k quadratic cost k=0 = 0 ⪰ = 0 ≻ = 0 ⪰ 1 where Q Q 0, R R 0, QN QN 0 1For a matrix Q 2 Rn×n, Q ≻ 0 means that Q is a positive definite matrix, i.e., x0Qx > 0 for all x 6= 0, 2 Rn ⪰ 0 ≥ 8 2 Rn x . QN 0 means positive semidefinite, x Qx 0, x Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 3 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Finite-time optimal control = ( ) = Example: Q diagonal Q Diag q1,..., qn , single input, QN 0 NX−1 Xn ( ( ) ) = 2( ) + 2( ) J x 0 , U qixi k Ru k k=0 i=1 Consider again the general performance index of the posed linear quadratic (LQ) problem NX−1 ( ( ) ) = 0( ) ( ) + 0( ) ( ) + 0( ) ( ) J x 0 , U x N QNx N x k Qx k u k Ru k k=0 N is called the time horizon over which we optimize performance The first term x0Qx penalizes the deviation of x from the desired target x = 0 The second term u0Ru penalizes actuator authority 0( ) ( ) ( ) The third term x N QNx N penalizes how much the final state x N deviates from the target x = 0 Q, R, QN are the tuning parameters of the optimal control design (cf. the parameters of the PID controller Kp, Ti, Td), and are directly related to physical/economic quantities Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 4 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Minimum-energy controllability Consider again the problem of controllability of the state to zero with minimum energy input 2 3 ( ) u 0 6 u(1) 7 6 7 minU 4 . 5 . u(N − 1) s.t. x(N) = 0 The minimum-energy control problem can be seen as a particular case of the LQ optimal control problem by setting = = = 1 · R I, Q 0, QN I Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 5 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Solution to LQ optimal control problem P ( ) = k ( ) + k−1 i ( − − ) By substituting x k A x 0 i=0 A Bu k 1 i in NX−1 ( ( ) ) = 0( ) ( ) + 0( ) ( ) + 0( ) ( ) J x 0 , U x k Qx k u k Ru k x N QNx N k=0 we obtain 1 1 J(x(0), U) = U0HU + x(0)0FU + x(0)0Yx(0) 2 2 where H = H0 ≻ 0 is a positive definite matrix The optimizer U∗ is obtained by zeroing the gradient 0 = r J(x(0), U) = HU + F0x(0) 2U 3 u∗(0) 6 u∗(1) 7 ∗ 6 7 −1 0 −! U = 4 . 5 = −H F x(0) . u∗(N − 1) Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 6 / 32 Lecture: Optimal control and estimation Linear quadratic regulation [ ] LQ problem matrix computation ¯ z }|Q { 2 30 2 3 2 3 x(1) Q 0 0 ... 0 x(1) 6 x(2) 7 6 0 Q 0 ... 0 7 6 x(2) 7 6 7 6 7 6 7 0 6 . 7 6 . 7 6 . 7 J(x(0), U) = x (0)Qx(0) + 6 . 7 6 . 7 6 . 7 + 4 . 5 4 . 5 4 . 5 x(N − 1) 0 ... 0 Q 0 x(N − 1) ( ) ( ) x N 0 0 ... 0 QN x N 2 3 2 3 R 0 ... 0 u(0) 6 0 R ... 0 7 6 u(1) 7 0 0 0 6 7 6 7 u (0) u (1) ... u (N − 1) 6 . 7 6 . 7 4 . .. 5 4 . 5 . 0 ... 0 R u(N − 1) | {z } R¯ ¯ z }|S { 2 3 2 3 2 3 x(1) B 0 ... 0 2 3 A u(0) 6 x(2) 7 6 AB B ... 0 7 6 A2 7 6 7 6 7 6 u(1) 7 6 7 6 . 7 = 6 . 7 4 5 + 6 . 7 x(0) 4 . 5 4 . .. 5 ... 4 . 5 . − − u(N − 1) x(N) AN 1BAN 2B ... B AN | {z } N¯ 0 0 0 J(x(0), U) = x (0)Qx(0) + (SU¯ + Nx¯ (0)) Q¯(SU¯ + Nx¯ (0)) + U RU¯ 1 0 0 0 0 1 0 0 = U 2(R¯ + S¯ Q¯S¯) U + x (0) 2N¯ Q¯S¯ U + x (0) 2(Q + N¯ Q¯N¯) x(0) 2 | {z } | {z } 2 | {z } H F Y Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 7 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Solution to LQ optimal control problem The solution 2 3 u∗(0) 6 u∗(1) 7 ∗ 6 7 −1 0 U = 4 . 5 = −H F x(0) . u∗(N − 1) ( ) = ( ( )) = − is an open-loop one: u k fk x 0 , k 0, 1,..., N 1 Moreover the dimensions of the H and F matrices is proportional to the time horizon N We use optimality principles next to find a better solution (computationally more efficient, and more elegant) Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 8 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Bellman’s principle of optimality Bellman’s principle Given the optimal sequence U∗ = [u∗(0),..., u∗(N − 1)] (and the corresponding optimal trajectory x∗(k)), the subsequence [ ∗( ) ∗( − )] u k1 ,..., u N 1 is optimal for the problem on the horizon [ ] ∗( ) k1, N , starting from the optimal state x k1 Richard Bellman (1920-1984) optimal state x∗(k) ∗( ) Given the state x k1 , the optimal input trajectory ∗ [ ] u on the remaining interval k1, N only depends on time ∗( ) x k1 0 k N 1 ∗( ) optimal input u∗(k) Then each optimal move u k of the optimal trajectory on [0, N] only depends on x∗(k) The optimal control policy can be always expressed in state feedback form u∗(k) = u∗(x∗(k)) ! time 0 k1 N Tuesday, May 11, 2010 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 9 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Bellman’s principle of optimality The principle also applies to nonlinear systems ∗ optimal state trajectories x and/or non-quadratic cost functions: the optimal control law can be always written in state-feedback form ∗( ) = ( ∗( )) 8 = − u k fk x k , k 0,..., N 1 Compared to the open-loop solution fu∗(0),..., u∗(N − 1)g = f(x(0)) the ∗( ) = ( ∗( )) feedback form u k fk x k has the big advantage of being more robust with respect to perturbations: at each time k we apply the best move on the remaining period [k, N] Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 10 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Dynamic programming ( ) = At a generic instant k1 and state x k1 z consider the optimal cost-to-go ( ) XN−1 V (z) = min x0(k)Qx(k) + u0(k)Ru(k) + x0(N)Q x(N) k1 N u(k1),...,u(N−1) k=k1 ( ) [ ] ( ) = V0 z is the optimal cost in the remaining interval k1, N starting at x k1 z Principle of dynamic programming V (z) = min J(z, U) 0 ¬f ( ) ( − )g U u 0 ,...,u N (1 ) − kX1 1 = 0( ) ( ) + 0( ) ( ) + ( ( )) min x k Qx k u k Ru k Vk x k1 u(0),...,u(k −1) 1 1 k=0 Starting at x(0), the minimum cost over [0, N] equals the minimum cost ( ) spent until step k1 plus the optimal cost-to-go from k1 to N starting at x k1 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011 11 / 32 Lecture: Optimal control and estimation Linear quadratic regulation Riccati iterations By applying the dynamic programming principle, we can compute the optimal inputs u∗(k) recursively as a function of x∗(k) (Riccati iterations): 1 ( ) = Initialization: P N QN 2 For k = N,..., 1, compute recursively the following matrix P(k−1) = Q−A0P(k)B(R+B0P(k)B)−1B0P(k)A+A0P(k)A 3 Define K(k) = −(R + B0P(k + 1)B)−1B0P(k + 1)A The optimal input is ∗ ∗ Jacopo Francesco u (k) = K(k)x (k) Riccati (1676-1754) The optimal input policy u∗(k) is a (linear time-varying) state feedback ! Prof.

Load more