OPTIMIZATION OF FUNCTIONALS
• Differentiability
• Inverse/Implicit Function Theorems
• Calculus of Variations
• Game Theory and the Minimax Theorem
• Lagrangian Duality
EL3370 Optimization of Functionals - 1
DIFFERENTIABILITY
Goal: Generalize the notion of derivative to functionals on normed spaces
Definition. Let !X ,Y be normed spaces and T :D ⊆ X →Y (a possibly nonlinear transformation) If for !x ∈D and each !h∈X , !δT(x;h)∈Y exists and is linear and continuous with respect to !h such that
T(x + h)−T(x)−δT(x;h) lim = 0 !h →0 h then !T is Fréchet differentiable at !x , and !δT(x;h) is the Fréchet differential of !T at !x with increment !h
d If !f is a functional on !X , then δ f (x;h)= f (x +αh) ! dα α=0
EL3370 Optimization of Functionals - 2
DIFFERENTIABILITY (CONT.)
Examples 1. If !X = n and f (x)= f (x ,…,x ) is a functional having continuous partial derivatives ! 1 n with respect to each variable x , then ! i
n ∂ f δ f (x;h)= h ∑ ∂x i ! i=1 i
1 2. Let X = C[0,1] and f (x)= g(x(t),t)dt where g exists and is continuous with respect to ! ! ∫0 ! x d 1 1 !x . Then δ f (x;h)= g(x(t)+αh(t),t)dt = g (x,t)h(t)dt ∫0 ∫0 x ! dα α=0
EL3370 Optimization of Functionals - 3
DIFFERENTIABILITY (CONT.)
Properties 1. If !T has a Fréchet differential, it is unique Proof. If δT(x;h), δ ′T(x;h) are Fréchet differentials of T , and ε > 0, δT(x;h)−δ ′T(x;h) ≤ T(x + h)−T(x)−δT(x;h) + T(x + h)−T(x)−δ ′T(x;h) ≤ ε h for h small. Thus, δT(x;h)−δ ′T(x;h) is a bounded operator with norm 0, i.e., δT(x;h)= δ ′T(x;h) ■
2. If !T is Fréchet differentiable at !x ∈D, where !D is open, then !T is continuous at !x Proof. Given ε > 0, there is a δ > 0 such that T(x + h)−T(x)−δT(x;h) < ε h whenever h <δ , i.e., T(x + h)−T(x) < ε h + δT(x;h) ≤(ε + M) h , where M = norm of δT(x;h), so T is continuous at x ■
If T :D ⊆ X →Y is Fréchet differentiable throughout !D, then the Fréchet differential is of the form !δT(x;h)= T′(x)h, where !T′(x)∈L(X ,Y ) is the Fréchet derivative of !T at !x
Also, if !x T′(x) is continuous in some open S ⊆ D, then !T is continuously Fréchet differentiable in !S
∗ If !f is a functional in !D, so that !δ f (x;h)= f ′(x)h, !f ′(x)∈X is the gradient of !f at !x
EL3370 Optimization of Functionals - 4
DIFFERENTIABILITY (CONT.)
Much of the theory for ordinary derivatives extends to Fréchet derivatives:
Properties
1. (Chain rule). Let S :D ⊆ X → E ⊆ Y and !P :E → Z be Fréchet differentiable at !x ∈D and !y = S(x)∈E , respectively, where !X ,Y ,Z are normed spaces and !D,E are open sets. Then !T = PS is Fréchet differentiable at !x , and !T′(x)= P′( y)S′(x)
Proof. If x,x + h∈D, then T(x + h)−T(x)= P[S(x + h)]− P[S(x)]= P( y + g)− P( y), where g = S(x + h)− S(x). Now, P( y + g)− P( y)− P′( y)g = o( g ), g− S′(x)h = o( h ) and g = O( h ), so T(x + h)−T(x)− P′( y)S′(x)h = o( h )+ o( g )= o( h ), which shows that T′(x)h= P′( y)S′(x)h ■
EL3370 Optimization of Functionals - 5
DIFFERENTIABILITY (CONT.)
Properties (cont.)
2. (Mean value theorem). Let !T be Fréchet differentiable on an open domain !D, and !x ∈D. Suppose that x +αh∈D for all α ∈[0,1]. Then T(x + h)−T(x) ≤ h ⋅sup T′(x +αh) ! ! 0<α<1
∗ Proof. Fix y ∈D , y = 1, and let φ(t):=
EL3370 Optimization of Functionals - 6
DIFFERENTIABILITY (CONT.)
Extrema The minima/maxima of a functional can be found by setting its Fréchet derivative to zero!
Definition. x ∈Ω is a local minimum of f :Ω ⊆ X → ! if there is a nbd B of x where ! 0 ! ! 0 f (x )≤ f (x) on Ω∩B, and a strict local minimum if f (x )< f (x) for all x ∈Ω∩B \{x } ! 0 ! 0 0
Theorem. If !f is Fréchet differentiable on !X , then a necessary condition for !f to have a local minimum/maximum at x ∈X is that δ f (x ;h)= 0 for all h∈X ! 0 ! 0 ! Proof. If δ f (x ;h)≠ 0, let h be of unit norm such that δ f (x ;h )> 0. As h →0, 0 0 0 0 f (x + h)− f (x )−δ f (x ;h) / h →0, so given ε > 0 there is a γ > 0 such that f (x +γ h )> 0 0 0 0 0 f (x )+δ f (x ;γ h )− εγ > f (x ), while f (x −γ h )< f (x )−δ f (x ;γ h )+ εγ < f (x ), so x is not a local 0 0 0 0 0 0 0 0 0 0 0 minimum/maximum ■
A generalization of this result to constrained optimization is:
Theorem. If x minimizes f on the convex set Ω ⊂ X , and f is Fréchet differentiable at x , 0 ! ! ! 0 then δ f (x ;x − x )≥ 0 for all x ∈Ω 0 0 Proof. For x ∈Ω, let h= x − x and note that x +αh∈Ω (0≤α ≤1) since Ω is convex. The rest of the 0 0 proof is similar to the previous one ■
EL3370 Optimization of Functionals - 7
INVERSE/IMPLICIT FUNCTION THEOREMS
The inverse and implicit function theorems are fundamental to many fields, and constitute the analytical backbone of differential geometry, essential to nonlinear system theory
Theorem (Inverse Function Theorem) Let X ,Y be Banach spaces, and x ∈X . Assume that T : X →Y is continuously Fréchet 0 differentiable in a nbd of x , and that T′(x ) is invertible. Then, there is a nbd U of x such 0 0 0 that T is invertible in U , and both T and T −1 are continuous. Furthermore, T −1 is −1 −1 continuously Fréchet differentiable in T(U), with derivative [T′(T ( y))] ( y ∈T(U))
Proof. (1) Invertibility: Since Tʹ(x ) is invertible, by translation and multiplying by a linear map, assume 0 w.l.o.g. that x =0, T(x )=0 and Tʹ(x )= I . Consider y !T (x)= x −T(x)+ y for y ∈ X ; note that a 0 0 0 y fixed point of T is precisely an x such that T(x)= y . Define the ball B := {x ∈ X : x ≤ R}, which is y R complete. Let F(x) T(x) x . By the mean value theorem, F(x) F(x ) sup F ( y) x x for = − − ʹ ≤ y∈B ʹ − ʹ R all x,xʹ ∈ B , and since Fʹ(0)=0, given a fixed ε ∈(0,1), if R >0 is small enough, R F(x)−F(xʹ) ≤ε x − xʹ
EL3370 Optimization of Functionals - 8
INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)
Proof (cont.) Suppose y ≤ R(1−ε). Note that, if x ∈ B , T (x) ≤ F(x) + y ≤ε x +R(1−ε)≤ R , so T (B )⊆ B , R y y R R and for x,xʹ ∈ B , T (x)−T (xʹ) ≤ F(x)−F(xʹ) ≤ε x − xʹ , so T is a contraction. By the Banach R y y y fixed point theorem, T has a unique fixed point, i.e., if y is small enough, there is a unique y x ∈ B such that T(x)= y , so T −1 :B → B exists R R(1−ε ) R
(2) Continuity: Since T is Fréchet differentiable in B , it is continuous there. For y, y ∈ B , R 0 R(1−ε ) −1 Ty x −Ty x = y − y0 →0 as y → y0 , so by the last part of the Banach fixed point theorem, T is 0 continuous
(3) Continuous differentiability: Consider a nbd V ⊆ B of 0 where Tʹ is invertible. Let W =T(V ), R y , y ∈W and x =T −1( y ), x =T −1( y). Then, 0 1 0 0
EL3370 Optimization of Functionals - 9
INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)
Proof (cont.)
T −1( y)−T −1( y )−[Tʹ(x )]−1( y − y ) x − x −[Tʹ(x )]−1(T(x)−T(x )) 0 0 0 = 0 0 0 y − y T(x)−T(x ) 0 0 (∗) ⎛ T(x)−T(x )−[Tʹ(x )](x − x ) ⎞⎛ x − x ⎞ ≤ [Tʹ(x )]−1 ⎜ 0 0 0 ⎟⎜ 0 ⎟ 0 ⎜ x − x ⎟⎜ T(x)−T(x ) ⎟ ⎝ 0 ⎠⎝ 0 ⎠
The 2nd factor tends to 0 as x → x , while for the 3rd factor is bounded from below: 0
T(x)−T(x ) Tʹ(x )[x − x ] T(x)−T(x )−Tʹ(x )[x − x ] lim inf 0 ≥ lim inf 0 0 − 0 0 0 x→x x→x 0 x − x 0 x − x x − x 0 0 0 Tʹ(x )[x − x ] 1 = lim inf 0 0 ≥ >0 x→x −1 0 x − x [Tʹ(x )] 0 0
Hence, the left hand side of (∗) tends to 0, and T −1( y ) has Fréchet derivative [Tʹ(x )]−1 ■ 0 0
EL3370 Optimization of Functionals - 10
INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)
Theorem (Implicit Function Theorem) Let X ,Y ,Z be Banach spaces, A an open subset of X ×Y , and f : A→ Z a continuously Fréchet differentiable function, with derivative [ f f ]. Let (x , y )∈A be such that x y 0 0 f (x , y )= 0, and assume that f (x , y ) is invertible. Then, there are open sets W ⊆ X and 0 0 y 0 0 V ⊆ A such that x ∈W , (x , y )∈V , and a g:W →Y Fréchet differentiable at x such that 0 0 0 0 (x, g(x))∈V and f (x, g(x))= 0 for all x ∈W . Moreover, g′(x )= −[ f (x , y )]−1 f (x , y ) 0 y 0 0 x 0 0
Proof. Define the continuously differentiable function F : A→ X × Z by F(x, y)=(x, f (x, y)). Note that F(x , y )=(x ,0) and 0 0 0
⎡ ⎤ ⎡ ⎤ I 0 −1 I 0 Fʹ=⎢ ⎥, [Fʹ(x , y )] =⎢ 1 1 ⎥ ⎢ f f ⎥ 0 0 ⎢ −[ f (x , y )]− f (x , y ) [ f (x , y )]− ⎥ ⎣ x y ⎦ ⎣ y 0 0 x 0 0 y 0 0 ⎦ i.e., Fʹ(x , y ) is invertible. By the inverse function theorem, there is an open V ⊆ A where F is 0 0 ⎡ ⎤ −1 −1 I 0 invertible and F is continuously differentiable. Since F =⎢ ⎥ for some G :F(V )→Y . The ⎣ G ⎦ function g:W →Y given by g(x)=G(x,0), where W = {x ∈ X :(x,0)∈ F(V )}, satisfies the conditions of the theorem ■
EL3370 Optimization of Functionals - 11
INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)
Application to initial-value problems Consider the initial-value problem
dx(t) = f (x,t), t ∈[a,b] dt n x(a)=ξ ∈ !
n where f is continuously differentiable, and x ∈ C([a,b],! )
We want to study the dependence of x on ξ . To this end, define the function n n n Φ:C([a,b],! )×! →C([a,b],! ) as
t Φ(x,ξ)(t)= x(t)−ξ − f (x(s),s)ds, t ∈[a,b] ∫a
Notice that x solves the initial-value problem iff Φ(x,ξ)=0. Now, Φ is continuously differentiable, and it satisfies the conditions of the implicit function theorem (check this!), which implies that x depends on ξ in a differentiable manner!
EL3370 Optimization of Functionals - 12
CALCULUS OF VARIATIONS
t2 A classical problem is to find a function !x on [t ,t ] that minimizes J = f [x(t),x(t),t]dt ! 1 2 ∫t ! 1
Assume that x ∈D[t ,t ], the space of real-valued continuously differentiable functions on ! 1 2 [t ,t ], with norm x = max x(t) + max x(t) ! 1 2 ! a≤t≤b a≤t≤b Also, the end points x(t ) and x(t ) are assumed fixed ! 1 ! 2
If D [t ,t ] is the subspace consisting of those x ∈D[t ,t ] such that x(t )= x(t )= 0, then the ! h 1 2 ! 1 2 ! 1 2 necessary condition for the minimization of !J is
δ J(x;h)= 0, for all h∈D [t ,t ] ! ! h 1 2
EL3370 Optimization of Functionals - 13
CALCULUS OF VARIATIONS (CONT.)
We have
d t2 δ J(x;h)= f (x +αh,x +αh,t)dt t dα ∫ 1 0 α= t t = 2 f (x,x,t)h(t)dt + 2 f (x,x,t)h(t)dt (integration by parts, assuming ∫t x ∫t x 1 1 d t f (x,x,t) is continuous in t ) 2 ⎡ d ⎤ x ! = f (x,x,t)− f (x,x,t) h(t)dt !dt ∫t ⎢ x x ⎥ ! 1 ⎣ dt ⎦
Lemma (Fundamental lemma of calculus of variations)
t2 If α ∈C[t ,t ], and α(t)h(t)dt = 0 for every h∈D [t ,t ], then !α = 0 ! 1 2 ∫t ! h 1 2 ! 1 Proof. If, say, α(t)> 0 for some t ∈(t ,t ), there is an interval (τ ,τ ) where α is strictly positive. Pick 1 2 1 2 t h(t)= (t −τ )2(t −τ )2 for t ∈(τ ,τ ) and h(t)= 0 otherwise. This gives 2α(t)h(t)dt > 0 ■ 1 2 1 2 ∫t 1
Using this result we obtain d δ J(x;h)= 0!for!all!h∈D [t ,t ] ⇔ fx (x,x,t)− fx (x,x,t)= 0 (Euler-Lagrange equation) ! h 1 2 ! dt
EL3370 Optimization of Functionals - 14
CALCULUS OF VARIATIONS (CONT.)
Example (minimum arc length) Given t ,t , x(t ), x(t ), determine the curve of minimum arc length connecting these points ! 1 2 1 2
Notice that the distance between points !(t ,x(t)) and !(t + Δt ,x(t + Δt)) is
(x(t t) x(t))2 t 2 (x(t) t o( t))2 t 2 1 x(t)2 t o( t) ! + Δ − + Δ = Δ + Δ + Δ = + Δ + Δ
t hence the total arc length, by integration, is: J = 2 1+ x(t)2 dt ∫t ! 1
Using the Euler-Lagrange equation, we obtain
d ∂ 1+ x2 = 0 !dt ∂x or !x = constant . Therefore, the extremizing arc is the straight line connecting these points
EL3370 Optimization of Functionals - 15
GAME THEORY AND THE MINIMAX THEOREM
Two-Person Zero-Sum Games
Consider a problem with two players: I and II. If player I chooses a strategy !x ∈X , and player II chooses a strategy !y ∈Y , then I gains, and II loses, an amount (payoff) !J(x, y) Each player wants to maximize its payoff
Example: Matching pennies
Player II Player I wants to maximize min J(x, y) wrt !x ! y∈Y y1 y2 Player II wants to minimize max J(x, y) wrt !y ! x∈X x1 1 1 Player I If V = maxmin J(x, y) and V ∗ = minmax J(x, y), ! ∗ x∈X y∈Y ! y∈Y x∈X x2 1 1 and V ∗ = V , V = V ∗ = V is the value of the game ! ∗ ! ∗
Not every game has a value!
EL3370 Optimization of Functionals - 16
GAME THEORY AND THE MINIMAX THEOREM (CONT.)
Mixed Strategies Instead of choosing a particular strategy, each player can choose a mixed/randomized strategy, i.e., a probability distribution over its strategy space X or Y : p (x), p ( y) ! ! ! X ! Y (assuming that !X and !Y are finite) The values of the game are
V = maxmin J(x, y)p (x)p ( y) ∗ ∑∑ X Y pX pY x∈X y∈Y V ∗ = minmax J(x, y)p (x)p ( y) p p ∑∑ X Y ! Y X x∈X y∈Y
The fundamental (minimax) theorem of game theory states that V = V ∗ ! ∗
EL3370 Optimization of Functionals - 17
GAME THEORY AND THE MINIMAX THEOREM (CONT.)
Proof of Minimax Theorem m×n We need to establish, equivalently, that for any matrix A ∈ !
V : max min xT Ay min max xT Ay :V ∗ ∗ = = = x n y m y m x n ∈!+ ∈!+ ∈!+ ∈!+ x +"+x =1 y +"+ y =1 y +"+ y =1 x +"+x =1 1 n 1 m 1 m 1 n
First notice that, for every x, y :
min xT Ayʹ≤ xT Ay ≤ max xʹT Ay y m x n ʹ∈!+ ʹ∈!+ yʹ +"+ yʹ =1 xʹ +"+xʹ =1 1 m 1 n so taking max wrt x and min wrt y gives V ≤V ∗ ! ! ∗
We need to show that V V ∗ , by showing that there is an x such that min xT Ay V ∗ ∗ ≥ 0 0 = y m ∈!+ y +"+ y =1 1 m
EL3370 Optimization of Functionals - 18
GAME THEORY AND THE MINIMAX THEOREM (CONT.)
Reformulation as an S-game To gain geometric insight, we can simplify the problem by defining the risk set
S = {Ay ∈ !n : y ∈ !m , y +"+ y =1} + 1 m so min xT Ay = min xT s y m s∈S ∈!+ y +"+ y =1 1 m x2
Example (0, 2)T y1 y2 y3 y4
x1 2 2 0 0.5 (2, 1)T S (0.5, 1)T
x2 0 1 2 1
x1 S is the convex hull of the columns of A (2, 0)T
EL3370 Optimization of Functionals - 19
GAME THEORY AND THE MINIMAX THEOREM (CONT.)
Back to the proof... T T A minimax strategy, i.e., an y0 such that max x Ay0 = min max x Ay , corresponds to an x n y m x n ∈!+ ∈!+ ∈!+ x +"+x =1 y +"+ y =1 x +"+x =1 1 n 1 m 1 n s = Ay ∈ S of minimum ∞-norm, s ∞ = max{ s ,…, s } 0 0 1 n x n 2 Let Q := {s ∈ ! : s ∞ ≤α}. Then α +
V ∗ = inf{α ≥0: Q ∩S ≠ ∅} H α
To find an x such that min xT s =V ∗ , we can use the 0 s∈S 0 separating hyperplane theorem to determine a hyperplane s0 T ∗ S (given by x ) separating Q ∗ and S : H = {s ∈ S : x s =V } V Q ( x has been scaled so that (x ) = 1, and since H contains the ↵ ∑ i vertex s∗ =(V ∗ ,…,V ∗ ) of Q , x T s = x T s∗ = V ∗ (x ) = V ∗ ) x1 V ∗ 0 ∑ i
Then we can choose x = x ! This proves the minimax theorem ■ 0
EL3370 Optimization of Functionals - 20
LAGRANGIAN DUALITY
Given a convex optimization problem in a normed space, our goal is to derive its (Lagrangian) dual. To formulate such a problem, we need to define an order relation:
Definitions - A set C in a real vector space V is a cone if x ∈ C implies that αx ∈ C for every α ≥0 - Given a convex cone P in V (positive cone), we say that x ≥ y for x, y ∈V when x − y ∈ P - If V is a normed space with positive cone P , x >0 means that x ∈ int P ⊕ ∗ ∗ ∗ - Given the positive cone P ⊆V , P := {x ∈V : x (x)≥0 for all x ∈ P} is the positive cone in ∗ ∗ ∗ V . By Hahn-Banach, if P is closed and x ∈V , then x (x)≥0 for all x ≥0 implies that x ≥0 - If X ,Y are real vector spaces, C ⊆ X is convex, and P is the positive cone of Y , a function f :C →Y is convex if f (αx +(1−α)y)≤α f (x)+(1−α)f ( y) for all x, y ∈ X , α ∈[0,1]
Given a vector space X and a normed space Y , let Ω be a convex subset of X , and P be the positive cone of Y . Also, let f :Ω→ ! and G :Ω→Y be convex functions
Consider the convex optimization problem
min f (x) x∈X s.t. x ∈ Ω, G(x)≤0
EL3370 Optimization of Functionals - 21
LAGRANGIAN DUALITY (CONT.)
To analyze this convex optimization problem, we need to introduce a special function:
Definition Let Γ = { y ∈Y : there exists an x ∈Ω such that G(x)≤ y}; this set is convex (why?) On Γ , the primal function ω :Γ → ! is given by ω( y)= inf{ f (x): x ∈Ω, G(x)≤ y} Notice that the original optimization problem corresponds to finding ω(0)
Properties (1) ω is convex Proof
ω(α y1 +(1−α )y2 )
= inf{ f (x): x ∈Ω, G(x)≤α y1 +(1−α )y2 }
≤ inf{ f (αx1 +(1−α )x2 ): x1 ,x2 ∈Ω, G(x1 )≤ y1 , G(x2 )≤ y2 } !(y) ≤α inf{ f (x): x ∈Ω, G(x)≤ y1 }
+(1−α )inf{ f (x): x ∈Ω, G(x)≤ y2 } y = αω( y )+(1−α )ω( y ) 0 1 2 (2) ω is non-increasing, i.e., if y ≤ y then ω( y )≥ω( y ) 1 2 1 2 Proof. Direct ■
EL3370 Optimization of Functionals - 22
LAGRANGIAN DUALITY (CONT.)
Duality theory of convex programming is based on the observation that, since ω is convex, so its epigraph (i.e., the area above the curve of ω in Γ × !) is convex, so it has a supporting hyperplane passing through the point (0,ω(0)). To develop this idea, consider the normed space Y × ! with the norm ( y,r) = y + r for y ∈Y and r ∈!
Theorem Assume that P has a non-empty interior, and that there exists a x ∈Ω such that G(x )< 0 1 1 (i.e., −G(x ) is an interior point of P ). Let 1
µ = inf{ f (x): x ∈Ω,G(x)≤ 0} (∗) 0 and assume µ is finite. Then, there exists a y∗ ∈P⊕ such that 0 0
µ = inf{ f (x)+ < G(x), y∗ >: x ∈Ω} (∗∗) 0 0
Furthermore, if the infimum in (∗) is achieved by some x ∈Ω, G(x )≤ 0, then the infimum 0 0 in (∗∗) is also achieved by x , and < G(x ), y∗ >= 0 0 0 0
EL3370 Optimization of Functionals - 23
LAGRANGIAN DUALITY (CONT.)
Proof. On Y × !, define the sets
A:= {( y,r): r ≥ f (x), y ≥ G(x) for some x ∈Ω} (epigraph of f )
B := {( y,r): r ≤ µ , y ≤ 0} 0
Since f ,G are convex, the sets A,B are convex. By the definition of µ , B ∩int A = ∅. Also, since P has 0 an interior point, B has a non-empty interior (why?). Then, by the separating hyperplane theorem, there is a non-zero w∗ = ( y∗ ,r )∈(Y × !)∗ such that 0 0 0
< y , y∗ > +r r ≥ < y , y∗ > +r r , for all ( y ,r )∈A, ( y ,r )∈B 1 0 0 1 2 0 0 2 1 1 2 2
From the nature of B , it follows that y∗ ≥ 0 and r ≥ 0. Since (0,µ )∈B , we have that < y, y∗ > +r r ≥ r µ 0 0 0 0 0 0 0 for all ( y,r)∈A; if r = 0, then in particular y∗ ≠ 0 and < G(x ), y∗ > ≥ 0, but since −G(x )> 0 and y∗ ≥ 0, 0 0 1 0 1 0 we would have that < G(x ), y∗ > < 0 (we know that < G(x ), y∗ > ≤ 0; now, there exists a y ∈Y such that 1 0 1 0 < y, y + > > 0, so G(x )+ ε y < 0 for ε sufficiently small, so if < G(x ), y∗ > = 0 we would have 0 1 1 0 < G(x )+ ε y, y∗ > > 0, a contradiction). Therefore, r > 0, and we can assume w.l.o.g. that r = 1 1 0 0 0 Since (0,µ )∈A∩B , we have that µ = inf{< y, y∗ > +r : ( y,r)∈A} ≤ inf{ f (x)+ < G(x), y∗ >: x ∈Ω} = 0 0 0 0 inf{ f (x): x ∈Ω,G(x)≤ 0} = µ , which establishes the first part of the theorem. Now, if there is an x ∈Ω 0 0 such that G(x )≤ 0 and f (x )= µ , then µ ≤ f (x )+ < G(x ), y∗ >≤ f (x )= µ , so < G(x ), y∗ > ■ 0 0 0 0 0 0 0 0 0 0 0
EL3370 Optimization of Functionals - 24
LAGRANGIAN DUALITY (CONT.)
∗ ∗ ∗ ⊕ The expression L(x, y )= f (x)+ < G(x), y >, for x ∈Ω, y ∈P , is the Lagrangian of the optimization problem
Corollary (Lagrangian Dual). Under the conditions of the theorem,
∗ ∗ sup L( y ):= inf{ f (x)+ < G(x), y >: x ∈Ω} = µ0 y∗∈P⊕ and the supremum is achieved by some y∗ ∈P⊕ 0 Proof. The theorem established the existence of a y∗ ∈P⊕ such that L( y∗ )= µ , while for all y∗ ∈P⊕ , 0 0 L( y∗ )= inf { f (x)+ < G(x), y∗ >} ≤ inf { f (x)+ < G(x), y∗ >} ≤ inf f (x)= µ ■ x∈Ω x∈Ω,G(x )≤0 x∈Ω, G(x )≤0 0
The dual problem typically provides useful information about the primal (original) problem, since their solutions are linked via the complementarity condition < G(x ),z∗ >= 0 0 0 Also, the dual problem always has a solution, so it may be easier to analyze than the primal
Remark. If f is non-convex, ω may be non-convex, and the optimal cost of the dual problem provides only a lower bound on the optimal cost of the original problem
EL3370 Optimization of Functionals - 25
LAGRANGIAN DUALITY (CONT.)
Examples m n n m 1) Linear programming. Let A ∈ ! × , b ∈ ! and c ∈ ! , and consider the problem
T µ0 = min b x x∈!n s.t. Ax ≥ c, x ≥0
T Assume there is an x >0 with Ax > c . Letting f (x)= b x , G(x)= c − Ax and Ω = {x : x ≥0 for all i}, the corollary yields, for y ∈ Ω⊕ = Ω, i
⎧ T T T T T T T ⎪ y c, if b ≥ A y L( y)= inf{b x + y (c − Ax): x ≥0}= inf{(b− A y) x + y c : x ≥0}= ⎨ ⎪ −∞, otherwise ⎩ so the Lagrangian dual, corresponding to the standard dual linear program, is
T µ0 = max c y y∈!n T s.t. A y ≤ b, y ≥0
EL3370 Optimization of Functionals - 26
LAGRANGIAN DUALITY (CONT.)
Examples (cont.) n n×n 2) Optimal control. Consider the system x!(t)= Ax(t)+bu(t), where x(t)∈ ! , A ∈ ! , b ∈ !n and u(t)∈ !. Given x(t ), the goal is to find an input u on [t ,t ] which minimizes 0 0 1 t J(u)= 1 u2(t)dt ∫t 0 while satisfying x(t )≥ c , where c ∈ !n . The solution of the system is 1
A(t −t ) t1 A(t −t ) x(t )= e 1 0 x(t )+Ku, Ku:= e 1 bu(t)dt 1 0 ∫t 0
A(t −t ) so the problem corresponds to minimizing J(u) subject to Ku≥ c −e 1 0 x(t )=:d 0 Assuming that u ∈ L (t ,t ), the corollary gives the dual problem 2 0 1
t1 A(t −t ) max inf { J(u)+ yT (d −Ku)}= max inf [u2(t)− yTe 1 bu(t)]dt + yTd = max yTQy + yTd y≥0 u∈L (t ,t ) y≥0 u∈L (t ,t ) ∫t y≥0 2 0 1 2 0 1 0
T t1 A(t −t ) A (t −t ) where Q:= −(1/4) e 1 bbTe 1 dt . This is a simple finite-dimensional problem, and its ∫t 0 A(t −t ) solution, y , yields u (t)=(1/2)yT e 1 b opt opt opt
EL3370 Optimization of Functionals - 27
BONUS: EQUALITY CONSTRAINED OPTIMIZATION
Problem min f (x) x∈Ω s.t. g (x)= 0, i = 1,…,n ! i where Ω ⊆ X and f , g ,…, g are Fréchet differentiable on !X ! ! ! 1 n
Theorem 1. Let x ∈Ω be a local minimum of f on the set of all x ∈Ω such that g (x)= 0, ! 0 ! ! ! i i = 1,…,n, and assume that the functionals δ g (x ;⋅ ),…,δ g (x ;⋅ ) are linearly independent ! ! 1 0 n 0 (i.e., x is a regular point). Then: ! 0
δ f (x ;h)= 0 for all h such that δ g (x ;h)= 0 for all i = 1,…,n ! 0 ! ! i 0 !
EL3370 Optimization of Functionals - 28
BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)
Proof First, notice that there exist vectors y ,…, y ∈ X such that the matrix M ∈ !n×n , M =δg (x ; y ) is 1 n ik i 0 k non-singular. To see this, consider the linear mapping G : X → !n , [G( y)] =δg (x ; y). The range of G is i i 0 n n T a linear subspace of ! ; if dimR(G)< n, there would exist a λ ∈ ! \{0} such that λ G( y)=0 for all y ∈ X , i.e., {δg (x ;⋅ )} would be linearly dependent. Therefore, there exist vectors y ,…, y ∈ X such i 0 1 n that G( y )= e , so M = I , which is non-singular i i
Fix h ∈ X such that δg (x ;h)=0 for all i =1, ,n, and consider the system of equations i 0 … n g (x +αh+ β y )=0, k =1,…,n, in α,β ,…,β . The Jacobian of this system, ⎡∂g ∂β ⎤ = M , is i 0 ∑k=1 k k 1 n ⎣ i k ⎦α=0,β =0 k n non-singular. Therefore, by the implicit function theorem (in ! ), there exists a continuous function n β :U ⊆ ! → ! in a nbd U of 0 such that β(0)=0 and
n 0= g x +αh+ β (α)y i ( 0 ∑k=1 k k ) n n g (x ) g (x ;h) g x ; ( )y o( ) o ( )y = i 0 +αδ i 0 +δ i 0 ∑ βk α k + α + ∑ βk α k !###"###$ !#( ##"k=1###$) ( k=1 ) 0 Mβ(α)
EL3370 Optimization of Functionals - 29
BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)
Proof (cont.) However, since M is non-singular and d β(α) ≤ Mβ(α) ≤ d β(α) , and since the y ’s are linearly 1 2 k n independent, d β(α) ≤ β (α)y ≤ d β(α) , for some d ,…,d >0. Therefore, from the equation 3 ∑k=1 k k 4 1 4 above, β(α) = o(α)
n Along the curve α ! x +αh+ β (α)y , f assumes its local minimum at x , so 0 ∑k=1 k k 0
d n d δ f (x ;h)= f x +αh+ β (α)y = f x +αh+o(α) =0 ■ 0 ( 0 ∑k=1 k k ) ( 0 ) dα α=0 dα α=0
EL3370 Optimization of Functionals - 30
BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)
Lemma. Let g ,…, g be linearly independent linear functionals on a vector space !X , and ! 1 n let f be a linear functional on X such that f (x)= 0 for all x ∈X such that g (x)= 0 for all ! ! ! ! ! i i = 1,…,n. Then, f ∈lin{g ,…, g } ! ! 1 n
Proof. Consider the linear operator G : X → !n+1 given by G (x)= g (x) (i =1,…,n) and G (x)= f (x). i i n+1 n+1 Notice that R(G) is a linear subspace of ! , and due to the condition on f , it does not intersect with n+1 {(0,…,0,x): x ≠0}, so dimR(G)< n+1, and there is a λ ∈ ! \{0} such that λ g (x)+!+λ g (x)+λ f (x)=0 for all x ∈ X . Due to the linear independence of the g ’s, λ ≠0, so 1 1 n n n+1 i n+1 dividing by −λ we have that f = λ! g +"+λ! g for some λ! ’s ■ n+1 1 1 n n i
From this lemma and Theorem 1, it follows immediately that
Theorem 2 (Lagrange multipliers). Under the conditions of Theorem 1, there exist constants λ ,…,λ such that ! 1 n n δ f (x ;h)+ λ δ g (x ;h)= 0 for all !h∈X ! 0 ∑ i=1 i i 0
EL3370 Optimization of Functionals - 31
BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)
Example: maximum entropy spectral analysis (Burg’s) method Consider the problem of estimating the spectrum Φ of a stationary Gaussian stochastic process, given estimates of the first n autocovariance coefficients. This problem is ill- posed, but one can appeal to the maximum entropy method to obtain an estimate:
1 π max H(Φ)= ln 2πe + ∫ lnΦ(ω)dω entropy rate of Gaussian process Φ∈C([−π ,π ]) 4π −π 1 π s.t. eikωΦ(ω)dω = c , k =0,1,…,n autocovariance coefficients 2π ∫−π k Φ(ω)≥0, for all ω ∈[−π ,π ] non-negativity constraint
We will solve this problem using calculus of variations, ignoring the non-negativity constraint (since the solution, as will be seen, is already non-negative). We will assume that the autocovariance coefficients c ,c ,…,c are such that the problem has feasible solutions. 0 1 n
EL3370 Optimization of Functionals - 32
BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)
Example: maximum entropy spectral analysis (Burg’s) method (cont.) Using Lagrange multipliers, an optimal solution Φopt should satisfy
1 π 1 1 n π h(ω)dω + λ eikωh(ω)dω =0, for all h ∈ C([−π ,π ]) ∫−π opt ∑i=−n i ∫−π 4π Φ (ω) 2π
Hence, using the fundamental lemma of calculus of variations,
1 n 1 +2 λ eikω = 0 ⇔ Φopt (ω )= − opt ∑i=−n i n Φ (ω ) 2 λ eikω ∑i=−n i where the quantities λ ,λ ,…,λ can be determined from the autocovariance coefficients 0 1 n c ,c ,…,c . This formula shows that the maximum-entropy spectrum corresponds to that of 0 1 n an “auto-regressive process”
Remark. The fact that Φopt is a maximizer of the optimization problem follows from the concavity of the cost function, and its non-negativity is due to that H(Φ)= −∞ if Φ is negative inside an interval of [−π ,π ] (yielding lower cost than any feasible Φ)
EL3370 Optimization of Functionals - 33