<<

OPTIMIZATION OF FUNCTIONALS

• Differentiability

• Inverse/Implicit Theorems

of Variations

• Game Theory and the Minimax Theorem

• Lagrangian Duality

EL3370 Optimization of Functionals - 1

DIFFERENTIABILITY

Goal: Generalize the notion of to functionals on normed spaces

Definition. Let !X ,Y be normed spaces and T :D ⊆ X →Y (a possibly nonlinear transformation) If for !x ∈D and each !h∈X , !δT(x;h)∈Y exists and is linear and continuous with respect to !h such that

T(x + h)−T(x)−δT(x;h) lim = 0 !h →0 h then !T is Fréchet differentiable at !x , and !δT(x;h) is the Fréchet differential of !T at !x with increment !h

d If !f is a functional on !X , then δ f (x;h)= f (x +αh) ! dα α=0

EL3370 Optimization of Functionals - 2

DIFFERENTIABILITY (CONT.)

Examples 1. If !X = n and f (x)= f (x ,…,x ) is a functional having continuous partial ! 1 n with respect to each variable x , then ! i

n ∂ f δ f (x;h)= h ∑ ∂x i ! i=1 i

1 2. Let X = C[0,1] and f (x)= g(x(t),t)dt where g exists and is continuous with respect to ! ! ∫0 ! x d 1 1 !x . Then δ f (x;h)= g(x(t)+αh(t),t)dt = g (x,t)h(t)dt ∫0 ∫0 x ! dα α=0

EL3370 Optimization of Functionals - 3

DIFFERENTIABILITY (CONT.)

Properties 1. If !T has a Fréchet differential, it is unique Proof. If δT(x;h), δ ′T(x;h) are Fréchet differentials of T , and ε > 0, δT(x;h)−δ ′T(x;h) ≤ T(x + h)−T(x)−δT(x;h) + T(x + h)−T(x)−δ ′T(x;h) ≤ ε h for h small. Thus, δT(x;h)−δ ′T(x;h) is a bounded operator with norm 0, i.e., δT(x;h)= δ ′T(x;h) ■

2. If !T is Fréchet differentiable at !x ∈D, where !D is open, then !T is continuous at !x Proof. Given ε > 0, there is a δ > 0 such that T(x + h)−T(x)−δT(x;h) < ε h whenever h <δ , i.e., T(x + h)−T(x) < ε h + δT(x;h) ≤(ε + M) h , where M = norm of δT(x;h), so T is continuous at x ■

If T :D ⊆ X →Y is Fréchet differentiable throughout !D, then the Fréchet differential is of the form !δT(x;h)= T′(x)h, where !T′(x)∈L(X ,Y ) is the Fréchet derivative of !T at !x

Also, if !x T′(x) is continuous in some open S ⊆ D, then !T is continuously Fréchet differentiable in !S

∗ If !f is a functional in !D, so that !δ f (x;h)= f ′(x)h, !f ′(x)∈X is the of !f at !x

EL3370 Optimization of Functionals - 4

DIFFERENTIABILITY (CONT.)

Much of the theory for ordinary derivatives extends to Fréchet derivatives:

Properties

1. (). Let S :D ⊆ X → E ⊆ Y and !P :E → Z be Fréchet differentiable at !x ∈D and !y = S(x)∈E , respectively, where !X ,Y ,Z are normed spaces and !D,E are open sets. Then !T = PS is Fréchet differentiable at !x , and !T′(x)= P′( y)S′(x)

Proof. If x,x + h∈D, then T(x + h)−T(x)= P[S(x + h)]− P[S(x)]= P( y + g)− P( y), where g = S(x + h)− S(x). Now, P( y + g)− P( y)− P′( y)g = o( g ), g− S′(x)h = o( h ) and g = O( h ), so T(x + h)−T(x)− P′( y)S′(x)h = o( h )+ o( g )= o( h ), which shows that T′(x)h= P′( y)S′(x)h ■

EL3370 Optimization of Functionals - 5

DIFFERENTIABILITY (CONT.)

Properties (cont.)

2. (). Let !T be Fréchet differentiable on an open domain !D, and !x ∈D. Suppose that x +αh∈D for all α ∈[0,1]. Then T(x + h)−T(x) ≤ h ⋅sup T′(x +αh) ! ! 0<α<1

∗ Proof. Fix y ∈D , y = 1, and let φ(t):= ( t ∈[0,1]), which is differentiable. Let γ (t)=φ(t)−(1−t)φ(0)−tφ(1), so γ (0)= γ (1)= 0 and γ ′(t)= φ′(t)+φ(0)−φ(1). If γ ≡ 0, γ ′ ≡ 0; if not, there is a τ ∈(0,1) such that, e.g., γ (τ )> 0, so there is an s ∈(0,1) such that γ (s)= max γ (t). Now, t∈[0,1] γ (s + h)−γ (s)≤ 0 whenever 0≤ s + h≤1, so γ ′(s)= 0, and φ(1)−φ(0) = φ′(s) ≤ sup γ ′(t) ≤ 0 , so taking the sup over y = 1 yields 0<α<1 the result ■

EL3370 Optimization of Functionals - 6

DIFFERENTIABILITY (CONT.)

Extrema The minima/maxima of a functional can be found by setting its Fréchet derivative to zero!

Definition. x ∈Ω is a local minimum of f :Ω ⊆ X → ! if there is a nbd B of x where ! 0 ! ! 0 f (x )≤ f (x) on Ω∩B, and a strict local minimum if f (x )< f (x) for all x ∈Ω∩B \{x } ! 0 ! 0 0

Theorem. If !f is Fréchet differentiable on !X , then a necessary condition for !f to have a local minimum/maximum at x ∈X is that δ f (x ;h)= 0 for all h∈X ! 0 ! 0 ! Proof. If δ f (x ;h)≠ 0, let h be of unit norm such that δ f (x ;h )> 0. As h →0, 0 0 0 0 f (x + h)− f (x )−δ f (x ;h) / h →0, so given ε > 0 there is a γ > 0 such that f (x +γ h )> 0 0 0 0 0 f (x )+δ f (x ;γ h )− εγ > f (x ), while f (x −γ h )< f (x )−δ f (x ;γ h )+ εγ < f (x ), so x is not a local 0 0 0 0 0 0 0 0 0 0 0 minimum/maximum ■

A generalization of this result to constrained optimization is:

Theorem. If x minimizes f on the convex set Ω ⊂ X , and f is Fréchet differentiable at x , 0 ! ! ! 0 then δ f (x ;x − x )≥ 0 for all x ∈Ω 0 0 Proof. For x ∈Ω, let h= x − x and note that x +αh∈Ω (0≤α ≤1) since Ω is convex. The rest of the 0 0 proof is similar to the previous one ■

EL3370 Optimization of Functionals - 7

INVERSE/IMPLICIT FUNCTION THEOREMS

The inverse and implicit function theorems are fundamental to many fields, and constitute the analytical backbone of differential geometry, essential to nonlinear system theory

Theorem ( Theorem) Let X ,Y be Banach spaces, and x ∈X . Assume that T : X →Y is continuously Fréchet 0 differentiable in a nbd of x , and that T′(x ) is invertible. Then, there is a nbd U of x such 0 0 0 that T is invertible in U , and both T and T −1 are continuous. Furthermore, T −1 is −1 −1 continuously Fréchet differentiable in T(U), with derivative [T′(T ( y))] ( y ∈T(U))

Proof. (1) Invertibility: Since Tʹ(x ) is invertible, by translation and multiplying by a linear map, assume 0 w.l.o.g. that x =0, T(x )=0 and Tʹ(x )= I . Consider y !T (x)= x −T(x)+ y for y ∈ X ; note that a 0 0 0 y fixed point of T is precisely an x such that T(x)= y . Define the ball B := {x ∈ X : x ≤ R}, which is y R complete. Let F(x) T(x) x . By the mean value theorem, F(x) F(x ) sup F ( y) x x for = − − ʹ ≤ y∈B ʹ − ʹ R all x,xʹ ∈ B , and since Fʹ(0)=0, given a fixed ε ∈(0,1), if R >0 is small enough, R F(x)−F(xʹ) ≤ε x − xʹ

EL3370 Optimization of Functionals - 8

INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)

Proof (cont.) Suppose y ≤ R(1−ε). Note that, if x ∈ B , T (x) ≤ F(x) + y ≤ε x +R(1−ε)≤ R , so T (B )⊆ B , R y y R R and for x,xʹ ∈ B , T (x)−T (xʹ) ≤ F(x)−F(xʹ) ≤ε x − xʹ , so T is a contraction. By the Banach R y y y fixed point theorem, T has a unique fixed point, i.e., if y is small enough, there is a unique y x ∈ B such that T(x)= y , so T −1 :B → B exists R R(1−ε ) R

(2) Continuity: Since T is Fréchet differentiable in B , it is continuous there. For y, y ∈ B , R 0 R(1−ε ) −1 Ty x −Ty x = y − y0 →0 as y → y0 , so by the last part of the Banach fixed point theorem, T is 0 continuous

(3) Continuous differentiability: Consider a nbd V ⊆ B of 0 where Tʹ is invertible. Let W =T(V ), R y , y ∈W and x =T −1( y ), x =T −1( y). Then, 0 1 0 0

EL3370 Optimization of Functionals - 9

INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)

Proof (cont.)

T −1( y)−T −1( y )−[Tʹ(x )]−1( y − y ) x − x −[Tʹ(x )]−1(T(x)−T(x )) 0 0 0 = 0 0 0 y − y T(x)−T(x ) 0 0 (∗) ⎛ T(x)−T(x )−[Tʹ(x )](x − x ) ⎞⎛ x − x ⎞ ≤ [Tʹ(x )]−1 ⎜ 0 0 0 ⎟⎜ 0 ⎟ 0 ⎜ x − x ⎟⎜ T(x)−T(x ) ⎟ ⎝ 0 ⎠⎝ 0 ⎠

The 2nd factor tends to 0 as x → x , while for the 3rd factor is bounded from below: 0

T(x)−T(x ) Tʹ(x )[x − x ] T(x)−T(x )−Tʹ(x )[x − x ] lim inf 0 ≥ lim inf 0 0 − 0 0 0 x→x x→x 0 x − x 0 x − x x − x 0 0 0 Tʹ(x )[x − x ] 1 = lim inf 0 0 ≥ >0 x→x −1 0 x − x [Tʹ(x )] 0 0

Hence, the left hand side of (∗) tends to 0, and T −1( y ) has Fréchet derivative [Tʹ(x )]−1 ■ 0 0

EL3370 Optimization of Functionals - 10

INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)

Theorem (Implicit Function Theorem) Let X ,Y ,Z be Banach spaces, A an open subset of X ×Y , and f : A→ Z a continuously Fréchet , with derivative [ f f ]. Let (x , y )∈A be such that x y 0 0 f (x , y )= 0, and assume that f (x , y ) is invertible. Then, there are open sets W ⊆ X and 0 0 y 0 0 V ⊆ A such that x ∈W , (x , y )∈V , and a g:W →Y Fréchet differentiable at x such that 0 0 0 0 (x, g(x))∈V and f (x, g(x))= 0 for all x ∈W . Moreover, g′(x )= −[ f (x , y )]−1 f (x , y ) 0 y 0 0 x 0 0

Proof. Define the continuously differentiable function F : A→ X × Z by F(x, y)=(x, f (x, y)). Note that F(x , y )=(x ,0) and 0 0 0

⎡ ⎤ ⎡ ⎤ I 0 −1 I 0 Fʹ=⎢ ⎥, [Fʹ(x , y )] =⎢ 1 1 ⎥ ⎢ f f ⎥ 0 0 ⎢ −[ f (x , y )]− f (x , y ) [ f (x , y )]− ⎥ ⎣ x y ⎦ ⎣ y 0 0 x 0 0 y 0 0 ⎦ i.e., Fʹ(x , y ) is invertible. By the inverse function theorem, there is an open V ⊆ A where F is 0 0 ⎡ ⎤ −1 −1 I 0 invertible and F is continuously differentiable. Since F =⎢ ⎥ for some G :F(V )→Y . The ⎣ G ⎦ function g:W →Y given by g(x)=G(x,0), where W = {x ∈ X :(x,0)∈ F(V )}, satisfies the conditions of the theorem ■

EL3370 Optimization of Functionals - 11

INVERSE/IMPLICIT FUNCTION THEOREMS (CONT.)

Application to initial-value problems Consider the initial-value problem

dx(t) = f (x,t), t ∈[a,b] dt n x(a)=ξ ∈ !

n where f is continuously differentiable, and x ∈ C([a,b],! )

We want to study the dependence of x on ξ . To this end, define the function n n n Φ:C([a,b],! )×! →C([a,b],! ) as

t Φ(x,ξ)(t)= x(t)−ξ − f (x(s),s)ds, t ∈[a,b] ∫a

Notice that x solves the initial-value problem iff Φ(x,ξ)=0. Now, Φ is continuously differentiable, and it satisfies the conditions of the implicit function theorem (check this!), which implies that x depends on ξ in a differentiable manner!

EL3370 Optimization of Functionals - 12

CALCULUS OF VARIATIONS

t2 A classical problem is to find a function !x on [t ,t ] that minimizes J = f [x(t),x(t),t]dt ! 1 2 ∫t ! 1

Assume that x ∈D[t ,t ], the space of real-valued continuously differentiable functions on ! 1 2 [t ,t ], with norm x = max x(t) + max x(t) ! 1 2 ! a≤t≤b a≤t≤b Also, the end points x(t ) and x(t ) are assumed fixed ! 1 ! 2

If D [t ,t ] is the subspace consisting of those x ∈D[t ,t ] such that x(t )= x(t )= 0, then the ! h 1 2 ! 1 2 ! 1 2 necessary condition for the minimization of !J is

δ J(x;h)= 0, for all h∈D [t ,t ] ! ! h 1 2

EL3370 Optimization of Functionals - 13

CALCULUS OF VARIATIONS (CONT.)

We have

d t2 δ J(x;h)= f (x +αh,x +αh,t)dt t dα ∫ 1 0 α= t t = 2 f (x,x,t)h(t)dt + 2 f (x,x,t)h(t)dt (, assuming ∫t x ∫t x 1 1 d t f (x,x,t) is continuous in t ) 2 ⎡ d ⎤ x  ! = f (x,x,t)− f (x,x,t) h(t)dt !dt  ∫t ⎢ x x ⎥ ! 1 ⎣ dt ⎦

Lemma (Fundamental lemma of calculus of variations)

t2 If α ∈C[t ,t ], and α(t)h(t)dt = 0 for every h∈D [t ,t ], then !α = 0 ! 1 2 ∫t ! h 1 2 ! 1 Proof. If, say, α(t)> 0 for some t ∈(t ,t ), there is an interval (τ ,τ ) where α is strictly positive. Pick 1 2 1 2 t h(t)= (t −τ )2(t −τ )2 for t ∈(τ ,τ ) and h(t)= 0 otherwise. This gives 2α(t)h(t)dt > 0 ■ 1 2 1 2 ∫t 1

Using this result we obtain d δ J(x;h)= 0!for!all!h∈D [t ,t ] ⇔ fx (x,x,t)− fx (x,x,t)= 0 (Euler-Lagrange ) ! h 1 2 ! dt 

EL3370 Optimization of Functionals - 14

CALCULUS OF VARIATIONS (CONT.)

Example (minimum ) Given t ,t , x(t ), x(t ), determine the of minimum arc length connecting these points ! 1 2 1 2

Notice that the distance between points !(t ,x(t)) and !(t + Δt ,x(t + Δt)) is

(x(t t) x(t))2 t 2 (x(t) t o( t))2 t 2 1 x(t)2 t o( t) ! + Δ − + Δ =  Δ + Δ + Δ = +  Δ + Δ

t hence the total arc length, by integration, is: J = 2 1+ x(t)2 dt ∫t ! 1

Using the Euler-Lagrange equation, we obtain

d ∂ 1+ x2 = 0 !dt ∂x or !x = constant . Therefore, the extremizing arc is the straight line connecting these points

EL3370 Optimization of Functionals - 15

GAME THEORY AND THE MINIMAX THEOREM

Two-Person Zero-Sum Games

Consider a problem with two players: I and II. If player I chooses a strategy !x ∈X , and player II chooses a strategy !y ∈Y , then I gains, and II loses, an amount (payoff) !J(x, y) Each player wants to maximize its payoff

Example: Matching pennies

Player II Player I wants to maximize min J(x, y) wrt !x ! y∈Y y1 y2 Player II wants to minimize max J(x, y) wrt !y ! x∈X x1 1 1 Player I If V = maxmin J(x, y) and V ∗ = minmax J(x, y), ! ∗ x∈X y∈Y ! y∈Y x∈X x2 1 1 and V ∗ = V , V = V ∗ = V is the value of the game ! ∗ ! ∗

Not every game has a value!

EL3370 Optimization of Functionals - 16

GAME THEORY AND THE MINIMAX THEOREM (CONT.)

Mixed Strategies Instead of choosing a particular strategy, each player can choose a mixed/randomized strategy, i.e., a probability distribution over its strategy space X or Y : p (x), p ( y) ! ! ! X ! Y (assuming that !X and !Y are finite) The values of the game are

V = maxmin J(x, y)p (x)p ( y) ∗ ∑∑ X Y pX pY x∈X y∈Y V ∗ = minmax J(x, y)p (x)p ( y) p p ∑∑ X Y ! Y X x∈X y∈Y

The fundamental (minimax) theorem of game theory states that V = V ∗ ! ∗

EL3370 Optimization of Functionals - 17

GAME THEORY AND THE MINIMAX THEOREM (CONT.)

Proof of Minimax Theorem m×n We need to establish, equivalently, that for any matrix A ∈ !

V : max min xT Ay min max xT Ay :V ∗ ∗ = = = x n y m y m x n ∈!+ ∈!+ ∈!+ ∈!+ x +"+x =1 y +"+ y =1 y +"+ y =1 x +"+x =1 1 n 1 m 1 m 1 n

First notice that, for every x, y :

min xT Ayʹ≤ xT Ay ≤ max xʹT Ay y m x n ʹ∈!+ ʹ∈!+ yʹ +"+ yʹ =1 xʹ +"+xʹ =1 1 m 1 n so taking max wrt x and min wrt y gives V ≤V ∗ ! ! ∗

We need to show that V V ∗ , by showing that there is an x such that min xT Ay V ∗ ∗ ≥ 0 0 = y m ∈!+ y +"+ y =1 1 m

EL3370 Optimization of Functionals - 18

GAME THEORY AND THE MINIMAX THEOREM (CONT.)

Reformulation as an S-game To gain geometric insight, we can simplify the problem by defining the risk set

S = {Ay ∈ !n : y ∈ !m , y +"+ y =1} + 1 m so min xT Ay = min xT s y m s∈S ∈!+ y +"+ y =1 1 m x2

Example (0, 2)T y1 y2 y3 y4

x1 2 2 0 0.5 (2, 1)T S (0.5, 1)T

x2 0 1 2 1

x1 S is the convex hull of the columns of A (2, 0)T

EL3370 Optimization of Functionals - 19

GAME THEORY AND THE MINIMAX THEOREM (CONT.)

Back to the proof... T T A minimax strategy, i.e., an y0 such that max x Ay0 = min max x Ay , corresponds to an x n y m x n ∈!+ ∈!+ ∈!+ x +"+x =1 y +"+ y =1 x +"+x =1 1 n 1 m 1 n s = Ay ∈ S of minimum ∞-norm, s ∞ = max{ s ,…, s } 0 0 1 n x n 2 Let Q := {s ∈ ! : s ∞ ≤α}. Then α +

V ∗ = inf{α ≥0: Q ∩S ≠ ∅} H α

To find an x such that min xT s =V ∗ , we can use the 0 s∈S 0 separating hyperplane theorem to determine a hyperplane s0 T ∗ S (given by x ) separating Q ∗ and S : H = {s ∈ S : x s =V } V Q ( x has been scaled so that (x ) = 1, and since H contains the ↵ ∑ i vertex s∗ =(V ∗ ,…,V ∗ ) of Q , x T s = x T s∗ = V ∗ (x ) = V ∗ ) x1 V ∗ 0 ∑ i

Then we can choose x = x ! This proves the minimax theorem ■ 0

EL3370 Optimization of Functionals - 20

LAGRANGIAN DUALITY

Given a convex optimization problem in a normed space, our goal is to derive its (Lagrangian) dual. To formulate such a problem, we need to define an order relation:

Definitions - A set C in a real vector space V is a cone if x ∈ C implies that αx ∈ C for every α ≥0 - Given a convex cone P in V (positive cone), we say that x ≥ y for x, y ∈V when x − y ∈ P - If V is a normed space with positive cone P , x >0 means that x ∈ int P ⊕ ∗ ∗ ∗ - Given the positive cone P ⊆V , P := {x ∈V : x (x)≥0 for all x ∈ P} is the positive cone in ∗ ∗ ∗ V . By Hahn-Banach, if P is closed and x ∈V , then x (x)≥0 for all x ≥0 implies that x ≥0 - If X ,Y are real vector spaces, C ⊆ X is convex, and P is the positive cone of Y , a function f :C →Y is convex if f (αx +(1−α)y)≤α f (x)+(1−α)f ( y) for all x, y ∈ X , α ∈[0,1]

Given a vector space X and a normed space Y , let Ω be a convex subset of X , and P be the positive cone of Y . Also, let f :Ω→ ! and G :Ω→Y be convex functions

Consider the convex optimization problem

min f (x) x∈X s.t. x ∈ Ω, G(x)≤0

EL3370 Optimization of Functionals - 21

LAGRANGIAN DUALITY (CONT.)

To analyze this convex optimization problem, we need to introduce a special function:

Definition Let Γ = { y ∈Y : there exists an x ∈Ω such that G(x)≤ y}; this set is convex (why?) On Γ , the primal function ω :Γ → ! is given by ω( y)= inf{ f (x): x ∈Ω, G(x)≤ y} Notice that the original optimization problem corresponds to finding ω(0)

Properties (1) ω is convex Proof

ω(α y1 +(1−α )y2 )

= inf{ f (x): x ∈Ω, G(x)≤α y1 +(1−α )y2 }

≤ inf{ f (αx1 +(1−α )x2 ): x1 ,x2 ∈Ω, G(x1 )≤ y1 , G(x2 )≤ y2 } !(y) ≤α inf{ f (x): x ∈Ω, G(x)≤ y1 }

+(1−α )inf{ f (x): x ∈Ω, G(x)≤ y2 } y = αω( y )+(1−α )ω( y ) 0 1 2 (2) ω is non-increasing, i.e., if y ≤ y then ω( y )≥ω( y ) 1 2 1 2 Proof. Direct ■

EL3370 Optimization of Functionals - 22

LAGRANGIAN DUALITY (CONT.)

Duality theory of convex programming is based on the observation that, since ω is convex, so its epigraph (i.e., the area above the curve of ω in Γ × !) is convex, so it has a supporting hyperplane passing through the point (0,ω(0)). To develop this idea, consider the normed space Y × ! with the norm ( y,r) = y + r for y ∈Y and r ∈!

Theorem Assume that P has a non-empty interior, and that there exists a x ∈Ω such that G(x )< 0 1 1 (i.e., −G(x ) is an interior point of P ). Let 1

µ = inf{ f (x): x ∈Ω,G(x)≤ 0} (∗) 0 and assume µ is finite. Then, there exists a y∗ ∈P⊕ such that 0 0

µ = inf{ f (x)+ < G(x), y∗ >: x ∈Ω} (∗∗) 0 0

Furthermore, if the infimum in (∗) is achieved by some x ∈Ω, G(x )≤ 0, then the infimum 0 0 in (∗∗) is also achieved by x , and < G(x ), y∗ >= 0 0 0 0

EL3370 Optimization of Functionals - 23

LAGRANGIAN DUALITY (CONT.)

Proof. On Y × !, define the sets

A:= {( y,r): r ≥ f (x), y ≥ G(x) for some x ∈Ω} (epigraph of f )

B := {( y,r): r ≤ µ , y ≤ 0} 0

Since f ,G are convex, the sets A,B are convex. By the definition of µ , B ∩int A = ∅. Also, since P has 0 an interior point, B has a non-empty interior (why?). Then, by the separating hyperplane theorem, there is a non-zero w∗ = ( y∗ ,r )∈(Y × !)∗ such that 0 0 0

< y , y∗ > +r r ≥ < y , y∗ > +r r , for all ( y ,r )∈A, ( y ,r )∈B 1 0 0 1 2 0 0 2 1 1 2 2

From the nature of B , it follows that y∗ ≥ 0 and r ≥ 0. Since (0,µ )∈B , we have that < y, y∗ > +r r ≥ r µ 0 0 0 0 0 0 0 for all ( y,r)∈A; if r = 0, then in particular y∗ ≠ 0 and < G(x ), y∗ > ≥ 0, but since −G(x )> 0 and y∗ ≥ 0, 0 0 1 0 1 0 we would have that < G(x ), y∗ > < 0 (we know that < G(x ), y∗ > ≤ 0; now, there exists a y ∈Y such that 1 0 1 0 < y, y + > > 0, so G(x )+ ε y < 0 for ε sufficiently small, so if < G(x ), y∗ > = 0 we would have 0 1 1 0 < G(x )+ ε y, y∗ > > 0, a contradiction). Therefore, r > 0, and we can assume w.l.o.g. that r = 1 1 0 0 0 Since (0,µ )∈A∩B , we have that µ = inf{< y, y∗ > +r : ( y,r)∈A} ≤ inf{ f (x)+ < G(x), y∗ >: x ∈Ω} = 0 0 0 0 inf{ f (x): x ∈Ω,G(x)≤ 0} = µ , which establishes the first part of the theorem. Now, if there is an x ∈Ω 0 0 such that G(x )≤ 0 and f (x )= µ , then µ ≤ f (x )+ < G(x ), y∗ >≤ f (x )= µ , so < G(x ), y∗ > ■ 0 0 0 0 0 0 0 0 0 0 0

EL3370 Optimization of Functionals - 24

LAGRANGIAN DUALITY (CONT.)

∗ ∗ ∗ ⊕ The expression L(x, y )= f (x)+ < G(x), y >, for x ∈Ω, y ∈P , is the Lagrangian of the optimization problem

Corollary (Lagrangian Dual). Under the conditions of the theorem,

∗ ∗ sup L( y ):= inf{ f (x)+ < G(x), y >: x ∈Ω} = µ0 y∗∈P⊕ and the supremum is achieved by some y∗ ∈P⊕ 0 Proof. The theorem established the existence of a y∗ ∈P⊕ such that L( y∗ )= µ , while for all y∗ ∈P⊕ , 0 0 L( y∗ )= inf { f (x)+ < G(x), y∗ >} ≤ inf { f (x)+ < G(x), y∗ >} ≤ inf f (x)= µ ■ x∈Ω x∈Ω,G(x )≤0 x∈Ω, G(x )≤0 0

The dual problem typically provides useful information about the primal (original) problem, since their solutions are linked via the complementarity condition < G(x ),z∗ >= 0 0 0 Also, the dual problem always has a solution, so it may be easier to analyze than the primal

Remark. If f is non-convex, ω may be non-convex, and the optimal cost of the dual problem provides only a lower bound on the optimal cost of the original problem

EL3370 Optimization of Functionals - 25

LAGRANGIAN DUALITY (CONT.)

Examples m n n m 1) Linear programming. Let A ∈ ! × , b ∈ ! and c ∈ ! , and consider the problem

T µ0 = min b x x∈!n s.t. Ax ≥ c, x ≥0

T Assume there is an x >0 with Ax > c . Letting f (x)= b x , G(x)= c − Ax and Ω = {x : x ≥0 for all i}, the corollary yields, for y ∈ Ω⊕ = Ω, i

⎧ T T T T T T T ⎪ y c, if b ≥ A y L( y)= inf{b x + y (c − Ax): x ≥0}= inf{(b− A y) x + y c : x ≥0}= ⎨ ⎪ −∞, otherwise ⎩ so the Lagrangian dual, corresponding to the standard dual linear program, is

T µ0 = max c y y∈!n T s.t. A y ≤ b, y ≥0

EL3370 Optimization of Functionals - 26

LAGRANGIAN DUALITY (CONT.)

Examples (cont.) n n×n 2) Optimal control. Consider the system x!(t)= Ax(t)+bu(t), where x(t)∈ ! , A ∈ ! , b ∈ !n and u(t)∈ !. Given x(t ), the goal is to find an input u on [t ,t ] which minimizes 0 0 1 t J(u)= 1 u2(t)dt ∫t 0 while satisfying x(t )≥ c , where c ∈ !n . The solution of the system is 1

A(t −t ) t1 A(t −t ) x(t )= e 1 0 x(t )+Ku, Ku:= e 1 bu(t)dt 1 0 ∫t 0

A(t −t ) so the problem corresponds to minimizing J(u) subject to Ku≥ c −e 1 0 x(t )=:d 0 Assuming that u ∈ L (t ,t ), the corollary gives the dual problem 2 0 1

t1 A(t −t ) max inf { J(u)+ yT (d −Ku)}= max inf [u2(t)− yTe 1 bu(t)]dt + yTd = max yTQy + yTd y≥0 u∈L (t ,t ) y≥0 u∈L (t ,t ) ∫t y≥0 2 0 1 2 0 1 0

T t1 A(t −t ) A (t −t ) where Q:= −(1/4) e 1 bbTe 1 dt . This is a simple finite-dimensional problem, and its ∫t 0 A(t −t ) solution, y , yields u (t)=(1/2)yT e 1 b opt opt opt

EL3370 Optimization of Functionals - 27

BONUS: EQUALITY CONSTRAINED OPTIMIZATION

Problem min f (x) x∈Ω s.t. g (x)= 0, i = 1,…,n ! i where Ω ⊆ X and f , g ,…, g are Fréchet differentiable on !X ! ! ! 1 n

Theorem 1. Let x ∈Ω be a local minimum of f on the set of all x ∈Ω such that g (x)= 0, ! 0 ! ! ! i i = 1,…,n, and assume that the functionals δ g (x ;⋅ ),…,δ g (x ;⋅ ) are linearly independent ! ! 1 0 n 0 (i.e., x is a regular point). Then: ! 0

δ f (x ;h)= 0 for all h such that δ g (x ;h)= 0 for all i = 1,…,n ! 0 ! ! i 0 !

EL3370 Optimization of Functionals - 28

BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)

Proof First, notice that there exist vectors y ,…, y ∈ X such that the matrix M ∈ !n×n , M =δg (x ; y ) is 1 n ik i 0 k non-singular. To see this, consider the linear mapping G : X → !n , [G( y)] =δg (x ; y). The range of G is i i 0 n n T a linear subspace of ! ; if dimR(G)< n, there would exist a λ ∈ ! \{0} such that λ G( y)=0 for all y ∈ X , i.e., {δg (x ;⋅ )} would be linearly dependent. Therefore, there exist vectors y ,…, y ∈ X such i 0 1 n that G( y )= e , so M = I , which is non-singular i i

Fix h ∈ X such that δg (x ;h)=0 for all i =1, ,n, and consider the system of i 0 … n g (x +αh+ β y )=0, k =1,…,n, in α,β ,…,β . The Jacobian of this system, ⎡∂g ∂β ⎤ = M , is i 0 ∑k=1 k k 1 n ⎣ i k ⎦α=0,β =0 k n non-singular. Therefore, by the implicit function theorem (in ! ), there exists a n β :U ⊆ ! → ! in a nbd U of 0 such that β(0)=0 and

n 0= g x +αh+ β (α)y i ( 0 ∑k=1 k k ) n n g (x ) g (x ;h) g x ; ( )y o( ) o ( )y = i 0 +αδ i 0 +δ i 0 ∑ βk α k + α + ∑ βk α k !###"###$ !#( ##"k=1###$) ( k=1 ) 0 Mβ(α)

EL3370 Optimization of Functionals - 29

BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)

Proof (cont.) However, since M is non-singular and d β(α) ≤ Mβ(α) ≤ d β(α) , and since the y ’s are linearly 1 2 k n independent, d β(α) ≤ β (α)y ≤ d β(α) , for some d ,…,d >0. Therefore, from the equation 3 ∑k=1 k k 4 1 4 above, β(α) = o(α)

n Along the curve α ! x +αh+ β (α)y , f assumes its local minimum at x , so 0 ∑k=1 k k 0

d n d δ f (x ;h)= f x +αh+ β (α)y = f x +αh+o(α) =0 ■ 0 ( 0 ∑k=1 k k ) ( 0 ) dα α=0 dα α=0

EL3370 Optimization of Functionals - 30

BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)

Lemma. Let g ,…, g be linearly independent linear functionals on a vector space !X , and ! 1 n let f be a linear functional on X such that f (x)= 0 for all x ∈X such that g (x)= 0 for all ! ! ! ! ! i i = 1,…,n. Then, f ∈lin{g ,…, g } ! ! 1 n

Proof. Consider the linear operator G : X → !n+1 given by G (x)= g (x) (i =1,…,n) and G (x)= f (x). i i n+1 n+1 Notice that R(G) is a linear subspace of ! , and due to the condition on f , it does not intersect with n+1 {(0,…,0,x): x ≠0}, so dimR(G)< n+1, and there is a λ ∈ ! \{0} such that λ g (x)+!+λ g (x)+λ f (x)=0 for all x ∈ X . Due to the linear independence of the g ’s, λ ≠0, so 1 1 n n n+1 i n+1 dividing by −λ we have that f = λ! g +"+λ! g for some λ! ’s ■ n+1 1 1 n n i

From this lemma and Theorem 1, it follows immediately that

Theorem 2 (Lagrange multipliers). Under the conditions of Theorem 1, there exist constants λ ,…,λ such that ! 1 n n δ f (x ;h)+ λ δ g (x ;h)= 0 for all !h∈X ! 0 ∑ i=1 i i 0

EL3370 Optimization of Functionals - 31

BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)

Example: maximum entropy spectral analysis (Burg’s) method Consider the problem of estimating the spectrum Φ of a stationary Gaussian stochastic process, given estimates of the first n autocovariance coefficients. This problem is ill- posed, but one can appeal to the maximum entropy method to obtain an estimate:

1 π max H(Φ)= ln 2πe + ∫ lnΦ(ω)dω entropy rate of Gaussian process Φ∈C([−π ,π ]) 4π −π 1 π s.t. eikωΦ(ω)dω = c , k =0,1,…,n autocovariance coefficients 2π ∫−π k Φ(ω)≥0, for all ω ∈[−π ,π ] non-negativity constraint

We will solve this problem using calculus of variations, ignoring the non-negativity constraint (since the solution, as will be seen, is already non-negative). We will assume that the autocovariance coefficients c ,c ,…,c are such that the problem has feasible solutions. 0 1 n

EL3370 Optimization of Functionals - 32

BONUS: EQUALITY CONSTRAINED OPTIMIZATION (CONT.)

Example: maximum entropy spectral analysis (Burg’s) method (cont.) Using Lagrange multipliers, an optimal solution Φopt should satisfy

1 π 1 1 n π h(ω)dω + λ eikωh(ω)dω =0, for all h ∈ C([−π ,π ]) ∫−π opt ∑i=−n i ∫−π 4π Φ (ω) 2π

Hence, using the fundamental lemma of calculus of variations,

1 n 1 +2 λ eikω = 0 ⇔ Φopt (ω )= − opt ∑i=−n i n Φ (ω ) 2 λ eikω ∑i=−n i where the quantities λ ,λ ,…,λ can be determined from the autocovariance coefficients 0 1 n c ,c ,…,c . This formula shows that the maximum-entropy spectrum corresponds to that of 0 1 n an “auto-regressive process”

Remark. The fact that Φopt is a maximizer of the optimization problem follows from the concavity of the cost function, and its non-negativity is due to that H(Φ)= −∞ if Φ is negative inside an interval of [−π ,π ] (yielding lower cost than any feasible Φ)

EL3370 Optimization of Functionals - 33