Optimization Basic Results

Michel De Lara Cermics, Ecole´ des Ponts ParisTech France

Ecole´ des Ponts ParisTech

November 22, 2020 Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality inf h(a, b) = inf inf h(a, b) a∈A,b∈B a∈A b∈B inf λf (a) = λ inf f (a) , ∀λ ≥ 0 a∈A a∈A inf f (a) + g(b) = inf f (a) + inf g(b) a∈A,b∈B a∈A b∈B Tower formula

For any h : A × B → [−∞, +∞]

we have

inf h(a, b) = inf inf h(a, b) a∈A,b∈B a∈A b∈B

and if B(a) ⊂ B, ∀a ∈ A, we have inf h(a, b) = inf inf h(a, b) a∈A,b∈B(a) a∈A b∈B(a) Independence

For any functions

f : A →] − ∞, +∞] , g : B →] − ∞, +∞] we have

inf f (a) + g(b) = inf f (a) + inf g(b) a∈A,b∈B a∈A b∈B

and for any finite S, any functions fs : As →] − ∞, +∞] and any nonnegative scalars πs ≥ 0, for s ∈ S, we have X X inf π f (a )= π inf f (a ) Q s s s s s s {as }s∈ ∈ s∈ As as ∈As S S s∈S s∈S Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Convex sets

Let N ∈ N∗. We consider subsets of the Euclidian space RN N I The subset C ⊂ R is convex if for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1], we have that tx1 + (1 − t)x2 ∈ C I An intersection of convex sets is convex I A segment is convex I A hyperplane is convex( H ⊂ RN is a hyperplane if there exists N  N y ∈ R \{0} and b ∈ R such that H = x ∈ R hx , yi + b = 0 ) I An affine subspace (intersection of hyperplanes) is convex Linear and affine functions

Consider a function f : RN → R N N I The function f is linear if, for any x1 ∈ R , x2 ∈ R and t1 ∈ R, t2 ∈ R,

f (t1x1 + t2x2) = t1f (x1) + t2f (x2)

N N I The function f is affine if, for any x1 ∈ R , x2 ∈ R and t ∈ R,

f (tx1 + (1 − t)x2) = tf (x1) + (1 − t)f (x2)

Exercise. Show that f : RN → R is affine if and only if g(x) = f (x) − f (0) is linear Convex functions (definitions) Let C ⊂ RN be an nonempty convex subset of RN , where N ∈ N∗, and f : C → R be a function I The function f is affine if, for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1],

f (tx1 + (1 − t)x2) = tf (x1) + (1 − t)f (x2)

I The function f is convex if, for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1],

f (tx1 + (1 − t)x2) ≤ tf (x1) + (1 − t)f (x2)

I The function f is strictly convex if, for any x1 ∈ C, x2 ∈ C, x1 6= x2, and t ∈]0, 1[,

f (tx1 + (1 − t)x2) < tf (x1) + (1 − t)f (x2)

I The function f is strongly convex (of modulus a > 0) if, for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1], a f (tx + (1 − t)x ) ≤ tf (x ) + (1 − t)f (x ) − t(1 − t)kx − x k2 1 2 1 2 2 1 2 Exercises

Let C ⊂ RN be an nonempty subset of RN , where N ∈ N∗ I Show that both definitions of an affine function coincide when C = RN I Show that a function f : C → R is convex if and only if its epigraph is a convex set subset of RN × R I Show that a function f : C → R is strongly convex of modulus a > 0 a 2 if and only if g(x) = f (x) − 2 kxk is convex I If f : C → R is convex, show that f is not strictly convex if and only there exists a nonempty convex subset C 0 ⊂ C over which f is affine Convex functions on the real line

Proposition Let I ⊂ R be an nonempty I A C 1 function f : I → R is convex if and only if f 0 is increasing on I I A C 2 function f : I → R is convex if and only if f 00(x) ≥ 0, for all x ∈ I I Let a > 0.A C 2 function f : I → R is a-strongly convex if and only if f 00(x) ≥ a, for all x ∈ I I A C 1 function f : I → R is strictly convex if and only if f is convex and the set {x ∈ I | f 00(x) = 0} is either empty or is a singleton

Exercise. Study the family of functions fα :]0, +∞[→ R given by α fα(x) = x . For which values of the parameter α is the function fα convex? For a given a > 0, for which values of the parameter α is the function fα strongly convex of modulus a? Provide an example of a strictly which is not strongly convex. Convexity for multivariate functions 2 N The Hessian matrix Hf (x) of a C function f : R → R is the N × N symmetric matrix given by

 ∂2f  Hf (x) = i j (x) ∂x ∂x (i,j)∈{1,...,N}2

Proposition Let C ⊂ RN be an nonempty convex subset of RN , where N ∈ N∗ I A C 2 function f : RN → R is convex on C if and only if the symmetric Hessian matrix Hf (x) is positive for all x ∈ C I A C 2 function f : RN → R is strongly convex of modulus a > 0 on C if and only if the eigenvalues of the symmetric Hessian matrix Hf (x) are uniformly bounded below by a > 0 on C

Exercise. Let Q be a N × N symmetric matrix and f (x) = 1/2x 0Qx, where x 0 is the transpose of the vector x. Give conditions on the smallest eigenvalue of Q so that the function f is convex, or strictly convex, or strongly convex of modulus a. Operations on functions preserving convexity

Proposition

Let (fi )i∈I be a family of convex functions Then supi∈I fi is a convex function

Proposition

Let (fi )i=1,...,n be convex functions Let (αi )i=1,...,n be nonnegative numbers Pm Then i=1 αi fi is a convex function

Proposition Let f : RN → R be convex Let A be a N × M matrix and b ∈ RN Then y ∈ RM 7→ f (Ay + b) is a convex function Coercivity

Definition A function f : RN → R is coercive if lim f (x) = +∞ kxk→+∞

Proposition A strongly convex function is coercive Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Here are the ingredients for a general abstract optimization problem

inf J(u) ad u∈U

I Optimization set U(= RN ) containing optimization variables u ∈ U I A criterion J : U → R ∪ {+∞} I Constraints of the form u ∈ Uad ⊂ U Examples of classes of optimization problems

inf J(u) ad u∈U

I Linear programming N I Optimization set U = R I Criterion J is linear (affine) ad I Constraints U defined by a finite number of linear (affine) equalities and inequalities I Convex optimization I Criterion J is a convex function ad I Constraints U define a convex set I Combinatorial optimization N N I Optimization set U is discrete (binary {0, 1} , integer Z , etc.) Minimum

Definition We say that u∗ ∈ U is a (global) minimum of the optimization problem inf ad J(u) if u∈U ∗ ad ∗ ad u ∈ U and J(u ) ≤ J(u) , ∀u ∈ U In this case, we write J(u∗) = min J(u) ad u∈U Existence and uniqueness of a minimum

We consider the optimization problem

inf J(u) where ad ⊂ = N ad U U R u∈U

Proposition If the criterion J is continuous and the constraint set Uad is compact (bounded and closed), then there is a minimum

Proposition If the constraint set Uad is closed and the criterion J is continuous and coercive, then there is a minimum

Proposition If the constraint set Uad is convex, and if the criterion J is strictly convex, a minimum is necessarily unique Exercises

We consider the optimization problem

inf J(u) ad u∈U Give an example I of continuous criterion J and of constraint set Uad for which there is no minimum I of criterion J and of compact constraint set Uad for which there is no minimum I of continuous criterion J and of unbounded and closed constraint set Uad for which there is no minimum I of convex criterion J and of constraint set Uad for which there is more than one minimum I of strictly convex criterion J and of constraint set Uad for which there is more than one minimum Local minimum

Definition We say that u∗ ∈ U is a local minimum of the optimization problem ∗ ad inf ad J(u) if there exists a neighborhood V of u in such that u∈U U ∗ ad ∗ u ∈ U and J(u ) ≤ J(u) , ∀u ∈ V

Proposition If the constraint set Uad is convex, and if the criterion J is convex, a local minimum is a global minimum Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Optimization under equality constraints

We consider the optimization problem

inf J(u) N u∈R under the constraint Θ(u) = 0

where Θ is a function with values in RM

N M Θ = (Θ1,..., ΘM ): R → R

whose components are denoted by Θj , where j runs from 1 to M Sufficient condition for qualification in case of equality constraints

Definition Let u∗ ∈ RN . The equality constraints Θ(u) = 0 are said to be regular at u∗ if, when Θ(u∗) = 0, the function Θ is differentiable at u∗ ∗ and the vectors ∇Θj (u ), j ∈ {1,..., M}, are linearly independent

Let u∗ ∈ RN . In case I either the equality constraints Θ(u) = 0 are regular at u∗ I or the function Θ is affine we say that the equality constraints Θ(u) = 0 are qualified at u∗ Lagrangian

Definition The Lagrangian L : RN × RM → R is defined by

M X L(u, λ) = J(u) + hλ, Θ(u)i = J(u) + λj Θj (u) j=1

The variables λ are called (Lagrange) multipliers First-order optimality conditions (necessary) Karush-Kuhn-Tucker (KKT) optimality conditions

Proposition We suppose that the criterion J is differentiable. Let u∗ ∈ RN . If the equality constraints Θ(u) = 0 are qualified at u∗, then a necessary condition for u∗ to be a local minimum of J, among the u such that Θ(u) = 0, is that there exists a vector λ∗ of RM such that ∂L ∂L (u∗, λ∗) = 0 and (u∗, λ∗) = 0 ∂u ∂λ expressing the first-order optimality conditions (KKT optimality conditions) First-order optimality conditions (sufficient)

Proposition Let u∗ ∈ RN . We suppose that I the criterion J is differentiable and convex I in the equality constraints Θ(u) = 0, the function Θ is affine Then a sufficient condition for u∗ to be a minimum of J, among the u such that Θ(u) = 0, is that there exists a vector λ∗ of RM such that ∂L ∂L (u∗, λ∗) = 0 and (u∗, λ∗) = 0 ∂u ∂λ Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Duality gap

Consider a function φ : X × Y → R

Proposition We have the inequality

inf sup φ(x, y) ≥ sup inf φ(x, y) x y y x

Notice that we minimize in the first variable x (primal variable) and maximize in the second variable y (dual variable) Definition The duality gap is

inf sup φ(x, y) − sup inf φ(x, y) ≥ 0 x y y x Saddle-point

Definition We say that (¯x, y¯) ∈ X × Y is a saddle-point if I y 7→ φ(¯x, y) achieves a maximum aty ¯ I x 7→ φ(x, y¯) achieves a minimum atx ¯ or, equivalently φ(x, y¯) ≥ φ(¯x, y¯) ≥ φ(¯x, y)

Proposition When there exists a saddle-point, there is no duality gap (that is, the duality gap is zero) Existence of a saddle point

Proposition Suppose that φ : X × Y → R I is continuous, I convex-concave (convex in the variable x, concave in the variable y), I there exists two convex closed sets X ⊂ X and Y ⊂ Y such that I there exists ax ˆ ∈ X such that limkyk→+∞ φ(ˆx, y) = −∞, or the set Y is bounded, I there exists ay ˆ ∈ Y such that limkxk→+∞ φ(x, yˆ) = +∞, or the set X is bounded. Then, there exists a saddle point for the function φ on X × Y Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Optimization under equality constraints

I We consider the optimization problem

inf J(u) N u∈R under the constraint Θ(u) = 0 N M where Θ = (Θ1,..., ΘM ): R → R I The Lagrangian L : RN × RM → R is defined by

M X L(u, λ) = J(u) + hλ, Θ(u)i = J(u) + λj Θj (u) j=1 Primal problem

Definition The primal optimization problem is

inf sup L(u, λ) u∈ N M R λ∈R

Proposition The original and the primal optimization problems have the same solutions (in u ∈ RN ) Dual problem Definition The dual optimization problem is

sup inf L(u, λ) M u∈ N λ∈R R

Definition The dual function is D : RM → R ∪ {−∞} given by D(λ) = inf L(u, λ) N u∈R

Proposition The dual function is concave

Proposition When there exists a saddle-point for the Lagrangian, primal and dual problems are equivalent First-order optimality conditions and saddle point

Proposition We suppose that I the criterion J is differentiable and convex I in the equality constraints Θ(u) = 0, the function Θ is affine Let u∗ ∈ RN be a minimum of J, among the u such that Θ(u) = 0. Then, there exists a vector λ∗ of RM such that (u∗, λ∗) is a saddle point of the Lagrangian L, that is,

u 7→ L(u, λ∗)

achieves a minimum at u∗, and

λ 7→ L(u∗, λ)

achieves a maximum at λ∗ Existence of a minimum and of a saddle point

Proposition We suppose that I the criterion J is differentiable and strongly convex I in the equality constraints Θ(u) = 0, the function Θ is affine Then I there exists a unique minimum u∗ ∈ RN of J among the u such that Θ(u) = 0 I there exists a vector λ∗ of RM such that (u∗, λ∗) is a saddle point of the Lagrangian L The Uzawa algorithm or dual gradient algorithm We suppose that I the criterion J is differentiable and a-strongly convex I in the equality constraints Θ(u) = 0, the function Θ is affine, with norm κ Then, when 0 < ρ < 2a/κ2, the following algorithm converges towards the (unique) minimum of

inf J(u) , Θ(u) = 0 N u∈R

Data: Initial multiplier λ(0), step ρ Result: minimum and multiplier; repeat (k) (k) u = arg min N L(u, λ ) (minimization w.r.t. the first u∈R variable) ; λ(k+1) = λ(k) + ρΘ(u(k)) (gradient step for the second variable) ; until Θ(u(k)) = 0; Algorithm 1: Dual Gradient Algorithm Outline of the presentation

Magic formulas

Convex functions, coercivity

Existence and uniqueness of a minimum

First-order optimality conditions (the case of equality constraints)

Duality gap and saddle-points

Elements of Lagrangian duality and Uzawa algorithm

More on convexity and duality Extended real valued functions

R = [−∞, +∞] = R ∪ {−∞} ∪ {+∞} For any set W and extended real valued function h : W → R, the epigraph is  epi h = (w, t) ∈ W × R h(w) ≤ t ⊂ W × R the effective domain is  dom h = w ∈ W h(w) < +∞ ⊂ W

and the function h : W → R is said to be proper if it never takes the value −∞ and if dom h 6= ∅ Moreau additions and characteristic function

The Moreau lower and upper addition extend the usual addition with

(+∞) u (−∞) = (−∞) u (+∞) = +∞ (+∞) + (−∞) = (−∞) + (+∞) = −∞ · ·

For any subset W ⊂ W, the characteristic function δW : W → R is ( 0 if w ∈ W δW (w) = +∞ if w 6∈ W

and we have  inf h(w) = inf h(w) u δW (w) w∈W w∈W Extended real valued convex and lsc functions

Let X be a (real) vector space Convex function A function f : X → R is said to be convex if its epigraph epi f is a convex subset of X × R

Let X be a topological (real) vector space Lower semi continuous (lsc) function

A function f : X → R is said to be lower semi continuous (lsc) if its epigraph epi f is a closed subset of X × R A function is said to be closed if it is either lsc and nowhere having the value −∞, or is the constant function −∞ Bilinear duality, primal and dual spaces

I Let X and Y be two (real) vector spaces that are paired: I there exists a bilinear form h , i : X × Y → R and locally convex topologies that are compatible in the sense that the continuous linear forms on X are the functions x ∈ X 7→ hx , yi, for all y ∈ Y, and that the continuous linear forms on Y are the functions y ∈ Y 7→ hx , yi, for all x ∈ X I The space X is called the primal space I The space Y is called the dual space Subdifferential of a function

Subdifferential The subdifferential of a function f : X → R at x ∈ X is the subset

 0 0 0 ∂f (x) = y ∈ Y f (x ) − hx , yi ≥ f (x) − hx , yi , ∀x ∈ X

0 0 0 y ∈ ∂f (x) ⇐⇒ f (x ) ≥ f (x) + hx − x , yi , ∀x ∈ X | {z } affine function of x0 sharp at x ∈ X The Fenchel conjugacy

I The Fenchel conjugacy ? is defined, for any functions f : X → R and g : Y → R, by

?   f (y) = sup hx , yi − f (x) , ∀y ∈ Y x∈X ?0   g (x) = sup hx , yi − g(y) , ∀x ∈ X y∈Y ??0  ?  f (x) = sup hx , yi − f (y) , ∀x ∈ X y∈Y

I The Fenchel conjugacy induces a one-to-one correspondence between the closed convex functions on X and the closed convex functions on Y Subdifferential and Fenchel conjugacy

best closed convex lower approximation

z}|{0 f ?? ≤ f

y ∈ ∂f (x) ⇐⇒ f (x) + f ?(y) = hx , yi 0 ∂f (x) 6= ∅ =⇒ f ?? (x) = f (x) 0 f is closed convex ⇐⇒ f ?? = f Dual problems given by Fenchel conjugacy

I Set W, function h : W → R and original minimization problem inf h(w) w∈W

I Embedding/perturbation scheme given by a nonempty set U, an element u ∈ U and a function H : W × U → R such that

h(w) = H(w, u) , ∀w ∈ W

I Paired spaces U and V, and Lagrangian L : W × V → R given by   L(w, v) = inf H(w, u) + hu − u , vi u∈U I Dual maximization problem

sup inf L(w, v) w∈ v∈V W Duality gap

I Weak duality always holds true

sup inf L(w, v) ≤ inf h(w) w∈ w∈ v∈V W W When it exists, the duality gap is the nonnegative difference I Strong duality holds true, or there is no duality gap, when

sup inf L(w, v) = inf h(w) w∈ w∈ v∈V W W Karush-Kuhn-Tucker (KKT) condition

Abstract Karush-Kuhn-Tucker (KKT) condition The couple (w, v) ∈ W × V satisfies the KKT condition if (w, v) is a saddle point of the Lagrangian L, that is, I the function W 3 w 7→ L(w, v) achieves a minimum at w I the function V 3 v 7→ L(w, v) achieves a maximum at v Strong duality and KKT condition under convexity

Theorem [Rockafellar(1974), Theorem 15, p. 40] Suppose that the function u 7→ H(w, u) is closed convex

Then, the following conditions are equivalent 1. There is no duality gap and w ∈ arg min h(w) w∈W and v ∈ arg max inf L(w, v) v∈V w∈W 2. The couple (w, v) ∈ W × V satisfies the KKT condition Strong duality and KKT condition under convexity

Theorem [Rockafellar(1974), Corollary 15A, p. 40] Suppose that there is no duality gap and w ∈ arg min h(w) w∈W Then, the following conditions are equivalent 1. w ∈ arg min h(w) w∈W 2. there exists v ∈ V such that the couple (w, v) ∈ W × V satisfies the KKT condition Subdifferential of the value function (sensitivity analysis)

The value function is

ϕ(u) = inf H(w, u) , ∀u ∈ U w∈W

Theorem [Rockafellar(1974), Theorem 16, p. 40] For v ∈ V, the following conditions are equivalent 1. v ∈ arg max inf L(w, v) and v∈V w∈W maxv∈V infw∈W L(w, v) = infw∈W h(w) 2. v ∈ ∂ϕ(u)

3. infw∈W h(w) = infw∈W L(w, v) Subdifferential of the value function (sensitivity analysis) The convex case

Theorem [Rockafellar(1974), Theorem 18, p. 41] Suppose that I the function H : W × U → R is convex I there exists w ∈ W such that the function u 7→ H(w, u) is bounded above in a neighborhood of u Then there exists v ∈ V such that 1. v ∈ arg max inf L(w, v) and v∈V w∈W maxv∈V infw∈W L(w, v) = infw∈W h(w) 2. v ∈ ∂ϕ(u)

3. infw∈W h(w) = infw∈W L(w, v) Classic Lagrangian duality

p p I Let θ = (θ1, . . . , θp): W → R be a mapping, and u ∈ R I We consider the optimization problem

min h(w) = min h(w) θ(w)≤u θ1(w)≤u1 ... θp (w)≤up

I In that case, take the perturbation scheme with U = Rp and

p X H(w, u) = h(w) u δθ(w)≤u = h(w) u δθj (w)≤uj j=1

I which gives the Lagrangian L : W × V → R, with V = Rp and

p X  L(w, v)= h(w) + hθ(w) − u , vi = h(w) + vj θj (w) − u j=1 Slater qualification constraint

Theorem [Rockafellar(1974)] Suppose that

I the functions h and θ1,..., θp are is convex I there exists w ∈ W such that

θ1(w) < u1, . . . , θp(w) < up

Then there exists v ∈ V such that 1. v ∈ arg max inf L(w, v) and v∈V w∈W maxv∈V infw∈W L(w, v) = infw∈W h(w) 2. v ∈ ∂ϕ(u)

3. infw∈W h(w) = infw∈W L(w, v) R. Tyrrell Rockafellar. Conjugate Duality and Optimization. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, 1974.