Optimization Basic Results
Michel De Lara Cermics, Ecole´ des Ponts ParisTech France
Ecole´ des Ponts ParisTech
November 22, 2020 Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality inf h(a, b) = inf inf h(a, b) a∈A,b∈B a∈A b∈B inf λf (a) = λ inf f (a) , ∀λ ≥ 0 a∈A a∈A inf f (a) + g(b) = inf f (a) + inf g(b) a∈A,b∈B a∈A b∈B Tower formula
For any function h : A × B → [−∞, +∞]
we have
inf h(a, b) = inf inf h(a, b) a∈A,b∈B a∈A b∈B
and if B(a) ⊂ B, ∀a ∈ A, we have inf h(a, b) = inf inf h(a, b) a∈A,b∈B(a) a∈A b∈B(a) Independence
For any functions
f : A →] − ∞, +∞] , g : B →] − ∞, +∞] we have
inf f (a) + g(b) = inf f (a) + inf g(b) a∈A,b∈B a∈A b∈B
and for any finite set S, any functions fs : As →] − ∞, +∞] and any nonnegative scalars πs ≥ 0, for s ∈ S, we have X X inf π f (a )= π inf f (a ) Q s s s s s s {as }s∈ ∈ s∈ As as ∈As S S s∈S s∈S Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Convex sets
Let N ∈ N∗. We consider subsets of the Euclidian space RN N I The subset C ⊂ R is convex if for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1], we have that tx1 + (1 − t)x2 ∈ C I An intersection of convex sets is convex I A segment is convex I A hyperplane is convex( H ⊂ RN is a hyperplane if there exists N N y ∈ R \{0} and b ∈ R such that H = x ∈ R hx , yi + b = 0 ) I An affine subspace (intersection of hyperplanes) is convex Linear and affine functions
Consider a function f : RN → R N N I The function f is linear if, for any x1 ∈ R , x2 ∈ R and t1 ∈ R, t2 ∈ R,
f (t1x1 + t2x2) = t1f (x1) + t2f (x2)
N N I The function f is affine if, for any x1 ∈ R , x2 ∈ R and t ∈ R,
f (tx1 + (1 − t)x2) = tf (x1) + (1 − t)f (x2)
Exercise. Show that f : RN → R is affine if and only if g(x) = f (x) − f (0) is linear Convex functions (definitions) Let C ⊂ RN be an nonempty convex subset of RN , where N ∈ N∗, and f : C → R be a function I The function f is affine if, for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1],
f (tx1 + (1 − t)x2) = tf (x1) + (1 − t)f (x2)
I The function f is convex if, for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1],
f (tx1 + (1 − t)x2) ≤ tf (x1) + (1 − t)f (x2)
I The function f is strictly convex if, for any x1 ∈ C, x2 ∈ C, x1 6= x2, and t ∈]0, 1[,
f (tx1 + (1 − t)x2) < tf (x1) + (1 − t)f (x2)
I The function f is strongly convex (of modulus a > 0) if, for any x1 ∈ C, x2 ∈ C and t ∈ [0, 1], a f (tx + (1 − t)x ) ≤ tf (x ) + (1 − t)f (x ) − t(1 − t)kx − x k2 1 2 1 2 2 1 2 Exercises
Let C ⊂ RN be an nonempty subset of RN , where N ∈ N∗ I Show that both definitions of an affine function coincide when C = RN I Show that a function f : C → R is convex if and only if its epigraph is a convex set subset of RN × R I Show that a function f : C → R is strongly convex of modulus a > 0 a 2 if and only if g(x) = f (x) − 2 kxk is convex I If f : C → R is convex, show that f is not strictly convex if and only there exists a nonempty convex subset C 0 ⊂ C over which f is affine Convex functions on the real line
Proposition Let I ⊂ R be an nonempty interval I A C 1 function f : I → R is convex if and only if f 0 is increasing on I I A C 2 function f : I → R is convex if and only if f 00(x) ≥ 0, for all x ∈ I I Let a > 0.A C 2 function f : I → R is a-strongly convex if and only if f 00(x) ≥ a, for all x ∈ I I A C 1 function f : I → R is strictly convex if and only if f is convex and the set {x ∈ I | f 00(x) = 0} is either empty or is a singleton
Exercise. Study the family of functions fα :]0, +∞[→ R given by α fα(x) = x . For which values of the parameter α is the function fα convex? For a given a > 0, for which values of the parameter α is the function fα strongly convex of modulus a? Provide an example of a strictly convex function which is not strongly convex. Convexity for multivariate functions 2 N The Hessian matrix Hf (x) of a C function f : R → R is the N × N symmetric matrix given by
∂2f Hf (x) = i j (x) ∂x ∂x (i,j)∈{1,...,N}2
Proposition Let C ⊂ RN be an nonempty convex subset of RN , where N ∈ N∗ I A C 2 function f : RN → R is convex on C if and only if the symmetric Hessian matrix Hf (x) is positive for all x ∈ C I A C 2 function f : RN → R is strongly convex of modulus a > 0 on C if and only if the eigenvalues of the symmetric Hessian matrix Hf (x) are uniformly bounded below by a > 0 on C
Exercise. Let Q be a N × N symmetric matrix and f (x) = 1/2x 0Qx, where x 0 is the transpose of the vector x. Give conditions on the smallest eigenvalue of Q so that the function f is convex, or strictly convex, or strongly convex of modulus a. Operations on functions preserving convexity
Proposition
Let (fi )i∈I be a family of convex functions Then supi∈I fi is a convex function
Proposition
Let (fi )i=1,...,n be convex functions Let (αi )i=1,...,n be nonnegative numbers Pm Then i=1 αi fi is a convex function
Proposition Let f : RN → R be convex Let A be a N × M matrix and b ∈ RN Then y ∈ RM 7→ f (Ay + b) is a convex function Coercivity
Definition A function f : RN → R is coercive if lim f (x) = +∞ kxk→+∞
Proposition A strongly convex function is coercive Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Here are the ingredients for a general abstract optimization problem
inf J(u) ad u∈U
I Optimization set U(= RN ) containing optimization variables u ∈ U I A criterion J : U → R ∪ {+∞} I Constraints of the form u ∈ Uad ⊂ U Examples of classes of optimization problems
inf J(u) ad u∈U
I Linear programming N I Optimization set U = R I Criterion J is linear (affine) ad I Constraints U defined by a finite number of linear (affine) equalities and inequalities I Convex optimization I Criterion J is a convex function ad I Constraints U define a convex set I Combinatorial optimization N N I Optimization set U is discrete (binary {0, 1} , integer Z , etc.) Minimum
Definition We say that u∗ ∈ U is a (global) minimum of the optimization problem inf ad J(u) if u∈U ∗ ad ∗ ad u ∈ U and J(u ) ≤ J(u) , ∀u ∈ U In this case, we write J(u∗) = min J(u) ad u∈U Existence and uniqueness of a minimum
We consider the optimization problem
inf J(u) where ad ⊂ = N ad U U R u∈U
Proposition If the criterion J is continuous and the constraint set Uad is compact (bounded and closed), then there is a minimum
Proposition If the constraint set Uad is closed and the criterion J is continuous and coercive, then there is a minimum
Proposition If the constraint set Uad is convex, and if the criterion J is strictly convex, a minimum is necessarily unique Exercises
We consider the optimization problem
inf J(u) ad u∈U Give an example I of continuous criterion J and of constraint set Uad for which there is no minimum I of criterion J and of compact constraint set Uad for which there is no minimum I of continuous criterion J and of unbounded and closed constraint set Uad for which there is no minimum I of convex criterion J and of constraint set Uad for which there is more than one minimum I of strictly convex criterion J and of constraint set Uad for which there is more than one minimum Local minimum
Definition We say that u∗ ∈ U is a local minimum of the optimization problem ∗ ad inf ad J(u) if there exists a neighborhood V of u in such that u∈U U ∗ ad ∗ u ∈ U and J(u ) ≤ J(u) , ∀u ∈ V
Proposition If the constraint set Uad is convex, and if the criterion J is convex, a local minimum is a global minimum Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Optimization under equality constraints
We consider the optimization problem
inf J(u) N u∈R under the constraint Θ(u) = 0
where Θ is a function with values in RM
N M Θ = (Θ1,..., ΘM ): R → R
whose components are denoted by Θj , where j runs from 1 to M Sufficient condition for qualification in case of equality constraints
Definition Let u∗ ∈ RN . The equality constraints Θ(u) = 0 are said to be regular at u∗ if, when Θ(u∗) = 0, the function Θ is differentiable at u∗ ∗ and the vectors ∇Θj (u ), j ∈ {1,..., M}, are linearly independent
Let u∗ ∈ RN . In case I either the equality constraints Θ(u) = 0 are regular at u∗ I or the function Θ is affine we say that the equality constraints Θ(u) = 0 are qualified at u∗ Lagrangian
Definition The Lagrangian L : RN × RM → R is defined by
M X L(u, λ) = J(u) + hλ, Θ(u)i = J(u) + λj Θj (u) j=1
The variables λ are called (Lagrange) multipliers First-order optimality conditions (necessary) Karush-Kuhn-Tucker (KKT) optimality conditions
Proposition We suppose that the criterion J is differentiable. Let u∗ ∈ RN . If the equality constraints Θ(u) = 0 are qualified at u∗, then a necessary condition for u∗ to be a local minimum of J, among the u such that Θ(u) = 0, is that there exists a vector λ∗ of RM such that ∂L ∂L (u∗, λ∗) = 0 and (u∗, λ∗) = 0 ∂u ∂λ expressing the first-order optimality conditions (KKT optimality conditions) First-order optimality conditions (sufficient)
Proposition Let u∗ ∈ RN . We suppose that I the criterion J is differentiable and convex I in the equality constraints Θ(u) = 0, the function Θ is affine Then a sufficient condition for u∗ to be a minimum of J, among the u such that Θ(u) = 0, is that there exists a vector λ∗ of RM such that ∂L ∂L (u∗, λ∗) = 0 and (u∗, λ∗) = 0 ∂u ∂λ Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Duality gap
Consider a function φ : X × Y → R
Proposition We have the inequality
inf sup φ(x, y) ≥ sup inf φ(x, y) x y y x
Notice that we minimize in the first variable x (primal variable) and maximize in the second variable y (dual variable) Definition The duality gap is
inf sup φ(x, y) − sup inf φ(x, y) ≥ 0 x y y x Saddle-point
Definition We say that (¯x, y¯) ∈ X × Y is a saddle-point if I y 7→ φ(¯x, y) achieves a maximum aty ¯ I x 7→ φ(x, y¯) achieves a minimum atx ¯ or, equivalently φ(x, y¯) ≥ φ(¯x, y¯) ≥ φ(¯x, y)
Proposition When there exists a saddle-point, there is no duality gap (that is, the duality gap is zero) Existence of a saddle point
Proposition Suppose that φ : X × Y → R I is continuous, I convex-concave (convex in the variable x, concave in the variable y), I there exists two convex closed sets X ⊂ X and Y ⊂ Y such that I there exists ax ˆ ∈ X such that limkyk→+∞ φ(ˆx, y) = −∞, or the set Y is bounded, I there exists ay ˆ ∈ Y such that limkxk→+∞ φ(x, yˆ) = +∞, or the set X is bounded. Then, there exists a saddle point for the function φ on X × Y Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Optimization under equality constraints
I We consider the optimization problem
inf J(u) N u∈R under the constraint Θ(u) = 0 N M where Θ = (Θ1,..., ΘM ): R → R I The Lagrangian L : RN × RM → R is defined by
M X L(u, λ) = J(u) + hλ, Θ(u)i = J(u) + λj Θj (u) j=1 Primal problem
Definition The primal optimization problem is
inf sup L(u, λ) u∈ N M R λ∈R
Proposition The original and the primal optimization problems have the same solutions (in u ∈ RN ) Dual problem Definition The dual optimization problem is
sup inf L(u, λ) M u∈ N λ∈R R
Definition The dual function is D : RM → R ∪ {−∞} given by D(λ) = inf L(u, λ) N u∈R
Proposition The dual function is concave
Proposition When there exists a saddle-point for the Lagrangian, primal and dual problems are equivalent First-order optimality conditions and saddle point
Proposition We suppose that I the criterion J is differentiable and convex I in the equality constraints Θ(u) = 0, the function Θ is affine Let u∗ ∈ RN be a minimum of J, among the u such that Θ(u) = 0. Then, there exists a vector λ∗ of RM such that (u∗, λ∗) is a saddle point of the Lagrangian L, that is,
u 7→ L(u, λ∗)
achieves a minimum at u∗, and
λ 7→ L(u∗, λ)
achieves a maximum at λ∗ Existence of a minimum and of a saddle point
Proposition We suppose that I the criterion J is differentiable and strongly convex I in the equality constraints Θ(u) = 0, the function Θ is affine Then I there exists a unique minimum u∗ ∈ RN of J among the u such that Θ(u) = 0 I there exists a vector λ∗ of RM such that (u∗, λ∗) is a saddle point of the Lagrangian L The Uzawa algorithm or dual gradient algorithm We suppose that I the criterion J is differentiable and a-strongly convex I in the equality constraints Θ(u) = 0, the function Θ is affine, with norm κ Then, when 0 < ρ < 2a/κ2, the following algorithm converges towards the (unique) minimum of
inf J(u) , Θ(u) = 0 N u∈R
Data: Initial multiplier λ(0), step ρ Result: minimum and multiplier; repeat (k) (k) u = arg min N L(u, λ ) (minimization w.r.t. the first u∈R variable) ; λ(k+1) = λ(k) + ρΘ(u(k)) (gradient step for the second variable) ; until Θ(u(k)) = 0; Algorithm 1: Dual Gradient Algorithm Outline of the presentation
Magic formulas
Convex functions, coercivity
Existence and uniqueness of a minimum
First-order optimality conditions (the case of equality constraints)
Duality gap and saddle-points
Elements of Lagrangian duality and Uzawa algorithm
More on convexity and duality Extended real valued functions
R = [−∞, +∞] = R ∪ {−∞} ∪ {+∞} For any set W and extended real valued function h : W → R, the epigraph is epi h = (w, t) ∈ W × R h(w) ≤ t ⊂ W × R the effective domain is dom h = w ∈ W h(w) < +∞ ⊂ W
and the function h : W → R is said to be proper if it never takes the value −∞ and if dom h 6= ∅ Moreau additions and characteristic function
The Moreau lower and upper addition extend the usual addition with
(+∞) u (−∞) = (−∞) u (+∞) = +∞ (+∞) + (−∞) = (−∞) + (+∞) = −∞ · ·
For any subset W ⊂ W, the characteristic function δW : W → R is ( 0 if w ∈ W δW (w) = +∞ if w 6∈ W
and we have inf h(w) = inf h(w) u δW (w) w∈W w∈W Extended real valued convex and lsc functions
Let X be a (real) vector space Convex function A function f : X → R is said to be convex if its epigraph epi f is a convex subset of X × R
Let X be a topological (real) vector space Lower semi continuous (lsc) function
A function f : X → R is said to be lower semi continuous (lsc) if its epigraph epi f is a closed subset of X × R A function is said to be closed if it is either lsc and nowhere having the value −∞, or is the constant function −∞ Bilinear duality, primal and dual spaces
I Let X and Y be two (real) vector spaces that are paired: I there exists a bilinear form h , i : X × Y → R and locally convex topologies that are compatible in the sense that the continuous linear forms on X are the functions x ∈ X 7→ hx , yi, for all y ∈ Y, and that the continuous linear forms on Y are the functions y ∈ Y 7→ hx , yi, for all x ∈ X I The space X is called the primal space I The space Y is called the dual space Subdifferential of a function
Subdifferential The subdifferential of a function f : X → R at x ∈ X is the subset
0 0 0 ∂f (x) = y ∈ Y f (x ) − hx , yi ≥ f (x) − hx , yi , ∀x ∈ X
0 0 0 y ∈ ∂f (x) ⇐⇒ f (x ) ≥ f (x) + hx − x , yi , ∀x ∈ X | {z } affine function of x0 sharp at x ∈ X The Fenchel conjugacy
I The Fenchel conjugacy ? is defined, for any functions f : X → R and g : Y → R, by
? f (y) = sup hx , yi − f (x) , ∀y ∈ Y x∈X ?0 g (x) = sup hx , yi − g(y) , ∀x ∈ X y∈Y ??0 ? f (x) = sup hx , yi − f (y) , ∀x ∈ X y∈Y
I The Fenchel conjugacy induces a one-to-one correspondence between the closed convex functions on X and the closed convex functions on Y Subdifferential and Fenchel conjugacy
best closed convex lower approximation
z}|{0 f ?? ≤ f
y ∈ ∂f (x) ⇐⇒ f (x) + f ?(y) = hx , yi 0 ∂f (x) 6= ∅ =⇒ f ?? (x) = f (x) 0 f is closed convex ⇐⇒ f ?? = f Dual problems given by Fenchel conjugacy
I Set W, function h : W → R and original minimization problem inf h(w) w∈W
I Embedding/perturbation scheme given by a nonempty set U, an element u ∈ U and a function H : W × U → R such that
h(w) = H(w, u) , ∀w ∈ W
I Paired spaces U and V, and Lagrangian L : W × V → R given by L(w, v) = inf H(w, u) + hu − u , vi u∈U I Dual maximization problem
sup inf L(w, v) w∈ v∈V W Duality gap
I Weak duality always holds true
sup inf L(w, v) ≤ inf h(w) w∈ w∈ v∈V W W When it exists, the duality gap is the nonnegative difference I Strong duality holds true, or there is no duality gap, when
sup inf L(w, v) = inf h(w) w∈ w∈ v∈V W W Karush-Kuhn-Tucker (KKT) condition
Abstract Karush-Kuhn-Tucker (KKT) condition The couple (w, v) ∈ W × V satisfies the KKT condition if (w, v) is a saddle point of the Lagrangian L, that is, I the function W 3 w 7→ L(w, v) achieves a minimum at w I the function V 3 v 7→ L(w, v) achieves a maximum at v Strong duality and KKT condition under convexity
Theorem [Rockafellar(1974), Theorem 15, p. 40] Suppose that the function u 7→ H(w, u) is closed convex
Then, the following conditions are equivalent 1. There is no duality gap and w ∈ arg min h(w) w∈W and v ∈ arg max inf L(w, v) v∈V w∈W 2. The couple (w, v) ∈ W × V satisfies the KKT condition Strong duality and KKT condition under convexity
Theorem [Rockafellar(1974), Corollary 15A, p. 40] Suppose that there is no duality gap and w ∈ arg min h(w) w∈W Then, the following conditions are equivalent 1. w ∈ arg min h(w) w∈W 2. there exists v ∈ V such that the couple (w, v) ∈ W × V satisfies the KKT condition Subdifferential of the value function (sensitivity analysis)
The value function is
ϕ(u) = inf H(w, u) , ∀u ∈ U w∈W
Theorem [Rockafellar(1974), Theorem 16, p. 40] For v ∈ V, the following conditions are equivalent 1. v ∈ arg max inf L(w, v) and v∈V w∈W maxv∈V infw∈W L(w, v) = infw∈W h(w) 2. v ∈ ∂ϕ(u)
3. infw∈W h(w) = infw∈W L(w, v) Subdifferential of the value function (sensitivity analysis) The convex case
Theorem [Rockafellar(1974), Theorem 18, p. 41] Suppose that I the function H : W × U → R is convex I there exists w ∈ W such that the function u 7→ H(w, u) is bounded above in a neighborhood of u Then there exists v ∈ V such that 1. v ∈ arg max inf L(w, v) and v∈V w∈W maxv∈V infw∈W L(w, v) = infw∈W h(w) 2. v ∈ ∂ϕ(u)
3. infw∈W h(w) = infw∈W L(w, v) Classic Lagrangian duality
p p I Let θ = (θ1, . . . , θp): W → R be a mapping, and u ∈ R I We consider the optimization problem
min h(w) = min h(w) θ(w)≤u θ1(w)≤u1 ... θp (w)≤up
I In that case, take the perturbation scheme with U = Rp and
p X H(w, u) = h(w) u δθ(w)≤u = h(w) u δθj (w)≤uj j=1
I which gives the Lagrangian L : W × V → R, with V = Rp and
p X L(w, v)= h(w) + hθ(w) − u , vi = h(w) + vj θj (w) − u j=1 Slater qualification constraint
Theorem [Rockafellar(1974)] Suppose that
I the functions h and θ1,..., θp are is convex I there exists w ∈ W such that
θ1(w) < u1, . . . , θp(w) < up
Then there exists v ∈ V such that 1. v ∈ arg max inf L(w, v) and v∈V w∈W maxv∈V infw∈W L(w, v) = infw∈W h(w) 2. v ∈ ∂ϕ(u)
3. infw∈W h(w) = infw∈W L(w, v) R. Tyrrell Rockafellar. Conjugate Duality and Optimization. CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, 1974.