L. Vandenberghe ECE236C (Spring 2020) 5. Conjugate functions
closed functions •
conjugate function •
duality •
5.1 Closed set a set C is closed if it contains its boundary:
x C, x x¯ = x¯ C k ∈ k → ⇒ ∈
Operations that preserve closedness
the intersection of (finitely or infinitely many) closed sets is closed •
the union of a finite number of closed sets is closed • inverse under linear mapping: x Ax C is closed if C is closed • { | ∈ }
Conjugate functions 5.2 Image under linear mapping the image of a closed set under a linear mapping is not necessarily closed
Example
2 C = x , x R x x 1 , A = 1 0 , AC = R++ {( 1 2) ∈ + | 1 2 ≥ }
Sufficient condition: AC is closed if
C is closed and convex • and C does not have a recession direction in the nullspace of A, i.e., •
Ay = 0, xˆ C, xˆ + αy C for all α 0 = y = 0 ∈ ∈ ≥ ⇒
in particular, this holds for any matrix A if C is bounded
Conjugate functions 5.3 Closed function
Definition: a function is closed if its epigraph is a closed set
Examples
f x = log 1 x2 with dom f = x x < 1 • ( ) − ( − ) { | | | } f x = x log x with dom f = R+ and f 0 = 0 • ( ) ( ) indicator function of a closed set C: • 0 x C δ x = ∈ C( ) + otherwise ∞
Not closed
f x = x log x with dom f = R++, or with dom f = R+ and f 0 = 1 • ( ) ( ) indicator function of a set C if C is not closed •
Conjugate functions 5.4 Properties
Sublevel sets: f is closed if and only if all its sublevel sets are closed
Minimum: if f is closed with bounded sublevel sets then it has a minimizer
Common operations on convex functions that preserve closedness
sum: f = f + f is closed if f and f are closed • 1 2 1 2 composition with affine mapping: f = g Ax + b is closed if g is closed • ( ) supremum: f x = sup fα x is closed if each function fα is closed • ( ) α ( ) in each case, we assume dom f , ∅
Conjugate functions 5.5 Outline
closed functions •
conjugate function •
duality • Conjugate function the conjugate of a function f is
T f ∗ y = sup y x f x ( ) x dom f ( − ( )) ∈
f ∗ is closed and convex (even when f is not)
Fenchel’s inequality: the definition implies that
T f x + f ∗ y x y for all x, y ( ) ( ) ≥ this is an extension to non-quadratic convex f of the inequality
1 1 xT x + yT y xT y 2 2 ≥
Conjugate functions 5.6 Quadratic function
1 f x = xT Ax + bT x + c ( ) 2
Strictly convex case (A 0)
1 T 1 f ∗ y = y b A− y b c ( ) 2( − ) ( − ) −
General convex case (A 0)
1 T f ∗ y = y b A† y b c, dom f ∗ = range A + b ( ) 2( − ) ( − ) − ( )
Conjugate functions 5.7 Negative entropy and negative logarithm
Negative entropy
n n X X yi 1 f x = xi log xi f ∗ y = e − ( ) i=1 ( ) i=1
Negative logarithm
Xn Xn f x = log xi f ∗ y = log yi n ( ) − i=1 ( ) − i=1 (− ) −
Matrix logarithm
n f X = log det X dom f = S f ∗ Y = log det Y n ( ) − ( ++) ( ) − (− ) −
Conjugate functions 5.8 Indicator function and norm
Indicator of convex set C: conjugate is the support function of C
0 x C T δC x = ∈ δC∗ y = sup y x ( ) + x < C ( ) x C ∞ ∈
Indicator of convex cone C: conjugate is indicator of polar (negative dual) cone
0 yT x 0 x C δ∗ y = δ C y = δC y = ≤ ∀ ∈ C( ) − ∗( ) ∗(− ) + otherwise ∞
Norm: conjugate is indicator of unit ball for dual norm
0 y 1 f x = x f ∗ y = k k∗ ≤ ( ) k k ( ) + y > 1 ∞ k k∗ (see next page)
Conjugate functions 5.9 Proof: recall the definition of dual norm
y = sup xT y k k∗ x 1 k k≤ to evaluate f y = sup yT x x we distinguish two cases ∗( ) x ( − k k)
if y 1, then (by definition of dual norm) • k k∗ ≤
yT x x for all x ≤ k k
and equality holds if x = 0; therefore sup yT x x = 0 x ( − k k) if y > 1, there exists an x with x 1, xT y > 1; then • k k∗ k k ≤ T T f ∗ y y tx tx = t y x x ( ) ≥ ( ) − k k ( − k k)
and right-hand side goes to infinity if t → ∞
Conjugate functions 5.10 Calculus rules
Separable sum
f x , x = g x + h x f ∗ y , y = g∗ y + h∗ y ( 1 2) ( 1) ( 2) ( 1 2) ( 1) ( 2)
Scalar multiplication (α > 0)
f x = αg x f y = αg y α ( ) ( ) ∗( ) ∗( / ) f x = αg x α f y = αg y ( ) ( / ) ∗( ) ∗( )
the operation f x = αg x α is sometimes called “right scalar multiplication” • ( ) ( / ) a convenient notation is f = gα for the function gα x = αg x α • ( )( ) ( / ) conjugates can be written concisely as gα = αg and αg = g α • ( )∗ ∗ ( )∗ ∗
Conjugate functions 5.11 Calculus rules
Addition to affine function
T f x = g x + a x + b f ∗ y = g∗ y a b ( ) ( ) ( ) ( − ) −
Translation of argument
T f x = g x b f ∗ y = b y + g∗ y ( ) ( − ) ( ) ( )
Composition with invertible linear mapping (A square and nonsingular)
T f x = g Ax f ∗ y = g∗ A− y ( ) ( ) ( ) ( )
Infimal convolution
f x = inf g u + h v f ∗ y = g∗ y + h∗ y ( ) u+v=x ( ( ) ( )) ( ) ( ) ( )
Conjugate functions 5.12 The second conjugate
T f ∗∗ x = sup x y f ∗ y ( ) y dom f ( − ( )) ∈ ∗
f is closed and convex • ∗∗ from Fenchel’s inequality, xT y f y f x for all y and x; therefore • − ∗( ) ≤ ( )
f ∗∗ x f x for all x ( ) ≤ ( )
equivalently, epi f epi f (for any f ) ⊆ ∗∗ if f is closed and convex, then •
f ∗∗ x = f x for all x ( ) ( )
equivalently, epi f = epi f ∗∗ (if f is closed and convex); proof on next page
Conjugate functions 5.13 Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ , epi f suppose x, f x < epi f ; then there is a strict separating hyperplane: ( ∗∗( )) a T z x − c < 0 for all z, s epi f b s f x ≤ ( ) ∈ − ∗∗( ) for some a, b, c with b 0 (b > 0 gives a contradiction as s ) ≤ → ∞
if b < 0, define y = a b and maximize left-hand side over z, s epi f : • /(− ) ( ) ∈ T f ∗ y y x + f ∗∗ x c b < 0 ( ) − ( ) ≤ /(− ) this contradicts Fenchel’s inequality if b = 0, choose yˆ dom f and add small multiple of yˆ, 1 to a, b : • ∈ ∗ ( − ) ( ) T a + yˆ z x T − c + f ∗ yˆ x yˆ + f ∗∗ x < 0 s f x ≤ ( ) − ( ) − − ∗∗( ) now apply the argument for b < 0
Conjugate functions 5.14 Conjugates and subgradients if f is closed and convex, then
T y ∂ f x x ∂ f ∗ y x y = f x + f ∗ y ∈ ( ) ⇐⇒ ∈ ( ) ⇐⇒ ( ) ( )
Proof. if y ∂ f x , then f y = sup yTu f u = yT x f x ; hence ∈ ( ) ∗( ) u ( − ( )) − ( ) T f ∗ v = sup v u f u ( ) u ( − ( )) vT x f x ≥ − ( ) = xT v y f x + yT x ( − ) − ( ) T = f ∗ y + x v y ( ) ( − ) this holds for all v; therefore, x ∂ f y ∈ ∗( ) reverse implication x ∂ f y = y ∂ f x follows from f = f ∈ ∗( ) ⇒ ∈ ( ) ∗∗
Conjugate functions 5.15 Example
f x = δ 1,1 x f ∗ y = y ( ) [− ]( ) ( ) | |
x y 1 1 −
∂ f x ∂ f y ( ) ∗( )
1
1 − x y 1
1 −
Conjugate functions 5.16 Strongly convex function
Definition (page 1.18) f is µ-strongly convex (for ) if dom f is convex and k · k µ f θx + 1 θ y θ f x + 1 θ f y θ 1 θ x y 2 ( ( − ) ) ≤ ( ) ( − ) ( ) − 2 ( − )k − k for all x, y dom f and θ 0,1 ∈ ∈ [ ]
First-order condition
if f is µ-strongly convex, then • µ f y f x + gT y x + y x 2 for all x, y dom f , g ∂ f x ( ) ≥ ( ) ( − ) 2 k − k ∈ ∈ ( )
for differentiable f this is the inequality (4) on page 1.19 •
Conjugate functions 5.17 Proof
recall the definition of directional derivative (page 2.28 and 2.29): • f x + θ y x f x f 0 x, y x = inf ( ( − )) − ( ) ( − ) θ>0 θ and the infimum is approached as θ 0 → if f is µ-strongly convex and subdifferentiable at x, then for all y dom f , • ∈ 1 θ f x + θ f y µ 2 θ 1 θ y x 2 f x f 0 x, y x inf ( − ) ( ) ( ) − ( / ) ( − )k − k − ( ) ( − ) ≤ θ 0,1 θ ∈( ] µ = f y f x y x 2 ( ) − ( ) − 2 k − k
from page 2.31, the directional derivative is the support function of ∂ f x : • ( ) gT y x sup g˜T y x ( − ) ≤ g˜ ∂ f x ( − ) ∈ ( ) = f 0 x; y x ( − ) µ f y f x y x 2 ≤ ( ) − ( ) − 2 k − k Conjugate functions 5.18 Conjugate of strongly convex function assume f is closed and strongly convex with parameter µ > 0 for the norm k · k
f is defined for all y (i.e., dom f = Rn • ∗ ∗ ) f is differentiable everywhere, with gradient • ∗ T f ∗ y = argmax y x f x ∇ ( ) x ( − ( ))
f ∗ is Lipschitz continuous with constant 1 µ for the dual norm : •∇ / k · k∗ 1 f ∗ y f ∗ y0 y y0 for all y and y0 k∇ ( ) − ∇ ( )k ≤ µk − k∗
Conjugate functions 5.19 Proof: if f is strongly convex and closed
yT x f x has a unique maximizer x for every y • − ( ) x maximizes yT x f x if and only if y ∂ f x ; from page 5.15 • − ( ) ∈ ( )
y ∂ f x x ∂ f ∗ y = f ∗ y ∈ ( ) ⇐⇒ ∈ ( ) {∇ ( )} hence f y = argmax yT x f x ∇ ∗( ) x ( − ( )) from first-order condition on page 5.17: if y ∂ f x , y ∂ f x : • ∈ ( ) 0 ∈ ( 0) T µ 2 f x0 f x + y x0 x + x0 x ( ) ≥ ( ) ( − ) 2 k − k T µ 2 f x f x0 + y0 x x0 + x0 x ( ) ≥ ( ) ( ) ( − ) 2 k − k
combining these inequalities shows
2 T µ x x0 y y0 x x0 y y0 x x0 k − k ≤ ( − ) ( − ) ≤ k − k∗k − k
now substitute x = f y and x = f y • ∇ ∗( ) 0 ∇ ∗( 0)
Conjugate functions 5.20 Outline
closed functions •
conjugate function •
duality • Duality
primal: minimize f x + g Ax ( ) ( ) T dual: maximize g∗ z f ∗ A z − ( ) − (− ) follows from Lagrange duality applied to reformulated primal • minimize f x + g y ( ) ( ) subject to Ax = y
dual function for the formulated problem is:
T T T inf f x + z Ax + g y z y = f ∗ A z g∗ z x,y ( ( ) ( ) − ) − (− ) − ( )
Slater’s condition (for convex f , g): strong duality holds if there exists an xˆ with • xˆ int dom f , Axˆ int dom g ∈ ∈ this also guarantees that the dual optimum is attained, if optimal value is finite
Conjugate functions 5.21 Set constraint
minimize f x ( ) subject to Ax b C − ∈
Primal and dual problem
primal: minimize f x + δ Ax b ( ) C( − ) T T dual: maximize b z δ∗ z f ∗ A z − − C( ) − (− )
Examples
constraint set C support function δ z C∗ ( ) equality Ax = b 0 0 { } norm inequality Ax b 1 unit -ball z k − k ≤ k · k k k∗ conic inequality Ax b K δ z K − K∗( )
Conjugate functions 5.22 Norm regularization
minimize f x + Ax b ( ) k − k
take g y = y b in general problem • ( ) k − k
minimize f x + g Ax ( ) ( )
conjugate of is indicator of unit ball for dual norm • k · k T g∗ z = b z + δB z where B = z z 1 ( ) ( ) { | k k∗ ≤ }
hence, dual problem can be written as • maximize bT z f AT z − − ∗(− ) subject to z 1 k k∗ ≤
Conjugate functions 5.23 Optimality conditions
minimize f x + g y ( ) ( ) subject to Ax = y assume f , g are convex and Slater’s condition holds
Optimality conditions: x is optimal if and only if there exists a z such that
1. primal feasibility: x dom f and y = Ax dom g ∈ ∈ 2. x and y = Ax are minimizers of the Lagrangian f x + zT Ax + g y zT y: ( ) ( ) − AT z ∂ f x , z ∂g Ax − ∈ ( ) ∈ ( )
if g is closed, this can be written symmetrically as
T A z ∂ f x , Ax ∂g∗ z − ∈ ( ) ∈ ( )
Conjugate functions 5.24 References
J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization • Algoritms (1993), chapter X.
D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization • (2003), chapter 7.
R. T. Rockafellar, Convex Analysis (1970). •
Conjugate functions 5.25