<<

L. Vandenberghe ECE236C (Spring 2020) 5. Conjugate functions

closed functions •

conjugate

5.1 Closed a set C is closed if it contains its boundary:

x C, x x¯ = x¯ C k ∈ k → ⇒ ∈

Operations that preserve closedness

the intersection of (finitely or infinitely many) closed sets is closed •

the union of a finite number of closed sets is closed • inverse under linear mapping: x Ax C is closed if C is closed • { | ∈ }

Conjugate functions 5.2 Image under linear mapping the image of a closed set under a linear mapping is not necessarily closed

Example

2   C = x , x R x x 1 , A = 1 0 , AC = R++ {( 1 2) ∈ + | 1 2 ≥ }

Sufficient condition: AC is closed if

C is closed and convex • and C does not have a recession direction in the nullspace of A, i.e., •

Ay = 0, xˆ C, xˆ + αy C for all α 0 = y = 0 ∈ ∈ ≥ ⇒

in particular, this holds for any matrix A if C is bounded

Conjugate functions 5.3 Closed function

Definition: a function is closed if its is a closed set

Examples

f x = log 1 x2 with dom f = x x < 1 • ( ) − ( − ) { | | | } f x = x log x with dom f = R+ and f 0 = 0 • ( ) ( ) indicator function of a closed set C: •  0 x C δ x = ∈ C( ) + otherwise ∞

Not closed

f x = x log x with dom f = R++, or with dom f = R+ and f 0 = 1 • ( ) ( ) indicator function of a set C if C is not closed •

Conjugate functions 5.4 Properties

Sublevel sets: f is closed if and only if all its sublevel sets are closed

Minimum: if f is closed with bounded sublevel sets then it has a minimizer

Common operations on convex functions that preserve closedness

sum: f = f + f is closed if f and f are closed • 1 2 1 2 composition with affine mapping: f = g Ax + b is closed if g is closed • ( ) supremum: f x = sup fα x is closed if each function fα is closed • ( ) α ( ) in each case, we assume dom f , ∅

Conjugate functions 5.5 Outline

closed functions •

conjugate function •

duality • Conjugate function the conjugate of a function f is

T f ∗ y = sup y x f x ( ) x dom f ( − ( )) ∈

f ∗ is closed and convex (even when f is not)

Fenchel’s inequality: the definition implies that

T f x + f ∗ y x y for all x, y ( ) ( ) ≥ this is an extension to non-quadratic convex f of the inequality

1 1 xT x + yT y xT y 2 2 ≥

Conjugate functions 5.6

1 f x = xT Ax + bT x + c ( ) 2

Strictly convex case (A 0)

1 T 1 f ∗ y = y b A− y b c ( ) 2( − ) ( − ) −

General convex case (A 0) 

1 T f ∗ y = y b A† y b c, dom f ∗ = range A + b ( ) 2( − ) ( − ) − ( )

Conjugate functions 5.7 Negative entropy and negative logarithm

Negative entropy

n n X X yi 1 f x = xi log xi f ∗ y = e − ( ) i=1 ( ) i=1

Negative logarithm

Xn Xn f x = log xi f ∗ y = log yi n ( ) − i=1 ( ) − i=1 (− ) −

Matrix logarithm

n f X = log det X dom f = S f ∗ Y = log det Y n ( ) − ( ++) ( ) − (− ) −

Conjugate functions 5.8 Indicator function and

Indicator of C: conjugate is the support function of C

 0 x C T δC x = ∈ δC∗ y = sup y x ( ) + x < C ( ) x C ∞ ∈

Indicator of convex cone C: conjugate is indicator of polar (negative dual) cone

 0 yT x 0 x C δ∗ y = δ C y = δC y = ≤ ∀ ∈ C( ) − ∗( ) ∗(− ) + otherwise ∞

Norm: conjugate is indicator of unit ball for

 0 y 1 f x = x f ∗ y = k k∗ ≤ ( ) k k ( ) + y > 1 ∞ k k∗ (see next page)

Conjugate functions 5.9 Proof: recall the definition of dual norm

y = sup xT y k k∗ x 1 k k≤ to evaluate f y = sup yT x x we distinguish two cases ∗( ) x ( − k k)

if y 1, then (by definition of dual norm) • k k∗ ≤

yT x x for all x ≤ k k

and equality holds if x = 0; therefore sup yT x x = 0 x ( − k k) if y > 1, there exists an x with x 1, xT y > 1; then • k k∗ k k ≤ T T f ∗ y y tx tx = t y x x ( ) ≥ ( ) − k k ( − k k)

and right-hand side goes to infinity if t → ∞

Conjugate functions 5.10 Calculus rules

Separable sum

f x , x = g x + h x f ∗ y , y = g∗ y + h∗ y ( 1 2) ( 1) ( 2) ( 1 2) ( 1) ( 2)

Scalar multiplication (α > 0)

f x = αg x f y = αg y α ( ) ( ) ∗( ) ∗( / ) f x = αg x α f y = αg y ( ) ( / ) ∗( ) ∗( )

the operation f x = αg x α is sometimes called “right scalar multiplication” • ( ) ( / ) a convenient notation is f = gα for the function gα x = αg x α • ( )( ) ( / ) conjugates can be written concisely as gα = αg and αg = g α • ( )∗ ∗ ( )∗ ∗

Conjugate functions 5.11 Calculus rules

Addition to affine function

T f x = g x + a x + b f ∗ y = g∗ y a b ( ) ( ) ( ) ( − ) −

Translation of argument

T f x = g x b f ∗ y = b y + g∗ y ( ) ( − ) ( ) ( )

Composition with invertible linear mapping (A square and nonsingular)

T f x = g Ax f ∗ y = g∗ A− y ( ) ( ) ( ) ( )

Infimal convolution

f x = inf g u + h v f ∗ y = g∗ y + h∗ y ( ) u+v=x ( ( ) ( )) ( ) ( ) ( )

Conjugate functions 5.12 The second conjugate

T f ∗∗ x = sup x y f ∗ y ( ) y dom f ( − ( )) ∈ ∗

f is closed and convex • ∗∗ from Fenchel’s inequality, xT y f y f x for all y and x; therefore • − ∗( ) ≤ ( )

f ∗∗ x f x for all x ( ) ≤ ( )

equivalently, epi f epi f (for any f ) ⊆ ∗∗ if f is closed and convex, then •

f ∗∗ x = f x for all x ( ) ( )

equivalently, epi f = epi f ∗∗ (if f is closed and convex); proof on next page

Conjugate functions 5.13 Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ , epi f suppose x, f x < epi f ; then there is a strict separating hyperplane: ( ∗∗( ))  a T  z x  − c < 0 for all z, s epi f b s f x ≤ ( ) ∈ − ∗∗( ) for some a, b, c with b 0 (b > 0 gives a contradiction as s ) ≤ → ∞

if b < 0, define y = a b and maximize left-hand side over z, s epi f : • /(− ) ( ) ∈ T f ∗ y y x + f ∗∗ x c b < 0 ( ) − ( ) ≤ /(− ) this contradicts Fenchel’s inequality if b = 0, choose yˆ dom f and add small multiple of yˆ, 1 to a, b : • ∈ ∗ ( − ) ( )  T   a +  yˆ z x  T  − c +  f ∗ yˆ x yˆ + f ∗∗ x < 0  s f x ≤ ( ) − ( ) − − ∗∗( ) now apply the argument for b < 0

Conjugate functions 5.14 Conjugates and subgradients if f is closed and convex, then

T y ∂ f x x ∂ f ∗ y x y = f x + f ∗ y ∈ ( ) ⇐⇒ ∈ ( ) ⇐⇒ ( ) ( )

Proof. if y ∂ f x , then f y = sup yTu f u = yT x f x ; hence ∈ ( ) ∗( ) u ( − ( )) − ( ) T f ∗ v = sup v u f u ( ) u ( − ( )) vT x f x ≥ − ( ) = xT v y f x + yT x ( − ) − ( ) T = f ∗ y + x v y ( ) ( − ) this holds for all v; therefore, x ∂ f y ∈ ∗( ) reverse implication x ∂ f y = y ∂ f x follows from f = f ∈ ∗( ) ⇒ ∈ ( ) ∗∗

Conjugate functions 5.15 Example

f x = δ 1,1 x f ∗ y = y ( ) [− ]( ) ( ) | |

x y 1 1 −

∂ f x ∂ f y ( ) ∗( )

1

1 − x y 1

1 −

Conjugate functions 5.16 Strongly

Definition (page 1.18) f is µ-strongly convex (for ) if dom f is convex and k · k µ f θx + 1 θ y θ f x + 1 θ f y θ 1 θ x y 2 ( ( − ) ) ≤ ( ) ( − ) ( ) − 2 ( − )k − k for all x, y dom f and θ 0,1 ∈ ∈ [ ]

First-order condition

if f is µ-strongly convex, then • µ f y f x + gT y x + y x 2 for all x, y dom f , g ∂ f x ( ) ≥ ( ) ( − ) 2 k − k ∈ ∈ ( )

for differentiable f this is the inequality (4) on page 1.19 •

Conjugate functions 5.17 Proof

recall the definition of directional (page 2.28 and 2.29): • f x + θ y x f x f 0 x, y x = inf ( ( − )) − ( ) ( − ) θ>0 θ and the infimum is approached as θ 0 → if f is µ-strongly convex and subdifferentiable at x, then for all y dom f , • ∈ 1 θ f x + θ f y µ 2 θ 1 θ y x 2 f x f 0 x, y x inf ( − ) ( ) ( ) − ( / ) ( − )k − k − ( ) ( − ) ≤ θ 0,1 θ ∈( ] µ = f y f x y x 2 ( ) − ( ) − 2 k − k

from page 2.31, the directional derivative is the support function of ∂ f x : • ( ) gT y x sup g˜T y x ( − ) ≤ g˜ ∂ f x ( − ) ∈ ( ) = f 0 x; y x ( − ) µ f y f x y x 2 ≤ ( ) − ( ) − 2 k − k Conjugate functions 5.18 Conjugate of strongly convex function assume f is closed and strongly convex with parameter µ > 0 for the norm k · k

f is defined for all y (i.e., dom f = Rn • ∗ ∗ ) f is differentiable everywhere, with gradient • ∗ T f ∗ y = argmax y x f x ∇ ( ) x ( − ( ))

f ∗ is Lipschitz continuous with constant 1 µ for the dual norm : •∇ / k · k∗ 1 f ∗ y f ∗ y0 y y0 for all y and y0 k∇ ( ) − ∇ ( )k ≤ µk − k∗

Conjugate functions 5.19 Proof: if f is strongly convex and closed

yT x f x has a unique maximizer x for every y • − ( ) x maximizes yT x f x if and only if y ∂ f x ; from page 5.15 • − ( ) ∈ ( )

y ∂ f x x ∂ f ∗ y = f ∗ y ∈ ( ) ⇐⇒ ∈ ( ) {∇ ( )} hence f y = argmax yT x f x ∇ ∗( ) x ( − ( )) from first-order condition on page 5.17: if y ∂ f x , y ∂ f x : • ∈ ( ) 0 ∈ ( 0) T µ 2 f x0 f x + y x0 x + x0 x ( ) ≥ ( ) ( − ) 2 k − k T µ 2 f x f x0 + y0 x x0 + x0 x ( ) ≥ ( ) ( ) ( − ) 2 k − k

combining these inequalities shows

2 T µ x x0 y y0 x x0 y y0 x x0 k − k ≤ ( − ) ( − ) ≤ k − k∗k − k

now substitute x = f y and x = f y • ∇ ∗( ) 0 ∇ ∗( 0)

Conjugate functions 5.20 Outline

closed functions •

conjugate function •

duality • Duality

primal: minimize f x + g Ax ( ) ( ) T dual: maximize g∗ z f ∗ A z − ( ) − (− ) follows from Lagrange duality applied to reformulated primal • minimize f x + g y ( ) ( ) subject to Ax = y

dual function for the formulated problem is:

T T T inf f x + z Ax + g y z y = f ∗ A z g∗ z x,y ( ( ) ( ) − ) − (− ) − ( )

Slater’s condition (for convex f , g): holds if there exists an xˆ with • xˆ int dom f , Axˆ int dom g ∈ ∈ this also guarantees that the dual optimum is attained, if optimal value is finite

Conjugate functions 5.21 Set constraint

minimize f x ( ) subject to Ax b C − ∈

Primal and dual problem

primal: minimize f x + δ Ax b ( ) C( − ) T T dual: maximize b z δ∗ z f ∗ A z − − C( ) − (− )

Examples

constraint set C support function δ z C∗ ( ) equality Ax = b 0 0 { } norm inequality Ax b 1 unit -ball z k − k ≤ k · k k k∗ conic inequality Ax b K δ z K − K∗( )

Conjugate functions 5.22 Norm regularization

minimize f x + Ax b ( ) k − k

take g y = y b in general problem • ( ) k − k

minimize f x + g Ax ( ) ( )

conjugate of is indicator of unit ball for dual norm • k · k T g∗ z = b z + δB z where B = z z 1 ( ) ( ) { | k k∗ ≤ }

hence, dual problem can be written as • maximize bT z f AT z − − ∗(− ) subject to z 1 k k∗ ≤

Conjugate functions 5.23 Optimality conditions

minimize f x + g y ( ) ( ) subject to Ax = y assume f , g are convex and Slater’s condition holds

Optimality conditions: x is optimal if and only if there exists a z such that

1. primal feasibility: x dom f and y = Ax dom g ∈ ∈ 2. x and y = Ax are minimizers of the Lagrangian f x + zT Ax + g y zT y: ( ) ( ) − AT z ∂ f x , z ∂g Ax − ∈ ( ) ∈ ( )

if g is closed, this can be written symmetrically as

T A z ∂ f x , Ax ∂g∗ z − ∈ ( ) ∈ ( )

Conjugate functions 5.24 References

J.-B. Hiriart-Urruty, C. Lemaréchal, and Minimization • Algoritms (1993), chapter X.

D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization • (2003), chapter 7.

R. T. Rockafellar, Convex Analysis (1970). •

Conjugate functions 5.25