<<

CONVEX FUNCTIONS

CMSC764 SP16 CONVEX SETS

Def: A is convex is every line between two points stays in the set? ✓x +(1 ✓)x , 0 ✓ 1 1 2   More General: All convex combinations lie in set

✓ixi = ✓1x1 + ✓2x2 + ✓3x3 X ✓ = ✓ + ✓ + ✓ =1, ✓ 0 i i 1 2 3 i 8 X These definitions are the same! IS IT CONVEX?

What if the square was round?

figures form Boyd and Vandenberghe

All convex combinations of points an a set

…it’s always the smallest convex superset IMPORTANT EXAMPLES?

Are these convex?

Hyperplane x aT x = b { | } Half-space x aT x b { | } x x x0 = b { | k k } Ball x x x b { | k 0k } Polynomials f f = a xi { | i i} i X FUNCTIONS OVER NON- CONVEX DOMAIN ⌦ Convex?? f : ⌦ R ! EXAMPLE: LINEAR STRESS IN FINITE ELEMENTS

Functions space over non-convex domain is convex

Biharmonic equation min u 2 u, r u:⌦ R k k h i ! UNIT BALL x x b { | k k } Convex?? Does it depend on which norm??

No! Because of triangle inequality. CONES x C = ax C, a 0 Second-order cone:2 ) 2 8 n+1 C2 = (x, t) x t R { | k k } 2 SEMIDEFINITE CONE n n n T S = A R ⇥ A = A ,A 0 { 2 | ⌫ } Set of PSD matrices?

Why is this a cone? ALLOWED OPERATIONS

The intersection of convex sets is convex x : Ax b {  } Is it convex? Why? SEMIDEFINITE CONE xT Ax 0, x 8 T S+ = A x Ax 0 { | } x N 2\R

Convex??? OTHER ALLOWED OPERATIONS

Set sum A + B = x + y x A, y Y { | 2 2 } Set Product A B = (x, y) x A, y Y ⇥ { | 2 2 }

What about union? A B [ CONVEX FUNCTIONS Jensen’s Inequality f(✓x +(1 ✓)y) ✓f(x)+(1 ✓)f(y)  EPIGRAPH epi(f)= (x, y) f(x) y { |  } f(x) y y

x x

y y

x x EPIGRAPH epi(f)= (x, y) f(x) y { |  } convex = convex epigraph

convex

non-convex SOME DEFINITIONS Theorem “Any proper, closed, function with bounded level sets has a minimizer” bounded level sets Proper : epigraph is non-empty

Coercive : x = f(x) k k!1 ) !1 unbounded level sets

Bounded level sets : f(x)= ln(x) ↵ r, f (x) ↵ = x r 8 9  )kk SOME DEFINITIONS

epi(f)= (x, y) f(x) y { |  }

Closed : epigraph is closed closed, LSC or level sets are closed

Lower semi-continuous : ✏ > 0, > 0, x x < open, USC 8 9 | 0| = f(x) f(x ) ✏ ) 0 Proper+closed = LSC WHY DO WE CARE ABOUT ?

Any closed function with bounded level sets has a minimizer (by compactness)

…but, for a convex function, Any minimizer is global (why) Set of minima is convex (why)

Therefore, we can find global minimizers WHY DO WE CARE ABOUT CONVEX FUNCTIONS?

Minimizers of convex problems This can’t happen! form a

If you found one you found them all! CONVEX FUNCTIONS HAVE CONVEX SUB-LEVEL SETS

f(x) convex contours

3

2 y 1

↵ 0

−1 x sub-level set −2

−3 f = x f(x) ↵ −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 ↵ { |  } x QUASI-CONVEX Non-convex problems can still have this property… convex contours = quasi-convex < convex

f(x) = log( x + .1) f(x)= x + .1 | | | | 1 2.5

0.5

2

0

1.5 −0.5 f −1 e 1

−1.5

0.5

−2

−2.5 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

This function is “log-convex” WHY DO WE CARE ABOUT CONVEX FUNCTION?

Any minimizer is global (why) Set of minima is convex (why)

Therefore, we can find global minimizers

But why convex?

Convex functions are closed under many operations POSITIVE WEIGHTED SUM

Sums of convex functions are convex

g(x)= fi(x) i X Example: f(x)= x 2 = x2 k k i i X Example: f(x)=x + x2 + x6 + x | | AFFINE COMPOSITION

Affine composition with convex function = convex

f(x) f(Ax + b)

What does this say about epigraphs? EXAMPLES

Least squares Ax b 2 = (AT x b)2 k k i i X 1 w 2 + C h(1 dT w) SVM 2k k i i X Logistic Regression log(1 + exp( ` dT x)) i i i X POINTWISE-MAX preserves convexity

g(x) = max fi(x) i Absolute value x = max x, x | | { }

Infinity norm x = max xi k k1 i | |

T Max eigenvalue A 2 = max v Av k k v

What does this say about epigraphs? WHY ARE THESE CONVEX?

Trace f(X) = trace(AT X) Linear operator

Distance over set f(x) = max x y Max over convex y C k k 2 Min of convex Distance to set f(x)=min x y y C k k (special case) 2

Max eigenvalue f(x)= b + Aixi 2 k k Affine comp i X If g(x,y) is convex, then minimizing for y preserves convexity WHY ARE THESE NON- CONVEX?

Comp of Neural Net y = (X (X (X D))) 3 2 1 Convex

NOT Dictionary learning f(X, Y )= XY B fro k k affine DIFFERENTIAL PROPERTIES IMPORTANT PROPERTY

Convex functions lie ABOVE their linear approximation OPTIMALITY CONDITIONS First-order conditions = optimality for convex funcs only f(y) f(x)+ f(x)T (y x) r =0 f(y) f(x)

This doesn’t happen!!

Any minima is a global minima SECOND ORDER CONDITIONS A smooth function is convex iff 2f(x) 0, x r ⌫ 8 For non-convex functions, minima satisfy: 2f(x?) 0 r ⌫

remember - the Hessian is a good local model of a smooth function STRONG CONVEXITY What about when there’s no Hessian? m f(y) f(x)+(y x)T f(x)+ y x 2 r 2 k k holds for any convex f min curvature

When Hessian exists… 1 f(y) f(x)+(y x)T f(x)+ (y x)T 2f(x)(y x) ⇡ r 2 r f(x)+(y x)T f(x)+ min y x 2 r 2 k k

…and so 2f(x) mI r ⌫ UPPER BOUND ON CURVATURE m Lower f(y) f(x)+(y x)T f(x)+ y x 2 r 2 k k M upper f(y) f(x)+(y x)T f(x)+ y x 2  r 2 k k

1 f(y) f(x)+(y x)T f(x)+ (y x)T 2f(x)(y x) ⇡ r 2 r f(x)+(y x)T f(x)+ max y x 2  r 2 k k LIPSCHITZ CONSTANT

f(x) f(y) M y x kr r k k k

M upper f(y) f(x)+(y x)T f(x)+ y x 2  r 2 k k

We use this when there’s no Hessian (ex: Huber function) OBJECTIVE ERROR BOUNDS m Lower f(y) f(x)+(y x)T f(x)+ y x 2 r 2 k k M upper f(y) f(x)+(y x)T f(x)+ y x 2  r 2 k k

We can bound the objective error in terms of distance from minimizer m f(y) f(x?) y x? 2 2 k k

M f(y) f(x?) y x? 2  2 k k CONDITION NUMBER major axis For smooth For any  = function minor axis functions  cond( 2f(x)) For differentiable ⇡ r functions  = M/m  =1.5  =5 10 10

8 8

6 6

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6

−8 −8

−10 −10 −10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 SUB-DIFFERENTIAL @f(x)= g : f(y) >f(x)+(y x)T g, y { 8 }

Optimality: 0 @f(x⇤) 2 EXAMPLE

@f(0) = [ 1, 1] MONOTONICITY

The (sub) of any convex function is monotone

y x, f(y) f(x) 0 h r r i or y x, g g 0 h y xi for any g @f(x), and g @f(y) x 2 y 2 This generalizes the concept of PSD Hessian (y x)T (f(y) f(x)) =(y x)T H(y x) 0 r i CONJUGATE FUNCTION

T f ⇤(y) = max y x f(x) x

Is it convex? EXAMPLE: QUADRATIC

1 f(x)= xT Qx 2 T 1 T f ⇤(y) = max y x x Qx x 2 y Qx? =0 ? 1 x = Q y

T 1 1 T 1 1 f ⇤(y)=y Q y y Q QQ y 2

1 T 1 f ⇤(y)= y Q y 2 CONJUGATE OF NORM f(x)= x k k T f ⇤(y) = max y x x x k k dual norm Holder inequality T y , max y x/ x yT x y x k k⇤ x k k k k⇤k k

y 1 f ⇤ =0 why? k k⇤  !

y > 1 f ⇤ = why? k k⇤ ! 1

0, y 1 f ⇤(y)= k k⇤  , otherwise (1 EXAMPLES Holder inequality for p-norms 1 1 x, y x y , + =1 |h i| k kpk kq p q

0, x 2 1 f(x)= x 2,f⇤(y)= 2(y)= k k  k k X , x > 1 (1 k k2 0, x 1 f(x)= x ,f⇤(y)= (y)= k k1  | | X1 ( , x > 1 1 k k1

0, x 1 f(x)= x ,f⇤(y)= 1(y)= | |  k k1 X , x > 1 (1 | | Useful when we study PROPERTIES OF CONJUGATE conjugate T f ⇤(y) = max y x f(x) x x is the point where f has gradient y (why?) y @f(x) 2 The gradient of conjugate = the “adjoint” of the gradient (why?)

y = f(x) f ⇤(y)=x r () r …and in general y @f(x) x @f ⇤(y) 2 () 2 also important for convergence proofs!