Lecture 3: Convex Functions

Xiugang Wu

University of Delaware

Fall 2019

• Basic Properties and Examples • Operations that Preserve Convexity • The Conjugate • Quasiconvex Functions • Log-concave and Log-convex Functions • Convexity with respect to Generalized Inequalities

A function f : Rn R is convex if dom f is a and ! f(✓x +(1 ✓)y) ✓f(x)+(1 ✓)f(y)  for all x, y dom f and ✓ [0, 1] 2 2

- f is strictly convex if the above holds with “ ” replaced by “<”  - f is concave if f is convex -ane functions are both convex and concave; conversely, if a function is both convex and concave, then it is ane

- f is convex i↵ it is convex when restricted to any line that intersects its domain, i.e. i↵ for all x dom f and all v, the function g(t)=f(x + tv) on its domain t x + tv dom2 f is convex { | 2 } 4 Extended-Value Extension

- The extended-value extension f˜ of f is defined by

f(x) x dom f f˜(x)= 2 x/dom f (1 2 - This simplifies notation as f is convex i↵

f˜(✓x +(1 ✓)y) ✓f˜(x)+(1 ✓)f˜(y), x, y Rn, ✓ [0, 1]  8 2 2 - We can recover the domain of f from the extension by taking dom f = x f˜(x) < { | 1}

5 First Order Condition

- f is di↵erentiable if dom f is open and the gradient

@f @f @f f(x)= , ,..., r @x @x @x ✓ 1 2 n ◆ exists at each x dom f 2 -di↵erentiable f with convex domain is convex i↵

f(y) f(x)+ f(x)T (y x), x, y dom f r 8 2 - Note that the ane function g(y)=f(x)+ f(x)T (y x) is the first-order Taylor approximation of f near x. The abover says that f is convex i↵ the first-order Taylor approximation is always an underestimator of f.

6 Second Order Condition

- f is twice di↵erentiable if dom f is open and the Hessian 2f(x) Sn, r 2 2 2 @ f f(x)ij = ,i,j [1 : n] r @xi@xj 2 exists at each x dom f 2 -twicedi↵erentiable f with convex domain is convex i↵ the Hessian 2f(x)is PSD for all x dom f r 2 -twicedi↵erentiable f with convex domain is strictly convex if the Hessian 2f(x) is PD for all x dom f r 2 1 T T - f(x)= 2 x Px+ q x + r is convex i↵ its Hessian P is PSD [Note that (xT Px)=2Px and 2(xT Px)=2P ] r r 2 T - Least-squares objective f(x)= Ax b 2 is convex because f =2A (Ax b) and 2f =2AT A 0 for any Ak k r r ⌫

7 Examples in One Dimension

Convex -ane: ax + b on R, for any a, b R 2 - exponential: eax on R, for any a R 2 -powers:xa on R , for a 1 or a 0 ++  - powers of : x a on R, for a 1 | |

- negative entropy: x log x on R++

Concave -ane: ax + b on R, for any a, b R 2 -powers:xa on R , for a [0, 1] ++ 2

- logarithm: log x on R++

8 Examples in High Dimensions ane functions are convex and concave; all norms are convex

Examples on Rn -ane function: f(x)=aT x + b

n p 1/p - norms: e.g., x p =( i=1 xi ) for p 1; x = maxi [1:n] xi k k | | k k1 2 | | P m n Examples on R ⇥ -ane function:

T f(X)=tr(A X)+b = AijXij + b i [1:m] j [1:n] 2X 2X - spectral (maximum singlar value) :

f(X)= X = (X)=( (XT X))1/2 k k2 max max

9 Examples in High Dimensions

- max function: f(x) = maxi xi is convex

- Quadratic over linear function: f(x, y)=x2/y is convex for y>0

2 y2 xy 2 y 2f(x, y)= = y x 0 r y3 xy x2 y3 x ⌫   ⇥ ⇤ - Log-sum-exponential: f(x) = log(ex1 + +exn ) is convex; also called softmax ··· 1/n - Geometric mean: f(x)=( i xi) is concave

Q n - Log determinant: f(x) = log detX, dom f = S++ is concave

g(t) = log det(Z + tV ) 1/2 1/2 1/2 1/2 = log det(Z (I + tZ VZ )Z ) 1/2 1/2 = log detZ + log det(I + tZ VZ ) n

= log detZ + log(1 + ti) i=1 X 1/2 1/2 where ,i [1 : n] are the eigenvalues of Z VZ i 2

10 Sublevel Set and

-The↵-sublevel set of f is x domf f(x) ↵ .Iff is convex, then its sublevel set is convex. Converse{ 2 is not true:| ex is} concave, but its sublevel sets are convex.

- The graph of function f is defined as (x, f(x) x dom f) ,whichisasubset in Rn+1. The epigraph is defined as{epif =| (2x, t) x dom} f,t f(x) . Function f is convex i↵ its epigraph is a convex{ set. | 2 }

11 Jensen’s Inequality

-Iff is convex, then f(Ex) Ef(x) for any x with x dom f w.p. 1. If f is non-convex, then there is a random variable x with x2 domf w.p. 1, such that f(Ex) > Ef(x). 2

- An intepretation: Consider convex function f. For any x dom f and random variable z with Ez = 0, we have 2

Ef(x + z) f(E[x + z]) = f(x), which means that randomization or dithering (adding a zero mean random vec- tor to the argument) only increases the value of convex function on average.

13 A Calculus of Convex Functions

Practical methods for establishing convexity of a function:

- verify definition (often simplified by restricting to a line)

- for twice di↵erentiable function, show its Hessian is PSD

- show that f is obtained from simple convex functions by operations that preserve convexity – nonnegative weighted sum – composition with ane function – pointwise maximum and supremum – composition – minimization – perspective

14 Positive Weighted Sum and Composition with Affine

Nonnegative weighted sum: If f ,...,f are convex then f = w f + +w f 1 m 1 1 ··· m m is also convex, where w1,...,wm are nonnegative

Composition with ane function: f(Ax + b) is convex if f is convex

- log barrier for linear inequalities:

m f(x)= log(b aT x), dom f = x aT x

15 Pointwise Maximum

If f ,...,f are convex then f(x) = max f (x),...,f (x) is also convex 1 m { 1 m }

T - piecewise-linear function: f(x) = maxi [1:m](ai x + b) is convex 2 - sum of r largest components of x Rn: 2

f(x)=x[1] + x[2] + ...+ x[r]

is convex, where x[i] is the ith largest component of x; this is because

f(x) = max xi I [1:n], I =r ✓ | | i I X2

16 Pointwise Supremum

If f(x, y) is convex in x for each y A then 2 g(x)=supf(x, y) y A 2 is also convex

T - support function of a set C: SC (x)=supy C y x is convex 2 - distance to farthest point in a set C:

f(x)=sup x y y C k k 2 - maximum eigenvalue of symmetric matrix: for X Sn, 2 T max(X)= sup y Xy y =1 k k2

17 Composition with Scalar Functions

Composition of h : R R with scalar function g : Rn R: ! ! f(x)=h(g(x)) f is convex if: - g convex, h convex, h˜ nondecreasing - g concave, h convex, h˜ nonincreasing

Remark: - proof (for n = 1, di↵erentiable g, h)

2 f 00(x)=h00(g(x))g0(x) + h0(g(x))g00(x)

- monotonicity must hold for extended-value extension h˜

Examples: -expg(x) is convex if g is convex -1/g(x) is convex if g is concave and positive

18 Composition with Vector Functions

Composition of h : Rk R with scalar function g : Rn Rk: ! !

f(x)=h(g1(x),g2(x),...,gk(x)) f is convex if: - gi convex, h convex, h˜ nondecreasing in each argument - gi concave, h convex, h˜ nonincreasing in each argument proof (for n = 1, di↵erentiable g, h)

T 2 T f 00(x)=g0(x) h(g(x))g0(x)+ h(g(x)) g00(x) r r

Examples: m - i=1 log gi(x) is concave if gi are concave and positive m - log exp gi(x) is convex if g are convex P i=1 P

19 Minimization

If f is convex in (x, y) and C is a convex nonempty set, then the function g(x)=infy C f(x, y) is also convex 2

E.g., distance to a set: d(x, S)=infy S x y is convex if S is convex [function x y is convex in (x, y)soifS is convex2 k thenk d(x, S) is convex in x] k k

20 Perspective

The perspective of f : Rn R is the function g : Rn+1 R given by ! ! g(x, t)=tf(x/t) with domg = (x, t) x/t domf,t > 0 . g is convex if f is convex. { | 2 } E.g.: Euclidean norm squared. The perspective of the convex function f(x)= xT x is g(x, t)=t(x/t)T (x/t)=xT x/t, which is convex in (x, t) for t>0

E.g.: Consider the convex function f(x)= log x.Itsperspectiveisg(x, t)= t log x/t = t log t/x = t log t t log x which is convex in (x, t). The function g is called the relative entropy of t and x. Therefore, the relative entropy of two vectors u, v defined as

ui log(ui/vi) i X is convex in (u, v) since it is a sum of relative entropies of ui,vi.

22 Conjugate Function

n n For a function f : R R, its conjugate function f ⇤ : R R is defined as ! ! T f ⇤(y)= sup (y x f(x)). x domf 2 n - The domain of f ⇤ consists of y R for which the supremum if finite 2 - f ⇤ is convex (even if f is not)

23 Examples

-Ane functions: f(x)=ax + b. As a function of x, yx ax b is bounded i↵ y = a, in which case it is constant. Therefore, the domain of the conjugate function f ⇤ is the singleton a , and f ⇤(a)= b { } - Negative logarithm. f(x)= log x. The function xy + log x is unbounded for y 0 and for y<0, reaches its maximum f ⇤(y)= log( y) 1 at x = 1/y 1 T n - Strictly convex quadratic function. f(x)= 2 x Qx,withQ S++.The T 1 T 2 function y x 2 x Qx is bounded above as a function of x for all y, and attains 1 1 T 1 the maximum at x = Q y,sof ⇤(y)= 2 y Q y

25 Quasiconvex Functions

- A function f is quasiconvex if its domain and all its sublevel sets are convex

- f is quasiconcave if f is quasiconvex, i.e. all its superlevel sets are convex - f is quasilinear if it is both quasiconvex and quasiconcave

26 Examples

- x is quasiconvex on R | | p - log x is quasilinear on R++

2 - f(x1,x2)=x1x2 is quasiconcave on R++

- linear-fractional function aT x + b f(x)= , dom f = x cT x + d>0 cT x + d { | } is quasilinear

27 Properties

- Modified Jensen’s inequality for f: for any x, y in the domain and ✓ [0, 1], 2 f(✓x +(1 ✓)y) max f(x),f(y) ,  { } i.e. the value of the function on a segment doesn’t exceed the maximum of the values at the endpoints.

- First-order condition: di↵erentiable f with convex domain is quasiconvex i↵

f(y) f(x) f(x)T (y x) 0  )r  - sums of quasiconvex functions are not necessarily quasiconvex

29 Log-concave and Log-convex Functions a positive function f is log-concave if log f is concave:

✓ 1 ✓ f(✓x +(1 ✓)y) f(x) f(y) , ✓ [0, 1] 8 2 f is log-convex if log f is convex

-powers:xa on R is log-convex for a 0, log-concave for a 0 ++  - many common probability densities are log-concave, e.g., Gaussian:

1 1 T 1 f(x)= exp (x x¯) ⌃ (x x¯) n 2 (2⇡) det⌃ ✓ ◆ p - cdf of Gaussian: x 1 u2/2 (x)= e du 2⇡ Z1 - determinant det X

30 Properties

-twicedi↵erentiable f with convex domain is log-concave i↵

f(x) 2f(x) f(x) f(x)T , x domf r r r 8 2 f is log-convex if log f is convex

- product of log-concave functions is log-concave

- sum of log-concave functions is not necessarily log-concave

- integration: if f : Rn Rm R is log-concave, then ⇥ ! g(x)= f(x, y)dy Z is log-concave

32 Convexity with respect to Generalized Inequalities

f : Rn Rm is K-convex if dom f is convex and for any x, y dom f and ✓ [0, 1],! 2 2 f(✓x +(1 ✓)y) ✓f(x)+(1 ✓)f(y) K