Convex functions I ELL 822–Selected Topics in Communications Let f : S → R, where S is a nonempty convex in Rn. The f is convex on S if

f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2) for each x1, x2 ∈ S and for each λ ∈ (0, 1) Lecture 2 Convex functions

- Ref: [Boyd] Chapter 3

Strictly convex on S if the above inequality is true as a strict inequality

Jun B. Seo 2-1 2-2

Level sets Sublevel sets • Sublevel set associated with f is defined as (for α ∈ R) Level set of function f with level α is defined as

Sα = {x ∈ dom f |f (x) ≤ α} S = {x|f (x) = α} • Let S be a nonempty in Rn and let f : S → R be a – Styblinski-Tang function: f (x) = 0.5(Pn x4 − 16x2 + 5x ) i=1 i i i . Then, the sublevel set Sα is convex: Proof Suppose x , x ∈ S . Thus, we have x , x ∈ S and 4 1 2 α 1 2

3 100 f (x1) ≤ α and f (x2) ≤ α. 2 50 1 Consider x = λx1 + (1 − λ)x2 for λ ∈ (0, 1).

0 2 0 x -50 -1 By convexity of S, we can see x ∈ S and x ∈ Sα. -2 -100 4 -3 Since f is convex, we write 2 4 0 2 -4 x2 -2 0 -2 x1 -4 -4 -4 -2 0 2 4 x1 f (x) ≤ λf (x1) + (1 − λ)f (x2) ≤ λα + (1 − λ)α = α.

2-3 2-4 Epigraph I Epigraph II • Let S be a nonempty set in Rn and let f : S → R. Proof • The graph of f is described by the set {(x, f (x))|x ∈ S} ⊂ Rn+1 Suppose that f is convex and let (x1, y1), and (x2, y2) ∈ epi f • The epigraph of f , denoted by epi f , is a subset of Rn+1, i.e., – It means that x1, x2 ∈ S and y1 ≥ f (x1) and y2 ≥ f (x2). epi f = {(x, y)|x ∈ S, y ∈ R, y ≥ f (x)} – Convexity of f enables us to write

f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2)

≤ λy1 + (1 − λ)y2

– Since λx1 + (1 − λ)x2 ∈ S, we have

[x1 + (1 − λ)x2, λy1 + (1 − λ)y2] ∈ epi f

• Let S be a nonempty convex set. Then, f is convex if and only if epi f is a convex set: 2-5 2-6

Epigraph III Epigraph IV Constrained optimization problem in standard form Proof continued minimize f0(x) x Conversely assume that epi f is convex and let x1, x2 ∈ S subject to fi (x) ≤ 0, for i = 1,..., m

Then, while [x1, f (x1)] ∈ epi f and [x2, f (x2)] ∈ epi f , due to hj (x) = 0, for j = 1,..., p convexity of epi f , for λ ∈ (0, 1) we have We can rewrite this in epigraph from as [λx + (1 − λ)x , λf (x ) + (1 − λ)f (x )] ∈ epi f . 1 2 1 2 minimize t x,t It implies subject to f0(x) − t ≤ 0 f (x) ≤ 0, for i = 1,..., m f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2), i hj (x) = 0, for j = 1,..., p which is convex. – Every problem can be transformed into a problem with linear objective function 2-7 2-8 Epigraph V First-order condition of convex functions I The epigraph form is an optimization problem in the (epi)graph Let S be a nonempty open set in Rn and let f : S → R be space (x, t): differentiable on S. minimize t Then, f is convex if and only if for any x ∈ S, we have x,t min. f0(x) = |x| x T ⇒ subject to |x| − t ≤ 0 f (y) ≥ f (x) + ∇f (x) (y − x) for each y ∈ S, subject to − x + 1 ≤ 0 − x + 1 ≤ 0 h df (x) df (x) iT where ∇f (x) = ,..., is gradient of f . dx1 dxn

3 feasible region

-3 0 1 3 – The first order approximation of f at x is a global lower bound 2-9 2-10

First-order condition of convex functions II First-order condition of convex functions III Proof Proof continued By convexity of f , To show the converse, suppose a point t = αx + (1 − α)y f (αy + (1 − α)x) ≤ αf (y) + (1 − α)f (x) We need to show that f is convex if the following holds = α(f (y) − f (x)) + f (x) f (x) ≥ f (t) + ∇f (t)(x − t) We rewrite this as f (y) ≥ f (t) + ∇f (t)(y − t) f (αy + (1 − α)x) − f (x) + αf (x) ≤ αf (y). Multiplying each with α and 1 − α, we have Finally, we have αf (x) ≥ αf (t) + α∇f (t)(x − t) f (x + α(y − x)) − f (x) (1 − α)f (y) ≥ (1 − α)f (t) + (1 − α)∇f (t)(y − t) f (y) ≥f (x) + (y − x) α(y − x) Adding them yields f (x + ∆x) − f (x) =f (x) + (y − x) αf (x) + (1 − α)f (y) ≥ f (t) + ∇f (t)(αx + (1 − α)y − t) ∆x =f (x) + ∇f (x)T (y − x) = f (αx + (1 − α)y)

2-11 2-12 First-order condition of convex functions IV Convex functions II • If f is convex and x, y ∈ dom f , we have Let S be an nonempty convex set in Rn and let f : S → R be t ≥ f (y) ≥ f (x) + ∇f (x)T (y − x) for (y, t) ∈ epi f differentiable on S. • The epi f has a supporting hyperplane with [∇f (x), −1] at x Then, f is convex if and only if for each x1, x2 ∈ S, we have " # " #! y x [∇f (x ) − ∇f (x )]T (x − x ) ≥ 0 (monotone) (y, t) ∈ epi f ⇒ [∇f (x) − 1] − ≤ 0 2 1 2 1 t f (x) Proof If f is convex, for two distinct x1 and x2 we have

T f (x1) ≥ f (x2) + ∇f (x2) (x1 − x2) T f (x2) ≥ f (x1) + ∇f (x1) (x2 − x1)

Adding two equations side-by-side, we have

T T ∇f (x2) (x1 − x2) + ∇f (x1) (x2 − x1) ≤ 0 Global minimum of convex function f is attained if and only if non-vertical supporting ∇f (x) = 0 hyperplanes 2-13 2-14

Convex functions III Second-order condition of convex functions I

Proof continued • Let Sn denote the set of symmetric n × n matrices, i.e., To prove the converse, by assumption, if the following holds for Sn = {X ∈ Rn×n|X = X T } x = λx1 + (1 − λ)x2 – Sn (or, Sn ) denote the set of symmetric positive [∇f (x) − ∇f (x )]T (x − x ) ≥ 0, ++ + 1 1 (semi)definite matrix, i.e., zT H z > 0 (or, zT H z ≥ 0) for z ∈ Rn T we have (1 − λ)[∇f (x) − ∇f (x1)] (x2 − x1) ≥ 0, i.e., • Let S be a nonempty open set in Rn and let f : S → R be twice

T T differentiable on S. ∇f (x) (x2 − x1) ≥ ∇f (x1) (x2 − x1) • f is convex (strictly) function if and only if its Hessian matrix Using the mean value theorem, i.e., ∂2f (x) T H (x) = [hij (x)] with hij (x) = f (x2) − f (x1) = ∇f (x) (x2 − x1), ∂xi ∂xj H (x) ∈ Sn (or Sn ) over S for x = λx1 + (1 − λ)x2 and λ ∈ (0, 1). We have the result. + ++

2-15 2-16 Second-order condition of convex functions II Second-order condition of convex functions III Proof Proof continued Using convexity of f , f (y) ≥ f (x) + ∇f (x)(y − x) for To show the converse, use ‘mean value theorem’ extended to y = x + λx ∈ S with small λ, we have second order f (x + λx) ≥ f (x) + λ∇f (x)T x Let f : Rn → R be twice continuously differentiable over an open set S, and x ∈ S. Using Taylor expansion of f , we also have For all y such that x + y ∈ S there exists an α ∈ [0, 1] 1 f (x + λx) = f (x) + λ∇f (x)T x + λ2xT H (x)x + λ2kxk2O(x; λx) 2 1 f (x + y) = f (x) + yT ∇f (x) + yT ∇2f (x + αy)y where O(x; λx) → 0 as λ → 0. 2 Plugging this, dividing λ2 and letting λ → 0 yields In using mean value theorem, let x = x + y ∈ S. Then, 1 1 T f (x) = f (x) + yT ∇f (x) + yT ∇2f (x + αy)y x H (x)x + O(x; λx) ≥ 0 2 2 | {z } →0

2-17 2-18

Second-order condition of convex functions IV Restriction of a convex function to a line I • f : Rn → R is convex if and only if g : R → R Proof continued g(t) = f (x + tv) for dom g = {t|x + tv ∈ domf } The point x + αy in ∇2f (x + αy) is expressed as is convex (in t) for any x ∈ dom f , v ∈ Rn x + αy = x + α(x − x) = αx + (1 − α)x = xˆ : used to check convexity of f by checking convexity of functions of one variable The theorem gives

f x , x x2 x2 1 ( 1 2) = 1 + 2 T T 2 f x , x x2 − x2 f (x) = f (x) + y ∇f (x) + y ∇ f (xˆ)y ( 1 2) = 1 2 2 60 20

1 T 2 40 0 If 2 y ∇ f (xˆ)y ≥ 0, then -20 20 f (x) ≥ f (x) + ∇f (x)T (x − x), -40 0 4 6 2 4 4 2 6 0 2 which completes the proof 0 2 4 x2 -2 0 x -2 -4 -2 0 -4 -2 x 2 -6 -6 -4 x1 -4 1

2-19 2-20 Restriction of a convex function to a line II Restriction of a convex function to a line III Proof n n f : S → R and f (X) = log det X for dom f = S++. Show whether f is convex or not g(t) = log det X + log det(Q(I + tΛ)QT ) T Proof = log det X + log det((I + tΛ)Q Q) n Y g(t) = log det(X + tV ) = log det(X 1/2(I + tX −1/2VX −1/2)X 1/2) = log det X + log det(I + tΛ) = log det X + log (1 + tλi ) i=1 = log det(X(I + tX −1/2VX −1/2)) n = log det X + X log(1 + tλ ) = log det X + log det(I + tX −1/2VX −1/2) i i=1 = log det X + log det(I + tQΛQT ) By examining g00(t), i.e., = log det X + log det Q(I + tΛ)QT n X 1 T T T 00 where real symmetric matrix A = QΛQ with QQ = Q Q = I g (t) = − 2 < 0 (1 + λi t) and Λ is a diagonal matrix of eigenvalues of X −1/2VX −1/2 i=1 we can see that f is concave.

2-21 2-22

Operations that preserve convexity I Operations that preserve convexity II • Every norm on Rn is convex

n • Let f1, f2, ... , fk: R → R be convex function. – Nonnegative weighted sum: Scalar composition f = h(g(x)), where h : Rk → R and k n k X g : R → R f (x) = αi fi (x) i=1 • f is convex if h is convex and nondecreasing, and g is convex, is convex for α > 0 i • f is convex if h is convex and nonincreasing, and g is concave – Pointwise maximum or supremum • f is concave if h is concave and nondecreasing, and g is concave f (x) = max{f1(x),..., fk(x)} • f is concave if h is concave and nonincreasing, and g is convex

– Composition with an affine mapping: Suppose f : Rn → R and h(x) = f (Ax + b). If f is convex, so is h 2-23 2-24 Operations that preserve convexity III Operations that preserve convexity IV

• Extended-value extension ˜f of convex function f for x ∈ dom f k n k For functions h : R → R and gi : R → R ( f (x), if x ∈ S Vector composition f = h(g(x)) = h(g (x),..., g (x)) where ˜f (x) = 1 k ∞, if x ∈/ S f 00 = g0(x)T ∇2h(g(x))g0(x) + ∇h(g(x))T g00(x) – ˜f is defined on Rn, and takes values in R ∪ {∞}. – f is convex for x ∈ convex set S, ˜f satisfies for θ ∈ [0, 1], f is convex if h is convex and nondecreasing in each argument, and g is convex, ˜f (θx1 + (1 − θ)x2) ≤ θ˜f (x1) + (1 − θ)˜f (x2) f is convex if h is convex and nonincreasing in each argument and gi • The following statements also hold: is concave f is concave if h is concave and nondecreasing in each argument, and – f is convex if h is convex, h˜ is nondecreasing, and g is convex, gi is concave – f is convex if h is convex, h˜ is nonincreasing, and g is concave

2-25 2-26

Subgradient of convex functions I Subgradient of convex functions II

A subgradient of convex function f : S → R at x ∈ S is any Let f (x) = min{f1(x), f2(x)}, where f1 and f2 are defined as n g ∈ R such that 2 f1(x) = 4 − |x| and f2(x) = 4 − (x − 2) for x ∈ R f (x) ≥ f (x) + gT (x − x) for all x ∈ S

always exists for convex f If f is differentiable, then we have unique g = ∇f (x)

0 1

Subgradient of f at x = 1: λ∇f1(1) + (1 − λ)∇f2(1) for λ ∈ [0, 1]

Subgradient of f at x = 4: λ∇f1(4) + (1 − λ)∇f2(4) for λ ∈ [0, 1] Subgradient is also a global underestimator of f at x 2-27 2-28 Subgradient of convex functions III Subgradient of convex functions IV

The subdifferential of f at x is the set of all subgradients at x • If g is a subgradient of f at x, from f (y) ≥ f (x) + gT (y − x),

∂f (x) = {g|f (x) ≥ f (x) + gT (x − x), ∀x ∈ dom f } f (y) < f (x) ⇒ gT (y − x) ≤ 0

0 1

• The nonzero subgradients at x define supporting hyperplanes to the sublevel set • ∂f (x) is always a closed convex set (possibly empty), since it is {y|f (y) ≤ f (x)} the intersection of an infinite set of halfspaces

2-29 2-30

Subgradient of convex functions V Subgradient of convex functions VI Let S be a convex set in Rn and f : S → R be convex function. Proof continued

For x ∈ int S, then ∂f (x) is nonempty. aT (x − x) + b(z − f (x)) ≤ 0 Proof • b > 0, as z → ∞, inequality does not hold: b ≤ 0 Suppose a hyperplane with normal vector [a, b] for a ∈ Rn, b ∈ R (not both zero) such that for all (x, z) ∈ epi f • b = 0 (this means a vertical hyperplane), we have " # " #! x x T [aT b] − = aT (x − x) + b(z − f (x)) ≤ 0 a (x − x) ≤ 0 z f (x) where (x, f (x)) is a boundary point of epi f which is impossible for all x ∈ S when x ∈ int S Suppose x + a ∈ int S for  > 0

aT (x − x) = aT a ≤ 0

This implies that a must be also zero, which contradicts nonzero vector a and b 2-31 2-32 Subgradient of convex functions VII Subgradient of convex functions VIII Proof continued A subgradient of convex function f : S → R at x ∈ S is any g ∈ Rn such that aT (x − x) + b(z − f (x)) ≤ 0 f (x) ≥ f (x) + gT (x − x) for all x ∈ S • b < 0, let ae = a/|b|, and divide both sides with |b|, " #T " # " #T " # This is rewritten as −a x −a x aT x − z ≤ aT x − f (x) ⇒ e ≤ e " # " #! e e x x 1 f (x) 1 z f (x) − gT x ≥ f (x) − gT x ⇒ [g − 1] − ≤ 0 f (x) f (x) for all (x, z) ∈ epi f , while we have ae ∈ ∂f (x) At point x, there exist a supporting hyperplane with [g, −1] • Letting z = f (x), we get a hyperplane, H as

n T o H = x|aˆ (xˆ − xˆ0) = 0 where " # " # " # a x x aˆ = e , xˆ = , and xˆ = −1 f (x) 0 f (x)

2-33 2-34

Subgradient of convex functions IX Properties of subdifferential I • Scaling: For λ > 0, the function λf is

The following functions are not subdifferentiable at x = 0 ∂(λf )(x) = λ∂f (x)

+ • f : R → R and dom f = R • Sum: the function of f1 + f2 is convex, ( 1, if x = 0 ∂(f + f )(x) = ∂f (x) + ∂f (x) f (x) = 1 2 1 2 0, if x > 0 • Composition with affine mapping: let φ(x) = f (Ax + b). Then, • f : R → R and dom f = R+ ∂φ(x) = AT ∂f (Ax + b) √ f (x) = − x • Finite pointwise maximum: if f (x) = maxi=1,...,n fi (x), then The only supporting hyperplane to epi f at (0, f (0)) is vertical   ∂f (x) = conv ∪i:fi (x)=f (x)∂fi (x) of the union of subdifferentials of all ‘active’ function at x 2-35 2-36 Properties of subdifferential II Quasi-convex n • Consider a piecewise-linear function Let f : S → R, where S is a nonempty convex set in R .

T The function f is called quasi-convex (or unimodal) f (x) = max (ai x + bi ) i=1,...,m • if and only if for x1, x2 ∈ S,

f (λx1 + (1 − λ)x2) ≤ max{f (x1), f (x2)} for each λ ∈ (0, 1)

• if and only if all its sublevel set Sα = {x ∈ dom f |f (x) ≤ α} for α ∈ R are convex • the function f is called quasi-concave, if −f is quasiconvex

• The subdifferential at x is a polyhedron

T ∂f = conv{ai |i ∈ I (x)} with I (x) = {i|ai x + bi = f (x)}

2-37 2-38

First-order condition of Quasi-convex I First-order condition of Quasi-convex II

Let S be a nonempty open convex set in Rn, and let f : S → R be differentiable on S Proof continued

Then, f is quasiconvex if and only if Suppose f (x2) ≤ f (x1). We assume that x2 > x1 and show that T f (z) ≤ f (x ) for z ∈ [x , x ] due to quasi-convexity. for x1, x2 ∈ S and f (x2) ≤ f (x1), ∇f (x1) (x2 − x1) ≤ 0 1 1 2 Consider it is not true, i.e., there is a z ∈ [x , x ] with f (z) > f (x ) Proof If f (x1) > f (x2), by definition of quasi-convexity, 1 2 1 Then, there exists z such that f 0(z) < 0 f (λx2 + (1 − λ)x1) = f (x1 + λ(x2 − x1)) ≤ f (x1) By definition of quasi-convexity, we must have We can write for 0 < λ ≤ 1 0 f (x1) ≤ f (z) ⇒ f (z) (x1 − z) ≤ 0 f (x + λ(x − x )) − f (x ) 1 2 1 1 (x − x ) ≤ 0 2 1 0 λ(x2 − x1) However, this contradicts, since f (z) < 0 and x1 − z < 0 As λ → 0, we have the result

2-39 2-40 Pseudo-convex

• For quasi-convex function, ∇f (x) = 0 does not give the condition of global minimizer • Let S be a nonempty open set in Rn and let f : S → R be differentiable on S The function f is called pseudoconvex:

T if for each x1, x2 ∈ S with ∇f (x1) (x2 − x1) ≥ 0, we have f (x2) ≥ f (x1) This shows that if ∇f (x) = 0 at any point x, we have

f (x) ≥ f (x) for all x, which implies that x is a global minimum for f

2-41