
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 1/53 Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 2/53 What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 3/53 What is Optimization What is Optimization? ◮ Finding the minimizer of a function subject to constraints: minimize f0(x) x s.t. fi(x) 0, i = 1, . , k ≤ { } hj(x) = 0, j = 1,...,l { } Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 4/53 What is Optimization What is Optimization? ◮ Finding the minimizer of a function subject to constraints: minimize f0(x) x s.t. fi(x) 0, i = 1, . , k ≤ { } hj(x) = 0, j = 1,...,l { } ◮ Example: Stock market. “Minimize variance of return subject to getting at least $50.” Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 4/53 What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. ◮ Linear regression: minimize Xw y 2 w k − k ◮ Classification (logistic regresion or SVM): n T minimize log 1 + exp( yix w) w − i i=1 X n 2 T or w + C ξi s.t. ξi 1 yix w, ξi 0. k k ≥ − i ≥ i X=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 5/53 What is Optimization We still care... ◮ Maximum likelihood estimation: n maximize log pθ(xi) θ i X=1 ◮ Collaborative filtering: T T minimize log 1 + exp(w xi w xj) w − i j X≺ ◮ k-means: k 2 minimize J(µ)= xi µj µ1,...,µk k − k j=1 i Cj X X∈ ◮ And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 6/53 What is Optimization But generally speaking... We’re screwed. ◮ Local (non global) minima of f0 ◮ All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = 0 250 200 150 100 50 0 −50 3 2 3 1 2 0 1 −1 0 −1 −2 −2 −3 −3 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 7/53 What is Optimization But generally speaking... We’re screwed. ◮ Local (non global) minima of f0 ◮ All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = 0 250 200 150 100 50 0 −50 3 2 3 1 2 0 1 −1 0 −1 −2 −2 −3 −3 ◮ Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 7/53 Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 8/53 Convex Sets Definition A set C Rn is convex if for x, y C and any α [0, 1], ⊆ ∈ ∈ αx + (1 α)y C. − ∈ y x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 9/53 Convex Sets Examples ◮ All of Rn (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 10/53 Convex Sets Examples ◮ All of Rn (obvious) ◮ Non-negative orthant, Rn : let x 0, y 0, clearly + αx + (1 α)y 0. − Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 10/53 Convex Sets Examples ◮ All of Rn (obvious) ◮ Non-negative orthant, Rn : let x 0, y 0, clearly + αx + (1 α)y 0. − ◮ Norm balls: let x 1, y 1, then k k≤ k k≤ αx + (1 α)y αx + (1 α)y = α x + (1 α) y 1. k − k ≤ k k k − k k k − k k≤ Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 10/53 Convex Sets Examples ◮ Affine subspaces: Ax = b, Ay = b, then A(αx + (1 α)y)= αAx + (1 α)Ay = αb + (1 α)b = b. − − − 1 0.8 0.6 0.4 3 x 0.2 0 −0.2 −0.4 1 0.8 1 0.6 0.8 0.4 0.6 0.4 0.2 0.2 0 0 x2 x1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 11/53 Convex Sets More examples ◮ Arbitrary intersections of convex sets: let Ci be convex for i , ∈ I C = i Ci, then x C, y C αx + (1 α)y Ci i T ∈ ∈ ⇒ − ∈ ∀ ∈ I so αx + (1 α)y C. − ∈ Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 12/53 Convex Sets More examples ◮ PSD Matrices, a.k.a. the positive semidefinite cone Sn Rn n Sn + × . A + means T ⊂ ∈ Rn x Ax 0 for all x . For 1 ≥S+ ∈ A, B , 0.8 ∈ n 0.6 T z x (αA + (1 α)B) x 0.4 − = αxT Ax + (1 α)xT Bx 0. 0.2 0 − ≥ 1 0.5 1 0.8 ◮ 0 On right: 0.6 −0.5 0.4 0.2 y −1 0 x x z S2 = 0 = x,y,z : x 0, y 0, xy z2 + z y ≥ ≥ ≥ Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 13/53 Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 14/53 Convex Functions Definition A function f : Rn R is convex if for x, y dom f and any α [0, 1], → ∈ ∈ f(αx + (1 α)y) αf(x) + (1 α)f(y). − ≤ − f(y) αf(x) + (1 - α)f(y) f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 15/53 Convex Functions First order convexity conditions Theorem Suppose f : Rn R is differentiable. Then f is convex if and only if for → all x, y dom f ∈ f(y) f(x)+ f(x)T (y x) ≥ ∇ − f(y) f(x) + f(x)T (y - x) ∇ (x, f(x)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 16/53 Convex Functions Actually, more general than that Definition The subgradient set, or subdifferential set, ∂f(x) of f at x is ∂f(x)= g : f(y) f(x)+ gT (y x) for all y . ≥ − f(y) Theorem f : Rn R is convex if and only if it→ has non-empty subdifferential set everywhere. (x, f(x)) f(x) + gT (y - x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 17/53 Convex Functions Second order convexity conditions Theorem Suppose f : Rn R is twice differentiable. Then f is convex if and only if → for all x dom f, ∈ 2f(x) 0. ∇ 10 8 6 4 2 0 2 1 2 1 0 0 −1 −1 −2 −2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 18/53 Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the epi f set of points epi f = (x,t): f(x) t . { ≤ } ◮ epi f is convex if and only if f is convex. a ◮ Sublevel sets, x : f(x) a { ≤ } are convex for convex f. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 19/53 Convex Functions Examples Examples ◮ Linear/affine functions: f(x)= bT x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 20/53 Convex Functions Examples Examples ◮ Linear/affine functions: f(x)= bT x + c. ◮ Quadratic functions: 1 f(x)= xT Ax + bT x + c 2 for A 0. For regression: 1 1 1 Xw y 2 = wT XT Xw yT Xw + yT y. 2 k − k 2 − 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 20/53 Convex Functions Examples More examples ◮ Norms (like ℓ1 or ℓ2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y . k − k ≤ k k k − k k k − k k Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 21/53 Convex Functions Examples More examples ◮ Norms (like ℓ1 or ℓ2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y . k − k ≤ k k k − k k k − k k ◮ Composition with an affine function f(Ax + b): f(A(αx + (1 α)y)+ b)= f(α(Ax + b) + (1 α)(Ay + b)) − − αf(Ax + b) + (1 α)f(Ay + b) ≤ − Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 21/53 Convex Functions Examples More examples ◮ Norms (like ℓ1 or ℓ2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y . k − k ≤ k k k − k k k − k k ◮ Composition with an affine function f(Ax + b): f(A(αx + (1 α)y)+ b)= f(α(Ax + b) + (1 α)(Ay + b)) − − αf(Ax + b) + (1 α)f(Ay + b) ≤ − ◮ Log-sum-exp (via 2f(x) PSD): ∇ n f(x) = log exp(xi) i ! X=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 21/53 Convex Functions Examples Important examples in Machine Learning 3 ◮ SVM loss: [1 - x]+ T f(w)= 1 yix w − i + ◮ Binary logistic loss: x T log(1 + e ) f(w) = log 1 + exp( yix w) − i 0 −2 3 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 22/53 Convex Optimization Problems Convex Optimization Problems Definition An optimization problem is convex if its objective is a convex function, the inequality constraints fj are convex, and the equality constraints hj are affine minimize f0(x) (Convex function) x s.t. fi(x) 0 (Convex sets) ≤ hj(x) = 0 (Affine) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall2009 23/53 Convex Optimization Problems It’s nice to be convex Theorem If xˆ is a local minimizer of a convex optimization problem, it is a global minimizer.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages82 Page
-
File Size-