Convex Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 1 / 26 Key Concepts in Convex Analysis: Convex Sets Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 2 / 26 Key Concepts in Convex Analysis: Convex Functions Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 3 / 26 Key Concepts in Convex Analysis: Minimizers Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 4 / 26 A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 strong convexity ) Strong convexity strict convexity. 6( Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 ) Strong convexity strict convexity. 6( Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 convexity strong convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 convexity strong convexity ) Strong convexity strict convexity. 6( Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 If f is differentiable, @f (x) = frf (x)g non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Notation: r~ f (x) is a subgradient of f at x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Verify definition of a convex function. @2f (x) Check if @2x greater than or equal to 0 (for twice differentiable function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Establishing convexity How to check if f (x) is a convex function? Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26 Establishing convexity How to check if f (x) is a convex function? Verify definition of a convex function. @2f (x) Check if @2x greater than or equal to 0 (for twice differentiable function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26 Unconstrained Optimization Algorithms: First order methods (gradient descent, FISTA, etc.) Higher order methods (Newton's method, ellipsoid, etc.) ... Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 8 / 26 Algorithm: @f (xt ) gt = @x . xt = xt−1 − ηgt . Repeat until convergence. Gradient descent Problem: min f (x) x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26 Gradient descent Problem: min f (x) x Algorithm: @f (xt ) gt = @x . xt = xt−1 − ηgt . Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26 Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Newton's method is a special case of steepest descent using Hessian norm. Newton's method Problem: min f (x) x Assume f is twice differentiable. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Newton's method is a special case of steepest descent using Hessian norm. Newton's method Problem: min f (x) x Assume f is twice differentiable. Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Newton's method Problem: min f (x) x Assume f is twice differentiable. Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Newton's method is a special case of steepest descent using Hessian norm. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Lagrange dual function: min L(x; λ, ν) x This is a concave function, regardless of whether f (x) convex or not. Can be −∞ for some λ and ν. Lagrange dual problem: max L(x; λ, ν) subject to λ ≥ 0 λ,ν Dual feasible: if λ ≥ 0 and λ, ν 2 dom L(x; λ, ν). Dual optimal: d∗ = max min L(x; λ, ν) λ≥0,ν x Duality Primal optimal: p∗ = min max L(x; λ, ν) x λ≥0,ν Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26 Lagrange dual problem: max L(x; λ, ν) subject to λ ≥ 0 λ,ν Dual feasible: if λ ≥ 0 and λ, ν 2 dom L(x; λ, ν). Dual optimal: d∗ = max min L(x; λ, ν) λ≥0,ν x Duality Primal optimal: p∗ = min max L(x; λ, ν) x λ≥0,ν Lagrange dual function: min L(x; λ, ν) x This is a concave function, regardless of whether f (x) convex or not.