Convex Optimization

Convex Optimization

Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 1 / 26 Key Concepts in Convex Analysis: Convex Sets Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 2 / 26 Key Concepts in Convex Analysis: Convex Functions Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 3 / 26 Key Concepts in Convex Analysis: Minimizers Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 4 / 26 A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 strong convexity ) Strong convexity strict convexity. 6( Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 ) Strong convexity strict convexity. 6( Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 convexity strong convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: 8λ 2 [0; 1], f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) A β−strongly convex function satisfies a stronger condition: 8λ 2 [0; 1] β f (λx + (1 − λ)x0) ≤ λf (x) + (1 − λ)f (x0) − λ(1 − λ)kx − x0k2 2 2 convexity strong convexity ) Strong convexity strict convexity. 6( Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26 v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 If f is differentiable, @f (x) = frf (x)g non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 non-differentiable case Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Notation: r~ f (x) is a subgradient of f at x Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Key Concepts in Convex Analysis: Subgradients Convexity ) continuity; convexity 6) differentiability (e.g., f (w) = kwk1). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x0) ≥ f (x) + v>(x0 − x) Subdifferential: @f (x) = fv : v is a subgradient of f at xg If f is differentiable, @f (x) = frf (x)g linear lower bound non-differentiable case Notation: r~ f (x) is a subgradient of f at x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26 Verify definition of a convex function. @2f (x) Check if @2x greater than or equal to 0 (for twice differentiable function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Establishing convexity How to check if f (x) is a convex function? Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26 Establishing convexity How to check if f (x) is a convex function? Verify definition of a convex function. @2f (x) Check if @2x greater than or equal to 0 (for twice differentiable function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26 Unconstrained Optimization Algorithms: First order methods (gradient descent, FISTA, etc.) Higher order methods (Newton's method, ellipsoid, etc.) ... Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 8 / 26 Algorithm: @f (xt ) gt = @x . xt = xt−1 − ηgt . Repeat until convergence. Gradient descent Problem: min f (x) x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26 Gradient descent Problem: min f (x) x Algorithm: @f (xt ) gt = @x . xt = xt−1 − ηgt . Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26 Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Newton's method is a special case of steepest descent using Hessian norm. Newton's method Problem: min f (x) x Assume f is twice differentiable. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Newton's method is a special case of steepest descent using Hessian norm. Newton's method Problem: min f (x) x Assume f is twice differentiable. Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Newton's method Problem: min f (x) x Assume f is twice differentiable. Algorithm: @f (xt ) gt = @x . −1 st = H gt , where H is the Hessian. xt = xt−1 − ηst . Repeat until convergence. Newton's method is a special case of steepest descent using Hessian norm. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26 Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Duality Primal problem: minx f (x) subject to gi (x) ≤ 0 i = 1;:::; m hi (x) = 0 i = 1;:::; p for x 2 X. Lagrangian: m p X X L(x; λ, ν) = f (x) + λi gi (x) + νi hi (x) i=1 i=1 λi and νi are Lagrange multipliers. Suppose x is feasible and λ ≥ 0, then we get the lower bound: m f (x) ≥ L(x; λ, ν)8x 2 X; λ 2 R+ Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26 Lagrange dual function: min L(x; λ, ν) x This is a concave function, regardless of whether f (x) convex or not. Can be −∞ for some λ and ν. Lagrange dual problem: max L(x; λ, ν) subject to λ ≥ 0 λ,ν Dual feasible: if λ ≥ 0 and λ, ν 2 dom L(x; λ, ν). Dual optimal: d∗ = max min L(x; λ, ν) λ≥0,ν x Duality Primal optimal: p∗ = min max L(x; λ, ν) x λ≥0,ν Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26 Lagrange dual problem: max L(x; λ, ν) subject to λ ≥ 0 λ,ν Dual feasible: if λ ≥ 0 and λ, ν 2 dom L(x; λ, ν). Dual optimal: d∗ = max min L(x; λ, ν) λ≥0,ν x Duality Primal optimal: p∗ = min max L(x; λ, ν) x λ≥0,ν Lagrange dual function: min L(x; λ, ν) x This is a concave function, regardless of whether f (x) convex or not.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    58 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us