
Practical Session on Convex Optimization: Constrained Optimization Mark Schmidt INRIA/ENS September 2011 Mark Schmidt MLSS 2011 Constrained Optimization Motivation: Optimizing with Constraints Often we have constraints on problem: Natural bounds on the variables. Regularization or identifiability. Domain of function is restricted. We may introduce constraints to use problem structure. Mark Schmidt MLSS 2011 Constrained Optimization We can convert this to a smooth constrained optimization: X min f (w) + λ s: −s≤w≤s i Or write it as a smooth bound-constrained problem: X X min f (w + − w −) + λ w + + λ w −: w +≥0;w −≥0 i i Example: `1-Regularized Optimization `1-regularization problems are of the form min f (w) + λjjwjj1 w The problem is non-smooth because of the `1-norm. Mark Schmidt MLSS 2011 Constrained Optimization Or write it as a smooth bound-constrained problem: X X min f (w + − w −) + λ w + + λ w −: w +≥0;w −≥0 i i Example: `1-Regularized Optimization `1-regularization problems are of the form min f (w) + λjjwjj1 w The problem is non-smooth because of the `1-norm. We can convert this to a smooth constrained optimization: X min f (w) + λ s: −s≤w≤s i Mark Schmidt MLSS 2011 Constrained Optimization Example: `1-Regularized Optimization `1-regularization problems are of the form min f (w) + λjjwjj1 w The problem is non-smooth because of the `1-norm. We can convert this to a smooth constrained optimization: X min f (w) + λ s: −s≤w≤s i Or write it as a smooth bound-constrained problem: X X min f (w + − w −) + λ w + + λ w −: w +≥0;w −≥0 i i Mark Schmidt MLSS 2011 Constrained Optimization Outline: Optimizing with Constraints Penalty-type methods for constrained optimization. Projection-type methods for constrained optimization. Convex Duality Mark Schmidt MLSS 2011 Constrained Optimization Penalty method for inequality constraints: Re-write min f (x); c(x)≥0 as µ min f (x) + jj maxf0; c(x)gjj2: x 2 These converge to the original problem as µ ! 1. Penalty-type Methods Penalty-type methods re-write as an unconstrained problem, e.g. Penalty method for equality constraints: Re-write min f (x); c(x)=0 as µ min f (x) + jjc(x)jj2: x 2 Mark Schmidt MLSS 2011 Constrained Optimization These converge to the original problem as µ ! 1. Penalty-type Methods Penalty-type methods re-write as an unconstrained problem, e.g. Penalty method for equality constraints: Re-write min f (x); c(x)=0 as µ min f (x) + jjc(x)jj2: x 2 Penalty method for inequality constraints: Re-write min f (x); c(x)≥0 as µ min f (x) + jj maxf0; c(x)gjj2: x 2 Mark Schmidt MLSS 2011 Constrained Optimization Penalty-type Methods Penalty-type methods re-write as an unconstrained problem, e.g. Penalty method for equality constraints: Re-write min f (x); c(x)=0 as µ min f (x) + jjc(x)jj2: x 2 Penalty method for inequality constraints: Re-write min f (x); c(x)≥0 as µ min f (x) + jj maxf0; c(x)gjj2: x 2 These converge to the original problem as µ ! 1. Mark Schmidt MLSS 2011 Constrained Optimization Note the trade-off associated with penalty methods: Small µ: easily solved but is a poor approximation. Large µ good approximation and is hard to solve. Augmented Lagrangian methods incorporate Lagrange multiplier estimates to improve the approximation for finite µ. Exercise: Penalty Methods Penalty method for non-negative `1-regularized logistic regression: n X T min λjjwjj1 + log(1 + exp(−yi (w xi ))): w i=1 Use an existing unconstrained optimization code. Solve for an increasing sequence of µ values. Mark Schmidt MLSS 2011 Constrained Optimization Augmented Lagrangian methods incorporate Lagrange multiplier estimates to improve the approximation for finite µ. Exercise: Penalty Methods Penalty method for non-negative `1-regularized logistic regression: n X T min λjjwjj1 + log(1 + exp(−yi (w xi ))): w i=1 Use an existing unconstrained optimization code. Solve for an increasing sequence of µ values. Note the trade-off associated with penalty methods: Small µ: easily solved but is a poor approximation. Large µ good approximation and is hard to solve. Mark Schmidt MLSS 2011 Constrained Optimization Exercise: Penalty Methods Penalty method for non-negative `1-regularized logistic regression: n X T min λjjwjj1 + log(1 + exp(−yi (w xi ))): w i=1 Use an existing unconstrained optimization code. Solve for an increasing sequence of µ values. Note the trade-off associated with penalty methods: Small µ: easily solved but is a poor approximation. Large µ good approximation and is hard to solve. Augmented Lagrangian methods incorporate Lagrange multiplier estimates to improve the approximation for finite µ. Mark Schmidt MLSS 2011 Constrained Optimization Augmented Lagrangian Method Augmented Lagrangian method for equality constraints: 1 Approximately solve µ min f (x) + y T c(x) + jjc(x)jj2: x k 2 2 Update Lagrange multiplier estimates: yk+1 = yk + µc(x): (for increasing sequence of µ values) Mark Schmidt MLSS 2011 Constrained Optimization Exercise: Extend the penalty method to an augmented Lagrangian method. Augmented Lagrangian Method Augmented Lagrangian method for inequality constraints: 1 Approximately solve µ min f (x) + y T c(x) + jj maxf0; c(x)gjj2: x k 2 2 Update Lagrange multiplier estimates: yk+1 = maxf0; yk + µc(x)g: (for increasing sequence of µ values) Mark Schmidt MLSS 2011 Constrained Optimization Augmented Lagrangian Method Augmented Lagrangian method for inequality constraints: 1 Approximately solve µ min f (x) + y T c(x) + jj maxf0; c(x)gjj2: x k 2 2 Update Lagrange multiplier estimates: yk+1 = maxf0; yk + µc(x)g: (for increasing sequence of µ values) Exercise: Extend the penalty method to an augmented Lagrangian method. Mark Schmidt MLSS 2011 Constrained Optimization Notes on Penalty-Type Methods Exact penalty methods use a non-smooth penalty, min f (x) + µjjc(x)jj1; x and are equivalent to the original problem for finite µ. Log-Barrier methods enforce strict feasibility, X min f (x) + µ log ci (x): x i Most interior-point software packages implement a primal-dual log-barrier method. Mark Schmidt MLSS 2011 Constrained Optimization Outline: Optimizing with Constraints Penalty-type methods for constrained optimization. Projection-type methods for constrained optimization. Convex Duality Mark Schmidt MLSS 2011 Constrained Optimization Projection-type methods use the projection operator, 1 2 PC(x) = arg min jjx − yjj : y2C 2 For non-negative constraints, this operator is simply x = maxf0; xg. Projection-type Methods Projection-type methods address the problem of optimizing over convex sets. A convex set C is a set such that θx + (1 − θ)y 2 C; for all x, y 2 C and 0 ≤ θ ≤ 1. Mark Schmidt MLSS 2011 Constrained Optimization For non-negative constraints, this operator is simply x = maxf0; xg. Projection-type Methods Projection-type methods address the problem of optimizing over convex sets. A convex set C is a set such that θx + (1 − θ)y 2 C; for all x, y 2 C and 0 ≤ θ ≤ 1. Projection-type methods use the projection operator, 1 2 PC(x) = arg min jjx − yjj : y2C 2 Mark Schmidt MLSS 2011 Constrained Optimization Projection-type Methods Projection-type methods address the problem of optimizing over convex sets. A convex set C is a set such that θx + (1 − θ)y 2 C; for all x, y 2 C and 0 ≤ θ ≤ 1. Projection-type methods use the projection operator, 1 2 PC(x) = arg min jjx − yjj : y2C 2 For non-negative constraints, this operator is simply x = maxf0; xg. Mark Schmidt MLSS 2011 Constrained Optimization We can use a variant of the Armijo condition to choose αk : T f (xk+1) ≤ f (xk ) − γrf (xk ) (xk+1 − xk ): This algorithm has similar convergence and rate of convergence properties to the gradient method. We can use many of the same tricks (polynomial interpolation, Nesterov extrapolation, Barzilai-Borwein step length). Modify either the Nesterov code or the gradient code from the first session to do gradient projection for non-negative `1-regularized logistic regression. Projection-type Methods The most basic projection-type method is gradient projection: xk+1 = PC(xk − αk rf (xk )): Mark Schmidt MLSS 2011 Constrained Optimization Modify either the Nesterov code or the gradient code from the first session to do gradient projection for non-negative `1-regularized logistic regression. Projection-type Methods The most basic projection-type method is gradient projection: xk+1 = PC(xk − αk rf (xk )): We can use a variant of the Armijo condition to choose αk : T f (xk+1) ≤ f (xk ) − γrf (xk ) (xk+1 − xk ): This algorithm has similar convergence and rate of convergence properties to the gradient method. We can use many of the same tricks (polynomial interpolation, Nesterov extrapolation, Barzilai-Borwein step length). Mark Schmidt MLSS 2011 Constrained Optimization Projection-type Methods The most basic projection-type method is gradient projection: xk+1 = PC(xk − αk rf (xk )): We can use a variant of the Armijo condition to choose αk : T f (xk+1) ≤ f (xk ) − γrf (xk ) (xk+1 − xk ): This algorithm has similar convergence and rate of convergence properties to the gradient method. We can use many of the same tricks (polynomial interpolation, Nesterov extrapolation, Barzilai-Borwein step length). Modify either the Nesterov code or the gradient code from the first session to do gradient projection for non-negative `1-regularized logistic regression. Mark Schmidt MLSS 2011 Constrained Optimization But several heuristics are available: 1 Sequential quadratic programming: Use a linear approximation to the constraints. 2 Active-Set: Sub-optimize over a manifold of selected constraints.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages40 Page
-
File Size-