15-780 – Numerical Optimization

15-780 { Numerical Optimization J. Zico Kolter January 29, 2014 1 Overview Introduction to mathematical programming problems • Applications • Classification of optimization problems • (Linear algebra review) • Convex optimization problems • Nonconvex optimization problems • Solving optimization problems • 2 Overview Introduction to mathematical programming problems • Applications • Classification of optimization problems • (Linear algebra review) • Convex optimization problems • Nonconvex optimization problems • Solving optimization problems • 2 Introduction to mathematical optimization Casting AI problems as optimization / mathematical • programming problems has been one of the primary trends of the last 15 years A topic not highlighted in textbook (see website for additional • readings) A seemingly remarkable fact: • Search problems Mathematical programs Variable type Discrete Continuous # of possible solutions Finite Infinite “Difficulty" of solving Exponential Polynomial (often) 3 Formal definition Mathematical programs are problems of the form • minimize f(x) x subject to gi(x) 0 i = 1; : : : ; m ≤ { x Rn is the optimization variable 2 { f : Rn R is objective function ! n { gi : R R are (inequality) constraint functions ! Feasible region: = x : gi(x) 0; i = 1; : : : ; m • C f ≤ 8 g ? n ? x R is an optimal solution if x , and • f(x2?) f(x); x 2 C ≤ 8 2 C 4 Note on naming Mathematical program (or mathematical optimization problem, • and sometimes just optimization problem) refer to the problem of the form minimize f(x) x subject to gi(x) 0 i = 1; : : : ; m ≤ Numerical optimization (or mathematical optimization, or just • optimization) refers to the general study of these problems, as well as methods for solving them on a computer 5 Overview Introduction to mathematical programming problems • Applications • Classification of optimization problems • (Linear algebra review) • Convex optimization problems • Nonconvex optimization problems • Solving optimization problems • 6 Least-squares fitting h(a) b h(a ) b i − i a Given ai; bi; i = 1; : : : ; m, find h(a) = x1a + x2 that optimizes • m X 2 minimize (x1ai + x2 bi) x − i=1 (x1 is slope, x2 is intercept) 7 Weber point (ai,bi) (x1,x2) Given m points in 2D space (ai; bi); i = 1; : : : m, find the point • that minimizes the sum of the Euclidean distances m X p 2 2 minimize (x1 ai) + (x2 bi) x − − i=1 Many modifications, e.g. keep x within range (al; bl), (au; bu) • m X p 2 2 minimize (x1 ai) + (x2 bi) x − − i=1 subject to al x1 au; bl x2 bu ≤ ≤ ≤ ≤ 8 Overview Introduction to mathematical programming problems • Applications • Classification of optimization problems • (Linear algebra review) • Convex optimization problems • Nonconvex optimization problems • Solving optimization problems • 9 Optimization problems Convex problems Nonconvex problems 10 Optimization problems Convex problems Nonconvex problems Semidefinite programming Quadratic programming Linear programming 10 Optimization problems Convex problems Nonconvex problems Semidefinite programming Integer programming Quadratic programming Linear programming 10 Overview Introduction to mathematical programming problems • Applications • Classification of optimization problems • (Linear algebra review) • Convex optimization problems • Nonconvex optimization problems • Solving optimization problems • 11 Notation A (column) vector with n real-valued entries • n x R 2 xi denotes the ith entry A matrix with real-valued entries, m rows, and n columns • m×n A R 2 Aij denotes the entry in the ith row and jth column The transpose operator AT switches rows and columns of a • matrix T Aij = (A )ji 12 Operations m×n Addition: For A; B R , • 2 (A + B)ij = Aij + Bij m×n n×p Multiplication: For A R , B R • 2 2 n X m×p (AB)ij = AikBkj; AB R 2 k=1 associative: A(BC) = (AB)C = ABC distributive: A(B + C) = AB + AC not commutative: AB = BA transpose of product: (6AB)T = BT AT 13 n Inner product: For x; y R • 2 n T X x y = x; y = xiyi h i i=1 n×n Identity matrix: I R has ones on diagonal and zeros • elsewhere, has property2 that IA = A n×n −1 Matrix inverse: For A R , A is unique matrix such that • 2 AA−1 = I = A−1A (may not exist for all square matrices) inverse of product: (AB)−1 = B−1A−1 when A and B both invertible 14 Overview Introduction to mathematical programming problems • Applications • Classification of optimization problems • (Linear algebra review) • Convex optimization problems • Nonconvex optimization problems • Solving optimization problems • 15 Convex optimization problems An extremely powerful subset of all optimization problems • Roughly speaking, allow for efficient (polynomial time) global • solutions Beautiful theory • Lots of applications (but be careful, lots of problems we'd like to • solve are definitely not convex) At this point, a fairly mature technology (off-the-shelf libraries • for medium-sized problems) 16 Formal definition A convex optimization problem is a specialization of a • mathematical programming problem minimize f(x) x subject to x 2 C n n where x R is optimization variable, f : R R is a convex function 2and feasible region is a convex set ! C 17 Convex sets A set is convex if, for a x; y and θ [0; 1], • C 2 C 2 θx + (1 θ)y − 2 C Convex set Nonconvex set n n n All of R : x; y R = θx + (1 θ)y R • 2 ) − 2 n Intervals: = x R : l x u (elementwise inequality) • C f 2 ≤ ≤ g 18 n m×n m Linear inequalities: = x R : Ax b ;A R ; b R • C f 2 ≤ g 2 2 Proof: n x; y R = Ax b; Ay b 2 2 C ) ≤ ≤ = θAx θb; (1 θ)Ay (1 θ)b ) ≤ − ≤ − = θAx + (1 θ)Ay b ) − ≤ = A(θx + (1 θ)y) b ) − ≤ = θx + (1 θ)y ) − 2 C n m×n m Linear equalities = x R : Ax = b ;A R ; b R • C f 2 g 2 2 Tm Intersection of convex sets: = i=1 i for convex sets i, • i = 1; : : : ; m C C C 19 Convex functions n n A function f : R R is convex if, for any x; y R and • θ [0; 1], ! 2 2 f(θx + (1 θ)y) θf(x) + (1 θ)f(y) − ≤ − (y,f(y)) (x,f(x)) f is concave if f is convex • − f is affine if f is convex and concave, must have form • f(x) = aT x + b n for a R , b R. 2 2 20 Testing for convexity Convex function must \curve upwards" everywhere • For functions with scalar input f : R R, equivalent to • condition that f 00(x) 0; x ! ≥ 8 For vector-input functions, corresponding condition is that • 2f(x) 0 rx where 2f(x) is the Hessian of f, rx 2 2 @ f(x) ( xf(x))ij = r @xi@xj and 0 denotes positive definiteness, the condition that for all n z R , 2 zT 2f(x) z 0 rx ≥ 21 Examples Exponential: f(x) = exp(ax),[f 00(x) = a2 exp(ax) 0 x] • ≥ 8 Negative logarithm: f(x) = log(x),[f 00(x) = 1=x2 0 x] • − ≥ 8 Squared Euclidean distance: f(x) = xT x [After some • 2 derivations, you'll find that xf(x) = I; x, which is positive semidefinite since zT Iz = zrT z 0] 8 ≥ T Euclidean distance: f(x) = x 2 = px x • k k Non-negative weighted sum of convex functions: • m X f(x) = wifi(x) i=1 where wi 0 and fi convex, i = 1; : : : ; m ≥ 8 22 Negative square root: f(x) = px convex for x 0 • − ≥ Log-sum-exp: f(x) = log(ex1 + ex2 + ::: + exn ) • Maximum of convex functions: • m f(x) = max fi(x) i=1 where fi convex, i = 1; : : : ; m 8 Composition of convex and affine function: if f(y) convex in y, • f(Ax b) is convex in x − Sublevel sets: for f convex, = x : f(x) c is a convex set • C f ≤ g 23 Convex optimization problems (again) Using sublevel sets property, it's more common to write generic • convex optimization problems as minimize f(x) x subject to gi(x) 0 i = 1; : : : ; m ≤ hi(x) = 0 i = 1; : : : ; p where f (objective) is convex, gi's (inequality constraints) are convex, and hi's (equality constraints) are affine Key property of convex optimization problems: all local • solutions are global solutions 24 Definition: a point x is globally optimal if x is feasible and • there is no feasible y such that f(y) < f(x) Definition: a point x is locally optimal if x is feasible and there • exists some R > 0 such that for all feasible y with that x y 2 R, f(x) f(y) k − k ≤ ≤ Theorem: for a convex optimization problem, all locally optimal • points are globally optimal Proof (by contradiction): Suppose there exists feasible y such that f(y) < f(x). Pick the point R z = θx + (1 θ)y; θ = 1 : − − 2 x y 2 k − k We will show x z 2 < R, z is feasible, and f(z) < f(x), contradicting localk − optimality.k 25 Proof (cont) R R x z 2 = x 1 x + y k − k − − 2 x y 2 2 x y 2 2 k − k k − k R = (x y) = R=2 < R 2 x y 2 − k − k 2 Furthermore, by convexity of feasible set, z is feasible. This shows all the points above, contradicting the supposition that there was a non-local feasible y with f(y) < f(x). 26 Example: least-squares General (potentially multi-variable) least-squares problems is • given by minimize Ax b 2 x k − k2 n m×n for optimization variable x R , data matrices A R , m 2 2 2 n b R , and where y 2 is the `2 norm of vector y R is 2 k k 2 m X y 2 = yT y = y2: k k2 i i=1 A convex optimization problem (why?) • 27 Example: Weber point Weber point in n dimensions • m X minimize x ai 2 x k − k i=1 n where x R is optimization variable and a1; : : : ; am are problem2 data.

Load more