B1 Optimization – Solutions
Total Page:16
File Type:pdf, Size:1020Kb
B1 Optimization – Solutions A. Zisserman, Michaelmas Term 2018 1. The Rosenbrock function is f(x; y) = 100(y − x2)2 + (1 − x)2 (a) Compute the gradient and Hessian of f(x; y). (b) Show that that f(x; y) has zero gradient at the point (1; 1). (c) By considering the Hessian matrix at (x; y) = (1; 1), show that this point is a mini- mum. (a) Gradient and Hessian 400x3 − 400xy + 2x − 2 1200x2 − 400y + 2 −400x rf = H = 200(y − x2) −400x 200 (b) gradient at the point (1; 1) 400x3 − 400xy + 2x − 2 0 rf = = 200(y − x2) 0 (b) Hessian at the point (1; 1) 1200x2 − 400y + 2 −400x 802 −400 H = = −400x 200 −400 200 Examine eigenvalues: • det is positive, so eigenvalues have same sign (thus not saddle point) • trace is positive, so eigenvalues are positive • Thus a minimum • λ1 = 1001:6006; λ2 = 0:39936077 1 2. In Newton type minimization schemes the update step is of the form δx = −H−1g where g = rf. By considering g.δx compare convergence of: (a) Newton, to (b) Gauss Newton for a general function f(x) (i.e. where H may not be positive definite). A note on positive definite matrices An n × n symmetric matrix M is positive definite if • x>Mx > 0 for all non-zero vectors x • All the eigen-values of M are positive In each case consider df = g:δx. This should be negative for convergence. (a) Newton g:δx = −g>H−1g Can be positive if H not positive definite. (b) Gauss Newton g:δx = −g>(2J>J)−1g Non positive, since J>J is positive definite. 2 3. Explain how you could use the Gauss Newton method to solve a set of simultaneous non- linear equations. Square the non-linear equations and add them – the resulting cost is then a sum of squared residuals, and so has a structure suitable for the Gauss Newton method. For example, the set of equations: g1(x; y) = 0 g2(x; y) = 0 can be solved for x = (x; y) by the following optimization problem which has the required sum of squares form 2 2 min f(x) = g1(x) + g2(x) x 3 4. Sketch the feasible regions defined by the following inequalities and comment on the possible optimal values. (a) −x1 + x2 ≥ 2 x1 + x2 ≤ 1 x1 ≥ 0 x2 ≥ 0 (b) 2x1 − x2 ≥ 2 x1 ≤ 4 x1 ≥ 0 x2 ≥ 0 4 3.5 −x + x = 2 1 2 3 2.5 2 −x1 + x2 ≥ 2 1.5 (a) x1 + x2 ≤ 1 x1 ≥ 0 1 x ≥ 0 x +x = 1 2 0.5 1 2 0 −0.5 −1 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 There is no feasible region, and therefore no possible solutions. 4 8 7 6 5 x = 6*x − 2x 2x1 − x2 ≤ 2 0 1 2 4 (b) x1 ≤ 4 x1 ≥ 0 3 x ≥ 0 2 x = 4 2x − x = 2 1 2 1 2 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 The feasible region is unbounded, but the optimum can still be bounded for some cost functions. 5 5. More on linear programming. (a) Show that the optimization X > min jai x − bij x i where the vectors ai and scalars bi are given, can be formulated as a linear program- ming problem. (b) Solve the following linear programming problem using Matlab: max 40x1 + 88x2 x1;x2 subject to 2x1 + 8x2 ≤ 60 5x1 + 2x2 ≤ 60 x1 ≥ 0 x2 ≥ 0 (a) The absolute value operator j:j is not linear, so at first sight this does not look like a linear programming problem. However, it can be transformed into one by adding extra variables and constraints > Introduce additional variables αi with the constraints for each i that jai x − bij ≤ αi. This can be written as the two linear constraints: > ai x − bi ≤ αi > ai x − bi ≥ −αi or equivalently > ai x − bi ≤ αi > bi − ai x ≤ αi Then the linear programming problem X min αi x,αi i subject to these constraints, optimizes the original problem. 6 (b) max 40x1 + 88x2 x1;x2 subject to 2x1 + 8x2 ≤ 60 5x1 + 2x2 ≤ 60 x1 ≥ 0 x2 ≥ 0 Matlab code f = [-40; -88]; A = [ 2 8 5 2 ]; b = [ 60; 60 ]; lb = zeros(2,1); x = linprog(f,A,b,[],[],lb); solution is x_1 = 10.0000, x_2 = 5.0000. 7 6. Interior point method using a barrier function. Show that the following 1D problem minimize f(x) = x2; x 2 IR subject to x − 1 ≥ 0 can be reformulated using a logarithmic barrier method as minimize x2 − r log(x − 1) Determine the solution (as a function of r), and show that the global optimum is obtained as r ! 0. We need to find the optimum of min B(x; r) = x2 − r log(x − 1) x Differentiating wrt x gives r 2x − = 0 x − 1 and rearranging gives 2x2 − 2x − r = 0 Use the standard formula for a quadratic equation p −b ± b2 − 4ac x = 2a to obtain p 2 ± 4 + 8r 1 1p x = = ± 1 + 2r 4 2 2 and since only x > 1 is admissible (due to the log) 1 1p x∗(r) = + 1 + 2r 2 2 and as r ! 0, x ! 1, which is the global optimum. 8 7. Mean and median estimates. For a set of measurements faig, show that (a) X 2 min (x − ai) x i is the mean of faig. (b) X min jx − aij x i is the median of faig. (a) N X 2 min (x − ai) x i To find the minimum, differentiate f(x) wrt x, and set to zero: N df(x) X = 2(x − a ) = 0 dx i i and rearranging N N X X x = ai i i and so N 1 X x = a N i i i.e. the mean of faig. (b) X min f(x) = jx − aij x i • The derivative of jx − aij wrt x is +1 when x > ai and −1 when x < ai. • The derivative of f(x) is zero when there are as many values of ai less than x as there are greater than x. • Thus f(x) mininized at the median of values of faig Note, the median is immune to changes in ai that lie far from the median – the value of the cost function changes, but not the position of the median. 9 8. Determine in each case if the following functions are convex: 2 2 (a) The sum of quadratic functions f(x) = a1(x − b1) + a2(x − b2) ; for ai > 0 > (b) The piecewise linear function f(x) = maxi=1;:::;m(ai x + bi) (c) f(x) = maxfx; 1=xg (d) f(x) = jjAx − bjj2 2 2 (a) The sum of quadratic functions f(x) = a1(x − b1) + a2(x − b2) ; for ai > 0 2 Consider expanding the two quadratics, then the coefficient of x is a1 + a2. Using the second derivative test for convexity: d2 f ≥ 0 d x2 then the sum is convex provided that a1 + a2 ≥ 0. So the function is convex since ai > 0. > (b) The piecewise linear function f(x) = maxi=1;:::;m(ai x + bi) This is convex. Most easily seen by drawing and applying the geometric chord test for convexity. f(x) T ai x + bi x 10 (c) f(x) = maxfx; 1=xg for x > 0 This is convex. Most easily seen by drawing and applying the geometric chord test for convexity. 5 4 3 ) u ( h 2 1 0 0 1 2 3 4 u (d) f(x) = jjAx − bjj2 Compute the Hessian rf(x) = 2A>(Ax − b) = 2A>Ax − 2A>b H = 2A>A Hence H is positive definite (because A>A is positive definite), and so the function is convex. 11.