B1 Optimization – Solutions

B1 Optimization – Solutions A. Zisserman, Michaelmas Term 2018 1. The Rosenbrock function is f(x; y) = 100(y − x2)2 + (1 − x)2 (a) Compute the gradient and Hessian of f(x; y). (b) Show that that f(x; y) has zero gradient at the point (1; 1). (c) By considering the Hessian matrix at (x; y) = (1; 1), show that this point is a minimum. (a) Gradient and Hessian 400x3 − 400xy + 2x − 2 1200x2 − 400y + 2 −400x rf = H = 200(y − x2) −400x 200 (b) gradient at the point (1; 1) 400x3 − 400xy + 2x − 2 0 rf = = 200(y − x2) 0 (b) Hessian at the point (1; 1) 1200x2 − 400y + 2 −400x 802 −400 H = = −400x 200 −400 200 Examine eigenvalues: • det is positive, so eigenvalues have same sign (thus not saddle point) • trace is positive, so eigenvalues are positive • Thus a minimum • λ1 = 1001:6006; λ2 = 0:39936077 1 2. In Newton type minimization schemes the update step is of the form δx = −H−1g where g = rf. By considering g.δx compare convergence of: (a) Newton, to (b) Gauss Newton for a general function f(x) (i.e. where H may not be positive definite). A note on positive definite matrices An n × n symmetric matrix M is positive definite if • x>Mx > 0 for all non-zero vectors x • All the eigen-values of M are positive In each case consider df = g:δx. This should be negative for convergence. (a) Newton g:δx = −g>H−1g Can be positive if H not positive definite. (b) Gauss Newton g:δx = −g>(2J>J)−1g Non positive, since J>J is positive definite. 2 3. Explain how you could use the Gauss Newton method to solve a set of simultaneous non- linear equations. Square the non-linear equations and add them – the resulting cost is then a sum of squared residuals, and so has a structure suitable for the Gauss Newton method. For example, the set of equations: g1(x; y) = 0 g2(x; y) = 0 can be solved for x = (x; y) by the following optimization problem which has the required sum of squares form 2 2 min f(x) = g1(x) + g2(x) x 3 4. Sketch the feasible regions defined by the following inequalities and comment on the possible optimal values. (a) −x1 + x2 ≥ 2 x1 + x2 ≤ 1 x1 ≥ 0 x2 ≥ 0 (b) 2x1 − x2 ≥ 2 x1 ≤ 4 x1 ≥ 0 x2 ≥ 0 4 3.5 −x + x = 2 1 2 3 2.5 2 −x1 + x2 ≥ 2 1.5 (a) x1 + x2 ≤ 1 x1 ≥ 0 1 x ≥ 0 x +x = 1 2 0.5 1 2 0 −0.5 −1 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 There is no feasible region, and therefore no possible solutions. 4 8 7 6 5 x = 6*x − 2x 2x1 − x2 ≤ 2 0 1 2 4 (b) x1 ≤ 4 x1 ≥ 0 3 x ≥ 0 2 x = 4 2x − x = 2 1 2 1 2 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 The feasible region is unbounded, but the optimum can still be bounded for some cost functions. 5 5. More on linear programming. (a) Show that the optimization X > min jai x − bij x i where the vectors ai and scalars bi are given, can be formulated as a linear programming problem. (b) Solve the following linear programming problem using Matlab: max 40x1 + 88x2 x1;x2 subject to 2x1 + 8x2 ≤ 60 5x1 + 2x2 ≤ 60 x1 ≥ 0 x2 ≥ 0 (a) The absolute value operator j:j is not linear, so at first sight this does not look like a linear programming problem. However, it can be transformed into one by adding extra variables and constraints > Introduce additional variables αi with the constraints for each i that jai x − bij ≤ αi. This can be written as the two linear constraints: > ai x − bi ≤ αi > ai x − bi ≥ −αi or equivalently > ai x − bi ≤ αi > bi − ai x ≤ αi Then the linear programming problem X min αi x,αi i subject to these constraints, optimizes the original problem. 6 (b) max 40x1 + 88x2 x1;x2 subject to 2x1 + 8x2 ≤ 60 5x1 + 2x2 ≤ 60 x1 ≥ 0 x2 ≥ 0 Matlab code f = [-40; -88]; A = [ 2 8 5 2 ]; b = [ 60; 60 ]; lb = zeros(2,1); x = linprog(f,A,b,[],[],lb); solution is x_1 = 10.0000, x_2 = 5.0000. 7 6. Interior point method using a barrier function. Show that the following 1D problem minimize f(x) = x2; x 2 IR subject to x − 1 ≥ 0 can be reformulated using a logarithmic barrier method as minimize x2 − r log(x − 1) Determine the solution (as a function of r), and show that the global optimum is obtained as r ! 0. We need to find the optimum of min B(x; r) = x2 − r log(x − 1) x Differentiating wrt x gives r 2x − = 0 x − 1 and rearranging gives 2x2 − 2x − r = 0 Use the standard formula for a quadratic equation p −b ± b2 − 4ac x = 2a to obtain p 2 ± 4 + 8r 1 1p x = = ± 1 + 2r 4 2 2 and since only x > 1 is admissible (due to the log) 1 1p x∗(r) = + 1 + 2r 2 2 and as r ! 0, x ! 1, which is the global optimum. 8 7. Mean and median estimates. For a set of measurements faig, show that (a) X 2 min (x − ai) x i is the mean of faig. (b) X min jx − aij x i is the median of faig. (a) N X 2 min (x − ai) x i To find the minimum, differentiate f(x) wrt x, and set to zero: N df(x) X = 2(x − a ) = 0 dx i i and rearranging N N X X x = ai i i and so N 1 X x = a N i i i.e. the mean of faig. (b) X min f(x) = jx − aij x i • The derivative of jx − aij wrt x is +1 when x > ai and −1 when x < ai. • The derivative of f(x) is zero when there are as many values of ai less than x as there are greater than x. • Thus f(x) mininized at the median of values of faig Note, the median is immune to changes in ai that lie far from the median – the value of the cost function changes, but not the position of the median. 9 8. Determine in each case if the following functions are convex: 2 2 (a) The sum of quadratic functions f(x) = a1(x − b1) + a2(x − b2) ; for ai > 0 > (b) The piecewise linear function f(x) = maxi=1;:::;m(ai x + bi) (c) f(x) = maxfx; 1=xg (d) f(x) = jjAx − bjj2 2 2 (a) The sum of quadratic functions f(x) = a1(x − b1) + a2(x − b2) ; for ai > 0 2 Consider expanding the two quadratics, then the coefficient of x is a1 + a2. Using the second derivative test for convexity: d2 f ≥ 0 d x2 then the sum is convex provided that a1 + a2 ≥ 0. So the function is convex since ai > 0. > (b) The piecewise linear function f(x) = maxi=1;:::;m(ai x + bi) This is convex. Most easily seen by drawing and applying the geometric chord test for convexity. f(x) T ai x + bi x 10 (c) f(x) = maxfx; 1=xg for x > 0 This is convex. Most easily seen by drawing and applying the geometric chord test for convexity. 5 4 3 ) u ( h 2 1 0 0 1 2 3 4 u (d) f(x) = jjAx − bjj2 Compute the Hessian rf(x) = 2A>(Ax − b) = 2A>Ax − 2A>b H = 2A>A Hence H is positive definite (because A>A is positive definite), and so the function is convex. 11.

B1 Optimization – Solutions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support