B1 Optimization – Solutions

A. Zisserman, Michaelmas Term 2018

1. The Rosenbrock function is f(x, y) = 100(y − x2)2 + (1 − x)2 (a) Compute the gradient and Hessian of f(x, y). (b) Show that that f(x, y) has zero gradient at the point (1, 1). (c) By considering the at (x, y) = (1, 1), show that this point is a mini- mum.

(a) Gradient and Hessian  400x3 − 400xy + 2x − 2   1200x2 − 400y + 2 −400x  ∇f = H = 200(y − x2) −400x 200

(b) gradient at the point (1, 1)  400x3 − 400xy + 2x − 2   0  ∇f = = 200(y − x2) 0

(b) Hessian at the point (1, 1)  1200x2 − 400y + 2 −400x   802 −400  H = = −400x 200 −400 200 Examine eigenvalues: • det is positive, so eigenvalues have same sign (thus not saddle point) • trace is positive, so eigenvalues are positive • Thus a minimum

• λ1 = 1001.6006, λ2 = 0.39936077

1 2. In Newton type minimization schemes the update step is of the form δx = −H−1g where g = ∇f. By considering g.δx compare convergence of: (a) Newton, to (b) Gauss Newton for a general function f(x) (i.e. where H may not be positive definite).

A note on positive definite matrices

An n × n symmetric matrix M is positive definite if • x>Mx > 0 for all non-zero vectors x • All the eigen-values of M are positive

In each case consider df = g.δx. This should be negative for convergence. (a) Newton

g.δx = −g>H−1g Can be positive if H not positive definite. (b) Gauss Newton g.δx = −g>(2J>J)−1g Non positive, since J>J is positive definite.

2 3. Explain how you could use the Gauss Newton method to solve a set of simultaneous non- linear equations.

Square the non-linear equations and add them – the resulting cost is then a sum of squared residuals, and so has a structure suitable for the Gauss Newton method. For example, the set of equations:

g1(x, y) = 0

g2(x, y) = 0 can be solved for x = (x, y) by the following which has the required sum of squares form 2 2 min f(x) = g1(x) + g2(x) x

3 4. Sketch the feasible regions defined by the following inequalities and comment on the possible optimal values. (a)

−x1 + x2 ≥ 2

x1 + x2 ≤ 1

x1 ≥ 0

x2 ≥ 0 (b)

2x1 − x2 ≥ 2

x1 ≤ 4

x1 ≥ 0

x2 ≥ 0

4

3.5 −x + x = 2 1 2 3

2.5

2 −x1 + x2 ≥ 2 1.5 (a) x1 + x2 ≤ 1 x1 ≥ 0 1

x ≥ 0 x +x = 1 2 0.5 1 2

0

−0.5

−1 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 There is no , and therefore no possible solutions.

4 8

7

6

5

x = 6*x − 2x 2x1 − x2 ≤ 2 0 1 2 4 (b) x1 ≤ 4

x1 ≥ 0 3 x ≥ 0 2 x = 4 2x − x = 2 1 2 1 2

1

0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 The feasible region is unbounded, but the optimum can still be bounded for some cost functions.

5 5. More on . (a) Show that the optimization X > min |ai x − bi| x i

where the vectors ai and scalars bi are given, can be formulated as a linear program- ming problem. (b) Solve the following linear programming problem using Matlab:

max 40x1 + 88x2 x1,x2 subject to

2x1 + 8x2 ≤ 60

5x1 + 2x2 ≤ 60

x1 ≥ 0

x2 ≥ 0

(a) The absolute value operator |.| is not linear, so at first sight this does not look like a linear programming problem. However, it can be transformed into one by adding extra variables and constraints > Introduce additional variables αi with the constraints for each i that |ai x − bi| ≤ αi. This can be written as the two linear constraints:

> ai x − bi ≤ αi > ai x − bi ≥ −αi or equivalently

> ai x − bi ≤ αi > bi − ai x ≤ αi Then the linear programming problem X min αi x,αi i subject to these constraints, optimizes the original problem.

6 (b)

max 40x1 + 88x2 x1,x2 subject to

2x1 + 8x2 ≤ 60

5x1 + 2x2 ≤ 60

x1 ≥ 0

x2 ≥ 0 Matlab code

f = [-40; -88];

A = [ 2 8 5 2 ];

b = [ 60; 60 ];

lb = zeros(2,1);

x = linprog(f,A,b,[],[],lb);

solution is x_1 = 10.0000, x_2 = 5.0000.

7 6. Interior point method using a . Show that the following 1D problem minimize f(x) = x2, x ∈ IR subject to x − 1 ≥ 0 can be reformulated using a logarithmic barrier method as minimize x2 − r log(x − 1) Determine the solution (as a function of r), and show that the global optimum is obtained as r → 0.

We need to find the optimum of min B(x, r) = x2 − r log(x − 1) x Differentiating wrt x gives r 2x − = 0 x − 1 and rearranging gives 2x2 − 2x − r = 0 Use the standard formula for a quadratic equation √ −b ± b2 − 4ac x = 2a to obtain √ 2 ± 4 + 8r 1 1√ x = = ± 1 + 2r 4 2 2 and since only x > 1 is admissible (due to the log) 1 1√ x∗(r) = + 1 + 2r 2 2 and as r → 0, x → 1, which is the global optimum.

8 7. Mean and median estimates. For a set of measurements {ai}, show that (a) X 2 min (x − ai) x i

is the mean of {ai}. (b) X min |x − ai| x i

is the median of {ai}.

(a) N X 2 min (x − ai) x i To find the minimum, differentiate f(x) wrt x, and set to zero:

N df(x) X = 2(x − a ) = 0 dx i i and rearranging N N X X x = ai i i and so N 1 X x = a N i i

i.e. the mean of {ai}.

(b) X min f(x) = |x − ai| x i

• The of |x − ai| wrt x is +1 when x > ai and −1 when x < ai. • The derivative of f(x) is zero when there are as many values of ai less than x as there are greater than x. • Thus f(x) mininized at the median of values of {ai}

Note, the median is immune to changes in ai that lie far from the median – the value of the cost function changes, but not the position of the median.

9 8. Determine in each case if the following functions are convex: 2 2 (a) The sum of quadratic functions f(x) = a1(x − b1) + a2(x − b2) , for ai > 0 > (b) The piecewise linear function f(x) = maxi=1,...,m(ai x + bi) (c) f(x) = max{x, 1/x} (d) f(x) = ||Ax − b||2

2 2 (a) The sum of quadratic functions f(x) = a1(x − b1) + a2(x − b2) , for ai > 0 2 Consider expanding the two quadratics, then the coefficient of x is a1 + a2. Using the second for convexity: d2 f ≥ 0 d x2 then the sum is convex provided that a1 + a2 ≥ 0. So the function is convex since ai > 0.

> (b) The piecewise linear function f(x) = maxi=1,...,m(ai x + bi) This is convex. Most easily seen by drawing and applying the geometric chord test for convexity.

f(x)

T ai x + bi

x

10 (c) f(x) = max{x, 1/x} for x > 0 This is convex. Most easily seen by drawing and applying the geometric chord test for convexity.

5

4

3 ) u (

h 2

1

0 0 1 2 3 4 u

(d) f(x) = ||Ax − b||2 Compute the Hessian ∇f(x) = 2A>(Ax − b) = 2A>Ax − 2A>b H = 2A>A Hence H is positive definite (because A>A is positive definite), and so the function is convex.

11