INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY

MA 106 Autumn 2012-13

Note 6

APPLICATIONS OF ROW REDUCTION

As we saw at the end of the last note, the Gaussian reduction provides a very practical method for solving systems of linear equations. Its superiority can be seen, for example, by comparing it with other methods, e.g. the Crammer’s rule. This rule has a very limited applicability, since it applies only when the number of equations equals the number of unknowns, i.e. when the coefficient A in the system

Ax = b (1) is a square matrix (of order n, say). Even when this is so, the rule is silent about what happens if A is not invertible, i.e. when |A| = 0. And even when it does speak, the formula it gives for the solution viz.

△1 △2 △n x1 = , x2 = ,...,x = (2) △ △ n △

(where △ = |A|, the of A and for i = 1, 2,...,n, △i is the determinant of the matrix obtained by replacing the i-th column of A by the column vector b) involves the computation of n + 1 of order n each. This is a horrendous task even for small values of n. With row reduction, we not only get a complete answer applicable regard- less of the size of A, we get it very efficiently. The efficiency of an algorithm depends on the number of various elementary operations such as addition and multiplication that have to be carried out in its execution. This num- ber, in general, depends not only on the algorithm but also on the particular piece of data to which it is applied. For example, if A = In in (1) then hardly any arithmetic is needed. To avoid such false impressions of efficiency, there are two standard yardsticks. One is the pessimistic one where we consider the worst case performance. The other is the practical one, where we consider the average performance, the average being taken either over all instances of data of a given size, or in some cases, over only those pieces of data which are more relevant in a particular application. (For example, in

1 some applications, the matrix A has only non-negative entries. Or in some applications it may be what is called a , i.e. a matrix with a large number of zero entries, such as the of a graph.) Questions of efficiency of an algorithm are considered in a branch of math- ematics called numerical analysis. We bypass them except to comment that the number of multiplications needed in the execution of the Gaus- sian reduction applied to (1) is, in the worst case, of the order of k3 where k = max{m, n} is a measure of the size of the piece of data. Instead, we turn to some theoretical applications of Gaussian reduction. Some of these are already familiar to us in the special cases where the size of A in (1) is 3 or less. For example, in Note 1, we used the determinant criterion to show the concurrency of the three altitudes of a triangle. The proofs we know are often by direct computation. We shall show how this can be derived easily by Gaussian reduction. Unless otherwise stated, throughout we shall assume that we are dealing with the system (1) where A is an m × n matrix and that after Gaussian reduction, it is transformed into

A∗x = b∗ (3) where the matrix A∗ is in a . To fix notations, we assume that only the first r rows of A∗ are non-zero and further that for i =1, 2,...,r, the first non-zero entry (which is always 1) in the i-th row occurs in the ji-th column, so that the integers j1, j2,...,jr are strictly increasing. (We recall that these are also the column numbers of the locations of the successive pivots in the Gaussian reduction. Note further that nothing is said about the vanishing or non-vanishing of the entries in the column vector b∗.) Since the matrix A and the column vector b are subjected to the same row operations throughout the Gaussian reduction, it is sometimes convenient to . merge them into a single m × (n + 1) matrix suggestively denoted by [A . b] and called the of (1) because it is obtained by appending . or ‘augmenting’ one more column to A. Similarly, [A∗ . b∗] is the augmented matrix of the reduced system A∗x = b∗. We first observe that the rank of A∗ is r. To see this, note first that the ∗ r × r submatrix of A formed by the first r rows and the j1-th, j2-th, . . . , ∗ jr-th columns of A is an upper with all diagonal entries and hence the determinant 1. So r(A∗) is at least r. On the other hand any r +1 (or more) rows of A and hence any submatrix of A contain at least one row with all entries 0. So, A cannot have an invertible submatrix of order

2 r + 1 (or more). Hence r(A) is precisely r. This is hardly an achievement. But a non-trivial consequence is that the rank of the original matrix A is also r because the rank is invariant under each elementary row operation as is easy to check directly. We thus see that the Gaussian elimination gives an efficient method to determine the rank of a matrix A. If m = n, then A and A∗ are square matrices. The determinant of A∗ is the product of its diagonal entries. From this we can determine the determinant of A if we have kept track of which row operations we did in reducing A to A∗. . . What about the ranks of the augmented matrices [A . b] and [A∗ . b∗]? Again, they are equal to each other. Moreover they will be either r or r + 1. From the entries of the reduced column vector b∗, it is very easy to tell which possibility holds. Suppose the entries in the last m − r rows are ∗ ∗ ∗ br+1, br+2,...,bm. If at least one of these numbers is non-zero then the rank of the augmented matrix is r + 1. Otherwise it is r. In terms of these two ranks, the criterion for the existence of a solution to (1) is simply that . r(A)= r([A . b]) (4)

This is proved easily because for the reduced system A∗x = b∗, it is very ∗ obvious that a solution cannot exist if bj is non-zero for some j>r, for in that case the j-th equation (in the reduced system) will be 0x1 +0x2 + ... + ∗ 0xn = bj . Such a system is said to be inconsistent. On the other hand, if ∗ ∗ ∗ br+1, br+2,...,bm all vanish then a solution does exist. In fact, we can get the entire solution set (or the ‘solution space’ as it is sometimes called) as follows.

We assign arbitrary values to the n − r variables other than xj1 , xj2 ,...,xjr . Once these values are fixed these r variables can be determined one-by-one starting in the reverse order. (This is known as back substitution.) The fact that n − r variables could be assigned arbitrary values is expressed technically by saying that the solution space has n−r degrees of freedom. Such a system is said to be under-determined because there aren’t enough equations to determine the unknowns uniquely. Note that since r ≤ m, a system with m < n, i.e. a system with fewer equations than unknowns is always underdetermined with at least n − m degrees of freedom, provided, of course, that it is consistent. Thus we see that the number r plays a crucial role. It is also called the rank of the system (1). (The rank of the augmented matrix is called the ‘augmented rank’ of the system.) Superficially, the system has m equations.

3 But the rank tells how many ‘really different’ equations there are, the un- derstanding being that an equation which follows as a consequence of some other equations in the system will be considered as redundant and will not be counted in presence of those equations from which it follows. Gaussian elim- ination essentially amounts to eliminating such redundant equations. When we have reduced the system to these r non-redundant equations, each one of them reduces the degree of freedom by one and that is why the solution set, except when it is empty, has n − r degrees of freedom. Later we shall introduce the concept of linear dependence which will give a precise meaning to what we have been describing loosely as redundant. All these cases can be illustrated by considering systems of linear equations in three variables which we denote by x, y, z so as to make us feel at home. Then the solution sets are various subsets of the three dimensional space R3. A single (non-degenerate) equation represents a plane which has 2 degrees of freedom. A system of two such equations represents a straight line in general, viz. the line of intersection of the two planes represented by the two equations. Such a line has one degree of freedom. (That is why it can be parametrised by a single parameter.) Note that here n = 3 and r = m = 2. However, if the two planes are parallel then even though m = 2, r is only 1. If the two planes are parallel (but distinct) then the augmented rank is 2 and there is no solution, i.e. the system is inconsistent. But when the two planes are identical, both the rank and the augmented rank equal 1 and the solution set is this common plane which has two degrees of freedom. If we have a system of three linear equations in x, yz, the solution set will be either empty, or a single point or a line or a plane (or even the entire IR3 when all three equations are degenerate. Which possibility holds when can be ascertained from the rank of the coefficient matrix and that of the augmented matrix. The criterion for the concurrency of three lines in a plane (which was used in Note 1 to prove the concurrency of the three altitudes of a triangle) can also be derived from the theory of linear equations. Suppose that the three lines are given by the equations: a1x + b1y = c1, a2x + b2y = c2 and a3x + b3y = c3. Here the matrix A is a 3 × 2 matrix. So its rank is at most 2. So, for a solution to exist the augmented matrix cannot have rank 3. Hence a b c 1 1 1 the 3 × 3 determinant a2 b2 c2 must vanish. However, this is only a a3 b3 c3 necessary condition. It is also sufficient if A has rank 2. But in degenerate

4 cases the determinant may vanish and still there is no solution, e.g. when the equations represent three parallel lines. Such degeneracies usually do not arise where the criterion is applied. (For example, the altitudes of a triangle can never be parallel to each other.) If the column vector b in (1) is the zero vector, the system is said to be homogeneous. Note that in this case the rank is the same as the augmented rank and so a solution always exists. Of course, this is also obvious directly since there is always the trivial solution in which every xi vanishes. The crucial question is whether the system has a non-trivial solution. Let r be the rank of the matrix A. The solution space will have n − r degrees of freedom. So, for a non-trivial solution of the homogeneous system, r must be less than n. Especially interesting is the case when m = n. In this case A is a square matrix. For a non-trivial solution, the rank must be less than n, which is equivalent to saying that the determinant of A must vanish. We record this as a theorem.

Theorem 1: A homogeneous system Ax = 0 of n equations in n unknowns has a non-trivial solutions if and only if det (A) = 0.

This theorem is often used as an elimination technique. Suppose, for example, that we want to find the equation of a circle passing through three given (non-collinear) points, (xi,yi), i = 1, 2, 3 in the plane. The desired equation is of the form A(x2 + y2)+2Gx +2Fy + C = 0, where A,G,F,C are some constants not all of which are 0. As the three points lie on this, we get 2 2 A(x1 + y1)+2Gx1 +2Fy1 + C =0 (5) 2 2 A(x2 + y2)+2Gx2 +2Fy2 + C =0 (6) 2 2 A(x3 + y3)+2Gx3 +2Fy3 + C =0 (7) These three equations along with the equation A(x2 +y2)+2Gx+2Fy+C =0 of the circle give a homogeneous system of four linear equations in the four unknowns A, G, F and C. Since they are not all zero, the system has a non-trivial solution and so its determinant vanishes. As a result we get the equation of the desired circle as x2 + y2 2x 2y 1 2 2 x1 + y1 2x1 2y1 1 2 2 =0 (8) x2 + y2 2x2 2y2 1 x2 y2 x y 3 + 3 2 3 2 3 1 5 Another application of Theorem 1 is that it gives a characterisation of eigenvalues which is often taken as the definition.

Theorem 2: Let A be a square matrix. Then a number λ is an eigenvalue of A if and only if there exists a non-zero column vector v such that Av = λv.

Proof: The condition is equivalent to saying that the system Ax = λx has a non-trivial solution. If we rewrite this system as (A − λI)x = 0, then it is a homogeneous system and so by Theorem 1, a non-trivial solution exists if and only if |A−λI| = 0. But that amounts to saying that λ is a characteristic root, i.e. an eigenvalue of the matrix A.

Let us not forget that all these results originate from Gaussian reduction. As our final application of Gaussian elimination, we show how it can be used to efficiently find the inverse of an n × n matrix A. (The method based on determinants is exceedingly inefficient.) By Problem 6 of Tutorial 2, it suffices to find the right inverse (if any) of A. Let X = (xij) be this (unknown) inverse. Then we have

AX = In (9)

Equating each of the n2 entries on both the sides, this can be viewed as 2 2 a system n linear equations in the n unknowns xij, 1 ≤ i ≤ n, 1 ≤ j ≤ n. But solving such a system is very laborious. Instead, let us look at (9) as a collection of n systems of linear equations in n unknowns each. For j = 1, 2,...,n, let xj be the j-th column of the (unknown) matrix X, i.e. t xj is the column vector [x1j x2j ... xnj] . Let ej be the column vector of length n in which the j-th entry is 1 and all other entries are 0. In other words, ej is the j-th column of the In. So, (9) is equivalent to a collection of n systems of n linear equations in n unknowns, viz.

Axj = ej, j =1, 2,...,n (10)

Note that for different values of j the unknowns appearing in the j-th system of equations are totally different. But the coefficient matrix is always the same, viz. A. So, instead of solving these n systems one-by-one we can take advantage of this fact and save considerable duplication of work. We apply row reduction to A and simultaneously to In. If the row reduction

6 ends with an identically zero row, that means the rank of A is less than n and so A is not invertible. Otherwise the row reduction will terminate with an equality like

A∗X = B (11) where A∗ is an upper triangular matrix with all diagonal entries equal to 1. From this point on we can apply back substitution to find each column of X. But there is a better way. In the Gaussian reduction we subtracted suitable multiples of the pivotal row from all rows below it. But we can as well do the same for all rows above it too, so that the entries below as well as above the pivot all vanish. If we do this to A, then instead of (11), (9) will reduce to

InX = C (12)

But this means C is precisely the inverse of A. Thus with repeated row reduction, we have explicitly obtained the inverse of A. (Actually, we have done a little more, viz. along the way we have found the rank of A too, which, in a way, is a measure of how close a (square) matrix is to invertibility. This modification of Gaussian reduction, where we subtract multiples of the pivotal row from all remaining rows (so that all entries in the pivotal column except the pivot itself are reduced to 0), is called the Gauss-Jordan reduction. It can be applied to any system. Its advantage is that it gives the solution of the system directly, because the work done in back substitution is already done in the additional pivoting. However, it turns out that sometimes the extra work is not worth the gain. As a result, the over all efficiency of the Gauss Jordan reduction is not as high as the Gaussian reduction. Still, for some problems such as finding the inverse of a matrix, it is more attractive because it gives the answer explicitly. For a numerical illustration of finding inverse by Gauss Jordan reduction, see Example 1 on page 352 of the text. Row reduction can also be used to prove some theoretical results. For ex- ample, see the proof of Theorem 4 on p. 356 in the text where it is proved that the determinant of a product is the product of the determinants. However, we do not follow this approach because we have already used this property before talking of row reduction. This just shows that there are various lines of approach to theorems about matrices. As we shall now see, some results

7 about matrices can be derived more elegantly by studying what are called vector spaces. Matrices are computational tools to handle problems in vec- tor spaces just as coordinates are computational tools to handle problems in pure geometry. But this relationship is not a one way affair. In the case of coordinates, too, even though they are often a more reliable means than the tricky proofs in pure geometry, occasionally we come across problems where pure geometry proofs are better. For example, unlike the equations of the altitudes of a triangle, those of the angle bisectors are messy to write down. And, therefore, proving their concurrency through coordinates is fairly cum- bersome. But a pure geometry proof can be given directly by characterising the angle bisector as the set of points equidistant from the arms of the angle.

8 INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY

MA 106

Note 7

REAL LIFE APPLICATIONS OF LINEAR EQUATIONS

Because of Gaussian reduction, systems of linear equations can be solved completely and efficiently. This is not the case with systems of equations of other kinds. Even a single polynomial equation of degree 5 or more cannot be solved by a formula. So we have to go for methods such as the Newton- Raphson method to get approximate solutions. Similarly, for differential equations we often have to settle for approximate solutions, and that too, valid only in a neighbourhood of a given point. It is a fact of life that the really important problems defy any easy solu- tions. So, one is led to believe that a simple tool such as solving systems of linear equations will have only superficial real life applications. But this is not quite so. It is true that the applications of linear equations to real life are not very profound mathematically. But in terms of their diversity, they are truly impressive. This happens because many things we encounter in real life such as yields and costs are linear in nature. Even when they are not, they generally have first order linear approximations. Indeed this is often the basis for methods for approximate solutions of non-linear equations. It is impossible here to give even a glimpse of the variety of applications of systems of linear equations. Broadly they can be classified into four cate- gories.

1. Direct applications. Here the data of the problem can be easily translated in terms of a system of linear equations and the problem simply asks for the values of the unknowns which satisfy the given given equations.

2. Indirect applications. Here it takes some imagination to paraphrase the problem in terms of systems of linear equations. Also often what the problem asks is about the nature of solutions rather than the actual solutions.

3. Optimisation of quadratic functions. Optimisation is a common term (popular in the corporate culture) which means either maximi-

9 sation or minimisation (depending on the context) of a real valued function of several variables. Usually this optimisation is to be done subject to some constraints to be satisfied by the variables. The set of points where these constraints are satisfied is called the feasible set of the particular optimisation problem and the function to be op- timised is called the objective function. So, in essence the problem is to maximise (or minimise) a given real valued function defined with the feasible set as its domain. If the function is differentiable, methods from calculus can be used. This requires solving a system of equations obtained by setting certain derivatives equal to 0. If the function is a quadratic in its arguments, then the derivatives are linear functions and the problem then reduces to solving a system of linear functions.

4. Optimisation of linear functions. When the objective function of an optimisation problem is linear, calculus based methods are of little help since the (partial) derivatives are constants. In this case it can be shown that the optimum solution (in case it exists) can alwyas be found from the boundary of the feasible set. If the equations which define the boundary of the feasible set are all linear then the feasible set is what is called a convex polytope. (Convex polygons and polyhedra are convex polytopes of dimensions 2 and 3 respectively.) In such cases the search for the optimum point can further be narrowed down to the corners, formally called the vertices, of the polytope. Identifying the vertices requires solving systems of linear equations. Problems of this type are studied in what is called linear programming, the peculiar name coming from the early applications to schedule the flights of airlines so as to optimise use of the available resources.

We briefly illustrate each of these types. Our intention is not so much to solve these problems here but to indicate how systems of linear equations arise in them. Most of the examples are from the text or tutorials. As an example of a direct application of systems of linear equations, we have electrical networks containing only resistances. Because of Ohm’s law, the current through a resistance is a linear function of the voltage drop across it. This is the essence of the applicability of linear equations. (For circuits involving inductance or capacitors, the equations involved are differential equations which are generally more complicated to solve.) In a network containing only (known) resistances and power sources of known voltages,

10 one can apply the Kirchoff Current law and the Kirchoff Voltage law to get a system of linear equations in which the unknowns are the currents in the various parts of the network. The details are illustrated in Example 2 on p. 324 of the text. (See also Problems 17 to 19 on p. 330 for more examples and Problem 21 for a traffic flow network.) As an example where it is not so obvious how to paraphrase the given prob- lem in terms of systems of linear equations, we consider the Stones Problem (Exercise 10 of Tutorial 2). Let the weights of the stones be x1, x2,...,x11. Do the weighings with a balance. Call its pans as A and B. In any weighing, assign xi the coefficient 1 or −1 or 0 depending upon whether the i-th stone is placed in pan A, in pan B or in neither. Then the condition of balancing is equivalent to saying that the signed sum of the xi’s is 0. When the i-th stone is excluded and the remaining ten grouped into two piles of five each, we get an equation of the form

α1x1 + α2x2 + ... + αixi + ... + α11x11 =0 in which αi = 0 and out of the remaining 10 α’s, five equal 1 each and the remaining five equal −1 each. We cannot say from the data which ones are 1 and which ones are −1. But in any case, the 11 such equations gives rise to a homogeneous system of 11 linear equations in 11 unknowns, whose coefficient matrix, say M, is an 11 × 11 matrix with diagonal entries 0 and with all row sums equal to 0. Obviously one solution of this system is when all xi’s are equal. The problem amounts to showing that there are no other solutions. This would follow if we can show that the rank of M is 10. Exercise 8 of Tutorial 2 is relatively easier to pose as a problem about systems of linear equations. However, the problem does not ask you to solve the system but to show that it has no solution. This can be done by calculating the rank and the augmented rank of the system. As an example of how systems of linear equations arise in quadratic op- timisation, suppose we conduct an experiment to decide how some variable quantity y depends on some other variable quantity x. We take readings with n values of x, say x = x1, x2,...,xn and record the corresponding values of y as, say, y = y1,y2,...,yn respectively. Lagrange’s interpolation formula (Exercise 11 of Tutorial 3) will give us a polynomial function say y = f(x) whose graph will contain all the data points (xi,yi), i =1, 2,...,n. But usu- ally this polynomial is fairly complicated as its degree could be as high as

11 n − 1. Instead we would like to find a polynomial function of degree one, say

y = mx + c (1) which contains all these n data points approximately. The approximation would be exact if and only if these data points happen to lie on the graph of (1). Otherwise we need some measure of how much they deviate. Experience shows that such a good measure is provided by taking the sum of the squares of the differences between the exact values yi and the approximate values mxi + c. That is, we define the error of approximation as

n e m, c y mx c 2 ( )= ( i − i − ) (2) i=1 which is a quadratic function of the two real parameters m and c. (There is 2 some good reason to consider (yi − mxi − c) instead of |yi − mxi − c| which would appear a more direct measure of the deviation, but we shall not go into it.) Here there are no constraints on the variables m and c. Hence the point (m, c) varies all over the plane. So the minimum occurs at an interior point. This point can be obtained by setting the partial derivatives of (2) equal to 0. This gives a system of two linear equations in the two unknowns m and c, viz.

n n x m nc y ( i) + = i (3) i=1 i=1 n n n x2 m x c x y and ( i ) +( i) = i i (4) i=1 i=1 i=1 These equations are called the normal equations of the problem. Note that this problem is qualitatively different from an interpolation problem where the goal is to find a function which fits the data exactly. The method just given is called the method of least squares. For a numerical illustration, see p. 915 of the text. Instead of using a linear polynomial to fit the data approximately, we can use a polynomial of higher degree. In that case, the normal equations will form a system of linear equations in more than two variables (specifically, one more than the degree of the polynomial). Finally, a prototype of a linear programming problem is given by Exercise 9 of Tutorial 2. As all requirements and nutritional contents are in multiples

12 of 100, we take 100 units as one unit. A diet consisting of x1 kg. of cereal A and x2 kg. of cereal B will meet the daily requirements if and only if

5x1 +3x2 ≥ 3 (1)

and x1 +4x2 ≥ 2 (2)

By practical considerations, x1, x2 have to be non-negative. So the solution set of these four inequalities is the subset, say D of the (x1, x2)-plane defined by

2 D = {(x1, x2) ∈ IR : x1 ≥ 0, x2 ≥ 0, 5x1 +3x2 ≥ 3, x1 +4x2 ≥ 2} (3)

(The set D is sketched x 2 in the accompanying fig- ure. These inequalities are the constraints, the objective

function is the cost of the diet D and the set D is the feasi- . ble set. If D = ∅, the prob- A (0, 1) lem would be infeasible. In

. the present case, D is an un- B (6/17, 7/17) bounded set bounded by por- . x 1 C (2, 0) tions of the lines x2 = 0, x1 = 0, x1 + 4x2 = 2 and 5x1 +4x2 = 3.) The cost of the diet is 55x1 + 30x2 = f(x1, x2) (say) which is the objective function, to be minimised on the set D. f has no critical points and so the minimum must occur at a point on the boundary of D which consists 6 7 of two segments AB and BC and two rays, where A = (0, 1), B = ( 17 , 17 ) and C = (2, 0). On each of these parts f can be expressed as a function of one variable and so the minimum, if any, must occur at an end-point. Since f(x1, x2) → ∞ as x1, x2 → ∞, minimum of f must occur at A, B or C. A direct computation shows that it occurs at (0, 1). So, the diet which consists of 1 kg of the cereal B is the cheapest diet. The problem cannot be solved by solving linear equations, at least in the obvious way. It is tempting to think that in the cheapest diet, there is no wastage and the dietary requirements are barely satisfied. So, replacing 6 7 the inequalities (1) and (2) by equalities, we get x1 = 17 , x2 = 17 which corresponds to the point B. But that does not give the cheapest diet.

13 To solve the problem without actually sketching the feasible set (which is impracticable when the number of variables exceeds 2) , we introduce certain other variables, called the slack variables, so called because they take up the slack, i.e. the difference of the two sides of the constraint inequalities (1) and (2). Thus we let

x3 = 5x1 +3x2 − 3 (4)

and x4 = x1 +4x2 − 2 (5)

The constraints (1) and (2) now simply mean that these new variables are non-negative. Note that any two of the variables x1, x2, x3, x4 determine the other two. We can now paraphrase the problem replacing the feasible 4 set D by the set, say E, consisting of those points (x1, x2, x3, x4) ∈ IR for which x1, x2, x3, x4 are all non-negative and (4) and (5) hold. This set is an elevated replica of the set D because its vertical projection on the (x1, x2)-plane gives the set D. The advantage in E is that the inequality constraints are repalced by the equality constraints, except, of course, the inequality constraint imposed by the non-negativity of the four variables. At every vertex of E at least two of these four variables vanish. Two vertices are adjacent if and only if the vanishing variables at them are the same except for one. There is a well-known algorithm called the simplex method in which we begin by identifying some vertex of E. We then identify all vertices that are adjacent to it and move to one of them at which the objective function has a smaller value. We repeat this process till we reach a vertex for which the objective function has a higher value at every adjacent vertex. This vertex gives an optimal solution. The details are beyond our scope, except that the transistion from one vertex to an adjacent one is carried out by row reduction.

14 INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY

MA 106 Autumn 2012-13

Tutorial 3

PART A Q.1 A Vandermonde determinant of order n is a determinant whose j−1 (i, j)-th entry is αi , where α1,α2,...,αn are some fixed numbers. In a full form it is the determinant, say, 2 n−1 1 α1 α ... α 1 1 2 n−1 1 α2 α2 ... α2 V (α1,α2,...,α )= . n . . . . . . 2 1 1 α α ... αn− n n n α α Hint: Prove that this determinant equals ( j − i). [ Starting 1≤i

Q.2 Prove both with and without the last exercise that V (α1,α2,...,αn) = 0 if and only if the α’s are all distinct. (It is interesting that in most applications of the Vandermonde determinant, only this fact and not the actual value of the determinant is needed. For one such application see Problem 11.) Q.3 Find the inverse of the following matrix using elementary row-operations: 1 3 −2   2 5 −7    0 1 −4  Also verify the answer by calculating the inverse in terms of cofactors. Q.4 Compute the last row of the inverse of the following matrix:

1 110   −3 −17 1 2    4 −17 8 −5     0 −5 −2 1 

15 Q.5 The n-th is the n × n matrix whose (i, j)-th entry 1 is . Find H−1 by GJEM. (There is a complicated formula i + j − 1 3 −1 for Hn for every n. As the inverses are known already but are not obvious, these matrices are used as test matrices to check the accuracy of an algorithm which claims to give the inverse of a given matrix.) Q.6 Consider the system of two homogeneous equations:

a1x + b1y + c1z =0 a2x + b2y + c2z =0

If at least one of the numbers a1b2 − a2b1, b1c2 − b2c1 and c1a2 − c2a1 is non-zero prove that every solution of the system is of the form x = t(b1c2 − b2c1),y = t(c1a2 − c2a1), z = t(a1b2 − a2b1) for some t ∈ IR. What happens when all these three numbers are 0? Q.7 Consider the following linear equations

ax + by + cz =0 bx + cy + az =0 cx + ay + bz =0

Column I lists some conditions on a, b, c while Column II lists some statements about what the equations represent. Match each condition with its implication(s). (JEE 2007)

Column I Column II

(A) a + b + c = 0 and (p) the equations represent planes a2 + b2 + c2 = ab + bc + ca meeting only at a single point

(B) a + b + c = 0 and (q) the equations represent the line a2 + b2 + c2 = ab + bc + ca x = y = z

(C) a + b + c = 0 and (r) the equations represent a2 + b2 + c2 = ab + bc + ca identical planes

(D) a + b + c = 0 and (s) the equations represent the whole a2 + b2 + c2 = ab + bc + ca of the three dimensional space

16 Q.8 Prove that a determinant vanishes if and only if some non-trivial linear combination of its rows is identically zero. Prove a similar result about its columns. Q.9 Prove that 1 is an eigenvalue of every Markov matrix and hence also of its transpose.

Q.10 In a certain community every person has one of the three religions, A,B,C. During each time interval of a certain duration, a certain pro- portion of people of each religion converts itself to some other religion as given in the following table.

To A To B To C From A 0.8 0.1 0.1 From B 0.1 0.7 0.2 From C 0.0 0.1 0.9

(Thus, the second row of the table means that 10% people of religion B convert themselves to A, 20% convert to C and the remaining 70% remain in B.) Find a steady state distribution of religions for that community, i.e. find percentages of the three religions in the community which will re- main unchanged after conversions during one time interval (and hence perpetually if there are no external forces). [Hint: Note that the pro- portions in each row add to 1.] Q.11 (Lagrange’s interpolation) Given a set of distinct real numbers α0,α1,...,αn and another set of (not necessarily distinct) real num- bers β0, β1,...,βn, prove that there exists a unique polynomial p(x) of degree n or less such that p(αi)= βi for all i =0, 1,...,n. (This result gives at least one logical answer to the popular quiz problems where given a list of a few numbers, you are asked to find the next one. Of course it may not be the best or the intended answer.)

Q.12 Suppose we are given four data points (0, 0), (1, 1), (2, 0) and (−1, −1) in the plane. Find the cubic polynomial which fits the data exactly. Also find the quadratic polynomial which best fits the data.

Q.13 Suppose Ax = 0 is a homogenous system of linear equations which has a solution where the variables x’s can take possibly complex values.

17 Prove that if the matrix A has only real entries, then a solution exists in which all x’s are real and that a similar result holds if the entries of the matrix A are all rational. (To appreciate the result, note that a similar assertion may fail for non-linear equations.) More generally, if the system has k linearly independent solutions, then we can find k linearly independent solutions with all real (rational) entries.

Q.14 Reduce the Stones Problem to the case where all the weights are ratio- nal and thence to integers. Then solve it by considering their parity.

PART B

Q.15 Can you calculate the determinant of Hn by induction on n?

Q.16 A n × n Cauchy matrix is defined as any matrix Cn whose (i, j)-th 1 entry is , where x1, x2,...,xn,y1,y2,...,yn are any fixed but xi + yj arbitrary real numbers (the only restriction being that the denomina- tors never vanish). Hilbert matrix is a very special case obtained by taking xi = i − 1 and yj = j. Obtain the determinant of Cn as fol- lows. First subtract the first column of Cn from each of the remaining ones to get a matrix, say Dn. Then for i = 1, 2,...,n, the i-th row 1 of Dn has as a common factor while for j = 2, 3,...,n, the xi + y1 j-th column has (y1 − yj) as a common factor. Take out these common factors, subtract the first row from the remaining ones to get a matrix, say En in which the first column has 1 at the top and all other entries 0 while for i, j ≥ 2, the i-th row has a factor x1 − xi and the j-th 1 column has a factor . If these factors are taken out, then the x1 + yj M11 is a Cauchy matrix of order n − 1 formed from the numbers x2, x3,...,xn,y2,y3,...,yn. Now expand w.r.t. the first row so as to re- late the determinant of Cn with that of another Cauchy matrix of order x x y y ( j − i)( j − i) 1≤i<≤j≤n n−1. The final answer ought to come out as . x y ( i + j) 1≤i,j≤n Determine the determinant of the Hilbert matrix as a special case and show that it is always invertible. (This is an excellent example of the

18 paradoxical situation where it is easier to do a more general problem than a special case. How do you explain the paradox?)

19