Indian Institute of Technology, Bombay Ma 106
Total Page:16
File Type:pdf, Size:1020Kb
INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY MA 106 Autumn 2012-13 Note 6 APPLICATIONS OF ROW REDUCTION As we saw at the end of the last note, the Gaussian reduction provides a very practical method for solving systems of linear equations. Its superiority can be seen, for example, by comparing it with other methods, e.g. the Crammer’s rule. This rule has a very limited applicability, since it applies only when the number of equations equals the number of unknowns, i.e. when the coefficient matrix A in the system Ax = b (1) is a square matrix (of order n, say). Even when this is so, the rule is silent about what happens if A is not invertible, i.e. when |A| = 0. And even when it does speak, the formula it gives for the solution viz. △1 △2 △n x1 = , x2 = ,...,x = (2) △ △ n △ (where △ = |A|, the determinant of A and for i = 1, 2,...,n, △i is the determinant of the matrix obtained by replacing the i-th column of A by the column vector b) involves the computation of n + 1 determinants of order n each. This is a horrendous task even for small values of n. With row reduction, we not only get a complete answer applicable regard- less of the size of A, we get it very efficiently. The efficiency of an algorithm depends on the number of various elementary operations such as addition and multiplication that have to be carried out in its execution. This num- ber, in general, depends not only on the algorithm but also on the particular piece of data to which it is applied. For example, if A = In in (1) then hardly any arithmetic is needed. To avoid such false impressions of efficiency, there are two standard yardsticks. One is the pessimistic one where we consider the worst case performance. The other is the practical one, where we consider the average performance, the average being taken either over all instances of data of a given size, or in some cases, over only those pieces of data which are more relevant in a particular application. (For example, in 1 some applications, the matrix A has only non-negative entries. Or in some applications it may be what is called a sparse matrix, i.e. a matrix with a large number of zero entries, such as the incidence matrix of a graph.) Questions of efficiency of an algorithm are considered in a branch of math- ematics called numerical analysis. We bypass them except to comment that the number of multiplications needed in the execution of the Gaus- sian reduction applied to (1) is, in the worst case, of the order of k3 where k = max{m, n} is a measure of the size of the piece of data. Instead, we turn to some theoretical applications of Gaussian reduction. Some of these are already familiar to us in the special cases where the size of A in (1) is 3 or less. For example, in Note 1, we used the determinant criterion to show the concurrency of the three altitudes of a triangle. The proofs we know are often by direct computation. We shall show how this can be derived easily by Gaussian reduction. Unless otherwise stated, throughout we shall assume that we are dealing with the system (1) where A is an m × n matrix and that after Gaussian reduction, it is transformed into A∗x = b∗ (3) where the matrix A∗ is in a row echelon form. To fix notations, we assume that only the first r rows of A∗ are non-zero and further that for i =1, 2,...,r, the first non-zero entry (which is always 1) in the i-th row occurs in the ji-th column, so that the integers j1, j2,...,jr are strictly increasing. (We recall that these are also the column numbers of the locations of the successive pivots in the Gaussian reduction. Note further that nothing is said about the vanishing or non-vanishing of the entries in the column vector b∗.) Since the matrix A and the column vector b are subjected to the same row operations throughout the Gaussian reduction, it is sometimes convenient to . merge them into a single m × (n + 1) matrix suggestively denoted by [A . b] and called the augmented matrix of (1) because it is obtained by appending . or ‘augmenting’ one more column to A. Similarly, [A∗ . b∗] is the augmented matrix of the reduced system A∗x = b∗. We first observe that the rank of A∗ is r. To see this, note first that the ∗ r × r submatrix of A formed by the first r rows and the j1-th, j2-th, . , ∗ jr-th columns of A is an upper triangular matrix with all diagonal entries and hence the determinant 1. So r(A∗) is at least r. On the other hand any r +1 (or more) rows of A and hence any submatrix of A contain at least one row with all entries 0. So, A cannot have an invertible submatrix of order 2 r + 1 (or more). Hence r(A) is precisely r. This is hardly an achievement. But a non-trivial consequence is that the rank of the original matrix A is also r because the rank is invariant under each elementary row operation as is easy to check directly. We thus see that the Gaussian elimination gives an efficient method to determine the rank of a matrix A. If m = n, then A and A∗ are square matrices. The determinant of A∗ is the product of its diagonal entries. From this we can determine the determinant of A if we have kept track of which row operations we did in reducing A to A∗. What about the ranks of the augmented matrices [A . b] and [A∗ . b∗]? Again, they are equal to each other. Moreover they will be either r or r + 1. From the entries of the reduced column vector b∗, it is very easy to tell which possibility holds. Suppose the entries in the last m − r rows are ∗ ∗ ∗ br+1, br+2,...,bm. If at least one of these numbers is non-zero then the rank of the augmented matrix is r + 1. Otherwise it is r. In terms of these two ranks, the criterion for the existence of a solution to (1) is simply that . r(A)= r([A . b]) (4) This is proved easily because for the reduced system A∗x = b∗, it is very ∗ obvious that a solution cannot exist if bj is non-zero for some j>r, for in that case the j-th equation (in the reduced system) will be 0x1 +0x2 + ... + ∗ 0xn = bj . Such a system is said to be inconsistent. On the other hand, if ∗ ∗ ∗ br+1, br+2,...,bm all vanish then a solution does exist. In fact, we can get the entire solution set (or the ‘solution space’ as it is sometimes called) as follows. We assign arbitrary values to the n − r variables other than xj1 , xj2 ,...,xjr . Once these values are fixed these r variables can be determined one-by-one starting in the reverse order. (This is known as back substitution.) The fact that n − r variables could be assigned arbitrary values is expressed technically by saying that the solution space has n−r degrees of freedom. Such a system is said to be under-determined because there aren’t enough equations to determine the unknowns uniquely. Note that since r ≤ m, a system with m < n, i.e. a system with fewer equations than unknowns is always underdetermined with at least n − m degrees of freedom, provided, of course, that it is consistent. Thus we see that the number r plays a crucial role. It is also called the rank of the system (1). (The rank of the augmented matrix is called the ‘augmented rank’ of the system.) Superficially, the system has m equations. 3 But the rank tells how many ‘really different’ equations there are, the un- derstanding being that an equation which follows as a consequence of some other equations in the system will be considered as redundant and will not be counted in presence of those equations from which it follows. Gaussian elim- ination essentially amounts to eliminating such redundant equations. When we have reduced the system to these r non-redundant equations, each one of them reduces the degree of freedom by one and that is why the solution set, except when it is empty, has n − r degrees of freedom. Later we shall introduce the concept of linear dependence which will give a precise meaning to what we have been describing loosely as redundant. All these cases can be illustrated by considering systems of linear equations in three variables which we denote by x, y, z so as to make us feel at home. Then the solution sets are various subsets of the three dimensional space R3. A single (non-degenerate) equation represents a plane which has 2 degrees of freedom. A system of two such equations represents a straight line in general, viz. the line of intersection of the two planes represented by the two equations. Such a line has one degree of freedom. (That is why it can be parametrised by a single parameter.) Note that here n = 3 and r = m = 2. However, if the two planes are parallel then even though m = 2, r is only 1. If the two planes are parallel (but distinct) then the augmented rank is 2 and there is no solution, i.e.