Principles of Scientific Computing Linear Algebra I, Theory And

Principles of Scientific Computing Linear Algebra I, Theory and Conditioning Jonathan Goodman last revised February 9, 2006 1 1 Introduction Linear algebra and calculus are the basic tools of quantitative science. Chap- ter ?? is about how to carry out the operations of calculus, differentiation and integration. Here we discuss the operations of linear algebra: solving systems of equations, finding subspaces, solving least squares problems, factoring matrices, and computing eigenvalues and eigenvectors. In practice most of these operations will be done by software pachages that you buy or download. This chapter discusses formulation and condition number of the main problems in computational linear algebra. Chapter ?? discusses algorithms. Conditioning is the primary concern in many practical linear algebra computations. Easily available linear algebra software is stable in the sense of Section ??.??. This means that errors in the computed results are on the order changes in the answer that would be produced by roundoff error perturbations in the data. Unfortunately, condition numbers as large as 1016 occur in not terribly large practical problems. The results of such a calculation in double precision would be completely unreliable. As discussed in Section ??.?? conditioning depends on the effect of small perturbations in the data of the problem. In linear algebra, this is called perturbation theory. Suppose1 A is a matrix and f(A) is something about A that you want to compute, such as x to satisfy Ax = b, or λ and v to satisfy Av = λv. Perturbation theory estimates ∆f = f(A + ∆A) − f(A) when ∆A is small. We often do this by applying implicit differentiation to the relevent equations (such as Ax = b). It is often, but not always, helpful to simplify the results of perturbation calculations using simple bounds that involve vector or matrix norms. For example, suppose we want to say that all the entries in ∆A or δv are small. For a vector, y, or a matrix, B, the norm, kyk or kBk, is a number that charac- terizes the size of y or B. Using norms, we can say that the relative size of a perturbation in A is k∆Ak / kAk. The condition number of computing f(A) is κ = lim max kf(A + ∆A)k / kf(A)k . (1) →0 k∆Ak/kAk≤ We will see that this often is applied informally rather than literally in specific examples. To determine the condition number of a specific problem, we first apply perturbation theory to estimate ∆f for small ∆A, then we use properties of norms to express the results in terms of norms as in (1). Note that the condition number of a problem depends on the problem as well as on A. For example, the condition number of f(A) = A−1b, the problem of finding x so that Ax = b, informally2 given by −1 kAk A . (2) 1This notation replaces our earlier A(x). In linear algebra, A always is a matrix and x never is a matrix. 2To get this result, we not only maximize over ∆A but also over b. If the relative error really were increased by a factor on the order of kAk A−1 the finite element method, which is the main computational technique for structural analysis of, would not work. 2 The problem of finding the eigenvalues of A has condition number that does not resemble (2). For example, the eigenvalue problem can be well conditioned when A is singular, so that A−1 is infinite. A computer can represent an n × n matrix, A, as an n × n array of numbers, which are the entries of A. If this is efficient, we say A is dense. Much of the discussion here and in Chapter ?? applies mainly to dense matrices. A matrix is sparse if storing all its entries directly is inefficient. A modern (2006) desktop computer has enough memory for a matrix of size up to about n = 50, 000. This would make dense matrix methods impractical for solving systems of equations with more than 50, 000 unknowns (components of x). The computing time solving n = 50, 000 linear equations in this way would be at least a few days. Sparse matrix methods can handle larger problems and often give faster methods for problems that can be handled using dense matrix methods. For example, finite element computations often lead to sparse matrices with orders of magnitude larger n that can be solved in hours. One way a matrix can be sparse is for most of its entries to be zero. For example, discretizations of the Laplace equation in three dimensions have as few as seven non zero entries per row, so that 7/n is the fraction of entries of A that are not zero. Sparse matrices in this sense also arise in circuit problems, where a non zero entry in A corresponds to a direct connection between two elements in the circuit. Such matrices may be stored in sparse matrix format, in which we keep lists noting which entries are not zero an the values of the non zero elements. Computations with such sparse matrices try to avoid fill in. For example, they would avoid explicit computation of A−1 because most of its entries are not zero. Sparse matrix software has heuristics that often do very well in avoiding fill in. The interested reader should consult the references. In some cases it is possible to compute the matrix vector product y = Ax for a given x efficiently without calculating the entries of A explicitly. One example is the discrete Fourier transform (DFT) described in Chapter ??. This is a full matrix (every entry different from zero) with n2 non zero entries, but the FFT (fast Fourier transform) algorithm computes y = Ax in O(n log(n)) operations. Another example is the fast multipole method that computes forces from mutual electrostatic interaction of n charged particles with b bits of accuracy in O(nb) work. Many finite element packages never assemble the stiffness matrix, A. Methods that use only matrix vector products but not the elements of A are called iterative. The largest problems are solved in this way. 2 Review of linear algebra This section recalls some aspects of linear algebra we make use of later. It is not a substitute for a course on linear algebra. I make use of many things from linear algebra, such as matrix inverses, without explanation. In my experience, people come to scientific computing with vastly differing points of view in linear algebra. This section should give everyone a common language. 3 2.1 Vector spaces Linear algebra gets much of its power through the interaction between the abstract and the concrete. Abstract linear transformations are represented by concrete arrays of numbers forming a matrix. The set of solutions of a homo- geneous system of equations forms an abstract subspace of Rn that we can try to characterize. For example, a basis for such a subspace may be computed by factoring a matrix in a certain way. A vector space is a set of elements that may be added and multiplied by “scalar” numbers (either real or complex numbers, depending on the applica- tion). Vector addition should be commutitive (u + v = v + u) and associative ((u+v)+w = u+(v +w)). Multiplication by scalars should be distributive over vector addition (x(u + v) = xu + xv for scalar x and vector u), and conversely ((x+y)u = xu+yu). There should be a unique zero vector, 0, with 0+u = u for any vector u. The standard vector spaces are Rn (or Cn), consisting of column vectors u1 u2 u = · · un where the components, uk, are arbitrary real (or complex) numbers. Vector addition and scalar multiplication are done componentwise. If V is a vector space and V 0 ⊂ V then we say that V 0 is a subspace of V if V 0 is also a vector space with the same vector addition and scalar multiplication operations. We may always add elements of V 0 and multiply them by scalars, but V 0 is a subspace if the result always is an element of V 0. We say V 0 is a subspace if it is closed under vector addition and scalar multiplication. For example, suppose V = Rn and V 0 consists of all vectors whose components sum to zero Pn ( k=1 uk = 0). If we add two such vectors or multiply by a scalar, the result also has the zero sum property. On the other hand, the set of vectors whose Pn component sum is one ( k=1 uk = 1) is not closed under vector addition or scalar multiplication. A‘basis for vector space V is a set of vectors f1, ..., fn so that any u ∈ V may be written in a unique way as a “linear combination” of the vectors fk: u = u1f1 + ··· + unfn , n n with scalar expansion coefficients uk. The standard vector spaces R and C have standard bases ek, being the vectors with all zero components but for a single 1 in position k. This is a basis because u1 1 0 0 u2 0 1 0 n X u = · = u1 · + u2 · + ··· + un · = ukek . · · · · k=1 un 0 0 1 4 In view of this, there is not much distinction between coordinates, components, and expansion coefficients, all of which are called uk. If V has a basis with n elements, we say the dimension of V is n. It is possible to make this definition because of the theorem that states that every basis of V has the same number of elements.

Load more