[email protected] T r 2 m 2 ci Thomas Finley, 1. x 1 = x1 + x2 + + xn Positive Definite, A = LDL min (σiyi ci) + c , so yi = . For i = r +1: k k | | | | ··· | | i=1 i=r+1 i σi x 2 2 2 n n q − 2. 2 = x1 + x2 + + xn A R × is positive definite (PD) (or semidefinite (PSD)) if n, y isP arbitrary. P k k ··· 1 i p p p p T ∈ T Rn 0 x y 3. x = lim ( x1 + + xn ) = max xi x Ax > 0 (or x Ax 0). A subspace is a set S such that S and , S,α,β k k∞ p | | ··· | | i=1..n | | ≥ ⊆ ∈ ∀ ∈ ∈ →∞ When LU-factorizing symmetric A, the result is A = LDLT ; Singular Value Decomposition R . αx + βy S. Ax  m n T An induced norm is A  = sup k k . It satisfies For any A R × , we can express A = UΣV such n ∈ x=0 x  L is unit lower triangular, D is diagonal. A is SPD iff D has x R is a of v1, , vk if β1, ,βk R k k 6 k k m∈ m n n the three properties of norms. T that U R × and V R × are orthogonal, and Σ = ∈ x v v ··· ∃ ··· ∈ all positive entries. The Cholesky factorization is A = LDL = ∈ ∈ such that = β1 1 + + βk k. Rn Rm n 3 Rm n ··· n x ,A × , Ax  A  x . 1/2 1/2 T T n 2 diag(σ1, , σp) × where p = min(m, n) and σ1 σ2 The span of v1,..., vk is the set of all vectors in R that ∀ ∈ ∈ k k ≤ k k k k LD D L = GG . Can be done directly in 3 +O(n ) flops. ··· ∈ ≥ ≥ AB  A  B , called submultiplicativity. σp 0. The σi are singular values. { } If G’s diagonal is positive, A is SPD. ···≥ ≥ are linear combinations of v1,..., vk. kT k ≤ k k k k 1. Matrix 2-norm, where A = σ . a b a 2 b 2, called Cauchy-Schwarz inequality. To solve Ax = b for SPD A, factor A = GGT , solve Gw = b 2 1 A B of subspace S, B = v1,..., vk S has ≤ k k k k n k k 1 σ1 T 2. The κ2(A)= A 2 A− 2 = , or rect- { } ⊂ 1. A = maxi=1,...,m j=1 ai,j (max row sum). by forward substitution, then solve G x = w with backwards σn Span(B)= S and all vi linearly independent. k k∞ m | | k k k σ1k n3 2 angular condition number κ2(A) = . Note that 2. A 1 = maxj=1,...,n Pi=1 ai,j (max column sum). substitution, which takes + O(n ) flops. σmin(m,n) The dimension of S is B for a basis B of S. k k | | 3 T 2 | | 3. A is hard: it takes O(n3), not O(n2) operations. m n T κ (A A)= κ (A) . For subspaces S, T with S T , dim(S) dim(T ), and fur- 2 P For A R × , if (A)= n, then A A is SPD. 2 2 k k n m ∈ ther if dim(S)= dim(T ), then⊆S = T . ≤ 4. A = a2 . often replaces . 3. For a rank k approximation to A, let Σk = k kF i=1 j=1 i,j k · kF k · k2 T T A linear transformation T : Rn Rm has x, y Rn,α,β q QR-factorization diag(σ1, , σk, 0 ). Then Ak = UΣkV . rank(Ak) k P P m n ··· ≤ → ∀ ∈ m n ∈ Numerical Stability For any A R with m n, we can factor A = QR, where and rank(Ak) = k iff σk > 0. Among rank k or lower R . T (αx + βy) = αT (x)+ βT (y). Further, A R × such × Six sources of error in scientific computing: modeling errors, mea- Rm m∈ ≥ T Rm n that x . T (x) Ax. ∃ ∈ Q × is orthogonal, and R =[ R1 0 ] × is upper matrices, Ak minimizes A Ak 2 = σk+1. ∀ ≡ surement or data errors, blunders, discretization or truncation ∈ ∈ k − k For two linear transformations T : Rn Rm, S : Rm Rp, triangular. rank(A)= n iff R1 is invertible. 4. Rank determination, since rank(A) = r equals the num- errors, convergence tolerance, and rounding errors. S T S(T (x)) is linear transformation. (→T (x) Ax) (S→(y) Q’s first n (or last m n) columns form an orthonormal basis ber of nonzero σ, or in machine arithmetic, perhaps the exponent − T B◦y) ≡ (S T )(x) BAx. ≡ ∧ ≡ For single and double: for span(A) (or nullspace(A )). number of σ ǫmach σ1. e T ≥ × T ⇒ ◦ ≡ d1.d2d3 dt β t = 24, e ∈ {−126,..., 127} 2vv T Σ(1 : r, 1: r) 0 V A Householder reflection is H = I T . H is symmetric 1 The matrix’s row space is the span of its rows, its column ± ··· × v v A = UΣV = U1 U2 T z}|{ t = 53, e ∈ {−1022,..., 1023} − 0 0 V space or range is the span of its columns, and its rank is the sign mantissa base and orthogonal. Explicit H.H. QR-factorization is:   2  ˆx x   |{z} | {z } |{z} 1: dimension of either of these spaces. The relative error in ˆx approximating x is | −x | . for k =1: n do See that range(U1)= range(A). The SVD gives an orthonormal Rm n | | t+1 2: T For A × , rank(A) min(m, n). A has full row (or Unit roundoff or machine epsilon is ǫmach = β− . Arith- v = A(k : m, k) A(k : m, k) 2e1 basis for the range and nullspace of A and A . ∈ ≤ ± k T k column) rank if rank(A)= m (or n). 2vv T metic operations have relative error bounded by ǫmach. 3: A(k : m, k : n)= I T A(k : m, k : n) Compute the SVD by using shifted QR on A A. A diagonal matrix D Rn n has d = 0 for j = k. The − v v × j,k E.g., consider z = x y with input x, y. This program has 4: end for   diagonal identity matrix I∈has i = 1. 6 − Information Retrival & LSI j,j three roundoff errors.z ˆ = ((1+ δ1)x (1 + δ2)y)(1+ δ3), where m − 2 We get H H H A = R, so then, Q = H H H . This In the bag of words model, wd R , where wd(i) is the (per- The upper (or lower) bandwidth of A is max i j among i,j z zˆ (δ1+δ3)x (δ2+δ3)y+O(ǫ ) n n 1 1 1 2 n | − mach | −2 ··· ··· ∈ | − | δ1, δ2, δ3 [ ǫmach,ǫmach]. | −z | = x y 2 3 where i j (or i j) such that A = 0. ∈ − takes 2mn 3 n + O(mn) flops. haps weighted) frequency of term i in document d. The corpus i,j | | | − | − m n m ≥ ≤ 6 The bad case is where δ1 = ǫmach, δ2 = ǫmach, δ3 = 0: Givens requires 50% more flops. Preferable for sparse A. matrix is A = [w1, , wn] R × . For a query q R , rank A matrix with lower bandwidth 1 is upper Hessenberg. z zˆ x+y − ··· q∈T w ∈ Rn n | −z | = ǫmach |x y| Inaccuracy if x + y x y called catas- The Gram-Schmidt produces a skinny/reduced QR- documents according to a d score. For A, B × , B is A’s inverse if AB = BA = I. If such | | | − | | |≫| − | wd 2 m n k k ∈ 1 trophic calcellation. factorization A = Q R , where Q R × has orthonormal a B exists, A is invertible or nonsingular. B = A− . 1 1 1 ∈ In latent semantic indexing, you do the same, but in a 1 columns. The Gram-Schmidt algorithm is: T The inverse of A is A− =[x1, , xn] where Axi = ei. Conditioning & Backwards Stability k dimensional subspace. Factor A = UΣV , then define n n ··· T k n T For A R the following are equivalent: A is nonsingular, Left Looking Right Looking A∗ = Σ V R × . Each w∗ = A∗ = U w , and × A problem instance is ill conditioned if the solution is sensitive to 1:k,1:k :,1:k ∈ d :,d :,1:k d ∈ x b b x 0 x 0 1: for k =1: n do 1: Q = A T rank(A)= n, A = is solvable for any , A = iff = . perturbations of the data. For example, sin 1 is well conditioned, q∗ = U:,1:kq. Rn T n 2: q a 2: for do The inner product of x, y is x y = i=1 xiyi. but sin 12392193 is ill conditioned. k = k k =1: n In the Ando-Lee analysis, for a corpus with k topics, for Rn ∈ T 3: for do 3: q Vectors x, y are orthogonal if x y =P 0. Suppose we perturb Ax = b by (A + E)ˆx = b + e where j =1: k 1 R(k,k)= k 2 t 1: k and d 1: n, let Rt,d 0 be document d’s relevance to ∈ m n n −T k k ∈ ∈ ≥ T n n The nullspace or of A R is x R : Ax = 0 . E e ˆx+x 2 4: R(j,k)= q ak 4: qk = qk/R(k,k) R × k k δ, k k δ. Then k k 2δκ(A) + O(δ ), where j topic t. R:,d 2 = 1. True document similarity is RR = × , Rm n ∈ { ∈T } A b x 5: q q q 5: for do k k For A × , Range(A) and Nullspace(A ) are orthogonal k k ≤ k k ≤1 k k ≤ k = k R(j,k) j j = k +1: n where entry (i,j) is relevance of i to j. Using LSI, if A contains ∈ T T κ(A)= A A− is the condition number of A. 6: − 6: T T T T complements, i.e., x Range(A), y Nullspace(A ) x y = k kk k end for R(k,j)= qk qj information about RR , then (A∗) A∗ will approximate RR m ∈ ∈ ⇒ n n 7: 7: 0. For all p R , there exist unique x and y so that p = x + y. 1. A R , κ(A) 1. 4. For diagonal D and all p, R(k,k)= qk 2 qj = qj R(k,j)qk well. LSI depends on even distribution of topics, where distribu- × k k − ∈ Rn n ∀ ∈ ≥ 8: 8: maxt Rt,: 2 For a permutation matrix P × , PA permutes the rows 2. κ(I) = 1. D p = maxi=1..n dii . qk = qk/R(k,k) end for tion is ρ = k k . Great for ρ is near 1, but if ρ 1, LSI ∈ mint Rt,: 2 1 T k k maxi=1..n |dii | 9: end for 9: end for k k ≫ of A, AP the columns of A. P − = P . So, κ(D)= | | . 3. If γ = 0, κ(γA)= κ(A). mini=1..n dii does worse. 6 1 | | T If κ(A) , A may as well be singular. In left looking, let line 4 be R(j,k)= qj qk for modified G.S. ≥ ǫmach Complex Numbers An algorithm is backwards stable if in the presence of roundoff to make it backwards stable. GE produces a factorization A = LU, GEPP PA = LU. Complex numbers are written z = x + iy C for i = √ 1. The error it returns the exact solution to a nearby problem instance. Plain GE GE with Partial Pivoting Basic Linear Algebra Subroutines real part is x = (z). The imaginary part∈is y = (z). − GEPP solves Ax = b by returning ˆx where (A + E)ˆx = b. ℜ ℜ 1: for k = 1 to n 1 do 1: for k = 1 to n 1 do 2 2 The conjugate of z is z = x iy. Ax = (Ax), A B = (AB) E ∞ 0. ops, like x + y . O(1) flops, O(1) data. 2: − 2: − It is backwards stable if k k O(ǫmach). With GEPP, − if akk = 0 then stop γ = argmax aik A ∞ 2 2 k k ≤ 1. Vector ops, like py = ax + y. O(n) flops, O(n) data. The absolute value of z is z = x + y . 3: i k+1,...,n | | E ∞ 2 | | T ℓk+1:n,k = ak+1:n,k/akk ∈{ } k k cnǫmach + O(ǫ ), where cn is worst case exponen- T H Cn n 3: a = a A ∞ mach The conjugate of x ispx = (x) . A × is 4: [γ,k],k:n [k,γ],k:n k k ≤ 2. Matrix-vector ops, like rank-one update A = A + xy . ak+1:n,k:n = ak+1:n,k:n 4: tial in n, but in practice almost always low order polynomial. 2 2 H ∈ − ℓ[γ,k],1:k 1 = ℓ[k,γ],1:k 1 O(n ) flops, O(n ) data. Hermitian or self-adjoint if A = A . ℓ a − − ˆx x k+1:n,k k,k:n 5: p = γ Combining stability and conditioning analysis yields k − k 3 2 If QH Q = I, Q is unitary. 5: k x ≤ 3. Matrix-matrix ops, like C = C + AB. O(n ) flops, O(n ) end for 6: 2 k k ℓk:n,k = ak:n,k/akk cn κ(A)ǫmach + O(ǫmach). data. Backward Substitution 7: · Eigenvalues & Eigenvectors ak+1:n,k:n = ak+1:n,k:n Use the highest BLAS level possible. Operators are architec- n n − For A C × , if Ax = λx where x = 0, x is an eigenvector of 1: x = zeros(n, 1) ℓ a ∈ 6 k+1:n,k k,k:n Rn n R ture tuned, e.g., data processed in cache-sized bites. A and λ is the corresponding eigenvalue. 2: for j = n to 1 do 8: end for The determinant det : × satisfies: w u x → Remember, A λx is singular iff det(A λI) = 0. With λ as 3: j j,j+1:n j+1:n 1. det(AB) = det(A) det(B). 4. det(L) = ℓ ℓ ℓ Linear Least Squares xj = − 1,1 2,2 n,n a variable, det(A − λI) is A’s characteristic− polynomial. uj,j ··· 2. det(A)=0iff A singular. for triangular L. Suppose we have points (u1,v1),..., (u5,v5) that we want to fit − n n 1 4: end for T For nonsingular T C × , T − AT (the similarity transfor- 3. det(A) = det(A ). a quadratic curve au2 + bu + c through. We want to solve for: ∈ T T s 2 mation) is similar to A. Similar matrices have the same char- To solve Ax = b, factor A = LU (or A = P LU), solve To compute det(A) factor A = P LU. det(P )=( 1) where u1 u1 1 a v1 This is overdetermined so an w b w bˆ bˆ b w P performs s swaps, det(L) = 1. When calculating det(− U), be- . . . . acteristic polynomial and hence the same eigenvalues (though L = (or L = where = P ) for using forward sub-  . . .  b = .  exact solution is out. Instead, x w x ware of overflow! . . . . probably different eigenvectors). This relationship is reflexive, stitution, then solve U = for using backward substitution. 2 c find the least squares solution 2 3 2  u u 1   v  transitive, and symmetric. The complexity of GE and GEPP is n + O(n ). GEPP en-  5 5    5  x x b 3 Orthogonal Matrices that minimizes A 2. A is diagonalizable if A is similar to a diagonal matrix counters an exact 0 pivot iff A is singular. k T− k Rn n For the method of normal equations, solve for x in A Ax = 1 For Q × , these statements are equivalent: 3 D = T − AT . A’s eigenvalues are D’s diagonals, and the eigen- For banded A, L + U has the same bandwidths as A. ∈T T T 2 n 1. Q Q = QQ = I (i.e., Q is orthogonal) A b with Cholesky factorization. This takes mn + 3 + O(mn) vectors are columns of T since AT:,i = Di,iT:,i. A is diagonaliz- T Norms 2. The 2 = 1 for each row and column of Q. The inner flops. It is conditionally but not backwards stable: A A doubles able iff it has n linearly independent eigenvectors. n k · k n n A vector norm : R R satisfies: product of any row (or column) with another is 0. the condition number. For symmetric A R × , A is diagonalizable, has all real k · k →~ n T T ∈ 1. x 0, and x = 0 x = 0. 3. For all x R , Qx 2 = x 2. Alternatively, factor A = QR. Let c = [ c1 c2 ] = Q b. eigenvalues, and the eigenvectors can be the columns of an or- k k≥ k k ⇔ n ∈m n k k k k 1 T 2. γx = γ x for all γ R, and all x R . A matrix Q R with m > n has orthonormal columns if the The least squares solution is x = R− c1. thogonal matrix Q where A = QDQ is the eigendecomposition k k | | · k k ∈ ∈ × 1 3. x + y x + y , for all x, y Rn. columns are∈ orthonormal, and QT Q = I. The product of orthog- If rank(A) = r and r < n (rank deficient), factor A = of A. Further, for symmetric A: k k ≤ k k k k ∈ T T T Common norms include: onal matrices is orthogonal. For orthogonal Q, QA = A UΣV , let y = V x and c = U b. Then, min Ax b 2 = 1. The singular values are absolute values of eigenvalues. 2 2 k − k and AQ = A . k k k k k k2 k k2 (k+1) (k) (k) 1 (k) 2 2. Is SPD (or SPSD) iff eigenvalues > 0 (or 0). In the splitting method, A = M N and Mv = c is easily In Newton’s method, x = x ( f(x ))− f(x ). By maintaining Bk in factored form, can iterate in O(n ) flops. ≥ (k+1) 1 (k−) (k+1) − ∇ (k) 2 T 3. For SPD, singular values equal eigenvalues. solvable. Then, x = M − Nx + b . If it converges, the This converges quadratically, i.e., e c e . Bk is SPD provided sk y > 0 (use line search to increase αk if m n k k≤ k k 4. For B R × , m n, singular values of B are the square limit point x∗ is a solution to Ax = b.  Automatic differentiation takes advantage of the notion that needed). The secant condition αkBk+1sk = yk holds. If BFCS ∈ ≥ (k) 1 k roots of BT B’s eigenvalues. The error is e = (M − N) e0, so splitting methods con- a computer program is nothing but arithmetic operations, and converges, it converges superlinearly. n n H 1 For any A C × , the Schur form of A is A = QTQ with verge if λ max (M − N) < 1. one can apply the chain rule to get the derivative. This may be ∈ | | Non-linear Least Squares unitary Q Cn n and upper triangular T Cn n. In the , consider M as the diagonals of A. This used to compute Jacobians and . n m × × For g : R R , m n, we want the x for min g(x) 2. ∈ ∈ will fail if A has any zero diagonals. → ≥ (k+1) k(k) k In this sheet I denote λ max = maxλ λ1,...,λn λ . In the Gauss-Newton method, x = x h where | | ∈{ } | | Optimization Cn n k T 1 T − For B × , then limk B =0 if λ max (B) < 1. Conjugate Gradient In continuous optimization, f : Rn R is the h =( g(x) g(x))− g(x) g(x). Note that h is a solution to ∈ →∞ | | → min f(x) ∇ ∇ ∇ (k) (k) Conjugate gradient iteratively solves Ax = b for SPD A. It is objective function, g : Rn Rm holds equal- a linear least squares problem min g(x )h g(x ) ! GN Power Methods for Eigenvalues s.t. g(x)= 0 k∇ T − k h Rn R→p is derived by applying NMUM to to g(x) g(x), and dropping a (k+1) (k) derived from Lanczos and exploits that if A is SPD then T is ity constraints, : holds inequality h(x) 0 x = Ax converges to λ max (A)’s eigenvector. → resulting (derivative of Jacobian). You keep the quadratic | | SPD. It produces the exact solution after n iterations. Time per constraints. ≥ Once you find an eigenvector x, find the associated eigenvalue convergence when g(x∗)= 0, since the tensor 0 as k . T iteration is O(n)+ . We did unrestricted optimization min f(x) in the course. x(k) Ax(k) → → ∞ λ through the Raleigh quotient λ = T . M n (k) (k) 1: (0) Error is reduced by A ball is a set B(x, r)= y R : x y < r . x x x = arbitrary (0 is okay) { ∈ k − k } Ordinary Differential Equations x(k+1) 1x(k) (0) ( κ(A) 1)/( κ(A) + 1) x The inverse shifted power method is =(A σI)− . 2: r0 = b Ax We have local minimizers ∗ which are the best in a region, ODE (or PDE) has one (or multiple) independent variables. − 1 − − dy n If A has eigenpairs (λ1, u1),..., (λn, un), then (A σI)− has 3: p r perp iteration.p Thus, for i.e., r> 0 such that f(x∗) f(x) for all x B(x∗, r). A global In initial value problems, given = f(y, t), y(t) R , and − 0 = 0 ∃ ≤ ∈ dt 1 1 T 4: κ(A) = 1, CG converges minizer is the best local minimizer. ∈ eigenpairs , u1 ,..., , un . Factor A = QHQ for k=0,1,2,. . . do y(0) = y0, we want y(t) for t> 0. Examples include: λ1 σ λn σ 2 dy  −   −  5: T T after 1 iteration. To speed Assume f is c . If x∗ is a local minimizer, then f(x∗)= 0 y where H is upper Hessenberg. αk =(rk rk)/(pk Apk) 1. Exponential growth/decay with dt = a , with closed form (k+1) (k) 2 ∇ at T 6: x x p up CG, use a perconditioner and f(x∗) is PSD. Semi-conversely, if f(x∗) = 0 and y(t)= y e . Growth if a> 0, decay if a< 0. To factor A = QHQ , find successive Householder reflections = + αk k ∇ ∇ 0 M such that κ(MA) κ(A) 2 x x dyi 7: rk+1 = rk αkApk f( ∗) is PD, then ∗ is a local minimizer. 2. Ecological models, = fi(y1,...,yn, t) for species i = H1, H2,... that zero out rows 2 and lower of column 1, rows 3 ≪ ∇ dt T− T and solve MAx = Mb T T 8: βk+1 =(r rk+1)/(r rk) 1,...,n. yi is population size, fi encodes species relation- and lower of column 2, etc. Then Q = H1 Hn 2. k+1 k Steepest Descent (0) ··· − 9: p = r β p instead. ships. 1: A = A A(k) is similar to A by the k+1 k+1 k+1 k Go where the function (locally) decreases most rapidly via 10: − 2: (k) end for (k+1) (k) (k) 3. Mechanics, e.g. wall-spring-block models for F = ma for k = 0, 1, 2,... do orthogonal transform U = x = x αk f(x . αk is explained later. SD is stateless: T d2x d2x kx d[x,v] (k) (k) (k) (k) (0) (k+1) − ∇ (a = ) and F = kx, so = . Yields = 3: Set A σ I = Q R Q Q . Perhaps Multivariate Calculus depends only on the current point. Too slow. dt2 dt2 −m dt − T − 4: A(k+1) = R(k)Q(k) + σ(k)I ··· (k) n kx y choose σ as eigenvalues of Provided f : R R, the gradient and Hessian are Newton’s Method for Unconstrained Min. v −m with 0 as initial position and velocity. 5: → 2 2 2 dy end for submatrices of A. δf δ f δ f δ f (k+1) (k) 2 (k) 1 (k) For stability of an ODE, let = Ay for A Cn n. The sta- 2 Iterate by x = x ( f(x ))− f(x ), derived dt × δx δx δx δx δxn δx1 1 1 2 ··· 1 − ∇ 2 ∇ (k) ∈   by solving for where f(x∗) = 0. If f(x ) is PD and ble or neutrally spable or unstable case is where maxi (λi(A)) < f =  .  , 2f = . . ∇ ∇ ℜ Arnoldi and Lanczos ∇ . ∇ . . f(x(k)) = 0, the step is a descent direction. 0 or = 0 or > 0 respectively. δf  δ2f δ2f δ2f  ∇ 6 Rn n Rn   2 What if the Hessian isn’t PD? Use (a) secant method, (b) In finite difference methods, approximate y(t) by discrete Given A × and unit length q1 , output Q, H such δxn  δxnδx1 δxnδx2 δx     ··· n  T 2 (k) ∈ T ∈ 2 nd 2 direction of negative curvature where h f(x )h < 0 where points y0 (given), y1, y2,... so yk y(tk) for increasing tk. that A = QHQ . Use Lanczos for symmetric A. If f is c (2 partials are all continuous), f is symmetric. ∇ ≈ Arnoldi Lanczos The Taylor expansion for f is ∇ h or h (doesn’t work well in practice), (c) trust region idea For many IVPs and FDMs, if the local truncation error (er- − p+1 T 1 T 2 3 so h = ( 2f(x(k))+ tI) 1 f(x(k)) (interpolation of NMUM ror at each step) is O(h ), the global truncation error (error 1: for k =1: n 1 do 1: β = w f(x + h)= f(x)+ h f(x)+ h f(x)h + O( h ) − 0 0 2 ∇ 2 ∇ k k − ∇ 2 (k∇) overall) is O(hp). Call p the order of accuracy. 2: − 2: k k Provided f : Rn Rm, the Jacobian is and SD), (d) factor f(x ) by Cholesky when checking for q˜k+1 = Aqk for k = 1, 2,... do ∇ 2 (k) wk → PD, detect 0 pivots, modify that diagonal in f(x ) and keep To find p, substitute the exact solution into FDM formula, in- 3: for ℓ =1: k do 3: q = −1 δf1/δx1 δf1/δxn k βk ∇ T −1 . ··· . going (unjustified by theory, but works in practice). sert a remainder term +R on RHS, use a Taylor series expansion, 4: H(ℓ,k)= q q˜k+1 4: u = Aq f =  . .. .  ℓ k k ∇ . . . solve for R, keep only the leading term. 5: q˜k+1 = q˜k+1 H(ℓ,k)qℓ 5: v = u β q Line Search k k k 1 k 1  δfm/δx1 δfm/δxn  In Euler’s method, let y = y + f(y , t )h where h = 6: − T − − − (k) k+1 k k k k k end for 6: αk = q vk  ···  2 Line search, given x and step h (perhaps derived from SD or k f’s Taylor expansion is f(x + h)= f(x)+ f(x)h + O( h ). tk+1 tk is the step size, and y′ = f(y, t) is perhaps computed 7: H(k + 1,k)= q˜ 7: ∇ k k NMUM), finds a α> 0 for x(k+1) = x(k) + αh. − k+1 2 wk = vk αkqk A linear (or quadratic) model approximates a function f by by finite difference. p = 1, very low. Explicit! q˜k+1 k k − (k) 8: q = 8: β = w In exact line search, optimize min f(x + αh) over α. k+1 H(k+1,k) k k 2 the first two (or three) terms of f’s Taylor expansion. A stiff problem has widely ranging time scales in the solu- 9: 9: k k Frowned upon because it’s computationally expensive. end for end for tion, e.g., a transient initial velocity that in the true solution In Armijo or backtrack line search, initialize α. While For Lanczos, the αk and βk are diagonal and subdiagonal en- Nonlinear Equation Solving disappears immediately, chemical reaction rate variability over Given f : Rn Rm, we want x such that f(x)= 0. f(x(k) + αh) >f(x(k)) + 0.1α f(x(k))T h, halve α. tries of the Hermitian tridiagonal Tk, and we have H in Arnoldi. ∇ temperature, transients in electical circuits. An explicit method → g Rn Rn Secant/quasi Newton methods use an approximate always PD After very few iterations of either method, the eigenvalues of Tk In fixed point iteration, we choose : such that requires hk to be on the smallest scale! x(k+1) = g(x(k)). If it converges to x , g(x ) →x = 0. 2f. In Broyden-Fletcher-Goldfarb-Shanno: and H will be excellent approximations to the “extreme” eigen- ∗ ∗ ∗ ∇ Backward Euler has yk+1 = yk + hf(yk+1, tk+1). BE is im- (k) (k) − (k) 2 1: values of A. g(x ) = g(x∗)+ g(x∗)(x x∗)+ O( x x∗ ) For B0 = initial approximate Hessian OK to use I. plicit (yk+1 on the RHS). If the original program is stable, any (k) (k) ∇ − k − k 2: { } For k iterations, Arnoldi is O(nk2) times and O(nk) space, small e = x x∗, ignore the last term. If g(x∗) has for k = 0, 1, 2,... do h will work! −(k) (k) k (0) ∇ 3: 1 (k) Lanczos is O(nk)+k time ( is time for matrix-vector mul- λ max < 1, then x x∗ as e c e for large k, sk = Bk− f(x ) ·M M | | → k k ≤ k k 4: (k+1)− (∇k) Miscellaneous where c = λ + ǫ, where ǫ is the influence of the ignored last x = x + α s Use special line search for α ! p+1 tiplication) and O(nk) space, or O(n + k) space if old qk’s are max k k k n constant p n p | | 5: (k+1) { (k) } ± k = + O(n ) discarded. term. This indicates a linear rate of convergence. yk = f(x ) f(x ) k=1 p+1 H ∇ −T ∇ T 2 b √b2 4ac c Suppose for g(x )= QTQ , T is non-normal, i.e., T ’s su- yky Bksks Bk P ax + bx + c = 0. r , r = . r r = ∗ 6: B = B + k k 1 2 − ± 2a − 1 2 a Iterative Methods for Ax = b ∇ k+1 k T T perdiagonal portion is large relative to the diagonal. Then this αyk sk − sk Bksk Exact arithmetic is slow, futile for inexact observations, and k Useful for sparse A where GE would cause fill-in. may not converge as ( g(x∗)) initially grows! 7: end for NA relies on approximate algorithms. k ∇ k 41981E07411F9B01BDFF261940159373C07AC167405C5F3DC100037440E8659641759C66418EFA5D41558A37C1CBAF46C196E88040E50EFC413A42A6C160850FC12C8D9BC0CD2F0DC0BDE9B0 C123D876412D0624C0EABD7B402D214B3F0A4E5B4106D552406FFC2840F0964AC0949650409F429E403E7359C0BDD4B93F7FD8ACBFDE0F6BC09EC408C11BEDDC3F8F803940D97DF84093DC28 41B34BB341C05DA0420B176A4207D117C0F7A989419CDAD0C1F8DFAD41E5C5834195F1D1418B3D06422C6A75C22A4E18C22CDE3DC1ADE35D42013BB7C2285485C1E1663DC1ACDB1FC17A6DF7 40FA52A2411987CA40DCBDF541A44735BE00EB0F40DF3240C15A8A0540C35A3F4138F2D9410C2DC4414E9086C18EEABBC19026C0C063C6464158FAD9C16A1481C14D5925C106BD65C00537C9 4202D6BB41DB298A420D8B4D422654DBC058CB0D41B93638C2291BD441EA849341432BF140211A3B42192392C224B279C21A623CC1F39916422C9E28C2144CEFC1EF5BDCC2000AE5C11B4178 41CC3330412812684214CE7F4207333FC154BD9942075FB5C1F700B7412FCDD84074C80C403D08FD421673FCC1D5728FC1CAC47DC1D284A542037446C1BA6373C1FA17FAC1EF9B6CC014B26A C163F2CCC074096EC1799FFE40E4FC1840C07B6E410D770A41FA3CBF412B7F75C088D8174091DC9D4102F598C17BC8FF3E23BEB940E2D9CBC1CB43A1C18DA47A4163CB5C41865279C101D174 4220409641D895594242E8CC4240AB99C1530BE941D3FC5AC2453B7B41F61FAF41867D4440E2374D4246F36CC2385596C24579EEC20E2C8C424E3ACEC2173FB7C223860AC218DC20C14C4D01 C0A6E2AC3FB233A9C11575FFBFDC81A64033140140A330D740D2515940E3266941A6486541983485415DC696C1832CAAC126A19E410CF23DC0D3BDEBC18BD0F53F812F1C4021EB80C1221B86 41F9E74341E49DC241F356E941F21CA1C0E5408E415C8840C220AAF941A2DF7E41C3770241B8E7C242073D67C22E7DC8C2316977C195288E4227CC9FC204D34FC2054DE7C1D27A3BC122FAC5 3F20F6F540239D6D407CE8FF4131C69FC0D9CC44416A05B0C0CF9E5C416196E23FF192CB414ABA62420A0178C180DBE7C10FB02FC0C938A140A01794C1ACE03DC121E557C102D79BC05A0B9A 4175D137410A666C41BA03F141A11609C119ADFE416CF000C1D3AC7840CF8936413BE95F4152284941EC4CD3C1483DF5C1BCE0F7C138674041D88129C199E0E9C1E35B6FC1AE0421BF0B7F9D 40E9B16D415B8F0240C2914D4151B42BBFC6AAC141448D07C16917974181226041685A324194A32E41CE5FA0C1E1375CC17115ADC0CFB430417742E8C1E0BD4FC171DEE7C12546BAC0FE0C27 41980F6140FA939942014021420F7E01C14930BE41C5771EC200F696414496D940291CDA40952454421D90C1C1A89B86C1C3FFE5C193999B41F97E2EC1C69876C201193DC1DA288C3FFE4D59 40329D993FFB03ED412BD3E6419ED013C029891B4180279EC0B155BC415D858F4025361D4107EE4B41E18AE3C19F2909C1551577C10EE2DC4144D1B9C1D025BAC14F6FEBC1073B6AC0959483 41E6259441B5FEB44208EE924228B899C11B6D7241C5B937C20911A441B72EEE40EC50BCC00D65FF4210A2CFC2113849C20CAC27C201171542046B73C1B0223EC1BBCC08C1CDDF00C13A5785 40F00474411D672E4027EED93F03DF7BC0850F2C4092866FC171638B4122D01C41A3035E41C4A1F841A87B85C1AB7D24C1A1FB1E40652E054180AB93C19F5F5BC1295F66C1426F7BC04297A0 415D4E0D41629AE44177E6A74195DAC5C0BF9F0E41541266C16230F3419DBEC9415618FB4199C3E042029A8BC2072CB5C1EC52F7C0FB2260418ED6C8C1FE42DBC19C13AFC08524F8C15C1E8A C02FB43F40CB7D46C171D9F9C10C9FC53FAD827BBD1BED2040B11CA141180061412507F34190D6B5408539A9C18D564EC0F56DAD4158BA12C007AFE5C1581D463F9846B74093F7C0408EA433