C H a P T E R § Methods Related to the Norm Al Equations

¡ ¢ £ ¤ ¥ ¦ § ¨ © © © © ¨ ©! "# $% &('*),+-)(./+-)0.21434576/),+98,:%;<)>=,'41/? @%3*)BAC:D8E+%=B891GF*),+H;I?H1KJ2.21K8L1GMNAPO45Q5R)S;S+T? =RU ?H1*)>./+EAVOKAPM ;<),5 ?H1G;P8W.XAVO4575R)S;S+Y? =Z8L1*)/[%\Z1*)]AB3*=,'Q;<)B=,'41/? @%3*)RAP8LU F*)BA(;S'*)Z)>@%3/? F*.4U ),1G;Û ?H1*)>./+ APOKAP;_),5a`Rbc`^dfeg`Rbih,jk=>.4U U )Blm;B'*)n1K84+o5R.EUp)B@%3*.K;q? 8L1KA>[r\0:D;_),1Ejp;B'/? A2.4s4s/+t8/.,=,' b ? A7.KFK84? l4)>lu?H1vs,+D.*=S;q? =>)m6/)>=>.43KA<)W;B'*)w=B8E)IxW=K? ),1G;W5R.K;S+Y? yn` `z? AW5Q3*=,'|{}84+-A_) =B8L1*l9? ;I? 891*)>lX;B'*.41Q`7[r~k8K{i)IF*),+NjL;B'*)71K8E+o5R.4U4)>@%3*.G;I? 891KA0.4s4s/+t8.*=,'w5R.BOw6/)Z.,l4)IM @%3*.K;_)7?H17A_8L5R)QAI? ;S3*.K;q? 8L1KA>[p1*l4)>)>lj9;B'*),+-)Q./+-)Q)SF*),1.Es4s4U ? =>.K;q? 8L1KA(?H1Q{R'/? =,'W? ;R? A s/+-)S:Y),+o+D)Blm;_8|;B'*)W3KAB3*.4Urc+HO4U 8,F|AS346KABs*.,=B)Q;_)>=,'41/? @%3*)BAB[&('/? A]=*'*.EsG;_),+}=B8*F*),+tA7? ;PM ),+-.K;q? F*)75R)S;S'K8ElAi{^'/? =,'Z./+-)R)K? ;B'*),+pl9?+D)B=S;BU OR84+k?H5Qs4U ? =K? ;BU O2+-),U .G;<)Bl7;_82;S'*)Q1K84+o5R.EU )>@%3*.G;I? 891KA>[ mWX(fQ - uK In order to solve the linear system 7 when is nonsymmetric, we can solve the equivalent system b b [¥K¦ ¡¢ £N¤ which is Symmetric Positive Definite. This system is known as the system of the normal equations associated with the least-squares problem, [ ¦ £N¤ minimize §* R¨©7ª§*«9¬ Note that (8.1) is typically used to solve the least-squares problem (8.2) for over- ®°¯± ±²³® determined systems, i.e., when is a rectangular matrix of size , . bi´ A similar well known alternative sets f¢ and solves the following equation for ´ : b [ µ¦ ´ 7 ! ¬ £N¤ ¶p¶p· ¶¡ £¢ ¤ ~¦¥¨§ &¨©¡ ©9&^~ \ ©¡¥p&©¡&ª\³&^~©}\ ¥© ¥p&^ \ ´ Once the solution is computed, the original unknown could be obtained by multiplying ´ b ´ by . However, most of the algorithms we will see do not invoke the variable explic- itly and work with the original variable instead. The above system of equations can be used to solve under-determined systems, i.e., those systems involving rectangular matrices ®°² ± ®"! ± of size ®°¯v± , with . It is related to (8.1) in the following way. Assume that # Q³ and that has full rank. Let be any solution to the underdetermined system . Then (8.3) represents the normal equations for the least-squares problem, b [ $¦ ´ §G#Z¨ § ¬ £N¤ minimize « bi´ Since by definition , then (8.4) will find the solution vector that is closest to # ® ²!± in the 2-norm sense. What is interesting is that when there are infinitely many ´ # Q¡ solutions to the system , but the minimizer of (8.4) does not depend on the # particular used. The system (8.1) and methods derived from it are often labeled with NR (N for “Nor- mal” and R for “Residual”) while (8.3) and related techniques are labeled with NE (N for “Normal” and E for “Error”). If is square and nonsingular, the coefficient matrices of these systems are both Symmetric Positive Definite, and the simpler methods for symmetric problems, such as the Conjugate Gradient algorithm, can be applied. Thus, CGNE denotes the Conjugate Gradient method applied to the system (8.3) and CGNR the Conju- gate Gradient method applied to (8.1). There are several alternative ways to formulate symmetric linear systems having the same solution as the original system. For instance, the symmetric linear system %'& %,+ % [ ./¦ - b (*) ) ) £N¤ + with ¢ 0¨fQ , arises from the standard necessary conditions satisfied by the solution of the constrained optimization problem, + [ 1/¦ « / § ¨ 9§ £N¤ 0 minimize « + b - [ 2/¦ ¬ £N¤ subject to The solution to (8.5) is the vector of Lagrange multipliers for the above problem. Another equivalent symmetric system is of the form % % % ( Q b ( b ) )! ) ¬ 4 5 The eigenvalues of the coefficient matrix for this system are 3,45 , where is an arbitrary singular value of . Indefinite systems of this sort are not easier to solve than the original nonsymmetric system in general. Although not obvious immediately, this approach is similar in nature to the approach (8.1) and the corresponding Conjugate Gradient iterations applied to them should behave similarly. A general consensus is that solving the normal equations can be an inefficient approach b in the case when is poorly conditioned. Indeed, the 2-norm condition number of is given by 6879;: b b b «<V >=§K m§*«§?<N >=A@BL§K«4¬ b « DEAF DEAF §K m§ C4 <N,= 4 <V>= Now observe that « where is the largest singular value of ¤ ¶¡ ¨§ ¡ ¢ r\¤£ §¦ r\¦¥ © &^ \ ©9&~}\ which, incidentally, is also equal to the 2-norm of . Thus, using a similar argument for b <V >= 6879;: 6879;: the inverse @B yields b [ ¦ « « « «<N >=Rz§Kn§ §K @BE§ <V>=K¬ £N¤ ¤ « « « b 687?9 : The 2-norm condition number for is exactly the square of the condition number of - © <V>=v , which could cause difficulties. For example, if originally « , then an iterative method may be able to perform reasonably well. However, a condition/ number of - B can be much more difficult to handle by a standard iterative method. That is because an/ y progress made in one step of the iterative procedure may be annihilated by the noise due to numerical errors. On the other hand, if the original matrix has a good 2-norm condition number, then the normal equation approach should not cause any serious difficulties. & X¢ In the extreme case when is unitary, i.e., when , then the normal equations are clearly the best approach (the Conjugate Gradient method will converge in zero step!). Z XZ ] -fª uw| u When implementing a basic relaxation scheme, such as Jacobi or SOR, to solve the linear system b b [ ¦ 7¢ £N¤ or b [¥¦ ´ 7 ! £N¤ b b 7 it is possible to exploit the fact that the matrices or need not be formed explic- itly. As will be seen, only a row or a column of at a time is needed at a given relaxation step. These methods are known as row projection methods since they are indeed projection b methods on rows of or . Block row projection methods can also be defined similarly. ! #"! %$ &¤')(+*,*.-/*1032#4503687:9<;>=?0@9¦7:A!BC'56D0FEG(¤'¨;>2H79!* It was stated above that in order to use relaxation schemes on the normal equations, only access to one column of at a time is needed for (8.9) and one row at a time for (8.10). This is now explained for (8.10) first. Starting from an approximation to the solution of (8.10), a basic relaxation-based iterative procedure modifies its components in a certain order using a succession of relaxation steps of the simple form [¥,¥K¦ ´3I JLK ŃMPO 5RQ 5 £N¤ O Q 5 5 S where is the S -th column of the identity matrix. The scalar is chosen so that the -th component of the residual vector for (8.10) becomes zero. Therefore, - b [¥>¦ ŃMPO Q Q <V R¨©X < 5 5 =T 5 = £N¤ ¶¡ p¶ ¤ ~¦¥¨§ &¨©¡ ©9&^~ \ ©¡¥p&©¡&ª\³&^~©}\ ¥© ¥p&^ \ + bc´ which, setting ! R¨©X , yields, + Q 5 < = [¥Gµ/¦ O 5 ¬ £N¤ b « Q §K 5S§ « 5 S Denote by the -th component of . Then a basic relaxation step consists of taking bi´ b 5 Q 5 ¨ <N S = [¥ $¦ O 5i ¬ £N¤ b « Q 5 §K § « Also, (8.11) can be rewritten in terms of -variables as follows: b [¥A./¦ M O I JLK Q 5N 5q¬ £N¤ The auxiliary variable ´ has now been removed from the scene and is replaced by the bi´ original variable . Consider the implementation of a forward Gauss-Seidel sweep based on (8.15) and O (8.13) for a general sparse matrix. The evaluation of 5 from (8.13) requires the inner prod- b b ´ Q 5 S uct of the current approximation with , the -th row of . This inner product b Q 5 ¡ is inexpensive to compute because is usually sparse. If an acceleration parameter O O 5 5 is used, we only need to change into ¡ . Therefore, a forward SOR sweep would be as follows. nK(a ¢£¢¥¤§¦n¦©¨-¤w¡ #§ £ 1. Choose an initial . 0 ª *¬*¬,¬TI® 2. For S Do: J / F.- O J /10 / @'&)(* $,+ c!¡#"%$ 3. 5 0 ( b MPO $ * Q 5P 4. 5 5. EndDo b Q 5 S Note that is a vector equal to the transpose of the -th row of . All that is needed is ®25 the row data structure for to implement the above algorithm. Denoting by the number of nonzero elements in the S -th row of , then each step of the above sweep requires 0 0 0 M 2 5 ®2 5 ® operations in line 3, and another operations in line 4, bringing the total to 3 3 0 0 M M 5 ® 2 ®2 ®2 ® . The total for a whole sweep becomes operations, where represents the total number of nonzero elements of . Twice as many operations are required for the Symmetric Gauss-Seidel or the SSOR iteration. Storage consists of the right-hand side, the vector , and possibly an additional vector to store the 2-norms of the rows of . A better alternative would be to rescale each row by its 2-norm at the start. Similarly, a Gauss-Seidel sweep for (8.9) would consist of a succession of steps of the form [¥A1/¦ MPO I J/K Q 5 5S¬ £N¤ O S Again, the scalar 5 is to be selected so that the -th component of the residual vector for (8.9) becomes zero, which yields - b b [¥A2/¦ M O Q Q <N R¨ <N 5 5 = 5 =^ ¬ £N¤ ¤ ¶¡ ¡ ¡ ¢ r\¤£ §¦ r\¦¥ © &^ \ ©9&~}\ +¡ + - b O 5 Q 5 Q 5 <V < ¨ = = With ]¨©7 , this becomes which yields + Q < B 5 = [¥ ¦ O 5 ¬ £N¤ ¤ « Q 5 §K § « Then the following algorithm is obtained.

C H a P T E R § Methods Related to the Norm Al Equations

The Column and Row Hilbert Operator Spaces

Matrices and Linear Algebra

Lecture 5 Ch. 5, Norms for Vectors and Matrices

Matrix Norms 30.4

5 Norms on Linear Spaces

A Degeneracy Framework for Scalable Graph Autoencoders

Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019

Domain Generalization for Object Recognition with Multi-Task Autoencoders

Norms of Vectors and Matrices and Eigenvalues and Eigenvectors - (7.1)(7.2)

Deep Subspace Clustering Networks

Eigenvalues and Eigenvectors

7.4 Matrix Norms and Condition Numbers This Section Considers the Accuracy of Computed Solutions to Linear Systems