Numerical Linear Algebra Software

Numerical Linear Algebra Software Michael C. Grant Introduction ² Basic computations: BLAS, ATLAS ² Linear systems and factorizations: LAPACK ² Sparse matrices ² Prof. S. Boyd, EE392o, Stanford University Numerical linear algebra in optimization Many optimization algorithms (e.g. barrier methods, primal/dual methods) requires the solution of a sequence of structured linear systems; for example, Newton's method: H¢x = g; H , 2f(x); g , f(x) ¡ r r The bulk of memory usage and computation time is spent constructing and solving these systems. How do we know this? theoretical complexity analysis ² pro¯ling|a method for measuring the resources consumed by various ² subroutines of a program during execution Therefore, writing e±cient code to solve optimization problems requires competence in numerical linear algebra software Prof. S. Boyd, EE392o, Stanford University 1 How to write numerical linear algebra software DON'T! Whenever possible, rely on existing, mature software libraries for performing numerical linear algebra computations. By doing so... You can focus on other important issues, such as the higher-level ² algorithm and the user interface The amount of code you will need to write will be greatly reduced... ² ...thereby greatly reducing the number of bugs that you will introduce ² You will be insulated from subtle and di±cult numerical accuracy issues ² Your code will run faster|sometimes dramatically so ² Prof. S. Boyd, EE392o, Stanford University 2 Netlib A centralized clearinghouse for some of the best-known numerical software: http://www.netlib.org Maintained by the University of Tennessee, the Oak Ridge National ² Laboratory, and colleagues worldwide Most of the code is public domain or freely licensed ² Much written in FORTRAN 77 (gasp!) ² A useful list of freely available linear algebra libraries: http://www.netlib.org/utk/people/JackDongarra/la-sw.html Prof. S. Boyd, EE392o, Stanford University 3 The Basic Linear Algebra Subroutines (BLAS) Written by people who had the foresight to understand the future bene¯ts of a standard suite of \kernel" routines for linear algebra. Created and organized in three levels: Level 1, 1973-1977: O(n) vector operations: addition, scaling, dot ² products, norms Level 2, 1984-1986: O(n2) matrix-vector operations: matrix-vector ² products, triangular matrix-vector solves, rank-1 and symmetric rank-2 updates Level 3, 1987-1990: O(n3) matrix-matrix operations: matrix-matrix ² products, triangular matrix solves, low-rank updates Prof. S. Boyd, EE392o, Stanford University 4 BLAS operations Level 1 addition/scaling ®x, ®x + y T dot products, norms x y, x 2, x 1 k k k k Level 2 matrix/vector products ®Ax + ¯y, ®AT x + ¯y rank 1 updates A + ®xyT , A + ®xxT rank 2 updates A + ®xyT + ®yxT triangular solves ®T ¡1x, ®T ¡T x Level 3 matrix/matrix products ®AB + ¯C, ®ABT + ¯C ®AT B + ¯C, ®AT BT + ¯C rank-k updates ®AAT + ¯C, ®AT A + ¯C rank-2k updates ®AT B + ®BT A + ¯C triangular solves ®T ¡1C, ®T ¡T C Prof. S. Boyd, EE392o, Stanford University 5 Level 1 BLAS naming convention All BLAS routines have a Fortran-inspired naming convention: cblas_ X XXXX pre¯x data type operation Data types: s single precision real d double precision real c single precision complex z double precision complex Operations: axpy y ®x + y dot r xT y Ã T Ã nrm2 r x 2 = px x asum r x 1 = xi Ã k k Ã k k i j j Example: P cblas_ddot double precision real dot product Prof. S. Boyd, EE392o, Stanford University 6 BLAS naming convention: Level 2/3 cblas_ X XX XXX pre¯x data type structure operation Matrix structure: tr triangular tp packed triangular tb banded triangular sy symmetric sp packed symmetric sb banded symmetric hy Hermitian hp packed Hermitian hn banded Hermitian ge general gb banded general Operations: mv y ®Ax + ¯y sv x A¡1x (triangular only) r AÃ A + xxT r2 AÃ A + xyT + yxT mm C Ã ®AB + ¯C r2k C Ã ®ABT + ®BAT + ¯C Ã Ã Examples: cblas_dtrmv double precision real triangular matrix-vector product cblas_dsyr2k double precision real symmetric rank-2k update Prof. S. Boyd, EE392o, Stanford University 7 Using BLAS e±ciently Always choose a higher-level BLAS routine over multiple calls to a lower-level BLAS routine, even when the results would be the same k T m£n m n A A + xiy ; A R ; xi R ; yi R Ã i 2 2 2 Xi=1 Two choices: k separate calls to the Level 2 routine cblas_dger T T T A A + x1y1 ; A A + x2y2 ; : : : A A + xky Ã Ã Ã k or a single call to the Level 3 routine cblas_dgemm T A A + XY ; X , x1 : : : xk ; Y , y1 : : : yk Ã £ ¤ £ ¤ The Level 3 choice will perform much better in practice. Prof. S. Boyd, EE392o, Stanford University 8 Is BLAS necessary? Why use BLAS at all when writing your own routines is so easy? A A + XY T A Rm£n; X Rm£p; Y Rn£p Ã 2 2 2 p Aij Aij + XikYjk Ã kX=1 void matmultadd( int m, int n, int p, double* A, const double* X, const double* Y ) { int i, j, k; for ( i = 0 ; i < m ; ++i ) for ( j = 0 ; j < n ; ++j ) for ( k = 0 ; k < p ; ++k ) A[ i + j * n ] += X[ i + k * p ] * Y[ j + k * p ]; } Why use BLAS when the code is this simple? Prof. S. Boyd, EE392o, Stanford University 9 Is BLAS necessary? The answer: performance|a factor of 5 to 10 times or more. It is not trivial to write numerical code on today's computers that achieves high performance, due to pipelining, memory bandwidth, cache architectures, etc. The acceptance of BLAS as a standard interface for numerical linear algebra has made practical a number of e®orts to produce highly-optimized BLAS libraries One free example: ATLAS http://math-atlas.sourceforge.net uses automated code generation and testing methods (basically, automated trial-and-error) to generate an optimized BLAS library for a speci¯c computer Prof. S. Boyd, EE392o, Stanford University 10 Improving performance through blocking Blocking can be used to improve the performance of matrix/vector and matrix/matrix multiplications, Cholesky factorizations, etc.. T A11 A12 X11 T T A + XY + Y11 Y21 Ã ·A21 A22¸ ·X21¸ £ ¤ The optimal block size is not easy to determine: it depends on the speci¯c architecture of the processor, cache, and memory. But once they are chosen, each block can be computed separately, and ordered in a manner which minimizes memory tra±c; e.g. T T A11 A11 + X11Y11; A12 A12 + X11Y21; Ã Ã T T A21 A21 + X21Y11; A22 A22 + X21Y21 Ã Ã Prof. S. Boyd, EE392o, Stanford University 11 The Matrix Template Library Having just sung the praises of the BLAS, here is an alternative for C++: http://www.osl.iu.edu/research/mtl/intro.php3 Uses the most advanced features of C++ for ﬂexibility and performance ² Supports structured matrices (triangular, symmetric, banded, etc.) ² Uses blocked algorithms to improve performance ² Also supports LAPACK, sparse matrices ² Attractively licensed ² It looks interesting. Has anyone tried this? Prof. S. Boyd, EE392o, Stanford University 12 The Linear Algebra PACKage (LAPACK) LAPACK contains a variety of subroutines for solving linear systems and performing a number of common matrix decompositions and factorizations. First release: February 1992; latest version (3.0): May 2000 ² Supercedes predecessors EISPACK and LINPACK ² Supports the same data types (single/double precision, real/complex) ² and matrix structure types (symmetric, banded, etc.) as BLAS Uses BLAS for internal computations ² Routines divided into three categories: auxiliary routines, computational ² routines, and driver routines Prof. S. Boyd, EE392o, Stanford University 13 LAPACK Computational Routines Computational routines are designed to perform single, speci¯c computational tasks: factorizations: LU, LLT /LLH, LDLT /LDLH, QR, LQ, QRZ, ² generalized QR and RQ symmetric/Hermitian and nonsymmetric eigenvalue decompositions ² singular value decompositions ² generalized eigenvalue and singular value decompositions ² Prof. S. Boyd, EE392o, Stanford University 14 LAPACK Driver Routines Driver routines call computational routines in sequence to solve a variety of standard linear algebra problems, including: Linear equations: AX = B; ² Linear least squares problems: ² minimizex b Ax 2 minimize y k ¡ k subject to dk =k By Generalized linear least squares problems: ² minimizex c Ax 2 minimize y subject to Bk x¡= d k subject to dk =k Ax + By Standard and generalized eignenvalue and singular value problems: ² AZ = ¤Z; A = U§V T ; Z = ¤BZ Prof. S. Boyd, EE392o, Stanford University 15 A LAPACK usage example Consider the task of solving P AT d r x = a ·A 0 ¸ ·dv¸ ·rb¸ n m for dx R , dv R , where 2 2 P Rn£n; P = P T 0 2 Â m£n n m A R ; ra R ; rb R ; (m < n) 2 2 2 Linear systems with this structure occur quite often in KKT-based optimization algorithms Prof. S. Boyd, EE392o, Stanford University 16 A LAPACK usage example: option 1 The coe±cient matrix is symmetric inde¯nite, so a permuted LDLT factorization exists: P A QLDLT QT ·A 0¸ ! The computational routine dsytrf performs this factorization the driver routine dsysv uses dsytrf to compute this factorization and performs the remaining computations to determine the solution: d 1 1 r x = QT L¡ D¡ L¡T Q a ·dv¸ ·rb¸ Prof. S. Boyd, EE392o, Stanford University 17 A LAPACK usage example: option 2 A bit of algebraic manipulation yields ¡1 T ¡1 ¡1 ¡1 ¡1 T dv = (AP A ) (AP ra rb); dx = P ra P A dv: ¡ ¡ So ¯rst, we solve the system P A~ r~a = A ra £ ¤ £ ¤ for A~ and r~a; then we can construct and solve T (AA~ )dv = Ar~a rb ¡ using a second call to dspsv; ¯nally, dx = r~a Ad~ v.

Load more