High Performance Cholesky and Symmetric Indefinite Factorizations
Total Page:16
File Type:pdf, Size:1020Kb
High Performance Cholesky and Symmetric Indefinite Factorizations with Applications Jonathan David Hogg Doctor of Philosophy University of Edinburgh 2010 Abstract The process of factorizing a symmetric matrix using the Cholesky (LLT ) or indefinite (LDLT ) factorization of A allows the efficient solution of systems Ax = b when A is symmetric. This thesis describes the development of new serial and parallel techniques for this problem and demonstrates them in the setting of interior point methods. In serial, the effects of various scalings are reported, and a fast and robust mixed precision sparse solver is developed. In parallel, DAG-driven dense and sparse factorizations are developed for the positive definite case. These achieve performance comparable with other world-leading implementations using a novel algorithm in the same family as those given by Buttari et al. for the dense problem. Performance of these techniques in the context of an interior point method is assessed. 3 4 Declaration I declare that this thesis was composed by myself and that the work contained therein is my own, except where explicitly stated otherwise in the text. (Jonathan David Hogg) 5 6 Acknowledgements I would like to thank my PhD supervisor Julian Hall for all his help and interest over the years. Among the other staff members at Edinburgh, I would like to single out my second supervisor Andreas Grothey for his support, and Jacek Gondzio who took on the role of supervising me for some projects. My colleagues at RAL are also deserving of praise, with much of the work at the core of this thesis done during my year working with them. Specific thanks to Jennifer Scott and John Reid for correcting various technical reports and teaching me how to write better — both code and English. My proofreader and girlfriend Gwen Fyfe, for reading and checking the English, even if most of the mathematics went over her head. Also for support and kisses. Many improvements and corrections were also suggested by the Thesis examiners Jacek Gondzio and Patrick Amestoy. Finally to my office mates and fellow PhD students down the years, for their support and conversation: Kristian Woodsend, Danny Hamilton, Hugh Griffiths, Ed Smith, and the rest. I will breifly mention the research that didn’t quite work out or make it into this thesis: deficient basis simplex methods, structured algebraic modelling languages, parallel generalised upper bound simplex codes and reduced interior point methods. 7 8 Contents Abstract 3 I Background 17 1 Coding for modern processors 19 1.1 Programminglanguage................................ 19 1.2 Overview ........................................ 19 1.3 Caches.......................................... 20 1.4 Branchprediction................................... 21 1.5 SIMDinstructions ................................... 22 1.6 BLAS .......................................... 22 1.7 Parallel programming models . 22 1.7.1 OpenMP .................................... 22 1.7.2 MPI....................................... 23 1.7.3 POSIXthreads................................. 23 1.7.4 Task-based ................................... 23 1.8 IEEEfloatingpoint................................... 24 1.9 Machines ........................................ 24 2 Solving linear systems 27 2.1 DenseCholeskyfactorization . ... 27 2.1.1 Serial implementation . 28 2.1.2 Traditional parallel implementation . 29 2.1.3 DAG-based parallel implementation . 30 2.2 Symmetric indefinite factorization . ... 31 2.3 Sparsematrices.................................... 32 2.4 Sparsesymmetricfactorization . .... 32 2.4.1 Preorder..................................... 35 2.4.2 Analyse ..................................... 35 2.4.3 Factorize .................................... 38 2.4.4 Solve....................................... 40 2.4.5 Parallel sparse Cholesky . 40 2.5 Sparse symmetric indefinite factorization . ...... 41 2.5.1 Scaling ..................................... 41 2.6 Out-of-coreworking ................................ .. 43 2.7 Refinementofdirectsolverresults . .... 43 2.7.1 Iterativerefinement .............................. 43 2.7.2 FGMRES.................................... 44 2.7.3 Mixedprecision................................. 44 2.8 HSLsolvers....................................... 45 2.8.1 MA57 ....................................... 45 2.8.2 HSL MA77 .................................... 45 9 3 Interior point methods 47 3.1 Karush-Kahn-Tuckerconditions. ..... 47 3.2 SolutionoftheKKTsystems ............................ 48 3.3 Higherordermethods ................................ 49 3.3.1 Mehrotra’s predictor-corrector algorithm . ..... 50 3.3.2 Higherordercorrectors . .. .. .. .. .. .. .. .. .. .. .. 50 3.4 Practical implementation . 50 3.4.1 Augmentedandnormalequations . 51 3.4.2 Numericalerrors ................................ 51 3.5 Nonlinear optimization . 52 II Improved algorithms 53 4 Which scaling? 57 4.1 Whyscale? ....................................... 57 4.2 Thescalings....................................... 57 4.3 Methodology ...................................... 58 4.3.1 Performanceprofiles .............................. 59 4.4 Numericalexperiments............................... .. 59 4.4.1 Scalingbytheradix .............................. 59 4.4.2 Useofunscaledmatrixforrefinement . 60 4.4.3 Comparison of hybrid scalings . 60 4.4.4 Overallresults ................................. 63 4.5 Interestingproblems ................................ .. 66 4.6 Conclusions....................................... 66 5 A mixed-precision solver 71 5.1 Mixedprecision..................................... 71 5.2 Testenvironment................................... 72 5.3 On single precision . 72 5.4 Basicalgorithm..................................... 73 5.4.1 Solving in single . 75 5.5 Iterativerefinement................................ ... 75 5.6 PreconditionedFGMRES ............................... 76 5.7 Design and implementation of the mixed-precision strategy . ....... 80 5.7.1 MA79 factor solve .............................. 80 5.7.2 MA79 refactor solve ............................. 82 5.7.3 MA79 solve ................................... 82 5.7.4 Errorsandwarnings .............................. 82 5.8 Numericalresults................................... 83 5.9 Conclusions and future directions . .... 84 6 Scheduling priorities for dense DAG-based factorizations 87 6.1 Introduction...................................... 87 6.2 Taskdispatch...................................... 87 6.3 Prioritisation . 89 6.4 Implementation..................................... 93 6.5 Numericalresults................................... 95 6.5.1 Choice of scheduling technique . 95 6.5.2 Scalability . 96 6.5.3 Blocksizes ................................... 97 6.5.4 Comparisonwithothercodes . 97 6.6 Conclusions....................................... 98 10 7 A DAG-based sparse solver 101 7.1 Solverframework................................... 101 7.1.1 Analyse .....................................101 7.1.2 Solve.......................................102 7.2 Nodaldatastructures .............................. .102 7.3 Tasks ..........................................102 7.4 Taskdispatchengine................................. 105 7.5 Increasing cache locality in update between .....................106 7.6 PaStiX..........................................108 7.6.1 ThePaStiXalgorithm .............................108 7.6.2 Majordifferences................................109 7.7 Numericalresults................................... 110 7.7.1 Testenvironment................................110 7.7.2 Densecomparison ...............................111 7.7.3 Effectofnodeamalgamation . .112 7.7.4 Blocksize....................................113 7.7.5 Localtaskstacksize ..............................113 7.7.6 Speedups and speed for HSL MA87 .......................114 7.8 Comparisonswithothersolvers . .115 7.9 Comparisonsonotherarchitectures. .. ..118 7.10Futurework...................................... 118 8 Improvedalgorithmsappliedtointeriorpointmethods 121 8.1 DAG-driven symmetric-indefinite solver . .121 8.1.1 Modifications ..................................122 8.1.2 Resultsongeneralproblems. 122 8.2 Ipoptinterfaces................................... 125 8.3 Numericalresults................................... 126 8.3.1 Netlibresults ..................................126 8.3.2 Largerproblemresults.............................129 8.4 Conclusions.......................................130 9 Conclusions and Future Work 131 A Test set details 141 B Full Ipopt results 155 B.1 Netlib lp problems . 155 B.2 Mittelmannbenchmarkproblems . 159 11 12 Notation Notational conventions Capital italic letters, e.g. A, L, P Matrices. Bold lower case italic letters, e.g. b, x Vectors. Lower case italic letters, e.g. i,ǫ Scalars. Subscripted italic lower case i-th component of vector x. e.g. x letters, i Double subscripted lower case Entry in row i, column j of matrix A. e.g. a italic letters, ij Double subscripted capital Sub-block of matrix A in position (i, j). e.g. A italic letters, ij Superscripted with brackets, Value of variable at superscripted e.g. x(k),H(j) iteration. Calligraphic subscripted letter, Lower-triangular non-zero set of e.g. j A corresponding matrix e.g. A. Subscripted with brackets, Submatrix of A containing rows e.g. A (i:j)(k:l) i to j, and columns k to l. Value with hat, e.g. xˆ Exact analytic solution. Specific variables and matrices A Matrix to be factorized (n n); alternatively the constraint matrix for a linear program (m