Analysis and Evaluation of Binary Cascade Iterative Refinement And
Total Page:16
File Type:pdf, Size:1020Kb
MASTERARBEIT Titel der Masterarbeit Analysis and Evaluation of Binary Cascade Iterative ” Refinement and Comparison to other Iterative Refinement Algorithms for Solving Linear Systems“ Verfasser Karl E. Prikopa, BSc angestrebter akademischer Grad Diplom-Ingenieur (Dipl.-Ing.) Wien, 2011 Studienkennzahl lt. Studienblatt: A 066 940 Studienrichtung lt. Studienblatt: Scientific Computing UG2002 Betreuer: Univ.-Ass. Privatdoz. Dr. Wilfried Gansterer Abstract Iterative refinement is a widely used method to improve the round-off errors of a solution of a linear system and is also used in software packets like LAPACK. The cost of the iterative improvement is very low compared to the cost of the factorization of the matrix but results in a solution which can be accurate to machine precision. Many variations on the standard iterative refinement method exist, which use different working precisions to refine the solution. The extra precise iterative refinement can use extended precision to improve the result. The mixed precision iterative refinement tries to exploit the benefits of using lower precisions to compute a solution and then uses iterative refinement to achieve the higher precision accuracy. The focus of this thesis will be the binary cascade iterative refinement (BCIR), which chooses the working precisions according to the input data. This algorithm depends on arbitrary precision arithmetic to support working precisions outside the IEEE standard data types provided by most hardware vendors. The thesis will analyse the properties of BCIR and conduct experiments which will compare the algorithm to other iterative refinement methods and focus on the numerical accuracy and the convergence. The arbitrary precision arithmetic will be implemented using the GNU MPFR software library. The simulated arbitrary precision does not provide accurate information about the gains and losses in performance due to the use of the different precisions. Therefore a per- formance model will be introduced in order to be able to compare the performance of the algorithms and to analyse the possible performance gains, which could be exploited in future works by hardware implementations for example using reconfigurable hardware like FPGAs. Contents Contents i List of Figures iv List of Listings vi 1 Introduction 1 1.1 ThesisOutline ................................... 2 2 Precisions of Floating Point Arithmetic 4 2.1 StandardPrecision ............................... 4 2.1.1 History ................................... 4 2.1.2 IEEEStandard............................... 6 2.2 ArbitraryPrecision.............................. ... 8 2.2.1 Constant Folding with Arbitrary Precision. ....... 9 2.3 Arbitrary-precision Software and Libraries . ........... 9 2.3.1 Hardwaresupport ............................. 10 2.3.2 Stand-aloneSoftware. 10 2.3.3 Programming languages and software libraries . ....... 11 2.4 GNUMPFR .................................... 12 2.4.1 MPFRVariables .............................. 12 2.4.2 MPFRRoundingModes. 14 2.4.3 MPFRFunctions.............................. 14 2.4.4 MPFRExampleSourceCode . 15 2.5 PerformanceEvaluation . ... 16 3 Iterative Refinement 19 3.1 StandardIterativeRefinement(SIR) . ...... 19 3.2 ExtraPreciseIterative Refinement (EPIR) . ....... 21 3.3 Mixed Precision Iterative Refinement (MPIR) . ........ 22 3.4 ModelEstimatingNumberofIterations . ...... 25 i 4 BCIR - Binary Cascade Iterative Refinement 26 4.1 TheAlgorithm ................................... 26 4.2 TheParameters................................... 27 4.2.1 Analysis of the Levels of Recursion p ................... 30 4.2.2 Analysis of the Precisions βj ....................... 31 4.3 RefinedVersionofBCIR............................. 33 4.4 Accessing Matrix A and Vector b atDifferentPrecisions . 35 4.5 Conclusion ..................................... 36 5 Implementation 37 5.1 Requirements .................................... 37 5.2 GenerateMatrix .................................. 38 5.3 FileFormat ..................................... 39 5.4 BlockLUdecomposition. 39 5.5 ImplementedPrograms. 40 5.5.1 FileStructureoftheResults . 41 5.5.2 Binary Cascade Iterative Refinement (BCIR) . ..... 42 5.5.3 StandardIRandMPIR(APIR) . 44 5.5.4 Extra Precise Iterative Refinement (EPIR) . ..... 46 5.5.5 DirectLUSolver(NoIR). 47 6 Experiments 48 6.1 GeneratedData................................... 49 6.2 TargetandWorkingPrecisions . .... 50 6.3 TerminationCriteria . ... 50 6.4 PrecisionsusedbyBCIR. 51 6.5 Comparison of Iterative Refinement Methods . ....... 53 6.5.1 RandomMatrices.............................. 53 6.5.2 HilbertMatrices .............................. 58 6.6 ExperimentsforRefinedVersionofBCIR . ..... 59 6.6.1 PrecisionsusedbyRefinedBCIR . 60 6.6.2 NumberofIterations. 61 6.6.3 RelativeResidual.............................. 62 6.6.4 ConclusionofRefinedBCIR. 63 7 Performance Model 64 7.1 PerformanceModels ............................... 65 7.1.1 PerformanceModelforDirectLUSolver. .... 65 7.1.2 PerformanceModelforSIR . 65 7.1.3 PerformanceModelforMPIR. 65 7.1.4 PerformanceModelforEPIR . 65 7.1.5 PerformanceModelforBCIR . 67 7.2 ResultsfortheModelledSpeed-Up . ..... 68 ii 8 Conclusion 71 Bibliography 73 A Additional and Extended Information 78 A.1 List of available Arbitrary Precision Libraries . ............ 78 A.2 ProjectFiles .................................... 80 A.3 TestEnvironment................................. 81 B Summaries and Curriculum Vitae 82 B.1 EnglishSummary.................................. 82 B.2 DeutscheZusammenfassung . ... 84 B.3 CurriculumVitae................................. 86 iii List of Figures 2.1 General representation of floating-point numbers [48].............. 7 2.2 Slow down effect of a matrix multiplication using the built-in C variable type double and arbitrary precision packages GNU MPFR and ARPREC compared toDGEMMfromtheATLASBLASlibrary . 17 2.3 Slow down effect of a matrix multiplication using the GNU MPFR library, comparedtoDGEMMfromtheATLASBLASlibrary . 18 3.1 Comparison of the performance of the direct solver and the mixed precision iterative refinement using the Cell BE processor on the Sony PlayStation 3 [27]. 24 4.1 Initial working precisions β0 coloured by the number of recursion levels p . 31 4.2 Initial working precisions β0 for the two matrix sizes n = 10 and n = 1000 31 4.3 Initial working precisions β0 coloured by the condition number κ ....... 32 4.4 Final working precisions βp coloured by the number of recursion levels p ... 33 4.5 Final working precisions βp for the two matrix sizes n = 10 and n = 1000 33 4.6 Initial working precisions β0 coloured by the number of recursion levels p using the definition of c in Equation (4.22) ....................... 34 4.7 Initial working precisions β0 for the two matrix sizes n = 10 and n = 1000 using the definition of c in Equation (4.22) ................... 34 4.8 Final working precisions βp coloured by the number of recursion levels p using the definition of c in Equation (4.22) ....................... 35 4.9 Final working precisions βp for the two matrix sizes n = 10 and n = 1000 using the definition of c in Equation (4.22) .................... 35 5.1 Block LU factorization [30] ............................ 40 6.1 Precisions βj used throughout the iterative process for n = 10 plotted on the left y-axis and the relative residual depicted by the orange dotted line on the righty-axis ..................................... 51 6.2 Precisions βj used throughout the iterative process for n = 2000 plotted on the left y-axis and the relative residual depicted by the orange dotted line on therighty-axis ................................... 52 6.3 Number of iterations for the different iterative refinement methods for n = 100 53 6.4 Number of iterations for the different iterative refinement methods for n = 2000 54 6.5 Number of iterations for the different iterative refinement methods for κ = 103 55 6.6 Relative residual for different iterative refinement methods for n = 2000 . 56 iv 6.7 Relative residual for different iterative refinement methods for κ = 103 .... 57 6.8 Relative residual for different iterative refinement methods for κ = 107 .... 57 6.9 Number of iterations for the different iterative refinement methods for Hilbert matrices....................................... 58 6.10 Relative residual for different iterative refinement methods for Hilbert matrices 59 6.11 Precisions βj used throughout the iterative process by the refined version of BCIR plotted on the left y-axis and the relative residual depicted by the orange dottedlineontherighty-axis. 60 6.12 Number of iterations for the different iterative refinement methods for n = 1000 61 6.13 Relative residual for different iterative refinement methods for κ =1 ..... 62 7.1 Modelled speed-up for different system sizes . ......... 69 7.2 Modelled speed-up for different condition numbers . .......... 70 v Listings 2.1 The definition of the data type mpfr t ...................... 13 2.2 All available MPFR functions to add two numbers using different input data types......................................... 14 2.3 DotproductimplementedinstandardC . ..... 15 2.4 Dot product implemented using the GNU MPFR library . ....... 16 5.1 Example for an output file created by bcir ................... 43 5.2 Example for an XML file created by bcir .................... 44 5.3 Example for an output file created by apir ................... 45 5.4 Example for an XML file created by apir .................... 45 5.5 Example for an output file created by apir ................... 47 5.6 Example for an XML file created