High-Performance Decimal Floating-Point Units

UNIVERSIDADE DE SANTIAGO DE COMPOSTELA DEPARTAMENTO DE ELECTRONICA´ E COMPUTACION´ PhD. Dissertation High-Performance Decimal Floating-Point Units Alvaro´ Vazquez´ Alvarez´ Santiago de Compostela, January 2009 To my family A´ mina˜ familia Acknowledgements It has been a long way to see this thesis successfully concluded, at least longer than what I imagined it. Perhaps the moment to thank and acknowledge everyone’s contributions is the most eagerly awaited. This thesis could not have been possible without the support of several people and organizations whose contributions I am very grateful. First of all, I want to express my sincere gratitude to my thesis advisor, Elisardo Antelo. Specially, I would like to emphasize the invaluable support he offered to me all these years. His ideas and contributions have a major influence on this thesis. I would like to thank all people in the Departamento de Electronica´ e Computacion´ for the material and personal help they gave me to carry out this thesis, and for providing a friendly place to work. In particular, I would like to mention to Prof. Javier D. Bruguera and the other staff of the Computer Architecture Group. Many thanks to Paula, David, Pichel, Marcos, Juanjo, Oscar,´ Roberto and my other workmates for their friendship and help. I am very grateful to IBM Germany for their financial support though a one-year research contract. I would like to thank Ralf Fischer, lead of hardware development, and Peter Roth and Stefan Wald, team managers at IBM Deutchland Entwicklung in Boblingen.¨ I would like to extend my gratitude to the FPU design team, in special to Silvia Muller¨ and Michael Kroner,¨ for their help and the warm welcome I received during my stay in Boblingen.¨ I would also like to thank Eric Schwarz from IBM for his support. Many thanks to Paolo Montuschi from Politecnico di Torino for his collaboration in several parts of this research. Finally, I want to thank the institutions that have financially supported this research through trip grants for attending to different conferences: Universidade de Santiago de Com- postela, Ministerio de Ciencia y Tecnolog´ıa (Ministry of Science and Technology) of Spain under contracts TIN2004-07797-C02 and TIN2007-67537-C03, and Xunta de Galicia under contract PGIDT03-TIC10502PR. “Before computers, I used my 10 fingers, but now ::: what am I supposed to do with the other 9 ?” Anonymous Contents 1 Decimal Computer Arithmetic: An Overview 1 1.1 The evolution of ALUs: decimal vs. binary . 2 1.2 The new financial and business demands . 6 1.3 Decimal floating-point: Specifications, standard and implementations . 10 1.4 Current and future trends . 12 2 Decimal Floating-Point Arithmetic Units 15 2.1 IEEE 754-2008 standard for floating-point . 15 2.2 Decimal floating-point unit design . 21 2.3 Decimal arithmetic operations for hardware acceleration . 23 3 10’s Complement BCD Addition 25 3.1 Previous work on BCD addition/subtraction . 26 3.1.1 Basic 10’s complement algorithm . 27 3.1.2 Direct decimal addition . 29 3.1.3 Speculative decimal addition . 33 3.2 Proposed method: conditional speculative decimal addition . 37 3.3 Proposed architectures . 42 3.3.1 Binary prefix tree architectures . 42 3.3.2 Hybrid prefix tree/carry-select architectures . 45 3.3.3 Ling prefix tree architectures . 51 3.4 Sum error detection . 55 3.4.1 2’s complement binary addition . 57 3.4.2 10’s complement BCD addition . 57 3.4.3 Mixed binary/BCD addition . 61 3.5 Evaluation results and comparison . 62 3.5.1 Evaluation results . 62 3.5.2 Comparison . 66 3.6 Conclusions . 71 4 Sign-Magnitude BCD Addition 73 4.1 Basic principles . 73 4.2 Sign-magnitude BCD speculative adder . 75 4.3 Proposed method for sign-magnitude addition . 78 4.4 Architectures for the sign-magnitude adder . 79 4.4.1 Binary prefix tree architecture . 80 i ii Contents 4.4.2 Hybrid prefix/carry-select architecture . 82 4.4.3 Ling prefix tree architectures . 83 4.5 Sum error detection . 85 4.6 Evaluation results and comparison . 87 4.6.1 Evaluation results . 88 4.6.2 Comparison . 89 4.7 Conclusions . 91 5 Decimal Floating-Point Addition 93 5.1 Previous work on DFP addition . 94 5.1.1 IEEE 754-2008 compliant DFP adders . 94 5.1.2 Significand BCD addition and rounding . 98 5.2 Proposed method for combined BCD significand addition and rounding . 102 5.3 Architecture of the significand BCD adder with rounding . 109 5.3.1 Direct implementation of decimal rounding . 112 5.3.2 Decimal rounding by injection . 114 5.4 Evaluation results and comparison . 115 5.4.1 Evaluation results . 115 5.4.2 Comparison . 116 5.5 Conclusions . 117 6 Multioperand Carry-Free Decimal Addition 119 6.1 Previous Work . 120 6.2 Proposed method for fast carry-save decimal addition . 121 6.2.1 Alternative Decimal Digit Encodings . 122 6.2.2 Algorithm . 123 6.3 Decimal 3:2 and 4:2 CSAs . 124 6.3.1 Gate level implementation . 124 6.3.2 Implementation of digit recoders . 125 6.4 Decimal carry-free adders based on reduction of bit columns . 128 6.4.1 Bit counters . 128 6.4.2 Architecture . 130 6.5 Decimal and combined binary/decimal CSA trees . 130 6.5.1 Basic implementations . 131 6.5.2 Area-optimized implementations . 132 6.5.3 Delay-optimized implementations . 132 6.5.4 Combined binary/decimal implementations . 135 6.6 Evaluation results and comparison . 137 6.6.1 Evaluation results . 137 6.6.2 Comparison . 138 6.7 Conclusions . 139 7 Decimal Multiplication 141 7.1 Overview of DFX multiplication and previous work . 142 7.2 Proposed techniques for parallel DFX multiplication . 143 7.3 Generation of partial products . 144 Contents iii 7.3.1 Generation of multiplicand’s multiples . 144 7.3.2 Signed-digit multiplier recodings . 146 7.4 Reduction of partial products . 150 7.5 Decimal fixed-point architectures . 152 7.5.1 Decimal SD radix-10 multiplier . 152 7.5.2 Decimal SD radix-5 multiplier . 153 7.5.3 Combined binary/decimal SD radix-4/radix-5 multipliers . 153 7.6 Decimal floating-point architectures . 154 7.6.1 DFP multipliers . 157 7.6.2 Decimal FMA: Fused-Multiply-Add . 159 7.7 Evaluation results and comparison . 161 7.7.1 Evaluation results . 162 7.7.2 Comparison . 163 7.8 Conclusions . 164 8 Decimal Digit-by-Digit Division 165 8.1 Previous work . 165 8.2 Decimal floating-point division . 166 8.3 SRT radix-10 digit-recurrence division . 168 8.3.1 SRT non restoring division . 168 8.3.2 Decimal representations for the operands . 169 8.3.3 Proposed algorithm . 171 8.3.4 Selection function . 173 8.4 Decimal fixed-point architecture . 175 8.4.1 Implementation of the datapath . 175 8.4.2 Operation sequence . 178 8.4.3 Implementation of the selection function . 179 8.4.4 Implementation of the decimal (5421) coded adder . 182 8.5 Evaluation and comparison . 184 8.5.1 Evaluation results . 184 8.5.2 Comparison . 185 8.6 Conclusions . 186 Conclusions and Future Work 187 A Area and Delay Evaluation Model for CMOS circuits 191 A.1 Parametrization of the Static CMOS Gate Library . 192 A.2 Path delay evaluation and optimization . 197 A.3 Optimization of buffer trees and forks . 199 A.4 Area and delay estimations of some basic components . 202 Bibliography 213 iv Contents List of Figures 1.1 Example of a decimal tax calculation using binary floating-point. ............ 9 2.1 DFP interchange format encodings. ............................ 18 2.2 Configurations for the architecture of the DFU. ..................... 22 3.1 10’s complement Addition/Subtraction Algorithm. 28 3.2 Direct Decimal Algorithm. 29 3.3 Direct decimal adder with direct BCD sum. ....................... 31 3.4 Mixed binary/direct decimal adder using a hybrid configuration. 32 3.5 Decimal Speculative Algorithm. 34 3.6 10’s complement.

High-Performance Decimal Floating-Point Units

Intel 8080 Datasheet

45-Year CPU Evolution: One Law and Two Equations

Appendix D an Alternative to RISC: the Intel 80X86

The Birth, Evolution and Future of Microprocessor

Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University the Intel® 4004 Microprocessor, Introdu

How Microprocessors Work E 1 of 64 ZM

Communication Theory II

Programmable Digital Microcircuits - a Survey with Examples of Use

Introduction to Cpu

(J. Bayko) Date: 12 Jun 92 18:49:18 GMT Newsgroups: Alt.Folklore.Computers,Comp.Arch Subject: Great Microprocessors of the Past and Present

A Hybrid-Parallel Architecture for Applications in Bioinformatics

Architecture and Implementation