Arithmetic Logic Unit Architectures with Dynamically Defined Precision
Total Page:16
File Type:pdf, Size:1020Kb
University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2015 ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION Getao Liang University of Tennessee - Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Part of the Computer and Systems Architecture Commons, Digital Circuits Commons, and the VLSI and Circuits, Embedded and Hardware Systems Commons Recommended Citation Liang, Getao, "ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION. " PhD diss., University of Tennessee, 2015. https://trace.tennessee.edu/utk_graddiss/3592 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Getao Liang entitled "ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Computer Engineering. Gregory D. Peterson, Major Professor We have read this dissertation and recommend its acceptance: Jeremy Holleman, Qing Cao, Joshua Fu Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Getao Liang December 2015 DEDICATION This dissertation is dedicated to my wife, Jiemei, and my family. ii ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to everyone who offered support and encouragement during this long and meaningful journey. Without their generosity and patience, none of this would have been possible. I would like to specially thank my advisor Dr. Gregory D. Peterson for his continuous support and insightful guidance toward completing my Ph.D degree. His valuable advice helped me overcome numerous obstacles over the years, and his inspiration and philosophy will always play an important part in my future life. I would also like to thank my committee member Dr. Jeremy Holleman for his encouragement and help with standard cell libraries and CAD tools. I also want to thank Dr. Qing Cao and Dr. Joshua S. Fu for serving on my doctoral committee and offering constructive advices that greatly improved my dissertation. In addition, I would like to express again my great appreciation to Dr. Donald W. Bouldin for his mentorship and support. Special thanks to my friends and colleagues at the Tennessee Advanced Computing Laboratory (TACL) for their help, encouragement and support. Thank you very much, JunKyu Lee, Depeng Yang, Yeting Feng, Yu Du, Shuang Gao, Rui Ma, Gwanghee Son, David Jenkins and Rick Weber. I would also like to thank Song Yuan, Kevin Tham, Liang Zuo, Kai Zhu, Kun Zheng, and my long-time roommate Shiliang Deng for being supportive and buddies. Heartfelt thanks to my beloved family for always being there to support me. iii ABSTRACT Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub- blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large. iv TABLE OF CONTENTS CHAPTER I INTRODUCTION ........................................................................................ 1 CHAPTER II LITERATURE REVIEW ........................................................................... 8 A. Software Solutions for Arbitrary Precision ......................................................................... 9 B. Statically Defined Multiple Precision Solutions ............................................................... 10 C. Multiple Precision Solutions for Fixed-point ALUs ......................................................... 11 D. Multiple Precision Solutions for Floating-point ALUs ..................................................... 13 E. Literature Overview........................................................................................................... 16 CHAPTER III FIXED-POINT ALU ARCHITECTURES.............................................. 18 A. Dynamically Defined Precision ......................................................................................... 18 B. Dynamically Defined Precision Adder Architectures ....................................................... 20 1) Carry Extraction and Insertion ...................................................................................... 21 2) Carry Manipulation ....................................................................................................... 22 C. Dynamically Defined Precision Multiplier Architectures ................................................. 23 1) Dynamic Precision Partial Product Generator ............................................................... 24 2) Reduction Tree and Final Faster Adder ......................................................................... 25 D. Dynamically Defined Precision Multiply-Accumulator Architectures ............................. 26 CHAPTER IV FLOATING-POINT ALU ARCHITECTURES ..................................... 29 A. Data Organization .............................................................................................................. 31 B. Floating-point Multiplier Architectures............................................................................. 34 1) Vectorized Unpacking and Packing .............................................................................. 36 2) Vectorized Exception Handling .................................................................................... 37 3) Vectorized Exponent Processing ................................................................................... 38 4) Vectorized Significand Multiplication .......................................................................... 40 5) Vectorized Significand Normalization and Rounding ................................................... 49 C. Floating-point Adder Architectures ................................................................................... 52 1) Overview of Dynamic Precision Adders ....................................................................... 55 2) Vectorized Sign Processing ........................................................................................... 60 3) Vectorized Exponent Processing ................................................................................... 62 4) Vectorized Significand Comparator and LZD ............................................................... 65 5) Vectorized Barrel Shifter ............................................................................................... 69 v 6) Vectorized Significand Adder ....................................................................................... 73 D. Floating-point Fused-Multiply-Add Architectures ............................................................ 75 1) Vectorized Radix Alignment for FMA Units ................................................................ 79 2) Vectorized Significand Addition for FMA Units .......................................................... 83 3) Vectorized LZD for FMA Units .................................................................................... 85 CHAPTER V IMPLEMENTATION AND ANALYSIS ................................................ 87 A. Fixed-point Adders ............................................................................................................ 88 B. Fixed-point Multipliers ...................................................................................................... 92 C. Eight-mode Floating-point ALUs .....................................................................................