Lossy Polynomial Datapath Synthesis

Imperial College London Department of Electrical and Electronic Engineering Lossy Polynomial Datapath Synthesis Theo Drane February 2014 Supervised by George A. Constantinides and Peter Y. K. Cheung Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Electrical and Electronic Engineering of Imperial College London and the Diploma of Imperial College London 1 The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work. 2 Abstract The design of the compute elements of hardware, its datapath, plays a cru- cial role in determining the speed, area and power consumption of a device. The building blocks of datapath are polynomial in nature. Research into the implementation of adders and multipliers has a long history and develop- ments in this area will continue. Despite such efficient building block implementations, correctly determining the necessary precision of each building block within a design is a challenge. It is typical that standard or uniform precisions are chosen, such as the IEEE floating point precisions. The hardware quality of the datapath is inextricably linked to the precisions of which it is composed. There is, however, another essential element that determines hardware quality, namely that of the accuracy of the components. If one were to implement each of the official IEEE rounding modes, significant dif- ferences in hardware quality would be found. But in the same fashion that standard precisions may be unnecessarily chosen, it is typical that components may be constructed to return one of these correctly rounded results, where in fact such accuracy is far from necessary. Unfortunately if a lesser accuracy is permissible then the techniques that exist to reduce hardware implementation cost by exploiting such freedom invariably produce an error with extremely difficult to determine properties. This thesis addresses the problem of how to construct hardware to effi- ciently implement fixed and floating-point polynomials while exploiting a global error freedom. This is a form of lossy synthesis. The fixed-point contributions include resource minimisation when implementing mutually exclusive polynomials, the construction of minimal lossy components with guaranteed worst case error and a technique for efficient composition of such components. Contributions are also made to how a floating-point polynomial can be implemented with guaranteed relative error. 3 Acknowledgements This thesis is another strand that links Imagination Technologies with Impe- rial College London's Department of Electrical and Electronic Engineering. Its inception is a testament to the vision of Imagination Technologies' Tony King-Smith and Martin Ashton and Imperial College London's Peter Che- ung and George Constantinides and for its execution it owes much to many. The thesis has grown at the same time that Imagination's Datapath group has grown. I have been incredibly fortunate to find six incredibly talented engineers who, luckily for me, took an interest in a field which I fell into quite by accident many years ago and have since made my home. For all their work, enthusiasm, ingenuity, tenacity and particular independence I sincerely thank Casper Van Benthem, Freddie Exall, Thomas Rose, Sam Elliott, Kenneth Rovers and Emiliano Morini. Particular mention goes to Freddie's staggering array of essential tools TBGen, Hectorator, Paretor and Decorator, Thomas for FRator, Casper and Kenneth's implementations of faithfully rounded floating-point multipliers and adders respectively and Sam's floating-point verification assistance. All without which this work would not have been possible. Thanks also to Imagination's Raeeka Yas- saie and Imperial's Clare Drysdale for all of their support and fascinating conversations on all matters industrial academic. Gratitude goes to Clifford Gibson and Jonathan Redshaw for being very understanding line managers as I tried to navigate two complementary yet competing sets of obligations. I would like to thank Carl Pixley and Himanshu Jain whose development of the paradigm shifting Synopsys' Hector has enabled my group to flourish, provided fascinating insight into our verification challenges and enabled me to share a verification success story in multiple countries. On the Imperial side, sincere thanks goes to David Boland, Samuel Bayliss and Josh Levine of the Circuits and Systems Group for their technical insight, critique and refreshingly broad conversation. Being involved in the shadow management of the EDA club beside David and Sam was a very 4 interesting journey, thanks to my companions for the ride. Thanks to EDA members for enriching my knowledge by the various talks and David Thomas for sharing his interesting work, unique approach to the field, clarity of pre- sentation and critique. I offer my supervisor, George Constantinides, my sincere gratitude for pushing this work to be the best it can be, his insight, vision, direction and incredible ability to quickly understand ideas, despite being presented, on occasion, with truly incomprehensible material. It is essential to thank my parents for giving me blank paper from a young age and luckily we all still use it. Finally it remains to thank my partner, James Van Der Walt, for his unending support and whose own work constantly reminded me that ultimately we do our work to improve the lives of others. I believe my work has enriched the field, improved and pushed my group's work, added value to the company and its products. This thesis has its roots in the continuous design challenges thrown up on a daily basis by working in the mobile graphics hardware sector. It is a truly exciting area for computer arithmetic, large amounts of datapath constrained by ever changing constraints on power, area, speed, precision and accuracy. Hopefully this thesis contributes a little to giving engineers the tools they need to do what must be done. I hope aspects of this thesis will live on in tools and code within Imagination, will be extended and improved by others and ultimately find its way into devices used by many. To those who may be interested in having one foot firmly in an academic setting and another in an industrial one and who wishes to stand and nudge the tide in both, I would say that it is one of the best things I have ever done. 5 Contents 1. Introduction 22 1.1. Statement of Originality . 25 1.2. Impact . 25 1.3. Thesis Organisation . 28 2. Background 30 2.1. Logic and Datapath Synthesis . 32 2.1.1. Integer Addition and Multiplication . 35 2.1.2. Datapath Synthesis Flow . 43 2.2. High-Level Synthesis . 44 2.3. Lossy Synthesis . 49 2.3.1. Word-length Optimisation . 49 2.3.2. Imprecise Operators . 51 2.3.3. Gate Level Imprecision . 53 2.4. Polynomials in Datapath . 55 2.5. Context of Contribution . 57 3. Preliminaries 60 3.1. Polynomials and Polynomial Rings . 61 3.2. Multivariate Division . 63 3.3. GröbnerBases . 67 3.4. Elimination and Solutions to Systems of Multivariate Poly- nomials . 70 3.5. Membership of Vanishing Ideals . 73 4. Lossless Fixed-Point Polynomial Optimisation 79 4.1. Arithmetic Sum-Of-Products . 82 4.2. Motivational Example . 85 6 4.3. Control Logic Minimisation . 88 4.3.1. Determining the Total Degrees of Monomials in p . 89 4.3.2. Solving the Optimisation Problem . 92 4.3.3. Lifting the Restrictions . 95 4.3.4. Changing Variables . 98 4.4. Optimising the Optional Negations . 100 4.5. Overall Flow . 105 4.6. Formal Verification . 107 4.6.1. The Waterfall Verification . 112 4.6.2. Overall Verification Flow . 113 4.7. Experiments . 116 4.7.1. Control Logic Minimisation Experiments . 117 4.7.2. Optional Negation Experiments . 124 4.7.3. General Experiments . 128 4.7.4. Formal Verification Experiments . 135 5. Lossy Fixed-Point Components 137 5.1. Integer Multiplication . 139 5.1.1. Background on Truncation Multiplier Schemes . 139 5.1.2. Constructing Faithfully Rounded Multipliers . 143 5.1.3. Error Properties of Truncated Multipliers . 154 5.2. Multioperand Addition . 157 5.2.1. Optimal Truncation Theorem . 159 opt 5.2.2. Lemma 1: li = hi for i < k . 160 opt 5.2.3. Lemma 2: li = 0 for i > k . 161 5.2.4. Faithfully Rounded Array Theorem . 162 5.2.5. Application to Multiplier Architectures . 163 5.2.6. Experiments . 166 5.3. Constant Division . 172 5.3.1. Conditions for Round Towards Zero Schemes . 173 5.3.2. Conditions for Round To Nearest, Even and Faithfully Rounded Schemes . 175 5.3.3. Hardware Cost Heuristic . 176 5.3.4. Optimal Round Towards Zero Scheme . 176 5.3.5. Optimal Rounding Schemes . 179 5.3.6. Extension to Division of Two's Complement Numbers 179 7 5.3.7. Application to the Multiplication of Unsigned Normalised Numbers . 180 5.3.8. Experiments . 181 5.4. Formal Verification of Lossy Components . 182 6. Lossy Fixed-Point Polynomial Synthesis 185 6.1. Exploiting Errors for Arbitrary Arrays and Constant Division 187 6.2. Array Constructions . 189 6.3. Navigation of the Error Landscape . 192 6.3.1. Developing a Heuristic for the Navigation of the Error Landscape . 193 6.3.2. Example Navigations of the Error Landscape . 196 6.3.3. Procedure for Error Landscape Navigation . 210 6.4. Fixed-Point Polynomial Hardware Heuristic Cost Function . 214 6.4.1. Binary Integer Adder . 214 6.4.2. Binary Rectangular Arrays . 215 6.4.3.

Load more