Posit Arithmetic John L

Copyright © 2017 John L. Gustafson Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sub-license, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: This copyright and permission notice shall be included in all copies or substantial portions of the software. THE SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OR CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Posit Arithmetic John L. Gustafson 10 October 2017 1 Overview Unums are for expressing real numbers and ranges of real numbers. There are two modes of operation, selectable by the user: posit mode and valid mode. In posit mode, a unum behaves much like a floating-point number of fixed size, rounding to the nearest expressible value if the result of a calculation is not expressible exactly; however, the posit representation offers more accuracy and a larger dynamic range than floats with the same number of bits, as well as many other advantages. We can refer to these simply as posits for short, just as we refer to IEEE 754 Standard floating-point numbers as floats. In valid mode, a unum represents a range of real numbers and can be used to rigorously bound answers much like interval arithmetic does, but with a number of improvements over traditional interval arithmetic. Valids will only be mentioned in passing here, and described in detail in a separate document. This document focuses on the details of posit mode, which can be thought of as “beating floats at their own game.” It introduces a type of unum, Type III, that combines many of the advantages of Type I and Type II unums, but is less radical and is designed as a “drop-in” replacement for an IEEE 754 Standard float with no changes needed to source code at the application level. The following table may help clarify the terminology. 2 Posits4.nb Date IEEE 754 Unum Intro- Compati- Advantages Disadvantages duced bility Variable width management needed; Yes; Most bit-efficient March Type I perfect rigorous-bound 2015 inherits IEEE 754 dis- superset representation advantages, such as redundant representations Maximum infor- mation per bit (can customize to a particular workload); Table look-up limits precision to No; perfect reciprocals ~20 bits or less; January Type II complete (+ – × ÷ equally easy); 2016 redesign exact dot product extremely fast via is usually expensive ROM table lookup; and impractical allows decimal representations Hardware-friendly; Too new to have posit form is a drop- vendor support Similar; in replacement February from vendors yet; Type III conversion for IEEE floats (less 2017 possible radical change); perfect reciprocals only for 2n, 0, and ±∞ Faster, more accurate, lower cost than float Type III posits use a fixed number of bits, though that number is very flexible, from as few as two bits up to many thousands. They are designed for simple hardware and software implementation. They make use of the same type of low-level circuit constructs that IEEE 754 floats use (integer adds, integer multiplies, shifts, etc.), and take less chip area since they are simpler than IEEE floats in many ways. Early FPGA experiments show markedly reduced latency for posits, com- pared to floats of the same precision. As with all unum types, there is an h-layer of human-readable values, a u-layer that represents values with unums, and a g-layer that performs mathematical operations exactly and temporarily, for return to the u-layer. Here, the h-layer looks very much like standard decimal input and output, though we are more careful about communicating the difference between exact and rounded values. The u-layer is posit-format numbers, which look very much like floats to a programmer. The g-layer represents temporary (scratch) values in a quire, which does exact operations that obey the distributive and associative laws of algebra and allow posits to achieve astonishingly high accuracy with fewer bits than floats. It's a lot to explain at once, but let's start with the u-layer since it is the closest cousin to floating-point arithmetic. Type III posits use a fixed number of bits, though that number is very flexible, from as few as two bits up to many thousands. They are designed for simple hardware and software implementation. They make use of the same type of low-level circuit constructs that IEEE 754 floats use (integer adds, integer multiplies, shifts, etc.), and take less chip area since they are simpler than IEEE floats in many ways. Early FPGA experiments show markedly reduced latency for posits, com- pared to floats of the same precision. As with all unum types, there is an h-layer of human-readable values, a u-layer that represents values with unums, and a g-layer that performs mathematical operations exactlyPosits4.nb and temporarily, 3 for return to the u-layer. Here, the h-layer looks very much like standard decimal input and output, though we are more careful about communicating the difference between exact and rounded values. The u-layer is posit-format numbers, which look very much like floats to a programmer. The g-layer represents temporary (scratch) values in a quire, which does exact operations that obey the distributive and associative laws of algebra and allow posits to achieve astonishingly high accuracy with fewer bits than floats. It's a lot to explain at once, but let's start with the u-layer since it is the closest cousin to floating-point arithmetic. 1.1 Brief description of the format for posit mode, Type III unums A posit for Type III unums is designed to be hardware-friendly, and looks like the following: sign regime exponent fraction bit bits bits, if any bits, if any s r r r r ⋯ r e1 e2 e3 ⋯ ees f1 f2 f3 f4 f5 f6 ⋯ nbits The boundary between regions is only shown after the sign bit, because the other boundaries vary with the size of the regime. The regime is the sequence of identical bits r, terminated by the opposite bit r or by the end of the posit. This way of representing integers is sometimes referred to as “unary arithmetic” but it is actually a bit more sophisticated. The most primitive way to record integers is with marks, where the number of marks is the number represented. Think of Roman numerals, 1 = I, 2 =II, 3 =III, but then it breaks down and uses a different system for 4 = IV. Or in Chinese and Japanese, 1 = Ӟ, 2 = ԫ, 3 = ㆔, but then the system breaks down and uses a different system for 4 = ㆕. The situation we see this system going beyond 3 is tally marks, the way countless cartoons depict prisoners tracking how many days they’ve been in prison. But how can tally marks record both positive, zero, and negative integers? We can imagine early attempts to notate the concept: 4 Posits4.nb But we have the ability to use two kinds of "tick mark": zero and one. This means we can express both positive and negative integers by repeating the mark. Like, negative integers by repeating 0 bits and zero or positive integers by repeating 1 bits. The number of exponent bits is es, but the number of those bits can be less than es if they are crowded off the end of the unum by the regime bits. For people used to IEEE floats, this is usually the part where confusion sets in. “Wait, exactly where are the exponent bits, and how many are there?” They do not work the same way as the exponent field of a float. They move around, and they can be clipped off. For now, think of the regime bits and exponent bits as together serving to represent an integer k, and the scaling of the number is 2k. The fraction bits are whatever bit locations are not used by the sign, regime, and exponent bits. They work exactly like the fraction bits for normalized floats. That's the main thing that makes Type III posits hardware-friendly. Just as with normalized (binary) floats, the representation boils down to (-1)sign ×2some integer power ×(1 plus a binary fraction) Hardware designers will be happy to hear that there are no “subnormal” posits like there are with floats, that there is only one rounding mode, and there are only two exception values (0 and ±∞) that do not work like the above formula. The reason for introducing regime bits is that they automatically and economically create tapered accuracy, where values with small exponents have more accuracy and very large or very small numbers have less accuracy. (IEEE floats have a crude form of tapered accuracy called gradual underflow for small-magnitude numbers, but they have no equivalent for large-magnitude numbers.) The idea of tapered accuracy was first proposed by Morris in 1971 and a full implementation was created in 1989, but it used a separate field to indicate the number of bits in the exponent.

Posit Arithmetic John L

Leveraging Posit Arithmetic in Deep Neural Networks

Study of the Posit Number System: a Practical Approach

Posits: an Alternative to Floating Point Calculations

Deckblatt KIT, 15-02

Travail De Bachelor Rapport Final

Up-To-Date Interval Arithmetic from Closed Intervals to Connected Sets of Real Numbers