Number Systems © Noel Murphy, DCU, 2000

A number system is just a way of writing numbers. We use a positional number system with ten symbols, 0 - 9, where the position of a symbol indicates its value. So for example, the 9 in 911 represents nine hundred things, while the 9 in 119 only represents nine things. It was not always like this.

Egyptian Numbers Around 5000 BC (2000 years before Newgrange was built and 3000 years before the pyramids were built) the Egyptians had a number system based on powers of ten, but with different symbols for ten, one hundred, one thousand, etc. Instead of a symbol to represent the number three, they would write the appropriate symbol three times.

Modern: 1 10 100 1000 10,000 100,000 1000,000

Egyptian:

Thus the number 234 (for us) would be written by an ancient Egyptian as two snares, three heel bones and four vertical strokes:

Roman Numerals The Romans used a number system based on the value five, represented by the letter V (which probably represented a hand with the fingers together and the thumb at an angle to them). This is called a quintal system from the Latin quintus meaning a fifth. Again there are special symbols for the bigger numbers, like X for ten, L for fifty, C for one hundred, D for five hundred and M for one thousand. It was also positional, but not in the modern sense. The symbols to the left of a larger symbol were to be subtracted to find the value represented, while ones appearing on the right were to be added. Thus IV represents five minus one, or four, while VI represents five plus one, or six.

Babylonian Number Systems Around 3000BC, the Babylonians in Mesopotamia (modern Iraq) used a number system based on the value sixty. They used two wedge-shaped (cuneiform) marks, for one and for ten, where the position of the symbol indicated its value. Thus represented two sixties plus ten plus one (131 in our system). The Babylonian use of sixty as their base number is the origin of our use of 60 seconds in a minute, and 60 minutes in an hour. It is also the origin of our convention of 360 degrees in a complete circle and use of “minutes of arc” and “seconds of arc” for fractions of a degree. The Babylonians were very interested in astronomy and the two things you need to study astronomy are measures of time and angles.

It was 300BC before the Babylonians started to use a special symbol to denote an empty space between other positional symbols (i.e., to indicate a zero, as in 105, meaning one hundred, no tens and five units in our system). However, they still did use not zeros on the right hand side of a number to give the scale of a number. Instead they used a sophisticated system that we now call floating point where the scale of a number is known or given separately. Thus they would write the numbers 1 30 , 2, 120 and 7200 in exactly the same way, with the scale being inferred from the context. This system allowed then to carry out multiplication very easily, whereas the later Roman made multiplication almost impossible. Actually the Babylonians also used numbers written in groupings based on tens and hundreds for normal business transactions and used the base sixty numbers just for the more difficult problems.

The History Of Our Number System Our number system with its and base (or ) ten, was first used in India in AD 595, though the earliest use of this system with a zero is from AD 876. Again, the main motivation for the use of a proper number system was astronomy. The system was adopted by the Moors, and later was introduced to Europe, principally through translations of a book called Algebra by an Arab mathematician called al-Khwarizmi. His name has given us the word “algorithm” meaning a procedure, particularly in the context of computer programs. (The word “algebra” is from the Arabic “al-jabr w’al-muqabala” meaning “restoration and reduction”). The use of positional notation for fractions was introduced around 1579 by Francois Viete. Before this, fractions were usually represented in the “degrees, minutes, seconds” notation that we still use for fractions of an angle. The dot for a point was popularised in the English-speaking world by the Scottish mathematician John Napier, who invented logarithms. (In continental Europe, a comma is usually used instead of a dot for the decimal “point”). The development of decimal fractions during the Middle Ages was often motivated by attempts to find accurate representations of the value of p. A Persian mathematician who died c.1436 calculated the value of 2p correct to 16 decimal places.

The Denary System In our system, a denary system, the position of a symbol to the left of the decimal point represents numbers of increasing powers of ten, while digits to the right of the decimal point represent decreasing fractions of ten. So

735.24 = 7 ´102 + 3´101 + 5´ 100 + 2 ´ 10-1 + 4 ´ 10-2

The Binary System In this century, the inventors of the computer, such as the Hungarian mathematician John Von Neumann, decided that as the simplest electronic circuit could be in just two states: ON or OFF, it would be useful to represent numbers in a binary system with two symbols or digits, “1” and “0”. The binary system used is exactly the same as the denary system except that powers of two are used instead of powers of ten. So

1101.11 1 23 1 22 0 21 1 20 1 2-1 1 2-2 2 = ´ + ´ + ´ + ´ + ´ + ´ 1 1 = 1´ 8 + 1´ 4 + 0´ 2 + 1´1 + 1´ 2 + 1´ 4 13.75 = 10

To convert from base ten to base two, we have to deal with the two sides of the binary point separately (its not a decimal point any more).1 For the number to the left of the binary point we repeatedly divide by 2 until there is nothing left. The binary whole number is then the remainders read from the bottom up. 13 2 6 remainder 1 10 ¸ =

610 ¸ 2 = 3 remainder 0

310 ¸ 2 = 1 remainder 1

110 ¸ 2 = 0 remainder 1 13 1101 That is 10 = 2 .

1 The general term for the dot used to separate the fractional part of a number from the integer part applicable to all number bases is the radix point. “Radix” is another word for the base to which a positional number system is defined. For the binary digits to the right of the decimal point we repeatedly multiply by 2, and take the fractional part for the next step, until there is nothing left. The binary fractional number is then the set of whole ones and zeros we got in the multiplication process, reading from the top.

0.6875 2 1.375 1 0.375 10 ´ = = +

0.37510 ´ 2 = 0.75 = 0 + 0.75

0.7510 ´ 2 = 1.5 = 1+ 0.5

0.510 ´ 2 = 1 = 1+ 0.0

0.6875 0.1011 That is, 10 = 2 .

In principle we can carry out this process to convert to and from base ten or denary numbers and any other base, but in principle only a few others have had any widespread usage. The most popular ones in computer engineering are base eight (or ) and base sixteen (or ).

Binary number systems are actually much older than our “computer age”. The medieval English units for the measure of quantities of liquid went as follows:

2 gills = 1 chopin 2 chopins = 1 pint 2 pints = 1 quart 2 quarts = 1 pottle 2 pottles = 1 gallon 2 gallons = 1 peck 2 pecks = 1 demibushel 2 demibushels = 1 bushel or firkin 2 firkins = 1 kilderkin 2 kilderkins = 1 barrel 2 barrels = 1 hogshead 2 hogsheads = 1 pipe 2 pipes = 1 tun

How many do you recognise?

The first published work on the system as such, was by a Spanish bishop, Juan Caramuel in 1670. Charles XII of Sweden began to promote, in 1717, the use of base 8 or base 64 for calculation, but died in battle before he could make a decree to this effect.

Hexadecimal Engineers usually use the word “bit” as a contraction for the term “binary digit”. Over the last twenty years, engineers and computer scientists have designed and built computers with things internally grouped in multiples of eight bits, sixteen bits, thirty-two bits, and more recently sixty-four bits. These are all multiples of four bits and it is always possible to represent four bits with one hexadecimal digit, so for convenience, engineers and computer scientists usually use hexadecimal, or base sixteen numbers, at least at the level of the “nuts and bolts” of the computer. The problem with base sixteen is that we need sixteen different symbols or digits. (In base ten these are 0-9 and in base two they are 0 and 1). Usually we use the letters A to F to represent the numbers ten to fifteen, so the digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. So

3 2 1 0 A1CF16 = 10´16 + 1´16 + 12 ´16 + 15´16 = 10´ 4096 + 1´ 256 + 12 ´16 + 15´1 = 40960 + 256 + 192 + 15 41423 = 10

The biggest number we can write with four hexadecimal digits (or thirty-two bits) is FFFF 65535 32767 16 = 10 . You will often find that this number, or half of it 10 , is the largest integer you can normally use in some older computers or in some application programs. (The number 32767 10 usually appears when the underlying computer representation of the number is signed, i.e. it can be positive or negative. The sixteenth bit is used as a sign bit, instead of being available to represent numbers between 3276810 and 6553510 ). The following table gives the numbers zero to seventeen in four different bases:

Denary Binary Octal Hexadecimal

010 02 08 016

110 12 18 116 210 102 28 216

310 112 38 316

410 1002 48 416

510 1012 58 516

610 1102 68 616

710 1112 78 716 810 10002 108 816

910 10012 118 916

1010 10102 128 A16

1110 10112 138 B16

1210 11002 148 C16

1310 11012 158 D16

1410 11102 168 E16 1510 11112 178 F16

1610 100002 208 1016

The word hexadecimal is actually a peculiar mixture of Latin and Greek stems. The word octal only appeared in dictionaries in the early 1960s. Previously, the proper term octonal was used.

Mixed Radix Numbers Number systems with a fixed radix (or base) such as 2, 8, 10 or 16 are not the only possibilities. Two examples of mixed radix systems are quantities of time and “old” currency. For example, 2 years, 45 weeks, 4 days, 14 hours, 45 minutes, 55 seconds and 765 milliseconds written with the various bases in place would be 210 4552 47 1424 4560 5560 × 7651000 seconds . (Why can we not use months?) Also, 10 pounds, 6 shillings and thruppence ha’penny would be 1010 620 312 ×12 . Fixed-point system In computers, there are two distinct ways of dealing with numbers that might have a fractional part. A fixed-point representation is often used in fast digital signal processing (DSP) chips. However, using fixed-point notation is quite constrained, so it is only appropriate in applications where the algorithm to be used is completely determined. In fixed-point representation, all numbers are given with a fixed number of decimal places. For example, 3.141, 2.718, 1.410 each has a fixed three decimal places.

Floating-point system The more common representation used in computers for values that might need to be extremely small or extremely large is some version of a floating-point representation. Here all numbers are given with a fixed number of significant digits.2 For example, 1.964 ´ 103 and 9.891´10-8 each have four significant digits.

The representation of a floating-point number in a computer consists of three parts: the sign, the fractional part (mantissa), and the exponent part (characteristic). Most floating-point numbers x are normalized to the following form: x = ±(1 + f ) × 2e where f is the fraction or mantissa and e is the exponent. The fraction must satisfy 0 £ f < 1 On a 32-bit floating-point representation, we could, for example, use 1 bit for the sign, 24 bits for mantissa (corresponding to 6 hexadecimal digits or about 8 significant decimal digits), and 7 bits for the (base 16) exponent, allowing numbers in the range 2-64 to 263 . The finite number of bits for the mantissa f is a limitation on precision (the number of significant digits). The finite number of bits for the exponent e is a limitation on range.

To illustrate the fact that floating-point number representations do not evenly occupy the interval within their range, consider the toy floating point system with three bits for the fraction f and three bits for the exponent e. Then between 2e and 2e+1 , eight equally-spaced numbers are represented. However, as e increases, the spacing between the numbers represented by the fraction part increases. This would be an issue if, for example, we subtracted a very small floating-point number from a very large one. The small value may not be able to move the large value to the next lower number represented and the operation will thus have no effect. The diagram below illustrates the spacing between all the positive numbers capable of representation in this toy floating-point system. Notice, for example, that we cannot represent 6.25 in this system, only 6.0 and 6.5. On the other hand, not only can we represent 1.25, but even 1.125. (So 6.5 - 0.125 = 6.5 , but 1.5 - 0.125 =1.375 in this system).

2 A significant digit is any given digit of a number, except for zeros, to the left of the first nonzero digit that serve only to fix the position of the decimal point. For example, 1.732, 0.01992, 1980 each has 4 significant digits. The IEEE (Institute of Electrical and Electronics Engineers) has actually produced a standard for floating point arithmetic. This standard specifies how single precision (32 bit) and double precision (64 bit) floating point numbers are to be represented, as well as how arithmetic should be carried out on them.

IEEE Single Precision Floating-point The IEEE single precision floating point standard representation requires a 32-bit word, the bits of which may be numbered from 0 to 31, left to right. The first bit is the sign bit, S, the next eight bits are the exponent bits, E, and the final 23 bits are the fraction F:

S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF 0 1 8 9 31

The value V represented by the word may be determined as follows: If E=255 and F is nonzero, then V=NaN (Not a Number, meaning an undefined number like 0/0) If E=255 and F is zero and S is 1, then V= - ¥ (negative infinity) If E=255 and F is zero and S is 0, then V= ¥ (positive infinity) If 0

In particular, 0 00000000 00000000000000000000000 = 0 1 00000000 00000000000000000000000 = -0 0 11111111 00000000000000000000000 = Infinity 1 11111111 00000000000000000000000 = - Infinity 0 11111111 00000100000000000000000 = NaN 1 11111111 00100010001001010101010 = NaN 0 10000000 00000000000000000000000 = (+1)(2128-127 )(1.0) = 2 0 10000001 10100000000000000000000 = (+1)(2129-127 )(1.101) = 6.5 1 10000001 10100000000000000000000 = (-1)(2129-127 )(1.101) = -6.5 0 00000001 00000000000000000000000 = (+1)(21-127 )(1.0) = 2-126 0 00000000 10000000000000000000000 = (+1)(2-126 )(0.1) = 2-127 0 00000000 00000000000000000000001 = (+1)(2-126 )(0.00000000000000000000001) = 2-149 (Smallest positive value)

IEEE Double Precision Floating-point The IEEE double precision floating point standard representation requires a 64-bit word, the bits of which may be numbered from 0 to to 63, left to right. The first bit is the sign bit, S, the next eleven bits are the exponent bits, 'E', and the final 52 bits are the fraction 'F':

S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 0 1 11 12 63

The value V represented by the word may be determined as follows: If E=2047 and F is nonzero, then V=NaN (Not a number) If E=2047 and F is zero and S is 1, then V= - ¥ (negative infinity) If E=2047 and F is zero and S is 0, then V= ¥ (positive infinity) If 0

-3 1 In the toy system above, the spacing of the numbers represented between 1 and 2 is 2 or 8 . In the IEEE double precision system, the equivalent spacing is 2-52 and this number is often called the machine epsilon (in Matlab it is called simply eps). The maximum roundoff error incurred when the result of a single arithmetic operation is rounded to the nearest floating-point number is one half of this number. The decimal value of eps is 2.2204´10-16 . This means that IEEE double precision arithmetic is accurate to around 16 decimal digits.

Note that some simple decimal fractions such as 0.1 and 0.3 are not represented exactly in this floating point system. This means that if we add an increment like 0.1 to itself 1000 times, we may not get exactly the value 100. Try the following Matlab instructions and see if you get what you expect: a=4/3 b=a-1 c=b+b+b e=1-c

Note also that eps is not the smallest number representable in the IEEE double precision system. That is given the Matlab name realmin and has a value of 2-1022 = 2.2251´10-308 . The largest number that can be represented in this system is given the Matlab name realmax and has the value (2 - eps)21023 =1.7977´10+308 . Any calculation that produces a value larger than realmax is said to overflow and the result is represented as an exceptional floating point value called Inf in Matlab.

References:

Daintith, J. & Nelson, R.D. The Penguin Dictionary of Mathematics, Penguin, 1989. (This was the source of the Egyptian and cuneiform symbols above).

Knuth, Donal E., The Art Of Computer Programming, Vol 2, Seminumerical Algorithms, (2nd Edition), Addison-Wesley, 1981.

ANSI/IEEE Standard 754-1985, Standard for Binary Floating Point Arithmetic.

Acknowledgements

Xiaojun Wang, DCU, for assistance with floating point representations and Cleve Moler of The MathWorks also for assistance with floating point representations in an article in Matlab promotional literature.