doi:10.1145/1435417.1435431 Double, double toil and trouble —Shakespeare, Macbeth, Act 4, Scene 1

by john mashey The Long Road To 64 Bits

Shakespeare’s words often cover circumstances beyond his wildest dreams. Toil and trouble accompany major computing transitions, even when people plan ahead. Much of tomorrow’s software will still be driven by decades-old decisions. Past decisions have unanticipated side effects that last

decades and can be difficult to undo. ranging from high-level strategies down For example, consider the overly to programming specifics. long, often awkward, and sometimes contentious process by which 32-bit Fundamental Problem (late 1980s) systems evolved into Running out of address space is a long 64/32-bitters needed to address larger tradition in computing, and often quite storage and run mixtures of 32- and 64- predictable. Moore’s Law grew DRAM bit user programs. Most major general- approximately four times bigger every purpose CPUs now have such versions, three to four years, and by the mid- so bits have “doubled,” but “toil and 1990s, people were able to afford 2GB trouble” are not over, especially in soft- to 4GB of memory for midrange micro- ware. processor systems, at which point sim- This example illustrates the interac- ple 32-bit addressing (4GB) would get tions of hardware, languages (especial- awkward. Ideally, 64/32-bit CPUs would ly C), , applications, have started shipping early enough standards, installed-base inertia, and (1992) to have made up the majority of industry politics. We can draw lessons the relevant installed base before they

Chronology: Multiple Interlocking IBM S/360  Threads 32-bit, with 24-bit addressing (16MB total) of real (core) memory

B en F rans k e P H BY P HOTOGRA 1964 1965

january 2009 | vol. 52 | no. 1 | communications of the acm 45 practice

were actually needed. Then people mode operating systems, compilers, Problem: CPU Must Address could have switched to 64/32-bit oper- and libraries; and modify application Available Memory ating systems and stopped upgrading source code to work in both 32- and 64- Any CPU can efficiently address some 32-bit-only systems, allowing a smooth bit environments. Numerous details amount of virtual memory, done most transition. Vendors naturally varied had to be handled correctly to avoid conveniently by flat addressing, in in their timing, but shipments ranged redundant hardware efforts and main- which all or most of the bits in an in- from “just barely in time” to “rather tain software sanity. teger register form a virtual memory late.” This is somewhat odd, consider- Solutions. Although not without address that may be more or less than ing the long, well-known histories of in- subtle problems, the hardware was actual physical memory. Whenever af- sufficient address bits, combined with generally straightforward, and not fordable physical memory exceeds the the clear predictability of Moore’s Law. that expensive—the first commercial easily addressable, it stops being easy All too often, customers were unable to 64-bit micro’s 64-bit data path added to throw memory at performance prob- use memory they could easily afford. at most 5% to the chip area, and this lems, and programming complexity Some design decisions are easy to fraction dropped rapidly in later chips. rises quickly. Sometimes, segmented change, but others create long-term Most chips used the same general ap- memory schemes have been used with legacies. Among those illustrated here proach of widening 32-bit registers to varying degrees of success and pro- are: 64 bits. Software solutions were much gramming pain. History is filled with ˲˲ Some unfortunate decisions may more complex, involving arguments awkward extensions that added a few be driven by real constraints (1970: about 64/32-bit C, the nature of exist- bits to extend product life a few years, PDP-11 16-bit). ing software, competition/cooperation usually at the cost of hard work by oper- ˲˲ Reasonable-at-the-time decisions among vendors, official standards, and ating-system people. turn out in 20-year retrospect to have influential but totally unofficial ad hoc Moore’s Law has increased afford- been suboptimal (1976–1977: usage of groups. able memory for decades. Disks have C data types). Some better usage recom- Legacies. The IBM S/360 is 40 years grown even more rapidly, especially mendations could have saved a great old and still supports a 24-bit legacy ad- since 1990. Larger disk pointers are deal of toil and trouble later. dressing mode. The 64/32 solutions are more convenient than smaller ones, al- ˲˲ Some decisions yield short-term at most 15 years old, but will be with us, though less crucial than memory point- benefits but incur long-term problems effectively, forever. In 5,000 years, will ers. These interact when mapped files (1964: S/360 24-bit addresses). some software maintainer still be mut- are used, rapidly consuming virtual ad- ˲˲ Predictable trends are ignored, tering, “Why were they so dumb?”4 dress space. or transition efforts underestimated We managed to survive the Y2K In the mid-1980s, some people start- (1990s: 32 -> 64/32). problem—with a lot of work. We’re still ed thinking about 64-bit micros—for ex- Constraints. Hardware people need- working through 64/32. Do we have any ample, the experimental systems built ed to build 64/32-bit CPUs at the right other problems like that? Are 64-bit by DEC (Digital Equipment Corpora- time—neither too early (extra cost, no CPUs enough to help the “ 2038” tion). MIPS Computer Systems decided market), nor too late (competition, an- problem, or do we need to be working by late 1988 that its next design must be gry customers). Existing 32-bit binaries harder on that? Will we run out of 64-bit a true 64-bit CPU, and announced the needed to run on upward-compatible systems, and what will we do then? Will R4000 in 1991. Many people thought 64/32-bit systems, and they could be ex- IPv6 be implemented widely enough MIPS was crazy or at least premature. I pected to coexist forever, because many soon enough? thought the system came just barely in would never need to be 64 bits. Hence, All of these are examples of long- time to develop software to match in- 32 bits could not be a temporary com- lived problems for which modest fore- creasing DRAM, and I wrote an article patibility feature to be quickly discard- sight may save later toil and trouble. to explain why.5 The issues have not ed in later chips. But software is like politics: Sometimes changed very much since then. Software designers needed to agree we wait until a problem is really painful N-bit CPU. By long custom, an N-bit on whole sets of standards; build dual- before we fix it. CPU implements an ISA (instruction

DEC PDP-11/20  16-bit, 16-bit addressing (64KB total)

IBM S/370 family virtual memory, 24-bit Algol 68  addresses, but multiple user includes long long address spaces allowed 1966 1967 1968 1969 1970

46 communications of the acm | january 2009 | vol. 52 | no. 1 practice

set architecture) with N-bit integer reg- of virtual memory. For many programs Of course, increasing AM in a multi- isters and N (or nearly N) address bits, PM can differ greatly according to the tasking system is still useful in improv- ignoring sizes of buses or floating-point input, but of course PM ≤ VL. ing the number of memory-resident registers. Many 32-bit ISAs have 64- or The RL (real limit) is visible to the tasks or reducing paging overhead, 80-bit floating-point registers and im- operating system and is usually lim- even if each task is still limited by VL. plementations with 8-, 16-, 32-, 64-, or ited by the width of physical address Operating system code is always simpli- 128-bit buses. Sometimes marketers buses. Sometimes mapping hardware fied if it can directly and simply address have gotten this confused. I use the is used to extend RL beyond a too- all installed memory without having to term 64/32-bit here to differentiate the small “natural” limit (as happened in manage extra memory maps, bank reg- newer from the older PDP-11s, described later). Installed AM isters, and so on. 64-bit word-oriented supercomputers, (actual memory) is less visible to user Running out of address bits has a as the software issues are somewhat programs and varies among machines long history. different. In the same sense, the Intel without needing different versions of 80386 might have been called a 32/16- the user program. Mainframes, Minicomputers, bit CPU, as it retained complete sup- Most commonly, VL ≥ RL ≥ AM. Some Microprocessors port for the earlier 16-bit model. programs burn virtual address space The oft-quoted George Santayana is apro- Why 2N-bits? People sometimes for convenience and actually perform pos here: “Those who do not remember want wider-word computers to improve acceptably when PM >> AM: I’ve seen the past are condemned to repeat it.” performance for parallel bit operations cases where 4:1 still worked, as a result Mainframes. IBM S/360 mainframes or data movement. If one needs a 2N- of good locality. File mapping can in- (circa 1964; see accompanying Chro- bit operation (add, multiply, and so on), crease that ratio further and still work. nology sidebar) had 32-bit registers, each can be done in one instruction On the other hand, some programs run of which only 24 bits were used in ad- on a 2N-bit CPU, but requires longer poorly whenever PM > AM, confirming dressing, for a limit of 16MB of core sequences on an N-bit CPU. These are the old proverb, “Virtual memory is a memory. This was considered im- straightforward low-level performance way to sell real memory.” mense at the time. A “large” mainframe issues. The typical compelling reason Sometimes, a computer family starts offered at most 1MB of memory, al- for wider words, however, has been the with VL ≥ RL ≥ AM, and then AM grows, though a few “huge” mainframes could need to increase address bits, because and perhaps RL is increased in ways provide 6MB. Most S/360s did not sup- code that is straightforward and ef- visible only to the operating system, at port virtual memory, so user programs ficient with enough address bits may which point VL << AM. A single program generated physical addresses directly, need global restructuring to survive simply cannot use easily buyable mem- and the installed AM was partitioned fewer bits. ory, forcing work to be split and making among the operating system and user Addressing—virtual and real—in a it more difficult. For example, in For- programs. The 16MB limit was unfor- general-purpose system. User virtual ad- tran, the declaration REAL X (M, M, M) tunate, but ignoring (not trapping) the dresses are mapped to real memory ad- is a three-dimensional array. If M = 100, high-order 8 bits was worse. Assembly dresses, possibly with intervening page X needs 4MB, but people would like the language programmers cleverly packed faults whereby the operating system same code to run for M = 1,000 (4GB), 8-bit flags with 24-bit addresses into 32- maps the needed code or data from or 6,300 (1,000GB). A few such systems bit words. disk into memory. A user program can do exist, although they are not cheap. I As virtual addressing S/370s (1970) access at most VL (virtual limit) bytes, once had a customer complain about enabled programs that were larger than where VL starts at some hardware limit, lack of current support for 1,000GB of physical memory allowed, and as core then sometimes loses more space to an real memory, although later the cus- gave way to DRAM (1971), the 16MB operating system. For example, 32-bit tomer was able to buy such a system limit grew inadequate. IBM 370XA systems easily have VLs of 4, 3.75, 3.5, and use that memory in one program. CPUs (1983) added 31-bit addressing or 2GB. A given program execution uses After that, the customer complained mode, but retained a (necessary) 24-bit h by Jay R . Jaeger Jay p h by Photogra PDP-11/20 an d Dec PDP-11/20 Dec gezeichnet, selbst p h by p hotogra ibm 370/145 at most PM (program memory) bytes about lack of 10,000GB support… mode for upward compatibility. I had

DEC PDP-11/45  separate instruction+data (64KI + 64KD); 248KB maximum real memory

Unix PDP-11/45, operating system rewritten in C; IP16

C IBM 370/145  integer data types: int, char; main memory no C on other machines Unix longer core, but (36-bit Honeywell 6000, sixth edition: 24-bit DRAM, 1 Kbit/chip IBM 370, others) maximum file size (16MB) 1971 1972 1973 1974 1975

january 2009 | vol. 52 | no. 1 | communications of the acm 47 practice

been one of those “clever” program- was solved for the short run, but not flags, extenders, and other artifacts mers and was somewhat surprised to with enough finesse to support a large once needed. The 80386 provided 32- discover that a large program (the S/360 family of minicomputers. This was in- bit flat addresses (1986), but of course ASSIST assembler) originally written deed a costly oversight.”2 retained the earlier mechanisms, and in 1970 was still in use in 2006—in To be fair, it would have been dif- 16-bit PC software lasted “forever.” The 24-bit compatibility mode, because it ficult to meet the PDP-11’s cost goals intermediate 80286 (1982) illustrated wouldn’t run any other way. Compiler with 32-bit hardware, but I think DEC the difficulty of patching an architec- code that had been “clever” had long did underestimate how fast the price of ture to get more addressing bits. since stopped doing this, but assembly DRAM memory would fall. In any case, The 32-bit Motorola MC68000 (1979) code is tougher. (“The evil that men do this lasted a long time—the PDP-11 was started with a flat-addressing program- lives after them, the good is oft interred finally discontinued in 1997! ming model. By ignoring the high 8 bits with their bones.” —Shakespeare, The PDP-11/70 (1976) raised the of a 32-bit register, it exactly repeated again, Julius Caesar) number of supportable concurrent the S/360 mistake. Once again, “clever” Then, even 31-bit addressing became tasks, but any single program could still programmers found uses for those bits, insufficient for certain applications, es- only use 64KI + 64KD of a maximum of and when the MC68020 (1984) inter- pecially databases. ESA/370 (1988) of- 4MB, so that individual large programs preted all 32, some programs broke (for fered user-level segmentation to access required unnatural acts to split code example, when moving from the origi- multiple 2GB regions of memory. and data into 64KB pieces. Some be- nal Apple Macintosh to the Mac II). The 64-bit IBM zSeries (2001) still lieved this encouraged modularity and Fortunately, 64-bit CPUs managed to supports 24-bit mode, 40-plus years inhibited “creeping featurism” and was avoid repeating the S/360 and MC68000 later. Why did 24-bit happen? I’m told therefore philosophically good. problem. Although early versions usu- that it was all for the sake of one low- Although the 32-bit VAX-11/780 ally implemented 40 to 44 virtual ad- cost early model, the 360/30, where 32 (1977) was only moderately faster than dress bits, they trapped use of not-yet- bits would have run slower because it the PDP-11/70, the increased address implemented high-order v bits, rather had 8-bit data paths. These were last space was a major improvement that than ignoring them. People do learn, shipped more than 30 years ago. Were ended the evolution of high-end PDP- eventually. they worth the decades of headaches? 11s. VAX architect William Strecker Minicomputers. In the 16-bit DEC explained it this way: “However, there Lessons PDP-11 minicomputer family (1970), a are some applications whose program- ˲˲ Even in successful computer fami- single task addressed only 64KB, or in ming is impractical in a 65KB address lies created by top architects, address later models (1973), 64KB of instruc- space, and perhaps more importantly, bits are scarce and are totally consumed tions plus 64KB of data. “The biggest others whose programming is appre- sooner or later. and most common mistake that can ciably simplified by having a large ad- ˲˲ Upward compatibility is a real be made in computer design is that of dress space.”7 constraint, and thinking ahead helps. not providing enough address bits for Microprocessors. The Intel 8086’s 16- In the mainframe case, a 24-bit “first- memory addressing and management,” bit ISA seemed likely to fall prey to the implementation artifact” needed C. Gordon Bell and J. Craig Mudge PDP-11’s issues (1978). It did provide hardware/software support for 40-plus wrote in 1978. “The PDP-11 followed user-mode mechanisms for explicit years. Then a successful minicomputer this hallowed tradition of skimping segment manipulation, however. This family’s evolution ended prematurely. on address bits, but was saved on the allowed a single program to access Finally, microprocessors repeated the principle that a good design can evolve more than 64KB of data. PC program- earlier mistakes, although the X86’s through at least one major change. For mers were familiar with the multiplicity segmentation allowed survival long the PDP-11, the limited address space of memory models, libraries, compiler enough to get flat-address versions.

Unix ported to 32-bit Interdata 8/32

C Unix unsigned, typedef, union; 32V for VAX-11/780; 32-bit long used to replace C is ILP32 int,2 in lseek, tell on 16-bit DEC PDP-11/70 PDP-11; IP16L32 (64KI + 64KD), but C  larger physical memory DEC VAX-11/780 The C Programming (a huge 4MB) 32-bit, 32-bit addressing Language, Brian Kernighan (4GB total, 2GB per and Dennis Ritchie C user process) (Prentice-Hall) short, long added (partly from doing C for C Intel 8086 Motorola MC68000  XDS Sigma, although PDP-11: I16LP32; VAX 16-bit, but with user-visible 32-bit ISA, but 24-bit long was 64 bits there) (other 32-bitters): ILP32 segmentation addressing (e.g., S/360) 1976 1977 1978 1979 1980

48 communications of the acm | january 2009 | vol. 52 | no. 1 practice

The 32- to 64-bit Problem about the problem. get third-party software 64-bit clean, in the Late 1980s SGI (Silicon Graphics). Starting in ear- but it was valuable to the industry as it By the late 1980s, Moore’s Law seemed ly 1992, all new SGI products used only accelerated the 64-bit cleanup. DEC was cast in silicon, and it was clear that by 64/32-bit chips, but at first they still ran probably right to do this, since it had no 1993–1994, midrange microprocessor a 32-bit operating system. In late 1994, installed base of 32-bit Alpha programs servers could cost-effectively offer 2GB– a 64/32-bit operating system and com- and could avoid having to support two 4GB or more of physical memory. We pilers were introduced for large servers, modes. For VMS, early versions were 32- had seen real programs effectively use able to support both 32-bit and 64-bit bit, and later ones 64/32-bit. as much as 4:1 more virtual memory user programs. This software worked Other vendors. Over the next few than installed physical memory, which its way down the product line. A few years, many vendors shipped 64-bit meant pressure in 1993–1994, and real customers quickly bought more than CPUs, usually running 32-bit software, trouble by 1995. As I wrote in Byte in 4GB of memory and within a day had and later 64/32-bit: Sun UltraSPARC September 1991:5 recompiled programs to use it, in some (1995), HAL SPARC64 (1995), PA-RISC “The virtual addressing scheme of- cases merely by changing one Fortran (1996), HP/UX 11.0 (1997), IBM RS64 ten can exceed the limits of possible parameter. Low-end SGI workstations, and AIX 4.3 (1997), Sun Solaris 7 (1998), physical addresses. A 64-bit address can however, continued to ship with a IBM zSeries (2001), Intel Itanium handle literally a mountain of memory: 32-bit-only operating system for years, (2001), AMD AMD64 (2003), Intel EM- Assuming that 1 megabyte of RAM and of course, existing 32-bit hardware T64a (2004), Microsoft Windows XP x64 requires 1 cubic inch of space (using had to be supported…for years. For his- (2005). 64-bit versions appeared 4-megabit DRAM chips), 2**64 bytes torical reasons, SGI had more flavors of at various times. would require a square mile of DRAM 32-bit and 64-bit instruction sets than Most 64-bit CPUs were designed as piled more than 300 feet high! For now, were really desirable, so it was worse extensions of existing 32-bit architec- no one expects to address this much than having just two of them. tures that could run existing 32-bit bi- DRAM, even with next-generation 16MB This is the bad kind of “long tail”— naries well, usually by extending 32-bit DRAM chips, but increasing physical people focus on “first ship dates,” but registers to 64 bits in 64-bit mode, but memory slightly beyond 32 bits is defi- often the “last ship date” matters more, ignoring the extra bits in 32-bit mode. nitely a goal. With 16MB DRAM chips, as does the “last date on which we will The long time span for these releases 2**32 bytes fits into just over 1 cubic release a new version of an operating arises from natural differences in pri- foot (not including cooling)—feasible system or application that can run on orities. SGI was especially interested in for deskside systems… that system.” Windows 16-bit applica- high-performance technical comput- “Database systems often spread a tions still run on regular Windows XP, ing, whose users were accustomed to single file across several disks. Cur- 20 years after the 80386 was introduced. 64-bit supercomputers and could often rent SCSI disks hold up to 2 gigabytes Such support has finally been dropped use 64 bits simply by increasing one ar- in D. Kissell C 4600 Ke v in D. C on v ex t, (i.e., they use 31-bit addresses). Calcu- in Windows XP x64. ray dimension in a Fortran program and lating file locations as virtual memory DEC. DEC shipped 64-bit Alpha sys- recompiling. SGI and other vendors of addresses requires integer arithmetic. tems in late 1992, with a 64-bit operating large servers also cared about memory

d M ühl p for H enry Operating systems are accustomed to system, and by late 1994 was shipping for large database applications. It was working around such problems, but servers with memories large enough certainly less important to X86 CPU it becomes unpleasant to make work- to need greater than 32-bit addressing. vendors whose volume was dominated arounds; rather than just making things DEC might have requested (easy) 32-bit by PCs. In Intel’s case, perhaps the em- p h by Photogra work well, programmers are struggling ports, but thinking long term, it went phasis on Itanium delayed 64-bit X86s. just to make something work….” straight to 64-bit, avoiding duplication. By 2006, 4GB of DRAM, typically con-

I ntel 80286 So, people started to do something It was expensive in time and money to sisting of 1GB DRAMs, typically used

Motorola MC68020 32-bit; 32-bit addressing

IBM 370/XA C C adds 31-bit mode Amdahl UTS (32-bit S/370) I16LP32 on MC68000 for user programs, uses long long, especially in Bell Labs Blit terminal 24-bit mode for large file pointers still supported Intel 80286  C C  allows 16MB of real Unix workstations Convex (64-bit vector memory, but restrictions generally use ILP32, mini-supercomputer) keep most systems following Unix uses long long for at 1MB on VAX systems 64-bit integers 1981 1982 1983 1984 1985

january 2009 | vol. 52 | no. 1 | communications of the acm 49 practice

Common C data types. Software is Harder Building a 64-bit CPU is not enough. Embedded-systems markets can move int long ptr long long Label Examples easier than general-purpose markets, as happened, for example, with Cisco 16 16 IP16 PDP-11 Unix (early, 1973) routers and Nintendo N64s that used 64-bit MIPS chips. Vendors of most 32- 16 32 16 IP16L32 PDP-11 Unix (later, 1977) Multiple instructions for long bit systems, however, had to make their way through all of the following steps to 16 32 32 I16LP32 Early MC68000 (1982) Apple Macintosh 68 KB produce useful upward-compatible 64- Microsoft operating systems bit systems: (plus extras for X86 segments) 1. Ship systems with 64/32 CPUs, probably running in 32-bit mode. Con- 32 32 32 ILP32 IBM 370; VAX Unix; many workstations tinue supporting 32-bit-only CPUs as 32 32 32 64 ILP32LL or Amdahl; Convex; 1990s Unix systems long as they are shipped and for years ILP32LL64 Like IP15L32, for same reason; thereafter (often five or more years). multiple instructions for long long. Most vendors did this, simply because 32 32 32 64 LLP64 or Microsoft Win64 software takes time. IL32LLP64 or 2. Choose a 64-bit programming P64 model for C, C++, and other languages. 32 32 32 *(64) LP64 or Most 64/32 Unix systems This involves discussion with standards I32LP64 bodies and consultation with competi- 64 32 32 *(64) ILP64 HAL; logical analog of ILP32 tors. There may be serious consequenc- * In these cases, LP64 and ILP64 offer 64-bit integers, and long long seems redundant, but in practice, es if you select a different model from most proponents of LP64 and ILP64 included long long as well, for reasons given later. ILP64 uniquely required a new 32-bit type, usually called _int32. most of your competitors. Unix vendors and Microsoft did choose differently, for plausible reasons. Think hard about inter-language issues—Fortran expects four DIMMs and could cost less than Lessons INTEGER and REAL to be the same $400, 300GB disks are widely available ˲˲ For any successful computer fam- size, which makes 64-bit default inte- for less than $1 per GB, so one would ily, it takes a very long time to convert gers awkward. have expected mature, widespread sup- an installed base of hardware, and 3. Clean up header files, carefully. port for 64 bits by then. All this took software lasts even longer. 4. Build compilers to generate 64-bit longer than perhaps it should have, how- ˲˲ Moving from 32 to 64/32 bits is a code. The compilers themselves almost ever, and there have been many years long-term coexistence scenario. Un- certainly run in 32-bit mode and cross- where people could buy memory but not like past transitions, almost all 64-bit compile to 64-bit, although occasional be able to address it conveniently, or CPUs must run compatible existing huge machine-generated programs not be able to buy some third-party ap- 32-bit binaries without translation, can demand compilers that run in 64- plication that did, because such appli- since much of the installed base re- bit mode. Note that programmer san- cations naturally lag 64-bit CPUs. It is mains 32-bit and a significant num- ity normally requires a bootstrap step worth understanding why and how this ber of 32-bit programs are perfectly here, in which the 32-bit compiler is happened, even though the impending adequate and can remain so indefi- first modified to accept 64-bit integers issue was well known. nitely. and then is recoded to use them itself.

Apple Mac II  MC68020’s IBM ESA/370 Intel  32-bit addressing causes multiple 31-bit address ANSI C (“C89”) 80386, 32-bit, with trouble for some spaces per user, although effort had started support for 8086 mode MC68000 software complex; 24-bit still there in 1983, ANSI X3J11 1986 1987 1988 1989 1990

50 communications of the acm | january 2009 | vol. 52 | no. 1 practice

5. Convert the operating system to ated endless and sometimes vitupera- We could have gotten by with char, 64-bit, but with 32-bit interfaces as well, tive discussions. short, and int, if all our systems had to run both 64- and 32-bit applications. been 32 bits. 6. Create 64-bit versions of all system C: 64-bit Integers on 64/32-bit It is important to remember the na- libraries. CPUs: Technology and Politics ture of C at this point. It took a while for 7. Ship the new operating system People have used various (and not al- typedef to become common idiom. and compilers on new 64-bit hardware, ways consistent) notations to describe With 20/20 hindsight, it might have been and hopefully, on the earlier 64-bit choices of C data types. In the accompa- wise to have provided a standard set of hardware that has now been shipping nying table, the first label of several was typedefs to express “fast integer,” for a while. This includes supporting (at the most common, as far as I can tell. “guaranteed to be exactly N-bit integer,” least) two flavors of every library. On machines with 8-bit char, short “integer large enough to hold a pointer,” 8. Talk third-party software vendors is usually 16 bits, but other data types and to have recommended that people into supporting a 64-bit version of any can vary. The common choices are build their own typedefs on these def- program for which this is relevant. Ear- shown in the table. initions, rather than base types. If this ly in such a process, the installed base Early days. Early C integers (1973) had been done, perhaps much toil and inevitably remains mostly 32-bit, and included only int and char; then trouble could have been avoided. software vendors consider the potential long and short were added by 1976, This would have been very counter- market size versus the cost of support- followed by unsigned and typedef cultural, however, and it would have ing two versions on the same platform. in 1977. In the late 1970s, the installed required astonishing precognition. DEC helped the industry fix 32- to 64-bit base of 16-bit PDP-11s was joined by Bell Labs already ran C on 36-bit CPUs portability issues by paying for numer- newer 32-bit systems, requiring that and was working hard on portability, ous 64-bit Alpha ports. source code be efficient, sharable, and so overly specific constructs such as 9. Stop shipping 32-bit systems (but compatible between 16-bit systems (us- “int16” would have been viewed with continue to support them for many ing I16LP32) and 32-bitters (ILP32), a disfavor. C compilers still had to run on years). pairing that worked well. PDP-11s still 64KI+64KD PDP-11s, so language mini- 10. Stop supporting 32-bit hard- employed (efficient) 16-bit int most mality was prized. The C/Unix commu- ware with new operating system re- of the time, but could use 32-bit long nity was relatively small (600 systems) leases, finally. as needed. The 32-bitters used 32-bit and was just starting to adapt to the 11. Going from step 1 to step 6 typi- int most of the time, which was more coming 32-bit minicomputers. In late cally took two to three years, and get- efficient, but could express 16-bit via 1977, the largest known Unix installa- ting to step 9 took several more years. short. Data structures used to com- tion had seven PDP-11s, with a grand The industry has not yet completed municate among machines avoided total of 3.3MB of memory and 1.9GB of step 10. int. It was very important that 32-bit disk space. No one could have guessed Operating system vendors can avoid long be usable on the PDP-11. Before how pervasive C and its offshoots would step 1, but otherwise, the issues are sim- that, the PDP-11 needed explicit func- become, and thinking about 64-bit ilar. Many programs need never be con- tions to manipulate i nt[2] (16-bit int CPUs was not high on the list of issues. verted to 64-bit, especially since many pairs), and such code was not only awk- 32-bit happy times. In the 1980s, operating systems already support 64- ward, but also not simply sharable with ILP32 became the norm, at least in bit file pointers for 32-bit programs. 32-bit systems. This is an extremely im- Unix-based systems. These were happy Next, I trace some of the twists and portant point—long was not strictly times: 32-bit was comfortable enough turns that occurred in the 1990s, espe- necessary for 32-bit CPUs, but it was for buyable DRAM, for many years. In cially involving the implementation of very important to enable code sharing retrospect, however, it may have caused

k Opp elt Dir p h by mi p s R 4000 hotogra d er S chaelss, A lexan p h by II Photogra a pp le mac C on 64/32-bit CPUs. This topic gener- among 16- and 32-bit environments. some people to get sloppy in assuming

SGI  ships first 64-bit micro (MIPS R4000) in system; Sun UltraSPARC still running 32-bit 64/32-bit hardware, operating system 32-bit-only operating system Informal 64-bit C SGI  HAL Computer’s SPARC64 working group ships IRIX 6 (64/32 uses ILP64 model for C discusses various models operating system; (LP64, ILP64, LLP64), ILP32LL + LP64) on Large file summit with no agreement except Power Challenge; codifies 64-bit interface to use long long and have customers buy 4GB+ to files >2GB, even in 32-bit inttypes.h to help users memory, use it systems (ILP32LL+LP64)

DEC DEC Aspen group ships 64-bit Alpha ships 4GB+ in DEC 7000 supports LP64 model systems, running 64-bit SMPs (may have been for C, so that Unix vendors operating system; LP64 slightly earlier) are consistent 1991 1992 1993 1994 1995

january 2009 | vol. 52 | no. 1 | communications of the acm 51 practice

sizeof(int) == sizeof(long) == portability studies. feasible to make much software 64-bit- sizeof(ptr) == 32. Chessin’s working group had no for- clean without making it 32-bit unclean. Sometime around 1984, Amdahl mal status, but had well-respected se- The SGI effort proved that 32-bit and 64- UTS and Convex added long long for nior technologists from many systems bit programs could be sensibly support- 64-bit integers, the former on a 32-bit and operating systems vendors, includ- ed on one system, with reasonable data architecture, the latter on a 64-bitter. ing several who were members of the interchange, a requirement for most UTS used this especially for long file C Standards Committee. With all this other vendors. In practice, that meant pointers, one of the same motivations brainpower, one might hope that one that one should avoid long in struc- for long in PDP-11 Unix (1977). Algol clear answer would emerge, but that tures used to interchange data, exactly 68 inspired long long in 1968, and it was not to be. Each of the three propos- akin to the avoidance of int in the PDP- was also added to GNU C at some point. als (LLP64, LP64, and ILP64) broke dif- 11/VAX days. About this time in 1995 the Many reviled this syntax, but at least it ferent kinds of code, based on the par- Large File Summit agreed on Unix APIs consumed no more reserved keywords. ticular implicit assumptions made in to increase file size above 2GB, using Of course, 16-bit int was used on the 32-bit happy times. long long as a base data type. Microsoft DOS and Apple Macintosh Respected members of the group Finally, the Aspen Group in 1995 had systems, given the original use of Intel made credible presentations citing another round of discussions about the 8086 or MC68000, where 32-bit int massive analyses of code and porting 64-bit C model for Unix and settled on would have been costly, particularly experience, each concluding, “XXX is LP64, at least in part because it had on early systems with 8- or 16-bit data the answer.” Unfortunately, XXX was been proved to work and most actual paths and where low memory cost was different in each case, and the group 64-bit software used LP64. especially important. remained split three ways. At that point During the 1992 meetings of the 64-bit heats up in 1991/1992. The MIPS I suggested we perhaps could agree on 64-bit C group Microsoft had not yet R4000 and DEC Alpha were announced header files that would help program- chosen its model, but later chose in the early 1990s. Email discussions mers survive (leading to ). Most people did agree that long vendors. I was told that this happened during 1990–1992 regarding the proper long was the least bad of the alterna- because the only 32-bit integer type model for 64-bit C, especially when im- tive notations and had some previous in PCs was long; hence, people often plemented on systems that would still usage. used long to mean 32-bit more explic- run 32-bit applications for many years. We worried that we were years ahead itly than in Unix code. That meant that Quite often, such informal cooperation of the forthcoming C standard, but changing its size would tend to break exists among engineers working for oth- could not wait for it, and the C Stan- more code than in Unix. This seemed erwise fierce competitors. dards Committee members were sup- plausible to me. Every choice broke In mid-1992 Steve Chessin of Sun portive. If we agreed on anything rea- some codes, and thus people looked at initiated an informal 64-bit C work- sonable, and it became widespread their own code bases, which differed, ing group to see if vendors could avoid practice, it would at least receive due leading to reasonable differences of implementing randomly different 64- consideration. Like it or not, at that opinion. bit C data models and nomenclatures. point this unofficial group probably Many people despised long long Systems and operating system vendors made long long inevitable—or per- and filled newsgroups with arguments all feared the wrath of customers and haps that inevitability dated from 1984. against it, even after it became incorpo- third-party software vendors otherwise. By 1994, DEC was shipping large sys- rated into the next C standard in 1999. DEC had chosen LP64 and was already tems and had paid for many third-party The official rationale for the C standard far along, as Alpha had no 32-bit ver- software ports, using LP64. SGI was also can be consulted by anyone who wants sion. SGI was shipping 64-bit hardware shipping large systems, which support- to understand the gory details.6 and working on 64-bit compilers and ed both ILP32LL and LP64, with long Differing implicit assumptions operating system; it preferred LP64 as long filling the role handled by long in about sizes of various data types had well. Many others were planning 64-bit the late 1970s. grown up over the years and caused a CPUs or operating systems and doing The DEC effort proved that it was great deal of confusion. If we could go

HP UP/UX 11.0 is 64/32-bit OS; ILP32LL + LP64 C HP  IBM Sun  ISO/IEC C (WG14’s “C99”); announces PA-RISC 2.0, RS64 PowerPC, AIX 4.3; 64/32 Solaris 7 released; includes long long, 64-bit ILP32LL + LP64 ILP32LL + LP64 at least 64 bits 1996 1997 1998 1999 2000

52 communications of the acm | january 2009 | vol. 52 | no. 1 practice back to 1977, knowing what we know code must still be able to describe it is still with us, as are some side effects now, we could have made all this easier cleanly. Current software is rarely done of C usage in the mid-1970s. The tran- simply by insisting on more consistent from scratch but has to exist inside a sition to 64-bit probably took longer use of typedef. In 1977, however, it large ecosystem. than it needed for a host of reasons. It’s would have been difficult to think ahead ˲˲ We might never build 128-bit com- too bad that people quite often have 20 years to 64-bit micros—we were just puters, but it would probably be good been unable to use affordable memory getting to 32-bit minis! to invent a notation for 128-bit inte- for solving performance problems or This process might also have been gers, whose generated code on 64-bit avoiding cumbersome programming. eased if more vendors had been fur- CPUs is about the same as 64-bit code It’s too bad there has been so much ther along in 64-bit implementations is on 32-bit CPUs. It would be nice to “toil and trouble,” but “double” for mi- in 1992. Many people had either for- do that long before it is really need- croprocessors has been accomplished, gotten the lessons of the PDP-11/VAX ed. In general, predictable long-term and “double” for software is at least un- era or had not been involved then. In problems are most efficiently solved der way, and people have stopped argu- any case, the 64/32 systems had a more with a little planning, not with fren- ing about the need for 64-bit micros. stringent requirement: 64-bit and 32- zied efforts when the problem is immi- bit programs would coexist forever in nent. Fortunately, 128-bitters are many References the same systems. years away, if ever (maybe 2020–2040), 1. aspen Data Model Committee. 64-bit programming models: Why LP64? (1997–1998); www.unix.org/ Timing is important to creating good because we’ve just multiplied our ad- version2/whatsnew/lp64_wp.html. industry standards. Too early, stan- dressing size by four billion, and that 2. bell, C.G. and Mudge, J.C. The evolution of the PDP-11. Computer Engineering: A DEC View of Computer dards may be written with insufficient will last a while, even if Moore’s Law System Design. C.G. Bell, J.C. Mudge, and J.E. practical experience because no one continues that long! In case 128-bit McNamara, Eds. Digital Press, Bedford, MA, 1978. 3. Josey, A. Data size neutrality and 64-bit support has real implementations from which happens in 2020, however, it would be (1997); www.usenix.org/publications/login/ to learn. Too late, and vendors may well wise to be thinking about the next inte- standards/10.data.html. 4. mashey, J. Languages, levels, libraries, longevity. ACM have committed to numerous incom- ger size around 2010. Queue 2, 9 (2004–2005), 32–38. patible implementations. In between, a Of course, when the S/360 was intro- 5. mashey, J. 64-bit computing. BYTE (Sept. 1991), 135–142. 9 (The complete text can also be found by few groups have early implementations duced, IBM and other vendors had 36- searching Google Groups comp.arch: Mashey BYTE to help inform the standards. Perhaps and 18-bit product lines. In an alternate 1991). 6. rationale for International Standard—Programming by accident, the 64-bit C standards were universe, had the S/360 been a 36-bit Languages—C; www.open-std.org/jtc1/sc22/wg14/ reasonably well-timed. architecture with four 9-bit bytes/word, www/docs/n897.pdf (or other sites). 7. strecker, W.D. VAX-11/780: A virtual address extension most later machines would have been to the DEC PDP-11 family. Computer Engineering: Lessons 18- and 36-bit, and we would just be A DEC View of Computer System Design. C.G. Bell, J.C. Mudge, and J.E. McNamara, Eds. Digital Press, ˲˲ Standards often get created in non- starting the 36-bit to 72-bit transition. Bedford, MA, 1978. standard ways, and often, de facto stan- ˲˲ 8. unix specification; adding support for arbitrary file Hardware decisions last a long sizes; www.unix.org/version2/whatsnew/lfs20mar. dards long precede official ones. time, but software decisions may well html. ˲˲ Reasonable people can disagree, last longer. If you are a practicing pro- John Mashey is a consultant for venture capitalists especially when looking at different grammer, take pity on those who end and technology companies, sits on various technology advisory boards of startup companies, and is a trustee of sets of data. up maintaining your code, and spend the Computer History Museum. Along with spending 10 ˲˲ Sometimes one must work with some time thinking ahead. Allow for years at Bell Laboratories working on the Programmer’s Workbench version of Unix, he was also one of the competitors to make anything reason- natural expansion of hardware, hide designers of the MIPS RISC architecture, one of the able happen. low-level details, and use the highest- founders of the SPEC benchmarking group, and chief scientist at Silicon Graphics. ˲˲ Programmers take advantage of level languages you can. C’s ability to A previous version of this article appeared in the October extra bits or ambiguity of specification. make efficient use of hardware can be 2006 issue of ACM Queue.

Most of the arguments happen because both a blessing and a curse. © 2009 ACM 0001-0782/09/0100 $5.00 application programmers make differ- ing implicit assumptions. Conclusion ˲˲ Code can be recompiled, but once Some decisions last a very long time. data gets written somewhere, any new The 24-bit addressing of 1964’s S/360

IBM 64-bit zSeries (S/360 descendent); 24-bit addressing still supported Microsoft  Microsoft AMD Intel Windows XP Professional Intel  Windows 64-bit 64-bit X86 64-bit X86, (called EMT64), x64 for X86; LLP64 64-bit Itanium for Itanium (now called AMD64) compatible with AMD (or IL32LLP64) 2001 2002 2003 2004 2005

january 2009 | vol. 52 | no. 1 | communications of the acm 53