64-BIT TECHNOLOGY

64-bit

From Wikipedia, the free encyclopedia.

In , 64-bit is an adjective N-bit Processors used to describe integers, memory addresses or 4- 8- 16- 24- 31- 32- 48- 64- 128- other data units that are at most 64 bits (8 bit bit bit bit bit bit bit bit bit octets) wide, or to describe CPU and ALU architectures based on registers, address buses, N-bit Applications or data buses of that size. 16- 31- 32- 64-

bit bit bit bit As of 2004, 64-bit CPUs are common in N-bit Data Sizes servers, and have recently been introduced to 4- 8- 16- 32- 64- 128- the (previously 32-bit) mainstream personal computer arena in the form of the AMD64, bit bit bit bit bit bit EM64T, and PowerPC 970 (or "G5") nibble byte octet word dword qword architectures. These definitions are relevant to the world of processors. See linked articles for Although a CPU may be 64-bit internally, its discussion of the meaning in other external data or address bus may have a architectures. The 31-bit and 48-bit sizes different size, either larger or smaller, and the relate to IBM mainframes and AS/400s, respectively. term is often used to describe the size of these buses as well. For instance, many current machines with 32-bit processors use 64-bit buses, and may occasionally be referred to as "64-bit" for this reason. The term may also refer to the size of an instruction in the computer's instruction set or to any other item of data. Without further qualification, however, a computer architecture described as "64-bit" generally has integer registers that are 64 bits wide and thus directly supports dealing both internally and externally with 64- bit "chunks" of data. Architectural implications

Registers in a processor are generally divided into three groups: integer, floating point, and other. In all common general purpose processors, only the integer registers are capable of storing pointer values (that is, an address of some data in memory). The non- integer registers cannot be used to store pointers for the purpose of reading or writing to memory, and therefore cannot be used to bypass any memory restrictions imposed by the size of the integer registers. Nearly all common general purpose processors (with the notable exception of the ARM and most 32-bit MIPS implementations) have integrated floating point hardware, which may or may not use 64 bit registers to hold data for processing. For example, the AMD64 architecture defines a SSE unit which includes 16 128-bit wide registers, and the traditional x87 floating point unit defines 8 80-bit registers in a stack configuration. By contrast, the 64-bit Alpha family of processors defines 32 64-bit wide floating point registers in addition to its 32 64-bit wide integer registers. Memory limitations

Most CPUs are currently (c. 2005) designed so that the contents of a single integer register can store the address (location) of any datum in the computer's . Therefore, the total number of addresses in the virtual memory — the total amount of data the computer can keep in its working area — is determined by the width of these registers. Beginning in the 1960s with the IBM System 360, then (amongst many others) the DEC VAX minicomputer in the 1970s, and then with the 80386 in the mid- 1980s, a de facto consensus developed that 32 bits was a convenient register size. A 32- bit register meant that 232 addresses, or 4 gigabytes of RAM memory, could be referenced. At the time these architectures were devised, 4 gigabytes of memory was so far beyond the typical quantities available in installations that this was considered to be enough "headroom" for addressing. 4-gigabyte addresses were considered an appropriate size to work with for another important reason: 4 billion integers are enough to assign unique references to most physically countable things in applications like databases.

However, with the march of time and the continual reductions in the cost of memory (see Moore's Law), by the early 1990s installations with quantities of RAM approaching 4 gigabytes began to appear, and the use of virtual memory spaces exceeding the 4- gigabyte ceiling became desirable for handling certain types of problems. In response, a number of companies began releasing new families of chips with 64-bit architectures, initially for supercomputers and high-end workstation and server machines. 64-bit computing has gradually drifted down to the personal computer desktop, with Apple Computer's PowerMac desktop line as of 2003 and its iMac home computer line (as of 2004) both using 64-bit processors (the G5 chip from IBM), and AMD's "AMD64" architecture (cloned by Intel as "EM64T") becoming common in high-end PCs.

Timeline

• 1991: MIPS Technologies produced the first 64-bit CPU, as the third revision of their MIPS RISC architecture, the R4000. The CPU was commercially available in 1991 and used in SGI graphics workstations starting with the Indigo series, running the 64-bit version of the IRIX .

• 1994: Intel announced plans for the 64-bit IA-64 architecture (jointly developed with HP) as a successor to its 32-bit IA-32 processors. A 1998-1999 launch date was targeted.

• 1995: Fujitsu-owned HAL Computer Systems launched workstations based on a 64-bit CPU, HAL's independently designed first generation SPARC64. IBM released 64-bit AS/400 systems, with the upgrade able to convert the operating system, database and applications.

• 1996: Sun and HP released their 64-bit processors, the UltraSPARC and the PA-8000. Sun Solaris, IRIX, and other variants of UNIX continued to be common 64-bit operating systems.

• 1999: Intel released the instruction set for the IA-64 architecture. First public disclosure of AMD's set of 64-bit extensions to IA-32 called x86-64.

• 2000: IBM shipped its first 64-bit mainframe, the zSeries z900, and its new z/OS operating system — culminating history's biggest 64-bit processor development investment and instantly wiping out 31-bit plug-compatible competitors Fujitsu/Amdahl and Hitachi. 64-bit Linux on zSeries followed almost immediately.

• 2001: Intel finally shipped its 64-bit processor line, now branded , targeting high-end servers. It fails to meet expectations due to the repeated delays getting IA-64 to market, and becomes a flop. Linux was the first operating system to run on the processor at its release.

• 2002: Intel introduced the Itanium 2 as a successor to the Itanium.

• 2003: AMD brought out its 64-bit and processor lines. Apple also shipped 64-bit PowerPC chips courtesy of IBM and Motorola, along with an update to its Mac OS X operating system. Several Linux distributions released with support for x86-64. Microsoft announced that it would create a version of its Windows operating system for the AMD chips. Intel maintained that its Itanium chips would remain its only 64-bit processors.

• 2004: Intel, reacting to the market success of AMD, admitted it had been developing a clone of the x86-64 extensions, which it calls EM64T. Updated versions of its and 4 processor families supporting the new instructions were shipped.

• 2005: In March, Intel announced that their first dual-core processors will ship in the second quarter 2005 with the release of the Pentium Extreme Edition 840 and the new chips. Dual-core Itanium 2 processors will follow in the fourth quarter.

• 2005: On April 18, Beijing Longxin rolled out its first x86-64 compatible CPU, named Longxin II. The thumb sized square chip gathers 13.5 million transistors with a peak capacity of 2 billion calculations per second for a single accuracy check and 1 billion calculations per second under a dual accuracy check. The new chip registers a maximum frequency of 500MHz and a power consumption ranging from 3 to 5 watts.

• 2005: On April 30, Microsoft publicly released Windows XP x64 Edition for x86-64 processors.

• 2005: In May, AMD pre-released its dual-core desktop processor family called Athlon 64 X2. Athlon 64 X2 (Toledo) processors feature two cores with 1MB of L2 memory per core and consist of about 233.2 million transistors. They are 199 mm² large.

• 2005: In July, IBM announced its new dual-core 64-bit PowerPC 970MP (codenamed Antares). 32 vs 64 bit

A change from a 32-bit to a 64-bit architecture is a fundamental alteration, as most operating systems must be extensively modified to take advantage of the new architecture. Other software must also be ported to use the new capabilities; older software is usually supported through either a hardware compatibility mode (in which the new processors support an older 32-bit instruction set as well as the new modes), through software emulation, or by the actual implementation of a 32-bit processor core within the 64-bit processor die (as with the Itanium2 processors from Intel). One significant exception to this is the AS/400, whose software runs on a virtual ISA which is implemented in low-level software. This software, called TIMI, is all that has to be rewritten to move the entire OS and all software to a new platform, such as when IBM transitioned their line from 32-bit POWER to 64-bit POWER.

While 64-bit architectures indisputably make working with huge data sets in applications such as digital video, scientific computing, and large databases easier, there has been considerable debate as to whether they or their 32-bit compatibility modes will be faster than comparably-priced 32-bit systems for other tasks.

Theoretically, some programs could well be faster in 32-bit mode. Instructions for 64-bit computing take up more storage space than the earlier 32-bit ones, so it is possible that some 32-bit programs will fit into the CPU's high-speed cache while equivalent 64-bit programs will not. However, in applications like scientific computing, the data being processed often fits naturally in 64-bit chunks, and will be faster on a 64-bit architecture because the CPU will be designed to such information directly rather than requiring the program to perform multiple steps. Such assessments are complicated by the fact that in the process of designing the new 64-bit architectures, the instruction set designers have also taken the opportunity to make other changes that address some of the deficiencies in older instruction sets by adding new performance-enhancing facilities (such as the extra registers in the AMD64 design). Pros and cons

A common misconception is that 64-bit architectures are no better than 32-bit architectures unless the computer has more than 4 GB of memory. This is not entirely true:

• Some operating systems reserve portions of each process' address space for OS use, effectively reducing the total address space available for mapping memory for user programs. For instance, Windows XP DLLs and userland OS components are mapped into each process' address space, leaving only 2 or 3 GB (depending on the settings) address space available, even if the computer has 4 GB of RAM. This restriction is not present in Linux or 64-bit Windows. • Memory mapping of files is becoming more dangerous with 32-bit architectures, especially with the introduction of relatively cheap recordable DVD technology. A 4 GB file is no longer uncommon, and such large files cannot be memory mapped easily to 32-bit architectures. This is an issue, as memory mapping remains one of the most efficient disk-to-memory methods, when properly implemented by the OS.

The main disadvantage of 64-bit architectures is that relative to 32-bit architectures the same data occupies slightly more space in memory (due to swollen pointers and possibly other types and alignment padding). This increases the memory requirements of a given process, and can have implications for efficient processor cache utilisation. Maintaining a partial 32-bit data model is one way to handle this, and is in general reasonably effective. 64-bit data models

Converting application software written in a high-level language from a 32-bit architecture to a 64-bit architecture varies in difficulty. One common recurring problem is that some programmers assume that pointers (variables that store memory addresses) have the same length as some other data type. Programmers assume they can transfer quantities between these data types without losing information. Those assumptions happen to be true on some 32 bit machines (and even some 16 bit machines), but they are no longer true on 64 bit machines. The C programming language and its descendant C++ make it particularly easy to make this sort of mistake.

To avoid this mistake in C and C++, the sizeof operator can be used to determine the size of these primitive types if decisions based on their size need to be made at run time. Also, limits.h in the C99 standard and climits in the C++ standard give more helpful info; sizeof only returns the number of bytes, which is sometimes misleading, because the size of a byte is also not well defined in C or C++. One needs to be careful to use the ptrdiff_t type (in the standard header ) when doing pointer arithmetic; too much code incorrectly uses "int" or "long" instead.

Neither C nor C++ define the length of a pointer, int, or long to be a specific number of bits.

In most programming environments on 32 bit machines, pointers, "int" variables, and "long" variables, are all 32 bits long.

However, in many programming environments on 64-bit machines, "int" variables are still 32 bits wide, but "long"s and pointers are 64 bits wide. These are described as having an LP64 data model. Another alternative is the ILP64 data model in which all three data types are 64 bits wide. However, in most cases the modifications required are relatively minor and straightforward, and many well-written programs can simply be recompiled for the new environment without changes. Another alternative is the LLP64 model that maintains compatibility with 32 bit code, by leaving both int and long as 32-bit. "LL" refers to the "long long" type, which is at least 64 bits on all platforms, including 32 bit environments. Note that a programming model is a choice made on a per compiler basis, and several can coexist on the same OS. However typically the programming model chosen by the OS API as primary model dominates.

Another consideration is the data model used for drivers. Drivers make up the majority of the operating system code in most modern operating systems (although many may not be loaded when the operating system is running). Many drivers use pointers heavily to manipulate data, and in some cases have to load pointers of a certain size into the hardware they support for DMA. As an example, a driver for a 32-bit PCI device asking the device to DMA data into upper areas of a 64-bit machine's memory could not satisfy requests from the operating system to load data from the device to memory above the 4 gigabyte barrier, because the pointers for those addresses would not fit into the DMA registers of the device. This problem is solved by having the OS take the memory restrictions of the device into account when generating requests to drivers for DMA. Current 64-bit processor architectures

64-bit processor architectures (as of 2005) include:

• The DEC Alpha architecture (view ALPHA 64-bit timeline) • Intel's IA-64 architecture (used in Itanium CPUs) • AMD's AMD64 architecture (used in AMD's Opteron and Athlon 64 CPUs). o Intel now markets the same architecture for its own processors as EM64T. • SPARC architecture o Sun's UltraSPARC architecture o Fujitsu's SPARC64 architecture • IBM's POWER architecture • IBM/Motorola's PowerPC architecture (originally the PowerPC 620, more recently the PowerPC 970 µP) • IBM's z/Architecture, used by IBM zSeries and System z9 mainframes • MIPS Technologies' MIPS IV, MIPS V, and MIPS64 architectures • HP's PA-RISC family

Some 64-bit processor architectures can execute 32-bit code natively without any performance penalty, such as AMD64, MIPS64,Sparc64, zSeries, PowerPC64, etc. This kind of support is commonly called biarch support or more generally multi-arch support. Beyond 64 bits

64-bit words seem to be sufficient for most practical uses today (circa 2004). Still it may be mentioned that IBM's System/370 used 128-bit floating point numbers, and many modern processors also include 128-bit floating point registers. The System/370 was notable, however, in that it also used variable-length decimal numbers of up to 16 bytes (i.e. 128-bit).

IA-32

From Wikipedia, the free encyclopedia.

IA-32, sometimes generically called x86-32, is the computer architecture of Intel's most successful . Within various programming language directives it is also referred to as "". The term may be used to refer to the 32-bit extensions to the original x86 architecture, or to the architecture as a whole.

This architecture defines the instruction set for the family of microprocessors installed in the vast majority of personal computers in the world.

The term means Intel Architecture, 32-bit, which distinguishes it from the 16-bit versions of the architecture that preceded it, and the 64-bit architecture IA-64 (which is very different, although it has an IA-32 compatibility mode). The more generic name for all 16 and 32-bit versions of this architecture is x86.

Intel was the inventor and is the biggest supplier of processors compatible with this instruction set, but it is not the only supplier of such processors. The second biggest supplier is AMD. And then there are numerous even smaller more specialized suppliers of these processors.

This instruction set was introduced in the Intel 80386 in 1985. This instruction set is still the basis of most PC microprocessors twenty years later in 2005. Even though the instruction set has remained intact, the successive generations of microprocessors that run it have become much faster at running it.

The IA-32 instruction set is usually described as CISC (Complex Instruction Set Computer) architecture, though such classifications have become less meaningful with advances in microprocessor design. Two memory management models

There are two memory access models that IA-32 supports. One is called Real mode, and the other is called Protected mode. In Real Mode, the processor is limited to accessing a total of just over 1MB of memory, while in Protected mode it can access all of its memory.

Real mode

The old DOS operating system required the real mode to work, while newer Windows, Linux and other operating systems usually require the protected mode. Upon power-on (aka booting), the processor initiates itself into Real mode, and then it begins loading programs automatically into RAM from ROM and disk. A program inserted somewhere along the boot sequence may be used to put the processor into the Protected mode.

Protected mode

In Protected mode, a number of other advantages beyond just the additional memory addressability beyond the DOS 1MB limit get activated. One of them is protected memory, which prevents programs from corrupting one another. Another one is virtual memory, which lets programs use more memory than is physically installed on the machine. And the third feature is task-switching, aka multitasking, which lets a computer juggle multiple programs all at once to look like they are all running at the same time.

The size of memory in Protected mode is usually limited to 4GB. However, this isn't the ultimate limit of the size of memory in IA-32 processors. Through tricks in the processor's and segment memory management systems, IA-32 operating systems may be able to access more than 32-bits of address space, even without the switchover to the 64-bit paradigm. One such trick is known as PAE (Physical Address Extensions).

Virtual 8086 mode

There was also a sub-mode of operation in Protected mode, called virtual 8086 mode. This is basically a special hybrid operating mode which allowed old DOS programs and operating systems to run while under the control of a Protected mode supervisor operating system. This allowed for a great deal of flexibility in running both Protected mode programs and DOS programs simultaneously. This mode was added only with the IA-32 version of Protected mode, it did not exist previously in the 80286 16-bit version of Protected mode. Registers

The 386 has eight 32-bit general purpose registers for application use. There are 8 floating point stack registers. Later processors added new registers with their various SIMD instruction sets too, such as MMX, 3DNow!, and SSE.

There are also system registers that are used mostly by operating systems but not by applications usually. They are known as segment, control, debug, and test registers. There are six segment registers, used mainly for memory management. The number of control, debug or test registers varies from model to model.

General Purpose registers The x86 general purpose registers are not really as general purpose as their name implies. That is because these general purpose registers have some highly specialized tasks that can often only be done by using only one or two specific registers. In other architectures, any general purpose register can be used for any purpose. The x86 general purpose registers further subdivide into registers specializing in data and others specializing in addressing.

Also a lot of operations can be done either inside a register or directly inside RAM without requiring the data to be loaded into a register first. The 1970s heritage of this architecture shows through by this behaviour.

Note: with the advent of the 64-bit extensions to x86 in AMD64, this odd behaviour has now been cleaned up (at least in 64-bit mode). General purpose registers are now truly general purpose and they can be used interchangeably. This does not affect the 32-bit architecture, however.

8-bit and 16-bit register subsets

8-bit and 16-bit subsets of these registers are also accessible. For example, the lower 16- bits of the 32-bit EAX registers can be accessed by calling it the AX register. Some of the 16-bit registers can be further subdivided into 8-bit subsets too; for example, the upper 8- bit half of AX is called AH, and the lower half is called AL. Similarly, EBX is subdivided into BX (16-bit), which in turn is divided into BH and BL (8-bit).

General data registers

All of the four following registers may be used as general purpose registers. However each has some specialized purpose as well. Each of these registers also have 16-bit or 8- bit subset names.

• EAX Accumulator (with a special interpretation for arithmetic instructions; a for accumulator) • EBX base register (used for addressing data in the data segment) • ECX (with a special interpretation for loops, c for counter) • EDX data register

General address registers

Used only for address pointing. They have 16-bit subset names, but no 8-bit subsets.

• EBP base pointer (holds the address of the current stack frame) • ESI source index (for string operations) • EDI destination index (for string operations) • ESP stack pointer (holds the top address of the stack) • EIP instruction pointer (holds the current instruction address) Floating point stack registers

Initially, IA-32 included floating-point capabilities only on add-on processors (8087, 80287 and 80387.) With the introduction of the 80486, these 8 80x87 floating point registers, known as ST(0) through ST(7) are built in to the CPU. Each register is 80 bits wide and stores numbers in the extended precision format of the IEEE floating-point standard.

These registers are not accessible directly, but are accessible like a LIFO stack. The register numbers are not fixed, but are relative to the top of the stack; ST(0) is the top of the stack, ST(1) is the next register below the top of the stack, ST(2) is two below the top of the stack, etc. That means that data is always pushed down from the top of the stack, and operations are always done against the top of the stack. So you couldn't just access any register randomly, it has to be done in the stack order.

SIMD registers

MMX, 3DNow!, and SSE also added new registers of their own to the IA-32 instruction set.

MMX registers

MMX added 8 new registers to the architecture, known as MM0 through MM7 (henceforth referred to as MMn). In reality, these new registers were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed not relative, and therefore they were randomly accessible.

Each of the MMn registers are 64-bit integers. However, one of the main concepts of the MMX instruction set is the concept of packed data types, which means instead of using the whole register for a single 64-bit integer (quadword), two 32-bit integers (doubleword), four 16-bit integers (word) or eight 8-bit integers (byte) may be used.

Also because the MMX's 64-bit MMn registers are aliased to the FPU stack, and each of the stack registers are 80-bit wide, the upper 16-bits of the stack registers go unused in MMX, and these bits are set to all ones, which makes it look like NaN's or infinities in the floating point view. This makes it easier to tell whether you are working on a floating point data or MMX data.

3DNow! registers

3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses the exact same register naming convention as MMX, that is MM0 through MM7. The only difference is that instead of packing byte to quadword integers into these registers, one would pack single precision floating points into these registers. The advantage of aliasing registers with the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about.

SSE registers

SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (Note: in AMD64, the number of SSE XMM registers has been increased from 8 to 16.)

But the downside is that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected mode, called Enhanced mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into traditional Protected mode.

SSE is a SIMD instruction set that works only on floating point values, like 3DNow!. However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of single precision floats into its registers. The original SSE was limited to only single-precision numbers, like 3DNow!. The SSE2 introduced the capability to pack double precision numbers too, which 3DNow! had no possibility of doing since a double precision number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128-bit, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow!, which were limited to only single precision. Instructions

The full listing of the x86 machine language mnemonics including integer, floating point, and SIMD instructions can be found in the X86 instruction listings link. They are categorized into a chronological and hierarchal format showing when the instructions first became available, and what category of instructions they are.

The original IA-32 instruction set has been evolved over time with the addition of the multimedia instruction updates. However, the ultimate evolution of IA-32 will be when it becomes 64-bit, but of course at that point it cannot be called IA-32 anymore. It is called x86_64 and the first implementation was AMD's AMD64. We cannot call it IA-64 as Intel and HP already saved this label for their new Itanium design and this design is not really an evolution which extends IA-32 but AMD64 is. AMD64 was the first x86_64 instruction set designed. Later, Intel followed by imitating AMD's design with what they call EM64T.

SIMD Multimedia Instruction Set updates

Various generations of IA-32 CPUs since have added several extensions to the original instruction set. They were known technically as SIMD instruction sets. However, more colloquially they were known as Multimedia instruction sets, because they were mainly used in multimedia entertainment software applications.

• The MMX extensions were the first major upgrade. This was a set of integer-only SIMD instructions. This was co-introduced by Intel and AMD in their Pentium MMX and K6 processors, in 1997. It shared its registers with the x87 FPU; therefore operating systems did not have to be modified to accept these instructions, they automatically worked if the OS also supported x87 state-saving. • MMX was further upgraded with the addition of floating-point SIMD capabilities, with the introduction of 3DNow! in early 1999. Like MMX, this set shared its registers with the x87 FPU too. This extension was introduced by AMD in the K6-2 processor, but it was never picked up by Intel. • SSE was single precision floating point SIMD introduced by Intel in late 1999, with the introduction of the Pentium III processor. Unlike 3DNow!, it was not an extension to the MMX extension, nor did it share its registers with the x87 FPU. It required some modifications to operating systems for them to work. This added programming inconvenience was made up for by the fact that SSE worked unencumbered by any of the old limitations of the x87 FPU. This instruction set was adopted eventually by AMD starting with its Athlon XP processor; all further extensions to SSE will likely be adopted by AMD from now on, as it will no longer make any extensions to its own 3DNow! instructions. • SSE2 was introduced in early 2001 with the introduction of the processor. This was a further upgrade to the original SSE, adding double precision operations to its bag of tricks. • SSE3 was introduced in early 2004, in an upgraded version of the Pentium 4, codenamed Prescott. It featured some minor tweaks to the SSE2 extensions.

Next-generation 64-bit Instruction Sets

Two new instruction sets can claim to be the 64-bit successor to IA-32. One of them builds on top of IA-32 but has a different name, while the other one discards IA-32 completely but has a similar name.

IA-64

Intel's IA-64 architecture is not directly compatible with the IA-32 instruction set. It completely discards all IA-32 instructions, and starts from scratch with a completely different instruction set as well as using a VLIW design instead of out-of-order execution. IA-64 is the architecture used by their Itanium line of processors. The Itanium has hardware-support for IA-32, though very slow because of the different approach. IA-32 execution mode is set by the EFI program loaded on boot-up. The nomenclature "IA-64" means "Intel Architecture, 64-bit", but the connection with IA-32 is only in the name.

AMD64

AMD's AMD64 instruction set, aka x86-64, is largely built on top of IA-32, and thus maintains the x86 family heritage. While extending the instruction set, AMD took the opportunity to clean up some of the odd behaviour of this instruction set that has existed (plagued?) since its earliest 16-bit days, while the processor is operating in 64-bit mode. They also doubled the number of general purpose registers from 8 to 16; and the general purpose registers are now much more truly general-purpose registers. They also doubled the number of SSE registers from 8 to 16 as well. They have also deprecated most of the functionality of the segment registers, since their usage has steadily declined even during the IA-32 days.

EM64T

By February 2004, Intel implicitly acknowledged the logic of the AMD64 instruction set, deriving from it the EM64T, which is very similar to AMD64. This extension is compatible with code written for the AMD64. Intel started using the set starting with the Xeon Nocona core in 2004, introducing it to the desktop market with the Pentium 4 Prescott 2M in early 2005. IA-64

From Wikipedia, the free encyclopedia.

In computing, IA-64 (Intel Architecture-64) is a 64-bit processor architecture developed in cooperation by Intel and Hewlett-Packard, implemented by processors such as Itanium and Itanium 2. The goal of Itanium was to produce a "post-RISC era" architecture, using a very long instruction word (VLIW) design. Unlike Intel x86 processors, the Itanium is not geared toward high performance execution of the IA-32 (x86) instruction set. Architecture

EPIC

In a mainstream "out-of-order" design, a complex decoder system examines each instruction as they flow through the pipeline and sees which can be fed off to operate in parallel across the available execution units — e.g., a series of instructions that say A = B + C and D = F + G will not affect each other, and so they can be fed into two different execution units and run in parallel. The ability to extract instruction level parallelism (ILP) from the instruction stream is essential for good performance in a modern CPU. Predicting which code can and cannot be split up this way is a very complex task. In many cases the inputs to one line are dependent on the output from another, but only if some other condition is true. For instance, consider the slight modification of the example noted before, A = B + C; IF A==5 THEN D = F + G. In this case the calculations remain independent of the other, but the second command requires the results from the first calculation in order to know if it should be run at all.

In these cases the circuitry on the CPU typically "guesses" what the condition will be. In something like 90% of all cases, an IF will be taken, suggesting that in our example the second half of the command can be safely fed into another core. However, getting the guess wrong can cause a significant performance hit when the result has to be thrown out and the CPU waits for the results of the "right" command to be calculated. Much of the improving performance of modern CPUs is due to better prediction logic, but lately the improvements have begun to slow.

IA-64 instead relies on the compiler for this task. Even before the program is fed into the CPU, the compiler examines the code and makes the same sorts of decisions that would otherwise happen at "run time" on the chip itself. Once it has decided what paths to take, it gathers up the instructions it knows can be run in parallel, bundles them into one larger instruction, and then stores it in that form in the program—hence the name VLIW or "very long instruction word."

Moving this task from the CPU to the compiler has several advantages. First, the compiler can spend considerably more time examining the code, a benefit the chip itself doesn't have because it has to complete as quickly as possible. Thus the compiler version can be considerably more accurate than the same code run on the chip's circuitry. Second, the prediction circuitry is quite complex, and offloading prediction to the compiler reduces that complexity enormously. It no longer has to examine anything; it simply breaks the instruction apart again and feeds the pieces off to the cores. Third, doing the prediction in the compiler is a one-off cost, rather than one incurred every time the program is run.

The downside is that a program's runtime-behaviour is not always obvious in the code used to generate it, and may vary considerably depending on the actual data being processed. The out-of-order processing logic of a mainstream CPU can make decisions on the basis of actual run-time data which the compiler can only guess at. That means that it is possible for the compiler to get its prediction wrong more often than comparable (or simpler) logic placed on the CPU. The VLIW design thus relies heavily on the performance of the compilers, the trade-off being to decrease microprocessor hardware complexity by increasing compiler software complexity.

Registers

The IA-64 architecture includes a very generous complement of registers: 128 each of 82-bit floating point and 64-bit integer registers. In addition to the sheer number, IA-64 adds in a register rotation mechanism that is controlled by the Register Stack Engine. Rather than the typical spill/fill or window mechanisms used in other processors, the Itanium can rotate in a set of new registers to accommodate for new function parameters or temporaries. The register rotation mechanism combined with predication is also very effective in executing automatically unrolled loops.

Instruction set

The architecture also provides instructions for multimedia operations and floating point operations.

Where a typical VLIW will assign sub-instructions from each long instruction word to a particular fixed functional unit, the Itanium supports several bundle mappings to allow for more instruction mixing possibilities and which include a balance between serial and parallel execution modes. There was room left in the initial bundle encodings to add more mappings in future versions of IA-64. In addition, the Itanium has individually settable predicate registers to cause a kind of runtime determined "no output" mode to each instruction.

A raw Itanium, when first booted, is actually missing some of its instruction functionality. A boot-rom like program called an EFI program is loaded which loads additional code into on-chip memory for defining these instructions, and performing other boot-time configurations, such as choosing the execution mode of the processor (64-bit versus 32-bit.) This design allows an Itanium system to be deployed with different capabilities depending on the contents of the EFI program. IA-32 support

In order to support IA-32, the Itanium can into 32-bit mode with special jump escape instructions. The IA-32 instructions have been mapped to the Itanium's functional units. However, since the Itanium is built primarily for speed of its EPIC-style instructions, and because it has no out-of-order execution capabilities, IA-32 code executes at a severe performance penalty compared to either the IA-64 mode or the Pentium line of processors. For example, the Itanium functional units do not automatically generate integer flags as a side effect of ordinary ALU computation, and do not intrinsically support multiple outstanding unaligned memory loads. There are also IA- 32 software emulators which are freely available for Windows and Linux, and these emulators typically outperform the hardware-based emulation by around 50%. The Windows emulator is available from Microsoft, the Linux emulator is available from some Linux vendors such as Novell. Given the superior performance of the software emulator, there has been some speculation that Intel will remove IA-32 emulation from future Itanium processors. However, the IA-32 hardware accounts for less than 1% of the transistors of an Itanium 2, and so there is little to gain from doing so.

Competitors

Although other 64-bit architectures have existed for a long time, most (MIPS, Alpha, PA- RISC) have faded from the marketplace. Itanium's remaining competition for the 64-bit server and workstation market appear to be the resurrected AMD with its AMD64 architecture, and the entrenched rivals: IBM's POWER architecture, and Sun's UltraSparc architecture. Although Apple might have challenged Intel with its XServe product line based on the IBM PowerPC architecture, any such prospect evaporated with the announcement of Apple's adoption of the Intel IA-32 architecture for its future products.

In response to favorable industry reaction to the AMD64, Intel's new version of the Xeon (Nocona) supports EM64T extensions to IA-32, which are largely instruction-set compatible with AMD64.

AMD64

From Wikipedia, the free encyclopedia.

The AMD64 or x86-64 or x64 is a 64 bit processor architecture invented by AMD. It is a superset of the x86 architecture, which it natively supports. The AMD64 Instruction set is currently used in AMD's Athlon 64, Athlon 64 FX, Athlon 64 X2, Turion 64, Opteron and later processors. Architecture Overview

AMD's x86-64 instruction set (later renamed AMD64) is a straightforward extension of the x86 architecture to 64 bits, motivated by the fact that the 4GB of memory directly addressable by a 32 bit CPU is no longer sufficient for all applications. Some of the changes:

• New registers. The number of general-purpose registers (GPRs) is increased from 8 in x86-32 to 16, and the size of these registers is increased from 32 bits to 64 bits. Additionally, the number of 128 bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16. The additional registers increase performance.

• Larger address space. Due to the 64 bit architecture, the AMD64 architecture can address up to 256 tebibytes (also known as terabytes) of memory in its current implementations. This is compared to just 4 GB for x86-32, only half of which is available to applications under the most common versions of Microsoft Windows. Future implementations of the AMD64 architecture may provide up to 2 exbibytes (also known as exabytes) of available memory. If paging is used properly, 32 bit operating systems can access some of the physical address extensions of the processor without having to execute in long mode. Virtual memory for all programs running in 32 bit mode is still limited to 4GB.

• RIP relative data access. Instructions can now reference data relative to the , which makes code in shared libraries that are not compiled to a fixed address more efficient. It also allows shared libraries to be mapped anywhere in the virtual address space.

• SSE instructions. The AMD64 architecture includes Intel's SSE and SSE2 instructions, newer E-stepping CPU include SSE3 as well. The x87 and MMX instructions are supported.

• NX bit. The NX bit is a processor feature that allows the operating system to forbid code execution in data areas, improving security. This feature is available in both 32 bit and 64 bit modes, and is supported by Linux, Solaris, Windows XP SP2, SP1 and newer. The NX bit (when coupled with an OS which takes advantage of it) is referred to in AMD's marketing literature as Enhanced Virus Protection (EVP). While it does indeed block a common attack vector for many types of malware (most notably buffer overflows), the NX bit (nor any single technological measure) is insufficient to prevent viruses from infecting a computer. Trade regulators in The Netherlands recently asked AMD to cease calling the NX bit "Enhanced Virus Protection" in advertisements in that country, stating that the NX capability was not a suitable substitute for other countermeasures, such as anti-virus software.

It should be noted that the NX bit has long been available on 32-bit x86 processors in the PAE (Paged Addressing Environment) mode, originally introduced in the 80286 processor. However, PAE has long been considered an obsolete mode of operation by systems software vendors (no current PC OS uses it); and AMD was the first x86-family vendor to support it in linear . Intel and other x86 CPU vendors are now supporting the NX bit in their product offerings as well.

Operating modes

Operating Application Default Default Typical Register Operating mode system recompile address operand GPR* extensions required required size size width 64 bit mode yes 64 yes 64 Long New 64 bit 32 Compatibility 32 32 mode OS no no mode 16 16 16 Protected 32 32 32 Mode Legacy 32 16 16 Legacy bit OS Virtual 8086 no no Mode mode 16 16 16 Legacy 16 Real mode bit OS *General Purpose Register

Operating mode explanation

There are two primary modes of operation for this architecture:

Long Mode The intended primary mode of operation of the architecture; it is a combination of the processor's native 64 bit mode and a 32 bit compatibility mode. It also abandons some of the more half-baked or lesser-used features of the 80386. It is used by 64 bit Operating Systems; among those that support Long Mode are Linux, the various BSDs, Solaris 10 and Windows XP Professional x64 Edition. Since the basic instruction set is the same, there is no major performance penalty for executing x86 code. This is unlike Intel's IA-64, where differences in the underlying ISA means that running 32 bit code is like using an entirely different processor. However, on AMD64, 32 bit x86 applications may still benefit from a 64 bit recompile, due to the additional registers in 64 bit code, which a high-level compiler can use for optimization. Using Long Mode, a 64 bit OS can run 32 bit applications and 64 bit applications simultaneously. Also, x86-64 includes native support for running 16 bit x86 applications. Microsoft, however, has explicitly left out 16 bit program support in Windows XP Professional x64 Edition due to problems in getting 16 bit x86 code to run via their WoW64 Subsystem. Legacy Mode The mode used by 16 bit operating systems, like MS-DOS, and 32 bit operating systems, such as Windows XP. In this mode, only 16 bit or 32 bit code can be executed. 64 bit programs (such as the GUI setup program for Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition) will not run.

Market analysis

AMD64 represents a break with AMD's past behavior of following Intel's standards, but follows Intel's earlier behavior of extending the x86 architecture, from the 16 bit 8086 to the 32 bit 80386 and beyond, without ever removing backwards compatibility. The AMD64 architecture extends the 32 bit x86 architecture (IA-32) by adding 64 bit registers, with full 32 bit and 16 bit compatibility modes for earlier software. Even the 64 bit mode is largely backwards compatible, allowing existing tools targeting x86 such as compilers to be retargeted to AMD64 with minimal effort. The AMD64 architecture also features the NX bit. Implementations

The following processors implement the AMD64 architecture:

• AMD K8 o AMD Athlon 64 o AMD Athlon 64 X2 o AMD Athlon 64 FX o AMD Opteron o AMD Turion 64 o AMD Sempron (only 'Palermo' models using the E6 stepping) • EM64T o Intel Xeon (some models since 'Nocona') o Intel D (some models since 'Prescott') o Intel Pentium 4 (some models since 'Prescott') o Intel Pentium D o Intel Pentium Extreme Edition o Intel (some models starting with 'Merom') o Intel Conroe (upcoming desktop core)

EM64T

From Wikipedia, the free encyclopedia.

Extended Memory 64-bit Technology (EM64T) is Intel's implementation of AMD64, a 64-bit extension to the IA-32 architecture. See the AMD64 article for architectural details.

History

The history of the EM64T project is long and convoluted, mainly due to the internal politics of Intel. It began with the codename Yamhill, named after the Yamhill River river in 's . After several years of denying that this project existed, Intel eventually admitted it existed in early 2004, and gave it the codename CT (Clackamas Technology), also named after an Oregon river (the Clackamas River, also a tributary of the ). Then within the space of weeks of the CT announcement, Intel gave it several new names. After the spring 2004 IDF, Intel named it IA-32E (IA-32 Extensions) and a few weeks later devised the name EM64T. Intel's chairman at the time, Craig Barrett, admitted that this was one of their worst kept secrets. Intel CPUs with EM64T

Intel's first processor to actively implement the EM64T technology is the processor codenamed Nocona, and is being sold as Intel's latest multiprocessor Xeon. Since the Xeon itself is directly based on Intel's desktop processor, the Pentium 4, the Pentium 4 also has EM64T technology built in, although as with Hyper-Threading, this feature was not initially enabled on the then-new Prescott design, likely because Intel had not yet perfected it at the time. Intel has since begun selling EM64T enabled Pentium 4s using the E0 revision of the Prescott core, being sold on the market as the Pentium 4, model F. The E0 revision also adds eXecute Disable(XD) support to EM64T, Intel's name for the NX bit, and should be backported in to the Nocona design soon. All 8xx/6xx/5x6/5x1/3x6/3x1 series CPU's have EM64T enabled, as will all future Intel CPUs.

As of June 2005, none of Intel's notebook CPUs (the Pentium M family, the Celeron M family, or the Mobile Intel Pentium 4 processors) support EM64T. The first Pentium M derivative supporting EM64T will be the dual core Merom targeted for mid-2006.