Dr. Ernesto Gomez : CSE 401 Most of the Material Here Is in Chapter 2 of the Text, Specifically Sections 2.1-2.8
Total Page:16
File Type:pdf, Size:1020Kb
Dr. Ernesto Gomez : CSE 401 Most of the material here is in chapter 2 of the text, specifically sections 2.1-2.8. We are using MIPS as an example of a real processor. This is because computer architecture is more of an engineering discipline than a pure science. Our designs are not just constrained by theory, we need to consider physics (is it possible? ), engineering (do we have the technology?, how much power will it use? what infrastructure will it need?), cost (what does it cost to do it? is the improvement worth it?), economics (can we sell it? how big a market?), history (is it compatible with existing tech? is there an upgrade path? ), other considerations may apply as well in sepcific cases. We need to consider problems well beyound the purely technical. We therefore will study a specific processors: MIPS. This is a fairly old processor, but it sllustrates a design that is still very current - the ARM proces sor family is very close to the Mway of IPS architecture. MIPS is simple enough for easier understanding, but has peculiarities of a real machine, which serve as illustration of what actually happens in a way that an idealized model architecture does not. MIPS is still used inside network swiches and routers. 1. CISC and RISC A way of classifying Instruction Set Architecture (ISA) is: Complex Instruction Set Computer (CISC) and Reduced ... (RISC). (Maybe one reason to use acronyms is, they sound a lot more intelligent and techinical than what they stand for?). To understand the reasong behind this, we need some definitions: • word : this is a the size of memory that a computer can process in a single instruction - if a CPU uses registers, the register size defines the word. Usually but not always, data and memory buses are designed to transfer one word (or some fixed number of words) per cycle. The size in bits of the word is what we mean when we say that some particular machine is “32 bit” or 64 bit”, or whatever. In modern computers, word size is some power of 2 (25 = 32, 26 =64), but historically this was not always the case - there have been 12 bit and 36 bit computers in the past, for example. • byte : 8 bits. Current practice is to address memory in bytes (in some older machines, addresses were sometimes in words). Bytes are a useful size fir character/text processing, 28 = 256, which allows a good number of digits, upper/lower case, special characters to be stored in one byte. Many machines have instructions that operate on bytes rather then words. With small wordsizes CISC was almost required - 8 bits gives only 256 possible instruction codes, but if you need them all, then instruction formats require mul- tiple words. This takes more cycles to process each instruction, so designs favored doing complicated things in each instruction so you could accomplish a task in fewer instructions. For example, the PDP-11 (16 bit word) had 18 addressing modes, applicable to memory (called register modes, they were ways of address- ing memory and loading or storing registers at an address, at the location an address points to, doing either of the previous things and incrementing the ad- dress in one instruction .. gets really complicated). I had a PDP-11 in the late 70s - early 80s - it had remarkable speed for the time, even running at 1 MHz (two or three thousand times slower than machines now!) it got a lot done in few 1 2 cycles (an overview of this can be found at https://en.wikipedia.org/wiki/PDP- 11_architecture#General_register_addressing_modes, if you are interested). In CISC, there can be large differences in complexity and work, the CPI for such a machine is usually not near 1. For example , if the direct address mode in a PDP-11 (load a value at an address) could be done in 1 cycle, the indirect-increment mode (read the address in a register, load the contents of that address, increment the register) could easily take 3 cycles. In the 1980s and later, RISC architectures were introduced. The name comes from comparison with CISC - “Reduced” compared to a typical CISC instruction set. RISC machines had fewer and simpler instructions - a single CISC instruction might need three or four RISC instructions. Since a RISC instruction does less than a typical CISC instruction, it is both easier to decode (fewer instructions, simpler formats) and runs faster (because it does a fraction of the work a CISC instruction does). Differences in amount of work between different instructions are lessened, more complex actions like the indirect-increment instruction described above are done in miltiple simple steps. So you could run a faster clock, with a CPI much closer to 1. 1.0.1. RISC advantage. Which was better, RISC or CISC? There was no real speed difference, it depended on the quality of implementation. RISC machines had faster clocks and better CPI, but substantially longer machine code than CISC for the same algorithms. CISC has a lower clock and worse CPI, but much shorter code sequence. It balanced out. In the 1980s, we realy only cared about speed. CISC did have an advantage in ease of programming, in assembly language however - the larger variety of instructions made it easy to find an instruction or sequence that could do whatever you need. In a RISC machine like MIPS you can be forced into strange program structures - for example there is no copy instruction from one register to another, if you need to do this, you add 0 to the source register and put the result in the target register. In high level languages there is no programming difference, the machine code is generated by the compiler or interpreter. In the 1990s and later we have become concerned with power usage and heat, and we are building much more complicated circuits into chips to get more parallelism (because we can’t economically cool chips running much over 4GHz, and the upper limit is still around 9GHz, cooled with totally ipractical liquid Helium). RISC requires simpler circuits - these are cheaper and smaller than CISC circuits, we can get more of them on a chip for parallelism, and they use less power so generate less heat. So at present the advantages all point ro RISC. ARM, all new designs are RISC. 1.0.2. Intel chips after 200,0. What about Intel? The Pentium architecture (also called the x86) descends from the Intel 80286 (the IBM PC-AT, a 16 bit chip with some 24 bit features) and in turn from the Intel 8088 (internally 16 bits, but with 8 bit memory bus). These were early late 1970s, early 1980s designs with a CISC Instruction Set Architecture. Backward compatibilty has forced Intel to retain CISC while everything else is RISC, but the enourmous base of software developed on Intel processors (for commercial reasons - even back in the 1980s there were better architectures, such as the Motorola 68000, which was internally a 32 bit machine with 24 bit addressing, and cost roughly the same as an Intel 80286, but they were not as commercially successful as the PC and PC compatible machines. 3 Intel actually tried to replace the x86 series in the early 2000s with their first 64 bit CPU, the Itanium. It was a better architecture, but it if only ran old Intel code it used an emulator program, to get the full advantage of the new chip required recompilation with new compilers. The emulation meant the new computer would run your existing code slower than your old computer. Itanium was a commercial failure, finally killed when AMD introduced a backward compatible 64 bit chip - all the alternatives from Intel at the time were 32 bit. Intel caught up with the Core 2 series chips, a 64 bit backward compatible architecture, retaining the old CISC design. They did finally go over to RISC - their current pipeline system inputs old x86 CISC code and translate it to RISC before passing it to the actual computational core. So the interior of the CPU is a modern, efficient RISC system wrapped inside an external hardware translator that reads old code compatible with a hardware design from the 1980s. Intel gets the advantage of a faster clock and lower power consumption on a simpler system, and the commercial advantage of compatibility with code from 1980. The clock is faster because in a pipeline design it is set to match the slowest single stage. Of course, to do all this, the Intel pipeline is much more complicated than MIPS. In Lecture 2 we described a 5 stage, simple pipeline with clock set to the slowest stage (usually arithmetic and logic or memory). Since the actual instructions that execute in Intel are RISC, they have the same speed and power advantages of RISC code. The downside is that the Intel processors need many more stages to do the RISC conversion and change the instruction stream from CISC to a longer RISC instruction stream. It doesn’t need a slower clock, because the extra stages are simpler than the computation and memory stages, but it needs a 15 to 20 stage pipeline instead of the 5 stages of MIPS. We sill see some of the implications of this in Chapter 4..