Intel X86 Architecture Intel Microprocessor History

Total Page:16

File Type:pdf, Size:1020Kb

Intel X86 Architecture Intel Microprocessor History Intel x86 Architecture Intel microprocessor history Comppgzuter Organization and Assembl yggy Languages Yung-Yu Chuang with slides by Kip Irvine Early Intel microprocessors The IBM-AT • Intel 8080 (1972) • Intel 80286 (1982) – 64K addressable RAM – 16 MB addressa ble RAM –8-bit registers – Protected memory – CP/M operating system – several times faster than 8086 – 5,6,8,10 MHz – introduced IDE bus architecture – 29K transistros – 80287 floating point unit • Intel 8086/8088 (1978) my first computer (1986) – Up to 20MHz – IBM-PC used 8088 – 134K transistors – 1 MB addressable RAM –16-bit registers – 16-bit data bus (8-bit for 8088) – separate floating-point unit (8087) – used in low-cost microcontrollers now 3 4 Intel IA-32 Family Intel P6 Family • Intel386 (1985) • Pentium Pro (1995) – 4 GB addressable RAM – advan ced optimiz ati on techni ques in micr ocode –32-bit registers – More pipeline stages – On-board L2 cache – paging (virtual memory) • Pentium II (1997) – Up to 33MHz – MMX (multimedia) instruction set • Intel486 (1989) – Up to 450MHz – instruction pipelining • Pentium III (1999) – Integrated FPU – SIMD (streaming extensions) instructions (SSE) – 8K cache – Up to 1+GHz •Pentium (()1993) • Pentium 4 (2000) – Superscalar (two parallel pipelines) – NetBurst micro-architecture, tuned for multimedia – 3.8+GHz • Penti um D (2005, Dua l core) 5 6 IA32 Processors • Totally Dominate Computer Market • EliEvolutionary DiDesign – Starting in 1978 with 8086 – Added more features as time goes on IA-32 Architecture – Still support old features, although obsolete • Complex Instruction Set Computer (CISC) –Manyyy different instructions with many different formats • But, only small subset encountered with Linux programs – Hard to match performance of Reduced Instruction Set Computers (RISC) – But, Inte l has done just th!hat! IA-32 architecture Modes of operation • Lots of architecture improvements, pipelining, •Protected mode superscalar, branch prediction, hyperthreading –native mode (Wind ows, Linux ), fu ll fea tures, and multi-core. separate memory • From programmer’s poitint of view, IA-32 has not • Virtual-8086 mode changed substantially except the introduction •hybrid of Protected of a set of hihhigh-performance iittinstructions • each program has its own 8086 computer • Real-address mode – native MS-DOS • System management mode – ppg,yy,gower management, system security, diagnostics 9 10 Addressable memory General-purpose registers •Protected mode 32-bit General-Purpose Registers –4 GB EAX EBP – 32-bit address EBX ESP • Real-address and Virtual-8086 modes ECX ESI –1 MB space EDX EDI – 20-bit address 16-bit Segggment Registers EFLAGS CS ES SS FS EIP DS GS 11 12 Accessing parts of registers Index and base registers • Use 8-bit name, 16-bit name, or 32-bit name • Some registers have only a 16-bit name for • App lies to EAX, EBX, ECX, and EDX their lower half (no 8-bit aliases). The 16-bit 8 8 registers are usually used only in real-address AH AL 8 bits + 8 bits mode. AX 16 bits EAX 32 bits 13 14 Some specialized register uses (1 of 2) Some specialized register uses (2 of 2) • General-Purpose •Segment – EAX – accumula tor (au toma tica lly used by div is ion –CS – code segment and multiplication) –DS – data segment – ECX – loop counter – SS – stack segment –ESP – stack pointer (should never be used for arithmetic or data transfer) – ES, FS, GS - additional segments – ESI, EDI – index registers (used for high-speed • EIP – instruction pointer memory transfer instructions) • EFLAGS – EBP – extddtended frame poitinter (tk)(stack) – status and control flags – each flag is a single binary bit (set or clear) • Some other system registers such as IDTR, GDTR, LDTR etc. 15 16 Status flags Floating-point, MMX, XMM registers • Carry • Eight 80-bit floating-point data ST(0) – unsigned arithmetic out of range registers ST(1) • Overflow – ST(0), ST(1), . , ST(7) ST(2) – signed arithmetic out of range ST(3) – arranged in a stack •Sign ST(4) – used for all floating-point – result is negative ST(5) arithmetic •Zero ST(6) – result is zero •Eight 64-bit MMX registers ST(7) • Auxiliary Carry • Eight 128-bit XMM registers for – carry from bit 3 to bit 4 single-instruction multiple-data (SIMD) operations •Parity – sum of 1 bits is an even number 17 18 Programmer’s model Programmer’s model 19 20 Real-address mode • 1 MB RAM maximum addressable (20-bit address) • Application programs can access any area of memory IA-32 Memory Management • Single tasking • Supported by MS-DOS operating system 22 Segmented memory Calculating linear addresses Segmented memory addressing: absolute (linear) address • Given a segment address, multiply it by 16 (add is a combination of a 16-bit segment value added to a 16- a hexadecimal zero), and add it to the offset bit offset • Example: convert 08F1:0100 to a linear address F0000 E0000 8000:FFFF D0000 Adjusted Segment value: 0 8 F 1 0 C0000 B0000 Add the offset: 0 1 0 0 A0000 one segment 90000 80000 (64K) Linear address: 0 9 0 1 0 70000 60000 8000:0250 50000 • A typical program has three segments: code, 40000 0250 30000 8000:0000 data and stack. Segment registers CS, DS and SS 20000 10000 are used to store them separately. seg ofs 00000 23 24 Example Protected mode (1 of 2) • 4 GB addressable RAM (32-bit address) What linear address corresponds to the segment/offset address 028F:0030? – (00000000 to FFFFFFFFh) • Each program assigned a memory partition whic h is protected from other programs 028F0 + 0030 = 02920 • Designed for multitasking • Supported by Linux & MS-Windows Always use hexadecimal notation for addresses. 25 26 Protected mode (2 of 2) Flat segmentation model • Segment descriptor tables • All segments are mapped to the entire 32-bit physical address space, at least two, one for data and one for • Program structure code – code, data, and stack areas • ggp()lobal descriptor table (GDT) – CS, DS, SS segment descriptors – global descriptor table (GDT) • MASM Programs use the Microsoft flat memory model 27 28 Multi-segment model Translating Addresses • Each program has a local descriptor table (LDT) • The IA-32 processor uses a one- or two-step – holds descriptor for each segment used by the program process to convert a variable' s logical address RAM into a unique memory location. •The first step combines a segment value wihith a variable’s offset to create a linear address. Local Descriptor Table • The second optional step, called page translation, converts a linear address to a 26000 base limit access physical address. 00026000 0010 00008000 000A 00003000 0002 8000 multiplied by 1000h 3000 29 Converting Logical to Linear Address Indexing into a Descriptor Table The segment Logical address Each segment descriptor indexes into the program's local selector points to a Selector Offset descriptor table (LDT). Each table entry is mapped to a segment descriptor, linear address: Descriptor table Linear address space which contains the base address of a (unused) memory segment. Segment Descriptor + Logical addresses The 32-bit offset LlDiLocal Descriptor TblTable DRAM from the logical SS ESP 0018 0000003A address is added to the segment’s base DS offset (index) 18 001A0000 address, generating GDTR/LDTR 0010 000001B6 10 0002A000 Linear address a 32-bit linear 08 0001A000 (contains base address of IP address. descriptor table) 00 00003000 0008 00002CD3 LDTR register Paging (1 of 2) Paging (2 of 2) • Virtual memory uses disk as part of the memory, • OS maintains page directory and page tables thus allowing sum of all programs can be larger • Page translation: CPU converts the linear than physical memory address into a physical address • On ly part of a program must be kep t in • Page fault: occurs when a needed page is not memory, while the remaining parts are kept on in memory, and the CPU interrup ts the dis k. program • The memory used by the program is divided • Virtual memory manager (VMM) – OS utility into small units called pages (4096-byte). that manages the loading and unloading of •As the ppgrogram runs, the processor selectivel y pages unloads inactive pages from memory and loads • OS copies the page into memory, program other ppgages that are immediatel y re quired. resumes execution Page Translation A linear address is Linear Address 10 10 12 divided into a page Directory Table Offset directory field, page Page Frame table field, and page Page Directory frame offset. The Page Table Physical Address CPU uses all three to calculate the Page-Table Entry ppyhysical address. Directory Entry CR3 32.
Recommended publications
  • The Von Neumann Computer Model 5/30/17, 10:03 PM
    The von Neumann Computer Model 5/30/17, 10:03 PM CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm The von Neumann Computer Model 1. The von Neumann Computer Model 2. Components of the Von Neumann Model 3. Communication Between Memory and Processing Unit 4. CPU data-path 5. Memory Operations 6. Understanding the MAR and the MDR 7. Understanding the MAR and the MDR, Cont. 8. ALU, the Processing Unit 9. ALU and the Word Length 10. Control Unit 11. Control Unit, Cont. 12. Input/Output 13. Input/Output Ports 14. Input/Output Address Space 15. Console Input/Output in Protected Memory Mode 16. Instruction Processing 17. Instruction Components 18. Why Learn Intel x86 ISA ? 19. Design of the x86 CPU Instruction Set 20. CPU Instruction Set 21. History of IBM PC 22. Early x86 Processor Family 23. 8086 and 8088 CPU 24. 80186 CPU 25. 80286 CPU 26. 80386 CPU 27. 80386 CPU, Cont. 28. 80486 CPU 29. Pentium (Intel 80586) 30. Pentium Pro 31. Pentium II 32. Itanium processor 1. The von Neumann Computer Model Von Neumann computer systems contain three main building blocks: The following block diagram shows major relationship between CPU components: the central processing unit (CPU), memory, and input/output devices (I/O). These three components are connected together using the system bus. The most prominent items within the CPU are the registers: they can be manipulated directly by a computer program. http://www.c-jump.com/CIS77/CPU/VonNeumann/lecture.html Page 1 of 15 IPR2017-01532 FanDuel, et al.
    [Show full text]
  • The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and Computer
    Microprocessors (0630371) Fall 2010/2011 – Lecture Notes # 1 The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and computer Outline of the Lecture Evolution of programming languages. Microcomputer Architecture. Instruction Execution Cycle. Evolution of programming languages: Machine language - the programmer had to remember the machine codes for various operations, and had to remember the locations of the data in the main memory like: 0101 0011 0111… Assembly Language - an instruction is an easy –to- remember form called a mnemonic code . Example: Assembly Language Machine Language Load 100100 ADD 100101 SUB 100011 We need a program called an assembler that translates the assembly language instructions into machine language. High-level languages Fortran, Cobol, Pascal, C++, C# and java. We need a compiler to translate instructions written in high-level languages into machine code. Microprocessor-based system (Micro computer) Architecture Data Bus, I/O bus Memory Storage I/O I/O Registers Unit Device Device Central Processing Unit #1 #2 (CPU ) ALU CU Clock Control Unit Address Bus The figure shows the main components of a microprocessor-based system: CPU- Central Processing Unit , where calculations and logic operations are done. CPU contains registers , a high-frequency clock , a control unit ( CU ) and an arithmetic logic unit ( ALU ). o Clock : synchronizes the internal operations of the CPU with other system components using clock pulsing at a constant rate (the basic unit of time for machine instructions is a machine cycle or clock cycle) One cycle A machine instruction requires at least one clock cycle some instruction require 50 clocks. o Control Unit (CU) - generate the needed control signals to coordinate the sequencing of steps involved in executing machine instructions: (fetches data and instructions and decodes addresses for the ALU).
    [Show full text]
  • Lecture Notes
    Lecture #4-5: Computer Hardware (Overview and CPUs) CS106E Spring 2018, Young In these lectures, we begin our three-lecture exploration of Computer Hardware. We start by looking at the different types of computer components and how they interact during basic computer operations. Next, we focus specifically on the CPU (Central Processing Unit). We take a look at the Machine Language of the CPU and discover it’s really quite primitive. We explore how Compilers and Interpreters allow us to go from the High-Level Languages we are used to programming to the Low-Level machine language actually used by the CPU. Most modern CPUs are multicore. We take a look at when multicore provides big advantages and when it doesn’t. We also take a short look at Graphics Processing Units (GPUs) and what they might be used for. We end by taking a look at Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). Stanford President John Hennessy won the Turing Award (Computer Science’s equivalent of the Nobel Prize) for his work on RISC computing. Hardware and Software: Hardware refers to the physical components of a computer. Software refers to the programs or instructions that run on the physical computer. - We can entirely change the software on a computer, without changing the hardware and it will transform how the computer works. I can take an Apple MacBook for example, remove the Apple Software and install Microsoft Windows, and I now have a Window’s computer. - In the next two lectures we will focus entirely on Hardware.
    [Show full text]
  • Unit 8 : Microprocessor Architecture
    Unit 8 : Microprocessor Architecture Lesson 1 : Microcomputer Structure 1.1. Learning Objectives On completion of this lesson you will be able to : ♦ draw the block diagram of a simple computer ♦ understand the function of different units of a microcomputer ♦ learn the basic operation of microcomputer bus system. 1.2. Digital Computer A digital computer is a multipurpose, programmable machine that reads A digital computer is a binary instructions from its memory, accepts binary data as input and multipurpose, programmable processes data according to those instructions, and provides results as machine. output. 1.3. Basic Computer System Organization Every computer contains five essential parts or units. They are Basic computer system organization. i. the arithmetic logic unit (ALU) ii. the control unit iii. the memory unit iv. the input unit v. the output unit. 1.3.1. The Arithmetic and Logic Unit (ALU) The arithmetic and logic unit (ALU) is that part of the computer that The arithmetic and logic actually performs arithmetic and logical operations on data. All other unit (ALU) is that part of elements of the computer system - control unit, register, memory, I/O - the computer that actually are there mainly to bring data into the ALU to process and then to take performs arithmetic and the results back out. logical operations on data. An arithmetic and logic unit and, indeed, all electronic components in the computer are based on the use of simple digital logic devices that can store binary digits and perform simple Boolean logic operations. Data are presented to the ALU in registers. These registers are temporary storage locations within the CPU that are connected by signal paths of the ALU.
    [Show full text]
  • HOW FAST? the Current Intel® Core™ Processor Has 43,000,000% More Transistors Than the 4004 Processor
    40yrs of Intel® microprocessor innovation Following Moore’s Law the whole way Intel co-founder Gordon Moore once made a famous prediction that transistor The world’s first microprocessor count for computer chips would —the Intel® 4004—was “born” in 1971, double every two years. 10 years before the first PC came along. Using Moore’s Law as a guiding principle, Intel has provided ever-increasing functionality, performance and energy efficiency to its products. Just think: What if the world had followed this golden rule the last 40 years? HOW FAST? The current Intel® Core™ processor has 43,000,000% more transistors than the 4004 processor. If a village with a 1971 population of 100 had grown as quickly, it would now be by far the largest city in the world. War and Peace? Wait a second. The 4004 processor executed 92,000 instructions per second, while today’s Intel® Core™ i7 processor can run 92 billion. If your typing had accelerated at that rate, you’d be able to type Tolstoy’s classic in just over 1 second. 0101010101010101… You would need 25,000 years to turn a light switch on and off 1.5 trillion times, but today’s processors can do that in less than a second. A PENNY SAVED… When released in 1981, the first well- equipped IBM PC cost about $11,250 in inflation-adjusted 2011 dollars. Today, much more powerful PCs are available in the $500 range (or even less). Fly me to the moon If space travel had come down in price as much as transistors have since 1971, the Apollo 11 mission, which cost around $355 million in 1969, would cost about as much as a latte.
    [Show full text]
  • Trends in Processor Architecture
    A. González Trends in Processor Architecture Trends in Processor Architecture Antonio González Universitat Politècnica de Catalunya, Barcelona, Spain 1. Past Trends Processors have undergone a tremendous evolution throughout their history. A key milestone in this evolution was the introduction of the microprocessor, term that refers to a processor that is implemented in a single chip. The first microprocessor was introduced by Intel under the name of Intel 4004 in 1971. It contained about 2,300 transistors, was clocked at 740 KHz and delivered 92,000 instructions per second while dissipating around 0.5 watts. Since then, practically every year we have witnessed the launch of a new microprocessor, delivering significant performance improvements over previous ones. Some studies have estimated this growth to be exponential, in the order of about 50% per year, which results in a cumulative growth of over three orders of magnitude in a time span of two decades [12]. These improvements have been fueled by advances in the manufacturing process and innovations in processor architecture. According to several studies [4][6], both aspects contributed in a similar amount to the global gains. The manufacturing process technology has tried to follow the scaling recipe laid down by Robert N. Dennard in the early 1970s [7]. The basics of this technology scaling consists of reducing transistor dimensions by a factor of 30% every generation (typically 2 years) while keeping electric fields constant. The 30% scaling in the dimensions results in doubling the transistor density (doubling transistor density every two years was predicted in 1975 by Gordon Moore and is normally referred to as Moore’s Law [21][22]).
    [Show full text]
  • What Is a Microprocessor?
    Behavior Research Methods & Instrumentation 1978, Vol. 10 (2),238-240 SESSION VII MICROPROCESSORS IN PSYCHOLOGY: A SYMPOSIUM MISRA PAVEL, New York University, Presider What is a microprocessor? MISRA PAVEL New York University, New York, New York 10003 A general introduction to microcomputer terminology and concepts is provided. The purpose of this introduction is to provide an overview of this session and to introduce the termi­ MICROPROCESSOR SYSTEM nology and some of the concepts that will be discussed in greater detail by the rest of the session papers. PERIPNERILS EIPERI MENT II A block diagram of a typical small computer system USS PRINIER KErBOARD INIERfICE is shown in Figure 1. Four distinct blocks can be STDRIGE distinguished: (1) the central processing unit (CPU); (2) external memory; (3) peripherals-mass storage, CONTROL standard input/output, and man-machine interface; 0111 BUS (4) special purpose (experimental) interface. IODiISS The different functional units in the system shown here are connected, more or less in parallel, by a number CENTKll ElIEBUL of lines commonly referred to as a bus. The bus mediates PROCESSING ME MOil transfer of information among the units by carrying UN IT address information, data, and control signals. For example, when the CPU transfers data to memory, it activates appropriate control lines, asserts the desired Figure 1. Block diagram of a smaIl computer system. destination memory address, and then outputs data on the bus. Traditionally, the entire system was built from a CU multitude of relatively simple components mounted on interconnected printed circuit cards. With the advances of integrated circuit technology, large amounts of circuitry were squeezed into a single chip.
    [Show full text]
  • Chapter 1: Microprocessor Architecture
    Chapter 1: Microprocessor architecture ECE 3120 – Fall 2013 Dr. Mohamed Mahmoud http://iweb.tntech.edu/mmahmoud/ [email protected] Outline 1.1 Computer hardware organization 1.1.1 Number System 1.1.2 Computer hardware organization 1.2 The processor 1.3 Memory system operation 1.4 Program Execution 1.5 HCS12 Microcontroller 1.1.1 Number System - Computer hardware uses binary numbers to perform all operations. - Human beings are used to decimal number system. Conversion is often needed to convert numbers between the internal (binary) and external (decimal) representations. - Octal and hexadecimal numbers have shorter representations than the binary system. - The binary number system has two digits 0 and 1 - The octal number system uses eight digits 0 and 7 - The hexadecimal number system uses 16 digits: 0, 1, .., 9, A, B, C,.., F 1 - 1 - A prefix is used to indicate the base of a number. - Convert %01000101 to Hexadecimal = $45 because 0100 = 4 and 0101 = 5 - Computer needs to deal with signed and unsigned numbers - Two’s complement method is used to represent negative numbers - A number with its most significant bit set to 1 is negative, otherwise it is positive. 1 - 2 1- Unsigned number %1111 = 1 + 2 + 4 + 8 = 15 %0111 = 1 + 2 + 4 = 7 Unsigned N-bit number can have numbers from 0 to 2N-1 2- Signed number %1111 is a negative number. To convert to decimal, calculate the two’s complement The two’s complement = one’s complement +1 = %0000 + 1 =%0001 = 1 then %1111 = -1 %0111 is a positive number = 1 + 2 + 4 = 7.
    [Show full text]
  • Theoretical Peak FLOPS Per Instruction Set on Modern Intel Cpus
    Theoretical Peak FLOPS per instruction set on modern Intel CPUs Romain Dolbeau Bull – Center for Excellence in Parallel Programming Email: [email protected] Abstract—It used to be that evaluating the theoretical and potentially multiple threads per core. Vector of peak performance of a CPU in FLOPS (floating point varying sizes. And more sophisticated instructions. operations per seconds) was merely a matter of multiplying Equation2 describes a more realistic view, that we the frequency by the number of floating-point instructions will explain in details in the rest of the paper, first per cycles. Today however, CPUs have features such as vectorization, fused multiply-add, hyper-threading or in general in sectionII and then for the specific “turbo” mode. In this paper, we look into this theoretical cases of Intel CPUs: first a simple one from the peak for recent full-featured Intel CPUs., taking into Nehalem/Westmere era in section III and then the account not only the simple absolute peak, but also the full complexity of the Haswell family in sectionIV. relevant instruction sets and encoding and the frequency A complement to this paper titled “Theoretical Peak scaling behavior of current Intel CPUs. FLOPS per instruction set on less conventional Revision 1.41, 2016/10/04 08:49:16 Index Terms—FLOPS hardware” [1] covers other computing devices. flop 9 I. INTRODUCTION > operation> High performance computing thrives on fast com- > > putations and high memory bandwidth. But before > operations => any code or even benchmark is run, the very first × micro − architecture instruction number to evaluate a system is the theoretical peak > > - how many floating-point operations the system > can theoretically execute in a given time.
    [Show full text]
  • COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011
    COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Edgar Gabriel Spring 2011 Edgar Gabriel Moore’s Law • Long-term trend on the number of transistor per integrated circuit • Number of transistors double every ~18 month Source: http://en.wikipedia.org/wki/Images:Moores_law.svg COSC 6385 – Computer Architecture Edgar Gabriel 1 What do we do with that many transistors? • Optimizing the execution of a single instruction stream through – Pipelining • Overlap the execution of multiple instructions • Example: all RISC architectures; Intel x86 underneath the hood – Out-of-order execution: • Allow instructions to overtake each other in accordance with code dependencies (RAW, WAW, WAR) • Example: all commercial processors (Intel, AMD, IBM, SUN) – Branch prediction and speculative execution: • Reduce the number of stall cycles due to unresolved branches • Example: (nearly) all commercial processors COSC 6385 – Computer Architecture Edgar Gabriel What do we do with that many transistors? (II) – Multi-issue processors: • Allow multiple instructions to start execution per clock cycle • Superscalar (Intel x86, AMD, …) vs. VLIW architectures – VLIW/EPIC architectures: • Allow compilers to indicate independent instructions per issue packet • Example: Intel Itanium series – Vector units: • Allow for the efficient expression and execution of vector operations • Example: SSE, SSE2, SSE3, SSE4 instructions COSC 6385 – Computer Architecture Edgar Gabriel 2 Limitations of optimizing a single instruction
    [Show full text]
  • Reverse Engineering X86 Processor Microcode
    Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz, Ruhr-University Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/koppe This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz Ruhr-Universitat¨ Bochum Abstract hardware modifications [48]. Dedicated hardware units to counter bugs are imperfect [36, 49] and involve non- Microcode is an abstraction layer on top of the phys- negligible hardware costs [8]. The infamous Pentium fdiv ical components of a CPU and present in most general- bug [62] illustrated a clear economic need for field up- purpose CPUs today. In addition to facilitate complex and dates after deployment in order to turn off defective parts vast instruction sets, it also provides an update mechanism and patch erroneous behavior. Note that the implementa- that allows CPUs to be patched in-place without requiring tion of a modern processor involves millions of lines of any special hardware. While it is well-known that CPUs HDL code [55] and verification of functional correctness are regularly updated with this mechanism, very little is for such processors is still an unsolved problem [4, 29]. known about its inner workings given that microcode and the update mechanism are proprietary and have not been Since the 1970s, x86 processor manufacturers have throughly analyzed yet.
    [Show full text]
  • Assembly Language: IA-X86
    Assembly Language for x86 Processors X86 Processor Architecture CS 271 Computer Architecture Purdue University Fort Wayne 1 Outline Basic IA Computer Organization IA-32 Registers Instruction Execution Cycle Basic IA Computer Organization Since the 1940's, the Von Neumann computers contains three key components: Processor, called also the CPU (Central Processing Unit) Memory and Storage Devices I/O Devices Interconnected with one or more buses Data Bus Address Bus data bus Control Bus registers Processor I/O I/O IA: Intel Architecture Memory Device Device (CPU) #1 #2 32-bit (or i386) ALU CU clock control bus address bus Processor The processor consists of Datapath ALU Registers Control unit ALU (Arithmetic logic unit) Performs arithmetic and logic operations Control unit (CU) Generates the control signals required to execute instructions Memory Address Space Address Space is the set of memory locations (bytes) that are addressable Next ... Basic Computer Organization IA-32 Registers Instruction Execution Cycle Registers Registers are high speed memory inside the CPU Eight 32-bit general-purpose registers Six 16-bit segment registers Processor Status Flags (EFLAGS) and Instruction Pointer (EIP) 32-bit General-Purpose Registers EAX EBP EBX ESP ECX ESI EDX EDI 16-bit Segment Registers EFLAGS CS ES SS FS EIP DS GS General-Purpose Registers Used primarily for arithmetic and data movement mov eax 10 ;move constant integer 10 into register eax Specialized uses of Registers eax – Accumulator register Automatically
    [Show full text]