<<

Outline IA-32 Architecture   IA-32 Registers Computer Organization  Instruction Execution Cycle &  IA-32 Assembly Language Programming

Dr Adnan Gutub aagutub ‘’ uqu.edu.sa

[Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers] Most Slides contents have been arranged by Dr Muhamed Mudawar & Dr Aiman El-Maleh from Computer Engineering Dept. at KFUPM 45/٢ IA-32 Architecture Computer Organization and Assembly Language slide

Intel Microprocessors and 80386 Processors  Intel introduced the 8086 in 1979  80286 was introduced in 1982  8086, 8087, 8088, and 80186 processors  24-bit address ⇒ 224 bytes = 16 MB address space   16-bit processors with 16-bit registers Introduced   16-bit data bus and 20-bit address bus Segmentation in protected mode is different from the  space = 220 bytes = 1 MB  80386 was introduced in 1985  8087 Floating-Point co-processor  First 32-bit processor with 32-bit general-purpose registers  Uses segmentation and real-address mode to address memory  First processor to define the IA-32 architecture  Each segment can address 216 bytes = 64 KB  32-bit data bus and 32-bit address bus  8088 is a less expensive version of 8086  232 bytes ⇒ 4 GB address space  Uses an 8-bit data bus  Introduced paging , , and the  80186 is a faster version of 8086  Segmentation can be turned off

45/٤ 45 IA-32 Architecture Computer Organization and Assembly Language slide/٣ IA-32 Architecture Computer Organization and Assembly Language slide

Intel 80486 and Processors Intel Processor Family  80486 was introduced 1989  P6 Processor Family: , Pentium II and III  Improved version of Intel 80386  Pentium Pro was introduced in 1995  On-chip Floating-Point unit (DX versions)  Three-way superscalar : can execute 3 instructions per clock cycle  On-chip unified Instruction/Data Cache (8 KB)  36-bit address bus ⇒ up to 64 GB of physical address space  Uses Pipelining : can execute up to 1 instruction per clock cycle  Introduced dynamic execution  Out-of-order and speculative execution  Pentium (80586) was introduced in 1993  Integrates a 256 KB second level L2 cache on-chip  Wider 64-bit data bus, but address bus is still 32 bits  Pentium II was introduced in 1997  Two execution pipelines: U-pipe and V-pipe  Added MMX instructions (already introduced on Pentium MMX)  Superscalar performance: can execute 2 instructions per clock cycle   Separate 8 KB instruction and 8 KB data caches Pentium III was introduced in 1999  Added SSE instructions and eight new 128-bit XMM registers  MMX instructions (later models) for multimedia applications

45/٦ 45 IA-32 Architecture Computer Organization and Assembly Language slide/٥ IA-32 Architecture Computer Organization and Assembly Language slide

١ and Family Pentium-M and EM64T  Pentium 4 is a seventh-generation architecture  ( Mobile ) was introduced in 2003  Introduced in 2000  Designed for low-power laptop computers  New micro-architecture design called Intel Netburst  Modified version of Pentium III, optimized for power efficiency  Very deep instruction , scaling to very high frequencies  Large second-level cache (2 MB on later models)   Introduced the SSE2 instruction set (extension to SSE) Runs at lower clock than Pentium 4, but with better performance  Tuned for multimedia and operating on the 128-bit XMM registers  64-bit Technology (EM64T)   In 2002, Intel introduced Hyper-Threading technology Introduced in 2004  64-bit superset of the IA-32 processor architecture  Allowed 2 programs to run simultaneously, sharing resources  64-bit general-purpose registers and integer support  Xeon is Intel's name for its server-class microprocessors  Number of general-purpose registers increased from 8 to 16  Xeon chips generally have more cache  64-bit pointers and flat  Support larger multiprocessor configurations  Large physical address space: up to 240 = 1 Terabytes

45/٨ 45 IA-32 Architecture Computer Organization and Assembly Language slide/٧ IA-32 Architecture Computer Organization and Assembly Language slide

Intel History MicroArchitecture  64-bit cores  Wide dynamic execution (execute four instructions simultaneously)  Intelligent power capability (power gating)  Advanced smart cache (shares L2 cache between cores)  Smart memory access (memory disambiguation)  Advanced digital media boost See the demo at http://www.intel.com/technology/architecture/coremicro/d emo/demo.htm?iid=tech_core+demo

45/ ١٠ 45 IA-32 Architecture Computer Organization and Assembly Language slide/٩ IA-32 Architecture Computer Organization and Assembly Language slide

CISC and RISC Next ...  CISC – Complex Instruction Set Computer  Intel Microprocessors  Large and complex instruction set  IA-32 Registers  Variable width instructions  Instruction Execution Cycle  Requires interpreter   Each instruction is decoded into a sequence of micro-operations IA-32 Memory Management  Example: Intel x86 family  RISC – Reduced Instruction Set Computer  Small and simple instruction set  All instructions have the same width  Simpler instruction formats and addressing modes  Decoded and executed directly by hardware  Examples: ARM, MIPS, PowerPC, SPARC, etc.

45/ ١٢ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ١١ IA-32 Architecture Computer Organization and Assembly Language slide

٢ Basic Program Execution Registers General-Purpose Registers  Registers are high speed memory inside the CPU  Used primarily for arithmetic and data movement  Eight 32-bit general-purpose registers  mov eax, 10 move constant 10 into register eax  Six 16-bit segment registers  Specialized uses of Registers  Processor Status Flags (EFLAGS) and Instruction Pointer (EIP)  EAX – Accumulator register  Automatically used by multiplication and division instructions 32-bit General-Purpose Registers  ECX – Counter register EAX EBP  EBX ESP Automatically used by LOOP instructions ECX ESI  ESP – Stack Pointer register EDX EDI  Used by PUSH and POP instructions, points to top of stack 16-bit Segment Registers  ESI and EDI – Source Index and Destination Index register EFLAGS CS ES  Used by string instructions SS FS  EBP – Base Pointer register EIP DS GS  Used to reference parameters and local variables on the stack

45/ ١٤ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ١٣ IA-32 Architecture Computer Organization and Assembly Language slide

Accessing Parts of Registers Accessing Parts of Registers  EAX, EBX, ECX, and EDX are 32-bit Extended registers  Programmers can access their 16-bit and 8-bit parts  Lower 16-bit of EAX is named AX  AX is further divided into 8 8 AH AL 8 bits + 8 bits  AL = lower 8 bits

 AH = upper 8 bits AX 16 bits  ESI, EDI, EBP, ESP have only EAX 16-bit names for lower half 32 bits

45/ ١٦ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ١٥ IA-32 Architecture Computer Organization and Assembly Language slide

Special-Purpose & Segment Registers EFLAGS Register  EIP = Extended Instruction Pointer  Contains address of next instruction to be executed  EFLAGS = Extended Flags Register  Contains status and control flags  Each flag is a single binary bit  Six 16-bit Segment Registers  Support segmented memory  Six segments accessible at a time  Status Flags  Segments contain distinct contents  Status of arithmetic and logical operations  Code  Control and System flags  Data  Control the CPU operation  Stack  Programs can set and clear individual bits in the EFLAGS register

45/ ١٨ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ١٧ IA-32 Architecture Computer Organization and Assembly Language slide

٣ Status Flags Floating-Point, MMX, XMM Registers  Carry Flag  Floating-point unit performs high speed FP operations  Set when unsigned arithmetic result is out of range  Eight 80-bit floating-point data registers  Overflow Flag   Set when signed arithmetic result is out of range ST(0), ST(1), . . . , ST(7) ST(0)  Sign Flag  Arranged as a stack ST(1)  ST(2) Copy of sign bit , set when result is negative  Used for floating-point arithmetic  Zero Flag ST(3)   Set when result is zero Eight 64-bit MMX registers ST(4)  Auxiliary Carry Flag  Used with MMX instructions ST(5)  ST(6) Set when there is a carry from bit 3 to bit 4  Eight 128-bit XMM registers  Parity Flag ST(7)  Used with SSE instructions  Set when parity is even  Least-significant byte in result contains even number of 1s

45/ ٢٠ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ١٩ IA-32 Architecture Computer Organization and Assembly Language slide

Registers in Intel Core Microarchitecture Next ...  Intel Microprocessors  IA-32 Registers  Instruction Execution Cycle  IA-32 Memory Management

45/ ٢٢ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٢١ IA-32 Architecture Computer Organization and Assembly Language slide

Fetch-Execute Cycle Instruction Execute Cycle  Each machine language instruction is first fetched from the memory and stored in an Instruction Register (IR). Instruction Obtain instruction from program storage  The address of the instruction to be fetched is stored in a Fetch register called Program Counter or simply PC . In some Instruction Determine required actions and instruction size computers this register is called the Instruction Pointer Decode or IP . Operand Locate and obtain operand data  After the instruction is fetched, the PC (or IP ) is Fetch Flash Movie

incremented to point to the address of the next Infinite Cycle Execute Compute result value and status instruction. Writeback  The fetched instruction is decoded (to determine what Deposit results in storage for later use needs to be done) and executed by the CPU. Result

Flash Movie

45/ ٢٤ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٢٣ IA-32 Architecture Computer Organization and Assembly Language slide

٤ Instruction Execution Cycle – cont'd Pipelined Execution  PC program Instruction execution can be divided into stages  Instruction Fetch I1 I2 I3 I4 . . . fetch  Pipelining makes it possible to start an instruction before memory read  Instruction Decode op1 op2 completing the execution of previous one  registers registers Operand Fetch instruction I1 register Stages For k stages and n instructions, the  Execute decode S1 S2 S3 S4 S5 S6 number of required cycles is: k + n – 1  Result Writeback 1 I-1 write write 2 I-1 flags ALU 3 I-1 Stages execute 4 I-1 S1 S2 S3 S4 S5 S6 (output) 5 I-1 1 I-1 6 I-1 2 I-2 I-1 7 I-2 Cycles 3 I-2 I-1 8 I-2 9 I-2 4 I-2 I-1

10 I-2 Cycles 5 Pipelined I-2 I-1 11 I-2 6 Execution I-2 I-1 12 I-2 7 I-2

45/ ٢٦ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٢٥ IA-32 Architecture Computer Organization and Assembly Language slide

Wasted Cycles (pipelined) Superscalar Architecture   When one of the stages requires two or more clock A superscalar processor has multiple execution pipelines cycles to complete, clock cycles are again wasted  The Pentium processor has two execution pipelines  Called U and V pipes  Assume that stage S4 is the Stages  execute stage exe In the following, stage Stages S1 S2 S3 S4 S5 S6 S4 has 2 pipelines S4  Assume also that S4 requires 1 I-1 S1 S2 S3 u v S5 S6 2 clock cycles to complete 2 I-2 I-1  Each pipeline still 3 I-3 I-2 I-1 1 I-1 requires 2 cycles 2 I-2 I-1  As more instructions enter the 4 I-3 I-2 I-1 5 I-3 I-1 3 I-3 I-2 I-1 pipeline, wasted cycles occur  Second pipeline 4 I-4 I-3 I-2 I-1 Cycles 6 I-2 I-1 5 I-4 I-3 I-1 I-2 7 I-2 I-1 eliminates wasted cycles

 Cycles 6 I-4 I-3 I-2 I-1 For k stages, where one 8 I-3 I-2 7 I-3 I-4 I-2 I-1 stage requires 2 cycles, n 9 I-3 I-2  For k stages and n 8 I-4 I-3 I-2 10 I-3 instructions require k + 2n – 1 instructions, number of 9 I-4 I-3 11 I-3 cycles cycles = k + n 10 I-4

45/ ٢٨ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٢٧ IA-32 Architecture Computer Organization and Assembly Language slide

Next ... Modes of Operation  Intel Microprocessors  Real-Address mode (original mode provided by 8086)  IA-32 Registers  Only 1 MB of memory can be addressed, from 0 to FFFFF (hex)  Programs can access any part of main memory  Instruction Execution Cycle  MS-DOS runs in real-address mode  IA-32 Memory Management  Protected mode  Each program can address a maximum of 4 GB of memory  The assigns memory to each running program  Programs are prevented from accessing each other’s memory  Native mode used by Windows NT, 2000, XP, and   Processor runs in protected mode, and creates a virtual 8086 machine with 1 MB of address space for each running program

45/ ٣٠ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٢٩ IA-32 Architecture Computer Organization and Assembly Language slide

٥ Program Segments

 Memory segmentation is necessary since the 20-bits memory  Machine language programs usually have 3 different parts stored in addresses cannot fit in the 16-bits CPU registers different memory segments:  Since x86 registers are 16-bits wide, a memory segment is made of  Instructions : This is the code part and is stored in the code segment 216 consecutive words (i.e. 64K words)  Data : This is the data part which is manipulated by the code and is  Each segment has a number identifier that is also a 16-bit number stored in the data segment (i.e. we have segments numbered from 0 to 64K)  Stack : The stack is a special memory buffer organized as Last-In-First-  A memory location within a memory segment is referenced by Out (LIFO) structure used by the CPU to implement procedure calls specifying its offset from the start of the segment. Hence the first and as a temporary holding area for addresses and data. This data word in a segment has an offset of 0 while the last one has an offset structure is stored in the stack segment of FFFFh  The segment numbers for the code segment, the data segment, and  To reference a memory location its logical address has to be the stack segment are stored in the segment registers CS , DS , and specified. The logical address is written as: SS , respectively.  Segment number:offset  Program segments do not need to occupy the whole 64K locations  For example, A43F:3487h means offset 3487h within segment in a segment A43Fh.

45/ ٣٢ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٣١ IA-32 Architecture Computer Organization and Assembly Language slide

Real Address Mode Logical to Linear Address Translation  A program can access up to six segments Linear address = Segment × 10 (hex) + Offset at any time Example:  Code segment  Stack segment segment = A1F0 (hex)  Data segment offset = 04C0 (hex)  Extra segments (up to 3) logical address = A1F0:04C0 (hex)  Each segment is 64 KB what is the linear address?  Logical address Solution:  Segment = 16 bits A1F00 (add 0 to segment in hex)  Offset = 16 bits + 04C0 (offset in hex)  Linear (physical) address = 20 bits A23C0 (20-bit linear address in hex)

45/ ٣٤ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٣٣ IA-32 Architecture Computer Organization and Assembly Language slide

Segment Overlap Your turn . . .  There is a lot of overlapping What linear address corresponds to logical address between segments in the main 028F:0030? memory.  A new segment starts every Solution: 028F0 + 0030 = 02920 (hex) 10 h locations (i.e. every 16 locations). Always use hexadecimal notation for addresses  Starting address of a segment What logical address corresponds to the linear address always has a 0h LSD. 28F30h?  Due to segments overlapping logical addresses are not Many different segment:offset (logical) addresses can unique . produce the same linear address 28F30h. Examples: 28F3:0000, 28F2:0010, 28F0:0030, 28B0:0430, . . .

45/ ٣٦ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٣٥ IA-32 Architecture Computer Organization and Assembly Language slide

٦ Flat Memory Model Programmer View of Flat Memory   Modern operating systems turn segmentation off Same base address for all segments Linear address space of  All segments are mapped to the same a program (up to 4 GB)  Each program uses one 32-bit linear address space linear address space 32-bit address  Up to 232 = 4 GB of memory can be addressed  ESI EIP Register EDI DATA  Segment registers are defined by the operating system  Points at next instruction 32-bit address  EIP All segments are mapped to the same linear address space  ESI and EDI Registers CODE  In assembly language, we use .MODEL flat directive  32-bit address Contain data addresses EBP STACK  To indicate the Flat memory model  Used also to index arrays ESP   ESP and EBP Registers CS A linear address is also called a virtual address DS Unused   Operating system maps virtual address onto physical addresses ESP points at top of stack SS ES  EBP is used to address parameters and  Using a technique called paging base address = 0 variables on the stack for all segments

45/ ٣٨ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٣٧ IA-32 Architecture Computer Organization and Assembly Language slide

Protected Mode Architecture Logical to Linear Address Translation  Logical address consists of  16-bit segment selector (CS, SS, DS, ES, FS, GS)  32-bit offset (EIP, ESP, EBP, ESI ,EDI, EAX, EBX, ECX, EDX) Upper 13 bits of  Segment unit translates logical address to linear address segment selector GDTR, LDTR are used to index  Using a segment descriptor table the descriptor table  Linear address is 32 bits (called also a virtual address )  Paging unit translates linear address to physical address  Using a page directory and a page table

TI = Table Indicator Select the descriptor table 0 = 1 = Local Descriptor Table

45/ ٤٠ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٣٩ IA-32 Architecture Computer Organization and Assembly Language slide

Segment Descriptor Tables Segment Descriptor Details  Global descriptor table (GDT)  Base Address  Only one GDT table is provided by the operating system  32-bit number that defines the starting location of the segment  GDT table contains segment descriptors for all programs  32-bit Base Address + 32-bit Offset = 32-bit Linear Address  Also used by the operating system itself  Segment Limit  Table is initialized during boot up  20-bit number that specifies the size of the segment  GDT table address is stored in the GDTR register  The size is specified either in bytes or multiple of 4 KB pages  Modern operating systems (Windows-XP) use one GDT table  Using 4 KB pages, segment size can range from 4 KB to 4 GB  Local descriptor table (LDT)  Access Rights  Another choice is to have a unique LDT table for each program  Whether the segment contains code or data  LDT table contains segment descriptors for only one program  Whether the data can be read-only or read & written  LDT table address is stored in the LDTR register  Privilege level of the segment to protect its access

45/ ٤٢ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٤١ IA-32 Architecture Computer Organization and Assembly Language slide

٧ Segment Visible and Invisible Parts Paging  Visible part = 16-bit Segment Register  Paging divides the linear address space into …  CS, SS, DS, ES, FS, and GS are visible to the programmer  Fixed-sized blocks called pages , Intel IA-32 uses 4 KB pages  Invisible Part = Segment Descriptor (64 bits)  Operating system allocates main memory for pages  Automatically loaded from the descriptor table  Pages can be spread all over main memory  Pages in main memory can belong to different programs  If main memory is full then pages are stored on the hard disk  OS has a Virtual Memory Manager (VMM)  Uses page tables to map the pages of each running program  Manages the loading and unloading of pages  As a program is running, CPU does address translation  Page fault: issued by CPU when page is not in memory

45/ ٤٤ 45 IA-32 Architecture Computer Organization and Assembly Language slide/ ٤٣ IA-32 Architecture Computer Organization and Assembly Language slide

Paging – cont’d Main Memory The operating system uses Page m . . . Page n page tables to ...... map the pages in the linear 1 Program Page 2 Page 2 2 Program virtual address Page 1 Page 1 space onto space of space Page 0 Page 0 of space main virtual linear address virtual linear Hard Disk The operating Each running Pages that cannot system swaps program has fit in main memory pages between its own page are stored on the memory and the table hard disk hard disk

As a program is running, the processor translates the linear virtual addresses onto real memory (called also physical ) addresses

45/ ٤٥ IA-32 Architecture Computer Organization and Assembly Language slide

٨