Assembly Language
Total Page:16
File Type:pdf, Size:1020Kb
x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures topics • Preliminary material – a look at what Assembly Language works with - How processors work » A moment of history - Programming model » the registers » the memory The Processor and The System Unit Computer Organization – Programmer's View • Fundamental components: - CPU - Memory - I/O (“everything else”) • Possible connection schemes: - Single Controller (right) - Direct Memory Access (DMA) (below) the Von Neumann Central Processing Unit • a.k.a. CPU or Processor Control - (or microprocessor) Unit • Control unit interprets program instructions Arithmetic • ALU performs numerical Logic and logical operations Unit • Registers access program data Registers • von Neumann model: single connection to memory to/from - the "Von Neumann Bottleneck" memory John von Neumann’s IAS Machine • Built at the Institute for Advanced Studies, Princeton, NJ - 1945 - 1951 • Program instructions and data in single shared memory - ~5 Kilobytes Data Flow 1. Instructions move from memory (through cache) into Control Unit 2. Operands move between memory and data registers 3. Instruction execution causes data movement from/to memory (sometimes) - Computations carry operands from registers (or memory) to ALU, FPU, then back to registers (or memory) - Input/Output operations move values between devices and registers, or between devices and memory locations Modern CPU elements GPU • ALU, FPU, GPU - Do the computations • General-Purpose Data Arithmetic- Registers Registers Logic Units - Hold data for ALU, FPU Floating- • Specialized Registers Point Units - Control Program Flow Specialized Registers • Control Unit - Controls everything Control Unit Memory • Internal busses (a.k.a. - Connect things Store, Storage, • Cache cache or RAM) - Alleviates Von Neumann bottleneck - Generally not visible to programmer Multi-core Processors • Each "core" is a CPU • One chip holds 2, 4, 8, ... cores • Cores share memory • Cores may have local L1 cache, may share L2 and L3 cache • A CPU plus a GPU forms a heterogeneous multicore processor A 48-Core "System on a Chip" SoC Building Blocks Based on ARMv8 The Falkor ARMv8 Processor the Fetch-Execute Cycle • The Control Unit's pulse: 1. Instruction address to Memory 2. Start read; increment instruction address 3. Decode instruction 4. Fetch operands 5. Command operation 6. Retire instruction » Write any results to registers/memory 7. Repeat with next instruction address The Fetch-Execute Cycle, expanded phase 1 – Fetch phase 2 – Decode phase 3 – Execute phase 4 – Retire Next Cycle: phase 1 – Fetch Pipelined Designs • Separate circuits perform each stage of the fetch-execute cycle • In a pipelined design the separate circuits work simultaneously, on sequential instructions 16 / 32 / 64-bit x86 Architectures st Intel - 1 microprocessors AMD – developed x86- 1971: 4-bit 4004 compatible processors 1972: 8-bit 8008 (1974: 8080) 1982: AMD contracted as second-source 1978: 16-bit 8086 (1979: 8088) for 8086 & 8088 origin of the x86 architecture 1991: Am386, clone of 80386 1995: K5, Pentium competitor 1985: 32-bit 80386 1997: K6 basis of the IA-32 family 1999: Athlon (K7) 1989: 80486 (renamed i486) (Family 4) Not socket-compatible with Intel 1993: Pentium (P5 / Family 5) processors 1995: Pentium Pro (P6 / Family 6) 2003: 64-bit Opteron, 1999: Pentium III Athlon 64 (K8) 2000: Pentium 4 (Family 15) 1st x86-64 processors 2005: 64-bit Pentium 4F, dual-core Pentium D backwards-compatible with IA-32 compatible with x86-64 2005: dual-core Athlon 64 X2 2007: quad-core Opteron (K10) 2006: Core Duo (Family 6) 2007: Core 2 Quad 2010: Fusion “APU” (CPU/GPU on shared chip) demonstrated …etc. 8086 Registers – the direct ancestor x86 (IA-32) Registers – 32-bit Words x86_64 Registers Memory - the processor's primary storage • RAM - Random Access Memory is an array of data-storage locations. - Each location is called a word. Memory Organization • The memory subsystem is a 1-dimensional array of storage locations - Each location has a unique address (or "array index") - Each location is made up of m bits (m >= 1) - Location's contents is any one of 2m bit patterns • Memory can hold: - Programs: bit patterns represent instructions - Data: bit patterns represent numbers, characters, etc. • Volatile contents: R/W memory, or RAM - DRAM is common for main memory • Nonvolatile (unchangeable): Read-Only, or ROM - PROM, EEPROM, Flash – erasable/reprogrammable ROM Overall Memory Usage - Linux How a program occupies memory • When a program runs it occupies a block of RAM • The block is divided into instructions, data, and stack segments, each in part of the memory Dividing up the program memory • Each function, including main(), has its own unchanging portion of the instruction memory - Any global variables occupy memory similar to instruction memory. • Function-local variables exist in the stack only while the function is executing Memory Usage: the IBM PC’s Memory Map • The IBM PC design designated various ranges within the 8088's 1MB memory address space for specific uses. • Some of the ranges are "populated", some are empty – i.e., there is no physical memory at those addresses. • This is all justified by “640K should be enough for anyone”. 80x86 Memory Spaces • 80386 and later - 32 Address bits - 232 addresses: 0000 0000h .. FFFF FFFFh • 80286 - 24 Address bits - 224 addresses: 00 0000h .. FF FFFFh • Original 8086, etc. - 20 Address bits - 220 addresses: 0 0000h .. F FFFFh the 80x86 / MS-DOS Interrupt Vector Table What uses the Interrupt Vector Table? • DOS – originally was dependent on the IVT for basic services such as Disk access • As DOS became more sophisticated, it added its own “DOS services” for higher-level operations - Based on Interrupt 0x21, a.k.a. 21h • Modern operating systems largely bypass the IVT - The first few interrupts are Hardware Interrupts, and must be handled - Linux uses Interrupt 0x80 for system calls .