X86 Instruction Set 4.2 Why Learn Assembly

4.1 CS356 Unit 4 Intro to x86 Instruction Set 4.2 Why Learn Assembly • To understand something of the limitation of the HW we are running on • Helpful to understand performance • To utilize certain HW options that high-level languages don't allow (e.g. operating systems, utilizing special HW features, etc.) • To understand possible security vulnerabilities or exploits • Can help debugging 4.3 Compilation Process CS:APP 3.2.2 void abs(int x, int* res) • Demo of assembler { if(x < 0) *res = -x; – $ g++ -Og -c -S file1.cpp else *res = x; • Demo of hexdump } Original Code – $ g++ -Og -c file1.cpp – $ hexdump -C file1.o | more Disassembly of section .text: 0000000000000000 <_Z3absiPi>: 0: 85 ff test %edi,%edi 2: 79 05 jns 9 <_Z3absiPi+0x9> • Demo of 4: f7 df neg %edi 6: 89 3e mov %edi,(%rsi) 8: c3 retq objdump/disassembler 9: 89 3e mov %edi,(%rsi) b: c3 retq – $ g++ -Og -c file1.cpp Compiler Output – $ objdump -d file1.o (Machine code & Assembly) Notice how each instruction is turned into binary (shown in hex) 4.4 Where Does It Live • Match (1-Processor / 2-Memory / 3-Disk Drive) where each item resides: – Source Code (.c/.java) = 3 – Running Program Code = 2 – Global Variables = 2 – Compiled Executable (Before It Executes) = 3 – Current Instruction Being Executed = 1 – Local Variables = 2 (1) Processor (2) Memory (3) Disk Drive 4.5 BASIC COMPUTER ORGANIZATION 4.6 Processor • Performs the same 3-step process over and over again – Fetch an instruction from Processor Arithmetic 3 Add the memory Circuitry specified values Decode 2 It’s an ADD – Decode the instruction Circuitry • Is it an ADD, SUB, etc.? 1 Fetch – Execute the instruction Instruction System Bus • Perform the specified operation • This process is known as the ADD SUB Instruction Cycle CMP Memory 4.7 Processor CS:APP 1.4 • 3 Primary Components inside a processor – ALU – Registers – Control Circuitry • Connects to memory and I/O via address, data, and control buses (bus = group of wires) Bus Processor Memory PC/IP 0 Addr Control 0 op. 1 ALU 2 out ADD, in1 SUB, Data 3 AND, 4 OR in2 R0-R31 5 Control 6 4.8 Arithmetic and Logic Unit (ALU) • Digital circuit that performs arithmetic operations like addition and subtraction along with logical operations (AND, OR, etc.) Processor Memory Addr ADD 0 op. 1 ALU 0x0123 2 out ADD, in1 SUB, Data 3 0x0579 0x0456 AND, 4 OR in2 5 Control 6 4.9 Registers • Recall memory is SLOW compared to a processor • Registers provide fast, temporary storage locations within the processor Processor Memory PC/IP Addr 0 op. 1 ALU 0x0123 2 out ADD, in1 SUB, 0x0456 Data 3 AND, 4 OR in2 R0-Rn-1 5 Control 6 4.10 General Purpose Registers • Registers available to software instructions for use by the programmer/compiler • Programmer/compiler is in charge of using these registers as inputs (source locations) and outputs (destination locations) Processor Memory PC/IP Addr 0 op. 1 ALU 2 out ADD, in1 SUB, Data 3 AND, 4 OR in2 R0-Rn-1 5 Control 6 4.11 What if we didn’t have registers? • Example w/o registers: F = (X+Y) – (X*Y) – Requires an ADD instruction, MULtiply instruction, and SUBtract Instruction – w/o registers • ADD: Load X and Y from memory, store result to memory • MUL: Load X and Y again from mem., store result to memory • SUB: Load results from ADD and MUL and store result to memory • 9 memory accesses Processor Memory PC/IP Addr 0 X op. 1 Y ALU 2 out ADD, in1 SUB, Data 3 F AND, 4 OR in2 R0-Rn-1 5 Control 6 4.12 What if we have registers? • Example w/ registers: F = (X+Y) – (X*Y) – Load X and Y into registers – ADD: R0 + R1 and store result in R2 – MUL: R0 * R1 and store result in R3 – SUB: R2 – R3 and store result in R4 – Store R4 back to memory – 3 total memory access Processor Memory PC/IP Addr 0 X op. 1 Y ALU 2 out ADD, in1 X SUB, Y Data 3 F AND, 4 OR in2 R0-Rn-1 5 Control 6 4.13 Other Registers • Some bookkeeping information is needed to make the processor operate correctly • Example: Program Counter/Instruction Pointer (PC/IP) Reg. – Recall that the processor must fetch instructions from memory before decoding and executing them – PC/IP register holds the address of the next instruction to fetch Processor Memory PC/IP Addr 0 op. 1 ALU 2 out ADD, in1 SUB, Data 3 AND, 4 OR in2 R0-Rn-1 5 Control 6 4.14 Fetching an Instruction • To fetch an instruction – PC/IP contains the address of the instruction – The value in the PC/IP is placed on the address bus and the memory is told to read – The PC/IP is incremented, and the process is repeated for the next instruction Processor PC/IP = Addr = 0 Memory PC/IP 0 Addr 0 inst. 1 op. 1 inst. 2 ALU Data = inst.1 machine code 2 inst. 3 out ADD, in1 SUB, Data 3 inst. 4 AND, 4 inst. 5 OR in2 Control = Read R0-Rn-1 … Control FF 4.15 Fetching an Instruction • To fetch an instruction – PC/IP contains the address of the instruction – The value in the PC/IP is placed on the address bus and the memory is told to read – The PC/IP is incremented, and the process is repeated for the next instruction Processor PC/IP = Addr = 1 Memory PC/IP 1 Addr 0 inst. 1 op. 1 inst. 2 ALU Data = inst.2 machine code 2 inst. 3 out ADD, in1 SUB, Data 3 inst. 4 AND, 4 inst. 5 OR in2 Control = Read R0-Rn-1 … Control FF 4.16 Control Circuitry • Control circuitry is used to decode the instruction and then generate the necessary signals to complete its execution • Controls the ALU • Selects Registers to be used as source and destination locations Processor Memory PC/IP 0 Addr Control 0 inst. 1 op. 1 inst. 2 ALU 2 inst. 3 out ADD, in1 SUB, Data 3 inst. 4 AND, 4 inst. 5 OR in2 R0-Rn-1 … Control FF 4.17 Control Circuitry • Assume 0x0201 is machine code for an ADD instruction of R2 = R0 + R1 • Control Logic will… – select the registers (R0 and R1) – tell the ALU to add – select the destination register (R2) Processor 0 Memory PC/IP 0 Addr Control 0 0201 ADD 1 inst. 2 0x0123 0201 2 inst. 3 out ALU in1 0x0456 Data 3 inst. 4 ADD 0x0579 4 in2 inst. 5 R0-Rn-1 … Control FF 4.18 Summary • Registers are used for fast, temporary storage in the processor – Data (usually) must be moved into registers • The PC or IP register stores the address of the next instruction to be executed – Maintains the current execution location in the program 4.19 UNDERSTANDING MEMORY 4.20 Memory and Addresses • Set of cells that each store a group of bits Address Data A[0] – Usually, 1 byte (8 bits) per 0 11010010 cell 1 01001011 … 2 10010000 • Unique address (number) A[n-1] assigned to each cell Inputs Address 3 11110100 – Used to reference the value 4 01101000 5 11010001 in that location D[0] • Data and instructions are … … both stored in memory and D[7] are always represented as a FFFF 00001011 string of 1’s and 0’s Inputs/Outputs Data Memory Device 4.21 Reads & Writes 0 11010010 • Memories perform 2 operations 2 1 01001011 Addr. – Read: retrieves data value in a 10010000 2 10010000 Data particular location (specified using 3 11110100 the address) Processor 4 01101000 – Write: changes data in a location 5 11010001 Read … to a new value Control • To perform these operations a set of address, data, and control FFFF 00001011 wires are used to talk to the A Read Operation memory 0 11010010 1 01001011 5 – Note: A group of wires/signals is Addr. 2 10010000 referred to as a ‘bus’ 00000110 Data 3 11110100 – Thus, we say that memories have 4 01101000 Processor an address, data, and control bus. 5 00000110 … Write Control FFFF 00001011 System Bus (address, data, control wires) A Write Operation 4.22 Memory vs. I/O Access • Processor performs reads and writes to communicate with memory and I/O devices – I/O devices have memory locations that contain data that the processor can access – All memory locations (be it RAM or I/O) have unique addresses which are used to identify them – The assignment of memory addresses is known Processor Memory as the physical memory map 0 code … 0x3ffffff data A D C Video Interface 800 FE may 8000000 FE FE signify a … WRITE white dot at a particular 01 location ‘a’ = 61 hex Keyboard in ASCII Interface 4000000 61 4.23 Address Space Size and View Logical View 0xf_ffff_ffff Memory Processor I/O Dev 2 I/O Dev 1 System (Addr. + Data) Bus OS Code (Addr = 36-39 bits, Data = 64) OS Stack Logical Address & Data bus widths = 64-bits User Stack Globals • Most computers are byte-addressable Code movl %rax,(%rdx) – Each unique address corresponds to 1-byte of addl %rcx,%rax ... memory (so we can access char variables) 0x0 • Address width determines max amount of Logical view of memory address/memory space – Every byte of data has a unique address – 32-bit addresses => 4 GB address space – 36-bit address bus => 64 GB address space 4.24 Data Bus & Data Sizes • Moore's Law meant we could build systems with more transistors Processor Data Bus • More transistors meant greater bit-widths Width – Just like more physical space allows for wider Intel 8088 8-bit roads/freeways, more transistors allowed us to move to 16-, 32- and 64-bit circuitry inside the processor Intel 8086 16-bit • To support smaller variable sizes (char = 1-byte) we Intel 80386 32-bit still need to access only 1-byte of memory per Intel Pentium 64-bit access, but to support int and long ints we want to access 4- or 8-byte chunks of memory per access Processor • Thus the data bus (highway connecting the processor and memory) has been getting wider (i.e.

X86 Instruction Set 4.2 Why Learn Assembly

Fill Your Boots: Enhanced Embedded Bootloader Exploits Via Fault Injection and Binary Analysis

Copy on Write Based File Systems Performance Analysis and Implementation

Pdp11-40.Pdf

The Linux Kernel Module Programming Guide

Mac Labs. More Creativity

Computer Organization and Architecture Designing for Performance Ninth Edition

Over View of Microprocessor 8085 and Its Application

ARM Instruction Set

Integrated Step-Down Converter with SVID Interface for Intel® CPU

Appendix D an Alternative to RISC: the Intel 80X86

The Von Neumann Computer Model 5/30/17, 10:03 PM

Introduction to Intel® FPGA IP Cores