Computer Organization & Assembly Language Programming

Computer Organization & Assembly Language Programming CSE 2312 Lecture 3 Processor 1 Metric Units The principal metric prefixes. 2 Measuring Computer and CPU Performance • Elapsed time – Total response time, including all aspects, such as processing, I/O, OS overhead, and idle time – Determines system performance Time spent • CPU time executing this – CPU time spent on processing a given job program’s – Not consider I/O time, and other jobs’ shares instructions – User CPU time + system CPU time – Different programs are affected differently by CPU and system performance 3 CPU Clock • Every action is driven by a clock in the CPU • Clock time = 1/Frequency – 1 Mhz clock = 10–6 seconds – 1 Ghz clock = 10–9 seconds 4 How Long Does an Instruction Take? • Digital logic is controlled by a clock Clock period Clock (cycles) Data transfer and computation Update state • Clock period (clock cycle time): duration of a clock cycle – e.g., 250ps = 0.25ns = 250×10–12s • Clock frequency (clock rate): cycles per second – e.g., 4.0GHz = 4000MHz = 4.0×109Hz 5 Calculating CPU Time Instruction Cycles per Count (IC) instruction (CPI) Instructions Clock cycles Seconds CPU Time Program Instruction Clock cycle 6 Instruction Count and Cycles per Instruction • IC is determined by program, ISA, and compiler • CPI is determined by CPU and other factors – Different instructions have different CPI – Average CPI is affected by instruction mix – Often measured with execution of a large number of instructions Clock Cycles Instruction Count (IC) Cycles per Instruction (CPI) CPU Time ICCPI Clock Cycle Time ICCPI Clock Rate 7 Improving CPU Time Usually a tradeoff CPU Time CPU Clock CyclesClock Cycle Time CPU Clock Cycles Clock Rate n Clock Cycles CPIi Instruction Count i i1 n Clock Cycles Instruction Count i CPI CPI i Instruction Count i1 Instruction Count Relative frequency 8 Compiler Matters! • Suppose compiler has two choices: – Can use 5 or 6 instructions, as described below: Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 2 has lower average CPI, so it is better. • Which is better? • Sequence 1: IC = 5 • Sequence 2: IC = 6 – Clock Cycles – Clock Cycles = 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3 = 10 = 9 – Avg. CPI = 10/5 = 2.0 – Avg. CPI = 9/6 = 1.5 9 Comparing Performance • Performance = 1 / Execution Time • “X is n times faster than Y” n = (Performancex) / (Performancey) = (Execution Timey) / (Execution Timex) • Example: time taken to run a program – 10s on A, 15s on B… how much faster is A? – Execution TimeB / Execution TimeA = 15s / 10s = 1.5 – So A is 1.5 times faster than B 10 CPI Example • Computer A: Cycle Time = 250ps, CPI = 2.0 • Computer B: Cycle Time = 500ps, CPI = 1.2 • Same ISA • Which is faster, and by how much? CPU TimeA Instruction Count CPIA Cycle TimeA I 2.0 250ps I500ps A is faster… CPU TimeB Instruction Count CPIB Cycle TimeB I1.2500ps I600ps CPU Time I600ps B 1.2 …by this much CPU TimeA I 500ps 11 CPU Example • Computer A: – 2GHz clock, 10s CPU time • Let’s design Computer B – Aim for 6s CPU time – Can do faster clock, but causes 1.2x clock cycles for the same program • How fast must the new clock be? Clock CyclesB 1.2×Clock CyclesA Clock RateB = = CPU TimeB 6s Clock CyclesA = CPU TimeA ×Clock Rate A = 10s×2GHz = 20×109 1.2×20×109 24 ×109 Clock Rate = = = 4GHz B 6s 6s 12 Time for a Program • CPU executes various instructions • A Program has a number of Instructions, how many? – Depends on program and compiler • Each Instruction takes a number of CPU cycles, how many? – Depends on the Instruction Set Architecture (ISA) – ISA: will learn in this course • Each cycle has a fixed time based on CPU and BUS speed. – Depends on the hardware, organization – Computer Architecture – will learn in this course 13 CPU Performance Equation 14 Performance Summary • Performance depends on – Algorithm: affects IC, possibly CPI – Programming language: affects IC, CPI – Compiler: affects IC, CPI – Instruction set architecture: affects IC, CPI, Tc Instructions Clock cycles Seconds CPU Time Program Instruction Clock cycle 15 How to Improve Performance? We must lower execution time! • Algorithm – Determines number of operations executed • Programming language, compiler, architecture – Determine number of machine instructions executed per operation (IC) • Processor and memory system – Determine how fast instructions are executed (CPI) • I/O system (including OS) – Determines how fast I/O operations are executed 16 Amdahl’s Law • Improving one aspect of a computer won’t give a proportional improvement in overall performance T T affected T improved improvement factor unaffected • Especially true when parallelism is considered • So try to identify and remove performance bottleneck! 17 Exercise 1 • Problem – There are 3 classes of instructions, A, B, C. Suppose compiler has two choices: Sequence 1 and Sequence 2, as described below: Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 3 1 1 • Which one is better? Why? • Sequence 1: IC = 5 • Sequence 2: IC = 5 – Clock Cycles = 2×1 + 1×2 – Clock Cycles= 3×1 + 1×2 + 2×3 = 10 + 1×3 = 8 – Avg. CPI = 10/5 = 2.0 – Avg. CPI = 8/5 = 1.6 Sequence 2 has lower average CPI, so it is better. 18 Exercise 2 • Problem: – There are two computers: A and B. – Computer A: Cycle Time = 250ps, CPI = 2.0 – Computer B: Cycle Time = 400ps, CPI = 1.5 – If they have the same ISA, which computer is faster? – How many times it is faster than another? • Answer: – We know that CPU = IC * CPI * Cycle time – Therefore, CPU(A) = IC*2*250 = 500*IC – CPU(B) = IC*1.5*400 = 600*IC – So, A is (600/500) = 1.2 times faster. 19 Exercise 3 • Problem: – Computer A has 2GHz clock. It takes 10s CPU time to finish one given task. – We want to design Computer B to finish the same task within 5s CPU time. – The clock cycle number for computer B is 2 times as that of Computer A. – What clock rate should be designed for Computer B? Clock CyclesB 2×Clock CyclesA • Answer: Clock Rate B = = CPU TimeB 5s Clock CyclesA = CPU Time A ×Clock Rate A = 10s×2GHz = 20×109 2×20×109 40×109 Clock Rate = = = 8GHz B 5s 5s 20 Central Processing Unit (CPU) The organization of a simple computer with one CPU and two I/O devices 21 Basic Elements Other devices: Cache; Virtual Memory Support (MMU); …. 22 Processor • CPU – Brain of the computer, executing programs stored in the main memory by fetching instructions, examining and executing them one after another • Bus – Connect different components – Parallel wires for transmitting address, data and control signals – Could be external to the CPU (connect with memory, I/O), or internal • Control Unit – Fetching instructions from main memory and determining their types • Arithmetic Logic Unit (ALU) – Performing arithmetic operations, such as addition, Boolean operations • Registers – High speed memory used to store temporary results and control information – Program Counter (PC): point to the next instruction to be fetched – Instruction Registers (IR): hold the instruction currently being executed 23 CPU Organization • Instructions: – Register-Memory: memory words being fetched into registers – Register-Register • Data Path Cycle – The process of running two operands through the ALU and storing results – Define what the machine can do – The faster the data path cycles is, the faster the computer runs The data path of a typical Von Neumann machine 24 Arithmetic Logic Unit (ALU) • Conduct different calculations A B – +, -, x, /, – and, or, xor, not, – Shift, … op c • Variants ALU n – Integer, Floating Point, Double Precision 4 z – High performance CPU may have v multiple. C • Input – Operands - A, B – Operation code: obtained from Status codes:Usually encoded instruction c - carry out from +, -, x, shift • Output n - result is negative – Result – C z - result is zero – Status codes v - result overflowed 25 Instruction Execution Steps • Fetch-decode-execute Central to the operation of all computers – Fetch next instruction from memory into instruction register – Change the program counter to point out the following instruction – Determine type of instruction just fetched – If instructions uses a word in memory, determine where it is – Fetch the word, if needed, into a CPU register – Execute the instruction – Go to step 1 to begin executing following instruction 26 Interpreting Instructions • Interpreter – A program that fetches, examines, and executes the instructions of other program. – Can write a program to imitate the function of a CPU – Main advantage: the ability to design a simple processor to support a rich set of instructions. • Benefits (simple computer with interpreted instructions) – The ability to fix incorrectly implemented instructions or make up for design deficiencies in the basic hardware – The opportunity to add new instructions at minimal cost even after delivery of the machine – Structured design that permits efficient development, testing, and documenting of complex instructions 27 An interpreter for a simple computer (written in Java). • Figure 2-3. An interpreter for a simple computer (written in Java). An interpreter for a simple computer (cont’d) RISC vs. CISC • Semantic Gap between – What machine can do? – What high-level programming languages required? • Reduced Instruction Set Computer (RISC) (Lego Building Example) – Did not use the interpretation – Did not have to be backward compatible with existing products – Small number of instructions, 50 • Key of designing RISC instructions – Designed instructions should be able to

Computer Organization & Assembly Language Programming

45-Year CPU Evolution: One Law and Two Equations

Cuda C Best Practices Guide

Multi-Cycle Datapathoperation

CS2504: Computer Organization

Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗

A Performance Analysis Tool for Intel SGX Enclaves

ESC-470: ARM 9 Instruction Set Architecture with Performance

A Characterization of Processor Performance in the VAX-1 L/780

Autotuning GPU Kernels Via Static and Predictive Analysis

The Anatomy of the ARM Cortex-M0+ Processor

Microarchitecture-Level Power-Performance Simulators: Modeling, Validation, and Impact on Design

Assembly Language Programming (Part 1) 2 7