This Week's Action Items

Carnegie Mellon This Week’s Action Items Recap of Last Class: Floating Point • Read Chapter 3 • Background: Fractional binary numbers • Complete Quiz 4 • IEEE floating point standard: Definition • Start Assignment 2 • Example and properties – Pre-Assignment Due Date: September 18 at • Rounding, addition, multiplication 11:59 pm • Floating point in C • Summary 1 1 2 Floating Point Zero, Infinity, NAN • Recall scientific notation • Zero – 0|0…0|0…0 – 65.4 = 6.54x101 (normal form – 1 non-zero leading digit) – Mantissa = 6.54, exponent = 1, radix (base) = 10 • +/- infinity – S|1…1|0…0 • Mantissa – sign/magnitude representation • Nan – S|1…1|non-zero • Exponent – biased notation Single-precision S|exp+127|significand 1| 8-bits | 23-bits Double-precision S|exp+1023|significand 1| 11-bits | 52-bits 3 4 3 4 1 Carnegie Mellon IEEE 754 Floating Point Rounding Closer Look at Nearest-Even • Default Rounding Mode: Round to nearest, ties to even • Four modes: – Hard to get any other kind without dropping into assembly – Round to +infinity – add 1 if round or sticky bit is set – All others are statistically biased • Sum of set of positive numbers will consistently be over- or under- and result >= 0 estimated – Round to –infinity – add 1 if round or sticky bit is set and result < 0 • Applying to Other Decimal Places / Bit Positions – Round to 0 (truncate) – When exactly halfway between two possible values – Round to nearest number (default) – round to even • Round so that least significant digit is even when exactly halfway – E.g., round to nearest hundredth 7.8949999 7.89 (Less than half way) • Add 1 if round bit and LSB of result are set, or, if round bit 7.8950001 7.90 (Greater than half way) and sticky bit are set 7.8950000 7.90 (Half way—round up) • Minimizes mean error introduced by rounding 7.8850000 7.88 (Half way—round down) 5 5 6 Carnegie Mellon Carnegie Mellon Rounding Binary Numbers Floating Point Addition • Binary Fractional Numbers • (–1)s1 M1 2E1 + (-1)s2 M2 2E2 Get binary points lined up – “Even” when least significant bit is 0 –Assume E1 > E2 E1–E2 – “Half way” when bits to right of rounding position = 100…2 (–1)s1 M1 • Exact Result: (–1)s M 2E • Examples –Sign s, significand M: + (–1)s2 M2 • Result of signed align & add – Round to nearest 1/4 (2 bits right of binary point) –Exponent E: E1 (–1)s M Value Binary Rounded Action Rounded Value • Fixing 2 3/32 10.000112 10.002 (<1/2—down) 2 –If M ≥ 2, shift M right, increment E 2 3/16 10.001102 10.012 (>1/2—up) 2 1/4 –if M < 1, shift M left k positions, decrement E by k –Overflow if E out of range 2 7/8 10.111002 11.002 ( 1/2—up) 3 –Round M to fit frac precision 2 5/8 10.101002 10.102 ( 1/2—down) 2 1/2 7 8 2 Floating Point – Additional bits for Floating Point Addition accuracy • Align decimal point by matching larger exponent • Carry/borrow bit – to take into account overflow • Add significands (possibly unnormalized) (carry/borrow) in result • Normalize – shift sum/adjust exponent • Guard bit – to take care of normalizing left shift • Round number to fit in significand field • Round bit – for correct rounding • Sticky bit – fine-tune rounding – additional bit to the right of round that is set to 1 if any 1 bit falls off the end of the round digit 9 9/14/2020 10 9 10 Carnegie Mellon FP Multiplication Floating Point Multiplication • (–1)s1 M1 2E1 x (–1)s2 M2 2E2 • Exponent of product = sum of exponents – bias • Exact Result: (–1)s M 2E • Multiply significands – decimal point location = – Sign s: s1 ^ s2 sum of digits to the right of decimals – Significand M: M1 x M2 • Normalize – check for overflow/underflow – Exponent E: E1 + E2 • Round • Sign – positive if both are the same, negative if • Fixing otherwise – If M ≥ 2, shift M right, increment E – If E out of range, overflow – Round M to fit frac precision 12 11 12 3 Exceptions • IEEE standard specifies defaults and allows traps to permit user to handle exceptions CSC 252: – Invalid operation: e.g., sqrt of –ve number, 0/0, inf/inf • Result is NaN Machine-Level Programming: – Overflow: result is +/- inf unless overflow exception is Instruction Set Architectures enabled – Divide by 0: result is +/- inf if exception not enabled – Underflow: non-zero result underflows to 0 – Inexact: rounded result not the actual result 13 14 13 14 This Module (~4 Lectures) Abstract Machines Machine Models Data Control C Program C 1) char 1) loops 2) int, float 2) conditionals mem proc 3) double 3) switch 4) struct, array 4) Proc. call 5) pointer 5) Proc. return Assembly Program Assembly 1) byte 3) branch/jump 2) 2-byte word 4) call mem regs alu 3) 4-byte long word 5) ret Machine Code Cond. 4) 8-byte quad word Stack processor Codes 5) contiguous byte allocation 6) address of initial byte 15 16 15 16 4 Assembly Language Characteristics • Minimal Data Types Instruction Set Architecture – “Integer” data of 1, 2, 4, or 8 bytes • There used to be many ISAs • Addresses (untyped pointers) • Data values – Motorola, x86, ARM, Power/PowerPC, Sparc, – Floating point data of 4, 8, or 10 bytes MIPS, IA64, z – No aggregate types such as arrays or structures – More consolidated today: ARM for mobile, x86 • Just contiguously allocated bytes in memory • Primitive Operations for others – Perform arithmetic function on register or memory data • There are even more microarchitectures – Transfer data between memory and register – Apple/Samsung/Qualcomm have their own • Load data from memory into register • Store register data into memory microarchitecture (implementation) of the ARM – Transfer control ISA • Unconditional jumps to/from procedures – Intel and AMD have different microarchitectures • Conditional branches 9/14/2020 17 for x86 18 17 18 Intel x86 Processors Intel x86 Evolution: Milestones • Dominate laptop/desktop/server market Name Date Transistors MHz • 8086 1978 29K 5-10 • Evolutionary design – First 16-bit Intel processor. Basis for IBM PC & DOS – Backwards compatible up until 8086, introduced in 1978 – 1MB address space – Added more features as time goes on • 386 1985 275K 16-33 – First 32 bit Intel processor , referred to as IA32 • Complex instruction set computer (CISC) – Added “flat addressing”, capable of running Unix – Many different instructions with many different formats • Pentium 4E 2004 125M 2800-3800 • But, only small subset encountered with Linux programs – First 64-bit Intel x86 processor, referred to as x86-64 – Hard to match performance of Reduced Instruction Set • Core 2 2006 291M 1060-3500 Computers (RISC) – First multi-core Intel processor – But, Intel has done just that! • Core i7 2008 731M 1700-3900 • In terms of speed. Less so for low power. – Four cores (our shark machines) 19 20 5 Intel x86 Processors, cont. 2020 State of the Art • Machine Evolution –Tiger Lake (11th generation) – 386 1985 0.3M • 10 nm, 9-15 Watts, 2-4 cores, mobile platforms – Pentium 1993 3.1M th – Pentium/MMX 1997 4.5M – Comet Lake (10 generation) – PentiumPro 1995 6.5M • 14 nm, up to 5.3 GHz, 8 cores/16 threads, 45 – Pentium III 1999 8.2M Watts – Pentium 4 2001 42M – Core 2 Duo 2006 291M – Core i7 2008 731M • Added Features – Instructions to support multimedia operations – Instructions to enable more efficient conditional operations – Transition from 32 bits to 64 bits – More cores https://simplecore.intel.com/newsroom/wp-content/uploads/sites/11/2020/04/Intel-10th-Gen-H-Series-2.jpg 21 22 x86 Clones: Advanced Micro Intel’s 64-Bit History Devices (AMD) • 2001: Intel Attempts Radical Shift from IA32 to IA64 • Historically – Totally different architecture (Itanium) –AMD has followed just behind Intel – Executes IA32 code only as legacy –Getting very competitive – Performance disappointing –Recruited top circuit designers from Digital Equipment Corp. and • 2003: AMD Steps in with Evolutionary Solution other downward trending companies – x86-64 (now called “AMD64”) –Built Opteron: tough competitor to Pentium 4 • Intel Felt Obligated to Focus on IA64 –Developed x86-64, their own extension to 64 bits – Hard to admit mistake or that AMD is better • • 2004: Intel Announces EM64T extension to IA32 – Extended Memory 64-bit Technology – Almost identical to x86-64! • All but low-end x86 processors support x86-64 – But, lots of code still runs in 32-bit mode 23 24 6 Today: Machine Programming I: Definitions Basics • History of Intel processors and architectures • Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand or write • C, assembly, machine code assembly/machine code. • Assembly Basics: Registers, operands, move – Examples: instruction set specification, registers. • Microarchitecture: Implementation of the architecture. • Arithmetic & logical operations – Examples: cache sizes and core frequency. • Code Forms: – Machine Code: The byte-level programs that a processor executes – Assembly Code: A text representation of machine code • Example ISAs: – Intel: x86, IA32, Itanium, x86-64 – ARM: Used in almost all mobile phones 25 26 Assembly/Machine Code View Turning C into Object Code CPU Memory – Code in files p1.c p2.c Addresses – Compile with command: gcc –Og p1.c p2.c -o p Registers • Use basic optimizations (-Og) [New to recent versions of GCC] Data Code PC Data • Put resulting binary in file p Condition Instructions Stack Codes text C program (p1.c p2.c) Compiler (gcc –Og -S) Programmer-Visible State text Asm program (p1.s p2.s) – PC: Program counter – Memory • Byte addressable array • Address of next instruction Assembler (gcc or as) • Called “RIP” (x86-64) • Code and user data – Register file • Stack to support procedures binary Object program (p1.o p2.o) Static libraries • Heavily used program data (.a) Linker (gcc or ld) – Condition codes • Store status information about most binary Executable program (p) recent arithmetic or logical operation • Used for conditional branching 27 28 7 Abstract Machines Pointers in C Machine Models Data Control int a = 4; C 1) char 1) loops int b = 3; 2) int, float 2) conditionals mem proc 3) double 3) switch int *c; 4) struct, array 4) Proc.

This Week's Action Items

Intel® Architecture Instruction Set Extensions and Future Features Programming Reference

Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures

EARNINGS Presentation Disclosures

Intel® Tiger Lake-UP3 Solution

11Th Gen Intel® Core™ Vpro® Mobile Processors & Intel® Xeon® W

Intel® Architecture Instruction Set Extensions and Future Features

Hardware-Enabled Security: 3 Enabling a Layered Approach to Platform Security for Cloud 4 and Edge Computing Use Cases

Intel's Tiger Lake 11Th Gen Core I7-1185G7 Review and Deep Dive

Intel Launches New 11Th Gen Core for Mobile

Understanding Intel's Tiger Lake

SLEEK PERFORMANCE. SLIM 7I (14")

Cpus, Gpus and Accelerators X86 Intel Server Micro-Architectures (1/2)