Lecture 03 Basics Moore's Law Since 2003

Lecture 03 Basics Processors CS 50011: Introduction to Systems II Architecture ISA Lecture 3: Computer Architecture DMA Modes Prof. Jeff Turkstra © 2017 Dr. Jeffrey A. Turkstra 1 © 2017 Dr. Jeffrey A. Turkstra 2 Basics Some lecture material based on: Moore’s Law for integrated circuits Slides by Dr. George B. Adams III Transistor count for a typical processor Slides from Hennessy & Patterson or memory chip increases 40% to 55% per year. Slides from Silberschatz Doubles every 18-24 months Transistor count ~= computational power 1986-2003 computer performance increased ~ 50%/year © 2017 Dr. Jeffrey A. Turkstra 3 © 2017 Dr. Jeffrey A. Turkstra 4 Moore’s Law since 2003 25,000-fold hardware performance Microprocessor performance only improvement since 1985 20%/year Programs today trade execution Maximum power dissipation limits for performance for programmer air-cooled chips productivity Lack of additional instruction-level More programming is done in managed parallelism for hardware to exploit languages like Java, Python, and C# New applications have arisen: speech, sound, images, video © 2017 Dr. Jeffrey A. Turkstra 5 © 2017 Dr. Jeffrey A. Turkstra 6 Computer components Hardware Data types Transistors Representations for character, integer, Gates floating point, etc more, od, xxd Combinational and sequential circuits Sign-magnitude Adders, decoders, mux/demux, latches, flip-flops, registers 1’s complement Processors 2’s complement Memory IEEE 754 etc BCD ... © 2017 Dr. Jeffrey A. Turkstra 7 © 2017 Dr. Jeffrey A. Turkstra 8 Software Instructions for what to compute © 2017 Dr. Jeffrey A. Turkstra 9 © 2017 Dr. Jeffrey A. Turkstra 10 How a modern computer Harvard architecture works Idea by Howard Aiken, Harvard physicist, to IBM Nov. 1937 Built by IBM in Endicott, NY and delivered to Harvard in Feb. 1944 as the Mark 1 computer Has separate memories for program (instructions) and data Input/output (I/O) to connect to the world Processor to carry out the computations © 2017 Dr. Jeffrey A. Turkstra 11 © 2017 Dr. Jeffrey A. Turkstra 12 (John) Von Neumann Harvard architecture Developed during his June 1945 train ride from Philadelphia to Los Alamos, NM He had programmed the Mark 1 in August 1944 One memory for both data and program Same I/O Same processor © 2017 Dr. Jeffrey A. Turkstra 13 © 2017 Dr. Jeffrey A. Turkstra 14 von Neumann vs Harvard Von Neumann architectures von Neumann Same memory holds instructions and data Single bus between CPU and memory Flexible, more cost effective Harvard Separate memories for data and instructions Two busses Allows two simultaneous memory fetches Less flexible, memory is physically partitioned Both are stored program computer designs © 2017 Dr. Jeffrey A. Turkstra 15 © 2017 Dr. Jeffrey A. Turkstra 16 Processors Stored programs Device that performs automatic computation Some memories can be written to only Fixed logic – single operation once and then read many times Traffic signal sequencer Read-Only Memory (ROM) Selectable logic – user can select from multiple E.g., automobile engine control hardwired functions Car with Econo and Sports modes for transmission Some ROM can be re-written Parameterized logic – computes fixed function PROM, programmable ROM on variable user input EPROM, erasable programmable ROM Programmable video recorder EEPROM, electrically erasable… Programmable logic processor Embedded systems often PROM CPU, GPU, etc Firmware upgrades © 2017 Dr. Jeffrey A. Turkstra 17 © 2017 Dr. Jeffrey A. Turkstra 18 Fetch-Execute Intel Core i7 Processor At the highest level, a processor does this: repeat forever { FETCH, access the next program instruction from location where it is stored EXECUTE, perform the actions described by the instruction } © 2017 Dr. Jeffrey A. Turkstra 19 © 2017 Dr. Jeffrey A. Turkstra 20 Motherboard © 2017 Dr. Jeffrey A. Turkstra 21 © 2017 Dr. Jeffrey A. Turkstra 22 Architecture basics Architecture basics Instruction set Logic design Software instructions that the hardware Which logic circuits are used and how executes are they organized? Functional organization Implementation How is the hardware partitioned into Technologies and packaging used specialized units? © 2017 Dr. Jeffrey A. Turkstra 23 © 2017 Dr. Jeffrey A. Turkstra 24 Instruction set Hierarchical abstraction architecture Hardware and software consist of Instruction set architecture (ISA) is a layers in a hierarchy key level of abstraction To a good approximation Primary interface between hardware and Each layer hides (some of) its detail software from the layer above Set of operations that a processor Principal of Abstraction performs Highest layer interacts with outside Instruction format defines an world/end user interpretation of bit strings Similar to ASCII, 2’s complement, IEEE 754, BCD, etc © 2017 Dr. Jeffrey A. Turkstra 25 © 2017 Dr. Jeffrey A. Turkstra 26 Opcodes, operands, and results A bit string, interpreted as an instruction, specifies Operations to be performed Actual operand(s) and/or source(s) for the operand(s) and their type(s) Destination for the result(s) © 2017 Dr. Jeffrey A. Turkstra 27 © 2017 Dr. Jeffrey A. Turkstra 28 ISA Design Many tradeoffs Instruction length Number of registers Number of instructions etc © 2017 Dr. Jeffrey A. Turkstra 29 © 2017 Dr. Jeffrey A. Turkstra 30 CISC vs RISC Endianness Complex Instruction Set Computer Imagine memory is read from lowest address to highest address Reduced Instruction Set Computer Big Endian RISC won Most significant, “big,” byte comes first. Ie, placed in lowest numbered memory location. Even Intel uses RISC micro-instructions “Big” end appears first when reading memory They just have a really amazing instruction Network traffic decoder PowerPC, ARM, SPARC, MIPS Little Endian Reverse of Big Endian: least significant, “Little,” byte placed in lowest address “Little” end first © 2017 Dr. Jeffrey A. Turkstra 31 © 2017 Dr. Jeffrey A. Turkstra 32 Example and comparison Consider 0x00C0F380 = 0x 00 C0 F3 80 = 0b0000 0000 1100 0000 1111 0011 1000 0000 Most significant byte Least significant byte Byte at given location Addresses arbitrarily start Memory Little endian Big endian address at 0x00000000; 0x00000000 1000 0000 0000 0000 Locations accessed in 0x00000001 1111 0000 1100 0000 arrow-indicated 0x00000002 1100 0000 1111 0011 sequence 0x00000003 0000 0000 1000 0000 © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra 33 © 2017 Dr. Jeffrey A. Turkstra 34 Memory hierarchy Registers Type of memory located inside CPU Can hold a single piece of data Data processing Control Many registers More later © 2017 Dr. Jeffrey A. Turkstra 35 © 2017 Dr. Jeffrey A. Turkstra 36 Designing an ISA © 2017 Dr. Jeffrey A. Turkstra 37 © 2017 Dr. Jeffrey A. Turkstra 38 Example Writing a program using these instructions is programming in assembly language; example Assembly instr. ; Comments load r2, 20(r1) ; r2 ← Data_Memory[20+r1] load r3, 24(r1) ; r3 ← Data_Memory[24+r1] add r4, r2, r3 ; r4 ← r2 + r3 store r4, 28(r1) ; Data_Memory[28+r1] ← r4 jump 60(r7) ; Fetch at Instr._Memory[60+r7] r1, r2, r3, r4 are registers in the data path 20, 24, 28 are decimal constants “x ←” means “x” is the location for the result Memory[x] means the contents of memory at address x + means addition, with operand typedefined by the instruction (r1 + r2 is add with different data type than 28+r3) © 2017 Dr. Jeffrey A. Turkstra 39 © 2017 Dr. Jeffrey A. Turkstra 40 Assembly and its machine ADD code ADD result data path Opcode PointerPointerPointer Unused offset, arbitrarily set all 0 41 ADD: ALU output is result, deliver to dst reg; result has the meaning “integer” because this is an integer adder © 2017 Dr. Jeffrey A. Turkstra 41 © 2017 Dr. Jeffrey A. Turkstra 4242 LOAD STORE t l u h s t e a r p D a A t a O L d STORE result data path LOAD: ALU output is pointer that must be sent to data memory, which then STORE: ALU output is pointer that must be sent to data memory along with produces copy of the location contents which, finally, must be written into dst_reg; the value from reg B to be written into the data memory location; Result is a Result is a bit string from memory: no inherent meaning at all bit string from reg_B written in memory, no inherent meaning at all © 2017 Dr. Jeffrey A. Turkstra 4343 © 2017 Dr. Jeffrey A. Turkstra 4444 Intel Core microarchitecture JUMP pipeline JUMP result data path JUMP: ALU output is computed Next_instruction_pointer, must deliver to Instruction_pointer_register; Result meaning is location of next instruction on the execution path © 2017 Dr. Jeffrey A. Turkstra 4545 © 2017 Dr. Jeffrey A. Turkstra 46 Direct memory access MMU DMA allows other hardware Responsible for “refreshing” DRAM subsystems to access main memory Translates virtual memory addresses without going through the CPU to physical addresses Modern systems usually have DMA Sometimes part of CPU controller (MMU) Sometimes not Memory address register, byte count, Northbridge for Intel until recently control, etc I7/i5 have an Integrated Memory Responsible for ensuring accesses are Controller (IMC) properly restrained Attack vector © 2017 Dr. Jeffrey A. Turkstra 47 © 2017 Dr. Jeffrey A. Turkstra 48 Page 70 Execution modes CPU hardware has several possible modes At any one time, in one mode Modes specify Privilege level Valid instructions Valid memory addresses Size of data items Backwards compatibility © 2017 Dr. Jeffrey A. Turkstra 49 © 2017 Dr. Jeffrey A. Turkstra 50 Rings Ring -1 Intel Active Management Technology Exists for other architectures as well Runs on the Intel Management Engine (ME) Isolated and protected coprocessor Embedded in all current Intel chipsets ARC core Out-of-band access Direct access to Ethernet controller Requires vPro-enabled CPU/Motherboard/Chipset © 2017 Dr. Jeffrey A. Turkstra 51 © 2017 Dr. Jeffrey A. Turkstra 52 Ring -1 Trusting trust ...if you can exploit it, you win. Reflections on Trusting Trust CVE-2017-5689 by Ken Thompson Go read about it Read this too © 2017 Dr. Jeffrey A. Turkstra 53 © 2017 Dr.

Load more