COPE-09 Architecture.Indd

Computer Organisation & Program Execution 2021 Architecture9 Uwe R. Zimmer - The Australian National University Architecture References for this chapter [Patterson17] David A. Patterson & John L. Hennessy Computer Organization and Design – The Hardware/Software Interface Chapter 4 “The Processor”, Chapter 6 “Parallel Processors from Client to Cloud” ARM edition, Morgan Kaufmann 2017 © 2021 Uwe R. Zimmer, The Australian National University page 468 of 490 (chapter 9: “Architecture” up to page 487) Architecture © 2021 Uwe R. Zimmer, The Australian National University page 469 of 490 (chapter 9: “Architecture” up to page 487) Architecture Defi nition: Processor Hardware origins 18th century machines L’Ecrivain 1770 Programmable, yet not a computer in today’s defi nition (not Turing complete) L’Ecrivain (1770) Pierre Jaquet-Droz, Henri-Louis Jaquet-Droz & Jean-Frédéric Lescho © 2021 Uwe R. Zimmer, The Australian National University page 470 of 490 (chapter 9: “Architecture” up to page 487) Architecture Defi nition: Processor Digital Computers Konrad Zuse with Z1, © Dr. Horst Zuse Hardware origins (replica of the 1937 computer) • Patents by Konrad Zuse (Germany), 1936. ENIAC 1946, • First digital computer: Z1 (Germany), 1937: Re- Glen Beck (background), Betty Jennings (foreground) lays, programmable via punch tape, clock: 1 Hz, 64 words memory à 22-bit, 2 registers, fl oating point unit, weight: 1 t. • First freely programmable (Turing complete) relays computer: Z3 (Germany), 1941: 5.3 Hz • Atanasoff Berry Computer (US) 1942: Vacuum tubes, (not Turing complete). • Colossus Mark 1 (UK) 1944: Vacuum tubes (not Turing complete). • “First Draft of a Report on the EDVAC” (Electronic Discrete Variable Automatic Computer) by John von Neumann (US), 1945: Infl uential article about core elements of a computer: Arithmetic unit, control unit (Sequencer), memory (holding data and program), and I/O. • First high level programming language: Plankalkül (“Plan Calculus”) by Konrad Zuse, 1945. • ENIAC (Electronic Numerical Integrator And Computer) (US) 1946: programed by plugboard, First Turing complete vacuum tubes based computer, clock: 100 kHz, weight: 27 t on 167 m2. © 2021 Uwe R. Zimmer, The Australian National University page 471 of 490 (chapter 9: “Architecture” up to page 487) Architecture Computer Architectures Harvard Architecture • Control unit Instructions Concurrently addresses program and data Control unit memory and fetches next instruction. Controls next ALU operations and Program memoryProgram Address determines the next instruction (based on ALU status). • Arithmetic Logic Unit (ALU) Control Status Fetches data from memory. Executes arithmetic/logic operation. Address Writes data to memory. • Input/Output Arithmetic Logic Unit • Program memory Data memory Input/Output • Data memory Data © 2021 Uwe R. Zimmer, The Australian National University page 472 of 490 (chapter 9: “Architecture” up to page 487) Architecture Computer Architectures von Neumann Architecture • Control unit Instructions Sequentially addresses program and data Control unit memory and fetches next instruction. Controls next ALU operations and Address determines the next instruction (based on ALU status). • Arithmetic Logic Unit (ALU) Control Status Memory Fetches data from memory. Executes arithmetic/logic operation. Writes data to memory. • Input/Output Arithmetic Logic Unit • Memory Program and data is not distinguished Input/Output Data G Programs can change themselves. © 2021 Uwe R. Zimmer, The Australian National University page 473 of 490 (chapter 9: “Architecture” up to page 487) Architecture Computer Architectures A simple processor (CPU) Code management • Decoder/Sequencer Decoder Can be a machine in itself which breaks CPU Sequencer instructions into concurrent micro code. • Execution Unit / Arithmetic-Logic-Unit (ALU) IP A collection of transformational logic. Flags Registers • Memory SP Memory • Registers Instruction pointer, stack pointer, general purpose and specialized registers. ALU • Flags Indicating the states of the latest calculations. Data management • Code/Data management Fetching, Caching, Storing. © 2021 Uwe R. Zimmer, The Australian National University page 474 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Pipeline CodeCode management managementent Some CPU actions are naturally sequential Decoder (e.g. instructions need to be fi rst loaded, then Int. SequencerSe Sequencer decoded before they can be executed). More fi ne grained sequences can IP be introduced by breaking CPU Registers Flags instructions into micro code. SP Memory G Overlapping those sequences in time will lead to the concept of pipelines. ALU G Same latency, yet higher throughput. G (Conditional) branches might break the pipelines G Branch predictors become essential. DataData management management © 2021 Uwe R. Zimmer, The Australian National University page 475 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Parallel pipelines CodeCode management managementt Filling parallel pipelines Decoder (by alternating incoming commands between Int. SequencerSe Sequencererr pipelines) may employ multiple ALU’s. G (Conditional) branches might IP again break the pipelines. Registers FlagsFla G Interdependencies might limit SP Memory the degree of concurrency. G Same latency, yet even higher throughput. AALU ALULU G Compilers need to be aware of the options. DataData management management © 2021 Uwe R. Zimmer, The Australian National University page 476 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Pipeline hazards CodeCode management managementent Structural hazard Decoder G Lack of hardware to run Int. SequencerSe Sequencer operations in parallel, … e.g. load an new instruction and IP load new data in parallel. Registers Flags Control hazard SP G A decision depends on the Memory previous instruction. … e.g. a conditional branch based on the ALU fl ags from the previous instruction. Data hazard G Needed data is not yet available DataData management management … e.g. the result of an arithmetic operation is needed in the next instruction. © 2021 Uwe R. Zimmer, The Australian National University page 477 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Out of order execution CodeCode management managemente Breaking the sequence inside each pipe- Decoder line leads to ‘out of order’ CPU designs. Int. SequencerSe Sequencer G Replace pipelines with hardware scheduler. IP G Results need to be Registers FlagsFla “re-sequentialized” or possibly discarded. SP G “Conditional branch prediction” executes Memory the most likely branch or multiple branches. G Works better if the presented code AALU ALULU sequence has more independent instructions and fewer conditional branches. G This hardware will require (extensive) code optimization to be fully utilized. DataData management management © 2021 Uwe R. Zimmer, The Australian National University page 478 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures SIMD ALU units Code management Provides the facility to apply the same in- Decoder struction to multiple data concurrently. Int. Sequencer Also referred to as “vector units”. Examples: Altivec, MMX, SSE[2|3|4], … IP Flags Registers G Requires specialized compilers SP or programming languages with Memory implicit concurrency. AALALULU ALU ALU ALU GPU processing Graphics processor as a vector unit. Data managementgeemmenmeent G Unifying architecture languages are used (OpenCL, CUDA, GPGPU). © 2021 Uwe R. Zimmer, The Australian National University page 479 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Hyper-threading CodeCod management Emulates multiple virtual CPU cores DecoderDecoder by means of replication of: Int. SequencerSequencer • Register sets • Sequencer IPIPIP • Flags RegistersRRegistersegistersegisegi FlagsFFlagslaggss • Interrupt logic SPSSPP Memory while keeping the “expensive” resources like the ALU central yet accessible by multiple hyper-threads concurrently. ALU G Requires programming languages with implicit or explicit concurrency. DataData management Examples: Intel Pentium 4, Core i5/i7, Xeon, Atom, Sun UltraSPARC T2 (8 threads per core) © 2021 Uwe R. Zimmer, The Australian National University page 480 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Multi-core CPUs CodeCod management Full replication of multiple CPU cores DecoderDecoder on the same chip package. Int. SequencerSequencer • Often combined with hyper-threading and/or multiple other means (as IPIP IP introduced above) on each core. RegistersRRegistersegistersegisgi FlagsFFlagslaggss • Cleanest and most explicit implementation SPSSPP Memory of concurrency on the CPU level. G Requires synchronized atomic operations. ALU G Requires programming languages with implicit or explicit concurrency. Historically the introduction of multi-core DataData management CPUs ended the “GHz race” in the early 2000’s. © 2021 Uwe R. Zimmer, The Australian National University page 481 of 490 (chapter 9: “Architecture” up to page 487) Architecture Processor Architectures Virtual memory Code management Translates logical memory addresses Decoder into physical memory addresses Int. Sequencer and provides memory protection features. IP • Does not introduce concurrency by itself. Registers Flags G Is still essential for concurrent programming as hardware memory protection SP

Load more