mputing HIGH PERFORMANCE Coding (HPC)

ALBERTO DASSATTI - 2016 Microarchitecture The real hardware, CPU, memory and so on, giving (an) implementation(s) of an ISA

ISA All information needed by the prgrammer for programming an architecture DLX RISC RISC families include DEC Alpha, AMD Am29000, ARC, ARM, Atmel AVR, Blackfin, i860 and i960, MIPS, CISC , PA-RISC, Power PDP-11, VAX, Motorola 68k, and your (including PowerPC), RISC-V, etc... desktop PCs on x86 architecture. SOMEMORERECENTEXAMPLES

Let’s have a look at some modern microarchitectures: Intel Nehalem Wikipedia, and more or AMD Bulldozer and more 2. Out of Order dynamically scheduled

3. Speculation: branch, pre-fetch (data-flow model)

4. 20 − 24 stage pipeline

5. 128µOps

6. Very High and variable ILP

NEHALEM

1. 6 Ops: 3 mem (load, store data and store address) + 3 ALU 3. Speculation: branch, pre-fetch (data-flow model)

4. 20 − 24 stage pipeline

5. 128µOps

6. Very High and variable ILP

NEHALEM

1. 6 Ops: 3 mem (load, store data and store address) + 3 ALU

2. Out of Order dynamically scheduled 4. 20 − 24 stage pipeline

5. 128µOps

6. Very High and variable ILP

NEHALEM

1. 6 Ops: 3 mem (load, store data and store address) + 3 ALU

2. Out of Order dynamically scheduled

3. Speculation: branch, pre-fetch (data-flow model) 5. 128µOps

6. Very High and variable ILP

NEHALEM

1. 6 Ops: 3 mem (load, store data and store address) + 3 ALU

2. Out of Order dynamically scheduled

3. Speculation: branch, pre-fetch (data-flow model)

4. 20 − 24 stage pipeline 6. Very High and variable ILP

NEHALEM

1. 6 Ops: 3 mem (load, store data and store address) + 3 ALU

2. Out of Order dynamically scheduled

3. Speculation: branch, pre-fetch (data-flow model)

4. 20 − 24 stage pipeline

5. 128µOps NEHALEM

1. 6 Ops: 3 mem (load, store data and store address) + 3 ALU

2. Out of Order dynamically scheduled

3. Speculation: branch, pre-fetch (data-flow model)

4. 20 − 24 stage pipeline

5. 128µOps

6. Very High and variable ILP source Or another useful view for performance arch source source CACHE MEMORY

If you need a gentle basic introduction to cache memories see this article Cache Miss I Coldness: first time the data is not there → Prefetching I Capacity: too much data accessed → organize data and code differently I Conflict: Multiple data mapped to the same location I Sharing: I True: Thread in another processor wanted the data, it got moved to the other cache I False: data in the same cache line used by another core source EXTRA

If you are interested in the effects of branch predictors (and you should) have a look here. Cache effects are very complicated to study and isolate. Read this article to have a better idea. QUESTIONS ?