Jaguar Microarchitecture
Alex Avery, Cody Smith Agenda
● AMD Processors ● Jaguar Overview ● Example Hardware ● Core Pipeline ● Instruction Fetch and Cache ● Instruction Decoding ● Scheduling ● Integer & FP Execution ● Memory ● Cache What is a Microarchitecture?
Microarchitecture is the Computer Organization
Microarchitecture + Instruction Set Architecture = Computer Architecture
A Microarchitecture describes the electrical circuitry of the device, it is how the ISA is implemented. AMD Processors
● Bobcat (2011) ● Piledriver (2012) ● Jaguar (2013) ● Steamroller (2014) ● Puma (2014) ● Excavator (2015) Jaguar Overview
● Targets 2-25W Devices ● Low cost ● 28 nm Technology ● Up to 4 Cores ● Split L1 Cache - 32 KiB instruction and 32 KiB data per core ● Unified L2 Cache - 1-2 MiB, 16 way ● Out-of-order and Speculative Execution ● Integrated memory controller ● Two-way integer execution ● Two-way 128-bit floating-point execution Example Hardware
● Gaming Consoles ○ Xbox One ○ PS4 ● Desktop Processors ○ Athlon 5350 ○ Sempron 3850 ● Laptops/Mini PCs ○ A6-5200 ○ E2-3000 ● Tablets ○ A6-1450 ● Embedded Processors ○ GX-420CA Jaguar Core Pipeline Instruction Fetch and Cache
● 6 Stages ● 32KB 2 way set associative L1 cache ● Pseudo least recently used (LRU) replacement algorithm ● 32B Instruction fetch window ● Branch predictors exploit characteristics of both direct and indirect branches as well as branch density Instruction Decoding
● Can decode two x86 Instructions per cycle ● Variable length x86 instructions are decoded into complex micro-operations (COPs) ● Can handle 128-bit vector units as well as x86 Advanced Vector Extensions (AVX) Scheduling
● Out-of-order execution ● After instructions are decoded into COPs, they are dispatched ● Each COP allocates a Retire Control Unit (RCU) entry Integer Execution
● Separate Integer and Floating Point Units ● 2 Symmetrical integer pipelines ● Integer addition/subtraction takes 3 cycles ○ Read operands ○ Execute ○ Write back ● 6 Cycle multiplication ● Separate hardware divider Floating Point Execution
● Designed for 128-bit wide execution ● Targets SSE and AVX vector extensions ● 2 Asymmetrical FP pipelines ● 4-7 cycles per addition/subtraction ○ Read operands (2 cycles) ○ Execute (1-4 cycles) ○ Write back (1 cycle) ● Co-processor architecture ○ Dedicated decode, rename, out-of-order scheduler and retire queue Memory
● Separate load and store pipelines ● Aggressive re-ordering ○ Loads can occur out-of-order ○ Loads can be moved ahead of stores before the target address is resolved ● Memory Ordering Queue and Store Queue handle memory ordering L1 Data Cache
● 32KB ● 8-way associative ● Parity protected writeback cache ● Pseudo-LRU replacement algorithm ● Can handle a 128-bit read and a 128-bit write each cycle ● Average latency of 3 cycles for a L1 hit L2 Cache
● 1 - 2 MB (depending on application) ● 16-way set associative ● Unified, shared by 2 to 4 cores ● ECC Memory (Error Correcting Code) for tag and data arrays ● Forms an EDC/ECC cache structure ● Minimum of 25 cycles per hit Jaguar Benchmarks
● Athlon 5350 ● Athlon 5150 ● Sempron 3850 Athlon 5350 vs. Intel Core i3 3220 vs. Celeron J1900 Athlon 5350 vs. Intel Core i7 5930K
The Athlon 5350 is much lower performance, however:
● Much better efficiency ● Much lower cost ● Better performance per watt ● Better performance per dollar Zen
● Entirely new core design ● New design family ‘Summit Ridge’ ● Simultaneous Multithreading ● New Cache System ● FinFET manufacturing process Resources http://www.anandtech.com/show/6976/amds-jaguar-architecture-the-cpu-powering-xbox-one-playstation-4-kabini-temash http://www.realworldtech.com/jaguar/ http://www.tomshardware.com/reviews/microsoft-xbox-one-console-review,3681-3.html https://nathanlamont91.wordpress.com/2015/03/22/my-report-on-the-amd-jaguar-quad-core-cpu/ https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/the-floating-point-unit-of-the-jaguar-x86-core- 1TVYueOORA http://www.xbitlabs. com/news/cpu/display/20120904201534_AMD_Discloses_Peculiarities_of_Next_Generation_Jaguar_Micro_Architecture. html