Jakob Engblom, PhD, Product Management Engineer, team, , Stockholm, Sweden 2019-05-22

Computer Architecture - Uppsala - 2019-05-22 4 My Background

Jakob Engblom . Datavetenskap, Uppsala: D92 . PhD, Systems, Uppsala . Product Management Engineer, Intel System Simulation team , Sweden – Previously at IAR, , Wind River . Intel Evangelist – Simulation . https://software.intel.com/en-us/meet-the- developers/evangelists/team/jakob- engblom . http://engbloms.se/jakob.html

Computer Architecture - Uppsala - 2019-05-22 5 What Does Intel Do?

• Processors • Intel® ® Phi™ • SSD • • Chipsets • Intel® Xeon® • 3D XPoint™ • WiFi • Processors • Intel® Optane™ • Bluetooth • Chipsets • GNSS • Accelerators • 5G and Server Storage Connectivity desktop

• SoC-FPGA • Movidius • Processors • Development tools • FPGA • Nervana • Gateways • Compilers • FPGA-CPU • • Connectivity • Simulation solutions • Xeon • & Windows drivers • FPGAs • UEFI & BIOS

FPGA AI etc. IoT Software

Computer Architecture - Uppsala - 2019-05-22 6 What I do: Product Management

Market communications (“PR”)

Engineering Product Management Customer

Support

Sales

Computer Architecture - Uppsala - 2019-05-22 7 What’s In a ? What’s in a ”Computer”?

(Main) Processor cores

. Run user-visible OS and applications Main memory - RAM Graphics and display Audio and media processing

. Camera, , speakers, image processing, ... Storage – ”Disk”

. SATA, NVMe, M.2., SCSI, PCIe, ... Input and output

. Local devices: USB, Thunderbolt, Serial, Bluetooth, ...

. Remote: Ethernet, WiFi, ...

Computer Architecture - Uppsala - 2019-05-22 9 Once Upon a Time...

The PROCESSOR was the essential part of a system It measured the goodness of the machine in terms like . Megahertz . Instructions per cycle . Cache size The supporting chips did some basic stuff to make the processor do its job... A better computer meant a better processor (mostly)

Computer Architecture - Uppsala - 2019-05-22 10 2009: Intel® Core™ i7 Processor: Still a Processor

Intel® Core™ i7-960 Processor (2009) . http://hexus.net/tech/reviews/cpu/1618 7-intel-core-i7-x58-chipset-systems-go- . The processor chip is a processor with fsb-invited/?page=3 minimal connections to the rest of the system . Cores + cache . Memory controller – just moved on-chip! . Intel QuickPath Interconnect (QPI) – link to the rest of the system

Computer Architecture - Uppsala - 2019-05-22 11 2009: Intel® X58 Express Chipset

IOH (I/O Hub) . QPI to processor (on the previous slide) . Graphics card . Connection to ICH10 ICH10 (I/O Controller Hub) . DMI link to the IOH . Main IO chip . SATA, Audio, USB, PCIe, Ethernet

Computer Architecture - Uppsala - 2019-05-22 12 2018: Intel® Core™ i9 Processor: Lots of Other Stuff

Intel® Core™ i9-9900K Processor (2018): . High-end eight-core desktop processor On the chip: . Graphics + media block bigger than four processor cores – 3D graphics, display, video decode, ... . “System Agent” - Memory controller & IO

about 2 processor cores https://en.wikichip.org/wiki/intel/core_i9/i9-9900k . L3 cache (2MB per core) not very large

Computer Architecture - Uppsala - 2019-05-22 13 2017: Intel® Z370 Chipset

Compared to Core i7-960, 7th and 8th gen processors have added:

. Integrated PCI Express (PCIe), version 3

. Integrated GPU, multiple displays, video decoding hardware, …

. Secure boot and other security functions (not shown)

. DMI 3.0 connection has about 160x the of the QPI from 2009

Intel® Z370 chipset is a single chip (PCH, ):

. 24 Additional PCIe version 3 lanes

. USB 3 and USB 2, 14 ports total

. Storage connections: SATA, eSATA, RAID, PCIe/NVMe, Intel® Optane™ Memory, eMMC, SDXC, …

. Advanced sound processing, including onboard DSP

. Management Engine (ME)

. Programming guide is 1700+ pages long!

Additional functions added on PCIe

. Wireless module: Wifi, Bluetooth, GNSS, …

Computer Architecture - Uppsala - 2019-05-22 14 Pure CPU chip: IO and Cache Dwarf the Cores

Intel® Core™ i7-6950K Processor (2016): . Memory controller as big as 5 cores! – Four (4) channels of DDR4 2400! . ”Queue, Uncore, and I/O” bigger than the 25MB L3 cache

http://www.anandtech.com/show/10337/the-intel-broadwell-e-review- core-i7-6950x-6900k-6850k-and-6800k-tested-up-to-10-cores

Computer Architecture - Uppsala - 2019-05-22 15 Note that these instructions Innovation: The Instruction Set Itself are virtually useless unless there is also supporting software libraries, SDKs, and compilers. People will not use them on their own Instruction set architecture are Recent Intel examples: without help. evolving a quick pace . Intel® Core™ i7-4xxx Processor: . Better instructions = many times faster – AVX2 vector processing computations on specific tasks – BMI1, BMI2 bit-manipulation instructions – https://software.intel.com/en- us/blogs/2017/01/09/resetting-the-lowest-n- Main trends: set-bits . Vector compute = more math per cycle – SMAP - Supervisor Mode Access Prevention – TSX – Transactional memory . Virtualization = faster, more efficient, more capable virtual machines . Intel® Core™ i7-5xxx Processor – RDSEED – Hardware random-number seed . Cryptography = crypto on CPU not on accelerator . Intel® Core™ i7-6xxx Processor – SGX – Software Guard Extensions . Security = better SW-SW protection

Computer Architecture - Uppsala - 2019-05-22 17 Example of the effect of ISA: AVX

Without AVX: With AVX and AVX512:

Source: https://www.anandtech.com/show/13400/intel-9th-gen-core-i9-9900k-i7-9700k-i5-9600k-review

Computer Architecture - Uppsala - 2019-05-22 18 Example: Instruction Set Flags - CPUID

See feature flags & instructions Software should check feature availability before executing them . CPUID instruction is crucial! . Software adopt dynamically to the machine it is running on Using Intel® Xtreme Tuning Utility . https://downloadcenter.intel.com/do wnload/24075/Intel-Extreme-Tuning- Utility-Intel-XTU-

Computer Architecture - Uppsala - 2019-05-22 19 Innovation Area: Networking

Ethernet speeds keep increasing Network interfaces add intelligence . 10GbE on Base-T: 2006 . Packet processing offload . 100GbE on Base-T: 2010 . Integrated switches . 40GbE & 25GbE . Virtualization – one physical interface appears as multiple virtual interface WiFi speeds keep increasing directly connected to virtual machines Cellular 4G/LTE/5G speeds keep – Intel® VT-d for VM access to hardware increasing – PCIe SR-IOV for multiple virtual devices in a single physical device = more going in and out than ever before!

Computer Architecture - Uppsala - 2019-05-22 20 Innovation Area: Connectors

USB Type C m.2 connector . Multiple interfaces in one connector . Multiple interfaces in – USB one connector – Thunderbolt – PCIe – HDMI – SATA – Displayport – USB – Power – I2C, Serial, PCM, ... More flexible computer design More compact SSDs Small ports = thinner machines Add-in other functions More user friendly like modems

Computer Architecture - Uppsala - 2019-05-22 21 Summary

Innovation in computing today is Buying a better machine: really in the platform capabilities . Faster disk and interface: – Once it was spinning disk on IDE Transistor budget being used to: – Then, SSD on SATA . Integrate previously separate functions – Now, SSD on M.2. PCIe NVMe onto processor die . More and faster external IO – Memory controller, GPU, IO, … – USB 3, USB 3.1 Type C, Thunderbolt, … . Add new functions to the platform . Higher display resolution, multiple displays, high dynamic range (HDR), … without increasing number of chips . Better network connectivity . Add new instructions to resolve software – WiFi standards, cellular standards, Bluetooth, bottlenecks Bluetooth Low Energy (BLE), …

Computer Architecture - Uppsala - 2019-05-22 22 What makes a system Tick? Answer: Firmware

Inside the processor, PCH, and other chips are many small programmable cores . Any semi-complicated subsystem has a programmable core inside The software running on these cores is called firmware . It is not hardware . But it is not as soft as software . Firmware – long-standing name for close-to-hardware software

http://www.ganssle.com/book.htm Disclosure: I wrote a chapter in the book

Computer Architecture - Uppsala - 2019-05-22 24 Example: SSD

Intel® Solid-State Drive Toolbox . Looking at the Intel® 600p M.2. PCIe NVMe drive in one of my PCs . Note the ”firmware revision” – There is a processor (or several) in there! – Updates are available to download and install

Computer Architecture - Uppsala - 2019-05-22 25 Example: Keyboard with 32-bit Processor

The tech specs for the Corsair* Gaming K95 keyboard: . ”32-bit ARM* Processor” . ”Display Controller” Conclusion: . If it is ”smart” or capable of acting independently, it has firmware in it

http://www.corsair.com/en-us/corsair-gaming-k95-rgb- mechanical-gaming-keyboard-cherry-mx-red

*Other names and brands may be claimed as the property of others

Computer Architecture - Uppsala - 2019-05-22 26 Example: Functionality Upgrades via Firmware

Sony* Playstation* 4 (PS4) HDMI controller upgraded from HDMI 1.4 to 2.0 to support HDR - using a firmware update! Hardware had the bit-pushing ability needed, but not the protocol and copy-protection bits

http://arstechnica.com/gaming/2016/09/whats-up-with- ps4s-surprise-firmware-update-is-4k-around-the-corner/

Computer Architecture - Uppsala - 2019-05-22 28 ”Russian Dolls”

Applications Memory Operating system

Main Main core core

The operating system and user sees a device on Timer Crypto the PCIe , with memory-mapped IO just like all other devices

USB PCIe Programming registers Programming

Serial Disk Advanced Device

Computer Architecture - Uppsala - 2019-05-22 29 ”Russian Dolls” Inside the device, you have a complete computer system, often with serial ports for debug access, and maybe running a complete OS!

Applications Memory Operating system

Firmware Main Main core core Memory

Small OS Serial Timer Crypto

Hidden Hidden Timer USB PCIe core core

Programming registers Programming IO Serial Disk Advanced Device

Computer Architecture - Uppsala - 2019-05-22 30 What Types of Processors are we Talking About?

Firmware processor cores cover a Classic embedded cores: broad range and is among the most . i8051, H8, … diverse ecosystem of cores around Standard embedded cores: Many driving factors for core choice: . ARM*, LEON* (SPARC*), MIPS*, ARC*, … . Size of core Digital signal processing cores: . Size of code . Ceva*, Tensilica*, ... . Speed of processing Full Intel® Architecture cores . Legacy of the subsystem Specialized custom cores . Programmability only rarely top of the list of concerns for hardware designers . Networking engines, pattern matching engines, ... *Other names and brands may be claimed as the property of others

Computer Architecture - Uppsala - 2019-05-22 31 How Firm is Firmware?

Originally, firmware was very firm Firmware today mostly stored in changeable memory . used hand- woven core memory to store programs . FLASH & EEPROM (Erasable Electrically – Software freeze four months before Programmable ROM) launch to allow it to be manually wired . Inside , FW is often loaded by . ”Mask ROMs” were added on top of the main processor microcontrollers of yore . ROM chips that could not be changed were common up until 1990s Firmware tends to change less often since it is kind of part of hardware and changes carry risk

Computer Architecture - Uppsala - 2019-05-22 32 Firmware Loading & Location

Fixed inside the device Applications Memory . ROM (maybe), FLASH Operating system . Often just a bootloader Main Main Dynamically loaded core core Firmware RAM . UEFI, BIOS, OS bootloader Serial starts and loads firmware Timer PCIe Hidden Hidden onto devices in early boot core core Timer . Device driver loads USB BOOT FLASH BOOT Local

firmware during OS boot registers Programming ROM FLASH IO . Stored in main SoC on- Serial Disk Advanced Device chip or off-chip boot flash, or in OS disk file system

Computer Architecture - Uppsala - 2019-05-22 33 Your Computer – A Distributed System

PC Display Simplified diagram! Most often each USB endpoint contains a very OS small embedded processor FW LAN too! Actual Drivers Display display proc Main USB core Ethernet port

FW FW FW FW FW USB port Keyboard Audio Graphics Thunder Thunder unit unit bolt bolt Audio jack

Computer Architecture - Uppsala - 2019-05-22 34 Security?

If a subsystem with firmware has a channel to the outside, it is part of the system security perimeter Example: Using WiFi chip firmware to take over phones . https://googleprojectzero.blogspot.se/2017/04/over-air-exploiting- broadcoms-wi-fi_4.html . ARM* Cortex-R4* processor “The first blog post will focus on exploring the Wi-Fi SoC itself; we’ll discover and exploit vulnerabilities which will allow us to . Gets firmware code from main processor remotely gain code execution on the chip. In the second blog post, we’ll further elevate our privileges from the SoC into the the operating system’s kernel. Chaining the two together, we’ll . Code not written securely demonstrate full device takeover by Wi-Fi proximity alone, requiring no user interaction. . All memory RWX – no MMU defense *Other names and brands may be claimed as the property of others

Computer Architecture - Uppsala - 2019-05-22 35 Summary

Firmware is everywhere! Software powers modern electronics in a very deep sense A ”processor” is not ”a processor” – it is a heterogeneous semi-autonomous collective of many processors Most of these processors are not exposed to end users or operating systems – they look and work like fixed-function hardware

Computer Architecture - Uppsala - 2019-05-22 36

37 What do we Want?

More performance ... With lower power consumption ... Giving off less heat = no fan ... With longer battery life ... Weighing less

NOT all that easy to do!

Computer Architecture - Uppsala - 2019-05-22 38 Power Efficiency Gains come from Many Sources

Manufacturing process Circuit design Computer architecture

System optimization Power management

Computer Architecture - Uppsala - 2019-05-22 39 Where Does the Power Go?

2 푃 = 푃푑푦푛푎푚푖푐 + 푃푙푒푎푘푎푔푒 푃푑푦푛푎푚푖푐 = 퐶푉 푓

Total power: Dynamic power: . Basic capacitance . Dynamic power during actual switching . × Voltage squared {V affects 푓푚푎푥} . Leakage power from just being powered- on . × Frequency Note: . Since increased frequency needs higher voltage, as a rule of thumb we have: 2 푃푑푦푛푎푚푖푐~ 푓

Computer Architecture - Uppsala - 2019-05-22 40 Better Silicon

Process , circuit design, Moore’s law . Transistors that use less power individually . Lower drive voltages – A processor used to run on 5V, then 3.3V, now down to < 1V – Interesting side-effect: – With a 95W power consumption, we have to feed 100A+ – Approximately half of all “pins” on a package are for power distribution . Lower leakage power All things equal, the same design on a better process = lower power or higher frequency at the same power . Over time, the same “nanometer” process is tuned and improved

Computer Architecture - Uppsala - 2019-05-22 41 Better Architecture

Allow the system to avoid waste Many slow cores, a few fast cores, or a mix? . Clock gating – shut off clock Processor pipeline design – trade performance vs power – Removes dynamic power . More work per clock cycle = lower clock = lower . Power gating – shut off power power – Removes static power (leakage) . Trade top-end performance for lower power – 2x performance means far more than 2x power . Gating is applied to ever smaller parts of chip Use accelerators Power states: . Specialized accelerators use less power to do . Settings for frequency, voltage, and on/off the same computation than a more general processor . Units set to lowest possible state to save power Cache hierachy . Increasing number of units and number of steps . Cache hit = lower power than memory access

Computer Architecture - Uppsala - 2019-05-22 42 System Optimization

Overall system design Such as. Selection of component parameters . Display resolution and size . Battery size . Processor choice . Memory choice – LPDDR (Low-Power DDR) vs regular DDR . Slow down wireless functions to save power . Offloading functions to specialized accelerators . Cooling efficiency

Computer Architecture - Uppsala - 2019-05-22 43 Power Management Software

Given that we have done our best in architecture & silicon...

Probably the biggest lever we have today to improve power/performance is the power management software

Essentially, a control feedback loop implemented in hardware, firmware, and software – driving power states and gating

Current operating goals

Temperature sensors

Power sensors Firmware Drivers

Operating BIOS Power states, on/off system

Clock frequency settings Applications

Voltage regulation Power management-relevant software

Computer Architecture - Uppsala - 2019-05-22 44 Power Management Firmware and Software Tasks

Optimize performance Avoid disaster . Profile current load . Throttle to avoid drawing too much power from the platform . Determine best way to set controls – Each chip has a design limit . Balance power draw vs user experience . Throttle to avoid overheating the chip

Sleep & wake-up . Put system into deeper sleep . Power off and on units in the correct order, wait until operation is stable

Computer Architecture - Uppsala - 2019-05-22 45 Hardware Control Points & Sensors

Hardware continously adds more control points to reduce waste: . Per-core voltage and clock-frequency adjustments (used to be per chip) . More power states in more devices . Faster changes to power states (off->on, clock & voltage scaling) – Note that going to low power state is not free - takes time to power or clock back up to full speed, operations take more time to complete Sensors multiply across the chips and system . Power levels . Thermal levels – very important to avoid cooking the chip! All of which come together in a power management unit (or units)

Computer Architecture - Uppsala - 2019-05-22 46 Layered Optimization and Goal Setting

Application OS will ask power management hardware to go to certain states based on its idea of Operating the current load system . ACPI states: ”active”, ”sleeping”, etc., for Driver processor, devices, and global Main processor . Applications can give hints to the OS about what it wants from power control Firmware Power controller firmware Power management unit . Will make quick adjustments based on the state . Responsible for sequencing sleep, nap, hardware hibernate states

Computer Architecture - Uppsala - 2019-05-22 47 Note: ACPI Power States

ACPI (Advanced Configuration and Power Interface) defines sets of states . Basic OS & Driver interface to power management Applies to different system parts and levels . Package/Chip . Core . Devices . Links (such as PCIe)

Computer Architecture - Uppsala - 2019-05-22 48 This CPU name string comes from CPUID and the chip directly! Sensor: Current CPU temperature

Actuator: Core frequencies vary with the load

Control example: fan speed vs temperature: higher temp = rev up the fan to compensate Screen capture from my living room gaming PC, using MSI* Command Center, 2017-03-05

Computer Architecture - Uppsala - 2019-05-22 49 Example: Situation Changes Quickly

More cores activated and the clock frequency goes up = higher package temperature

Screen capture from my living room gaming PC, using Intel® Extreme Tuning Utility (XTU), 2017-03-27

Computer Architecture - Uppsala - 2019-05-22 50 Example: Avoiding Disaster

My old * Android mobile phone Playing some YouTube* videos This happens when (I guess): . Screen is on . WiFi pulling in data . Processor & accelerators decompressing video streams at high resolution . Is overall a bit more than the package was designed to handle...

*Other names and brands may be claimed as the property of others

Computer Architecture - Uppsala - 2019-05-22 51 Power Management: Max is not Sum of all Max

35W Fictional example for illustration

5W 3W 5W 10W Total chip power allowed = 35W Processor Vector core Unit . Dictated by heat sink, power supply, and 5W 3W market segmentation Processor Vector Memory core Unit controller Total max power = 51W 15W 5W . Throttle one part of the chip to allow others to run at full speed Graphics unit L2 Cache IO Power management needs to keep the System-on-Chip power inside allowed bounds Hypothetical chip, rather simplified

Computer Architecture - Uppsala - 2019-05-22 53 Power Management: Set According to Workload

35W Compute-focus:

5W 3W 4W 10W . Power up cores, memory, and vectors Processor Vector core Unit . Throttle graphics to make room 5W 3W . Turn off IO, we assume we run from Processor Vector Memory core Unit controller memory 5W 0W

Graphics unit L2 Cache IO

System-on-Chip Hypothetical chip, rather simplified

Computer Architecture - Uppsala - 2019-05-22 54 Power Management: Set According to Workload

35W Gaming:

5W 0W 4W 10W . Graphic processing most important Processor Vector core Unit . Run one core at full speed – latency of 0W 0W processor work is important Processor Vector Memory core Unit controller . Forbid the use of vector units – assume 15W 1W that is all on the graphics unit . A bit of IO needed for sound and chat

Graphics unit L2 Cache IO . Memory controller also needs power

System-on-Chip Hypothetical chip, rather simplified

Computer Architecture - Uppsala - 2019-05-22 55 “Turbo” Processor Speed and Multicore

Processor speeds typically defined: . Base frequency . Max/turbo frequency When high performance is needed: . Use only a few cores = clock higher . Use many cores = clock lower . Using heavy units like AVX = lower

. Example graph for Intel® Xeon® https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of- Platinum 8180 28-core processor: the-decade/8

Computer Architecture - Uppsala - 2019-05-22 56 Summary

Power saving comes from silicon improvement, architecture improvements, system optimization, and power management Chips are full of sensors and actuators used by power management Power management is a nested dynamic feedback loop Broken power management can literally fry a chip

Computer Architecture - Uppsala - 2019-05-22 57 Wind RiverSimics® –System-Level Virtual Platform Computer Architecture - Uppsala - 2019-05-22 59 Hardware: A Hard Development Platform?

Computer Architecture - Uppsala - 2019-05-22 60 Hardware is Hard When it is in...

Not yet available Flaky prototype stage Not available anymore

Computer Architecture - Uppsala - 2019-05-22 61 Hardware is Hard When it is...

Inconveniently large & complex Dangerous to play with Inaccessible & expensive

Computer Architecture - Uppsala - 2019-05-22 62 Solution: Simulate the Hardware = Virtual Platform

Apps

OS

HW

Wind River® Simics®

Computer Architecture - Uppsala - 2019-05-22 63 About Wind River Simics® Wind River Simics® History

Development started in 1991 Major milestones . Spin-off from research project . 2.0: Heterogeneous systems Virtutech company founded in 1998 . 3.0: Reverse execution & debug, 2005 . 3.2: Intel VT-X acceleration . Sun & Ericsson first customers . 4.0: Multi-threaded (coarse), 2008 Acquired by Intel in 2010 . 4.2: Distribution, 2009 . Sales and marketing put into Wind River . 4.4: Eclipse GUI, 2010 . Large internal use & development at Intel . 4.6: TCF Debugger, 2012 Core development team still in Stockholm . 5: Multicore multithreading, 2015 . With local development teams around the . 6: More threading & integration, 2018 world, doing integration and modeling

Computer Architecture - Uppsala - 2019-05-22 65 How it Works

Full system virtual platform

Apps . Virtual/simulated target hardware User-level application code

. Run the same software as the physical OS Middleware and system libraries Important properties: HW Target operating system (s) . Fast enough to run real software workloads Virtual/simulated Network . Simulate any computer system target hardware Wind River® Simics® . Single board, multiple boards, standard parts, custom chips, IO, networks, … Host operating system Frees testing and development from the dependence on physical hardware Host hardware

Computer Architecture - Uppsala - 2019-05-22 66 Simulate any size System

Systems of systems

Racks, backplanes, cabinets, networks

Boards and single machines Chipsets Processor cores and SoCs

Computer Architecture - Uppsala - 2019-05-22 67 What’s the Point?

Fundamentally Simics is about running real software on virtual hardware in order to test & debug the software, the software-exposed aspects of the hardware, and the hardware design “Software” can mean many things… . Firmware, that is deeply hidden inside a chip . BIOS/Bootloader/UEFI, that is used to boot the machine . Device drivers, that manage hardware for an operating system . Operating systems . Middleware, providing services for other software . Applications, that any programmer would write . Distributed systems, software running across many separate machines . From bytes to terabytes of code!

Computer Architecture - Uppsala - 2019-05-22 68 Simulation as Tool

The power of Simics is to bring simulation to the domain of concrete software Build a model of reality, and then do experiments on the model Simulation as a design & development & research tool is well established . Explore things that cannot be done for . Quickly try different things very cheaply . Vary parameters over large spaces not possible with real hardware . Observe the internals of a system . Disturb/change the internal state of a system . Debug behaviors more efficiently thanks to better insight and control We make systems work better by allowing the software to be run on virtual hardware

Computer Architecture - Uppsala - 2019-05-22 69 Simics Level of Abstraction

Goal: Fast & scalable simulation Transaction-level modeling (TLM) Lazy and agile modeling

Build out platform from core to all over time A T B

A B Scope and speed and Scope

Detail of model Time

Goal: run the real software Model function & basic timing Add timing and µarch when needed

User application code System Processor Processor Cycle-accurate memory map Device register Cache model instruction simulators hardware Middleware and (not bus interface (timing) libraries set from designers models system)

Target operating system (s) Packet-level Event-driven Loose timing Processor models of simulation, not Power models Target model includes all software-visible model timing models functional aspects of hardware, such as networks cycle-driven processor instructions, supervisor modes, device registers, interrupts, etc.

Computer Architecture - Uppsala - 2019-05-22 70 Wind RiverSimics®use cases Wind River® Simics®: Throughout the Life Cycle

Bring-up and Design & Application Test and Deployment & platform Architecture development integration maintenance development

Product Timeline

Computer Architecture - Uppsala - 2019-05-22 72 Architecture: Example of a Hardware Block

Benchmark or typical user application Traffic generation

Linux Device driver Linux

Target machine Network traffic Detailed model of the Core Core RAM Disk generation inside or accelerator block: outside of Simics Microarchitecture, microengines, buses, APIC FLASH Ethernet etc. Firmware Network

USB Serial GPU Detailed model of an accelerator block Evaluate the performance of the block under real Simics target system model workloads Wind River®Simics®

Computer Architecture - Uppsala - 2019-05-22 73 “Shift Left” – Accelerating Product Development

Provide hardware models well in advance of RTL and silicon . Allow software development before silicon arrives – IP-block firmware, BIOS, drivers, ... . Shorten time to market by overlapping software and hardware design . Decouple hardware and software schedules for reduced risk . Validation of hardware, software, and their integration can start earlier ”Better products faster”

Computer Architecture - Uppsala - 2019-05-22 74 Ecosystem Enablement

Critical component – but not a product in itself

Product

Intel chips and chipsets Custom board

Board design Start early using Firmware dev virtual platform UEFI customization models – even at Device drivers OEM-level New HW features

Computer Architecture - Uppsala - 2019-05-22 75 Heterogeneous Integration Platform

Simics serves as a simulation platform, User program User program Middleware integrating all kinds of models Operating system

Hardware drivers UEFI/BIOS/Boot code

Firmware Device Simics Simics Other SystemC TLM RAM Flash Disk ISS ISS ISS system w/ ISS Subsystem Sensor Firmware TLM Model in Simics IO IO Subsystem with Python Xtor Xtor other framework DML Actuator internal ISS Entire chip Firmware Detailed Simics SystemC TLM Bus SystemC detailed architecture C/C++ device model model RTL Simulator, FPGA Environment model Simics heterogeneous target system model prototype, Big-box Emulator

Wind River® Simics®

Computer Architecture - Uppsala - 2019-05-22 76 Continuous Integration Developer Changes or Adds Code

Pre-CI Test Build System

Tests running mostly on simulation in order to: Unit Test • Do integration pre-si and post-si • Shorten test latency • Run each test more often Subsystem-Level Test • Run more and more varied configurations • Provide suitable configurations • Test what cannot be tested on hardware System-Level Test

Continuous OK Quality Delivery Assurance

Computer Architecture - Uppsala - 2019-05-22 77 System-Level Super Debugger

Insight into all Synchronous entire- Trace anything System-level symbolic components system stop debug

Unlimited powerful Record-replay debug Repeatability & Collaboration between

breakpoints Test Reverse debug developers

break –x 0x0000 length 0x1F00 Test

break-io uart0 Test Debug break-exception int13 Test break-log “spec violation” Test

Computer Architecture - Uppsala - 2019-05-22 78 More Information

Wind River® Simics® product: . http://www.windriver.com/products/simics/ My blog on simulation: . https://software.intel.com/en-us/meet-the- developers/evangelists/team/jakob-engblom My personal blog: . http://jakob.engbloms.se Intel Software makes other programming tools available for free to students: . https://software.intel.com/en-us/qualify-for-free-software/

Computer Architecture - Uppsala - 2019-05-22 79 Legal Disclaimers

• Intel ’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. • No computer system can be absolutely secure. • Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete http://www.intel.com/performance. Intel, the Intel logo, Xeon, Xeon Phi, , Quark, Core, , 3D Xpoint, Optane are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2018 Intel Corporation

Computer Architecture - Uppsala - 2019-05-22 80