Simulation: Modeling + Execution On Simulation • Build a model of the system Jakob Engblom, PhD • Try various scenarios on this model & Uppsala University – Experimental, not analytical approach [email protected] • Understand the real system by working with the model – More available – More inspectable – Less dangerous

5 ESSES, 4 Sept 2003

Simulation or Analysis Sufficient Level of Detail

• Simulation gets closer to real world • Maintain sufficient details – More details – To observe relevant aspects of reality – Fewer assumptions – To avoid artifacts of experiment – High computational workload • Abstract away unimportant aspects • Analytical models – Newtonian vs. quantum physics – Efficient predictors – Timing vs. function – Low computational workload • Danger: bad abstractions = bad simulation – ... but more removed from world=less accurate

6 ESSES, 4 Sept 2003 7 ESSES, 4 Sept 2003 Scope versus Abstraction Example: Scope/Detail tradeoff To simulate the Galaxies universe, the units of simulation have •”GPL” to be galaxies • ”Life-like” action: – Momentum

Level of abstraction – Friction The Scope of model Universe – Steering Simulating a single atom, we can use – Engine torque the incredible detail of quantom Reasonable to • Not nuts & mechanics and simulate: scope bolts of cars string theory proportional to abstraction String theory Grand-Prix Legends

8 ESSES, 4 Sept 2003 9 ESSES, 4 Sept 2003

Simulation is never perfect Simulating Computers • It is never quite the real thing ...

• ...but it can be very close indeed

10 ESSES, 4 Sept 2003 Simulating Computer Systems Simulating Computer Systems

• We need to decide the level of abstraction P Prog roce ram • More detail = smaller scope ssor • Less detail = larger scope WhatPe rdo we need to simulate? – Size of systems that can be investigated iphe rals – Number of different systems Stim • Measure of scope: speed uli – As number of software instructions per second

12 ESSES, 4 Sept 2003 13 ESSES, 4 Sept 2003

Detailed Hardware Models Instruction-Set Simulation

-level model • Model computer at instruction set level – Very close to actual implementation – Stable & defined interface – The level where hardware & software meet • Small scope – Stimuli at transaction level – Small piece of HW • Abstractions to increase scope: Key issue: there can – Small programs be no software – Stimuli at bit level – Keep functionality correct visible difference – Speed: 100s of instructions per second – Vary fidelity in timing (including to the OS) – With 25MUSD hardware: 10-100 KIPS – Simplify some behavior • Necessary for hardware development • Speed: 10 KIPS to 10070 0MIPS MIPS

14 ESSES, 4 Sept 2003 15 ESSES, 4 Sept 2003 Sufficient Detail of Model

• Complete from a software perspective – All readable values represented – All registers of CPU implemented – Software=OS, drivers, applications, middleware, ... • Hardware considered as a set of devices – I/O-space or memory mapped – Behavior at level seen by device drivers • No “abstract” networks, all concrete • Next slide: example of detail required

16 ESSES, 4 Sept 2003 17 ESSES, 4 Sept 2003

Instruction-Set Simulation Full-System Simulation

• To run real workloads, you need – Hardware: CPU & devices – OS and other services Virtutech – Stimuli to feed them Simics • Common methods to achieve this – Full-system simulation – Virtualization One physical – User-level simulation Virtual computer systems of computer many different types

18 ESSES, 4 Sept 2003 19 ESSES, 4 Sept 2003 NotFull-System Simulation Full-System Simulation Virtualization User program

Middleware DB Servers Virtualization Real OS & system Operating system Software

Simulated CPU Network hardware Disk GPU RAM One physical Several virtual computers Device computer of the same type controller Hardware

20 ESSES, 4 Sept 2003 21 ESSES, 4 Sept 2003

NotFull-System Simulation Speed User-level simulation

User program • Depends of level of timing detail in model Real user • Slowest: cycle-accurate simulation Middleware DB Servers program – Hardware timing modeled in great detail Operating system Simulated • Fastest: emulation (user-level only) OS, services, • Sweet spot: somewhere inbetween CPU some HW – Simics tries to hit this spot – Configurable level of detail RAM

Hardware 22 ESSES, 4 Sept 2003 23 ESSES, 4 Sept 2003 Speed Going up in Scope

Detailed hardware sim • Interesting systems are larger than single CPU cycle-accurate simulator (>10,000x) • Multiprocessors – Homogeneous like servers accuracy fast full-system – Heterogeneous like mobile phones simulation (20-400x) • Distributed systems emulator (5x) – Local-Area Networks – Embedded CAN buses

Virtualization – Networks-on-chips speed 10 KIPS 1000 MIPS • = Simulated shared memory, networks

24 ESSES, 4 Sept 2003 25 ESSES, 4 Sept 2003

Distributed Network Simulation Network SimulationSimulated network of simulated machines • Level of simulation – Entire packets, not physical layer – Simulate the network cards in nodes • Spread simulation across multiple machines

– Necessary increase of speed Interface to • Still, maintain determinism real network if needed – Synchronize simulated machines – One machine stops, all machines stop – Global checkpointing & restore Real network of physical machines

26 ESSES, 4 Sept 2003 27 ESSES, 4 Sept 2003 Simulation Advantages Simulation Advantages • Configurability – Simulate anything, • Independent of available hardware • Target architecture • System configuration • Availability – Easy to copy setup, no manufacturing involved • Determinism – Removes real-world indeterminism – Synchronization across machines and networks

29 ESSES, 4 Sept 2003

Simulation Advantages Simulation Advantages

• Checkpoint & restart • Sandboxing – Save state of machine to reload later – Completely walled-in – Parallelize & repeat runs – No hidden communications – Distribute fixed starting points – Undo state changes • Non-intrusive inspection & tracing – Dangerous experiments possible – Any events or state in the machine – Viruses, worms, buffer overflows, … – Does not affect running system – IO events, hardware events • Magic instructions: • Deep inspection of system state – Allows programs to communicate with outside – Caches, TLBs, registers, device registers, buffers ...

30 ESSES, 4 Sept 2003 31 ESSES, 4 Sept 2003 Integrated Systems HW/SW Cosimulation

P • Highly-integrated r Data o DSP erals g devices on the rise mem Periph r a

m

• Develop HW & SW in Bluetooth parallel GSM CPU • Simulate hardware Radio and software together LCD Code memory in development of driver entire system

33 ESSES, 4 Sept 2003

Big Systems & Small Details Transactions vs Pins

• To achieve speed: reduce level of detail • Transaction-level modeling: – Model transactions as a unit 0101011001011 • To capture important effects: increase level – Level of model: • ”Memory read” / ”Network packet send” / ... – Only when something is activated • Solution: model only parts at great detail • Pin-level modeling: – Finished hardware can be modeled simply – Model detailed electronics of a transaction – Model only what needs to be observed – Level of model: • Individual pins – Mostly, no need for RTL-level understanding • Clocked pulsing of transmission pins – Every clock cycle

34 ESSES, 4 Sept 2003 35 ESSES, 4 Sept 2003 Clocking vs Blocking Transactions/Events vs Pins/Clock

• Traditional hardware modeling: • Example: device read operations CPU Device – One (or two) step per clock cycle • Transaction: – Clock to generate evolution of internal state – Call device model: (op=read, address=0x17) – =All devices called each cycle • Large overhead for context switching – Immediate reply: data=0x42 • Optimized hardware modeling (blocking): •Pins: – Only call when events (read, writes) occur – Set address pins to 00011110 – Evolve internal state several cycles at a time – Drive clock pin to 1 and then 0 CPU Device • Count the time since last activation – ... until data ready pin is 1 – Lower context switch overhead – Then read 01000010 from data pins

36 ESSES, 4 Sept 2003 37 ESSES, 4 Sept 2003

HW/SW Cosimulation: fast HW/SW Cosimulation: detailed

Simics Simics VHDL/Verilog Simulator

Memory Devices Memory Devices Application Application (RT)OS Behavioral model (RT)OS Drivers Drivers CPU Core Interface: CPU Core Interface: RTL-level simulation transactions, pins, events, maybe clock cycles clock cycles

38 ESSES, 4 Sept 2003 39 ESSES, 4 Sept 2003 Device Modeling Stimulating a Simulation • Large part of work for a platform – Processors: few and standardized – Devices: (very) many and varied. But simpler. – Still pretty fast, at transaction level li mu Sti • Modeling devices: – C/C++/Python with simulator APIs – SystemC – VHDL/Verilog – Graphical languages (Magic-C)

40 ESSES, 4 Sept 2003

Stimuli Regular Computers

• Without proper stimuli, model is useless • Fixed inputs – Spec benchmarks: loaded from disk • Feed mechanism •Network – How to get information into the simulation – Load generation on simulated machines – Interface to a real network • Data generation • Interactive use – What to supply to the simulation • Load generators on real machines • Keyboard & mouse – Can get tricky – Map directly to real device – Easy for PC-on-PC-style – Interactive user

42 ESSES, 4 Sept 2003 43 ESSES, 4 Sept 2003 Non-traditional Computers Physical World Interaction

• Phones, navigation computers, • Special simulated devices PDAs, etc. – Sensors & actuators • Application development • Data sources – Use GUI to provide interactive – Statistical models of real system behavior sessions with user – Simulation models of – Keyboard, joystick, touch screen physical reality – Not radio data etc. – Hardware-in-the-loop simulation

44 ESSES, 4 Sept 2003 45 ESSES, 4 Sept 2003

Configuration as Stimuli Workload Scaling

• Stimuli = hardware configuration • Problem: simulation is slow • Booting an operating system – Especially for detailed architectural simulation – Test of OS software vs hardware – Slowdown 10000: – Reconfigure hardware, alter devices • 1 minute real time = 7 days simulation time • Self-configuring systems • Scale (down) workloads to fit – Networks & other distributed systems – Smaller data sets – Master election, device discovery, etc. – How to make representative of full runs? – Adding/removing simulated nodes – Tricky problem in its own right

46 ESSES, 4 Sept 2003 47 ESSES, 4 Sept 2003 Software Development Using Simulation • Low-level software development – Supervisor-level (OS) & interrupt code debug – Inspection of system state – Device access tracing & breakpoints – Debugging unfinished operating systems – Developing drivers • High-level software development – Powerful debugger, with checkpointing

49 ESSES, 4 Sept 2003

Hardware Replacement Hardware Development

• Embedded HW • Model hardware in development – Cheaper, more convenient, available, stable • Test components before physical prototype – Often 10000+ USD development platforms • Stimulate HW with real workloads • Virtual platform for early software dev – Requires ability to run operating systems – Boards under development • HW/SW cosimulation – AMD64 (Hammer, Opteron, Athlon 64) – At various levels of detail • Saved months for the Linux/AMD64 ports • Shortens time to market dramatically – Next-gen UltraSparcs, ...

50 ESSES, 4 Sept 2003 51 ESSES, 4 Sept 2003 Parallelization of Development Network Software Handoff to the software team, Board design Board prototype when ”working” hardware exists • Develop network stacks & protocols

Software development – Easy to instrument the network, trace traffic – Easy to inject packets Board prototype – No interference from other traffic – Synchronous breaks at important events Board design & Simulator = reference build simulator • Try network configurations – Large networks Handoff to the Software development software team, using – Pathological topologies a simulation of the hardware platform 52 ESSES, 4 Sept 2003 53 ESSES, 4 Sept 2003

Performance Tuning Faults and Boundary Cases

• Performance tuning of software • Fault injection – Trace & statistics on performance events – Repeatable, no physical damage necessary – Cache misses, TLB misses, disk accesses – Fault tolerant systems, safety critical systems – Memory access patterns – Examples: next slide – Get first-order estimates from event counts • Boundary case testing • Absolute performance measurements – Extremely small or large configurations – Requires very detailed models – Intense bursts of interrupts – Not a design goal of large-scale simulators – Communications latencies and intensity

54 ESSES, 4 Sept 2003 55 ESSES, 4 Sept 2003 Fault Injection Examples Fault Injection & Checkpointing

C Restore checkpoint orru m pt easu C reme Boot system SW Run 1 orrupt nts registe r values Sensor CPU Position workload Check results Transient Take checkpoint CKP errors in xmit Device Did the Restore checkpoint RAM fault t affect the anen Run 2 Perm rors Check results result? bit er injected Device Bridge Network fault twork pt ne Check results a Corru lug ug e ; unp npl tir ckets U e en m pa vic ill ste de K sy ub 56 s ESSES, 4 Sept 2003 57 ESSES, 4 Sept 2003

Teaching Simulate with Care • Enable hands-on experience • Computer architecture • Embedded systems programming • Operating systems – Debug half-finished systems – Same setup for all students, easy handins • System management – Easy to restore system state – No risk to real machines and networks

58 ESSES, 4 Sept 2003 Obtaining Significant Results Wisconsin Experiments

• Computer architecture research • Mark Hill et al, IEEE Computer Feb 2003 – 90% or more done in simulation • Investigating potential pitfalls of simulation – Measure of success: • Detailed microarchitectural modeling effect of modification to a reference machine – Pipeline, caches, reordering, the works – What is a significant result? -5%?+10%? – Randomized L2 miss time (80-89 cycles) •How real is the machine modified? • Several runs with same workload – SimpleScalar is not a real processor • Variable results! • = need for quite extensive modeling

60 ESSES, 4 Sept 2003 61 ESSES, 4 Sept 2003

Wisconsin Experiments Wisconsin Experiments

4.0 3.4

3.5 3.2

max avg 3.0 3.0 min 32 64

Cycles Per Trans. (millions) 2.8 2.5 16 32 64 Cycles Per Trans. (millions) ROB Size 2.6 • WCR (16,32) = 18% (“Wrong Conclusion Ratio”) 5 101520 • WCR (16,64) = 7.5% Sample Size (number of runs) • WCR (32,64) = 26%

62 ESSES, 4 Sept 2003 63 ESSES, 4 Sept 2003 Wisconsin Experiments Implementations • Conclusions: – Simulation no different from runs on real HW – Use standard statistics – Non-overlapping confidence intervals • Danger of determinism in simulation – Testing a single path of a program – Induce variability by randomization

64 ESSES, 4 Sept 2003

Full-System Simulators Almost done... • Integrated full-system simulators – Virtutech Simics (better at abstraction) – Virtio Virtio (more on the pin-level) • Frameworks for combining discrete simulators – Combine ISS with VHDL and Verilog simulators – Mentor Graphics Seamless – Cadence Incisive • Virtualization environments – VmWare VmWare (fast PC-on-PC) – Connectix/Microsoft VirtualPC (fast PC-on-PC)

66 ESSES, 4 Sept 2003 Cost of Simulation

• Sim in 1977: on DEC VAX-11/780 Windows NT – 200,000 USD (1977 dollars) VxWorks on on Linux on PowerPC – 1 VAX MIPS – simulation technology ~200 Photo courtesy of The Computer Museum History Center – cost for simulated server hour: 4,000 USD Solaris on • Sim in 2002: on Dell PC (P4 2.2 GHz) Sun SunFire – 1,500 USD – approx 3100 VAX MIPS Windows XP/64 – simulation technology ~40 on AMD Hammer Photo courtesy of Photo courtesy – cost for simulated server hour: 2 USD Linux All running on a … a factor of 2000 in 25 years! on x86 Linux host

68 ESSES, 4 Sept 2003

Demo Time! Thank You • Booting Linux • Lifting checkpoints of Windows and Solaris http://www.virtutech.com • Kernel debugging http://www.simics.net • IO access history [email protected] • Configuration file syntax • ... and more ..

70 ESSES, 4 Sept 2003