Virtual Prototypes and Plaorms A Primer

Eyck Jentzsch, MINRES Technologies GmbH Rocco Jonack, MINRES Technologies GmbH Josef Eckmüller, Deutschland GmbH

© Systems Initiative 1 FOUNDATIONS OF VP

© Accellera Systems Initiative 2 What are not virtual prototypes & plaorms

• These are not virtualizaon soluons or virtual machines like VirtualBox or VMWare – A virtual machine (VM) is an emulaon of a computer system (...) and provides funconality of a physical computer (Wikipedia) – A VM or virtual machine was originally defined by Popek and Goldberg as "an efficient, isolated duplicate of a real computer machine." • Although some concepts and mechanisms are also used in VPs

© Accellera Systems Initiative 3 What is a virtual prototype or plaorm

• Virtual plaorm is a hardware simulator execung embedded soware (eSW) • Usually built on top of an instrucon set simulator (ISS) of the processor(s) connected via buses to memory and peripherals such as mers, general I/O, communicaon interface, etc. • While the VP is mostly implemented in SystemC the ISS may be implemented in some other general purpose language.

© Accellera Systems Initiative 4 Differences between virtual prototype and plaorm

• Virtual prototype – Is in development and does not have final structure and paroning. – Possibility to quickly change this – Comprehensive analysis means – Used to opmize SoC design wrt. power, area, speed, throughput, and/or latency • Virtual plaorm – Represents a SoC in its final shape, – Aims at simulaon speed as well as funconal accuracy and completeness, – Used for (e)SW development

© Accellera Systems Initiative 5 Alternaves to VPs

• Emulaon – high accuracy and speed – needs Register-Transfer-Level (RTL) descripon, expensive to scale • FPGA – high accuracy and speed – needs RTL implementaon to do synthesis, place and route è long implement- compile-debug loops • Hardware in the loop (HIL) – very high accuracy and speed (real-me) – comes late in the design process, scalability challenge

© Accellera Systems Initiative 6 VP standards

• Focus on standards for VP development - SW and FW running on VP may have different requirements/standards • Usage of C++ - Performance requirements - Compliance with exisng models - Most tools provide APIs - Linking of scripng environments for dynamic control e.g. Python, TCL, Lua • Standards are important for IP exchange - Internal implementaon might be different - Standards oen imply standard interfaces/APIs - Standards typically allow easier tool deployment • Tools can blur the line between standard and proprietary implementaon - Addional funconality provided that is vendor specific - Model libraries may incur tool specific funconality

© Accellera Systems Initiative 7

Modeling techniques

• Several modeling approaches are used to implement a VP and its components – Behavioral/unmed – Funconal/loosely med – Cycle-accurate/approximately med – Register-transfer-level • A parcular VP usually is a mix-and-match of these techniques

© Accellera Systems Initiative 8 Behavioral/unmed

• The unmed model is a funconal IF descripon having no noon of me • A set of algorithmic blocks F1 communicang using funcon calls or message pipes • Keeps the causality and order of invocaon F2

© Accellera Systems Initiative 9 Funconal/loosely med (LT) • The loosely med model is a structural and behavioral refinement of the OS+SW (IF, F2) funconal model. ACCEL • Mapping of funconal blocks to HW and CPU (F1) SW components and communicaon interfaces in-between based on a chosen architecture • Subsystems can execute ‘ahead-of-me’ • Transacons at communicaon MEM I/O interfaces correspond to a complete read or write across a bus or network in physical hardware and provide ming at the level of the individual transacon

© Accellera Systems Initiative 10 Cycle-accurate/approximately med (AT)

• Approximated ming on bus OS+SW communicaon and on hardware (IF, F2) ACCEL resource access CPU (F1) – Interface communicaon me – Average processing me in hardware IP • Transacons are broken down into a

MEM I/O number of phases corresponding much more closely to the phasing of parcular hardware protocols

© Accellera Systems Initiative 11 Register-transfer level (RTL)

• Models a block or SoC in terms of the flow of data between hardware registers, and the logical operaons performed on this data • Starng point for physical implementaon by synthesis, place, and route

© Accellera Systems Initiative 12 C++ class libraries for modeling

• SystemC is a class library on top of C++ SCML - Structural descripon elements - Data types TLM2 - Event driven simulaon kernel • TLM2 modeling allows for interoperability SystemC - Reuse of exisng components whether 3rd party or inhouse C++ - Tool environments support TLM2 - Interacon with more cycle-accurate models is well defined – Even bridging into RTL simulators is well defined and supported by (commercial) tools • Producvity libraries enable faster modeling of common components - Registers, memories, infrastructure tasks - Examples of libraries - Propriatary: SCML, Intel ISCTLM, ASTC Genesis - Open Source: Greensocs Greenlib, MINRES SC Components

© Accellera Systems Initiative 13 TLM2 based modeling

• TLM2 ports are the interfaces between IA TLM2 ports modules - Retain connecon to architectural view of SoC - Allows for reuse of components Subsystems, for instance Audio, - Focus on interfaces, not on funconality Graphics, Modem etc. • Provides infrastructure for modeling efficiency - LT modeling allows opmizaon of events Iniator Slave - Direct-memory-interface (DMI) calls can nb_transport_bw b_transport invalidate_direct_mem_ptr nb_transport_fw provide very high memory-access efficiency get_direct_mem_ptr transport_dbg - Temporal decoupling allows models to delay synchronizaon with other components

© Accellera Systems Initiative 14 Modeling efficiency with TLM2 LT

• LT specifies different transport mechanisms IA ISS RAM - b_transport() as blocking access per transacon - DMI calls to request memory access by pointer

• Design goals Subsystems, for instance Audio, - Minimizing synchronizaon with environment Graphics, Modem etc. - Localizing workload - Allows for registraon of callback funcons • Works well in conjuncon with temporal decoupling - Avoid synchronizaon in blocking calls if possible - Processor models can decide when to synchronize

© Accellera Systems Initiative 15 Producvity Libraries

• Common elements should be modeled on top IA of common classes • Registers, Memories, etc.

• Efficient for stubs or preliminary implementaons Subsystems, for instance Audio, • Common infrastructure tasks should be Graphics, modeled on top of common classes Modem etc. – Logging – Parametrizaon – Domain handling (Clock, reset, power) • SCML, ISCTLM, and Genesis are examples of (proprietary) producvity libraries – Concepts and implementaons might serve as starng point for standardizaon

© Accellera Systems Initiative 16 Register modeling

• Registers are one of the main interface between HW and SW • Registers mean different things to HW and SW groups registers control Write_start() - HW contains many registers, but only few are exposed to SW enable registers set Write_finished() - SW visible registers should be implemented, documented and reset tested src_addr Reset_values() dest_addr • Consider automated code generaon - Large amount of registers - changing requirements - Meta data formats like IPXACT, RDL, RAL • Having a model that allows consistent HW and SW access modeling is important - Access modes, reset and retenon values - Efficient access through callback funcons - Producvity layer provides register classes

© Accellera Systems Initiative 17 Model meta data

• Stching models together is a recurring IA and error prone task • Using meta data is oen useful – Data visualizaon Subsystems, for – Data exchange instance Audio, Graphics, • IPXACT is a XML based schema for Modem etc. interface descripons – Some tools can provide or read IPXACT • XML parsing for simple tasks can be done using Open source libraries • Other examples for meta data representaons are RDL, RAL, sysML

© Accellera Systems Initiative 18 Improvement of standards

• TLM2 provides a generic transport mechanism and extension mechanisms – Specific protocols are not described • AMBA, PCI, MIPI are le to implementer • Producvity libraries are not standardized • More support for common infrastructure tasks – Tracing – Profiling – Parallelizaon

© Accellera Systems Initiative 19 Using IP for Virtual Plaorms • VP represents hardware – SoC design typically contains a significant number of 3rd IP components – Ideally all IP should come from same source (RTL, architecture, verificaon, VP models) • Availability of IP for VP is improving, but sll limited – Availability of models oen defines the usage – Consider Co-simulaon • Processor-based models that execute SW – User of VP is exposed to those models • Peripheral models – Execute funconality autonomously, but under control of SW • Interconnect models – Connecng different elements based on addresses

© Accellera Systems Initiative 20 Modeling processors

SW/FW • Core funconality of VP IA ISS • Need support for high speed, good debug interface, profiling, etc.

Subsystems, for • Most embedded processor providers instance Audio, Graphics, also provide ISS models Modem etc. – Speed versus ming accuracy • Dynamically configurable to run in performance mode or accuracy mode – Links to debug environment and other SW tools should be consistent with user expectaon

© Accellera Systems Initiative 21 Processor modeling techniques

• Interpreng execuon – Slow but easy to implement and accurate • Just-in-me (JIT) compiling/dynamic binary translaon (DBT) – Virtualizaon library converts target instrucons to host instrucons e.g. using LLVM or QEMU right before execung them • Host based/host compiled simulaons – Instead of running the target SW/FW binary on a target processor (model), run a host-compiled representaon interacng with the model of the plaorm – Virtualizaon environments allow running the SW/FW binary

© Accellera Systems Initiative 22 ARM processor models

• ARM fast models – LT based models for speed with debug interfaces • Cycle accurate model based on Carbon tool translaon – Cycle accuracy guaranteed

Source: ARM support website

© Accellera Systems Initiative 23 Tensilica processor models • ISS from Tensilica is highly configurable – Turbo mode (JIT) is using various mechanisms to speed up simulaon – Non-turbo mode (interpreng) is instrucon accurate • Models provide TLM2 based interfaces • Tensilica provides extension to gdb for debugging • Integraon into Tensilica tool environment by aaching simulaon

through TCP port Source: Cadence support website

© Accellera Systems Initiative 24 Modeling peripherals • Typical peripherals are UART, PCI, USB, mers – May connect to the ‘real’ world by using the host workstaon resources – Not main funconality, but important to get FW contents right – Synopsys DW library provides TLM2 based versions of various IP – Some vendors react on demand • Various versions of peripherals are common – Upgrading even to a new version may imply changes in the interface • Peripherals can have significant contribuon to system performance – DMAs, peripherals sending video data, real-me constraints Source: screen shot from PlatformCreator

© Accellera Systems Initiative 25 Modeling interconnect

IA • The model that connects everything ISS – Mostly address map in VP context – Consistent view of registers Subsystems, for – May contain some SW visible funconality like instance Audio, firewalls, address shiing, domain crossings Graphics, Modem etc. • Some libraries and tools contain generic models – Fast inial implementaon – High performance • Some vendors provide TLM2 LT interconnect model, e.g. NetSpeed – Ensures consistency within design flow Source: screenshot from NetSpeed NocStudio

© Accellera Systems Initiative 26 Custom build models

• Custom models are necessary, for instance – In house IP – Important IP without any available 3rd party model – Extensions to exisng models • Consider cost of maintenance – Build based on producvity libraries – Reuse exisng models for instance unmed models • Consider RTL co-simulaon or translaon like Carbon tools

© Accellera Systems Initiative 27 Commercial tools available for VP development (exemplary list) • Synopsys Processor Designer, Virtualizer • Cadence Virtual System Plaorm • Mentor Vista • MathWorks Matlab, Simulink • ASTC VLAB, Processor Modeling Studio • Wind River Simics • Vivado • ARM Carbon Model Studio, SoC Designer Plus

© Accellera Systems Initiative 28 SystemRDL, IPXACT & SystemC DEMO

© Accellera Systems Initiative 29 DEVELOPMENT AND APPLICATIONS OF VP

© Accellera Systems Initiative 30 SoC/system & VP development phases

• Behavioral descripon of the SoC or Definion/Specificaon system • Paroning and Mapping to Architecture/Design Architecture (HW) and SW • HW & FW development Implementaon

• Funconal, ming, power, ... validaon Validaon

• SW development and further Opmizaon opmizaon

© Accellera Systems Initiative 31 User roles in the VP dev phases • Plaorm authors, IP authors Definion/Specificaon

• Plaorm authors Architecture/Design

• Plaorm authors, IP authors Implementaon

• Plaorm authors, plaorm users (SoC Validaon & system integrators) • Plaorm users Opmizaon

© Accellera Systems Initiative 32 VP uses cases • Architectural exploraon and Definion/Specificaon performance analysis

Architecture/Design • Funconal verificaon • FW development (driver, HAL, Implementaon OS) • System performance validaon Validaon • Debug & analysis in FW/SW development Opmizaon • Connuous integraon & test- driven SW development

© Accellera Systems Initiative 33 Architectural exploraon & performance analysis

• Mapping of a behavioral model to one or more points in the architectural space consisng of different HW implementaons • Evaluaon based on performance characteriscs for different system architectures, such as a HW/SW split, communicaon system, or system components • Typical use case for plaorm authors • Important properes: accuracy wrt. to performance metrics i.e. ming, latency, throughput, power

© Accellera Systems Initiative 34 Funconal verificaon

• Behavioral model allows early development of system and integraon test thus validang the specificaon • Refined VP with final architecture and HW/SW paroning serves as executable specificaon for hardware implementaon

© Accellera Systems Initiative 35 Funconal verificaon • Can be used as a reference model for the SoC implementaon as well as for component development (RTL block verificaon) – Replace (IP-)blocks by their RTL counterpart and validate in system context – VP as part of the block test bench • Reuse of smuli from VP to RTL, component to system – E.g. using portable smuli (Accellera standard)

© Accellera Systems Initiative 36 FW development (driver, HAL & OS) • Use the VP as a vehicle to execute target SW by simulang HW behavior Implement • Allows shi-le by starng SW development early in the design process • Used by plaorm authors and system integrators • Important properes: balance between speed vs. accuracy Build • Comprehensive analysis, visibility and debug Debug means needed – Logging of HW and SW events (e.g. via host I/O) – Signal & transacon tracing – Non-intrusive debugger integraon

© Accellera Systems Initiative 37 System performance validaon

• Validate performance of plaorm and SW in applicaon scenarios within target environments – Recepon of data at wireless modems – Event response me in realme crical systems • Used by system integrators

© Accellera Systems Initiative 38 Debug & analysis in FW/SW development

• Similar to FW development • Used by system integrators • Focus even more on speed as not only the plaorm is being simulated rather also the target environment e.g. cellular base staons, enre motor controller including the control loop containing actuators and sensors

© Accellera Systems Initiative 39 Debug & analysis in FW/SW development

• Comprehensive analysis, visibility, tracing, and debug enables non- intrusive observaon and debug of behavior thus easing the debugging • Tools like Eclipse Trace Compass or impulse allow to fuse HW and FW/ SW events to ease debugging system behavior

© Accellera Systems Initiative 40 Connuous integraon & test-driven SW development • The all-in-soware approach allows to deploy modern SW development techniques like connuous integraon (CI) and test-driven design (TDD) • Each feature is being (unit-) tested so that funconality regressions can be detected early • Each SW change is tested before propagang to the main line of development • Allows close monitoring of the eSW development progress • Maximum benefits in situaons where one soware system addresses mulple hardware variants in different system contexts • Used by system integrators

© Accellera Systems Initiative 41 FW debugging in Eclipse using VP as target. DEMO

© Accellera Systems Initiative 42 ADVANTAGES, LIMITATIONS, AND CHALLENGES

© Accellera Systems Initiative 43 Advantages

• Early availability & easy reconfiguraon • Observability • System fault injecon & fault simulaon • Simulaon speed and accuracy • Scalability • Enabler of agile eSW development methodology

© Accellera Systems Initiative 44 Limitaons

• Simulaon speed vs. accuracy • Limited mul-threading capabilies in SystemC – advantageous for sub-system integraon and mul-core simulaons • Missing standards for runme configuraon, inspecon, logging and

tracing – integraon of IPs from different vendors sll very tedious

© Accellera Systems Initiative 45 Challenges

• Simulaon speed vs. accuracy • Use case broadening (FW/SW performance validaon, dynamic power esmaon) • Integraon of sub-system and systems

– happens at at different levels ( sub-component, component, sub-system, system) – re-use of test infrastructure between levels – mul-core debugging, SMP, AMP – efficient sub-system integraon process – dynamic roung and linking – availability of lower FW/SW layer required to integrate higher layer SW • Conflict of interest between plaorm authors, tool and IP vendors – plaorm author need maximal flexibility in terms of IP selecon and tool support – IP and tool vendor aims to lock-in plaorm author and user • Exchange of VP plaorms between plaorm authors and plaorm users – legal and IP protecon challenges

© Accellera Systems Initiative 46 Simulaon speed vs. accuracy

• Simulaon speed is crucial for the adopon of VPs • Modern ISS reach several tens or hundreds of MIPS depending on the complexity of the processor • Modeling the enre plaorm slows down esp. If components or communicaon interfaces need to be modeled in more detail and/or with higher accuracy • Single-threaded SystemC kernel

© Accellera Systems Initiative 47 Simulaon speed and accuracy

• Ways to address this: – Mix-n-match approach (coarse and fine grain modeled components) – use case adapted VPs (variants) – Mul-threading and simulaon paroning – Simulaon check-poinng – Faster host hardware �

© Accellera Systems Initiative 48 Early availability & easy reconfiguraon

• Earlier system availability by agile development – Implemenng only funconality that is needed for the use case (soware development, performance analysis) – Incremental extension unl compleon with intermediate deliveries • As communicaon is abstracted, a VP can easily be re-paroned – Graphical tools are available to ease this • By using soware build mechanisms or runme configuraon mechanisms it’s easy to create and maintain variants or derivaves to suit different requirements

© Accellera Systems Initiative 49 Observability

• Simulaon can be stopped upon occurrence of important or interesng events • State of the plaorm (both the hardware and the soware state) is frozen and can be examined e.g. using debuggers • Non-intrusive debugging: other than ‘real’ hardware the system cannot determine that it has been stopped • Same or beer debugger access than hardware • Extensive logging and tracing e.g. VCD or transacon recording allows comprehensive post-simulaon analysis

© Accellera Systems Initiative 50 Observability

• Combining hardware traces and logs with SW events and traces gives even more insight • Detailed analysis of system w/o – Expensive on-chip measurement – Slow HDL simulaon

© Accellera Systems Initiative 51 Scalability

• The all-in-soware approach executes target code on standard hardware • No dependencies on special hardware – Other than FPGA, Emulaon or HIL • Each developer has its own target system to debug the code • Easy scalability by adding off-the-shelf hardware like workstaons or servers

© Accellera Systems Initiative 52 Enabling agile eSW development methodology

• The all-in-soware approach allows to deploy modern SW development techniques like connuous integraon (CI) and test-driven design (TDD) • It can be easily scaled to meet developers demand and compung requirements for CI • Allows close monitoring of the eSW development progress and provides benefits: – Beer code structure and quality by frequent checking and extracng soware metrics – Easier debug and less impact (less code to roll-back) – Fewer major integraon bugs (detected early and easy to track) – Constant availability of a "current" build for tesng, demo, or release purposes

© Accellera Systems Initiative 53 References

• IP-XACT, RDL, SystemC/TLM 2.0: hp://www.accellera.org • ASTC VLAB: hp://vlabworks.com/ • ARM support website: hp://infocenter.arm.com • Tensilica support website: hps://ip.cadence.com/support • GreenSocs Greenlib: hps://www.greensocs.com/docs • MINRES SC-Components: hps://www.minres.com/#opensource • Eclipse Trace Compass: hp://tracecompass.org/ • impulse: hp://toem.de/index.php/projects/impulse

© Accellera Systems Initiative 54 Quesons

© Accellera Systems Initiative 55