A Virtual Platform Environment for Exploring Power, Thermal and Reliability Management Control Strategies in High-Performance Multicores

A Virtual Platform Environment for Exploring Power, Thermal and Reliability Management Control Strategies in High-Performance Multicores

A Virtual Platform Environment for Exploring Power, Thermal and Reliability Management Control Strategies in High-performance Multicores Andrea Bartolini†, Matteo Cacciari †, Andrea Tilli †, Matthias Gries‡, Luca Benini † University of Bologna, DEIS † Intel Labs ‡ Italy Germany Microprocessor Lab a.bartolini, matteo.cacciari, andrea.tilli, [email protected], [email protected] Outline • Introduction • Simulation Strategy • Virtual Platform • Platform Characterization • Control Development Cycle • Case Study • Conclusion Today and Future Multicores Tecnology scaling software System integration High performace Costs requirements Spatial and temporal Limitated workload variation High dissipation power capabilities densities NON UNIFORM: power, temperature, performance Leakage current Reliability lost, Hot spots, thermal Aging gradients and cycles Today and Future Multicores Tecnology scaling software System integration High performace Costs requirements Spatial and temporal Limitated workload variation High dissipation power Resource Management solution:capabilities densities on-line tuning of system performance and NON UNIFORM: temperaturepower, temperature, through closed performance-loop control management policies Leakage current Reliability lost, Hot spots, thermal Aging gradients and cycles Management Loop: Holistic view System information Workload from OS Task scheduling - migration CPU utilization, queue status Control algo migration policy SW HW Introspection: monitors Self-calibation: knobs (Vdd, Vb, f,on/off )‏ Reliability alarms Temperature Power Multicore Platform Core 1 Core N Proc. 1 Proc. 2 … Proc. N INTERCONNECT Private Private … Private Mem Mem Mem Evaluation by Simulation HW prototype and test-chips: accurate but expensive • Long and complex development stage System • Limited introspection Simulator • Incorrect control policy can lead to permanent HW damage Key features : • must accurately simulate the entire system evolution (program flow, data coherency conflicts, complex memory latency) • to avoid unrealistic simulation artifacts • to allow evaluation of control solution that fits in the system • must emulate the same performance knobs and introspective sensors of real HW • must co-simulate the physical effects we want the controller to be developed for • Power model, Thermal model, Reliability model • allows high-level control strategies co-simulation • acting on virtual performance knobs / reacting on virtual introspective sensors Simulator Strategy Trace driven Simulator [1]: • Not suitable for full system simulation (How to simulate O.S.?) • looses information on cross-dependencies resulting in degraded simulation accuracy Workload Multicore Simulator Power Model Control strategy Temperature Model [1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007 Simulator Strategy Trace driven Simulator [1]: • Not suitable for full system simulation (How to simulate O.S.?) • looses information on cross-dependencies Workload Set resulting in degraded simulation accuracy Close loop simulator: Multicore • Cycle accurate simulators [2] : Simulator • High modeling accuracy • support well-established power and temperature co- Execution Trace simulation based on analytical models and system database micro-architectural knowledge • Low simulation speed • Not suitable for full-system simulation Power Model • Functional and instruction set simulators: Control • allow full system simulation strategy • less internal precision Temperature Model • less detailed input data no micro-architectural analytical model • introduces the challenge of having accurate power and temperature physical models [1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007 [2] Benini L. et al. MPARM: Exploring the multi-processor SoC design space with SystemC 2005 Contributions Novel Virtual Platform: • Allows fast but physical accurate simulation • Based on a full-system multicore high-level functional simulator (Virtutech Simics [1]) • Integrates Power and Thermal model derived by real HW characterization • Enables fast control-algorithms prototyping, design and testing: • Allowing co-simulation of multicores SoCs with high-level description of control-algorithms (Mathworks Matlab/Simulink [2]) Challenges: • Identification, from experiments on a real general-purpose multicore platform, of accurate, but component-oriented, power and thermal models • Interfacing physical model with functional simulator engine • Development of a co-simulation bridge between the multicores simulator and Mathworks Matlab/Simulink. [1] http://www.virtutech.com/ [2] http://www.mathworks.com/ Virtual Platform App.1 App.N 1 1 N N T T T RUBY CORE ... T .... ... Stall Mem O.S. SW Access HW CPU1 CPU2 CPUN SIMICS L1 L1 L1 L2 L2 Simulator Network DRAM Simics by Virtutech: Virtual Platform • full system functional simulator Memory timing model • models the entire system: • RUBY – GEMS (University of Wisconsin)[1] peripherals, BIOS, network interfaces, cores, • Public cycle-accurate memory timing memories model • allows booting full OS, such as LinuxSMP • Different target memory architectures • supports different target CPU (arm, sparc, x86) • fully integrated with Virtutech Simics • x86 model: • written in C++ • in-order • we use it as skeleton to apply our add- • all instruction are retired in 1 cycle ons (as C++ object) • does not account for memory latency [1] Martin Milo M. K. et al. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset 2005 Virtual Platform Performance knobs (DVFS) module: • Virtutech Simics support frequency change at run-time • RUBY does not support it: • does not have internal knowledge of frequency • We add a new DVFS module to support it : • ensures L2 cache and DRAM to have a constant clock frequency • L1 latency scale with Simics processor clock frequency App.1 App.N 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE CPU1 CPU2 CPUN f Stall Mem L1 L1 L1 i Access L2 L2 SIMICS Network DRAM Simulator Virtual Platform Virtual Platform Performance knobs (DVFS) module: • Virtutech Simics support frequency change at run-time • RUBY does not support it: • does not have internal knowledge of frequency • We add a new DVFS module to support it : • ensures L2 cache and DRAM to have a constant clock frequency • L1 latency scale with Simics processor clock frequency App.1 App.N 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE PC CPU1 CPU2 CPUN f Stall Mem #hlt,stall L1 L1 L1 i Access active,cycles L2 L2 SIMICS Network DRAM #L1 , #BUS , MISS ACCESS Virtual Platform Simulator CYCLEACTIVE,.... Virtual Platform Power model module: • At run-time estimate the power consumption of the target architecture • Core model PT = [PD(f,CPI) + PS(T, VDD)] *(1 − idleness) + idleness *(PIDLE) • PD experimentally calibrated analytical power model • Cache and memory power – access cost estimated with CACTI [1] PCORE, PL1, PL2 App.1 App.N POWER MODEL 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE PC CPU1 CPU2 CPUN f Stall Mem #hlt,stall L1 L1 L1 i Access active,cycles L2 L2 SIMICS Network DRAM #L1 , #BUS , Simulator MISS ACCESS Virtual Platform CYCLEACTIVE,.... [1] Thoziyoor Shyamkumar et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. 2008 Power Model . Power model interface i-th CPU # Istruction retired P # Stall Cycle O # HLT Istruction P W i-th L1 O # Line & WD Read E W f = k * f f = k * f # Line & WD Write R 1 1 nom 2 2 nom fN = kN * fnom E f ,V ,T R dd & CPU1 CPU2 CPU N M L1 L1 L1 i-th L2 E O N L2 L2 D # Line Read E # Line Write E R Network L G DRAM Simulation snap-shot Y DRAM Ruby & simics # Burst Read # Burst Write Modeling Real Platform – Power Real Power Measurement • Intel server system S7000FC4UR • 16 cores - 4 quad cores Intel® Xeon® X7350, 2.93GHz • 16GB FBDIMMs • Intel® Core™ 2 Duo architecture • At the wall Power consumption • test: • set of synthetic benchmarks with different memory pattern accesses • forcing all the cores to run at different performance levels • for each benchmark we extract the clocks per instruction metrics (CPI) and correlate it with the power consumption 2 kE PD kA VDD fCK kB (kC kD fCK )CPI • We relate the static power with the operating point by using an analytical model Modeling Real Platform – Power Real Power Measurement • Intel server system S7000FC4UR • 16 cores - 4 quad cores Intel® Xeon® X7350, 2.93GHz • 16GB FBDIMMs High accuracy at • Intel® Core™ 2 Duo architecture high and low CPI • At the wall Power consumption • test: • set of synthetic benchmarks with different memory pattern accesses • forcing all the cores to run at different performance levels • for each benchmark we extract the clocks per instruction metrics (CPI) and correlate it with the power consumption 2 kE PD kA VDD fCK kB (kC kD fCK )CPI • We relate the static power with the operating point by using an analytical model Virtual Platform Temperature model module: • we integrate our virtual platform with a thermal simulator [1] • Input: power dissipated by the main functional units composing the target platform • Output: Provides the temperature distribution along the simulated multicore die area as output TEMPERATURE MODEL T PCORE, PL1, PL2 App.1 App.N POWER MODEL 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE PC CPU1 CPU2 CPUN f Stall Mem #hlt,stall L1 L1

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us