Architecture Level Modelling

Architecture Level Modelling

Architecture level modelling Ramon Canal NCD - Master MIRI NCD - Master MIRI 1 How Power Estimation is Addressed Power reduction Power analysis Software-level opportunities iteration times design Software level 10-20X Software-level Power models seconds - minutes power analysis for software-level System level components Arch optimizations Architecture level 2-5X minutes - hours Architecture-level power analysis Power models for Logic level macroblocks and control logic Transistor level 20 - 50% Logic synthesis Increasing power savings Increasing power hours - days Layout level Decreasing design iteration times Logic-level Power models power analysis for gates, cells, and nets days Levels of the execution flow Layout Transistor-level/ Layout synthesis Transistor Good Logic speed/accuracy trade-off Transistor-level Transistor models, power analysis wire models Arch Sys Estimation Time Sw seconds 5% 30% Accuracy Agenda • Introduction • Using Hardware Counters – Based on PARAPET group work (Princeton) • Architecture Simulation • Statistical sampling • Related Work • Conclusions Advantages of Hdw Counters • Power Models reflecting modern processors – Clock gating, power – Voltage regulation, di/dt • Need for Fast-Realtime Modeling and Measurement to observe long time periods – Thermal time constants: O(s) – Not feasible even with architecural simulators • i.e.: 1s of real run ~5 x IPC hrs of WATTCH simulation • Need live, run-time power/thermal measures – Dynamic Thermal Management – Power-Aware OS & Systems control 4 Where all this is useful? • Measurement/Modeling for microarchitectural details • Compiler level power – SW power profiling • Power Aware OS – Dynamic power/thermal/March. Configuration • Dynamic memory allocate, Process cruise control, etc. • Demonstrates modern processor power • Need for speed! Long Timescales, thermal constants • Identify program phases w/o knowledge of code, no basic block info whatsoever • Program signatures for detailed simulation, say: “power points rather than simpoints” 5 INTRODUCTION • Runtime processor power – Measurement with HW – Estimation with Performance counters • CPU Unit Power Breakdowns • Runtime verification • Processor thermal modeling • Power Phase Behavior of programs • Mapping between power behavior and program structure 6 THE BIG PICTURE Performance Real Power Program Monitoring Measurement Profiling Bottom line… To Estimate Power Program Modeling Structure component power & temperature breakdowns for P4 at runtime… To analyze how power Thermal Power phase behavior Modeling Phases relates to program structure 7 Agenda • Performance Monitoring Performance Real Power – P4 Performance Counters Monitoring Measurement – Performance Reader LKM • Real Power Measurement – P4 Power Measurement Setup –Examples • Power Modeling Power – P4 Power Model Modeling – Model + Measurement Sync Setup, Verification • Thermal Modeling – Brief Thermal Model Intro Thermal – Ppro Thermal Model Results Modeling 8 Bonus Material • Power Phase Behavior Program – Similarity Based on Power Vectors Profiling – Identifying similar program regions • Profiling Execution Flow – Sampling process’ execution Power Program –“PCsampler” LKM Modeling Structure • Program Structure – Execution vs. Code space – Power Phases Exec. Phases • <OR VICE VERSA> Power Phases 9 Performance Monitoring • Related Work • Performance Monitoring Performance Real Power – P4 Performance Counters Monitoring Measurement – Performance Reader LKM • Real Power Measurement – P4 Power Measurement Setup –Examples • Power Modeling Power – P4 Power Model Modeling – Model + Measurement Sync Setup, Verification • Thermal Modeling – Refined Thermal Model Thermal – Ex: Ppro Thermal Model Modeling 10 Live CPU Performance Monitoring with Hardware Counters • Most CPUs have hardware performance counters • P4 Performance Monitoring HW: – 18 Event Counters – 18 Counter Configuration Control Registers • Configure how to count – 45 Event Selection Control Registers • Configure what to count – Additional Control Registers 11 Our Event-Counter: Performance Reader • Performance Reader implemented as Linux Loadable Kernel Module – Implements 6 syscalls: • select_events() • reset_event_counter() • start_event_counter() • stop_event_counter() • get_event_counts() • set_replay_MSRs() • User Level Interface: – Defines the events Starts counters Event Types: – Stops counters 59 event classes Reads counters & TSC 100s of events to count 12 Performance Reader: Example Validation • L1_Dcache benchmark • Controls cache hit behavior • Validated against measured cache events • Vary hit rate from 0- 100% 13 Processor Power Measurement • Related Work • Performance Monitoring Performance Real Power – P4 Performance Counters Monitoring Measurement – Performance Reader LKM • Real Power Measurement – P4 Power Measurement Setup –Examples • Power Modeling Power – P4 Power Model Modeling – Model + Measurement Sync Setup, Verification • Thermal Modeling – Refined Thermal Model Thermal – Ex: Ppro Thermal Model Modeling 14 P4 Power Measurement Setup Clamp ammeter on 12V lines on measured CPU DMM reading 1mV/Adc clamp voltages conversion Serial Reader Voltage readings (PowerMeter) via RS232 to (PowerPlotter) logging machine Convert to Power vs. time window 15 “Branch exercise” “High-Low”“L1Dcache” (Taken rate: 1) “L1Dcache” Array Size “L1Dcache” Array Size Array Size 1/100x25 of of L1 L1~L2 “Fast” x4 of L2 Benchmark Execution Initialization PowerPlotter: Example 16 SPEC Power Examples 80 Spec GCC (O3) with specrun -a run • Different 70 60 programs show 50 40 very different [W] 30 power 20 characteristics 10 0 0 50 100time (s) 150 200 • Timescale of Spec VPR (O3) with specrun -a run 60 interest can be 50 huge => 40 30 [W] inaccessible via 20 simulation 10 17 0 time(s) 0 100 200 300 400 500 Processor Power Modeling • Related Work • Performance Monitoring Performance Real Power – P4 Performance Counters Monitoring Measurement – Performance Reader LKM • Real Power Measurement – P4 Power Measurement Setup –Examples • Power Modeling Power – P4 Power Model Modeling – Model + Measurement Sync Setup, Verification • Thermal Modeling – Refined Thermal Model Thermal – Ex: Ppro Thermal Model Modeling 18 P4 POWER MODEL Define components (I.e. L1 cache, BPU, Regs, etc.), whose powers we’ll model: Define . from annotated layout Components Determine combination of P4 events that represent component accesses best Define Events Gather counter info with minimal power overhead and program interruption Performance Real Power Monitoring Measurement Convert counter info into component power breakdowns Verify total power against measured processor power Power Modeling 19 Defining Components 20 Defining Events Access Rates • We determined 24 events to approximate access rates for 22 components • Used Several Heuristics to represent each access rate • Examples: • Need to rotate counters 4 times to collect all event data – Used 15 counters & 4 rotations to collect all event data Access Rates Component Powers •“Performance Counter based Access Rate estimations are used as proxy for max component power weighting together with microarchitectural details in order to estimate processor sub-unit powers” – EX: Trace cache delivers 3 uops/cycle in deliver mode and 1 uop/cycle in build mode: • Power(TC)=[Access-Rate(TC)/3 + Access-Rate(ID)] x MaxPower(TC) + Non-gated TC CLK power • Total power is computed as the sum of all 22 component powers + measured idle power (8W): 22 Experiment Setup – Recall: Clamp ammeter on 12V lines on measured CPU DMM reading 1mV/Adc clamp voltages conversion Serial Reader Voltage readings (PowerMeter) via RS232 to (PowerPlotter) logging machine Convert to Power vs. time window 23 Experiment Setup 1mV/Adc conversion Voltage readings via RS232 to logging machine 24 Experiment Setup 1mV/Adc conversion POWER SERVER Voltage readings via RS232 to logging machine Component POWER access rates over CLIENT ethernet Convert voltage to measured power Convert access rates to modeled powers Sync together in time window 25 Tuning Benchmarks “Branch exercise” (Taken rate: 1) “L1Dcache” “High-Low” (Hit Rate : 0.1) “Fast” Measured Modeled 26 Component Breakdowns for “branch_exercise” Colors for 4 CPU subsystems Execution Issue - Retire Component Breakdowns 27 Benchmark Power Breakdowns High Busissue,L2L1 Cache Power exec. Power & branch power 28 SPEC2000 Results VPR Elaboration: EquakeInteger Elaboration:benchmark (FP benchmark) Initialization2 runs: 1st andPlacement, computation 2nd Routephases FP1st intensiverun much mesh stable computation power, 2nd more phase variable InitializationPlacement haswith higher high complex miss than IA32 route instructions < L2 pwr> Significant FPE power due to x87_SIMD_moves Twolf Elaboration:(Integer benchmark) Several loop computations traversing memory <High Memory Power> Although ~const. Total power, component powers have slight gradients 29 Average SPEC Total Powers •1st set: Overall, 2nd set: Non-idle power • Average difference between measurement and estimation: 3W • Worst case: Equake (5.8W) 30 Stdev of SPEC Total Powers •1st set: Overall, 2nd set: Non-idle power • Average difference: 2W • Worst case: Vortex (3.5W) 31 Thermal Model • Related Work • Performance Monitoring Performance Real Power – P4 Performance Counters Monitoring Measurement – Performance Reader LKM • Real Power Measurement – P4 Power Measurement Setup –Examples • Power Modeling Power – P4 Power Model Modeling – Model + Measurement Sync Setup, Verification • Thermal Modeling – Brief

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    70 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us