Optimizing Performance and Power in 7

Douglas Boling Boling Consulting Inc. About Douglas Boling

• Independent consultant specializing in and Windows Embedded Compact (Windows CE) – On-Site Instruction – Consulting and Development

• Author – Programming Embedded Windows CE • Fourth Edition Agenda

• Concepts

• Performance strategies

• Power strategies Fundamentals

• Embedded hardware is slow – It’s designed that way intentionally

• Slower hardware means – Smaller – Better battery life – Cooler (temperature wise) – Less expensive

• Software performance impacts hardware design – Users expect a level of performance – With slow software, hardware has to be made faster Speed Limiters

• Devices are “bus limited” – Performance of device limited by memory bandwidth

• Embedded CPUs fast in the core – Computation ability fairly good

• Often, no floating point – Except – Some ARM CPUs Bad Caching Algorithms

• Most BSPs were designed assuming single level caches – Or relatively small L2 caches

• BSPs may flush entire L2 cache unnecessarily – Huge impact to performance – If smaller flush than one line, look at flushing by address

• Solution – Go over cache routines with fine tooth comb – Specialized cache algorithms may be necessary – Take the time, it’s worth it! Excessive Copying of Data

• Try to pass pointers, not data

• Use Memory Mapped Object to share buffers between apps

• Have driver work on application’s buffers – This can be a stability/Security problem Frequent Calls Across “Boundaries”

• Calling between applications and drivers

• Calling between managed and native code

• Solution – Think “chunky”, not “chatty” • Make fewer calls with larger amounts of data Frequent Registry Access

• Read registry data once and save in memory

• Keep registry keys open if reading multiple values

• Registry not good for interprocess communication

• The RAM-based registry is much faster than the hive registry Mounting Databases

• Database mounts are very slow – Lots of work by the OS

• Make sure a process mounts a DB only once – Share code within application Data Alignment

• RISC processors hate non-aligned memory accesses – Old ARM processor used to just throw an exception • Not a bad idea!

• Unaligned accesses can increase bus access/cache times 4x!

• Don’t pack structures – Key words to check for: • “UNALIGNED” • #pragma(pack) directive • /Zp compiler option Excessive Switching

• Threads that switch between each other in sub 100 mS range can be problematic

• Creating multiple threads ‘just because’ isn’t a great idea – Why is the code multithreaded?

• Know the characteristics of the CPU – Is it multicore?

• Has your code been recompiled for WEC? Bad Code or Data Locality

• Results in excessive Translation Lookaside Buffer (TLB) misses

• Very hard to notice without looking – Use profiler on MIPS and SHx – Use hardware debugger or In Circuit Emulator to track on ARM and x86

• This can have a massive impact on performance

• Consider hardware debugger to track TLB misses – Lauterbach debugger The Page Pool

• The Page Pool is the amount of RAM used for code pages – Prevents the paging system from using all the RAM for code

• Two Page Pools – Loader Pool • Used for demand paging of code and r/o data from apps

– File Pool • Used for demand paging of memory mapped files • Registry and Databases use this pool as well Page Pool Parameters

• Target Preferred size of RAM used • Maximum Maximum RAM allowable • Release Increment Amount freed lazily if above target • Critical Increment Amount freed quickly if within 1 “critical increment” of maximum • Normal Priority Priority of discard thread during normal operation • Critical Priority Priority of discard thread during critical operation Page Pool Default Values

Loader Pool File Pool Target 3 Meg 1 Meg Maximum 8 Meg 10 Meg Release Increment 128 K 64 K Critical Increment 64 K 64 K Normal Priority 255 255 Critical Priority 247 251

• Override – Config.Sys Fixup variables – IOCTL_HAL_GET_POOL_PARAMETERS Other Things That Are Slow

• String compares through NLS functions – CompareString is a slow compare

• Using Mutexes instead of Critical Sections – If intraprocess, always use Critical Sections • Mutexes only necessary if timeout needed – Rule: Never create an unnamed Mutex System Level Video Performance

• The video subsystem is the slowest part of the system

• Disable Exploding Window animation

; Disable shell rectangle animation [HKEY_LOCAL_MACHINE\SYSTEM\GWE] "Animate"=dword:0

• Consider skinning the shell to reduce complexity – Avoid the ‘3D’ look, its slow – This is what the does Silverlight For Embedded Performance

• Silverlight demands high performance GPUs

• If only slow GPU available – Silverlight still useful • Great for creating unique look and feel – Avoid • Alphablends • Multiple transitions • Gradients • Solution – Have U/I designer test on real hardware • They’ll soon learn what they can do and what they can’t Power Management

• Power Management is important – Even on line-powered machines

• Power Consumption == Heat – Thermal design – Fans! Power Manager

• Provides various levels of operation based on power consumption – On – Idle – Screen Off – Etc…

• Power Manager sets each driver to a power level depending on system level – Configurable via registry

• Levels change programmatically or via input Programming for Lower Power

• Thread Management – Block, always… – Sleeps burn power

• Hardware management – Analyze line drivers (serial, USB) • Turn off when not needed. – GPIOs • Review GPIOs to ensure most efficient use – Clocks • Turn off the clocks to components not in use Summary

• Performance and Power are closely related – Well designed code will be efficient

• Performance must be designed in – It can RARELY be engineered in later

• Power leaks from everywhere – It takes an experienced engineer to track down the leaks – Knowledge of the hardware is critical

• TEST ON TARGET HARDWARE Questions…

Doug Boling Boling Consulting Inc. www.bolingconsulting.com dboling @ bolingconsulting.com © ©2011 2011 Microsoft Corporation. All All rights rights reserved. reserved. This This presentation presentation is for informationalis for informational purposes purposes only. Microsoft only. makesMicrosoft no warranties, makes no express warranties, or implied, express in this or summary.implied, in this summary.