Energy efficiency of mobile video decoding

Tero Rintaluoma Olli Silven Hantro Products Oy Department of Electrical and Information Engineering Kiviharjunlenkki 1 P.O.B. 4500 FI-90220 Oulu, Finland FI-90014 University of Oulu, Finland Email: [email protected] Email: [email protected]

TABLE I Abstract-In this paper, we consider the energy efficiency of CHARACTERISTICS OF TYPICAL PORTABLE MULTIMEDIA DEVICES. implementations of video codecs for mobile devices in a top-down manner. We start from typical applications and analyse device Portable Handheld Typical ratio architectures, codec implementations, and software platforms. Laptop PC Multimedia Terminal The physical size of mobile devices limits their heat dissipation, Display size (inches) 12-15 2-4 5x (area 20x) while the battery capacity needs to be used conservingly to Display resolution (pixels) 1024x768- 176x208- 15x provide for satisfactory untethered active use time. Together 1600x1200 640x240 with the required versatile capabilities of the devices, these Processor DRAM (MB) 256-1024 16-64 16x Processor clock (GHz) 1-3 0.1-0.3 lOx are essential constraints that must be taken into account from Max. power dissipation (W) 60 3 20x hardware to application software design. In video decoding Surface area (cm2) 1500 150 lOx additional constraints come from the need to support multiple Heat dissipation (mW/cm2) 40 20 2x digital video coding standards, and the platform oriented design Video resolution 720x576/25Hz 640x480/30Hz lx regimes of the device manufacturers. Battery capacity 4000mAh/14.4V 10OOmAh/3.6V 15x I. INTRODUCTION Wireless multimedia applications typically use content pro- vided via the web or broadcast services such as DVB-H [1], or the larger display of the laptop explains less than IOW of the play back locally stored music and movies. In addition, users difference. From the usability point of view the user interface can create content and stream it to the network for redistribu- is the biggest difference between these categories of devices. tion or make video calls that require real-time streaming. The The usability of the handheld devices is critically dependent popularity of laptop PCs as DVD players and as a means to on their active use times; these in turn depend on the energy access multimedia content via public WiFi networks can be efficiency. Table II presents the power consumption breakdown a prediction of the future uses of wireless terminals. We may of an early 3G phone in 384kbit/s video streaming mode [2], suspect that if the uses are similar, the same may apply to the clearly showing the limitations of the multimedia implemen- technical solutions. tation. Only 600mW is available for application processing, in The requirements for wireless mobile terminals are tough, this case decoding of video bit stream into sequences of image especially when considered from the energy efficiency point frames. This is a very hard requirement for software solutions. of view. At the same time high demands are placed on the With a 1000mAh battery the active time is limited to around an usability that includes not only the intuitiveness of the user hour. The application power needs of the PDA device [3] are interface, but also the length of active usage time between in the same region, while the larger display and frame buffer charging the batteries. memory explain most of the higher power consumption. The A typical laptop PC user carries a charger and connects to hypothetical power budget for a future device that provides for the mains whenever possible, and uses the device while sitting three hours of active use time has been estimated based on the down. In contrast, a hand held device is expected to provide data of the most power efficient system components available for a longer active use time as they are used anywhere in an today. untethered manner, and are charged only at night. Another energy efficiency related aspect is heat dissipation. This is The increasing bandwidth needs and the increasing com- mostly the concern of the handhelds, as most of the time the plexity of air interfaces make it very difficult to save in laptop devices are desktop operated. RF and baseband signal processing, while essential efficiency The characteristics of typical wireless handheld and laptop improvements can be expected from the display technologies, multimedia devices are compared in Table I. The application e.g., by switching from TFT LCDs to OLEDs. However, requirements are almost the same, but the handheld devices application processing is usually the first target when looking provide the services using around l/10th of the size, energy, for ways to cut down power. and processor speed. The maximum heat dissipation via sur- In the following we consider the energy efficiency issues faces can be only half of the laptop level to prevent the devices from the point of view of mobile video decoders. Comparative from becoming too hot to handle. The power consumption of evaluations are presented when data is available.

1-4244-1058-4/07/$25.00 C 2007 IEEE 103 TABLE II TABLE III POWER CONSUMPTION BREAKDOWN EXAMPLES OF POCKET SIZED MULTIMEDIA USE CASES FOR A PROCESSOR PLATFORM ON 3.6V 800MAH DEVICES. BATTERY.

Use case Target Power Usage time System component Power consumption (mW) bitrate (Mb/s) consumption (mW) (h) 3G phone in PDA device Expected future capture 1-4 video streaming in MPEG-4 playback mobile devices Video 350 8 mode [2] [3] Movie playback I 500 6 Application processor 600 833 100 and memories Display, audio, keyboard 1000 2441 400 and backlights (UI) Misc. memories 200 754 100 provided for encoding, while decoding is performed in soft- RF and cellular modem 1200 N/A 1200 Total 3000 4028 1800 ware. We may conclude that hardware for video decoding Battery capacity 1000/1h N/A 1500/3h mAh/usage time would cut the consumption figures of playback below the currently estimated ones. The assumed processing platform implementation technology in this case is 90nm CMOS.

II. POWER CONSUMPTION AND BATTERY LIFE III. MOBILE VIDEO APPLICATIONS A standard way to estimate the battery life time at a The multimedia applications of mobile devices include the given rate of discharge is the well-known Peukert's law [4], uses as camcorders, video phones or mobile digital TVs, and although it can be somewhat inaccurate for mobile devices. have an impact on system designs and power budgets. The This is explained by the dependency of the battery capacity more versatile devices tend to be less energy efficient due to on temperature that in the absence of active cooling strongly the added software and hardware complexity, and platform depends on the load current [5]. Figure 1 below shows the technologies needed to support rapid development. actual behaviour of a 64OmAh LiON battery in a PDA device Camcorder use requires real-time encoding and preview under constant load based on the experiments in [4]. As capabilities, while during playback decoding and display are the battery life is a non-linear function of the load current, needed as illustrated in Figure 2. Encoding in consumer use is improved power efficiency in the knee region of the curve mostly limited to short DI (720*576@25frames/s) sequences will result in super-linear improvements. within the memory capacity of the device, but requires signif- icantly more processing and power than decoding. Typical consumer camcorders need around 8-9W of power 1200 in encoding mode, while their displays are in the same size 1000 I class as multimedia capable mobile phones. Approximately 1- 2W [8] of the disparity is explained by the electro-mechanics 800 of the DVD drive and additional electrical interfaces of the camcorders, but a significant portion, 6-7W comes from the 600 computing platform and display interface. 400 Decoding flow

200 Mass Post Display Memory Processor Device 0 0 100 200 300 400 500 600 700 Encoding flow Discharge current (mA) Camera Pre- -0- 0 Display Interface Processor Device Fig. 1. Discharge time of 64OmAh LiON battery u]nder constant load.

Encoder ; Mass Table III shows estimates provided by a multimedia pro- lviciiiory cessor supplier [6] for a hypothetical mobile device with a smart energy efficient display. The video standard in question Fig. 2. Decoding and encoding data flow. is MPEG-4 SP [7] at 30 frames/s that is close to DVD quality. The power results are for a full system including all compo- The mobile TV is about to create a demand for terminals nents such as camera, display, speakers, etc. However, this that support several simultaneously decoded program streams device lacks wireless connectivity. For comparison, Microsoft to provide for living thumbnails, as shown in Figure 3. This Zune player is based on the same processor technology and feature comes from the expectations to seamless channel achieves 4h video playback in QVGA format on 800mAh 3.7V surfing despite the energy saving time-slicing technique used battery. in the air-interface of DVB-H. However, thumbnails and split In the case of Table III video encoding consumes less power displays effectively multiply the power and memory bandwidth than decoding. This is explained by hardware acceleration needs of the decoding task. In practice, either at least two

104 TABLE IV decoders, as in some digital TV set-top boxes, or the shared PROCESSOR CYCLES/S AND POWER NEEDS OF MPEG-4 AND H.264 use of the decoding resources are needed. DECODERS (VGA 30 FRAMES/S, 47OKBIT/S) ON THREE PROCESSORS.

Processor EPI (nJ) MPEG-4 H.264 Cycle rate Power needs Cycle Rate Power needs (MHz) (mW) (MHz) (mW) 4 48 273 5060 725 13440 (Cedar Mill) 15 400 2320 1060 6140 (Dothan) Core Duo 11 280 1190 744 3160 (Yonah)

TABLE V POWER NEEDS OF H.264 DECODERS ON ARM PROCESSORS (VGA 30 FRAMES/S, 512 KBIT/S).

Processor Cycle CPU power CPU+memory power rate consumption consumption (est.) Fig. 3. Picture-In-Picture mobile television demonstration. (MHz) (mW) (mW) ARM1O (1022E) 384 222 450 ARMIl (1136J-S) 434 332 550 These kinds of demands need to be considered against the video capability implementation technologies. The two basic alternatives are the completely software based approach and frames/s VGA baseline H.264 bitstreams. DVD/MPEG-2 [13] hardware acceleration. The energy efficiency of software is decoding at 3Mbit/s and MPEG-4 at 512kbit/s rate need 30- lower, whereas hardware acceleration, when implemented in a 50% of these figures, that is, around 200-300mW with memo- monolithic manner, can result in inflexibility. The actual design ries. As seen from the estimated total power consumptions, the problem is to find cost and energy efficient solutions. costs of accessing external memories may more than double In case of our mobile TV example, an option that comes the power needs. The processor implementations have been from the desktop applications world, is to employ several roughly scaled to the same technology as above (65nm, 1.33V) instances of software based decoders on a fast processor that by assuming that the power consumption is proportional to supports frequency and voltage scaling. For instance, only supply voltage squared and the design rule (line width). The about 10-15% of the maximum performance of Core Duo CPIs (cycles per instruction) with this application are 1.2 and processor is needed for the H.264/AVC [9] decoding task at 1.4 for ARM10 and ARMl 1, respectively. The lower CPIs are CIF and QVGA resolutions [10]. This choice can be defended explained by shorter pipelines. by compatibility grounds, although it is an energy efficiency The energy efficiency difference between the desktop and and cost compromise. embedded processors are significant, but the gap is narrowing In the remaining sections of this article we consider the and can be expected impact on technology selections and software and hardware options for implementing the video platform designs. The current mobile processors offer low capabilities. The key observation is the rapidly narrowing gap power technology and lower cycle rates, but at the cost of between mobile and desktop processor technologies. lower peak performance when compared to PC processors. IV. SOFTWARE BASED DECODERS Table VI shows the power needs of ARM series cores im- plemented using a 90nm 1V CMOS process. The EPI values The power needs of mobile and general purpose processor have been calculated from H.264 video decoding, in which the based platforms in multimedia applications are approaching CPI for ARM7 and ARM 9 are 2.2 and 1.6, respectively. It is each other and in the future they may not differ significantly. obvious that architectural changes that have aimed at higher To illustrate these developments, Table IV shows the nominal performance include energy efficiency compromises. energy per instruction data (EPI) of Intel PC processors normalized to the same 65nm 1.33V silicon process [10]. The table also contains the required cycles/s for MPEG-4 TABLE VI and H.264 decoding [11], and the approximate power con- ENERGY CHARACTERISTICS AND SILICON AREAS OF ARM PROCESSORS. sumptions of the decoders when the processors are clocked at their nominal rates without dynamic power management. Processor Clock frequency Silicon area Power EPI All the video figures have been scaled from the data given (MHz) (mm2) consumption (nJ) (mW/MHz) for [11]. In power consumption calculations 2.59 ARM7 (7EJ-S) 260 0.5 0.1 0.22 cycles per instruction (CPI) has been assumed [12]. ARM9 (926EJ-S) 500 1.55 0.29 0.46 ARM10 (1026EJ-S) 540 2.45 0.45 0.54 For comparison, Table V shows the power consumptions ARM1l (1136J-S) 620 2.5 0.45 0.63 of selected ARM architecture processors in decoding of 30

105 A. Instruction set efficiencies memories, depending on the capacity requirements and silicon To get the whole picture, it is good to compare the rel- area consumed. ative instruction counts of video decoder implementations. Table VII reveals that for the above highly optimized MPEG-4 Input Output and H.264 decoders they are in the same range on ARM and Pentium processors. We may conclude that the big differences in energy efficiency are at least partly explained by architec- tural features that are not used at this application level, such as memory management, multitasking support, and superscalar execution. TABLE VII INSTRUCTION COUNTS IN MILLIONS FOR DECODING Is VIDEO BIT STREAM (VGA 30FRAMES/S, 512KBIT/S).

Processor MPEG-4 H.264 Pentium 4 105 280 Pentium M 154 410 Core Duo 108 288 Fig. 4. The internal pipelined organization of a monolithic MPEG-4 ARM9 129 322 accelerator. ARM10 129 322 ARMIl 99 311 Table VIII shows the relative energy consumptions of two MPEG-4 decoder implementations that have very similar in- We can estimate that a DVD playback application (decoding put data access characteristics, but the hardware accelerated of video and audio) consumes about 250-400mW on mobile solution uses only 5% of the total energy needed by the processors implemented using the 65nm 1.33V CMOS pro- software solution. Furthermore, the software solution, although cess. The estimated memory power consumptions are included optimized, needs 30 times more energy for memory accesses in the estimates. These figures can be compared to a software than the accelerated application. The explanation is in the implementation of DVD playback on Core Duo CPU [8] accesses needed to store intermediate results. that through exploitation of dynamic voltage and frequency TABLE VIII consumes scaling techniques approximately 700mW, saving at ESTIMATED ENERGY SHARES OF SYSTEM COMPONENTS IN MPEG-4 least 500mW from the nominal level of the processor. In other VIDEO DECODING. words, the application level EPI of the PC processor is only two times than for embedded processors. higher high-speed Platform component Hardware Software Relative energy Based on Table II, we see that using currently available PC decoder [14] decoder [15] (HW:SW) processors for video streaming applications would increase the CPU 1% (ARM9) 20% (ARM7) 1:10 Memories and 54% 80% 1:30 power less than 30%. This can be a system consumption by memory interfaces very acceptable trade-off, although the higher load currents Decoder 14W 45% N/A N/A could drop the battery lifetime more than proportionally. Relative total energy 1 20 1:20 In the future the power efficiency disparities between desk- top and mobile processors can become even smaller. Then the If support for standards is needed in the key justifications for using mobile processors can be just the multiple coding weight and price. same device, software is obviously the more flexible imple- mentation technique. To illustrate the rapid complexity growth V. HARDWARE BASED DECODERS of monolithic hardware accelerator designs, Table IX [16] Hardware acceleration has potential for high energy effi- shows four decoders, two software and two hardware imple- ciency when compared to software implementations, because mentations. The accelerators are a single and a multi-standard of the lack of instruction fetch and decoding cycles. In addi- decoder. The software codes for MPEG-4 and H.264 are tion, a substantial share of the data accesses can be eliminated. essentially rewrites, and re-use only the instruction set, while In the following subsections we consider both monolithic and the hardware implementation re-uses the functional blocks and finer grained accelerator implementations. suffers from increasing amount of control logic with each added decoder. We can also notice the growth of internal A. Monolithic hardware accelerators memory that is the buffers between the accelerator blocks, Figure 4 shows the pipelined internal functional blocks however, the silicon area consumed by buffering is still small of the monolithic MPEG-4 decoder, in which energy effi- in comparison to the total. ciency is achieved by clock gating and by avoiding the use The complexity growth is continuing with new standards of external and shared memories. The data buffers needed and extensions such as VC-1 [17] and the main profile of between the blocks are implemented either as registers or H.264. These add new functional blocks that complicate the

106 TABLE IX GATE COUNT AND MEMORY NEEDS OF VIDEO DECODER software consumes 5000-6000 cycles, the accelerator enables IMPLEMENTATIONS FOR (VGA, 30FRAMES/S). significant savings, but with a VGA sized bitstream, 36000 interrupts are generated each second, wasting a significant Implementation Gate Memory External Internal Relative count footprint memory memory power portion of the processor resources in overheads, extra adminis- (kgates) (kB) (kB) (kB) consumption Software H.264 N/A 65 1164 N/A 2 tration, and control. In practise, the gatecounts of macroblock on ARMIl processor decoders are not much lower and the flexibility is not any Software MPEG-4 N/A 45 1061 N/A 1 on ARMIl processor better than with monolithic accelerators. Monolithic hardware 221 105 947 18 0.25 (H.264) multiformat decoder 0.06 (MPEG-4) 2) Function accelerators: An ideal fine grain accelerator (MPEG-4, H.264, JPEG) solution should work as shown in Figure 6 where an inter- Monolithic hardware 96 60 938 3.7 0.05 (5mW) MPEG-4 decoder macroblock in a MPEG-4 stream is being decoded. VLD and MV denote to Variable Length Decoding and Motion Vector, respectively. The pipeline of the monolithic hardware routing of data in multi-standard accelerators, making the solution has been replaced by cooperatively scheduled threads silicon area gap to single standard based designs narrower. that switch between software and hardware execution. The latencies of the accelerator blocks are deterministic and known B. Finer grained accelerators when the threads are scheduled based on the contents of the A way to slow down the complexity growth of hardware so- input bit stream. Interrupts are not needed, as each hardware lutions is to resort to finer grained acceleration. Then, more of execution is guaranteed to have finished before the respective the control complexity could be allocated to software. Unfor- next step in software continues. tunately, this may increase the overheads in software/hardware interfacing. Interrupts are the usual mechanism to synchronize hardware accelerators to software. A typical interrupt overhead in a platform with an operating system is around 300 CPU cycles. On the other hand, the 8x8 IDCT (Inverse Discrete Cosine Transform) that is common in video decoders needs around 300 CPU cycles when implemented in mobile microcontroller software, and less on a suitable DSP or media processor.

Although a straightforward hardware implementation executes Time in a few tens of clock cycles, such a solution is more expensive than software after the interrupt overhead and the actual Fig. 6. A use-case for fine grained hardware accelerators. interrupt service cycles are added. 1) Macroblock accelerators: The next step towards coarser The energy efficiency achieved through the cooperative granularity is to employ hardware acceleration for decoding thread scheduling is 2-4 times better than with software macroblocks; Figure 5 shows the idea for the intra case. implementations. This is much better than with interrupt The top level of the finite state machine control of the driven schemes that are out of question due to excessive is now software. With MPEG- decoding pipeline replaced by software/hardware synchronization overheads. Nevertheless, 4 each macroblock may comprise zero to six 8x8 blocks that the gap to monolithic hardware accelerators is substantial as each need to be variable-length decoded, AC/DC predicted, they are still 5-10 times more efficient. and IDCT'd. In addition, motion compensation that employs bilinear interpolation of image data is required for inter-coded VI. SOFTWARE INTERFACING ISSUES macroblocks. As mobile multimedia devices must satisfy a large amount 0 1 Entropy RLC Inverse Inve of requirements, the result tends to be a complex hardware and Scan Quantization 1E F Decoding Decoding software system. For instance, support for several standards is 2 Cb needed, such as JPEG, H.264, MPEG-4 and VC- 1, in the same IFT CbJ AC/DC Image execution environment. The interfacing solutions that include L Crj Prediction IDCT Reconstruction multitasking operating systems, APIs, and middleware, can have big impacts on video performance. Fig. 5. Decoding of a MPEG-4 intra macroblock. Mobile operating systems, such as Symbian [18], have been designed to operate at low overheads, although the application Depending on the degree of parallelism in hardware, and start-up costs can be high. However, few video applications the number of coded blocks and the type of the macroblock, a interface to the plain operating system, instead, they are built typical macroblock accelerator needs around 800-1200 cycles. to run on multimedia interfacing frameworks. Much of the So the interrupts needed for software/hardware synchroniza- software overheads originate from these layers that provide for tion add 20-30% in overheads. As decoding a macroblock in compatibility between platform and operating system versions.

107 A. The cost of a multimedia software framework API and CONTROL LAYER Symbian Multimedia Framework (MMF) is an example of API Control interfacing solutions. It is a multi-threaded approach for audio and video streaming functionalities, and is intended to stan- SEQUENCE LAYER Stream VOP Video Packet Short Video dardize the integration of software and hardware based video Headers Headers Headers Headers coding solutions. For that purpose it includes mechanisms to MACROBLOCK LAYER build portable multimedia applications. Based on experiments made on an actual ARMl platform the costs of accessing MCBPC CBPY MV DC Coeff. the video decoder functionality through this framework is 2,14MHz that translates to approximately 1mW of power. BLOCK LAYER The bulk of the overhead cycles turn out to originate from VLC Moion AC/DC IDCT Compensation Prediction middleware. Previously, Verhoeven et al (2001) found that the performance of different middleware solutions varied between 260 and 7500 calls per second [19]. This translates into at Fig. 7. Layered software architecture of a MPEG-4 video decoder. least tens of thousands of cycles per call and is in line with our observations. Table X shows the typical overheads TABLE XI THE POWER USED INTO INTERNAL OVERHEADS WITH THREE API LAYER as on a of interface mechanisms ARMIl processor cycles OPTIONS (MPEG-4 DECODER ON ARM 1I IMPLEMENTED AT 90NM Symbian operating system. CMOS, VGA 30 FRAMES/S, 512KB/S).

TABLE X TYPICAL SOFTWARE INTERFACE COSTS IN AN EMBEDDED SYSTEM APIs Overhead Power consumption ENVIRONMENT (SYMBIAN 9). (MHz) (mW) Sequence layer only 0 0 Mechanism Sequence and macroblock layers 2.7 1.2 Overhead/cycles Sequence, macroblock, and block layers 11.4 5.1 Procedure call 3-7 System call (user-kernel) 1000-2500 Interrupt latency 300-600 Context switch 400 Middleware 60000 layers. The experiments were run without an operating system for maximum cache efficiency. The internal overheads of the decoder are about 5.4% of the total decoder effort. Energywise the overheads are about B. Internal interface overheads the same as the total needs of a monolithic MPEG-4 decoder The internal organization of the decoder requires at least accelerator implemented using the same silicon technology. function interfaces, and perhaps simple APIs to provide for configurability. As typical video decoders consist of 10-20 VII. DISCUSSION algorithms that are in total invoked around 1-2 million times Video is among the most demanding services in mobile de- each second for VGA sequences, the costs of invocation vices that are increasingly used for entertainment applications. mechanisms are important, regardless of whether the imple- These require at least 3-4h of battery life from small devices, mentation is in the software or hardware. and are a big challenge for energy efficient design. The battery Figure 7 shows an indicative organization of a MPEG-4 life time can increase significantly via relatively small savings decoder [20] that consists of layers that each provide decoding in power consumption. functions for the upper layer. This is the structure designed The decision between software and hardware implemen- already into the standards and regularly imitated in software tation of the video functionality sets the baseline for power implementations. The sequence layer is executed once for consumption in the active state. Table XII shows the estimated each frame or video packet, and extracts information on the power consumptions of a number of MPEG-4 video decoder employed coding tools and parameters from the input stream. implementations. Although the differencies are large, complete The macroblock layer in turn controls the block layer decoding systems add user interfaces, mass memories, and communica- functions. For a VGA bit stream, the macroblock layer is tions subsystems, that reduce the relative impacts of video invoked at most 1200 times per frame, while the block layer power needs. is run at most 7200 times. The energy efficiency gap between mobile and desktop Table XI demonstrates the costs of internal software inter- processors is becoming smaller, while the number of ap- faces, when the APIs enabling reusability of functionalities plications running on mobile devices is growing. This will are placed on the sequence, macroblock, and block layers, and obviously influence the platform designs, and we may see the assumed call overhead is 7 cycles. These overheads were desktop processors integrated into devices that were previously measured with a pure software implementation, and are an reserved for embedded cores. approximate lower bound of fine grained accelerator solutions. The power efficiency of hardware accelerated video is The figures do not contain the costs of any functionality in the excellent, and higher resolutions than with software imple-

108 TABLE XII APPROXIMATE MPEG-4 VIDEO DECODER POWER CONSUMPTIONS ON [2] Y Neuvo, "Cellular phones as embedded systems," in Solid-State SELECTED PLATFORM IMPLEMENTATIONS (MW) (VGA, 30 FRAMES/S, Circuits Conference, vol. 1, 2004, pp. 32-37. 1.33V, 65NM CMOS). [3] H. Shim, "System-Level Power Reduction Techniques for Color TFT Liquid Crystal Displays," Ph.D. dissertation, School of Computer Sci- ence and Engineering, Seoul National University, Korea, 2006. System n ARM Intel n Monolithic Fine grained [4] D. Rakhmatov, S. Vrudhula, and D. Wallach, "A model for battery Component ha,rdware hardware lifetime analysis for organizing applications on a pocket computer," in Application processor 200-300 600(est.) 12 100-150 Very Large Scale Integration (VLSI) Systems, IEEE Transactions, vol. 11, and memories Dec. 2003, pp. 1019-1030. Hardware accelerator N/A N/A 12 10 [5] D. Doerffel and A. Sharkh, "A critical review of using the Peukert equa- and memories tion for determining the remaining capacity of lead-acid and lithium-ion Total (computing) 200-300 600(est.) 13_ 110-160 batteries," in Journal of Power Sources, vol. 155, Apr. 2006, pp. 395- 400. 2 [6] 0. Pelc, "Multimedia Support in the i.MX31 and i.MX31L Applications Processors." Freescale Semiconductor, Feb. 2006, document Number: IMX31MULTIWP, Rev. 0. mentations can be supported. As monolithic hardware solu- [7] ISO/IEC 14496-2:2004, Information technology - Coding ofaudio-visual tions suffer from increasing complexity, finer-grained video objects - Part 2: Visual, 3rd ed., ISO/IEC, Jun. 2004. accelerators are becoming attractive. [8] R. Chabukswar, "DVD Playback Power Consumption Analysis." Intel, 2007. VIII. SUMMARY [9] ISO/IEC 14496-10:2005; Recommendation ITU-T H.264, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audio- Handheld devices are approaching the video capabilities of visual services - Coding of moving: Advanced video coding for generic desktop systems, while severely constrained by their physical audiovisual services video, ITU-T, Nov. 2005. [10] E. Grochowski and M. Annavaram, "Energy per Instruction Trends in size and battery capacity. On the other hand, the energy Intel Microprocessors," in Technology@Intel Magazine. Intel, Mar. efficiency of desktop processors is no longer much inferior 2006. to embedded cores. [11] Fraunhofer, "Fraunhofer IIS MPEG-4 Video Software." Fraunhofer Institut Integrierte Schaltungen IIS, May 2006. As a result, the apparent number of technological alterna- [12] C. Martinez, M. Pinnamaneni, and E. John, "Multimedia Workloads tives for system designers is growing. Still, due to the non- versus SPEC CPU2000," in 2006 SPEC Benchmark Workshop, The linear relationship between the battery life time and discharge University of Texas, Jan. 2006. [13] ISO/IEC 13818-2:2000, Information technology - Generic coding of current, the system designers are destined to make choices moving pictures and associated audio information: Video, 2nd ed., that result in apparently minor savings, but extend the active ISO/IEC, Dec. 2000. use time over critical limits. The choices are affected [14] Hantro, "Hardware Video Codecs for Wireless IC's." Hantro Products design Oy, 2006. by the estimated development effort and time to market that [15] D. Shin, H. Shim, Y. Joo, H.-S. Yun, J. Kim, and N. Chang, "Energy- regularly contradict with power efficiency. Monitoring Tool for Low-Power Embedded Programs," in IEEE Design & Test, vol. 19. IEEE Computer Society Press, Jul. 2002, pp. 7-17. ACKNOWLEDGMENT [16] K. Vainio, "Video Decoder and Post-processing in Portable Device," Master's thesis, Department of Electrical and Information Engineering, Numerous people have directly and indirectly, sometimes University of Oulu, Oulu, Finland, 2006. even unknowingly, contributed to this paper by providing us [17] SMPTE 421M-2006, VC-1 Compressed Video Bitstream Format and Decoding Process, SMPTE, Feb. 2006. their observations, comments, and questions. In particular, we [18] Symbian, "Introduction to the ECOM Architecture," 2006. wish to thank Messrs. Juuso Raekallio, Jani Huoponen, and [19] P. Verhoeven, J. Huang, and J. Lukkien, "Network middleware and Juha Valtavaara from Hantro Products Oy, Mr. Kari Jyrkka mobility," in PROGRESS workshop, 2001. [20] Hantro, "Software Video Codecs for Wireless ICs and Handsets." from the Nokia Corporation, Dr. Mark Barnard and Mr. Jani Hantro Products Oy, 2006. Boutellier, both from the University of Oulu. REFERENCES [1] ETSI EN 302 304 V1.1.1, Digital Video Broadcasting (DVB); Trans- mission System for Handheld Terminals (DVB-H), European Telecom- munications Standards Institute, Nov. 2004.

109