Program Analysis and Optimization for Embedded Systems
Total Page:16
File Type:pdf, Size:1020Kb
Program Analysis and Optimization for Embedded Systems Mike Jochen and Amie Souter jochen, souter ¡ @cis.udel.edu Computer and Information Sciences University of Delaware Newark, DE 19716 (302) 831-6339 1 Introduction ¢ have low cost A proliferation in use of embedded systems is giving cause ¢ have low power consumption for a rethinking of traditional methods and effects of pro- ¢ require as little physical space as possible gram optimization and how these methods and optimizations can be adapted for these highly specialized systems. This pa- ¢ meet rigid time constraints for computation completion per attempts to raise some of the issues unique to program These requirements place constraints on the underlying analysis and optimization for embedded system design and architecture’s design, which together affect compiler design to address some of the solutions that have been proposed to and program optimization techniques. Each of these crite- address these specialized issues. ria must be balanced against the other, as each will have an This paper is organized as follows: section 2 provides effect on the other (e.g. making a system smaller and faster pertinent background information on embedded systems, de- will typically increase its cost and power consumption). A lineating the special needs and problems inherent with em- typical embedded system, as illustrated in figure 1, consists bedded system design; section 3 addresses one such inherent of a processor core, a program ROM (Read Only Memory), problem, namely limited space for program code; section 4 RAM (Random Access Memory), and an ASIC (Applica- discusses current approaches and open issues for code min- tion Specific Integrated Circuit). The cost of developing an imization problems; section 5 proposes a new approach to integrated circuit is linked to the size of the system. The the code minimization problem; and finally section 6 pro- largest portion of the integrated circuit is often devoted to vides areas of related work to embedded systems. the ROM for storing the application. Therefore, developing techniques that reduce code size are extremely important in 2 Background terms of reducing the cost of producing such systems. Unfortunately, the current state of the art in compilation Consumers’ hunger for smaller, faster, more powerful gad- for embedded systems is somewhat left to be desired. We gets in the telecommunication, multimedia, and consumer live in an era where memory for the general use computer is electronics industries is pushing the integration of complete abundant and processing speed is remarkably fast. Thus is- systems onto a single chip [Lie97]. The demand for con- sues such as limiting code size for compiled code have taken sumer oriented, wireless communication and multimedia de- a back seat to other optimizations. Compiler technology for vices directly impacts the design of the underlying architec- embedded system design must support: ture for such gadgets. Embedded systems are quickly be- ¢ coming the means to meet these consumer demands. Em- compilation for low-cost, irregular architectures (in- bedded systems are much more specialized in nature than struction set programmable architectures) like micro- general computing systems (e.g. desktop computers, lap- controller units (MCU’S), Digital Signal Processors top machines, servers, and workstations). Embedded sys- (DSP’s), and application specific instruction set pro- tems are designed to serve dedicated functions (e.g. anti- cessors (ASIC’s) lock brake systems, digital cellular phones, video coding for ¢ rich data structures to support complex instruction sets HDTV, audio coding for surround sound, etc.). for the above The special nature of use of these personal digital assis- tants (PDA’s) imposes several critical criteria on their design. ¢ extensive searching (helpful in register allocation, These systems must: scheduling, & code selection) that must be addressed because embedded systems are usu- ally constrained to relatively small memories. Thus, with the onslaught of embedded processors and A/D embedded systems, we are now forced to re-examine many of today’s optimizations and their effect on the target code in terms of code size, register use, instruction scheduling, and D/A DSP garbage collection, among other issues. CORE ROM S/P 3 Code Minimization for Embedded Systems The whole premise of an embedded system demands that P/S it be small; no one wants to carry around a portable phone that is as large as a toaster. This constraint on size means there can be no wasted space. Put more succinctly, every- RAM ASIC DMA thing from chip design to on-board memory must take as little space as possible. To this end, a few approaches to limiting the amount of memory needed to hold executable code have surfaced. Some techniques look at instruction Figure 1: A typical embedded system. scheduling [CS98, CSS99] and its effect on target code size. Other techniques employ some form of pattern matching [CM99, LDK99] to eliminate regions of repeated code. A ¢ methods for capturing architecture specific optimiza- third genre of technique attempts to physically compress the tions easily target code [LW98] to save on precious memory space. All of these new requirements render current compiler An attempt to limit target code growth via efficient in- technologies at least inefficient and at most unusable for em- struction scheduling was introduced by [CS98]. This tech- bedded systems. In fact, most source codes for embedded nique merely tries to repair the damage (the increase in code systems today are written in assembly, despite all the bene- space requirements) normally done by instruction schedul- fits that come with today’s high level languages. The costs ing optimization (e.g. finding another sequence of instruc- of using these high level languages via traditional compila- tions that minimizes the number of ’nop’ instructions). This tion are too great in terms of executable code size, execution is done by scheduling larger regions of code which allows a time, and number and type of registers used. Further, a com- higher degree of fine-grain parallelism (ILP) to be achieved, piler for embedded systems must be: thus eliminating ’nop’ or stall instructions. This technique adds a good deal of overhead (in terms of time) to the schedul- ¢ retargetable - able to compile to a number of changing ing phase of compilation. Somehow finding a way to de- instruction sets for different architectures crease instruction scheduling time while increasing the size of the block of instructions to be scheduled would be wel- ¢ able to handle register constraints - most embedded comed improvements. processors have a number of special-purpose rather than In [CSS99], Cooper et al. expanded the above approach general-purpose registers (allows for tighter instruction to include a series of optimizations in which genetic search coding) algorithms find a sequence of optimizations that produce ¢ able to handle specialized arithmetic - DSP’s often have small codes. Genetic algorithms are search algorithms de- specialized math operators which require more than 3 signed to mimic the process of natural selection and evolu- operands (thus rendering 3-address-code useless) tion in nature. The goal of the algorithm in this case is to search for the best optimizing sequence that creates minimal The complexity of these systems are forcing designers codes. Cooper at al. validates their approach with experi- who were once able to implement functions in hardware to ments using C and FORTRAN codes. The genetic algorithm switch to implementing the functions in software and then approach produces code sizes that range from 20 to 75 per- burning the program onto a chip. There is now a motiva- cent smaller than unoptimized code. They also show that tion to switch to high-level languages to program embed- the GA produces codes with 0 to 40 percent fewer static op- ded systems, assembly languages are being phased out be- erations, and that the GA can produce codes with 25 to 26 cause development costs are lower when using high-level percent fewer dynamic operations. Since code for embedded languages. One disadvantage of implementation via high- systems is often compiled once and burned onto a ROM, the level languages is increased code size. This is a problem software designer will tolerate longer compile times. Ge- netic algorithms do take a long time to run. For their results, cache miss, the cache line is fetched from memory and sent they ran the GA for 10 hours to find the best optimization through a decoder before it is given to the cache. Thus, de- sequence. The GA approach typically produces small object compression is done on the fly at run time. They present two codes, but not always. Also, the code produced may some- algorithms for compression using such an approach. Their times run more slowly than the original code, but it is stated results show that these approaches achieve good code com- that code speed comes secondary to code size. pression results, comparable to UNIX Compress and gzip. The goal of Liao et al’s work is to achieve maximal code The difference is that their approach can be performed sep- compression while incurring very little performance penalty arate from compilation. This approach does require special [LDK99]. The basic technique uses a data compression ap- hardware, and there is some increase in cache miss penalty proach where common sequences of code are extracted to as decompression must also be included with each cache form dictionary entries and are replaced by mini subrou- miss. tine calls to the dictionary. Liao et al. formulate the dic- An application of code compression is shown in the work tionary entry selection and substitution problem as a set- by [YRE98]. They are trying to compress the Java machine covering problem. Two methods are introduced, one based in order to make it feasible to run on the Inferno OS (an OS on software and the second based on hardware/software.