Minimizing Bank Selection Instructions for Partitioned Memory Architectures∗
Total Page:16
File Type:pdf, Size:1020Kb
Minimizing Bank Selection Instructions for Partitioned Memory Architectures∗ Bernhard Scholz Bernd Burgstaller Jingling Xue The University of Sydney The University of Sydney University of NSW [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Bank switching is a technique that increases the code and Embedded systems have become an integral part of the data memory in microcontrollers without extending the ad- infrastructure of today's technological society. They are dress buses. Given a program in which variables have been prevalent in an ever-increasing range of applications, in- assigned to data banks, we present a novel optimization cluding consumer electronics, home appliances, instrumen- technique that minimizes the overhead of bank switching tation/measurement, automotive, communications and in- through cost-effective placement of bank selection instruc- dustrial control. Microcontrollers constitute the core of all tions. The optimal placement is controlled by a variety embedded systems designs. According to the Semiconductor of different objectives, such as runtime, low power, small Industry Association's November 2005 forecast, the market code size or a combination of these parameters. We have for 4-, 8-, 16-, and 32-bit microcontrollers will grow to $12.8 formulated the problem as a form of Partitioned Boolean billion in 2006. The reported share of 8-bit microcontrollers Quadratic Programming (PBQP). is 42%. Gartner Dataquest reports that the 8-bit market We implemented the optimization as part of a PIC Micro- reached $5.5 billion in 2004 [6]. chip backend and evaluated the approach for several opti- The widespread use of 8-bit microcontrollers can be at- mization objectives. Our benchmark suite comprises pro- tributed to the following: (1) many embedded systems de- grams from MiBench and DSPStone plus a microcontroller signs do not need the more costly, energy-burning and com- real-time kernel and drivers for microcontroller hardware de- plex 16- or 32bit CPUs, (2) many embedded systems designs vices. Our optimization achieved a reduction of program distribute small numbers of low-cost electronics instead of memory space between 2.7% and 18.2%, and an overall im- using one powerful and expensive core CPU, (3) embedded provement with respect to instruction cycles between 5.1% systems designs often employ 8-bit microcontrollers as low- and 28.8%. Our optimization achieved an optimal solution cost subsystems of complex 32-bit hardware designs, and (4) for all benchmark programs. there is a trend to add entry-level electronics intelligence to mechanical-based systems. Categories and Subject Descriptors Bank switching is a common technique for 8-bit micro- controllers to increase the size of code and data memory D3.4 [Programming Languages]: Processors|Compilers without extending the address buses of the CPU. The ad- dress space is partitioned into memory banks, and the CPU General Terms can only access one bank at a time. This bank is called the Algorithms, Languages, Performance active bank. To keep track of the active bank the CPU's bank register stores the address of the active bank. A bank selec- tion instruction is issued to switch between banks. Smaller Keywords address buses result in smaller chip die-sizes, higher clock Compiler optimization, microcontrollers, partitioned mem- frequencies and less power consumption. As an example, ory architecture, bank-switching, RAM allocation, PBQP Motorola 68HC11 8-bit microcontrollers addresses a maxi- ∗ mum of 64KB memory using their 16-bit address registers. This project has been supported by the ARC Discovery This scheme allows multiple 64KB banks to be accessed al- Project Grant \Compilation Techniques for Embedded Sys- tems" under Contract DP 0560190 and the University of though only one can be active at a time. As another exam- Sydney R&D Grants Scheme \Speculative Partial Redun- ple, the PIC16F877A microcontroller allows data accesses to dancy Elimination" under Contract L2849 U3229. be switched between four 128B data banks. Other processor families have similar features such as Zilog's Z80 and Intel's 8051 processor families. Architectures such as Ubicom's 8- bit SX microcontroller organize their registers in register Permission to make digital or hard copies of all or part of this work for banks to shorten the cycle time avoiding multi-porting [15]. personal or classroom use is granted without fee provided that copies are The disadvantage of bank switched architectures is the not made or distributed for profit or commercial advantage and that copies code-size and runtime overhead caused by bank selection bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific instructions. Several commercial and open-source compil- permission and/or a fee. ers for microcontrollers provide limited support to generate CASES'06, October 23–25, 2006, Seoul, Korea. bank-switched code. For example, GNU GCC for Motorola Copyright 2006 ACM 1-59593-543-6/06/0010 ...$5.00. 68HC11 and 68HC12 will compile a function declared with to show that our optimization can accommodate a va- the far attribute by using a calling convention that takes riety of optimization objectives such as speed, space care of switching banks when entering and leaving a func- and a combination of both. tion. However, the GCC compiler does not eliminate re- dundant bank selection instructions. The CC5X compiler The paper is organized as follows. In Section 2, we de- for mid-range PICmicro devices (from B Knudsen Data) ex- scribe the background. In Section 3, we define and motivate pects the programmer to allocate variables to banks but the problem of minimizing the costs of bank selection in- will insert bank selection instructions automatically with no structions across basic block boundaries. The optimization guarantee of optimal placement of the bank selection in- algorithm is presented in Section 4. In Section 5, we present structions. The PICC-18 for the PIC18Fxxx family appears and discuss our experimental results. We draw our conclu- to have automated both tasks under certain language re- sions in Section 6. strictions. As far as the authors are aware of, the bank switching schemes used in existing compilers seem to be ad 2. BACKGROUND hoc, and it is still a challenging research problem to generate A basic block is a sequence of statements in which flow efficient memory accesses for bank-switched architectures. of control can only enter from its beginning and leave at This work is concerned with developing a compiler opti- its end. A control flow-graph(CFG) is a directed graph mization for optimal placement of bank selection instruc- G = hV; E; s; ei where V is the set of vertices represent- tions in a bank-switched architecture. This problem is im- ing basic blocks and E is the set of edges. Vertex s is the portant because poor placement of bank selection instruc- entry node (aka. start node) of the CFG and e is the exit tions increases runtime, code-size, and power consumption. node (aka. end node). The set of predecessors preds(u) is Given a program in which all variables have been assigned defined as fwj(w; u) 2 Eg and the set of successors succs(u) to banks (by the programmer or compiler), we present an as fvj(u; v) 2 Eg. A critical edge is an edge (u; v) for optimization that inserts a minimum number of bank selec- which jsuccs(u)j > 1 and jpreds(v)j > 1. A path π is a tion instructions in the program to guarantee that banked sequence of vertices hv ; : : : ; v i such that (vi; vi ) 2 E for memory is accessed correctly. The optimal placement is con- 1 k +1 all 1 ≤ i < k. In a CFG, all vertices are reachable, i.e. there trolled by a variety of objectives such as runtime, low power, is a path from s to every other vertex in V . small code size or a combination of these parameters. The The PBQP problem [20, 4] is a specialized quadratic as- authors are only aware of an ad-hoc approach in this area, signment problem and is NP-complete. Consider a set of which was introduced in [8]. discrete variables X = fx ; : : : ; xng and their finite domains Most previous efforts on partitioned memory architectures 1 fD ; : : : ; Dng. A solution of PBQP is a simple function focus on maximizing parallel data accesses to make mem- 1 h : X ! D where D is D [ : : : [ Dn; for each variable ory banks simultaneously active [2, 10, 16, 17, 19, 21, 25, 1 xi we choose an element di in Di. The quality of a solution 26]. By enabling parallel memory accesses in a single in- is based on the contribution of two sets of terms: struction, one can increase memory bandwidth and thus improve program performance. Such partitioned memory 1. for assigning variable xi to the element di in Di. The banks are found in Motorola's DSP56000, Analog Devices' quality of the assignment is measured by a local cost ADSP2016x and NEC's µPD77016. Some researchers re- function c(xi; di). organize the order of instructions and the layout of data, e.g., by loop transformations [3], to reduce energy consump- 2. for assigning two related variables xi and xj to the tion in partitioned memory architectures. In the case of elements di in Di and dj in Dj . We measure the heterogeneous memory banks such as scratchpad SRAM, in- quality of the assignment with a related cost function ternal DRAM and external DRAM, we refer to [1, 11, 23, C(xi; xj ; di; dj ). 24] and the references therein for a number of compiler tech- niques proposed on performing automatic scratchpad man- Thus, the total cost of a solution h is given as agement.