A Binary Instrumentation Tool for the Blackfin Processor

A Binary Instrumentation Tool for the Blackfin Processor Enqiang Sun David Kaeli Department of Electrical and Computer Department of Electrical and Computer Engineering Engineering Northeastern University Northeastern University Boston, MA, U.S.A Boston, MA, U.S.A [email protected] [email protected] ABSTRACT tems. Software developers have used them to gather pro- While a large number of program profiling and instrumenta- gram information and identify critical sections of code. Hard- tion tools have been developed to support hardware and software designers use them to facilitate their evaluation of fu- ware analysis on general purpose systems, there is a general ture designs. Instrumentation tools can be divided into two lack of sophisticated tools available for embedded architec- categories, based on when instrumentation is applied. In- tures. Embedded systems are sensitive to performance bot- strumentation applied at run-time is called dynamic instru- tlenecks, memory leaks, and software inefficiencies. There is mentation; instrumentation applied at compile time or link a growing need to develop more sophisticated profiling and time is called static instrumentation. instrumentation tools in this rapidly growing design space. Instrumentation techniques in the embedded domain are rel- In this paper we describe, DSPInst, a binary instrumenta- atively immature, though due to the increased sophistication tion tool for the Analog Device’s Blackfin family of Digital of recent embedded systems, the need for more powerful Signal Processors (DSPs). DSPInst provides for fine-grained tools is growing. It is critical for embedded system design- control over the execution of programs. Instrumentation ers to be able to properly debug hardware and software is- tool users are able to gain transparent access to the pro- sues. The need for improved instrumentation frameworks cessor and memory state at instruction boundaries, without for digital signal processing systems continues to grow [18, perturbing the architected program state. DSPInst provides 11]. a platform for building a wide range of customized analysis tools at an instruction level granularity. To demonstrate In this paper we describe DSPInst, a binary instrumenta- the utility of this toolset, we provide an example analysis tion tool targeting Analog Devices’ Blackfin family of DSPs. and optimization tool that performs dynamic voltage and DSPInst provides an infrastructure wherein embedded sys- frequency scaling to balance performance and power. tem developers can design a wide range of customized analysis tools that operate at an instruction granularity. In Categories and Subject Descriptors DSPInst we have adopted a static instrumentation approach, and modified the binary executables of the applications be- D.3.4 [Programming languages]: Code generation, Opti- fore run-time. mization The remainder of the paper is organized as follows. Sec- General Terms tion 2 introduces the Blackfin DSP architecture used in our Languages, Management, Measurement, Performance work. Section 3 presents an survey of related instrumentation tools. Section 4 presents the design details of DSPInst. Keywords In section 5, we illustrate the usage of DSPInst by present- DSPInst, Binary instrumentation, Embedded architectures, ing an example utility. Section 6 concludes the paper and Dynamic voltage and frequency scaling discusses future directions. 1. INTRODUCTION 2. ANALOGDEVICESBLACKFINDSPAR- Instrumentation tools have been shown to be extremely use- CHITECTURE ful in analyzing program behavior on general purpose sys- The infrastructure discussed in this paper targets the Analog Devices Blackfin family of DSPs. The specific DSP used in our example application is the ADSP-BF548. We start by Permission to make digital or hard copies of all or part of this work for providing a brief overview of the Blackfin family and its core personal or classroom use is granted without fee provided that copies are architecture. More detailed information is available in the not made or distributed for profit or commercial advantage and that copies Analog Devices Programmers Reference Manual [4, 1, 5, 2]. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific The Blackfin DSP is a Micro Signal Architecture (MSA) permission and/or a fee. WBIA ’09, Dec 12, New York City, NY based architecture developed jointly by Analog Devices and Copyright(c) 2009 ACM 978-1-60558-793-6/12/09...$10.00. Intel Corporation. The architecture combines a dual 16- bit Multiply Accumulate (MAC) signal processing engine, Tools that operate on a binary format can be further di- flexible 32-bit Single Instruction Multiple Data (SIMD) ca- vided into two categories based on when instrumentation is pabilities, an orthogonal RISC-like instruction set and mul- applied. Probably the most commonly used static binary timedia features into a single instruction set architecture. instrumentation toolset ever developed was ATOM, which Combining DSP and microcontrol in a single instruction set targeted the Digital Alpha processor [21]. Other commonly enables the Blackfin to perform equally well in either signal used instrumentation toolsets include EEL [14], Etch [17] processing or control intensive applications. and Morph [24]. Some of the Blackfin Instruction Set Architecture (ISA) fea- The second class of binary instrumentation tools we consider tures include: are dynamic instrumentation. A number of high quality dynamic instrumentation tools have been developed in recent years, and include Dyninst [7], Kerninst [22], Detours [12] • two 16-bit multipliers, two 40-bit accumulators, two and Vulcan [10]. These tools dynamically modify the origi- 40-bit arithmetic logic units (ALUs), four 8-bit video nal code in memory during execution in order to insert in- ALUs and a 40-bit shifter, strumentation trampolines (i.e., a mechanism that jumps be- • two Data Address Generator (DAG) units, tweeen the instrumented code and analysis code, and back again). Most of these dynamic instrumentation systems do • 16-bit instructions (which represent the most frequently not address instrumentation transparency, preserving the ar- used instructions), chitected state of the processor. Other dynamic instrumentation systems use caches and dynamic compilation of the bi- • complex DSP instructions are encoded into 32-bit op- nary, and include Valgrind [16], Strata [20], DynamoRIO [6], codes as multifunction instructions, Diota [15] and Pin [13]. Most of these tools target general • support limited multi-issue capabilities in a Very Long purpose architectures. Instruction Word (VLIW) fashion, where a 32-bit instruction can be issued in parallel with two 16-bit in- Instrumentation tools targeting embedded systems need to structions, and satisfy a different set of requirements [11]. Some of these constraints include power efficiency and lack of operating • a fixed point processor, offering 8, 16, 32-bit signed or system support. Many of these issues force us to take a unsigned traditional data types, as well as 16 or 32-bit minimalists approach to instrumentation (i.e., less is more). signed fractional data types. The DELI [9] developed by Hewlett-Packard and ST Micro- electronics provides utilities to manipulate VLIW instructions on the LX/ST210 embedded processor. DSPTune [18] The Blackfin processor supports a modified Harvard archi- is a toolset similar to DSPInst targeting the Analog Devices tecture in combination with a hierarchical memory struc- SHARC architecture. However, DSPTune can only instru- ture, which is organized into two levels. The L1 memory ment C or assembly code, which limits its usefulness when can be accessed at the core clock frequency in a single clock source is not available. cycle. The L2 memory is slightly slower, but still faster than external memory. The L1 memory can be configured DSPInst utilizes static instrumentation. The main contri- as cache and/or SRAM, giving Blackfin the flexibility to sat- butions of DSPInst are: isfy a range of application requirements [19]. The Blackfin Processor was designed to be a low-power pro- • This is the first tool-building system targeting the Black- cessor, and is equipped with a Dynamic Power Manage- fin DSP processors. Analysis tools to perform code ment Controller (DPMC). The DPMC works together with profiling or code modification can be built easily. the Phase Locked Loop (PLL) circuitry, allowing the user to scale both frequency and voltage, to arrive at the best • DSPInst allows for selective instrumentation; the user power/performance frequency/voltage operating point for can specify on instruction boundaries when to turn the target application. profiling or code modifications on/off. The Blackfin Processor has a built-in performance monitor • unit (PMU) that monitors internal resources unintrusively. DSPInst instruments object code, versus source or as- The PMU covers a wide range of events, including pipeline sembly. DSPInst decouples the user from having to and memory stalls, and includes penalties associated with provide source code of the application to be instru- these events. Developers can use the PMU to count proces- mented. sor events during program execution. This kind of profiling can be utilized to better understand performance bottle- necks and opportunities for voltage/frequency scaling. The The basic philosophy of DSPInst is to replace instructions PMU provides a more efficient debugging utility as opposed in the

Load more