Introduction to the Blackfin Processor
Total Page:16
File Type:pdf, Size:1020Kb
Embedded Signal Processing with the Micro Signal Architecture PREV NEXT ⏮ ⏭ Part B Embedded Signal Proce... 6. Real-Time DSP Fundamenta... Chapter 5 Introduction to the Blackfin Processor T his chapter examines the architecture of the Blackfin processor, which is based on the MSA jointly developed by Analog Devices and Intel. We use assembly programs to introduce the processing units, registers, and memory and its addressing modes. At the end of the chapter, we design, simulate, and implement an eightband graphic equalizer and use this application to explain some of the practical implementation issues. An indepth discussion of the realtime processing concepts, number representations, peripheral programming, code optimization, and system design is given in Chapters 6, 7 , and 8 . 5.1 THE BLACKFIN PROCESSOR: AN ARCHITECTURE FOR EMBEDDED MEDIA PROCESSING This section introduces the architecture of the Blackfin processor and its internal hardware units, memory, and peripherals using assembly instructions. In particular, we use the BF533 processor [23] for explaining the Blackfin processor’s architecture. The BF537 processor [24] has core and system architectures identical to those of the BF533, but slightly different onchip peripherals. 5.1.1 INTRODUCTION TO MICRO SIGNAL ARCHITECTURE As introduEcend ijno Cyha Sptear f1,a threi M? SSA ucorbe swcasr diebsigen eTd oto dacahieyve high speed DSP performance and best power efficiency. This core combines the best capabilities of microcontroller and DSP processor into a single programming model. This is different from other cores that require separate DSP processor and microcontroller. The main advantage of the MSA core is the integrated feature that combines multimedia processing, communication, and user interface on a single, easytoprogram platform. This highly versatile MSA core performs DSP tasks as well as executing user commands and control tasks. The programming environment has many features that are familiar to both microcontroller and DSP programmers, thus greatly speeding up the development of embedded systems. The MSA architecture is also designed to operate over a wide range of clock speeds and operating voltages and includes circuitry to ensure stable transitions between operating states. A dynamic power management circuit continuously monitors the software running on the processor and dynamically adjusts both the voltage delivered to the core and the frequency at which the core runs. This results in optimized power consumption and performance for realtime applications. 5.1.2 OVERVIEW OF THE BLACKFIN PROCESSOR The ADSPBF5xx Blackfin processor is a family of 16bit fixedpoint processors that are based on the MSA core. This processor targets power sensitive applications such as portable audio players, cell phones, and digital cameras. Low cost and high performance factors also make Blackfin suitable for computationally intensive applications including video equipment and thirdgeneration cell phones. The first generation of the BF5xx family is the BF535, which achieves a clock speed up to 350 MHz at 1.6 V. Analog Devices introduced three processor families (BF532, BF533, and BF561) in 2003. These processors can operate up to 750 MHz at 1.45 V. The clock speed and operating voltages can be switched dynamically for given tasks via software for saving power. The BF561 processor incorporates two MSA cores to improve performance using parallel processing. A recent release of the BF5xx family consists of BF534, BF536, and BF537. These processors add embedded Ethernet and controller area network connectivity to the Blackfin processor. The Blackfin core combines dual multiplyaccumulate (MAC) engines, an orthogonal reduceinstructionset computer (RISC)like instruction set, single instruction, multiple data (SIMD) programming capabilities, and multimedia processing features into a unified architecture. As shown in Figure 5.1, the Blackfin BF533 processor [23] includes system peripherals such as parallel peripheral interface (PPI), serial peripheral interface (SPI), serial ports (SPORTs), generalpurpose timers, universal asynchronous receiver transmitter (UART), realtime clock (RTC), watchdog timer, and generalpurpose input/output (I/O) ports. In addition to these system peripherals, the Blackfin processor also has a direct memory access (DMA) controller that effectively transfers data between external devices/memories and the internal memories without processor intervention. Blackfin processors provide L1 cache memory for quick accessing of both data and instructions. In summary, Blackfin processors have rich peripheral supports, memory management unit (mmu), and RISClike instructions, which are typically found in many highend microcontrollers. These processors have high speed buses and advanced computational engines that support variable length ariEthmnejtoic yop eSraatiofnasr ini? ha Srduwabres. Tchreisbe fea tTuroesd makye the Blackfin processors suitable to replace other highend DSP processors and microcontrollers. In the following sections, we further introduce the core architecture and its system peripherals. Figure 5.1 Block diagram of the Blackfin BF533 system (courtesy of Analog Devices, Inc.) 5.1.3 ARCHITECTURE: HARDWARE PROCESSING UNITS AND REGISTER FILES Figure 5.2 shows that the core architecture consists of three main units: the address arithmetic unit, the data arithmetic unit, and the control unit. 5.1.3.1 Data Arithmetic Unit The data arithmetic unit contains the following hardware blocks: 1. Two 16bit multipliers represented as in Figure 5.2. 2. Two 40bit accumulators (ACC0 and ACC1). The 40bit accumulator can be partitioned as 16bit lowerhalf (A0.L, A1.L), 16bit upperhalf (A0.H, A1. H), and 8bit extension (A0.X, A1.X), where L and H denote lower and higher 16bit, respectively. Figure 5.2 Core architecture of the Blackfin processor (courtesy of Analog Devices, Inc.) 3. Two 40Ebnit jaorityhm Setaic floagirc iu?n iSts (uALbUss)c rerpirbesen tTedo ads ay in Figure 5.2. 4. Four 8bit video ALUs represented as in Figure 5.2. 5. A 40bit barrel shifter. 6. Eight 32bit data registers (R0 to R7) or 16 independent 16bit registers (R0. L to R7.L and R0.H to R7.H). Computational units get data from data registers and perform fixedpoint operations. The data registers receive data from the data buses and transfer the data to the computational units for processing. Similarly, computational results are moved to the data registers before transferring to the memory via data buses. These hardware computational blocks are used extensively in performing DSP algorithms such as FIR filtering, FFT, etc. The multipliers are often combined with the adders inside the ALU and the 40bit accumulators to form two 16by 16bit MAC units. Besides working with the multiplier, the ALU also performs common arithmetic (add, subtract) and logical (AND, OR, XOR, NOT) operations on 16bit or 32bit data. Many special instructions or options are included to perform saturation, rounding, sign/exponent detection, divide, field extraction, and other operations. In addition, a barrel shifter performs logical and arithmetic shifting, rotation, normalization and extraction in the accumulator. An illustrative experiment using the shift instructions is presented below in HandsOn Experiment 5.3. With the dual ALUs and multipliers, the Blackfin processor has the flexibility of operating two register pairs or four 16bit registers simultaneously. In this section, we use Blackfin assembly instructions to describe the arithmetic operations in several examples. The assembly instructions use algebraic syntax to simplify the development of the assembly code. EXAMPLE 5.1 Single 16-Bit Add/Subtract Operation Any two 16bit registers (e.g., R1.L and R2.H) can be added or subtracted to form a 16bit result, which is stored in another 16bit register, for example, R3.H = R1.L + R2.H (ns), as shown in Figure 5.3. Note that for 16bit arithmetic, either a saturation flag (s) or a no saturation (ns) flag must be placed at the end of the instruction. The symbol “;” specifies the end of the instruction. Saturation arithmetic is discussed in Chapter 6. The Blackfin processor provides two ALU units to perform two 16bit add/subtract operations in a single cycle. This dual 16bit add/subtract operation doubles the arithmetic throughput over the single 16bit add/subtract operation. EXAMPLE 5.2 Dual 16-Bit Add/Subtract Operations Any two 32bit registers can be used to store four inputs for dual 16bit add/subtract operations, and the two 16bit results are saved in a single 32bit register. As shown in Figure 5.4, the instruction R3 = R1+|−R2 performs addition in the upper halves of R1 and R2 and subtraction in the lower halves of R1 and R2, simultaneously. The results are stored in the high and low words of the R3 register, respectively. Enjoy Safari? Subscribe Today Figure 5.3 Single 16bit addition using three registers Figure 5.4 Dual 16bit add/subtract using three registers Figure 5.5 Quad 16bit add/subtract using four registers The Blackfin processor is also capable of performing four (or quad) 16bit add/subtract operations in a single pass. These quad operations fully utilize the dual 40bit ALU and thus quadruple the arithmetic throughput over the single add/subtract operation. EXAMPLE 5.3 Quad 16-Bit Add/Subtract Operations In quad 16bit add/subtract operations, only the same two 32bit registers can be used to house the four 16bit inputs for these quad additions. In other words, two operations can be operated on the same pair of 16bit registers. For example, the instructions R3 = R1+|−R2, R4 = R1−|+R2 perform addition and subtraction on the halves of R1 and R2 as shown in Figure 5.5. Note that the symbol “,” separates two instructions that are operated aEt tnhej soamye S cyaclef.ari? Subscribe Today Besides the previous 16bit operations, the Blackfin processor can also perform single 32bit add/subtract using any two 32bit registers as inputs.