Embedded Signal Processing with the Micro Signal Architecture PREV NEXT ⏮ ⏭ Part B Embedded Signal Proce... 6. Real-Time DSP Fundamenta... Chapter 5 Introduction to the Blackfin Processor T his chapter examines the architecture of the Blackfin processor, which is based on the MSA jointly developed by Analog Devices and Intel. We use assembly programs to introduce the processing units, registers, and memory and its addressing modes. At the end of the chapter, we design, simulate, and implement an eight­band graphic equalizer and use this application to explain some of the practical implementation issues. An in­depth discussion of the real­time processing concepts, number representations, peripheral programming, code optimization, and system design is given in Chapters 6, 7 , and 8 . 5.1 THE BLACKFIN PROCESSOR: AN ARCHITECTURE FOR EMBEDDED MEDIA PROCESSING This section introduces the architecture of the Blackfin processor and its internal hardware units, memory, and peripherals using assembly instructions. In particular, we use the BF533 processor [23] for explaining the Blackfin processor’s architecture. The BF537 processor [24] has core and system architectures identical to those of the BF533, but slightly different on­chip peripherals. 5.1.1 INTRODUCTION TO MICRO SIGNAL ARCHITECTURE As introduEcend ijno Cyha Sptear f1,a threi M? SSA ucorbe swcasr diebsigen eTd oto dacahieyve high­ speed DSP performance and best power efficiency. This core combines the best capabilities of microcontroller and DSP processor into a single programming model. This is different from other cores that require separate DSP processor and microcontroller. The main advantage of the MSA core is the integrated feature that combines multimedia processing, communication, and user interface on a single, easy­to­program platform. This highly versatile MSA core performs DSP tasks as well as executing user commands and control tasks. The programming environment has many features that are familiar to both microcontroller and DSP programmers, thus greatly speeding up the development of embedded systems. The MSA architecture is also designed to operate over a wide range of clock speeds and operating voltages and includes circuitry to ensure stable transitions between operating states. A dynamic power management circuit continuously monitors the software running on the processor and dynamically adjusts both the voltage delivered to the core and the frequency at which the core runs. This results in optimized power consumption and performance for real­time applications. 5.1.2 OVERVIEW OF THE BLACKFIN PROCESSOR The ADSP­BF5xx Blackfin processor is a family of 16­bit fixed­point processors that are based on the MSA core. This processor targets power­ sensitive applications such as portable audio players, cell phones, and digital cameras. Low cost and high performance factors also make Blackfin suitable for computationally intensive applications including video equipment and third­generation cell phones. The first generation of the BF5xx family is the BF535, which achieves a clock speed up to 350 MHz at 1.6 V. Analog Devices introduced three processor families (BF532, BF533, and BF561) in 2003. These processors can operate up to 750 MHz at 1.45 V. The clock speed and operating voltages can be switched dynamically for given tasks via software for saving power. The BF561 processor incorporates two MSA cores to improve performance using parallel processing. A recent release of the BF5xx family consists of BF534, BF536, and BF537. These processors add embedded Ethernet and controller area network connectivity to the Blackfin processor. The Blackfin core combines dual multiply­accumulate (MAC) engines, an orthogonal reduce­instruction­set computer (RISC)­like instruction set, single instruction, multiple data (SIMD) programming capabilities, and multimedia processing features into a unified architecture. As shown in Figure 5.1, the Blackfin BF533 processor [23] includes system peripherals such as parallel peripheral interface (PPI), serial peripheral interface (SPI), serial ports (SPORTs), general­purpose timers, universal asynchronous receiver transmitter (UART), real­time clock (RTC), watchdog timer, and general­purpose input/output (I/O) ports. In addition to these system peripherals, the Blackfin processor also has a direct memory access (DMA) controller that effectively transfers data between external devices/memories and the internal memories without processor intervention. Blackfin processors provide L1 cache memory for quick accessing of both data and instructions. In summary, Blackfin processors have rich peripheral supports, memory management unit (mmu), and RISC­like instructions, which are typically found in many high­end microcontrollers. These processors have high­ speed buses and advanced computational engines that support variable­ length ariEthmnejtoic yop eSraatiofnasr ini? ha Srduwabres. Tchreisbe fea tTuroesd makye the Blackfin processors suitable to replace other high­end DSP processors and microcontrollers. In the following sections, we further introduce the core architecture and its system peripherals. Figure 5.1 Block diagram of the Blackfin BF533 system (courtesy of Analog Devices, Inc.) 5.1.3 ARCHITECTURE: HARDWARE PROCESSING UNITS AND REGISTER FILES Figure 5.2 shows that the core architecture consists of three main units: the address arithmetic unit, the data arithmetic unit, and the control unit. 5.1.3.1 Data Arithmetic Unit The data arithmetic unit contains the following hardware blocks: 1. Two 16­bit multipliers represented as in Figure 5.2. 2. Two 40­bit accumulators (ACC0 and ACC1). The 40­bit accumulator can be partitioned as 16­bit lower­half (A0.L, A1.L), 16­bit upper­half (A0.H, A1. H), and 8­bit extension (A0.X, A1.X), where L and H denote lower and higher 16­bit, respectively. Figure 5.2 Core architecture of the Blackfin processor (courtesy of Analog Devices, Inc.) 3. Two 40E­bnit jaorityhm Setaic floagirc iu?n iSts (uALbUss)c rerpirbesen tTedo ads ay in Figure 5.2. 4. Four 8­bit video ALUs represented as in Figure 5.2. 5. A 40­bit barrel shifter. 6. Eight 32­bit data registers (R0 to R7) or 16 independent 16­bit registers (R0. L to R7.L and R0.H to R7.H). Computational units get data from data registers and perform fixed­point operations. The data registers receive data from the data buses and transfer the data to the computational units for processing. Similarly, computational results are moved to the data registers before transferring to the memory via data buses. These hardware computational blocks are used extensively in performing DSP algorithms such as FIR filtering, FFT, etc. The multipliers are often combined with the adders inside the ALU and the 40­bit accumulators to form two 16­by 16­bit MAC units. Besides working with the multiplier, the ALU also performs common arithmetic (add, subtract) and logical (AND, OR, XOR, NOT) operations on 16­bit or 32­bit data. Many special instructions or options are included to perform saturation, rounding, sign/exponent detection, divide, field extraction, and other operations. In addition, a barrel shifter performs logical and arithmetic shifting, rotation, normalization and extraction in the accumulator. An illustrative experiment using the shift instructions is presented below in Hands­On Experiment 5.3. With the dual ALUs and multipliers, the Blackfin processor has the flexibility of operating two register pairs or four 16­bit registers simultaneously. In this section, we use Blackfin assembly instructions to describe the arithmetic operations in several examples. The assembly instructions use algebraic syntax to simplify the development of the assembly code. EXAMPLE 5.1 Single 16-Bit Add/Subtract Operation Any two 16­bit registers (e.g., R1.L and R2.H) can be added or subtracted to form a 16­bit result, which is stored in another 16­bit register, for example, R3.H = R1.L + R2.H (ns), as shown in Figure 5.3. Note that for 16­bit arithmetic, either a saturation flag (s) or a no saturation (ns) flag must be placed at the end of the instruction. The symbol “;” specifies the end of the instruction. Saturation arithmetic is discussed in Chapter 6. The Blackfin processor provides two ALU units to perform two 16­bit add/subtract operations in a single cycle. This dual 16­bit add/subtract operation doubles the arithmetic throughput over the single 16­bit add/subtract operation. EXAMPLE 5.2 Dual 16-Bit Add/Subtract Operations Any two 32­bit registers can be used to store four inputs for dual 16­bit add/subtract operations, and the two 16­bit results are saved in a single 32­bit register. As shown in Figure 5.4, the instruction R3 = R1+|−R2 performs addition in the upper halves of R1 and R2 and subtraction in the lower halves of R1 and R2, simultaneously. The results are stored in the high and low words of the R3 register, respectively. Enjoy Safari? Subscribe Today Figure 5.3 Single 16­bit addition using three registers Figure 5.4 Dual 16­bit add/subtract using three registers Figure 5.5 Quad 16­bit add/subtract using four registers The Blackfin processor is also capable of performing four (or quad) 16­bit add/subtract operations in a single pass. These quad operations fully utilize the dual 40­bit ALU and thus quadruple the arithmetic throughput over the single add/subtract operation. EXAMPLE 5.3 Quad 16-Bit Add/Subtract Operations In quad 16­bit add/subtract operations, only the same two 32­bit registers can be used to house the four 16­bit inputs for these quad additions. In other words, two operations can be operated on the same pair of 16­bit registers. For example, the instructions R3 = R1+|−R2, R4 = R1−|+R2 perform addition and subtraction on the halves of R1 and R2 as shown in Figure 5.5. Note that the symbol “,” separates two instructions that are operated aEt tnhej soamye S cyaclef.ari? Subscribe Today Besides the previous 16­bit operations, the Blackfin processor can also perform single 32­bit add/subtract using any two 32­bit registers as inputs.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages48 Page
-
File Size-