Graduate Institute of Electronics Engineering, NTU

BBllaacckkffiinn PPrroocceessssoorr AArrcchhiitteeccttuurree

Instructor: Prof. Andy Wu

ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Berkeley incorporated a Reduced Instruction Set Computer (RISC) architecture

It has the following key features: A fixed (32-bit) instruction size with few formats CISC processors typically had variable length instruction sets with many formats A load store architecture were instructions that process data operate only on registers and are separate from instructions that access memory CISC processors typically allowed values in memory to be used as operands in data processing instructions A large register bank of thirty-two 32-bit registers, all of which could be used for any purpose, to allow the load-store architecture to operate efficiently CISC register sets were getting larger, but none was this large and most had different registers for different purposes ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Hard-wired instruction decode logic CISC processor used large microcode ROMs to decode their instructions

Pipelined execution CISC processors allowed little, if any, overlap between consecutive instructions (though they do now)

Single-cycle execution CISC processors typically took many clock cycles to completes a single instruction ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Single memory space for program and data Shared global bus ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Separate program and data memory spaces Usually refer to separate program and data buses ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Program bus can be use for coefficient loading for MAC ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Made by Coporation

A new breed of embedded media processor designed specifically for today s embedded audio, video and communication applications.

Combine a 32-bit RISC-like instruction set and dual 16-bit multiply accumulate (MAC) signal processing functionality

Perform equally well both in signal processing and control processing applications-in many cases deleting the requirement for separate heterogeneous processors ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Two 16-bit MACs, two 40-bit ALUs, four 8-bit Video ALUs

Support for 8/16/32-bit integer and 16/32-bit fractional data types

Concurrent Fetch of One instruction and two unique data elements

Two loop counters that allow for nested zero-overhead looping

A Modified in combinational with a hierarchical memory ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Arbitrary bit and bit field manipulation, insertion and extraction

Two data address generator (DAG) units with circular and bit-reversed addressing Data address generator contains two 32-bit address ALUs and an address register file Address register file consists of six 32-bit general purpose pointer registers and four 32-bit circular buffer addressing registers ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Unified 4GB memory space

Mixed 16/32-bit instruction encoding for best code density

Memory protection for support of OS operation ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Three modes of operation User mode User mode has restricted access to a subset of system resources, thus providing a protected software environment User mode is considered the domain of application programs Supervisor mode and Emulation mode Supervisor mode and Emulation mode have unrestricted access to the core resources Supervisor mode and Emulation mode are usually reserved for the kernel code of an ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU BBllaacckkffiinn AArrcchhiitteeccttuurree SSuuppppoorrtt ((SSiinnggllee CCyyccllee )) Possibility of the following parallel operations processed in one clock cycle Execution of a single instruction operating on both MACs or ALUs

Execution of a 2 x 32-bit data moves 2 reads or 1 read/1 write

Execution of two pointer updates

Execution of hardware loop updates ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

BBllaacckkffiinn PPrroocceessssoorr CCoommppuuttee UUnniitt ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

BBFF553333 MMeemmoorryy AAcccceessss

Under the right conditions 4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store

PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS background DMA activity ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

CCoommppuuttee UUnniitt AArrcchhiitteeccttuurree ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

RReeggiisstteerr FFiillee

Data Register Syntax R0, R1 etc. refer to 32 bit registers R0.L refers to the low 16 bits of the R0 32 bit reg R0.H refers to the high 16 bits of the R0 register

Accumulator Syntax 8 x 32 bit A0.L => low 16 bits OR A0.H => next 16 bits A0.W => least significant 32 bit word 16 x 16 bit A0.X => MS 8 bit extension

SHARC 16 32-bit data registers, integer and float. There is a pair of 2 x 40 bit SHARC accumulator registers too accumulators ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Blackfin SHARC 68K R0 = R1 + R2; R0 = R1 + R2; MOVE.L R2, R0 ADD.L R1, R0

R0.L = R1.L + R2.H; MOVE.W R2, R0 ADD.W R1, R0

Closest R0 = R1 +|- R2; MOVE.L R2, R0 R0 = R1 + R2, ASR.L #16, R0 Means R4 = R1 R2; MOVE.L R1, R3 ASR.L #16, R3 R0.L = R1.L R2.L ADD.W R3, R0 in parallel with ASL.L #16, R0 R0.H = R1.H + R2.H MOVE.W R2, R0 ADD.W R1, R0 ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

A & B registers must stay on the same side of the | for both Instruction For dual and quad 16 bit operations the (CO) option causes the destination registers to cross ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Multiplies are signed fractional by default Signed fractional multiply result is automatically left shifted 1 bit Signed fractional multiply != signed integer multiply Rounding available on fractional number multiplies and special option of integer number multiplies ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Two cases

Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before extracting a 16 bit value to the destination register too ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken Where in the destination register it goes depends on which accumulator is being extracted from ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

When extracting a 16 bit integer value from an accumulator the low 16 bits is taken Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

In general there are 16 and 32 bit versions of the arithmetic instructions Most of the 32 bit instructions can be executed in parallel with 2 x 16 bit memory/index operations Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands || means parallel Examples: A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\ R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0]; ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

BBllaacckkffiinn PPrroocceessssoorr MMeemmoorryy AArrcchhiitteeccttuurree ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

A single, unified 4G byte address space using 32-bit addresses The L1 memory system is the primary highest performance memory available to the core and is faster than L2 memory system The L2 memory system is off-chip and have longer access latencies ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

BBllaacckkffiinn PPrroocceessssoorr PPeerriipphheerraallss ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Parallel Peripheral Interface (PPI) Serial Ports (SPORTs) Serial Peripheral Interface (SPI) General-purpose timers Universal Asynchronous Receiver Transmitter (UART) Real-Time Clock (RTC) Watchdog timer General-purpose I/O (programmable flags) ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 EEZZ--KKIITT LLiittee

Key features Attributes ADSP-BF535 Blackfin Processor 4M x 32-bit SDRAM 272K x 16-bit AD1885 48 kHz AC 97 SoundMax codec Power management capability JTAG ICE 14-pin header Evaluation suite of VisualDSP++ Three 90-pin conncetors for analyzing and interfacing with the processors peripheral interfaces CE Certified

System Requirements Pentium 166 MHz or higher Minimum of 32 MB of RAM Windows 98, Windows 2000, or Windows XP One USB port ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AAnnaalloogg DDeevviicceess CCRROOSSSSCCOORREE TToooollss

CROSSCORE, Analog Devices development tools product line, provides easier and more robust methods for engineers to develop and optimize systems by shortening product development cycles for faster time-to-market

VisualDSP++ software development and debugging environment An integrated software development and debugging environment allowing for fast and easy development, debug, and deployment EZ-KIT Lite evaluation systems Provides an easy way to investigate the power of the ADI s family of Embedded Processors and DSPs to develop applications Emulators Emulators are available for PCI and USB host platforms ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 BBllaacckkffiinn PPrroocceessssoorr

Key features High performance 16-bit dual MAC processor core up to 350 MHz Flexible, software controlled Dynamic Power Management Optimized RISC instruction set for high code density and programming C/C++ language Enhanced media instructions to process audio, image, and video for multimedia applications Integrated system peripherals including USB device, PCI, serial ports, UARTs, SPIs, 32-bit timers, and more

Blackfin processors utilize Single processor core Single instruction set Single programming model Single set of development tools ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 BBllaacckkffiinn PPrroocceessssoorr

Target applications Automotive Broadband access Central office/network switch Digital imaging and printing Global positioning systems Industrial signal processing Instrumentation/telemetry Internet appliances Modem solutions Personal branch exchanges (PBX) POS terminals Telecommunications Video conferencing VoIP phone solutions ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 BBllaacckkffiinn PPrroocceessssoorr

Blackfin Processor System Environment ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 BBllaacckkffiinn PPrroocceessssoorr

Blackfin Processor Memory Hierarchy L1 instruction and data memories can be dynamically configured as SRAM, cache, or a combination of both L2 for larger storage need of instruction and data ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 BBllaacckkffiinn PPrroocceessssoorr

Portable Low Power Architecture Dynamic power management ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

AADDSSPP--BBFF553355 BBllaacckkffiinn PPrroocceessssoorr

ADSP-BF535 Block Diagram ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF556611 BBllaacckkffiinn SSyymmmmeettrriicc MMuullttii--PPrroocceessssoorr ADSP-BF561 Symmetric Multi-Processor Block Diagram ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF556611 BBllaacckkffiinn SSyymmmmeettrriicc MMuullttii--PPrroocceessssoorr Key features Blackfin Symmetric Multi-Processor Dual high performance Blackfin Processors up to 756 MHz Capable of over 3000 MMACs Independent processor cores for image processing and system control functions RISC-like register and instruction model for ease of programming and C/C++ complier friendly support Enhanced media instructions process audio, image, and video data for multimedia applications Software controlled Dynamic Power Management with on-chip voltage regulation minimizes power consumption ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF556611 BBllaacckkffiinn SSyymmmmeettrriicc MMuullttii--PPrroocceessssoorr Key features Highest Level of integration 328 Kbytes of total on-chip memory Dual Parallel Peripheral Interface and ITU-R 656 video data formats External memory controller providing glueless connection to multiple banks of external SDRAM, SRAM, FLASH, or ROM memory High bandwidth, two-dimensional internal DMA controllers UART with support for IrDA Integrated on-chip voltage regulator 256-ball Pb-Free Mini-BGA, and 297-ball Sparse PBGA package options ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF556611 BBllaacckkffiinn SSyymmmmeettrriicc MMuullttii--PPrroocceessssoorr Key features Target Applications Digital still cameras Digital video cameras Hybrid digital video/still cameras Video security/surveillance system Portable multimedia players ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF553311//BBFF553322//BBFF553333 BBllaacckkffiinn PPrroocceessssoorr SSeerriieess Key features Blackfin Processors Offer Features Attractive to a Broad Application Base Performance to 756 MHz/1512 MMAC enables multichannel audio plus VGA/D1 video processing in multimedia applications Enhanced Dynamic Power Management with on-chip voltage regulation allows operation to 0.8V, extending battery life in portable applications Application-tuned peripherals provide glueless connectivity to general- purpose converters in data acquisition applications Multiple low cost, pin and code compatible derivatives enable software differentiation in cost-sensitive consumer applications ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF553311//BBFF553322//BBFF553333 BBllaacckkffiinn PPrroocceessssoorr SSeerriieess Key features High Level of Integration Up to 148 Kbytes of on-chip SRAM Parallel Peripheral Interface supporting ITU-R 656 video data formats Two-dual channel, full duplex synchronous serial ports supporting eight stereo IS channels 12 DMA channels supporting one- and two-dimensional data transfers Memory controller providing glueless connection to multiple banks of external SDRAM, SRAM, flash, or ROM Three timers supporting PWM and pulsewidth /event count modes UART with support for IrDA SPI compatible port Real-time clock Watchdog timer PLL capable of 1x to 63xfrequency multiplication 160-ball mini-BGA, 169-ball Pb-Free PBGA and 176-lead LQFP packages Commercial and industrial temperature ranges ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF553311//BBFF553322//BBFF553333 BBllaacckkffiinn PPrroocceessssoorr SSeerriieess CCoorree AArrcchhiitteeccttuurree Key features Two 16-bit multipliers Two 40-bit accumulators Two 40-bit arithmetic logic units (ALU) Four 8-bit video ALUs One 40-bit shifter Compute register file Contains eight 32-bit registers Can be operated as 16 Independent 16-bit registers MAC Can perform a 16 - by 16 bit multiply per cycle, with accumulation to a 40-bit result Signed and unsigned formats, rounding, and saturation are supported ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF553311//BBFF553322//BBFF553333 BBllaacckkffiinn PPrroocceessssoorr SSeerriieess CCoorree AArrcchhiitteeccttuurree Key features Program sequencer Controls the instruction execution flow, including instruction alignment and decoding For program flow control, the sequencer supports PC-relative and indirect conditional jumps ( with static branch prediction ) and subroutine calls Hardware is provided to support zero-overhead looping The architecture is fully interlocked, meaning there are no visible pipeline effects when executing instructions with data dependencies Address arithmetic unit Provides two addresses for simultaneous dual fetches from memory Contains a multiported register file consisting of four sets of 64-bit index, Modify, Length, and Base registers (for circular buffering) and eight additional 32-bit pointer registers (for C-style indexed stack manipulation) ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU AADDSSPP--BBFF553311//BBFF553322//BBFF553333 BBllaacckkffiinn PPrroocceessssoorr SSeerriieess CCoorree AArrcchhiitteeccttuurree Key features Blackfin processor support a modified Harvard architecture in combination with a hierarchical memory structure Level 1 (L1) memories typically operate at the full processor speed with little or no latency At the L1 level, the instruction memory holds instructions only. The two data memories hold data, and a dedicated scratchpad data memory stores stack and local variable information Three modes of operation User mode has restricted access to a subset of system resources, thus providing a protected software environment Supervisor and Emulation modes have unrestricted access to the system core resources ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU

[1] Analog Devices Web Site, http://www.analog.com/ [2] Blackfin Processor http://www.analog.com/processors/processors/blackfin/ [2] ADSP-BF533 Blackfin Processor Hardware Reference, Rev 1.0, December 2003, Analog Devices. Section 2 [3] Blackfin Processor Instruction Set Reference, Rev 3, June 2004, Analog Devices. Sections 8 ~ 10, 14 & 15

I suggest that students who want to be familiar with the Blackfin Processor should read reference 3 and 4 thoroughly.