Core Architectures Introduction

Purpose • The intent of this module is to explain the advantages of the Hybrid Controller over traditional and DSPs, and to describe the natural progression from the 56800 core to the 56800E core.

Objectives • Identify the benefits of the Hybrid Controller over microcontrollers and DSPs. • Identify the MCU and DSP features of the 56800 and 56800E cores. • Describe the enhancements of the 56800E core over the 56800 core. • Describe the underlying 56800E core architecture.

Contents • 11 pages • 2 questions

Learning Time • 20 minutes

In this module, we will explain the advantages of the Hybrid Controller over traditional microcontrollers and Digital Signal Processors (DSPs), and describe the natural progression from the 56800 core to the 56800E core. We will describe the benefits of the Hybrid Controller over microcontrollers and DSPs. We will also outline the MCU and DSP features of the 56800 and 56800E cores. Finally, we will describe the enhancements of the 56800E core over the 56800 core, as well as the underlying core architecture of the 56800E.

1 Hybrid Controller Functionality

Roll your mouse pointer over the Traditional and Traditional DSP Engine boxes to learn more.

Traditional Traditional DSP Microcontroller Engine

Positive Features: Positive Features: • Designed for controller code • Designed for DSP processing • Compact code size • Designed for matrix operations • Easy to program 56800/E Hybrid Controller Negative Features: Negative Features: • Difficult to program • Not efficient for DSP DSP + MCU • Not optimized for control

• Instructions optimized for controller code, DSP, matrix operations • Compact assembly and “C” compiled code size • Easy to program • Improved processing power (MIPs) over traditional MCU • Extended addressing space

Here, we can see the traditional microcontroller, the traditional DSP engine, and the 56800/E Hybrid Controller. Traditional MCUs and DSPs have both positive and negative features. However, the 56800/E resolves all of the negative features associated with the traditional MCUs and DSPs. Roll your mouse pointer over the Traditional Microcontroller and Traditional DSP Engine boxes to learn more.

The 56800E provides a natural roadmap continuation of the first generation 56800 family. It features the unique combination of true DSP and controller functionality. It also has full source code compatibility with the original 56800 core. Its instructions are further optimized for controller code and DSP operations along with extended registers and data modes (byte and long) for compiler efficiency.

The 56800/E is as flexible as an MCU and as powerful as a DSP. The core is easy to program and has features that support improved compiler efficiency. The core provides processing speeds up to 200MHz unheard of in the traditional MCU world. The address supports an extended addressing range of up to 4MB of program memory and 32MB of data memory providing additional application development flexibility.

2 56800E Features

• 1.8V core voltage and lower (based on migration) • 0.136 mA/MHz power consumption for the core • Single instruction 16x16-bit multiply with 36-bit accumulator • Nested DO loops • Efficient C complier and local variable support • Full scan based test methodology • Up to 200 MIPS/MHz (120 MIPS at 120MHz and beyond) • Supports byte, word, and long data types • Fully source compatible with first generation 56800 architecture • Instruction set that supports both DSP and controller functions • Fast interrupt capability for Level 2 interrupts • Vectored interrupt capability with five priority levels • Enhanced On-Chip Emulation (EOnCE) • Embedded low-cost with MCU and DSP functionality • 16-bit instructions (for optimal code density) • 32-bit buses and arithmetic units (for throughput) • 16x16 multiply-accumulator with 36-bit result • 24-bit addressing range for data • 21-bit addressing range for program

This is a reference page for the previous page

3 56800/E MCU Functionality Roll your mouse pointer over each feature to learn more.

• Efficient C programming True Software • Structured programming • Unlimited function calls Stack and Pointer • Local variable and parameter passing • Stack not limited in size or location by the hardware • Optimal code density 16-bit Program Word • Minimizes the amount of program memory • Any Data (ALU) register can be General Purpose Register Files used as source or destination for arithmetic operations. and Orthogonal Instructions to • Code efficiency Data and Address Register Files • Compiler efficiency • Programming flexibility

56800E: 8, 16, 32-bit Data Types • 56800E: 32-bit Core-global Data Bus (CDBR-CDBW) • 56800: 16-bit Core-global Data Bus (CDBR-CDBW) (supported by instruction set) • More data types give additional code flexibility for 56800: 16-bit Data Type increased programming efficiency.

19 Addressing Modes • Compact code size • Efficient compiler performance Atomic Read-Modify-Write Instructions • Programming ease and flexibility • Non-interruptible bit manipulation instructions Full Set of Bit Manipulation • Dedicated Bit manipulation unit in the core • Efficient control code and peripheral programming Instructions and 16- and • MCU functionality of Hybrid Controller 32-bit Shifting

The advanced 56F800E architecture is the successful merger of several types of processors. When the 56800E core was created, it challenged world-class core designers to create a core that incorporated the best points of its 8-bit, 16-bit, and 32-bit MCU cores with the performance of its digital signal processing cores. The designers built on their 56800 core knowledge and succeeded with the 56800E. The result is a core that has the signal processing power of a DSP, the ease of programming of a 16- bit MCU, and 32-bit performance with 16-bit code density.

Here, we can see the features that provide the MCU properties found in the 56800 and 56800E cores.

4 56800/E DSP Functionality

Roll your mouse pointer over each property for more information.

Multiplier - Accumulator (MAC) • Features DSP programmers Single and dual parallel move instructions • Demand for executing true signal processing algorithms

No overhead hardware looping • Two hardware supported DO loops Nested looping capability that are also interruptible.

• Supports traditional DSP Modulo arithmetic (for circular buffers) mathematical functions for executing Integer and fractional arithmetic support complicated computations

56800E: 5 levels of interrupt priorities • Gives the programmer more control HW Interrupt nesting in interrupt-driven applications Fast Interrupt support 56800: 2 levels of interrupt priorities SW Interrupt nesting

Here, we can see the properties of the 56800 and 56800E cores that cause it to have the superior signal processing capability of a DSP.

Parallel moves support highly efficient DSP operations. The sum of products used in filtering requires multiply and accumulate and dual parallel moves to fetch the next two vectors. This can be performed in a single instruction due to . This is highly efficient!

There is a two-deep hardware stack and support for two hardware DO loops. These DO loops are also interruptible.

Dedicated modulo registers support auto indexing to the top of circular buffers when the end of the buffer is reached. Integer and fractional arithmetic support by the core instruction set supports traditional DSP mathematical functions.

With the 56800E, there are five user-assigned levels of interrupt priorities with hardware- supported nesting. Fast interrupts provide low-latency support for Level 2 interrupts.

With the 56800, there are two user-assigned levels of interrupt priorities with software- supported nesting.

5 Question

Complete the following sentence.

The 56800E core has the signal processing power of a ______and the ease of programming of an ______.

• MCU • ALU • MAC • DO loop • DSP

Here’s an opportunity to see if you can remember what you have learned so far about the 56800E core.

The 56800E core has the signal processing power of a DSP and the ease of programming of an MCU.

6 Programming Model Comparison

DATA ARITHMETIC LOGIC UNIT ADDRESS GENERATION UNIT DATA REGISTERS 23 15 0 35 32 31 16 15 0 R0 R0 ==> R0, R1, N, and M01 A A2 A1 A0 R1 R1 registers are shadowed. B B2 B1 B0 R2 15 0 C C2 C1 C0 R3 M01 D D2 D1 D0 R4 M01 R5 MODIFIER REGISTERS Y1 15 0 Y N Y0 N N3 X0 SP SECONDARY OFFSET REGISTER POINTER REGISTERS

Here, we can see some of the enhancements of the 56800E core over the 56800 core.

In the 56800E core, there are two additional Data registers (C and D), as well as two additional Pointer registers (R4 and R5). The Pointer registers are extended to 24 bits to support a larger memory map. There is also a secondary Offset register (N3). These additional registers increase compiler efficiency by reducing the number of function parameters and variable references that must be accessed from the software stack.

7 Programming Model Comparison

PROGRAM 20 15 0 PC PROGRAM 15 0 ADDRESS GENERATION UNIT OMR 23 15 0 SR R0 R0 ==> R0, R1, N, and M01 OPERATING MODE and STATUS R1 R1 registers are shadowed. 23 15 0 R2 LA 15 0 LA2 R3 M01M01 LOOP ADDRESS R4 23 15 0 R5 MODIFIER REGISTERS HWS0 15 0 HWS1 N N N3 HARDWARE STACK 15 12 0 SP SECONDARY OFFSET REGISTER LC POINTER REGISTERS LC2 LOOP COUNTER 20 0 FIRA FAST INTERRUPT RETURN ADDRESS 12 0 FISR FAST INTERRUPT

Here, we can see more enhancements of the 56800E core over the 56800 core.

In the 56800E core, there is a 21-bit . There is also a secondary hardware DO loop feature that is supported by the Loop Address (LA2) register and the Loop Counter (LC2) register.

The Fast Interrupt feature reduces ISR overhead and can be mapped to any level 2 interrupt vector. Registers dedicated to this feature include shadow registers (R0, R1, N, and M01), the Fast Interrupt Return Address (FIRA) register, and the Fast Interrupt Status Register (FISR). There is also a dedicated return from interrupt with delay instruction (FRTID) to provide the lowest latency and overhead.

8 56800E Core Architecture

Instruction Fetch: PROGRAM PAB - 21 bits ALU1 ALU2 CONTROLLER AGU ALU1 ALU2 PDB - 16 bits PCPC LALA INSTRUCTIONINSTRUCTION LA2 M01M01 LA2 DECODERDECODER R0R0 ProgramProgram HWS R1 1st Data Access: HWS N3N3 R1 Memory FIRAFIRA INTERRUPTINTERRUPT R2R2 Memory FISRFISR UNIT R3R3 XAB1 - 24 bits UNIT R4 SRSR R4 R5 CDBR - 32 bits OMROMR LOOPINGLOOPING R5 NN LCLC UNITUNIT SP LC2 SP LC2 DataData XAB1 2nd Data Access: MemoryMemory XAB2 PAB XAB2 - 24 bits XDB2 - 16 bits PDB CDBW CDBR IP-BusIP-Bus Operations XDB2 InterfaceInterface Performed: 1st - PAB /

A PDB B DATA 2nd - XAB1 / BIT C ALU MANIPULATION D ExternalExternal CDBR- UNIT Y0 BusBus CDBW Y1 X0 InterfaceInterface 3rd - XAB2 / EOnCE/JTAG XDB2 TAP MAC Multi-bit and ALU Shifter

This is a block diagram of the 56800E core. Two key features are the Dual Harvard Architecture and independent functional units. The Dual Harvard Architecture allows for simultaneous program and data memory accesses. The data memory interface also supports two simultaneous read operations, enabling a total of three simultaneous memory accesses. For example, it is possible for the Data ALU to perform a multiplication operation, the Address Generation Unit (AGU) to generate up to two addresses, and the Program Controller to prefetch the next instruction, all within a single execution cycle. This supports a MAC and dual parallel moves in a single cycle, which supports highly efficient DSP functionality (sum of products algorithm, for example). The Program Controller, Address Generation Unit (AGU), and Data Arithmetic Logic Unit (ALU) contain their own register sets and control logic that allows them to operate independently and in parallel to increase program execution. There is also an independent Bit Manipulation Unit that enables efficient control operations typically associated with micro-controllers. The 56800 core is a subset of the 56800E core. Therefore, most of what we will discuss here will apply to both cores. The Program Controller and Hardware Looping Unit include instruction fetch, instruction decoding, hardware loop control, exception processing, and change-of-flow. The AGU performs all of the effective address calculations and address storage necessary to address data operands in memory. The AGU operates in parallel with other chip resources to minimize address-generation overhead. It contains two ALUs (generation of up to 2 addresses every ), one for either XAB1 (24-bit Data Address Bus 1) or PAB (21- bit Program Address Bus) and one for XAB2 (24-bit Data Address Bus 2). The Data ALU performs all arithmetic and logical operations on data operands (8, 16, and 32-bit operations). This includes multiplication, multiply-accumulations (positive or negative accumulation), addition, subtraction, shifting, and logical operations--executing in one instruction cycle (16 x 16 MAC unit with 36-bit result). There is support for signed and unsigned arithmetic, integer and fractional arithmetic, single and double precision arithmetic, fast normalization, and division iteration. The Bit Manipulation Unit operates on any core registers, peripheral registers, or data memory locations. It has bit manipulation, which is capable of testing, setting, clearing, or inverting any bits specified in a 16-bit mask in a single clock cycle thereby making it impossible to be interrupted. This is an important feature when manipulating memory or memory mapped registers in the presence of interrupts that may also be manipulating the same memory location or register. It also has the ability to branch on bits set or clear. The Enhanced On-Chip Emulation (EOnCE) has real-time debugging features and allows a user to interact in debug environment with core and peripherals non-intrusively. It can examine and change registers, on-chip peripheral registers, and memory, set breakpoints on program or data memory, set multiple breakpoints, and step or trace instructions. It can data upload and data download, and it has a change-of-flow trace buffer. The On-Chip Memory and Bus Interfaces have three address buses. Data memory addresses are provided on 24-bit address buses (XAB1 and XAB2). Program memory addresses are provided on a 21-bit address bus (PAB). PAB, XAB1, and XAB2 can each provide addresses for accessing both internal and external memory. There are also four unidirectional Data buses. The 56800E has 2 x 32-bit unidirectional core data buses (CDBW - data write, CDBR - data read). The 56800 has 2 x 16-bit unidirectional core data buses (CDBW - data write, CDBR - data read). Both have a 1 x 16-bit Program memory data bus (PDB – instruction fetch) and a 1 x 16-bit internal Data memory data bus 2 (XDB2 – data read). 9 56800E Core Comparisons

Program Data MIPS Clocks # Interrupt Memory Memory Technology Core per Instr Levels Registers Data Types (Max) Addr Space Addr Space 56800 40 2 2 5 Data 16-bit 128 KB 128 KB Semi-custom 5 Address Fully 56800E 200 1 5 7 Data 8-bit, 16-bit 4 MB 32 MB Synthesizable 8 Address 32-bit & Scanable

56800E Additional Enhancements • Compiler efficiency • Fast interrupt • Additional nested hardware looping • Enhanced OnCE/JTAG

Let’s compare the 56800 and 56800E cores. The 56800E core supports higher processing rates (up to 200 MIPs) than the 56800 core (up to 40 MIPs). The 56800E core is capable of executing 1 instruction per clock cycle whereas the 56800 core requires 2 clock . There are also deeper pipelines in the 56800E core versus the 56800 core that increase processing rates as well.

In the 56800E core, there is hardware that supports five levels of nested interrupts, whereas the 56800 core relies on software. The 56800E also supports 8, 16, and 32-bit native data types by instruction set extensions and 32-bit wide data bus.

The 56800E core has increased program and data memory addressing, which includes a 21-bit program address bus (4 MB) and a 24-bit data address bus (32 MB). A synthesizable design allows us to more easily target and migrate down the technology curve, which means more cost reduction opportunities in a shorter amount of time. This translates to easier customization for special purpose parts at a reduced cost and cycle time.

The 56800E also has additional enhancements. There is greater complier efficiency. The instruction set support of byte, word, and long data types reduces the number of instructions to support these data types, as well as optimizes the execution time of the code. Additional address and data registers allow less stack usage.

Fast interrupt provides low latency ISR servicing for Level 2 interrupts. One additional interruptable hardware DO loop supports nested hardware looping. The enhanced OnCE/JTAG supports real-time, non-intrusive debugging.

10 Question

Is the following statement true or false?

The 56800E core has decreased program and data memory addressing when compared to the 56800 core.

True

False

Consider this question about the 56800 core and the 56800E core.

The 56800E core has increased program and data memory addressing, which includes a 21-bit program address bus (4MB) and a 24-bit data address bus (32MB). The 56800 core includes 16-bit program and data address buses (128KB).

11 Module Summary

• Advantages and benefits of the Hybrid Controller • Progression from the 56800 core to the 56800E core • MCU features of both cores • DSP features of both cores • 56800E core architecture • 56800E enhancements

In this module, we have outlined the advantages and benefits of the Hybrid Controller over traditional microcontrollers and DSPs. We have described the natural progression from the 56800 core to the 56800E core, as well as described the MCU and DSP features of the 56800 and 56800E cores. Finally, we have described the enhancements of the 56800E core over the 56800 core, as well as the underlying core architecture of the 56800E.

12