Class 413 Rapidly Developing Embedded Systems Using Configurable Processors

Steven Knapp ([email protected])

Triscend Corporation www.triscend.com (Booth 160)

© Copyright 1998-99, Triscend Corporation. All rights reserved. Agenda

• What is a Configurable Processor? • Configurable Processors architectures • Building custom peripherals • Hardware/software trade-offs, algorithmic acceleration • Comparing the alternatives • Configurable Processor technical challenges – Communication – Software development environment – Debugging • Summary Terminology Used in this Session

• Configurable Processor – Embedded processor core – Programmable logic to build peripherals – Dedicated System Bus – On-chip memory • User-Definable Processor – Parameterized or changeable processor hardware description – Typically synthesizable from VHDL/Verilog – Usually targeted to ASIC or system-on-a-chip – Examples: ARC, Tensilica, etc. Trends Toward the Configurable Processor • Decreasing Time-to-Market – Fast iterations – Fast in-system, real-time debugging – Fast component availability • More Adaptability – During design and debug – In the field or in the application • Higher Performance – Match the architecture to the problem – Fast response to real-time events – Parallel operations • Increased Differentiation • Ever-improving Process Technology Toward a Configurable Processor

"System on a Configurable Chip" Processors

MCU ASIC Derivative Configurable FLASH Processor EPROM/

MCU Derivatives and Programmable Logic

RAM Discrete Solutions MCU Derivative EPROM/ RAM MCU TTL FLASH "Roll-Your-

CPLD/ Own" MCU in EPROM EPROM FPGA FPGA PAL Periph- TTL eral

Periph- Periph-

TTL eral eral Increasing Functionality, Density, and Performance Density, Increasing Functionality, 1980 1990 2000 Year What Is a Configurable Processor?

• Industry-standard Programmable I/O Programmable I/O

processor Dedicated RAM • Dedicated bus Processor • Programmable Dedicated Programmable Peripherals Logic Logic Bus System Soft User Programmable I/O Programmable – Soft peripherals I/O Programmable Hardware peripherals logic Breakpoint – User-defined functions Programmable I/O – Hardware acceleration • On-Chip Memory Configurable Processors Vendor/ Dedicated Programmable Embedded Bus Family Processor Status Resources Resources Structure 2-channel DMA Triscend Triscend/ 8032 8K-64K bytes RAM 8-bit Data Sampling Coarse-grained, E5 "Turbo" Hardware debug 32-bit Address bus oriented JTAG Triscend ARM In 32-bit Data Triscend Coarse-grained, 7TDMI Development 32-bit Address bus oriented 2-channel DMA Multiple busses Motorola/ 3K bytes RAM Motorola MPA ColdFire Cancelled Unknown CORE+ DRAM controller Fine-grained format Hardware debug 16K RAM National/ Compact- In 8x256 RAM Concurrent NAPA RISC Development Timer Fine-grained 1000 JTAG debugger Plans Gatefield Siemens TriCore Announced Fine-grained Plans Atmel AT40K Atmel ? Announced Coarse-grained None, SIDSA/ Programmable SIDSA 8031 Sampling? Memory- FIPSOC analog Coarse-grained mapped Motorola CORE+

Clock DRAM Controller JTAG Chip System Selects 8Kbyte Bus Interrupt Instruction Controller Cache Controller External 8Kbyte Bus SRAM Interface Shared Pins Debug Module FPGA I/O Motorola M H/W ColdFire A Divide CPU Core C

DMA Master Bus Fine- Controller Interface Grained 1Kbyte FPGA Parallel Embedded Array Port RAM

Timers Slave Bus DUART Interface External 2 3Kbyte I C Embedded Bus Controller RAM Interface National NAPA

ToggleBus Transceiver System Port National Reconfigurable Adaptive CompactRISC Pipeline Logic CPU Controller Processor

Bus Interface Pipeline (fine-grain Unit Memory Array FPGA

architecture) I/O Configurable External Memory Peripheral Scratchpad Interface Devices Memory Array Triscend E5 Configurable Processor

Memory Selector PIO Power Interface Control Unit Selector PIO Configurable PIO System Logic Clock and Selector (CSL) PIO Crystal 8032 Matrix Oscillator PIO "Turbo" Control Micro- controller PIO Power-On Reset CSI Socket Data Bus Address Mappers Byte-wide Bus Address Bus System Arbiter RAM Two- channel DMA Hardware Controller Breakpoint Unit

JTAG Interface

Configurable System Interconnect (CSI) bus Configurable Processor Applications

• Custom Peripheral Set – Practically any digital function – Matched specifically to the application – Derivative on demand • Hardware Acceleration – Algorithms in hardware – Handling odd-size math – Faster real-time response – Multiple operations in parallel – Bit manipulation Custom Peripherals (Hardware/Software Trade-Offs) • Software Solution (µs to ms) – Slow peripherals (serial ports, etc.) – Limited by CPU performance (Scenix, Teragen) – Easy to modify – Cheap, re-use existing silicon • Hardware Solution (ns to µs) – Standard derivative (no differentiation) – CPU + ASIC/FPGA (difficult to modify) – Configurable Processor (easy to modify) – Additional silicon/cost Example: SPI Interface

• Find a processor derivative that matches your requirements – It has SPI, but does it have everything else you need? – Availability? Software support? • Implement your peripheral in software (ex. Scenix) • Build your peripheral in an external ASIC or FPGA • Use a SPI soft peripheral in your configurable processor Design Techniques

Peripherals in Software Peripherals in Hardware • ‘C’ language • Schematic capture • Assembly • VHDL/Verilog entry • Instruction-set • Digital logic simulator simulator • Soft macros available • Function library

Software Hardware Hardware Acceleration Example

• Calculate the instantaneous average of four 8-bit values ( + + + DCBA ) Z = 4

• Issues – Concurrency (I/O, processing requirements) – Handling overflow (accumulator width) – Performance (processing time) Two Solutions

• Processor Solution • Logic Solution

Move PortA to Accumulator A Add PortB to Accumulator

B Add PortC to Accumulator ÷4 Z

Add PortD to Accumulator C

Shift Accumulator Right Twice (divide by four) D

Move Accumulator to RegZ More instances require More instances require additional time additional logic Comparing the Alternatives

Device Development Solution Issues When to Use It Cost Time/Cost Availability, software Processor Quick/ Lowest cost, if your $1 - $15 support, Derivative Low application fits differentiation Acquiring cores, System-on-a- $5 - $50 Long/ Volume, complexity, + development verification, NRE, Chip High performance justify it. cost vendor selection Fast Moderate/ Creating ‘soft’ If it fits and it’s fast $5 - $50 Processor Low peripherals enough, use it! Multi-chip solution, For applications where inter-chip CPU + Moderate/ a configurable $10 - $100 communication, ASIC/FPGA Moderate processor does not yet debugging support, exist. multiple CAE tools Fast time to market, Configurable Moderate/ $8 - $80 New technology complete embedded Processor Moderate system Configurable Processor Technical Challenges • Communication between the processor and programmable logic functions • Maintaining a standard development flow • Debugging a system with both processor and programmable logic Communication between the Processor and Programmable Logic • Distributing the data and address bus to the programmable logic • Decoding/controlling bus transactions • Multi-master bus support • Register intimacy • Debugging support One Solution: CSI Bus Socket (Configurable System Interconnect) • Distributes address and Side-band Signals

CSI Socket Interface data to CSL matrix – Full access to data and Data Write 8032 address bus "Turbo" Data Read – Dedicated address Address decoding – Predictable, synchronous timing ?Selectors 2-Channel Bus Clock – Forward compatible with DMA Controller DMA Request/ Acknowledge (CSL) Matrix future configurable

Wait-State processors Control

Configurable System Logic – Wait-state and Breakpoint Hardware Configurable Interconnect (CSI) Bus System Control breakpoint control Breakpoint Unit – Contention-free bussing Decoding Bus Transactions CSI Selector Style CSI Bus Address A2 A1 A0

An Fast address decoding • Any address range Match0 READ RDSEL Match1 • Access type Code

WRITE WRSEL Data

BCLK Exported Special Function Registers (SFR) • Three modes Selector Bus Clock Chip Select RdSel DMA Control Register Data Read [7:0] DATA • Up to 200 selectors in a single device Decode delay is constant (less than 5 ns after clock) Connecting the Two Development Worlds

Hardware Development Software Development • Design and “soft” • Compiler/assembler module libraries support • Passing register • Function libraries addresses to • Instruction-set compiler/assembler simulator • Vendor place and • System-wide in- route software system debugging • Device programming support support • System-wide in- system debugging support Preserving Existing Tool Flow

Header Configurable Processor Tools File MCU Development Tools 1 2 System Program Configuration Development

Soft Module Object Source Library File Code 4 3 Library System Test Program and Debug Debug

5 Device Designer’s Programming standard tool flow System on a Chip Flow

Configurable Processor MCU Development Tools Development Software 3rd Party EDA Tools Program System Development Configuration Capture or Synthesis Instruction Source Code Simulation Library System Test Netlist and Debug Functional In-System Simulation Debug Device Designer’s Programming standard tool flow Example System: FastChip Software “Soft” Module Library

Dedicated Resources

“Soft” peripherals dragged into CSL matrix

Resources Used Indicators Real-Time, In-System Debugging • Difficult in most ASIC or system-on-a-chip designs – Must rely on simulation before completing design • Most debuggers only support the processor – Monitor bus activity – Monitor processor registers – Break on event and single-step • Additional debugging desired for programmable logic functions – Monitor the state of logic and flip-flops in “soft” peripherals – Monitor or force a breakpoint from programmable logic Configurable Processor Debugging Capabilities

Access to all Commands from 3rd address mapped party debuggers 8032 RAM Test and and other key Control translated to JTAG processor resources instructions CSI Bus

All sequential and Configurable combinatorial logic System Logic Breakpoint unit nodes have snoops the internal complete bus, providing Triscend E5 observability complex runtime control features Summary

• Configurable Processors offer benefits in design: – Faster time to market than ASIC/SOC designs – Higher performance compared to most processors – Higher product differentiation • Configurable processors are a new class of single-chip programmable devices designed for embedded systems applications – Industry-standard processor – Dedicated, high-performance internal bus – Programmable logic, connected to internal bus – On-chip, high-density memory