<<

Architectures

Overview of SOC Architecture design

Tien-Fu Chen

National Chung Cheng Univ.

© by Tien-Fu Chen@CCU SOC - 0

SOC design Issues

 SOC architecture

 Reconfigurable

 System-level  Programmable processors  Low-level reconfiguration  On-

 Embedded Issues

© by Tien-Fu Chen@CCU SOC - 1 Embedded Systems vs. General Purpose - 1  General purpose computing

 Runs a few applications often Intended to run a fully general set knownatdesigntime of applications

 Not end-user programmable  End-user programmable

 Operates in fixed run-time  Faster is always better constraints, additional performance may not be useful/valuable

© by Tien-Fu Chen@CCU SOC - 2

Embedded Systems vs. General Purpose Computing - 2 Embedded System General purpose computing

Differentiating features:

 Differentiating features  cost  speed (need not be fully  speed (must be predictable) predictable)  speed  did we mention speed? output analog CPU  cost (largest component power) input analog Logic mem embedded computer

© by Tien-Fu Chen@CCU SOC - 3 Embedded System: Examples

Embedded System

© by Tien-Fu Chen@CCU SOC - 4

Design Complexity Increase

SoC Characteristics Current Next Design Design Average Gate Count 616K 1,000K Designs with Gate Count > 18% 31% 1M Units Shipped >5M 20% 37%

# of lines of DSP SW 54K 295K # of lines of uP SW 75K 266K Clock Speed >133mhz 44% 61% >400mhz 15% 22% >5 Clock Domains 25% 35% >9 Clock Domains 10% 12%

Source: Collett International Research- December 2000 Research on 360 IC/ASIC Design Teams in North America © by Tien-Fu Chen@CCU SOC - 5 Embedded System on Chip (SoC) Design

System Environment Zone 4: Global Satellite Zone 3: Suburban Zone 2: Urban Zone 1: In-Building

Pico-Cell Macro-Cell Micro-Cell Requirements Specification Specification Untimed, Unclocked, C/C++ Level Embedded Testbench Software Implementation

Embedded Characterization Refinement Export P/C µ Analog

Memory SOC Implementation Timed, Clocked, RTL Level CORE

© by Tien-Fu Chen@CCU SOC - 6

Architectures

 Supplements Models by specifying how the system will actually be implemented

 Goal of each architecture is to describe

 Number of components  Type of each component  Type of each connection among above components  General classification

 Application-specific architectures: DSP  General-purpose architectures: CISC, RISC  Parallel processors: VLIW, SIMD, MIMD

© by Tien-Fu Chen@CCU SOC - 7 System Architecture Design

 System Architecture & Exploration  What

 Hardware/Software partitioning; , and memory architecture choices; system timing budget, strategy, system verification strategy…  Partitioning into HW block hierarchy, cycle time budgeting, block interfaces, block verification, clock architecture and test strategy  Fixed point architecture exploration and design  How - Quickly assemble architecture(s) for exploration to measure system timing/performance. Need to accurately (enough) model the bottlenecks

© by Tien-Fu Chen@CCU SOC - 8

System Integration & Verification

 System Design Environment for HW/SW Refinement, Verification and Integration  What – Enables hierarchical (manual or automatic) refinement of individual blocks of design in context of system. Maintain system and hierarchical test benches – Verification of refined hardware/software with entire system design – Define next level of clock architecture (derived) and test strategy  How - Build a system verification hierarchy that allows integration of HW blocks, system software (HAL), embedded application SW and eventually verifying the entire design at cycle accurate (or RTL) level

© by Tien-Fu Chen@CCU SOC - 9 CoDesign and Co-Synthesis Specification

Partition HW: HDL SW: , (Behavioral, DataFlow, Textual/Graphical Structural), Schematic Co-Synthesis representation Synthesis

Executable or RTL, Gate level, Compilable code: Transistors, Layout Detailed Representation The program(s), of Implementation OS routines

© by Tien-Fu Chen@CCU SOC - 10

Traditional design

Traditional System Design Tasks

SW design SW test System design PCB test

ASIC design Fabrication Test

Time

© by Tien-Fu Chen@CCU SOC - 11 System-level Co-design

Co-Design Process Tasks

SW design SW test System design Shared Design PCB test

ASIC design Fabrication Test

Time System-Level Partitioning

© by Tien-Fu Chen@CCU SOC - 12

Configurabilty and Embedded Systems

Advantages of configuration:

Pay (in power, design time, area) only for what you use

Gain additional performance by adding features tailored to your application:

Particularly for embedded systems:

 Principally in embedded controller applications  Some us in DSP

© by Tien-Fu Chen@CCU SOC - 13 What to Configure?

What parts of the /microprocessor system to configure?

Easy answers:

 Memory and Sizes - get precisely the sizes your applications needs  sizes  Interrupt handling and addresses Harder answers:

  Instructions But first we need more context

© by Tien-Fu Chen@CCU SOC - 14

Trickle Down Theory of Embedded Architectures

 Mainframe/ Features tend to trickle down: • #bits: 4->8->16->32->64  High-end servers/workstations • ISA’s • Floating point support  High-end personal • Dynamic • Caches  Personal computers • I/O controllers/processors • LIW/VLIW  Lap tops/ tops • Superscalar  Gadgets



 ...

© by Tien-Fu Chen@CCU SOC - 15 Configurability in ARM Processor

 ARM allows for configurability via AMBA bus

 Offers ``prime cell’’ peripherals which hook into AMBA Bus (APB)

 UART

 Real Time Clock

 Audio Codec Interface

 Keyboard and mouse interface

 General purpose I/O

 Smart card interface

 Generic IR interface

© by Tien-Fu Chen@CCU SOC - 16

ARM7 core

© by Tien-Fu Chen@CCU SOC - 17 ARM’s Amba open standard

 Advanced , (ASB) - high performance, CPU, DMA, external

 Advanced Peripheral Bus, (APB) - low speed, low power, parallel I/O, UART’s

 External interface

http://www.arm.com/Documentation/Overviews/AMBA_Intro/#intro

© by Tien-Fu Chen@CCU SOC - 18

Ex: ARM Infrared (IR) Interface

© by Tien-Fu Chen@CCU SOC - 19 Ex : Audio Codec

© by Tien-Fu Chen@CCU SOC - 20

Another Kind of Configurability

HDL Synthesis of a processor core from an RTL description allows for: RTL Synthesis • full range of other types of configurability • additional degrees of freedom in logic quality of implementation optimization Examples:

netlist • ARM7 • Motorola Coldfire physical design • Xtensa

layout

© by Tien-Fu Chen@CCU SOC - 21 Issues in low-level Configurable Design

Choice and Granularity of Computational Elements Choice and Granularity of Interconnect Network (Re)configuration Time and Rate  Fabrication time --> Fixed function devices  Beginning of product use --> /Quicklogic FPGAs  Beginning of usage epoch --> (Re)configurable FPGAs  Every cycle --> traditional Instruction Set Processors © by Tien-Fu Chen@CCU SOC - 22

The Choice of the Computational Elements

Reconfigurable Reconfigurable Reconfigurable Reconfigurable Logic Arithmetic Control In

Data Program mux Memory Memory AddrGen AddrGen CLB CLB reg0 Inst ruc tion Memory Memory Decoder reg1 & Controller CLB CLB MAC buffer Memory

Bit-Level Operations Dedicated data paths Arithmetic kernels RTOS e.g. encoding e.g. Filters, AGU e.g. Process management

© by Tien-Fu Chen@CCU SOC - 23 Multi-granularity Reconfigurable Architecture: The Berkeley Pleiades Architecture

Configuration Bus Satellite Processor Configuration Arithmetic Arithmetic Arithmetic Processor Processor Processor Dedicated Arithmetic Communication Network

Network Interface Control Configurable Configurable Processor Datapath Logic

• Computational kernels are “spawned” to satellite processors • Control processor supports RTOS and reconfiguration • Order(s) of magnitude energy-reduction over traditional programmable architectures

© by Tien-Fu Chen@CCU SOC - 24

Matching Computation and Architecture

AddressGen AddressGen

Memory Memory Convolution

MAC MAC

LCG Control Processor

Two models of computation: Two architectural models: communicating processes + data-flow sequential control+ data-driven

© by Tien-Fu Chen@CCU SOC - 25 Execution Model of a Data-Flow Kernel

Embedded processor

Code seg start end

AddrGen for(i=1;i<=L;i++) for(k=i;k<=L;k++) MEM: in phi[i][k]= phi[i-1][k-1] AddrGen MPY MPY +in[NP-i]*in[NP-k] MEM: phi -in[NA-1-i]*in[NA-1-k]; ALU

Code seg ALU

• Distributed control and memory © by Tien-Fu Chen@CCU SOC - 26

Software Methodology Flow

Algorithms C++

µproc & Kernel Detection Accelerator PDA Models

Behavioral Estimation/Exploration SUIF+ C-IF Power & Timing Estimation of Various Kernel Implementations Premapped Kernels Partitioning C++ Module Libraries Software Compilation Reconfig. Hardware Mapping Interface Code Generation

© by Tien-Fu Chen@CCU SOC - 27 The System-on-a-Chip Nightmare

System Bus

DMA CPU DSP

Mem Ctrl. Bridge

MPEG The “Board-on-a-Chip” Approach

I O O C

Custom Interfaces

Control Wires Peripheral Bus © by Tien-Fu Chen@CCU SOC - 28

Sonics SOC Integration Architecture

Open Core DMA CPU DSP MPEG Protocol™

MultiChip SiliconBackplane™ Backplane™ (patented) {

C MEM I O SiliconBackplane Agent™

© by Tien-Fu Chen@CCU SOC - 29 Master vs. Slave

IP Core IP Core IP Core

Master Master Slave Slave

Open Core Response Protocol Request Initiator Target Slave Slave Master Master

On-Chip Bus

© by Tien-Fu Chen@CCU SOC - 30

The Backplane: Why Not Use a Computer Bus?

Transmit FIFO Receive FIFO IP IP Core Arbiter Address Core

 Computer  Bus Data

IP IP Core Core Time

• Expensive to decouple • Not designed for real-time

© by Tien-Fu Chen@CCU SOC - 31 Communication Buses Decouple and Guarantee Real Time

Transmit FIFO Receive FIFO IP IP Core TDMA TDMA Core

 Communications  Bus Data

IP IP Core Core Time

• Connections are expensive • Poor read

© by Tien-Fu Chen@CCU SOC - 32

On-Chip Bus for SOC

Example on- chip bus interconnects

 ARM’s AMBA bus  IBM’s Core Connect  Virtual Socket Interface Alliance group  Open Connect Protocol group Example processor cores

 ARM  MIPS  PowerPC

© by Tien-Fu Chen@CCU SOC - 33 Design Example: The Radio-on-a-Chip

DSP and control

Reconfigurable Embedded uP intensive State Machines +DSPs Mixed-mode Combines programmable, flexible, FPGA and application-specific

Dedicated Reconfigurable modules DSP DataPath Cost and energy are the key metrics

© by Tien-Fu Chen@CCU SOC - 34

System Level Design Science

Design Methodology:

 Top Down Aspect:  Orthogonalization of Concerns: – Separate Implementation from Conceptual Aspects – Separate computation from communication  Formalization: precise unambiguous semantics  Abstraction: capture the desired system details (do not overspecify)  Decomposition: partitioning the system behavior into simpler behaviors  Successive Refinements: refine the abstraction level down to the implementation by filling in details and passing constraints  Bottom Up Aspect:  IP Re-use (even at the algorithmic and functional level)  Components of architecture from pre-existing library

© by Tien-Fu Chen@CCU SOC - 35 Separate Behavior from Micro-architecture

System Behavior Implementation Architecture

 Functional Specification  Hardware and Software of System.  Optimized Computer  No notion of hardware or software! Mem 13 External DSP User/Sys Processor Rate Control I/O Buffer 3 12 Synch Control MPEG DSP RAM 4

Rate Frame Video Video Front Transport Buffer Buffer Front Decode 6 Output 8 Peripheral Control End 1 Decode 2 5 7 Processor Bus Processor Bus Processor Rate Audio Buffer Audio System 9 Decode/ Decode Output 10 RAM

Mem 11 © by Tien-Fu Chen@CCU SOC - 36

Map Between Behavior from Architecture

Transport Decode Implemented as Software Running on Microcontroller

Mem 13 Communication User/Sys DSP Rate Control External Buffer 3 Sensor Over Bus Processor 12 I/O Synch Control 4 MPEG DSP RAM

Rate Frame Video Video Front Transport Buffer Buffer Front Decode 6 Output 8 End 1 Decode 2 5 7 Peripheral Control Processor Bus Rate Processor Bus Processor Buffer Audio 9 Decode/ Audio System Output 10 Decode RAM Mem 11

Audio Decode Behavior Implemented on Dedicated Hardware

© by Tien-Fu Chen@CCU SOC - 37 Crisis

Cheaper, more powerful J. Fiddler - WRS

Increasing Embedded More Time-to-market Software Applications pressure Crisis

J. Fiddler - WRS Bigger, More Complex Applications

© by Tien-Fu Chen@CCU SOC - 38

SW: Embedded Software Tools

application U source Application S code software E R a.out RTOS

A S ROM C I C simulator P debugger A U S I RAM C

© by Tien-Fu Chen@CCU SOC - 39 Hardware Platforms Not Enough!

Hardware platform has to be abstracted

Interface to the application software is the “API”

Software layer performs abstraction:

 Programmable cores and memory subsystem “hidden” by RTOS and  I/O subsystem with Device Drivers  Network with Network Communication Software

© by Tien-Fu Chen@CCU SOC - 40

Software Platforms

Platform API Application Software

Software Software Platform Hardware Platform Hardware Input devices Output Devices I O network

API

C

o RTOS

m

Device Drivers Network

p

i

l

e Communication r BIOS

© by Tien-Fu Chen@CCU SOC - 41 Platform-based methodology

 Platform based design:  Application mapped on architecture  Performance evaluation and iterative refinement  Challenges:  complete system simulation  complexity management  composability and reuse  Key elements for composability  Identification and use of useful models of computation  FSMD, DE, DF, CSP, ...  A flexible, extensible language platform to capture the functionality.  Composability can be achieved using Object-oriented mechanisms:

© by Tien-Fu Chen@CCU SOC - 42

Final goal for SOC: Platform-Based Design Taking Design Block Reuse to the Next Level

Pre-Qualified/Verified Foundation Block + Reference Design Foundation-IP* Scaleable bus, test, power, IO, MEM clock, timing architectures Application Hardware IP Space CPU Processor(s), RTOS(es) and FPGA SW architecture SW IP

Programmable Methodology / Flows: System-level performance evaluation environment Rapid Prototype for *IP can be hardware (digital End-Customer Evaluation or analogue) or software. Foundry-Specific SoC Derivative Design IP can be hard, soft or Pre-Qualification Methodologies ‘firm’ (HW), source or Foundry Targetting Flow object (SW)

© by Tien-Fu Chen@CCU SOC - 43 Summary of SOC Design

System Verification Specification

Co-Synthesis Partitioning

Verification HW Parameter SW Parameter Estimation Estimation

HW SW Synthesis Synthesis Verification ASIC OS EXE Code Verification

System Integration Final Verification

© by Tien-Fu Chen@CCU SOC - 44