Computer Architectures
Overview of SOC Architecture design
Tien-Fu Chen
National Chung Cheng Univ.
© by Tien-Fu Chen@CCU SOC - 0
SOC design Issues
SOC architecture
Reconfigurable
System-level Programmable processors Low-level reconfiguration On-chip bus
Embedded Software Issues
© by Tien-Fu Chen@CCU SOC - 1 Embedded Systems vs. General Purpose Computing - 1 Embedded System General purpose computing
Runs a few applications often Intended to run a fully general set knownatdesigntime of applications
Not end-user programmable End-user programmable
Operates in fixed run-time Faster is always better constraints, additional performance may not be useful/valuable
© by Tien-Fu Chen@CCU SOC - 2
Embedded Systems vs. General Purpose Computing - 2 Embedded System General purpose computing
Differentiating features:
power Differentiating features cost speed (need not be fully speed (must be predictable) predictable) speed did we mention speed? output analog CPU cost (largest component power) input analog Logic mem embedded computer
© by Tien-Fu Chen@CCU SOC - 3 Embedded System: Examples
Embedded System
© by Tien-Fu Chen@CCU SOC - 4
Design Complexity Increase
SoC Characteristics Current Next Design Design Average Gate Count 616K 1,000K Designs with Gate Count > 18% 31% 1M Units Shipped >5M 20% 37%
# of lines of DSP SW 54K 295K # of lines of uP SW 75K 266K Clock Speed >133mhz 44% 61% >400mhz 15% 22% >5 Clock Domains 25% 35% >9 Clock Domains 10% 12%
Source: Collett International Research- December 2000 Research on 360 IC/ASIC Design Teams in North America © by Tien-Fu Chen@CCU SOC - 5 Embedded System on Chip (SoC) Design
System Environment Zone 4: Global Satellite Zone 3: Suburban Zone 2: Urban Zone 1: In-Building
Pico-Cell Macro-Cell Micro-Cell Requirements Specification Specification Untimed, Unclocked, C/C++ Level Embedded Systems Design Testbench Software Implementation
Embedded Characterization Refinement Software Design Export P/C µ Analog
Memory SOC Implementation Timed, Clocked, Firmware RTL Level CORE
© by Tien-Fu Chen@CCU SOC - 6
Architectures
Supplements Models by specifying how the system will actually be implemented
Goal of each architecture is to describe
Number of components Type of each component Type of each connection among above components General classification
Application-specific architectures: DSP General-purpose architectures: CISC, RISC Parallel processors: VLIW, SIMD, MIMD
© by Tien-Fu Chen@CCU SOC - 7 System Architecture Design
System Architecture & Exploration What
Hardware/Software partitioning; processor, and memory architecture choices; system timing budget, power management strategy, system verification strategy… Partitioning into HW block hierarchy, cycle time budgeting, block interfaces, block verification, clock architecture and test strategy Fixed point architecture exploration and design How - Quickly assemble architecture(s) for exploration to measure system timing/performance. Need to accurately (enough) model the bottlenecks
© by Tien-Fu Chen@CCU SOC - 8
System Integration & Verification
System Design Environment for HW/SW Refinement, Verification and Integration What – Enables hierarchical (manual or automatic) refinement of individual blocks of design in context of system. Maintain system and hierarchical test benches – Verification of refined hardware/software with entire system design – Define next level of clock architecture (derived) and test strategy How - Build a system verification hierarchy that allows integration of HW blocks, system software (HAL), embedded application SW and eventually verifying the entire design at cycle accurate (or RTL) level
© by Tien-Fu Chen@CCU SOC - 9 CoDesign and Co-Synthesis Specification
Partition HW: HDL SW: Algorithm, (Behavioral, DataFlow, Textual/Graphical Structural), Schematic Co-Synthesis representation Synthesis
Executable or RTL, Gate level, Compilable code: Transistors, Layout Detailed Representation The program(s), of Implementation OS routines
© by Tien-Fu Chen@CCU SOC - 10
Traditional design
Traditional System Design Process Tasks
SW design SW test System design PCB test
ASIC design Fabrication Test
Time
© by Tien-Fu Chen@CCU SOC - 11 System-level Co-design
Co-Design Process Tasks
SW design SW test System design Shared Design PCB test
ASIC design Fabrication Test
Time System-Level Partitioning
© by Tien-Fu Chen@CCU SOC - 12
Configurabilty and Embedded Systems
Advantages of configuration:
Pay (in power, design time, area) only for what you use
Gain additional performance by adding features tailored to your application:
Particularly for embedded systems:
Principally in embedded controller microprocessor applications Some us in DSP
© by Tien-Fu Chen@CCU SOC - 13 What to Configure?
What parts of the microcontroller/microprocessor system to configure?
Easy answers:
Memory and Cache Sizes - get precisely the sizes your applications needs Register file sizes Interrupt handling and addresses Harder answers:
Peripherals Instructions But first we need more context
© by Tien-Fu Chen@CCU SOC - 14
Trickle Down Theory of Embedded Architectures
Mainframe/supercomputers Features tend to trickle down: • #bits: 4->8->16->32->64 High-end servers/workstations • ISA’s • Floating point support High-end personal computers • Dynamic scheduling • Caches Personal computers • I/O controllers/processors • LIW/VLIW Lap tops/palm tops • Superscalar Gadgets
...
© by Tien-Fu Chen@CCU SOC - 15 Configurability in ARM Processor
ARM allows for configurability via AMBA bus
Offers ``prime cell’’ peripherals which hook into AMBA Peripheral Bus (APB)
UART
Real Time Clock
Audio Codec Interface
Keyboard and mouse interface
General purpose I/O
Smart card interface
Generic IR interface
© by Tien-Fu Chen@CCU SOC - 16
ARM7 core
© by Tien-Fu Chen@CCU SOC - 17 ARM’s Amba open standard
Advanced System Bus, (ASB) - high performance, CPU, DMA, external
Advanced Peripheral Bus, (APB) - low speed, low power, parallel I/O, UART’s
External interface
http://www.arm.com/Documentation/Overviews/AMBA_Intro/#intro
© by Tien-Fu Chen@CCU SOC - 18
Ex: ARM Infrared (IR) Interface
© by Tien-Fu Chen@CCU SOC - 19 Ex : Audio Codec
© by Tien-Fu Chen@CCU SOC - 20
Another Kind of Configurability
HDL Synthesis of a processor core from an RTL description allows for: RTL Synthesis • full range of other types of configurability netlist Library • additional degrees of freedom in logic quality of implementation optimization Examples:
netlist • ARM7 • Motorola Coldfire physical design • Tensilica Xtensa
layout
© by Tien-Fu Chen@CCU SOC - 21 Issues in low-level Configurable Design
Choice and Granularity of Computational Elements Choice and Granularity of Interconnect Network (Re)configuration Time and Rate Fabrication time --> Fixed function devices Beginning of product use --> Actel/Quicklogic FPGAs Beginning of usage epoch --> (Re)configurable FPGAs Every cycle --> traditional Instruction Set Processors © by Tien-Fu Chen@CCU SOC - 22
The Choice of the Computational Elements
Reconfigurable Reconfigurable Reconfigurable Reconfigurable Logic Datapaths Arithmetic Control In
Data Program mux Memory Memory AddrGen AddrGen CLB CLB reg0 Inst ruc tion Memory Memory Decoder reg1 Datapath & Controller adder CLB CLB MAC buffer Data Memory
Bit-Level Operations Dedicated data paths Arithmetic kernels RTOS e.g. encoding e.g. Filters, AGU e.g. Convolution Process management
© by Tien-Fu Chen@CCU SOC - 23 Multi-granularity Reconfigurable Architecture: The Berkeley Pleiades Architecture
Configuration Bus Satellite Processor Configuration Arithmetic Arithmetic Arithmetic Processor Processor Processor Dedicated Arithmetic Communication Network
Network Interface Control Configurable Configurable Processor Datapath Logic
• Computational kernels are “spawned” to satellite processors • Control processor supports RTOS and reconfiguration • Order(s) of magnitude energy-reduction over traditional programmable architectures
© by Tien-Fu Chen@CCU SOC - 24
Matching Computation and Architecture
AddressGen AddressGen
Memory Memory Convolution
MAC MAC
LCG Control Processor
Two models of computation: Two architectural models: communicating processes + data-flow sequential control+ data-driven
© by Tien-Fu Chen@CCU SOC - 25 Execution Model of a Data-Flow Kernel
Embedded processor
Code seg start end
AddrGen for(i=1;i<=L;i++) for(k=i;k<=L;k++) MEM: in phi[i][k]= phi[i-1][k-1] AddrGen MPY MPY +in[NP-i]*in[NP-k] MEM: phi -in[NA-1-i]*in[NA-1-k]; ALU
Code seg ALU
• Distributed control and memory © by Tien-Fu Chen@CCU SOC - 26
Software Methodology Flow
Algorithms C++
µproc & Kernel Detection Accelerator PDA Models
Behavioral Estimation/Exploration SUIF+ C-IF Power & Timing Estimation of Various Kernel Implementations Premapped Kernels Partitioning C++ Module Libraries Software Compilation Reconfig. Hardware Mapping Interface Code Generation
© by Tien-Fu Chen@CCU SOC - 27 The System-on-a-Chip Nightmare
System Bus
DMA CPU DSP
Mem Ctrl. Bridge
MPEG The “Board-on-a-Chip” Approach
I O O C
Custom Interfaces
Control Wires Peripheral Bus © by Tien-Fu Chen@CCU SOC - 28
Sonics SOC Integration Architecture
Open Core DMA CPU DSP MPEG Protocol™
MultiChip SiliconBackplane™ Backplane™ (patented) {
C MEM I O SiliconBackplane Agent™
© by Tien-Fu Chen@CCU SOC - 29 Master vs. Slave
IP Core IP Core IP Core
Master Master Slave Slave
Open Core Response Protocol Request Initiator Target Slave Slave Master Master
On-Chip Bus
© by Tien-Fu Chen@CCU SOC - 30
The Backplane: Why Not Use a Computer Bus?
Transmit FIFO Receive FIFO IP IP Core Arbiter Address Core
Computer Bus Data
IP IP Core Core Time
• Expensive to decouple • Not designed for real-time
© by Tien-Fu Chen@CCU SOC - 31 Communication Buses Decouple and Guarantee Real Time
Transmit FIFO Receive FIFO IP IP Core TDMA TDMA Core
Communications Bus Data
IP IP Core Core Time
• Connections are expensive • Poor read latency
© by Tien-Fu Chen@CCU SOC - 32
On-Chip Bus for SOC
Example on- chip bus interconnects
ARM’s AMBA bus IBM’s Core Connect Virtual Socket Interface Alliance group Open Connect Protocol group Example processor cores
ARM MIPS PowerPC
© by Tien-Fu Chen@CCU SOC - 33 Design Example: The Radio-on-a-Chip
DSP and control
Reconfigurable Embedded uP intensive State Machines +DSPs Mixed-mode Combines programmable, flexible, FPGA and application-specific
Dedicated Reconfigurable modules DSP DataPath Cost and energy are the key metrics
© by Tien-Fu Chen@CCU SOC - 34
System Level Design Science
Design Methodology:
Top Down Aspect: Orthogonalization of Concerns: – Separate Implementation from Conceptual Aspects – Separate computation from communication Formalization: precise unambiguous semantics Abstraction: capture the desired system details (do not overspecify) Decomposition: partitioning the system behavior into simpler behaviors Successive Refinements: refine the abstraction level down to the implementation by filling in details and passing constraints Bottom Up Aspect: IP Re-use (even at the algorithmic and functional level) Components of architecture from pre-existing library
© by Tien-Fu Chen@CCU SOC - 35 Separate Behavior from Micro-architecture
System Behavior Implementation Architecture
Functional Specification Hardware and Software of System. Optimized Computer No notion of hardware or software! Mem 13 External DSP User/Sys Processor Rate Control I/O Buffer 3 Sensor 12 Synch Control MPEG DSP RAM 4
Rate Frame Video Video Front Transport Buffer Buffer Front Decode 6 Output 8 Peripheral Control End 1 Decode 2 5 7 Processor Bus Processor Bus Processor Rate Audio Buffer Audio System 9 Decode/ Decode Output 10 RAM
Mem 11 © by Tien-Fu Chen@CCU SOC - 36
Map Between Behavior from Architecture
Transport Decode Implemented as Software Task Running on Microcontroller
Mem 13 Communication User/Sys DSP Rate Control External Buffer 3 Sensor Over Bus Processor 12 I/O Synch Control 4 MPEG DSP RAM
Rate Frame Video Video Front Transport Buffer Buffer Front Decode 6 Output 8 End 1 Decode 2 5 7 Peripheral Control Processor Bus Rate Processor Bus Processor Buffer Audio 9 Decode/ Audio System Output 10 Decode RAM Mem 11
Audio Decode Behavior Implemented on Dedicated Hardware
© by Tien-Fu Chen@CCU SOC - 37 Embedded Software Crisis
Cheaper, more powerful J. Fiddler - WRS Microprocessors
Increasing Embedded More Time-to-market Software Applications pressure Crisis
J. Fiddler - WRS Bigger, More Complex Applications
© by Tien-Fu Chen@CCU SOC - 38
SW: Embedded Software Tools
application compiler U source Application S code software E R a.out RTOS
A S ROM C I C simulator P debugger A U S I RAM C
© by Tien-Fu Chen@CCU SOC - 39 Hardware Platforms Not Enough!
Hardware platform has to be abstracted
Interface to the application software is the “API”
Software layer performs abstraction:
Programmable cores and memory subsystem “hidden” by RTOS and compilers I/O subsystem with Device Drivers Network with Network Communication Software
© by Tien-Fu Chen@CCU SOC - 40
Software Platforms
Platform API Application Software
Software Software Platform Hardware Platform Hardware Input devices Output Devices I O network
API
C
o RTOS
m
Device Drivers Network
p
i
l
e Communication r BIOS
© by Tien-Fu Chen@CCU SOC - 41 Platform-based methodology
Platform based design: Application mapped on architecture Performance evaluation and iterative refinement Challenges: complete system simulation complexity management composability and reuse Key elements for composability Identification and use of useful models of computation FSMD, DE, DF, CSP, ... A flexible, extensible language platform to capture the functionality. Composability can be achieved using Object-oriented mechanisms:
© by Tien-Fu Chen@CCU SOC - 42
Final goal for SOC: Platform-Based Design Taking Design Block Reuse to the Next Level
Pre-Qualified/Verified Foundation Block + Reference Design Foundation-IP* Scaleable bus, test, power, IO, MEM clock, timing architectures Application Hardware IP Space CPU Processor(s), RTOS(es) and FPGA SW architecture SW IP
Programmable Methodology / Flows: System-level performance evaluation environment Rapid Prototype for *IP can be hardware (digital End-Customer Evaluation or analogue) or software. Foundry-Specific SoC Derivative Design IP can be hard, soft or Pre-Qualification Methodologies ‘firm’ (HW), source or Foundry Targetting Flow object (SW)
© by Tien-Fu Chen@CCU SOC - 43 Summary of SOC Design
System Verification Specification
Co-Synthesis Partitioning
Verification HW Parameter SW Parameter Estimation Estimation
HW SW Synthesis Synthesis Verification ASIC OS EXE Code Verification
System Integration Final Verification
© by Tien-Fu Chen@CCU SOC - 44