Lecture 26a: Software Environments for Embedded Systems

Prepared by: Professor Kurt Keutzer Computer Science 252, Spring 2000 With contributions from: Jerry Fiddler, Wind River Systems, Minxi Gao, Xiaoling Xu, UC Berkeley Shiaoje Wang, Princeton

1 Kurt Keutzer SW: Embedded Software Tools

application compiler U source Application S code software E R a.out RTOS

A S ROM C I C simulator P debugger A U S I RAM C

2 Kurt Keutzer Another View of Microprocessor Architecture

Let’s look at current architectural evolution from the standpoint of the software developers …, in particular Jerry Fiddler

3 Kurt Keutzer Fiddler’s Predictions for the Next Ten Years (2010)

End of the “Age of the PC” Lots of Exciting Applications Development Will Continue To Be Hard

G Even as we and our competitors continue to make incredible efforts Chips - No predictions MEMS / Nano-technology & Sensors Will Impact Us

J. Fiddler - WRS

4 Kurt Keutzer Fundamental Principles

Computers are, and will be, everywhere The world itself is becoming more intelligent Our infrastructure will have major software content Most of our access to information will be through embedded systems Economics will inexorably drive deployment of embedded systems The Internet is one important factor in this trend Reliability is a critical issue EVERY tech and mfg. business will need to become good at embedded software

J. Fiddler - WRS

5 Kurt Keutzer What Will Be Embedded in Ten Years?

Everything That is Now Electro-Mechanical Machines (Nano-Machines) Analog Signals Anything that communicates Lots of stuff in our cars Our Bodies

G Today - Pacemakers

G Soon - De-Fibrillators, Insulin Dispensers

G We can all be the $6M Person, for a lot cheaper All sorts of interfaces

G Speech, DNI, etc.

J. Fiddler - WRS

6 Kurt Keutzer Embedded Microprocessor Evolution

> 500k transistors 2+M transistors 5+M transistors 22+M transistors 1 - 0.8 µ 0.8 - 0.5 µ 0.5 - 0.35 µ 0.25 - 0.18 µ 33 mHz 75 - 100 mHz 133 - 167 mHz 500 - 600 mHz

1989 1993 1995 1999

Embedded CPU cores are getting smaller; ~ 2mm2 for up to 400 mHz

G Less than 5% of CPU size Higher Performance by:

G Faster clock, deeper pipelines, branch prediction, ... Trend is towards higher integration of processors with:

G Devices that were on the board now on chip: “system on a chip”

G Adding more compute power by add-on DSPs, ...

G Much larger L1 / L2 caches on silicon J. Fiddler - WRS

7 Kurt Keutzer Microprocessor Chaos

ST 20 J. Fiddler - WRS M32 R/D StrongARM ARM SH-DSP SH 4 MCORE 680x0 680x0 CPU32 CPU32 PowerPC PowerPC 80x86 80x86 MIPS 3k/4k/5k MIPS 3k/4k/5k SPARC SPARC SH 1/2/3 SH 1/2/3 29k 29k 29k 680x0 RAD 6k RAD 6k CPU32 Siemens C16x Siemens C16x 80x86 NEC V8xx NEC V8xx SPARC PARISC PARISC MIPS R3k i960 i960 68000 i960 563xx 563xx

1980 1990 1996 1998 8 Kurt Keutzer A Challenging Environment

Expanding Functional Demands J. Fiddler - WRS Of Embedded Applications

And keep it small, stupid!

Numerous Microprocessor Architectures Derivative Processors Application-Specific CPUs Systems On A Chip 9 Kurt Keutzer New Hardware Challenges Software Development J. Fiddler - WRS

More & More Architectures

G User-Customizable µprocessors More Power Demands More Software Functionality

G Software is not following Moore’s law (yet) System-on-a-chip DSP

10 Kurt Keutzer Embedded Software Crisis

Cheaper, more powerful J. Fiddler - WRS Microprocessors

Increasing EmbeddedEmbedded More Time-to-market SoftwareSoftware Applications pressure CrisisCrisis

J. Fiddler - WRS Bigger, More Complex Applications 11 Kurt Keutzer SW: Embedded Software Tools

application compiler U source Application S code software E R a.out RTOS

A S ROM C I C simulator P debugger A U S I RAM C

12 Kurt Keutzer Outline on RTOS

Introduction VxWorks

G General description G System G Supported processors

G Details G Kernel G Custom hardware support G Closely coupled multiprocessor support G Loosely coupled multiprocessor support pSOS eCos Conclusion

13 Kurt Keutzer Embedded Development: Generation 0

Development: Sneaker-net Attributes:

G No OS

G Painful!

G Simple software only

14 Kurt Keutzer Embedded Development: Generation 1

Hardware: SBC, minicomputer Development: Native Attributes:

G Full-function OS

G Non-Scalable

G Non-Portable

G Turnkey

G Very primitive

15 Kurt Keutzer Embedded Development: Generation 2

Hardware: Embedded Development: Cross, serial line Attributes

G Kernel

G Originally no file sys, I/O, etc.

G No development environment

G No network

G Non-portable, in assembly

16 Kurt Keutzer Embedded Development: Generation 3

Hardware: SBC, embedded Development: Cross, Ethernet

G Integrated, text-based, Unix Attributes

G Scalable, portable OS

G Includes network, file & I/O sys, etc.

G Tools on target

G Network required

G Heavy target required for development

G Closed development environment

17 Kurt Keutzer Embedded Development: Generation 4

Hardware: Embedded, SBC Development: Cross

G Any tool - Any connection - Any target

G Integrated GUI, Unix & PC Attributes

G Tools on host

G No target resources required

G Far More Powerful Tools (WindView, CodeTest, …)

G Open dev. environment, published API

G Internet is part of dev. environment

G Support, updates, manuals, etc.

18 Kurt Keutzer Embedded Development: Generation 5???

Super-scalable Communications-centric Virtual application platform

G Java? Multi-media Way-cool development environment

G Much easier to create, debug & re-use code

G Easy for non-programmers to contribute

19 Kurt Keutzer The RTOS Evolution

Application Browser / GUI Java Application Advanced Interconnect X Windows Advanced Networking WindNet Distributed Objects Memory Management Fault Tolerance 90%* Application 75%* Multiprocessing File System File System File System Application Networking 30%* Networking Networking Kernel 10%* Kernel Kernel Kernel

1980 1990 1996 1998

*Percent of total software supplied by RTOS vendor in a typical embedded device

20 Kurt Keutzer Introduction to RTOS

Wind River Systems Inc. VxWorks http://www.wrs.com

Integrated Systems Inc. pSOS http://www.isi.com

Cygnus Inc. => RedHat eCos http://www.cygnus.com => www.redhat.com

21 Kurt Keutzer VxWorks

Real-Time Embedded Applications

Graphics Multiprocessing support Internet support

Java support POSIX Library File system

WindNet Networking

Core OS

Wind Microkernel

VxWorks 5.4 Scalable Run-Time System

22 VxWorks Supported Processors

PowerPC SPARC 68K, CPU 32 NEC V8xx ColdFire MCORE M32 R/D 80x86 and Pentium RAD6000 i960 ST 20 ARM and Strong ARM TriCore MIPS SH

23 VxWorks Wind microkernel

Task management

G multitasking, unlimited number of tasks

G preemptive scheduling and round-robin scheduling(static scheduling)

G fast, deterministic context switch

G 256 priority levels

24 VxWorks Wind microkernel

Fast, flexible inter-task communication

G binary, counting and mutual exclusion semaphores with priority inheritance

G message queue

G POSIX pipes, counting semaphores, message queues, signals and scheduling

G control sockets

G shared memory

25 VxWorks Wind microkernel

High scalability Incremental linking and loading of components Fast, efficient and exception handling Optimized floating-point support Dynamic memory management System clock and timing facilities

26 VxWorks ``Board Support Package’’

BSP = Initializing code for hardware device + device driver for peripherals BSP Developer’s Kit

Hardware Processor independent dependent Device dependent code code code

BSP

27 VxWorks VxMP

A closely coupled multiprocessor support accessory for VxWorks. Capabilities:

G Support up to 20 CPUs

G Binary and counting semaphores

G FIFO message queues

G Shared memory pools and partitions

G VxMP data structure is located in a shared memory area accessible to all CPUs

G Name service (translate symbol name to object ID)

G User-configurable shared memory pool size

G Support heterogeneous mix of CPU

28 VxWorks VxMP

Hardware requirements:

G Shared memory

G Individual hardware read-write-modify mechanism across the shared memory

G CPU interrupt capability for best performance

G Supported architectures:

G 680x0 and 683xx

G SPARC

G SPARClite

G PPC6xx

G MIPS

G i960

29 VxWorks VxFusion

VxWorks accessory for loosely coupled configurations and standard IP networking; An extension of VxWorks message queue, distributed message queue. Features:

G Media independent design; App1 App2 G Group multicast/unicast messaging;

G Fault tolerant, locale-transparent operations; VxFusion

G Heterogeneous environment. Adapter Layer Supported targets:

G Motorola: 68K, CPU32, PowerPC Transport G Intel x86, Pentium, Pentium Pro

30 VxWorks pSOS

Loader Debug C/C++ File System

Memory POSIX I/O system BSPs Management Library

pSOS+ Kernel

pSOS 2.5

31 pSOS Supported processors

PowerPC M32/R 68K m.core ColdFire NEC v8xx MIPS ST20 ARM and Strong ARM SPARClite X86 and Pentium i960 SH

32 pSOS pSOS+ kernel

Small Real Time multi-tasking kernel; Preemptive scheduling; Support memory region for different tasks; Mutex semaphores and condition variables (priority ceiling) No interrupt handling is included

33 pSOS Board Support Package

BSP = skeleton device driver code + code for low- level system functions each particular devices requires

34 pSOS pSOS+m kernel

Tightly coupled or distributed processors; pSOS API + communication and coordination functions; Fully heterogeneous; Connection can be any one of shared memory, serial or parallel links, Ethernet implementations; Dynamic create/modify/delete OS object; Completely device independent

35 pSOS eCos

ISO C Library Native Kernel C API µITRON 3.0 API

Internal Kernel API

Kernel pluggable schedulers, mem alloc, synchronization, timers, ,

Device Drivers threads

HAL

36 eCos Supported processors

Advanced RISC Machines ARM7 Fujitsu SPARClite Matsushita MN10300 Motorola PowerPC Toshiba TX39 Hitachi SH3 NEC VR4300 MB8683x series Intel strong ARM

37 eCos Kernel

No definition of task, support multi-thread Interrupt and exception handling Preemptive scheduling: time-slice scheduler, multi-level queue scheduler, bitmap scheduler and priority inheritance scheduling Counters and clocks Mutex, semaphores, condition variable, message box

38 eCos Hardware Abstraction Layer

Architecture HAL abstracts basic CPU, including:

G interrupt delivery

G context switching

G CPU startup and etc. Platform HAL abstracts current platform, including

G platform startup

G timer devices

G I/O register access

G interrupt control Implementation HAL abstracts properties that lie between the above,

G architecture variants

G on-chip devices The boundaries among them blurs.

39 eCos Summary on RTOS

VxWorks pSOS eCos Task YYOnly Thread Scheduler Preemptive, static Preemptive Preemptive Synchronization mechanism No condition variable Y Y POSIX support YYLinux Scalable YYY Custom hw support BSP BSP HAL, I/O package Kernel size - 16KB - Multiprocessor support VxMP/ VxFusion PSOS+m None (accessories) kernel

40 Kurt Keutzer Recall the ``Board Support Package’’

BSP = Initializing code for hardware device + device driver for peripherals BSP Developer’s Kit

Hardware Processor independent dependent Device dependent code code code

BSP

41 VxWorks Introduction to Device Drivers

What are device drivers?

G Make the attached device work.

G Insulate the complexities involved in I/O handling.

Application

RTOS

Device driver

Hardware 42 Kurt Keutzer Proliferation of Interfaces

New Connections

G USB

G 1394

G IrDA

G Wireless New Models

G JetSend

G Jini

G HTTP / HTML / XML / ???

G Distributed Objects (DCOM, CORBA)

43 Kurt Keutzer Leads to Proliferation of Device Drivers

Courtesy - Synopsys 44 Kurt Keutzer Device Driver Characterization

Device Drivers’ Functionalities

G initialization

G data access

G data assignment

G interrupt handling

45 Kurt Keutzer Device Characterization

Block devices

G fixed data block sizes devices Character devices

G byte-stream devices Network device

G manage local area network and wide area network interconnections

46 Kurt Keutzer I/O Processing Characteristics

Initialization

G make itself known to the kernel

G initialize the interrupt handling

G optional: allocate the temporary memory for device driver

G initialize the hardware device Front-End Processing

G initiation of an I/O request Back-End Processing

G handles the completion of I/O operations

47 Kurt Keutzer Commercial Resources

Aisys DriveWay 3DE

G Motorola MPC860, MC68360, MC68302, AMD E86, Philips XA, 8C651, PIC 16/17 Stenkil MakeApp

G Hitachi H8, SH1, SH3, SH7x, HCAN Intel’s ApBuilder Motorola MCUnit GO DSP Code Composer

G TI DSPs CoWare

48 Kurt Keutzer Aysis 3DE DriveWay Features

Extensive documentation: KB help along the way as detailed as a chip manual: traffic.ext, traffic.dwp CNFG for configuring the chip such as memory and clock. Gives warning if necessary Can generate test function Can insert user code One file for each peripheral

49 Kurt Keutzer DriveWay Design Methodology

.DWP GUI Code “generator”

User data Little generation more manipulation

Output .DLL K.B. files

Manipulation Chip of K.B.database specific 50 Kurt Keutzer K.B. Database

A specific K.B. per chip family Family of chips

G chip

G peripherals – functional objects (timer, PWM counter) • functions • physicals (register setting, values, clock rate) • actual code

51 Kurt Keutzer DriveWay Builder

Add chip Add peripheral Create skeleton, link to other thins such as GUI Code reuse in adding a new chip in an existing family, e.g., use code in MPC 860 for MPC 821 Easy to create infrastructure but specifics has to be written

52 Kurt Keutzer About the code generator (1)

Cut and paste K.B. database Areas where we can use automation for device driver generation:

G model user specification

G extract useful information for drivers from HDL description of the chip

G MAP registers

G interrupt

53 Kurt Keutzer About the code generator (2)

Why is Aysis not using automation?

G Commercial efficiency

G e.g., easy to capture user specification from the GUI rather than using a model such as UML or state machine

G HDL code too low level, hard to extract information

54 Kurt Keutzer CoWare Interface Synthesis™

System suggests hardware/software interface protocols

G Handshaking, memory mapped I/O, interrupt scheme, DMA… Designer selects communication protocols & memory System synthesizes efficient device drivers and glue logic

Hardware Software

Device Glue Logic Driver

55 Kurt Keutzer Interface Synthesis Example: Memory Mapped I/O

SW SW

Port HW Device Glue HW Driver Logic Port = value;

Glue Logic compiled on processor

SW

Processor HW

*FFA3*FFA3 = value;

Device Driver

Memory 56 Kurt Keutzer Address FFA3 SW: Embedded Software Tools

application compiler U source Application S code software E R a.out RTOS

A S ROM C I C simulator P debugger A U S I RAM C

57 Kurt Keutzer ASIC Value Proposition

S/P RAMRAM µC DMA

ASIC DSP LOGIC CORE

• 20% area decrease in ASIC portion • 25% higher performance • move to higher level - HDL description at RTL The Importance of Code Size

Killian- Tensilica 5.0 Area vs. Program Instructions 4.5 2 4.0 3.5 3.0 2.5 2.0 1.5 1.0 Processor + Code RAM mm RAM Code + Processor 0.5 0.0 0 1000 2000 3000 4000 5000 6000 7000 8000 Xtensa MIPS-4Kc ARC ARM9 ARM9-Thumb Program Size (Instructions)

Based on base 0.18µ implementation plus code RAM or cache Xtensa code ~10% smaller than ARM9 Thumb, ~50% smaller than MIPS-Jade, ARM9 and ARC ARM9-Thumb has reduced performance RAM/cache density = 8KB/mm2 59 Kurt Keutzer SW Compiler Value Proposition

• 20% area decrease in RAM portion • 25% higher performance • move to higher level - C rather than assembler

R S/P RAMA µC M DMA ASIC DSP LOGIC CORE

20% area decrease over ASIC portion Memory? StrongARM Processor

Compaq/Digital StrongARM

61 Kurt Keutzer Compiler Support

BUT, few companies focused on compiler support for embedded systems:

G Cygnus => RedHat

G Tartan => TI

G Green Hills

Why? Bad ``buying behaviors’’ – few seats, low ASP’s

62 Kurt Keutzer Current Status on Compiler Support

Adequate compiler and debugger support in breadth and quality for embedded microprocessors/

G ARM

G MIPS

G Power PC

G Mot family From

G Cygnus/RedHat

G Manufacturer

G Green Hills DSP’s still poorly supported

G Tartan acquired by

G WHY???? NO support for growing generation of special purpose processors:

G TMS320C80

G IXP1200

63 Kurt Keutzer Recall: Architectural Features of DSPs

Data path configured for DSP

G Fixed-point arithmetic

G MAC- Multiply-accumulate Multiple memory banks and buses -

G

G Multiple data memories Specialized addressing modes

G Bit-reversed addressing

G Circular buffers Specialized instruction set and execution control

G Zero-overhead loops

G Support for MAC Specialized peripherals for DSP

64 Kurt Keutzer Example: IXP1200

Host CPU (optional) PCI MAC Devices

PCI Bus 66 Mhz

32

PCI Bus Unit StrongARM core

Microengine 64 SDRAM SDRAM Memory 1 Microengine (up to 256 MB) Unit 2 Microengine 3 Microengine 4 Microengine SRAM 32 SRAM Memory 5 Microengine 6 (up to 8 MB) Unit

Boot ROM IX Bus Interface (up to 8 MB) Unit

64 Peripherals FIFO Bus 66 Mhz

Ethernet MAC ATM, T1/E1 Another IXP1200

65 Kurt Keutzer IXP1200 Network Processor

6 micro-engines

SDRAM G RISC engines Ctrl G 4 contexts/eng MicroEng G 24 threads total PCI Interface MicroEng Hash IX Bus Interface Engine G packet I/O

MicroEng G connect IXPs ICache IX Bus Interface G scalable MicroEng SA StrongARM DCache Scratch Core G less critical tasks MicroEng Pad Mini SRAM Hash engine DCache MicroEng G level 2 lookups SRAM PCI interface Ctrl

66 Kurt Keutzer Summary

Embedded software support for microcontrollers and microprocessors is broadly available and of adequate quality

G RTOS

G Device drivers

G Compilers

G Debuggers Embedded software support for DSP processors is inadequate:

G Patchy support – many parts lack support

G Quality poor – lags hand coding by 20-100% Embedded software support for special purpose processors often non-existent Still in a ``build a hardware then write the software’’ world Alternatives?

67 Kurt Keutzer ASIP/Extensible micro DESIGN FLOW

APPLICATION_1 APPLICATION_2 APPLICATION_7

APPLICATION CODE µARCHITECTURE DESIGNER

RETARGETABLE INSTRUCTION SET COMPILER

OBJECT SIMULATION PERFORMANCE CODE MODEL ANALYSIS Tensilica TIE Overview

Killian- Tensilica

Processor ASIC flow RTL Configure Processor Base uP Generator uP ∗∗∗∗∗∗∗ Mem ∗∗∗∗ Software ∗∗∗∗∗∗∗∗ Tools Software ∗∗∗ Software Generator compile Describe new inst in TIE ∗∗∗∗∗∗∗ ∗∗∗∗ ∗∗∗∗∗∗∗∗ ∗∗∗

Application

69 Kurt Keutzer Tensilica TIE Design Cycle

Killian- Tensilica Develop application in C/C++

Run cycle-accurate ISS Profile and analyze

N Id potential new instructions Acceptable ?

Y Describe new instructions Measure hardware impact

Generate new software tools N Acceptable ? Compile and run application Y Build the entire processor NY Correct ?

70 Kurt Keutzer Conclusions

Full embedded software support for will be requirement for future ``platforms’’ Companies evolving hardware and software together will have a significant competitive advantage Few examples beginning to emerge- Tensilica, ST Microelectronics

71 Kurt Keutzer