Lecture 26a: Software Environments for Embedded Systems
Prepared by: Professor Kurt Keutzer Computer Science 252, Spring 2000 With contributions from: Jerry Fiddler, Wind River Systems, Minxi Gao, Xiaoling Xu, UC Berkeley Shiaoje Wang, Princeton
1 Kurt Keutzer SW: Embedded Software Tools
application compiler U source Application S code software E R a.out RTOS
A S ROM C I C simulator P debugger A U S I RAM C
2 Kurt Keutzer Another View of Microprocessor Architecture
Let’s look at current architectural evolution from the standpoint of the software developers …, in particular Jerry Fiddler
3 Kurt Keutzer Fiddler’s Predictions for the Next Ten Years (2010)
End of the “Age of the PC” Lots of Exciting Applications Development Will Continue To Be Hard
G Even as we and our competitors continue to make incredible efforts Chips - No predictions MEMS / Nano-technology & Sensors Will Impact Us
J. Fiddler - WRS
4 Kurt Keutzer Fundamental Principles
Computers are, and will be, everywhere The world itself is becoming more intelligent Our infrastructure will have major software content Most of our access to information will be through embedded systems Economics will inexorably drive deployment of embedded systems The Internet is one important factor in this trend Reliability is a critical issue EVERY tech and mfg. business will need to become good at embedded software
J. Fiddler - WRS
5 Kurt Keutzer What Will Be Embedded in Ten Years?
Everything That is Now Electro-Mechanical Machines (Nano-Machines) Analog Signals Anything that communicates Lots of stuff in our cars Our Bodies
G Today - Pacemakers
G Soon - De-Fibrillators, Insulin Dispensers
G We can all be the $6M Person, for a lot cheaper All sorts of interfaces
G Speech, DNI, etc.
J. Fiddler - WRS
6 Kurt Keutzer Embedded Microprocessor Evolution
> 500k transistors 2+M transistors 5+M transistors 22+M transistors 1 - 0.8 µ 0.8 - 0.5 µ 0.5 - 0.35 µ 0.25 - 0.18 µ 33 mHz 75 - 100 mHz 133 - 167 mHz 500 - 600 mHz
1989 1993 1995 1999
Embedded CPU cores are getting smaller; ~ 2mm2 for up to 400 mHz
G Less than 5% of CPU size Higher Performance by:
G Faster clock, deeper pipelines, branch prediction, ... Trend is towards higher integration of processors with:
G Devices that were on the board now on chip: “system on a chip”
G Adding more compute power by add-on DSPs, ...
G Much larger L1 / L2 caches on silicon J. Fiddler - WRS
7 Kurt Keutzer Microprocessor Chaos
ST 20 J. Fiddler - WRS M32 R/D StrongARM ARM SH-DSP SH 4 MCORE 680x0 680x0 CPU32 CPU32 PowerPC PowerPC 80x86 80x86 MIPS 3k/4k/5k MIPS 3k/4k/5k SPARC SPARC SH 1/2/3 SH 1/2/3 29k 29k 29k 680x0 RAD 6k RAD 6k CPU32 Siemens C16x Siemens C16x 80x86 NEC V8xx NEC V8xx SPARC PARISC PARISC MIPS R3k i960 i960 68000 i960 563xx 563xx
1980 1990 1996 1998 8 Kurt Keutzer A Challenging Environment
Expanding Functional Demands J. Fiddler - WRS Of Embedded Applications
And keep it small, stupid!
Numerous Microprocessor Architectures Derivative Processors Application-Specific CPUs Systems On A Chip 9 Kurt Keutzer New Hardware Challenges Software Development J. Fiddler - WRS
More & More Architectures
G User-Customizable µprocessors More Power Demands More Software Functionality
G Software is not following Moore’s law (yet) System-on-a-chip DSP
10 Kurt Keutzer Embedded Software Crisis
Cheaper, more powerful J. Fiddler - WRS Microprocessors
Increasing EmbeddedEmbedded More Time-to-market SoftwareSoftware Applications pressure CrisisCrisis
J. Fiddler - WRS Bigger, More Complex Applications 11 Kurt Keutzer SW: Embedded Software Tools
application compiler U source Application S code software E R a.out RTOS
A S ROM C I C simulator P debugger A U S I RAM C
12 Kurt Keutzer Outline on RTOS
Introduction VxWorks
G General description G System G Supported processors
G Details G Kernel G Custom hardware support G Closely coupled multiprocessor support G Loosely coupled multiprocessor support pSOS eCos Conclusion
13 Kurt Keutzer Embedded Development: Generation 0
Development: Sneaker-net Attributes:
G No OS
G Painful!
G Simple software only
14 Kurt Keutzer Embedded Development: Generation 1
Hardware: SBC, minicomputer Development: Native Attributes:
G Full-function OS
G Non-Scalable
G Non-Portable
G Turnkey
G Very primitive
15 Kurt Keutzer Embedded Development: Generation 2
Hardware: Embedded Development: Cross, serial line Attributes
G Kernel
G Originally no file sys, I/O, etc.
G No development environment
G No network
G Non-portable, in assembly
16 Kurt Keutzer Embedded Development: Generation 3
Hardware: SBC, embedded Development: Cross, Ethernet
G Integrated, text-based, Unix Attributes
G Scalable, portable OS
G Includes network, file & I/O sys, etc.
G Tools on target
G Network required
G Heavy target required for development
G Closed development environment
17 Kurt Keutzer Embedded Development: Generation 4
Hardware: Embedded, SBC Development: Cross
G Any tool - Any connection - Any target
G Integrated GUI, Unix & PC Attributes
G Tools on host
G No target resources required
G Far More Powerful Tools (WindView, CodeTest, …)
G Open dev. environment, published API
G Internet is part of dev. environment
G Support, updates, manuals, etc.
18 Kurt Keutzer Embedded Development: Generation 5???
Super-scalable Communications-centric Virtual application platform
G Java? Multi-media Way-cool development environment
G Much easier to create, debug & re-use code
G Easy for non-programmers to contribute
19 Kurt Keutzer The RTOS Evolution
Application Browser / GUI Java Application Advanced Interconnect X Windows Advanced Networking WindNet Distributed Objects Memory Management Fault Tolerance 90%* Application Multiprocessing 75%* Multiprocessing File System File System File System Application Networking 30%* Networking Networking Kernel 10%* Kernel Kernel Kernel
1980 1990 1996 1998
*Percent of total software supplied by RTOS vendor in a typical embedded device
20 Kurt Keutzer Introduction to RTOS
Wind River Systems Inc. VxWorks http://www.wrs.com
Integrated Systems Inc. pSOS http://www.isi.com
Cygnus Inc. => RedHat eCos http://www.cygnus.com => www.redhat.com
21 Kurt Keutzer VxWorks
Real-Time Embedded Applications
Graphics Multiprocessing support Internet support
Java support POSIX Library File system
WindNet Networking
Core OS
Wind Microkernel
VxWorks 5.4 Scalable Run-Time System
22 VxWorks Supported Processors
PowerPC SPARC 68K, CPU 32 NEC V8xx ColdFire MCORE M32 R/D 80x86 and Pentium RAD6000 i960 ST 20 ARM and Strong ARM TriCore MIPS SH
23 VxWorks Wind microkernel
Task management
G multitasking, unlimited number of tasks
G preemptive scheduling and round-robin scheduling(static scheduling)
G fast, deterministic context switch
G 256 priority levels
24 VxWorks Wind microkernel
Fast, flexible inter-task communication
G binary, counting and mutual exclusion semaphores with priority inheritance
G message queue
G POSIX pipes, counting semaphores, message queues, signals and scheduling
G control sockets
G shared memory
25 VxWorks Wind microkernel
High scalability Incremental linking and loading of components Fast, efficient interrupt and exception handling Optimized floating-point support Dynamic memory management System clock and timing facilities
26 VxWorks ``Board Support Package’’
BSP = Initializing code for hardware device + device driver for peripherals BSP Developer’s Kit
Hardware Processor independent dependent Device dependent code code code
BSP
27 VxWorks VxMP
A closely coupled multiprocessor support accessory for VxWorks. Capabilities:
G Support up to 20 CPUs
G Binary and counting semaphores
G FIFO message queues
G Shared memory pools and partitions
G VxMP data structure is located in a shared memory area accessible to all CPUs
G Name service (translate symbol name to object ID)
G User-configurable shared memory pool size
G Support heterogeneous mix of CPU
28 VxWorks VxMP
Hardware requirements:
G Shared memory
G Individual hardware read-write-modify mechanism across the shared memory bus
G CPU interrupt capability for best performance
G Supported architectures:
G 680x0 and 683xx
G SPARC
G SPARClite
G PPC6xx
G MIPS
G i960
29 VxWorks VxFusion
VxWorks accessory for loosely coupled configurations and standard IP networking; An extension of VxWorks message queue, distributed message queue. Features:
G Media independent design; App1 App2 G Group multicast/unicast messaging;
G Fault tolerant, locale-transparent operations; VxFusion
G Heterogeneous environment. Adapter Layer Supported targets:
G Motorola: 68K, CPU32, PowerPC Transport G Intel x86, Pentium, Pentium Pro
30 VxWorks pSOS
Loader Debug C/C++ File System
Memory POSIX I/O system BSPs Management Library
pSOS+ Kernel
pSOS 2.5
31 pSOS Supported processors
PowerPC M32/R 68K m.core ColdFire NEC v8xx MIPS ST20 ARM and Strong ARM SPARClite X86 and Pentium i960 SH
32 pSOS pSOS+ kernel
Small Real Time multi-tasking kernel; Preemptive scheduling; Support memory region for different tasks; Mutex semaphores and condition variables (priority ceiling) No interrupt handling is included
33 pSOS Board Support Package
BSP = skeleton device driver code + code for low- level system functions each particular devices requires
34 pSOS pSOS+m kernel
Tightly coupled or distributed processors; pSOS API + communication and coordination functions; Fully heterogeneous; Connection can be any one of shared memory, serial or parallel links, Ethernet implementations; Dynamic create/modify/delete OS object; Completely device independent
35 pSOS eCos
ISO C Library Native Kernel C API µITRON 3.0 API
Internal Kernel API
Kernel pluggable schedulers, mem alloc, synchronization, timers, interrupts,
Device Drivers threads
HAL
36 eCos Supported processors
Advanced RISC Machines ARM7 Fujitsu SPARClite Matsushita MN10300 Motorola PowerPC Toshiba TX39 Hitachi SH3 NEC VR4300 MB8683x series Intel strong ARM
37 eCos Kernel
No definition of task, support multi-thread Interrupt and exception handling Preemptive scheduling: time-slice scheduler, multi-level queue scheduler, bitmap scheduler and priority inheritance scheduling Counters and clocks Mutex, semaphores, condition variable, message box
38 eCos Hardware Abstraction Layer
Architecture HAL abstracts basic CPU, including:
G interrupt delivery
G context switching
G CPU startup and etc. Platform HAL abstracts current platform, including
G platform startup
G timer devices
G I/O register access
G interrupt control Implementation HAL abstracts properties that lie between the above,
G architecture variants
G on-chip devices The boundaries among them blurs.
39 eCos Summary on RTOS
VxWorks pSOS eCos Task YYOnly Thread Scheduler Preemptive, static Preemptive Preemptive Synchronization mechanism No condition variable Y Y POSIX support YYLinux Scalable YYY Custom hw support BSP BSP HAL, I/O package Kernel size - 16KB - Multiprocessor support VxMP/ VxFusion PSOS+m None (accessories) kernel
40 Kurt Keutzer Recall the ``Board Support Package’’
BSP = Initializing code for hardware device + device driver for peripherals BSP Developer’s Kit
Hardware Processor independent dependent Device dependent code code code
BSP
41 VxWorks Introduction to Device Drivers
What are device drivers?
G Make the attached device work.
G Insulate the complexities involved in I/O handling.
Application
RTOS
Device driver
Hardware 42 Kurt Keutzer Proliferation of Interfaces
New Connections
G USB
G 1394
G IrDA
G Wireless New Models
G JetSend
G Jini
G HTTP / HTML / XML / ???
G Distributed Objects (DCOM, CORBA)
43 Kurt Keutzer Leads to Proliferation of Device Drivers
Courtesy - Synopsys 44 Kurt Keutzer Device Driver Characterization
Device Drivers’ Functionalities
G initialization
G data access
G data assignment
G interrupt handling
45 Kurt Keutzer Device Characterization
Block devices
G fixed data block sizes devices Character devices
G byte-stream devices Network device
G manage local area network and wide area network interconnections
46 Kurt Keutzer I/O Processing Characteristics
Initialization
G make itself known to the kernel
G initialize the interrupt handling
G optional: allocate the temporary memory for device driver
G initialize the hardware device Front-End Processing
G initiation of an I/O request Back-End Processing
G handles the completion of I/O operations
47 Kurt Keutzer Commercial Resources
Aisys DriveWay 3DE
G Motorola MPC860, MC68360, MC68302, AMD E86, Philips XA, 8C651, PIC 16/17 Stenkil MakeApp
G Hitachi H8, SH1, SH3, SH7x, HCAN Intel’s ApBuilder Motorola MCUnit GO DSP Code Composer
G TI DSPs CoWare
48 Kurt Keutzer Aysis 3DE DriveWay Features
Extensive documentation: KB help along the way as detailed as a chip manual: traffic.ext, traffic.dwp CNFG for configuring the chip such as memory and clock. Gives warning if necessary Can generate test function Can insert user code One file for each peripheral
49 Kurt Keutzer DriveWay Design Methodology
.DWP GUI Code “generator”
User data Little generation more manipulation
Output .DLL K.B. files
Manipulation Chip of K.B.database specific 50 Kurt Keutzer K.B. Database
A specific K.B. per chip family Family of chips
G chip
G peripherals – functional objects (timer, PWM counter) • functions • physicals (register setting, values, clock rate) • actual code
51 Kurt Keutzer DriveWay Builder
Add chip Add peripheral Create skeleton, link to other thins such as GUI Code reuse in adding a new chip in an existing family, e.g., use code in MPC 860 for MPC 821 Easy to create infrastructure but specifics has to be written
52 Kurt Keutzer About the code generator (1)
Cut and paste K.B. database Areas where we can use automation for device driver generation:
G model user specification
G extract useful information for drivers from HDL description of the chip
G MAP registers
G interrupt
53 Kurt Keutzer About the code generator (2)
Why is Aysis not using automation?
G Commercial efficiency
G e.g., easy to capture user specification from the GUI rather than using a model such as UML or state machine
G HDL code too low level, hard to extract information
54 Kurt Keutzer CoWare Interface Synthesis™
System suggests hardware/software interface protocols
G Handshaking, memory mapped I/O, interrupt scheme, DMA… Designer selects communication protocols & memory System synthesizes efficient device drivers and glue logic
Hardware Software
Device Glue Logic Driver
55 Kurt Keutzer Interface Synthesis Example: Memory Mapped I/O
SW SW
Port HW Device Glue HW Driver Logic Port = value;
Glue Logic compiled on processor
SW
Processor HW
*FFA3*FFA3 = value;
Device Driver
Memory 56 Kurt Keutzer Address FFA3 SW: Embedded Software Tools
application compiler U source Application S code software E R a.out RTOS
A S ROM C I C simulator P debugger A U S I RAM C
57 Kurt Keutzer ASIC Value Proposition
S/P RAMRAM µC DMA
ASIC DSP LOGIC CORE
• 20% area decrease in ASIC portion • 25% higher performance • move to higher level - HDL description at RTL The Importance of Code Size
Killian- Tensilica 5.0 Area vs. Program Instructions 4.5 2 4.0 3.5 3.0 2.5 2.0 1.5 1.0 Processor + Code RAM mm RAM Code + Processor 0.5 0.0 0 1000 2000 3000 4000 5000 6000 7000 8000 Xtensa MIPS-4Kc ARC ARM9 ARM9-Thumb Program Size (Instructions)
Based on base 0.18µ implementation plus code RAM or cache Xtensa code ~10% smaller than ARM9 Thumb, ~50% smaller than MIPS-Jade, ARM9 and ARC ARM9-Thumb has reduced performance RAM/cache density = 8KB/mm2 59 Kurt Keutzer SW Compiler Value Proposition
• 20% area decrease in RAM portion • 25% higher performance • move to higher level - C rather than assembler
R S/P RAMA µC M DMA ASIC DSP LOGIC CORE
20% area decrease over ASIC portion Memory? StrongARM Processor
Compaq/Digital StrongARM
61 Kurt Keutzer Compiler Support
BUT, few companies focused on compiler support for embedded systems:
G Cygnus => RedHat
G Tartan => TI
G Green Hills
Why? Bad ``buying behaviors’’ – few seats, low ASP’s
62 Kurt Keutzer Current Status on Compiler Support
Adequate compiler and debugger support in breadth and quality for embedded microprocessors/microcontrollers
G ARM
G MIPS
G Power PC
G Mot family From
G Cygnus/RedHat
G Manufacturer
G Green Hills DSP’s still poorly supported
G Tartan acquired by Texas Instruments
G WHY???? NO support for growing generation of special purpose processors:
G TMS320C80
G IXP1200
63 Kurt Keutzer Recall: Architectural Features of DSPs
Data path configured for DSP
G Fixed-point arithmetic
G MAC- Multiply-accumulate Multiple memory banks and buses -
G Multiple data memories Specialized addressing modes
G Bit-reversed addressing
G Circular buffers Specialized instruction set and execution control
G Zero-overhead loops
G Support for MAC Specialized peripherals for DSP
64 Kurt Keutzer Example: IXP1200
Host CPU (optional) PCI MAC Devices
PCI Bus 66 Mhz
32
PCI Bus Unit StrongARM core
Microengine 64 SDRAM SDRAM Memory 1 Microengine (up to 256 MB) Unit 2 Microengine 3 Microengine 4 Microengine SRAM 32 SRAM Memory 5 Microengine 6 (up to 8 MB) Unit
Boot ROM IX Bus Interface (up to 8 MB) Unit
64 Peripherals FIFO Bus 66 Mhz
Ethernet MAC ATM, T1/E1 Another IXP1200
65 Kurt Keutzer IXP1200 Network Processor
6 micro-engines
SDRAM G RISC engines Ctrl G 4 contexts/eng MicroEng G 24 threads total PCI Interface MicroEng Hash IX Bus Interface Engine G packet I/O
MicroEng G connect IXPs ICache IX Bus Interface G scalable MicroEng SA StrongARM DCache Scratch Core G less critical tasks MicroEng Pad Mini SRAM Hash engine DCache MicroEng G level 2 lookups SRAM PCI interface Ctrl
66 Kurt Keutzer Summary
Embedded software support for microcontrollers and microprocessors is broadly available and of adequate quality
G RTOS
G Device drivers
G Compilers
G Debuggers Embedded software support for DSP processors is inadequate:
G Patchy support – many parts lack support
G Quality poor – lags hand coding by 20-100% Embedded software support for special purpose processors often non-existent Still in a ``build a hardware then write the software’’ world Alternatives?
67 Kurt Keutzer ASIP/Extensible micro DESIGN FLOW
APPLICATION_1 APPLICATION_2 APPLICATION_7
APPLICATION CODE µARCHITECTURE DESIGNER
RETARGETABLE INSTRUCTION SET COMPILER
OBJECT SIMULATION PERFORMANCE CODE MODEL ANALYSIS Tensilica TIE Overview
Killian- Tensilica
Processor ASIC Verilog flow RTL Configure Processor Base uP Generator uP ∗∗∗∗∗∗∗ Mem ∗∗∗∗ Software ∗∗∗∗∗∗∗∗ Tools Software ∗∗∗ Software Generator compile Describe new inst in TIE ∗∗∗∗∗∗∗ ∗∗∗∗ ∗∗∗∗∗∗∗∗ ∗∗∗
Application
69 Kurt Keutzer Tensilica TIE Design Cycle
Killian- Tensilica Develop application in C/C++
Run cycle-accurate ISS Profile and analyze
N Id potential new instructions Acceptable ?
Y Describe new instructions Measure hardware impact
Generate new software tools N Acceptable ? Compile and run application Y Build the entire processor NY Correct ?
70 Kurt Keutzer Conclusions
Full embedded software support for will be requirement for future embedded system ``platforms’’ Companies evolving hardware and software together will have a significant competitive advantage Few examples beginning to emerge- Tensilica, ST Microelectronics
71 Kurt Keutzer