<<

Teaching Design and Optimization with the ARM Cortex-M0+

Dr. Alexander G. Dean Dept. of ECE North Carolina State University Raleigh, NC [email protected] http://www.cesr.ncsu.edu/agdean

ARM University Program Copyright © ARM Ltd 2014 1 Course Approach . Hands-on MCU development experience . Programming in with free toolchain from Keil . Simple, inexpensive hardware . Easy system expansion with Arduino-compatible hardware

. Relevant, useful material based on 13 years of experience . Interaction with industry through on-site design reviews of real embedded systems . Teaching large (50 – 120 students) undergraduate and graduate embedded system courses

. Prerequisites . Introductory course: introduction to computer organization, C programming . Advanced course: introductory course. Other courses also helpful

ARM University Program Copyright © ARM Ltd 2014 2 Course Materials . Easy adoption for your own course . Flexibility with modular design . All source files provided (pptx, docx, vsd, c, h …)

. Course modules typically include . PowerPoint slides/lecture notes . Demonstration code for use in lecture and outside of class . Homework questions and solutions . Lab exercise(s) with step-by-step procedure and questions . Programming project(s) with solution

. Status . Introductory course materials available now . Advanced course materials available early summer . Developing a textbook to support both courses

ARM University Program Copyright © ARM Ltd 2014 3 ARM University Program Copyright © ARM Ltd 2014 4 Worldwide Adoption

. Video introduction . Nearly 100 adoptions since launch last summer . North & South America, Europe, Asia, Africa . “We were delighted to be one of the first institutions to receive the ARM University Program's Lab-in-a-Box on Embedded Systems. It has immediately proven itself to me as an excellent resource for our research and teaching activities.” . Dr. Boris Adryan, University of Cambridge, UK. . "Freescale is delighted to be building on our longstanding relationship with ARM by providing access to our solutions to universities and electronic engineering students around the world. The Lab-in-a-Box is just one way Freescale and ARM collaboratively reach out to our engineers of the future and help us create a new wave of innovative embedded technologies that will drive our increasingly connected world.“ . Andy Mastronardi, Director, University Programs, Freescale

ARM University Program Copyright © ARM Ltd 2014 5 Cortex Processor Cores

. Cortex-A: application profile . High performance, multiprocessing . Cortex-R: real-time profile . Predictable performance . Cortex-M: profile - optimized for embedded applications . Implementation . Short pipeline . Fast interrupt response . Low cost, low power . Fast GPIO access . Instruction set . Good code density (16-bit Thumb 2 instructions) . Bit and byte operations . Optional single-cycle multiply instruction, hardware divide, saturated math . Optional DSP & SIMD instructions . Optional floating point unit

ARM University Program Copyright © ARM Ltd 2014 6 Target Board – Freescale Freedom KL25Z . 32-bit ARM Cortex-M0+ processor . Freescale Kinetis MKL25Z128VLK4 microcontroller . Extremely low power use . 48 MHz max processor clock freq. . 128 KB Flash ROM, 16 KB RAM . Wide range of peripherals, including USB on-the-go . FRDM-KL25Z board . $13 (USD) . Peripherals: 3-axis accelerometer, RGB LED, capacitive touch slider . Expansion ports are compatible with Arduino shield ecosystem – endless opportunities, low-cost hardware . mbed.org enabled - online software development toolchain, reusable code Images courtesy of Freescale ARM University Program Copyright © ARM Ltd 2014 7 Hardware Ecosystem

. Arduino shields . Wide variety . Low cost . High volume . Xtrinsic board from element 14

Images courtesy of Freescale, Adafruit, Element 14, Parallax, Seeed Studio ARM University Program Copyright © ARM Ltd 2014 8 And Even More Shields

Image courtesy of Seeed Studio

ARM University Program Copyright © ARM Ltd 2014 9 SOFTWARE DEVELOPMENT TOOLS

ARM University Program Copyright © ARM Ltd 2014 10 Software Development Suite: MDK-ARM

. Low cost tools for ARM7, ARM9, Cortex-M and Cortex-R4 MCUs . Extensive support for many devices . Core and peripheral simulation . Flash support . Integrated development environment with ARM . Full (pro) version available for teaching, free 32 KB size-limited version available also . Debugger with full access to program, core and peripherals . Real-time trace on devices based on Cortex-M3 and M4 . Real-Time . KEIL RTX RTOS + Source Code . TCP networking suite, Flash File System, CAN Driver Library, USB Device Interface . Debug Hardware . Evaluation boards . Separate support channel . See www.keil.com

ARM University Program Copyright © ARM Ltd 2014 11 CURRICULUM OVERVIEW

ARM University Program Copyright © ARM Ltd 2014 12 Curriculum Overview . Introductory Course: Building an Embedded System with a Microcontroller . Microcontroller concepts . Software design basics . ARM Cortex-M0+ architecture and interrupt system . C as implemented in assembly language . Peripherals and interfacing . Advanced Course: Embedded System Design, Analysis and Optimization . Creating responsive multithreaded systems . Optimizing code speed . Optimizing system power and energy . Optimizing memory requirements . Details in appendix

ARM University Program Copyright © ARM Ltd 2014 13 Introductory Course Modules

Software Design Basics Cortex-M0+ Emb. SW Concurrency CMSIS Processor Core Engineering

C as Implemented in Assembly Lang.

Interrupts

HW & SW for Serial GP I/O Analog I/O DMA Timers Robustness Comm.

Using Arduino Shields

ARM University Program Copyright © ARM Ltd 2014 14 Introduction to Embedded Systems . What is an Embedded System? . Why add a computer to the larger system? . Differences between… . Embedded and general-purpose computers . Microcontrollers and . Embedded system . Functions . Attributes . Constraints . Economics

ARM University Program Copyright © ARM Ltd 2014 15 Embedded MCU vs. General-Purpose CPU

. Both have a CPU core to execute instructions . Microcontroller has peripherals for concurrent embedded interfacing and control . Analog . Non-logic level signals . Timing, communications . Reliability and safety . Embedded systems have concurrent, reactive behaviors . Must respond to sequences and combinations of events . Real-time systems have deadlines on responses . Typically must perform multiple separate activities concurrently

ARM University Program Copyright © ARM Ltd 2014 16 Concurrent Hardware & Software Operation

Software Hardware Software Hardware Software Time

. Embedded systems rely on both MCU hardware peripherals and software to get everything done on time

ARM University Program Copyright © ARM Ltd 2014 17 Advanced Course Modules (1/2)

Concurrency Concepts C as Implemented and Interrupts in Assembly Lang.

Cooperative Preemptive Execution-Time Examining Task Scheduling Task Scheduling Profiling Object Code

Design, Analysis Source Code and High-Level & Optimization of Toolchain Tuning Optimizations Real-Time Systems

CMSIS-DSP and Cortex-M0+ Features

ARM University Program Copyright © ARM Ltd 2014 18 Advanced Course Modules (2/2)

Power & Energy C Program Modeling & Analysis Memory Use Concepts

Cortex M0+ KL25Z Freedom Tools for Analyzing Core MCU Board Memory Use

Optimizing Power ROM Size RAM Size and/or Energy with Optimization Optimization Sleep and DVFS

ARM University Program Copyright © ARM Ltd 2014 19 COURSE MODULE DETAILS

ARM University Program Copyright © ARM Ltd 2014 20 INTRODUCTORY COURSE

ARM University Program Copyright © ARM Ltd 2014 21 Software Development, Processor & Interrupts

Module Presentations Demonstration Homework Laboratory Programming Code and Test Exercise Project Questions Introduction to Presentation Solutions Embedded Systems SW Design Basics: Presentation Solutions Concurrency, SW Eng. & CMSIS APIs Cortex-M0+ Presentation Solutions Text Processing in Integer Square Root Assembly Approximation Processor Core Language • Assignment • Lab Exercise • Starter code • Code • Solution C Code as Presentation C/Asm Demo 1 Solutions, Examining Code, source project Toolchain Output Implemented in Demo 2 Code • Lab Exercise Assembly • Code Language Interrupts Presentation Interrupt Demo Solutions, Measuring Human Response Code, Notes spreadsheet Interrupt Timing Timer • Lab Exercise • Assignment • Solution

ARM University Program Copyright © ARM Ltd 2014 22 Using Peripherals General Purpose Presentation GPIO Demo Code Solutions, Basic User Slide Whistle (includes Basic Spreadsheet Interface with LCD • Assignment Digital Interfacing Light Switch, RGB and switches • Solution LED Flasher, • Lab Exercise Speaker tone • Code generator, Text LCD with parallel bus) Analog Interfacing Presentation Analog Interfacing Solutions, ADC: Voltmeter Infrared Proximity Demo Code spreadsheet • Lab Exercise Sensor • Code • Assignment Interfacing with Comparator: • Solution Arduino Analog Voltage Monitor Devices • Lab Exercise • Code DAC: Signal Generator • Lab Exercise • Code Timers Presentation Timer Demo Code Solutions, Signal Generator Clock with Pulsing spreadsheet with Precision LED Timing and • Assignment Buffering • Solution • Lab Exercise Using PWM to • Code control motor Speed • Solution • Assignment • Solution

ARM University Program Copyright © ARM Ltd 2014 23 Using Peripherals Serial Presentation I2C and Solutions UART UART: Creating a Accelerometer Performance Speedometer with a Communication Analysis GPS Receiver Creating a Console • Lab Exercise • Assignment Interface with a • Code • Solution UART Interfacing with an SD memory card Interfacing with using SPI Arduino Serial • Assignment Devices • Solution Improving System Presentation Watchdog Timer Solutions Testing an Making a system embedded robust Robustness with Stack Overflow system’s • Assignment Hardware and Detection robustness • Solution Software • Lab Exercise • Code Using Direct Presentation Memory Copy Solutions Evaluating DMA: Upgrading & ISR Replacement Memory Copy from ISRs to DMA Memory Access to Speeds • Assignment Improve • Lab Exercise • Solution Performance • Code

ARM University Program Copyright © ARM Ltd 2014 24 ADVANCED COURSE

ARM University Program Copyright © ARM Ltd 2014 25 Building Multithreaded Systems Module Presentations Demonstration Homework Laboratory Programming Code and Test Exercise Project Questions Introduction to Presentation Advanced Topics Managing Presentation RTX Preemptive Solutions Evaluating Scheduler Scheduler Concurrency with Responsiveness Cooperative and Nonpreemptive • Assignment Preemptive Schedulers Scheduler • Code • Solution Designing Presentation Data race conditions Solutions Using RTOS Upgrading the with preemption Mechanisms Waveform Generator Multithreaded • Assignment to use an RTOS Applications with • Code • Assignment RTOS Support • Solution • Solution

Design, Analysis, and Presentation Response time Solutions Instrumenting RTX and evaluation verifying task timing Optimization of Real- • Assignment Time Systems • Solution Advanced Debugging Presentation Microtrace buffer Using the MTB Kernel-aware • Assignment with Cortex-M0+ & debugging with MDK • Code MDK

ARM University Program Copyright © ARM Ltd 2014 26 Performance Analysis and Optimization Module Presentations Demonstration Homework Laboratory Programming Code and Test Exercise Project Questions Profiling Program Presentation Profiling Spherical Solutions Profiling Lab Tilt-compensated Geometry • Lab Exercise compass profiling Execution Time Calculations • Code • Assignment • Solution Examining Object Presentation Sample program Solutions Code without Getting Lost Speed Optimization Presentation Tuning the toolchain Solutions Evaluating Compiler Toolchain tuning for Optimizations the TC compass with Toolchain Tuning • Lab Exercise • Assignment • Code • Solution

Speed Optimization Presentation Optimizing Spherical Solutions Code optimization for Geometry the TC compass with Program Calculations • Assignment Transformations • Solution DSP Acceleration with Presentation Real-time audio Solutions Using the CMSIS-DSP filtering library for an ultrasonic Cortex-M0+ and the rangefinder CMSIS-DSP Library • Assignment • Solution

ARM University Program Copyright © ARM Ltd 2014 27 Power and Energy Analysis and Optimization

Module Presentations Demonstration Homework Laboratory Programming Code and Test Exercise Project Questions Power and Energy Presentation Solutions Freedom Board Power Analysis Lab Analysis • Lab Exercise • Code Freedom Board Energy Analysis • Lab Exercise • Code KL25Z Features for Presentation Evaluating Power in Solutions Evaluating KL25Z Active and Sleep Low-Power Standby Low Power and Energy Modes Modes • Lab Exercise • Code KL25Z Voltage and Frequency Scaling • Lab Exercise • Code Optimizing Power or Presentation Analyzing and Solutions This Side Up: optimizing data Optimizing an Energy Use logger energy use orientation logger for lower energy use • Assignment • Solution

ARM University Program Copyright © ARM Ltd 2014 28 Memory Analysis and Optimization

Module Presentations Demonstration Homework Laboratory Programming Code and Test Exercise Project Questions Profiling and Reducing Presentation Graphics rendering Solutions Evaluating the Reducing RAM and size profiling and impact of the ROM for a ROM and RAM optimization compiler multithreaded system Memory optimizations and • Assignment Requirements toolchain options • Solution

ARM University Program Copyright © ARM Ltd 2014 29 INTRODUCTORY COURSE

ARM University Program Copyright © ARM Ltd 2014 30 Introduction to Embedded Systems Design

. Embedded System Fundamentals . Concurrency . Software Engineering for Embedded Systems . CMSIS . Improving System Robustness with Hardware and Software . Processor . Cortex-M0+ Processor Core . C Code as Implemented in Assembly Language . Interrupts . Using Peripherals . General Purpose Digital Interfacing . Analog Interfacing . Timers . Serial Communication . Direct Memory Access

ARM University Program Copyright © ARM Ltd 2014 31 Introduction to Embedded Systems . What is an Embedded System? . Why add a computer to the larger system? . Differences between… . Embedded and general-purpose computers . Microcontrollers and microprocessors . Embedded system . Functions . Attributes . Constraints . Economics

ARM University Program Copyright © ARM Ltd 2014 32 Software Design Basics

. Concurrency in software and hardware . Peripherals

. Software tasks Time . Task scheduling and response time . Prioritization . Preemption . Software engineering . Development models . Design before coding . Graphical representations: statecharts, flowcharts, Static sequence diagrams Rec Sw LCD Dec Check . Essential UML Dynamic Run-to-Completion . Peer review Rec Dec Check Dynamic Preemptive . Testing concepts Dec Check

ARM University Program Copyright © ARM Ltd 2014 33 Cortex Microcontroller Software Interface Standard

. Vendor-independent hardware abstraction layer for Cortex-M . Standardizes interfaces to . Processor core . Peripherals . Debug access port . RTOS . Provides . Optimized libraries of DSP functions in fixed and floating point . Peripheral system view description

ARM University Program Copyright © ARM Ltd 2014 34 Cortex-M0+ CPU Core

. Processor core registers . Memory space, contents and addressing . ARMv6-M Thumb instruction set overview

ARM University Program Copyright © ARM Ltd 2014 35 C as Implemented in Assembly Language

Lower (Free stack space) . We program in C for convenience, address Activation record Local storage <- Stack ptr but should understand the for current Return address function Arguments assembly code implementing it Local storage Activation record Return address for caller function . Code efficiency Arguments Activation record Local storage . Ease of analysis for caller’s caller Return address function Arguments Higher Activation record Local storage . An overview of what C gets address for caller’s caller’s Return address compiled to caller function Arguments . C start-up module . Register use . Activation records . Subroutines . Data types & classes . Using pointers . Control flow

ARM University Program Copyright © ARM Ltd 2014 36 Exceptions and Interrupts

. Exception and Interrupt Concepts . Vector table . Stack use . Processing sequence . Entering an Exception Handler . Exiting an Exception Handler . Cortex-M0+ Interrupts . NVIC interrupt controller, exception mask, prioritization . Using Port Module and External Interrupts . Timing Analysis . Program Design with Interrupts . Sharing Data Safely Between ISRs and Other Threads . Data atomicity and race conditions . Volatile data

ARM University Program Copyright © ARM Ltd 2014 37 General-Purpose I/O

Data Bus Address . GPIO bit n Bus Address . Basic Concepts Decoder PDDR select . Port Circuitry Port Data D Direction Q . Control Registers Register . Accessing Hardware Registers in C PDOR select PSOR select Set PCOR select . Clocking and Muxing Rst Port Data PTOR select Pin or Tgl Output Register Pad on . Circuit Interfacing D Q package

. Inputs I/O Clock PDIR select . Switches Port Data Pin Control D Input Q Register Register MUX field . Outputs . LEDs . Speaker . Both . Interfacing with a Text LCD

ARM University Program Copyright © ARM Ltd 2014 38 Demo, Lab and Project

. Demo . Basic Light Switch . RGB LED Flasher . Speaker tone generator . Lab . Interfacing with a generic text LCD and switches . Can target Arduino LCD + switch shield instead . Project . Slide whistle sound effect generator

ARM University Program Copyright © ARM Ltd 2014 39 Analog Interfacing

. Analog and digital domains 111111 Quantization and transfer functions Test voltage . (DAC output) . Sampling Analog . Converters Input 100110 . DAC 100100 Voltage

0 100000 00

. Comparator 000 1 0000 00000 1 . ADC – flash, successive approximation 1 . Done. . . KL25Z Peripherals , try xxxxxx . Digital to analog converter 100110 . Analog comparator know know know know 1 try 10011 10011x, know know 1xxxxx, try 1 know 10xxxx, try 10 know 1001xx, try try 1001 1 1001xx, know know 100xxx, try try 100 1 100xxx, know 000000 . Reference voltage DAC T1 T2 T3 T4 T5 T6 Start of Time . Analog to digital converter Conversion . Clock configuration . Input channel multiplexer . Conversion triggers . Special features: averaging, low power, repeat, automatic compare

ARM University Program Copyright © ARM Ltd 2014 40 Demos, Labs and Projects

. Demo . Measure voltage with ADC . Detect low voltage with comparator . Waveform generator with DAC . Labs . ADC – Supply voltage monitor . Comparator – Low voltage alarm . DAC – Waveform generator . Project . Infrared proximity sensor using ADC and digital output

Object present, reflects IR back to receiver

ARM University Program Copyright © ARM Ltd 2014 41 Timers

. Concepts Interrupt . Elapsed time measurement . Event counting

. Periodic interrupts Read/write timer start value (TSV) . Input capture from PIT_LDVALn Output compare . Start Value . Pulse-width modulation Reload . Software data queues Clock Presettable Interrupt . KL25Z Timers Binary Counter

. Periodic Interrupt Timer Read current timer value (TVL) . Timer/PWM Module from PIT_CVALn . Low-Power Timer . Sys Tick

ARM University Program Copyright © ARM Ltd 2014 42 Demo, Lab and Project

. Demo . Count interrupts, adjust PWM signal based on board tilt as measured by accelerometer . Lab . Signal generator with precision timing and buffering . Projects . Time-of-day clock with pulsing LED . Using PWM to control a motor’s speed

ARM University Program Copyright © ARM Ltd 2014 43 Serial Communications

. Serial communications Data bits . Concepts . Tools . Software: polling, interrupts and

Data Sampling T T Zero Time T T T T T T T T bit bit bit bit bit bit bit bit bit bit *10.5 *1.5

Time at *2.5 *3.5 * * *6.5 *7.5 *8.5 *9.5

buffering 4.5 5.5 Receiver . Processing binary and text messages

Any char. except *, \r or \n . UART communications Start $ Append char to buf. Append char to buf. Talker + Sentence Inc. counter *, \r or \n, Type . Concepts non-text, or buf==$SDDBT, counter>6 $VWVHW, or $YXXDR Enqueue all chars. from buf . KL25 I2C peripheral /r or /n Sentence Body Any char. except * . SPI communications Enqueue char * Enqueue char . Concepts Checksum send_string get_string 1 Any char. . KL25 SPI peripheral Save as checksum1 Checksum 2 2 . I C communications Any char. Save as checksum2 . Concepts tx_isr rx_isr

. KL25 I2C peripheral Serial Interface

ARM University Program Copyright © ARM Ltd 2014 44 Demo, Lab and Projects

. Demo . I2C communication with onboard accelerometer . Console interface with a UART . Lab . UART performance and timing analysis . Projects . UART: Creating a speedometer with a GPS receiver . SPI: Interfacing with an SD memory card

ARM University Program Copyright © ARM Ltd 2014 45 Improving System Robustness Start WDT Restart WDT Restart WDT WDT times out, WDT resets system SF Regs Value Time Global Data . Low voltage detector Heap

. Watchdog timer B Stack

. Defensive programming A Stack Monitor RAM . Stack depth analysis Thread A

. Testing and test coverage Instructions Thread B

ARM University Program Copyright © ARM Ltd 2014 46 Direct Memory Access

. Basic Concepts . DMA peripheral . Selecting trigger sources

. DMA Peripherals in Cortex-M0+

. DMA Applications . Data Transfer . Replacing ISRs

ARM University Program Copyright © ARM Ltd 2014 47 Demo, Lab and Project

. Demo . Memory copy . Waveform playback to DAC without ISR . Lab . Evaluate memory copy speed with different DMA configurations . Project . Remote data acquisition system with serial control and analog data input

ARM University Program Copyright © ARM Ltd 2014 48 ADVANCED COURSE

ARM University Program Copyright © ARM Ltd 2014 49 Advanced Design, Analysis and Optimization

. Advanced Scheduling and Design . Cooperative and Preemptive Schedulers . Designing Multithreaded Systems . Sharing Data Safely with RTOS Support . Timing Analysis of Real-Time Systems . Advanced Debugging with Cortex-M0+ . Analysis and Optimizations . Code Execution Speed . Analysis . Optimization . Power and Energy . Analysis and Modeling . Optimization . Memory Requirements . Analysis . Optimization

ARM University Program Copyright © ARM Ltd 2014 50 Cooperative and Preemptive Schedulers

. Task scheduling and response time . Prioritization and preemption . Non-preemptive scheduling . Task states and scheduling rules . Scheduler implementation . RTX API and example . Limitations . Response time . Prioritization . Software structure . Preemptive scheduling . Task states and scheduling rules . Scheduler implementation and context switching mechanics . RTX API and example . Limitations . Data sharing . Memory use ARM University Program Copyright © ARM Ltd 2014 51 Designing Multithreaded Systems

. What is in an RTOS? . Bounded response times . Preemption . Time management . Synchronization . Task creation & control . Signaling events . Sharing data safely . Concepts . Data atomicity Keil RTX . Readers and writers Real-Time . Race conditions . Mechanisms . Synchronization objects . Scheduler lock . Interrupt masking

ARM University Program Copyright © ARM Ltd 2014 52 Creating Real-Time Systems . Periodic task model . Releases, periods and P1 P1 P1 P1 deadlines P2 P2 P2 . Worst-case execution time P3 P3 (analytical and empirical) 0 1 2 3 4 5 6 7 8 9 10 11 12 Time . Simplifying assumptions . Metrics . Response time   Ri . Schedulability Ri = Ci + ∑  C j . Priority-based scheduling j∈hp(i) Tj  . Priority assignment . Fixed priority (RM, DM) . Response time analysis U = m(21/ m −1) . Utilization bound schedulability test Max . Dynamic priority (EDF) . Response time analysis . Utilization bound schedulability test

ARM University Program Copyright © ARM Ltd 2014 53 Handling More Complex RT Systems . Supporting aperiodic tasks   an . Non-zero context switch times Ri = Ci + Bi + ∑  C j j∈hp(i) Tj  . Dependencies between tasks

H R H 3 4 M M . Dealing with priority inversion 1 2 5 L R L L . Dealing with WCET >> ACET

. Supporting different deadlines

ARM University Program Copyright © ARM Ltd 2014 54 Lab and Projects

. Lab – Response Time Comparison . Non-preemptive, non-prioritized . Non-preemptive, prioritized . Preemptive, prioritized

. Projects . Worst-case execution time analysis for the Cortex-M0+ . Instrumenting RTX and verifying task timing

Image courtesy Sparkfun

ARM University Program Copyright © ARM Ltd 2014 55 Advanced Debugging with Cortex-M0+

. CoreSight . Micro Trace Buffer . Data Watchpoints . Hardware breakpoints . Kernel-aware debugging in Keil MDK

ARM University Program Copyright © ARM Ltd 2014 56 Code Execution Speed Analysis

. Overview

. Timing Methods . Timer peripheral . Scope with twiddle bits

. Profiling Methods . Program counter sampling . Hardware trace support

. Object Code Inspection . How to keep from getting lost

ARM University Program Copyright © ARM Ltd 2014 57 Code Execution Speed Optimization

. Rationale and Trade-offs . Design-time vs. Compile-time Optimizations . Maintainability and Portability vs. Fast Code . The Evils of Premature Optimization . Trust but Verify (the Compiler) . Using the Compiler to the Fullest . What should the compiler be able to do? . Helping and persuading the compiler . Precalculation . Algorithms, Data Organization and Data Structures . Math . Fixed point and integer math . Reduced precision floating-point math . Polynomial approximations

ARM University Program Copyright © ARM Ltd 2014 58 Cortex-M Optimizations . CMSIS-DSP Library . Primarily supports vectors and matrices for DSP . Versions optimized for Cortex-M0, M0+, M3, or M4 . Data types . Floating point . Fixed point: q7, q15, q31, q63 . Wide range of functions available . Basic math, fast math, complex math . Filters, transforms, matrix functions . Motor control, . Statistical functions . Support functions . Interpolation functions . Cortex-M0+-specific coding practices

ARM University Program Copyright © ARM Ltd 2014 59 Demo and Project

. Demos . Profiling and optimizing spherical geometry code . Project . Profiling and optimizing a tilt-compensated compass

ARM University Program Copyright © ARM Ltd 2014 60 Power and Energy Analysis . Modeling fundamentals . Basic models . Static & dynamic power . Optimizing for power vs. energy . Power systems . Voltage regulators and switching converters . Switches (diodes and transistors) . Modeling system power

. Measuring current and power V1 . KL25Z processor . Freedom board . Sampling V and I V2 . Measuring energy . Sampling V and I and integrating Capacitor Voltage Δt . Using an ultracapacitor Time ARM University Program Copyright © ARM Ltd 2014 61 Demos and Labs

. Demos MCU Current (µA) at VDD = 3V MCU Power Mode Normal LL VLL VLP ??? . Power measurement Run 5000 250 . Ultracapacitor-based Wait 3700 135 energy measurement Stop 345 1.9 4.4 Stop 3 1.4 Stop 1 0.77 Stop 0 0.38

. Labs . Freedom board power analysis – where does the power go? . Energy analysis – how long will the ultracap power the board?

ARM University Program Copyright © ARM Ltd 2014 62 Power and Energy Optimization

. System optimization challenges 1.6 1.4 PActive . Optimizations for peripherals 1.2 PSleep Paverage 1

. Standby and low-power modes 0.8 . Clock gating 0.6

Average Power (mW) 0.4

. Voltage scaling and conversion 0.2

0 . Frequency scaling 0 10 20 30 40 50 MCU Clock Frequency (MHz) . Optimizations for the MCU . Voltage scaling . Clock frequency scaling Average Power 1.6 . Voltage and frequency scaling 1.4 1.2 . Active & sleep modes 1.0 0.8 0.6 . Integrating power and energy 0.4 Power (mW) Power 0.2 management into task 0.0 schedulers . Non-real-time systems . Real-time systems

ARM University Program Copyright © ARM Ltd 2014 63 Analysis of Memory Requirements

. Memory is free up to a point … after which it becomes expensive

. Reducing memory requirements may enable . Use of a less expensive MCU . Implementation of a more sophisticated algorithm . More diagnostic code . More fault logging

. Determining Memory Requirements . Program memory use . Static, automatic and dynamic data . Code memory . Understanding the linker map file

ARM University Program Copyright © ARM Ltd 2014 64 Optimization of Memory Requirements

. Optimizing Data Memory . Language support for read-only data . Toolchain support . Memory models . Data sizes . Packed data structures and bitfields . Improving stack memory size estimation . Shrinking activation records . Using stack-friendly functions

ARM University Program Copyright © ARM Ltd 2014 65 Optimization of Memory Requirements

. Optimizing Code Memory 3 Max Sta Task . Language support . Toolchain configuration . Function outlining Non-preemptive Preemptive Dynamic . Memory models Dynamic . Optimized library variants . Improving similar or identical code Task 2 2 Stack Max Task

. Optimization for Multitasking Systems . Improving the accuracy of stack memory estimates . Reducing or eliminating preemption Task 1 1 Stack Max Task 2 Stack Max Task 3 Stack Max Task 4 Stack Max Task 1 Stack Max Task . Combining tasks to reduce stack Task 1 Statics Task 1 Statics count Task 2 Statics Task 2 Statics . Preemption threshold scheduling Task 3 Statics Task 3 Statics Task 4 Statics Task 4 Statics

ARM University Program Copyright © ARM Ltd 2014 66 EXAMPLE PROJECTS

ARM University Program Copyright © ARM Ltd 2014 67 INFRARED PROXIMITY SENSOR

ARM University Program Copyright © ARM Ltd 2014 68 Concepts

No object present, no Object present, reflects IR reflected back to receiver IR back to receiver

. Detect object by shining infrared light and measuring impact on ambient light level

. Components . Infrared LED – emits IR light . Infrared phototransistor – conducts more current as IR increases

ARM University Program Copyright © ARM Ltd 2014 69 Using Differential Measurements

. Basic approach of measuring ambient + reflected light is unreliable . Vulnerable to changes in ambient light due to flicker in light sources, shadows, etc. Measure IR level with IRLED off . Use differential measurements instead . Measure brightness with IRLED off . Measure brightness with IRLED on . Difference in brightness levels indicates amount of IR Measure IR level reflected with IRLED on

ARM University Program Copyright © ARM Ltd 2014 70 Response Time Issues

. IR Sensor (phototransistor) does not respond IRLED instantaneously . Has internal capacitance which Off Off must be charged or discharged On On On . Rate of change depends on brightness IR Sensor Darker . Need to introduce time delay in processing sequence . Change IRLED value Lighter . Wait for some time for sensor to respond . Read IR Sensor

ARM University Program Copyright © ARM Ltd 2014 71 BUBBLE LEVEL WITH TEXT LCD

ARM University Program Copyright © ARM Ltd 2014 72 Learning Objectives

. Configuring GPIO pins for input and output

. Interfacing with Text LCD controller

. Developing Code . Printing text . Interfacing with accelerometer

ARM University Program Copyright © ARM Ltd 2014 73 Hardware Overview

Supply VDD

Text LCD Module

Enable E 16 Rows LCD Glass LCD Read/~Write R/~W Controller MCU Register Select RS (HD44780 40 Col. 40 Col. 40 Columns Data Bus DB0-7 or equivalent) Serial Data Driver Driver

Contrast Ground Adjustment VSS VO . MCU communicates with LCD . LCD controller interprets commands controller via from MCU . Parallel data bus (DB0-7) . Write to display memory . Read/~Write . Change configuration . Register Select (register or data?) . Read status . Enable . Read memory

ARM University Program Copyright © ARM Ltd 2014 74 LCD Controller Instructions Code Instruction Description RS R/W B7 B6 B5 B4 B3 B2 B1 B0 Clears display and returns cursor to the home Clear display 0 0 0 0 0 0 0 0 0 1 position (address 0). Cursor home 0 0 0 0 0 0 0 0 1 * Returns cursor to home position. Sets cursor move direction (I/D); specifies to shift the Entry mode set 0 0 0 0 0 0 0 1 I/D S display (S). These operations are performed during data read/write. Sets on/off of all display (D), cursor on/off (C), and Display on/off control 0 0 0 0 0 0 1 D C B blink of cursor position character (B). Cursor/display shift 0 0 0 0 0 1 S/C R/L * * Cursor-move or display-shift (S/C), direction (R/L). Sets interface data length (DL), number of display Function set 0 0 0 0 1 DL N F * * line (N), and character font (F). Sets the CGRAM address. CGRAM data are sent Set CGRAM address 0 0 0 1 CGRAM address and received after this setting. Sets the DDRAM address. DDRAM data are sent Set DDRAM address 0 0 1 DDRAM address and received after this setting. Reads busy flag (BF) indicating internal operation Read busy flag & 0 1 BF CGRAM/DDRAM address being performed and reads CGRAM or DDRAM address counter address counter contents. Write CGRAM or 1 0 Write Data Write data to CGRAM or DDRAM. DDRAM Read from CG/DDRAM 1 1 Read Data Read data from CGRAM or DDRAM.

ARM University Program Copyright © ARM Ltd 2014 75 POWER AND ENERGY MEASUREMENT LAB

ARM University Program Copyright © ARM Ltd 2014 76 Learning Objectives

. Using a timer peripheral . Low-power timer configuration

. Interrupt handlers

. Low-Power MCU operation . Low power modes . Run and stop modes . Low-leakage wakeup unit

. Measuring power and energy . Concepts . Using an ultracapacitor

ARM University Program Copyright © ARM Ltd 2014 77 System Power

. The MCU uses up to 15 to 20 mW . Many embedded systems have peripheral circuitry, and that also draws power . So we need to consider that as well. . The Freedom board is no exception – it is a good example.

ARM University Program Copyright © ARM Ltd 2014 78 Freedom KL25Z Board Power Architecture

J4 P3V3_KL25Z P5V_SDA P3V3 KL25Z MCU

Linear 3.3V J14 P5-9_VIN Regulator P3V3_SDA (Reset) J3 OpenSDA P5V_KL25Z Coin Cell Interface

Inertial Sensor

RGB LEDs

. Will evaluate impact of disconnecting OpenSDA interface

ARM University Program Copyright © ARM Ltd 2014 79 How Do We Measure Energy? . Energy ( ) = . 𝑇𝑇 V and I will𝑊𝑊 𝑇𝑇 vary∫ as0 𝑉𝑉 we𝑡𝑡 𝐼𝐼turn𝑡𝑡 𝑑𝑑 𝑑𝑑on and off devices, change clock speeds, etc. . We need something to integrate power over time

. Solutions: . Use sampling energy meter, but low sample rate will limit accuracy . MCU make wake up for only a few microseconds before going to sleep . Power the circuit from a capacitor with a known capacitance C, then calculate capacitor energy before and after test

ARM University Program Copyright © ARM Ltd 2014 80 Capacitor-Based Energy Measurement

. Measure capacitor voltage before (V1) and after (V2) V1 . The energy W used by the circuit can be calculated = 2 2 𝑉𝑉1 −𝑉𝑉2 V2 . Average power is 𝑊𝑊energy𝐶𝐶 2divided by time = 2 2 Capacitor Voltage 𝑉𝑉1 −𝑉𝑉2 Δt . A constant current𝑃𝑃 load𝐶𝐶 2I𝑡𝑡will take t seconds to discharge the capacitor from V1 to V2: Time ( ) = 𝑉𝑉1 −𝑉𝑉2 . A constant resistance𝑡𝑡 𝐶𝐶 load𝐼𝐼 R will take t seconds to discharge the capacitor from V1 to V2:

= 𝑉𝑉2 ARM University Program 𝑡𝑡 −𝐶𝐶𝐶𝐶 𝑙𝑙𝑙𝑙 𝑉𝑉1 Copyright © ARM Ltd 2014 81 Summary

. Ready-to-use teaching material . Supports range of embedded systems courses . Introductory . Programming a microcontroller in C . Advanced . Optimizing response time . Optimizing execution speed . Optimizing memory use . Optimizing power and energy . Targets Freescale Freedom-KL25Z MCU platform

. For more information, contact . [email protected] . [email protected] . www.arm.com/university

ARM University Program Copyright © ARM Ltd 2014 82