STM32 MICRONTROLLER Introduction to Embedded Systems Design Introduction • What is an ? • Application-specific system • Built into a larger system

• Why add a computer to the larger system? • Better performance • More functions and features • Lower cost • More dependability

• Economics • (used for embedded ) are high-volume, so recurring cost is low • Nonrecurring cost dominated by software development

• Networks • Often embedded system will use multiple processors communicating across a network to lower parts and assembly costs and improve reliability Example Embedded System: Bike Computer • Functions • Speed and distance measurement

• Constraints • Size • Cost • Power and Energy • Weight

• Inputs • Wheel rotation indicator • Mode key

• Output • Liquid Crystal Display

• Low performance MCU • 8-bit, 10 MIPS Gasoline Automobile Engine Control Unit

• Functions • Fuel injection • Air intake setting • Spark timing • Exhaust gas circulation • Electronic throttle control • Knock control

Image courtesy • Constraints of Freescale • Reliability in • Many Inputs and Outputs harsh environment • Discrete sensors & actuators • Cost • Network interface to rest of car • Weight • High Performance MCU • 32-bit, 3 MB flash memory, 150 - 300 MHz Benefits of Embedded Computer Systems

• Greater performance and efficiency • Software makes it possible to provide sophisticated control

• Lower costs • Less expensive components can be used • Manufacturing costs reduced • Operating costs reduced • Maintenance costs reduced

• More features • Many not possible or practical with other approaches

• Better dependability • Adaptive system which can compensate for failures • Better diagnostics to improve repair time Embedded System Functions

• Closed-loop control system • Monitor a , adjust an output to maintain desired set point (temperature, speed, direction, etc.)

• Sequencing • Step through different stages based on environment and system

• Signal processing • Remove noise, select desired signal features

• Communications and networking • Exchange information reliably and quickly

• Interfacing with larger system and environment • Analog signals for reading sensors • Typically use a voltage to represent a physical value • Power electronics for driving motors, solenoids • Digital interfaces for communicating with other digital devices • Simple - switches • Complex - displays vs. Microprocessor

• Both have a CPU core to execute instructions

• Microcontroller has peripherals for concurrent embedded interfacing and control • Analog • Non-logic level signals • Timing • Clock generators • Communications • point to point • network • Reliability and safety Microcontroller vs. Microprocessor • Roughly speaking: MCU= CPU + peripherals (e.g. memory, programmable input/output peripherals )

• ARM provides ARM IPs like Cores, internal bus, controllers, etc.

• But MCUs are not created equal! MCUs from different vendors really vary due to different design decisions: • Architecture • Implementation • Processing optimization • Peripherals • Power management • Preferred tool chains • …… Attributes of Embedded Systems

• Concurrent, reactive behaviors

• Must respond to sequences and combinations of events

• Real-time systems have deadlines on responses

• Typically must perform multiple separate activities concurrently MCU Hardware & Software for Concurrency

• CPU executes instructions from one or more thread of execution • Specialized hardware peripherals add dedicated concurrent processing • DMA - transferring data between memory and peripherals • Watchdog timer • Analog interfacing • Timers • Communications with other devices • Detecting external signal events • Peripherals use to notify CPU of events Concurrent Hardware & Software Operation

SoftwareHardware Software Hardware Software Time

• Embedded systems rely on both MCU hardware peripherals and software to get everything done on time Cortex M-x Technical presentation Contents  Cortex M-x general comparison, migration and selection

 Overview of Cortex M0, M3, M4 and M7 cores

 Main technical features of Cortex M0 core (optional)

 Main technical features of Cortex M3 core (optional)

 DSP and FPU insights of Cortex M4 (optional)

 Main Peripherals

14 Cortex M-x general comparison, migration and selection Cortex-M Processor Portfolio • Traditional 8/16/32-bit classification is obsolete Cortex-M0 Cortex-M3 Cortex-M4

“8/16-bit” applications “16/32-bit” applications “32-bit/DSC” applications Performance efficiency MCU plus DSP Lowest cost Feature-rich Accelerated SIMD, Optimized connectivity connectivity FP and DSP

Increasing performance

Binary and Tool Compatible Cortex ™-M microcontroller – power highlights  Active current < 200µA/MHz  Sleep mode current < 50µA  Deep Sleep mode current < 1µA Cortex M common features Targeting the microcontroller applications

 Very good power and area optimization  Designed for low cost and low power  Automatic state saving on interrupt and exceptions  Low software overhead on exception entry and exit  Deterministic instruction execution timing  Instructions always take same time to execute, from a deterministic memory system

17 Cortex-M feature set comparison Cortex-M0 Cortex-M3 Cortex-M4 Architecture Version V6M v7M v7ME Instruction set architecture Thumb, Thumb-2 System Thumb + Thumb-2 Thumb + Thumb-2, Instructions DSP, SIMD, FP DMIPS/MHz 0.9 1.25 1.25 Bus interfaces 1 3 3 Integrated NVIC Yes Yes Yes Number interrupts 1-32 + NMI 1-240 + NMI 1-240 + NMI Interrupt priorities 4 8-256 8-256 Breakpoints, Watchpoints 4/2/0, 2/1/0 8/4/0, 2/1/0 8/4/0, 2/1/0 Memory Protection Unit (MPU) No Yes (Option) Yes (Option) Integrated trace option(ETM) No Yes (Option) Yes (Option) Fault Robust Interface No Yes (Option) No Single Cycle Multiply Yes (Option) Yes Yes Hardware Divide No Yes Yes WIC Support Yes Yes Yes Bit banding support No Yes Yes Single cycle DSP/SIMD No No Yes Floating point hardware No No Yes (Option) Bus protocol AHB Lite AHB Lite, APB AHB Lite, APB CMSIS Support Yes Yes Yes

18 Cortex-M processors binary compatible

19 Cortex-M – firmware compatibility(1/2)

• Cortex M processors are FW and binary compatible

• Migrating path M0->M3->M4 is straight forward • Instruction set of Cortex-Mx is strictly included in the instruction set of Cortex-My (for x

• Re-compilation of the code is recommended • From Cortex-M0 to Cortex-M3, in order to fully take advantage of the higher performance ISA (e.g. HW division) • From M0/M3 to M4 w/ FPU, in order to generate the FPU code

20 Cortex-M – firmware compatibility(2/2)

• Code density is equivalent on the different Cortex-M implementations • Code size differences for usual codes are bellow few percents, provided that the same optimizations options are chosen in the compiler

• Within the STM32 family, common peripheral set compatibility is guaranteed in order to take full advantage of this simple migration path • Using the STM32 CMSIS library makes this porting very easy

21 Overview of Cortex M0, M3 and M4 cores Cortex-M0 processor microarchitecure • ARMv6M Architecture • Thumb-2 Technology • Integrated configurable NVIC • Compatible with Cortex-M3

• Microarchitecture • 3-stage pipeline with branch speculation • 1x AHB-Lite Bus Interfaces

• Configurable for ultra low power • Deep Sleep Mode, Wakeup Interrupt Controller

• Flexible configurations for wider applicability • Configurable Interrupt Controller (1-32 Interrupts and Priorities) • No Memory Protection Unit • Optional Debug & Trace

23 M0 – a low cost Cortex-M processor

• Address “low end” applications, as a 8/16-bit MCU replacement • Similar gate count to 16-bit processors

• High performance; only 25-30% lower than the standard Cortex-M3 32-bit architecture

• From the global MCU price point, it is lower than most of 8/16-bit processors due to lower code memory footprint (similar, in a few percent range to the Cortex-M3)

24 Cortex-M3 processor microarchitecture

• ARMv7ME Architecture • Thumb-2 Technology • Integrated configurable NVIC

• Microarchitecture • 3-stage pipeline with branch speculation • 3x AHB-Lite Bus Interfaces

• Configurable for ultra low power • Deep Sleep Mode, Wakeup Interrupt Controller

• Flexible configurations for wider applicability • Configurable Interrupt Controller (1-240 Interrupts and Priorities) • Optional Memory Protection Unit • Optional Debug & Trace

25 Cortex-M4 processor microarchitecture

• ARMv7ME Architecture • Thumb-2 Technology • DSP and SIMD extensions • Single cycle MAC (Up to 32 x 32 + 64 -> 64) • Optional single precision FPU • Integrated configurable NVIC • Compatible with Cortex-M3

• Microarchitecture • 3-stage pipeline with branch speculation • 3x AHB-Lite Bus Interfaces

• Configurable for ultra low power • Deep Sleep Mode, Wakeup Interrupt Controller • Power down features for Floating Point Unit

• Flexible configurations for wider applicability • Configurable Interrupt Controller (1-240 Interrupts and Priorities) • Optional Memory Protection Unit • Optional Debug & Trace

26 M4 - another Cortex-M processor

• Address new markets requiring digital signal control • Digital Signal (Processor + Micro) Controller

MCU DSP Harvard architecture Ease of use Single cycle MAC C Programming Cortex-M4 Floating Point Interrupt handling Barrel shifter Ultra low power • An intelligent blend of MCU and DSP features demanded • Upper limits of bandwidth challenged in general purpose MCUs • Hard to learn/program technology in many general purpose DSPs

• Extend the Cortex-M portfolio to cover new markets • Cortex-M0 for mixed signal devices and state machine replacements • Cortex-M3 for mainstream 32-bit microcontrollers • Opportunity - high end MCUs and DSC market

• Introduce ARM strengths to digital signal control market • Very high energy efficiency – more processing in less mW • Strong software ecosystem – easy to program and use

27 Main technical features of Cortex M0, M3 and M4 cores

(click on the picture below to see the given Cortex M presentation) Main technical features of Cortex M3 core Cortex-M3 Processor 30

• Hierarchical processor integrating core and advanced system peripherals

• Cortex-M3 core • Harvard architecture • 3-stage pipeline w. branch speculation • Thumb ®-2 and traditional Thumb • ALU w. H/W divide and single cycle multiply

• Cortex-M3 Processor • Cortex-M3 core • Configurable interrupt controller • Bus matrix • Advanced debug components • Optional MPU & ETM Cortex-M3 Processor Overview (1/2)

ARM v7M Architecture Thumb-2 Instruction Set Architecture Mix of 16 and 32 bit instructions for very high code density Harvard architecture Separate I & D buses allow parallel instruction fetching & data storage Integrated Nested Vectored Interrupt Controller (NVIC) for low latency interrupt processing Vector Table is addresses, not instructions Designed to be fully programmed in C Even reset, interrupts and exceptions Integrated Bus Matrix Bus Arbiter Bit Banding – Atomic Bit Manipulation Write Buffer Memory Interface (I&D) Plus System Interface & Private Peripheral Bus Integrated System Timer (SysTick) for Real Time OS or other scheduled tasks Cortex-M3 Processor Overview (2/2)

• 3-Stage Pipeline

• Fetch, Decode & Execute

• Single Cycle Multiply

Source Destination Cycles 16b x 16b 32b 1 32b x 16b 32b 1 32b x 32b 32b 1 32b x 32b 64b 3-7*

*UMULL, SMULL,UMLAL, and SMLAL are interruptible and can also complete early depending on source values Hardware Division UDIV & SDIV (Unsigned or Signed divide) Instruction takes between 2 & 12 cycles depending on dividend and devisor Closer the dividend and division the faster the instruction completes Instruction is interruptible (abandoned/restarted)

32 Cortex-M3 & ARM7: Key Features Comparison

ARM7TDMI-S Cortex-M3 Architecture v4T v7M ISA Support ARM (32-bit) & Thumb (16-bit) Thumb-2 (Merged 32/16-bit)

DMIPS/MHz 0.74 Thumb / 0.93 ARM 1.25 Thumb-2

Pipeline 3-Stage 3-Stage + Branch Speculation

Interrupts FIQ / IRQ NMI, SysTick and up to 240 interrupts. Integrated NVIC Interrupt Controller up to 1-255 Priorities Interrupt Latency 24-42 Cycles 12 Cycles (Depending on LSM) (6 when Tail Chaining) Memory Map Undefined Architecture Defined System Status PSR. 6 modes. xPSR. 2 modes. 20 Banked regs Stacked regs (1 bank) Sleep Modes No Three

• Additional Features of the Cortex-M3 • Reduced pin debug & trace interfaces reduce pin overhead from 9-pins to 2- or 3-pins • Hardware Interrupt Handling removes need for assembler code in interrupts • Integrated atomic bit manipulation for improved data storage • Extended Data Watchpoints & Flash Patch technology • Embedded sleep control and power-down modes • Optional very small Memory Protection Unit (MPU) & Embedded Trace Macrocell (ETM) 33 High Performance CPU and Buses

ARM v7M Architecture : Harvard benefits with Von Neumann single memory space

Von Neumann “bottleneck” Three 32bit buses for a parallel Single 32bit bus for: CODE ♦ code execution, 0 1 ♦ code execution, CORE 0 CM3 0 ♦ data transfer (core/DMA), 0 1 0 ♦ data transfer ( core/DMA ), 0 ♦ peripheral control 1 DATA 1 1 1 1 00 ♦ peripheral control 1 1 1 0 1 1 00 11 00 0 1 1 CST 0 1 1 10 1 11 0 1 011 0 0 0 0 0 0 11 0 0 011 0 1 0 DMA 0 1 0 0 DMA 1 1 101 0 1 0 0 0 PERIPH 0 PERIPH 0 00 RAM FLASH RAM FLASH 1 PERIPH 1 PERIPH 1

CORTEX-M3 DMIPS ARM966 (ARM) ARM7TDMI (ARM) Outstanding efficiency of 1.25 DMIPS/MHz

ARM7TDMI (THUMB)

fCPU

34 High Performance CPU and Buses

THUMB2 instruction set provide 32bit performance with 16bit code density

THUMB 16bit Instruction Set Full THUMB compatibility THUMB-2

ARM 32bit Instruction Subset ARM instruction set ♦ Single POWERFULL instruction for better performance set  No more mode switching ♦ Two 16bit instruction fetch per FLASH access 1 cycle MAC and Hardware Divide New 16/32bit Instructions Bit handling

35 Bit Banding Traditional method Speed and code size optimized Cortex-M3 implementation Disable external events 32bit REAL Read byte (RAM, register) @Rbase+N b31 b0 memory 0 1 0 0 1 0 1 1 image example: 20000000h to 200FFFFFh

Mask and modify bit element b31 X X X X X 1 X X VIRTUAL aliased @Vbase+Nx32+4xbit bit banding Write byte (RAM, register) image 0 1 0 0 1 1 1 1 @Vbase+Nx32 b0

example: 22000000h to 23FFFFFFh Enable external events ♦ Bit Banding done by bus matrix. ♦ Single instruction Read/Modify/Write (no more masking). ♦ No new instruction set  Use standard data one (AND, OR, XOR…). Optimized RAM, peripherals and IOs registers accesses Easy multi-task semaphore management

36 Debug Capabilities Serial Wire Debugging for optimized device pin-out

More pins available SWD JTAG for the application

Embedded break/watch capabilities for easy flashed application debugging ♦ 2 hardware breakpoints  8 hardware breakpoints ♦ 2 hardware watchpoints

Serial Wire Viewer for targeted low bandwidth data trace ♦ Using serial wire interface or dedicated bus CKout+D[3..0] for better bandwidth ♦ Triggered by embedded break and watch points

ETM capability for better real time debugging ♦ Instruction trace only ♦ External signal triggering capability ♦ Can be used in parallel with data watchpoint

Debugging features still kept whilst the core entered low power mode Privilege, Modes and Stacks

• Privileged/Non-privileged operation • Same as ARM7 Supervisor/User

• Thread mode and Handler mode • Handler mode is an exception or interrupt • Thread mode is just normal application code running

• Main stack – Process stack • Exceptions use main stack in privileged mode • Applications (thread mode) can use process stack

38 Privilege • Code can execute as “privileged” or “unprivileged” • Privileged operation • Also called supervisor privilege • Active out of reset • Entered whenever an exception or interrupt is taken • Privileged operation allows access to all processor resources • Unprivileged operation • Also called user privilege • Limited access to processor resources, prevents: • Use of some instructions such as CPS to set FAULTMASK and PRIMASK, MSR fields • Access to System Control Space (SCS) registers such as NVIC and SysTick Modes • Cortex-M3 has 2 execution modes • Handler mode • An exception is being processed • An exception handler or ISR is executing • Could be an interrupt or a fault • Always privileged execution • Thread mode • No exception is being processed • Normal code is executing • Could be privileged or unprivileged • When Thread mode has been changed from privileged to unprivileged, it cannot change itself back to privileged. Only a Handler can change the privilege of Thread mode. • This model is a simplification of the modes from other ARM processors • Other ARM processors have several other modes Stacks • Cortex-M3 supports two stacks • Main Stack • Process Stack • Exceptions use main stack • Thread mode (no exceptions active) uses either main or process stack • SW selectable • The intended usage model is • OS and Exceptions use main stack • Threads (user processes) use the process stack • Intended to prevent user process from modifying the main stack • Must be configured in the Special Purpose Control Register • Accessed via MRS/MSR instructions • Can be configured to use just one stack (Main) • This is default on reset Privilege, Modes and Stacks Summary

Operations Stacks (privilege out of reset) (Main out of reset)

Handler Privileged execution Main Stack Used by - An exception is being processed Full control OS and Exceptions

Thread Privileged/Unprivileged Main/Process - No exception is being processed - Normal code is executing Modes out of (Thread reset)

42 Simplified Register Set

Main • Very simple, linear 4GByte address space R0 R1 • No data pages R2 • No code pages R3 R4 • Very compiler friendly R5 R6 • Flexible register scheme R7 R8 • Single-cycle multiply possible between any of the registers R9 R10 • Any register can be used as a pointer to data R11 structures/arrays R12 Process R13(SP) SP • Same “visible” register file as any other ARM R14(LR) architecture device R15 (PC)

• Allowing unified assembler to mimic equivalent ARM xPSR instructions in Thumb-2 mode xPSR – Program Status Register

31 26252423 1615 1087 0

N Z C V Q ICI/IT T ICI/IT ISR Number

• Allows access to APSR , EPSR and IPSR special purpose registers • PSR stored on stack during exceptions • Condition code flags

• N = Negative result from ALU

• Z = Zero result from ALU

• C = ALU Operation carried out

• V = ALU Operation overflowed

• Q = Saturated math overflow • IT/ICI Bits

• Contain IF-THEN base condition code and Interrupt Continue information • ISR Number

• ISR contains information on which exception was pre-empted If… Then Conditional Blocks • The If… Then instruction can be used to generate blocks of up to four instructions dependent on a single value of the condition code flags

;if (r0 == 0) The IT instruction itself does not change the ; r0 = *r1 + 2 condition codes ;else ; r1 = *r2 + 4 In general, you should not branch into or out of an ;if IT block A branch may be written as the final instruction CMP r0,#0 ITTEE EQ SWI may be used anywhere

;then IT blocks cannot be nested LDREQ r0, [r1] ADDEQ r0, #2 ;else LDRNE r0, [r2] ADDNE r0, #4 Exception/Interrupt Handling

• Very low latency interrupt processing

• Exceptions processed in Privileged operation

• Interruptible LDM/STM for low interrupt latency

• Automatic processor state save and restore

• Provides low latency ISR entry and exit • Allows handler to be written entirely in ‘C’ • The Cortex-M3 processor integrates an advanced Nested Vectored Interrupt Controller (NVIC)

• The NVIC supports up to 240 dynamically re-prioritizable interrupts each with up to 256 levels of priority

• Allows early processing of interrupts

• Supports advanced features for next generation real-time applications

• Tail-chaining of pending interrupts • Late-arrival interrupt handling and priority boosting / inversion

Exceptional Control Capabilities Through Integrated Interrupt Handling Interrupt Handling Method

• Interrupt handling is micro-coded. No instruction overhead

• Entry

• Processor state automatically saved to the stack over the data bus.

• {PC, xPSR, R0-R3, R12, LR} • In parallel, ISR is prefetched on the instruction bus.

• ISR ready to start executing as soon as stack PUSH complete. • Late arriving interrupt will restart ISR prefetch, but state saving does not need to be repeated. • Exit

• Processor state is automatically restored from the stack. • In parallel, interrupted instruction is prefetched ready for execution upon completion of stack POP. • Stack POP can be interrupted, allowing new ISR to be immediately executed without the overhead of state saving. 47 Interrupt Response- Tail Chaining

IRQ1 Highest IRQ2 42 CYCLES

ARM7 PUSH ISR 1 POP PUSH ISR 2 POP Interrupt handling in assembler code 26 16 26 16

Tail-chaining

Cortex-M3 PUSH ISR 1 ISR 2 POP Interrupt handling in HW 12 6 12 6 CYCLES

ARM7 Cortex-M3

• 26 cycles from IRQ1 to ISR1 entered • 12 cycles from IRQ1 to ISR1 entered •Up to 42 cycles if LSM • 12 cycles if LSM •42 cycles from ISR1 exit to ISR2 entry •6 cycles from ISR1 exit to ISR2 entry •16 cycles to return from ISR2 •12 cycles to return from ISR2

48 Interrupt Response – Preemption

IRQ1 Highest IRQ2

42 CYCLES

ARM7 ISR 1 POP PUSH 2 ISR 2 POP 16 26 16

Cortex-M3 ISR 1 POP ISR 2 POP

1- 6 12 12 7-18 CYCLES Cortex-M3 ARM7 • POP may be abandoned early if another • Load Multiple uninterruptible, interrupt arrives and hence the core must complete the • If POP is interrupted it only takes 6 POP and the full stack PUSH cycles to enter ISR2 ( Equivalent to Tail -chaining)

49 Interrupt Response – Late Arriving

IRQ1 Highest IRQ2

ARM7 PUSH PUSH ISR 1 POP ISR 2 POP 26 26 16 16

Cortex-M3 PUSH ISR 1 ISR 2 POP 6 12 Tail- Chaining

ARM7 Cortex-M3

• 26 cycles to ISR2 entered • Stack push to ISR 2 is interrupted • Immediately pre-empted by IRQ1 and • Stacking continues but new vector address takes a further 26 cycles to enter ISR 1. is fetched in parallel • ISR 1 completes and then takes 16 • 6 cycles from late-arrival to ISR1 entry. cycles to return to ISR 2. • Tail-chain into ISR 2

50 Interrupt Response – Example Highest

NMI

IRQ1

IRQ2

IRQ3

PUSH PUSH NMI ISR 1 POP ISR 2 ISR 3 POP

ISR 2  Push for ISR1 begins Starts  Pre-empted by NMI Cortex-M3  New instruction fetch in parallel minimises time to NMI •Following NMI processor tail-chains into ISR1 •ISR2 Completed •Pop only occurs on return to “Main”

51 NVIC Registers Each interrupt input has several registers to control it Enable/Disable Bit Enable or disable the interrupt Can be set, cleared or read Pending Bit If the pending bit is set, then the interrupt is pending An interrupt can be “pended” by setting the pending bit A pending interrupt can only be taken (become active) if it is enabled and it has sufficient priority to run Pending bit can be set, cleared or read Active Bit A bit is set if the interrupt is executing or “active-stacked” “Active-stacked” means the interrupt was executing, but was pre-empted by another higher-priority interrupt Active register is normally read only Priority field 4 bits of priority for each interrupt

52 Interrupt Prioritization

Each interrupt source has an 8-bit interrupt priority value The 8 bits are divided into pre-empting priority levels and non-pre-empting “sub-priority” levels Sub-priority levels only have an effect if the pre-empting priority levels are the same The software programmable PRIGROUP register field of the NVIC chooses how many of the 8-bits are used for “group-priority” and how many are used for “sub- priority” Group priority is the pre-empting priority Lower numbers are higher priority Hardware interrupt number is lowest level of prioritization IRQ3 is higher priority than IRQ4 if the priority registers are programmed the same In STM32F10x 16 levels (4-bit) of priority are implemented: Preempting Priority PRIGROUP Binary Point Sub-Priority (Group Priority) (3 Bits) (group.sub) Bits Levels Bits Levels 011 4.0 gggg 4 16 0 0 100 3.1 gggs 3 8 1 2 101 2.2 ggss 2 4 2 4 110 1.3 gsss 1 2 3 8 111 0.4 ssss 0 0 4 16

53 Cortex-M3 Exception Types

Type of No. Exception Type Priority Descriptions Priority 1 Reset -3 (Highest) fixed Reset

2 NMI -2 fixed Non-Maskable Interrupt

3 Hard Fault -1 fixed Default fault if other hander not implemented

4 MemManage Fault 0 settable MPU violation or access to illegal locations

5 Bus Fault 1 settable Fault if AHB interface receives error

6 Usage Fault 2 settable Exceptions due to program errors

7-10 Reserved N.A. N.A.

11 SVCall 3 settable System Service call

12 Debug Monitor 4 settable Break points, watch points, external debug

13 Reserved N.A. N.A.

14 PendSV 5 settable Pendable request for System Device

15 SYSTICK 6 settable System Tick Timer

16 Interrupt #0 7 settable External Interrupt #0

…… ………………….. ………………….. settable ………………….. 256 Interrupt#240 247 settable External Interrupt #240

54 Execution Priority

The execution priority is defined to be the maximum priority of all active exceptions This definition of execution priority prevents priority inversion Priority Boosting increases the current execution priority

55 Priority Boosting The priority can be boosted by the following mechanisms PRIMASK: setting this bit raises the execution priority to 0 This prevents all exceptions with configurable priority from activating, other than through the HardFault escalation mechanism FAULTMASK: setting this mask bit raises the execution priority to -1 Can only be set when the execution priority is lower than -1 to avoid escalating the priority of a fault handler Cleared automatically on all exception returns (except NMI) BASEPRI: can be written with a value from N (lowest configurable priority) to 1 A non-zero value will act as a priority mask, affecting the execution priority when the priority defined by BASEPRI is the same or higher than the current executing priority These mechanisms only affect the group priority They have no effect on the sub-priority The sub-priority is only used to sort pending exception priorities and does not affect active exceptions

56 Vector Table

• Vector Table starts at location 0 Address Vector 0x00 Initial Main SP • In the code section of the memory map 0x04 Reset • Vector Table contains addresses (vectors) 0x08 NMI 0x0C Hard Fault of exception handlers and ISRs 0x10 Memory Manage • Not instructions like other ARM processors 0x14 Bus Fault 0x18 Usage Fault • Table size (in words) is = number of IRQ inputs + 16 0x1C-0x28 Reserved • Minimum size ( case of 1 IRQ) : 17 words 0x2C SVCall

• Maximum size ( case of 240 IRQs) 256 words 0x30 Debug Monitor 0x34 Reserved • Main stack pointer initial value in location 0 0x38 PendSV • Set up by hardware during Reset 0x3C Systick 0x40 IRQ0 • Vector Table can be relocated (to SRAM) … More IRQs • Software configurable through dedicated register in SCB

57 Cortex-M3 Memory Map

• Vendor Specific (0.5GB) • Set aside to enable vendors to implement peripheral compatibility with previous systems • Private Peripheral Bus (1M) • Address space for system components (CoreSight, NVIC etc.) • External Device (1GB). • Intended for external devices and/or shared memory that needs ordering/non-buffered • External RAM (1GB) • Intended for off chip memory • Peripheral (0.5G) • Intended for normal peripherals. The bottom 1MB of the 32MB peripheral address space (0x40000000 – 0x400FFFFF) is reserved for bit-band accesses. Accesses to the peripheral 32MB bit band alias region (0x42000000 – 0x43FFFFFF) are remapped to this 1MB • SRAM (0.5GB) • Intended for on-chip SRAM. The bottom 1MB of the SRAM address space (0x20000000 - 0x200FFFFF) is reserved for bit-band accesses. Accesses to the SRAM 32MB bit band alias region (0x22000000 – 0x23FFFFFF) are remapped to this 1MB address space. • Code(0.5GB) • Reserved for code memory (flash, SRAM). This region is accessed via the Cortex-M3 ICode and DCode busses. Power Management

8bit Microcontroller like power mode management SLEEP NOW ♦ “Wait for Interrupt” instructions to enter low power mode  No more dedicated control register settings sequence ♦ “Wait for Event” instructions to enter low power mode  No need of Interrupt to wake-up from sleep  Rapid resume from sleep SLEEP on EXIT ♦ Sleep request done in interrupt routine ♦ Low power mode entered on interrupt return  Very fast wakeup time without context saving (6 cycles) DEEP SLEEP ♦ Long duration sleep  From product side: PLL can be stopped or shuts down the power to digital parts of the system  Enables low power consumption

Optimized RUN mode CORE power consumption 3 time less than ARM7 TDMI System Timer (SysTick)

• Flexible system timer

• 24-bit self-reloading down counter with end of count interrupt generation

• 2 configurable Clock sources

• Suitable for Real Time OS or other scheduled tasks

In STM32F10x the SysTick clock can be: CPU clock or CPU clock/8 (provided externally by the Reset Clock Control )

60 DSP and FPU insights of Cortex M4 FPU benefits and performance FPU benefits in real life applications

High level approach Matrix, mathematical equations

Meta language tools Matlab ,Scilab…etc…

C code generation Floating point numbers ( float )

FPU No FPU No FPU Direct mapping Usage of SW lib Usage of integer based format No code modification No code modification Code modification High performance Low performance Corner case behavior to be checked Optimal code efficiency Medium code efficiency (saturation, scaling) Medium/high performance Medium code efficiency

63 Cortex-M4 single precision floating point

• IEEE 754 standard compliant

• Decoupled floating point pipeline

• Single-precision floating point math • Add, subtract, multiply, divide, MAC and square root • Fused MAC – higher precision

OPERATION CYCLE COUNT Add/Subtract 1 Divide 14 Multiply 1 Multiply Accumulate (MAC) 3 Fused MAC 3 Square Root 14

64 FPU assembly code generation

float function1(float number1, float number2) { float temp1, temp2;

temp1 = number1 + number2; temp2 = number1/temp1;

return temp2; }

# float function1(float number1, float number2) # float function1(float number1, float number2) # { # { # float temp1, temp2; PUSH {R4,LR} # MOVS R4,R0 # temp1 = number1 + number2; MOVS R0,R1 VADD.F32 S1,S0,S1 # float temp1, temp2; # temp2 = number1/temp1; # VDIV.F32 S0,S0,S1 # temp1 = number1 + number2; # MOVS R1,R4 # return temp2; BL __aeabi_fadd BX LR MOVS R1,R0 # } # temp2 = number1/temp1; MOVS R0,R4 BL __aeabi_fdiv # 1 assembly instruction # return temp2; POP {R4,PC} # } Call Soft-FPU

65 Floating point benchmark

 Time execution comparison for a 29 coefficient FIR on float 32 with and without FPU (CMSIS library) N −1 Execution y[][][]n = h k x n − k Time ∑ k=0

10x improvement Best compromise Development time vs. performance

No FPU FPU

66 DSP benefits and performance Single-cycle multiply-accumulate (MAC)

• The multiplier unit allows any MUL or MAC instructions to be executed in a single cycle • Signed/Unsigned Multiply • Signed/Unsigned Multiply-Accumulate • Signed/Unsigned Multiply-Accumulate Long (64-bit)

• Benefits : Speed improvement vs. Cortex-M3 • 4x for 16-bit MAC (dual 16-bit MAC) • 2x for 32-bit MAC • up to 7x for 64-bit MAC

68 Saturated arithmetic • Intrinsically prevents overflow of variable by clipping to min/max boundaries and remove CPU load due to software range checks

1.5 • Benefits 1 • Audio applications Without 0.5 saturation 0 1.5 -0.5 1 -1 0.5 -1.5 0 1.5 1 -0.5 With 0.5 -1 saturation 0 -1.5 -0.5 -1 -1.5 • Control applications • The PID controllers’ integral term is continuously accumulated over time. The saturation automatically limits its value and saves several CPU cycles per regulators Single-cycle SIMD instructions • Stands for Single Instruction Multiple Data

• Allows to do simultaneously several operations with 8-bit or 16-bit data format • Ex: dual 16-bit MAC 32- (Result = 16x16 + 16x16 + 32) bit

• Ex: Quad 8-bit SUB / ADD

• Benefits • Parallelizes operations (2x to 4x speed gain) • Minimizes the number of Load/Store instruction for exchanges between memory and register file (2 or 4 data transferred at once), if 32-bit is not necessary • Maximizes register file use (1 register holds 2 or 4 values)

70 16-bit DSP functions compared

Relative cycle counts for DSP tasks running on 16-bit data shown below Smaller is better on the chart – Cortex-M4 is 30% to 70% better

71 32-bit DSP functions compared Relative cycle counts for DSP tasks running on 32-bit data shown below Smaller is better on the chart – Cortex-M4 is 25% to 60% better

72 73

DSP

One step closer to Digital-Signal Processor

(but still universal easy-to-use STM32 MCU) Cortex-M7 Processor Overview 74 • ARMv7E-M Architecture

• Harvard architecture, 6-stage pipeline

• Dual-issue superscalar architecture!

• DIV in 12-cycles max, SIMD instructions

• Memory Protection Unit (MPU)

• Floating point unit

⇒ Included in current STM32 based on ARM Cortex-M4 and Cortex-M7

12-cycles interrupt latency still

One step closer to DSPs One step closer to Real-Time processors Load and store in parallel with arithmetic Tightly Coupled Memories Zero overhead loops AXI-M interface with Cache memory Core Architecture 75 FLASH ITCM DTCM AHBS ART for DMAs

SQ TIGHTLY COUPLED MEMORY UNIT

NVIC Interrupt requests MPU

DEBUG ETM/ITM trace DATA PROCESSING Debug PREFETCH UNIT LOAD/STORE UNIT UNIT (+ FPU) AHBD

STORE BUFFER AHBP (peripherals)

I-Cache D-Cache BUS INTERFACE UNIT ARM ® Cortex ®-M7 AXI-M

AXI to Multi-AHB You can find on the leaflet

External memories Internal memories ARM Cortex-M7 → dual-issue 76

PREFETCH DATA PROCESSING UNIT LOAD/STORE UNIT (+ FPU) UNIT

Execute Update from DPU Load/Store 1

Prefetch Fetch Decode Issue (2x 32b) 32-bit Load/Store 2

BTAC 64-entry 4 3 2 1 ALU 1 (Main) X 64 bits

ALU 2 32-bit #1 DECODE #2 DECODE

from NVIC MAC (32b x 32b + 64b) 64-bit per cycle BRANCH

code memories FPU Load and store in parallel with arithmetic 77 • Cortex-M4 Group as many loads and • Single load or store instructions take 2 cycles stores together • N consecutive loads or stores take N+1 cycles

• Cortex-M7 • Load and store operations can occur in parallel with math Interleave memory accesses with computation • Memory access possible without penalty Execute

Load/Store 1

Fetch Decode Issue (2x 32b) 32-bit Load/Store 2

ALU 1 (Main)

32-bit ALU 2 #1 DECODE #2 DECODE

Compiler job! RELAX TIME General Purpose I/O Overview

• How do we make a program light up LEDs in response to a switch?

• GPIO • Basic Concepts • Port Circuitry • Control Registers • Accessing Hardware Registers in C • Clocking and Muxing

• Circuit Interfacing • Inputs • Outputs

• Additional Configuration Basic Concepts

• GPIO = General-purpose input and output (digital) • Input: program can determine if input signal is a 1 or a 0 • Output: program can set output to 1 or 0

• Can use this to interface with external devices or on board peripherals • Input: switch, button…… • Output: LEDs, speaker…… STM32F40x LQFP100 pinout • Port A (PA) through Port E (PE)

• Not all port bits are available

• Quantity depends on package pin count GPIO Port Bit Circuitry in MCU • Configuration • Direction • MUX • Modes • Speed

• Data • Output (different ways to access it) • Input • Analogue

• Locking Control Registers • Each general-purpose I/O port has • four 32-bit configuration registers ( • GPIOx_MODER (input, output, AF, analog) • GPIOx_OTYPER (output type: push-pull or open drain) • GPIOx_OSPEEDR(speed) • GPIOx_PUPDR(pull-up/pull-down) • two 32-bit data registers(GPIOx_IDR and GPIOx_ODR) • a 32-bit set/reset register (GPIOx_BSRR) • a 32-bit locking register (GPIOx_LCKR) • two 32-bit alternate function selection register (GPIOx_AFRH and GPIOx_AFRL) • One set of control registers (10 in total) per port

• Each bit in a control register corresponds to a port bit

• All registers have to be accessed as 32-bit word GPIO Configuration registers

• Each bit can be configured differently

• Reset clears port bit direction to 0

• Output modes: push-pull or open drain + pull-up/down

• Output data from output data register (GPIOx_ODR) or peripheral (alternate function output)

• Input states: floating, pull- up/down, analog

• Input data to input data register (GPIOx_IDR) or peripheral (alternate function input) Alternate function selection register

• In AF mode, AFRL or AFRH needs to be configured to be driven by specific peripheral

• Can be seen as a select signal to the Mux

• EVENTOUT is not mapped onto the following I/O pins: PC13, PC14, PC15, PH0, PH1 and PI8. Inputs and Outputs, Ones and Zeros, Voltages and Currents INTERFACING Inputs: What’s a One? A Zero? • Input signal’s value is determined by voltage

• Input threshold voltages depend on supply voltage

VDD

• Exceeding V DD or GND may damage chip Outputs: What’s a One? A Zero?

• Nominal output voltages

• 1: V DD -0.5 V to V DD • 0: 0 to 0.5 V

• Note: Output voltage depends on current drawn by load on pin • Need to consider source-to-drain Logic 1 out resistance in the transistor • Above values only specified when current < 5 mA (18 mA for high-drive

pads) and V DD > 2.7 V out V

Logic 0 out

Iout Driving External LEDs • Need to limit current to a value which is safe for both LED and MCU port driver

• Use current-limiting resistor

• R = (V DD –VLED )/I LED

• Set I LED = 4 mA

• VLED depends on type of LED (mainly color) • Red: ~1.8V • Blue: ~2.7 V

• Solve for R given VDD = ~3.0 V • Red: 300 Ω • Blue: 75 Ω Output Example: Driving a Speaker

• Create a square wave with a GPIO output

• Use capacitor to block DC value

• Use resistor to reduce volume if needed

void Speaker_Beep(uint32_t frequency){

Init_Speaker();

while(1){

GPIOD->BSRRL=(MASK(2));

Delay(frequency);

GPIOD->BSRRH=(MASK(2));

Delay(frequency);

}

} Analog Interfacing Why It’s Needed • Embedded systems often need to measure values of physical parameters

• These parameters are usually continuous ( analog ) and not in a digital form which computers (which operate on discrete data values) can process

• Temperature • Pressure – Thermometer (do you have a fever?) – Blood pressure monitor – Thermostat for building, fridge, freezer – Altimeter – Car engine controller – Car engine controller – Chemical reaction monitor – Scuba dive computer – Safety (e.g. microprocessor processor – Tsunami detector thermal management) • Acceleration • Light (or infrared or ultraviolet) – Air bag controller intensity – Vehicle stability – Digital camera – IR remote control receiver – Video game remote – Tanning bed – UV monitor • Mechanical strain • Other • Rotary position – Touch screen controller – Wind gauge – EKG, EEG – Knobs – Breathalyzer CONVERTING BETWEEN ANALOG AND DIGITAL VALUES The Big Picture – A Depth Gauge V_ref // Your software Analog to ADC_Code = ADC0->R[0]; Pressure Digital V_sensor = ADC_code*V_ref/1023; Sensor Converter Pressure_kPa = 250 * (V_sensor/V_supply+0.04); Depth_ft = 33 * (Pressure_kPa – Atmos_Press_kPa)/101.3; Air Pressure Voltages ADC Output Codes ADC_Code V_ref 111..111 V_sensor 111..110 111..101 111..100

V_sensor ADC_Code

000..001 Ground 000..000 1. Sensor detects air pressure and generates a proportional output voltage V_sensor

2. ADC generates a proportional digital integer (code) based on V_sensor and V_ref

3. Code can convert that integer to a something more useful 1. first a float representing the voltage , 2. then another float representing pressure , 3. finally another float representing depth Getting From Analog to Digital

• A Comparator tells us “Is V > V ?” in ref Comparator • Compares an analog input voltage with an analog reference voltage and determines Vin which is larger, returning a 1-bit number 0 Vref • E.g. Indicate if depth > 100 ft

• Set Vref to voltage pressure sensor returns with 100 ft depth.

A/D Converter V • An Analog to Digital converter [AD or ref 0 ADC] tells us how large V in is as a fraction 1 of V . Vin ref 0 • Reads an analog input signal (usually a Clock voltage) and produces a corresponding 1 multi-bit number at the output. • E.g. calculate the depth Digital to Analog Conversion • May need to generate an analog voltage or current as an output signal • E.g. audio signal, video signal brightness. D/A Converter • DAC: “Generate the analog voltage 0

which is this fraction of Vref ” 1 0 Vout • Digital to Analog Converter equation 1 • n = input code Vref • N = number of bits of resolution of converter

• Vref = reference voltage

• Vout = output voltage. Either N • Vout = Vref * n/(2 ) or N • Vout = Vref * (n+1)/(2 ) • The offset +1 term depends on the internal tap configuration of the DAC – check the datasheet to be sure Waveform Sampling and Quantization Digitalvalue time

• A waveform is sampled at a constant rate – every ∆t • Each such sample represents the instantaneous amplitude at the instant of sampling • “At 37 ms, the input is 1.91341914513451451234311… V” • Sampling converts a continuous time signal to a discrete time signal

• The sample can now be quantized (converted) into a digital value • Quantization represents a continuous (analog) value with the closest discrete (digital) value • “The sampled input voltage of 1.91341914513451451234311… V is best represented by the code 0x018, since it is in the range of 1.901 to 1.9980 V which corresponds to code 0x018.” ANALOG TO DIGITAL CONVERSION CONCEPTS A/D – Flash Conversion • A multi-level voltage divider is used to set voltage levels over the 1V R Comparators complete range of conversion. 7/8 V + • A comparator is used at each level to R 1 determine whether the voltage is - 6/8 V + lower or higher than the level. R 1 • The series of comparator outputs are - 5/8 V encoded to a binary number in digital + logic (a priority encoder) R 1 4/8 V - • Components used + R Encoder • 2N resistors 3 - 0 • 2N-1 comparators 3/8 V + • Note R 0 - • This particular resistor divider 2/8 V + generates voltages which are not R offset by ½ bit, so maximum error is 1 0 - bit 1/8 V + • We could change this offset voltage R 0 by using resistors of values R, 2R, 2R - ... 2R, 3R (starting at bottom)

Vin ADC - Successive Approximation Conversion111111 • Successively approximate input voltage by using a binary search Test voltage and a DAC (DAC output)

• SA Register holds current Analog approximation of result Input 100110 100100 • Set all DAC input bits to 0 100000 0 1 Voltage 1 00 1 000 1 0000

• Start with DAC’s most significant 00000 1 bit 1

• Repeat • Set next input bit for DAC to 1 • Wait for DAC and comparator to stabilize know xxxxxx, try know xxxxxx, know 10011x, try 10011 try know 10011x, know 100110. Done. know 100110. know 10xxxx, know 10 10xxxx, try know 1xxxxx, 1 try know 1xxxxx, know 100xxx, know 100 100xxx, try know 1001xx, try know 1001xx, 1001 try 000000 • If the DAC output (test voltage) T T T T T T is smaller than the input then 1 2 3 4 5 6 Start of Time set the current bit to 1, else clear Conversion the current bit to 0 A/D - Successive Approximation Converter Schematic

Analog Input + Converter Comparator output

-

D/A Converter

Digital Output 12 Successive Approximation Start of Conversion Register Status

Clock ADC Performance Metrics

• Linearity measures how well the transition voltages lie on a straight line.

• Differential linearity measure the equality of the step size.

• Conversion time: between start of conversion and generation of result

• Conversion rate = inverse of conversion time Sampling Problems

• Nyquist criterion

• Fsample >= 2 * Fmax frequency component

• Frequency components above ½ Fsample are aliased, distort measured signal

• Nyquist and the real world • This theorem assumes we have a perfect filter with “brick wall” roll-off • Real world filters have more gentle roll-off • Inexpensive filters are even worse (e.g. first order filter is 20 dB/decade, aka 6 dB/octave) • So we have to choose a sampling frequency high enough that our filter attenuates aliasing components adequately Inputs

• Differential • Use two channels, and compute difference between them • Very good noise immunity • Some sensors offer differential outputs (e.g. Wheatstone Bridge)

• Multiplexing • Typically share a single ADC among multiple inputs • Need to select an input, allow time to settle before sampling

• Signal Conditioning • Amplify and filter input signal • Protect against out-of-range inputs with clamping diodes Sample and Hold Devices • Some A/D converters require the input analog signal to be held constant during conversion, (e.g. successive approximation devices)

Sampling • In other cases, peak capture or switch Output sampling at a specific point in time Signal Analog Input Hold necessitates a sampling device. Signal Capacitor

• This function is accomplished by a sample and hold device as shown to the right:

• These devices are incorporated into some A/D converters ANALOG TO DIGITAL CONVERTER ADC Overview

• Uses successive approximation for conversion • Supports multiple resolutions: 12, 10, 8 and 6 bits • 4 injected channels and 16 regular channels • Supports single and continuous conversions • DUAL/Triple ADC mode • DMA • Analog watchdog • Temperature sensor ADC Overview

• High sampling speed • Conversion range from 0 to 3.6 V • Different Supply requirement • Scan mode for automatic conversion • DUAL/Triple ADC mode • Channel by channel programmable sampling time • Interrupt generation on • End of (Injected)conversion • Analog watchdog • Overrun ADC System Fundamentals

Output Registers

ADC Analog Input

Clock Using the ADC

• ADC initialization • Configure GPIO (if using on board pins) • Enable clock • Enable ADC • Select voltage reference • Select trigger source • Select input channel • Select other parameters

• Trigger conversion

• Read results

• Calibrate? Average? On-off Control

• For power efficiency, the ADC module is usually turned off (even if it is clocked).

• If ADON bit in ADC control register 2 is set, the module is powered on; otherwise it is powered off.

• Good practical to shut down ADC whenever you are not using it. Clock Configuration

• Analog Clock • ADCCLK, common to all ADCs • From APB2 (72Mhz) (Can be prescale by 1,2,4,8 or 16) • Can be prescaled by 2, 4, 6 or 8, which means at most 36MHz • ADC common control register(ADC_CCR) bit 17:16

• Digital Interface Clock • Used for registers read/write access • From APB2 (72Mhz) • Need to be enable individually for each channel (RCC_APB2ENR) ADC Conversion Time • Programmable sample time for all channels • Sample time register 1 to 2 (ADC_SMPRx)

• Total conversion time = Tsampling + Tconversion Channel Selection

• Two groups of channels • Regular group • Up to 16 conversions • Consists of a sequence of conversions that can be done on any channel in any order • Specify each sequence by configuring the ADC_SQRx registers • Specify the total number of conversions by configuring the least 4 bits in the ADC_SQR1 register • Injected group • Up to 4 conversions • Similar to regular group • But the sequence is specified by the ADC_JSQR register • Specify the total number of conversions by configuring the least 2 bits in the ADC_JSQR register • Modifying either ADC_SQRx or ADC_JSQR will reset the current ADC process. Channel Selection • Three other channels • ADC1_IN16 is internally connected to the temperature sensor • ADC1_IN17 is internally connected to the reference voltage VREFINT • ADC1_IN18 is connected to the VBAT. Can be use as regular or injected channel. • But only available on the master ADC1 peripheral. Voltage Reference Selection

• Input range from V REF- to V REF+

• VREF+ Positive analog reference

• VDDA equal to Vdd

• VREF- Negative analog reference, =V SSA

• VSSA Grounded and equal to V SS

• By default, can convert input range from 0 to 3V Conversion Trigger Selection

• Can be triggered by software • Setting SWSTART bit in control register 2 (ADC_CR2) for regular group • Setting JSWSTART bit in control register 2 (ADC_CR2) for injected group

• Or by external trigger • Select the trigger detection mode • Specify the trigger event • Different bits for specifying regular group and injected group Hardware Trigger Sources

• ADC control register 2 Conversion Options Selection • Continuous? • Single conversion or continuous conversion (CR2 CONT bit) • Discontinuous mode available(CR1 DISCEN bit)

• Sample time

• Data alignment • CR2 ALIGN

• Scan mode: convert all the channels • CR1 SCAN

• Resolution • CR1 RES[1:0] Conversion Completion • In single conversion mode • Regular channel • Store the result into the 16-bit ADC_DR register • Set the EOC (end of conversion) flag • Interrupt if EOCIE bit is set • Injected channel • Store the result into the 16-bit ADC_JDR1 register • Set the JEOC (end of conversion injected) flag • Interrupt if JEOCIE bit is set

• Behave differently in other modes. And if there is a sequence of conversions, can be specified to set the flag at the end of the sequence or at the end of every conversion Result Registers • After the conversion, may need extra processing • Offset subtraction from calibration • Averaging: 1, 4, 8, 16 or 32 samples • Formatting: Right justification, sign- or zero-extension to 16 bits • Output comparison

• Result registers for two groups • ADC_DR for regular group • ADC_JDRx(x=1..4) for injected group Common Control Register

• Select different modes by writing to MULTI [4:0] bits

• Prescale the clock by writing to ADCPRE bits

• Enable the V BAT or the temperature sensor by setting VBATE or TSVREFE

• Decide the delay between to sampling phases by writing to DELAY bits Using ADC Values

• The ADC gives an integer representing the input voltage relative to the reference voltages

• Several conversions may be needed • For many applications you will need to compute the approximate input voltage

•Vin = … • For some sensor-based applications you will need to compute the physical parameter value based on that voltage (e.g. pressure) – this depends on the sensor’s transfer function • You will likely need to do additional computations based on this physical parameter (e.g. compute depth based on pressure)

• Data type • It’s likely that doing these conversions with integer math will lead to excessive loss of precision, so use floating point math • AFTER you have the application working, you can think about accelerating the program using fixed-point math (scaled integers).

• Sometimes you will want to output ASCII characters (to the LCD, for example). You will need to convert the floating point number to ASCII using sprintf, ftoa, or another method. Example: Temperature Sensor

• ADC1 Channel 16

• The minimum ADC sampling time for the temperature channel is 10 microseconds

• Sampling cycles at least 110

• T(°C) = {(Vsense-V25)/Avg_Slope}+25 • V25=Vsense value for 25 °C(typical value:0.78V) • Avg_Slope=average slope of the temperature vs. Vsense curve(typical value:1.3mv/°C) • Vsense=DR×3/4096 (If Vref+=3v, Vref-=0v, 12-bit format) • Statics really vary from board to board!

• Use your finger to press the chip in the center of the board, the temperature will go high. ANALOG WATCHDOG Analog Watchdog

• Watchdog basically tries to detect exception and recover the MCU from specific situations.

• Analog Watchdog is actually an ADC followed by one (or two) comparator(s).

• ADC1 Channel 17

• Set the status bit (or generate an interrupt) if voltage converted is below a lower threshold or is above a higher threshold.

• Can select to watch all channels (either injected or regular groups or even both) or single channels.

• Monitor analog input and bark e.g., if temperature goes crazy! Example: Power Failure Detection • Need warning of when power has failed

• Use continuous mode and the analog watchdog interrupt

• Do the last second jobs! • Very limited amount of time before capacitor discharges • Save critical information • Turn off output devices • Put system into safe mode

• Can use a comparator to compare V REFINT (1.2V) against a fixed reference voltage Vref

• Save data, money or even life if lucky enough Timer Peripherals STM32F4 Clock tree • Various Clock sources

• Highly configurable

• Can be controlled independently

• Possible to prescale

• Can output clock from some pins

• To achieve the balance between performance and power consumption Timer/Counter Peripheral Introduction

Events Reload Value Reload

Presettable or ÷2 or RS PWM Binary Counter Clock Interrupt Current Count • Common peripheral for microcontrollers • Based on presettable binary counter, enhanced with configurability • Count value can be read and written by MCU • Count direction can often be set to up or down • Counter’s clock source can be selected • Counter mode: count pulses which indicate events (e.g. odometer pulses) • Timer mode : clock source is periodic, so counter value is proportional to elapsed time (e.g. stopwatch) • Counter’s overflow/underflow action can be selected • Generate interrupt • Reload counter with special value and continue counting • Toggle hardware output signal • Stop! STM32F4 Timer Peripherals • Advanced Control Timer • TIM1 and TIM8 • Input capture, output compare, PWM, one pulse mode • 16-bit auto-reload register • Additional control for driving motor or other devices • General Purpose Timer • TIM2 to TIM5 • Input capture, output compare, PWM, one pulse mode • 16-bit or 32-bit auto-reload register • General Purpose Timer • TIM9 to TIM14 • Input capture, output compare, PWM, one pulse mode • Only 16-bit auto-reload register • Basic Timer (Simple timer) • TIM6 and TIM7 • Can be generic counter and internally connected to DAC • 16-bit auto reload register • Also a 24 bit system timer(SysTick) General Purpose Timer Block Diagram General Purpose Timer

Reload Value TIMx_ARR Reload

Clock TIMx_PSC TIMx_CNT ISR Resume Interrupt

Current Count • Timer can count down or up or up/down (up by reset) • A prescaler can divide the counter clock • 4 independent channels for the timer • Input capture • Output compare • PWM generation • One-pulse mode output • Overflow or underflow will cause an update event (UEV), thus the interrupt and possibly the reload PERIODIC INTERRUPT Timer as A periodic Interrupt Source Reload Value TIMx_ARR Reload

Clock TIMx_PSC TIMx_CNT ISR Resume Interrupt

Current Count

• STM32F4 families enjoy sophisticated and powerful timer • One of the basic function of the timer is to cause independent and periodic interrupts • Best for regularly repeating some certain small tasks General Purpose Timer Clock selection • There are 4 possible sources of clock • Internal clock(CK_INT) • External clock mode1: external input pin(TIx) • External clock mode2: external trigger input (ETR) (some timers only) • Internal trigger inputs (ITRx) • Access TIMx_SMCR (Bit 2:0 SMS) slave mode control register to select the clock source and mode • For example, for the periodic interrupt, with SMS being 000, the counter will clocked by the internal clock (APB1) • By default, the prescaler for APB1 is 4 (defined by the system_stm32f4xx.c), which means the CK_INT is 42Mhz (SYSCLK is 168Mhz) Specify PSC and ARR

Reload Value TIMx_ARR Reload

Clock TIMx_PSC TIMx_CNT ISR Resume Interrupt

Current Count • TIMx_PSC prescale register stores the value which will be used to divide the clock input. • In count up mode, overflow will occur if TIMx_CNT counter value reach the TIMx_ARR auto-reload value. And then the TIMx_CNT will be updated with 0. • Both TIMx_ARR and TIMx_PSC are 16-bit register! • So the total periodic time can be calculated as • Tout=((ARR+1)×(PSC+1)) ÷Fclk • When Fclk is 42MHz, setting ARR to 5999 and PSC to 13999 makes one second periodic time. More on PSC and ARR

Reload Value TIMx_ARR Reload

Clock TIMx_PSC TIMx_CNT ISR Resume Interrupt

Current Count • Accessing the ARR only write or read from a preload register of ARR, there is another shadow register which is the register that actually performs the reloading. The content of the preload register will be transfer to the shadow register permanently if the APRE bit in TIMx_CR1 is clear or at each UEV if APRE bit is set.

• PSC, on the other hand, is always buffered. Which means though it can be changed on the fly, the new ratio will only be taken into account at the next UEV. CAPTURE/COMPARE MODE Capture/Compare Channels • Different channels but same blocks • Capture mode can be used to measure the pulse width or frequency • Input stage includes digital filter, multiplexing and prescaler • Output stage includes comparator and output control • A capture register (with shadow register)

• Input Stage Example • Input signal->filter->edge detector->slave mode controller or capture command Capture/Compare Channels • Main Circuit

• The block is made of one preload register and a shadow register. • In capture mode, captures are done in shadow register than copied into preload register • In compare mode, the content of the preload register is copied into the shadow register which is compared to the counter Capture/Compare Channels • Output stage

• Generates an intermediate waveform which is then used for reference. Wind Speed Indicator (Anemometer) • Rotational speed (and pulse frequency) is proportional to wind velocity

• Two measurement options: • Frequency (best for high speeds) • Width (best for low speeds)

• Can solve for wind velocity v

• How can we use the Timer for this? • Use Input Capture Mode to measure period of input signal Input Capture Mode for Anemometer

• Operation: Repeat • First capture - on rising edge • Reconfigure channel for input capture on falling edge • Clear counter, start new counting • Second Capture - on falling edge • Read capture value, save for later use in wind speed calculation • Reconfigure channel for input capture on rising edge • Clear counter, start new counting

• Solve the wind speed

• Vwind = K÷(Cfalling –Crising )×Freq PWM MODE Pulse-Width Modulation • Uses of PWM • Digital power amplifiers are more efficient and less expensive than analog power amplifiers • Applications: motor speed control, light dimmer, switch-mode power conversion • Load (motor, light, etc.) responds slowly, averages PWM signal • Digital communication is less sensitive to noise than analog methods • PWM provides a digital encoding of an analog value • Much less vulnerable to noise • PWM signal characteristics • Modulation frequency – how many pulses occur per second (fixed) • Period – 1/(modulation frequency) • On-time – amount of time that each pulse is on (asserted) • Duty-cycle – on-time/period • Adjust on-time (hence duty cycle ) to represent the analog value PWM to Drive Servo Motor

• Servo PWM signal • 20 ms period • 1 to 2 ms pulse width Serial Communications Overview • Serial communications • Concepts • Tools • Software: polling, interrupts and buffering

• UART communications • Concepts • STM32F4Discovery UART peripheral

• SPI communications • Concepts • STM32F4Discovery SPI peripheral

• I2C communications • Concepts • STM32F4Discovery I2C peripheral Why Communicate Serially?

• Native word size is multi-bit (8, 16, 32, etc.)

• Often it’s not feasible to support sending all the word’s bits at the same time • Cost and weight: more wires needed, larger connectors needed • Mechanical reliability: more wires => more connector contacts to fail • Timing Complexity: some bits may arrive later than others due to variations in capacitance and resistance across conductors • Circuit complexity and power: may not want to have 16 different radio transmitters + receivers in the system Example System Peripheral Wr Rd Data Peripheral aaR Wr Rd Data

MCU Data Rd Wr Peripheral

Data Rd Wr Peripheral

• Dedicated point-to-point connections • Parallel data lines, read and write lines between MCU and each peripheral

• Fast, allows simultaneous transfers

• Requires many connections, PCB area, scales badly Parallel Buses

• All devices use buses to share data, read and write signals

• MCU uses individual select lines to address each peripheral

• MCU requires fewer pins for data, but still one per data bit

• MCU can communicate with only one peripheral at a time Synchronous Serial Data Transmission

Transmitting Device Receiving Device

Clock

Serial Data

Data Sampling Time at Receiver

• Use shift registers and a clock signal to convert between serial and parallel formats

• Synchronous: an explicit clock signal is along with the data signal Synchronous Full-Duplex Serial Data Bus

• Now can use two serial data lines - one for reading, one for writing. • Allows simultaneous send and receive full-duplex communication Synchronous Half-Duplex Serial Data Bus

• Share the serial data line • Doesn’t allow simultaneous send and receive - is half-duplex communication Asynchronous Serial Communication Data bits

Data Sampling Time Zero T T T T T T T T T T bit bit bit bit bit bit bit bit bit bit

Time at Receiver *10.5 *1.5 *6.5 *7.5 *8.5 *9.5 *2.5 *3.5 *4.5 *5.5

• Eliminate the clock line! • Transmitter and receiver must generate clock locally • Transmitter must add start bit (always same value) to indicate start of each data frame • Receiver detects leading edge of start bit, then uses it as a timing reference

for sampling data line to extract each data bit N at time Tbit *(N+1.5) • Stop bit is also used to detect some timing errors Serial Communication Specifics

• Data frame fields Data • Start bit (one bit) bits • Data (LSB first or MSB, and size – 7, 8, 9 bits) • Optional parity bit is Message used to make total number of ones in data even or odd • Stop bit (one or two bits) • All devices must use the same communications parameters • E.g. communication speed (300 baud, 600, 1200, 2400, 9600, 14400, 19200, etc.) • Sophisticated network protocols have more information in each data frame • Medium access control – when multiple nodes are on bus, they must arbitrate for permission to transmit • Addressing information – for which node is this message intended? • Larger data payload • Stronger error detection or error correction information • Request for immediate response (“in-frame”) Error Detection • Can send additional information to verify data was received correctly

• Need to specify which parity to expect: even, odd or none.

• Parity bit is set so that total number of “1” bits in data and parity is even (for even parity) or odd (for odd parity) • 01110111 has 6 “1” bits, so parity bit will be 1 for odd parity, 0 for even parity • 01100111 has 5 “1” bits, so parity bit will be 0 for odd parity, 1 for even parity

• Single parity bit detects if 1, 3, 5, 7 or 9 bits are corrupted, but doesn’t detect an even number of corrupted bits

• Stronger error detection codes (e.g. Cyclic Redundancy Check) exist and use multiple bits (e.g. 8, 16), and can detect many more corruptions. • Used for CAN, USB, Ethernet, Bluetooth, etc. SOFTWARE STRUCTURE – HANDLING ASYNCHRONOUS COMMUNICATION Software Structure • Communication is asynchronous to program • Don’t know what code the program will be executing … • when the next item arrives • when current outgoing item completes transmission • when an error occurs • Need to synchronize between program and serial communication interface somehow

• Options • Polling • Wait until data is available • Simple but inefficient of processor time • Interrupt • CPU interrupts program when data is available • Efficient, but more complex Serial Communications and Interrupts

• Want to provide multiple threads of Main Program or control in the program other threads • Main program (and subroutines it calls) • Transmit ISR – executes when serial send_string get_string interface is ready to send another character • Receive ISR – executes when serial interface receives a character • Error ISR(s) – execute if an error occurs

tx_isr rx_isr • Need a way of buffering information between threads • Solution: circular queue with head and Serial tail pointers Interface • One for tx, one for rx UNIVERSAL SYNCHRONOUS ASYNCHRONOUS RECEIVER TRANSMITTER (USART) USART Block Diagram USART

• Like other modules, USART in the STM32F4 families is capable to operate in many modes. Transmitter Basics Data bits

Data Sampling Time Zero T T T T T T T T T T T bit bit bit bit bit Time at Receiver bit bit bit bit bit bit

• If no data to send, keep sending 1 (stop bit) – idle line • When there is a data word to send • Send a 0 (start bit) to indicate the start of a word • Send each data bit in the word (use a shift register for the transmit buffer ) • Send a 1 (stop bit) to indicate the end of the word Receiver Basics Data bits

Data Sampling Time Zero T T T T T T T T T T bit bit bit bit bit bit bit bit bit bit

Time at Receiver *10.5 *1.5 *6.5 *7.5 *8.5 *9.5 *2.5 *3.5 *4.5 *5.5

• Wait for a falling edge (beginning of a Start bit) • Then wait ½ bit time • Do the following for as many data bits in the word • Wait 1 bit time • Read the data bit and shift it into a receive buffer (shift register) • Wait 1 bit time • Read the bit • if 1 (Stop bit), then OK • if 0, there’s a problem! For this to work…

• Transmitter and receiver must agree on several things (protocol) • Order of data bits • Number of data bits • What a start bit is (1 or 0) • What a stop bit is (1 or 0) • How long a bit lasts • Transmitter and receiver clocks must be reasonably close, since the only timing reference is the start bit How the STM32F4 works

• Transmitter and receiver must agree on several things (protocol) • Order of data bits • LSB will be transmitted first • Number of data bits • Can be configured as 8 or 9 bits • What a start bit is (1 or 0) • 0 • What a stop bit is (1 or 0) • 1 • Configurable length (0.5, 1, 1.5 or 2) • How long a bit lasts • Software programmable phase and polarity

• Many of them are configurable as well Input Data Oversampling

• When receiving, UART oversamples incoming data line • Extra samples allow voting, improving noise immunity • Better synchronization to incoming data, improving noise immunity

• FTM32F4 provides configurable oversampling rate of either 8 or 16 times the baud rate clock

• Two voting method: Single sample in the center or majority vote of the three samples in the center Using the UART • Transmitter • Receiver • Configure the GPIO • Configure the GPIO • AF mode, fast speed • AF mode, fast speed • Enable the USART (Set UE in CR1) • Enable the USART (Set UE in CR1) • Define word length (writing M bit in CR1) • Define word length (M bit in CR1) • Program the stop bits (CR2) • Program the stop bits (CR2) • Using DMA? (DMAT in CR3) • Using DMA? (DMAR in CR3) • Parity Check? (PCE/PS in CR1) • Parity Check?(PCE/PS in CR1) • Configure Baud Rate (USART_BRR) • Configure Baud Rate (USART_BRR) • First transmission (Set TE in CR1) • Begin waiting for start bit (Set RE in CR1) • Write data to send (DR) • If RXNE is set, then the data has been • Repeat writing data to send (DR) received and can be read • For ending the transmission, wait until the • Interrupt if RXNEIE bit is set last frame is complete (when TC=1) SPI COMMUNICATIONS Hardware Architecture

• All chips share bus signals • Clock SCK • Data lines MOSI (master out, slave in) and MISO (master in, slave out)

• Each peripheral has its own chip select line (CS) • Master (MCU) asserts the CS line of only the peripheral it’s communicating with

• SPI interface of STM32F4 also supports I 2S audio protocol. Serial Data Transmission

Transmitting Device Receiving Device

Clock Serial Data Data Sampling Time at Receiver

• Use shift registers and a clock signal to convert between serial and parallel formats

• Synchronous: an explicit clock signal is along with the data signal SPI Signal Connection Overview

is also referred as NSS in some documents. Using the SPI • Slave mode • Master mode • Decide data frame format (DFF) • Baud rate (BR in CR1) • Select the relationship (CPOL/CPHA) • Select the relationship (CPOL/CPHA) • MSB or LSB first? (LSBFIRST in CR1) • Decide data frame format (DFF) • Using DMA? (DMAT in CR3) • MSB or LSB first? (LSBFIRST in CR1) • Handle the NSS or SSM and SSI bit • Handle the NSS or SSM and SSI bit depending on the mode depending on the mode • TI mode protocol? (FRF in CR2) • TI mode protocol? (FRF in CR2) • Clear the MSTR and set SPE in CR1 • Set MSTR and SPE in CR1

• MOSI is input and MISO is output • MOSI is output and MISO is input

• Transmit • Transmit • Parallel-load data byte into Tx buffer during • Write a byte into Tx buffer a write cycle • Transfer data from buffer to shift register • Transfer data from buffer to shift register • Receive • Receive • Transfer data from shift register to RX buffer • Transfer data from shift register to Rx buffer and set the RXNE flag and set the RXNE flag I2C COMMUNICATIONS I2C Bus Overview • “Inter-” bus

• Multiple devices connected by a shared serial bus

• Bus is typically controlled by master device, slaves respond when addressed

• I2C bus has two signal lines • SCL: Serial clock • SDA: Serial data

• Full details available in “The I 2C-bus Specification” I2C Bus Connections

• Resistors pull up lines to V DD

• Open-drain transistors pull lines down to ground

• Master generates SCL clock signal • Can range up to 400 kHz, 1 MHz, or more I2C Message Format

• Message-oriented data transfer with four parts 1. Start condition 2. Slave Address transmission • Address • Command (read or write) • Acknowledgement by receiver 3. Data fields • Data byte • Acknowledgement by receiver 4. Stop condition Master Writing Data to Slave PROTOCOL COMPARISON Factors to Consider • How fast can the data get through? • Depends on raw bit rate, protocol overhead in packet

• How many hardware signals do we need? • May need clock line, chip select lines, etc.

• How do we connect multiple devices (topology)? • Dedicated link and hardware per device - point-to-point • One bus for master transmit/slave receive, one bus for slave transmit/master receive • All transmitters and receivers connected to same bus – multi-point

• How do we address a target device? • Discrete hardware signal (chip select line) • Address embedded in packet, decoded internally by receiver

• How do these factors change as we add more devices? Protocol Trade-Offs

Protocol Speed Signals Req. for Device Topology Bidirectional Addressing Communication with N devices

UART Fast – Tens of 2*N(TxD,RxD) None Point-to-point full (Point to Mbit/s duplex Point) UART Fast – Tens of 2 (TxD, RxD) Added by user Multi-drop (Multi- Mbit/s in software drop) SPI Fast –Tens of 3+N for SCLK, MOSI, Hardware chip Multi-point full- Mbit/s MISO, and one SS select signal duplex, multi-drop per device per device half-duplex buses I2C Moderate –100 2: SCL, SDA Inpacket Multi-point half- kbit/s, 400 kbit/s, 1 duplex bus Mbit/s, 3.4 Mbit/s. Packet overhead.