SOMNIUM® DRT Benchmarks Whitepaper DRT v3.4 release : August 2016 www.somniumtech.com

SOMNIUM® DRT is a complete embedded software NXP Kinetis examples NXP provided CMSIS-Core routines (from Kinetis Design Studio) are called before main() to development environment which supports ARM® Cortex® M configure the on-chip PLL to highest possible frequency. devices from major semiconductor vendors. DRT's savings are significant on small memory devices Vendor Family Cortex such as the KL series (starting at 8K ROM, 1K RAM).

Atmel (a subsidiary of SMART M0+, M3, M4 Kinetis Cortex M0+ example (KL46Z) Microchip Technology Inc.) ROM RAM DRT saving DRT saving Kinetis M0+, M4 KBytes (% and bytes) KBytes (% and bytes) NXP LPC M0, M0+, M3, M4 DRT 1.5 n/a n/a 0.1 n/a n/a KDS3 Nano 2.4 59% 908 0.6 314% 440 STMicroelectronics STM32 M0, M0+, M3, M4 KDS3 Newlib 3.9 164% 2504 1.5 1009% 1412 Other software vendors use adjectives. SOMNIUM use CW10.6 Nano 2.5 66% 1004 0.6 317% 444 facts. This whitepaper compares benchmarking results for SOMNIUM DRT against other toolchain products to CW10.6 Newlib 4.5 201% 3064 1.5 1009% 1412 demonstrate that DRT builds the smallest, fastest most Kinetis Cortex M4 example (K64F) energy efficient code with no source code changes required. ROM RAM DRT saving DRT saving No defeat devices! Unlike many well known software KBytes (% and bytes) KBytes (% and bytes) vendors, SOMNIUM play fair and do not put “benchmark DRT 1.4 n/a n/a 0.1 n/a n/a special” features in our products to change their behavior in KDS 3 Nano 2.5 75% 1108 0.6 314% 440 the presence of known tests. We don't alter the benchmark KDS3 Newlib 4.0 178% 2608 1.5 1009% 1412 source code, and we always use the same tool options to get an honest and fair comparison. CW10.6 Nano 2.6 82% 1208 0.6 317% 444 CW10.6 Newlib 4.6 223% 3276 1.5 1009% 1412 Small memory footprint for C execution DRT uses a highly tuned and specifically configured C NXP LPC examples DRT leaves more space for your runtime library with a reduced memory footprint: application by making significant memory savings on small memory devices such as the LPC8xx series (starting  Over 100% smaller ROM usage than GNU Newlib. at 8K ROM, 2K RAM).  Smaller ROM usage than GNU Newlib Nano.  Around 100% smaller statically allocated RAM overhead LPC Cortex M0+ example (LPC824) than GNU Newlib Nano and GNU Newlib. ROM RAM  Unlike vanilla GNU Newlib Nano, memory is statically DRT saving DRT saving allocated (where possible) and so RAM usage is easily KBytes (% and bytes) KBytes (% and bytes) predicted by build time pass/fail. DRT 0.6 n/a n/a 0.1 n/a n/a  Vastly smaller ROM and RAM usage than NXP RedLib. LPCX Nano 1.0 72% 424 0.1 -3% -4  Full support for C++ exception handling. LPCX RedLib 2.6 359% 2112 1.1 691% 968 A simple “empty” C program was used to show the minimum LPC Cortex M4 example (LPC5411) ROM/RAM requirement to establish a C environment: ROM RAM DRT saving DRT saving int main (int argc, char *argv[]) KBytes (% and bytes) KBytes (% and bytes) { DRT 0.5 n/a n/a 0.1 n/a n/a while (1) {} return 0; LPCX Nano 0.9 91% 424 0.1 -3% -4 } LPCX RedLib 2.5 455% 2112 1.1 691% 968

These memory savings made by DRT can have huge STMicroelectronics STM32 examples practical impact on the usability of small memory devices. DRT can save development time (by allowing We created an “Empty” C program natively in each IDE. DRT software development in C rather than ARM assembly saves a huge amount of memory versus SW4STM32 language), and increase the potential to use smaller making the use of small memory STM32 devices far more memory (and lower cost, lower energy) devices. practical.

SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v3.0d SOMNIUM® DRT Benchmarks Whitepaper

STM32F0 : Cortex M0+ ROM RAM DRT saving DRT saving KBytes (% and bytes) KBytes (% and bytes) DRT 0.6 n/a n/a 0.1 n/a n/a SW4STM32 2.1 261% 1544 2.1 1423% 1992 STM32F4 : Cortex M4

ROM RAM DRT saving DRT saving KBytes (% and bytes) KBytes (% and bytes) DRT 0.8 n/a n/a 0.1 n/a n/a SW4STM32 2.0 164% 1280 2.1 1423% 1992 DRT didn't affect RAM usage, but always generated the smallest codesize, the fastest and lowest energy results. Industry standard benchmarks NXP Kinetis results SOMNIUM are members of the EEMBC® Automotive We compared against the latest KDS3.2 tools and the stable Subcommittee and use their industry standard benchmarks. GNU ARM Launchpad 4.9 tools - the vanilla GCC 5x/6x tools We believe that in order to be useful, benchmarks should used in other commercial products have code generation show 3 dimensions. Not just performance, but memory bugs (not present in DRT) which manifest with the Kinetis size and energy. It's easy to obtain higher performance by software enablement and so we recommend against their unrolling, inlining and use with Kinetis. specializing functions to the extreme, but real world KL02 devices are quite constrained by their memory size and systems are memory limited performance. Using DRT increases performance whilst so this approach makes little saving significant amounts of ROM, RAM and energy. sense if you want to get an accurate understanding of real world behaviour. We always measure both memory size, performance and where possible we measure energy, measured to the uJ using the high accuracy EEMBC EnergyMonitor™ (due to its 29mA current limitation we couldn't use it to perform measurements on all devices). CoreMark™ is used to demonstrate the usable performance of a processor system running typical algorithms including list processing (to stress test data accesses), matrix manipulation (to stress test and mathematical operations, KV10 devices have high performance memory systems, with and state machines (to stress test complex control flows). a 16-entry, 4-way set associated flash cache. Even with this Atmel SMART results high performance hardware, DRT still increases KV10 We compare DRT to the vanilla GNU tools from Atmel Studio. performance and saves energy whilst reducing codesize.

SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v3.0d SOMNIUM® DRT Benchmarks Whitepaper

K21 devices DRT significantly improves performance and DRT maintained performance whilst also producing reduces energy consumption. significant memory reductions. Sadly we couldn't measure the energy benefits of DRT as EEMBC EnergyMonitor can't supply enough current to power this MCU. STM32L053 is an ultra low power device with a very simple flash buffer rather than a cache. DRT can significantly improve its performance and energy behaviour.

NXP LPC results We compared DRT to the LPCXpresso tools which use vanilla GCC5x (similar to the GNU tools as used in Atollic TrueStudio) both of which contain code generation bugs. LPC8xx devices LPC824 is an entry level device with no flash cache. DRT slightly improved performance and energy usage “Real world” demonstration systems whilst producing significant reductions in memory usage. Its not just benchmarks! We examined applications which use “real world” middleware including FreeRTOS, Micrium uCOS, code generated using Atmel START and Atmel Software Framework, (Kinetis) Processor Expert, Kinetis SDK v2, Sensor Fusion Library, LPCOpen, STM32CubeMX, and the eGUI graphic library. Atmel ASF examples Across the 2,343 unique ASF examples built @ Os (optimize for size) on average DRT consumes 32% less ROM than code generated with the default Atmel Studio GNU tools. SOMNIUM DRT Cortex-M IDE 3.4 introduces support for STMicroelectronics STM32 results Atmel SMART devices and is available on Windows, We compared DRT against the SW4STM32 tools and (with MacOSX support coming soon). It is fully integrated TrueSTUDIO (both of which use the same vanilla GNU GCC 5x with Atmel START for ASF project creation. We also offer tools as free of charge products and contain the same bugs). SOMNIUM DRT Atmel Studio Extension - a plugin for STM32F446 has 64-bit wide flash, with a very sophisticated Atmel Studio 6.2 and Atmel Studio 7. It seamlessly 1KByte flash cache comprising 32 * 32 byte lines and a integrates with Atmel's tools and is provided as a new similar 256byte structure for data accesses to flash which toolchain “flavor”. Using DRT rather than the default GNU allow the Cortex M4 processor to run at full speed without tools is a simple press button activity requiring no source being impacted by instruction fetch latency. code changes or change of environment. SAMD21freertos_oled1_tickless_xpro_example A basic FreeRTOS demo from ASF 3.24.21 using SAMD21 (Cortex M0+): ROM RAM KBytes DRT saving KBytes DRT saving “out of box” @ -O1 using Newlib Nano DRT 17.8 n/a 24.4 n/a Studio 6.2 19.7 10% 24.4 0% “size optimized” @ -Os using Newlib Nano DRT 16.2 n/a 24.4 n/a Studio 6.2 18.0 11% 24.4 0.0%

SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v3.0d SOMNIUM® DRT Benchmarks Whitepaper

Micrium_uCOSII_Led_Blink_SAM4S_Xpro example the available ROM and RAM. A basic Micrium Demo supplied with Studio 6.2 shows the When built with DRT, both Attach V1 and V2 fit into the ROM/RAM overhead on a SAM4S (Cortex M4): available ROM and RAM. No source code changes were ROM RAM required, no features were removed, and DRT made these KBytes DRT saving KBytes DRT saving savings automatically. “out of box” @ -O1 using Newlib Attach V2 Demo @ -Os : features cut so KDS3 Nano just fits DRT 12.0 n/a 17.9 n/a ROM RAM Studio 6.2 22.8 90% 20.0 12% KBytes DRT saving KBytes DRT saving “size optimized” @ -Os using Newlib Nano DRT 122.0 n/a 14.9 n/a DRT 11.5 n/a 17.9 n/a KDS3 Nano 127.0 3% 14.9 0% Studio 6.2 12.3 7% 18.0 0.1% KDS3 Newlib 144.8 17% 17.0 26% SAMB11 PXP demo uses a BLE software stack supplied as a prebuilt library in ROM, which can't be further optimized by NXP Sensor Fusion Library DRT. Despite this library forming over 70% of the codesize of The Sensor Fusion Library and its example programs are this demo, DRT still provides significant memory savings. supplied as KDS3 source projects. We used DRT's automatic This shows how DRT can leave more memory available for KDS project importer/convertor to avoid the need for any your application in resource constrained devices. manual steps (this isn't possible with Atollic's tools which ROM RAM require manual conversion so we don't quote Atollic results). KBytes DRT saving KBytes DRT saving KL46Z Sensor Fusion “size optimized” @ -Os using Newlib Nano ROM RAM DRT 35.7 n/a 7.7 n/a DRT saving DRT saving Studio 7 38.2 7% 7.8 0.2% KBytes (% and bytes) KBytes (% and bytes) “Out of box” @ O3 DRT 39.7 n/a n/a 7.1 n/a n/a NXP Kinetis examples KDS3 Nano 47.9 21% 8448 7.1 0% 4 We examined a number of examples of NXP's Kinetis CW10.6 Nano 43.2 9% 3617 6.9 -2% -158 software enablement which is supported for CodeWarrior Optimized for size @ Os and KDS. Some Kinetis components (such as CMSIS) are also DRT 34.9 n/a n/a 7.1 n/a n/a supported for IAR tools, but several key items of software enablement (including Sensor Fusion and Intelligent Sensor This demonstration shows that “off the shelf” Kinetis Framework) are only provided for GNU compatible tools software enablement ports to DRT with no changes (CodeWarrior and KDS). As DRT is fully GNU compatible it required and uses significantly less ROM built with DRT. supports all NXP Kinetis software enablement without When options are modified to optimize for size, DRT still requiring any source code changes. produces the smallest executable. “Attach” is a Freescale produced demonstration program, NXP LPC examples originally written using CodeWarrior 10.6, then ported by We examined a number of examples of NXP's LPC software Freescale to KDS2. enablement for LPCXpresso which make use of LPCOpen Attach V1: only fits when built with DRT components. LPCOpen FreeRTOS blinky demo @ -Os Attach V1 Demo @ Os : doesn't fit with KDS3 Nano ROM RAM A basic FreeRTOS demo from LPCOpen showing the basic KBytes DRT saving KBytes DRT saving overhead required for LPCOpen & FreeRTOS. DRT's ROM DRT 123.5 n/a 13.5 n/a savings show how using DRT frees up valuable memory, KDS3 Nano 128.4 4% 13.4 0% leaving space for your application. KDS3 Newlib 137.3 11% 15.4 15% ROM RAM No KDS configurations (KDS2, KDS3, Newlib or Newlib Nano) KBytes DRT saving KBytes DRT saving result in an executable which fits within the MKL26Z128 DRT 6.0 n/a 0.4 n/a device (ROM usage is too large, and RAM is almost full even LPCX Nano 6.5 8% 0.4 0% before heap and stack are allocated!) LPCX RedLib 7.0 16% 0.6 50% Attach V2: reduced features to fit with KDS3 Freescale made significant source modifications such that when built with KDS3 Newlib Nano, Attach V2 “just” fits into

SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v3.0d SOMNIUM® DRT Benchmarks Whitepaper

LPC824 QuickJack demo Software manifest "The flexible Smartphone Quick-Jack Solution adapts the Vendor Product Version standard 3.5 mm stereo audio jack found on most iOS or SW4STM32 System Workbench for STM32 1.8.0 Android smart devices into a self-powered data port and provides a universal interface for external sensors, switches Studio 6.2 and 7.0 and other external devices. It gives both consumer and Atmel Software Framework 3.30.1 industrial product designers a simple, plug-and-go way to get data into an endless variety of control, monitoring, data Atollic TrueSTUDIO for ARM 6.0.0 collection, maintenance, medical and even fun applications." EEMBC Energy Monitor 1.1.3

ROM RAM Codewarrior for MCUs 10.6 KBytes DRT saving KBytes DRT saving NXP/Freescale Kinetis Design Studio 3.2 DRT 3.2 n/a 0.2 n/a LPCX Nano 3.4 7% 0.2 7% LPCXpresso 8.2.0 LPCX RedLib 5.0 57% 0.5 202% DRT Cortex-M IDE 3.4 Built with DRT this demo is almost half the size of the SOMNIUM DRT NXP Edition 3.4 demo when built with LPCXpresso using RedLib. DRT Atmel Studio Extension 3.4 Gesture recognition demo Not all tools are equal The LPC824 Touch Library API is used to drive and scan capacitive sensors, processes their signal data, filters out Free of charge tools use vanilla GNU and Eclipse tools. Many noise and provides touch data to a gesture recognition commercial products freeload on the same vanilla tools and application. The Touch Library accounts for ~45% of this offer no benefits over free products. demo's ROM usage and is supplied with LPCOpen as a pre SOMNIUM are experts in GNU and Eclipse internals and built library (as its prebuilt it can't be optimized by DRT). maintain GNU tools for major semiconductor vendors. We DRT still provides significant memory savings compared are NXP Proven Partners and produced Kinetis Design Studio to vanilla GNU tools used in LPCXpresso and TrueSTUDIO. for Freescale. ROM RAM KBytes DRT saving KBytes DRT saving Summary DRT 10.3 n/a 3.4 n/a DRT is the ideal upgrade path from entry level tools, with LPCX Nano 11.0 6% 3.4 0% unique features to generate the best code quality and save LPCX RedLib 12.2 17% 3.8 10% development time and money. DRT's device-aware code generation builds the most efficient program for your choice STMicroelectronics STM32 examples of Cortex-M device, with no source code changes required. DRT IDEs also include leading edge debug and trace features We tested DRT a the demo examples built with SW4STM32 to improve productivity and help you reach market faster. and TrueSTUDIO. DRT is available in three flavors. STM32F4 “Animated picture from SDCard” demo DRT MCU support IDE DRT uses significantly less ROM and RAM than other tools. Atmel Studio Atmel SMART Windows hosted ROM RAM Extension Atmel Visual Studio DRT saving DRT saving KBytes KBytes 1 (% and bytes) (% and bytes) Cortex-M IDE Atmel SMART, NXP Kinetis Windows & Linux DRT 33.2 n/a n/a 8.5 n/a n/a & LPC, STMicroelectronics hosted, Eclipse IDE STM32 with debug and trace SW4STM32 43.1 30% 10080 9.2 8% 676 TrueSTUDIO 33.7 1% 456 9.1 6% 560 Trial and buy STM32F0: "No name" image browser from SDcard demo Find out more and get a free of charge fully featured trial of DRT uses significantly less ROM and RAM than SW4STM32. DRT from the SOMNIUM portal: ROM RAM www.somniumtech.com/product-selector DRT saving DRT saving KBytes KBytes (% and bytes) (% and bytes) Watch video tutorials and demonstrations at: DRT 53.5 n/a n/a 24.3 n/a n/a www.youtube.com/c/somniumtech SW4STM32 80.5 50% 27620 26.5 9% 2192 TrueSTUDIO 55.1 3% 1676 24.4 0% 52 1 SOMNIUM NXP Edition IDE also available (only supporting NXP devices)

SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v3.0d