SOMNIUM® DRT Benchmarks Whitepaper DRT 4.0 Release : March 2017
Total Page:16
File Type:pdf, Size:1020Kb
SOMNIUM® DRT Benchmarks Whitepaper DRT 4.0 release : March 2017 www.somniumtech.com SOMNIUM® DRT is a complete C/C++ embedded software Microchip SAM examples Atmel START was used to create an "empty" C program including Atmel Software Framework development environment which supports ARM® Cortex® M (ASF) routines to configure the on-chip PLL to highest devices from leading semiconductor vendors. possible frequency on a SAMD21 device (32KByte ROM, Vendor Family Cortex 4KByte RAM). IAR Embedded Workbench uses less memory than vanilla GNU tools from Atmel Studio, DRT Microchip SAM M0+, M3, M4 does even better and uses the smallest amount of ROM. Kinetis M0+, M4 NXP ROM RAM LPC M0, M0+, M3, M4 DRT is smaller DRT is smaller KBytes KBytes (% and bytes) (% and bytes) STMicroelectronics STM32 M0, M0+, M3, M4 DRT 1.5 n/a n/a 8.7 n/a n/a Other software vendors use adjectives. SOMNIUM use Atmel STUDIO 1.8 24.1% 360 8.7 0.0% 0 facts. This whitepaper compares benchmarking results for IAR 1.5 2.4% 36 8.6 -0.4% -32 SOMNIUM DRT against other toolchain products to demonstrate that DRT builds the smallest, fastest most energy efficient code with no source code changes required. NXP Kinetis examples We used NXP's Kinetis SDK v2 (now No defeat devices! Unlike many well known software renamed as MCUXpresso SDK) tools to create an "empty" C vendors, SOMNIUM play fair and do not put “benchmark program (including use of Kinetis SDK v2 routines to enable special” features in our products to change their behavior in the on-chip PLL) for a KL25Z device. DRT saves a significant the presence of known tests. We don't alter the benchmark amount of ROM compared to both vanilla GNU tools and source code, and we always use the same tool options to get IAR tools. an honest and fair comparison. Kinetis Cortex M0+ KSDK v2 example (KL25Z) Small memory footprint for C execution ROM RAM DRT uses a highly tuned and specifically configured C DRT is smaller DRT is smaller KBytes KBytes runtime library with a reduced memory footprint: (% and bytes) (% and bytes) Over 100% smaller ROM usage than GNU Newlib. DRT 3.7 n/a n/a 2.5 n/a n/a Smaller ROM usage than GNU Newlib Nano. KDS 3 6.7 79.8% 3056 2.5 0.0% 0 Around 100% smaller statically allocated RAM overhead IAR 5.5 48.4% 1852 2.1 -16.4% -420 than GNU Newlib Nano and GNU Newlib. Before KSDK v2 (now renamed MCUXpresso SDK), NXP's Unlike vanilla GNU Newlib Nano, memory is statically software enablement strategy was focussed on using the allocated (where possible) and so RAM usage is easily CMSIS-Core standard to enable the on-chip PLL before predicted by build time pass/fail. calling main(). We used CMSIS-Core GNU sources for an Vastly smaller ROM and RAM usage than NXP RedLib. "empty C" program to compare tools. Once again DRT uses Similar/smaller ROM usage to IAR. the smallest amount of ROM and RAM. These savings are Full support for C++ exception handling within the small very significant on small memory devices such as the KL memory footprint library. series (starting at 8K ROM, 1K RAM). A simple “empty” C program was used to show the minimum Kinetis Cortex CMSIS-Core M4 example (K64F) ROM/RAM requirement to establish a C environment: ROM RAM int main (int argc, char *argv[]) { DRT is smaller DRT is smaller KBytes KBytes while (1) {} return 0; (% and bytes) (% and bytes) } DRT 0.7 n/a n/a 0.3 n/a n/a KDS 3 Nano 2.5 250% 1840 0.6 101% 292 These memory savings made by DRT can have huge practical impact on the usability of small memory KDS3 Newlib 4.0 454% 3340 1.5 439% 1264 devices. DRT can save development time (by allowing CW10.6 Nano 2.6 264% 1940 0.6 103% 296 software development in C rather than ARM assembly CW10.6 Newlib 4.6 545% 4008 1.5 439% 1264 language), and increase the potential to use smaller CW10.6 EWL 4.9 583% 4292 4.1 1344% 3872 memory (and lower cost, lower energy) devices. TrueSTUDIO 0.9 22% 160 1.0 268% 772 SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 - 2017 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v5.0b SOMNIUM® DRT Benchmarks Whitepaper Kinetis Cortex M0+ CMSIS-Core example (KL46Z) STMicroelectronics Cortex M4 example (STM32F446RE) ROM RAM ROM RAM DRT is smaller DRT is smaller DRT is smaller DRT is smaller KBytes KBytes KBytes KBytes (% and bytes) (% and bytes) (% and bytes) (% and bytes) DRT 0.6 n/a n/a 0.3 n/a n/a DRT 3.3 n/a n/a 1.5 n/a n/a KDS3 Nano 2.4 317% 1852 0.6 101% 292 SW4STM32 3.4 3% 104 1.5 0% 4 KDS3 Newlib 3.9 590% 3448 1.5 439% 1264 TrueSTUDIO 3.4 2% 76 1.5 0% 0 CW10.6 Nano 2.5 334% 1948 0.6 103% 296 IAR 3.5 6% 204 1.5 -2% -36 CW10.6 Newlib 4.5 686% 4008 1.5 439% 1264 We also tried creating an empty project using each IDE's CW10.6 EWL 5.0 784% 4576 4.1 1340% 3860 NPW and compared results. DRT's memory savings have a TrueSTUDIO 0.6 -3% -20 0.3 1% 4 significant impact on the use of small memory STM32 devices, especially compared to SW4STM32. NXP LPC examples NXP's LPCXpresso did not provide a standalone tool to create LPC projects (like Kinetis Expert) so STM32F0 : Cortex M0+ we took a the LPCOpen "periph_blink" demo and removed ROM RAM the body code from main(). DRT is smaller DRT is smaller KBytes KBytes (% and bytes) (% and bytes) DRT leaves more space for your application by making significant memory savings on small memory devices DRT 0.5 n/a n/a 0.1 n/a n/a such as the LPC8xx series (starting at 4K ROM, 1K RAM). SW4STM32 0.7 42% 200 0.0 -71% -96 TrueSTUDIO 0.5 17% 80 0.2 21% 28 LPC Cortex M0+ example (LPC824) STM32F4 : Cortex M4 ROM RAM ROM RAM DRT is smaller DRT is smaller KBytes KBytes DRT is smaller DRT is smaller (% and bytes) (% and bytes) KBytes KBytes DRT 2.9 n/a n/a 0.0 n/a n/a (% and bytes) (% and bytes) DRT 0.7 n/a n/a 0.1 n/a n/a LPCX Nano 3.6 24% 720 0.1 1600% 128 LPCX RedLib 4.8 64% 1924 0.5 5800% 464 SW4STM32 0.7 0% 0 0.0 -71% -96 TrueSTUDIO 3.2 9% 284 0.0 0% 0 TrueSTUDIO 0.9 36% 244 0.0 -74% -100 LPC Cortex M4 example (LPC54114) Industry standard benchmarks ROM RAM SOMNIUM are members of the EEMBC® Automotive DRT is smaller DRT is smaller Subcommittee and use their industry standard benchmarks. KBytes KBytes (% and bytes) (% and bytes) We believe that in order to be useful, benchmarks should DRT 4.1 n/a n/a 0.0 n/a n/a show 3 dimensions - not just performance, but memory size and energy. It's easy to obtain higher performance by LPCX Nano 4.5 9% 364 0.0 0% 0 unrolling, inlining and LPCX RedLib 5.8 40% 1712 0.3 2800% 336 specializing functions to the TrueSTUDIO LPC54114 not supported in TrueSTUDIO extreme, but real world systems are memory limited STMicroelectronics STM32 examples so this approach makes little We used STM32CubeMX to create an "empty C" program sense if you want to get an using the HAL to enable clocks and PLL, perform default accurate understanding of real initialization of GPIO, etc. DRT's savings are significant on world behaviour. We always low-memory devices such as the STM32L0 (8K ROM). measure both memory size, performance and where STMicroelectronics Cortex M0+ example (STM32L011D3) possible we measure energy, ROM RAM measured to the uJ using the high accuracy EEMBC DRT is smaller DRT is smaller EnergyMonitor™ (due to its 29mA current limitation we KBytes KBytes (% and bytes) (% and bytes) couldn't use it to perform measurements on all devices). DRT 2.7 n/a n/a 1.5 n/a n/a CoreMark™ is used to demonstrate the usable performance SW4STM32 3.0 13% 348 1.5 0% 4 of a processor system running typical algorithms including TrueSTUDIO 3.0 12% 316 1.5 0% 0 list processing (to stress test data accesses), matrix manipulation (to stress test and mathematical operations, IAR 2.7 1% 16 1.5 -2% -36 and state machines (to stress test complex control flows). SOMNIUM and the SOMNIUM logo are registered trademarks of SOMNIUM® Technologies Limited 2016 - 2017 SOMNIUM® Technologies Limited All other product or service names are the property of their respected owners. SOMNIUM-MS-0049 v5.0b SOMNIUM® DRT Benchmarks Whitepaper Microchip SAM results KV10 devices use an M0+ processor and have high performance memory systems, with a 16-entry, 4-way set We compare DRT to the vanilla GNU tools from Atmel Studio associated flash cache. Even with this high performance and Atollic TrueSTUDIO's recent vanilla GNU tools. hardware, DRT significantly improves KV10 performance DRT didn't affect RAM usage, but always generated the and energy whilst reducing codesize. smallest code and the fastest and lowest energy results.