<<

arXiv:1908.03638v1 [cs.CR] 9 Aug 2019 smr n oeebde eie r once oteInter- the to connected are devices embedded more and more As opeette rmbigcmrmsd h eoyprotecti memory the compromised, being from them prevent To oe oteItre,weeeeyn a anhatc remo attack launch can everyone where Internet, e no the are devices to that embedded posed more assumed and more is as changed it Unfortunat has landscape system. is, the That penetrate actively were environment. could tests adversary benign these a However, decades. in pe past ducted were the systems s in such tested of communication robustness tently vehicle and and reliability plants The enviro tems. industrial closed as a in such operating ment, been long have systems Embedded INTRODUCTION 1 systems. buil embedded to suggestions secure technical provide inspire and can revi design also MPU findings We latest situation. our the hope change We to solutions favored. remedial not new to is reasons fundamental MPU the why parti pinpoint plain in to try (RTOSs), ado and systems FreeRTOS, MPU operating the the real-time investigate major we in work, tion this In perform responsiveness. decreased and of expense the at enhancement vi security brings no MPU that found we ar importantly, more as reasons issues, such bility The reasons surpr products. non-technical are real-world our there To by While defenders. used multi-fold. the seldom is for lunch MPU free the a t become has devices, to many on tential available readily is which (MPU), unit nra rdcs lhuhteMUhsbe rudfrmr th more for around been has MPU the seldom although products, is similar real technique in implemented this and found we design Unfortunately, this hardware. followed FRAM MSP430 prom MCU as a Other is vulnerabilities. it memory Therefore, to compromised. technique is region mitigation code memory of sensitive piece a certain case extens safeguards in security that low-cost MCUs a ARM is to MPU The me (MPU). the unit proposes c protection MCU), leading (or a unit ARM, side, le defense for the privilege designer On same program. core the the at of run that libraries third-party co many especially systems, that these of security the to pose threat errors great memory C/C++, as such languages programming tem ul estse adiscr)dvcsaeepsdt misc to exposed are (IoT), devices insecure) Internet-of-Things (and of tested less emergence ously the to leading net, ABSTRACT ic meddssesaetpclyporme sn sys- using programmed typically are systems embedded Since odMtv u a ein h R P a eoean Become Has MPU ARM Why Design: Bad but Motive Good ainlCmue ewr nrso rtcinCenter, Protection Intrusion Network Computer National olg fIfrainSine n ehooy The Technology, and Sciences Information of College nvriyo hns cdm fSine,China Sciences, of Academy Chinese of University enyvnaSaeUiest,USA University, State Pennsylvania e Zhou Wei egLiu Peng ucs nEbde Systems Embedded in Outcast nsidering l,this ely, ompati- more d wthe ew rtually reants. such s epo- he e as vel previ- mory cular, ising used ance con- rsis- tely. ion hip ise, ex- ys- on an n- p- x- a e s ehp ofidotmjrraost xli h h P a not has MPU the why explain to reasons major out find to hope We hc r otue ycretITMU nyspot8memory 8 support only MCUs IoT current by used most are which R stelae nMUdsg.AMhsdvlpdqiea quite developed has ARM design. MCU in leader the is ARM egtacs oto.I lospiiee otaet defi to privileged allows It control. access weight hnargo fabtaysz srqie,svrlsmaller several required, is size arbitrary of region a when 1 ino aydvcs u oki oiae yti observati this by motivated is ado work wide Our has devices. and many architecture) on tion ARMv4t the (since decades two nMUcnspot81 eoyrgos(Cortex-M0+/M3/M4 regions regions memory 8-16 support can implement MPU the an on depending First, manual. architecture M/R ai iiainmcaim h P a enpoie o li for provided been has space MPU address the same mechanism, the mitigation access basic and level privilege same instr the the at all default, By support. feature security limited market. em IoT deeply MCU, and the systems, in ded popular app very industrial are they and Therefore, real-time tions. for are silicon efficiency lower high-energy much a much have p to designed Cortex-M usually and are cessors Cortex-R them, Among applications. correspo different series)) (Cortex-M Processors se P (Cortex-R Micro-controller Processors Application Real-time (i.e., series), (Cortex-A products sors different of number DESIGN MPU THE 2 overhea additional introducing complexity. n programming than provides embed virtually and other MPU IoT benefit the the that security in found We leader market. a system FreeRTOS, em-ded on on preliminary results This MPU our it. of reports support usage that the systems investigate operating we bedded so, do To popular. been eitr(BAR) – registers Register two ea configure to to need caching) Developers and them. (ordering attributes (read/writ memory permission and access ecution) memory assign and regions ory aet eue orahtetre ie eod oadflexibl add to Second, size. target the reach to Th two. used of be power 2) to and have bytes, 32 Reg least size. at its 1) of be multiple must of it address size, an the at start must it dress, adal.TeMUhsbe upre nmi-temARM seri main-stream Cortex-R in all and supported Cortex-M0+/M3/M4/M7 been including has MCUs, MPU The g HardFault. processor a the permissions, access I the registers. violates these access access ory can code privileged only that Note blog/posts/arm-cortex-m3-processor-the-core-of-the- ainlCmue ewr nrso rtcinCenter, Protection Intrusion Network Computer National eateto optrSine nvriyo Georgia, of University Science, Computer of Department https://community.arm.com/developer/ip-products/pro h rgamn oe fMUi ecie nteARMv7- the in described is MPU of model programming The u oiscs n oe-ffiin ein Rv-/ have ARMv7-M/R design, power-efficient and cost its to Due nvriyo hns cdm fSine,China Sciences, of Academy Chinese of University 1 .Ec einsol eaind eadn h tr ad- start the Regarding aligned. be should region Each ). and uigZhang Yuqing aeAtiueSz eitr(BASR) Register Attribute/Size Base eGuan Le USA iot cessors/b/processors-ip- aeAddress Base cin run uctions is and ries) dn to nding enerates emem- ne mem- a f regions arding and d roces- ation, sa As . and a work hof ch e/ex- bed- lica- ght- es. on. ro- us, p- o y - . Table 1: Popular RTOSs MPU adoption Table 2: of MPU-enabled FreeRTOS on ARMv7- M3/4 RTOS MPU support Open source Permission Access Region No. Base Size Usage FreeRTOS Optional Open-source Mode Attributes Privilege rx ARM Mbed Mandatory Open-source 0 0x0 2GB Code Segment User rx Nucleus 3.X Mandatory Proprietary Privilege rx 1 0x0 24kB Kernel Keil RTX None Proprietary User ∅ Privilege rwx 2 0x20000000 512B Kerneldata None Open-source User rx Privilege rw ThreadX None Proprietary 3 0x40000000 2GB Standard Peripherals User rw TinyOS None Open-source Privilege rwx 4 Stack Bottom Stack Depth User Stack TI-RTOS None Open-source User rwx Privilege 5-7 User-defined User-defined User-defined User-defined uC/OS-II Optional Proprietary User VxWorks Optional Proprietary Memory regions which are unmapped by any MPU region falls in the default region, which can accessed in privileged mode only. each region greater than 256 bytes can be divided into eight equally FreeRTOS APIs as the second MPU region. It is set to be read-only sized sub-regions. This is achieved by configuring the sub-region in privileged mode, and inaccessible in unprivileged mode. disable field (SRD) in BASR. Note that each sub-region can be indi- 2. The kernel maintained data (e.g., current control block) are vidually activated, but all still have the same permissions. Third, re- located in a separated 512B region in RAM which is readable or gions can overlap, and higher numbered regions have precedence. writable only in privileged mode by kernel. There is a default region with priority -1 that maps the entire phys- 3. By default, standard peripherals (e.g., UARTs) could be read ical memory. When enabled, it sets default access permissions for or written in unprivileged mode. privileged mode. Unprivileged mode must be explicitly set to grant 4. Tasks can be created to run in either privileged mode or un- any permissions. privileged mode (user mode). A separated RAM region is reserved Adoption. Taking advantage of the MPU, embedded systems can for each user mode task and is inaccessible from any other tasks. restrict access privilege of untrusted or error-prone code, isolate A Privileged mode task can set itself into User mode, but once in critical services and therefore mitigate the consequences caused by User mode it cannot set itself back to Privileged mode. memory errors. Unfortunately, based on our our investigation, the 5. The remaining three regions can be defined by user mode situation is very discouraging. In Table 1, we list the adoption of tasks individually. MPU in popular market-leading RTOSs. ARM Mbed and Nucleus 6. A like system peripherals (e.g., system timer, 3.X have integrated the MPU in their own system design in the Nested Vectored Controller (NVIC), MPU registers, etc.) first place. While others, including FreeRTOS and ThreadX, add the in private peripheral (PPB) space which are not mapped by MPU support optionally. Other RTOSs such as Contiki, Keil RTX, any MPU region falls in the default region, which can accessed in and TinyOS do not support the MPU at all. What concerns most is privileged mode only. that we found that real devices rarely use MPU-enabled variants. For example, although FreeRTOS has MPU port, most manufactur- 4 PITFALLS OF MPU-ENABLED FREERTOS ers choose the non-MPU version. This is evidenced by the fact that We found that the protection implemented in MPU-enable FreeR- Web Services (AWS), which took stewardship of the FreeR- TOS is incomplete and can be bypassed easily. Moreover, the intro- TOS kernel in 2017, only continues to integrate its IoT libraries to duced overhead makes it unsuitable in many scenarios. the non-MPU version of FreeRTOS. Vulnerable Memory Isolation The aforementioned memory iso- lation can be easily bypassed. Since FreeRTOS APIs are located 3 MPU USAGE IN FREERTOS in a region that can only be accessed in privileged mode, a task Cooperating with the world’s leading chip companies, the FreeR- has to raise its privilege temporarily if it needs to invoke a ker- TOS has become a market leading RTOS and the de-facto stan- nel API. This is achieved by the xPortRaisePrivilege function dard solution for many micro-controllers and small microproces- (vPortResetPrivilege to drop the privilege). For example, as shown sors in the past 15 years. Therefore, in this work, we start with in listing 1, if a user mode task needs memory from the heap, it has studying MPU-enable FreeRTOS. Note that the MPU is only used to invoke MPU_pvPortMalloc which uses xPortRaisePrivilege in the MPU-enabled version of FreeRTOS 2 rather than the widely function to to privilege mode before invoking the FreeRTOS adopted normal version. In the MPU-enabled version, FreeRTOS kernel function pvPortMalloc. On the completion of pvPortMalloc, is revised in the following ways according to the memory map as it needs to drop the privilege by invoking vPortResetPrivilege. shown in Table 2. Since firmware binaries embed all the static compiled program 1. The code segment was configured to be read-only in case as well as FreeRTOS AIPs, function xPortRaisePrivilege can code tampering. In additional, first 24K of flash is reserved for core be easily located via reverse-engineering. Once attacker is able to carry out control flow hijacking attack (e.g., by exploiting a mem- 2 https://www.freertos.org/FreeRTOS-MPU-memory-protection-unit.html ory error) on any user mode task, the hijacked task can directly escalate privilege via invoking xPortRaisePrivilege and never API has to go through a full privilege switch. Since kernel API is fre- drop the privilege. With the escalated privilege, the hijacked task quently invoked in tasks, this poses significant impact on real-time can access any resources on the device. performance. Our experiment shows that one thousand privilege To verify our observation, we artificially built a firmware with switch takes 3.5ms in average on MPS2+ FPGA prototyping sys- a stack overflow bug, and then exploited this bug to launch a clas- tem broad (AN386) with 25MHZ CPU clock frequency. In a previ- sical control flow hijacking attack. Specifically, the original return ous research [4], similar result was obtained. In addition, the MPU address on the stack is overwritten by the address of the function regions of each task is different from each other, so MPU regions xPortRaisePrivilege. As a result, the executing privilege has have to be reconfigured during task switch, which will also cause been escalated. Combining more sophisticated ROP programming, time delay. we were able to take over the control flow with elevated privilege. This has been verified by our experiments and others [7]. Ironically, FreeRTOS intends to leverage MPU to protect kernel code from 5 WHY THE MPU HAS BECOME AN memory errors. However, improper protection to xPortRaisePrivilege OUTCAST itself deconstructs the boundary between the task code and kernel code. In other words, if there is a memory error, the added security There are multiple reasons that make MPU less attractive. We sum- can be bypassed completely. marizes the most important ones based on our observations/exper- iments. void *MPU_pvPortMalloc( size_t xSize ) { Non-technical Reasons. First, IoT devices are low-cost energy- void *pvReturn; efficient devices. If more transistors are reserved for complex secu- BaseType_t xRunningPrivileged = xPortRaisePrivilege(); rity features, not only the price of SoC could be raised accordingly, pvReturn = pvPortMalloc( xSize ); vPortResetPrivilege( xRunningPrivileged ); but also increased power consumption rules out many applications return pvReturn; in which thermal design power (TDP) concerns. } Second, as IoT business continues to grow, manufacturers are Listing 1: pvPortMalloc function in MPU-enable FreeRTOS facing increased time-to-market pressure. Although security is a concern, manufacturers tend to reuse existing code base, which is obsolete and less tested on the Internet. At the same time, IoT ap- Conflict with Exiting System Design Second, due to MPU in- plications are becoming more and more Moreover, developing new tegration, some existing mechanisms are diminished or even be- software leveraging MPU may cause compatibility issues. This is come incompatible. For example, user mode tasks cannot use dy- clearly shown in the case of MPU-eabled FreeRTOS and other de- namic queues because there is no between any two veloper forums [3]. In summary, if existing code works, few com- tasks. To overcome this, a task has to allocate memory statically panies are willing to harm the profit by investing on security. and shares it with the peers by configuring an MPU region. Note Technical Reasons. The MPU is a trimmed down version of MMU. that each peer needs an same MPU region for each queue. In ad- Can we directly borrow the design of MMU to replace MPU? We dition, is also a special kind of queue (Its queue length believe this is infeasible due to two reasons. Except for the afore- is one). As a result, if a task needs multiple queues or semaphore, mentioned cost issue, MPU actually benefits very little from ad- MPU resources soon become exhausted (there are only three free vanced features available on MMU. For example, paging which is MPU regions for user mode tasks to use). the underlying technology of , is used in a batch Incomplete Protection The MPU-enable FreeRTOS is coarse-grained of security solutions. Whereas in MCU, paging is an overkill be- and inflexible. First, standard peripherals are not protected by de- cause the RAM in an MCU never exceeds several megabyte. The fault. Although there are three remaining MPU regions can be benefit of virtual memory is substantially diminished. Supporting configured individually by each user mode task, they are not suit- paging or not becomes a dilemma and ARM obviously chooses not able for protecting several separated small peripheral regions due to support it. to the alignment problem and limited number of MPU regions as Incorporating security checking for each memory access and mentioned in Section 2. For instance, the memory map for Audio frequent privilege (as is done in MPU-enabled FreeRTOS) peripheral on MPS2+ FPGA prototyping system broad (AN386) is inevitably introduce performance overhead. As shown in Section 4, 0x40024000-0x40024FFF (16 Bytes). However, the least length of the performance is so significant that real time constraints cannot MPU region is 32 bytes. be met in some scenarios [4]. There are two major sources of ad- Second, Since FreeRTOS are located in the region that can only ditional overhead. First, each task switch requires MPU reconfig- be accessed in privileged mode, a task has to raise its privilege tem- uration. Second, each hardware interrupt or invocation of kernel porarily if it needs to invoke a kernel API. On the other hand, de- API require privilege escalation. This rules out the MPU in many velopers have the demand to assign separate access rights to in- real-time applications. terrupt handlers [2]. However, NVIC registers are located in the system peripheral region which can only be accessed in privilege mode unless MPU is disabled during interrupt handling. 6 SUGGESTIONS Increased Overhead The “protection” provided by the MPU in- We propose technical suggestions of building securer embedded curs too much overhead. This is because each invocation to kernel systems and review the latest secure design. 6.1 MPU Revision in ARMv8-M that efficiently implements many novel security primitives (e.g., ARM has already acknowledged the problems with ARMv7-M MPU execution-aware MPU, secure ) for embedded devices. In ad- by revising it in ARMv8-M. However, it only mitigates the problem dition, ARMv8-M architecture extends TrustZone technology to rather than solving it. The most noticeable improvements include Cortex-M series processors for incremental security enhancement. increased region number (as many as 16) and more flexible region In particular, existing embedded software does not need heavy re- alignment. As a result, MPU registers can be used more efficient to engineering but still benefits from a trusted execution domain. meet the requirement of different region sizes. Using Start and Limit (end) address to define memory regions 7 CONCLUSION simplifies memory region definition and leads to a more efficient This paper attempts to answer the question of why the MPU, as use of available memory space. As mentioned in Section 2, when a a ready-to-use security feature for protecting ARM-based MCUs, region of arbitrary size was required, several smaller regions had has been largely ignored by both device manufacturers and RTOS to be used to reach the target size in ARMv7-M, while ARMv8-M communities. We use MPU-enabled FreeRTOS as a concrete exam- just need one region. ple to showcase how the claimed security benefits brought by MPU can be bypassed or undermined. Although FreeRTOS cannot rep- 6.2 Suggestions resent all the embedded OSs, we believe our observations apply to other OSs because the demonstrated pitfalls root in the fundamen- Better Usage of MPU. A serious limitation with MPU is that only tal design drawbacks of MPU. We forecast what future MPU will limited number of regions are supported. We found that creatively be like and also provide technical suggestions to safeguard legacy using sub-region disable field (SRD) in BASR can make MPU more devices. efficient. As mentioned in Section 2, each memory region can be divided into eight sub-regions, which can be enabled/disabled in- REFERENCES dividual. Suppose a developer needs to allocate a 5KB region and a [1] Abraham A Clements, Naif Saleh Almakhdhub, Saurabh Bagchi, and Mathias 3KB region for two tasks. Without sub-region, the developer has to Payer. 2018. ACES: Automatic Compartments for Embedded Systems. In 27th USENIX Security. configure two regions of 8KB and 4KB separately. There is a waste [2] NXP Community. 2019. MPU advanced features? of 3KB and 1KB memory space correspondingly. With sub-region, https://community.nxp.com/message/1144517?commentID=1144517#comment-1144517. the same 8KB region can be shared between the two tasks. Specifi- [3] ST Community. 2018. FAQ: Ethernet not working on STM32H7x3. cally, when running task one, MPU is configured so that the high- https://community.st.com/s/article/FAQ-Ethernet-not-working-on-STM32H7x3. est three sub-regions of the 8KB region are disabled. When running [4] Chung Hwan Kim, Taegyu Kim, Hongjun Choi, Zhongshu Gu, Byoungyoung Lee, task two, MPU is configured so that the lowest two sub-regions of Xiangyu Zhang, and Dongyan Xu. 2018. Securing real-time microcontroller sys- the 8KB region are disabled. In this way, the 8KB memory block is tems through customized memory view switching. In NDSS. reused by the two tasks without wasting any memory. [5] Patrick Koeberl, Steffen Schulz, Ahmad-Reza Sadeghi, and Vijay Varadharajan. 2014. TrustLite: A security architecture for tiny embedded devices. In EuroSys. In addition, it is a common practice that same kinds of peripher- [6] Thomas Nyman, Jan-Erik Ekberg, Lucas Davi, and N Asokan. 2017. CFI CaRE: als (e.g., UART0 and UART1) have adjacent and same size memory Hardware-Supported Call and Return Enforcement for Commercial Microcon- region. Thus, developer can use a large region to cover adjacent trollers. In RAID. Springer. [7] Joel Sandin. 2016. AWS IoT Developer Guide. and same size peripherals memory region. If several nearby pe- https://speakerdeck.com/jsandin/shmoocon-2016-exploiting-memory-corruption-vulnerabilities-on-the--operating-system. ripherals need to be protected (i.e., only can be access in privilege mode), the developer can just disable these correspond sub-regions when running use mode tasks. Note that above approach is still not able to protect several peripheral memory regions which far from each other. Software Workaround. Before a better MPU is proposed, on the one hand, we should continue to improve coding quality to avoid program bugs; one the other hand, we can resort to software solu- tions. Some previous researches [1, 4] propose to use static analysis and recompile the firmware to achieve more effective MPU usage. For example, MINION [4] uses k-means clustering to group mem- ory sections having similar access permissions together to mini- mize the number of required MPU regions. It also configures MPU during task switches to avoid privilege escalation requests. Hardware Retrofit. Although, ARMv8-M provide more power- ful MPU design, it can not fundamentally solve the problems we mentioned in Section 4. In the long term, we expect a redesigned architecture that fundamentally addresses the illustrated insecu- rity and inflexibility in a lightweight way. Along this direction, hardware-based solutions have been proposed [5, 6]. The most rep- resentative work is TrustLite[5]. It is a radical hardware redesign