Extrax: Security Extension to Extract Cache Resident Information for Snoop-based External Monitors

Jinyong Lee , Yongje Lee , Hyungon Moon , Ingoo Heo , and Yunheung Paek Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea Email: {jylee, yjlee, hgmoon, igheo, ypaek}@sor.snu.ac.kr Abstract—Advent of has urged researchers to conduct alter the kernel by snooping every data traffic between the host much research on defending the integrity of OS kernels. Even CPU and main memory. Being located at the outside of the host though recently proposed snoop-based monitors have shown to as a dedicated hardware unit, the monitor is not only immune provide higher performance and security level compared to con- to rootkits attacks on the host, but also able to constantly ventional -based monitors, we discovered that the use observe the memory access behaviors of rootkits revealed on of write-back caches in a system would seriously undermine the the system bus without affecting the host performance. effectiveness of snoop-based monitors. To address the problem, we propose a special hardware unit called Extrax which makes use of Although snoop-based monitors have been working well existing hardware logic, core debugging interface, to extract nec- in their environments and assumptions, we have recently essary information for security monitoring. Being implemented to discovered a potential vulnerability which future attackers refine the debug information for security purposes, Extrax assists might exploit. It comes from the fact that most computer snoop-based monitors to detect attacks that exploit write-back systems employ write-back caches. Being located in between caches. Experimental results show that our system can detect the host CPU and main memory, caches hold copies of data more advanced attacks, which the state-of-the-art snoop-based or instructions recently accessed by CPU, thereby boosting the hardware monitors cannot capture, with moderate area overhead overall system performance to a large extent. However, for the and power consumption. perspective of snoop-based monitors, the existence of caches I. INTRODUCTION AND PREVIOUS WORK can be disadvantageous because they shall reduce the number As electronic devices such as PCs and smartphones become of events that the monitors can watch. For example, if a essential parts of our everyday life, the potential privacy and tries to compromise the kernel by modifying sensitive data, and security risks due to numerous malwares on the devices are the very data hits in the cache, then the write traffic would not rapidly growing. As a means to protect such devices from appear on the system bus, rendering the monitor oblivious of these attacks, current OSes support a variety of anti-malware the write event. solutions. These solutions usually depend on the services from Even though some previous works discussed the possibility the underlying OS kernel, implying that they would only that this problem may seriously undermine the effectiveness of work as designed when the integrity of the kernel is ensured. their approaches [6]–[8], none of them has properly addressed However, the kernel integrity has been seriously threatened this cache-induced hiding (CIH) effect problem. In [7], they since the advent of kernel level rootkits that manipulate the tried to avert the problem by restricting the usage of their kernel so as to achieve certain goals (i.e., concealing their monitors to the systems with write-through caches. In [8], they existence or providing backdoor accesses). Because the kernel merely mentioned a simple scheme of using periodic cache operates at the highest privilege level in the system, the flush. Unfortunately, they did not provide any empirical data compromised kernel may nullify the effectiveness of any anti- about how much loss their scheme may suffer on performance, malware measures that have their root of trust on the kernel. detection rate or power consumption. However, as we will see The threat of rootkits have urged researchers to conduct later, our study evinces that frequent cache flush might increase much study to seek a more secure base that can the host performance overhead to a large extent. safely monitor the system and ensure the kernel integrity even In this paper, we propose a hardware-assisted low-overhead in the presence of rootkits. Two mainstream of the research solution which thwarts the CIH effect by enabling the external directions are hypervisor-based [1]–[3] and hardware-based monitors to directly access the cache resident information approaches [4]–[8]. In general, the former approaches have (CRI) which includes all the internal data residing within the popularity in the security community as they do not necessitate cache without being exposed on the system bus. To implement underlying hardware modification while providing a higher this solution, we utilized the existing hardware logic, called privileged, thus safer, software layer for monitoring than the the core debug interface (CDI), which can be found in several kernel does. However, the latest attacks [9] and reported processors available today such as ARM Cortex series and vulnerabilities [10] pointed toward the probability that the code Xilinx MicroBlaze [11], [12]. CDI has been conventionally and data of can also be compromised at runtime. used to supply the information relevant to the CPU internal Although the known vulnerabilities have been fixed shortly, state for the on-chip debug (OCD) unit [11], [12]. If CDI the growing complexity of hypervisors implicates that there is plugged into a security monitor, the bountiful information would be more vulnerabilities revealed in the near future. provided by CDI, which contains memory access events issued The hardware-based approaches utilize an isolated hardware by CPU, would certainly help monitor perform its desired task module physically independent of the monitored host system without the CIH effect. [5]–[8]. In particular, prominent monitoring schemes are re- This task, however, involves several complications in im- cently proposed in [6]–[8]. At the center of these approaches, plementation majorly because the initial set of signals from there is a hardware monitor, which we hereafter call the snoop- CDI cannot be simply fed into the security monitor as they based monitor, whose role is to detect malicious attempts to are in their present form. Some signals originally generated for

978-3-9815370-4-8/DATE15/c 2015 EDAA 151 6LJQDO 'HVFULSWLRQ VWUXFWPRGXOH VWUXFWPRGXOH D (70,&7/>@ (70LQVWUXFWLRQFRQWUROEXV /.0 /.0 (70,$>@ (70LQVWUXFWLRQDGGUHVV OLVW OLVW OLVW

(70'&7/>@ (70GDWDFRQWUROEXV VWUXFWPRGXOH VWUXFWPRGXOH VWUXFWPRGXOH E (70'$>@ (70GDWDDGGUHVV 0DOLFLRXV/.0 /.0 /.0 OLVW OLVW OLVW OLVW (70''>@ (70GDWDZULWHGDWDYDOXH

(70&,'>@ &XUUHQWSURFHVVRU&RQWH[W,' F VWUXFWPRGXOH VWUXFWPRGXOH VWUXFWPRGXOH 0DOLFLRXV/.0 /.0 /.0 TABLE I. DESCRIPTION OF CDI SIGNALS FOR ETM OLVW OLVW OLVW OLVW debugging must be translated into another form that is required for security monitoring. Therefore, we have developed an extra Fig. 1. Cache resident LKM hiding attack hardware unit, called the Extrax, that being located between hiding technique as a representative cache resident attack CDI and security monitors, carefully examine and properly example since many rootkits in the wild employ the technique refine or transform each individual signal from the interface to hide themselves. LKMs are initially designed to support before delivering it to the monitor. extension of the kernel code at runtime without recompiling To validate our design and further explore the implication the entire kernel. However, they are often used by attackers of this additional circuits to the overall system, we have to conceal malicious processes, files or even themselves from implemented a full snoop-based monitoring system in which detection mechanisms. Adversaries achieve their goal of hiding the host system has been augmented with Extrax. With the LKMs by directly modifying the kernel data structures that system prototyped on a FPGA platform, we evaluated and maintain the list of loaded LKMs. compared the performance, power and area of our full system Figure 1 shows how the LKM hiding technique is affected against the baseline system in which Extrax is not deployed. by write-back caches. In (a), there are several LKMs, each Experiment results exhibit that our monitor, with modest area of which is represented by the struct module. The kernel and power overhead but with the host performance being al- handles the LKMs by maintaining the modules list, which is most unaffected, successfully detects rootkit attacks regardless a linked list of struct module. Upon the module load request, of the type of caches while the baseline monitor often fails. in this case a request from the malicious LKM depicted in The rest of the paper is organized as follows. We first present (b), the kernel adds the corresponding struct module to the the background information and a motivational example in list, which is the head of the modules list. In a system with Section II. Then in Section III, the assumption and threat model write-back cache, the list will be cached after this step, and are presented. In Section IV, the baseline system is presented subsequent accesses to the data structure will also hit in the to show the overall operation of hardware-based monitoring. cache. Thus, even if the malicious LKM removes itself from Section V describes the details of the proposed security exten- the modules list by directly manipulating the pointers of the sion, Extrax. After Section VI shows our experimental results, modules list as depicted in (c), this event might not be placed we conclude this paper in Section VII. on the system bus. Consequently, recently proposed snoop- II. BACKGROUND based monitors might no longer guarantee the integrity of the A. Core Debug Interface kernel since they detect attacks by snooping the system bus. The on-chip debug (OCD) unit is a debug/trace module Hence, a novel way to nullify CIH effect should be devised. that has been widely adopted to the commodity processors. III. ASSUMPTIONS AND THREAT MODELS Provided by OCD, a rich set of information allows users, We use the assumption taken by previous snoop-based on their , to follow path that the target monitors, especially by KI-Mon [7]. Therefore, we assume CPU takes as a result of code execution and monitor values that adversaries have already gained administrators’ privilege in various registers and memories. CDI is an interface placed on the host system and thus are able to install rootkits to on the CPU side, and provides OCD with the CPU’s internal hide themselves or leave backdoors to the host system; for status information. In this paper, we specifically focus on the instance, the attackers can install LKMs or place hooks on interface that provides instruction address, current context ID critical system calls. However, we rule out physical attacks by (or ID), and data address/value of memory access an insider who has direct access to the host system and direct instructions in real-time without affecting the performance kernel structure manipulation attacks proposed in [14]. of the target processor. A representative example of OCD In addition, we also assume that the host system uses write- that supports such a tracing capability is the embedded trace back caches, and provides CDI, that can be connected to macrocell (ETM) module of ARM [13], and the signals to OCD. Side-channel attacks that exploit the information from ETM provided by ARM through CDI is described on Table I. OCD/CDI are not considered in this work.

B. Cache Resident Attacks IV. THE BASELINE SYSTEM We define cache resident attacks as malicious attacks that, A. The Overall System Design intentionally or unintentionally take advantage of the CIH Figure 2 shows a high level view of the baseline system. effect; the existence of write-back caches can unintentionally Since our monitoring system employs the snoop-based mon- blindfold snoop-based monitors by impeding memory write itoring scheme, it is similar to the prototype of KI-Mon [7]. events from appearing on the system bus, or attackers can To ensure the integrity of the kernel, the monitor side core intentionally hide the evidence of attacks by overwriting the dynamically configures the hardware ASIC units, especially malicious data residing in caches with benign one, thereby the snooper, based on a security policy. prohibiting the monitors to detect the symptom of attacks. To We designed our baseline monitor to support various policies better explain, we chose the loadable kernel module (LKM) on detecting attacks on immutable regions and mutable objects

152 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) HostSystem SnoopͲbasedMonitor state (a) to (b) on time s, and from the state (b) to (c) on time AHB Verifier

 Trafficfrom Filtered Master SystemBus Snooper e. In principle, the baseline monitor concludes that an LKM Hostcore theCPU Traffic Core is malicious when the LKM is removed from the modules list  IF CDI Memory (state change from (b) to (c)) while the corresponding memory DMA &1 System Bus Ctrl ˎ  region for the LKM remains in memory [7]. Thus, the period

CPU_CLK BUS_CLK MainMemory of cache flush p should be shorter than the interval d=e-s so that every event on modules list can be revealed on the system Fig. 2. The overall baseline system design bus before the adversary achieve her own goal. Since d in a primitive LKM hiding technique is sufficiently large, p of our that have invariant value sets, as KI-Mon proposed in [7]. baseline with periodic cache flush could be long as well, so Immutable regions contain data that should not be modified as to detect the attack with acceptable performance overhead. after the boot process is complete, such as the table However, by slightly changing the original LKM hiding (SCT) and descriptor table (IDT). Kernel mutable technique, it is possible for attackers to reduce d substantially. object with a invariant value set is data object in which data The devised technique requires two LKMs, one of which is can be updated by the kernel at run-time, but the updated value the malicious LKM and the other is an LKM that merely is chosen among the set of possible values that can be profiled hides the first one. We call the latter the hider LKM whose prior to run-time; for instance, many function pointers within role is to insert a callback function to the kernel timer, and kernel objects are known that each function pointer points to set the timer with a period q. Then the callback function is one of its possible candidate landing sites [7]. periodically invoked to check whether the malicious LKM, According to the security policy of the host CPU, the which ultimately achieves the attackers goal, is inserted or snooper is configured with appropriate address ranges of not, and hide the newly added one upon detection. To insert the kernel data objects or regions to be monitored on main and hide a malicious LKM, attackers first insert the hider memory. Then the snooper constantly acquires write events LKM with an arbitrary period q, and insert the malicious one placed on the system bus, filters out every benign event that sometime later. Since the hider LKM does not hide itself and is is not relevant to the monitored regions, and transfers only the thus added and removed legitimately, the monitor has no way ones that violate the current security policy to the verifier core of distinguishing the hider LKM from other normal LKMs. for further investigation. Thus, to defend against such cache resident attacks, the While attacks on immutable regions can be easily detected flushing period d should be adjusted to a very small value. by simply snooping the bus for write events on the region, Our preliminary study showed that, in order to attain 100% catching evidence of malicious modifications on kernel muta- detection rate, the period d need to be reduced to 30us, ble objects with invariant value sets is not straightforward since resulting in increased performance overhead of up to 84%. mere write events cannot be regarded as a symptom of attacks. From this result, we claim that the detection with periodic Therefore, we employ techniques similar to the ones proposed cache flush might not only induce huge performace overhead, in [7] such as the whitelisting-based verification and callback- but would also cause failing in detection if attackers know the based semantic verification that basically verifies the written existence of periodic cache flush and modify their attacks to values as well. The detailed explanation of these techniques are reduce d. omitted in this paper since our work is focused on overcoming the CIH effect rather than suggesting new detection schemes. V. E XTRAX DESIGN Therefore, readers interested in these techniques are kindly As long as CRI is accurately sent via CDI to snoop- referred to [7]. based monitors, many security threats due to cache resident The key difference between our baseline system and the attacks would be resolved without modifying the host inter- prototype of [7] is that ours uses write-back caches. Therefore, nal architecture. Unfortunately, realizing precise and efficient snooping only the system bus may cause detection failure deliverance of these signals in an actual system comes at a because of the aforementioned CIH effect. cost with some implementation challenges, as briefly stated B. Periodic Cache Flush for Cache Resident Attacks in Section I. Below is summarized two of those that must As mentioned in [8], periodically flushing caches might be resolved in order to efficaciously transfer the host internal help reveal more CRI on the bus when write-back caches information to snoop-based monitors through the existing CDI. are deployed in the system. To show the effectiveness of 1. CDI is originally designed to send a virtual address (VA) the scheme, we applied it to the baseline system. When to the OCD unit for each memory access while the monitor implementing the scheme, the flush period should be decided demands physical addresses (PA) so as not to be disrupted by with great care because a reckless choice of the period may the certain type of attacks which will be discussed shortly. induce either non-negligible overhead or detection failure. Therefore, the original VAs from CDI cannot be directly used For attacks that aim at immutable regions [6], the cache flush for a snoop-based monitor. period, p, can be selected arbitrarily because any write attempt 2. The number of memory events coming from CDI is far on the regions is deemed malicious [6], and the cacheline larger than that of those appearing on the system bus because where the written data is located will eventually be evicted to a majority of write events originating from CPU are hidden main memory regardless of the period p. However, the decision by caches on the way to the bus. This excessive number of of the period p for attacks that target kernel mutable objects is events sent from CDI could be burdensome to the monitoring far more difficult since attackers can usually figure out a way system in terms of performance. to avoid detection by slightly modifying the original attacks. These key issues are tackled respectively by two new To better explain, consider the case shown in Figure 1. hardware units, the address translation unit (ATU) and the Assume that the state of the modules list changes from the early stage filter (ESF), both of which constitute our Extrax.

2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) 153 Virtual Address Latch 9LUWXDO$GGU>@ 3K\VLFDO$GGU>@ SystemBus TrafficfromSystemBus 9LUWXDOSDJH 3DJH2IIVHW     3K\VLFDO$GGU>@ +RVW&38

&', (6) Translation Lookaside Buffer

ZLWK )LOWHUHG 9HULILHU $+%0DVWHU,) $78 6QRRSHU :ULWHEDFN 7UDIILF &38 3DJH7DEOH(QWU\ Async. 3DJH 9LUWXDO 3DJH7DEOH(QWU\ &DFKH Queue 3,' 7DEOH )URP7R

$GGUHVV 3DJH7DEOH(QWU\ˎ 7DJV :DONHU 6\VWHP%XV +RVW6\VWHP([WUD[ 6QRRSEDVHG0RQLWRUZLWK([WUD[ 7DJV CPU_CLK BUS_CLK 3DJH7DEOH(QWU\Q )URP7R 3,'>@ $+%6ODYH,) $+%6ODYH Fig. 3. The augmented baseline system with Extrax Fig. 5. The structure of the ATU Figure 3 depicts a block diagram of our proposed system where the baseline is extended with Extrax. It is noteworthy here from the CIH effect. This abundant information continuously that even though the information conveyed through ATU can streaming into the monitor will certainly enhance the chance cover all memory events, the original path from the system to detect cache resident attacks. However, it may also create bus remains the same. The purpose of the path is mainly to an excessively large volume of information flow that will detect attacks from other bus masters such as DMA. In this inevitably impose heavy burdens on the monitor. section, we will focus our discussion on these hardware units. Though, if we remind that the monitors usually need to watch only a subset of the memory events of the host de- A. Address Translation Unit pending on security policies, it would be wasteful if ATU Upon receiving VAs from CDI, a snoop-based monitor exhaustively translates all incoming addresses for the monitor. should decide whether the current memory access targets the The problem can be alleviated if we can filter out benign regions it monitors. It seems that such a decision can be events based on a security policy. As an example, for one made based on VAs, but there are cases in which the monitor of the policies considered in our experiments, the monitor is necessities PAs. For instance, consider the case depicted in interested in write attempts to the kernel data. Therefore, any Figure 4 where the monitor is protecting a kernel data structure memory events can be safely discarded before reaching contained in the critical page frame. Since this data structure either the monitor or even ATU. The filter operations in this is critical to the integrity of the system and is thus managed case is in fact rather straightforward since memory access types by the kernel through the kernel , no arbitrary are easily discernable right after the events occur. However, we mapping should be made to the data structure. However, in often need to apply more aggressive filtering to the events. As the presence of kernel-level rootkits, it would be possible for briefly mentioned in Section IV, the kernel data of interest attackers to simply insert an LKM that generates another page occupy relatively small amount of memory compared to the table mapping for the data structure, denoted as the malicious whole range of main memory. Therefore, the monitor just mapping in the figure. Thus in this situation, if the monitor needs to watch the access events on this limited region. Un- used the original VA to protect this type of kernel structures, fortunately, it is not always straightforward to decide whether the attackers could indirectly modify the kernel data with the or not an event just emitting from CDI falls into this region, newly mapped VA, hence successfully escaping the vigilance since its address remains virtual before reaching ATU. of the monitor. Nevertheless, there is still a way to provide a solution to To deal with the cases where PAs must be supplied for this decision problem. For this, consider Figure 5 where we security monitors, we have installed ATU that translates VAs see that an address translation step does not require the page from CDI into physical ones. The overall architecture of offset field; in fact, this field is identical for both VAs and PAs. ATU is displayed in Figure 5. ATU is configured by the Inspired by this fact, we implemented ESF which, being placed monitor through the advanced high-performance bus (AHB) between CDI and ATU, removes unnecessary memory events slave interface. On a translation lookaside buffer (TLB) miss, a based on page offset before the events reach ATU, thereby page table walk is initiated through the AHB master interface, reducing the number of events delivered to ATU. which is connected to the host system bus. The input to ATU Figure 6 represents the internal block diagram of ESF, which includes a VA, the context ID and the base address of the rearranges the signals from CDI and filters out as many events page global directory (PGD). The former two inputs are as possible. The implementation of output rearrangement is provided by CDI, while the latter one is configured through somewhat simple in that it merely reorganizes and extracts sig- the AHB master interface immediately after the current context nals that are needed for the monitors to detect attacks. Among ID is updated. TLB is a fully associative table in which the the signals introduced in Section II, the addresses/values of number of entries can be configured from 16 to 32. It has a memory write instructions and context IDs are selected and random replacement policy. rearranged for monitors. B. Early Stage Filter In the current implementation, ESF has eight 12-bit address With the help of CDI, our monitor is now able to snoop range register pairs and comparators. The address range reg- every write event generated by the host CPU without suffering ister pairs of ESF contain the page offset field of start/end addresses that need to be monitored. The register values can Malicious 3DJH7DEOH  be configured by the verifier core at any time. Therefore, Mapping 9LUWXDO$GGUHVV whenever the critical kernel regions of interest are updated, 3*' 0DOLFLRXV37( &ULWLFDO3DJH)UDPH 008 the monitor configures the snooper and ESF simultaneously. 3*'(QWU\WR 3*'(QWU\WR 3DJH7DEOH &ULWLFDO5HJLRQ After the configuration, ESF compare the page offset field 3K\VLFDO$GGUHVV 2ULJLQDO37( of a VA that comes from CDI with the values of address range register pairs. If the address is included in any of the Fig. 4. Multiple virtual addresses mapping example monitoring regions, ESF enables the filter output register so

154 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE) ) ) 155 ) the BaseWB virtual file Ours-Extrax BaseWT [18], and implemented two OOTKIT DETECTION RESULT 'HWHFWHG'HWHFWHG'HWHFWHG 'HWHFWHG'HWHFWHG 'HWHFWHG 1RWGHWHFWHG 1RWGHWHFWHG 'HWHFWHG 'HWHFWHG 'HWHFWHG 'HWHFWHG %DVH:7 %DVH:% 2XUV([WUD[ YNTHESIS RESULT OF THE PROTOTYPE SYSTEM 6&7+RRNLQJ /.0+LGLQJ 9)6+RRNLQJ ,'7+RRNLQJ TABLE III. R ([DPSOH1DPH 63$5&9&RUHZLWK//&DFKH +RVW6\VWHP 63$5&9&RUHZLWK/&DFKH2QO\ ([WHUQDO0RQLWRU %XVFRPSRQHQWV $+%%XVHV$+%$3%EULGJHV 0HPRU\&RQWUROOHU 6QRRSHU 3HULSKHUDOV 7,0(58$57DQGHWF 7RWDO%DVHOLQH6\VWHP (DUO\6WDJH)LOWHU (6)  $GGUHVV7UDQVODWRU8QLW $78 LQFOXGLQJTXHXH 7RWDO([WUD[  ([WUD[RYHU%DVHOLQH6\VWHP                          (VFS) hooking attacks target mutable objects that have REMHFWV UHJLRQV 0XWDEOH ,PPXWDEOH TABLE II. S SPEC 2006 benchmark suites Recent attackers tend to avoid launching attacks that are Since CDI does not introduce any performance impact on To evaluate the security monitoring capability of our ap- To demonstrate the effectiveness of Extrax, we injected ([WUD[ 6\VWHP %DVHOLQH &DWHJRU\ &RPSRQHQW /87V %5$0V '63( our proposed systemmonitor with in Extrax. BaseWTsince could As immediately the seen detect host insending all uses Table the every a III, rootkits write write-throughin the event BaseWB, cache, onto however, thus the wasmutable system immediately objects unable because bus. to of The thecontrary, detect CIH monitor could attacks effect. detect Ours-Extrax, on onor all kernel the not), the thanks attacks to (whether our cache Extrax resident andeasily CDI detectable support. like thosemore importance on becomes immutable the regions.regions detection Instead, [17]. of of attacks Wedeployed on have mutable on just the seennullified host that when core a cache withobjects. snoop-based a resident Therefore, monitor write-back attacks werole cache claim are in is assisting that made such easily level monitors, Extrax on of thereby can the mutable increasing the systems. play security aC. critical Performance Analysis the host, the main factorgenerated which by incurs ATU the as overhead a is resultthe the of performance traffic address translation. overhead, To measure the we chose seven applications from B. Security Evaluation proach, we choseare several employed in well-known real-worldrootkits attack rootkits that techniques [8] target and either that objects. implemented immutable regions four Table or kerneltarget III mutable can lists beimmutable the deduced regions rootkits, while by thesystem of their others, the names. which LKMinvariant The value and the sets. first We specific also twothat implemented runs monitoring target software on ourfor monitoring, verifier such core as to theaddress snooper, configure range ESF registers the and of ATU. peripheral The ESFaccesses units current is configured only to capture memory ondata structures kernel such mutableevents as on objects immutable page regions and tablessince, can as be for other mentioned safely the before, filtered related enough snooping out objects. to the by Memory system catch ESF bus attacks would on be immutable regions. the four rootkitsbaseline system into with write-through three caches as system in [7], versions: ( ( the baseline system with write-back caches and ( ope Conference & Exhibition (DATE) ID  Filter Data  Address Context

Stage Filter  OutputReg pass_en Early  ESULTS Filter 

đ n R matched Addr REGn đđ  Filter  1 matched 2015 Design, Automation & Test in Eur Addr XPERIMENTAL REG1  VI. E Filter  0 ID  matched Addr REG0 Address Data Context To evaluate our approach, we augmented the baseline sys- Based on the parameters for the prototype as described We have implemented our baseline system as as As mentioned before, current ESF implementation has 8 tem with Extrax.synthesizable core Although [15], our providesthe their host information own comes processor, CDI outto specification, -source of CDI that is of quiteslightly commercial restricted extended compared product, it suchthose to as of support ARM. the Therefore, ARMand CDI we ATU architecture are signals implemented (seespecification equivalent [13]). to to Table Since be CDI I).OCD, compatible is connected Thus, with we to ARM designed both bothsnoop-based CDI Extrax Extrax ESF and monitoring to ishave disable 8 turned signals address on. range toSPARC V8 register ESF OCD Reference pairs. MMU is when Our [16],16 ATU, has configured compliant TLB been to with configured entries to and have 16 input queueabove, entries. we synthesizedwith our a system Xilinx SC5VLX330of onto FPGA. the a Table baseline prototyping IIfor system board provides and logic the area Extrax(DSP48E). (LUTs), in It block terms showsLUTs of RAMs that lookup as Extrax (BRAMs) tables incurs comparedthe and 12.09% to area DSP overhead overhead thenoteworthy for slices of baseline here our hardware. thatsynthesizable Extrax Even core our seems based though on non-negligible, baselineindeed SPARC it V8 system, very architecture is the smalloverhead [15], of has open-source size. Extrax Therefore,the might system be we with quite commercial claim acceptable CPU if that core deployed such the on as area Cortex-A9. possible to the designwhere proposed the in [7], hostsynthesizable as core CPU an [15] FPGA is whichstage prototype uses the pipeline. a It SPARC single-issue, hasand V8 in-order, data. separate 7- In processor, 16kB addition, L1 awhich the caches host 32-bit employs CPU for also instruction thecompliant has a write-back with 256kB L2 policy. theinterconnect cache The all AMBA2 host modules AHB/APB inused system protocol the as bus system, is the and hostimplemented used OS. Linux with to the The 2.6.21.1 same snoop-based is snooper processor monitor has and system the eight is system setsconfigured bus. of also according The address to range the registers which security can policy be of the monitor. A. Prototype System pairs of address rangerent register, monitoring limiting regions. the Note numberbe that of adjusted the concur- for number the ofwe environment registers can of can deployment. looselyeach Alternatively, set pair contains the moreit address than would one range produce monitored unnecessarystill register region. events do Although pairs, for not ATU so miss to any that handle, access ESF to the monitored regions. Fig. 6. The overall structure ofthat the the early stage eventaddress, filter can thus flow obviating unnecessary to operations ATU. in Otherwise, ATU. ESF blocks the 3URSRVHG6\VWHP 3URSRVHG6\VWHP $SSOLFDWLRQ %DVHOLQH6\VWHP ASIC module plugged into CDI in order to convey the host ZLWK(6)WXUQHGRII ZLWK(6)WXUQHGRQ internal information from CDI to the external monitor. For K V V V E]LS V V V precise and efficient monitoring, the module performs the tasks KPPHU V V V of address translations and filtering out benign memory write OLETXDQWXP V V V events. To validate our proposed design, we have implemented SDUVHU V V V RPQHWSS V V V a prototype on FPGA, and evaluated the security capabilities [DODQ V V V in addition to the performance, power and area overhead. TABLE IV. PERFORMANCE OVERHEAD Empirical results showed that our monitor, regardless of the versions of the host system: the baseline and the proposed full type of caches, successfully detect all our rootkit samples, system with ESF turned off. The reason of turning ESF off which the previous monitoring systems often failed to catch is to strain the system with the excessive traffic generated by owing to CIH effect, with modest area and power overhead CDI in the worst-case scenario. increase along with virtually no host performance overhead. Table IV represents this worst-case performance overhead, ACKNOWLEDGMENT which is around 3.24%. The reason for this low overhead might This work was partly supported by the IT R&D program of be explained in a way that even if there seem to be a number MSIP/KEIT [K10047212, Development of homomorphic en- of memory events coming from CDI, the number of events cryption supporting arithmetics on ciphertexts of size less than that really need memory translation in ATU is relatively small 1kB and its applications], the Brain Korea 21 Plus Project in because our ATU has TLB and most memory translation end 2014, Software R&D Center of Samsung Electronics, and the up retrieving values from the TLB. We also conducted the National Research Foundation of Korea(NRF) grant funded by same experiment, with ESF turned on, monitoring the mutable the Korea government(MSIP) (No. 2014R1A2A1A10051792). objects related to the VFS hooking and LKM hiding attacks. REFERENCES As seen in the table, there is virtually no overhead caused by [1] N. L. Petroni Jr and M. Hicks, “Automated detection of persistent kernel Extrax because ESF filters out most memory events that do control-flow attacks,” in Proceedings of the 14th ACM conference on not access the monitored memory regions. Computer and communications security. ACM, 2007, pp. 103–115. Since turning off ESF does not cause serious performance [2] O. S. Hofmann, A. M. Dunn, S. Kim, I. Roy, and E. Witchel, “Ensuring overhead of 3.42%, some might think that ESF is not essential. kernel integrity with osck,” in ACM SIGPLAN Notices, vol. 46, no. 3. ACM, 2011, pp. 279–290. As explained before, however, its main goal is to reduce the [3] Z. Wang and X. Jiang, “Hypersafe: A lightweight approach to provide amount of CRI delivered to ATU. Reduced number of memory lifetime hypervisor control-flow integrity,” in Security and Privacy (SP), events would not only help ATU decrease the number of events 2010 IEEE Symposium on. IEEE, 2010, pp. 380–395. that need address translation (main memory access), but it [4] Y. Bulygin and D. Samyde, “Chipset based approach to detect virtual- would also drastically reduces the number of TLB accesses, ization malware,” BlackHat Briefings USA, 2008. which in turn might possibly save hugh amount of power [5] N. L. Petroni Jr, T. Fraser, J. Molina, and W. A. Arbaugh, “Copilot- consumed by ATU. a coprocessor-based kernel runtime integrity monitor.” in USENIX Security Symposium, 2004, pp. 179–194. D. Power Consumption [6] H. Moon, H. Lee, J. Lee, K. Kim, Y. Paek, and B. B. Kang, “Vigilare: To assess the power consumption of Extrax, we used toward snoop-based kernel integrity monitor,” in Proceedings of the 2012 ACM conference on Computer and communications security. Synopsys Design Compiler, Mentor Graphics ModelSim, and ACM, 2012, pp. 28–37. synthesized netlists of Snooper, ATU and ESF. Switching [7] H. Lee, H. Moon, D. Jang, K. Kim, J. Lee, Y. Paek, and B. B. Kang, activity interchange format (SAIF) files were extracted from “Ki-mon: a hardware-assisted event-triggered monitoring platform for Modelsim with synthesized input test sequences that maxi- mutable kernel object,” in USENIX conference on Security. USENIX mizes the power consumption of each component. Then the Association, 2013. SAIF files and netlists were given as the input to Synopsys [8] Z. Liu, J. Lee, J. Zeng, Y. Wen, Z. Lin, and W. Shi, “Cpu transparent Design Compiler with a commercial 45 nm process library to protection of os kernel and hypervisor integrity with programmable dram,” in Proceedings of the 40th Annual International Symposium on estimate power consumption. The results are presented in Table . ACM, 2013, pp. 392–403. V with other commodity processors as reference machines. As [9] “The blue pill, doi=http://theinvisiblethings. shown in the figure, Extrax consumes relatively small power blogspot.com/2008/07/0wing-xen-invegas.html.” as being compared to commodity processor cores in products [10] “Vmware: Vulnerability statistics.” ranging from low- to high-end computing devices. [11] W. Orme, “Debug and trace for multicore socs,” Sep 2008. [12] MicroBlaze Processor Reference Guide, Xilinx, Apr 2012. VII. CONCLUSION In this paper, we proposed to reuse the CDI feature readily [13] Embedded Trace Macrocell Architecture Specification, ARM, 2011. available for debugging in modern CPU cores in an effort to [14] S. Bahram, X. Jiang, Z. Wang, M. Grace, J. Li, D. Srinivasan, J. Rhee, and D. Xu, “Dksm: Subverting virtual machine introspection for fun and elevate the effectiveness of existing snoop-based monitors. We profit,” in Reliable Distributed Systems, 2010 29th IEEE Symposium on. first discussed several implementation complications involved IEEE, 2010, pp. 82–91. in the transfer of CDI signals for snoop-based monitors located [15] GRLIB IP Core User’s Manual, Aeroflex Gaisle, January 2012. outside the host CPU. Then we suggested Extrax which is an [16] The SPARC Architecture Manual, SPARC International, Inc., 1992. &RPSRQHQW $50&RUWH[$ $50&RUWH[$ 63$5&9 6QRRSHU $78 (6) [17] N. L. Petroni Jr, T. Fraser, A. Walters, and W. A. Arbaugh, “An archi- 'XDO&RUH 'XDO&RUH 6LQJOH&RUH QR/ 0%/ .%/ tecture for specification-based detection of semantic integrity violations 3URFHVV QP QP QP QP QP QP in kernel dynamic data,” in USENIX Security Symposium, 2006. 3RZHU : #0+] : P: P: P: X: &RQVXPSWLRQ : #*+] #*+] #0+] #0+] #0+] #0+] [18] J. L. Henning, “Spec cpu2006 benchmark descriptions,” SIGARCH Comput. Archit. News, vol. 34, no. 4, pp. 1–17, Sep. 2006. [Online]. TABLE V. POWER CONSUMPTION ANALYSIS Available: http://doi.acm.org/10.1145/1186736.1186737

156 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)