Exploring Voltage Scaling Techniques in Embedded Processors Hardware Monitors
Total Page:16
File Type:pdf, Size:1020Kb
Exploring Voltage Scaling Techniques in Embedded Processors Hardware Monitors Arman Pouraghily, Padmaja Duggisetty, Thiago Teixeira VLSI Design Principles Final Report - ECE 658 Fall 2014 Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA, USA Email: fapouraghily,pduggisetty,[email protected] Abstract—The Internet is a very important communication is taken into account. Our goal to estimate power consumption infrastructure in modern life, with applications varying from overhead – imposed by the introduction of hardware monitor banking transactions, transfer of copyrighted material, compa- – and by applying power saving techniques, minimizing this nies’ assets, to the most simple web surfing. The role of Internet is expected to grow exponentially with cloud computing and overhead is achieved by using voltage scaling techniques. Internet of Things. In this context, the router will continue to The basic idea behind our approach is the simple fact that be the equipment carrying most of the Internet traffic, with the logical complexity of the modules used in the hardware an increasing number of routers’ packet forwarding application monitor is by far less that the ones used in the main CPU being deployed in the form of programmable network processor. Many security algorithms have been developed for the network which means that our hardware monitor could run with much security as a whole in order to increase reliability of com- higher clock rate. However, running the hardware monitor at a munications. The network infrastructure security has received higher frequency than the Network Processor leads to resource less attention, even though attacks in the data plane that can wastage. This is because the hardware monitor should track trigger changes in the network processor have been successfully and monitor the behaviour of the main processor; so running demonstrated. Hardware monitors are the proposed solution for securing the network infrastructure, by comparing the network it at a higher frequency means that it would overtake the main processor correct instructions with running instructions, resetting processor which defeats the whole purpose of monitoring. and recovering the processor if the instruction has any deviation This simple fact gives rise to an added advantage; being that do not match the monitoring graph. Hardware monitors able to run at higher frequency means having much more have to execute instructions at a very high speed. However, our slack time for a predefined time budget (which is imposed challenge in this work is to look at techniques to reduce the voltage level, which turns the hardware monitor slower, while by the critical path delay of the main processor). The main maintaining its functionality. Hence, consuming less power. objective of our work is to make a good use of this slack Index Terms—hardware monitor, voltage scaling. time. We also know that the delay of a digital circuit is inversely affected by supply voltage value which means that I. INTRODUCTION decreasing the supply voltage will increase the delay of the With the ubiquitous presence of Internet in today’s soci- circuit. But the power consumption of a circuit increases ety, ensuring a trustworthy communication is key. Financial quadratically with the supply voltage, meaning that a small transactions, private data of companies flow from one office change in supply voltage has a significant impact of power to another, and private user data are a few examples of where consumption of the circuit. This work deals predominantly the Internet requires to work correctly. with the library characterization of digital CMOS logic gates An important part of the Internet infrastructure is the with respect to different voltage levels in order to predict delay router. An increasing number of routers are shipped with and power consumption. Hence, power consumption of the programmable packet processing, used by vendors to extend hardware monitor is reduced since it now operates at a lower system functionality. The packet processing applications are voltage level than the nominal voltage. With the results from implemented in the form of network processor (NP). our work we see that voltage scaling has a huge impact in Furthermore, [1] have showed that the network processor power savings and the added circuit still satisfies the timing can be exploited using an integer overflow attack. Hardware budget. monitors are the standard solution for preventing attacks on network processors. [2] have showed a solution that uses a single memory read per instruction to compare the malicious II. LITERATURE REVIEW code with the code being executed. If the monitor detects a malicious pattern (an invalid state for instance), the packet is dropped. This work comprises of two main topics: the hardware mon- The hardware monitor is an extra hardware that has to be itor operation and the voltage scaling. The first one concerns accommodated on the chip. We desire it to be small and about the device security, while the second one concerns about consume less power. Hence, in this work, power consumption reducing the power consumption as much as possible. A. Hardware Monitor this the solution is to represent the DFA states with varying numbers of outgoing edges by encoding all the necessary Modern high-performance routers no longer use application- information in a single table entry and to group states by the specific integrated circuits (ASICs), instead, they use pro- number of outgoing edges and by the same previous state. The grammable network processors [3]. Network processors are memory contains tuples and is logically divided into groups. multi-core high-performance embedded systems that imple- The base addresses for each group are stored in register file ment packet forwarding and other network functions pro- with 16 entries. This makes accessing the memory for hash grammed with software. While programmable network proces- code of the next instruction faster. An increase in speed was sors offer router vendors and network providers the flexibility observed with such an implementation of memory usage over to remotely reprogrammed the equipment, it also exposes po- previous approaches. tential risks to the Internet infrastructure. Defence mechanisms Code injection attacks are feasible on a Harvard architecture against data plane attacks on network processors have been processor using a return-oriented programming technique. In proposed. Specifically, hardware monitors that operate in par- such attacks the attacker takes control of return instructions in allel with network processors, monitoring the processor core the stack to chain attack code from an existing function.Since and comparing the with monitoring graph. If the behaviour the code is already in the executable memory, the attack can deviates from the monitoring graph, the processor core is reset not be prevented. One such attack which is possible by integer (e.g. drop the packet) and recovered. overflow vulnerability is presented in the paper. An attacker An effective network processor monitoring system needs sends a UDP packet with a maximum size i.e., 65534. But this to verify every instruction that is executed by the processor. passes the maximum packet size since 65534 + 12 = 10 due Due to this reason the monitors need to run at very high to integer overflow. The packet payload is made sure that the speeds to match up with the processor speed. This instruction- return address is overwritten and all the ports are over flooded based monitoring can be viewed as a finite automaton with with the attack packets. As a result the system crashes. As soon a fixed number of acceptable paths. A deterministic finite as the control flow changes, the hash values reported by the automaton (DFA) has been used to perform instruction level processor no longer match the monitoring graph information monitoring as opposed to non-deterministic finite automaton and the system is reset. These kind of attacks are hence (NFA). Using DFA reduces the requirement of a high memory detected in the developed hardware monitor since there is bandwidths when compared to an NFA used for monitoring. no valid edge between the states. All the above work was All the previous work in embedded security had been done implemented in fixed logic and prototyped on a stratix IV on a Von Neumann processor architecture. But the network GX230 FPGA located on a Altera DE4 board. processors use a Harvard architecture. So an example attack In [4] the authors have extended the work to multi-core was presented to prove the existence of attacks on Harvard network processors, implemented in a field-programmable gate architecture and how the processor is prevented from such an array (FPGA) platform. attack. These two key problems were addressed in the paper. In [2], the authors developed a high performance hardware B. Voltage Scaling Technique monitor that takes a single memory read per instruction, oper- Moore’s law has been driving the semiconductor industry ating at speeds sufficient to maintain the network data transfer advances since 1965. However, with the miniaturization of rate. A deterministic monitoring graph is implemented in the the transistor and consequently the increasing transistor count, form of a state machine derived from the packet processing power consumption have became a barrier for the advancing code. For each instruction executed on the processor core, of Moore’s law. Multi-core processors was the solution en- a hash value of the executed operation is reported to the countered by industry to increase computation power. monitor. The monitor uses the comparison logic to compare The most advanced system on chip (SoC) on the market the reported hash value to the information that is stored in the can easily reach billions of transistors. For instance, the monitoring graph. The monitoring graph used by the monitor last Apple’s A8 processor has a dual-core CPU and have is a state machine, where each state is represented by a specific approximately two billion transistor, fabricated in a 20 nm processor instruction.