A Plim Computer for the Internet of Things, in Computer
Total Page:16
File Type:pdf, Size:1020Kb
COVER FEATURE VLSI FOR THE INTERNET OF THINGS A PLiM Computer for the Internet of Things Mathias Soeken, EPFL Pierre-Emmanuel Gaillardon, University of Utah Saeideh Shirinzadeh, University of Bremen Rolf Drechsler, University of Bremen and German Research Center for Artificial Intelligence Giovanni De Micheli, EPFL Emerging applications are dramatically changing computer architecture requirements, with a shift toward big data that is processed using simple computations. A programmable logic-in-memory (PLiM) computer can allow memory cells to perform primitive logic operations and therefore compute without needing to communicate with a processing unit. he work of Hungarian-American mathemati- in today’s highly optimized and sophisticated memory cian John von Neumann uniquely influenced hierarchies. The driving assumption behind this innova- how we design computers. His “First Draft of tion has been that computation is complex and must be a Report on the EDVAC,” written in 1945 while fast, and therefore memory needs to be readily available. Tvon Neumann was commuting by train to Los Alamos, Memory hierarchies allow fast access to small amounts New Mexico, proposed a uniform memory that contains of data and require longer times to access the larger both data and instructions. Known today as the von memory located deeper in the hierarchy. Consequently, Neumann architecture, this key concept has been con- the fundamental assumption underlying today’s com- tinually improved over the past few decades, resulting puting architectures is only valid as long as computation COMPUTER 0018-9162/17/$33.00 © 2017 IEEE JUNE 2017 35 VLSI FOR THE INTERNET OF THINGS density, lower power, and higher per- formance.10 In addition to their mem- ory properties, RRAMs can perform primitive Boolean logic operations. To begin, let’s review the basic Boolean switching primitive offered by RRAMs (see Figure 1). A RRAM is a two-terminal device with an inter- nal resistive state that can be pro- grammed depending on the voltage difference between the top electrode (a) (b) (c) (d) T and the bottom electrode B. Tran- FIGURE 1. Intrinsic majority operations: (a) a schematic of a resistive RAM (RRAM) cell sition occurs whenever T and B are with its internal state Z and electrodes B and T; (b) state machine illustrating how Z assigned different voltage. If T = 0 and changes based on values for B and T; (c) transition relation for the state machine, result- B = 1 (that is, VTB < Vprog), the resulting ing in the RM3 operation; and (d) truth table for the transition relation. low- resistance state is Z = 0. If T = 1 and B = 0 (that is, VTB > Vprog), then Z = 1. Here, Vprog is the memory technology’s is dominant and not too much data is applications),3,4 build the PLiM com- programming voltage, which for sim- being processed. puter from these devices, and program plicity we assume is symmetric. The Today, the requirements of emerg- the PLiM computer. The underlying truth table in Figure 1d summarizes ing applications such as deep learn- RRAM device switches its internal this behavior. ing, data fusion, and the Internet of state based on its two terminals via By denoting Z as the current resis- Things (IoT) are a challenge for the von inversion (complementation) and a tance value and Z′ as the resistance Neumann architecture as the focus majority-of-three operation. Conse- value after assigning signals to T and shifts to large amounts of data that are quently, for in-memory computing, B, it is possible to express Z′ as processed using comparably simple this approach offers an assembly-level computations. At this point, improve- abstraction in terms of a natively imple- ZZ′ =∧()TB∨∧ZT()BT= BZ , (1) ments to the memory hierarchy can- mented majority and complement oper- not solve the problem, so a revolution- ator. Therefore, we can use innovations where 〈xyz〉 = xy ∨ xz ∨ yz is the Bool- ary change is necessary. In-memory in majority-based logic synthesis5–9 to ean three-input majority function computing is a promising candidate.1,2 program the PLiM computer. Finally, that evaluates to true, if and only if at With this approach, memory cells can because programs are data that are least two of its inputs are true. In the perform primitive logic operations executed directly in memory, we can special case of Equation 1, one oper- and can therefore compute without link applications by providing parts of and is negated, and for convenience, needing to communicate with a pro- the program’s instructions from dis- we define RM3(T, B, Z) = TBZ for a cessing unit. In addition, independent tributed devices. This innovative and three-input resistive majority. RM3 is memory cells can perform their com- new programming paradigm ideally universal and will be used as the PLiM’s putations in parallel. matches the capabilities of in-memory elementary computing operation. In this article, we propose a pro- computing in the IoT context. grammable logic-in-memory (PLiM) PLiM COMPUTER computer and demonstrate how it INTRINSIC MAJORITY The general philosophy underpin- can help implement IoT applications. OPERATIONS ning the PLiM architecture addresses We show how to implement a Bool- Among other types of emerging non- how to add computing capabilities ean majority operation with a single volatile memories, RRAMs are con- (through bit-level RM3 instructions) to resistive RAM (RRAM) memory device sidered a leading candidate to imple- a regular dense memory array. Extra (which can be used in industrial- scale ment memory arrays with higher hardware is necessary to obtain a 36 COMPUTER WWW.COMPUTER.ORG/COMPUTER computer’s abstraction without losing techniques were initially presented 12 clock the standard memory functionality. in earlier work. ) For the sake of con- reset enable lim Figure 2 shows the PLiM computer venience, we introduce the following rw enable architecture, which consists of a stan- commands, which are shorthand for rw PLiM data Memory array dard memory array with signals that several useful RM3 instructions. Given data address are wrapped with the PLiM controller. registers a, b, and z, we define address This controller is a lightweight syn- chronous block that controls the mem- › ZERO( ): ← RM3(0, 1, ) = 〈00 〉 ory array’s access bus to allow a compu- = 0 FIGURE 2. Programmable logic-in- 푧 푧 푧 푧 tation mode to run. The computation › ONE( ): ← RM3(1, 0, ) = 〈11 〉 = 1 memory (PLiM) computer. The architecture mode runs a sequential execution of › BUF( , ): ZERO( ; ← RM ( , 0, consists of a standard memory array and 푧 푧 푧 3 푧 a given set of instructions that repre- = 〈 10〉 = a lightweight controller for data access 푎 푧 푧) 푧 푎 푧) sent a program. The program is stored › NOT( , ): ZERO( ); ← RM (1, , and controlling whether the memory 푎 푎 3 on the memory array, and its output = 〈1a 0〉 = a behaves as a default memory or performs 푎 푧 푧 푧 푎 푧) updates the memory array itself. The › RM( , , ): ← RM3( , , in- memory computations. logic-in-memory (LiM) input controls 푎 푏 푧 푧 푎 푏 푧) the transition between the computa- In these commands, z is the modified tion and memory modes. register. Also, note that the commands PLiM revolves around one single ZERO, ONE, and RM require exactly one y y instruction: RM3(A, B, C). The instruc- RM3 instruction, whereas the other tion takes three operands (A, B, and Z), two require two instructions. 2 2 applies the RM majority operation with Next, we show how to use majority- x x 3 2 x 1 2 x 1 A as the top electrode and B as the bot- inverter graphs (MIGs)9 to translate 4 4 tom electrode, and updates the value of Boolean functions into a sequence of x1 x3 x1 x3 (a) x2 x2 Z accordingly. The single- instruction RM3 instructions that compute the scheme simplifies the architecture as it functions. The left side of Figure 3a y is directly associated with the memory’s illustrates the idea using a small MIG. intrinsic logic operation. It consists of two nodes and four pri- 6 The architecture’s source, destina- mary inputs x1, x2, x3, and x4. We want tion, and processing unit is the memory to compute the function of the single block itself. Performing the instruction primary output y. We consider primary 45 simply means loading the bit-level val- inputs to be environment variables that x 1 ues of A and B from memory and apply- cannot be modified by PLiM instruc- 3 ing them to Z. Also, the instruction itself tions. Because none of the primary is stored on the same memory block. inputs can be overridden, we need a 123 Hence, to execute an instruction, the free RRAM to which we can write the 0 x x x x x x 2 2 3 1 x 3 instruction is first loaded from memory, result in order to compute the output 1 1 2 the operands are then loaded from mem- of node 1. Also, we need to get rid of (b) ory, and the operands are finally applied one of the inverters, because the RM3 to the destination. (Additional details operation expects exactly one input FIGURE 3. Using majority-inverter graphs about the RM3 instruction encoding are to be inverted. The two commands— (MIGs) to translate Boolean functions into a 11 available in earlier work. ) NOT(x3, z1); RM(x1, x2, z1)— compute the sequence of RM3 instructions. (a) Rewriting value of node 1 and require three RM3 an MIG involves using inverter propagation, PLiM COMPILER instructions and one RRAM cell z1, which can lead to better starting points for We can now show how to compile which stores the node’s output.