COVER FEATURE VLSI FOR THE INTERNET OF THINGS

A PLiM for the Internet of Things

Mathias Soeken, EPFL Pierre-Emmanuel Gaillardon, University of Utah Saeideh Shirinzadeh, University of Bremen Rolf Drechsler, University of Bremen and German Research Center for Artificial Intelligence Giovanni De Micheli, EPFL

Emerging applications are dramatically changing computer architecture requirements, with a shift toward big data that is processed using simple computations. A programmable logic-in-memory (PLiM) computer can allow memory cells to perform primitive logic operations and therefore compute without needing to communicate with a processing unit.

he work of Hungarian-American mathemati- in today’s highly optimized and sophisticated memory cian John von Neumann uniquely influenced hierarchies. The driving assumption behind this innova- how we design . His “First Draft of tion has been that computation is complex and must be a Report on the EDVAC,” written in 1945 while fast, and therefore memory needs to be readily available. Tvon Neumann was commuting by train to Los Alamos, Memory hierarchies allow fast access to small amounts New Mexico, proposed a uniform memory that contains of data and require longer times to access the larger both data and instructions. Known today as the von memory located deeper in the hierarchy. Consequently, Neumann architecture, this key concept has been con- the fundamental assumption underlying today’s com- tinually improved over the past few decades, resulting puting architectures is only valid as long as computation

COMPUTER 0018-9162/17/$33.00 © 2017 IEEE JUNE 2017 35 VLSI FOR THE INTERNET OF THINGS

density, lower power, and higher per- formance.10 In addition to their mem- ory properties, RRAMs can perform primitive Boolean logic operations. To begin, let’s review the basic Boolean switching primitive offered by RRAMs (see Figure 1). A RRAM is a two-terminal device with an inter- nal resistive state that can be pro- grammed depending on the voltage difference between the top electrode (a) (b) (c) (d) T and the bottom electrode B. Tran- FIGURE 1. Intrinsic majority operations: (a) a schematic of a resistive RAM (RRAM) cell sition occurs whenever T and B are with its internal state Z and electrodes B and T; (b) state machine illustrating how Z assigned different voltage. If T = 0 and changes based on values for B and T; (c) transition relation for the state machine, result- B = 1 (that is, VTB < Vprog), the resulting ing in the RM3 operation; and (d) truth table for the transition relation. low-­resistance state is Z = 0. If T = 1 and B = 0 (that is, VTB > Vprog), then Z = 1. Here, Vprog is the memory technology’s is dominant and not too much data is applications),3,4 build the PLiM com- programming voltage, which for sim- being processed. puter from these devices, and program plicity we assume is symmetric. The Today, the requirements of emerg- the PLiM computer. The underlying truth table in Figure 1d summarizes ing applications such as deep learn- RRAM device switches its internal this behavior. ing, data fusion, and the Internet of state based on its two terminals via By denoting Z as the current resis- Things (IoT) are a challenge for the von inversion (complementation) and a tance value and Z′ as the resistance Neumann architecture as the focus majority-of-three operation. Conse- value after assigning signals to T and shifts to large amounts of data that are quently, for in-memory computing, B, it is possible to express Z′ as processed using comparably simple this approach offers an assembly-level computations. At this point, improve- abstraction in terms of a natively imple- ZZ′ =∧()TB∨∧ZT()BT= BZ , (1) ments to the memory hierarchy can- mented majority and complement oper- not solve the problem, so a revolution- ator. Therefore, we can use innovations where 〈xyz〉 = xy ∨ xz ∨ yz is the Bool- ary change is necessary. In-memory in majority-based logic synthesis5–9 to ean three-input majority function computing is a promising candidate.1,2 program the PLiM computer. Finally, that evaluates to true, if and only if at With this approach, memory cells can because programs are data that are least two of its inputs are true. In the perform primitive logic operations executed directly in memory, we can special case of Equation 1, one oper- and can therefore compute without link applications by providing parts of and is negated, and for convenience, needing to communicate with a pro- the program’s instructions from dis- we define RM3(T, B, Z) = TBZ for a cessing unit. In addition, independent tributed devices. This innovative and three-input resistive majority. RM3 is memory cells can perform their com- new programming paradigm ideally universal and will be used as the PLiM’s putations in parallel. matches the capabilities of in-memory elementary computing operation. In this article, we propose a pro- computing in the IoT context. grammable logic-in-memory (PLiM) PLiM COMPUTER computer and demonstrate how it INTRINSIC MAJORITY The general philosophy underpin- can help implement IoT applications. OPERATIONS ning the PLiM architecture addresses We show how to implement a Bool- Among other types of emerging non- how to add computing capabilities ean majority operation with a single volatile memories, RRAMs are con- (through bit-level RM3 instructions) to resistive RAM (RRAM) memory device sidered a leading candidate to imple- a regular dense memory array. Extra (which can be used in industrial-­scale ment memory arrays with higher hardware is necessary to obtain a

36 COMPUTER WWW.COMPUTER.ORG/COMPUTER computer’s abstraction without losing techniques were initially presented 12 clock the standard memory functionality. in earlier work. ) For the sake of con- reset enable lim Figure 2 shows the PLiM computer venience, we introduce the following rw enable architecture, which consists of a stan- commands, which are shorthand for rw PLiM data Memory array dard memory array with signals that several useful RM3 instructions. Given data address are wrapped with the PLiM controller. registers a, b, and z, we define address This controller is a lightweight syn- chronous block that controls the mem- ››ZERO( ): ← RM3(0, 1, ) = 〈00 〉 ory array’s access bus to allow a compu- = 0 FIGURE 2. Programmable logic-in-­ 푧 푧 푧 푧 tation mode to run. The computation ››ONE( ): ← RM3(1, 0, ) = 〈11 〉 = 1 memory (PLiM) computer. The architecture mode runs a sequential execution of ››BUF( , ): ZERO( ; ← RM ( , 0, consists of a standard memory array and 푧 푧 푧 3 푧 a given set of instructions that repre- = 〈 10〉 = a lightweight controller for data access 푎 푧 푧) 푧 푎 푧) sent a program. The program is stored ››NOT( , ): ZERO( ); ← RM (1, , and controlling whether the memory 푎 푎 3 on the memory array, and its output = 〈1a 0〉 = a behaves as a default memory or performs 푎 푧 푧 푧 푎 푧) updates the memory array itself. The ››RM( , , ): ← RM3( , , in-­memory computations. logic-in-memory (LiM) input controls 푎 푏 푧 푧 푎 푏 푧) the transition between the computa- In these commands, z is the modified tion and memory modes. register. Also, note that the commands PLiM revolves around one single ZERO, ONE, and RM require exactly one y y instruction: RM3(A, B, C). The instruc- RM3 instruction, whereas the other tion takes three operands (A, B, and Z), two require two instructions. 2 2 applies the RM majority operation with Next, we show how to use majority-­ x x 3 2 x 1 2 x 1 A as the top electrode and B as the bot- inverter graphs (MIGs)9 to translate 4 4 tom electrode, and updates the value of Boolean functions into a sequence of x1 x3 x1 x3 (a) x2 x2 Z accordingly. The single-­instruction RM3 instructions that compute the scheme simplifies the architecture as it functions. The left side of Figure 3a y is directly associated with the memory’s illustrates the idea using a small MIG. intrinsic logic operation. It consists of two nodes and four pri- 6

The architecture’s source, destina- mary inputs x1, x2, x3, and x4. We want tion, and processing unit is the memory to compute the function of the single block itself. Performing the instruction primary output y. We consider primary 45 simply means loading the bit-level val- inputs to be environment variables that x 1 ues of A and B from memory and apply- cannot be modified by PLiM instruc- 3 ing them to Z. Also, the instruction itself tions. Because none of the primary is stored on the same memory block. inputs can be overridden, we need a 123 Hence, to execute an instruction, the free RRAM to which we can write the 0 x x x x x x 2 2 3 1 x 3 instruction is first loaded from memory, result in order to compute the output 1 1 2 the operands are then loaded from mem- of node 1. Also, we need to get rid of (b) ory, and the operands are finally applied one of the inverters, because the RM3 to the destination. (Additional details operation expects exactly one input FIGURE 3. Using majority-inverter graphs about the RM3 instruction encoding are to be inverted. The two commands— (MIGs) to translate Boolean functions into a 11 available in earlier work. ) NOT(x3, ); RM(x1, x2, z1)—­compute the sequence of RM3 instructions. (a) Rewriting value of node 1 and require three RM3 an MIG involves using inverter propagation, PLiM COMPILER instructions and one RRAM cell z1, which can lead to better starting points for We can now show how to compile which stores the node’s output. For the PLiM program compilation. (b) The order arbitrary Boolean functions into remainder of this article, we will use z in which nodes are processed in the MIG

RM3 instruction streams. (These variables to refer to RRAM cells. affects the PLiM program’s quality.

JUNE 2017 37 VLSI FOR THE INTERNET OF THINGS

the polarity requirement of operand B NOT(x1, z1) 3 RM(x1, , z5) BUF(x2, z1) 3 RM(x1, , z4) BUF( , ) NOT( , ) 1 RM( , 1, ) 5 RM( , , ) x2 x3 z6 x1 z1 z1 z2 z4 or the RM3 instruction—for example, 1 RM(0, , ) ONE( ) ONE( ) BUF( , ) z1 z2 z7 z2 x3 z2 as we did for node 1. In the program BUF(x , z ) 4 RM(z , z , z ) 2 RM(x , x , z ) 4 RM(z , 0, z ) 3 3 2 6 7 3 2 2 1 2 in Figure 4a, we chose to invert input 2 RM(1, x , z ) 5 RM(z , z , z ) NOT(x , z ) 6 RM(z , z , z ) 2 3 2 3 5 2 3 1 4 2 x , requiring a NOT command and NOT(x2, z4) 6 RM(z7, z5, z2) BUF(x3, z4) 1

BUF(x3, z5) one additional RRAM. Those can be (a) (b) avoided when using the constant 0 as

FIGURE 4. Two PLiM programs constructed from the MIG in Figure 3b: (a) 19 RM3 an inverted constant 1, as we did in the instructions and seven RRAM cells and (b) 15 RM3 instructions and four RRAM cells. The Figure 4b program. difference in the programs is due to the choice of node-traversal order and the mapping Figure 5 shows our experimen- of the nodes’ children to RM3 operands. The i indicates the computation of node i. tal results for the PLiM compiler. We applied the PLiM compiler to MIGs for instances in the EPFL (École Polytech- In the next step, we can compute programs by simply changing the nique Fédéderale de Lausanne) bench- node 2 with a similar sequence: NOT(z1, order in which the nodes are traversed mark suite (lsi.epfl.ch/benchmarks). z2); RM(x2, x4, z2). Again, this requires and by changing the children of each Figure 5a gives the number of RM3 three RM3 instructions and one addi- node to be used as the operand in the instructions, and Figure 5b gives the tional RRAM cell z2. RM3 instruction. Nodes can be tra- number of instructions. The blue bars In total, the PLiM program requires versed as long as they follow a topo- correspond to an approach in which we six instructions and two RRAMs. Sev- logical order from the primary inputs directly translated the MIGs to PLiM eral MIGs exist that realize the same to the primary outputs. Operands can programs following a node-traversal function, and we can obtain one from be selected arbitrarily because the order on node indexes and selecting the other by applying rewriting rules.9 majority operation is fully symmet- operands from left to right. The red We can illustrate the effect by applying ric. However, the choice of a traversal bars show the effect after MIG rewriting the inverter propagation order and operand mapping can have but still using the naive node-­traversal a significant impact, so we are inter- order and operand selection. Finally, xx,,xx,,xx (2) ested in finding a good one. Figure 3b the brown bars show the effect after MIG 1231= 23 illustrates this effect. rewriting and taking into consideration to node 1. Based on the MIG in Figure 3b, sev- heuristics for better node traversal and The right side of Figure 3a illus- eral different PLiM programs can be operand selection. These results show trates the resulting MIG. This MIG can constructed using the technique we that MIG rewriting strongly affects the be translated into the PLiM program just discussed. Figure 4 shows two number of instructions, but not neces-

BUF(x3, z1); RM(x2, x1, z1); RM(x2, x4, z1). example PLiM programs. For the lon- sarily the number of RRAMs. However, This program only requires four RM3 ger program (Figure 4a), which con- when taking the node-traversal heuris- instructions and one RRAM cell. In sists of 19 RM3 instructions and 7 tics into account, the number of RRAMs addition to inverter propagation, the RRAM cells, we used the traversal decreases, but there is little gain in the other rewriting rules from the axi- order 1, 2, 3, 4, 5, 6. The shorter pro- number of instructions. omatic set of MIG manipulation rules gram (Figure 4b) was created using the can lead to MIGs for which we can find traversal order 1, 2, 3, 5, 4, 6. The latter better PLiM programs. Furthermore, program consists of 15 RM3 instruc- he PLiM computer we describe the MIG rewriting algorithms to opti- tions and only 4 RRAM cells. However, here is a low-power platform mize for PLiM compilation differ from the traversal order is not the only cause that is capable of implement- rewriting algorithms that target area of this improvement. When selecting Ting the IoT applications of tomorrow. or delay optimization in conventional which child to map to which operand In-­memory computing is a better fit logic synthesis. in the RM3 instruction, constants in for IoT applications than conventional Even without rewriting the particular allow some freedom. We von Neumann architectures because MIG, we can obtain different PLiM can always invert a constant to match it can deal with large amounts of data

38 COMPUTER WWW.COMPUTER.ORG/COMPUTER using comparably simple computa- –105 tions. PLiM computers and programs 1.5 allow a new paradigm of computing, Naive Rewriting Rewriting and transversal where programs are sequences of

RM3 instructions that send the data from one PLiM computer to another. 1.0 These PLiM programs can be par- tial and distributed, where each IoT

device provides its part to the compu- of instructions tation. This model allows a high degree No. 0.5 of configurability. As part of our ongoing research efforts, we are evaluating the physical 0 design of a PLiM computer and more t c div sin i2c ctrl bar dec sqrt max advanced programming models. log2 voter cavl adder router square priority int2floa (a) memctrl multiplier

ACKNOWLEDGMENTS We thank Luca Amarù and Giulia Meuli 8,000 for fruitful discussions. This research Naive Rewriting Rewriting and transversal was supported by H2020-ERC-2014-ADG 669354 CyberCare, the Swiss National 6,000 Science Foundation (200021-169084 MAJ- esty and 200021-146600), the University of Utah SEED grant 51900298, and the Uni- 4,000 of RRAMs

versity of Bremen’s graduate school SyDe No. (System Design), funded by the German 2,000 Excellence Initiative.

0 t r REFERENCES y div sin i2c ctrl bar dec sqrt ma x log2 voter cavlc adder 1. M. Chang et al., “Designs of Emerg- router square priorit

(b) int2floa memctrl ing Memory Based Non-volatile multiplie TCAM for Internet-of-Things (IoT) FIGURE 5. Experimental results after compiling MIGs into PLiM programs: (a) number and Big Data Processing: A 5T2R of instructions and (b) number of RRAMs. The blue bars show a configuration in which Universal Cell,” Proc. Int’l Symp. the MIGs were translated naively without taking rewriting or node-traversal heuristics Circuits and Systems (ISCAS 16), 2016, into account. The red bars show the results after rewriting, and the brown bars show the pp. 1142–1145. results after rewriting and node-traversal heuristics. 2. M. Ueki et al., “Low-Power Embedded ReRAM Technology for IoT Applica- tions,” Proc. Symp. VLSI Circuits (VLSI 4. S. Sheu et al., “A 4Mb Embedded SLC Combinational Logic Using Three-­ Circuits 15), 2015, pp. 108–109. Resistive-RAM Macro with 7.2ns Input Majority Gates,” Proc. 3rd 3. R. Fackenthal et al., “A 16Gb ReRAM Read-Write Random Access Time and Ann. Symp. Switching Circuit Theory with 200MB/s Write and 1GB/s Read 160ns MLC-Access Capability,” Proc. and Logical Design (SWCT 62), 1962, in 27nm Technology,” Proc. IEEE Int’l IEEE Int’l Solid-State Circuits Conf. pp. 149–158. Solid-State Circuits Conf. (ISSCC 14), (ISSCC 11), 2011, pp. 200–202. 6. H.S. Miller and R.O. Winder, 2014, pp. 338–339. 5. S.B. Akers Jr., “Synthesis of “Majority-­Logic Synthesis by

JUNE 2017 39 VLSI FOR THE INTERNET OF THINGS

ABOUT THE AUTHORS

MATHIAS SOEKEN is a scientist at EPFL (École Polytechnique Fédéderale de Lausanne). His research interests include the many aspects of logic synthesis and formal verification. Soeken received a PhD in computer science and engi- neering from the University of Bremen. He is a member of IEEE and ACM. Con- tact him at [email protected]. Geometric Methods,” IRE Trans. Elec- tronic Computers, vol. 11, no. 1, 1962, PIERRE-EMMANUEL GAILLARDON is an assistant professor in the Electrical pp. 89–90. and Computer Engineering Department at the University of Utah. His research 7. R. Lindaman, “A Theorem for Deriv- interests include reconfigurable logic architectures and digital circuits exploit- ing Majority-Logic Networks within ing emerging device technologies and novel electronic design automation an Augmented Boolean Algebra,” techniques. Gaillardon received a PhD in electrical engineering from the Uni- IRE Trans. Electronic Computers, vol. 9, versity of Lyon. He is a Senior Member of IEEE and a member of ACM. Contact no. 3, 1960, pp. 338–342. him at [email protected]. 8. R. Zhang, P. Gupta, and N.K. Jha, “Majority and Minority Network SAEIDEH SHIRINZADEH is a PhD student in the Group for Computer Architec- Synthesis with Application to QCA-, ture at the University of Bremen’s Institute of Computer Science. Her research SET-, and TPL-Based Nanotechnol- interests include multiobjective optimization, evolutionary computation, logic ogies,” IEEE Trans. CAD of Integrated synthesis, and in-memory computing. Shirinzadeh received an MSc in electri- Circuits and Systems, vol. 26, no. 7, cal engineering from the University of Guilan. Contact her at s.shirinzadeh@ 2007, pp. 1233–1245. uni-bremen.de. 9. L.G. Amaru, P.-E. Gaillardon, and G. De Micheli, “Majority-Inverter ROLF DRECHSLER is a professor and the head of the Group for Computer Graph: A Novel Data Structure and Architecture at the University of Bremen’s Institute of Computer Science. He is Algorithms for Efficient Logic Opti- also the director of the Cyber-Physical Systems Group at the German Research mization,” Proc. Design Automation Center for Artificial Intelligence (DFKI). Drechsler’s research interests include Conf. (DAC 14), 2014, article no. 194. the development and design of data structures and algorithms, with a focus on 10. H.P. Wong et al., “Metal-Oxide circuit and system design. He received a PhD in computer science from J.W. RRAM,” Proc. IEEE, vol. 100, no. 6, Goethe University Frankfurt am Main. Drechsler is an IEEE Fellow. Contact him 2012, pp. 1951–1970. at [email protected]. 11. P.-E. Gaillardon et al., “The Pro- grammable Logic-In-Memory (PLiM) GIOVANNI DE MICHELI is a professor and director of the Institute of Electrical Computer,” Proc. Design, Automation Engineering at EPFL. He is also a program leader of the Nano-Tera.ch program. and Test in Europe (DATE 16), 2016, De Micheli is a recipient of the IEEE Computer Society Harry Goode Award and pp. 427–432. the European Design and Automation Association (EDAA) Lifetime Achieve- 12. M. Soeken et al., “An MIG-Based ment Award. He is a Fellow of IEEE and ACM, a former president of the IEEE Compiler for Programmable Circuits and Systems Society, a former IEEE Division 1 director, and a member of Logic-In-Memory Architectures,” Academia Europaea. Contact him at [email protected]. Proc. Design Automation Conf. (DAC 16), 2016, article no. 117.

Subscribe today for the latest in computational science and engineering research, news and analysis, CSE in education, and emerging technologies in the hard sciences. www.computer.org/cise

40 COMPUTER WWW.COMPUTER.ORG/COMPUTER