A Field Programmable Transistor Array Featuring Single-Cycle Partial

A Field Programmable Transistor Array Featuring Single-Cycle Partial/Full Dynamic Reconfiguration Jingxiang Tian, Gaurav Rajavendra Reddy, Jiajia Wang, William Swartz Jr., Yiorgos Makris and Carl Sechen Department of Electrical Engineering, The University of Texas at Dallas, Richardson, TX 75080 Email: {jxt122130, gxr141930, jxw143630, wps100020, gxm112130, cms057000}@utdallas.edu Abstract—We introduce a CMOS computational fabric con- In the following, we discuss the FPTA architecture and sisting of carefully arranged regular rows and columns of tran- design features which support each of the above three ob- sistors which can be individually configured and appropriately jectives (Sections II-IV). To demonstrate the aforementioned interconnected in order to implement a target digital circuit. Termed Field Programmable Transistor Array (FPTA), this capabilities, we designed and laid out an FPTA prototype in novel reconfigurable architecture enables several highly-desirable the IBM 130nm 1.2V process and we developed a CAD tool features including (i) simultaneous storage of three configurations flow (Section V) to synthesize, place and route designs onto it. along with the ability to dynamically switch between them Results using various benchmark circuits (Section VI) confirm in a fraction of a single cycle, while retaining the fabric’s that, despite the added features of single-cycle switching computational state, (ii) rapid or full modification of a stored configuration in a time proportional to the number of modified between multiple designs and rapid partial reconfiguration, configuration bits through the use of hierarchically arranged, our FPTA achieves better area utilization in comparison to high throughput, asynchronously pipelined memory buffers, and a typical FPGA in the same technology node. (iii) support for libraries containing cells of the same height and variable width, just as in a typical standard cell circuit, thereby II. TRANSISTOR-LEVEL PROGRAMMING simplifying transition from a prototype to a custom IC design. Besides presenting the design details of this fabric in a 130nm To present our FPTA’s ability to support transistor-level pro- technology and demonstrating the aforementioned capabilities, gramming, we first describe its overall architecture, including we also briefly discuss the development of a complete CAD flow the basic logic cell structure and the routing resources. for programing this fabric and we use numerous benchmark circuits to contrast its area efficiency against a typical FPGA A. FPTA Architecture Overview implemented in the same technology node. The FPTA architecture, which resembles a standard cell I. INTRODUCTION circuit, consists of numerous long rows of transistors and can work with cell libraries similar to those used for a typical We present a novel field programmable device, developed standard cell-based ASIC, where each cell has the same height on conventional static CMOS processes, which has significant and variable width. In the FPTA, the granularity of the width differences and potential advantages over field programmable of the cells is one column of transistors. As shown in Fig. gate arrays (FPGAs). Specifically, our design seeks to: 1(a), a column consists of two pMOS transistors above two 1) Improve area utilization: Unlike the basic configurable nMOS transistors. This basic column is replicated repeatedly logic block (CLB) of FPGAs, which employs look-up tables in the horizontal direction, forming a row. Vertical transistors (LUTs) to generate combinational logic functions [1], our in Fig. 1(a) can be programmed to be always on, always off, FPTA relies on a carefully-arranged, configurable array of or to receive logic signals. Among the horizontal transistors, transistors, which can be interconnected to implement standard while the innermost (blue-colored) ones are strictly used for library cells. Thereby, for logic outside the custom cells isolation, the outermost ones not only support isolation but (e.g., full adders, D flip-flops, multiplexers) that both FPGAs also enable the use of logic functions that require up to and our FPTA explicitly possess, we surmise that transistor three transistor in series. The metal1 (M1) layer is used to utilization in our FPTA is better than in FPGA LUTs. In other interconnect the transistors and various logic gates. words, an FPGA may allocate an entire LUT to implement The actual hardware implementation of the row-based FPTA even a relatively simple gate, while our FPTA allocates only groups every four columns of transistors into so-called ‘logic the precise number of columns of transistors required. cells’. In Fig. 1(a), a potential logic gate output is illustrated 2) Enable time-sharing between multiple circuits: Our by means of a small green rectangle. Each potential output design supports simultaneous storage of three separate con- is optionally connected to a vertical metal2 (M2) track, by figurations, each with its own computational state. Therefore, means of a programmed switch. In addition, the two pMOS we also provide the means to switch, in a fraction of a single and the two nMOS inputs in a column are directly connected cycle, between any of the three stored configurations, or load to individual vertical M2 tracks. Each of these tracks is driven a new configuration while toggling between the other two. by either a programming bit or a logic signal. 3) Support rapid circuit updates: Instead of being serially In addition to the four columns of transistors, each logic loaded, our FPTA configuration is stored in hierarchically cell comes with a custom D-flip-flop (DFF), a full adder arranged, high throughput, asynchronously pipelined memory (FA) and a multiplexer (MUX), whose inputs and outputs buffers. This enables not only faster configuration but also (I/Os) are optionally connected to logic cell I/Os. The D input rapid dynamic partial reconfiguration wherein only a portion of (Signal in) of the DFF, which is shown in Fig. 1(b), is either a circuit is reloaded by addressing specific transistor columns. connected to a logic transistor in the first column of the logic 978-3-9815370-8-6/17/$31.00 c 2017 IEEE 1336 Signal_in1 Mem_in1 Signal_in2 Mem_in2 Mem_in1 Signal_in2 Mem_in2 Signal_in1 ctrl1 ctrl1 1 0 ctrl2 1 0 1 0 ctrl2 1 0 M2_7t M2_8t M2_10t M2_11t en en en 0 1 en 0 1 M2_2t M2_9t M2_5t M2_12t 0 1 0 1 VDD 0 1 M2_1t M2_3t M2_4t M2_6t M2_13t M2_14t M2_15t M2_16t en en en po A sq MC B MUX M2_O1 M2_O2 M2_O3 M2_O4 en MUX B FA A MUX en Q SQ scan MS en 0 1 S rst C DFF/ MS M2_13b M2_14b M2_15b M2_16b en 1 0 SCAN DFF clk po M2_1b M2_3b M2_4b M2_6b SD D en sd en 0 1 en 0 1 0 1 ctrl3 M2_2b M2_9b M2_5b M2_12b 0 1 ctrl3 0 1 M2_7b M2_8b M2_10b M2_11b 0 1 ctrl Mem_in3 Signal_in3 Mem_in3 Signal_in3 Mem_in (a) (b) Signal_in (c) (d) Fig. 1: (a) Logic cell structure, (b) Built-in D flip-flop, (c) Built-in full adder, and (d) Built-in multiplexer cell (if ctrl = 1 and enable = 0) or to the D input of the DFF each metal line is labeled with the letter ‘M’ followed by the if enable = 1 via the de-multiplexer. All DFFs provided by layer number, and then an underscore followed by the line or the logic cells are connected in a scan chain. The three inputs track number. In Fig. 1(a), the transistors in the logic cell are of the FA, which is shown in Fig. 1(c), span across the 2nd labeled as to how they connect with M2 lines in Fig. 2(c). and 3rd columns of the logic cell. The carry and sum outputs Each small square (of various colors) shown in Fig. 2(c) is (either inverted or non-inverted) are provided at the outputs a switch, implemented by an nMOS transistor controlled by of the 2nd and 3rd columns, respectively. The custom MUX, a programming bit, with the source and drain connecting the which is shown in Fig. 1(d), only occupies one column. The two perpendicular metal lines (on different layers) as shown MUX output is provided in either inverting or non-inverting in Fig. 2(d) (with the programming bit called ctrl in this case). form at the output of the 4th column. Since the pass transistors do not pass a full Vdd, a half-keeper Figs. 2(a) and (b) depict how two different cells, i.e., a is added to each metal segment, as shown in Fig. 2(d). NAND3 and an AOI22 gate, are programmed in a logic cell. There are 12 vertical M4 lines that go over the logic cell All the transistors, except for the ones highlighted in black, are unit along with switches connecting to the 17 horizontal M3 turned off. Among the highlighted transistors, the ones with lines (switches in gray) and to the 9 M5 lines (switches in signal names at their gate terminals receive primary inputs; red). Each of the 9 M3 lines and each of the 9 M5 lines has the rest are turned on to complete the circuit. Transistors 4 switches to M4. Each of the remaining 8 M3 lines has 3 highlighted in blue are turned on with programming bits to switches to M4. For the switchbox above (below) the logic form the output node of the logic function. cell, the 16 M2 lines terminate inside the logic cell, either at Note that a pull-down (or pull-up) network of three tran- a pMOS (nMOS) gate input or at an output. In fact, 4 M2 sistors in series is the maximum possible. The motivation lines connect to the outputs of the logic cell and 12 M2 lines for limiting to three transistors in series was based on area connect to pMOS (nMOS) inputs. Each of the 12 M2 lines, efficiency versus power efficiency concerns.

A Field Programmable Transistor Array Featuring Single-Cycle Partial

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support