Design and Implementation of Hybrid Lut/Multiplexer Fpga Logic Architectures

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VIII /Issue 1 / DEC 2016 DESIGN AND IMPLEMENTATION OF HYBRID LUT/MULTIPLEXER FPGA LOGIC ARCHITECTURES Vutukuri Syam Kumar 1 Rambabu Kusuma2 MJRN Prasad3 [email protected] [email protected] 2 [email protected] 1PG Scholar, Dept of ECE, Srinivasa Institute Of Engineering And Technology College, Cheyyeru, East Godavari, Andhra Pradesh 2Assistant Professor, Dept of ECE, Srinivasa Institute Of Engineering And Technology College, Cheyyeru, East Godavari, Andhra Pradesh, 3Associate Professor, Dept of ECE, Srinivasa Institute Of Engineering And Technology College, Cheyyeru, East Godavari, Andhra Pradesh. , Abstract: Hybrid configurable logic block I. Introduction architectures for field-programmable gate arrays that contain a mixture of lookup tables and A field-programmable gate array hardened multiplexers are evaluated toward the (FPGA) is a block of programmable logic that goal of higher logic density and area reduction. can implement multi-level logic functions. Multiple hybrid configurable logic block FPGAs are most commonly used as separate architectures, both nonfracturable and commodity chips that can be programmed to fracturable with varying MUX:LUT logic implement large functions. However, small element ratios are evaluated across two blocks of FPGA logic can be useful components benchmark suites (VTR and CHStone) using a on-chip to allow the user of the chip to custom tool flow consisting of LegUp-HLS, customize part of the chip’s logical function. An Odin-II front-end synthesis, ABC logic synthesis FPGA block must implement both and technology mapping, and VPR for packing, combinational logic functions and interconnect placement, routing, and architecture exploration. to be able to construct multi-level logic VPR is used to model the new hybrid functions. There are several different configurable logic block and verify post place technologies for programming FPGAs, but most and route implementation.. In this paper logic processes are unlikely to implement anti- experimentally, we show that for nonfracturable fuses or similar hard programming technologies. architectures, without any mapper optimizations, Throughout the history of field-programmable we naturally save up to∼8% area post place and gate arrays (FPGAs), lookup tables (LUTs) have route. For fracturable architectures, experiments been the primary logic element (LE) used to show that only marginal gains are seen after realize combinational logic. A K-input LUT is place-and-route up to∼2%. For both generic and very flexible able to implement any nonfracturable and fracturable architectures, we K-input Boolean function. The use of LUTs see minimal impact on timing performance for simplifies technology mapping as the the architectures with best area-efficiency. problem is reduced to a graph covering problem. However, an exponential area Keywords— FPGA, Multiplexer logic element, Complex logic block, mapping technologies price is paid as larger LUTs are considered. The value of K between 4 and 6 is typically seen in industry and academia, and this IJPRES INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VIII /Issue 1 / DEC 2016 range has been demonstrated to offer a good can represent functions (though some of these area/performance compromise. Recently, a functions are permutations of each other). number of other works have explored alternative FPGA LE architectures for performance improvement to close the large gap between FPGAs and application-specific integrated circuits (ASICs) LOOKUP TABLES The basic method used to build a combinational logic block (CLB) also called a logic element in an SRAM-based FPGA is the Fig-2 Programming A Lookup Table lookup table (LUT). As shown in Figure, the lookup table is an SRAM that is used to A typical logic element has four inputs. The implement a truth table. Each address in the delay through the lookup table is independent of SRAM represents a combination of inputs to the the bits stored in the SRAM, so the delay logic element. The value stored at that address through the logic element is the same for all represents the value of the function for that input functions. This means that, for example, a combination. An n-input function requires an lookup table-based logic element will exhibit the SRAM with locations. same delay for a 4-input XOR and a 4-input NAND. In contrast, a 4-input XOR built with static CMOS logic is considerably slower than a 4-input NAND. Of course, the static logic gate is generally faster than the logic element. Logic elements generally contain registers flip-flops and latches as well as combinational logic. A flip-flop or latch is small compared to the combinational logic element (in sharp contrast to the situation in custom VLSI), so it makes sense to add it to the combinational logic element. Using a separate cell for the memory element would simply take up routing resources. The Fig -1 Lookup Tables memory element is connected to the output; Because a basic SRAM is not clocked, the whether it stores a given value is controlled by lookup table logic element operates much as any its clock and enable inputs. other logic gate as its inputs changes, its output In this paper, we propose incorporating (some) changes after some delay. hardened multiplexers (MUXs) in the FPGA PROGRAMMING A LOOKUP TABLE logic blocks as a means of increasing silicon area efficiency and logic density. The MUX- Unlike a typical logic gate, the function based logic blocks for the FPGAs have seen represented by the logic element can be changed success in early commercial architectures, such by changing the values of the bits stored in the as the Actel ACT-1/2/3 architectures, and SRAM. As a result, the n-input logic element efficient mapping to these structures has been IJPRES INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VIII /Issue 1 / DEC 2016 studied in the early 1990s. However, their use in architecture that optimize for area, while commercial chips has waned, perhaps partly due preserving the original mapping depth. to the ease with which logic functions can be mapped into LUTs, simplifying the entire 3) A full post-place-and-route architecture computer aided design (CAD) flow. evaluation with VTR7, and CHStone Nevertheless, it is widely understood that the benchmarks facilitated by LegUp-HLS, the LUTs are inefficient at implementing MUXs, Verilog-to-Routing project showing impact on and that MUXs are frequently used in logic both area and delay. circuits. To underscore the inefficiency of LUTs Compared with the preliminary publication, we implementing MUXs, consider that a six input have performed transistor level modeling of the LUT (6-LUT) is essentially a 64-to-1 MUX (to MUX4 LE, further studied the fracturable select 1 of 64 truth-table rows) and 64-SRAM architectures, and unified the open source tool- configuration cells, yet it can only realize a 4-to- flow from C through LegUp-HLS to the VTR 1 MUX (4 data+2 select=6 inputs). In this paper, flow. Sparse crossbars (versus full crossbars in we present a six-input LE based on a 4-to-1 the previous work) have also been included in MUX, MUX4, that can realize a subset of six- our CLBs, increasing modeling accuracy. The input Boolean logic functions, and a new hybrid new transistor-level modeling of the MUX4 also complex logic block (CLB) that contains a provides more accurate results as compared with mixture of MUX4s and 6-LUTs. The proposed the previous work. Results have also been MUX4s are small compared with a 6-LUT (15% expanded with the inclusion of timing results as of 6-LUT area), and can efficiently map all well as larger architectural ratio sweeps. {2,3}-input functions and some {4,5,6}-input functions. In addition, we explore fracturability II. Literature Review of Les the ability to split the LEs into multiple smaller elements in both LUTs and MUX4s to Recent works have shown that the increase logic density. The ratio of LEs that heterogeneous architectures and synthesis should be LUTs versus MUX4s is also explored methods can have a significant impact on toward optimizing logic density for both improving logic density and delay, narrowing nonfracturable and fracturable FPGA the ASIC–FPGA gap. Works by Anderson and architectures. To facilitate the architecture Wang with “gated” LUTs, then with asymmetric exploration, we developed a CAD flow for LUT LEs, show that the LUT elements present mapping into the proposed hybrid CLBs, created in commercial FPGAs provide unnecessary using ABC and VPR, and describe technology flexibility. Toward improved delay and area, the mapping techniques that encourage the selection macrocell-based FPGA architectures have been of logic functions that can be embedded into the proposed. These studies describe significant MUX4 elements. The main contributions in this changes to the traditional FPGA architectures, paper are as follows. whereas the changes proposed here build on architectures used in industry and academia. 1)Two hybrid CLB architectures (nonfracturable Similarly, and-inverter cones have been and fracturable) that contain a mixture of MUX4 proposed as replacements for the LUTs, inspired LEs and the traditional LUTs yielding up to 8% by and-inverter graphs (AIGs). area savings. Purnaprajna and Ienne explored the 2) Mapping techniques called Natural Mux and possibility of repurposing the existing MUXs MuxMap targeted toward the hybrid CLB contained within the Xilinx Logic Slices. Similar IJPRES INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VIII /Issue 1 / DEC 2016 to this work, they use the ABC priority cut insight to FPGA makers on the deficiencies to mapper as well as VPR for packing, place, and attack and, thereby, improve FPGAs. We route. However, their work is primarily delay- describe the methodology by which the based showing an average speed up of 16% measurements were obtained and show that, for using only ten of 19 VTR7 benchmarks.

Load more