<<

EDIC RESEARCH PROPOSAL 1

Logic Synthesis for Ambipolar FETs Luca Gaetano Amaru´ LSI & TCL, I&C, EPFL

Abstract— Double-Independent-Gate (DIG) Field Effect Transistors carbon nanotubes [1], graphene [2] and silicon nanowires [3]. Tunable (FETs) are expected to extend Moore’s law in the coming years. Many polarity DIG FETs, commonly referred to as ambipolar transistors, emerging technologies present the possibility to have DIG FETs with can be in-field configured as p-type or n-type applying a specific one gate controlling online the device polarity. Such devices are called ambipolar transistors and efficiently embed the XOR function. voltage on the additional gate, usually called the Polarity Gate gates based on ambipolar transistors can implement more complex logic (PG). The Conventional Gate (CG) instead controls the ambipolar functions with less physical resources than conventional Complementary transistor’s on-state as in usual unipolar FETs. Fig.1(a) summarizes Metal Oxide Semiconductor (CMOS) gates. However, most of the state- the in-field configuration of the ambipolar transistor’s polarity. of-art logic design and synthesis methods have been developed expressly for CMOS technology and may miss some optimization opportunity if directly employed for ambipolar technology. This motivates us to propose new methods, or adapt existing ones, for automated logic synthesis A A’ targeting ambipolar transistors. In this report, we first present design techniques for logic gates D B’ B based on ambipolar devices. Then, we introduce logic optimization CG D Y and technology mapping methods, originally proposed for CMOS, that PG=1 S A A’ are also of interest for ambipolar technology. Finally, we present our CG PG=0 D B B’ proposal for an efficient logic synthesis methodology targeting ambipolar PG S transistors. CG Index Terms—Logic synthesis, Design automation, DG transistor, S Tunable polarity (a) (b)

I.INTRODUCTION Fig. 1: (a) Ambipolar FETs polarity control (b) XOR-2 gate in [4]. OVING toward Multiple-Independent-Gate (MIG) Field Effect M Transistors (FETs) is a promising alternative to rejuvenate Ambipolar transistor’s on-state is logical biconditional (XNOR) Moore’s law. With respect to single-gate FET, MIG FET offers on both gates values. For this reason, ambipolar transistors enable an improved design flexibility: each gate separately influences the a compact realization of the XOR function, such as the XOR-2 electrical characteristics of the device. Among controllable device gate depicted by Fig.1(b). Moreover, negative unate functions (e.g. characteristics, device polarity has recently drawn researchers interest NAND/NOR) have the same efficient implementation with ambipolar due to the possibility to achieve denser logic circuits. To this end, transistors as in the well established Complementary Metal Oxide Double-Independent-Gate (DIG) FETs with tunable polarity have Semiconductor (CMOS) technology. Indeed, ambipolar transistors been proposed in many emerging nano-scale technologies, such as can behave as unipolar FETs fixing the voltage on the PG. In summary, ambipolar devices are efficient to implement both neg- Proposal submitted to committee: September 4th, 2012; Candidacy ative unate (NAND/NOR) and binate (XOR/XNOR) logic functions exam date: September 11th, 2012; Candidacy exam committee: Yusuf thanks to the in-field configuration of the device polarity. However, Lebleblici, Giovanni De Micheli, Andreas Burg, David Atienza taking full advantage of controllable ambipolarity in automated logic Alonso. synthesis involves several challenges. First, logic optimization is This research plan has been approved: usually carried out with heuristics that, for pragmatic reasons, target only one type of functions. Then, traditional library-based technology mapping methods restrict the design flexibility (due to the limited Date: ———————————— size of the standard-cell library) missing efficient complex gates implementations offered by ambipolar transistors. A logic synthesis methodology capable of exploiting ambipolar transistors potentiality Doctoral candidate: ———————————— must overcome the aforementioned limitations while maintaining a (name and signature) tractable computational complexity. In this report, we will first discuss three papers of interest for logic Thesis director: ———————————— design and synthesis of circuits based on ambipolar transistors. The (name and signature) first paper, [4], describes circuit level methods to design ambipolar logic gates. The second paper, [5], highlights the limitations of state-of-art synthesis tools to deal with XORs and propose selective Thesis co-director: ———————————— XOR rewriting heuristics to reduce the circuit delay. The last paper, (if applicable) (name and signature) [6], presents two efficient library-free technology mapping methods targeting static CMOS and mixed static CMOS/Pass Transistor Logic (PTL) circuits. After these three papers survey, we present Doct. prog. director: ———————————— our proposal for an efficient logic synthesis methodology targeting (R. Urbanke) (signature) ambipolar transistors. The remainder of this report is organized as follows. Section II EDIC-ru/05.05.2009 summarizes the work in [4] on ambipolar design. In section EDIC RESEARCH PROPOSAL 2

III, we review the work in [5] on selective XOR expansion to min- imize circuit delay. Section IV describes the library-free technology mapping approach in [6]. We present our research proposal in Section V. We conclude the paper in Section VI.

II.AMBIPOLAR LOGIC GATES DESIGN Double-gate controllable ambipolar Carbon NanoTube (CNT) FETs are capable to implement complex functions, embedding the XOR operation, with low physical resources. The work in [4] exploits this opportunity to achieve denser and faster logic circuits. For this purpose, the authors provide guidelines to design logic gates based on controllable ambipolar CNTFETs. In order to evaluate the advantage of combinational circuits realized with such gates, the authors defined an efficient library of ambipolar logic gates (ambipolar library) employable in a standard synthesis flow. Circuits synthesized with the ambipolar library are then compared to their standard unipolar implementation in terms of area, speed and power consumption. In this section, we briefly describe the design methods introduced in [4] and we summarize the results achieved. Fig. 3: Static and pseudo logic style implementations of (A ⊕ B)· C A. Logic Design with Controllable Ambipolar Devices with ambipolar transistors [4]. Static and pseudo logic styles are considered in [4] to design am- bipolar logic gates. In static style, complementary Pull-Up Network (PUN) and Pull-Down Network (PDN) enable full-voltage swing 3) Transistor Sizing: In CNTFETs, holes and electrons mobility logic provided that p-type devices are present only in the PUN while is equal. Therefore, devices in the PUN and PDN are equally sized. n-type devices are used only in the PDN. In pseudo style, the PUN Considered this, successive CNTFETs area sizing to equalize gate is replaced by a single p-type pull-up device weakly biased in order rise and fall times as an unit follows standard rules as in [7]. to allow the output signal to fall within the tolerated margin. Pseudo 4) TG vs. PT Ambipolar Logic: Transmission gate logic has style is preferred to static style when gate area is more critical than several advantages compared to the pass-transistor approach. First, power consumption. TG logic does not require an output buffer, or inverter, saving one The XNOR function is implemented with a single ambipolar device delay stage. Then, the transistor sizing operation is more critical with as shown by Fig. 1(a), but, depending on the input values, the pass-transistor style than with transmission-gates, due to the possible polarity can be n or p. This causes a signal degradation if such p or n type configuration of ambipolar devices in PDN or PUN, device is used in the PUN or PDN. In order to achieve full-voltage respectively. Indeed, CNTFETs acting as controllable ambipolar PTs swing logic, authors in [4] propose to replace each ambipolar device must be oversized (about double the standard area) to equalize gate implementing the XNOR, or XOR, function with a Transmission rise and fall times as an unit inverter. On the other hand, with TGs the Gate (TG) composed by two parallel ambipolar devices fed with sizing operation is far less dramatical thanks to the parallel of p and complementary signals. In this way, any voltage level is passed n type configured ambipolar transistors. For this reason, ambipolar without signal degradation, as depicted by Fig. 2. TG logic has been proved in [4] to be superior to ambipolar PT logic, despite the additional circuitry needed to generate inverted signals.

B. Ambipolar Library In [4], a library of 46 logic gates with no more than 3 series stacked ambipolar or unipolar CNTFETs is employed to synthesize multi- level logic benchmarks. Table I summarizes the proposed library (ambipolar library). Note that if the ambipolar library is implemented Fig. 2: Transmission Gate based on ambipolar transistors [4]. using transmission-gate logic, it is possible to swap signals with different polarities applied to the TGs and achieve more functions 1) Transmission-Gate Logic: The use of ambipolar TGs permits to utilizing the same resources. For example, considering the TG-based design compact full-voltage swing logic gates. However, complemen- realization of (A ⊕ B)· C in Fig. 3, it is possible to implement tary input signals are needed and the circuitry employed to generate (A ⊕ B)· C by swapping A )* A or B )* B. Exploiting this such inverted signals increase the overall gate area. In Fig. 3, the opportunity, 158 different logic functions are achievable with the TG- static and pseudo style TG based implementations of (A ⊕ B)· C based implementation of the ambipolar library in Table I. with ambipolar devices are depicted. The ambipolar library is characterized with delay, area and power 2) Pass-Transistor Logic: An alternative approach to reduce the models for CNTFET technology. The tunable polarity feature is transistor count of ambipolar gates is to use Pass Transistors (PTs) emulated by two parallel n and p type CNTFETs, manually turned in place of TGs. In this case, the drawback is that ambipolar devices on and off in the simulation environment. in-field configured as n or p type can be located in the PUN 1) Delay Model: The Stanford MOSFET-like CNTFET model is or PDN. Consequently, the output signal level is degraded and a used to validate the correctness of the designed gates. Then, the successive voltage restoration stage is needed. In Fig. 3, the static normalized1 F04 delay for each cell is estimated using the switch- and pseudo style PT based implementations of (A ⊕ B)· C with ambipolar devices are depicted. 1The FO4 delay is normalized to the delay of a unit inverter. EDIC RESEARCH PROPOSAL 3

TABLE I: Ambipolar Library 2 α· C· f· Vdd where α is the activity factor, C the load capacitance, Gate Basic Function Derived Functions f the operating frequency and Vdd the power supply. The operating F00 A 1 frequency and the power supply are typically given by the process F01 A ⊕ B 2 and design. The load capacitance is also given by the process. The F02 A + B 1 activity factor is statistically estimated for the considered application. F03 A· B 1 Short-circuit Power: For CNTFET, it is assumed that PSC ≈ F04 (A ⊕ B) + C 2 0.15· P as in CMOS technology. F05 (A ⊕ B)· C 2 D F06 (A ⊕ B) + (A ⊕ C) 3 Static and Gate Leakage Power: Static leakage power F07 (A ⊕ B)· (A ⊕ C) 3 heavily depends on the pattern of on-transistors in the considered F08 (A ⊕ B) + (C ⊕ D) 3 logic gate. Indeed, leaking parallel transistors strongly increase the F09 (A ⊕ B)· (C ⊕ D) 3 static power consumption. In order to have a fast and accurate F10 A + B + C 1 estimate of the static power consumption, in [4] a pattern base power F11 (A + B)· C 1 model is proposed. F12 A + (B· C) 1 Pattern-Based Power Model: The number of input vectors to a F13 A· B· C 1 logic gate grow exponentially with the number of inputs. The method F14 (A ⊕ D) + B + C 2 proposed in [4] permits to avoid exhaustive leakage power simula- F15 (A ⊕ D) + (B ⊕ D) + C 3 F16 (A ⊕ D) + (B ⊕ D) + (C ⊕ D) 4 tions for all input combinations by means of a pattern classification F17 ((A ⊕ D) + B)· C 2 method. It consists in identifying on/off transistors patterns leading F18 ((A ⊕ D) + (B ⊕ D))· C 3 to equivalent leakage currents. For this purpose, on transistors are F19 ((A ⊕ D) + B)· (C ⊕ D) 4 considered as short circuits and anything in parallel to them is F20 ((A ⊕ D) + (B ⊕ D))· (C ⊕ D) 6 removed. Indeed, input vectors [110] and [101] of the NOR gate F21 (A + B)· (C ⊕ D) 2 in Fig. 4, generates the same leakage current pattern. Once all the F22 (A ⊕ D) + (B· C) 2 F23 A + (B ⊕ D)· C 2 F24 (A ⊕ D) + (B ⊕ D)· C 4 F25 A + (B ⊕ D)· (C ⊕ D) 3 F26 (A ⊕ D) + ((B ⊕ D)· (C ⊕ D)) 6 F27 (A ⊕ D)· B· C 2 F28 (A ⊕ D)· (B ⊕ D)· C 3 F29 (A ⊕ D)· (B ⊕ D)· (C ⊕ D) 4 F30 (A ⊕ D) + (B ⊕ E) + C 3 F31 (A ⊕ D) + (B ⊕ D) + (C ⊕ E) 8 F32 ((A ⊕ D) + (B ⊕ E))· C 3 F33 ((A ⊕ D) + B)· (C ⊕ E) 4 F34 ((A ⊕ D) + (B ⊕ D))· (C ⊕ E) 6 F35 ((A ⊕ D) + (B ⊕ E))· (C ⊕ D) 6 Fig. 4: Identical Ioff patterns for [110] and [101] input vectors in a F36 (A ⊕ D) + ((B ⊕ E)· C) 4 3-input NOR gate [4]. F37 A + ((B ⊕ D)· (C ⊕ E)) 3 F38 (A ⊕ D) + ((B ⊕ E)· (C ⊕ E)) 6 different leakage current patterns for a certain logic gate have been F39 (A ⊕ D) + ((B ⊕ E)· (C ⊕ D)) 8 determined, circuit-level power simulations can be performed. Since F40 (A ⊕ D)· (B ⊕ E)· C 3 the logic gates are designed to be symmetrical, i.e. having equal drive F41 (A ⊕ D)· (B ⊕ D)· (C ⊕ E) 6 current for PUN and PDN, similar patterns in the PUN and PDN F42 (A ⊕ D) + (B ⊕ E) + (C ⊕ F ) 4 imply equal leakage currents and are therefore considered equivalent. F43 ((A ⊕ D) + (B ⊕ E))· (C ⊕ F ) 6 F44 (A ⊕ D) + ((B ⊕ E)· (C ⊕ F )) 6 Note that also the gate leakage current can be assessed by the same F45 (A ⊕ D)· (B ⊕ E)· (C ⊕ F ) 4 topology analyzer. Finally, the overall static power consumption of Total 158 a logic gate can be estimated averaging results over all the different leakage patterns. level RC delay model. In this context, the ambipolar CNTFET on- C. Synthesis of Large Logic Circuits resistance is assumed to be the same as for an unipolar CNTFET. ABC software [8] is used in [4] to synthesize MCNC logic bench- Moreover, the polarity and conventional gates capacitances of an marks in ambipolar CNTFET technology and in unipolar CMOS and ambipolar CNTFET are also assumed to be equal. Similarly as for CNTFET technology. For ambipolar CNTFETs, the library in Table I traditional MOSFETs, the instrinsic capacitance is considered roughly is employed while for unipolar CNFETs and standard CMOS, only equal to the gate capacitance. F00, F02, F03, F10, F11, F12, and F13 functions of Table I are 2) Area Model: The weighted transistor count is used as area considered as they can be fabricated in any unipolar technology. For metric for each cell in the ambipolar library. This metric is defined as power simulations, 640K random input patterns are used for each the sum of the devices weighted with their corresponding area ratio benchmark in order to estimate the activity factor. In order to allow a with a transistor in the PDN of an unit inverter. technology-independent comparison of ambipolar and unipolar design 3) Power Model: The power consumption model for a logic gate approaches, the weighted device count, logic depth, and normalized is P = PD +PSC +PS +PG, where PD denotes the dynamic power, delay metrics are considered. PSC the short-circuit power, PS the static power, and PG the power Experimental results highlight that implementations with ambipo- dissipation due to gate leakage. In [4], authors studied each power lar CNTFETs have about 42% fewer levels of logic compared component for ambipolar CNTFETs based logic gates. to unipolar realizations. Moreover, ambipolar implementations have Dynamic Power: The dynamic power can be expressed as PD = 38% less area, expressed as weighted transistor count, than the EDIC RESEARCH PROPOSAL 4 unipolar counterpart. Considering the normalized delay, ambipolar AB or A + B can be computed faster than A and B, then rewriting design saves about 26% of the overall delay with respect to standard an XOR in terms of AND and OR operations reduces the circuit unipolar design. On top of that, logic circuits realized with ambipolar delay. However, also the area overhead associated with the XOR CNTFETs have a reduced power consumption. Indeed, ambipolar expansion must be considered. A complete procedure to evaluate if an circuits save about 32% of the total power and 59% in EDP, compared XOR expansion is beneficial is as follows. First, the common factor to unipolar implementations. between the XOR operands A and B is extracted and the quotients QA and QB are considered. Then, delay (D) and area (AR) of QA, D. Discussion QB , QAQB and QA + QB are evaluated. Based on these values, the delay reduction  and the area overhead δ corresponding to the The paper in [4] describes guidelines to design ambipolar logic min(D ,D ) QA QB gates and provides an insight into the circuit-level benefits deriving XOR expansion are computed as  = (1 − max(D ,D ) ) and QA QB (AR +AR ) from the controllable ambipolarity. We discuss here what are the (QAQB ) (QA+QB ) δ = ( (AR +AR ) − 1). Finally, if the delay reduction limitation of this work and what can be done to improve it. (QA) (QB ) and area overhead fit with the prefixed thresholds,  ≥  1) Logic Synthesis: The logic synthesis tool employed in [4] threshold and δ ≤ δ , the XOR expansion is considered useful if δ/ (ABC [8]) is based on And-Inverter Graphs (AIGs) manipulation. threshold is less than some constant κ, i.e. the area penalty per unit gain in Unfortunately, AIGs are not suited to highlight binate functions, such critical path delay is smaller than κ. as the XOR, that instead have a compact realization with ambipolar transistors. This means that ambipolar circuits synthesized through AIG structures may miss some optimization opportunity. In order B. Global Correlation to fully exploit the tunable polarity feature of ambipolar devices, Having a high local correlation is not the only condition to gain by an ad hoc synthesis methodology must be developed. To this end, expanding XORs. Indeed, the XOR operands may have also a strong proposals for efficient logic synthesis targeting ambipolar transistors correlation with the rest of the expression where the XOR itself is are introduced in Section V. embedded. This property is referred in [5] as global correlation. In 2) Alternate Ambipolar Logic Family: In [4], static and pseudo order to measure the global correlation of an XOR, the following logic design styles are considered to implement gates based on expansions are considered: ambipolar devices. Ambipolar gates are particularly efficient to realize binate functions containing the XOR operation. For such (A ⊕ B) + C = (AB → C)(A + B + C), (G1) logic gates, input signals in both polarities are needed and the (A ⊕ B) + C = (A· B → C)(AB + C). (G2) extra delay in generating the inputs complement could lead to a (X → Y ) (X + Y ) temporary sneak path inside the logic gate, which could result in where the expression is the same as . a large power dissipation. Alternative logic design styles can be For the sake of clarity, we consider as an example the global corre- considered to avoid this issue, such as Differential Cascade Voltage lation in the carry expression of a full adder cout = ab + (a ⊕ b)cin. Swing Logic (DCVSL) style where signals in both polarities are In this case, the XOR expansion G1 is useful: generated practically at the same time. cout = ab + (a ⊕ b)cin, cout = (abcin → ab)(ab + (a + b)cin), III.OPTIMIZATION OF XOR-DOMINATED CIRCUITS cout = ab + (a + b)cin. Boolean function optimization aims to reduce the size of the logic circuit, minimizing some general metric such as gate, literal or net To determine if G1, or G2, XOR expansions are useful, a procedure count. For pragmatic reasons, standard optimization methods are similar to the local correlation one is employed. In practice, an based on algebraic factorization and target negative unate functions, expansion is considered useful only if it reduces the delay at least by e.g. NAND/NOR, that have a particular efficient implementation in a threshold threshold, increase the area by no more than δthreshold CMOS technology. Instead, binate logic functions, e.g. XOR/XNOR, and the ratio of area overhead and delay gain is less than κ. are usually left untouched by logic synthesizers. However, there exist many logic circuits that make extensive use of the XOR func- C. Complete Algorithm and Experiments tion, e.g. multipliers, adders, error correcting and telecommunication The XOR-optimization method proposed in [5] uses the local and circuits. For these circuits, traditional logic synthesizers may lose global correlation algorithms. Note that such optimization method is some optimization opportunity. To overcome this limitation, in [5] a delay oriented. general technique to optimize XOR-dominated circuits is presented. 1) XOR-optimization Algorithm: It consists in rewriting XOR expressions in terms of AND and OR The input of the optimization operations to reduce the critical path delay. However, expanding algorithm is a Boolean tree. First, the input tree is collapsed by XORs sometimes results in worse delay and severe hardware area merging subtrees having the same type of nodes into a single multi- e.g. penalties. For this reason, authors in [5] have introduced algorithms input node, a 4-input tree consisting only of 2-input AND to determine if an XOR expansion is beneficial based on the local nodes is collapsed into a single 4-input AND node. Then, the and global correlation between XOR operands. In this section, we collapsed tree is optimized by two steps, iterated till the delay of present such algorithms and we report results achieved in [5]. the circuit is improved. The fist step consists of traversing the tree in topological order and apply the local correlation algorithm to subtree rooted at XOR nodes. The local correlation algorithm is A. Local Correlation applied in conjunction to a greedy heuristic that takes in account In order to measure the local correlation of an XOR operation, the the input arrival times of XORs operands. In the second step, the following expansion is considered: tree is traversed in reverse topological order (from leaves to root) A ⊕ B = (AB)(A + B). (L1) and the global correlation method is applied to XOR nodes. In order to evaluate the computational complexity of the optimization If A and B are such that AB = 0, then A ⊕ B reduces to A + B, algorithm, denote by n the number of nodes in the collapsed tree. while if A+B = 1 then A⊕B is equivalent to AB. It follows that if Note that the most expensive function called in the algorithm is the EDIC RESEARCH PROPOSAL 5 one that computes  and δ for the correlation algorithms. Denote by order to generate complex logic gates, the work in [6] presents a Testimator the execution time for this estimator function. The global method, referred to as the Odd-level Transistor Replacement (OTR) correlation algorithm makes o(n) calls to the estimator function. On method, to collapse an odd number of gates levels into a single gate. the other hand, the local correlation algorithm makes o(n2) calls to 1) Odd-Level Transistor Replacement Method: The core idea of the estimator function due to the greedy heuristic used to consider OTR is to use the pull-down (pull-up) transistors from gates at input arrival times. Therefore, the overall time complexity of the the same level to replace pull-up (pull-down) transistors of gates 2 XOR-optimization algorithm is o(Testimator· n ). at the next level. An example of OTR is provided by Fig. 5. The 2) Experiments: In [5], the XOR-optimization algorithm is im- starting structure is formed by only NAND and INV gates and has plemented in Maple 10 which is used as front-end to Synopsys 3 gate levels and 20 transistors in all, implementing the function Design Compiler. The input circuits are synthesized twice: first with (a + b)· (c + d). In the first step, pull down (pull up) transistors Design Compiler and then with the XOR-expansion method. For of gates (G1, G2) and (G3, G4) are employed to replace pull up synthesis purposes, a common standard cell library in UMC 0.13 µm (pull down) transistors of gates G5 and G6, respectively. During this is used. The XOR expansion parameters are set to threshold = 0.1, operation, the temporary gates G5’ and G6’ in Fig. 5(c) are created. δthreshold = 0.2 and κ = 2. Averaged over different arithmetic In the second, and last, step, pull down (pull up) transistors of gates circuits, the synthesis flow based on the XOR-expansions reduces (G5’, G6’) substitute pull up (pull down) transistors of the final gate the delay by 13.7%, at cost of an area overhead of 15.1%, compared G7. The resulting gate is depicted by Fig. 5(d). In order to achieve to the same circuits directly synthesized by Design Compiler. full-swing complementary gates, the number of levels in the OTR must be odd to have a final gate with p-type pull-up transistors and n- D. Discussion type pull down transistors. Otherwise, temporary gates, like G5’ and G6’, with p-type pull down and n-type pull up cannot be eliminated. The work in [5] presents a delay oriented XOR-optimization The name OTR derives from this property. Note that is also possible algorithm. Despite such method is efficient and well supported by to apply the OTR procedure to circuits with an even number of levels experimental results, it does not address the generic need for an by inserting couples of inverters3. optimization algorithm capable to manipulate XOR-intensive circuits. Indeed, XOR-expansions L1, G1 and G2 permits only to rewrite The CMOS library-free mapping algorithm in [6] is based on existing XORs but do not allow to evidence new XOR relations if Dynamic Programming (DP) and on the OTR method. The input of convenient. XOR factorization heuristics must be integrated with the such algorithm is a subject-graph containing only NAND and INV XOR expansion methods in [5] to address this lack. functions. This subject-graph is first decomposed into a forest of trees to reduce the computational complexity of the mapping task. Then, IV. LIBRARY-FREE TECHNOLOGY MAPPING each tree root is processed with a DP method. In such condition, the DP procedure associate a set of states to each node, where a Technology mapping is the logic synthesis task that transforms an node represent a complex gate output. Indeed, a state correspond to optimized Boolean network into an interconnected netlist of logic a partial solution relative to the OTR procedure applied to that node. gates taken from a given library. The quality of the technology map- The state information is a pair [Area,Delay] for the gate generated by ping operation mainly depends on the richness of the standard cell the OTR method and its fan-in nodes. Note that the OTR procedure is library [10] and on the ability of the mapping tool to recognize where bounded to use less than k series stacked transistors for each complex such cells can be used (matching). Boolean matching can address the gate. The DP algorithm consists of two phases. The first phase is recognition issue with little computational burden since most cells a post-order traversal of the tree (from the leaves to the root) that have a small number of inputs (i.e. at most 5 or 6 inputs) [11]. On computes all the possible states of a node and eliminates the ones that the other hand, the number of cells in the library, i.e. the library size, are provably sub-optimal. The second phase is a preorder traversal of directly affects the complexity of technology mapping imposing a the tree, where the best solution is chosen among the stored states. tradeoff between the computational cost and the design flexibility. To 2) Experimental Results: The CMOS library-free mapping method overcome this limitation, library-free technology mapping has been is first compared with SIS [9] library-based technology mapper. The proposed in [6], [12]–[14] where each cell/gate is built on the fly at benchmarks are taken from the ISCAS’85 suite. These circuits are the transistor level without requiring any pre-characterized standard previously decomposed into NAND-INV networks using SIS. For cell-library or matching operation. Given a maximum gate fan-in m, m SIS technology mapper, the library employed are nand-nor.genlib, library free technology mapping can implement all the 22 possible mcnc.genlib, lib2.genlib and 44-6.genlib. For the OTR method, the functions2 of m variables in a single gate. Library-free technology maximum number of series stacked transistors k is set to 4. Ex- mapping is a promising approach to take full advantage of complex perimental results show that the CMOS library-free mapping method logic gates with a bounded computational cost. In this section, we provides better results than SIS mapper, with average delay reductions review the work in [6] about library-free technology mapping of high of about 40% with about 40% smaller areas. Considering instead the performance CMOS and Pass Transistor Logic (PTL) designs. For latest library-free mapping method (TABA [15]) available at the time the sake of clarity, we present algorithms in [6] targeting CMOS and of [6], the CMOS library-free mapping method reduces the delay mixed CMOS/PTL designs separately. by an average of 26%, while simultaneously providing average area reductions of 0.5%. A. CMOS Library-free Technology Mapping In library-free technology mapping each logic gate is built on the B. Mixed CMOS/PTL Library-free Technology Mapping fly at the transistor level starting from an initial decomposed circuit Pass Transistor Logic (PTL) is a promising design style that (subject-graph). The initial circuit in [6] is decomposed into INVs permits to implement Boolean functions with fewer transistors com- and 2 input NANDs logic functions to form the subject-graph. In pared to static CMOS style. Unfortunately, PTL also presents some

m 2Note that some of the 22 functions have the same gate implementation 3Inserting two consecutive inverters does not change the functionality of due to input permutation. However, the number of different gates implemen- the mapped function but allows to split an even-leveled circuit into two odd- tation with m inputs is still large. leveled sub-circuits. EDIC RESEARCH PROPOSAL 6

(a) (b) (c) (d)

Fig. 5: OTR method applied to (a + b)· (c + d). (a) NAND-INV representation, (b) Transistor-level implementation of (a), (c) Intermediate gates, (d) Final gate. [6] drawbacks. First, n-type (p-type) devices can be used to pull-up phase, the best implementation for a node (PTL by a BDD or CMOS (pull-down) nodes causing an imperfect voltage transition. Second, by the OTR) is chosen. sneak paths between Vcc and Vss may exist if the PTL circuit is not 1) Experimental Results: The same settings of the CMOS library- carefully designed. At the logic level, PTL circuits can be efficiently free mapping method are used. In addition, the maximum number designed using Binary Decision Diagrams (BDDs) [16], [17]. Indeed, of stacked pass transistors in PTL circuits is set to 4 (p = 4). The BDDs can be directly transposed in their PTL implementations. mixed CMOS/PTL library-free mapping method achieves an average Fig. 6 illustrates the one-to-one correspondence between a BDD node delay reduction of about 50%, and at the same time area reductions and its PTL implementations. In [6], the BDD node implementation above 70%, over the results of SIS.

C. Discussion In [6], library-free technology mapping methods targeting static CMOS and mixed CMOS/PTL circuits are presented. The library- free approach has been proven in [6] to be superior, in terms of performance and area, with respect to traditional library-based methods. However, layout issues including cell generation and cell characterization on the fly, represent major limitations of library-free technology mapping. Indeed, the quality of the layouts generated on the fly, and the corresponding timing characterization accuracy, are inferior to those obtained by a pre-characterized standard cell library. Efficient automated layout synthesis tools are needed to overcome this barrier and to exploit the library-free technology Fig. 6: PTL implementation of a BDD node. (a) Implementation with mapping opportunity. n-type devices only, (b) Mixed p/n type devices implementation. [6] depicted by Fig. 6(b) is employed as it is sneak-path free and does V. RESEARCH PROPOSAL not require complementary signals. This research proposal addresses the need for an efficient logic The library-free static CMOS technology mapping algorithm is synthesis flow targeting ambipolar transistors. While ambipolar logic adapted to support mixed static CMOS/PTL circuits. In this circum- gate design has been already explored in [4], no prior work is stance, each node can be part of a CMOS or a PTL gate. As for known about a logic synthesis methodology for controllable am- CMOS, the PTL implementation constraint is the maximum number bipolar devices. Indeed, results in [4] are obtained using ABC [8] of stacked pass transistors (p). Given this limitation, BDDs with a software that is very efficient for negative unate functions but may maximum depth p are used to design PTL circuits. After p stacked miss optimization opportunities for binate functions (e.g. XOR) that pass transistors the static CMOS style is imposed. These conditions instead have a convenient implementation with ambipolar transistors. are embedded in the DP procedure. First, in the postorder traversal This means that the improvements of controllable ambipolar over phase, all the candidate BDDs (PTL implementations) respecting the unipolar technology presented in [4] are conservative and can be maximum depth p for a given node are computed. Also the possible further boosted using a specific synthesis method. To achieve this implementation in static CMOS style for the same node is considered goal, a novel integrated synthesis tool must be developed. We present (OTR with no more than k stacked transistors). Once all the PTL hereafter the challenges involved in this task and propose solutions and CMOS solutions have been generated for that node, provably to them. Fig. 7 summarizes the research proposal highlighting the suboptimal solutions are eliminated. Then, in the preorder traversal preliminary studies and the open problems for future research. EDIC RESEARCH PROPOSAL 7

Logic Synthesis for Ambipolar FETs transistors. In this context, logic optimization corresponds to mini- Optimization Mapping mize the BBDD size, since it is directly affecting the complexity of Hybrid Optim. Lib.−free Mapping Benchmarking the resulting ambipolar logic circuit. To this end, we have proposed in [19] ordering and reduction rules that make Reduced and Ordered AND/OR−XOR Area Oriented MCNC Benchmarks Logic BBDD (ROBBDDs) very compact and canonical. Experimental re- Majority Logic Delay Oriented Error Correcting Circuits sults in [19] show that ROBBDDs have on average 37% fewer nodes than their standard ROBDDs [21] counterparts. However, a complete Biconditional BDD Direct Mapping Legend decision diagram package for BBDDs has not been implemented in Reduction & Order BBDD FET Assign. Preliminary Studies [19] and therefore experiments for large circuits were not conduced. Decomposition Physical Synthesis Future Research We plan to build such BBDD package and integrate it in a logic synthesis tool targeting ambipolar devices. Moreover, we plan to Fig. 7: Research Proposal: Logic Synthesis for ambipolar transistors. develop decomposition methods for BBDDs in order to further reduce the representation size and to enhance the compactness of ambipolar logic circuits directly mapped on such decomposed BBDDs. A. Logic Optimization

In order to achieve efficient ambipolar logic circuits, an integrated B. Technology Mapping XOR-AND/OR optimization method is needed. However, XOR and AND/OR optimization heuristics proposed in literature are typically Library-based technology mapping onto ambipolar logic gates not compatible or, if they are, the optimization quality is sensibly presents similar limitations to the one discussed in Section IV pendant to one function type. Instead, a desirable XOR-AND/OR for standard CMOS. In addition, ambipolar logic gates offer an optimization method must give near-optimal results for both types of advantageous implementation for complex functions that are often functions. We propose a hybrid optimization method to accomplish unexploited in library-based mapping algorithms. This motivates us this task. to move to a library-free technology mapping approach for ambipolar 1) Hybrid Optimization: In contrast to standard single step op- transistors. In this circumstance, complex functions are built on timization procedures, multi-step optimization is a promising can- the fly at the transistor level taking advantage of the controllable didate to provide near-optimal results for logic circuits containing ambipolarity. We propose a new algorithm for library-free technology different types of functions intertwined together. In [18], we have mapping targeting ambipolar transistors. presented a 2-step optimization method that enables selective and 1) Library-free Technology Mapping: We have considered library- distinct manipulation of AND/OR and XOR-intensive portions of free technology mapping onto ambipolar transistors in [18]. We have the logic circuit. This method consists of a first XOR-optimization modified the traditional approach in [6] to support ambipolar devices of the initial circuit to highlight the XORs, and then a successive expressive power. First, in order to preserve XOR operations, the AND/OR optimization of the remaining part of the circuit (non- input circuit is decomposed into a subject-graph comprising only 2- XOR nodes). Intermediate EXternal Don’t Care (EXDC) conditions input AND/OR/XOR/XNOR and INV functions. Then, the subject- can be computed to improve the AND/OR optimization quality. graph is decomposed into a forest of trees to bound the computational Experimental results in [18] show that the hybrid optimization pro- complexity. Each tree is processed by a low complexity greedy cedure reduces up to 15.7% (about 5.2% on average to large MCNC algorithm that is guaranteed to find the area-optimal transistor-level benchmarks) the area of the synthesized logic circuit with respect to solution for the tree. Experimental results in [18] have shown that a traditional single step optimization method. This corresponds to a the hybrid optimization method followed by the proposed library-free better exploitation of ambipolar transistors functionality. We plan to technology mapping algorithm can reduce the number of ambipolar extend the hybrid optimization method to further support ambipolar transistors by 20.9% and 15.3%, on the average, with respect to transistors expressive power. Indeed, in [19], we have presented a state-of-art academic and commercial synthesis tools, respectively. novel logic gate based on ambipolar transistors implementing a 3- We plan to extend the library-free technology mapping algorithm to input majority function with only 8 devices while in full-swing static delay oriented ambipolar design. CMOS 12 devices are used [7]. This opens up the opportunity to include a third step in the hybrid optimization procedure to highlight The capability to synthesize circuits passing through Decision majority functions that can be advantageously realized with ambipolar Diagrams (DDs) data structures has been explored in several works transistors. We plan to develop a novel majority-logic decomposition in literature [22], [23]. Biconditional BDDs (BBDDs) proposed in technique to support this additional optimization phase. [19] enable an efficient direct mapping of ambipolar transistors onto Reduced and Ordered BBDDs (ROBBDDs). We propose technology Since most algorithms for logic optimization depend on the repre- mapping of ambipolar circuits by direct ROBBDDs mapping. sentation of input expressions, the choice of the logic representation 2) Direct Mapping onto Logic Representation Structures: In [19], form plays a vital role in multi-level optimization. Given that our aim we have considered direct mapping of ambipolar transistors onto is to fully exploit the logic expressive power of ambipolar transistors, ROBBDD structures. Indeed, BBDD nodes and edges can be effi- a convenient logic representation form must have a strong link with ciently implemented with ambipolar devices. Experimental results in the functionality of ambipolar devices. For this purpose, we propose [19] show that ROBBDDs direct mapping reduces the number of a novel canonical Binary Decision Diagram (BDD) structure. ambipolar devices by 49.7% on average with respect to state-of-art 2) New Logic Representation Form: Ambipolar transistors embed logic synthesis tool. We plan to integrate physical synthesis in this the XNOR in their functionality. In [19], we have technology mapping method. Indeed, regular structures resulting from introduced a novel class of decision diagrams, called Biconditional ROBBDDs are easy to layout as edges (signal wires) connect only Binary Decision Diagrams (BBDDs), that shares the same core adjacent nodes (cells) and branching variables (control wires) are logical connective with ambipolar devices. Thanks to this property, local. This approach can alleviate the interconnection issue that is of BBDDs have an efficient one-to-one correspondence with ambipolar paramount importance with double-gate ambipolar devices [24]. EDIC RESEARCH PROPOSAL 8

C. Benchmarking [21] R.E. Bryant, Graph-based algorithms for Boolean function manipula- tion, IEEE Trans. Comput., C-35: 677-691, 1986. In [18] and [19], we have tested the proposed logic synthesis [22] R. Drechsler, W. Gunther, Toward One-Pass Synthesis, Springer, 2002. flows targeting ambipolar transistors with arithmetic benchmarks [23] V. Bertacco et al., Decision Diagrams and Pass Transistor Logic taken from the MCNC suite. We plan to evaluate the impact of Synthesis, Proc. IWLS, 1997. controllable ambipolarity in real-life applications. In particular, we [24] S. Bobba et al., Physical synthesis onto a Sea-of-Tiles with double-gate silicon nanowire transistors, Proc. DAC, pp. 42-47, 2012. will consider error correcting circuits as they are expected to take large advantage from ambipolar technology. Low-Density Parity- Check (LDPC) circuits make extensive use of the XNOR function and therefore will be used as first benchmark. Then, we plan to design ambipolar circuits for polar codes and compare their complexity to the CMOS realization that we have presented in [20]. Finally, we will consider the application of ambipolar devices to majority logic decoding circuits exploiting the compact ambipolar realization of the majority function presented in [19].

VI.CONCLUSIONS Controllable ambipolar transistors enable an unprecedented logic design flexibility. In this report, we have first presented previous work on ambipolar logic gate design and then we have reviewed logic optimization and technology mapping algorithms, originally proposed for CMOS, that are also of interest for ambipolar technology. We have finally proposed a research plan that aims to address the need for an efficient logic synthesis flow targeting ambipolar transistors.

REFERENCES [1] Y. Lin et al., High-Performance Carbon Nanotube Field-Effect Transistor with Tunable Polarities, IEEE Trans. Nanotech., 4(5): 481-489, 2005. [2] N. Harada et al., A polarity-controllable graphene inverter, Applied Physics Letters, 96(1): 012102 - 012102-3, 2010. [3] M. De Marchi et al., Polarity control in Double-Gate, Gate-All-Around Vertically Stacked Silicon Nanowire FETs, Proc. IEDM, 2012. [4] M.H. Ben-Jamaa, K Mohanram and G. De Micheli, An Efficient Gate Library for Ambipolar CNTFET Logic, IEEE Trans. CAD, vol. 30, pp.242-255, Feb. 2011. [5] A.K. Verma, P. Ienne, Improving XOR-Dominated Circuits by Exploiting Dependencies between Operands, Proc. ASP-DAC, pp.601-608, 2007. [6] Y. Jiang, S.S. Sapatnekar and C. Bamji, Technology Mapping for High- Performance Static CMOS and Pass Transistor Logic Designs, IEEE Trans. VLSI, vol. 9, pp. 577-589, Oct. 2001. [7] J.M. Rabaey et al., Digital Integrated Circuits, Prentice Hall, 2003 [8] ABC Logic Synthesis Tool [Online]. Available: http://www.eecs.berkeley. edu/alanmi/abc/ [9] E. Sentovich et al., SIS: A System for Sequential Circuit Synthesis, ERL, Dept. EECS, Univ. California, Berkeley, UCB/ERL M92/41, 1992. [10] K. Keutzer, K. Kolwicz, and M. Lega, Impact of library size on the quality of automated synthesis, Proc. of ICCAD, pp. 120-123, 1987. [11] L. Benini and G. De Micheli, A survey of Boolean matching techniques for library binding, ACM TODAES, Vol. 2, No. 3, pp.193-226, July 1997. [12] M. Pullerits and A. Kabbani, Library-free synthesis for area-delay minimization, International Conference on Microelectronics, 2008. [13] F.S. Marques et. al, DAG Based Library-Free Technology Mapping, Proc. GLSVLSI, pp 293-298, 2007 [14] J. Xue, D. Al-Khalili and C.N. Rozon, Technology Mapping in Library- free Logic Synthesis, Proc. SPIE 5837, 919 (2005). [15] A. Reis, R. Reis, D. Auvergne, and M. Robert, The library free tech- nology mapping problem, in Proc. IWLS, June 1997. [16] C.Y. Lee, Representation of Switching Circuits by Binary-Decision Programs, Bell Systems Technical Journal, 1959. [17] S.B. Akers, Binary Decision Diagrams, IEEE Trans. Comp., C- 27(6):509-516, June 1978. [18] L. Amaru,´ P.E. Gaillardon and G. De Micheli, MIXSyn: An Efficient Logic Synthesis Methodology for Mixed XOR-AND/OR Dominated Cir- cuits, Submitted to ASPDAC 2013. [19] L. Amaru,´ P.E. Gaillardon and G. De Micheli, Biconditional BDD: A New Canonical BDD for Logic Synthesis targeting Ambipolar Transistors, Submitted to DATE 2013. [20] A. Mishra, A. Raymond, L. Amaru,´ G. Sarkis, C. Leroux, P. Meinerzha- gen, W. Gross, A. Burg, A Successive Cancellation Decoder ASIC for a 1024-Bit Polar Code in 180nm CMOS, To appear in A-SSCC 2012.