Pbmap: a Path Balancing Technology Mapping Algorithm for Single Flux Quantum Logic Circuits Ghasem Pasandi, Student Member, IEEE, and Massoud Pedram, Fellow, IEEE

IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY (DOI: 10.1109/TASC.2018.2880343) 1 PBMap: A Path Balancing Technology Mapping Algorithm for Single Flux Quantum Logic Circuits Ghasem Pasandi, Student Member, IEEE, and Massoud Pedram, Fellow, IEEE Abstract—This paper presents a path balancing technology architectures are needed [5]. An example difference between mapping algorithm, which is a new algorithm for generating SFQ and CMOS gates is that most of SFQ gates (except for a mapping solution for a given Boolean network such that the confluence buffers, splitters, TFFs and I/O cells) receive a average logic level difference among fanin gates of each gate in the network is minimized. Path balancing technology mapping clock signal. This makes the clock distribution network in SFQ is required in dc-biased Single Flux Quantum (SFQ) circuits more complex than CMOS; the clock network is much bigger for guaranteeing the correct operation, and it is beneficial in in SFQ compared to CMOS and it should be designed more CMOS circuits to reduce the hazard issues. We present a dynamic carefully to guarantee delivery of clock signals to all gates programming based algorithm for path balancing technology with acceptable amounts of jitter and skew [8], [9]. mapping which generates optimal solutions for dc-biased SFQ (e.g. Rapid SFQ or RSFQ) circuits with tree structure and acts Another difference between SFQ and CMOS circuits is the as an effective heuristic for circuits with general Directed Acyclic requirement of path balancing in SFQ circuits; if there is a Graph (DAG) structure. Experimental results show that our path difference among logic levels of fanin gates of a gate in an balancing technology mapper reduces the balancing overhead by SFQ circuit, path balancing D-Flip-Flops (DFFs) should be up to 2:7× and with an average of 21% compared to the state- inserted into outputs of the fanin gates with smaller logic of-the-art academic technology mappers. levels. This is done to guarantee arrival of all input signals Index Terms—Energy Efficient, eSFQ, ERSFQ, Logic Syn- of a gate at the same clock period. Otherwise, input pulses thesis, Low Power, Rapid Single Flux Quantum, RSFQ, SFQ, which have arrived at earlier clock periods will be consumed, Superconducting Electronics, Technology Mapping. generating wrong output values. For some small circuits, one could add a few asynchronous delay elements (e.g. chain I. INTRODUCTION of Josephson Transmission Lines (JTLs)) to make sure that ATH balancing technology mapping is a new method of all gates receive their inputs in right clock periods, hence, P mapping an RTL description such as a Boolean network guaranteeing correct circuit operation. However, this solution into a gate-level netlist. For a network generated by the path cannot be scaled and it is hard to be automated, because it 1 balancing mapper, the average logic level difference among requires information of routed wires (after place and route) fanin gates of each gate in the mapped netlist is reduced during logic synthesis. Therefore, path balancing should be (ideally zero). The path balancing technology mapping is considered in logic synthesis (e.g. during the technology required in Single Flux Quantum (SFQ) logic families includ- mapping phase) of SFQ circuits not only to meet the balancing ing Rapid Single Flux Quantum (RSFQ) [1], energy-efficient requirement, but to minimize the path balancing overhead by SFQ (eSFQ) [2], and Energy-efficient RSFQ (ERSFQ) [3] for reducing the number of required path balancing DFFs. correct circuit operation. In this paper, we present PBMap: a path balancing tech- SFQ gates with switching delay of 1ps and switching energy nology mapping algorithm which provides optimal solutions −19 of 10 J are potential candidates for replacing CMOS gates for mapping tree-like dc-biased SFQ logic (including RSFQ, to achieve high performance and energy efficient systems [2]. eSFQ, and ERSFQ) circuits. Note that ERSFQ logic was As an example, a T-Flip-Flop (TFF) with speed of 770GHz is arXiv:1812.10006v2 [cs.ET] 3 Jan 2019 developed to eliminate static power losses of RSFQ by reported in [4]. SFQ circuits are made of Josephson Junctions replacing bias resistors with inductors and current-limiting (JJs), which are superconducting devices working based on Josephson junctions. Similarly, eSFQ logic, which was also the Josephson effect [5]. One of the most popular families of powered by direct current, differed from ERSFQ in the size SFQ circuits is RSFQ, which is developed in 1980s [1]. of the bias current limiting inductor and how the limiting JJs Due to some key differences between SFQ and CMOS were regulated. So, although there are key differences among circuits such as gate-level pipelining and fanout limitation in RSFQ, ERSFQ, and eSFQ in terms of their biasing network SFQ circuits, existing Computer-Aided Design (CAD) tools designs, these differences do not affect the proposed mapping for CMOS technology cannot be directly used for SFQ cir- algorithm. cuits [6], [7]. Therefore, to make use of benefits that SFQ In our algorithm, DFF insertion to achieve path balancing is circuits provide in generating high performance and low- done to enable gate-level wave-pipelining. In other words, in power solutions, new design concepts, automation tools, and a circuit generated by our algorithm, length of all paths from Authors are with the Department of Electrical Engineering, Univer- any Primary Input (PI) to any Primary Output (PO) will be sity of Southern California, Los Angeles, California, USA. e-mails: the same. However, the benefit of using our path balancing [email protected], [email protected] algorithm is that it reduces total number of required path Manuscript submitted June 26, 2018. 1 balancing DFFs and as a result it reduces total JJ count and Logic level of a gate gi in a network N is the length of the longest path (in terms of the gate count) from any primary input of N to gi. total area (Table II). IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY (DOI: 10.1109/TASC.2018.2880343) 2 A B Ib2 A splitter B C L2 Ic2 J2 Ib1 L1 5ps A B Iin I c1 J1 Ib3 C L3 I c3 J3 C (a) (b) out1 splitter in out4 splitter splitter splitter out2 in splitter out3 splitter out1 out out4 2 out3 (c) (d) Fig. 1: (a) Splitter gate in SFQ, (b) waveforms corresponding to the operation of this gate, and a splitter tree providing four fanouts with depth (c) 2, and (d) 3. Delay for generating out1 in (d) is lower than (c). Thus, using the structure of (d) is better in networks where the critical path goes through out1. The rest of this paper is organized as follows: Section II is called Resistive Single Flux Quantum logic [13]. Using provides some background knowledge on SFQ circuits and this logic, the operation speed of up to 30GHz was reported, logic synthesis, and it summarizes the related work. It also which was quite higher than any other digital device with the gives a quick overview on the state-of-the-art technology map- same complexity at that time [14]. Later on, another version ping flow. Section III provides a motivation example for path was proposed by using JJs instead of ohmic resistors. This balancing technology mapping, presents our path balancing version is called Rapid SFQ (RSFQ) [1]. It improved the technology mapping algorithm, gives its proof of optimality parameter margins of the first version and also increased its for trees, considers retiming, generalizes the technology map- operation speed to 300GHz [15]. In the following, we explain ping algorithm to Directed Acyclic Graphs (DAGs), and finally some properties and key circuit/gate level requirements of SFQ talks about clock jitter accumulation problem. Section IV gives circuits. the experimental results, and finally, Section V concludes the 1) Fanout in SFQ: In SFQ logic (RSFQ/ERSFQ/eSFQ), if manuscript. a gate needs to have more than one fanout, a special SFQ gate called splitter should be added to the output of this gate. II. BACKGROUND Splitter is an asynchronous gate that accepts an SFQ pulse and A. Background on SFQ Logic Circuits produces two output pulses after its intrinsic delay. One splitter In SFQ logic, a single quanta of magnetic flux (Φ0 = h=2e can produce only two fanouts. For additional fanouts, more = 2:07mV ×ps) is used for representation of logic bits. In this splitters should be added in a binary tree structure. To have n representation, presence of a pulse has the meaning of “logic- fanouts, n-1 splitters are needed. Fig. 1 shows the circuit-level 1”, while absence of a pulse is considered as a “logic-0”. schematic of a splitter gate, its operating waveforms, and two Operation of SFQ logic is based on overdamped Josephson examples of splitter binary trees for providing four fanouts junctions, and hence, it does not experience the problem of (FO4). Please note that for AQFP, splitters are clocked buffers hysteretic I-V, which degrades the operation speed of “1” to that can have 1-to-2, 1-to-3 and even 1-to-4 fanouts [16], [17]. “0” switching. 2) Gate-level pipeline: Unlike CMOS gates, in SFQ logic, SFQ logic families are divided into two groups: ac-biased most of the gates receive a clock signal. There are three main and dc-biased. Adiabatic Quantum Flux Parametron (AQFP) methods for clock distribution in SFQ circuits: (i) counter- [10], [11] and Reciprocal Quantum Logic (RQL) [12] are flow clocking where the clock flows in the opposite direction examples of ac-biased and RSFQ [1] is an example for dc- of the data, (ii) concurrent-flow clocking in which the clock biased logic family.

Pbmap: a Path Balancing Technology Mapping Algorithm for Single Flux Quantum Logic Circuits Ghasem Pasandi, Student Member, IEEE, and Massoud Pedram, Fellow, IEEE

Space Expansion of Feature Selection for Designing More Accurate Error Predictors

Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA

Model-Order Reduction Using Variational Balanced Truncation with Spectral Shaping Payam Heydari, Member, IEEE, and Massoud Pedram, Fellow, IEEE

Design Technologies for Low Power VLSI

Block Level Voltage/Frequency Scheduling for Low Power Consumption Under Timing Constraints

SLA-Based, Energy-Efficient Resource Management In

Placement and Beyond in Honor of Ernest S. Kuh

Coupling of Synthesis and Layout: Challenges and Solutions

Payam Heydari Address : 830 Childs Way, SUITE #254 Los Angeles, CA 90089 Tel : (213) 740-4437 Home: (323) 730-0883 Fax : (213) 740-7290 E-Mail : [email protected]

Global Warming – Impacts and Future Perspective

Dr. Massoud Pedram

An Efficient Training and Inference Engine for Intelligent Mobile