Strategic Quantum Ancilla Reuse for Modular Quantum Programs Via Cost-Effective Uncomputation

Total Page:16

File Type:pdf, Size:1020Kb

Strategic Quantum Ancilla Reuse for Modular Quantum Programs Via Cost-Effective Uncomputation SQUARE: Strategic Quantum Ancilla Reuse for Modular Quantum Programs via Cost-Effective Uncomputation Yongshan Ding1, Xin-Chuan Wu1, Adam Holmes1,2, Ash Wiseth1, Diana Franklin1, Margaret Martonosi3, and Frederic T. Chong1 1Department of Computer Science, University of Chicago, Chicago, IL 60615, USA 2Intel Labs, Intel Corporation, Hillsboro, OR 97124, USA 3Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Abstract—Compiling high-level quantum programs to ma- implemented to ensure operation fidelity is met for arbitrarily chines that are size constrained (i.e. limited number of quantum large computations. bits) and time constrained (i.e. limited number of quantum One major challenge, however, facing the QC community, is operations) is challenging. In this paper, we present SQUARE (Strategic QUantum Ancilla REuse), a compilation infrastructure the substantial resource gap between what quantum computer that tackles allocation and reclamation of scratch qubits (called hardware can offer and what quantum algorithms (for classi- ancilla) in modular quantum programs. At its core, SQUARE cally intractable problems) typically require. Space and time strategically performs uncomputation to create opportunities for resources in a quantum computer are extremely constrained. qubit reuse. Space is constrained in the sense that there will be a limited Current Noisy Intermediate-Scale Quantum (NISQ) computers and forward-looking Fault-Tolerant (FT) quantum computers number of qubits available, often further complicated by poor have fundamentally different constraints such as data local- connectivity between qubits. Time is also constrained because ity, instruction parallelism, and communication overhead. Our qubits suffer from decoherence noise and gate noise. Too heuristic-based ancilla-reuse algorithm balances these consider- many successive operations on qubits results in lower program ations and fits computations into resource-constrained NISQ or success rates. FT quantum machines, throttling parallelism when necessary. To precisely capture the workload of a program, we propose Due to space and time constraints, it is critical to find effi- an improved metric, the “active quantum volume,” and use cient ways to compile large programs into programs (circuits) this metric to evaluate the effectiveness of our algorithm. Our that minimize the number of qubits and sequential operations results show that SQUARE improves the average success rate of (circuit depth). Several options have been proposed [8]–[14]. NISQ applications by 1.47X. Surprisingly, the additional gates Among the options, one approach not yet well studied is to for uncomputation create ancilla with better locality, and result in substantially fewer swap gates and less gate noise overall. coordinate allocation and reclamation of qubits for optimal SQUARE also achieves an average reduction of 1.5X (and up to reuse and load balancing [15]. Reclaiming qubits, however, 9.6X) in active quantum volume for FT machines. comes with a substantial operational cost. In particular, to Index Terms—quantum computing, compiler optimization, re- obey the rules of quantum computation, before recycling a versible logic synthesis used qubit, additional gate operations need to be applied to “undo” part of its computation. I. INTRODUCTION In this paper, we propose the first automated compila- Thanks to recent rapid advances in physical implementation tion framework for such strategic quantum ancilla reuse arXiv:2004.08539v2 [quant-ph] 25 Jun 2020 technologies, quantum computing (QC) is seeing an exciting (SQUARE) in modular quantum programs that could be read- surge of hardware prototypes from both academia and in- ily applied to both NISQ and FT machines. SQUARE is a dustry [1]–[4]. This phase of QC development is commonly compiler that automatically determines places in a program to referred as the Noisy Intermediate-Scale Quantum (NISQ) perform such uncomputation in order to manage the trade-offs era [5]. Current quantum computers are able to perform on in qubit savings and gate costs and to optimize for the overall the order of hundreds of quantum operations (gates) using resource consumption. tens to hundreds of quantum bits (qubits). While modest in Optimally choosing reclamation points in a program is scale, these NISQ machines are large and reliable enough to crucial in minimizing resource consumption. This is because perform some computational tasks. Looking beyond the NISQ reclaiming too often can result in significant time cost (due to era, quantum computers will ultimately arrive at the Fault- more gates dedicated to uncomputation). Likewise, reclaiming Tolerant (FT) era [6], [7], where quantum error correction is too seldom may require too many qubits (e.g. fail to fit the program in the machine). For example, Figure 1 shows how Corresponding author: [email protected] qubit usage changes over time for the modular exponentia- tion step in Shor’s algorithm [16]. Unfortunately, finding the 300 optimal points in a program for reclaiming qubit could get Max qubits of machine 250 Coherence timeof qubits extremely complex [17], [18]. An efficient qubit reuse strategy Too many qubits Eager: Always reclaim will play a pivotal role in enabling the execution of programs 200 on resource-constrained machines. Lazy: Never reclaim SQUARE: Selectively reclaim 150 To precisely estimate the workload of a computational Qubits Too many gates task, we propose a resource metric called “active quantum 100 volume” (AQV) that evaluates the “liveness” of qubit during the lifetime of the program, which we will formally introduce 50 in Section III-B. This is inspired by the concept of “quantum 0 volume” introduced by IBM [19], a common measure for 0 200 000 400 000 600 000 800 000 1.0×106 1.2×106 1.4×106 the computational capability of a quantum hardware device, Time based on parameters such as number of qubits, number of Fig. 1: Qubit usage over time for Modular Exponentiation. The gates, and error probability. AQV is a metric that measures the shaded area under the curve corresponds to the active quantum volume of resource required by a program when executing on a volume of this application. The blue curve, representing a target hardware, which can therefore serve as an minimization balance between qubit reclamation and uncomputation, has the objective for the allocation and reclamation strategies. lowest area and is the best option. The contributions of our work are: II. BACKGROUND AND RELATED WORK • We present a heuristic-based compilation framework, A. Basics of Quantum Computing called SQUARE, for optimizing qubit allocation and reclamation in modular reversible programs. It leverages Quantum computers are devices that harness quantum me- the knowledge of qubit locality as well as program chanics to store and process information. For this paper, we modularity and parallelism. highlight three of the basic rules derived from the principles • We introduce a resource metric, active quantum volume of quantum mechanics: (AQV), that calculates the “liveness” of qubits over the • Superposition rule: A quantum bit (qubit) can be in a lifetime of a program. This new metric allows us to quan- quantum state of a linear combination of 0 and 1: j i = tify the effectiveness of various optimization strategies, as α j0i + β j1i, where α and β are complex amplitudes well as to characterize the volume of resources consumed satisfying jαj2 + jβj2 = 1. by different computational tasks. • Transformation rule: Computation on qubits is accom- • Our approach fits computations into resource-constrained plished by applying a unitary quantum logic gate that NISQ machines by strategically reusing qubits. Surpris- maps from one quantum state to another. This process is ingly, adding gates for uncomputation can improve the reversible and deterministic. fidelity of a program rather than impair it, as it creates • Measurement rule: Measurement or readout of a qubit ancilla with better locality, leading to substantially fewer j i = α j0i + β j1i collapses the quantum state to swap gates and thus less gate noise. SQUARE improves classical outcomes: j 0i = j0i with probability jαj2 and the success rate of NISQ applications by 1.47X on j 0i = j1i with probability jβj2. This is irreversible and average. probabilistic. • Our approach has broad applicability from NISQ to FT 1) Reversibility constraints.: The above three rules give machines. SQUARE achieves an average reduction of rise to the potential computing power that quantum computers 1.5X (and up to 9.6X) in active quantum volume for FT possess, but at the same time, they impose strict constraints systems. on what we can do in quantum computation. For example, the transformation rule implies that any quantum logic gate The rest of the paper is organized as follows: Section II we apply to a qubit has to be reversible. The classical AND briefly discusses the basics of quantum computation and com- gate in Figure 2 is not reversible because we cannot recover pilation of reversible arithmetic to quantum circuits, as well the two input bits based solely on one output bit. To make it as related work in both classical and quantum compilation. reversible, we could introduce a scratch bit, called ancilla, to Section III illustrate the central problem of allocation and store the result out-of-place, as in controlled-controlled-NOT reclamation of ancilla tackled in this paper and the general idea gate (or Toffoli gate) in Figure 2. Note that we use the of our solution. Section
Recommended publications
  • Quantum Computing
    Quantum Computing Professor Lloyd Hollenberg IBM Q Hub @ The University of Melbourne, ARC Centre for Quantum Computation & Communication Technology CHEP2019, Nov 7 2019 Outline Introduction – quantum logic and information processing Quantum search 101 – the QUI system Quantum error correction and scale-up QUIspace.org Quantum factoring, HPC simulations Emerging quantum computers, “supremacy” IBM Q Hub @ UoM – research highlights Quantum computing and HEP IBM.com Quantum computing: drivers Hard problems: generally scale poorly with (classical) CPU resources, technology plateau (Moore’s Law final gasp) 10um Hard problems log(feature size) log(feature space of quantum Intel 80386 (1um) apps being explored Pentium 4 (100 nm) Quantum Intel CORE17 easy & easier e.g. optimisation, finance, AI, Single atom transistor atomic quantum materials… limit regime Easy problems 1970 1980 1990 2000 2010 2020 2030 Conventional transistor miniaturization Quantum computers based on the laws of …the end of Moore’s Law is nigh quantum mechanics circumvent limitations of classical information processing Bubble: commons.wikimedia.org Bursting bubble: williamhortonphotography.com Quantum information…the important bits |1 Quantum superposition – multiple possibilities existing at the same time |0 Quantum measurement – collapse to one possibility when ”observed” |1 Quantum entanglement – observation of one part affects another part |00ۧ + |11ۧ 2 Quantum logic Classical logic: bit by bit Quantum logic Classical NOT gate Quantum NOT gate both bits flipped at same
    [Show full text]
  • Empirical Evaluation of Circuit Approximations on NISQ Devices
    Empirical Evaluation of Circuit Approximations on Noisy Quantum Devices Ellis Wilson Frank Mueller North Carolina State University North Carolina State University [email protected] [email protected] Lindsay Bassman Costin Iancu Lawrence Berkeley National Lab Lawrence Berkeley National Lab [email protected] [email protected] ABSTRACT are fraught by a multitude of device, systemic, and environmental Noisy Intermediate-Scale Quantum (NISQ) devices fail to produce sources of noise that adversely affect results of computations [1]. A outputs with sufficient fidelity for deep circuits with many gates number of factors contribute to noise, or errors, experienced during today. Such devices suffer from read-out, multi-qubit gate and cross- the execution of a quantum program. These include talk noise combined with short decoherence times limiting circuit • noise related to limits on qubit excitation time and program depth. This work develops a methodology to generate shorter cir- runtime due to decoherence; cuits with fewer multi-qubit gates whose unitary transformations • noise related to operations, i.e., gates performing transfor- approximate the original reference one. It explores the benefit of mations on the states of one or more qubits; such generated approximations under NISQ devices. Experimen- • noise related to interference from (crosstalk with) other tal results with Grover’s algorithm, multiple-control Toffoli gates, qubits; and and the Transverse Field Ising Model show that such approximate • noise subject to the process of measuring the state of a qubit circuits produce higher fidelity results than longer, theoretically via a detector when producing a program’s output. precise circuits on NISQ devices, especially when the reference circuits have many CNOT gates to begin with.
    [Show full text]
  • Analysis on Quantum Computation with a Special Emphasis on Empirical Simulation of Quantum Algorithms
    CALIFORNIA STATE UNIVERSITY, NORTHRIDGE ANALYSIS ON QUANTUM COMPUTATION WITH A SPECIAL EMPHASIS ON EMPIRICAL SIMULATION OF QUANTUM ALGORITHMS A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computer Science By ARINDAM MISHRA MAY 2020 ©Copyright by ARINDAM MISHRA, 2020 All Rights Reserved ii The thesis of Arindam Mishra is approved: ___________________________________________ ________________ Katya Mkrtchyan, Ph.D. Date ___________________________________________ ________________ Robert D McIlhenny, Ph.D. Date ___________________________________________ ________________ John Noga, Ph.D., Chair Date California State University, Northridge iii DEDICATION I dedicate this paper to my grandmother Shivani Mishra, my parents Milan and Deepali Mishra, my uncle and aunt Kalyan and Chitra Mishra, and my two sweet sisters Ishani and Aishwarya Mishra, with my gratitude for their support and motivations throughout my life. iv Table of Contents COPYRIGHT ii DEDICATION ....................................................................................................................................... iv LIST OF TABLES ................................................................................................................................ vii LIST OF FIGURES ............................................................................................................................. viii ABSTRACT ..........................................................................................................................................
    [Show full text]
  • Arxiv:2003.01293V4 [Quant-Ph] 9 Apr 2021 Measurement [6] and a Quantum Volume [7] Measurement of 26 = 64
    Demonstration of the trapped-ion quantum CCD computer architecture J. M. Pino,∗ J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyenhuis Honeywell Quantum Solutions, 303 S. Technology Ct., Broomfield, Colorado 80021, USA The trapped-ion QCCD (quantum charge-coupled de- plex quantum algorithms. While small trapped-ion quan- vice) proposal [1,2] lays out a blueprint for a universal tum computers only require a single ion crystal in a single quantum computer. The proposal calls for a trap de- trapping region, this approach is unlikely to scale beyond vice to confine multiple arrays of ions, referred to as ion one hundred qubits [11]. Efforts have therefore focused crystals. Quantum interactions between ions in a sin- on architectures that use either a network of ion crystals gle crystal occur via electric- or magnetic-field induced connected through photonic links [12, 13] or physically spin-motion coupling. Interactions between ions initially transporting ions between different crystals [14, 15] and in different crystals are achieved by using precisely con- eventually even across trap modules [16]. In this article trolled electric fields to split the crystals, transport the we demonstrate progress towards the latter approach. ions to new locations, and then combine them to form The QCCD architecture aims to create a high fidelity, new crystals where the desired interactions are applied. scalable quantum computer, at the cost of some chal- High-fidelity physical transport is the key to two advan- lenging requirements: 1 { a device that can trap multi- tages of the QCCD architecture: high-fidelity interac- ple small ion crystals capable of high fidelity operations, tions between distant qubits and low crosstalk between 2 { fast transport operations for moving ions between gates.
    [Show full text]
  • Towards a Quantum Computing Algorithm for Helicity Amplitudes and Parton Showers
    PHYSICAL REVIEW D 103, 076020 (2021) Towards a quantum computing algorithm for helicity amplitudes and parton showers † ‡ Khadeejah Bepari ,1,* Sarah Malik ,2, Michael Spannowsky,1, and Simon Williams 2,§ 1Institute for Particle Physics Phenomenology, Department of Physics, Durham University, Durham DH1 3LE, United Kingdom 2High Energy Physics Group, Blackett Laboratory, Imperial College, Prince Consort Road, London SW7 2AZ, United Kingdom (Received 21 December 2020; accepted 18 March 2021; published 26 April 2021) The interpretation of measurements of high-energy particle collisions relies heavily on the performance of full event generators, which include the calculation of the hard process and the subsequent parton shower step. With the continuous improvement of quantum devices, dedicated algorithms are needed to exploit the potential quantum that computers can provide. We propose general and extendable algorithms for quantum gate computers to facilitate calculations of helicity amplitudes and the parton shower process. The helicity amplitude calculation exploits the equivalence between spinors and qubits and the unique features of a quantum computer to compute the helicities of each particle involved simultaneously, thus fully utilizing the quantum nature of the computation. This advantage over classical computers is further exploited by the simultaneous computation of s- and t-channel amplitudes for a 2 → 2 process. The parton shower algorithm simulates collinear emission for a two-step, discrete parton shower. In contrast to classical implementa- tions, the quantum algorithm constructs a wave function with a superposition of all shower histories for the whole parton shower process, thus removing the need to explicitly keep track of individual shower histories. Both algorithms utilize the quantum computers ability to remain in a quantum state throughout the computation and represent a first step towards a quantum computing algorithm describing the full collision event at the LHC.
    [Show full text]
  • Density Matrix Quantum Circuit Simulation Via the BSP Machine on Modern GPU Clusters
    Industrial and Systems Engineering Density Matrix Quantum Circuit Simulation via the BSP Machine on Modern GPU Clusters ANG LI1,OMER SUBASI1,XIU YANG2, AND SRIRAM KRISHNAMOORTHY3 1Pacific Northwest National Laboratory (PNNL), Richland, WA, 99354, USA 2Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA 3Washington State University, Pullman, WA, 99164, USA ISE Technical Report 20T-029 Density Matrix Quantum Circuit Simulation via the BSP Machine on Modern GPU Clusters † ‡ Ang Li∗, Omer Subasi∗, Xiu Yang∗ and Sriram Krishnamoorthy∗ ∗Pacific Northwest National Laboratory (PNNL), Richland, WA, 99354, USA †Lehigh University, Bethlehem, PA, 18015, USA ‡Washington State University, Pullman, WA, 99164, USA Email: [email protected], [email protected], [email protected], [email protected] ________ L Abstract—As quantum computers evolve, simulations of quan- j0 = 0 H H tum programs on classical computers will be essential in vali- | i | i • • • _____ ___ ______L __ dating quantum algorithms, understanding the effect of system j1 = 0 H S H noise, and designing applications for future quantum computers. | i | i • • _ ____ ___ ______L __ In this paper, we first propose a new multi-GPU programming j2 = 0 H T S H methodology called MG-BSP which constructs a virtual BSP | i | i • _ ____ ___ machine on top of modern multi-GPU platforms, and apply s this methodology to build a multi-GPU density matrix quantum 0 | i U 4 U 2 U simulator called DM-Sim. We propose a new formulation that s1 can significantly reduce communication overhead, and show | i that this formula transformation can conserve the semantics Fig.
    [Show full text]