Arxiv:2006.01273V1 [Quant-Ph] 1 Jun 2020 Performance of the System According to Our Bench- 5.1 Full Stack Benchmarking

Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack Daniel Mills∗ 1,2, Seyon Sivarajah2, Travis L. Scholten3, and Ross Duncan2,4 1University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK 2Cambridge Quantum Computing Ltd, 9a Bridge Street, Cambridge, CB2 1UB, UK 3IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 4University of Strathclyde, 26 Richmond Street, Glasgow, G1 1XH, UK June 3, 2020 Abstract Contents Quantum computing systems need to be bench- 1 Introduction2 marked in terms of practical tasks they would be expected to do. Here, we propose 3 “application- 2 Circuit Classes4 motivated” circuit classes for benchmarking: deep 2.1 Shallow Circuits: IQP . .4 (relevant for state preparation in the variational 2.2 Square Circuits: Random Circuit quantum eigensolver algorithm), shallow (inspired Sampling . .5 by IQP-type circuits that might be useful for near- 2.3 Deep Circuits: Pauli Gadgets . .7 term quantum machine learning), and square (inspired by the quantum volume benchmark). We 3 Figures of Merit7 quantify the performance of a quantum computing 3.1 Heavy Output Generation Bench- system in running circuits from these classes using marking . .8 several figures of merit, all of which require expo- 3.2 Cross-Entropy Difference . 10 nential classical computing resources and a polyno- 3.3 `1-Norm Distance . 11 mial number of classical samples (bitstrings) from 3.4 Metric Comparison . 12 the system. We study how performance varies with the compilation strategy used and the de- 4 Quantum Computing Stack 12 vice on which the circuit is run. Using systems 4.1 Software Development Kits . 12 made available by IBM Quantum, we examine their 4.2 Compilers . 13 performance, showing that noise-aware compilation 4.3 Devices . 13 strategies may be beneficial, and that device con- nectivity and noise levels play a crucial role in the 5 Experimental Results 14 arXiv:2006.01273v1 [quant-ph] 1 Jun 2020 performance of the system according to our bench- 5.1 Full Stack Benchmarking . 16 marks. 5.2 Application Motivated Benchmarks . 22 5.3 Insights from Classical Simulation . 24 6 Conclusion 25 A Exponential Distribution 35 ∗Corresponding author: A.1 Square Circuits . 35 [email protected] A.2 Deep Circuits . 35 1 A.3 Shallow Circuits . 35 – should be benchmarked collectively. Such “full- stack” benchmarking provides information that B Compilation Strategies 37 benchmarking individual stack components cannot, as it captures the performance of the system as an C Device Data 38 integrated unit. C.1 Device Coupling Maps . 39 At the same time, running a full-stack bench- C.2 Device Calibration Information . 39 mark on a fixed computational system, while useful for tracking the performance of that system D Empirical Relationship Between over time, provides little information on how differ- Heavy Output Generation Probabil- ent combinations of the stack’s components could ity and L1 Distance 40 change system performance. For this reason, full- stack benchmarking should, as much as possible, make explicit the variable components of the stack, 1 Introduction and systematically vary those components to see how the inclusion of a particular component affects As quantum computers evolve from bespoke lab- system-level performance. oratory experiments comprising a handful of Here, we will focus on benchmarking systems qubits, to more general-purpose, programmable, made available by IBM Quantum, and investigate commercial-grade systems [1–5], new techniques for two components of the stack: the compilation strat- characterizing them are needed. Quantum char- egy used to map an abstract circuit onto one that is acterization, validation, and verification (QCVV) executable on a quantum computer and the device protocols to detect, diagnose, and quantify errors used to run the compiled circuit and return the re- in quantum computers, originally focused on prop- sults. While the particular systems used here have erties of one or several qubits (e.g., T and T times, 1 2 other components (such as pulse synthesizers), we gate error rates, state preparation fidelity, etc). As do not look at the impact of those pieces on full- multi-qubit quantum computing systems develop, stack performance. the scope of QCVV must expand. In particular, The design of new compilers for quantum cir- a need has arisen for “holistic” benchmarks - ones cuits is an active area of research, especially “noise- which stress test a quantum computing system in aware” compilation strategies which use knowledge its entirety, not just individual components. Holis- of the physical properties of the system’s qubits tic benchmarks are desirable for two reasons: they to improve results [1,9–11]. The proliferation of enable comparison across different systems1, and compilers necessitates understanding how the in- allow for tracking the performance of a fixed sys- clusion of particular compilation strategies in the tem over time. stack affects performance. Problem instances re- “Holistic benchmarking” of a quantum comput- quiring compilation, which are often more repre- ing system could refer to benchmarking the physi- sentative of real world problems, typically show dif- cal implementation of a collection of qubits, with- fering performance from those that do not [12]. In out referring to the computational task these qubits particular, noise-aware compilation strategies make would perform. This idea is most useful when test- assumptions about the influence of noise processes ing physical properties of a collection of qubits2. on overall system performance, so full-stack bench- The complementary view (taken in this work) is marking is necessary to verify those assumptions. that holistic benchmarks test the quantum compu- The benchmarks defined here have two parts: tational capabilities of the complete system. Under a circuit class and a figure of merit. The cir- this view, the entire compute stack – qubits, com- cuit class describes the type of circuit to be run pilation strategy, classical control hardware, etc. by the system, and the figure of merit quantifies 1This should be compared to the benchmarking of classi- how well the system did when running circuits from cal computers, with the LINPACK benchmarks [6,7] being that class. This approach is inspired by volumetric used to build the TOP500 ranking of supercomputers [8]. benchmarking [13]. 2A simple example is crosstalk detection, where the output of the benchmarking could be a table of coupling values Because quantum computing systems are used between all connected qubits. for particular applications, the circuits classes 2 should, in some way, test the performance of a sys- Section2 provides details of these circuit classes, tem in those arenas [14]. At least two notions have and presents algorithms for generating them. been put forth as to how to define such classes. One How well a stack executes a circuit is assessed proposes benchmarks based on often-used quantum here via continuous figures of merit, rather than bi- algorithmic primitives [13], the examples given be- nary ones which may only verify correctness. This ing primitives of Grover iterations and Trotterized is because the outcomes from noisy devices will Hamiltonian simulation. likely not be correct, while information about close- An alternative is to pick a particular instance of ness to the correct answer is still highly valuable. an application and check for the accuracy of the Further, techniques for the verification of univer- results returned by the system when running that sal quantum computation requires many qubits or instance. Naturally, to measure non-negligible ac- qubit communication or both, none of which are curacy on noisy near-term systems the applications accessible using present-day noisy devices [39, 40]. and instances must also be near-term by design. Indeed, to reflect the current state-of-the-art, where Such benchmarks have been defined in the context there exist few devices with limited networking be- of quantum simulation [15–19], quantum machine tween them [4, 41], we will focus on examples of learning [14, 20–23], discrete optimisation [12, 24– how classical computers can be used to perform 26], and quantum computational supremacy [3, 27– benchmarks, as opposed to using small quantum 29]. This approach has the advantage that the computers to benchmark each other [42, 43] definition of success is fairly straightforward. The We use three figures of merit, calculated us- downside is that performance as measured by one ing classical computers. These are: heavy out- instance of an application may not be predictive of put generation probability [44], cross-entropy dif- performance for the application generically. ference [29], and `1-norm distance. Estimating each The “application-motivated” circuit classes de- of these figures of merit requires knowledge about fined here draw inspiration from [13] (looking at the ideal (noise-free) outcome probabilities of bit- computational primitives) but also draw inspira- strings the system could produce. tion from the literature above, by focusing on com- In practice, calculating the ideal outcome proba- putational primitives of near-term quantum com- bilities requires direct simulation of the circuit un- puting applications (chemistry and machine learn- der consideration. Consequently, scaling to tens or ing, in particular). A system which does well on an hundreds of qubits will be challenging in general, application-motivated benchmark should do well in particularly if

Arxiv:2006.01273V1 [Quant-Ph] 1 Jun 2020 Performance of the System According to Our Bench- 5.1 Full Stack Benchmarking

Lattice Surgery with a Twist: Simplifying Clifford Gates of Surface Codes

Arxiv:2003.09412V2 [Quant-Ph] 7 Jul 2021 Hadamard-Free Circuits Expose the Structure of the Clifford Group

Quantum Topological Error Correction Codes Are Capable of Improving the Performance of Clifford Gates

Universal Quantum Computation with Ideal Clifford Gates and Noisy Ancillas

Randomized Benchmarking of Two-Qubit Gates

Supercomputer Simulations of Transmon Quantum Computers Quantum Simulations of Transmon Supercomputer

Quantum Proofs Can Be Verified Using Only Single Qubit Measurements

Arxiv:2007.08532V2 [Quant-Ph]

A Study of the Robustness of Magic State Distillation Against Clifford Gate Faults

Superconducting Qubits: Current State of Play Arxiv:1905.13641V3

Logical Clifford Synthesis for Stabilizer Codes

Stim: a Fast Stabilizer Circuit Simulator Craig Gidney