<<

Application-Motivated, Holistic Benchmarking of a Full Stack

Daniel Mills∗ 1,2, Seyon Sivarajah2, Travis L. Scholten3, and Ross Duncan2,4 1University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK 2Cambridge Quantum Computing Ltd, 9a Bridge Street, Cambridge, CB2 1UB, UK 3IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA 4University of Strathclyde, 26 Richmond Street, Glasgow, G1 1XH, UK

June 3, 2020

Abstract Contents

Quantum computing systems need to be bench- 1 Introduction2 marked in terms of practical tasks they would be expected to do. Here, we propose 3 “application- 2 Circuit Classes4 motivated” circuit classes for benchmarking: deep 2.1 Shallow Circuits: IQP ...... 4 (relevant for state preparation in the variational 2.2 Square Circuits: Random Circuit quantum eigensolver algorithm), shallow (inspired Sampling ...... 5 by IQP-type circuits that might be useful for near- 2.3 Deep Circuits: Pauli Gadgets . . . .7 term ), and square (in- spired by the quantum volume benchmark). We 3 Figures of Merit7 quantify the performance of a quantum computing 3.1 Heavy Output Generation Bench- system in running circuits from these classes using marking ...... 8 several figures of merit, all of which require expo- 3.2 Cross-Entropy Difference ...... 10 nential classical computing resources and a polyno- 3.3 `1-Norm Distance ...... 11 mial number of classical samples (bitstrings) from 3.4 Metric Comparison ...... 12 the system. We study how performance varies with the compilation strategy used and the de- 4 Quantum Computing Stack 12 vice on which the circuit is run. Using systems 4.1 Software Development Kits . . . . . 12 made available by IBM Quantum, we examine their 4.2 Compilers ...... 13 performance, showing that noise-aware compilation 4.3 Devices ...... 13 strategies may be beneficial, and that device con- nectivity and noise levels play a crucial role in the 5 Experimental Results 14 arXiv:2006.01273v1 [quant-ph] 1 Jun 2020 performance of the system according to our bench- 5.1 Full Stack Benchmarking ...... 16 marks. 5.2 Application Motivated Benchmarks . 22 5.3 Insights from Classical Simulation . 24

6 Conclusion 25

A Exponential Distribution 35 ∗Corresponding author: A.1 Square Circuits ...... 35 [email protected] A.2 Deep Circuits ...... 35

1 A.3 Shallow Circuits ...... 35 – should be benchmarked collectively. Such “full- stack” benchmarking provides information that B Compilation Strategies 37 benchmarking individual stack components cannot, as it captures the performance of the system as an C Device Data 38 integrated unit. C.1 Device Coupling Maps ...... 39 At the same time, running a full-stack bench- C.2 Device Calibration Information . . . 39 mark on a fixed computational system, while use- ful for tracking the performance of that system D Empirical Relationship Between over time, provides little information on how differ- Heavy Output Generation Probabil- ent combinations of the stack’s components could ity and L1 Distance 40 change system performance. For this reason, full- stack benchmarking should, as much as possible, make explicit the variable components of the stack, 1 Introduction and systematically vary those components to see how the inclusion of a particular component affects As quantum computers evolve from bespoke lab- system-level performance. oratory experiments comprising a handful of Here, we will focus on benchmarking systems , to more general-purpose, programmable, made available by IBM Quantum, and investigate commercial-grade systems [1–5], new techniques for two components of the stack: the compilation strat- characterizing them are needed. Quantum char- egy used to map an abstract circuit onto one that is acterization, validation, and verification (QCVV) executable on a quantum computer and the device protocols to detect, diagnose, and quantify errors used to run the compiled circuit and return the re- in quantum computers, originally focused on prop- sults. While the particular systems used here have erties of one or several qubits (e.g., T and T times, 1 2 other components (such as pulse synthesizers), we gate error rates, state preparation fidelity, etc). As do not look at the impact of those pieces on full- multi- quantum computing systems develop, stack performance. the scope of QCVV must expand. In particular, The design of new compilers for quantum cir- a need has arisen for “holistic” benchmarks - ones cuits is an active area of research, especially “noise- which stress test a quantum computing system in aware” compilation strategies which use knowledge its entirety, not just individual components. Holis- of the physical properties of the system’s qubits tic benchmarks are desirable for two reasons: they to improve results [1,9–11]. The proliferation of enable comparison across different systems1, and compilers necessitates understanding how the in- allow for tracking the performance of a fixed sys- clusion of particular compilation strategies in the tem over time. stack affects performance. Problem instances re- “Holistic benchmarking” of a quantum comput- quiring compilation, which are often more repre- ing system could refer to benchmarking the physi- sentative of real world problems, typically show dif- cal implementation of a collection of qubits, with- fering performance from those that do not [12]. In out referring to the computational task these qubits particular, noise-aware compilation strategies make would perform. This idea is most useful when test- assumptions about the influence of noise processes ing physical properties of a collection of qubits2. on overall system performance, so full-stack bench- The complementary view (taken in this work) is marking is necessary to verify those assumptions. that holistic benchmarks test the quantum compu- The benchmarks defined here have two parts: tational capabilities of the complete system. Under a circuit class and a figure of merit. The cir- this view, the entire compute stack – qubits, com- cuit class describes the type of circuit to be run pilation strategy, classical control hardware, etc. by the system, and the figure of merit quantifies 1This should be compared to the benchmarking of classi- how well the system did when running circuits from cal computers, with the LINPACK benchmarks [6,7] being that class. This approach is inspired by volumetric used to build the TOP500 ranking of supercomputers [8]. benchmarking [13]. 2A simple example is crosstalk detection, where the out- put of the benchmarking could be a table of coupling values Because quantum computing systems are used between all connected qubits. for particular applications, the circuits classes

2 should, in some way, test the performance of a sys- Section2 provides details of these circuit classes, tem in those arenas [14]. At least two notions have and presents algorithms for generating them. been put forth as to how to define such classes. One How well a stack executes a circuit is assessed proposes benchmarks based on often-used quantum here via continuous figures of merit, rather than bi- algorithmic primitives [13], the examples given be- nary ones which may only verify correctness. This ing primitives of Grover iterations and Trotterized is because the outcomes from noisy devices will Hamiltonian simulation. likely not be correct, while information about close- An alternative is to pick a particular instance of ness to the correct answer is still highly valuable. an application and check for the accuracy of the Further, techniques for the verification of univer- results returned by the system when running that sal quantum computation requires many qubits or instance. Naturally, to measure non-negligible ac- qubit communication or both, none of which are curacy on noisy near-term systems the applications accessible using present-day noisy devices [39, 40]. and instances must also be near-term by design. Indeed, to reflect the current state-of-the-art, where Such benchmarks have been defined in the context there exist few devices with limited networking be- of quantum simulation [15–19], quantum machine tween them [4, 41], we will focus on examples of learning [14, 20–23], discrete optimisation [12, 24– how classical computers can be used to perform 26], and quantum computational supremacy [3, 27– benchmarks, as opposed to using small quantum 29]. This approach has the advantage that the computers to benchmark each other [42, 43] definition of success is fairly straightforward. The We use three figures of merit, calculated us- downside is that performance as measured by one ing classical computers. These are: heavy out- instance of an application may not be predictive of put generation probability [44], cross-entropy dif- performance for the application generically. ference [29], and `1-norm distance. Estimating each The “application-motivated” circuit classes de- of these figures of merit requires knowledge about fined here draw inspiration from [13] (looking at the ideal (noise-free) outcome probabilities of bit- computational primitives) but also draw inspira- strings the system could produce. tion from the literature above, by focusing on com- In practice, calculating the ideal outcome proba- putational primitives of near-term quantum com- bilities requires direct simulation of the circuit un- puting applications (chemistry and machine learn- der consideration. Consequently, scaling to tens or ing, in particular). A system which does well on an hundreds of qubits will be challenging in general, application-motivated benchmark should do well in particularly if the `1-norm distance is used as the running the application the benchmark was derived figure of merit. However, by considering circuits from. Three such “application-motivated” circuit with few qubits we allow ourselves the ability to classes are introduced here. Drawing inspiration simulate the circuits classically, and to gain an in- from the volumetric benchmarking approach, the sight into the behaviour of larger devices [3, 45]. classes cover varying depth regimes and are (some- We refer to a set of benchmarks as a bench- what) controllable in depth. In brief, the classes – marking suite, each benchmark being defined by as labelled by their depth regimes – are: unique combinations of each circuit class and figure of merit. Using a benchmarking suite enables the Deep: Inspired by product formula circuits, in- derivation of broad insights about the behaviour cluding state preparation circuits used in the and performance of a quantum computing system variational quantum eigensolver (VQE) algo- across a wide variety of possible applications. Their rithm for quantum chemistry [30–32]. varying demands on the quantum computing re- sources (qubits, depth) allows for the exploration Shallow: Inspired by hardware-efficient ansatze of the best routes to extract the most utility from [33, 34] which may be useful for near-term near-term quantum computers. In sum, our bench- quantum machine learning and chemistry ap- marking approach is both application-motivated plications [35–37]. and holistic. The remainder of this paper is comprised as fol- Square: Inspired by the circuits used to calculate lows: Section2 details the circuit classes, including a system’s quantum volume [38]. algorithms for generating the circuits; Section3 ex-

3 plains the figures of merit we use; Section4 intro- Pauli-X basis, acting on the |0in state, with mea- duces the software stack, as well as hardware made surement taking place in the computational basis. available by IBM Quantum, that comprise the sys- For this class of circuits, Theorem1 applies. tems we’ll be benchmarking; and Section5 shows Theorem 1 (Informal [48]) Assuming either the results of our benchmarking. We conclude in one of two conjectures, relating to the hardness Section6. of approximating the Ising partition function and the gap of degree 3 polynomials, and the stability 2 Circuit Classes of the Polynomial Hierarchy3, it is impossible to classically sample from the output probability This section presents the formal definitions of the distribution of any IQP circuit in polynomial time, circuits used in this work, while also identifying the up to an `1-norm distance of 1/192. motivations for their use in benchmarking. These This class is called “instantaneous" because these motivations include both the class of applications gates commute with one another, which in turn re- they represent and the properties of the quantum duces the amount of time that the quantum state computing stacks that they will probe. Collectively, will need to be stored. In addition, the impossibil- this selection of circuit classes encompass an ar- ity of simulating IQP circuits is shown to hold when ray of potential applications of quantum comput- restricted by physically motivated constraints such ing, covering circuits of varied depth, connectivity, as limited connectivity and constant error rates on and gate types. each qubit [49]. An equivalent, commonly-considered definition is 2.1 Shallow Circuits: IQP that IQP circuits consist of gates diagonal in the Pauli-Z basis, sandwiched between two layers of Instantaneous Quantum Polytime (IQP) circuits Hadamard gates acting on all qubits. Algorithm1 [46] can be implemented using commuting gates. is used to generate IQP circuits of this form. Note As well as being simpler to implement than uni- that Algorithm1 limits the connectivity allowed be- versal quantum circuits, there are strong theoreti- tween the qubits, so it does not generate all circuits cal reasons to believe that, even in the presence of in the IQP class. noise, IQP circuits cannot be simulated using classi- The depth of this circuit may be arrived at by cal computers [47–49]. This has allowed for the ap- observing that finding an optimal order of applica- plication of noisy quantum technology in areas such tion of CZ is equivalent to finding a edge colouring as machine learning [35, 36] and interactive two- of the graph Gn. In this case a 4-colouring can player games [43, 46]. The connection between IQP be found in polynomial time [51]. Algorithm1 in- and a demonstration of quantum computational cludes discrete randomness over the graphs, Gn, supremacy on near-term hardware makes their im- and continuous randomness over the rotation an- plementation a pertinent benchmark of the perfor- gles, αi. mance of these devices. The design of circuits in Algorithm1 may be The shallow class of circuits, whose depth in- compared to other sparse IQP circuits [49], IQP creases slowly with width, is a subclass of IQP circuits on 2D latices [49, 52], and random 3- circuits. These circuits probe the performance of regular graphs used for benchmarking [12]. For our a quantum computing stack in fine-grained detail purposes these require too high-connectivity, are by measuring the impact of including more qubits too architecture-specific, and are too application- (quasi-) independently of increasing circuit depth. specific, respectively. There are sparse IQP cir- This is useful when for understanding the perfor- cuits for which verification schemes exist [52, 53] al- mance of a device being utilised for applications though the connectivity is too architecture-specific whose qubit requirement grows more quickly than for our purposes, with the verification scheme re- the circuit depth. quiring limits to the measurement noise which we cannot guarantee. Definitions and Related Results An n-qubit 3The non-collapse of the Polynomial Hierarchy is widely IQP circuit consists of gates that are diagonal in the conjectured to be true [50].

4 Discussion The close connection, through The- orem1, of quantum computational supremacy and 4 shallow circuits , explicitly measured in `1-norm distance, provides a measure of a quantum comput- ing stack’s quality; namely, by analysing the close- ness of the distributions it produces to the ideal ones, as measured by the `1-norm distance, and comparing this value to 1/192. Algorithm 1 The pattern for building shallow cir- However, as the output probabilities of shallow cuits. circuits are not exponentially distributed, we can- not use Cross-Entropy Benchmarking. Similarly Input: Number of qubits, n ∈ Z the theoretical value of heavy output probability Worst case depth: 7 for circuits with exponentially distributed output Output: Circuit, Cn probabilities, as discussed in Section 3.1, cannot be used here. 1: Initialise n qubits, labelled q1, ..., qn, in the Instead, we use the empirical value of the ideal state |0i. heavy output probability, in the place of a theoret- 2: ically derived one, as a point of comparison with 3: for all i ∈ {1, ..., n} do the behaviour of the quantum computing stack be- 4: Act H on qi ing benchmarked. This approach requires calcu- 5: end for lation of all output probabilities and summation 6: of the probabilities of those that are heavy. This 7: Generate a random binomial graph, Gn, with n can be done for the small circuits investigated here, vertices and edge probability 0.5, post selecting but allows for the benchmarking of fewer qubits on those that are connected and have degree than would be accessible if a theoretical value was less than 4. known. 8: Before compilation shallow circuits have constant 9: for all edges {i, j} in Gn do depth, allowing us to measure the impact of in- 10: Act CZ between qi and qj creasing circuit width independently of increasing 11: end for circuit depth. Further, because Algorithm1 lim- 12: its the connectivity allowed between the qubits, 13: for all i ∈ {1, ..., n} do the increase in circuit depth due to compilation 14: Generate αi ∈ [0, 2π] uniformly at random. onto limited-connectivity architectures is also min- 15: Act RZ (αi) on qi . imised, while avoiding a choice of connectivity 16: end for favouring one device in particular. By bounding 17: connectivity, but allowing all connections in prin- 18: for all i ∈ {1, ..., n} do ciple, we avoid biasing against architectures that 19: Act H on qi allow all-to-all connectivity, which would still per- 20: end for form well. 21: 22: Measure q1, ..., qn in the computational basis 2.2 Square Circuits: Random Cir- cuit Sampling While circuits required for applications are typi- cally not random, sampling from the output dis-

4While Theorem1 is a worst case hardness result, and may not apply to shallow circuits, we regard their perfor- mance as indicative of that of those for which it does. In- deed, similar hardness results to Theorem1 exist for other families of sparse, constant depth IQP circuits [52].

5 tributions of random circuits built from two-qubit The circuits used here – which are almost iden- gates has been suggested as a means to demonstrate tical to those used for the quantum volume bench- quantum computational supremacy [29, 44, 54, 55]. mark [38] – are generated according to Algorithm2. Further, by utilising uniformly random two-qubit We refer to this class of circuits as square circuits, unitaries, the class we define here, which we refer and note that they consist of n layers of two-qubit to as square circuits, provides a benchmark at all gates acting between a bipartition of the qubits. layers of the quantum computing stack. In particu- There is discrete randomness over the possible bi- lar it tests the ability of the device to implement a partition of the qubits, and continuous randomness universal gate set, the diversity and quality of the over the random two-qubit SU (4) gates. gates available, and the compilation strategy’s abil- ity to decompose these gates to the native architec- Algorithm 2 The pattern for building square cir- ture. Further, as quantum circuits can always be cuits. approximated up to arbitrary precision using two- Input: Number of qubits, n ∈ qubit unitary gates [56], square circuits can help us Z Worst case depth: n understand the performance of quantum computing Output: Circuit, C stacks when implementing computations requiring n a universal gate set. 1: Initialise n qubits, labelled q1, ..., qn, in the state |0i Definitions and Related Results A random 2: circuit, for a fixed number of qubits n and coupling 3: for each layer t up to depth n do map Gn, is generated by applying m = poly (n) 4: . The contents of this for loop constitutes a uniformly random two-qubit SU (4) gates between layer. The choice of the number of layers used qubits connected by edges of G . Here, “uniformly n here is discussed in Appendix A.1. random” means according to the Haar measure. 5: Random Circuit Sampling (RCS) is the task of 6: Divide the qubits into b n c pairs {q , q } producing samples from the output distribution of 2 i,1 i,2 at random. random circuits. To perform RCS approximately 7: for all i ∈ , 0 ≤ i ≤ b n c do is to sample from a distribution close to that pro- Z 2 8: Generate U ∈ SU (4) uniformly at ran- duced by the random circuit. This task has been i,t dom according to the Haar measure. shown to be hard even in the average case [54, 55], 9: Act U on qubits q and q . as outlined in Theorem2, which improves upon the i,t i,1 i,2 10: end for worst case result for IQP circuits as seen in Theo- 11: end for rem1. 12: Theorem 2 (Informal [54]) There exists a col- 13: Measure all qubits in the computational basis. lection of coupling maps Gn, with one for each n, and procedure for generating random circuits re- specting each G , for which there is no classical n Discussion By allowing two-qubit gates to act randomised algorithm that performs approximate between any pair of qubits in the uncompiled cir- RCS, to within inverse polynomial ` -norm dis- 1 cuit, square circuits avoid favouring any device in tance error, for a constant fraction of the random particular [3, 29, 44]. This choice adheres closely circuits. to our motivations of being hardware-agnostic. In The conditions imposed on which coupling maps addition, assuming all-to-all connectivity passes the and circuit generation procedures are covered by burden of mapping the circuit onto the device to the this theorem are quite mild, but in particular this compilation strategy, which is in line with our wish can be done using circuits with depth O (n) acting to benchmark the full quantum computing stack. on a 2D square lattice [44, 54]. While this is rele- That siad, any architecture whose coupling map vant for devices built using superconducting tech- closely mirrors the uncompiled circuit will be ad- nology [3], we wish to avoid biasing in favour of this vantaged, as even a naive compilation strategy will technology in particular. perform well in that case.

6 In [38] similar circuits are used but with all-to-all ning the VQE end-to-end. Focusing on the state connectivity restricted to nearest neighbour con- preparation portion of a VQE circuit, we might de- nectivity on a line, and the addition of permuta- duce performance of the quantum computing stack tion layers. As this disadvantages devices with a when running the VQE on a number of molecules6. completely connected coupling map5 [5], a property The intuition being that if the state preparation which would typically be an advantage, we choose sub-component is accurate, then the error in the not to make this restriction here. Notice, however, expectation values of measured observables will be that naively compiling square circuits onto an ar- due to errors in implementing those observables, or chitecture with nearest neighbour connectivity on the readout process itself. a line would result in the circuits of [38]. This simi- larity makes a comparison between experiments in- Definitions and Related Results These cir- volving these circuits relevant. As a result, com- cuits are built as in Algorithm3. They are con- piling square circuits to superconducting devices structed from several layers of Pauli Gadgets, each (where connectivity is low) will generally result in a acting on a random subset of n qubits. In the worst circuit similar to those used in the quantum volume case each Pauli Gadget will demand 4n + 1 gates: benchmark, as many SWAP operations are required 2n Pauli gates, 2 (n − 1) CX gates, and one RZ gate. regardless. In the construction of deep circuits there is discrete In addition, square circuits fulfil the necessary randomness over the choice of Pauli string, s, and conditions to apply HOG, as defined in Problem continuous randomness over a rotation angle α. 1. Namely, the distribution pC is sufficiently far from uniform in the required sense, as introduced Discussion By establishing the exponential dis- in Section 3.1, which we demonstrate in Appendix tribution of the output probabilities from deep cir- A.1. cuits, as we do in Appendix A.2, we allow our- selves the capacity to use Heavy Output Genera- 2.3 Deep Circuits: Pauli Gadgets tion Benchmarking and Cross-Entropy Benchmark- ing as introduced in Section3. This constitutes a Pauli gadgets [57] are quantum circuits implement- novel extension of those approaches to application ing an operation corresponding to exponentiating motivated benchmarking, and the unique ability for a Pauli tensor. Sequences of Pauli gadgets acting us to benchmark application-motivated circuits, us- on qubits form product formula circuits, most com- ing polynomially many samples from a device. This monly used in Hamiltonian simulation [30]. Many provides a novel insight into the capacity of near- algorithms employing these circuits require fault- term hardware to implement quantum chemistry tolerant devices, but they are also the basis of trial circuits. state preparation circuits in many variational algo- rithms, which are the most promising applications of noisy quantum computers. A notable exam- 3 Figures of Merit ple of this in quantum chemistry is the physically- motivated UCC family of trial states used in the Suppose a quantum computer is programmed to variational quantum eigensolver (VQE) [31, 58]. As run a circuit C or a unitary U. Figures of merit near-term quantum computers hold promise as use- compare pU (pC ), the ideal output probabilities for ful tools for studying quantum chemistry, we pro- U (C), and DU (DC ), the distributions produced pose that the quality of an implementation of these by an implementation, which may be noisy. The gadgets is a useful benchmark, and use them to remainder of this section outlines three figures of define the deep circuit class. merit and details: their definition, the continuous Note that the circuits in this class differ from run- range of values they can take, their dependence on noise, and the procedure for calculating their value 5Note that some compilation strategies may identify that the SWAP gates in the permutation layer may be removed 6Here we do not explore the relationship between the per- for devices with all-to-all connectivity. We avoid this depen- formance of a quantum computing stack when implementing dence on the compilation strategy by fixing the connectivity deep circuits and when implementing VQE but regard it as in the uncompiled circuit to be all-to-all. important for future work.

7 from samples produced by an implementation. As Algorithm 3 The pattern for building deep cir- noted in the introduction, we use continuous fig- cuits. ures of merit which require classical resources to compute. Input: Number of qubits, n ∈ Z Worst case depth: (4n − 1) (3n + 1) 3.1 Heavy Output Generation Output: Circuit, C Benchmarking

1: function PhaseGadget(α, {q˜1, ..., q˜p}) Heavy Output Generation [44] (HOG) is the prob- 2: if p = 1 then lem which demands that, given a 3: Act RZ (α) on q˜1 C as input, strings x1, ..., xk be generated which are 4: else predominantly those that are the most likely in the 5: Act CX between q˜1 and q˜2 output distribution of C. That is to say, outputs 6: PhaseGadget(α, {q˜2, ..., q˜p}) with the highest probability in the ideal distribu- 7: Act CX between q˜1 and q˜2 tion should be produced most regularly. 8: end if If the ideal distribution is sufficiently far from 9: end function uniform, this problem provides a means to dis- 10: tinguish between samples from the ideal distribu- 11: function Pauli({q˜1, ..., q˜p}, s) tion and a trivial attempt to mimic such a sam- 12: if s1 = X then pling procedure, namely producing uniformly ran- π  13: Act RX 2 on q˜1 dom strings. Although a simple problem, this task 14: else if s1 = Y then is also conjectured to be hard for a classical com- 15: Act H on the q˜p puter to perform in general [44]. 16: end if Importantly, a solution to HOG can be verified 17: Pauli({q˜2, ..., q˜p}, s) by a classical device using polynomial samples from 18: end function the real distribution. In combination, these prop- 19: erties make the study of the likely output of a dis- 20: function PauliGadget(α, qubits, s) tribution a useful tool in benchmarking near-term 21: Pauli(qubits, s) quantum devices. 22: PhaseGadget(α, qubits) 23: Act the inverse of Pauli(qubits, s) Definitions and Related Results Let pC (x) = 24: end function |hx| C|0ni|2 be the probability of measuring the out- 25: put x in the output probability distribution of an 26: Initialise n qubits, labelled q1, ..., qn, in the ideal implementation of a circuit C. An output state |0i. z ∈ {0, 1}n is heavy for a quantum circuit C, if 27: |hz| C|0ni|2 is greater than the median of the set 28: for each layer t up to depth 3n + 1 do n {pC (x): x ∈ {0, 1} }. 29: . The contents of this for loop constitutes a We can define the probability that samples drawn layer. The choice of the number of layers used from a distribution DC will be heavy outputs in the here is discussed in Appendix A.2. distribution pC , called the heavy output generation 30: n probability of DC , as follows. Here δC (x) = 1 if x 31: Select a random string s ∈ {I, X, Y, Z} is heavy for C, and 0 otherwise. 32: Generate random angle α ∈ [0, 2π] 33: X PauliGadget(α, {qi : si 6= I}, s) HOG (DC , pC ) = DC (x) δC (x) (1) 34: x∈{0,1}n 35: end for 36: For HOG (DC , pC ) to help us distinguish between 37: Measure all qubits in the computational basis an ideal implementation of C and a trivial at- tempt to mimic it by generating random bit strings, HOG (pC , pC ) should be greater than 0.5. In fact,

8 HOG (pC , pC ) is expected to be (1 + log 2)/2 ≈ distribution DC to the uniform distribution U, 0.846574 [44] for circuit classes whose distribution HOG (DC , pC ) = 1/2. This is compared to the of measurement probabilities, p, is of the exponen- case where the output probabilities are exponen- 7 −Np n tial form Pr (p) = Ne , where N = 2 . This is tially distributed, where DU = pU , when we would discussed at length in AppendixA. When the out- expect to have HOG (DC , pC ) = (1 + log 2)/2. The put distributions of a class of circuits is shown to continuum of values in between provides a valuable take this form it is meaningful to define the Heavy figure of merit, which we call Heavy Output Gen- Output Generation problem. eration Benchmarking, for a quantum computing stack. Problem 1 (Heavy Output Generation [44]) Given a measure µ over a class of circuits, the family of distributions {DC } is said to satisfy Calculation From Samples We approximate HOG if the following is true. HOG (DC , pC ) in a number of operations which 2 grows exponentially with the number of qubits, but [HOG (D , p )] ≥ (2) EC←µ C C 3 using only a polynomial number of samples from the real distribution DC , by calculating the ideal Indeed, the exponential distribution of the out- probabilities pC (x). To do so we simply calculate put probabilities of the random circuits defined in the following expression, where x1, ..., xk are sam- [38] allowed for the definition of the quantum vol- ples drawn from DU . ume of a device. This is the largest n for which distributions {DCn } which solve the HOG problem 1 X δC (xi) (3) introduced in Problem1, where Cn are random cir- k cuits defined in [38], can be sampled from. i=1,...,k The motivation for the introduction of quantum volume is the classical hardness of solving the HOG By the law of large numbers, this converges to problem of Problem1 for random circuits, under HOG (DU , pU ) in the limit of increasing sample the QUATH assumption of Assumption1. size.

Assumption 1 The QUAntum THreshold as- sumption (QUATH) [44] is that there is no poly- Discussion The connections between HOG and nomial time classical algorithm that takes as input quantum computational supremacy allow us to ex- the description of a random circuit C ← µ and tract valuable insights into the ability of a quantum which guesses whether |h0n| C|0ni|2 is greater or computing stack to demonstrate quantum compu- n less than the median value in {pC (x): x ∈ {0, 1} } tational supremacy. It provides a minimal, single with success probability at least 1/2 + Ω(1/2) over value with which to compare quantum computing the choices of C. stacks, with an intuitive interpretation. The HOG problem of Problem1, in particular, is easy to solve As opposed to the statement that HOG is hard, on a fault tolerant quantum computer with over- QUATH does not reference sampling, and concerns whelming success probability. only the difficulty of approximating amplitudes. As with quantum volume, we too will consider QUATH can be evidenced by observing the diffi- the largest n for which solving the HOG problem culties of calculating output probability amplitudes of Problem1 is possible for the circuit classes in [44]. Section2 which have exponentially distributed out- put probabilities. This is not the case for all cir- Ideal and Noisy Implementations HOG is cuit classes used here, and for those for which it solved efficiently by a quantum computer, sim- is not we will explicitly calculate the ideal heavy ply by implementing the circuit C. In the case output probability as a point of comparison. Intu- of extreme noise, and the convergence of the real itively, the largest n solving this problem verifies 7This is also commonly referred to as the Porter-Thomas the largest Hilbert space accessible to a quantum distribution [59]. computing stack.

9 3.2 Cross-Entropy Difference Definition 2 (Cross-Entropy Difference) The cross-entropy difference between two probabil- Cross-entropy benchmarking [29] relates to the av- ity distributions D and D0 is erage probability, in the ideal distribution, pU , of the outputs which are sampled from the real dis- CED (D, D0) = tribution, DU . For distributions which are far X  1   1  from uniform, and with a spread of probabilities − D (x) log . (6) 2n D0 (x) of outcomes, this measure can be used to distin- x∈{0,1}n guish an ideal from a real implementation. Ideal implementations will regularly produce the higher Therefore, the cross-entropy difference can be probability outputs, obtaining a high benchmark thought of intuitively as answering “is the distri- value, while even a small shift in the distribution bution D0 best predicted by D or by the uniform will lower the value. distribution?”. The value of the cross entropy difference can A different but related definition sets fD (x) = be calculated using exponential classical resources, 2−n − D (x)8, in which case the related quantity from a polynomial number of samples from a quan- is referred to as linear cross entropy [3]. In this tum computer, which allows for its utilisation in case the connection to the average probability of benchmarking smaller quantum devices [3, 29, 60, the outputs sampled is clearer. 61]. There are also well developed means by which this quantity can be used as a means of extrapolat- Ideal and Noisy Implementations The cross- ing from the behaviour of smaller devices to that of entropy, CE (DU , pU ), between the output distribu- larger devices, which might demonstrate quantum tion, pU , of a unitary, U, and the output distribu- computational supremacy [3]. tion of an ideal implementation of U, DU , reduces to the entropy of pU . In the case where the prob- Definitions and Related Results Intuitively, abilities pU (x) are approximately independent and the entropy, H (D), of a distribution, D, as defined identically distributed according to the exponential n in equation (4), measures the expectation of ones distribution, we have that H (pU ) = log 2 + γ − 1 ‘surprise’ at observing samples from D. In this case, [29], where γ is Euler’s constant. this is measured by fD (x) = − log (D (x)), which In the case where the probabilities D (x) are un- accordingly decreases with increasing probability of correlated with those of pU (x) we arrive at the fol- the outcome occurring. lowing prediction of the cross-entropy [29]. X  1  n H (D) = D (x) log (4) EU [CE (DU , pU )] = log 2 + γ (7) D (x) x∈{0,1}n D (x) and pU (x) are uncorrelated if, for exam- By extension, the cross-entropy measures ones ple, D is the uniform distribution, or, in the surprise when sampling from D when expecting D0. case of demonstrations of quantum computational This may be restated as the additional information supremacy, if D is the output of a polynomial cost required to describe D given a description of D0. classical algorithm [29]. Formally, cross-entropy is defined as in Definition These results allow us to identify the extreme 1. values taken by the cross-entropy difference.

Definition 1 (Cross-Entropy) The cross- DU = pU : When the unitary is implemented per- entropy between two probability distributions D fectly CED (DU , pU ) = 1. and D0is   DU = U: When samples are generated uniformly at 0 X 1 random CED (DU , pU ) = 0. CE (D, D ) = D (x) log 0 . (5) n D (x) x∈{0,1} 8 The function fD (x) may be any which decreases with Then the cross-entropy difference is simply increasing outcome probability. The choice depends on the 0 0 relationship between the fidelity of the resulting state, and CE (U, D ) − CE (D, D ), where U is the uniform the standard deviation of the estimator of the associated distribution. definition of cross-entropy difference.

10 As such, the cross entropy difference gives a value Cross-Entropy Benchmarking of the circuit built between 0 and 1 which measures the accuracy of from gates in the larger circuit which act only on the implementation of a unitary, the calculation of each half respectively, and multiplying together the which is called Cross-Entropy Benchmarking. results of both. This approach is feasible when it can be justified, through numerical simulations Calculation From Samples By the law of large and experimental implementations, that the av- numbers, the following expression converges to erage circuit fidelities do combine in this fashion. CE (DU , pU ), where x1, ..., xk are samples drawn This is so when the errors on each output are un- from DU . correlated with the amplitude of that output in the ideal probability distribution. 1 X  1  log (8) k pU (xi) i=1,...,k 3.3 `1-Norm Distance This can be used by a classical computer to ap- The `1-norm distance between two probability dis- proximate the value for CED (DU , pU ). While only tributions measures the total difference between a polynomial number of samples xi are required, the probabilities the distributions assign to ele- the calculation of pU (xi) takes exponential time. ments of their sample space. Such a metric is In our case, to avoid requiring the inverse of 0 in sufficiently strong that for several classes of quan- this approximation, we chose to use an approxima- tum circuits it is known that classical simulation tion to pU . Namely we approximate it by the larger of all circuits in the class to within some `1-norm of pU and an inverse exponential in the number of distance of the ideal distribution would contradict qubits, as is inspired by the average case supremacy commonly held computational complexity theoretic results related to random circuits [54, 55]. conjectures [48, 54, 62]. Unlike the previous two figures of merit, approx- Discussion The comparison to the uniform dis- imating the `1-norm distance requires a full char- tribution which the cross-entropy difference pro- acterisation of the ideal output distribution. In the vides is valuable as, if an honest attempt is being cases where few qubits are considered, as is so here, made to recreate a distribution, at worst U could be it is possible to perform such characterisations. For produced. In addition, the cross-entropy gives an larger qubit counts, the cross-entropy benchmark- estimate for the average circuit fidelity [29], when ing and heavy output generation are the preferred the conditions of the above discussion are met, fa- benchmarking schemes. cilitating the characterisation of noise levels in im- plementations of quantum circuits. While Cross- Definitions and Related Results In the case Entropy Benchmarking on its own cannot be used of distributions over the sample space {0, 1}n, the to distinguish error channels, in combination with ` -norm distance is defined as follows. the techniques introduced here, it can provide in- 1 sight into this information. Definition 3 (`1-norm distance) For distribu- By approximating the fidelity of smaller circuits, tions D and D0 over the sample space {0, 1}n the Cross-Entropy Benchmarking allows us to char- `1-norm distance between them is defined as acterise larger ones. This is achieved by com- 0 X 0 bining the fidelities of the smaller circuits which `1 (D, D ) = |D (x) − D (x)| . (9) themselves combine to give a larger one. This x∈{0,1}n method has been introduced and employed to benchmark demonstrations of quantum computa- Ideal and Noisy Implementations An ideal tional supremacy [3]. In that domain, calculat- implementation of a unitary would result in a `1- ing the cross-entropy difference of the larger circuit norm distance of `1 (DU , pU ) = 0. However, noise would otherwise be too computationally costly. will likely make it incredibly difficult for even fault The average circuit fidelity may be calculated by tolerant quantum computers to achieve a `1-norm decoupling two halves of the device9, performing fully coupled circuits are investigated to ensure the accuracy 9In the work of [3] both decoupled, partially coupled, and of this method of combining fidelities.

11 distance of 0 and so bounds, such as that discussed of qubits. As the circuit widths approach those in Theorem1, are often put on the value instead. large enough to demonstrate quantum computa- Indeed in that case it is sufficient for `1 (DU , pU ) to tional supremacy, memory requirements become be bounded for a demonstration of quantum com- the bottleneck [63]. putational supremacy to occur. In the case of Heavy Output Generation Bench- Once again, the `1-norm distance takes a contin- marking and Cross-Entropy Benchmarking only uous range of values allowing for comparison be- polynomially many single output probabilities are tween implementations of circuits. required, allowing the utilisation of Feynman sim- ulators [44]. These compute output bit string am- Calculation From Samples In this work we plitudes by adding all Feynman path contributions. will approximate the `1-norm distance between the This extends the domain of classical simulation by ideal and real distributions using samples from the overcoming the memory storage problem, establish- real distribution. Given samples s = {x1, ..., xm} ing the frontier of what’s possible on classical com- x from DU , let s be the number of times x ap- puters [3, 64, 65]. However, this method still re- x pears in s. Define DfU by DfU (x) = (s )/m. Then quires exponential time to perform and so reaches the approximation we will use for `1 (DU , pU ) is its own limit for large numbers of qubits.   Since HOG (DU , pU ) and CE (DU , pU ) are expec- `1 DfU , pU . tations of different functions of ideal output prob- abilities, δ (pU ) and − log (pU ) respectively, over Discussion Because of its independence from the experimental output distribution, they cap- probability values themselves, the `1-norm distance ture different features of the outputs [54]. In fact is regarded as a fair measure on the closeness of dis- HOG (DU , pU ) can also be used to approximate cir- tributions. That is to say, it is reasonable to require cuit fidelity, however the standard deviation of the quantum computers to produce samples from dis- estimator is larger than that for CE (DU , pU ) [3]. tributions within some `1-norm distance of the ideal distribution. This might not be true for measures of distance, such as multiplicative error, which re- 4 Quantum Computing Stack quire zero probability outcomes are preserved in the presence of noise, but for which very strong Each component of a quantum computing stack ex- connections to quantum computational supremacy erts an influence on overall performance, and identi- also exist [47]. fying the distinct impact of a particular component is often hard. To disentangle these factors, we must 3.4 Metric Comparison clearly identify the components used during bench- marking. Here we detail the components used to Unfortunately, Cross-Entropy Benchmarking and build the quantum computing stacks explored in Heavy Output Generation Benchmarking cannot Section5. The diverse selection of components al- be used to bound the `1-norm distance [54], which, lows us to investigate a variety of ways of building as noted in Section 3.3, provides strong guaran- a quantum computing stack. tees of demonstrations of quantum computational 10 supremacy . That is, the `1-norm distance pro- vides uniquely (amongst the metrics studied here) 4.1 Software Development Kits strong assurances about quantum computational We use a combination of tools available via pytket supremacy. This comes at the cost of requiring full [10, 66] and [1, 67]. pytket is a Python mod- state vector simulation to calculate it, consuming ule which provides an environment for construct- memory which grows exponentially in the number ing and implementing quantum circuits, as well as 10Interestingly, empirical result show a slight negative cor- for interfacing with CQC’s t|keti, a retargetable relation between `1-norm distance and experimental heavy compiler for near term quantum devices featuring output probability (normalized to the heavy output proba- hardware-agnostic optimisation. Qiskit is a open- bility of an ideal device). This relationship is discussed in AppendixD, which provides empirical details of this rela- source quantum computing software development tionship. framework for programming, simulating, and inter-

12 acting with quantum processors, which also pro- the circuit’s representation should be initially vides a compiler. Details of the versions of the mapped. software used are seen in Table2 of AppendixB. We use three parts of Qiskit in this work. First Routing: Modify a circuit to conform to the qubit is the transpiler architecture, which enables users layout of a specific architecture, for example, to define a custom compilation strategies by exe- by inserting SWAP gates to allow non-adjacent cuting a series of passes on the input circuit, as qubits to interact [76]. Circuits are rarely de- discussed in Section 4.2. The second part of Qiskit signed with the device’s coupling map in mind, we use is the library of predefined passes. Finally, a so this step is important [12]. provider is used to access hardware made available over the cloud by IBM Quantum. The provider Optimisation: Work to minimise some property enables users to send circuits to hardware, retrieve of a circuit. This may be gate count or depth, results, and query the hardware for its properties11. which is done to improve implementation ac- Similarly, we use pytket to generate and manip- curacy by reducing the impact of noise. ulate circuits in several ways. Firstly we use the Each of these tasks could consider such things as t|keti compiler to construct compilation strategies the trade-offs between the connectivity of a par- which optimise the input circuit for the target hard- ticular subgraph of the device and the amount of ware, utilising predefined passes available in t|keti. crosstalk present in that subgraph [77]. Secondly we use pytket to define abstract circuits Both pytket and Qiskit have multiple placement, and to convert t|keti’s native representation of the optimisation, and routing passes. We compare the circuit into a Qiskit object which QuantumCircuit performance of 5 compilation strategies built from is then dispatched to IBM Quantum’s systems for these passes. Two of them, noise-unaware pytket execution. and noise-unaware Qiskit, compile the circuit with- out knowledge of the device’s noise properties. An- 4.2 Compilers other two, noise-aware pytket and noise-aware Qiskit, do take noise properties into account. As a base Compilers provide tools to construct executable line, we consider a simple compilation strategy quantum circuits from abstract circuit models. from pytket using only routing, without optimisa- This is done by defining passes which may manip- tion or noise-awareness; we refer to this pass as ulate a representation of a quantum circuit, often only pytket routing. We detail these schemes in Ap- by taking account of limited connectivity architec- pendixB. The main difference between the noise- tures, or minimising quantities such as gate depth, aware schemes is that noise-aware pytket prioritises but need not perform any manipulation12. These the minimisation of gate errors during placement13, passes are composed to form compilation strategies whereas noise-aware Qiskit prioritises readout and which should output executable quantum circuits. CX errors [73]. Quantum compiling is an active area of research [68–75], and there are many pieces of software avail- able for quantum compiling. As noted above, in 4.3 Devices this work we use two: t|keti and the compiler avail- able in Qiskit. We benchmark some of the devices made avail- For the purposes of this work, the problem of able over the cloud by IBM Quantum. The de- quantum compilation is divided into three tasks. vices we use are referred to by the unique names ibmqx2, ibmq_16_melbourne, ibmq_singapore and Placement: Determine onto which physical ibmq_ourense. Each device has a set of native qubits of a given device the virtual qubits in gates which all gates in a given circuit must be de- composed to. For all the devices considered here, 11These properties include the graph connectivity, single- the native gates are: an identity operation, I; 3 and two-qubit error rates, and qubit T1 and T2 times. Some of the noise-aware compilation strategies we use require “u-gates” [78], as defined in equation (10); and a knowledge of these properties. 12An example of this is a pass which counts the gates in 13This is true of pytket 0.3.0; as a result of this work, later the circuit. versions of pytket take into account readout error.

13 controlled-NOT (CX) gate. benchmark suite is used here to probe the perfor- mance of quantum computing stacks in three ways: θ  iλ θ  ! cos 2 −e sin 2 U3 (θ, φ, λ) = Full Stack Benchmarking: In Section 5.1 we eiφ sin θ  ei(λ+φ) cos θ  2 2 perform benchmarks of the full quantum com- π  puting stack. Incorporating and thoroughly U2 (φ, λ) = U3 , φ, λ 2 investigating the compilation strategy, in par- U1 (λ) = U3 (0, 0, λ) ticular, helps develop an understanding of how (10) circuit compilation influences the performance of the quantum computing stack. In the Two of the device properties used by the noise- case of noise-aware compilation strategies, this aware compilation strategies are their connectivity also highlights how the assumptions made by and calibration data. The connectivity of a device the strategy about the importance of different refers to the connectivity of the graph representing kinds of noise impacts performance. how the qubits are coupled to one another. This information is contained in a device’s coupling map Application Motivated Benchmarks: In Sec- which, in the cases of the devices studied here, are tion 5.2, by including three quite different cir- shown in Appendix C.1 and summarised in Table cuit classes in our benchmark suite, we explore 1. how a quantum computing stack may perform Device calibration data includes information when implementing a wide array of applica- about single- and two-qubit error rates, readout er- tions. ror, and qubit frequency, T1, and T2 times. The Insights from Classical Simulation: In Sec- noise-aware compilation strategies we investigate tion 5.3 we explore how benchmarks them- use the gate error rates and readout error. Full selves can assist in the task of developing new details of noise levels can be found in Appendix noise models. By identifying when benchmark C.2 with average values given in Figure1. This values for real implementations and those we information is updated twice daily, with the data expect from simulations using noise models in Figure1 averaged over the period 2020-01-29 to differ, noise channels which should be added 2020-02-10 during which time our experiments were to the noise models to achieve greater agree- conducted. ment with real devices can be identified. This The results of Section5 depend heavily on the is of particular importance as noise-aware noise levels of the device at the time at which the compilation strategies often utilise noise computation is implemented. This is doubly true properties. in the case of the noise-aware optimisation schemes as a circuit optimised at one time may not perform In the following subsections we present results on as well over time as the noise levels of the devices each of these topics. For each circuit class and fixed change. To reduce this effect we endeavoured to number of qubits, 200 circuits were generated ac- compile and run circuits within as short a time in- cording to the circuit generation algorithms of Sec- terval as possible. tion2. Each circuit is compiled by a given compi- lation strategy onto a particular device. The com- piled circuits were then run on the device, using 5 Experimental Results 8192 repetitions (samples) from each compiled cir- cuit, which generates 8192 bitstrings. The compiled In this section we identify, using the benchmark circuits are also classically simulated using a noise suite defined in Section2 and Section3, which model built from the device calibration information properties of different levels of a quantum comput- at the time of the device run. See Data Availability ing stacks of Section4 result in the best perfor- for access to the full experimental data set. mance. This allows us to suggest means to extract The resulting bitstrings are then processed ac- as much computing power as is possible from the cording to the figures of merit given in Section3. devices available now, and in the near future. The The distribution of the figures of merit are com-

14 Device Vertices Average Degree Radius Minimum Cycle Length ibmqx2 5 2.4 1 3 2 ibmq_16_melbourne 15 2 3 4 4 ibmq_ourense 5 1.6 2 N/A ibmq_singapore 20 2.3 4 6

Table 1: Selected graph properties of the coupling maps of devices studied in this work. This table displays: the number of vertices in the graph (corresponding to the number of qubits on the device); the average degree, which is the mean number of edges incident on each vertex; the radius, which is the minimax distance over all pairs of vertices; and the minimum cycle length, which is the smallest number of edges per cycle over all cycles of the graph. See Appendix C.1 for full details of the coupling maps of the devices explored here.

Error Rate Device

0.00 0.02 0.04 0.06 0.0000 0.0005 0.0010 0.00150.00 0.02 0.04 0.06

(a) Readout Error. (b) Error per U2 gate. (c) Error per CX gate.

Figure 1: Average error rates across devices used in this work. Bars show the mean error rates across the whole device, while error bars give the standard deviation. Devices shown here are: ibmqx2 [ ], ibmq_ourense [ ], ibmq_singapore [ ], ibmq_16_melbourne [ ]. Further details can be found in Appendix C.2

15 pared by their mean, and their shape, which is ag- performance. That is, the strong performance (us- gregated into a box-and-whisker plot. Uncompiled ing a systems-level benchmark) of a given device circuits were also perfectly simulated without noise might be caused by the compilation strategy; to in order to calculate the ideal heavy output prob- reduce the effect of the strategy, aggregation over ability. These points are referred to as Noise-Free several can be done. in the figures below. For example, Figure3 shows that by consider- ing performance with a fixed compilation strategy (in this case, noise-aware pytket), ibmq_singapore 5.1 Full Stack Benchmarking would be considered to perform similarly, if not Impact of the Compilation Strategy slightly better than ibmq_ourense, as measured by `1-norm distance. However, aggregating over The two layers of the quantum computing stack all strategies, as is done in Figure4, shows we study are the compilation strategy and the de- ibmq_ourense to perform better. This suggests vice on which the compiled circuit is run. Using a that ibmq_ourense might be a better device for a fixed device and comparing multiple compilation “generic” compilation strategy to compile to. strategies allows us to determine which strategy An instance-by-instance comparison of different tends to perform well. Further, we aggregate per- compilation strategies also helps us understand formance over all compilation strategies as a way their limitations. For example, Figure5 reveals of estimating the performance of a “generic” strat- noise-aware pytket works best at reproducing the egy. Similarly, using a fixed strategy and compar- ideal distribution of heavy output probabilities of ing its performance on multiple devices enables a square circuits on ibmq_16_melbourne. This is study of how the assumptions made by the strategy likely in part due to the routing scheme, as is re- about the devices impact performance when those vealed by the strong performance of only pytket assumptions don’t always hold. routing. Figure2 displays experimental results making Similarly, Figure6 shows that noise-aware this comparison when implementing square circuits pytket is amongst the worst-performing compilation on ibmq_16_melbourne, using heavy output gener- strategies for lower numbers of qubits, while it is ation probability as the figure of merit. The noise- amongst the best-performing for higher numbers. aware pytket compilation strategy performs some- This could be a result of the way in which noise- what better, on average, than a generic strategy. aware pytket prioritises noise in its routing scheme, Because the aggregated information (“All Strate- with gate errors taking precedence15. gies” in Figure2) includes aggregation over noise- These results highlight the fact that full-stack aware pytket, these results indicate that other com- benchmarking can help provide a more detailed un- pilation strategies perform a bit worse, since the derstanding of how the components of a system af- performance of the aggregate is generally lower fect performance. than that of noise-aware pytket14. This reveals both the potential for compilation strategy driven im- Noise Level, Connectivity Trade Off provements in performance, and the insights into such improvements which full quantum computing Another particularly important example of this is stack benchmarking brings. the examination of the connectedness of the de- Aggregation over compilation strategies is not vice and its noise levels. More highly-connected only useful for identifying strategies which are bet- architectures typically allow for shallower imple- ter in general. Doing so also provides a way of mentations of a given circuit as compared to less- identifying devices which perform well, by “wash- connected ones, but the noise levels in a more ing out” the effect of the compilation strategy on highly-connected architecture may be higher due to crosstalk [79]. This creates a trade-off between 14Note that due to the fact we aggregate over all 5 com- pilation strategies, the distribution of heavy output proba- 15For larger numbers of qubits and deeper circuits, gate bilities amongst the “All Strategies” category contains five errors becomes more impactful on the total noise, and mate- times as many points as compared to those for noise-aware rialises as giving noise-aware pytket an advantage for larger pytket. numbers of qubits.

16 1.0

0.9

0.8 Strategy: [ ] noise-aware pytket 0.7 [ ] Noise-Free 0.6 [ ] All Strategies 0.5

0.4

2 3 4 5 Heavy Outputs Probability Number of Qubits

Figure 2: Comparison of fixed compilation strategy to average of all strategies, us- ing the heavy outputs probability metric, when running square circuits using the real ibmq_16_melbourne device. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the low and high quartiles. White circles give the mean.

Implementation on Real Device 1.4

1.2

1.0

0.8

0.6

0.4 0.2 Device: 0.0 [ ] ibmq_singapore Classical Simulation Using Qiskit Noise Model [ ] ibmqx2 1.4 [ ] ibmq_16_melbourne 1.2 [ ] ibmq_ourense

l1 Norm1.0 Distance

0.8

0.6

0.4

0.2

0.0 2 3 4 5 6 7 Number of Qubits

Figure 3: Comparison of devices, using the `1-norm distance metric, when running shallow circuits compiled using noise-aware pytket. Both simulations using Qiskit noise models, and imple- mentations on real devices, are included. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

17 1.25 Device: 1.00 [ ] ibmq_singapore 0.75 [ ] ibmqx2

0.50 [ ] ibmq_16_melbourne [ ] ibmq_ourense 0.25 l1 Norm Distance 0.00 2 3 4 5 6 7 Number of Qubits

Figure 4: Comparison of real devices, using the `1-norm distance metric, when running shallow circuits compiled using all compilation strategies. Here we compile onto each device using all compilation strategies, including all compiled circuits in this plot. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

1.0 Strategy: [ ] noise-aware pytket 0.8 [ ] noise-aware Qiskit [ ] noise-unaware pytket 0.6 [ ] only pytket routing [ ] noise-unaware Qiskit 0.4 [ ] Noise-Free

2 3 4 5 Heavy Outputs Probability Number of Qubits

Figure 5: Comparison of compilation strategies, using the heavy outputs probability metric, when square circuits are ran on the real ibmq_16_melbourne device. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

18 1.0 Strategy: 0.8 [ ] noise-aware pytket [ ] noise-aware Qiskit 0.6 [ ] noise-unaware pytket 0.4 [ ] only pytket routing 0.2 [ ] noise-unaware Qiskit -Norm Distance 1 ` 0.0 2 3 4 5 Number of Qubits

Figure 6: Comparison of compilation strategies, using the `1-norm distance metric, when shallow circuits are run on the real ibmq_ourense device. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the low and high quartiles. White circles give the mean. connectivity and the total amount of noise incurred those which cannot, will not. In general though, when running a computation. lower-noise devices will tend to perform best. As noise affects the accuracy of the computation, this trade-off has practical implications for the per- Comparison with Previous Results formance of a device. Indeed, reducing the connec- tivity between superconducting qubits is used as a As discussed in Section 2.2, our definition of square tool to reduce noise levels [79]. This can also be circuits differs from previous experiments [38], by counteracted by decoupling qubits [3] but this is allowing for all-to-all connectivity before compila- not utilised in the devices studied here16. tion, as opposed to utilising permutation layers. Figure7 shows that devices with lower noise However, as a naive compilation of square circuits levels (ibmq_singapore and ibmq_ourense) typi- onto a one-dimensional, nearest-neighbour connec- cally outperform devices with higher noise levels tivity would recreate the same circuits as used in (ibmqx2 and ibmq_16_melbourne) despite the lat- [38], we might expect the results from our experi- ter’s higher connectivity. An interesting exception ments to be similar, and a comparison of the results to this is for 4 qubits, where ibmq_16_melbourne from these experiments is warranted. Further, in performs best, likely because of the 4-qubit cycles the case of superconducting devices, as are explored in its connectivity graph. This reduces the SWAP here, we would expect the circuit after compilation operations necessary for implementing the circuit, to be similar, as many SWAP operations will need reducing the overall circuit depth. This reveals the to take place in both cases. increase in performance that can be expected when Figure7 shows that all quantum computing the connectivity of the device and the problem in- stacks explored here produce heavy outputs with stance are similar [12]. Similar results hold for probability greater than 2/3 on average for circuits Cross-Entropy Benchmarking, as shown in Figure acting on at most 3 qubits, with ibmq_ourense typ- 8. ically performing best. Previous results reported In general, we expect that circuits whose struc- that ibmq_singapore could demonstrate a quan- ture can naturally be mapped to the connectivity tum volume, as defined in [38], of 24 [80]. That of the device will generally perform well, whereas our experiments produce different results is surpris- ing, given their aforementioned similarity. One un- 16While we focus on the connectivity of superconducting controllable variable is changes of the device over architectures here, more generally the comparison between the time between previous experiments and this the limited connectivity of superconducting devices, and the completely connected coupling maps of ion trap devices is one, which may influence this discrepancy. In- of interest [14, 21, 23]. deed, studying the change in the value of volumet-

19 Implementation on Real Device

1.0

0.9

0.8

0.7

0.6

0.5 Device: [ ] ibmq_singapore 0.4 [ ] ibmq_16_melbourne Classical Simulation Using Qiskit Noise Model [ ] ibmq_ourense 1.0 [ ] ibmqx2 0.9 [ ] Noise-Free

0.8

Heavy Outputs0.7 Probability

0.6

0.5

0.4 2 3 4 5 Number of Qubits

Figure 7: Comparison of devices, using the heavy outputs probability metric, when running square circuits compiled using noise-aware pytket. Both simulations using Qiskit noise models, and implementations on real devices, are included. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

20 Implementation on Real Device

1.25

1.00

0.75

0.50

0.25

0.00

0.25 Device: Classical Simulation Using Qiskit Noise Model [ ] ibmq_ourense 1.25 [ ] ibmqx2

1.00

0.75

Cross Entropy Difference 0.50

0.25

0.00

0.25

2 3 4 5 Number of Qubits

Figure 8: Comparison of devices, using the cross entropy difference metric, when running square circuits compiled using noise-aware pytket. Both simulations using Qiskit noise models, and implementations on real devices, are included. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

21 ric benchmarks over time would be interesting and has comparable performance to ibmq_ourense for useful, although no such effort is known to us. smaller numbers of qubits. ibmq_singapore outper- forms ibmq_ourense by having more qubits avail- 5.2 Application Motivated Bench- able. This superior performance of ibmq_singapore marks is in comparison to the results of Figure7, where ibmq_ourense was shown to perform well. This jus- The same quantum computing stack will perform tifies our suggestion that shallow circuits should differently when running different applications, as be included in benchmarking suites. Doing so al- the structure of the circuits they require will gen- lows for the exploration of higher qubit requirement erally be different. Differences in performance are computations. In this setting devices that perform seen in the context of our application-motivated poorly when implementing square circuits or deep benchmarks. For example, consider Figure9, which circuits may perform well. shows performance when implementing sparsely connected circuits, and Figure 10, which shows per- Shallow Circuits and `1-Norm Distance formance when implementing chemistry-motivated circuits. In the case of Figure9, the ibmqx2 device Theorem1 provides a convenient criterion for suc- outperforms ibmq_singapore, while in the case of cess in implementing shallow circuits; namely an Figure 10 the reverse is true. `1-norm distance of not more than 1/192 from the ideal distribution. Figure4 explores the closeness Quantum Chemistry to a successful implementation and reveals that, on average over all compilation strategies, the best Figure 10 suggests ibmq_ourense is best for quan- performing device is ibmq_ourense. tum chemistry applications, because it performs Figure6 explores the best performing compila- 17 well when running deep circuits . In particular tion strategies for ibmq_ourense. It shows that, Figure 10 indicates that the average circuit fidelity compared to the other strategies, the mean `1- is highest for implementations on ibmq_ourense. norm distance is marginally smaller for noise-aware In Figure 10, all devices converge to the mini- pytket when the number of qubits is larger, while mum value of cross-entropy difference at 4 qubits. noise-unaware Qiskit and noise-aware Qiskit perform To extend an investigation of this sort to more well for fewer qubits. We explore the perfor- qubits would require lower noise levels or chemistry mance of noise-aware pytket further, as the in- motivated circuits which generate exponentially stances with higher qubit counts are relevant for distributed output probabilities at lower depth. use cases of quantum computers. Indeed, Figure3 shows ibmq_ourense and ibmq_singapore perform Shallow Circuits as a Benchmark similarly when the noise-aware pytket optimiser is Figure 11 demonstrates that shallow circuits allow used, despite ibmq_ourense performing better on us to benchmark the behaviour of a quantum com- average. This is likely because ibmq_singapore puting stack for applications involving circuits with has a sub lattice with comparable noise levels to many qubits but low circuit depth [35, 43, 46]. In ibmq_ourense, which noise-aware pytket is able to this case we are able to continue our analysis, be- isolate, while on average the levels are higher. yond that of Figure7, of those devices which per- No device consistently brings the `1-norm dis- form sufficiently well for a smaller number of qubits, tance to within 1/192 of the ideal. However, and which have architectures including more qubits ibmq_ourense seems to slightly outperform the to investigate. other devices, showing the benefit of lower noise The results show ibmq_singapore outperforms levels over high connectivity or high numbers of the comparably sized ibmq_16_melbourne and qubits. While this methodology would be impos- sible to extend to the demonstrations of quantum 17This comes with the caveat, as mentioned in Section 2.3, computational supremacy, we hope that exploring that the connection between the quality of an implementa- it for these quantum computational supremacy re- tion of these computational primitives, as measured by this benchmark, and accurate ground state energy calculations lated circuits will provide insights into the best in VQE has not been demonstrated experimentally. quantum computing stack for such demonstrations.

22 1.0

0.5 Device: [ ] ibmq_singapore 0.0 [ ] ibmqx2 [ ] ibmq_16_melbourne 0.5 [ ] ibmq_ourense

1.0

Cross Entropy Difference 2 3 4 5 6 7 Number of Qubits

Figure 9: Comparison of real devices, using the cross entropy difference metric, when run- ning shallow circuits compiled using noise-aware Qiskit. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

Implementation on Real Device

1.5

1.0

0.5

0.0

0.5 Device: [ ] ibmq_singapore Classical Simulation Using Qiskit Noise Model [ ] ibmq_16_melbourne 1.5 [ ] ibmq_ourense [ ] ibmqx2 1.0

Cross Entropy Difference 0.5

0.0

0.5

2 3 4 Number of Qubits

Figure 10: Comparison of devices, using the cross entropy difference metric, when running deep circuits compiled using noise-aware Qiskit. Both simulations using Qiskit noise models, and implementations on real devices, are included. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

23 1.0 Device: 0.8 [ ] ibmq_singapore [ ] ibmqx2 0.6 [ ] ibmq_16_melbourne [ ] ibmq_ourense 0.4 [ ] Noise-Free

0.2 2 3 4 5 6 7 Heavy Outputs Probability Number of Qubits

Figure 11: Comparison of real devices, using the heavy outputs probability metric, when running shallow circuits compiled using noise-aware pytket. Boxes show quartiles of the dataset while the whiskers extend to 1.5 times the IQR past the upper and lower quartiles. White circles give the mean.

5.3 Insights from Classical Simula- This investigation could also influence the perfor- tion mance of noise-aware compilation strategies, which use properties of the noise. Verifying the accuracy The noise present in a non-fault-tolerant quantum of these noise properties, through verification of the computer results in discrepancies between results accuracy of the resulting noise models, could im- obtained from running on real hardware and those prove the performance of noise-aware strategies. that would be obtained from an ideal quantum For the devices explored here, the noise models computer. Often, noise models are utilised dur- are built using Qiskit. They are derived from a ing classical simulation to investigate the effects of device’s properties and include one- and two-qubit noise and help identify why these discrepancies oc- gate errors18 and single-qubit readout errors. We cur [81]. However, a perfect model of the noise, find these noise models are inadequate to explain which could reproduce the results of real hardware some of the discrepancies observed in the data. (up to statistical error) could require many parame- ters to completely specify it. Therefore, most noise models consider only a small handful of physical ef- fects. Consequently discrepancies between the re- Noise Does Not Just Flatten Distributions sults of noisy simulation and running experiments on real hardware always remain. One discrepancy between experiments and noisy Historically, closing the gap between noisy simu- simulations is the spread of the data. For example, lation and real hardware required developing noise Figure7 shows that only in the experimental case models of increasing sophistication. Developing do the whiskers of the plot fall below the value 0.5, them typically requires a great deal of physics ex- indicating the heavy outputs are less likely than pertise to identify new noise channels. Further, new they would be in the uniform distribution. Some experiments would have to be designed in order to noise type, in particular one which shifts the prob- estimate their parameters in the noise model. ability density, rather than uniformly flattening it, Here, we suggest some of the benchmarks con- is not considered, or is under appreciated, by the ducted in this work could be helpful in identifying noise models used. Identifying that noise channel whether new noise channels should be incorporated is left to future work, though we speculate it may into a noise model. In particular, by isolating the be related to a kind of thermal relaxation error. circuit types and coupling maps for which the dis- crepancies are greatest, it is possible to speculate 18These are modelled to consist of a depolarising errors about the possible causes of the mismatch. followed by a thermal relaxation errors.

24 Noise Models Under Represent Some Noise provide a benchmark suite utilising differing circuit Channels classes and figures of merit to access a variety of properties of the device. This includes the use of The classical simulations in Figure7 suggest ibmqx2 three circuit classes: deep circuits and shallow cir- should perform similarly to ibmq_ourense in most cuits, which are novel to this paper; and square cases. In fact, it quite consistently performs worse. circuits, which resemble random circuits used in This is isolated in Figure8, with the same phe- other benchmarking experiments [38]. In addition nomenon being observed in Figure3 and Figure 10, we make use of a diverse selection of figures of merit showing the behavior is consistent across all circuit to measure the performance of the quantum com- types and figures of merit. puting stacks considered, namely: Heavy Output This difference between simulated and experi- Generation Benchmarking, Cross-Entropy Bench- mental results is pronounced in the case of Fig- marking, and the `1-norm distance. ure 10, where deep circuits are used. This suggests In particular, in the form of deep circuits we the noise models may be underestimating the er- present an alternative to previous approaches to ror from time-dependent noises such as depolaris- application-motivated benchmarking. This is by ing and dephasing, or from two-qubit gates which considering circuits inspired by one of the primi- are more prevalent in deep circuits. tives utilised in VQE, namely Pauli gadgets em- Another such example of a two-qubit noise chan- ployed for state preparation, rather than VQE it- nel, which is explicitly not accounted for in the self. Further, while we have found that the per- noise models, is crosstalk. The results in Figure formances of quantum computing stacks are indis- 8 are consistent with the expectation that cross- tinguishable when using square circuits and Heavy talk should have the greatest impact on more highly Output Generation Benchmarking for a large num- connected devices [79]. As such crosstalk may be ber of qubits, shallow circuits extend the number the origin of the discrepancy. Of note is the fact of qubits for which detail can be observed, while this benchmark wasn’t explicitly designed to cap- also being consistent with philosophy of volumetric ture the effects of crosstalk, and yet those effects benchmarking. manifest themselves in its results. We anticipate We demonstrate this benchmark suite by that including crosstalk-aware passes in compila- employing it on ibmqx2, ibmq_16_melbourne, tion strategies [77] would reduce the discrepancy. ibmq_ourense, and ibmq_singapore. In doing so we justified our thesis that the accuracy of a com- putation depends on several levels of the quantum 6 Conclusion computing stack, and that each layer should not be considered in isolation. For example, identify- The performance of quantum computing devices ing that the increased connectivity of a device does is highly dependent on several factors. Amongst not compensate for the increased noise, as we do in them are the noise levels of the device, the soft- Section 5.1, shows the impact of this layer of the ware used to construct and manipulate the circuits stack, and justifies investigating devices with a va- implemented, and the applications for which the riety of coupling maps and noise levels. By showing device is used. The impact of these factors on the the differing performance between five compilation performance of a quantum computing stack are in- strategies, we are able to identify, in Section 5.1, tertwined, making the task of predicting its holistic the dependence of the best compilation strategy to performance from knowledge of the performance of use on the device and the dimension of the circuit. each component impossible. In order to understand This illustrates the dependence of the performance and measure the performance of quantum comput- of the quantum computing stack on the compilation ing stacks, benchmarks must take this into consid- layer, and the interdependence between the compi- eration. lation strategy, device and application on the over- In this work we have addressed this prob- all performance of the quantum computing stack. lem by introducing a methodology for perform- In particular, noise-aware compilation strategies of- ing application-motivated, holistic benchmarking ten perform well, when the noise model used by the of the full quantum computing stack. To do so we strategy is accurate, as discussed in Section 5.3.

25 In Section 5.2, the wide selection of circuits system can be increased. within the proposed benchmark suite reveals that the same device, evaluated according to a fixed fig- Second, the philosophy of application-motivated ure of merit, will perform differently when running benchmarking could be extended to circuits which different applications, whose circuits are compiled are more easily classically simulable. Because of by the same compilation strategy. Indeed the com- their reliance on classical simulation, the bench- parative performance of (compilation strategy, de- marks introduced here may be used up to, but not vice) pairs is shown to vary between our circuit after, the point of demonstrating quantum compu- classes. This justifies our inclusion of circuit classes tational supremacy. Hence new circuit classes will which collectively cover a wide selection of applica- need to be introduced which can be classically sim- tions in the benchmark suite proposed here, and ulated in this regime. Alternatively, application- our full quantum computing stack approach. motivated benchmarks that are derived from com- We foresee the benchmarks conducted in this bining benchmarks of smaller devices [3] could be work providing a means to select the best quan- developed. tum computing stack, of those explored here, for a particular task, and vice versa. As such we also an- Third, we envision a need to systematically study ticipate that a variety of new quantum computing how properties of hardware, such as noise levels stacks could be benchmarked in the way described or connectivity, influence a given device’s perfor- in this work, empowering the user with knowledge mance. In this work, we were limited to the par- about the performance of current quantum tech- ticular devices made available by IBM Quantum, nologies for particular tasks. which limits our ability to perform such a system- These benchmarks may, in time, come to comple- atic inquiry. It is is nevertheless vital to do so, as ment noise models and calibration information as a the results of Section 5.1 show that changing the means to disseminate information about a device’s hardware can dramatically influence performance. performance. This parallels the use of the LIN- Indeed, this would allow us understand if the ob- PACK benchmarks [6] alongside FLOPS to com- servations made in Section 5.1 are typical, and to pare diverse classical computers. Recently, quan- explore the existence of other relationships. This tum volume, as defined in [38], has started to be could be achieved by implementing this benchmark adopted as one such metric [82], and we hope the suite on more devices, or synthetic devices with benchmark suite developed here will be incorpo- tunable coupling maps and noise information. rated similarly. Further, our benchmarks may fa- cilitate an understanding of how new, or hard-to- Finally, there is a need study the correlation characterize, noise affects the practical performance between the results of an application-motivated of quantum computers, as implied by the classical benchmark and the performance of a quantum com- simulations of Section 5.3. puting stack at running the application which mo- The work presented here could be extended in tivated it. This would show that benchmarking several directions. The first is to examine the im- application subroutines provides reliable predictors pact of incorporating these benchmarks into a com- of performance when running the application itself. pilation strategy. While noise-aware compilation While similar work has explored the correlation be- strategies currently use properties of qubits to de- tween the classification accuracy and circuit prop- cide how to compile a circuit, it would be interest- erties of parametrised quantum circuits [37], com- ing to explore if instead optimising for these bench- paring the performance of the benchmarks defined marks would change the compilation. The trade here with their applications is a subject for future off between the benefits of doing so against the in- work. For example, comparing the performance of creased compilation time resulting from the time a stack at implementing deep circuits and running taken to perform the benchmarks should then also the VQE algorithm would show the extent to which be assessed. This information would help in the quantum computing stacks that perform well at a understanding of the interplay between the amount particular kind of state preparation circuit also per- of classical circuit optimisation performed and the form well in estimating properties of a wide range amount by which the performance of a quantum of molecules.

26 Data Availability [4] Simon J. Devitt. Performing quantum com- puting experiments in the cloud. Phys. The data gathered during the experiments con- Rev. A, 94:032329, September 2016. ducted for this work, as presented in Section doi:10.1103/PhysRevA.94.032329. URL 5, are available at http://doi.org/10.5281/ https://link.aps.org/doi/10.1103/ zenodo.3832121. QASM files representing the cir- PhysRevA.94.032329. cuits executed, the per sample bitstring outputs from device and simulator executions, and device [5] S. Debnath, N. M. Linke, C. Figgatt, K. A. calibration data gathered throughout the course of Landsman, K. Wright, and C. Monroe. the experiments, are provided. Demonstration of a small programmable quan- tum computer with atomic qubits. Nature, 536(7614):63–66, August 2016. ISSN 1476- Acknowledgements 4687. doi:10.1038/nature18648. URL https: //doi.org/10.1038/nature18648. DM acknowledges support from the Engineering and Physical Sciences Research Council (grant [6] A Petitet, R C Whaley, J Dongarra, and EP/L01503X/1). This work was made possible, in A Cleary. Hpl - a portable implementation of part, by systems built by IBM as part of the IBM the high-performance linpack benchmark for Quantum program, and made accessible via mem- distributed-memory computers. URL http: bership in the IBM Q Network. //www.netlib.org/benchmark/hpl/. IBM, IBM Q, and Qiskit are trademarks of Inter- national Business Machines Corporation, registered [7] Jack J. Dongarra, Piotr Luszczek, and in many jurisdictions worldwide. Other product or Antoine Petitet. The linpack benchmark: service names may be trademarks or service marks past, present and future. Concurrency and of IBM or other companies. Computation: Practice and Experience, 15 (9):803–820, 2003. doi:10.1002/cpe.728. URL https://onlinelibrary.wiley.com/doi/ References abs/10.1002/cpe.728. [1] Gadi Aleksandrowicz, Thomas Alexander, [8] Jack Dongarra and Piotr Luszczek. Panagiotis Barkoutsos, Luciano Bello, et al. TOP500, pages 2055–2057. Springer US, Qiskit: An Open-source Framework for Quan- Boston, MA, 2011. ISBN 978-0-387- tum Computing, January 2019. URL https: 09766-4. doi:10.1007/978-0-387-09766- //doi.org/10.5281/zenodo.2562111. 4_157. URL https://doi.org/10.1007/ [2] Peter J Karalekas, Nikolas A Tezak, Eric C Pe- 978-0-387-09766-4_157. terson, Colm A Ryan, Marcus P da Silva, and Robert S Smith. A quantum-classical cloud [9] Ryan LaRose. Overview and Compari- platform optimized for variational hybrid al- son of Gate Level Quantum Software Plat- gorithms. Quantum Science and Technology, forms. Quantum, 3:130, March 2019. 5(2):024003, April 2020. doi:10.1088/2058- ISSN 2521-327X. doi:10.22331/q-2019-03- 9565/ab7559. URL https://doi.org/10. 25-130. URL https://doi.org/10.22331/ 1088%2F2058-9565%2Fab7559. q-2019-03-25-130.

[3] Frank Arute, Kunal Arya, Ryan Babbush, [10] Seyon Sivarajah, Silas Dilkes, Alexander Dave Bacon, et al. Quantum supremacy Cowtan, Will Simmons, Alec Edgington, and using a programmable superconducting pro- Ross Duncan. t|keti: A retargetable compiler cessor. Nature, 574(7779):505–510, 2019. for nisq devices. Quantum Science and Tech- ISSN 1476-4687. doi:10.1038/s41586-019- nology, 2020. doi:10.1088/2058-9565/ab8e92. 1666-5. URL https://doi.org/10.1038/ URL https://iopscience.iop.org/ s41586-019-1666-5. article/10.1088/2058-9565/ab8e92.

27 [11] Robert S. Smith, Michael J. Curtis, and [18] E. F. Dumitrescu, A. J. McCaskey, G. Hagen, William J. Zeng. A practical quantum in- G. R. Jansen, T. D. Morris, T. Papenbrock, struction set architecture. 2016. URL https: R. C. Pooser, D. J. Dean, and P. Lougov- //arxiv.org/abs/1608.03355. ski. Cloud quantum computing of an atomic nucleus. Phys. Rev. Lett., 120:210501, May [12] Frank Arute, Kunal Arya, Ryan Babbush, 2018. doi:10.1103/PhysRevLett.120.210501. Dave Bacon, et al. Quantum approximate op- URL https://link.aps.org/doi/10.1103/ timization of non-planar graph problems on a PhysRevLett.120.210501. planar superconducting processor, 2020. URL https://arxiv.org/abs/2004.04197. [19] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, et al. Hartree-fock on a supercon- [13] Robin Blume-Kohout and Kevin C. Young. A ducting qubit quantum computer, 2020. URL volumetric framework for quantum computer https://arxiv.org/abs/2004.04174. benchmarks, 2019. URL https://arxiv.org/ abs/1904.05546. [20] Vedran Dunjko and Hans J Briegel. Ma- chine learning & artificial intelligence in [14] Norbert M. Linke, Dmitri Maslov, Martin the quantum domain: a review of recent Roetteler, Shantanu Debnath, Caroline Fig- progress. Reports on Progress in Physics, gatt, Kevin A. Landsman, Kenneth Wright, 81(7):074001, June 2018. doi:10.1088/1361- and Christopher Monroe. Experimental com- 6633/aab406. URL https://doi.org/10. parison of two quantum computing architec- 1088%2F1361-6633%2Faab406. tures. Proceedings of the National Academy of Sciences, 114(13):3305–3310, 2017. ISSN [21] Marcello Benedetti, Delfina Garcia-Pintos, 0027-8424. doi:10.1073/pnas.1618020114. Oscar Perdomo, Vicente Leyton-Ortega, Yun- URL https://www.pnas.org/content/114/ seong Nam, and Alejandro Perdomo-Ortiz. A 13/3305. generative modeling approach for benchmark- [15] Sam McArdle, Suguru Endo, Alán Aspuru- ing and training shallow quantum circuits. Guzik, Simon C. Benjamin, and Xiao npj , 5(1):45, 2019. Yuan. Quantum computational chem- ISSN 2056-6387. doi:10.1038/s41534-019- istry. Rev. Mod. Phys., 92:015003, March 0157-8. URL https://doi.org/10.1038/ 2020. doi:10.1103/RevModPhys.92.015003. s41534-019-0157-8. URL https://link.aps.org/doi/10.1103/ RevModPhys.92.015003. [22] Kathleen E. Hamilton and Raphael C. Pooser. Error-mitigated data-driven circuit learning [16] Alexander J. McCaskey, Zachary P. Parks, on noisy quantum hardware. 2019. URL Jacek Jakowski, Shirley V. Moore, Titus D. https://arxiv.org/abs/1911.13289. Morris, Travis S. Humble, and Raphael C. Pooser. Quantum chemistry as a bench- [23] Kathleen E. Hamilton, Eugene F. Du- mark for near-term quantum computers. mitrescu, and Raphael C. Pooser. Gener- npj Quantum Information, 5(1):99, 2019. ative model benchmarks for superconduct- ISSN 2056-6387. doi:10.1038/s41534-019- ing qubits. Phys. Rev. A, 99:062323, 0209-0. URL https://doi.org/10.1038/ June 2019. doi:10.1103/PhysRevA.99.062323. s41534-019-0209-0. URL https://link.aps.org/doi/10.1103/ PhysRevA.99.062323. [17] Pierre-Luc Dallaire-Demers, Michał Stęchły, Jerome F. Gonthier, Ntwali Toussaint Bashige, [24] Madita Willsch, Dennis Willsch, Fengping Jonathan Romero, and Yudong Cao. An appli- Jin, Hans De Raedt, and Kristel Michielsen. cation benchmark for fermionic quantum sim- Benchmarking the quantum approximate op- ulations. 2020. URL https://arxiv.org/ timization algorithm, 2019. URL https:// abs/2003.01862. arxiv.org/abs/1907.02359.

28 [25] Andreas Bengtsson, Pontus Vikstål, Christo- [32] Jonathan Romero, Ryan Babbush, Jarrod R pher Warren, Marika Svensson, et al. Quan- McClean, Cornelius Hempel, Peter J Love, tum approximate optimization of the exact- and Alán Aspuru-Guzik. Strategies for cover problem on a superconducting quantum quantum computing molecular energies using processor, 2019. URL https://arxiv.org/ the unitary coupled cluster ansatz. Quantum abs/1912.10495. Science and Technology, 4(1):014008, October 2018. doi:10.1088/2058-9565/aad3e4. URL [26] G. Pagano, A. Bapat, P. Becker, K. S. Collins, https://iopscience.iop.org/article/10. A. De, P. W. Hess, H. B. Kaplan, A. Kypri- 1088/2058-9565/aad3e4. anidis, W. L. Tan, C. Baldwin, L. T. Brady, A. Deshpande, F. Liu, S. Jordan, A. V. Gor- [33] Sukin Sim, Peter D. Johnson, and Alán shkov, and C. Monroe. Quantum approximate Aspuru-Guzik. Expressibility and entangling optimization of the long-range ising model capability of parameterized quantum circuits with a trapped-ion , 2019. for hybrid quantum-classical algorithms. Ad- URL https://arxiv.org/abs/1906.02700. vanced Quantum Technologies, 2(12):1900070, [27] John Preskill. Quantum computing and the 2019. doi:10.1002/qute.201900070. URL entanglement frontier, 2012. URL https:// https://onlinelibrary.wiley.com/doi/ arxiv.org/abs/1203.5813. abs/10.1002/qute.201900070. [28] Aram W. Harrow and Ashley Montanaro. [34] Abhinav Kandala, Antonio Mezzacapo, Quantum computational supremacy. Nature, Kristan Temme, Maika Takita, Markus 549(7671):203–209, September 2017. ISSN Brink, Jerry M. Chow, and Jay M. Gam- 1476-4687. doi:10.1038/nature23458. URL betta. Hardware-efficient variational quan- http://dx.doi.org/10.1038/nature23458. tum eigensolver for small molecules and quantum magnets. Nature, 549(7671): [29] Sergio Boixo, Sergei V. Isakov, Vadim N. 242–246, September 2017. ISSN 1476- Smelyanskiy, Ryan Babbush, Nan Ding, 4687. doi:10.1038/nature23879. URL Zhang Jiang, Michael J. Bremner, John M. https://doi.org/10.1038/nature23879. Martinis, and Hartmut Neven. Characteriz- ing quantum supremacy in near-term devices. [35] Brian Coyle, Daniel Mills, Vincent Danos, and Nature Physics, 14(6):595–600, June 2018. Elham Kashefi. The born supremacy: Quan- ISSN 1745-2481. doi:10.1038/s41567-018- tum advantage and training of an ising born 0124-x. URL https://doi.org/10.1038/ machine. 2019. URL https://arxiv.org/ s41567-018-0124-x. abs/1904.02214. [30] Dominic W. Berry, Graeme Ahokas, Richard [36] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Cleve, and Barry C. Sanders. Efficient and Dacheng Tao. The expressive power of quantum algorithms for simulating sparse parameterized quantum circuits, 2018. URL hamiltonians. Communications in Mathe- https://arxiv.org/abs/1810.11922. matical Physics, 270(2):359–371, Mar 2007. ISSN 1432-0916. doi:10.1007/s00220-006- [37] Thomas Hubregtsen, Josef Pichlmeier, and 0150-x. URL https://doi.org/10.1007/ Koen Bertels. Evaluation of parameterized s00220-006-0150-x. quantum circuits: on the design, and the rela- tion between classification accuracy, express- [31] Alberto Peruzzo, Jarrod McClean, Peter Shad- ibility and entangling capability. 2020. URL bolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. https://arxiv.org/abs/2003.09887. Love, Alán Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on [38] Andrew W. Cross, Lev S. Bishop, Sarah a photonic quantum processor. Nature Com- Sheldon, Paul D. Nation, and Jay M. munications, 5(1):4213, July 2014. ISSN 2041- Gambetta. Validating quantum com- 1723. doi:10.1038/ncomms5213. URL https: puters using randomized model circuits. //doi.org/10.1038/ncomms5213. Phys. Rev. A, 100:032328, September 2019.

29 doi:10.1103/PhysRevA.100.032328. URL [45] Nathan Wiebe, Christopher Granade, and D G https://link.aps.org/doi/10.1103/ Cory. Quantum bootstrapping via compressed PhysRevA.100.032328. quantum hamiltonian learning. New Journal of Physics, 17(2):022005, February 2015. [39] Alexandru Gheorghiu, Theodoros Kapourni- doi:10.1088/1367-2630/17/2/022005. URL otis, and Elham Kashefi. Verification of https://doi.org/10.1088%2F1367-2630% quantum computation: An overview of ex- 2F17%2F2%2F022005. isting approaches. Theor. Comp. Sys., 63 (4):715–808, May 2019. ISSN 1432-4350. [46] Dan Shepherd and Michael J. Bremner. doi:10.1007/s00224-018-9872-3. URL https: Temporally unstructured quantum compu- //doi.org/10.1007/s00224-018-9872-3. tation. Proceedings of the Royal Society A: Mathematical, Physical and Engi- [40] Urmila Mahadev. Classical verification of neering Sciences, 465(2105):1413–1439, quantum computations. In 2018 IEEE 2009. doi:10.1098/rspa.2008.0443. URL 59th Annual Symposium on Foundations of https://royalsocietypublishing.org/ Computer Science (FOCS), pages 259–267. doi/abs/10.1098/rspa.2008.0443. IEEE, 2018. URL http://ieee-focs.org/ FOCS-2018-Papers/pdfs/59f259.pdf. [47] Michael J. Bremner, Richard Jozsa, and [41] Wojciech Kozlowski and Stephanie Wehner. Dan J. Shepherd. Classical simulation of Towards large-scale quantum networks. In commuting quantum computations implies Proceedings of the Sixth Annual ACM In- collapse of the polynomial hierarchy. Proceed- ternational Conference on Nanoscale Com- ings of the Royal Society A: Mathematical, puting and Communication, NANOCOM ’19, Physical and Engineering Sciences, 467(2126): New York, NY, USA, 2019. Association for 459–472, 2011. doi:10.1098/rspa.2010.0301. Computing Machinery. ISBN 9781450368971. URL https://royalsocietypublishing. . doi:10.1145/3345312.3345497. URL https: org/doi/abs/10.1098/rspa.2010.0301 //doi.org/10.1145/3345312.3345497. [48] Michael J. Bremner, Ashley Montanaro, [42] C. Greganti, T. F. Demarie, M. Ringbauer, and Dan J. Shepherd. Average-case com- J. A. Jones, V. Saggio, I. A. Calafell, L. A. plexity versus approximate simulation Rozema, A. Erhard, M. Meth, L. Postler, of commuting quantum computations. R. Stricker, P. Schindler, R. Blatt, T. Monz, Phys. Rev. Lett., 117:080501, August 2016. P. Walther, and J. F. Fitzsimons. Cross- doi:10.1103/PhysRevLett.117.080501. URL verification of independent quantum devices. https://link.aps.org/doi/10.1103/ 2019. URL https://arxiv.org/abs/1905. PhysRevLett.117.080501. 09790. [49] Michael J. Bremner, Ashley Montanaro, [43] Daniel Mills, Anna Pappa, Theodoros and Dan J. Shepherd. Achieving quantum Kapourniotis, and Elham Kashefi. Infor- supremacy with sparse and noisy commuting mation theoretically secure hypothesis test quantum computations. Quantum, 1:8, April for temporally unstructured quantum com- 2017. ISSN 2521-327X. doi:10.22331/q-2017- putation (extended abstract). Electronic 04-25-8. URL https://doi.org/10.22331/ Proceedings in Theoretical Computer Science, q-2017-04-25-8. 266:209–221, February 2018. ISSN 2075- 2180. doi:10.4204/eptcs.266.14. URL http: [50] Richard M. Karp and Richard J. Lipton. Some //dx.doi.org/10.4204/EPTCS.266.14. connections between nonuniform and uniform complexity classes. In Proceedings of the [44] Scott Aaronson and Lijie Chen. Complexity- Twelfth Annual ACM Symposium on The- theoretic foundations of quantum supremacy ory of Computing, STOC ’80, page 302–309, experiments. 2016. URL https://arxiv. New York, NY, USA, 1980. Association for org/abs/1612.05903. Computing Machinery. ISBN 0897910176.

30 doi:10.1145/800141.804678. URL https:// [58] Panagiotis Kl. Barkoutsos, Jerome F. doi.org/10.1145/800141.804678. Gonthier, Igor Sokolov, Nikolaj Moll, Gian Salis, Andreas Fuhrer, Marc Ganzhorn, [51] J. Misra and David Gries. A constructive Daniel J. Egger, Matthias Troyer, Anto- proof of vizing’s theorem. Information Pro- nio Mezzacapo, Stefan Filipp, and Ivano cessing Letters, 41(3):131 – 133, 1992. ISSN Tavernelli. Quantum algorithms for elec- 0020-0190. doi:https://doi.org/10.1016/0020- tronic structure calculations: Particle-hole 0190(92)90041-S. URL http://www. hamiltonian and optimized wave-function sciencedirect.com/science/article/ expansions. Phys. Rev. A, 98:022322, August pii/002001909290041S. 2018. doi:10.1103/PhysRevA.98.022322. URL https://link.aps.org/doi/10.1103/ [52] Juan Bermejo-Vega, Dominik Hangleiter, PhysRevA.98.022322. Martin Schwarz, Robert Raussendorf, and Jens Eisert. Architectures for quantum [59] C. E. Porter and R. G. Thomas. Fluc- simulation showing a quantum speedup. tuations of nuclear reaction widths. Phys. Rev. X, 8:021010, April 2018. Phys. Rev., 104:483–491, October 1956. doi:10.1103/PhysRevX.8.021010. URL doi:10.1103/PhysRev.104.483. URL https://link.aps.org/doi/10.1103/ https://link.aps.org/doi/10.1103/ PhysRevX.8.021010. PhysRev.104.483. [60] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo, [53] D Hangleiter, M Kliesch, M Schwarz, and et al. A blueprint for demonstrating quantum J Eisert. Direct certification of a class of supremacy with superconducting qubits. quantum simulations. Quantum Science Science, 360(6385):195–199, 2018. ISSN and Technology, 2(1):015004, February 2017. 0036-8075. doi:10.1126/science.aao4309. doi:10.1088/2058-9565/2/1/015004. URL URL https://science.sciencemag.org/ https://doi.org/10.1088%2F2058-9565% content/360/6385/195. 2F2%2F1%2F015004. [61] Sergio Boixo, Vadim N. Smelyanskiy, and [54] Adam Bouland, Bill Fefferman, Chinmay Hartmut Neven. Fourier analysis of sampling Nirkhe, and Umesh Vazirani. Quantum from noisy chaotic quantum circuits. 2017. supremacy and the complexity of random cir- URL https://arxiv.org/abs/1708.01875. cuit sampling. 2018. URL https://arxiv. [62] Scott Aaronson and Alex Arkhipov. The org/abs/1803.04402. computational complexity of linear optics. [55] Ramis Movassagh. Efficient unitary paths and In Proceedings of the Forty-Third Annual quantum computational supremacy: A proof ACM Symposium on Theory of Comput- of average-case hardness of random circuit ing, STOC ’11, page 333–342, New York, sampling. 2018. URL https://arxiv.org/ NY, USA, 2011. Association for Com- abs/1810.04681. puting Machinery. ISBN 9781450306911. doi:10.1145/1993636.1993682. URL https: [56] Michael A. Nielsen and Isaac L. Chuang. //doi.org/10.1145/1993636.1993682. Quantum computation and quantum informa- [63] Edwin Pednault, John A. Gunnels, Giacomo tion: 10th anniversary edition, 2010. Nannicini, Lior Horesh, and Robert Wisnieff. Leveraging secondary storage to simulate deep [57] Alexander Cowtan, Silas Dilkes, Ross Duncan, 54-qubit sycamore circuits. 2019. URL https: Will Simmons, and Seyon Sivarajah. Phase //arxiv.org/abs/1910.09534. gadget synthesis for shallow circuits. Elec- tronic Proceedings in Theoretical Computer [64] Benjamin Villalonga, Sergio Boixo, Bron Nel- Science, 318:214–229, April 2020. ISSN 2075- son, Christopher Henze, Eleanor Rieffel, Ru- 2180. doi:10.4204/eptcs.318.13. URL http: pak Biswas, and Salvatore Mandrà. A flex- //dx.doi.org/10.4204/EPTCS.318.13. ible high-performance simulator for verifying

31 and benchmarking quantum circuits imple- computation via randomized compiling. mented on real hardware. npj Quantum In- Phys. Rev. A, 94:052325, November 2016. formation, 5(1):86, 2019. ISSN 2056-6387. doi:10.1103/PhysRevA.94.052325. URL doi:10.1038/s41534-019-0196-1. URL https: https://link.aps.org/doi/10.1103/ //doi.org/10.1038/s41534-019-0196-1. PhysRevA.94.052325.

[65] Benjamin Villalonga, Dmitry Lyakh, Sergio [72] Jeff Heckey, Shruti Patil, Ali JavadiAb- Boixo, Hartmut Neven, Travis S Humble, Ru- hari, Adam Holmes, Daniel Kudrow, Ken- pak Biswas, Eleanor G Rieffel, Alan Ho, and neth R. Brown, Diana Franklin, Frederic T. Salvatore Mandrà. Establishing the quan- Chong, and Margaret Martonosi. Compiler tum supremacy frontier with a 281 pflop/s management of communication and paral- simulation. Quantum Science and Technol- lelism for quantum computation. In Pro- ogy, 5(3):034003, apr 2020. doi:10.1088/2058- ceedings of the Twentieth International Con- 9565/ab7eeb. URL https://doi.org/10. ference on Architectural Support for Pro- 1088%2F2058-9565%2Fab7eeb. gramming Languages and Operating Sys- tems, ASPLOS ’15, page 445–456, New [66] pytket documentation. URL https: York, NY, USA, 2015. Association for Com- //cqcl.github.io/pytket/build/html/ puting Machinery. ISBN 9781450328357. index.html. doi:10.1145/2694344.2694357. URL https: [67] qiskit documentation. URL https://qiskit. //doi.org/10.1145/2694344.2694357. org/documentation/. [73] Prakash Murali, Jonathan M. Baker, Ali [68] Sumeet Khatri, Ryan LaRose, Alexander Javadi-Abhari, Frederic T. Chong, and Poremba, Lukasz Cincio, Andrew T. Sorn- Margaret Martonosi. Noise-adaptive com- borger, and Patrick J. Coles. Quantum- piler mappings for noisy intermediate-scale assisted quantum compiling. Quantum, 3:140, quantum computers. In Proceedings of May 2019. ISSN 2521-327X. doi:10.22331/q- the Twenty-Fourth International Conference 2019-05-13-140. URL https://doi.org/10. on Architectural Support for Programming 22331/q-2019-05-13-140. Languages and Operating Systems, ASP- LOS ’19, page 1015–1029, New York, [69] Ali JavadiAbhari, Shruti Patil, Daniel NY, USA, 2019. Association for Com- Kudrow, Jeff Heckey, Alexey Lvov, Fred- puting Machinery. ISBN 9781450362405. eric T. Chong, and Margaret Martonosi. doi:10.1145/3297858.3304075. URL https: Scaffcc: A framework for compilation and //doi.org/10.1145/3297858.3304075. analysis of quantum computing programs. In Proceedings of the 11th ACM Confer- [74] Ali J Abhari, Arvin Faruque, Mohammad J ence on Computing Frontiers, CF ’14, New Dousti, Lukas Svec, Oana Catu, Amlan York, NY, USA, 2014. Association for Com- Chakrabati, Chen-Fu Chiang, Seth Vander- puting Machinery. ISBN 9781450328708. wilt, John Black, and Fred Chong. Scaffold: doi:10.1145/2597917.2597939. URL https: language. Techni- //doi.org/10.1145/2597917.2597939. cal report, PRINCETON UNIV NJ DEPT OF COMPUTER SCIENCE, 2012. URL [70] Aram W. Harrow, Benjamin Recht, and https://www.cs.princeton.edu/research/ Isaac L. Chuang. Efficient discrete ap- techreps/TR-934-12. proximations of quantum gates. Journal of Mathematical Physics, 43(9):4445–4451, 2002. [75] Alexander J McCaskey, Dmitry I Lyakh, doi:10.1063/1.1495899. URL https://doi. Eugene F Dumitrescu, Sarah S Powers, and org/10.1063/1.1495899. Travis S Humble. XACC: a system-level software infrastructure for heterogeneous [71] Joel J. Wallman and Joseph Emerson. quantum–classical computing. Quantum Sci- Noise tailoring for scalable quantum ence and Technology, 5(2):024002, February

32 2020. doi:10.1088/2058-9565/ab6bf6. URL [81] Iskren Vankov, Daniel Mills, Petros Wallden, https://doi.org/10.1088%2F2058-9565% and Elham Kashefi. Methods for classically 2Fab6bf6. simulating noisy networked quantum architec- tures. Quantum Science and Technology, 5 [76] Alexander Cowtan, Silas Dilkes, Ross Duncan, (1):014001, November 2019. doi:10.1088/2058- Alexandre Krajenbrink, Will Simmons, and 9565/ab54a4. URL https://doi.org/10. Seyon Sivarajah. On the Qubit Routing Prob- 1088%2F2058-9565%2Fab54a4. lem. In Wim van Dam and Laura Mancinska, editors, 14th Conference on the Theory of [82] J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Quantum Computation, Communication and Gaebler, S. A. Moses, M. S. Allman, C. H. Cryptography (TQC 2019), volume 135 of Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, Leibniz International Proceedings in Infor- C. Ryan-Anderson, and B. Neyenhuis. Demon- matics (LIPIcs), pages 5:1–5:32, Dagstuhl, stration of the qccd trapped-ion quantum com- Germany, 2019. Schloss Dagstuhl–Leibniz- puter architecture, 2020. URL https:// Zentrum fuer Informatik. ISBN 978-3- arxiv.org/abs/2003.01293. 95977-112-2. doi:10.4230/LIPIcs.TQC.2019.5. [83] Joseph Emerson, Yaakov S. Weinstein, Mar- URL http://drops.dagstuhl.de/opus/ cos Saraceno, Seth Lloyd, and David G. volltexte/2019/10397. Cory. Pseudo-random unitary operators [77] Prakash Murali, David C. Mckay, Margaret for quantum information processing. Sci- Martonosi, and Ali Javadi-Abhari. Software ence, 302(5653):2098–2100, 2003. ISSN mitigation of crosstalk on noisy intermediate- 0036-8075. doi:10.1126/science.1090790. scale quantum computers. In Proceed- URL https://science.sciencemag.org/ ings of the Twenty-Fifth International Con- content/302/5653/2098. ference on Architectural Support for Pro- [84] Joseph Emerson, Etera Livine, and gramming Languages and Operating Sys- Seth Lloyd. Convergence conditions tems, ASPLOS ’20, page 1001–1016, New for random quantum circuits. Phys. York, NY, USA, 2020. Association for Com- Rev. A, 72:060302, December 2005. puting Machinery. ISBN 9781450371025. doi:10.1103/PhysRevA.72.060302. URL doi:10.1145/3373376.3378477. URL https: https://link.aps.org/doi/10.1103/ . //doi.org/10.1145/3373376.3378477 PhysRevA.72.060302. [78] IBM Quantum Experience User Guide. [85] Fernando G. S. L. Brandão, Aram W. Har- Advanced single-qubit gates, 2019. URL row, and Michał Horodecki. Local random https://quantum-computing.ibm.com/ quantum circuits are approximate polynomial- support/guides/user-guide. designs. Communications in Mathemati- cal Physics, 346(2):397–434, September 2016. [79] Christopher Chamberland, Guanyu Zhu, ISSN 1432-0916. doi:10.1007/s00220-016- Theodore J. Yoder, Jared B. Hertzberg, and 2706-8. URL https://doi.org/10.1007/ Andrew W. Cross. Topological and subsys- s00220-016-2706-8. tem codes on low-degree graphs with flag qubits. Phys. Rev. X, 10:011022, Jan- [86] Andrew Fagan and Ross Duncan. Optimis- uary 2020. doi:10.1103/PhysRevX.10.011022. ing clifford circuits with quantomatic. Elec- URL https://link.aps.org/doi/10.1103/ tronic Proceedings in Theoretical Computer PhysRevX.10.011022. Science, 287:85–105, January 2019. ISSN 2075-2180. doi:10.4204/eptcs.287.5. URL [80] Jerry Chow and Jay Gambetta. Quan- http://dx.doi.org/10.4204/EPTCS.287.5. tum takes flight: Moving from labora- tory demonstrations to building systems, [87] M Blaauboer and R L de Visser. An ana- 2020. URL https://www.ibm.com/blogs/ lytical decomposition protocol for optimal research/2020/01/quantum-volume-32/. implementation of two-qubit entangling

33 gates. Journal of Physics A: Mathematical Jerry M. Chow, Colm A. Ryan, Chad and Theoretical, 41(39):395307, sep 2008. Rigetti, S. Poletto, Thomas A. Ohki, doi:10.1088/1751-8113/41/39/395307. URL Mark B. Ketchen, and M. Steffen. Char- https://doi.org/10.1088%2F1751-8113% acterization of addressability by simulta- 2F41%2F39%2F395307. neous randomized benchmarking. Phys. Rev. Lett., 109:240504, December 2012. [88] Easwar Magesan and Jay M. Gambetta. doi:10.1103/PhysRevLett.109.240504. URL Effective hamiltonian models of the cross- https://link.aps.org/doi/10.1103/ resonance gate. Phys. Rev. A, 101:052308, PhysRevLett.109.240504. May 2020. doi:10.1103/PhysRevA.101.052308. URL https://link.aps.org/doi/10.1103/ [94] David C. McKay, Andrew W. Cross, Christo- PhysRevA.101.052308. pher J. Wood, and Jay M. Gambetta. Corre- [89] Timothy Proctor, Kenneth Rudinger, lated randomized benchmarking, 2020. URL Kevin Young, Mohan Sarovar, and https://arxiv.org/abs/2003.02354. Robin Blume-Kohout. What randomized benchmarking actually measures. Phys. [95] Easwar Magesan, J. M. Gambetta, and Rev. Lett., 119:130502, September 2017. Joseph Emerson. Scalable and robust doi:10.1103/PhysRevLett.119.130502. URL randomized benchmarking of quantum pro- https://link.aps.org/doi/10.1103/ cesses. Phys. Rev. Lett., 106:180504, May PhysRevLett.119.130502. 2011. doi:10.1103/PhysRevLett.106.180504. URL https://link.aps.org/doi/10.1103/ [90] E. Knill, D. Leibfried, R. Reichle, J. Britton, PhysRevLett.106.180504. R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland. [96] Easwar Magesan, Jay M. Gambetta, and Randomized benchmarking of quantum Joseph Emerson. Characterizing quan- gates. Phys. Rev. A, 77:012307, January tum gates via randomized benchmarking. 2008. doi:10.1103/PhysRevA.77.012307. Phys. Rev. A, 85:042311, April 2012. URL https://link.aps.org/doi/10.1103/ doi:10.1103/PhysRevA.85.042311. URL PhysRevA.77.012307. https://link.aps.org/doi/10.1103/ [91] Arnaud Carignan-Dugas, Kristine Boone, PhysRevA.85.042311. Joel J Wallman, and Joseph Emerson. From randomized benchmarking experiments [97] A. D. Córcoles, Jay M. Gambetta, Jerry M. to gate-set circuit fidelity: how to inter- Chow, John A. Smolin, Matthew Ware, Joel pret randomized benchmarking decay pa- Strand, B. L. T. Plourde, and M. Steffen. rameters. New Journal of Physics, 20(9): Process verification of two-qubit quan- 092001, September 2018. doi:10.1088/1367- tum gates by randomized benchmarking. 2630/aadcc7. URL https://doi.org/10. Phys. Rev. A, 87:030301, March 2013. 1088%2F1367-2630%2Faadcc7. doi:10.1103/PhysRevA.87.030301. URL https://link.aps.org/doi/10.1103/ [92] Timothy J. Proctor, Arnaud Carignan-Dugas, PhysRevA.87.030301. Kenneth Rudinger, Erik Nielsen, Robin Blume-Kohout, and Kevin Young. Direct [98] David C. McKay, Sarah Sheldon, John A. randomized benchmarking for multiqubit de- Smolin, Jerry M. Chow, and Jay M. Gam- vices. Phys. Rev. Lett., 123:030503, July betta. Three-qubit randomized benchmark- 2019. doi:10.1103/PhysRevLett.123.030503. ing. Phys. Rev. Lett., 122:200502, May URL https://link.aps.org/doi/10.1103/ 2019. doi:10.1103/PhysRevLett.122.200502. PhysRevLett.123.030503. URL https://link.aps.org/doi/10.1103/ [93] Jay M. Gambetta, A. D. Córcoles, S. T. PhysRevLett.122.200502. Merkel, B. R. Johnson, John A. Smolin,

34 A Exponential Distribution for other similar random circuits [29, 38, 44], we explore the distribution of its output probabilities The exponential distribution, with rate λ, is a prob- here. ability distribution with the probability density The relevant results are seen in Figure 12. In function particular, it can be seen from Figure 12b that the −λx Pr (x) = λe . minimum value of `1-norm distance between the This is the distribution of waiting times between distribution of output probabilities and the expo- events in a Poisson process. We are concerned nential distribution is approached at a number of with showing that output probabilities of the cir- layers equal to the number of qubits, justifying our cuits classes considered here are exponentially dis- choice of layer numbers in Algorithm2. It may be tributed. Such a property is a signature of quan- that asymptotically the number of layers required tum chaos, and that a class of circuits is approx- is sub-linear [29], although for the circuit sizes used imately Haar random [29, 83, 84]. It also allows here a linear growth in depth is appropriate. Fig- for the calculation of both the ideal value of the ure 12a illustrates the closeness of fit of the two cross-entropy discussed in Section 3.2, and the ideal distributions. heavy output probability as discussed in Section 3.1. This in turn allows us to fully exploit Cross- A.2 Deep Circuits Entropy Benchmarking and Heavy Output Gener- ation Benchmarking. Here we will argue numeri- Unlike with square circuits, there is no precedent cally which of the circuits we introduce in Section for utilising deep circuits to generate exponentially 2 generate output probabilities of this form19, and distributed output probabilities, as we do here. discuss the implications when they do not. This allows us to use deep circuits as a uniquely in- We also demonstrate why the circuit depths used sightful benchmark of the performance of quantum in Section2 are necessary to generate output prob- computing stacks, grounded both in the theoretical abilities of this form. To do this we generate 100 results of Section3, and in pertinent applications. circuits of each type and number of layers, where a The relevant results are seen in Figure 13. In layer is as defined in the respective Algorithms of particular, it can be seen from Figure 13b that the Section2. We then calculate the ideal output prob- minimum value of `1-norm distance between the abilities using classical simulation and compare this distribution of output probabilities and the expo- distribution of output probabilities to the exponen- nential distribution is approached at a number of tial distribution. In the case of square circuits and layers equal to three times the number of qubits, deep circuits, we notice a better approximation of plus one, justifying our choice of layer numbers in the exponential distribution by the distribution of Algorithm3. Figure 13a illustrates the closeness of fit of the two distributions. output probabilities, measured by the `1-norm dis- tance between the two, as the number of layers in- The depth required to achieve an exponential dis- creases. We can use this to isolate the number of tribution of outcome probabilities with deep cir- layers at which the difference approaches its mini- cuits is greater than is the case for square circuits. mum. Indeed, random circuits were initially introduced as the shallowest circuits required to generate such output probabilities [29]. This sacrifice in depth A.1 Square Circuits is made to achieve a benchmark which is uniquely The exponential form of the distribution of the out- application motivated, as discussed in Section2. put probabilities from random circuits similar to square circuits has been established [29, 44]. As A.3 Shallow Circuits the procedure we use to generate square circuits, seen in Algorithm2, differs slightly from that used Unlike in the case of square circuits and deep cir- cuits, the output probabilities of shallow circuits 19 This numerical approach to demonstrating properties of are not exponentially distributed. This is unsur- distributions of output probabilities from particular circuit classes parallels that taken in other work on benchmarking prising since random circuits with this limited con- [29, 44, 52, 62]. nectivity are thought to require at least depth

35 30 30

20 20

10 10 Probability Density Probability Density 0 0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.0 0.1 0.2 0.3

pC pC (a) The distribution of output probabilities from a cir- (a) The distribution of output probabilities from a cir- cuit C, where C is a 5 qubit circuit, from the square cuit C, where C is a 5 qubit circuit, from the deep circuits class as defined in Algorithm2. circuits class as defined in Algorithm3.

1.50

0.8 1.25

0.6 1.00 0.75 0.4 0.50 -norm distance -norm distance

1 0.2 1 ` ` 0.25

0.00 1 2 3 4 5 6 0 5 10 15 20 Number of Layers Number of Layers

(b) The `1-norm distance between the distribution of (b) The `1-norm distance between the distribution of output probabilities of square circuits and the expo- output probabilities of deep circuits and the exponen- n n nential distribution 2ne−2 x, where n is the number of tial distribution 2ne−2 x, where n is the number of qubits. A layer is defined as in Algorithm2. Colours qubits. A layer is defined as in Algorithm3. Colours correspond to numbers of qubits in the following way: correspond to numbers of qubits in the following way: 2 [ ], 3 [ ], 4 [ ], 5 [ ]. 2 [ ], 3 [ ], 4 [ ], 5 [ ].

Figure 12: Exponential distribution fitting Figure 13: Exponential distribution fitting data for square circuits. data for deep circuits.

36 √ O ( n) to create such a feature [52, 54, 85]. This slices. This does not take device error rates has the unfortunate side effect that the results of into account. Section 3.2 do not apply, and so Cross-Entropy Benchmarking cannot be used. Algorithm 4 pytket compilation strategies. The While it is also true that the predictions made passes listed here are named as in the documenta- about the ideal heavy output probability, as dis- tion for pytket [66], where additional detail on their cussed in Section 3.1, also do not apply, a study actions can be found. of the heavy output probability is still of interest. In particular, while we cannot connect the bench- Input: noise_aware ∈ {True, False} mark to the HOG problem of Problem1, we can compare the probability of generating heavy out- 1: OptimiseCliffords puts to the ideal probability of producing heavy 2: KAKDecomposition outputs, as calculated by classical simulation. 3: 4: RebaseToRzRx . Convert to IBM gate set 5: CommuteRzRxThroughCX B Compilation Strategies 6: 7: if noise_aware then This section details the compilation strategies ex- 8: noise_aware_placement plored in each of our experiments. For the circuit 9: else families and figures of merit investigated here, the 10: line_placement compilation strategies we used were designed and 11: end if empirically confirmed to perform well at the com- 12: pilation tasks at hand. The version of each package 13: route used are listed in Table2. 14: decompose_SWAP_to_CX 15: redirect_CX_gates . Orientate CX to noise-unaware pytket and noise-aware pytket coupling map The noise-unaware pytket and noise-aware pytket 16: compilation strategies are generated using Algo- 17: OptimisePostRouting . Optimisation rithm4. noise-unaware pytket is generated by pass- preserving placement and orientation ing False as input to Algorithm4, and noise-aware pytket by passing True. Of particular interest are the following functions: noise-unaware Qiskit and noise-aware Qiskit OptimiseCliffors: Simplifies Clifford gate se- The noise-unaware Qiskit and noise-aware Qiskit quences [86]. compilation strategies, as defined in Algorithm5, are heavily inspired by level_3_passmanager, a KAKDecomposition: Identifies two-qubit sub- preconfigured compilation strategy made available circuits with more than 3 CXs and reduces in Qiskit. noise-unaware Qiskit is generated by pass- them via the KAK/Cartan decomposition ing noise_aware as False in Algorithm4, and [87]. noise-aware Qiskit by passing True. Where possible we passed stochastic as True in route: Modifies the circuit to satisfy the archi- order to use StochasticSwap instead of BasicSwap tectural constraints [76]. This will introduce during the swap mapping pass. In general, SWAP gates. StochasticSwap generates circuits with lower noise_aware_placement: Selects initial qubit depth; however, for the versions listed in Table2, placement taking in to account reported it proved faulty for some circuit sizes and device device gate error rates [10]. coupling maps used in this work. StochasticSwap may also result in repeated measurement of the line_placement: Attempts to place qubits next to same qubit, which cannot be implement. Repeated those they interact with in the first few time compilation attempts may therefore be necessary,

37 Package Version Qiskit [1, 67] 0.12.0 pytket [10, 66] 0.3.0

Table 2: Packages used in this work, and their corresponding versions. and if this fails the circuit is not included in the plots of Section5. Of particular note are the following functions: Algorithm 5 Qiskit compilation strategies. The passes listed here are named as in the documenta- NoiseAdaptiveLayout: Selects initial qubit place- tion for Qiskit [67], where additional detail on their ment based on minimising readout error rates actions can be found. [73]. Input: DenseLayout: Chooses placement by finding the noise_aware ∈ {True, False} most connected subset of qubits. stochastic ∈ {True, False}

Unroller: Decomposes unitary operation to de- 1: Unroller sired gate set. 2: 3: if noise_aware then StochasticSwap: Adds SWAP gates to adhere to 4: NoiseAdaptiveLayout coupling map using a randomised algorithm. 5: else 6: DenseLayout BasicSwap: Produces a circuit adhering to cou- 7: end if pling map using a simple rule: CX gates in the 8: . Assign idle qubits as circuit which are not supported by the hard- AncillaAllocation ancillas ware are preceded with necessary SWAP gates. 9: 10: if stochastic then 11: StochasticSwap only pytket routing In this case we perform, in 12: else the order as listed, the pytket operations: route, 13: BasicSwap decompose_SWAP_to_CX, and redirect_CX_gates. 14: end if We then account for the architecture gate set, with- 15: out any further optimisation. 16: Decompose(SwapGate) . Decompose SWAP to CX 17: CXDirection . Orientate CX to coupling map C Device Data 18: 19: . Gather 2 qubit blocks Two device properties leveraged by our compila- 20: Collect2qBlocks tion strategies are the coupling maps, describing the 21: ConsolidateBlocks connectivity of the qubits and in which directions 22: CX gates can be performed, and the calibration in- 23: Unroller . Unroll two-qubit blocks formation, describing the noise levels of the device. 24: Optimize1qGates . Combine chains of These properties, and devices noise levels in par- one-qubit gates ticular, are considered valuable benchmarks of the 25: CXDirection performance of the device in their own right. These properties are collectively influential in noise-aware compiling, as detailed in AppendixB.

38 There circuits are compiled to adhere to the de- this information is contained in calibration data vice’s coupling map, while also aiming to min- which is accessible using tools in the Qiskit library, imise some function of the calibration informa- and is updated twice daily. The experiments in tion. Because full quantum computing stack holis- this paper were conducted between 2020-01-29 and tic benchmarking encompasses the circuit compi- 2020-02-10 with the calibration data in Figure 15 lation strategies, it provides a novel way of using and Figure 16 aggregated over this time period. device information to benchmark an entire system, An assignment or readout error corresponds to instead of simply the physical qubits which com- an incorrect reading of the state of the qubit; for ex- prise it. ample, returning “0” when the proper label is “1”, or vice-versa. The probability of incorrectly labelling a C.1 Device Coupling Maps the qubit is called the readout error, denoted,  , and is calculated as A coupling map of a device is a graphical represen- Pr(“0”||1i) + Pr(“1”||0i) tation of how two-qubit gates can be applied across a = . (11) the device. In this representation, each qubit is 2 represented by a vertex, with directed edges join- a is estimated by repeatedly preparing a qubit in ing qubits between which a two-qubit gate can be a known state, immediately measuring it, and then applied. For the devices considered here, this two- counting the number of times the measurement re- qubit gate is a CX gate, implemented using the turns the wrong label. This value, for the devices cross-resonance interaction of qubits [88]. explored in this paper, is reported in Figure 15a. The direction of the edge is from the control to Errors affecting the gates of the device corre- the target qubit of the CX gate, with bi-directional spond to an incorrect operation applied by the de- edges indicating that both qubits can be used as vice. There are many ways to quantify the effect of either the control or target. The coupling maps this error, with IBM Quantum’s devices reporting of the devices investigated in this work are shown randomized benchmarking (RB) numbers [89, 90]. in Figure 14. For those devices all edges are bi- The RB number, C, is estimated by running many directional, although this is not typical when the self-inverting Clifford circuits, consisting of m lay- asymmetric CX is employed. ers of gates drawn from the n-qubit Clifford group, As discussed in Section4, a trade-off exists be- inverted at layer m + 1. The survival probabil- tween the connectivity of the device and the num- ity, which is the probability the input state is un- ber of two-qubit gates necessary to implement a changed, can then be estimated. Under a broad set given circuit. More highly connected coupling maps of noise models and assumptions [89, 91], this sur- typically require fewer two-qubit gates to imple- vival probability can be shown to decay exponen- ment a fixed unitary than less connected ones, ow- tially with m. Consequently, it can be estimated ing to the reduced need for SWAP gates to ac- by fitting a decay curve of the form Apm + B. The count for discrepancies between the coupling maps RB number is related to p ∈ [0, 1], called the depo- of the uncompiled circuit and the device. While larisation/decay rate, by this reduced depth can reduce the impact of time based noise channels, this is counterbalanced by C = (1 − p) (1 − 1/D) , (12) the higher levels of cross-talk experienced by qubits corresponding to vertices with high degree in the where D = 2n, and n is the number of qubits acted device’s coupling map [79]. on by the Clifford gates. C, which is also referred to as the error per Clifford of the device, is min- C.2 Device Calibration Information imised at p = 1, in which case the survival proba- bility is constant and set by the state preparation The noise-aware tools employed by the compilation and measurement errors. strategies explored in this work consider three kinds The Clifford gates necessary for RB must be com- of errors which can occur, namely: readout error, piled to the native gate set of the device. Using an C g single-qubit gate error, and two-qubit gate error. estimate of  , an estimate of the error per gate, G, For the devices provided through IBM Quantum, for a gate G, can be obtained by multiplying C by

39 (a) ibmqx2 (b) ibmq_singapore (c) ibmq_16_melbourne (d) ibmq_ourense

Figure 14: Coupling maps of the devices studied in this work. Vertices, represented by blue circles, correspond to qubits, while edges are directed from the control to the target qubits of permitted two-qubit gates.

a factor related to the average number of uses of G D Empirical Relationship Be- when implementing a random Clifford operation: tween Heavy Output Gen- g C G ∼  × # uses of G per Clifford. (13) eration Probability and L1 g Distance Values for U , the error per gate for U2 gates, can 2 g be found in Figure 15b, and CX, that for CX gates, in Figure 16. The commonly reported average fi- As discussed in Section 3.4, the theoretical foun- 2 delity for U gates is 1 − 1 − g  . dations for believing that implementing shallow 3 U2 There are many variants of randomized bench- circuits to within a fixed `1-norm distance con- marking, such as direct RB [92], simultaneous RB stitutes a demonstration of quantum computa- [93], and correlated RB [94]. For details on the tional supremacy are stronger than for imple- randomized benchmarking protocol used by IBM mentations with high heavy output generation Quantum, see [93, 95–98]. probability. That being said, Figure3 and The experiments necessary for cross-entropy Figure 11 contain similar features. For exam- benchmarking may themselves also be used to esti- ple, ibmq_16_melbourne consistently performs the mate a depolarisation rate in a similar way to RB worst, with ibmq_singapore and ibmq_ourense per- [3]. Instead of using random Clifford circuits, how- forming the best in both figures of merit. An in- ever, the random circuits are run. Under the as- teresting question, then, is how these two figures of sumption that the action of a random circuit can merit generally relate to one another. be described using a depolarising error model (with If the `1-norm distance was 0, the experimen- equal-probability Pauli errors), then the Pauli er- tal outcome frequencies would equal the ideal out- ror, P, can be estimated as come probabilities. Consequently, the heavy out- put probabilities would be the same between the P = (1 − p)(1 − 1/D2). (14) device and an ideal quantum computer. Because the heavy output probability depends on the cir- Here, p is the depolarisation rate of the survival cuit in question, when examining the empirical rela- probability under the action of random circuits, tionship between `1-norm distance and heavy out- estimated as above. Interestingly, P can be esti- put probability, it is useful to normalize the lat- mated using single and two-qubit RB information. ter by the heavy output probability of an ideally- Several important noise channels, most notably implemented circuit. We define the normalised cross-talk, are not included in the device calibration heavy output generation probability as the ratio of data. As shown in Section5, the effects of this noise the heavy output probability of the device and the can be inferred through the application-motivated heavy output probability from an ideal quantum benchmarks we introduce in this work, by showing computer. Therefore if the `1-norm distance was 0, the trade-off between connectivity of the device and the normalised heavy output generation probability cross-talk [79]. would be 1.

40 h vrg ro ae;errbr r n tnaddvain aaageae ae ncalibration are: on here based shown aggregated Devices Data experiments. deviation. our standard of one course are [ the bars over error collected data rates; error average the 15: Figure labelled. (a) (b) ],

Error Rate error. readout Average Error Rate ibmq_singapore 1 1 1 0 0 0 v rg ro per error Average 2 1 3

ro e igeqbtoeain ntedvcsue nti work. this in used devices the on operations qubit single per Error 0 0

1 1

[ 10 10 ], ibmq_16_melbourne

U 11 11 2 gate. 12

12 incorrectly is qubit given a of state the probability the is error readout The

h ro e aei esr fhwacrtl the accurately how of measure a is gate per error The 13 13

14 14

15 15 [

.Algrtmcsaei used. is scale logarithmic A ]. 16 16 Qubit Qubit 41 17 17

18 18

19 19

2 2

3 3

4 4 ibmqx2

5 5

U 6 6 [ 2 ], aei applied. is gate

ibmq_ourense 7 7 asindicate Bars

8 8

9 9 ro asaeoesadr eito.Dt grgtdbsdo airto aacletdoe the over collected data calibration on are: based here aggregated shown Devices Data ibmq_16_melbourne experiments. deviation. our standard of one course are bars error CX 16: Figure Error Rate aei esr fhwacrtl the accurately how of measure a is gate 1 1 1 1 0 0 0 0 2 1 2 1

(0,1) v rg ro per error Average (1,0) (1,2) (0,14) (2,1) (1,13) (2,3) (3,2)

[ (2,12) (3,4) .Algrtmcsaei used. is scale logarithmic A ]. (4,3) (3,11) (4,5) (5,4) (4,10) (5,6) (6,5)

CX (5,9) (6,8) (7,8) (8,6) prto ntedvcsue nti work. this in used devices the on operation (9,5) (8,7) (8,9) (9,8) (10,4)

CX (9,10) (10,9) (11,3) (10,11) aei ple.Br niaeteaeaeerrrates; error average the indicate Bars applied. is gate Qubits 42 (11,10) (12,2)

ibmqx2 (11,12) (12,11) (13,1) (12,13) (13,12) (14,0)

[ (13,14) (14,13)

], (0,2) (1,3) ibmq_ourense (2,0) (3,1) (2,4) (1,6) (4,2) (3,8) (6,1) (5,10) (6,7) (7,6) (8,3) [ (7,12) (10,5) ], (12,7) ibmq_singapore (9,14) (14,9) (11,16) (16,11)

h ro per error The (13,18) (16,15) (15,16) (16,17) (17,16) (18,13) (17,18) (18,17)

[ (19,18) (18,19) ], As the `1-norm distance increases, the experi- mental frequencies increasingly differ from the ideal outcome probabilities. Two things then may hap- pen: heavy outputs are produced more regularly, in which case the normalised heavy output generation probability will grow above 1; or less regularly, in which case the normalised heavy output generation probability will fall below 1. In practice, we expect the distribution produced by the device to converge to the uniform one over all bit strings as the noise increases, so we expect the normalised heavy out- put generation probability to fall with increasing `1-norm distance. The empirical relationship between the nor- malised heavy output generation probability and `1-norm distance is shown in Figure 17. For each circuit, Figure 17 plots the `1-norm distance of the distribution produced by a real device against the normalised heavy output generation probabil- ity. As expected, a negative correlation exists be- tween these two figures of merit. For the deepest circuits, and in particular the widest circuits from the deep circuits class, the cluster of points can be seen to indicate that the the normalised heavy output generation probability falls more slowly as the `1-norm distance becomes larger. This is be- cause the minimum value of heavy output gener- ation probability is being reached, which is to say that the output distribution from the real device has converged to the uniform one, while more de- tail can be extracted by considering the `1-norm distance. This correlation is encouraging as, in the regime where it becomes impossible to calculate the `1- norm distance, we can be justified in believing that the correlation between the features present in the plot throughout this section persist. This line of reasoning is similar to that used when Cross- Entropy Benchmarking is used to predict demon- strations of quantum computational supremacy in the regime when it too becomes impossible to cal- culate [3]. Note also that this correlation contrasts with the knowledge that, in general, the probabil- ity of producing heavy outputs does not provide an upper bound on the `1-norm distance [54], and re- veals that in practice it may be relied upon to do so.

43 Circuit Class Shallow Circuits Square Circuits Deep Circuits 1.2

1.0 ibmq_singapore

0.8

0.6

0.4

0.2

1.2

1.0

0.8 ibmqx2

0.6

0.4 Device 0.2

1.2 ibmq_16_melbourne 1.0

0.8

0.6 Normalised Heavy Output Probability

0.4

0.2

1.2

1.0 ibmq_ourense

0.8

0.6

0.4

0.2 0.00 0.25 0.50 0.75 1.00 1.25 1.50 0.00 0.25 0.50 0.75 1.00 1.25 1.50 0.00 0.25 0.50 0.75 1.00 1.25 1.50

`1-norm distance

Figure 17: Scatter plot and linear regression line comparing the normalised heavy output generation probability and `1-norm distance. Each point corresponds to one circuit of the class and width as labelled. Colours correspond to numbers of qubits in the following way: 2 [ ], 3 [ ], 4 [ ], 5 [ ], 6 [ ], 7 [ ].

44