PHYSICAL REVIEW A 95, 032338 (2017)

Quantum computation with realistic magic-state factories

Joe O’Gorman1 and Earl T. Campbell2 1Department of Materials, University of Oxford, Oxford OX1 3PH, United Kingdom 2Department of Physics and Astronomy, University of Sheffield, Sheffield S3 7RH, United Kingdom (Received 26 August 2016; published 31 March 2017)

Leading approaches to fault-tolerant quantum computation dedicate a significant portion of the hardware to computational factories that churn out high-fidelity ancillas called magic states. Consequently, efficient and realistic factory design is of paramount importance. Here we present the most detailed resource assessment to date of magic-state factories within a surface code quantum computer, along the way introducing a number of techniques. We show that the block codes of Bravyi and Haah [Phys. Rev. A 86, 052329 (2012)] have been systematically undervalued; we track correlated errors both numerically and analytically, providing fidelity estimates without appeal to the union bound. We also introduce a subsystem code realization of these protocols with constant time and low ancilla cost. Additionally, we confirm that magic-state factories have space-time costs that scale as a constant factor of surface code costs. We find that the magic-state factory required for postclassical factoring can be as small as 6.3 million data , ignoring ancilla qubits, assuming 10−4 error gates and the availability of long-range interactions.

DOI: 10.1103/PhysRevA.95.032338

I. INTRODUCTION in constant time, independent of k, by braiding defects in the surface code. Despite this, the same work found that efficiency Architectures for quantum computers must tolerate ex- improvements of block protocols were modest. However, all perimental faults and imperfections, doing so in the most previous work has taken a very pessimistic estimate of the practical and efficient way. One aspect of fault tolerance is fidelity of these protocols, leading to an overestimated cost. We the use of error-correcting codes, which provides a storage present several results improving resource costs and leading to method for protecting from noise. To a more optimistic outlook for realzing magic-state factories. perform quantum computations, additional techniques are We present a method of realizing the Bravyi-Haah block needed to ensure a universal set of quantum gates can be protocols [16] in constant time without relying on braiding implemented fault tolerantly. Most error-correcting codes operations, providing a fast implementation to other architec- natively allow fault-tolerant implementation of gates from the tures. Braiding operations natively support control-NOT (CNOT) Clifford group, a nonuniversal set of gates. Fully functional gates with multiple target qubits in constant time, which is the quantum computation is attained by adding the Toffoli or π/8 key step in the circuit reduction of Ref. [3]. Our approach phase gate to the Clifford group. The prevailing proposal does not require availability of such powerful gates, but rather for performing these gates is to first prepare high-fidelity reduces time costs by borrowing concepts from subsystem magic states, which are then used to inject a gate into the codes [21–24] and notions of gauge fixing [7,9,25]. In main computation. These magic states are needed in vast subsystem codes, operations acting on many qubits are broken quantities, and their preparation requires a significant portion up into more pieces each involving fewer qubits, which we find of a device to operate as a dedicated magic-state factory [1–3]. enables greater parallelization in realizing Bravyi-Haah. Our Alternatives exist to the magic-state paradigm [4–10],butitis proposal is especially applicable to distributed architectures unclear whether they will be feasible substitutes due to worse for quantum computation and provides these devices with an thresholds [11,12]. approach to magic-state distillation that is low in ancilla and Magic-state factories use several rounds of distillation time costs. protocols, and several directions have been explored [15–20] We also show that magic-state factories can make more use to improve efficiency over the original proposal which uses of block protocols than previously thought by using techniques Reed-Muller codes [1]. Notably, an n → k block protocol to reduce and calculate the global output error. Consider a takes n input magic states and output k at higher fidelity, with protocol outputting K qubits with multiqubit global error higher ratios of n to k generally offering greater efficiency rate  . All previous studies have used the union bound, [15–17]. These block protocols do require more complex g also called Boole’s inequality, to assert   K, where K circuits, but there has been limited investigation into the g is the number of magic states and  is the average error full resource cost of these protocols. One advance in this rate of single- outputs, obtained by tracing the qubit direction [3] has shown that block protocols can be realized from the multiqubit state. For an uncorrelated product state, ⊗K K ρg = ρ ,wehaveg = 1 − (1 − ) , so to leading order 2 g = K − O( ), and the union bound is a good estimate for small . However, the output of block protocols can be highly Published by the American Physical Society under the terms of the correlated. Boole’s inequality holds for correlated states, but Creative Commons Attribution 4.0 International license. Further g can be much smaller than K. In distillation protocols distribution of this work must maintain attribution to the author(s) such as Bravyi-Haah, correlations are produced because block and the published article’s title, journal citation, and DOI. protocols reduce the probability of single errors, but pairs or

2469-9926/2017/95(3)/032338(19) 032338-1 Published by the American Physical Society JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017) clusters of errors can go undetected and lead to a correlated magic states at an average rate that can just keep up with the error pair. If one error is present, it almost certainly has a “time-optimal” surface code implementation of the algorithm partner. As such, use of the union bound has led to systematic [14], which we further discuss in Sec. VI. overestimation of noise in Bravyi-Haah and other distillation protocols. Here we track these correlations through multiple rounds of a magic-state factory. We take as our starting point II. REALIZING BLOCK PROTOCOLS the triorthogonal codes for producing π/8magicstates[16] Many descriptions of distillation protocols are high level, and a method of producing Toffoli magic states [26,27]. Once leaving open many aspects of how to implement these proto- we begin tracking correlations, it becomes apparent that quality cols. To assess the full resource cost, we require a low-level control of the factory can be improved. Detecting an error in description of distillation protocols in terms of elementary one part of the factory can, due to correlations, indicate a operations, such as one- and two-qubit gates, preparations, likely undetected partner error elsewhere in the factory. We and measurements. We call such a description a realization of introduce the notion of module checking whereby we discard a protocol, and a given protocol can have different realizations magic states brought into disrepute by their correlated partners. with varying costs. The enhanced fidelity of module checking is both analytically This section presents a realization of the Bravyi-Haah estimated and found by numerical Monte Carlo simulations, protocols, giving explicit instructions presented in Fig. 1.The with excellent agreement between these methods. protocol is presented as a four-step process acting on a col- The specification of the distillation protocol is just one as- lection of n noisy |T  states, where |T ∝|0+exp(iπ/4)|1. pect of designing magic-state factories. The size of magic-state The first step uses ancillas to measure operators composed of factories depends on both the distillation protocols and also the Pauli-Z operators, and the second step applies a correction underlying error correction codes, and a significant saving can dependent on the measurement outcomes. The third step be made by judiciously reducing the error correction code uses ancillas to measure operators composed of Pauli-X at earlier rounds of magic-state distillation. The balanced operators, with only certain measurement outcomes kept. After investment of qubits at each round of distillation uses small a successful third step, the k output magic states are within an surface codes for low-fidelity magic states and larger surface encoded state and delocalized across 3k + 8 sites. The fourth codes for high-fidelity magic states. A small fraction of magic step uses measurements to localize the output qubits to specific states in the factory will be high fidelity and require much sites. All these steps are detailed in Fig. 1 and show how larger surface codes. Raussendorf et al. [28] argued that, the multiqubit measurements are broken down into ancilla consequently, magic-state distillation can be achieved at a preparation, two-qubit gates, and single-qubit measurements. constant factor cost over surface code error correction, which Observe in Fig. 1 that each multiqubit measurement involves we also observe and discuss. only four entangling gates, with each such gate designated Taking all factors into consideration, we present a blueprint a distinct colored link. Therefore, the measurement is a for a factory capable of delivering enough magic states to solve four-qubit operator. This is called the weight of the operator. large Shor’s algorithm tasks, beyond the reach of classical When using a single ancilla to measure a stabilizer of weight m, computation, within a surface code quantum computer. Our we need m time steps to perform the required controlled-gate results are summarized in Table I, where we look at both the operations. The low, and constant, weight of our measurements space-time overhead required to produce a Toffoli magic state gives the realization a constant time cost. More commonly, and the physical footprint of the factory required to produce the Bravyi-Haah protocol is presented as requiring only two

TABLE I. The size and time requirements of some examples of magic-state factories. We consider an implementation of Shor’s algorithm requiring 40N 3 Toffoli gates, which dominates the overhead. We realize each of these gates using a single Toffoli magic state or seven T states in parallel [13], whichever proves optimal. In this algorithm, the Toffoli gates are all sequential, so using time-optimal methods [14], the fastest 3 possible run time is 40N tmeas/ff ,wheretmeas/ff is the time taken to make a physical measurement and feed forward the result to selectively perform a single-qubit gate elsewhere in the quantum computer. The number of “physical qubits in factory” neglects qubit cost associated with measuring surface code stabilizers, so for many architectures this number will be doubled. The variable tsc is the time taken to perform a single round of the parallel stabilizer measurements of the surface code, a process involving four CNOT gates, two single-qubit gates and a measurement. We assume throughout that tmeas/ff = 0.1tsc, which is reasonable for a distributed architecture such as ion traps.

Magic states Space-time overhead per Physical qubits in factory (and evaluation time) required magic state in qubit rounds required for time-optimal computation

−3 −4 −3 −4 Type Count pg = 10 pg = 10 pg = 10 , tmeas/ff = 0.1tsc pg = 10 , tmeas/ff = 0.1tsc

−3 −5 −3 −5 Problem tsc = 10 s tsc = 10 s tsc = 10 s tsc = 10 s 1000-bit Shor Toffoli 1010.60 1.41×107 5.35×105 1.73×108 1.73×108 6.30×106 6.30×106 (6.6 weeks) (11 h) (6.6 weeks) (11 h) 2000-bit Shor Toffoli 1011.51 1.66×107 5.71×105 2.18×108 2.18×108 6.97×106 6.97×106 (53 weeks) (3.7 days) (53 weeks) (3.7 days) 4000-bit Shor Toffoli 1012.41 1.94×107 6.12×105 2.50×108 2.50×108 7.69×106 7.69×106 (8 years) (4.2 weeks) (8 years) (4.2 weeks)

032338-2 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

FIG. 1. Explicit circuit for realizing Bravyi-Haah (3k + 8) → k block protocols for k = 2,6,10,14,....Squares indicate the (3k + 8) noisy |T  magic states to be distilled, and circles represent ancilla qubits used to effect measurements on the magic states. Increasing k does not increase the number of time steps in the protocol but increases the number of qubits involved in a block. As we increase k, we add qubits to the tail end, with the protocol translationally invariant along the tail. We report that we have independently confirmed the validity of the protocol for k = 2 by full wave-function simulation, which further confirmed that all single errors are detected and all two errors processes lead to outputs with correlated errors.

measurements of X-type observables that have weight 2k + 4, is the time for the physical gate, d is the code distance, and tsc implying a potentially expensive time cost. However, the is the time for a round of stabilizer measurements. Therefore, concept of gauge subsystem codes provides a method whereby for large distances the time is dominated by 8dtsc. We will such complex measurements can be broken down into a larger also take this as the time cost of a merge operation [29,30] number of simpler measurements [21–24]. In Appendix C we which we use to project two ancillary qubits into the even- present a rigorous demonstration that the Bravyi-Haah code or odd-parity subspace. We assume entangling gates can be can be viewed as a subsystem code, and using a combination performed in parallel, but a qubit can participate in only one of gauge fixing and a cat-state ancilla, we create a circuit of gate at a time. We allow entangling gates to be long range depth 4 for both X and Z measurement sets. We call this as is feasible within distributed architectures for quantum general approach gauge-MSD, where MSD is short for magic computing [31–38], although this may be relaxed at only a distillation. We remark that constant time realization was also modest increase in resources. found by Fowler et al. [3] but was tied to a monolithic braiding Our realization uses a number of ancilla qubits, so that architecture. in addition to the n = 8 + 3k qubits being distilled we also The time complexity of our realization is simply use nanc = 3k + 6 ancilla qubits, giving ntot = 6k + 14 logical qubits in total. These ancillas appear as circles in Fig. 1. With t = 8t + t + 2t + 3t ∼ 8dt , (1) block cnot A prep measure sc these logical qubits encoded in a distance d , the = + 2 where tcnot is the two-qubit gate (e.g., CNOT)√ time, tA is total qubit cost is Ntot (6k 14)d . Therefore, the total 3 the gate time for single-qubit A = (X + Y )/ 2 rotation, space-time cost amounts to Ntottblock ∼ (48k + 112)d , which tprep is the single-qubit preparation time, and tmeasure is the compares well against other approaches (see Appendix A). single-qubit measurement time. Here all operations are applied When using the toric code, A gates are also not naturally fault tolerantly to logical qubits within an error-correcting fault tolerant and thus require their own process of state code. Therefore, the time scales are for fault-tolerant gates. distillation and injection. The time tA to implement such a gate We will assume throughout that logical CNOT gates are applied is therefore the time taken by the logical CNOT which teleports transversally and thus take time tcnot = tg + d × tsc, where tg the gate into the computation plus any Clifford correction that

032338-3 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017) is required, although this can be rolled into a later Clifford This module-branch structure is common to all proposals operation. The resources required for the state distillation of to date. Such explicit terminology has not previously been A are far less than those of the T gate, and the ancilla resource introduced but rather has been left as an implicit consequence is reusable for many A gates [17,39]. As such, we neglect the of statements about correlation avoidance. Establishing clear overhead of these gates as a small additional overhead to the vocabulary about this structure is important as we delve into main process of |T  distillation. the effect of postselecting at different levels on this structure. Previous protocols have considered whether individual blocks III. OVERVIEW succeed or fail; we call this block checking. Below, we outline why it can be advantageous to postselect on the level of the A. Blocks, branches, and modules modules, which are collections of blocks. We propose an We begin by introducing some helpful vocabulary for de- additional quality check, so that the whole module is discarded scribing magic-state factories. Efficient distillation uses n → k whenever any of its blocks fail. We call this module checking. block protocols that take n noisy |T ∝|0+exp(iπ/4)|1 A block will always detect a single incoming error but might and with some probability output k states of a higher fidelity. fail to detect a pair of errors. When a block detects an error, it Such a process we call a block, and the previous section indicates the presence of damaged branches, and since errors described the details of the inner working of such a block. cluster together within branches, this increases the likelihood Now we treat each block as a black box with known relations of errors in other blocks throughout the module. Consider between input and output and consider how these blocks are when two branches fail, each with a correlated error pair; the composed together. first branch sends damaged qubits to blocks 1 and 2, and the Distillation protocols have many levels forming a treelike second branch sends damaged qubits to blocks 2 and 3. Since structure with many branches that merge at points we call blocks 1 and 3 each received a single erroneous qubit, they modules, shown in Fig. 2. Branches contain many qubits, will detect them. But block 2 receives a pair of errors, so they which are potentially correlated. However, the inputs to block may go undetected. A simplified illustration of this process is protocols must not be correlated, so each qubit in a branch shown in Fig. 2. Module checks improve fidelity by preventing must be fed into a different block. Therefore, as we enter a these processes from degrading the output fidelity. Even with module, a branch of Bl qubits is split up so that each qubit module checks, it is possible for a pair of corrupted branches to enters a different block. If each block implements a nl → kl go undetected, but both branches must carry exactly the same protocol, then the whole module can be thought of taking pattern of errors, which is a very rare occurrence. Blnl inputs to Blkl outputs. This entails that Bl+1 = Blkl and that each module has nl branches feeding into it. Initially, IV. THE G-MATRIX FORMALISM = = branches are single qubits, B1 1, and so Bl 1j

032338-4 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

Conditioned on success, the normalized distribution on output and output states have global infidelity = states is Prout(y) Prunnorm(y)/Psuc. Given an explicit form B (l)  l , for G0 and G1, this completes the black box picture of block g + (7) protocol performance. These formulas form the basis upon Al Bl which we build both our analytic and numerical analysis in the where we have made use of the following: following sections. (n1n2···nl ) The G-matrix formalism of Bravyi and Haah has been Al = (1 − ) , (8) significantly generalized [40,41]. This extension provides l ··· − l B = C 2 (1 − )(n1n2 nl 2 ), (9) protocols that convert noisy T magic states into another species l l l capable of injecting complex multiqubit circuits. Included in − = 2l j this framework are protocols, based on G matrices, which Cl ηj (v) . (10) provide resources for implementing Toffoli gates. Protocols j=1 v independently proposed by Jones [27] and Eastin [26] realized The most important quantity in Theorem 1 is Cl.Aftertwo error-suppressed Toffoli gates, and here we consider a variant 2 rounds C2 is simply [ y η1(y) ][ y η2(y)]. After l rounds, based on G matrices that we discuss further in Appendix E.All Cl is a product of l terms. One may approximate the theorem three variants perform identically when we use block checking. (l) ∼ 2l to leading order g Cl . However, our later numerical However, with the G-matrix formalism we can again use investigations found that such an approximation was too module checking to track correlations and achieve superior coarse, and we really need the slightly more complex form error suppression. This is just one additional application given in the theorem. We postpone proof of this result to of module checking; the technique can be deployed in Appendix F. conjunction with the general class of protocols introduced in For the Bravyi-Haah protocols it is easy to verify the Refs. [40,41]. following: ⎧ ⎨3, |y|=2, V. ANALYSIS OF MODULE CHECKING η (y) = 4, |y|=k, (11) BHk ⎩ We present a method of tracking the leading-order errors, 0, otherwise, accounting for correlations, through many rounds of module- so that for k>2wehave checked protocols. At each level of distillation the protocol k k(k − 1) is characterized by a function η that summarizes how well it η (y)m = 4m + 3m = 4m + 3m , (12) BHk 2 2 tolerates leading-order errors. y Definition 1. For every distance-2 G-matrix code that → k → where the binomial factor arises in counting the number of y distills n k qubits, we define a function η : Z2 Z, taking | |= = values where y 2. When k 2, we have a special case as then the |y|=2 terms and |y|=k terms are the all same, so = { | |= = = } η(y): # y : x 2,G0x 0,y G1x , (4) η (y)m = 7m. (13) BH2 |···| | |= y where is the weight y j yj and # counts the number of elements in a set {···}. In other words, the value η(y) counts For the Toffoli protocol we have the number of inputs x such that (1) they are weight 2 [formally, ηTof (y) = 4,y = 0, (14) (|x|=2)], (2) they are undetected by the protocol (formally, so G x = 0), and (3) they give y as output (formally, G x = y). 0 1 m = × m Since η(y) counts the number of lowest-weight errors ηTof (y) 7 4 . (15) leading output y, the total error rate for one round of distillation y can be simply estimated as Given expressions for η(y)m, we can compose these protocols anyway we wish and obtain an estimate of the global error rate as given in Theorem 1.  = η(y) 2 + O(4). (5) g We perform numerical Monte Carlo simulations by sam- y pling from the probability distribution of the raw magic states, |x| N−|x| Counting errors over many rounds is a more subtle problem, where Pr(x) =  (1 − ) , and track its evolution through but we find that η still provides sufficient information to the magic-state factory. perform this calculation. If each round can even use a different Our simulations track the progress of potentially damaging protocol, we label the corresponding function with a subscript. input error configurations of the initial raw magic states though We now state our key result. the factory. Direct Monte Carlo simulation is possible for one Theorem 1. Consider L rounds of distillation with module to two rounds of distillation, but with three or more rounds the checking, with associated functions η1,η2,...ηL. Such a failure events become so rare that brute force simulation fails protocol outputs a multiqubit magic state where the lth-level to provide statistically meaningful data. Rather, we use a “rare- modules succeed with probability event” technique that preselects cases with two or more corrupt branches. A full description of the method employed can be A + B found in Appendix G. We thus investigate the performance of a P  l l (6) suc,l n (Al−1 + Bl−1) l factory which makes use of module checking for a range of raw

032338-5 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017)

FIG. 3. (a) and (b) The cost C of the Bravyi-Haah protocols utilizing both block checking and module checking. It can be seen that only around the transitions to an additional level of distillation is block checking very slightly preferable. magic-state error rates between 0.1% and 1%. We find that the VI. FACTORY OVERHEAD ANALYSIS leading-order analytic estimates match well with the numeric While the “cost” is a useful guide to the performance of results, with the difference between the two being of the order a distillation protocol, it fails to capture several important of a few percent in the investigated parameter regime. In fact, features of real magic-state factories. The very purpose of wherever the block protocols are involved, if k  14, we find magic-state distillation is to supplement the shortcomings the percentage difference between the numerical simulation and analytic estimate is <10%. This discrepancy between the of a particular error-correcting code, which is to say that analytic estimate and numerical simulations is not visible on a magic-state factory is implemented at the level of logical log-log plots presented in Appendix G. qubits, with the quantum information already encoded. As The cost of a protocol is the average number of raw magic such, the more relevant question is not how many noisy input states consumed to produce one higher-fidelity magic state. magic states are required per output state, but rather the number For an n → k protocol, which takes in states with error p,this of the physical qubits that will build up such a factory and the is rate at which the distilled magic states are produced. Both of these numbers are of key importance to determining the size of n C(p) = . (16) a quantum computer and the run time of an algorithm. A single kPs (p) number, the “space-time overhead” of the magic-state factory, l−1 captures these both as a figure of merit, which was also studied For l rounds of distillation we have C (p) = C (p ). l i=1 i i in Ref. [3]. In this section we present a comparison based Figure 3 compares the cost of a Bravyi-Haah magic-state on the space-time overhead of implementing a magic-state factory with block checking and module checking, showing the minimum cost achievable for given output error. For output factory in a surface code quantum computer, utilizing module error rate and success probability we use known expressions checking, where appropriate, and our implementations of the for block checking, and for module checking we use the Reed-Muller, Bravyi-Haah, and Toffoli protocols. success probability and global error given by Theorem 1 with We consider a number of issues in this estimate of the an estimate of the reduced error rate on a single qubit given footprint of a magic-state factory, including (1) balanced by the global error rate estimate divided by the total number investment, the use of smaller surface codes during early of qubits in this factory’s output. rounds of magic-state distillation, and (2) clock-rate zoning, We find that module checking is superior to block checking cycling through distillation attempts faster during early rounds for a large proportion of target error rates and can use up to of magic-state distillation. We will assume throughout that 4 times fewer raw magic states in some regimes. However, we use the method outlined by Li [42] to inject the initial near a transition from j to j + 1 rounds of distillation, raw magic state into a logical surface code. We will therefore module checking loses it advantage and may even be slightly assume throughout that the initial error rate on a magic state = outperformed. The best error rates that can be achieved for before distillation is in 0.4pg. We also use the “rotated- a given number of rounds use low-k block codes; for these lattice” surface codes [29], such that a distance d surface 2 the benefit in the global error rate of module checking over requires d physical data qubits. Of course a practical surface block checking is smaller (see Table II), while the success code requires the use of physical ancilla qubits to make probability is, of course, much inferior. Above the transition the stabilizer measurements of the code; we leave this as higher k values are used, for which the success probability is an extra multiplicative factor to be applied to our overhead much lower, and this washes out the benefit of the superior calculated here, as different physical realizations have different error suppression over block checking, which has a higher requirements. We estimate the surface code distance required to protect a logical qubit up to error rate Pl using the relation success probability. In these regimes, as one might expect, d+1 module checking is not the optimal approach. In those regions Pl(d,pg) = d(100pg) 2 [3]. between the transitions, module checking allows the use of Given an algorithm and some implementation of it, the higher-k protocols, which are more efficient, to achieve the number of magic states required is determined in advance. For error rates of lower-k block-checked protocols. a given distillation protocol we must then determine whether

032338-6 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

TABLE II. The leading coefficient Cl for a variety of protocols with two and three levels of distillation. For clarity we also show Cl in the large block limit (k →∞). When we write BHk, we implicitly assume k>2, as the results differ slightly for the k = 2 case. The final column shows the ratio between the union bound estimate made by utilizing the reduced error rate on a single qubit BH made by Bravyi and Haah and 2l the corresponding estimate of the global error rate given by Cl  . It can be seen that the benefit (in error rate) of module checking scales with both k and the number of rounds of distillation.

4 Level 1 Level 2 Cl limk→∞ Cl limk→∞ k1k2BH/   9 3 27 2 2 3 2 BHk BHk 16 + k1(k1 − 1) 4 + k2(k2 − 1) k k 27k k 1 2 2 2  4 1 2 1 2 3 2 2 Tof BHk 112 4 + k(k − 1) 168k 2352k 2  + 9 − 2 3 BHk Tof 16 2 k(k 1) 28 126k 252k 8 Level 1 Level 2 Level 3  Cl   limk→∞ Cl limk→∞ k1k2k3BH/ 81 9 3 2 2 2 5 3 2 BHk BHk BHk 256 + k1(k1 − 1) 16 + k2(k2 − 1) 4 + k3(k3 − 1) 273.375k k k 2187k k k 1 2 3 2 2  2  1 2 3 1 2 3 9 3 2 2 4 3 3 2 Tof BHk BHk 1792 16 + k1(k1 − 1) 4 + k2(k2 − 1) 12096k k 28 × 3 k k 1 2 2  2  1 2 1 2 + 81 − + 9 − 2 2 5 3 BHk1 BHk2 Tof 256 2 k1(k1 1) 16 2 k2(k2 1) 28 5013k1 k2 20412k1 k2

its magic-state factory is capable of producing magic states of enc to be 0.1glo/k to ensure that the error due to each qubit’s fidelity great enough that there is a high probability that none finite encoding is much less than the reduced error ∼glo/k on of the magic states required for the algorithm fail. Thus, we that qubit. The final output of the factory can then ne encoded set a target global error rate that our factory must achieve: according to target. For a block-checked factory (or the original 15-to-1 Reed- = − 1/N target 1 (Psuc,alg) , (17) Muller protocol) we do not have worry about “protecting” correlated errors. This means we can work backwards from where Psuc,alg is the desired success probability of our algorithm (i.e., the probability that every non-Clifford gate our error target to determine an efficient balanced investment works) and N is the number of successful iterations of the of qubits, as illustrated in Fig. 4. For a local target error of = −14 factory needed to produce the desired number of magic states. ptop 10 the top level of distillation needs vPL(d,enc) < × −14 For example, a magic-state factory which utilizes three rounds 0.1 10 , where v is the space-time volume of a single of Bravyi-Haah distillation with a k value of 10 in each round block of distillation, which is the number of surface code qubits will produce 1015 |T  states after 1012 successful iterations. in the block multiplied by the number of times they undergo A 90% success probability for the algorithm as a whole then d rounds of surface code stabilizer measurements. Again, we −13 use a distance corresponding to a lower error rate by a factor implies target = 1.05 × 10 . We must then check that the factory is capable of producing an output of this quality. If of 10 to suppress any error injected by the logical circuitry the factory is module checked, then this “10-10-10” factory above that which is left over after distillation. The distance −16 required for the next level down can then be determined by, has a global error rate glo = 2.3 × 10 , making this a valid = 3 protocol. However, the estimate of the global error rate of a in this example, pi−1 ptop/35 for the 15-to-1 protocol, 3 −11 which gives distance di−1, and so on until pi >pin. block-checked factory gives 10 × BC ∼ 10 , so this would not be a valid factory for this task.

A. Balanced investment Once we have established that a factory is valid, we can calculate the space-time overhead per magic state that it produces. This is done by calculating the distance of surface code required at each level of distillation di , the length of time in surface code cycles that each round of distillation will take Ti , and the number of logical qubits Qi , including logical ancillas, required at each level of the factory. Determining the distance required at each level requires slightly different methods depending on whether module or block checking is used. To obtain the benefits of module checking we cannot make full use of balanced investment at the lower levels because doing so would inject noise at FIG. 4. The concept of balanced investment. During the initial a rate comparable to or greater than the rate of correlated rounds of distillation smaller surface codes can be used to encode the error. We can estimate the total global error output in the logical information. The factory requires only logical surfaces of the output of a n → k protocol as tot ∼ glo + kenc, where enc distance corresponding to the target error rate of the round in which is the random error rate resulting from the size of the logical that qubit is involved. This target in each round will depend on the encoding chosen. As such we determine the distance of the final error target, the protocol being used to achieve it, and the error code at the intermediate levels based on a desired error rate rate of the raw state.

032338-7 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017)

FIG. 5. Space-time overheads of producing both (a) and (b) |T  states and (c) and (d) |Toff  states. Note that module checking is more beneficial when one of the rounds of distillation is Toffoli.

The length of time Ti for each round of distillation can be without the need to decrease the rate of the factory. As such, simply determined by the protocol used and how many times the time taken for a round to complete in the distillation factory we attempt it before abandoning the round. We assume that is Ti = τi ti , where ti is the number of attempts that you allow measurements can be completed in one time step, and the time at each round. In all our simulations, any “idle time” that the scale of the CNOT gates, preparation, and A gates is dominated qubits experience is counted towards the space-time cost. by the requirement for d rounds of stabilization afterwards. Therefore, we let the time for each of these be d × t .The sc C. Numerical simulations time taken to implement the distillation protocol in round i is then We have therefore arrived at the expression for the full ⎧ space-time volume V in qubit rounds occupied by a factory, ⎨11tsc × d, round i uses Bravyi-Haah, = 12t × d, round i uses Toffoli, V N { } { } τi ⎩ sc (18) (in,pg, , ki , ti ,Psuc,alg) 13tsc × d, round i uses Reed-Muller. r Q T d2 r Q τ t d3 = i=1 i i i = i=1 i i i i , r P r P (19) B. Clock-rate zoning i ki suc,i i ki suc,i

Balanced investment also allows another advantage. In the where ki labels the magic-state protocol used at each round context of surface code computing, the distance of the code is and Pi is the probability that round i of distillation will relevant for not only the spatial dimensions of the computer succeed in producing enough magic states to feed the next but also the execution time. A surface code of distance d round. All rounds must succeed for the factory to successfully r V must undergo d rounds of parallel stabilizer measurements to output i ki magic states. As discussed, is a function of protect from measurement errors. As such, the time taken for several variables, the raw magic-state error rate in, the error a logical operation is proportional to d (except those which of gate operations pg, the total states required N , the protocol can be handled in software), and therefore, using balanced chosen {ki}, the repetitions allowed in each round {ti }, and investment, the initial rounds of distillation will take less time. the probability with which you wish the algorithm to succeed In the hypothetical case that a round of distillation leads to Psuc,alg. a squaring of the input error rate, this would correspond to In practice we calculate V for one, two, and three rounds a doubling of the code distance required by the next round of distillation using combinations of the Bravyi-Haah, Reed- (by the exponential suppression of error with distance of a Muller, and Toffoli protocols with our proposed implementa- subthreshold surface code). Therefore, one can repeat the first tions, with both block and module checking, and a variety of round of distillation twice in the time taken for the second combinations of ti . We then search for the most space-time round of distillation to be completed. This increases the chance efficient method of producing either T or Toffoli magic states, that all the necessary magic states for the second round will given a pg, assuming in = 0.4pg, the number of magic states have been produced in time for the next round of distillation, desired, and an arbitrary choice of 90% for Psuc,alg.

032338-8 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

Figure 5 shows the results of these simulations which far its most expensive part, and choose the minimum Toffoli suggest that module checking can provide an improvement gate-count implementation, which has Toffoli count and depth of a factor of 3 in the space-time overhead in certain parameter equal to 40N 3 for an N bit number [43]. We then determine regimes. We also see that in some regions module checking can the minimum possible space-time overhead per magic state be detrimental by a small amount, such as near the transition for this task and also the smallest possible factory, in terms from i to i + 1 rounds of Bravyi-Haah. of physical qubits, that can produce all the magic states We use the numbers we have generated to estimate the necessary while keeping up with a time-optimal quantum size of a magic-state factory required to perform some computation [14]. postclassical factoring tasks using Shor’s algorithm. Our The smallest possible factory is not necessarily the most results are summarized in Table II, in which we approximate space-time efficient factory possible: these factories tend to Shor’s algorithm as modular exponentiation, as this is by use larger k blocks which can make the factory formidably vast (see Fig. 2) but able to produce more qubits in the same time as lower-k protocols. However, these may well produce magic states much faster than required by the fastest possible implementation of the algorithm. Figure 6 shows that it is possible to use fewer qubits than required by the time-optimal implementation and perform your computation at a reduced speed. Equally, it is, of course, possible to increase the size of the factory to produce magic states at a faster rate. In this case, though, the extra qubit overhead is effectively wasted unless the computer is performing many calculations in parallel. We find that all the magic states required for a time-optimal factorization of a 1000-bit number can be produced with a surface code magic-state factory of 5.6 million “data qubits” if the infidelity of operations on physical qubits is 10−4.Thisis based on the cost of d2 physical qubits to store the information in the rotated lattice surface code [29]. However, for many architectures this number must be doubled to provide ancillas responsible for syndrome extraction. In this case the physical qubit overhead would be ∼11 million. A summary of the time and space overheads for some example Shor tasks can be found in Table I. We selected Shor’s algorithm as a benchmark as the number of non- required has been well studied and these results have been used in previous analyses. We note that due to the large resource overhead that we have demonstrated, early quantum computers may focus on other problems, particularly in

FIG. 6. To produce T magic states at a “time-optimal” rate the factory must, on average, produce states at a rate equal to the fastest rate at which they can be used in sequence by an algorithm. In this paper, where we have assumed that tsc = 10tmeas/ff , this means ten magic states must be produced on average every tsc to keep up with the time-optimal implementation of the algorithm. As this figure makes clear, it is indeed possible to produce a given number of magic states faster (slower) than this using more (fewer) qubits. Each data point represents one possible magic-state factory given the input error and target number of 1016 T magic states; only the factories near the boundary between possible and impossible factories are shown. Not shown in the lower right of each graph are myriad other possible FIG. 7. The scaling of resource cost in qubit rounds per magic factories of lower rate and higher overhead. As seen clearly in (a), state is not worse than that of the surface code. This corroborates the doubling the size of the factory can allow one to more than double predictions of Raussendorf et al. [28] and suggests that the asymptotic the rate of magic-state production when the larger factory allows the overhead scaling ∼O( log(N)3) of the surface code is applicable to use of the higher-k (and therefore higher-rate) block codes. This point universal fault-tolerant computing with gate counts in the regime of is less evident in (b), where only two rounds of distillation can be practical interest. Lines are fitted functions of the form V(pg,N) = sufficient and the higher-k block codes are not necessarily optimal. a log(N)b + c.

032338-9 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017) quantum chemistry. A recent analysis of some such problems phenomenological threshold for the surface code. Although [44] demonstrates that they are solvable with overheads lower a full circuit-based threshold has not yet been determined, it is than we find here, albeit that work makes more optimistic unlikely to challenge that of the surface code due to the higher assumptions of gate times and fidelities. weight stabilizers required. This points towards 3D color codes Of significant interest is how the cost of magic-state dis- requiring physical error rates below 0.1%. Resource costs are tillation compares to the surface code overhead. Raussendorf heavily influenced by proximity to threshold, so 3D color codes et al. (see Sec. 6.2 of Ref. [28]) were the first to note that seem to require significantly lower physical error rates before using balanced investment in a magic-state distillation leads they can start to compete with surface codes augmented by to a constant factor overhead compared to the CNOT gate. Our magic states. Therefore, with current technology and fidelities, numerical results in Fig. 7 show a ratio between T -gate cost and known schemes for avoiding magic states are a false economy. CNOT cost in the range of ∼150–310 when 1010

032338-10 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

S by an Abelian group called the stabilizer , which is a code has stabilizer S, and these groups satisfy S˜ ⊂ S ⊂ Sg. subgroup of the Pauli group. The projector onto the code However, Sg is inflated in size compared to S and is no longer ∝ = ∈ S space is  s∈S s, so that s  for all s . There Abelian. Furthermore, one can verify that Sg does not contain always exists a minimal set of operators {S1,S2,...Sm} that any logical operators as follows. First, note that Bravyi and generates the group, which we denote by S = S1,S2,...,Sm. Haah use triorthogonal (also called triply even) matrices where For the Calderbank-Shor-Steane (CSS) codes, these generators for any f,g ∈ G we have (f,g) = 1 if and only if f = g and can be chosen so that they are all either X type or Z f ∈ G1. type, as we define shortly. If g is a binary vector of As logical operators we take X[l] and Z[l] for each l in length n,weuseZ[g] =⊗n Z and, similarly, X[g] = j:[g]j =1 j G1. From the triorthogonality of G we see that X[l] and ⊗n X .WesayZ[g]areZ-type operators and X[g]are Z[l] anticommute, but X[l] and Z[l] commute when l = l. j:[g]j =1 j X-type operators. Therefore, a CSS code has generators S = To properly describe a subsystem code, where measuring the Z[f1],...,Z[fa],X[g1],...,X[gb]. Commutation of X[g] gauge operators does not damage the logical qubits, we require and Z[f ] is equivalent to (f,g) = 0, where we use the inner that the logical operators are not elements of the gauge group product (f,g):= fg (mod 2). Sg. Recall that the gauge group is defined by vectors that reside ⊥ Bravyi and Haah introduced the notion of a G matrix in the dual code G . Therefore, every gauge operator must G which is a binary matrix composed of two submatrices, G1 have a vanishing inner product with every row in . However, ∈ G = and G0. We label the rows of G as {g1,...,gb,gb+1,...gm}, l , and (l,l) 1, so l is not in the dual space, and the where the rows {g1,...,gb} belong to G0 and the rows logical operators are not gauge operators. This completes our {gb+1,...,gm} belong to G1. These matrices define a CSS proof that the logical operators indeed lie outside the gauge code as follows. The rows of G0 are some set {g1,...,gb} group. that specifies the X-type generators {X[g1],...,X[gb]}.The G⊥ Z-type generators are given less explicitly. Denote as APPENDIX C: REALIZING THE BRAVYI-HAAH the binary vector space orthogonal to both G0 and G1, that ⊥ PROTOCOLS is, G :={g :(f,g) = 0; ∀ f ∈ G0,G1}. Note that Bravyi and ⊥ ⊥ Haah required that G0 ⊂ G . We define G as some (minimal) There are many routes to realizing magic-state distillation. ⊥ matrix with rows {f1,...,fa} that generate the group G Assuming perfect Clifford operations, different realizations under row-wise modular addition. This defines the Z-type suppress errors equally but differ in terms of temporal depth generators {Z[f1],...,Z[fb]}. Note that there exist many and required ancillas. Many of these potential realizations different choices for G⊥, which all result in the same CSS have only been sketched, without a complete assessment of code with Z-type generators {Z[f1],...,Z[fa]}. resources involved. Here we introduce a method particularly This completes the description of the space, suitable for architectures implementing logical gates via although we also need to know how information is stored transversal operations or lattice surgery [29]. Conceptually, within the subspace. We have that the X operator for the kth we are inspired by notions of subsystem codes and gauge- logical qubit is X[gk], where gk is a row of G1,sob + 1  fixing techniques and so call our approach gauge-MSD. We k  m. These are representatives of the logical operators, with consider only even k with k ∈{2,6,...,4m + 2,...} as then equivalent logical operators differing by only a stabilizer. For the Bravyi-Haah codes have transversal T gates. For k ∈ { } Bravyi-Haah protocols all rows in G1 have odd weight, so we 0,4,8,...,4m, . . . , the Bravyi-Haah codes have transversal mayalsotaketheZ operator for the kth logical qubit as Z[gk], T gates only up to a nonlocal Clifford correction. where gk is the kth row of G1. 1. Outline of protocol 2. Subsystem Bravyi-Haah codes Here we present an outline of gauge-MSD, with details of Given a G matrix, we define a subsystem code with how to realize multiqubit Pauli measurements postponed until the next section. First, we specify some notation. Refer back to stabilizer S˜ := Z[g1],...,Z[gb],X[g1],...,X[gb], where { } Appendix B for definitions of the G matrix and G⊥ matrix. Let g1,...,gb are the rows of G0. Notice that now there are equal ⊥ numbers of X and Z stabilizers, and they both correspond to R be a binary matrix such that G · R = 1l (mod 2), which is ensured to exist by virtue of the fact that G⊥ is full rank. the rows of G0. Bravyi and Haah considered a class of matrices Furthermore, let M be a binary matrix such that M · G⊥ = G obeying triorthogonality conditions, which require that G0 ⊂ 0 ⊥ ⊥ G S˜ ⊂ S (mod 2), which must exist since G0 is in the span of G . . Therefore, we have that , with the subsystem code ⊥ having strictly fewer stabilizers than the original Bravyi-Haah Explicit examples of G , R, and M will be given in the next ˜ ∝ section. Measurement outcomes will be recorded in binary: 0 code. We denote the subsystem projector as  s∈S˜ s. We further take the logical operator for the subsystem code for +1 eigenvalues and 1 for −1 eigenvalues. Let O, X , and to be identical to those of the original Bravyi-Haah code. Z be disjoint sets: This leaves some degrees of freedom as neither stabilizers O ={6 + 3j|j = 1,2,...k}, nor logical operators. We define the gauge group Sg := X ={ } Z[f1],...,Z[fa],X[f1],...,X[fa], where {f1,...fa} are 1,2,3 , ⊥ rows of G . Notice that Sg contains S by virtue of Z ={ + + | = } ⊥ 4,5,6,7,8,7 3j,8 3j j 1,2,...,k . (C1) G0 ⊂ G . Let us recap. Our subsystem code is defined by its stabilizer Associated with these sets are binary matrices that allow us to S˜ and gauge group Sg, whereas the original Bravyi-Haah compute Pauli corrections; HZ is a k-by-|X | matrix and HX is

032338-11 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017) a k-by-|Z| matrix as follows: where  = XZ is the full projector onto the Bravyi-Haah stabilizer code space. 4 5 6 7 8 10 11 13 14 16 17 ... We have obtained the desired  projection required by the ⎛ ⎞ 011111 1 0 0 0 0 Bravyi-Haah protocol. But we have picked up an additional ⎜011110 0 1 1 0 0 ...⎟ X,γ . This additional projection results from measuring some = ⎜ ⎟ gauge operators of the subsystem variant of Bravyi-Haah. HZ ⎝011110 0 0 0 1 1 ⎠(C2) . . Thus, the logically encoded qubits are unharmed, but there . . has been a change in the gauge degrees of freedom. 123 In step 5, we perform single-qubit measurements to isolate ⎛ ⎞ the k logical qubits from the n qubits in the subsystem code. 110 Recall that in the original presentation of Bravyi-Haah we ⎜ ⎟ 110 have logical operators Z[l ] and X[l ]forthejth logical H = ⎜ ⎟ (C3) j j X ⎝110⎠ qubit, where l is the jth row of G . The measurements . j 1 . localize these logical operators onto single qubits, up to a Pauli correction depending on the measurement outcomes. where the numbers above the columns correspond to the That is, the measurements cause the state to become stabilized Z X elements in sets and . Last, we will also make use by some new ±Z[w] operators, and every logical Z[lj ] can ⊥ of a k-by-dim(G ) matrix denoted Q that satisfies G1 = be multiplied by these operators to obtain a single-qubit ±Z ⊥ W + QG , where W has Wj,6j+3 = 1, Wj,1 = Wj,2 = 1 and operator acting on the (6 + 3j)th qubit. It is straightforward to is zero everywhere else. An example Q is given in the next verify that the ± sign is corrected to + by the Pauli correction section. X[HZmZ]. To show that X logical operators are also localized We now state the protocol: on a single output qubit, we first multiply X[lj ]byX-type ⊥ (1) Measure all Z[fk], where fk is the kth row of G , gauge operators until it acts on qubits 2, 3, and (6 + 3j). recording outcomes as μ = (μ1,μ2,...,μk). Measuring qubits 1, 2, and 3 completes the localization of = (2) Apply A[w], where w Rμ (mod 2). X[lj ] onto the (6 + 3j)th qubit. The required Pauli operator is ⊥ (3) Measure all X[fk], where fk is the kth row of G , now X[HZmZ + Qγ ], where there is also some dependence recording the outcome as γ = (γ1,γ2,...,γk). on the eigenvalues of gauge operators obtained in step 3. (4) Declare success if Mγ = (0,...,0) (mod 2) and failure otherwise. 2. Implementing Pauli measurements in minimum depth (5) Measure qubits in set X in the X basis, recording Implementing the protocol requires a set of Z measure- outcomes as mX. Simultaneously, measure qubits in set Z ments,followedbyasetofX measurements. In the original in the Z basis, recording outcomes as mZ. (6) Qubits in X and Z are discarded, while qubits in set O standard implementation the X measurements involved many are retained as output as qubits (1,2,...,k). qubits and so were difficult to implement. However, the previous section shows that the difficult X measurements (7) Apply Pauli corrections X[HZmZ]Z[HXmX + Qγ ], or update the Pauli frame accordingly. can be replaced with lower-weight measurements mirroring the Z measurements performed. The complexity of these After steps 1 and 2, the state is deterministically projected measurements depends on the row weights of G⊥. The matrix by Z ∝ ∈S s, where SZ := Z[f1],...,Z[fa], exactly s Z ⊥ G⊥ as in the original Bravyi-Haah protocol. After step 3, the G must generate the space , but there is some freedom γk in how we choose the generating rows. In the work of system is an eigenstate of (−1) X[fk], and we can associate Bravyi-Haah, G⊥ was only implicitly defined (as the dual of some projector with X,γ with this process. The postselection + G), leaving unclear how much time it would take to implement in step 4 ensures the system is an eigenstate of X[gk], where ⊥ ⊥ the required measurements. It is desirable that G is as sparse gk is the kth row of M · G . Since we defined M such that ⊥ as possible to minimize the resource overheads. Therefore, M · G = G0, we have that gk are the rows of G0. It follows we wish to find a very sparse G⊥. We found a family of G⊥ that X,γ = XX,γ , where X is the projector for the X stabilizer of the Bravyi-Haah stabilizer code. Combining steps matrices where all rows, except one, are weight 4 and the + 1–4, we have the projections single exception has weight k 2. We present the exact form of this G⊥ for k = 2, along with the associated R, M, and Q X,γ Z = X,γ XZ = X,γ , (C4) matrices used in the previous section:

⎛ ⎞ 10010110000000 ⎜ ⎟ ⎜01011010000000⎟ ⎜ ⎟ ⎜00111100000000⎟ ⎜ ⎟ ⎜00001001110000⎟ ⊥ = ⎜ ⎟ G ⎜00000101101000⎟, (C5) ⎜00000011011000⎟ ⎜ ⎟ ⎜00000000110110⎟ ⎝ ⎠ 00000000011011 00100010100100

032338-12 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

⎛ ⎞ 100011000 ⎜ ⎟ ⎜010101000⎟ ⎜001110000⎟ ⎜ ⎟ ⎜000000000⎟ ⎜ ⎟ ⎜000100000⎟ ⎜ ⎟ ⎜000010000⎟ ⎜ ⎟ ⎜000001000⎟ 110001000 R = ⎜ ⎟,Q= , ⎜000000000⎟ 110001010 ⎜ ⎟ ⎜000000000⎟ ⎜ ⎟ ⎜000000000⎟ ⎜ ⎟ ⎜000000000⎟ ⎜ ⎟ ⎝001111001⎠ 001111101 001111111 ⎛ ⎞ 010111110 M = ⎝001111010⎠. 111111000 A Mathematica script for generating G⊥ for any k is provided in the Supplemental Material [51], which also verifies that G⊥ is full rank and dual to G. As one further example, we find for k = 6 that ⎛ ⎞ 10010110000000000000000000 ⎜ ⎟ ⎜01011010000000000000000000⎟ ⎜ ⎟ ⎜00111100000000000000000000⎟ ⎜00001001110000000000000000⎟ ⎜ ⎟ ⎜00000101101000000000000000⎟ ⎜ ⎟ ⎜00000011011000000000000000⎟ ⎜ ⎟ ⎜00000000110110000000000000⎟ ⎜ ⎟ ⎜00000000011011000000000000⎟ ⊥ ⎜ ⎟ G = ⎜00000000000110110000000000⎟. (C6) ⎜ ⎟ ⎜00000000000011011000000000⎟ ⎜ ⎟ ⎜00000000000000110110000000⎟ ⎜ ⎟ ⎜00000000000000011011000000⎟ ⎜ ⎟ ⎜00000000000000000110110000⎟ ⎜ ⎟ ⎜00000000000000000011011000⎟ ⎜ ⎟ ⎝00000000000000000000110110⎠ 00000000000000000000011011 00100010100100100100100100

Notice that the bottom row has weight exceeding 4. The on whether the entangling gates can be scheduled in an measurements corresponding to weight-4 rows can be im- economical manner. The scheduling problem is equivalent to plemented with each measurement using a single ancilla in a graph coloring problem, and we find that it can be solved the |+ state and four entangling gates (control Z or control in four time steps (e.g., using four colors) as in Fig. 1.We X depending on the measurements). Therefore, for these independently confirmed this using an automated solver of the measurements it is possible that all these gates can be realized edge colorability problem for k up to 40; see supplementary in four time steps while respecting that a qubit can be involved Mathematica script for details. in only a single entangling gate at a time. Unfortunately, there is a single row of G⊥ with weight k + 2, so using APPENDIX D: REED-MULLER CONNECTIVITY a single ancilla to perform this measurement would result in a growing time cost with k. Therefore, for this single The usual form of the 15-qubit punctured Reed-Muller code ⊗ + ⊗ + measurement we make use of a cat state |0 k 2 +|1 k 2 G matrix is so that the entangling gates can be performed in parallel. The ⎛ ⎞ cat state itself is constructed by merge operators on |+⊗k+2 111111110000000 ⎜ ⎟ qubits. The merge operations project onto |00 00|+|11 11| ⎜111100001111000⎟ = ⎜ ⎟ or |01 01|+|10 10| subspaces and so commute with the GRM ⎝110011001100110⎠. control gates, so the cat state can be built concurrently with 101010101010101 the control gates. This opens the possibility of realizing 111111111111111 each round of measurements in four times steps but depends (D1)

032338-13 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017)

For our purposes an efficient choice of dual is ⎛ ⎞ 000011110000000 ⎜ ⎟ ⎜000000001111000⎟ ⎜ ⎟ ⎜000000001100110⎟ ⎜ ⎟ ⎜000000001010101⎟ ⎜ ⎟ ⊥ = ⎜111100000000000⎟ GRM ⎜ ⎟, ⎜110011000000000⎟ ⎜101010100000000⎟ ⎜ ⎟ ⎜011000000110000⎟ ⎝ ⎠ 001010000010100 100010001000100 (D2) which has a maximum row weight of 4 and a max column weight of 5. There is a five-colorable graph associated with this matrix shown in Fig. 8, similar to what we found for the Bravyi-Haah protocol. It is well known that the Reed- Muller code can also be viewed as a subsystem code, so the Pauli-X measurements can clone the pattern of the Pauli-Z measurements. This approach yields the realization shown in Fig. 8.

APPENDIX E: TOFFOLI PROTOCOL The Toffoli protocol converts eight T states into one Toffoli state, 1 |Tof =√ (−1)abc|a,b,c. (E1) 8 a,b,c∈{0,1} It has previously been described in the circuit picture with its performance found by brute-force counting of errors [26,27]. Here we consider an equivalent protocol (with the same performance when using block checking) but with the G matrix formalism of the Bravyi-Haah protocols. The G matrix achieving this is simply ⎛ ⎞ 11110000 ⎜ 11001100⎟ G = ⎝ ⎠, (E2) Tof 10101010 11111111 FIG. 8. Squares represent magic states to be distilled, and circles where proof that this distills was given in Refs. [40,41]. Here are ancillas used to implement measurements in Reed-Muller. Edges we show in Fig. 9 how to convert this abstract protocol to a show required hardware connections and associated required control- realization with low space-time overhead. phase gates. Edges show time ordering of control-phase gates, e.g., red, purple, green, blue, and gold (dark to light gray), demonstrating APPENDIX F: PROOF OF ERROR TRACKING the required entanglement can be established in five time steps. RM Sometimes we call this the graph representing Gz . When iterating distillation protocols, we have a collection of inputs which may have been outputs of an earlier round. (j) ∈ N We illustrate the structure in Fig. 2, where we introduce the We use x Z2 to denote the error distribution of the terminology of branches and modules. Our proof proceeds input contribution from the jth branch and use probability by considering how an individual module maps probability Pr(x(j)) for the probability of this event. For now we take distributions over its inputs to a probability distribution over this probability as given and note only that for a block that (j) = outputs. If a level-l module uses many nl → kl blocks, then quadratically suppresses noise, we have that Pr(x 0) is, 2l it requires nl branches of inputs from earlier rounds. If each to first order, proportional to  after l rounds of distillation. branch carries Nl qubits, then we need Nl instances of the If a single block of an nl → kl protocol is described by a nl → kl protocol so that the whole module maps Nlnl → Nlkl matrix G, then N copies within a module are described by qubits. The size of Nl equals the product of the kl values for G = G ⊗ 1l N , where ⊗ is the tensor product and 1lN is the all earlier rounds, which can be verified by simply counting identity matrix of dimension N. On the first round, N = 1, so back through the tree. we simply have G = G. Before proceeding we must clarify

032338-14 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

collisions, such as an arbitrary permutation of qubits within any branch. Potentially, such permutations could perform better or worse than the canonical choice, leaving room for further optimization. We continue with the canonical choice as it is particularly natural and amenable to analysis. After a module passes module checking, the output branch has errors distributed as 1 Prl(y) = Pr(x), (F1) Psuc,l {x:G0x=0,G1x=y} where the denominator is a normalization constant accounting for the success probability Psuc,l = Pr(x). (F2)

y {x:G0x=0,G1x=y} For our proof, it is useful to work with unnormalized probabilities that we write as Pl(y) = Pl−1(x), (F3)

{x:G0x=0,G1x=y} where we will renormalize later. Herein, we focus on the smallest weight errors that can lead to a particular y.The no-error case (y = 0) occurs when the initial states had no errors, so

ml Pl(0)  (1 − ) , (F4)

where ml counts the total number of raw qubits needed for l = rounds of distillation, namely, ml j=1,...,l nj . For protocols quadratically suppressing noise, 2l is the smallest number of errors that can evade detection. Therefore, we take the approximation

l l 2 ml −2 Pl(y)  Cl(y) (1 − ) , (F5) l where Cl(y) counts the number of different weight-2 errors that lead to output y. For one round of distillation this is simply

C1(y) = η(y). (F6) Recall that η(y) was defined back in Definition 1 as a function that counts precisely the number of weight-2 errors that lead FIG. 9. Squares represent magic states to be distilled, and circles to a particular output. are ancillas used to implement the stabilizer measurements which Finding Cl(y) for higher levels (l>1) is more involved. project the T states into the Toffoli state. As before, edges show The relation between input and output error strings is y = G1x required hardware connections and associated control-phase gates. (assuming G0x = 0), so x vectors of weight 2 can result in Edges show time ordering of control phase gates, e.g., red, purple, y vectors of a variety of weights, potentially increasing the green, blue, and gold (dark to light gray), demonstrating the required weight, so some care is needed. entanglement can be established in four time steps for the Z stabilizers Consider a module (on some l>1 level) where some of and five time steps for the single X stabilizer, which utilizes just two the incoming branches contain errors. The probability of two ancillas. We call this the graph representing GToff . z branches a and b containing errors is

(a) (b) nl −2 Pl−1(x )Pl−1(x )Pl−1(0) l − l how this tensor product structure relates to the input strings  (a) (b) 2 − ml 2 (j) Cl−1(x )Cl−1(x ) (1 ) . (F7) x. We define δ as a length-nl binary vector with entries 1 in the jth location and 0 everywhere else. It follows that x = These errors may contribute to undetected errors leaving the (j) ⊗ (j) G = (j) ⊗ (j) j (δ x ), so that x j (Gδ x ). This ordering module. Fewer or more branch failures do not provide leading- ensures that no two qubits from the same branch collide into the order contributions. Consider when only one branch contains same block, which must be prevented because of correlations any errors. After this branch is split up and fed into different within branches. We call this canonical ordering, and it is blocks, each nl → kl block can contain at most one error, so assumed throughout. Other orderings exist that prevent such it will be detected. If t>2 branches contain an error, the

032338-15 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017)

− probability will be weighted by t∗2(l 1) , which for small  is a where we have used the shorthand rare process compared to two failed branches. l 2l−j Furthermore, with two erroneous branches a and b,we Cl = Cl(y) = ηj (v) . (F12) (a) = (b) can further deduce that either x x or the error will be y =0 j=1 v detected. To see this, we begin by noting that errors will be = = − ml undetected only if (G0 ⊗ 1l N )x = 0 (mod 2), and we have Equating also Al Pl(0) (1 ) , we find we have reached the expression of Eq. (8). To renormalize the error ⊗ = (a) ⊗ (a) + (b) ⊗ (b) (G0 1l N )x (G0δ ) x (G0δ ) x . (F8) probability Bl, we simply divide through by the total prob- + (a) (b) ability Al Bl, yielding one equation of Theorem 1. The Both G0δ and G0δ are columns of G0 and so are nonzero. denominator Al + Bl represents the success probability of not Since no terms are zero, it can vanish mod 2 only if both just a single module succeeding but all events feeding into G δ(a) = G δ(b) x(a) = x(b) 0 0 and . This is a central point of that module also succeeding. We are actually interested in the the proof, so we will give a second explanation of what is (a) (b) success probability conditioned on previous modules being happening here. Note that if x = x , then without loss of n successful, which is (A + B ) divided by (A − + B − ) l . x(a) l l l 1 l 1 generality had an error in at least one location that is absent This divider comes from the unconditional success probability from x(b). If the error’s location is t, then the tth distillation of an (l − 1)-level module, (Al−1 + Bl−1), and the fact that nl block will receive an input with only one error and must detect of these feed into an l-level module. This completes the proof it. We conclude that the dominate source of errors is from the of Theorem 1. identical failure of pairs of branches. To simplify notation, let u = δ(a) + δ(b) and f = x(a) = x(b), so we have the more concise expression x = u ⊗ f for APPENDIX G: NUMERICAL SIMULATIONS relevant errors, with u being weight 2. Above we noted that 1. Brute-force method (a) (b) an undetected error must satisfy G0δ = G0δ , which is In simulating a magic-state factory it quickly becomes equivalent to G0u = 0 (mod 2). The output from such an apparent that a brute-force method of simulation is inade- error is y = (G1 ⊗ 1l N )(u ⊗ f ) = (G1u) ⊗ f . Therefore, we can now deduce that quate. The “brute-force” method simply involves randomly generating a Boolean string of length l nl to describe the 2 nl −2 Pl(v ⊗ f ) = Pl−1(f ) Pl−1(0) input to the magic-state factory and performing calculations

{u:G0u=0,G1u=v} of the stabilizers and logical output of each module of the factory. The explicit procedure for a three-round factory is by counting over all u that lead to the same v. Therefore, given in Algorithm 1. For each module in round 1, a random 2 input is generated and tested until the stabilizer checks for C (v ⊗ f ) = C − (f ) . l l 1 the module are passed. The logical output of each of these { = = | |= } u:G0u 0,G1u v, u 2 successful modules is saved until enough have been generated The summation is over weight-2 vectors u, but the terms are to feed into round 2. These outputs are shuffled according to independent of u, so we find that Eq. (F8) before entering round 2. Here the module checking is performed again. If all the module checks are passed, then 2 Cl(y) = Cl(v ⊗ f ) = ηl(v)Cl−1(f ) , we again proceed by feeding the logical output of round 2 to round 3, after shuffling. Having reached round 3, we where we have again used the η notation introduced in then determine the characteristics of this module by recording Definition 1. For two rounds of distillation, l = 2, so C − = l 1 where the module check fails and, if it succeeds, whether the C , and we can end the recursion by using Eq. (F6), so that 1 output of the module contains a logical (undetected error). 2 Clearly, a large number of iterations of this protocol are C2(y) = C2(v ⊗ f ) = η2(v)η1(f ) . (F9) required to obtain reliable statistics for the performance of However, if we have more than two rounds, notice that the the factory. For example, our analytic treatment estimates relevant f will again have the form f = v ⊗ f , so that that with a raw magic-state error rate  = 0.001 the rate of undetected logical error of a round-3 module is ∼10−21.As =  ⊗  2 Cl(y) ηl(v)[Cl−1(v f )] such, successfully simulating this by brute force would require   2 2  21 = ηl(v)[ηl−1(v )Cl−2(f ) ] 10 attempts at the algorithm to be made, not just to build  2  4 statistics but to ensure that enough instances of the third-round = η (v)[η − (v )] [C − (f )] (F10) l l 1 l 2 module are generated in the first place. We find that the and so on until we reach C1 and can use Eq. (F6) to end the simulation of two-round factories by brute force is achievable recursion. for 2

032338-16 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

FIG. 10. Numeric vs analytic. (a) Bravyi-Haah block protocols with two and three rounds of distillation. (b) Toffoli protocol followed by one or two rounds of Bravyi-Haah. (c) One or two rounds of Bravyi-Haah followed by the Toffoli protocol. The close correspondence of our analytic treatment of the factory and numerical simulations is demonstrated for two and three rounds of the magic-state distillation with module checking. Points represent simulation data, while the lines are the corresponding analytic estimates as determined by Theorem 1. The protocols are labeled by the k value of each round, which is set to be equal for each round of distillation. The global error, the probability of a logical error anywhere in the output of the factory, is shown. For a “2-2-2” factory this is the probability of an error anywhere in the 8 output magic states; for a “26-26-26” this corresponds to an error anywhere in the 17,576 magic states that this factory produces. error configurations that we know cannot lead to a logical For each of the “corrupt” branches entering round 3, we error. Our chosen method of doing this proceeds as follows. know that it must originate from a round-2 module which We know that a module has a chance of failing only if at itself had two or more corrupt branches entering it. These least two of the branches entering that module contain an branches again would have originated in a round-1 module error. For a module in the third round, we focus on cases (equivalently, a block) which had at least two errors fed to where at least two of its input branches contain errors. We it. We can thus greatly reduce the size of the simulation and do this using a process that can be called preselection, in the number of iterations required by simulating only on a contrast to postselection. In postselection, we sample from subsection of the factory where these errors have occurred a distribution and reject instances that do not meet a certain (see Fig. 11). The probability of an undetected error after the criterion. This is not feasible when the criterion (here having first round was determined analytically for Bravyi-Haah by at least two corrupt branches) is rare, so we instead construct explicitly calculating the weight enumerator a new probability distribution conditioned on the criterion k k/2 + being meet. The first step of the algorithm therefore is to first W(k,) = 2 (1 − 2)3m 4 + (1 − 2)3(2m) decide how many error-containing branches will enter this m is odd m=0 module, given that this number is 2 based on the statistics k k/2 − + + already gathered for the rate of errors in the output of round-2 + 6 (1 − 2)2k m 4 + (1 − 2)3(2m) 8, modules. Later, we analytically adjust for this process of m=0 m=0 preselection. (G1)

032338-17 JOE O’GORMAN AND EARL T. CAMPBELL PHYSICAL REVIEW A 95, 032338 (2017)

Algorithm 1. Brute-force simulation algorithm

1: select protocol: {k1,k2,kToff } 2: generate number of modules in each round: {M1,M2,MToff } {Round One} 3: for i

032338-18 QUANTUM COMPUTATION WITH REALISTIC MAGIC- . . . PHYSICAL REVIEW A 95, 032338 (2017)

   19: take (ii)th string of length n2 from V : v 30: shuffle output of round 2 to firewall correlations W → W  20: measure stabilizers G0(k2)v 31: for l

[1] S. Bravyi and A. Kitaev, Phys. Rev. A 71, 022316 (2005). [29] C. Horsman, A. G. Fowler, S. Devitt, and R. Van Meter, New J. [2] C. Jones, Phys. Rev. A 87, 042305 (2013). Phys. 14, 123011 (2012). [3] A. G. Fowler, S. J. Devitt, and C. Jones, Sci. Rep. 3, 1939 (2013). [30] A. J. Landahl and C. Ryan-Anderson, arXiv:1407.5103. [4]P.W.Shor,inProceedings of the 37th Annual Symposium [31] S. D. Barrett and P. Kok, Phys.Rev.A71, 060310 (2005). on Foundations of Computer Science (IEEE Computer Society [32] L. Jiang, J. M. Taylor, A. S. Sørensen, and M. D. Lukin, Phys. Press, 1996), pp. 56–65. Rev. A 76, 062323 (2007). [5] E. Knill, R. Laflamme, and W. H. Zurek, Science 279, 342 [33] D. K. L. Oi, S. J. Devitt, and L. C. L. Hollenberg, Phys.Rev.A (1998). 74, 052313 (2006). [6] R. Raussendorf, J. Harrington, and K. Goyal, Ann. Phys. (NY) [34] D. L. Moehring, P. Maunz, S. Olmschenk, K. C. Younge, D. N. 321, 2242 (2006). Matsukevich, L. M. Duan, and C. Monroe, Nature (London) [7] H. Bombin, New J. Phys. 17, 083002 (2015). 449, 68 (2007). [8] A. Paetznick and B. W. Reichardt, Phys.Rev.Lett.111, 090505 [35] E. T. Campbell and S. C. Benjamin, Phys.Rev.Lett.101, 130502 (2013). (2008). [9] J. T. Anderson, G. Duclos-Cianci, and D. Poulin, Phys. Rev. [36] E. Campbell and J. Fitzsimons, Int. J. Quantum Inf. 8, 219 Lett. 113, 080501 (2014). (2010). [10] T. Jochym-O’Connor and R. Laflamme, Phys.Rev.Lett.112, [37] H. Bernien, B. Hensen, W. Pfaff, G. Koolstra, M. S. Blok, 010505 (2014). L. Robledo, T. H. Taminiau, M. Markham, D. J. Twitchen, L. [11] B. J. Brown, N. H. Nickerson, and D. E. Browne, Nat. Commun. Childress, and R. Hanson, Nature (London) 497, 86 (2013). 7, 12302 (2016). [38] N. H. Nickerson, J. F. Fitzsimons, and S. C. Benjamin, Phys. [12] S. Bravyi and A. Cross, arXiv:1509.03239. Rev. X 4, 041041 (2014). [13] P. Selinger, Phys.Rev.A87, 042302 (2013). [39] J. T. Anderson, arXiv:1205.0289. [14] A. G. Fowler, arXiv:1210.4626. [40] E. T. Campbell and M. Howard, Phys. Rev. Lett. 118, 060501 [15] A. M. Meier, B. Eastin, and E. Knill, Quantum Inf. Comput. 13, (2017). 195 (2013). [41] E. T. Campbell and M. Howard, Phys. Rev. A 95, 022316 (2017). [16] S. Bravyi and J. Haah, Phys. Rev. A 86, 052329 (2012). [42] Y. Li, New J. Phys. 17, 023037 (2015). [17] N. C. Jones, R. Van Meter, A. G. Fowler, P. L. McMahon, J. [43] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Kim, T. D. Ladd, and Y. Yamamoto, Phys. Rev. X 2, 031007 Phys. Rev. A 86, 032324 (2012). (2012). [44] M. Reiher, N. Wiebe, K. M. Svore, D. Wecker, and M. Troyer, [18] H. Anwar, E. T. Campbell, and D. E. Browne, New J. Phys. 14, arXiv:1605.03590. 063006 (2012). [45] A. Fowler (private communication). [19] E. T. Campbell, H. Anwar, and D. E. Browne, Phys.Rev.X2, [46] G. Duclos-Cianci and K. M. Svore, Phys. Rev. A 88, 042325 041021 (2012). (2013). [20] E. T. Campbell, Phys. Rev. Lett. 113, 230501 (2014). [47] A. J. Landahl and C. Cesare, arXiv:1302.3240. [21] D. Poulin, Phys.Rev.Lett.95, 230504 (2005). [48] G. Duclos-Cianci and D. Poulin, Phys.Rev.A91, 042315 [22] D. Bacon, Phys.Rev.A73, 012340 (2006). (2015). [23] P. Aliferis and A. W. Cross, Phys.Rev.Lett.98, 220502 [49] E. T. Campbell and J. O’Gorman, Quantum Sci. Technol. 1, (2007). 015007 (2016). [24] H. Bombin, Phys. Rev. A 81, 032301 (2010). [50] D. Herr, F. Nori, and S. J. Devitt, New J. Phys. 19, 013034 [25] H. Bombin, R. W. Chhajlany, M. Horodecki, and M. A. Martin- (2017). Delgado, New J. Phys. 15, 055023 (2013). [51] See Supplemental Material at http://link.aps.org/supplemental/ [26] B. Eastin, Phys. Rev. A 87, 032321 (2013). 10.1103/PhysRevA.95.032338 for a Mathematica notebook [27] C. Jones, Phys. Rev. A 87, 022328 (2013). which calculates the dual matrix of G(k) for the Bravyi-Haah [28] R. Raussendorf, J. Harrington, and K. Goyal, New J. Phys. 9, codes and a list of the specific combinations of protocols which 199 (2007). produced the data points in Fig. 5.

032338-19