Novel constructions for the fault-tolerant Toffoli gate

Cody Jones∗ Edward L. Ginzton Laboratory, Stanford University, Stanford, California 94305-4088, USA

We present two new constructions for the Toffoli gate which substantially reduce resource costs in fault-tolerant . The first contribution is a Toffoli gate requiring Clifford operations plus only four T = exp(iπσz/8) gates, whereas conventional circuits require seven T gates. An extension of this result is that adding n control inputs to a controlled gate requires 4n T gates, whereas the best prior result was 8n. The second contribution is a for the Toffoli gate which can detect a single σz error occurring with probability p in any one of eight T gates required to produce the Toffoli. By post-selecting circuits that did not detect an error, the posterior error probability is suppressed to lowest order from 4p (or 7p, without the first contribution) to 28p2 for this enhanced construction. In fault-tolerant quantum computing, this construction can reduce the overhead for producing logical Toffoli gates by an order of magnitude.

I. INTRODUCTION to date. Amy et al. studied classical search methods for decomposing gates like Toffoli into fault-tolerant primi- Fault-tolerant quantum computing is the effort to de- tives [17], and Selinger investigated circuit constructions sign quantum information processors which are resilient with particular emphasis on T -gate count and depth, to sufficiently small (but nonzero) probability of failure where the latter metric allows parallel T gates on dif- in any individual component [1, 2]. Enhanced reliability ferent [18]. We use Selinger’s work as our starting comes at the cost of redundancy, and recent study in this point, as we turn his almost-Toffoli gate into a proper area has focused on minimizing the overhead, or addi- Toffoli gate, using four T gates and some quantum tele- tional resource costs, associated with converting a perfect portation. Finally, the importance of this topic has at- quantum operation into a form compatible with error cor- tracted the attention of other researchers, and Eastin has rection [3–5]. This work focuses on the Toffoli gate, which independently discovered equivalent results [19]. appears in both reversible-classical and quantum logic This paper presents two important results. First, Sec- and which may be defined as Tof |x, y, zi = |x, y, z ⊕ xyi, tion II describes how to implement the Toffoli gate with for x, y, z being binary variables. Unlike many quan- only four T gates and Clifford-group operations [2, 20]. tum gates, the quantum Toffoli gate has a classical ana- Second, Section III introduces a Toffoli construction re- logue, so it is favored as a building block for importing quiring eight T gates that can detect an error in any more complex classical operations, such as binary arith- single T gate. This new circuit is an important devel- metic, into quantum algorithms like Shor’s factoring al- opment for fault-tolerant quantum computing, because gorithm [4, 6, 7] and quantum simulation [4, 8, 9]. For it relaxes the requirements on high-fidelity T gates that these reasons, the Toffoli gate is critically important to are expensive to produce; however, the circuit is prob- quantum computing in general, and improvements in the abilistic, and we discuss its proper usage. Section IV design of the Toffoli gate make the realization of large- presents some analysis of the resource costs and error scale quantum computation more tractable. rates for these circuits. The paper concludes with a brief Several researchers have studied circuit constructions discussion of the impact these results have on large-scale for the Toffoli gate. The most oft-cited implementation quantum computing. is probably the one on page 182 of Ref. [2], which may have been derived from Ref. [10]. As can be seen in Ref. [2], the Toffoli gate is decomposed into smaller quan- II. TOFFOLI USING JUST FOUR T GATES tum gates, each of which can be made fault-tolerant by arXiv:1212.5069v1 [quant-ph] 20 Dec 2012 conventional means [1]. The most nettlesome of these is the T = exp(iπσz/8) gate, which is much more expensive In fault-tolerant quantum computing, the most diffi- in both time and space resources to produce [3, 4, 11–16]; cult quantum gates to produce√ are non-Clifford gates. x z notably, the Toffoli circuit in Ref. [2] uses seven T gates. The Hadamard gate H = (1/ 2)(σ + σ ), the phase z In fact, Ref. [10] contains a construction nearly identi- gate S = exp(iπσ /4), and the CNOT gate are generators cal to one derived here (we use four T gates), except for the Clifford group, as any gate in this group can be for an undesirable (−1) phase on one output state (we produced by combinations thereof, up to a global phase show how to correct this with modest effort). However, that we ignore. However, at least one gate outside the “complete” implementations of the Toffoli gate without Clifford group is required for universal quantum comput- a phase error have used seven T gates in the literature ing. The T gate is often selected because it is the easiest to produce; however, as we explain below, “easy” is rel- ative, and this gate is still quite expensive in computing resources. ∗ [email protected] In most quantum codes, including the surface code [21], 2

(a) † x x x T x x y = y S † = y T † y = y z z z H T H 0 T z z

0 (b) x x

y Z y FIG. 2. An error-detecting Toffoli gate. The measurement = is in the σz basis, and obtaining result |1i indicates an error 0 S H was detected, so the qubits should be discarded. z z

FIG. 1. A circuit construction for Toffoli using four T gates. and construction. (a) The Toffoli? circuit by Selinger [18] that is almost a Toffoli, The construction in Figure 1b can also be used to with the difference being the controlled-S† operation. (b) Our add control- inputs to an existing controlled-G gate, ? circuit combines Toffoli with a phase correction and telepor- where G is any unitary. Replace the CNOT in Figure 1b tation to produce an exact Toffoli gate. The measurement is z with controlled-G (targeting however many qubits G acts in the σ basis, and the double vertical lines indicate that the on), and the result is controlled-controlled-G. By iterat- controlled-Z correction is conditioned on the measurement result being |1i. ing this procedure, one can add n controls to controlled- G using 4n T gates. The best prior result required 8n T gates [18]. non-Clifford gates are produced using an ancilla state that is injected into the circuit [2, 20]. As this ancilla is produced in a faulty manner, it must be purified through III. ERROR-DETECTING TOFFOLI CIRCUIT magic state distillation [12, 14]. The handful of rounds of state distillation required to reach the ∼ 10−12 error Whereas the previous section reduced the number of rates required for quantum algorithms are considerably T gates needed to make a Toffoli, this section addresses expensive, such that a single T gate requires ∼ 100× the the resource-cost problem differently by making each circuit volume (product of qubits and time steps) of a T gate less expensive. The cost of a T gate scales in- CNOT or H gate [4], making its production the dominant versely with the probability p of it having an undetected cost among fault-tolerant gate primitives. This poses an error, with a relationship where circuit volume (qubits × issue for quantum computing, as very many T gates in gates) is O (poly(log(1/p))). We introduce a new Toffoli the form of Toffoli gates are required for typical quantum gate that can detect an error in any one of eight T gates. algorithms like integer factoring or quantum simulation. As a result, the effective error probability of the Tof- The first Toffoli gate construction we present uses four foli gate is 28p2 instead of 4p (we only consider lowest T gates instead of seven, thereby reducing the overhead non-vanishing order throughout this paper since p  1). due to state distillation. Even though twice as many T gates are needed, they can Let us denote the Toffoli? gate as the operation in Fig- tolerate larger error rates, so they are substantially less ure 1a, which requires four T gates and was introduced expensive to produce than would otherwise be necessary. by Selinger [18]. Toffoli and Toffoli? differ only by a The error-detecting Toffoli circuit is rather simple to controlled-S† gate between the control qubits x and y. derive. It consists of two Toffoli? gates acting on a target Beginning with Toffoli?, we need only an ancilla qubit, a qubit which is in a bit-flip code [2], as shown in Fig- phase gate S, and teleportation to implement the exact ure 2. The gate with reversed triangles is the inverse Toffoli gate, as shown in Figure 1b. We first apply the operation (Toffoli?)†. Importantly, the controlled-S and Toffoli? using the same controls the desired Toffoli but controlled-S† gates acting on the same qubits x and y are with an ancilla |0i as target. The erroneous controlled- inverse operations, so they cancel. A logically equivalent S† is corrected by a simple S gate applied to the ancilla. decomposition into T gates is shown in Figure 3; this Afterwards, the CNOT and measurement teleport the dou- circuit is convenient for analyzing how errors propagate. bly conditional NOT operation encoded in the ancilla to We assume that |0i preparation, H, CNOT, and measure- the target qubit of the desired Toffoli. The measurement ment operations are perfect, because fault-tolerant error result determines whether a corrective gate of controlled- correction for these processes is economical compared to Z, which is in the Clifford group, is required to correct T gates. A single σz error in any of the T gates will nec- a (−1) phase resulting from measurement back-action. essarily propagate to the syndrome measurement for this One can readily verify that only four T gates are required bit-flip code, as indicated by the red dashed lines. Upon in this procedure [2, 20]. Note that the inverse gate T † such an event, all of the qubits are discarded. Note that requires the same ancilla-based teleportation circuit as σx errors, if present, do not propagate anywhere since T , so these gates are equivalent in state-distillation cost they commute with the CNOT gates; they have no effect 3

a a Error-detecting Toffoli ancilla 0 H X b b 0 H Z X c H H c ab 0 Z 0 H H 0 0 T x

0 T Toffoli inputs: y 0 T † z H

0 T † FIG. 4. Proper use of an error-detecting Toffoli ancilla. All z 0 T † measurements are in the σ basis. The ancilla-production circuit in the grey box in upper left is probabilistic. A mea- 0 T † surement result of |1i indicates the circuit failed, in which case the qubits are discarded. Because the Toffoli ancilla is 0 T not coupled to any other part of the quantum computation, its production can be repeated until success. The subsequent 0 T CNOT gates and measurements teleport data qubits through the Toffoli gate encoded in the ancilla (cf. p. 488 of Ref. [2]). Clifford-group gates are applied conditional on measurements FIG. 3. An error-detecting Toffoli gate. The red dashed lines showing outcome |1i, as indicated by the parallel double lines. indicate how any single σz error will propagate to the readout qubit. The measurement is in the σz basis, and obtaining result |1i indicates an error was detected, so the qubits should be discarded. As long as the ancilla qubits are initialized is straightforward. The latter requires about the half perfectly to |0i and the CNOT and H gates have no errors, then the resources of the former, under our assumption that only σz errors in the T gates matter, as σx errors cannot T gates are the dominant cost. There is also a mod- propagate to data qubits. If the probability of a σz error in est improvement in the Toffoli error rate (7p becomes each T gate is i.i.d. Bernoulli(p), then the success probability 4p). However, in fault-tolerant quantum computing, this is 1−8p and the a posteriori error probability is 28p2, to lowest result is likely overshadowed by the error-detecting con- order in p. struction. Doubling the number of T gates from four to eight to achieve O(p2) Toffoli error rate is usually the correct deci- on the Toffoli gate. sion. The reason is that this approach is more economical The circuit in Figure 3 must be discarded upon a de- than increasing the accuracy of the T gates through fur- tected error event, which happens with probability 8p. ther magic-state distillation (or other fault-tolerant pro- If this circuit were connected by entanglement to other cedures). Bravyi and Haah present a conjecture in the qubits in a quantum algorithm, all qubits must be dis- context of magic-state distillation, stating that to pro- carded, and the algorithm fails. To avoid this scenario, duce one magic state with error O(p2) requires at least one can produce a Toffoli ancilla [2]. If the circuit fails two input states with error p; hence, the resources needed because of a detected error, then the qubits are discarded, to increase T gate accuracy to O(p2) at least doubles, and but no far-reaching damage occurs since this faulty cir- in all practical cases known to this author, the overhead cuit is not entangled to any data qubits. Conditioned on factor is larger than two (an example case is considered the circuit succeeding, the ancilla is teleported into data below). Moreover, there is no known protocol which satu- qubits to enact a Toffoli gate, using only Clifford gates rates this bound. Multilevel distillation comes arbitrarily and measurement, as shown in Figure 4. Using a repre- close as p → 0, but this limiting case is not always rel- −8 sentative value for T -gate error as p = 10 (we consider evant for finite p, and multilevel protocols require large such a scenario in Section IV), the failure probability for and complex circuits [16]. preparing the Toffoli ancilla is a modest 8 × 10−8, which Under conditions relevant to quantum computing, the negligibly increases the number of times such preparation error-detecting Toffoli in Figure 3 can reach the low er- circuits must be repeated. ror rates required for quantum algorithms with one less round of state distillation, leading to as much as an order- of-magnitude reduction in the resources required to pro- IV. RESOURCE ANALYSIS duce a fault-tolerant Toffoli gate. For example, suppose that we wish to produce a Toffoli gate with error prob- Comparing resource costs between the naive Toffoli ability below 10−12. We presume the “raw” T gate an- gate using seven T gates and our construction using four cilla has a σz-error probability of 10−2. Using the results 4 in Ref. [14], the simple Toffoli gate would require four V. CONCLUSIONS T gates distilled to p = 10−15 using a hybrid scheme of one round of Bravyi-Kitaev (BK) distillation and two The Toffoli gate is an ubiquitous operation in quantum rounds of Meier-Eastin-Knill (MEK) distillation, at an computing, as it plays a key role in many quantum algo- average total cost of 1744.8 raw states. Conversely, the rithms. However, quantum computers that realize these error-detecting Toffoli would require just one round each algorithms are still out of reach. In the meantime, en- of BK and MEK distillation circuits for each of the eight gineering a system capable of large-scale, fault-tolerant T gates distilled to p = 10−8, for a total average cost quantum computation demands that quantum computer of 697.6 raw states. The resource savings factor is 2.5×, architects minimize computing resource costs in terms of just in terms of number of undistilled states needed for execution time and machine size. The constructions in distillation. In practice, the resource savings is ampli- this paper substantially reduce the circuit volume for the fied by another factor of 2× because one less round of fault-tolerant Toffoli gate when one considers how expen- distillation is needed (fewer gates means smaller circuit sive each non-Clifford gate T is to produce. In the case volume). Additionally, the state-distillation sub-circuits of the error-detecting Toffoli gate, the resource savings is for the error-detecting Toffoli gate can use weaker error an order of magnitude in a representative example with correction (i.e. lower code distance, by about a factor T -gate error p = 0.01. The improved fault-tolerant Tof- of two) than the same preparation circuits for the sim- foli gate brings large-scale quantum computing closer to ple Toffoli, which translates to fewer qubits and gates at realization. the hardware level [22]. Relative to the Toffoli circuit using seven T gates, there is an additional savings factor of 7/4. Therefore, the error-detecting circuit reduces to- tal overhead for non-Clifford gates by up to an order of magnitude in this representative example. ACKNOWLEDGMENTS It is also noteworthy that if “raw” T gates can be pro- duced with error rate p = 10−4, then the error-detecting This work was supported by the Univ. of Tokyo Spe- Toffoli has a posterior error probability of approximately cial Coordination Funds for Promoting Science and Tech- 3 × 10−7. This would enable modest quantum computa- nology, NICT, and the Japan Society for the Promo- tions using about 106 Toffolis, such as the multiplication tion of Science (JSPS) through its “Funding Program for of two 1000-bit numbers, without the need for resource- World-Leading Innovative R&D on Science and Technol- intensive magic-state distillation. ogy (FIRST Program).”

[1] John Preskill, “Reliable quantum computers,” Proceed- the transverse Ising model,” Phys. Rev. A 79, 062314 ings of the Royal Society of London. Series A: Mathe- (2009). matical, Physical and Engineering Sciences 454, 385–410 [9] N Cody Jones, James D Whitfield, Peter L McMahon, (1998). Man-Hong Yung, Rodney Van Meter, Al’an Aspuru- [2] Michael A. Nielsen and Isaac L. Chuang, Quantum Com- Guzik, and Yoshihisa Yamamoto, “Faster quantum putation and Quantum Information, 1st ed. (Cambridge chemistry simulation on fault-tolerant quantum comput- University Press, 2000). ers,” New Journal of Physics 14, 115023 (2012). [3] N. Isailovic, M. Whitney, Y. Patel, and J. Kubiatow- [10] Adriano Barenco, Charles H. Bennett, Richard Cleve, icz, “Running a quantum circuit at the speed of data,” David P. DiVincenzo, Norman Margolus, Peter Shor, Ty- in 35th International Symposium on Computer Architec- cho Sleator, John A. Smolin, and Harald Weinfurter, ture, 2008 (ISCA’08) (2008). “Elementary gates for quantum computation,” Phys. [4] N. Cody Jones, Rodney Van Meter, Austin G. Fowler, Rev. A 52, 3457–3467 (1995). Peter L. McMahon, Jungsang Kim, Thaddeus D. Ladd, [11] Xinlan Zhou, Debbie W. Leung, and Isaac L. Chuang, and Yoshihisa Yamamoto, “Layered Architecture for “Methodology for quantum construction,” Quantum Computing,” Phys. Rev. X 2, 031007 (2012). Phys. Rev. A 62, 052316 (2000). [5] Austin G. Fowler and Simon J. Devitt, “A bridge to [12] Sergey Bravyi and Alexei Kitaev, “Universal quantum lower overhead quantum computation,” (2012), Preprint computation with ideal clifford gates and noisy ancillas,” arXiv:1209.0510v3. Phys. Rev. A 71, 022316 (2005). [6] Peter W Shor, “Polynomial-time algorithms for prime [13] R. Raussendorf, J. Harrington, and K. Goyal, “Topo- factorization and discrete logarithms on a quantum com- logical fault-tolerance in cluster state quantum computa- puter,” SIAM J. Comput. 26, 1484–1509 (1997). tion,” New Journal of Physics 9, 199 (2007). [7] Rodney Van Meter and Kohei M. Itoh, “Fast quan- [14] Adam M. Meier, Bryan Eastin, and Emanuel Knill, tum modular exponentiation,” Phys. Rev. A 71, 052320 “Magic-state distillation with the four-qubit code,” (2005). (2012), Preprint arXiv:1204.4221v1. [8] Craig R. Clark, Tzvetan S. Metodi, Samuel D. Gasster, [15] Sergey Bravyi and Jeongwan Haah, “Magic state and Kenneth R. Brown, “Resource requirements for distillation with low overhead,” (2012), Preprint fault-tolerant quantum simulation: The ground state of arXiv:1209.2426v1. 5

[16] Cody Jones, “Multilevel distillation of magic [20] Daniel Gottesman and Isaac L. Chuang, “Demonstrat- states for quantum computing,” (2012), Preprint ing the viability of universal quantum computation using arXiv:1210.3388v1. teleportation and single-qubit operations,” Nature 402, [17] Matthew Amy, Dmitri Maslov, Michele Mosca, and Mar- 390–393 (1999). tin Roetteler, “A meet-in-the-middle algorithm for fast [21] Austin G. Fowler, Ashley M. Stephens, and Peter synthesis of depth-optimal quantum circuits,” (2012), Groszkowski, “High-threshold universal quantum com- Preprint arXiv:1206.0758v2. putation on the surface code,” Phys. Rev. A 80, 052312 [18] Peter Selinger, “Quantum circuits of T-depth one,” (2009). (2012), Preprint arXiv:1210.0974v1. [22] Austin G. Fowler, Matteo Mariantoni, John M. Marti- [19] Bryan Eastin, “Distilling one-qubit magic states into Tof- nis, and Andrew N. Cleland, “Surface codes: Towards foli states,” (2012), private communication. practical large-scale quantum computation,” Phys. Rev. A 86, 032324 (2012).