A Scalable Decoder Micro-Architecture for Fault-Tolerant Quantum Computing

A Scalable Decoder Micro-architecture for Fault-Tolerant Quantum Computing Poulami Das,1 Christopher A. Pattison,2 Srilatha Manne,3 Douglas Carmean,3 Krysta Svore,3 Moinuddin Qureshi,1 and Nicolas Delfosse∗3 1Georgia Institute of Technology, Atlanta, GA, USA 2California Institute of Technology, Pasadena, CA, USA 3Microsoft Quantum Systems and Microsoft Research, Redmond, WA, USA Quantum computation promises significant computational advantages over classical computation for some problems. However, quantum hardware suffers from much higher error rates than in classical hardware. As a result, extensive quantum error correction is required to execute a useful quantum algorithm. The decoder is a key component of the error correction scheme whose role is to identify errors faster than they accumulate in the quantum computer and that must be implemented with minimum hardware resources in order to scale to the regime of practical applications. In this work, we consider surface code error correction, which is the most popular family of error correcting codes for quantum computing, and we design a decoder micro-architecture for the Union-Find decoding algorithm. We propose a three-stage fully pipelined hardware implementation of the decoder that significantly speeds up the decoder. Then, we optimize the amount of decoding hardware required to perform error correction simultaneously over all the logical qubits of the quantum computer. By sharing resources between logical qubits, we obtain a 67% reduction of the number of hardware units and the memory capacity is reduced by 70%. Moreover, we reduce the bandwidth required for the decoding process by a factor at least 30x using low-overhead compression algorithms. Finally, we provide numerical evidence that our optimized micro-architecture can be executed fast enough to correct errors in a quantum computer. Quantum computing promises significant speed-up rection cycles that removes the noise injected by logical over conventional computers for specific applications such gates. Both quantum error correction and logical quan- as integer factorization [1], physics and chemistry simu- tum gates must be implemented through fault-tolerant lations [2–4] and database search [5]. gadgets [11–13] to avoid the injection of pathological er- The primary obstacle to the implementation of quan- ror configurations that would be uncorrectable by the tum algorithms solving industrial size problems is the subsequent correction cycle. high noise rate in any quantum device that makes the In this work, we consider a fault-tolerant quantum output of a quantum computation indistinguishable from computer based on surface codes [14, 15] which is the a random output. A fault-tolerant quantum computer, most promising family of error-correcting codes for very in which quantum bits, or qubits, are regularly refreshed noisy quantum hardware. Surface codes can correct up to by quantum error correction [6], is necessary to perform 1% of noise on all the basic components of the quantum a useful computation on noisy quantum hardware. computer and they can be implemented on a square grid Most classical error correction schemes can be adapted of qubits using exclusively local quantum gates acting on to the quantum setting thanks to the CSS construc- nearest neighbor qubits. For comparison, most quantum tion [7, 8] and the stabilizer formalism [9], providing a error correction codes such as quantum Hamming code quantum version of standard families of classical error- (Steane code) requires an error rate below 10−5 [16]. correcting codes [10] such as repetition codes, Hamming In this paper, we focus on the design of an error de- codes, Reed-Muller codes, BCH codes, LDPC codes and coder, or simply decoder, which is the primary building polar codes. block in charge of error correction. The decoder takes as The main difference with the classical setting is the an input the syndrome, which is the data extracted from arXiv:2001.06598v1 [quant-ph] 18 Jan 2020 very high noise rate that affects current quantum hard- quantum parity check measurements and it returns an ware, often of the order of 1% per quantum gate, which estimation of the error. Given this estimation, the effect makes quantum error correction much more challenging of the error can be easily reversed. In a fault-tolerant to implement. This is because it relies on the measure- quantum computer, the decoder must satisfy the three ment of quantum parity checks that are likely to intro- following design constraints. duce additional noise to the qubits. Moreover, one must 1. Accuracy Constraint: The decoder must cor- be able to implement a universal set of quantum gates on rectly identify the error with high probability. encoded qubits, or logical qubits. In a fault-tolerant quantum computer (FTQC), the computation is performed by 2. Latency Constraint: The decoder must be fast alternating logical quantum gates and quantum error cor- enough to correct errors within one error correction cycle. 3. Scalability Constraint: The decoder design ∗[email protected] must feasible to implement in the regime of practi- 2 cal applications that may require millions of phys- with the unit type and across the logical qubits ical qubits. and hence propose an efficient time-division multi- plexing that allows logical qubits to share decoding A number of surface code decoding algorithms have been resources within a decoder block without compro- proposed in the past 20 years [14, 17–68] and many satisfy mising the error correction capabilities. the Accuracy Constraint which is often the main motivation. However, it remains unclear whether the decoding 4. We propose different compression algorithms can be made fast enough to satisfy the Latency Con- adapted to a cryogenic setting in order to reduce straint without degrading the decoder performance [69]. the bandwidth consumed to send the syndrome Moreover, the decoding problem is generally studied for data to the decoder. a single logical qubit, ignoring the Scalability Constraint, whereas practical applications require hundreds or thou- 5. Combining all the previous results, we describe an sand of logical qubits encoded into millions of physical Error Decoding Architecture (EDA) represented in qubits [3]. A substantial amount of decoding hardware Fig. 1 that scales the decoder design for a large is required to decoding simultaneously all the qubits of FTQC while reducing hardware costs. The number the quantum computer. Most of the work in quantum er- of hardware units is reduced by 67% and we obtain ror correction decoders focuses on algorithmic aspects of a 30× bandwidth reduction while preserving the the problem. Here, we consider this problem through the decoder accuracy. Moreover, we demonstrate by lens of computer architecture and we propose a decoder numerical simulations that our architecture leads to micro-architecture that satisfies simultaneously our three a decoder that is fast enough to satisfy the Latency design contraints. Constraint, despite the high noise rate of quantum hardware. Individual Error Decoders Error Decoding Architecture (EDA) Item 1 and 2 provide a hardware acceleration of the Decoder Decoder … Decoder N-Qubit … N-Qubit Decoder Block Decoder Block decoder and 3 and 4 lead to a satisfying solution to the Scalability Constraint. Our EDA is optimized carefully Syndrome Decompression in order to guarantee that the error correction capability High Bandwidth Interface of the initial UF decoder is preserved, which guarantees Syndrome Compression that the Accuracy Constraint is satisfied. The numerical Qubit Qubit … Qubit Qubit … Qubit Qubit … Qubit simulation of our optimized micro-architecture (item 5) accounts for the limitation imposed by shared hardware resources which demonstrates that our decoder satisfy Figure 1: A naive decoding architecture with a decoder asso- simultaneously the three design constraints. ciated with each logical qubit and our Error Decoding Archi- This article exploits a number of ideas from computer tecture based on optimized decoder blocks shared across mul- tiple logical qubits. A low-overhead compression algorithm is architecture such as pipelining and resource optimiza- used to reduce the bandwidth cost. Our optimized decoder tion. We believe this approach is necessary in order to block is described in Fig. 12. scale quantum machines. Even though we provide a de- tailed micro-architecture for the Union-Find decoder, the We propose a decoder micro-architecture based on the general principles of our design apply to any decoding al- Union-Find decoding algorithm [30, 43]. We choose the gorithm. A key ingredient is the simplicity of the decoder Union-Find decoder for its accuracy and its simplicity. It and the decomposition in independent steps, which leads is proven to achieve good decoding performance. More- to a natural pipeline and a speed-up by instruction-level over, it comes with a almost-linear time complexity and it parallelism. is also very fast in practice because it requires no floating- point arithmetic and no matrix operations. Finally, the simplicity of the decoder allows us to design a special- I. BACKGROUND AND MOTIVATION ized hardware implementation that leads to a significant speed-up. Our main contributions are the following. A. Qubits and decoherence 1. We propose a hardware implementation of the We refer to Preskill’s lecture notes [70] and Nielsen Union-Find decoder based on three hardware units and Chuang’s book [71] for a great overview of field corresponding to each step of the algorithm. of quantum computation and quantum information the- 2. We design a three-stage pipepline based on our ory. A qubit is the basic unit of information in a quan- three hardware units that brings an important tum computer. A qubit is described by complex vector speed-up to the Union-Find decoder by paralleliz- j i = αj0i + βj1i, which represents the superposition ing the implementation of the decoding stages. of two basis state j0i and j1i, with α; β 2 C such that jαj2 + jβj2 = 1. Without error correction, a quantum 3.

Load more