<<

ISCA 2019 Submission #576 Confidential Draft: DO NOT DISTRIBUTE

Asymptotic Improvements to Quantum Circuits via Qutrits

ABSTRACT energy levels. In fact, hardware must actively suppress higher Quantum computation is traditionally expressed in terms of level states in order to achieve the two-level approxi- quantum bits, or . In this work, we instead consider mation. Hence, using three-level qutrits is simply a choice three-level qutrits. Past work with qutrits has demonstrated of including an additional discrete energy level, albeit at the cost of more opportunities for error. only constant factor improvements, owing to the log2(3) binary-to-ternary compression factor. We present a novel Prior work on qutrits (or more generally, d-level qudits) technique using qutrits to achieve a logarithmic depth (run- identified only constant factor gains from extending beyond time) decomposition of the Generalized using qubits. In general, this prior work [9] has emphasized the no ancilla–a significant improvement over linear depth for information compression advantages of qutrits. For example, N qubits can be expressed as N qutrits, which leads to the best qubit-only equivalent. Our circuit construction also log2(3) features a 70x improvement in two-qudit gate count over log2(3) ≈ 1.6-constant factor improvements in runtimes. the qubit-only equivalent decomposition. This results in cir- Our approach utilizes qutrits in a novel fashion, essentially cuit cost reductions for important algorithms like quantum using the third state as temporary storage, but at the cost of neurons and Grover search. We develop an open-source cir- higher per-operation error rates. Under this treatment, the cuit simulator for qutrits, along with realistic near-term noise runtime (i.e. circuit depth or critical path) is asymptotically models which account for the cost of operating qutrits. Simu- faster, and the reliability of computations is also improved. lation results for these noise models indicate over 90% mean Moreover, our approach only applies qutrit operations in reliability (fidelity) for our circuit construction, versus un- an intermediary stage: the input and output are still qubits, der 30% for the qubit-only baseline. These results suggest which is important for initialization and measurement on real that qutrits offer a promising path towards scaling quantum devices [10, 11]. computation.

1. INTRODUCTION Infeasible, Recent advances in both hardware and software for quan- not enough qubits tum computation have demonstrated significant progress to- wards practical outcomes. In the coming years, we expect will have important applications in fields ranging from machine learning and optimization [1] to drug discovery [2]. While early research efforts focused on longer- term systems employing full error correction to execute large instances of algorithms like Shor factoring [3] and Grover search [4], recent work has focused on NISQ (Noisy Inter- Number of Data Qubits Typical mediate Scale Quantum) computation [5]. The NISQ regime Frontier, no space for ancillas considers near-term machines with just tens to hundreds of quantum bits (qubits) and moderate errors. Feasible, Given the severe constraints on quantum resources, it is can use ancillas critical to fully optimize the compilation of a quantum algo- rithm in order to have successful computation. Prior archi- Number of Qubits on Machine tectural research has explored techniques such as mapping, scheduling, and parallelism [6,7,8] to extend the amount Figure 1: The frontier of what quantum hardware can of useful computation possible. In this work, we consider execute is the yellow region adjacent to the 45° line. In another technique: quantum trits (qutrits). this region, each machine qubit is a data qubit. Typi- While quantum computation is typically expressed as a cal circuits rely on non-data ancilla qubits for workspace two-level binary abstraction of qubits, the underlying physics and therefore operate below the frontier. of quantum systems are not intrinsically binary. Whereas classical operate in binary states at the physical The net result of our work is to extend the frontier of what level (e.g. clipping above and below 2.5V), quantum com- quantum computers can compute. In particular, the frontier puters have natural access to an infinite spectrum of discrete is defined by the zone in which every machine qubit is a

1 data qubit, for example a 100-qubit algorithm running on a computation and in quantum computation. It has a control 100-qubit machine. This is indicated by the yellow region in qubit and a target qubit. When the control qubit is in the |1i Figure1. In this frontier zone, we do not have room for non- state, the CNOT performs a NOT operation on the target. The data workspace qubits known as ancilla. The lack of ancilla CNOT gate serves a special role in quantum computation, in the frontier zone is a costly constraint that generally leads allowing quantum states to become entangled so that a pair to inefficient circuits. For this reason, typical circuits instead of qubits cannot be described as two individual qubit states. operate below the frontier zone, with many machine qubits Any operation may be conditioned on one or more controls. used as ancillas. Our work demonstrates that ancillas can Many classical operations, such as AND and OR gates, be substituted with qutrits, enabling us to operate efficiently are irreversible and therefore cannot directly be executed as within the ancilla-free frontier zone. quantum gates. For example, consider the output of 1 from We highlight the three primary contributions of our work: an OR gate with two inputs. With only this information about the output, the value of the inputs cannot be uniquely 1. A circuit construction based on qutrits that leads to determined. These operations can be made reversible by the asymptotically faster circuits (633N → 38log N) than 2 addition of extra, temporary workspace bits initialized to 0. equivalent qubit-only constructions. We also reduce Using a single additional ancilla, the AND operation can be total gate counts from 397N to 6N. computed reversibly as in Figure2. 2. An open-source simulator, based on Google’s [12], which supports realistic noise simulation for qutrit (and |q i • |q i qudit) circuits. 0 0 |q1i • |q1i 3. Simulation results, under realistic noise models, which demonstrate our circuit construction outperforms equiv- |0i |q0 AND q1i alent qubit circuits in terms of error. For our bench- marked circuits, our reliability advantage ranges from Figure 2: Reversible AND circuit using a single ancilla 2x for trapped ion noise models up to more than 10,000x bit. The inputs are on the left, and time flows rightward for superconducting noise models. For completeness, to the outputs. This AND gate is implemented using a we also benchmark our circuit against a qubit-only con- Toffoli (CCNOT) gate with inputs q0, q1 and a single an- struction augmented by an ancilla and find our construc- cilla initialized to 0. At the end of the circuit, q0 and q1 tion is still more reliable. are preserved, and the ancilla bit is set to 1 if and only if both other inputs are 1. The rest of this paper is organized as follows: Section2 presents relevant background about quantum computation Physical systems in classical hardware are typically binary. and Section3 outlines related prior work that we benchmark However, in common quantum hardware, such as in super- our work against. Section4 demonstrates our key circuit conducting and trapped ion computers, there is an infinite construction, and Section5 surveys applications of this con- spectrum of discrete energy levels. The qubit abstraction is struction toward important quantum algorithms. Section6 an artificial approximation achieved by suppressing all but introduces our open-source qudit circuit simulator. Section7 the lowest two energy levels. Instead, the hardware may be explains our noise modeling methodology (with full details configured to manipulate the lowest three energy levels by in AppendixA), and Section8 presents simulation results for operating on qutrits. In general, such a could be these noise models. Finally, we discuss our results at a higher configured to operate on any number of d levels, except as d level in Section9. increases the number of opportunities for error, termed error channels, increases. Here, we focus on d = 3 with which we 2. BACKGROUND achieve the desired improvements to the Generalized Toffoli A qubit is the fundamental unit of quantum computation. gate. Compared to their classical counterparts which take values In a three level system, we consider the computational of either 0 and 1, qubits may exist in a superposition of basis states |0i, |1i, and |2i for qutrits. A qutrit state |ψi may the two states. We designate these two basis states as |0i be represented analogously to a qubit as |ψi = α |0i+β |1i+ and |1i and can represent any qubit as |ψi = α |0i + β |1i γ |2i, wherekαk2 +kβk2 +kγk2 = 1. Qutrits are manipulated with kαk2 + kβk2 = 1. kαk2 and kβk2 correspond to the in a similar manner to qubits; however, there are additional probabilities of measuring |0i and |1i respectively. gates which may be performed on qutrits. Quantum states can be acted on by quantum gates which (a) For instance, in quantum binary logic, there is only a sin- preserve valid probability distributions that sum to 1 and (b) gle X gate. In ternary, there are three X gates denoted X01, guarantee reversibility. For example, the X gate transforms X02, and X12. Each of these Xi j for i 6= j can be viewed as a state |ψi = α |0i + β |1i to X |ψi = β |0i + α |1i. The X swapping |ii with | ji and leaving the third basis element un- gate is also an example of a classical reversible operation, changed. For example, for a qutrit |ψi = α |0i+β |1i+γ |2i, equivalent to the NOT operation. In quantum computation, applying X02 produces X02 |ψi = γ |0i + β |1i + α |2i. Each we have a single irreversible operation called measurement of these operations’ actions can be found in the left state that transforms a into one of the two basis diagram in Figure3. states with a given probability based on α and β. There are two additional non-trivial operations on a single In order to interact different qubits, two-qubit operations trit. They are the +1 and −1 (sometimes referred to as a are used. The CNOT gate appears both in classical reversible +2) operations (with + meaning addition modulo 3). These

2 operations can be written as X01X12 and X12X01, respectively; However, in past work, qudits have conferred only an infor- however, for simplicity, we will refer to them as X+1 and X−1 mation compression advantage. For example, N qubits can operations. A summary of these gates’ actions can be found be compressed to N qudits, giving only a constant-factor log2(d) in the right state diagram in Figure3. advantage [9] at the cost of greater errors from operating qudits instead of qubits. Under the assumption of linear cost d d = |0i |0i scaling with respect to , it has been demonstrated that 3 is optimal [20, 21], although as we show in Section7 the cost is generally superlinear in d. X01 X02 X+1 X+1 The information compression advantage of qudits has been X−1 applied specifically to Grover’s search algorithm [22, 23, 24, 25] and to Shor’s factoring algorithm [26]. Ultimately, the | i | i |1i |2i 1 2 tradeoff between information compression and higher per- X12 X+1 qudit errors has not been favorable in past work. As such, the past research towards building practical quantum computers Figure 3: The five nontrivial permutations on the basis el- has focused on qubits. ements for a qutrit. (Left) Each operation here switches Our work introduces qutrit-based circuits which are asymp- two basis elements while leaving the third unchanged. totically better than equivalent qubit-only circuits. Unlike These operations are self-inverses. (Right) These two op- prior work, we demonstrate a compelling advantage in both erations permute the three basis elements by performing runtime and reliability, thus justifying the use of qutrits. a +1 mod 3 and −1 mod 3 operation. They are each other’s inverses. 3.2 Generalized Toffoli Gate We focus on the Generalized Toffoli gate, which simply Other, non-classical, operations may be performed on a adds more controls to the Toffoli circuit in Figure2. The single qutrit. For example, the Hadamard gate [13] can be Generalized Toffoli gate is an important primitive used across extended to work on qutrits in a similar fashion as the X gate a wide range of quantum algorithms, and it has been the focus was extended. In fact, all single qubit gates, like rotations, of extensive past optimization work. Table1 compares past may be extended to operate on qutrits. In order to distinguish circuit constructions for the Generalized Toffoli gate to our qubit and qutrit gates, all qutrit gates will appear with an construction, which is presented in full in Section 4.2. appropriate subscript. Among prior work, the Gidney [27], He [28], and Barenco Just as single qubit gates have qutrit analogs, the same [29] designs are all qubit-only. The three circuits have varying holds for two qutrit gates. For example, consider the CNOT tradeoffs. While Gidney and Barenco operate at the ancilla- operation, where an X gate is performed conditioned on the free frontier, they have large circuit depths: linear with a control being in the |1i state. For qutrits, any of the X gates large constant for Gidney and quadratic for Barenco. The presented above may be performed, conditioned on the con- Gidney design also requires rotation gates for very small trol being in any of the three possible basis states. Just as angles, which poses an experimental challenge. While the He qubit gates are extended to take multiple controls, qutrit gates circuit achieves logarithmic depth, it requires an ancilla for are extended similarly. each data qubit, effectively halving the effective potential of One question concerning the feasibility of using higher any given quantum hardware. Nonetheless, in practice, most states beyond the standard two is whether these gates can be circuit implementations use these linear-ancilla constructions implemented and perform the desired manipulations. Qutrit due to their small depths and gate counts. gates have been successfully implemented [14, 15, 16] indi- As in our approach, circuit constructions from Lanyon [30], cating it is possible to consider higher level systems apart Ralph [31], and Wang [24] have attempted to improve the from qubit only systems. ancilla-free Generalized Toffoli gate by using qudits. Both the In order to evaluate a decomposition of a , Lanyon [30] and Ralph [31] constructions, which have been we consider quantum circuit costs. The space cost of a circuit, demonstrated experimentally, achieve linear circuit depths by i.e. the number of qubits (or qutrits), is referred to as circuit operating the target as a d = N-level qudit. Wang [24] also width. Requiring ancillas increases the circuit width and achieves a linear circuit depth but by operating each control therefore the space cost of a circuit. The time cost for a as a qutrit. circuit is the depth of a circuit. The depth is given as the Our circuit construction, presented in Section 4.2, has length of the critical path (in terms of gates) from input to similar structure to the He design, which can be represented as output. a binary tree of gates. However, instead of storing temporary results with a linear number of ancilla qubits, our circuit 3. PRIOR WORK temporarily stores information directly in the qutrit |2i state of the controls. Thus, no ancillas are needed. 3.1 Qudits In our simulations, we benchmark our circuit construction Qutrits, and more generally qudits, have been been studied against the Gidney construction [27] because it is the asymp- in past work both experimentally and theoretically. Experi- totically best qubit circuit in the ancilla-free frontier zone. mentally, d as large as 10 has been achieved (including with We label these two benchmarks as QUTRIT and QUBIT. The two-qudit operations) [17], and d = 3 qutrits are commonly QUBIT circuit handles the lack of ancillas by using dirty used internally in many quantum systems [18, 19]. ancillas, which unlike clean (initialized to |0i) ancillas, can

3 This Work Gidney [27] He [28] Barenco [29] Wang [24] Lanyon [30], Ralph [31] Depth logN N logN N2 N N Ancillas 0 0 N 0 0 0 Qudit Types Controls are qutrits Qubits Qubits Qubits Controls are qutrits Target is d = N-level qudit Constants Small Large Small Small Small Small

Table 1: Asymptotic comparison of N-controlled gate decompositions. The total gate count for all circuits scales linearly (except for Barenco [29], which scales quadratically). Our construction uses qutrits to achieve logarithmic depth with- out ancillas. We benchmark our circuit construction against Gidney [27], which is the asymptotically best ancilla-free qubit circuit.

A similar construction for the Toffoli gate is known from past |q0i 1 1 work [30, 31]. The goal is to perform an X operation on |q1i X+1 2 X−1 the last (target) input qubit q2 if and only if the two control qubits, q0 and q1, are both |1i. First a |1i-controlled X+1 is |q2i X performed on q0 and q1. This elevates q1 to |2i iff q0 and q1 were both |1i. Then a |2i-controlled X gate is applied to q2. Figure 4: A Toffoli decomposition via qutrits. Each in- Therefore, X is performed only when both q0 and q1 were put and output is a qubit. The red controls activate on |1i, as desired. The controls are restored to their original |1i and the blue controls activate on |2i. The first gate states by a |1i-controlled X−1 gate, which undoes the effect of the first gate. The key intuition in this decomposition is temporarily elevates q1 to |2i if both q0 and q1 were |1i. We then perform the X operation only if q is |2i. The that the qutrit |2i state can be used instead of ancilla to store 1 temporary information. final gate restores q0 and q1 to their original state. 4.2 Generalized Toffoli Gate have an unknown initial state. Dirty ancillas can therefore We now present our circuit decomposition for the General- be bootstrapped internally from a quantum circuit. How- ized Toffoli gate in Figure5. The decomposition is expressed ever, this technique requires a large number of Toffoli gates in terms of three-qutrit gates (two controls, one target) instead which makes the decomposition particularly expensive in of single- and two- qutrit gates, because the circuit can be gate count. understood purely classically at this granularity. In actual im- Augmenting the base Gidney construction with a single plementation and in our simulation, we used a decomposition ancilla1 does reduce the constants for the decomposition sig- [14] that requires 6 two-qutrit and 7 single-qutrit physically nificantly, although the asymptotic depth and gate counts are implementable quantum gates. maintained. For completeness, we also benchmark our cir- Our circuit decomposition is most intuitively understood cuit against this augmented construction, QUBIT+ANCILLA. by treating the left half of the the circuit as a tree. The desired However, the augmented circuit does not operate at the ancilla- property is that the root of the tree, q , is |2i if and only if each free frontier, and it conflicts with parallelism, as discussed in 7 of the 15 controls was originally in the |1i state. To verify Section9. this property, we observe the root q7 can only become |2i iff q7 was originally |1i and q3 and q11 were both previously 4. CIRCUIT CONSTRUCTION |2i. At the next level of the tree, we see q3 could have only In order for quantum circuits to be executable on hardware, been |2i if q3 was originally |1i and both q1 and q5 were they are typically decomposed into single- and two- qudit previously |2i, and similarly for the other triplets. At the gates. Performing efficient low depth and low gate count bottom level of the tree, the triplets are controlled on the |1i decompositions is important in both the NISQ regime and be- state, which are only activated when the even-index controls yond. Our circuits assume all-to-all connectivity–we discuss are all |1i. Thus, if any of the controls were not |1i, the |2i this assumption in Section9. states would fail to propagate to the root of the tree. The right half of the circuit performs uncomputation to restore the 4.1 Key Intuition controls to their original state. To develop intuition for our technique, we first present After each subsequent level of the tree structure, the num- a Toffoli gate decomposition which lays the foundation for ber of qubits under consideration is reduced by a factor of our generalization to multiple controls. In each of the fol- ∼ 2. Thus, the circuit depth is logarithmic in N. Moreover, lowing constructions, all inputs and outputs are qubits, but each qutrit is operated on by a constant number of gates, so we may occupy the |2i state temporarily during computa- the total number of gates is linear in N. tion. Maintaining binary input and output allows these circuit Our circuit decomposition still works in a straightforward constructions to be inserted into any preexisting qubit-only fashion when the control type of the top qubit, q0, activates on circuits. |2i or |0i instead of activating on |1i. These two constructions In Figure4, a Toffoli decomposition using qutrits is given. are necessary for the Incrementer circuit in 5.3. We verified our circuits, both formally and via simulation. 1This ancilla can also also be dirty. Our verification scripts are available on our GitHub [32].

4 5.2 Grover’s Algorithm |q0i 1 1 Grover’s Algorithm√ for search over M unordered items |q1i X+1 2 2 X−1 requires just O( M) oracle queries. However, each oracle query is followed by a post-processing step which requires |q2i 1 1 a multiply-controlled gate with N = dlog2 Me controls [13]. |q3i X+1 2 2 X−1 The explicit circuit diagram is shown in Figure6.

|q4i 1 1

|q5i X+1 2 2 X−1 H X 1 X H

|q6i 1 1 H X 1 X H |q i X X 7 +1 2 −1 Oracle |q8i 1 1 H X 1 X H |q9i X+1 2 2 X−1 H X Z X H |q10i 1 1 |q i X 2 2 X 11 +1 −1 Figure 6: Each iteration of Grover Search has a multiply- |q12i 1 1 controlled Z gate. Our logarithmic depth decomposition, reduces a logM factor in Grover’s algorithm to loglogM. |q13i X+1 2 2 X−1

|q14i 1 1 Our log-depth circuit construction directly applies to the multiply-controlled gate. Thus, we reduce a logM factor in |q i 15 U Grover search’s time complexity to loglogM via our ancilla- free qutrit decomposition. Figure 5: Our circuit decomposition for the Generalized Toffoli gate is shown for 15 controls and 1 target. The 5.3 Incrementer inputs and outputs are both qubits, but we allow occupa- The Incrementer circuit performs the +1 mod 2N oper- tion of the |2i qutrit state in between. The circuit has a ation to a register of N qubits. While logarithmic circuit tree structure and maintains the property that the root depth can be achieved with linear ancilla qubits [35], the of each subtree can only be elevated to |2i if all of its con- best ancilla-free incrementers require either linear depth with trol leaves were |1i. Thus, the U gate is only executed if large linearity constants [36] or quadratic depth [29]. Us- all controls are |1i. The right half of the circuit performs ing alternate control activations for our Generalized Toffoli uncomputation to restore the controls to their original gate decomposition, the incrementer circuit is reduced to state. This construction applies more generally to any O(log2 N) depth with no ancilla, a significant improvement multiply-controlled U gate. over past work. Our incrementer circuit construction is shown in Figure7 for an N = 8 wide register. The multiple-controlled X+1 gates 5. APPLICATION TO ALGORITHMS perform the job of computing carries: a carry is performed iff the least significant bit generates (represented by the |2i The Generalized Toffoli gate is an important primitive in control) and all subsequent bits propagate (represented by a broad range of quantum algorithms. In this section, we the consecutive |1i controls). We present an N = 8 incre- survey some of the applications of our circuit decomposition. menter here and have verified the general construction, both by formal proof and by explicit circuit simulation for larger 5.1 Artificial Quantum Neuron N. The artificial quantum neuron [33] is a promising target ap- The critical path of this circuit is the chain of logN multiply- N N N plication for our circuit construction, because the algorithm’s controlled gates (of width 2 , 4 , 8 , ...) which act on |a0i. circuit implementation is dominated by large Generalized Since our multiply-controlled gate decomposition has log- Toffoli gates. The algorithm may exhibit an exponential depth, we arrive at a total circuit depth circuit scaling of advantage over classical perceptron encoding and it has al- log2 N. ready been executed on current quantum hardware. More- over, the threshold behavior of perceptrons has inherent noise 5.4 Arithmetic Circuits and Shor’s Algorithm resilience, which makes the artificial quantum neuron partic- The Incrementer circuit is a key subcircuit in many other ularly promising as a near-term application on noisy systems. arithmetic circuits such as constant addition, modular multi- The current implementation of the neuron on IBM quantum plication, and modular exponentiation. Further, the modular computers relies on ancilla qubits [34] which constrains the exponentiation circuit is the bottleneck in the runtime for circuit width to N = 4 data qubits. Our circuit construction executing Shor’s algorithm for factorization [36,37]. While a offers a path to larger circuit sizes without waiting for larger shallower Incrementer circuit alone is not sufficient to reduce hardware. the asymptotic cost of modular exponentiation (and there-

5 proportional to the circuit width. By contrast, Cirq’s default |a0i X+1 2 2 2 2 2 X02 |(a + 1)0i simulation procedure relies on a dense state vector representa- tion requiring space and time exponential in the circuit width. |a i 1 1 X 0 0 |(a + 1) i 1 01 1 Reducing this scaling from exponential to linear dramatically |a2i 1 X+1 2 X02 0 |(a + 1)2i improved our verification procedure, allowing us to verify circuit constructions for all possible classical inputs across |a3i 1 X01 0 |(a + 1)3i circuit sizes up to widths of 14. Our software is fully open source [32]. |a4i X+1 2 2 2 X02 |(a + 1)4i 6.1 Noise Simulation |a i 1 X 0 |(a + 1) i 5 01 5 Figure8 depicts a schematic view of our noise simulation |a6i X+1 2 X02 |(a + 1)6i procedure which accounts for both gate errors and idle er- rors, described below. To determine when to apply each gate |a7i X01 |(a + 1)7i and idle error, we use Cirq’s scheduler which schedules each gate as early as possible, creating a sequence of Moment’s of simultaneous gates. During each Moment, our noise simula- Figure 7: Our circuit decomposition for the Incrementer. tor applies a gate error to every qudit acted on. Finally, the At each subcircuit in the recursive design, multiply- simulator applies an idle error to every qudit. This noise sim- controlled gates are used to efficiently propagate carries ulation methodology is consistent with previous simulation over half of the subcircuit. The |2i control checks for techniques which have accounted for either gate errors [41] carry generation and the chain of |1i controls checks or idle errors [42]. for carry propagation. The circuit depth is log2 N, which is only possible because of our log depth multiply- controlled gate primitive. U1 U1 Gate Error Idle Error

Idle Error fore Shor’s algorithm), it does reduce constants relative to qubit-only circuits. U2 =⇒ U2 Gate Error Idle Error

5.5 Error Correction and Fault Tolerance Idle Error The Generalized Toffoli gate has applications to circuits U3 U3 Gate Error for both error correction [38] and fault tolerance [39]. We Idle Error foresee two paths of applying these circuits. First, our circuit construction can be used to construct error-resilient logical Figure 8: This Moment comprises three gates executed in qubits more efficiently. This is critical for quantum algo- parallel. To simulate with noise, we first apply the ideal rithms like Grover’s and Shor’s which are expected to require gates, followed by a gate error noise channel on each af- such logical qubits. In the nearer-term, NISQ algorithms are fected qudit. This gate error noise channel depends on likely to make use of limited error correction. For instance, whether the corresponding gate was single- or two- qu- recent results have demonstrated that error correcting a sin- dit. Finally, we apply an idle error to every qudit. The gle qubit at a time for the Variational Quantum Eigensolver idle error noise channel depends on the duration of the algorithm can significantly reduce total error [40]. Thus, Moment. our circuit construction is also relevant for NISQ-era error correction. Gate errors arise from the imperfect application of quantum gates. Two-qudit gates are noisier than single-qudit gates [43], 6. SIMULATOR so we apply different noise channels for the two. Our specific To simulate our circuit constructions, we developed a qudit gate error probabilities are given in Section7. simulation library, built on Google’s Cirq Python library [12]. Idle errors arise from the continuous decoherence of a Cirq is a qubit-based quantum circuit library and includes quantum system due to energy relaxation and interaction a number of useful abstractions for quantum states, gates, with the environment. The idle errors differ from gate errors circuits, and scheduling. in two ways which require special treatment: Our work extends Cirq by discarding the assumption of 1. Idle errors depend on duration, which in turn depend two-level qubit states. Instead, all state vectors and gate on the schedule of simultaneous gates (Moments). In matrices are expanded to apply to d-level qudits, where d is particular, two-qudit gates take longer to apply than a circuit parameter. We include a library of common gates single-qudit gates. Thus, if a Moment contains a two- for d = 3 qutrits. Our software adds a comprehensive noise qudit gate, the idling errors must be scaled appropriately. simulator, detailed below in Section 6.1. Our specific scaling factors are given in Section7. In order to verify our circuits are logically correct, we first simulated them with noise disabled. We extended Cirq to al- 2. For the generic model of gate errors, the error channel low gates to specify their action on classical non-superposition is applied with probability independent of the quantum input states without considering full state vectors. Therefore, state. This is not true for idle errors such as T1 am- each classical input state can be verified in space and time plitude damping, which only applies when the qudit

6 is an excited state. This is treated in the simulator by A naive circuit simulation implementation would treat ev- computing idle error probabilities during each Moment, ery quantum gate or Moment as a dN × dN matrix. For 14 for each qutrit. qutrits, a single such matrix would occupy 366 terabytes– Gate errors are reduced by performing fewer total gates, and out of range of simulability. While the exponential nature idle errors are reduced by decreasing the circuit depth. Since of simulating our circuits is unavoidable, we mitigate the our circuit constructions asymptotically decrease the depth, cost by using a variety of techniques which rely only on this means our circuit constructions scale favorably in terms state vectors, rather than full square matrices. For example, we maintain Cirq’s approach of applying gates by Einstein of asymptotically fewer idle errors. N N Our full noise simulation procedure is summarized in Algo- Summation [46], which obviates computation of the d × d rithm1. The ultimate metric of interest is the mean fidelity, matrix corresponding to every gate or Moment. which is defined as the squared overlap between the ideal Our noise simulator only relies on state vectors, by adopt- (noise-free) and actual output state vectors. Fidelity expresses ing the quantum trajectory methodology [47, 48], which is the probability of overall successful execution. We do not also used by the Rigetti PyQuil noise simulator [49]. At a consider initialization errors and readout errors, because our high level, the effect of noise channels like gate and idle errors circuit constructions maintain binary input and output, only is to turn a coherent quantum state into an incoherent mix of occupying the qutrit |2i states during intermediate computa- classical probability-weighted quantum states (for example, tion. Therefore, the initialization and readout errors for our |0i and |1i with 50% probability each). The most complete circuits are identical to those for conventional qubit circuits. description of such an incoherent quantum state is called the density matrix and has dimension dN × dN. The quantum trajectory methodology is a stochastic approach–instead of |Ψi ← random initial state vector maintaining a density matrix, only a single state is propa- |Ψiideal = circuit applied to |Ψi without noise gated and the error term is drawn randomly at each timestep. foreach Moment do Over repeated trials, the quantum trajectory methodology foreach Gate ∈ Moment do converges to the same results as from full density matrix |ψi ← Gate applied to |ψi simulation [49]. Our simulator employs this technique–each GateError ← DrawRand(GateError Prob.) simulation in Algorithm1 constitutes a single quantum trajec- |ψi ← GateError applied to |ψi tory trial. At every step, a specific GateError or IdleError end term is picked, based on a weighted random draw. Finally, our random state vector generation function was foreach Qutrit do also implemented in O(dN) space and time. This is an im- if Moment has 2-qudit gate then provement over other open source libraries [50, 51], which IdleErrors ← long-duration idle errors perform random state vector generation by generating full else dN × dN unitary matrices from a Haar-random distribution IdleErrors ← short-duration idle errors and then truncating to a single column. Our simulator directly end computes the first column and circumvents the full matrix Prob. ← [kM|Ψik2 for M ∈ IdleErrors] computation. IdleError ← DrawRand(Prob.) With optimizations, our simulator is able to simulate cir- |ψi ← IdleError applied to |ψi cuits up to 14 qutrits in width. This is in the range as other Renormalize(|ψi) state-of-the-art noisy quantum circuit simulations [52] (since end 14 qutrits ≈ 22 qubits). While each simulation trial took end several minutes (depending on the particular circuit and noise 2 model), we were able to run trials in parallel over multiple return hΨideal|Ψi , fidelity btwn ideal & actual output; processes and multiple machines, as described in Section8. Algorithm 1: Pseudocode for each simulation trial, given a particular circuit and noise model. 7. NOISE MODELS We also do not consider crosstalk errors, which occur when In this section, we describe our noise models at a high gates are executed in parallel. The effect of crosstalk is level, with mathematical details described in AppendixA. very device-dependent and difficult to generalize. Moreover, We chose noise models which represent realistic near-term crosstalk can be mitigated by breaking each Moment into a machines. We first present a generic, parametrized noise small number of sub-moments and then scheduling two-qutrit model roughly applicable to all quantum systems. We then operations to reduce crosstalk, as demonstrated in prior work present specific parameters, under the generic noise model, [44, 45]. which apply to near-term superconducting quantum comput- ers. Finally, we present a specific noise model for trapped ion 6.2 Simulator Efficiency quantum computers. Simulating a quantum circuit with a classical computer is, in general, exponentially difficult in the size of the input 7.1 Generic Noise Model because the state of N qudits is represented by a state vector of dN complex numbers. For 14 qutrits, with complex numbers 7.1.1 Gate Errors stored as two 8-byte floats (complex128 in NumPy), a state The scaling of gate errors for a d-level qudit can be roughly vector occupies 77 megabytes. summarized as increasing as d4 for two-qudit gates and d2 for

7 single-qudit gates. For d = 2, there are 4 single-qubit gate er- Noise Model 3p1 15p2 T1 ror channels and 16 two-qubit gate error channels. For d = 3 SC 10−4 10−3 1 ms there are 9 and 81 single- and two- qutrit gate error chan- SC+T1 10−4 10−3 10 ms nels respectively. Consistent with other simulators [49, 42], SC+GATES 10−5 10−4 1 ms we use the symmetric depolarizing gate error model, which SC+T1+GATES 10−5 10−4 10 ms assumes equal probabilities between each error channel. Un- der these noise models, two-qutrit gates are (1 − 80p2)/(1 − Table 2: Noise models simulated for superconducting de- 15p2) times less reliable than two-qubit gates, where p2 is the vices. Current publicly accessible IBM superconducting probability of each two-qubit gate error channel. Similarly, quantum computers have single- and two- qubit gate er- single-qutrit gates are (1−8p1)/(1−3p1) times less reliable −3 −2 rors of 3p1 ≈ 10 and 15p2 ≈ 10 , as well as T1 life- than single-qubit gates, where p1 is the probability of each times of 0.1 ms [43, 53]. Our baseline benchmark,SC, single-qubit gate error channel. assumes 10x better gate errors and T1. The other three benchmarks add a further 10x improvement to T1, gate 7.1.2 Idle Errors errors, or both. Our treatment of idle errors focuses on the relaxation from higher to lower energy states in quantum devices. This is Noise Model p1 p2 called amplitude damping or T relaxation. This noise chan- 1 TI_QUBIT 6.4 × 10−4 1.3 × 10−4 nel irreversibly takes qudits to lower states. For qubits, the . × −4 . × −4 only amplitude damping channel is from |1i to |0i, and we BARE_QUTRIT 2 2 10 4 3 10 DRESSED_QUTRIT 1.5 × 10−4 3.1 × 10−4 denote this damping probability as λ1. For qutrits, we also model damping from |2i to |0i, which occurs with probability Table 3: Noise models simulated for trapped ion devices. λ . 2 The single- and two- qutrit gate error channel probabili- 7.2 Superconducting QC ties are based on calculations from experimental param- eters. For all three models, we use single- and two- qudit We chose four noise models based on superconducting gate times of ∆t ≈ 1 µs and ∆t ≈ 200 µs respectively. quantum computers expected in the next few years. These noise models comply with the generic noise model above and are thus parametrized by p1, p2, λ1 and λ2. The λi proba- increased by 10x every 3 years for the past 20 years [58]. bilities are derived from two other experimental parameters: Hence, 100x longer T1 is a reasonable projection for devices the gate time ∆t and T1, a timescale that captures how long a that are ∼ 6 years away. qudit persists coherently. As a starting point for representative near-term noise mod- 7.3 Trapped Ion 171Yb+ QC els, we consider parameters for current superconducting quan- We also simulated noise models for trapped ion quantum tum computers. For IBM’s public cloud-accessible super- computing devices. Trapped ion devices are well matched −3 conducting quantum computers, we have 3p1 ≈ 10 and to our qutrit-based circuit constructions because they fea- −2 15p2 ≈ 10 . The duration of single- and two- qubit gates is ture all-to-all connectivity [59], and many ions that are ideal ∆t ≈ 100ns and ∆t ≈ 300ns respectively, and the IBM devices candidates for QC devices are naturally multi-level systems. have T1 ≈ 100µs [43, 53]. We focus on the 171Yb+ ion, which has been experimen- However, simulation for these current parameters indicates tally demonstrated as both a qubit and qutrit [10, 11]. Trapped an error is almost certain to occur during execution of a mod- ions are often favored in QC schemes due to their long T1 est size 14-input Generalized Toffoli circuit. This motivates times. One of the main advantages of using a trapped ion us to instead consider noise models for better devices which is the ability to take advantage of magnetically insensitive are a few years away. Accordingly, we adopt a baseline super- states known as "clock states." By defining the computa- conducting noise model, labeled asSC, corresponding to a tional subspace on these clock states, idle errors caused from superconducting device which has 10x lower gate errors and fluctuations in the magnetic field are minimized. However, 10x longer T1 duration than the current IBM hardware. This compared to superconducting devices, gates are much slower. range of parameters has already been achieved experimen- Thus, gate errors are the dominant error source for ion trap tally in superconducting devices for gate errors [54, 55] and devices. We modelled a fundamental source of these errors: for T1 duration [56, 57] independently. Faster gates (shorter the spontaneous scattering of photons originating from the ∆t) are yet another path towards greater noise resilience. We lasers used to drive the gates. The duration of single- and do not vary gate speeds, because errors only depend on the two- qubit gates used in this calculation was ∆t ≈ 1 µs and ∆t/T1 ratio, and we already vary T1. In practice however, ∆t ≈ 200 µs respectively [60]. The single- and two- qudit faster gates could also improve noise-resilience. gate error probabilities are given in Table3. We also consider three additional near-term device noise models, indexed to theSC noise model. These three models further improve gate errors, T1, or both, by a 10x factor. The 8. RESULTS specific parameters are given in Table2. Our 10x improve- Figure9 plots the exact circuit depths for all three bench- ment projections are realistic extrapolations of progress in marked circuits. The qubit-based circuit constructions from hardware. In particular, Schoelkopf’s Law–the quantum ana- past work are linear in depth and have a high linearity con- logue of Moore’s Law–has observed that T1 durations have stant. Augmenting with a single borrowed ancilla reduces the

8 circuit depth by a factor of 8. However, both circuit construc- These simulations represent over 20,000 CPU hours, which tions are surpassed significantly by our qutrit construction, was sufficient to estimate mean fidelity to an error of 2σ < which scales logarithmically in N and has a relatively small 0.1% for each circuit-noise model pair. leading coefficient. The full results of our circuit simulations are shown in Figure 11. All simulations are for the 14-input (13 controls, Circuit Depth 1 target) Generalized Toffoli gate. We simulated each of the three circuit benchmarks against each of our noise models QUBIT QUBIT+ANCILLA QUTRIT (when applicable), yielding the 16 bars in the figure. 105 ∼ 633N 9. DISCUSSION 104 ∼ 76N Figure 11 demonstrates that our QUTRIT construction (or- ange bars) significantly outperforms the ancilla-free QUBIT benchmark (blue bars) in fidelity (success probability) by 3 10 more than 10,000x. For theSC, SC+T1, and SC+GATES noise models, our 102 ∼ 38log (N) qutrit constructions achieve between 57-83% mean fidelity, 2 whereas the ancilla-free qubit constructions all have almost 0% fidelity. Only the lowest-error model, SC+T1+GATES 101 achieves modest fidelity of 26% for the QUBIT circuit, but 25 50 75 100 125 150 175 200 in this regime, the qutrit circuit is close to 100% fidelity. Number of Qudits The trapped ion noise models achieve similar results–the DRESSED_QUTRIT and the BARE_QUTRIT achieve ap- Figure 9: Exact circuit depths for all three benchmarked proximately 95% fidelity via the QUTRIT circuit, whereas circuit constructions up to N = 200. Both QUBIT and the TI_QUBIT noise model has only 45% fidelity. Between QUBIT+ANCILLA scale linearly in depth and both are the dressed and bare qutrits, the dressed qutrit exhibits higher bested by QUTRIT’s logarithmic depth. fidelity than the bare qutrit, as expected. Moreover, as dis- cussed in Appendix A.3, the dressed qutrit is resilient to Figure 10 plots the total number of two-qudit gates for all leakage errors, so the simulation results should be viewed as three circuit constructions. As noted in Section4, our circuit a lower bound on its advantage over the qubit and bare qutrit. construction is not asymptotically better in total gate count– Based on these results, trapped ion qutrits are a particularly all three plots have linear scaling. However, as emphasized strong match to our qutrit circuits. In addition to attaining by the logarithmic vertical axis, the linearity constant for our the highest fidelities, trapped ions generally have all-to-all qutrit circuit is 70x smaller than for the equivalent ancilla- connectivity [59] within each ion chain, which is critical as free qubit circuit and 8x smaller than for the borrowed-ancilla our circuit construction requires operations between distant qubit circuit. qutrits. The superconducting noise models also achieve good fi- Two-Qudit Gate Count delities. They exhibit a particularly large advantage over ancilla-free qubit constructions because idle errors are signif- QUBIT QUBIT+ANCILLA QUTRIT 105 icant for superconducting systems, and our qutrit construc- ∼ 397N tion significantly reduces idling (circuit depth). However, most superconducting quantum systems only feature nearest- 104 neighbor or short-range connectivity. Accounting for data ∼ 48N movement on a nearest-neighbor-connectivity 2D architec-√ ture would expand the qutrit circuit depth from logN to N 103 ∼ 6N (since√ the distance between any two qutrits would scale as N). However, recent work has experimentally demonstrated 102 fully-connected superconducting quantum systems via ran- dom access memory [61]. Such systems would also be well matched to our circuit construction. 101 For completeness, Figure 11 also shows fidelities for the 25 50 75 100 125 150 175 200 QUBIT+ANCILLA circuit benchmark, which augments the Number of Qudits ancilla-free QUBIT circuit with a single dirty ancilla. Since QUBIT+ANCILLA has linearity constants ∼ 10x better than Figure 10: Exact two-qudit gate counts for the three the ancilla-free qubit circuit, it exhibits significantly better benchmarked circuit constructions. All three plots scale fidelities. While our QUTRIT circuit still outperforms the linearly; however the QUTRIT construction has a sub- QUBIT+ANCILLA circuit, we expect a crossing point where stantially lower linearity constant. augmenting a qubit-only Generalized Toffoli with enough ancillas would eventually outperform QUTRIT. However, we Our simulations under realistic noise models were run in emphasize that the gap between an ancilla-free and constant- parallel on over 100 n1-standard-4 Google Cloud instances. ancilla construction for the Generalized Toffoli is actually a

9 Fidelity for Superconducting Models Fidelity for Trapped Ion Models

QUBIT QUBIT+ANCILLA QUTRIT 94.7% 94.9% 96.1% 100% 89.9% 83.1% 84.1% 75% 65.9% . 56 8% 52.3% 50% 44.7% 30.2% 26.1% 25% 18.5%

0.01% 0.56% 0.01% BARE_QUTRIT DRESSED_QUTRIT 0% SC SC+T1 SC+GATES SC+T1+GATES TI_QUBIT

Figure 11: Circuit simulation results for all possible pairs of circuit constructions and noise models. Each bar represents 1000+ trials, so the error bars are all 2σ < 0.1%. Our QUTRIT construction significantly outperforms the QUBIT construction. The QUBIT+ANCILLA bars are drawn with dashed lines to emphasize that it has access to an extra ancilla bit, unlike our construction. fundamental rather than an incremental gap, because: • While d = 3 qutrits were sufficient to achieve the de- • Constant-ancilla constructions prevent circuit paralleliza- sired asymptotic speedups for our circuits of interest, tion. For example, consider the parallel execution of there may be other circuits that are optimized by qudit N/k disjoint Generalized Toffoli gates, each of width k information carriers for larger d. In particular, we note for some constant k. An ancilla-free Generalized Toffoli that increasing d and thereby increasing information would pose no issues, but an ancilla-augmented Gen- compression may be advantageous for hardware with eralized Toffoli would require Θ(N/k) ancillas. Thus, limited connectivity. constant-ancilla constructions can impose a choice be- tween serializing to linear depth or regressing to linear Independent of these future directions, the results presented ancillas. The Incrementer circuit in Figure7 is a con- in this work are applicable to quantum computing in the near crete example of this scenario–any multiply-controlled term, on machines that are expected within the next five years. gate decomposition requiring a single clean ancilla or The net result of this work is to extend the frontier of what is more than 1 dirty ancilla would contradict the paral- computable by quantum hardware, and hence to accelerate the lelism and reduce runtime. timeline for practical quantum computing, rather than waiting for better hardware. Emphatically, our results are driven by • Even if we only consider serial circuits, given the expo- the use of qutrits for asymptotically faster ancilla-free circuits. nential advantage of certain quantum algorithms, there Moreover, we also also improve linearity constants by two is a significant practical difference between operating orders of magnitudes. Finally, as verified by our open-source at the ancilla-free frontier and operating just a few data circuit simulator coupled with realistic noise models, our qubits below the frontier. circuits are more reliable than qubit-only equivalents. Our While we only performed simulations up to 14 inputs in results justify the use of qutrits as a path towards scaling width, we would see an even bigger advantage in larger cir- quantum computers. cuits because our construction has asymptotically lower depth and therefore asymptotically lower idle errors. Our circuit construction and simulation results point to- APPENDIX wards promising directions of future work that we highlight below: A. DETAILED NOISE MODEL • A number of useful quantum circuits, especially arith- We chose noise models that represent realistic near-term metic circuits, make extensive use of multiply-controlled machines. We first present a generic, parametrized noise gates. However, these circuits are typically pre-compiled model in that is roughly applicable to all quantum systems. into single- and two- qubit gates using one of the de- Next, we present specific parameters, under the generic noise compositions from prior work, usually one that involves model, that apply to near-term superconducting quantum ancilla qubits. Revisiting these arithmetic circuits from computers. Finally, we present a specific noise model for first principles, with our qutrit circuit as a new tool, 171Yb+ trapped ions. could yield novel and improved circuits like our Incre- menter circuit in Section 5.3. A.1 Generic Noise Model • Relatedly, we see value in a logic synthesis tool that in- The general form of a quantum noise model is expressed jects qutrit optimizations into qubit circuits, automated by the Kraus Operator formalism which specifies a set of in fashion inspired by classical reversible logical syn- matrices, {Ki}, each capturing an error channel. Under this thesis tools [62, 63]. formalism, the evolution of a system with initial state σ =

10 |ΨihΨ| is expressed as a function E (σ), where: Note that in this model, the dominant effect of using qutrits † instead of qudits is that the no-error probability for two- E (σ) = E (|ΨihΨ|) = KiσK (1) ∑ i operand gates diminishes from 1 − 15p2 to 1 − 80p2, as ex- i pressed by equations4 and6 respectively. where † denotes the matrix conjugate-transpose. A.1.2 Idle Errors A.1.1 Gate Errors For qubits, the Kraus operators for amplitude damping are: For a single qubit, there are four possible error channels: no-error, bit flip, phase flip, and phase+bit flip. These chan-    √  1√ 0 0 λ1 nels can be expressed as products of the Pauli matrices: K0 = and K1 = (7) 0 1 − λ1 0 0 0 1 1 0  X = and Z = 1 0 0 −1 For qutrits, the Kraus operator for amplitude damping can be modeled as [65, 66]: which correspond to bit and phase flips respectively. The 0 0 no-error channel is X Z = I and the phase+bit flip channel    √  is the product X1Z1. 1√ 0 0 0 λ1 0 In the Kraus operator formalism, we express this single- K0 = 0 1 − λ1 √ 0 , K1 = 0 0 0, qubit gate error model as 0 0 1 − λ2 0 0 0 1 1 j k j k † √ E (σ) = p jk(X Z )σ(X Z ) (2)   ∑ ∑ 0 0 λ2 j=0 k=0 and K2 = 0 0 0  (8) where p jk denotes the probability of the corresponding Kraus 0 0 0 operator. This gate error model is called the Pauli or depolarizing As discussed in Section 6.1, these noise channels are in- channel. We assume all error terms have equal probabilities, coherent (non-unitary), which means that the probability of i.e. p jk = p1 for j,k 6= 0. This assumption of symmetric each error occurring depends on the current state. Specifi- depolarizing is standard and is used by most noise simulators cally, the probability of the Ki channel affecting the state |Ψi 2 [42]. Under this model, the error channel simplifies to: is kKi |ψik [13]. j k j k † E (σ) = (1 − 3p1)σ + ∑ p1(X Z )σ(X Z ) (3) A.2 Superconducting QC jk∈{0,1}2\00 We picked four noise models based on superconducting For two-qubit gate errors, the Kraus operators are the Carte- quantum computers that are expected in the next few years. sian product of the two single-qubit gate error Kraus opera- These noise models comply with the generic noise model tors, leading to the noise channel: above and are thus parametrized by p1, p2, λ1, and λ2. The † λm terms are given by [66]: E (σ) = (1 − 15p2)σ + ∑ p2KjklmσKjklm (4) jklm∈{0,1}4\0000 −m∆t/T λm = 1 − e 1 (9) where p2 is the probability of each error term and Kjklm = X jZk ⊗ XlZm. where ∆t is the duration of the idling and T1 is associated Next, for qutrits, we have a similar form, except that there with the lifetime of each qubit. are now more possible error channels. We now use the gener- 171 + alized Pauli matrices: A.3 Trapped Ion Yb QC Based on calculations from experimental parameters for 0 0 1 1 0 0  the trapped ion qutrit, we know the specific Kraus operator X = 1 0 0 Z = 0 e2πi/3 0 +1   and 3   types for the error terms, which deviate slightly from those in 0 1 0 4πi/3 0 0 e the generic error model. The specific Kraus operator matrices 2 2 are provided at our GitHub repository [32]. The Cartesian product of {I,X+1,X+1} and {I,Z3,Z3 } con- stitutes a basis for all 3x3 matrices. Hence, this Cartesian We chose three noise models: TI_QUBIT, product also constitutes the Kraus operators for the single- BARE_QUTRIT, and DRESSED_QUTRIT. Both qutrit gate error [64, 65, 41]: TI_QUBIT and DRESSED_QUTRIT take advantage of clock states and thus have very small idle errors. j k j k † E (σ) = (1 − 8p1)σ + ∑ p1(X+1Z3)σ(X+1Z3) They both would be ideal candidates for a qudit. The jk∈{0,1,2}2\00 BARE_QUTRIT will suffer more from idle errors as it is (5) not strictly defined on clock states but will require less experimental resources to prepare. Idle errors are very small and similarly, the two-qutrit gate error channel is: in magnitude and manifest as coherent phase errors rather than amplitude damping errors as modeled in Section 7.1.2. † E (σ) = (1 − 80p2)σ + ∑ p2KjklmσKjklm (6) We also do not consider leakage errors. These errors could jklm∈ be handled for Yb+ by treating each ion as a d = 4 qudit, {0,1,2}4\0000 regardless of whether we use it as a qubit or a qutrit.

11 B. REFERENCES [23] H. Y. Li, C. W. Wu, W. T. Liu, P. X. Chen, and C. Z. Li, “Fast quantum search algorithm for databases of arbitrary size and its implementation [1] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and in a cavity QED system,” Physics Letters A, vol. 375, pp. 4249–4254, S. Lloyd, “,” Nature, vol. 549, pp. 195 EP Nov. 2011. –, Sep 2016. [24] Y. Wang and M. Perkowski, “Improved complexity of quantum oracles [2] I. Kassal, J. D. Whitfield, A. Perdomo-Ortiz, M.-H. Yung, and for ternary grover algorithm for graph coloring,” in 2011 41st IEEE A. Aspuru-Guzik, “Simulating chemistry using quantum computers,” International Symposium on Multiple-Valued Logic, pp. 294–301, May 2010. 2011. [3] P. W. Shor, “Polynomial-time algorithms for prime factorization and [25] S. S. Ivanov, H. S. Tonchev, and N. V. Vitanov, “Time-efficient discrete logarithms on a quantum computer,” SIAM J. Comput., implementation of quantum search with qudits,” Phys. Rev. A, vol. 85, vol. 26, pp. 1484–1509, Oct. 1997. p. 062321, Jun 2012. [4] L. K. Grover, “A fast quantum mechanical algorithm for database [26] A. Bocharov, M. Roetteler, and K. M. Svore, “Factoring with qutrits: search,” 1996. Shor’s algorithm on ternary and metaplectic quantum architectures,” [5] J. Preskill, “Quantum Computing in the NISQ era and beyond,” Phys. Rev. A, vol. 96, p. 012306, Jul 2017. Quantum, vol. 2, p. 79, Aug. 2018. [27] C. Gidney, “Constructing large controlled nots,” 2015. [6] Y. Ding, A. Holmes, A. Javadi-Abhari, D. Franklin, M. Martonosi, and [28] Y. He, M.-X. Luo, E. Zhang, H.-K. Wang, and X.-F. Wang, F. T. Chong, “Magic-state functional units: Mapping and scheduling “Decompositions of n-qubit Toffoli Gates with Linear Circuit multi-level distillation circuits for fault-tolerant quantum architectures,” Complexity,” International Journal of Theoretical Physics, vol. 56, 2018. pp. 2350–2361, July 2017. [7] A. Javadi-Abhari, P. Gokhale, A. Holmes, D. Franklin, K. R. Brown, [29] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, M. Martonosi, and F. T. Chong, “Optimized surface code P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter, “Elementary gates communication in superconducting quantum computers,” in for quantum computation,” Phys. Rev. A, vol. 52, pp. 3457–3467, Nov Proceedings of the 50th Annual IEEE/ACM International Symposium 1995. on Microarchitecture, MICRO-50 ’17, (New York, NY, USA), pp. 692–705, ACM, 2017. [30] B. P. Lanyon, M. Barbieri, M. P. Almeida, T. Jennewein, T. C. Ralph, K. J. Resch, G. J. Pryde, J. L. O’Brien, A. Gilchrist, and A. G. White, [8] G. G. Guerreschi and J. Park, “Two-step approach to scheduling “Quantum computing using shortcuts through higher dimensions,” quantum circuits,” Quantum Science and Technology, vol. 3, 2008. p. 045003, jul 2018. [31] T. C. Ralph, K. J. Resch, and A. Gilchrist, “Efficient toffoli gates using [9] A. Pavlidis and E. Floratos, “Arithmetic circuits for multilevel qudits qudits,” Phys. Rev. A, vol. 75, p. 022313, Feb 2007. based on quantum fourier transform,” 2017. [32] “Repo is private during review period.” [10] J. Randall, S. Weidt, E. D. Standing, K. Lake, S. C. Webster, D. F. Murgia, T. Navickas, K. Roth, and W. K. Hensinger, “Efficient [33] F. Tacchino, C. Macchiavello, D. Gerace, and D. Bajoni, “An artificial preparation and detection of microwave dressed-state qubits and neuron implemented on an actual quantum processor,” 2018. qutrits with trapped ions,” Phys. Rev. A, vol. 91, p. 012322, 01 2015. [34] F. Tacchino. Personal Communication. [11] J. Randall, A. M. Lawrence, S. C. Webster, S. Weidt, N. V. Vitanov, and W. K. Hensinger, “Generation of high-fidelity quantum control [35] T. G. Draper, “Addition on a quantum computer,” 2000. methods for multilevel systems,” Phys. Rev. A, vol. 98, p. 043414, 10 [36] C. Gidney, “Factoring with n+2 clean qubits and n-1 dirty qubits,” 2018. 2017. [12] “Cirq: A python framework for creating, editing, and invoking noisy [37] T. Häner, M. Roetteler, and K. M. Svore, “Factoring using 2n + 2 intermediate scale quantum (nisq) circuits.” qubits with toffoli based modular multiplication,” Quantum Info. https://github.com/quantumlib/cirq, 2018. Comput., vol. 17, pp. 673–684, June 2017. [13] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum [38] D. G. Cory, M. D. Price, W. Maas, E. Knill, R. Laflamme, W. H. Information: 10th Anniversary Edition. New York, NY, USA: Zurek, T. F. Havel, and S. S. Somaroo, “Experimental quantum error Cambridge University Press, 10th ed., 2011. correction,” Phys. Rev. Lett., vol. 81, pp. 2152–2155, Sep 1998. [14] Y.-M. Di and H.-R. Wei, “Elementary gates for ternary quantum logic [39] E. Dennis, “Toward fault-tolerant quantum computation without circuit,” 2011. concatenation,” Phys. Rev. A, vol. 63, p. 052314, Apr 2001. [15] A. Muthukrishnan and C. R. Stroud, “Multivalued logic gates for [40] M. Otten and S. Gray, “Accounting for errors in quantum algorithms quantum computation,” Phys. Rev. A, vol. 62, p. 052309, Oct 2000. via individual error reduction,” 2018. [16] A. B. Klimov, R. Guzmán, J. C. Retamal, and C. Saavedra, “Qutrit [41] D. Miller, T. Holz, H. Kampermann, and D. Bruç, “Propagation of quantum computer with trapped ions,” Phys. Rev. A, vol. 67, generalized pauli errors in qudit clifford circuits,” 2018. p. 062313, Jun 2003. [42] N. Khammassi, I. Ashraf, X. Fu, C. G. Almudever, and K. Bertels, [17] M. Kues, C. Reimer, P. Roztocki, L. R. Cortés, S. Sciara, B. Wetzel, “Qx: A high-performance quantum computer simulation platform,” in Y. Zhang, A. Cino, S. T. Chu, B. E. Little, D. J. Moss, L. Caspani, Proceedings of the Conference on Design, Automation & Test in J. Azaña, and R. Morandotti, “On-chip generation of high-dimensional Europe, DATE ’17, (3001 Leuven, Belgium, Belgium), pp. 464–469, entangled quantum states and their coherent control,” Nature, vol. 546, European Design and Automation Association, 2017. pp. 622 EP –, 06 2017. [43] “Quantum devices and simulators.” https: [18] T. BÃ˛ekkegaard, L. B. Kristensen, N. J. S. Loft, C. K. Andersen, //www.research.ibm.com/ibm-q/technology/devices/, D. Petrosyan, and N. T. Zinner, “Superconducting qutrit-qubit circuit: 2018. A toolbox for efficient quantum gates,” 2018. [44] D. Venturelli, M. Do, E. Rieffel, and J. Frank, “Compiling quantum [19] A. Fedorov, L. Steffen, M. Baur, M. P. da Silva, and A. Wallraff, circuits to realistic hardware architectures using temporal planners,” “Implementation of a toffoli gate with superconducting circuits,” Quantum Science and Technology, vol. 3, p. 025004, feb 2018. Nature, vol. 481, pp. 170 EP –, Dec 2011. [45] K. E. C. Booth, M. Do, J. C. Beck, E. Rieffel, D. Venturelli, and [20] A. D. Greentree, S. G. Schirmer, F. Green, L. C. L. Hollenberg, A. R. J. Frank, “Comparing and integrating constraint programming and Hamilton, and R. G. Clark, “Maximizing the for a finite temporal planning for quantum circuit compilation,” 2018. number of distinguishable quantum states,” Phys. Rev. Lett., vol. 92, p. 097901, Mar 2004. [46] J. Biamonte and V. Bergholm, “Tensor networks in a nutshell,” 2017. [21] M. H. A. Khan and M. A. Perkowski, “Quantum ternary parallel [47] T. A. Brun, “A simple model of quantum trajectories,” American adder/subtractor with partially-look-ahead carry,” J. Syst. Archit., Journal of Physics, vol. 70, no. 7, pp. 719–737, 2002. vol. 53, pp. 453–464, July 2007. [48] R. Schack and T. A. Brun, “A c++ library using quantum trajectories [22] Y. Fan, “Applications of multi-valued quantum algorithms,” 2008. to solve quantum master equations,” 1996.

12 [49] R. S. Smith, M. J. Curtis, and W. J. Zeng, “A practical quantum instruction set architecture,” 2016. [50] J. Johansson, P. Nation, and F. Nori, “Qutip: An open-source python framework for the dynamics of open quantum systems,” Computer Physics Communications, vol. 183, pp. 1760–1772, 8 2012. [51] J. Johansson, P. Nation, and F. Nori, “Qutip 2: A python framework for the dynamics of open quantum systems,” Computer Physics Communications, vol. 184, pp. 1234–1240, 4 2013. [52] A. Y. Chernyavskiy, V. V. Voevodin, and V. V. Voevodin, “Parallel computational structure of noisy quantum circuits simulation,” Lobachevskii Journal of Mathematics, vol. 39, pp. 494–502, May 2018. [53] N. M. Linke, D. Maslov, M. Roetteler, S. Debnath, C. Figgatt, K. A. Landsman, K. Wright, and C. Monroe, “Experimental comparison of two quantum computing architectures,” 2017. [54] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank, E. Jeffrey, T. C. White, J. Mutus, A. G. Fowler, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, C. Neill, P. O‘Malley, P. Roushan, A. Vainsencher, J. Wenner, A. N. Korotkov, A. N. Cleland, and J. M. Martinis, “Logic gates at the surface code threshold: Superconducting qubits poised for fault-tolerant quantum computing,” 2014. [55] E. Barnes, C. Arenz, A. Pitchford, and S. E. Economou, “Fast microwave-driven three-qubit gates for cavity-coupled superconducting qubits,” Phys. Rev. B, vol. 96, p. 024504, Jul 2017. [56] M. Reagor, W. Pfaff, C. Axline, R. W. Heeres, N. Ofek, K. Sliwa, E. Holland, C. Wang, J. Blumoff, K. Chou, M. J. Hatridge, L. Frunzio, M. H. Devoret, L. Jiang, and R. J. Schoelkopf, “ with millisecond coherence in circuit qed,” Phys. Rev. B, vol. 94, p. 014506, Jul 2016. [57] N. Earnest, S. Chakram, Y. Lu, N. Irons, R. K. Naik, N. Leung, L. Ocola, D. A. Czaplewski, B. Baker, J. Lawrence, J. Koch, and D. I. Schuster, “Realization of a Λ system with metastable states of a capacitively shunted fluxonium,” Phys. Rev. Lett., vol. 120, p. 150504, Apr 2018. [58] S. M. Girvin, “Circuit qed: Superconducting qubits coupled to microwave photons,” [59] K. R. Brown, J. Kim, and C. Monroe, “Co-designing a scalable quantum computer with trapped atomic ions,” npj , vol. 2, p. 16034, 2016. [60] N. C. Brown and K. R. Brown, “Comparing zeeman qubits to hyperfine qubits in the context of the surface code: 174Yb+ and 171Yb+,” Phys. Rev. A, vol. 97, p. 052301, May 2018. [61] R. K. Naik, N. Leung, S. Chakram, P. Groszkowski, Y. Lu, N. Earnest, D. C. McKay, J. Koch, and D. I. Schuster, “Random access quantum information processors using multimode circuit quantum electrodynamics,” Nature Communications, vol. 8, no. 1, p. 1904, 2017. [62] M. Soeken, S. Frehse, R. Wille, and R. Drechsler, “Revkit: An open source toolkit for the design of reversible circuits,” in Reversible Computation (A. De Vos and R. Wille, eds.), pp. 64–76, Springer Berlin Heidelberg, 2012. [63] D. M. Miller, D. Maslov, and G. W. Dueck, “A transformation based algorithm for reversible logic synthesis,” in Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451), pp. 318–323, June 2003. [64] V. Karimipour, A. Mani, and L. Memarzadeh, “Characterization of qutrit channels in terms of their covariance and symmetry properties,” Phys. Rev. A, vol. 84, p. 012321, Jul 2011. [65] M. Grassl, L. Kong, Z. Wei, Z.-Q. Yin, and B. Zeng, “Quantum error-correcting codes for qudit amplitude damping,” 2015. [66] J. Ghosh, A. G. Fowler, J. M. Martinis, and M. R. Geller, “Understanding the effects of leakage in superconducting quantum-error-detection circuits,” Phys. Rev. A, vol. 88, p. 062329, Dec 2013.

13