<<

Quantum Algorithms

Student Seminar RWTH Aachen

Winter Semester 2020/2021 Contents

1 Gate Set Tomography 6 Lucas Marcogliese (under the supervision of Maarten Wegewijs) 1.1 Introduction...... 6 1.2 Operational representation of quantum processors...... 7 1.2.1 Quantum circuits and open quantum systems...... 7 1.2.2 Superoperator formalism...... 8 1.2.3 Physical constraints in the canonical gauge...... 9 1.2.4 Black box model...... 9 1.3 Breaking self-referentiality in traditional quantum tomography...... 10 1.3.1 The self-referentiality problem...... 11 1.3.2 SPAM errors...... 12 1.4 Linear gate set tomography...... 13 1.4.1 Self-consistent approach to quantum tomography...... 13 1.5 Applications of a gate set estimate...... 15 1.5.1 Predictions for arbitrary sequences...... 16 1.5.2 Figures of merit and the lack of gauge invariance...... 16 1.6 Gauge fixing simulation with pyGSTi ...... 18 1.6.1 Overview of pyGSTi algorithm...... 19 1.6.2 Simulation results and analysis...... 22 1.7 Conclusion...... 25 App. 1.A Example: Pauli transfer matrix representation...... 26 App. 1.B Physical constraints on gate set under the blame gauge transformation 26 App. 1.C Overcomplete data with least squares...... 27 App. 1.D Algorithm scaling analysis...... 27 App. 1.E Long-sequence GST: commuting errors and amplificational completeness 28 App. 1.F Motivation for maximum likelihood estimation...... 28 App. 1.G Target and estimate gate PTMs...... 29 App. 1.H Simulation: native state and effect analysis...... 30

2 Variational Quantum Algorithms 32 Bavithra Govintharajah (under the supervision of Fabian Hassler) 2.1 Introduction...... 32 2.2 Variational quantum algorithms...... 34 2.2.1 Outline of the algorithm...... 34 2.2.2 The variational paradigm and general quantum systems...... 35 2.3 Variational ansatz construction...... 37 2.3.1 Arbitrary unitary evolution...... 38 2.3.2 Problem-inspired ansatz construction...... 39 2.3.3 Hardware-efficient ansatz construction...... 40 CONTENTS 3

2.4 Cost function...... 41 2.5 Measurement scheme...... 43 2.5.1 Precision and Chebyshev’s inequality...... 43 2.5.2 Measurement scheme optimization...... 44 2.6 Optimization routine...... 44 2.7 hands-on example: Molecular quantum simulation using the VQE.. 47 2.7.1 Classical pre-processing...... 47 2.7.2 Simulating the Hydrogen molecule...... 51 2.8 Outlook and conclusions...... 58 App. 2.A The variation principle...... 58 App. 2.B Pauli basis...... 60

3 Topological : Codes and Decoders 61 Orkun Şensebat (under the supervision of Eliana Fiorelli) 3.1 Introduction...... 61 3.2 Quantum Error Correction...... 64 3.2.1 The Stabilizer Formalism...... 64 3.2.2 Comparison of Quantum Codes...... 66 3.3 Topological Error Correction...... 69 3.3.1 The ...... 69 3.3.2 Planar Codes...... 73 3.3.3 Code Thresholds...... 75 3.4 Conclusion...... 79 3.5 Acknowledgments...... 80

4 Quantum Walk and its Universality 81 Kiran Adhikari (under the supervision of Thomas Botzung) 4.1 Introduction...... 81 4.2 Quick review of Classical Random walk...... 82 4.2.1 Classical random walk on an unrestricted line...... 82 4.3 Quantum walk...... 84 4.3.1 Discrete quantum walk on a line...... 84 4.3.2 Continuous-time quantum walk (CTQW)...... 87 4.3.3 Universal Computation by Quantum walk...... 90 4.4 Conclusion...... 93 App. 4.A Notebook for Discrete quantum walk simulation...... 93

5 Adiabatic 98 Ananya Singh (under the supervision of Markus Müller) 5.1 Introduction...... 98 5.1.1 Overview of the report...... 99 5.2 Adiabatic Theorem...... 99 5.2.1 Adiabatic ...... 100 5.2.2 Landau-Zener transition...... 101 5.3 Adiabatic Quantum Computing...... 103 5.3.1 The ingredients...... 103 5.3.2 Some simple examples...... 105 5.3.3 A simple application - 2-SAT with 3 ...... 106 5.4 Adiabatic Quantum Algorithm...... 107 5.4.1 Needle in the haystack...... 108 5.4.2 Finding the needle adiabatically...... 109 4 CONTENTS

5.5 Universality of AQC...... 111 5.5.1 Gate model can efficiently simulate AQC...... 112 5.6 Summary...... 114 5.6.1 NP-Complete problems...... 114 App. 5.A Adaibatic theorem calculation...... 115 App. 5.B Codes...... 116

6 Topological Quantum Error Correction: 117 Dmitrii Dobrynin (under the supervision of Lorenzo Cardarelli) 6.1 Introduction...... 117 6.2 Theory of Quantum Error Correction...... 118 6.2.1 Basic Codes...... 118 6.2.2 Stabilizer Formalism...... 119 6.2.3 Fault-tolerance and Universality...... 121 6.2.4 Topological Error Correction: Surface Code...... 122 6.3 Magic State Distillation...... 124 6.3.1 Magic States...... 124 6.3.2 Distillation Algorithm...... 125 6.3.3 Costs of Computation with Magic States...... 128 6.3.4 Implications of Magic States...... 129 6.4 Conclusion...... 129 App. 6.A Program for MSD plots...... 130 CONTENTS 5

These notes are the result of a student seminar that took place during the win- ter semester 2020/2021 at RWTH Aachen. The seminar was geared towards beginning master’s students. The goal of the seminar was to introduce the stu- dents to ‘advanced’ quantum algorithms that are not covered in the introductory lecture on theory. The students were asked to consult the literature, realize some analytical or numerical work on the chosen topic, pre- pare a talk, as well as these notes. The students were assigned a supervisor that offered guidance. The result of the seminar can be found in these notes. Please be aware that the notes are written by the students with little to no corrections. As such, they should be read critically and taken with a grain of salt. Still we hope that the notes might be of use, in particular as a first introduction and as a guide to future literature. The supervisors Chapter 1

Gate Set Tomography

Lucas Marcogliese Supervisor: Maarten Wegewijs

Gate set tomography (GST) is a quantum algorithm that self-consistently char- acterizes a quantum processor’s set of available operations. By extracting in- formationally complete sets of fiducial states and effects from the original gate set, it successfully overcomes the self-referentiality problem associated with tra- ditional quantum process tomography. In this report, the derivation of the lin- ear GST algorithm is presented. The intrinsic gauge freedom in tomography is shown to emerge naturally from the Born rule and does not prevent one from using the GST estimate to predict experimental observations for arbitrary sequences. Moreover, the dangers of using standard, gauge-variant figures of merit with GST estimates to quantify device performance are considered and a suitable, gauge-invariant alternative is discussed. Advanced techniques tech- niques for higher accuracy, namely long-sequence GST, are introduced and an overview of the software package pyGSTi is given. Finally, the results of a simulation probing the influence of pyGSTi’s gauge fixing procedure on GST’s estimates are analyzed.

1.1 Introduction

Modern quantum computing platforms such as ion traps, superconducting qubits and semi- conductor qubits, all share a common functionality: the register is initialized in some state, a clever quantum algorithm is performed and the resulting output state is measured yielding a certain outcome. These promising candidates also have the common fragility of interacting with their environment which is on one hand necessary to control the qubits and on the other hand constitutes a channel for noise and decoherence. Luckily, fault-tolerant quantum error correction (FTQEC) schemes predict the feasibility of univer- sal quantum computation with noisy systems as long as their error rates are below certain thresholds. But how do we characterize a quantum processor to evaluate whether it meets FTQEC thresholds and, more generally, to quantify its overall performance with respect to its competitors especially across different platforms? Figures of merit such as average gate fidelities obtained by randomized benchmarking are regularly reported in the literature, but they fail to extract individual gate errors. Quantum tomography represents the only Lucas Marcogliese 7 technique capable of reconstructing the entire set of quantum logic gates by fitting a model to a dataset acquired from a given set of quantum circuits. However, traditional methods suffer from state preparation and measurement (SPAM) errors in the estimates, made when coping with the self-referentiality of those schemes. Ironically, to capture the full picture and extract the most information from a quantum processor, one must take a step back and become agnostic to the inner workings of the device. Gate set tomography (GST), a self-consistent approach to quantum tomography, accurately characterizes quantum devices using precisely such a black box model. The present report aims to give readers the necessary theoretical foundation of GST, to equip them with the analytical tools to critically interpret GST estimates and to impart an appreciation for GST’s importance in the development of reliable quantum hardware. The report begins with a short introduction on quantum circuits and open quantum sys- tems. The latter is followed by the formulation of an operational representation of quantum tomography via the superoperator formalism which provides a compact description of quan- tum circuits. Section 1.3 discusses how traditional methods break the self-referentiality of traditional quantum tomography, ultimately leading to pronounced errors in the estimate. The derivation of the self-consistent GST algorithm is presented in section 1.4 along with its applications in section 1.5 and a discussion of an interesting result of self-consistency: an estimate defined up to a global gauge freedom. Finally, results of a GST simulation with the state-of the-art software package pyGSTi, probing its gauge fixing procedure, are analyzed in section 1.6.

1.2 Operational representation of quantum processors

1.2.1 Quantum circuits and open quantum systems

Quantum logic experiments are commonly modeled as quantum circuits involving an initial state preparation, a sequence of quantum processes/gates1 that are ideally unitary (thus reversible) and a measurement. Performed sequentially, these elements of the experiment produce an outcome with a certain probability. The circuit is represented by operations joined graphically by wires as depicted in Fig. 1.1. Conventionally, the circuit runs from left to right. Here, the opposite direction is preferred in order to remain consistent with the form of the Born rule which, as we will later see, lies at the heart of tomography. State preparation

Measurement A B

Fig. 1.1: A coupled to environment E (or ancilla B)[1].

GST aims to characterize noisy devices or, equivalently, open quantum systems. The latter is described by the coupling of the system performing the operations in a quantum circuit A to its environment E. Alternatively, the environment can be modeled by an ancilla system B if the information required to model such an environment is inaccessible. In an open system, the initial state is described by the tensor product of the density

1These sequences can be separated into layers if gates acting on different qubits are quasi-independent as delimited by the dotted lines in Fig. 1.1. 8 Gate Set Tomography matrices ρ ρ . An operation on the composite (purified) system is still defined by a A ⊗ B unitary evolution, but the effective noisy gate acting on system A is no longer an ideal, G reversible process. By performing an orthogonal measurement on system B2 and discarding ("tracing out") the outcomes, we can implement gate in the Kraus form acting on ρ G A for outcomes n (see [2, chapter 4]),

N X (ρ ) = K ρ K† (1.1) G A n A n n=1 where the Kraus operators Kn are not necessarily unitary, but satisfy the completeness P † 1 relation n = KnKn = A [3]. By applying a similar orthogonal measurement on a composite operation, this time without discarding the outcomes from system B, essentially performing a read-out, the positive-operator valued measure (POVM) can be derived. This measurement is a gen- eralization of the more familiar orthogonal measurement which is in fact insufficient to provide a unique estimate in tomography (see 1.3). A POVM is a set of effect operators P 1 satisfying the completeness relation i Ei = A. Each Ei is associated to an outcome mi occurring with probability pi = Tr(EiρA). POVMs are important because, in practice, most measurements are not actually orthogonal. For example, a photodetector must de- stroy a photon to detect it, corresponding to E = 0 1 and a "click" on the detector [2]. 1 | i h | The absence of a photon (no click) would be identified by the effect E = 0 0 . By the 0 | i h | completeness relation, the remaining effect operator 1 E E would be an inconclusive − 0 − 1 result.

With this description of open systems, the probability of obtaining outcome mi from a simple quantum circuit consisting of a single gate can be predicted by the Born rule,

p = Tr(E (ρ)) (1.2) i iG as described by quantum theory.

1.2.2 Superoperator formalism

Although the Kraus form is useful to describe open quantum systems, it is an inconvenient representation for quantum circuits. Constructing sequences of gates would require a cumbersome composition of multiple quantum operations expressed in the Kraus form. A compact notation is instead achieved with the superoperator formalism in which state and effect matrices are encoded into column vectors ρ ρ and row vectors E E of → | ii → hh | length d2 whereas gates are mapped onto d2 d2 superoperator matrices (ρ) ρ , × G → G | ii where d = 2n for an n-qubit register [4]. The resulting space of operators is called the Hilbert-Schmidt (or Liouville) space denoted ( ) and its subspace of Hermition operators is equipped with inner prod- B Hd uct E ρ = Tr(Eρ) [5]. The Pauli transfer matrix (PTM) is a convenient and intu- hh | ii itive superoperator representation based on an expansion in the Pauli basis P ⊗n where P = I,X,Y,Z [4], conveniently denoted P j . The PTM coefficients of the { } | jii ≡ | ii

2 The orthogonal measurement corresponds to 1A ⊗ |iB ihiB | where |iB ihiB | is a projector and hjB |iB i = δij . Lucas Marcogliese 9 states, effects and gates can thus be expressed as,

d2 1 X ρ = Tr(ρP ) j (1.3) | ii √ j | ii d j=0 d2 1 X E = (Tr(ρP )) i (1.4) hh | √ i hh | d i=0 1 G = Tr(P (P )) (1.5) ij d iG j where j and i are basis vectors with i j = δ . See App. 1.A for an example with | ii hh | hh | ii ij a mixed state and Section 1.3.2 for examples of gates.

1.2.3 Physical constraints in the canonical gauge Real physical experiments yield outcomes with positive and normalized probabilities. States, effects and gates must therefore satisfy a set of mathematical constraints to ensure these properties. As will become clear later (see section 1.5.2), the representation of these con- straints is not unique. Nonetheless, the standard choice is the canonical gauge. In this canonical gauge, normalized probabilities for states and effects are ensured P 1 by the constraints Tr(ρ) = 1 and i Ei = . If this property is conserved under the transformation of a gate such that Tr( (ρ)) = Tr(ρ), then is called trace-preserving G G P G √ (TP). In the PTM representation, this is given by ρ0 = i(Ei)0 = 1/ d and G0j =  3 1 0 0 . ··· Positive probabilities in the canonical gauge require ρ to be positive semidefinite which we denote ρ 0. Through the inner product this implies that E 0. Once again, ≥ ≥ if this property is conserved under the transformation of a gate such that (ρ) 0 G G ≥ ρ 0, then is called positivity-preserving. If positivity is also conserved under the ∀ ≥ G transformation on the extended system such that ( 1B)ρ 0 ρ 0, then GA ⊗ AB ≥ ∀ AB ≥ G is called completely positivity-preserving or completely positive (CP). The CP constraint is much more challenging to express in the PTM representation than the TP constraint. Instead, we can encode the CP constraint on the gate using an isomorphism between quantum gates and states called the Choi-state operator, 1 ρG = ( ) 1 1 (1.6) d G ⊗ I | iihh | where 1 is an unnormalized maximally entangled state. The CP constraint requires | ii that ρG must have positive eigenvalues ρG 0, analogous to a real state. This, how- ≥ ever, must not be interpreted as an equivalency between gates and states: gates represent transformations and can be repeated in a quantum circuit, whereas states cannot. When referring to both constraints simultaneously, we will hereafter call these CPTP constraints. Note that CPTP constraints apply only to gates whereas states and effects are subject to positivity and normalization constraints. In some GST implementations, CPTP constraints can be applied to ensure that estimates correspond to real physical operations (see section 1.6.2).

1.2.4 Black box model With the mathematical preliminaries in place, we can now derive an operational represen- tation for noisy quantum circuits. In tomography, we want to model the system as a black 3This is equivalent to hh1| G = hh1| since hh1| G |ρii = hh1|ρii = 1 with hh1| = 1 0 0 0. 10 Gate Set Tomography box in order to remain agnostic to the physical details of the device. From an operational standpoint, outcomes are measured in the lab with a certain probability determined by the Born rule,

p(E , , ρ ) = E ρ (1.7) i Gk j hh i| Gk | jii where each operation in the quantum circuit is indexed by a classical control or "button" i, j, k on a black box, shown in Fig. 1.2, containing the quantum device. Note that this is equivalent to Eq. (1.2), with the only difference that sequences can be conveniently written as a product of superoperator matrices E ρ as opposed to a composition in hh i| Gk ···G1 | jii the Kraus form Tr(E ( (ρ))). iGk ···G1 Suppose we choose to perform a binary outcome qubit experiment. By pressing a sequence of buttons according to a given quantum circuit, the experiment will produce one of two outcomes, labeled 0 and 1 and represented as a blinking green or red light on the black box. By repeating the experiment many times, statistics can be gathered to estimate the probability predicted by the Born rule. In this way, a representation is achieved that is only a function of operations implemented in the lab and can be used to model virtually any quantum computer regardless of the physical platform and detailed workings of the device.

풢1

휌 E

풢K

Fig. 1.2: Quantum device in a black box with classic controls [6].

The following theoretical treatments of both traditional and self-consistent tomography rely on one common, important assumption: the sequence of possible events resulting from the sequence of button presses on the black box is a Markov process [6]. In other words, the result of a press of a button depends only on the instantaneous device state and is not subject to correlated errors (no drift in parameters). This allows us to write the time average as a separable product [7],

Tr(E ρ ) = Tr( E ρ ) h iGk j i h iihGkih ji where denotes the time average, not the inner product. The assumption greatly sim- h•i plifies the subsequent framework and is reasonably fulfilled for systems of high enough quality meriting characterization in the first place. Nevertheless, there exist methods to detect violations of Markovianity [8] and efforts to develop correlated error detection tech- niques [9][7].

1.3 Breaking self-referentiality in traditional quantum tomography Lucas Marcogliese 11

1.3.1 The self-referentiality problem In quantum theory, the Born rule is normally used to calculate probabilities with a known quantum circuit, including the states and effects. In tomography, the situation is reversed: probabilities are estimated from repeatedly measured outcomes which are then used to infer the unknown circuit. Traditionally, the effects and/or states were assumed known in order to invert the problem and calculate coefficients for either effects, states or gates. These traditional methods, called state, measurement and process tomography respectively, are examined in this section to expose their inherent self-referential structure motivating the need for GST. Despite this fundamental issue with the traditional approaches, all forms of tomography share a common requirement: informational completeness. Definition 1.3.1. A set of states (effects) is informationally complete (IC) if it forms a complete basis for the computational Hilbert-Schmidt (dual) space ( ) of dimension B Hd d2 [1]. With this condition we can state the key result of tomography, Theorem 1.3.1. Given IC sets of (i) states, (ii) effects or (iii) both states and effects, tomographic estimates of (i) effects, (ii) states or (iii) gates are uniquely determined [1].

Before considering the proof, we discuss some examples. IC sets can be better visualized for a single qubit with d2 = 4 on the Bloch sphere. To uniquely determine a density matrix ρ, we need at least 4 linearly independent effects that span ( ). Since these are rank-1 B Hd projective (non-orthogonal) operators, they can be represented on the Bloch sphere (which is usually used to represent states) and form a tetrahedron, two examples of which are shown in Fig. 1.3.

n 1  1  i 2π  1  i 4π o 0 , √ 0 + √2 1 , √ 0 + √2e 3 1 , √ 0 + √2e 3 1 Fig. 1.3: Symmetric IC set (left) 3 3 3 and n| i | i | i | i o | i | i | i example of non-symmetric IC set 0 , 1 , √1 ( 0 + 1 ) , √1 ( 0 + i 1 ) (right) of pure states for a | i | i 2 | i | i 2 | i | i single qubit represented as regular and irregular tetrahedrons in the Bloch sphere [10][11].

More generally, given d2 1 effects4, probabilities (p ) = E ρ can be estimated − ρ i hh i| ii which yield coefficients that uniquely determine an unknown state ρ by linear inversion, | ii −1  E   (p )  hh 1| ρ 1 ρ =  .   .  (1.8) | ii  .   .  E 2 (p ) 2 hh d | ρ d where each E is row vector. Note that the stacked effect matrix is only invertible if the hh i| effects form an IC set. In this case, the matrix is of full rank and Eq. (1.8) has a unique

4The TP constraint, a single equality, reduces the number of required effects by 1. 12 Gate Set Tomography solution, proving the previous theorem. This describes state tomography. Similarly, given an IC set of states, we can perform measurement tomography (p ) = E ρ to uniquely E j hh | jii estimate an unknown effect E , hh |  −1 E = (p ) ... (p ) 2 ρ ... ρ 2 (1.9) hh | E 1 E d | 1ii | d ii where each ρ is a column vector. Finally, quantum process tomography (QPT) probes a | jii set of gates : k 1, 2, ,K with IC sets of both states and effects. The corresponding {Gk ∈ ··· } probabilities p = E ρ allow one to reconstruct the gates, ijk hh i| Gk | jii

 −1   Ei p11k pd21k hh . | . ···. . −1 =  .   . . .  ρ ... ρ 2 (1.10) Gk  .   . . .  | 1ii | d ii E 2 p 2 p 2 2 hh d | 1d k ··· d d k Although Eq. (1.8)-(1.10) appear to concede the tomographic estimates directly, they are highly deceptive since they omit explicit reference to a basis. To implement any of these traditional approaches, we need IC sets of states and effects, but determining these effects requires a known set of states and vice versa. The "solution" ultimately depends on the observer. Confronted with this self-referential problem, an observer who assumes nothing about these sets quickly realizes that Eq. (1.8)-(1.10) become utterly futile. If instead, a naive observer pretends to know the sets of states and effects, essentially breaking this inherent self-referentiality, the estimate becomes riddled with state preparation and measurement (SPAM) errors. The next section delves into the pitfalls of this procedure leading to SPAM errors, precisely the errors that GST will be shown to eliminate in section 1.4.

1.3.2 SPAM errors

Traditionally, the naive position was adopted, assuming a perfectly precalibrated reference frame which is never truly available in practice. Most physical hardware systems have at most one native state and effect when turned on, i.e., before performing any other operation. For example, a spin qubit can have a native pure state oriented along a very strong static magnetic field B and the native projective effect 0 obtained by spin-to- z hh | charge conversion. Tomographers must apply gates to these native states and effects to build IC sets and span the linear space containing the Bloch sphere, but these noisy gates are exactly what tomography aims to characterize in the first place! Assuming the IC sets are known corresponds to working in the wrong reference frame and leads to SPAM errors. To appreciate the impact of SPAM errors, consider a gate set composed of the three non-trivial gates X ,Y ,X , whose matrices are plotted as color maps in the left { π/2 π/2 π} column of Fig. 1.4 in the PTM representation [4]. These "actual" gates, which include the actual noise on the ideal "target" behaviours, were used to simulate data on which GST and (QPT) were later performed. Note that, in practice, the actual gates are inaccessible to the experimentalist and one feeds real experimental data to the respective tomography protocol. In this example, only the Yπ/2 gate is faulty, evident by the slightly coloured (2, 2) and (4, 4) diagonal terms in the left column, while the rest are ideal. These gates provide an intuitive example of the PTM representation given by Eq. (1.3): rows 3 and 4 of X and X transform axes (Y,Z) ( Z,Y ) and (Y,Z) ( Y, Z) respectively. π/2 π → − → − − Plotted in the right column are the differences between the estimates obtained by (quantum) process tomography and the actual set. We immediately observe that the first rows in the estimates are always exact due to the TP constraint. More importantly, despite Lucas Marcogliese 13

actual GST − actual QPT − actual 1 0.1 0.1 /2

π 0 0 0 X

−1 −0.1 −0.1 1 0.1 0.1 /2

π 0 0 0 Y

−1 −0.1 −0.1 1 0.1 0.1 π

X 0 0 0

−1 −0.1 −0.1 1 0.1 0.1

0 0 0 Identity −1 −0.1 −0.1

Fig. 1.4: SPAM errors:Figure 4.3: distribution Pauli transfer matrices of actual for GST errorand QPT in maximumYπ/2 lamongstikelihood estimates rest of QPT estimates I,X ,Y ,X (2nddue and 3rd to columns) assuming for the same known data as set in Fig. of 4.2, states with actual and PTMs effects (column 1) [4]. GST estimates { π/2 π/2 π}subtracted off. The GST errors are about 1% of QPT errors, and are not visible on this contain only finitescale. sampling error, invisible on the SPAM error scale.

only the Yπ/2 gate being faulty, its errors are redistributed among all the other gates in the estimated set due to the naive procedure of assuming incorrect IC sets of states and effects.

If these sets were chosen correctly, these SPAM39 errors would be reduced, but verifying this is impossible. The difference between the GST estimates and the target gates are plotted in the center column. Since the problem is treated self-consistently, GST errors are negligible in contrast to QPT and thus invisible on the scale of the QPT SPAM errors. The example deals with simulated data, therefore the error in the GST estimate arises purely from finite sample size, also occurring in QPT. In a real experiment, both QPT and GST estimates are susceptible to decoherence errors that shrink gate eigenvalues to 0 and correlated errors (which we neglect here). However only QPT is dominated by SPAM errors. The next section deals with the derivation of GST in its most fundamental form: linear gate set tomography.

1.4 Linear gate set tomography

The modern implementations of GST rely on different, sophisticated statistical estimators to deal with finite sample sizes, a separate issue treated in section 1.6.1 and App. 1.C, for the estimation of probabilities [12]. The present section introduces the original formulation of GST, linear gate set tomography, for two reasons. Firstly, it intuitively illustrates the self-consistent approach and secondly, it still serves as a seed for subsequent estimation steps in the software package pyGSTi (see section 1.6).

1.4.1 Self-consistent approach to quantum tomography

The self-consistent method was first proposed in 2013 by an IBM group [13]. There have been similar efforts by a group at Sandia National Laboratories [6] and we here follow their review [1]. The idea consists of absorbing the SPAM operations into the gate set to be estimated. One denotes these specifically as fiducial gates , which act on native Ai Bj 14 Gate Set Tomography effects/states to produce fiducial effects/states,

E0 E (1.11) hh i| ≡ hh | Ai ρ0 ρ (1.12) | jii ≡ Bj | ii 2 2 The set of fiducial gates are called IC if the sets ρ0 d and E0 d they gener- {| jii}j=1 {hh i|}i=1 ate form bases. These fiducial gates are in turn constructed self-consistently from the set of original gates which GST seeks to estimate. For example, given the gate set I,X ,Y , we can construct = I,X ,Y ,X X = G ∈ { π/2 π/2} {Ai} {Bj} ∈ { π/2 π/2 π/2 · π/2} I,X ,Y ,X [4]. This set of fiducials builds states ρ0 and effects E0 that form { π/2 π/2 π} | jii hh i| a basis since the vectors subtend an irregular tetrahedron on the Bloch sphere as shown in Fig. 1.3. Arbitrary states and effects can be built by standard linear combination with coefficients given by probabilities of a tomography experiment. To analyze tomographic measurements, we need to work with matrices and the oper- ational representation guarantees that we can work solely with indices corresponding to operations in the lab. However, it is unavoidable, as an intermediate step, to define the vector representations of fiducial states in unknown basis α and effects in dual basis β , | ii hh | A E α (1.13) iα ≡ hh | Ai | ii B β ρ (1.14) jβ ≡ hh | Bj | ii These fiducial vectors are indexed by i, j referring to buttons on the black box and can be stacked into matrices analogously to Eq. (1.10) before inversion, but are in this case constructed self-consistently. Similarly, we can represent the gate by a matrix with both Gk bases (that can be chosen independently),

(G ) α β (1.15) k αβ ≡ hh | Gk | ii 2 5 With fiducial IC sets Ai and Bj with d (co)vectors each, we define the GST gate set Θ as,

2 2 Θ = A d , G K , B d (1.16) {{ i}i=1 { k}k { j}j=1} This combined "gate set" is the object self-consistent tomography seeks to estimate. Al- though the vectors are expanded in unknown basis α it will soon become clear that the | ii indeterminacy of a reference frame is an innate result of the self-consistent method, but does not limit the practical value of the final estimate. The question now is how do we obtain an estimate for Θ using the self-consistent IC sets that allows us to make predictions of probabilities for arbitrary sequences? Anticipating an eventual linear inversion to solve for Gk which is probed with sequences involving fiducial gates, it becomes relevant to isolate the sequences consisting only of fiducial gates. To this end, we need to first define the null gate.

Definition 1.4.1. Every tomographic gate set Θ is required to contain the null gate 0 [4]. G ≡ I

where is the identity. This must not be misinterpreted as a strong constraint; it I merely means that a measurement can be performed immediately after state preparation which is a fair assumption. In fact, without this condition, gates would be ill-defined. The null gate simply marks where one gate ends and the next begins.6 The trivial null gate is 5"Gate set" is a misnomer from the literature since A and B are not actually gates. 6It should not be mistaken for a "do-nothing" or "wait" operation since these can in principle be noisy. Characterizing wait times would thus require including a separate, dedicated gate into the set. Otherwise, it can be neglected if the tomographic sequences are much shorter than the decoherence time of the system. Lucas Marcogliese 15 associated to a tomographic null matrix which contains nontrivial information about the fiducial states and effects. This is the matrix of the measured probabilities of sequences involving pairs of fiducial gates and the (implicit) null gate, i.e. a measurement immediately following state preparation,

(P ) E0 ρ0 = (A B) (1.17) 0 ij ≡ hh i|G0| jii · ij Accordingly, this matrix is indexed by buttons i, j that can be performed in the lab7, as one would expect for measured probabilities. Note how the unknown basis indices only appear in the invariant scalar product of this operational expression. As opposed to the null gate, the tomographic null matrix can be noisy since it arises from faulty fiducial gates. In practice, the IC fiducials will have finite error and not be perfectly linearly independent, thus P0 might be ill-conditioned [4]. The experimentalist must therefore find appropriate fiducial sets that maximize the eigenvalues of P0. The remaining experimental information is contained in the tomographic gate matrices indexed by gate buttons k,

(P ) E0 ρ0 = (A G B) (1.18) k ij ≡ hh i|Gk| jii · k · ij where k 1, ,K . To obtain an estimate for , we apply the inverse of P to the left ∈ { ··· } Gk 0 of Pk and invert,

G = B P −1 P B−1 (1.19) k · 0 · k · The estimate is a function of measured probabilities P −1 P and defined up to a global 0 · k gauge freedom B B−1. Analogous to the traditional methods described by Eq. (1.8) and • 1.9, we can also perform self-consistent state (effect) tomography by fixing the native effect (state) and applying each of the corresponding fiducial gates to obtain coefficients (Pρ)i ((PE)j),

0 −1 (Pρ)i E ρ = E i 0 ρ = Ai B0 B0 = B P Pρ (1.20) ≡ h i i hh | A G | ii · · 0 · 0 −1 (PE)j E ρ = E 0 j ρ = A0 Bj A0 = PE B (1.21) ≡ h j i hh | G B | ii · · where B0 and A0 represent the native state and effect columns and rows of B and A respectively. Note that these estimates are also determined up to gauge transformations by B,B−1. This gauge freedom is unavoidable: it emerges because in tomography the Born rule is used to retrodict the underlying model rather than to predict the probabilities. Instead of naively assuming an IC set of states and effects in an arbitrary reference frame, we self-consistently extract it from a sufficiently large set of estimations for the original gates of interest. Self-consistency is achieved up to the gauge freedom that is fundamentally built into quantum theory via the Born rule [9]. GST characterizes gates relative to each other [1], which, importantly is sufficient to make predictions as we will see in the next section. Avoiding SPAM errors of a naive observer allows us to obtain these relative estimates to much greater accuracy than with traditional process tomography based on ad-hoc assumptions, without compromising the success of powerful applications discussed in the following section.

1.5 Applications of a gate set estimate

7 0 0 P 0 0 P Note that (P0)ij = hhEi|G0|ρj ii = αhhEi|αiihhα||ρj ii = α AiαBαj = (A · B)ij 16 Gate Set Tomography

1.5.1 Predictions for arbitrary sequences

The most natural application of a GST estimate is to calculate probabilities for arbi- trary sequences. An accurate model for a quantum processor enables the prediction of complex quantum algorithms on real hardware. Given an arbitrary sequence of gates where m 1, 2, ,K , we can calculate the corresponding proba- GmN ···Gmn ···Gm1 n ∈ { ··· } bility using the Born rule,

p( ) = A G G G B (1.22) GmN ···Gmn ···Gm1 · mN ··· mn ··· m1 · = P P −1P P −1 P −1P P −1 P −1P P −1 P (1.23) E · 0 mN 0 ··· 0 mn 0 ··· 0 m1 0 · ρ where the last line was obtained by substituting the fiducial and gate matrices with Eq. (1.17) and (1.18). Note that the vectors Pρ and PE starting and ending the process produce a number for the sequence of tomographic gate matrices. In line with intuition, the inverse null matrix, coloured grey for legibility, glues all steps together. The gauge transformations in Eq. (1.19) and (1.20) disappear which demonstrates that the probabilities for arbitrary sequences are only a function of tomographic matrices, i.e., measured probabilities as they should be. To verify this explicitly, consider the invertible gauge transformation . S Plugging this into the Born rule for the simple case of a single gate yields, Gk 0 0 S 0 −1 −1 0  E k ρ E k ρ (1.24) h i G j i−→ h i S SG S S j i 0 0 = E k ρ (1.25) h i G j i Thus, predicting probabilities of arbitrary circuits with the GST estimate is gauge invariant as one would expect for any physical quantity.

1.5.2 Figures of merit and the lack of gauge invariance

The complete characterization provided by GST allows one to study quantities that assess its performance relative to target models, i.e. the system’s intended, ideal characteristics. Prominent figures of merit are the gate fidelity [3],

 2 qp p ( k, k) = min Tr k(ρ) k(ρ) k(ρ) (1.26) F G E ρ G E G and the diamond norm [5] 8,

k k = sup (( k k) d)(ρ) 1 (1.27) |G − E |♦ ρ || G − E ⊗ I || where is the GST estimate and is the ideal, target gate of . Both the gate fidelity Gk Ek Gk and the diamond norm have well-motivated operational meanings in quantum information and serve to distinguish two quantum channels [3]. They are used to define thresholds under certain noise models to achieve FTQEC. The detailed interpretation of these figures of merit is beyond the scope of this report (see [14, chapter 3]). However, the gauge freedom in GST estimates can already shed light on their obscure significance in the context of quantifying performance relative to target models. Eq. (1.26) and (1.27) are gauge variant because is only defined up to a global gauge Gk transformation whereas target must be defined by the user in some fixed gauge. In prac- Ek tice, this is the canonical gauge where is the superoperator of a CPTP map. Introducing Ek an invertible gauge transformation does not simply cancel out as it does in the Born √ S 8 † Here the trace norm is ||A||1= Tr AA Lucas Marcogliese 17

9 −1 −1 rule . Thus, generally ( k , k) = ( k, k) and k k = k k . In F SG S E 6 F G E SG S − E ♦ 6 |G − E |♦ practice, is often expressed in an "optimized" gauge (see section 1.6.1). Since FTQEC Gk figures of merit are not gauge-invariant in this context, they can be artificially tuned by gauge transformations despite the gauge transformed sets corresponding to equivalent ex- perimental observations. Therefore, these figures of merit must interpreted with caution when used with GST estimates.

This does not mean that there are no well-defined figures of merit as pointed out recently [5]: One can quantify the accuracy of a model with respect to a target gate as a benchmark for processor improvement by comparing gauge invariant probabilities computed with the two models for all possible circuits with K gates, a measure called the mean variation error (MVE),

1 X pij( k) pij( k) (1.28) NK | G − E | ijk

normalized by the total number of circuits NK . The sole appearance of device button indices i, j, k indicates that the MVE is gauge-invariant as verified in section 1.6.2.

Using a simple concrete model, the computational part of this report, section 1.6.2, non-trivially illustrates the above issues pointed out in Ref. [5]. This model consists of a mixed initial state with polarization ρ, where ρ = 1 and ρ = 0 correspond to pure and maximally mixed states respectively, and a noisy projective measurement with signal-to- noise ratio E , where limiting cases E = 1 and E = 0 distinguish a projection along z versus a non-informative measurement (trivial identity). Finally, the model includes a gate described by some unitary operation 10 with amplitude damping characterized by G U GAD damping parameter γ [5]. Note that here we do not distinguish between the superoperator and its matrix representation. Moreover, the unitary gate U represents the target, ideal gate whereas the actual gate is given by = which GST seeks to estimate along G U·GAD with the actual, noisy native state and effect. It is tempting to think that the "noise model", how introduces noise into the gate , "describes the physics". However, this GAD U cannot be true since, besides states and their transformations by gates, the Born rule also involves observations. This becomes clear by explicitly working out a gauge transformation:

1 0 0 0 0 q 0 0 =   S 0 0 q 0 0 0 0 q

9The fidelity and diamond norm can in fact be gauge-invariant, but the gauge transformation must be consistently applied to both Gk, Ek and the sets of allowable states and effects [5]. Proving this particular case of gauge-invariance goes beyond the scope of this work. 10Note that the superoperator U is given by the transformation U • U †. 18 Gate Set Tomography where q R. The transformation yields, ∈  1   1  1  0  S 1  0  ρ =   ρ =   (1.29) | ii √2  0 −→S | ii √2  0  ρ qρ 1 0 0 0   1 0 0 0  0 √1 γ 0 0  S −  0 √1 γ 0 0  =  −  1 =  −  G U 0 0 √1 γ 0 −→SGS U  0 0 √1 γ 0  − − γ 0 0 1 γ qγ 0 0 1 γ − − (1.30)   1  S −1 1 E E = 1 0 0  E = 1 0 0 (1.31) hh | √2 E −→hh | S √2 q

Two important observations are made. Firstly, the error between the state and the effect are exchanged which is why is called the blame gauge transformation [9] as it S trades the blame for error between the two. Secondly, the pre-transformed amplitude damping PTM in Eq. (1.30) describes an energy dissipation from the system such as spontaneous emission where 0 is an eigenstate. However, the post-transformed PTM | i represents generalized amplitude damping which instead describes T1 relaxation such as a chain of spins coupled to a thermal bath where the eigenstate of the system is a mixed state [3]. Clearly, the blame gauge transformation alters the noise model describing the system without altering any predictions for experimental observations actually made.11 Furthermore, one can show that all of the elements in the gate set satisfy the CPTP constraints represented by the canonical gauge if q [  , 1] (see App. 1.B). However, ∈ | E | if, for instance, we choose q = 2, the gate G is no longer CPTP whereas the predicted probabilities remain unchanged, positive and normalized. In other words, the canonical gauge is not required to predict physical probabilities. While at first glance this may seem surprising or disturbing, the set of allowable states and effects also change under the gauge transformation which compensates for the non-canonical G. The trouble with non-canonical matrices is that the physical constraints are very difficult to characterize, in contrast to the canonical gauge where the mathematics of CPTP is characterized by the Kraus form (see Eq. 1.1). Again, this issue arises because in tomography the Born rule is used in reverse (see [7, Fig. 1]), a case which does not normally find its way into standard quantum theory textbooks, despite the gauge freedom being a fundamental property of the Born rule. To conclude this section, it is now clear that although the inherent gauge freedom of the Born rule does not limit the practicality of GST’s estimates to predict probabilities, one must be exert caution with the interpretation of the figures of merit when expressed in a "fixed" gauge. The consequences of gauge fixing are explored further with a simulation in the following section along with some advanced considerations for higher accuracy.

1.6 Gauge fixing simulation with pyGSTi

Owing to the interesting gauge freedom that emerges from the Born rule in tomography, this computational section of the report focuses on a simulation that on one hand highlights the remarkable performance of GST and on the other hand exposes the dangers of the

11Note carefully: the two models differ for arbitrary observations but not for the observations actually made in the experiment. If the experiment would perform all possible observations, one could tell the difference. Lucas Marcogliese 19 temptation to circumvent this unavoidable gauge freedom by means of a gauge fixing procedure. The simulation was carried out with a sophisticated implementation of GST, the open-source Python software package pyGSTi12 developed by a research group at Sandia National Laboratories.

pyGSTi employs advanced techniques to produce estimates that are not limited by finite sample size, rather only by the quantum mechanical model. Moreover, it also attempts to deal with the inherent gauge freedom in GST estimates in order to evaluate "meaningful" figures of merit which can be compared to FTQEC thresholds, despite the ambiguity of these quantities in the context of GST as discussed in section 1.5.2. The algorithm uses a gauge fixing technique called gauge optimization that minimizes the error, according to some gauge-variant distance measure, between the estimate and a target model provided by the user. The authors acknowledge that this procedure is indeed "fundamentally a hack" [1]. Nonetheless, they motivate it (as a "good hack") with the following three arguments: (1) gauge optimization does not alter gauge-invariant quantities, i.e. errors on circuit-outcome probabilities are not artificially reduced, (2) gauge transformations however can change errors for a given estimated gate with respect to the actual gate of the device (see section 1.5.2), which merit minimization using the available experimental intuition since the system is designed to specifically behave as close as possible to a given target model and (3) any alternatively sought gauge fixing procedure essentially reduces to gauge optimization13 [1]. With this approach, it was claimed that FTQEC subthreshold error rates were achieved on a single trapped ion qubit using GST-assisted debugging [8]. However, gauge optimiza- tion remains a controversial procedure [5]. Aside from demonstrating a practical real- ization of GST with high accuracy estimates, the purpose of the simulation presented in this section is to study the interaction of the gauge fixing procedure with gauge- equivalent target models and to support the arguments put forward in section 1.5.2 in disfavour of reporting FTQEC figures of merit using GST estimates. In the follow- ing, an overview of the pyGSTi algorithm is first presented, followed by the analysis of the simulation results. The original simulation code is available on GitHub: https: //github.com/lumarcogliese/GST_gaugeFixingSimulation.

1.6.1 Overview of pyGSTi algorithm

As an input, the pyGSTi algorithm first requires a target model that captures the intended behaviour of the system, namely, consisting of an explicit noise model with a native state ρ and effect E and a set of gates , , . With a user-specified maximal circuit | ii hh | Gk ··· G1 length L, it then creates an experiment design with the appropriate list of circuits, including fiducial gate strings, for which experimental or simulated data must be acquired. With this information, pyGSTi can obtain high accuracy estimates via the following two main steps [8]:

12The software and all relevant documentation can be found at: https://www.pygsti.info/. 13The last argument just motivates this specific form of gauge fixing. 20 Gate Set Tomography

1. Long-sequence GST

• Gauge optimization Target gate set ← • (CPTP contraction)

2. Maximum likelihood estimation (MLE)

• (TP-constrained) gauge optimization Target gate set ← Both steps involve high accuracy estimation protocols, gauge optimization steps and the possibility to impose physical constraints. Each step is broken down into more detail in the following two subsections.

Long-sequence GST

The first step refers to an extended version of the linear GST protocol outlined in section 1.4 which greatly improves accuracy by considering additional, longer sequences. Using   this technique, the sampling error barrier, that scales as 1/√N with sample size N, O can be overcome. This is necessary since the number of samples becomes experimentally impractical with increasing qubit register size (see App. 1.D). It turns out that, for random   sequences of length L, the estimation error scales as 1/√LN [1]. This is intuitive since O considering additional, random sequences translates to collecting more data points, hence larger sample size. However, the improvement is still limited and begs the question: is there a clever alter- native to random sequences? Long-sequence GST answers this by increasing sensitivity to errors  with repeated sequences of gates called germs g = , m 1, 2, ..., K m GmN ···Gm1 n ∈ { } where each gate is linearly parametrized to enable data fitting [1]. When each gate is linearly parametrized as a function of error , then the germ is given by a non-trivial linear combination of its constituent gates’ free parameters. The method is called long- sequence GST because it involves raising these germs to the power ` resulting in care- fully crafted long sequences which amplify a germ’s eigenvalues. Diagonalizing the germ −1 gm() = Qm()Λm()(Qm()) and using the Born rule, we obtain,

P = A [Q ()Λ ()(Q ())−1]` B m` · m m m · B P −1P B−1 = Q ()(Λ ())`(Q ())−1 · 0 m` · m m m The gray elements of the left- and right-hand sides represent a gauge freedom and a basis change with matrix Qm() whose column vectors are eigenvectors of gm(). Note that the terms of the diagonal eigenvalue matrix Λm(), whose eigenvalues are a non-trivial function of , are amplified to the power of `, effectively increasing sensitivity to . If  can be estimated to accuracy (L) where L is the total length of the germ power sequence, O   then the estimation error scales as 1/(L√N) [1]. By increasing ` (and therefore L), O limited only by the decoherence time of the system, the error breaks the sampling error   barrier 1/√N since it is experimentally less complicated to increase L than N. For O further details on long-sequence GST and the types of errors that can be amplified, see App. 1.E. The long-sequence GST estimate is obtained by fitting a parametrized model Θ ∈ M to the long-sequence data, with denoting the fitting domain of parametrized statis- M tical models. Since gates appear more than once in a sequence data in long-sequence GST, each tomographic matrix element is fitted with a non-trivial function of a gate’s free Lucas Marcogliese 21 parameters, therefore the parametrized model is nonlinear. Next, gauge optimization is performed whereby the Frobenius distance between the long-sequence GST estimate and the user-specified target model is minimized. The Frobenius distance is chosen over the gate fidelity and diamond distance because the former does not produce unique minima and the Frobenius distance behaves quite similar to the diamond distance within gauge- equivalent sets while being computationally cheaper to compute14 [1]. Then, the user can optionally apply CPTP constraints which project the gauge optimized gate set to the near- est CPTP model in parameter space. The full details of this step are theoretically beyond the scope of this report and not required to interpret the results (see [1, section V]). Before the second maximum likelihood estimation (MLE) step, there is in fact an in- termediate, iterative χ2 estimation serving as a seed for MLE since it is computationally less expensive to compute. However, it has the disadvantage of being a slightly biased estimator unlike the more robust MLE [8]. The χ2 step therefore acts merely as an compu- tational speedup with MLE performed just once at the end. Hence, we hereafter disregard it and focus the discussion on MLE.

Maximum likelihood estimation

A likelihood function quantifies how well a parametrized model Θ reproduces a L ∈ M given data set D [15]. In other words, the likelihood that D was produced by Θ is written as (Θ) = p(D Θ) 15. Maximizing fits the parametrized model to the data set. For a L | L two-outcome experiment and a given sequence s, the likelihood function and the associated maximization problem read [8], X max log (Θ) = maxN [fs log ps(Θ) + (1 fs) log(1 ps(Θ))] (1.32) Θ∈M L Θ∈M − − s where N is the number of samples, ps is the probability of measuring a given outcome for sequence s predicted by model Θ and fs is the associated observed frequency. It is easily verified that the expression is maximized when ps = fs for all s. Since we disregard correlated errors, the model’s free parameters are independent and identically distributed random variables such that the likelihood function is locally asymptotically normal [15], meaning that it can be asymptotically approximated by a normal distribution for a large sample of experiments. To facilitate optimization, the logarithm of is usually taken in or- L der to obtain a function that is linear in ps. In the case of traditional tomography, each gate appears in a given sequence only once. Therefore, Θ would be a linear function of parame-  2 ters making the maximization problem convex with log (Θ) log( (µ, σ)) = Θ−µ L → N − σ having a unique maximum and admitting polynomial-time solvers. In GST, although the logarithm does reduce complexity to some degree, gates appear in s more than once thus the problem remains nonlinear, can be extremely non-convex and generally NP-hard (for details on computational complexity, see [3, chapter 3.2]). Despite this complexity, MLE is well-motivated statistical estimator that has been demonstrated to produce high accuracy GST estimates and is justified further in App. 1.F. Finally, as a very final subroutine, another gauge optimization is performed where the user can choose to apply TP constraints. Again, TP constraints are a simple linear equality and are much simpler than the CPTP contraction. Note that imposing the CPTP con- straints after long-sequence GST and the TP constraints after MLE are mutually exclusive options.

14Calculating the diamond distance requires semi-definite programs. 15The likelihood function is not to be confused with the opposite case where the distribution p(Θ|D) predicts the probability of obtaining a data set with a fixed model. 22 Gate Set Tomography

1.6.2 Simulation results and analysis To probe the response of the algorithm to gauge-equivalent target models, data was simu- lated with the noise model considered in the blame gauge transformation example in section 1.5.2. As a reminder, this involved a mixed initial state, a noisy projective measurement and a amplitude damped unitary gate . For this simulation, the gate set consisted U·GAD of two ideal, unitary gates I,X and one amplitude damped Y 0 Y rotation. π/2 π/2 ≡ π/2 ·GAD Under the blame gauge transformation, only the native state, native effect and non-unital 0 Yπ/2 gate acquire a q dependence. The gate set Θq is then given by,

  1    1 0 0 0        0  qγ 0 0 1 γ     √1   0   √1 E Θq = ρq =   , I,Xπ/2,Y (q) =  −  , Eq = 1 0 0 (1.33) | ii 2  0  π/2  0 0 √1 γ 0  hh | 2 q   −    qρ  0 √1 γ 0 0   − − 0 where the (2, 1) matrix element of Yπ/2 is boxed since it scales linearly with blame gauge parameter q and will be used to assess the estimate in the subsequent analysis. The value of q was varied within the range [  , 1] for which Θ predicts positive, normalized | E | q probabilities and each of these gauge transformed models was fed to the algorithm as a target model. Since probabilities are gauge-invariant, the input data remained constant and only the target model was varied. The input parameters were γ = 0.1, ρ = 0.9, E = 0.5, N = 1000 and L = 32. With these values, we obtain a near-pure state and relatively non- informative measurement such that the experiment is intentionally unbalanced in blame for error between preparation and detection. As a reference, the difference between estimate and actual gates in the PTM representation similar to Fig 1.4 are included in App. 1.G. For the sake of the argument, it is assumed that the "actual" noise model of the (sim- ulated) physical system is described by the model for which q = 1, thereby mimicking a situation where the experimentalist assumes the wrong target model, with gauge-equivalent noise models of different q. In the previous sections, the target model was always defined to be the ideal, intended gate set whereas here, the target set includes noise contributions (in the gates, not sampling noise) based on the experimentalist’s intuition. Whether such intuition-inspired gauge fixing adds any value to the estimate is precisely the question we seek to answer. In practice, the actual noise model and the target model might not be connected by a gauge transformation which would generate additional errors and pro- duce convoluted results ultimately complicating the analysis. For this reason, the former scenario was selected to isolate the effect of gauge fixing. 0 0 The TP and CPTP constrained Yπ/2(q) estimates were compared to the Yπ/2(q = 1) target (the "actual" noise model of the physical system) using the Frobenius distance, plotted below in Fig. 1.5. The distances between the gauge-equivalent target models 0 0 Yπ/2(q) and the Yπ/2(q = 1) target model are also plotted in yellow as a reference. We observe that the TP constrained estimates are entirely biased by the target model, 16   −  limited only by the estimation error on the order of 1/√N = 10 2 . The "suc- O O cess" of the gauge optimization is due to the simplicity of the TP constraint, a linear  equality that merely corresponds to fixing the first row of the gate matrix to 1 0 0 . ··· Conversely, the distance from CPTP constrained estimates to the q = 1 target is nearly invariant in q. The estimate also has a notably large variance which can be qualitatively explained by the topography of the likelihood function. pyGSTi uses a fully parametrized model that is nonlinear due to long-sequence GST which causes the likelihood function to oscillate in parameter space [1], forming an egg-carton-like profile with many local maxima. Taking gauge freedom into account, each of these local maxima traces out a gauge orbit 16In this case, long-sequence GST does not break the sampling error barrier since the system contains non-negligible decoherent errors (γ = 0.1). Lucas Marcogliese 23

0 0 Fig. 1.5: Frobenius distance between Yπ/2(q) estimates and "actual" damped Yπ/2(q = 1) gate as a function of blame gauge parameter q. of equal likelihood. Through an earlier example (see section 1.5.2), it was shown that gauge transformations can produce gates that do not satisfy the CPTP constraints without altering probabilities predicted by the Born rule. This means that CPTP constraints break gauge symmetry in parameter space and induce more local maxima within these gauge orbits. Each time CPTP constraints are applied within a gauge optimization step with respect to a new target model, the estimate can be projected to different points on the CPTP boundary in parameter space that lie near different local maxima. These distinct maxima produce the same likelihood and therefore the same probabilities with the Born rule, but can yield a vast range of gauge-equivalent matrices that are not necessarily close to the "actual" model.

0 Fig. 1.6: Matrix element (Yπ/2)21 as a function of blame gauge parameter q.

0 In Fig. 1.6, the (2,1) matrix element of the Yπ/2 gate is plotted as well as the target model value qγ. Once again, we observe complete biasing of the TP constrained estimates across the range of blame gauge parameters. Meanwhile, there is an clear discrepancy for q > 0.8 between the CPTP estimate and the expected qγ dependence. Interestingly, the error becomes worse near q = 1. One possible explanation: The upper bound of the CPTP range of q values stems from the restriction on gates with d4 d2 free parameters17 whereas − 17TP constraints reduce the number of free parameters by d2 since the first matrix row is given by a 24 Gate Set Tomography the lower bound is derived from the positive and normalization constraint on effects with d2 parameters (see App. 1.D and 1.B). The physical constraints delimit a boundary in parameter space with a dimension equal to the number of free parameters. For example, for a qubit density matrix with 3 free parameters, the positive and unit trace condition constrains the Bloch vector ~n 1 which amounts to a 3-dimensional surface boundary, the | |≤ Bloch sphere. In GST, unconstrained estimates are normally distributed. However, any unconstrained estimate that lies outside the physical space will be projected to the nearest point on the boundary (see [1, section 5]). If the upper and lowers bounds are represented by constraints with different dimensions in parameter space, then the respective collapsed estimate distributions will be skewed differently. Namely, the higher dimensional boundary at q = 1 will shift the mean values of the individual parameters more than at q = 0.5 (see Fig. 1.10) and the resulting difference between the target model’s (2,1) matrix element, located at the boundary in parameter space for q = 1, and that of the estimate will increase. The next question is: are the probabilities actually positive and normalized and how exactly do they change with q if the target models are all gauge-equivalent? Ideally, we would expect no change, but to verify this, we use the gauge-invariant measure, the mean variation error given by Eq. (1.28), between the estimates and the target models for each constrained case, plotted in Fig. 1.7.

Fig. 1.7: Mean variation between estimates and target model as a function of blame gauge parameter q.

All the probabilities were verified to be positive and normalized in both cases. Indeed, the MVE values remain virtually constant. According to this gauge-invariant metric, the CPTP-constrained estimates actually perform better than the TP-constrained estimates. 0 This is not inconsistent with the previous analysis. Although the TP-constrained Yπ/2(q) estimate matrix was nearer to the target gate matrix, the accuracy of predicted probabil- ities does not rely on the proximity of individual estimate matrices to their targets in a fixed gauge because the probabilities depend on all the estimate matrices simultaneously. In other words, the MVE is determined by the distance to the maximum of the likelihood function with respect to all parameters, whereas a shift within a particular parameter sub- space can in principle be compensated by a shift in another. Furthermore, MVE depends on the native state, the native effect and all the gates in the set therefore the impact of CPTP contraction on gate set Θ must be considered as a whole. While a gauge-variant measure indicates that CPTP constraints reduce estimate accuracy, gauge-invariant MVE curiously reveals the contrary in this particular example.

fixed equality. Lucas Marcogliese 25

To summarize, GST represents a major practical step forward in the estimation of predictive models for quantum computers. It also highlights the interesting gauge freedom that emerges when using the Born rule "in reverse" to determine the model responsi- ble for observed probabilities. The numerical simulation demonstrates GST’s ability to self-consistently extract a predictive model with high accuracy, but also that gauge op- timization can be completely guided by the target model which expresses the desire of the experimentalist and not the realities of his/her experiments. Quantifying quantum hardware performance with GST-based metrics is becoming increasingly prevalent and warrants further theoretical work: (1) the re-evaluation of standard figures of merit, (2) the development of new gauge-invariant ones and (3) a critical assessment of gauge fixing procedures. As the only algorithm capable of fully characterizing a quantum computer, GST holds promise as a powerful tool to set valuable benchmarks in quantum hardware design. 1.7 Conclusion

GST is an advanced quantum algorithm that estimates a model for a quantum device’s set of available gates based on observed probabilities. By treating the device as a black box, GST can reconstruct the gate set of a quantum computer regardless, in principle, of the hardware platform. The estimated model can predict arbitrary circuit statistics and be used to calculate figures of merit which quantify the performance of a quantum computer. GST solves the self-referential problem associated with traditional quantum tomography by self-consistently introducing informationally complete sets of fiducial gates, constructed from the original gate set, which uniquely determines the elements of the gate set relative to each other, i.e., up to a global gauge freedom. The gauge freedom that emerges naturally when applying the Born rule in reverse implies firstly that there is no need to use the canonical gauge to obtain positive and nor- malized probabilities since GST estimates a model that is in an unknown gauge. Secondly, it highlights that quantum theory forbids the explicit extraction of a noise model describing the device in a specific gauge. The software package pyGSTi is a sophisticated implementa- tion of GST that can estimate high accuracy models. In the spirit of calculating commonly reported, gauge-variant figures of merit, it also attempts to deal with gauge freedom via a gauge fixing procedure which optimizes the estimate with respect to a gauge-fixed target gate set. However, the resulting estimate can be entirely guided by the target model, thus gauge optimization remains a controversial technique. Gauge freedom aside, pyGSTi achieves hyperaccuracy scaling owing to the long-sequence GST protocol, meaning that estimate accuracy is not limited by finite sample size but only by the quantum mechanical model. Additional precision can be gained with complex estimators such as maximum likelihood estimation. The practicality of GST meanwhile suffers from its scaling with qubit register size, exploding for more than 2 qubits. Despite this shortcoming, no other known algorithm can characterize the individual gate errors even for a single qubit, making GST a unique and indispensable tool for any quantum computer development endeavor. 26 Gate Set Tomography

App. 1.A Example: Pauli transfer matrix representation

Consider the mixed state with polarization ρ for a single qubit,

1 1 1 +  0  ρ = (I +  Z) = ρ (1.34) 2 ρ 2 0 1  − ρ The state can be written in the PTM representation using Eq. (1.3),

1 1 ( ρ )1 = Tr(ρI) = (1.35) | ii √2 √2   1 1 0 1 + ρ ( ρ )2 = Tr(ρX) = = 0 (1.36) | ii √2 2√2 1 ρ 0  −  1 1 0 i(1 + ρ) ( ρ )3 = Tr(ρY ) = − = 0 (1.37) | ii √2 2√2 i(1 ρ) 0  −  1 1 1 + ρ 0 1 ( ρ )4 = Tr(ρZ) = = ρ (1.38) | ii √2 2√2 0 ρ 1 √2 −

App. 1.B Physical constraints on gate set under the blame gauge transformation

Native state and measurement: ρ = 1 (1 + ~n ~σ) and E = 1 (1 + ~n ~σ) where ~n 1 2 ρ · 2 E · | ρ|≤ and ~n 1. | E |≤  1   0  ρ =   qρ 1 | ii  0  ⇒ | |≤ qρ    E = 1 0 0 E  q hh | q ⇒ | E |≤ | |

Damped Yπ/2 gate: positive semidefinite Choi matrix [4].

1 X 0 T ρY 0 = (Y )ijPi P π/2 d2 π/2 ⊗ j ij

 1 √1 γ γ(q 1) + 1 √1 γ  − − − − 1  √1 γ 1 √1 γ γ(q + 1) 1 ρY 0 =  − − −  π/2 4 γ(q 1) + 1 √1 γ 1 1 √1 γ  − − − − √1 γ γ(q + 1) 1 √1 γ 1 − − − − − with eigenvalues,

1 λ = γ(1 q) q 1 1,2 4 ± ⇒ | |≤ 1  p  λ = 2 γ (qγ)2 4γ + 4 q 1 3,4 4 − ± − ⇒ | |≤ Lucas Marcogliese 27

App. 1.C Overcomplete data with least squares

As with most statistical inference methods, the accuracy of an estimate is a function of   sampling error which scales as 1/√N where N is the number of samples. Besides O repeating the same experiment many times, one can also measure outcomes of sequences involving more than the minimal IC set of fiducial states and effects. Handling this over- complete data increases accuracy and can help verify self-consistency. The following theory follows the derivation from [1, section 3.C] 2 Overcomplete data involves acquiring statistics for circuits with sets of DA,DB > d fiducial gates A and B such that the dimensions of A and B are D d2 and d2 D i j A × × B respectively. Consequently, the tomographic matrices P0 and Pk inherit the dimensions D D . The least squares is amongst a number of algorithms used to solve overcomplete A × B linear systems. Applying the left and right inverses of A and B to Pk, we obtain the estimate,

G = (AT A)−1 AT P BT (B BT )−1 k · · · k · · · = B (BT B P T P )−1 B P T P BT (B BT )−1 · · · 0 · 0 · · 0 · k · · · where we’ve used A = P BT (B BT )−1. We cannot directly make use of this result since 0 · · · it contains a B matrix sandwiched between tomographic matrices. Defining B = B0Π with B (d2 d2) and projector Π(d2 N ) remedies the situation, 0 × × B T T −1 T T −1 Gk = B0(ΠP0 P0Π ) (ΠP0 PkΠ )B0

Lastly, the least squares cost function, the residuals between between observed fre- 2 2 2 2 quencies and predicted probabilities, must be minimized P A(d ×d )B(d ×d ) 2= P (1 | 0 − | | 0 − ΠT Π) 2. The Π projector with the largest singular values minimizes this expression [1]. | App. 1.D Algorithm scaling analysis

When discussing a quantum algorithm, the same question often arises: how does it scale with qubit register size? The minimal case, standard TP-constrained linear GST, for a quantum processor with K gates, involves estimating Np total free parameters [1],

Np = KNG + Nρ + NE

where NG,Nρ,NE are the number of free parameters for gates, states and effects. Gates generally have d2 d2 parameters, but the TP constraint forces the first row of G to be × k [1,0,...,0] (d2 parameters). A native state has d2 total parameters, but the first element of ρ must be √1 (1 parameter). A native effect has d2 parameters and the completeness | ii d relation (TP constraint) does not fix any parameters since none of the POVM’s other effects are known. The total number of free parameters is then,

N = K(d2 d2 d2) + d2 1 + d2 p · − − = Kd4 (K 2)d2 1 − − − For example, given K = 3, d = 22 and a sample size of N = 2000 for a error of < 10%, requires N 1.5 106 experiments. This is without considering overcomplete data nor p ≈ · long-sequence GST! Clearly, GST is experimentally impractical for more than 2 qubits. 28 Gate Set Tomography

App. 1.E Long-sequence GST: commuting errors and ampli- ficational completeness

Not all errors can be amplified using long-sequence GST. Decoherence errors that involve damping, for example, further surpress errors instead of amplifying them, essentially drain- ing the system of information. For long-sequence GST to succeed in increasing estimation accuracy, the errors must induce perturbations in gm, which can be written as θ~gm where ` ∇ θ~ are the free parameters of Θ, that commute with gm such that lim Λ() is a function `→∞ of both ` and . In addition it is possible to construct an amplificationally complete set of germs if this set amplifies all gate set parameters. The interested reader is encour- aged to consult the proofs of these statements, beyond the scope of this present work, in [1, Appendix C]. We can consider two examples of coherent errors to illustrate the concept [1]. Consider an over-rotation in X by angle . The over-rotation matrix has eigenvalues λ1,2 = 1 and ±i λ3,4 = e . Note how the eigenvalues λ3,4 are a function of  if one chooses suitable powers `, such as a logarithmic scaling to circumvent periodicity. Also, rotation matrices along a fixed axis always commute so we can build a germ g = X X and raise it to the power `, π · 

1 0 0 0  ` ` ` 0 1 0 0  (Xπ()) = X X =   (1.39) π ·  0 0 ( 1)` cos(`)( 1)`+1 sin(`) − − 0 0 ( 1)` sin(`)( 1)` cos(`) − − For small over-rotation errors , this germ can successfully amplify parameters above a given sampling noise level with an appropriate set of powers. Tilt errors, are more complex. Consider an Xπ/2 rotation around a slightly wrong axis 1 π  π  described by the operation cos 4 i(cos()σx + sin()σy) sin 4 . The corresponding − π 0 ±i 2 superoperator Xπ/2 in the PTM representation has eigenvalues λ1,2 = 1 and λ3,4 = e 0 4 that do not depend on . Notice that (Xπ/2) = I, thus there are no powers ` that can 0 0 arbitrarily amplify small values of  for g = Xπ/2. We can instead construct g = Xπ/2Yπ/2 with eigenvalues, 1 1 p λ = x2 x (√x 1) x2 2x 3 (1.40) 3,4 − − 2 ± 2 ∓ − − where x = sin(). Since λ3,4 are a function of , the latter can be amplified by germ powers.

App. 1.F Motivation for maximum likelihood estimation

MLE can be motivated by three main arguments: (1) For global maxima within the space of models that satisfy CPTP constraints, the Bayesian estimator (most accurate) can be approximated by the maximum likelihood estimator [13]. (2) It has a deeper statistical foundation. With a simple transformation (see Ref. [6]), one can show that MLE is in fact equivalent to the relative Shannon entropy between the observed frequencies and the probabilities with the model Θ. The common Shannon entropy on a single random variable represents the amount of information/data required to accurately describe it whereas the relative entropy is simply a measure of the closeness between two distributions. (3) It al- lows one to impose CPTP constraints unlike in linear GST. Thus, with a fully parametrized model seeded close enough to the actual distribution by a long-sequence GST estimate, MLE can achieve high accuracy by exploring the neighbourhood of this estimate in pa- rameter space guided by a sophisticated, statistically meaningful estimator. Lucas Marcogliese 29

App. 1.G Target and estimate gate PTMs

TP-constrained CPTP-constrained estimate - actual estimate - actual

푋휋/2

Y'휋/2

Fig. 1.8: Average differences between TP-/CPTP- estimates and actual gates for q = 1. 30 Gate Set Tomography

App. 1.H Simulation: native state and effect analysis

Frobenius distances and 4th vector elements of native state and effect:

Fig. 1.9: Frobenius distance between native state estimates and actual native state as a function of blame gauge parameter q. The distance to the target gates is also plotted as a reference. Since the distance at q = 1 is 0, the target curve exhibits a sharp divergence as it approaches this point.

Fig. 1.10: Frobenius distance between native effect estimates and actual native effect as a function of blame gauge parameter q. The distance to the target gates is also plotted as a reference. Since the distance at q = 1 is 0, the target curve exhibits a sharp divergence as it approaches this point. Lucas Marcogliese 31

4

(휌)

Fig. 1.11: Vector element ( ρ ) as a function of blame gauge parameter q. The target | ii 4 √1  q varies as 2 ρ . There is no noticeable change in the estimate’s vector element error as a function q since neither bound on q is determined by the constraints on the state.

4

(E)

Fig. 1.12: Vector element ( E ) as a function of blame gauge parameter q. The target hh | 4 √1  /q varies as 2 E . CPTP-constrained variance increases as it approaches the effect-defined constraint boundary at q = E = 0.5 when q is lowered (plot is read from right to left). Chapter 2

Variational Quantum Algorithms

Bavithra Govintharajah Supervisor: Fabian Hassler

Simulation of large quantum systems or solving large-scale linear algebraic prob- lems is classically inefficient due to high computational costs. Quantum comput- ers promise to unlock these uncharted territories, although fault-tolerant quan- tum computers will likely not be available soon. Quantum devices available at present have serious bottlenecks such as the limited number of qubits and noise that severely limit circuit depths.Variational Quantum Algorithms (VQAs) have emerged as promising candidates to address the constraints of near-term quan- tum computing. They employ a quantum-classical hybrid approach to train a parametrized quantum circuits. Various VQAs have now been proposed for several applications in various disciplines spanning physics, machine learning, quantum chemistry, and combinatorial optimization and they appear to the best hope for obtaining quantum advantage with near term devices. Nevertheless, open questions remain including the trainability, accuracy, and efficiency of VQAs. In this chapter, an introduction to the concept of VQAs is presented. The key structure of these algorithms, their limitations, and their advantages are discussed in detail. Furthermore, an example of applying a VQA to simulate small molecules is presented.

2.1 Introduction

When early quantum devices became available, it was clear that progress in both the al- gorithmic and the hardware side was going to require the realization of proof-of-principle implementations of quantum algorithms. The modest resource requirements and flexibility of variational quantum information processing compared to other algorithms such as the Shor’s algorithm and the Quantum Phase Estimation (QPE) algorithm, positioned vari- ational approach as one of the first milestones in the road-map of quantum computing experiments. The introduction of the first Variational Quantum Algorithms (VQA) in 2014 coincided with the transition of experimental quantum computing from primarily an academic endeavor to an industrial enterprise. The inevitable question was, and still is: will these early noisy quantum devices deliver an advantage over classical computing and efficient solutions to complex problems? Hope is that, this is the case, and it is believed Bavithra Govintharajah 33 that a viable path towards a definite affirmative answer lies in the further development and improvement of quantum variational approaches.

One important aspect that marks experimental advancements in quantum computing is that the number of available physical qubits must exceed a specific threshold to explore classically intractable problems. At the time of writing this document, the largest circuit model quantum processors reported comprise of just 10-100 qubits with error rates of 10−3 10−2. This is called the Noisy Intermediate Scale Quantum (NISQ) era, a term − first coined by John Preskill in 2018 [16]. NISQ devices are characterized by limited coherence times, lower gate fidelities, and fewer qubits. Therefore, NISQ devices will not have adequate resources to implement quantum error correction, and as a consequence, only shallow circuits are executable. However, these machines will be large enough to implement quantum dynamics that cannot be efficiently simulated with existing classical computers because classical computers are notoriously bad at simulating dynamics of highly entangled many-particle quantum systems 1.

The very first VQAs; the Variational Quantum Eigensolver (VQE) [17] and the Quan- tum Approximate Optimization Algorithm (QAOA) [18] paved the way for a new paradigm in quantum algorithm design, by considering the limitations of NISQ devices. Before these proposals, quantum algorithmic development was aimed at fault-tolerant quantum com- puters with error corrections capabilities. In contrast, VQAs seek to remove the stringent requirements for coherent evolution by formulating the computation as an approximate optimization problem. Here, a functional2 in the space of heuristic functions defined by a variational quantum circuit is optimized. From this perspective variational quantum information processing and VQAs concern a strategy for finding extremums of quantum functionals, which are defined as mappings from the state space of a quantum system onto a scalar [19]. The flexibility in the definition of these heuristics allows the expansion of op- erability of these approaches within the NISQ regime, circumventing the need for quantum error correction.

The purpose of this chapter is to give a short introduction to the art of variational quantum algorithms. This chapter is brief but self-contained, and only a solid knowledge of quantum mechanics and the basics of quantum information is assumed. The reader is encouraged to go through Chapter 4 of Ref. [3] beforehand to better understand the implementation of VQAs on actual quantum devices. The chapter begins with a conceptual introduction to VQAs highlighting the motivation behind the use of variational methods on NISQ devices under Section 2.2. The subsequent Sections 2.3,2.4, 2.5 and 2.6 delve deeper into concepts introduced in 2.2. Finally a hands-on example is given under Section 2.7 where a detailed analysis is performed on the application of the variational quantum eigensolver to solve the electronic structure problem of H2 molecule using Qiskit [20]. Basic familiarity with Hartree-Fock theory and post Hartree-Fock methods might prove useful in extending the hands-on example but it is not a requirement to understand the example. At the end of this chapter, the reader will hopefully feel comfortable with the use of VQAs to solve common optimization problems and have a clear overview of the theory to adjust various parts of the algorithm to suit one’s needs.

1Even just storing arbitrary quantum states in classical computers is difficult because the amount of classical resources required to represent a given is related to the degree of entanglement of the state. This problem just gets worse when studying the dynamics. Check Exercise 4.46 in Ref. [3]. 2A functional is simply a mapping that takes a function as input and returns a scalar as output. 34 Variational Quantum Algorithms

2.2 Variational quantum algorithms

VQAs arise from three main concepts in variational quantum information processing as given below [19]:

1. Approximate nature of the calculations: Instead of guaranteeing an exact solution to the problem, VQAs offer approximate solutions. This helps us gain more flexibility in the number of resources required, in exchange for performance guarantees on the algorithm.

2. The use of variational quantum circuits: Variational quantum circuits (also called variational ansatz or variational forms or parametrized quantum circuits) are quan- tum circuits in which some of the unitary operations have tunable parameters ob- tained by parameterizing the corresponding quantum gates. These parameters are systematically updated when the algorithm is executed. To give an oversimplified analogy, the variational quantum circuit is like a stencil of the problem where the sizes of the holes can be varied. Computational problems which encode solutions as extreme values of a functional can be naturally described using the variational quantum circuits as heuristics. The flexibility in choosing variational circuits allows one to pick circuits that can operate within the capabilities of the NISQ devices. This operational robustness thus removes the requirement for error correction.

3. Employing hybrid quantum-classical computing strategy: The variational quantum circuits require an optimization routine to tune the parameters. This task is allo- cated to a classical computer. The optimization routine is carried out in a quantum- classical loop via classical communication. This feature motivates the idea of hybrid quantum-classical computing. The key idea of this approach is to ’divide and con- quer’ where the main computational task is divided into sub-tasks and allocated between the classical and quantum computers. By outsourcing pre-processing and post-processing involved to a classical computer, it is possible to create algorithms that exhibit advantage using fewer quantum resources than an algorithm that imple- ments the entire task only on a quantum device.

These elements are combined to create a general strategy that can be applied to find approximate solutions to optimization problems. The main goal of a VQA is to find a function that extremizes the value of specific quantities of interest depending on the problem. For example, in the Variational Quantum Eigensolver (VQE) [17], the goal is to find a quantum circuit that prepares the ground state for a given Hamiltonian, usually the electronic structure problem. Another prominent example of a VQA is the Quantum Approximate Optimization Algorithm (QAOA) [18] where the goal is to find a quantum circuit that prepares the state which maximizes a problem specific cost function with encoded constraints.

2.2.1 Outline of the algorithm

In practice, VQAs are carried out following an iterative procedure on a Quantum Process- ing Unit (QPU) with continuous feedback to and from a classical central processing unit (CPU)3. A VQA consists of several modular components split between the QPU and CPU and can be readily connected, extended, and enhanced individually with developments in algorithms and quantum hardware. A full protocol consists of:

3This is similar to the Graphical Processing Unit (GPU) and the CPU working together in a normal computer. In this case, the QPU plays the role of a GPU. Bavithra Govintharajah 35

1. An initial state preparation: we start by preparing the initial input state ϕ ; | i 2. Application of a parametrized unitary: a parametrized unitary Uˆ(θ~) is applied to the input state on the QPU preparing the output state Uˆ(θ~) ϕ . This parametrized | i unitary is defined by the choice of variational ansatz. It should correspond to a quantum circuit that cannot be efficiently computed using classical resources 4;

3. A cost function and a method to estimate/evaluate it: the value of the function is estimated using a quantum circuit. Ultimately, the estimation procedure reduces to performing measurements on the output state. To estimate the function to a given accuracy, Step 1 has to be repeated several times to prepare trial states to collect sufficient measurement statistics;

4. A feedback loop with classical optimization routine: based on the function esti- mate from Step 3. A classical optimization technique proposes a new set of values for the optimal variational parameters θ~0.

Steps 1-4 are repeated until some problem-specific convergence criteria are satisfied. The following sections will deal with each of the components in-depth and a summarized diagram of the full protocol is shown in Fig. 2.1.

2.2.2 The variational paradigm and general quantum systems

The variational principle in quantum mechanics states that, for a given quantum state Ψ | i and an observable Oˆ, the expectation value

Ψ Oˆ Ψ Oˆ = h | | i λ , (2.1) h i|Ψi Ψ Ψ ≥ 0 h | i where λ is the lowest eigenvalue of operator Oˆ. The equality holds only when Oˆ = λ 0 h i|Ψi 0 and Ψ is said to be in the ground state. For a detailed discussion on the variational | i principle see Appendix 2.A. In many instances, the eigenstates corresponding to the lowest few eigenvalues, and their properties are of primary interest. In physical systems studied in physics, this is usually the case since low energy states play a dominant role in determining the properties of a system at moderate temperatures, and in optimization problems, they often encode the optimal solution. The variational principle leads to a simple method to find the ground state of a quantum system with Oˆ = Hˆ : prepare several parametrized ’trial states’ Ψ(θ~) , | i measure the expectation value of the Hamiltonian, choose the state where the expectation value is minimized. This brings us to the topic below.

How are variational methods implemented in a quantum computer?

Let us consider a quantum system composed of N qubits which will act as our quantum S computer. Take a Hamiltonian Hˆ of a different system which need not have any relations Q to other than acting on a Hilbert of space of less than N qubits. The goal is to find and S study the system using . The Hamiltonian Hˆ naturally arises in the description of all Q S physical systems. For example, this Hamiltonian could be derived from a physical system such as a collection of interacting spins or an interacting electronic system.

4One should remember that certain sets of quantum gates such as the Clifford group gates, compose quantum circuits that can be efficiently simulated (i.e in polynomial time) with classical resources according to the Gottesman-Knill theorem 36 Variational Quantum Algorithms

Fig. 2.1: Control flow of a variational quantum algorithm using hybrid quantum- classical model [19]: the algorithm is implemented on a quantum device by preparing the initial state ϕ = R(ϕ) 0 ... 0 and executing the parametrized unitary U(θ~). the circuit | i | i C implements the measurement scheme to estimate the cost function. To estimate the cost function, the state preparation and necessary measurements are repeated several times to accumulate sufficient statistics. The results are then post-processed on a classical computer which might include statistical modeling. Also, the circuit C could be replaced by a circuit that computes gradients of the cost function, which are required by some optimization routines. An iteration of the optimization algorithm generates a new candidate for the optimal variational parameters θ~0. The cycle between the quantum and classical modules is repeated until convergence. Here, discontinuous arrows represent transfer of classical information.

Hamiltonians are not the only operators that can be measured on quantum devices; in general, any expectation value of an operator can be obtained by extending this method. Our attention is restricted to the class of operators whose expectation value can be mea- sured efficiently on and mapped to [21]. A sufficient condition for this property is S Q that the operator Oˆ has a decomposition into a polynomial sum of simple operators5 as X Oˆ = cαOˆα (2.2) α where Oˆ acts on , α runs over several terms polynomial in the size of the system, h Q α is a constant coefficient, and each Oˆ has a simple measurement prescription on . This α S will enable straightforward measurement of the expectation values of Oˆ on by weighted Q summation of projective measurements of . A simple and relevant example is the decom- S position of a Hermitian operator into a weighted sum of Pauli strings (See Appendix 2.B). After deciding the operator and the corresponding measurement scheme through poly- nomial expansion as given by Eq. (2.2), the system needs to be prepared into the S 5Almost all the Hamiltonians describing physical systems such as the Ising Model, Heisenberg Model, and electronic structure problem can be written as a polynomial number of terms, concerning the system size [22]. Bavithra Govintharajah 37 parametrized trial states. How they are prepared is the key element of VQAs and is discussed in detail under 2.3. To give a more general description, influence of the environ- ment on the trial state preparation must be taken into account. This is better described by an ensemble given by a parametrized density matrix ρ(θ~). In an ideal case where the state preparation is error free and a pure state is maintained, ρ(θ~) = Ψθ~ Ψθ~ . In density | ih | matrix formalism, the expectation value of an operator is given by h i Oˆ = Tr ρOˆ (2.3) h iρ and the variational principle in Eq. (2.1) still holds. It can be rewritten as D E h i Oˆ (θ~) = Oˆ (θ~) = Tr ρ(θ~)Oˆ λ . (2.4) h iρ ≥ 0 The fact that this principle still holds for mixed states is the main reason for the robustness of VQAs to errors and environmental influence. By finding the set of parameters θ~ that minimizes Oˆ , one is in effect, finding a set of experimental parameters that are most h i likely to produce the ground state on average, potentially affecting a blind purification of the state of the system being produced [21].

2.3 Variational ansatz construction

Variational quantum circuit or the ansatz construction is a crucial aspect of VQA that dictates the success of the algorithm. It is a very active field of research and in most cases finding an optimal ansatz for a problem is difficult than finding a solution to the computational problem itself. The parameters are continuously varied to prepare a lot of different trial states. One can make this parametrization classical - i.e, given a list of parameters θ~ RNparams , we can prepare a different trial state Ψθ~ for each θ~. One might ∈ | i ask whether the classical parametrization makes evaluation of Oˆ classically tractable, but h i this is avoided by using gate angles as parameters to control the quantum circuit. With this key idea in mind, now we are ready to define a variational ansatz formally according to [23].

Definition 2.3.1. A variational ansatz on n parameters corresponds to a pair (U, ~0 ), p | i where U is smooth map from the parameter space θ~ Rnp to the unitary operator U(θ~) on N N ∈ C2 , and ~0 C2 is the initial state, which is acted on to generate the variational state | i ∈ ψ(θ~) = U(θ~) ~0 , with variational energy E(θ~) = ψ(θ~) H ψ(θ~) . | i | i h | | i Assuming that the vectors θ~ are continuously defined, a variational ansatz can be imagined to sweep out a (possibly open) manifold in the Hilbert space of the states. In simpler terms, it explores some smooth space that lies within the set of allowed Hilbert space. Except for possible boundaries, the dimension of this manifold is precisely Nparams. This points out a limit of the variational paradigm; to explore the entire 2N -dimensional Hilbert space and guarantee that the ground state is found, O(2N ) parameters [23] and a similar scaling of time are required 6. So an exponential advantage is not expected. Instead, the ’art’ of a VQA design is in choosing a manifold that contains points sufficiently close to the true ground state and the ’art’ of application design is in choosing a problem such that ’sufficiently close’ is achievable. With the limitations in mind, the variational circuit must be chosen carefully. Because the choice of ansatz greatly affects the performance of a VQA. From the perspective of the 6This doesn’t come off as a surprise, because performing such a task is known to be a QMA-hard problem [24]. 38 Variational Quantum Algorithms problem, the ansatz influences both the closeness to the final solution and the convergence speed. Moreover, given the exponentially small fraction of the Hilbert space that will be explored, randomly generated ansatz cannot be expected to be ’good enough’ due to the difficulties faced when optimizing such an arbitrary ansatz. Other than the computational concerns, NISQ hardware has to be taken into account as well. Due to the short coherence times of NISQ devices, deeper circuits are more susceptible to errors, and some gates can be costly to compose with hardware native gates. Accordingly, most of the ansätze devel- oped to date are categorized either as more problem-inspired or more hardware efficient, depending on their structure and application. In the following subsections, a short expla- nation of arbitrary unitary evolution U(θ) and common constructions of both the ansatz types are presented.

2.3.1 Arbitrary unitary evolution

Understanding how arbitrary unitary evolutions U(θ) are executed in a quantum circuit is crucial for efficient ansatz construction. An arbitrary unitary evolution can be generated by an Hermitian operator gˆ which defines an evolution in terms of the parameter θ,

U(θ) = e−igθˆ . (2.5)

From an abstract viewpoint, these evolutions can always be described as the time evolution of the corresponding quantum state, so that the generator gˆ is often just the Hamiltonian Hˆ 7. Since any Hˆ can be written as a sum of Pauli strings according to Eq. (2.2) it is useful to analyze the case when gˆ is a Pauli operator Pˆ PN. This makes ∈ −iPˆ θ UPˆ(θ) = e . (2.6)

Taylor expanding Eq. (2.6) and using the properties of the Pauli operators (see Ap- pendix 2.B) yields,

X ( iθPˆ)k U (θ) = − Pˆ k! k X ( iθPˆ)2k ( iθPˆ)2k+1 = − + − 2k! (2k + 1)! k X θ2k θ2k+1 = ( i)2k P 2k + ( i)2k+1 P 2k+1 − 2k! i − (2k + 1)! i k X θ2k θ2k+1 = ( 1)k I + ( i) P . (2.7) − 2k! − (2k + 1)! i k

The terms are just the Taylor expansions of sin and cos and thus, U(t) becomes a single- qubit rotation given by U (θ) = cos θI i sin θP.ˆ (2.8) Pˆ − It is important to note that although Pˆ when acting as a unitary operator by itself does not generate any entanglement between the qubits, UPˆ(θ) is an entangling gate! Not surprisingly, every possible operation on a quantum computer may be approximated by a series of UPˆ(θ) for different operators Pi.

7Note, however, that the Hamiltonian generating evolution does not necessarily need to be the operator that describes the energy of the system of interest. Bavithra Govintharajah 39

Suzuki-Trotter expansion

The Suzuki-Trotter expansion [25] is a general method to approximate a general, difficult to implement unitaries in the form of eq. (2.3.1) as a function of the θ parameter. This is done by decomposing the generator gˆ into a sum of non-commuting operators oˆ such { k} that X gˆ = akoˆk, (2.9) k

−ioˆkθ for some coefficients ak. The operators oˆk are chosen such that the evolution e is easy to implement. An example of this is the expansion of the Hamiltonian Hˆ by Pauli strings as shown previously. The full evolution over θ can now be decomposed into m Z+ ∈ smaller, equally sized chunks

!m a oˆ −igθˆ Y −i k k θ e = lim e m . (2.10) m→∞ k

When Pauli strings are used as in Eq. (2.6), this becomes a multi-qubit rotation which themselves can be decomposed into primitive one and two-qubit gates.

2.3.2 Problem-inspired ansatz construction

Historically, problem-inspired ansatz was the first to be proposed and implemented on actual hardware. Unitary Coupled Cluster (UCC) ansatz [26], Variational Hamiltonian Ansatz (VHA) [21, 27], factorized UCC and adaptive approaches are a few such prob- lems inspired variational ansatz that stems from quantum chemistry and many-body physics. Within so-called problem-inspired approaches, unitary evolution is in the form of Eq. (2.3.1), with generators gˆ derived from the system properties. Knowledge about the physics behind the problem Hamiltonian being Trotterized can reduce substantially the number of gates needed to implementing the Suzuki-Trotter method. Let us now look at a couple of problem-inspired ansatz examples.

Example 1: The Unitary Coupled-Cluster Ansatz

The UCC ansatz is widely used in quantum chemistry problems where the goal is to simulate a fermionic molecular Hamiltonian and obtain the ground state energy. The idea of implementing UCC on quantum computers arose from the observation that the UCC ansatz, which adds quantum correlations to the Hartree-Fock approximation, is inefficient to represent on a classical device [28]. It was first realized on a photonic processor by Peruzzo and his group in 2014 [17]. Typically, we expect the ground state of an interacting system to be a linear com- bination of low-energy excitations of the corresponding non-interacting problem. In the electronic structure problem this corresponds to excitations of a few particles from the Hartree-Fock state Ψ of the Hamiltonian Hˆ . The UCC ansatz proposes a candidate | HF i for a ground state based on excitations generated by the cluster operator Tˆ from ΨHF ˆ ˆ† | i to low-lying non-interacting higher energy states as eT (θ)−T (θ) Ψ [26]. The cluster | HF i 40 Variational Quantum Algorithms operator Tˆ which captures the excitation dynamics are given by8, X Tˆ = Tˆ(1) + Tˆ(2) + Tˆ(3) + ... = Tˆ(k) (Cluster operator), (2.11a) k ˆ(1) X i † T = θk cˆi cˆk (Single excitations), (2.11b) i ∈ empty k ∈ filled ˆ(1) X i,j † † T = θk,l cˆi cˆjcˆkcˆl (Double excitations). (2.11c) i,j ∈ empty k,l ∈ filled

The operator cˆi refers to the fermionic annihilation operator of the i-th Hartree-Fock orbital and the sets empty and filled refer to the corresponding occupied and unoccupied i i,j Hartree-Fock orbitals. Here, θk and θk,l are the free parameters of the problem. Due to the decreasing importance of higher Tˆ(k), the series is usually truncated after the second or third term. The ansatz is termed UCCSD or UCCSDT, respectively referring to the inclusion of single, double, and triple excitations from the Hartree-Fock ground state. To implement this ansatz on a quantum computer, second quantization methods are used to map the fermionic operators to the qubit operators9 resulting in an ansatz of the form in Eq. (2.10). It is later on converted to a parametrized ansatz via the Trotterization.

Example 2: The Quantum Alternating Operator Ansatz

The Quantum Alternating Operator Ansatz was originally introduced as a part of the QAOA algorithm [18], and later generalized as a standalone variational ansatz. One can imagine the QAOA as a Trotterized version of where the order p of the Trotterization determines the precision of the solution. Indeed, the adiabatic evolution can be obtained in the limit of p . The goal of QAOA is to map an input state ψ , → ∞ | 0i −iγj HˆC to the ground state of the problem Hamiltonian HˆC by applying a unitary e and a ˆ mixer unitary e−iβj HM sequentially. That is,

p Y ˆ ˆ U(~γ, β~) = e−iβj HM e−iγj HC , (2.12) j=1 where θ~ = (~γ, β~)). A key strength of QAOA is the fact that the feasible subspace for certain problems is smaller than the complete Hilbert space, and this restriction results in a better performing algorithm.

2.3.3 Hardware-efficient ansatz construction Problem-inspired ansatz has been shown computationally, to ensure faster convergence to a satisfying solution. However, the deeper circuits associated with Trotterization can be challenging to realize experimentally. The hardware-efficient ansatz is a generic name used for variational ansatz that is aimed at NISQ with a focus on circuit depth reduction. It was introduced in 2017 by Kandala and Mezzacapo as an experiment on a 7 qubit superconducting device [29]. The key idea is to generate random parametrized quantum circuits only using the better-performing native gates of the underlying hardware with a particular qubit topology. This avoids the circuit depth overhead arising from translating 8Given in physicist notation. Computational quantum chemistry softwares use the chemist notation in which the ordering of the operators change: physicist: ijkl → chemist: iklj 9There are many variants of the UCC ansatz, with some of them reducing the depth of the quantum circuit by considering more efficient methods for compiling the fermionic operators using group theory. Bavithra Govintharajah 41 an arbitrary unitary into the native gates. By doing so, one expects to reduce errors during the computation while being able to generate meaningful parametrized trial states. Let us now see, how one can construct such an ansatz with basic single-qubit gates and two-qubit entangling gates according to the method presented in [29]. The circuit is constructed from blocks of single-qubit gates and entangling gates applied to multiple qubits simultaneously. A single sequence of application of a block is called a layer. The ansatz circuit generally has multiple such layers and it enables exploration of a larger portion of the Hilbert space. The unitary of a hardware-efficient ansatz with Nl is given by N Yl U(θ~) = Uk(θk)Wk, (2.13) k=1 ˆ ~ −iθkOk where θ = (θ1, . . . , θNl ) are the variational parameters, unitary Uk(θk) = e is the unitary generated by the Hermitian operator Oˆk, and Wk represents the non-parametrized quantum gates. Typically Oˆk are Pauli strings acting locally on each qubit. In those cases, Uk becomes a product of combinations of single-qubit rotations, each one defined as in Eq. (2.8). Wk is the entangling unitary constructed from the native gate set. For example for superconducting qubit architectures, CNOT or CZ gates can be used and in trapped ion platforms, XX gates can be used [30,31]. The choice gate set, qubit topology, and the ordering influence the subspace of the Hilbert space covered by the ansatz and how fast it converges to a solution for a specific problem. In an ideal case, one would expect to perform entangling gates between all pairs of qubits. However, the reality is different due to the limited qubit connectivity which varies among different hardware platforms, thus limiting the ability to perform two-qubit en- tangling gates. One main advantage of this ansatz is its versatility and problem-agnostic nature. It is especially useful to study Hamiltonians that are similar to the interactions in the hardware. On the other hand, the problem-agnostic nature also leads to problems in optimization when parameters are initialized randomly.

2.4 Cost function

After constructing a variational ansatz, the focus should be on finding a way to extract useful classical information from it. Therefore, the next step in a VQA is to encode the problem into an objective function, sometimes called the cost function. From Section 2.2.2, cost function is just the expectation value of the operator being measured. To reformulate ~ it appropriately, the standard definition of a VQA cost function, denoted fOˆ(θ) is the N expectation value of an observable Oˆ Herm(C2 ), that is, ∈ f (θ~) = ψ(θ~) Oˆ ψ(θ~) (2.14) Oˆ h | | i where normalization of the wavefunction is assumed. Let us look at a couple of examples of cost functions.

Example 1: Approximating the ground state energy of a physical system

Information about the dynamics of a system is encoded in its Hamiltonian Hˆ , and it naturally arises in the description of all physical systems. In physics, one is often interested in computing the ground state energy E0 of a Hamiltonian H given by,

ψ Hˆ ψ E0 = min h | | i. (2.15) |ψi ψ ψ h | i 42 Variational Quantum Algorithms

In this case it is apparent that the corresponding cost function is given by

f (θ~) = ψ(θ~) Hˆ ψ(θ~) . (2.16) Hˆ h | | i The UCCSD ansatz from Section 2.3.2 and the hardware efficient ansatz from Section 2.3.3 are used commonly to solve problems of this kind. This will also be the example problem analysed under Section 2.7.

Example 2: Approximating optimization problems

For problems unrelated to physical systems, encoding into an equivalent qubit Hamiltonian is still possible, thereby opening a path to solving them on a quantum computer. To solve a combinatorial optimization problem using a quantum algorithm, the problem is first encoded onto a quantum system. One of the most widely used models in physics, that is also used to represent optimization problems, is the Ising model. A quantum version of it can be obtained easily by replacing the spin variables with Pauli Z-operators

n X X Hˆ Hˆ (σz, .ˆ . . , σz ) = J σzσz h σz (2.17) C ≡ 1 n − ij i j − i i 1≤i

z where Jij and hi are real numbers, and σi refers to the Pauli Z-operator acting on the ith qubit. Several NP-complete optimization problems and even many NP-hard problems can naturally be expressed as a problem of finding the ground state, or minimum energy configuration, of a quantum Ising Hamiltonian by choosing appropriate values for Jij and hi [32]. The QAOA ansatz from Section 2.3.2 is usually the preferred ansatz for solving these combinatorial optimization problems and one can read such an example of QAOA applied to a tail-gate assignment problem in Ref. [33].

To sum up the idea, a cost function defines a hyper-surface called the cost landscape that maps the trainable parameters to real numbers. The task of the optimizer is to tra- verse the cost landscape and find the global minima. The choice of optimizer depends on how complicated the landscape is, as discussed in detail under Section 2.6. Therefore, the choice of a good objective function is also crucial to ensure convergence to the solution in addition to the variational ansatz choice. A good objective function of a problem should meet the following criteria.

1. Faithfulness: The optimum of the cost function must correspond to the solution of the problem

2. Efficient estimation: It should be possible to estimate the cost function efficiently by performing a finite number of measurements with classical post-processing.

3. Operational meaningfulness: It’s useful when the value of the cost function reflects the quality of a solution (e.g. lower cost means better solution).

4. Trainability: It must be possible to efficiently optimize the parameters to meet con- vergence criteria

Even though the properties given above sound trivial, they are difficult to satisfy when running an actual experiment on NISQ devices with constraints on circuit depth and ancilla requirements. Bavithra Govintharajah 43

2.5 Measurement scheme

Let us now address the question of how to evaluate the VQA cost function defined in Eq. (2.14). Using Eq. (2.2) and Theorem 2.B.1, most interesting observables can be written as a linear combination of a polynomial number of Pauli strings with real coefficients, i.e.,

poly(N) ˆ X α O = ciσi (2.18) i=1 where α = x, y, z identify the Pauli spin operator. For example, most physically motivated Hamiltonians are k-local. That is, each of the Pauli strings in the sum acts non-trivially on at most k qubits. Examples of k-local systems include Ising and Heisenberg spin systems, van der Waals gases, strong and weak interactions, and lattice gauge theories. In fact, any system that is consistent with special and general relativity evolves according to local n interactions [34]. In a k-local system there are k possible subsets of k qubits. Therefore the number of Pauli strings that appear in the decomposition of a k-local Hamiltonian is bounded by a polynomial. When Eq. (2.18) is encoded into a cost function as in Eq. (2.14) we find that

f (θ~) = ψ(θ~) Oˆ ψ(θ~) Oˆ h | | i poly(N) X = ψ(θ~) c σα ψ(θ~) h | i i | i i=1 poly(N) X = c ψ(θ~) σα ψ(θ~) . (2.19) i h | i | i i=1 From this, we see that the evaluation of the cost function boils down to evaluating the expectation value of a weighted sum of Pauli strings assuming that the values of ci are known beforehand. The expectation value of a tensor product of an arbitrary number of Pauli operators can be estimated by measuring each of the qubits locally [3]. This is an operation that requires a coherence time of O(1) (i.e. incurring a constant cost in time) under the assumption that parallel qubit rotations and readouts are possible [17].

2.5.1 Precision and Chebyshev’s inequality

Suppose we wish to estimate E = ψ(θ~) P ψ(θ~) , for some Pauli string P PN , up to a h | | i ∈ precision ε > 0. In order to achieve that, let

2N X P = λ φ φ (2.20) k | kih k| k=1 denote the spectral decomposition of P . Here E can be taken as the expected value of a N random variable X with possible outcomes λ 2 with { i}i=1 2 p(X = λ ) = φ ψ(~0) . (2.21) k | h k| i| That is, X is the random variable representing the outcome of measuring operator Oˆ on the state ψ(θ~) . Chebyshev’s inequality is a well known result in probability theory which | i states that if we take M copies of X, denoted by X1,...,XM , then ! PM 2 i=1 Xi σ p E ε 2 , (2.22) M − ≥ ≤ Mε 44 Variational Quantum Algorithms where σ denotes the variance of X. Since the variance of X can be assumed to be bounded above by a constant, from Eq. (2.22) it can be deduced that 1 M (2.23) ∼ ε2 measurements must be performed, to obtain, with high probability, an estimate of E to within additive precision ε. The limitations in current NISQ hardware has severe impli- cations on how well the above strategy will allow one to estimate the energies E, as the measurement outcomes will become noisy. This inherent statistical nature of the outcomes will influence the optimization strategy. This is discussed in Section 2.6.

2.5.2 Measurement scheme optimization The strategy presented here is a very naive way of estimating the cost function by sam- pling every Pauli string and using the outcome statistics to estimate the corresponding expectation value. As mentioned before, the key advantage of this approach is that the coherence time to make a single measurement after preparing the state is O(1). Conversely, the disadvantage of this approach is the scaling of the total number of operations, as a function of the desired precision ε as O(ε−2) [17]. Moreover, this scaling also reflects the number of state preparation repetitions required. This has led to an active area of research in measurement optimization strategies. One example of such a technique would be to use the commutation properties of Pauli strings and make simultaneous measurements. The problem is then to divide the Pauli strings into a small number of commuting subsets. It turns out that this problem is equivalent to the clique cover problem, which is known to be NP-complete [35]. Researchers have proposed new methods using approximation algorithms for the clique cover problem as a grouping, with various degrees of success. By speeding up the measurement scheme with new techniques, it is possible to realize faster algorithms with efficient use of quantum resources.

2.6 Optimization routine

When choosing the variational ansatz, the hope is that it explores a region of Hilbert space ~ ˆ in which the minimum minθ~ fOˆ(θ) is near the actual ground state energy of O. Thus, after making the ansatz choice, the challenge that remains is to find the θ~ that minimizes the cost function fOˆ. To this end, a classical optimization routine is employed. Numerical optimization is a vast field, and therefore identifying and adapting existing optimization techniques that are suited to quantum variational approaches requires considerable effort because the success of a VQA depends on the reliability and efficiency of the classical optimization method used. The following should be kept in mind when choosing an optimizer.

1. Due to short coherence times in the NISQ era, complicated analytical gradient circuits cannot be implemented

2. The optimizer should be resilient to noisy data and precision on objective function evaluation that is limited by the number of shots in the measurement.

3. Objective function evaluations take a long time, which means shots frugal optimizers are favored.

Which classical optimization routine works best with VQAs is an open question. Broadly speaking, the optimization algorithms used in VQAs can be split into two groups: Bavithra Govintharajah 45

1. Gradient-based approaches: As the name suggests, gradient-based methods optimize the cost function f (θ~) via its gradient f (θ~). The calculated gradient will indicate Oˆ ∇θ~ Oˆ the direction in which the cost function shows the greatest change. By going in the opposite direction to that, one can find the (local) minima. Advantages:

• Convergence properties are well established in research. • When the cost function landscape is simple and smooth, it works incredibly well. Disadvantages:

• Time taken for convergence is large when the gradient evaluation is expensive. • Unreliable under noisy gradient evaluations. • Suffers in the presence of exploding or vanishing gradients.

Weighing the pros and cons, gradient-based optimizers are a good choice when there is previous knowledge of the cost landscape and it is known to be smooth to ensure efficient gradient evaluations.

2. Gradient-free approaches: Gradient-free methods rely on evaluating the function at many points in the cost landscape and inferring a minimum from these evaluations. Advantages:

• Works well for complicated energy landscapes that aren’t smooth as they don’t require the ability to evaluate gradients efficiently. • Works decently even in the presence of vanishing and exploding gradients. Disadvantages:

• Requires several cost function evaluations in general. • Doesn’t converge as fast as gradient-based approaches in smooth cost land- scapes.

Gradient-free approaches are suitable when evaluating the gradient is very expensive and when there is no prior knowledge about the cost landscape.

Effect of hardware noise

As a final remark, it is important to mention that VQAs are somewhat resilient to noise. This is due to the ability to counteract errors in the evaluation by varying the parameter. That is, if an error slightly moves the target minimum, it is still possible to find it by changing the parameters accordingly as explained under Section 2.2.2. The cost function is statistical by design. Unlike a typical optimization routine in classical computing, the inherently stochastic environment due to the limited number of measurements, hardware noise, and the presence of barren plateaus (i.e. exponentially vanishing gradients) makes the choice of a suitable classical optimizer difficult. This has lead to active research in the development of quantum-aware optimizers which are believed to deal best with the quantum noise in expectation value evaluations. 46 Variational Quantum Algorithms [ 40 ]. [ 37 ] [ 38 ] ref ground & ex- cited states ground & ex- cited states ground state [ 36 ] ground state [ 36 ] ground state Computed state PSO PSO grid-scan, localtion optimiza- grid-scan, localtion optimiza- Nelder- Meadgrid-scan, localtion optimiza- ground state [ 17 ] hardware-specific parametrized ansatz parametrized Hamiltonian ansatz withscheme truncation hardware-specific parametrized ansatz 2 2 2462 hardware-efficient ansatz hardware-efficient ansatz SPSA hardware-efficient ansatz SPSA UCC SPSA ground state ground state [ 29 ] ground state [ 29 ] [ 29 ] 2 UCC 2 + 2 2 2 2 2 LiH 3 approximate UCC H twounits chlorophyll inring of 18-mer LHII com- plex deuteron 2-3 UCC grid-scan ground state [ 39 ] LiH BeH H H H HeH Representative experimental demonstrations of the VQE algorithm using various quantum computer architectures ion trap ion trap pro- photonic chip + + cessor Ca qubits Si Transmon qubits via cloud Transmon qubits Transmon qubits Ca Transmon qubits Transmon qubits ArchitecturePhotonic chip Molecule qubits Ansatz Optimization routine Table 2.1: Abbreviations: SPSA, simultaneous perturbation stochastic approximation; PSO, particle swarm optimization. Bavithra Govintharajah 47

2.7 Qiskit hands-on example: Molecular quantum simulation using the VQE

The VQE is a popular VQA candidate that has already entered the experimental regime as can be seen in Table 2.1. It is an application of the time-independent variational principle, where the ansatz preparation as well as cost function estimation are both implemented on a quantum processor. As it’s a VQA, the modular structure discussed in section 2.2.1 makes up the basis of VQE. The specific details of the VQE can be broken down into iterative steps:

1. Preparation of the trial states by applying the chosen parametrized unitaries.

2. Determination of the expectation value of every term in the qubit representation of the Hamiltonian via an efficient partial tomography. A large number of executions of quantum circuits is required to evaluate the expectation values. This step consists of three nested iterations:

(a) Outer iteration: to systematically update parameters θ~ of quantum state Ψ(θ~) . | i (b) Middle iteration: to evaluate the expectations value by calculating the weighted sum of Pauli strings. (c) Inner iterations: to evaluate the expectation value of a Pauli strings through sampling.

3. Parameter update using a classical optimization routine

4. Repetition of steps 2 and 3 until convergence criteria are satisfied.

This section is based on Ref. [41] and Ref. [29]. Let us now apply the theory from previous sections to solve the problem under study in Example 1 from 2.4. The goal is to analyze the energy landscape and estimate the ground state energy of H2 molecule using VQE. Fig. 2.2 summarizes the important steps involved in solving this problem.

Before we begin: The calculations and simulations given in this section are performed 10 using Qiskit . Since H2 is a small molecule, the problem is classically tractable and good classical benchmarking results are already available. Hence the reader will be able to replicate the results on their own with the limited computational resources of a personal computer.

2.7.1 Classical pre-processing As discussed in the Introduction, the focus will be on the tools developed to solve the electronic structure problem. Due to the hybrid approach, choices made during the classical pre-processing steps would also influence the overall performance and quantum resource requirements of the algorithm as discussed in 2.4. So let us begin by formulating our problem and translating it into the language of second quantization. This step is classically tractable after some approximations and for the case of the electronic structure problem, there are several computational chemistry packages readily available. In this example, Python-based Simulations of Chemistry Framework (PySCF) [43] is used to obtain the molecular drivers required as input. The classical pre-processing stage might seem at first glance very simple with PySCF and Qiskit in terms of the number of lines of code required, but it is rich in physics. The aim of this section is to produce a self-contained 10At the time of writing, version 0.24.0 48 Variational Quantum Algorithms

Molecular Hamiltonian in real space

Born- Oppenheimer Approximation

Compute orbitals, Hartree-Fock state

Write in second quantized orbital basis

Transform fermionic operators to qubit operators Update parameters

Apply parametrized unitary Measure energy expectation values

Fig. 2.2: A flow chart adapted from [42] describing the steps required to compute molecular energies using VQE. Here, steps inside the dashed box represents classical preparation. short summary of the essential knowledge required for quantum computation for molecular quantum mechanics, and the interested reader is referred to Ref. [44] and [45] for further information.

What is the electronic structure problem?

The fundamental goal in electronic-structure problems is obvious: solve for the ground- state energy of many-body interacting fermionic Hamiltonians. The Hamiltonian of a molecule consisting of Nα nuclei and Ne is

2 2 2 X ¯h 2 X ¯h 2 X e Zα H = i α − 2me ∇ − 2Mα ∇ − 4π0 ~ri R~ α i α i,α | − | (2.24) 2 2 1 X e 1 1 X e ZαZ + + β , 2 4π0 ~ri ~rj 2 4π0 R~ R~ i=6 j | − | α=6 β | α − β|

th where Mα, R~ α and Zα represent the mass, position and the atomic number of the α th nucleus and ~ri is the position of the i . For conciseness, switching to atomic units M 0 = Mα and substituting α me yields

2 2 X i X α X Zα 1 X 1 1 X ZαZβ H = ∇ ∇ 0 + + . (2.25) − 2 − 2Mα − ~r R~ 2 ~ri ~rj 2 R~ R~ i α i,α | i − α| i=6 j | − | α=6 β | α − β| Bavithra Govintharajah 49

Since our primary interest lies in solving this for molecules, where a nucleon is over 1000 times heavier than the electron, the Born-Oppenheimer approximation is applied. As a result, the problem is simplified for a given nuclear configuration where one needs to solve only an electronic Hamiltonian given by

2 X i X Zα 1 X 1 Helec = ∇ + . (2.26) − 2 − ~r R~ 2 ~ri ~rj i i,α | i − α| i=6 j | − |

The aim is to find energy eigenstates of Helec and the corresponding energy eigenvalues to an accuracy of at least 1.6 10−3 hartree11, known as the chemical accuracy. Knowing × the energies up to chemical accuracy enables reaction rate calculations within an order of magnitude. This is one of the promises that one-day quantum computers hope to deliver.

All about quantization, basis sets and mapping

The state under consideration is an electronic wavefunction. This means the anti-symmetric nature of the wavefunction as a consequence of Pauli’s exclusion principle should be taken into account. The anti-symmetrization can be accounted for in either first or second quan- tization methods. The names are largely due to historical reasons, but it’s important to remember that the choice affects how the physical system is simulated on a quantum com- puter. First quantization methods retain the anti-symmetric nature of the wavefunctions explicitly whereas second quantization maintains it through correct exchange statistics and the properties of the field operators. The key difference in the representations becomes more prominent in the context of to qubit mappings. Whether the simulation is done in first quantization or second quantization is distinct from whether a grid-based method or basis set method is used. The details of grid-based methods and basis set meth- ods are beyond the scope of this chapter. Therefore a summary of the different methods are given in Table 2.3. In the case of quantum simulation, the number of basis functions in the basis set and the quantization method used determines the number of gate operations required. For NISQ devices this leads to a compromise between the accuracy obtainable and the number of basis functions due to the limited qubit numbers. The number of qubits required to store the wavefunction in each quantization method is summarized in Table 2.2.

Quantization Number of qubits

First quantization, basis sets method N[log2(M)] Second quantization, basis sets method M

Table 2.2: The number of qubits required to store the wavefunction using dif- ferent quantization methods. N is the number of simulated electrons in the problem, M is the number of spin orbitals used in the basis set. The number of qubits can often be further reduced [41].

2 111 hartree = e = 27.211 eV 4π0a0 50 Variational Quantum Algorithms = i ~x , }

) , i ) 1 ~x ) → ∞ i ( 0 − 0 p ~x N φ ( M ~x 1 { ( − 1 , . . . f − p M M , . . . f 1 , which ensures that the − . . . φ . . . φ M f used determines the number ) | 1 basis functions ) − 0 = M ~x N ...... M i ( ~x f 0 ( | φ 0 φ

) = ! 1 − N 1 N √ ) = Galerkin discretization 1 , . . . , ~x − i ~x N ( ψ electron wavefunction is written as a Slater determinant which approximate electron spin orbitals. ) , . . . , ~x i N i ~x , σ ( i ~r ψ and in occupation number representation in Fock space as Hamiltonian is projected onto ( of qubits and gate operations required toKnown solve as a the problem energy converges to the true value from above as Resource requirement is reduced by exploitingsymmetry the and knowledge the on general spatial form ofThe the number orbital of functions. basis functions Most common basisGaussian-type sets orbitals are (GTOs) and the plane Slater-type wave orbitals basis sets. (STOs), The given by • • • • • • -electron gives the ) N , in 3 dimen- i ) N denotes anti- i which creates , σ i which serve as , z i ,σ A i , z i r i , ,z . . . d~x , y i i i , y 1 i ,y x Ψ i x ( | † x d~x ˆ c N ) i N ) = ( i , σ , . . . , ~x i electron. 1 positioned at ~r ~x , . . . , ~x ) th h Real space grid is described by a i 1 i Wavefunction in position repre- r = ( at grid point ~x | i ( − σ ~x A r ) = ) ( N δ N , . . . , ~x , . . . , ~x 1 1 ~x ~x ( functions ( ψ ψ δ Overview of the grid-based methods and basis set methods used in computational many body problems. N ,...,~x First quantization: sentation is explicitly anti-symmetrized. The wavefunction is given by where symmetrization and Second quantization: set of basis. Creation operator becomes position and spin of the an electron with spin sional space. 1 ~x i Z – – Ψ | = Wavefunction is evaluated on every point in a spatial grid Memory required towith the store size wavefunction of the scales system. exponentially Useful when simulating systemsapproximation where is not the applicable Born-Oppenheimer Grid-based methods inused second in quantization any quantum has or not classical yet simulations [ 41 ] been Table 2.3: • • • • Grid-based methods Basis set methods Bavithra Govintharajah 51

Once the quantization step is over, we proceed to the mapping step. Here the operators that act on indistinguishable must be mapped to operators that act on distin- guishable qubits. To this end, an encoding method, which is just a map from the fermionic Fock space to the qubit Hilbert space is applied. This will lead to every fermionic state being represented by a qubit state. There are multiple second quantization encoding methods, which are described next.

1. Jordan - Wigner encoding (JW)

• Occupation number of the spin orbital is stored in the 0 and 1 state of a | i | i qubit (unoccupied and occupied respectively). † • The creation operator cˆj of the electronic orbital is associated with a qubit j via † ˆ ˆ ˆ† ˆ cˆj Z1 ... Zj−1 Zj I, (2.27) ↔ | ⊗ {z⊗ } ⊗ ⊗ application of a phase and has two operations: changing the occupation number locally and applying a phase according to parity. • Stores the parity non-locally and occupation number locally. • The large weight of the parity-counting phase can be costly for certain quantum simulation algorithms.

2. Parity encoding (P) [46]

• It is the dual version of the JW mapping. • While JW stores the occupation number of each spin orbital in each qubit, the parity mapping stores the parity in each qubit. • Stores the parity locally and occupation number non-locally. • Parity operators are low weight while the occupation operators become high weight.

3. Bravyi-Kitaev encoding (BK) [47]

• Combines the advantages of JW and P encding methods through non-trial trans- formation. • Applies only for systems with a spin orbital number equal to a power of 2 • BK mapping allows for simulation of interacting fermion systems on a graph where each node corresponds to a local fermionic mode and each edge represents a pairwise interaction [40].

2.7.2 Simulating the Hydrogen molecule

Now that we have a clear understanding both the quantum and classical parts of the algorithm it’s time to use that to solve for the ground state energy of H2 molecule using use basis set method in second quantization.

Choose the basis set functions. An ideal choice of basis set function should be one that captures the physics of the problem, such that a good representation can be achieved using the smallest possible set. A good basis set [40]

• allows systematic and fast converging extrapolation to the basis set limit 52 Variational Quantum Algorithms

• has a mathematical form which facilitates the evaluation of molecular integrals,

• should not only describe energies but also other useful properties.

Let us first consider the sets Slater-type orbitals with 3 Gaussians (STO-nG with n=3), split valence basis 6-31G and correlation consistent polarized valence double-ζ (cc-PVDZ) encoded using JW transformation and observe how the number of qubits vary.

1 import matplotlib.pyplot as plt 2 import numpy as np 3 import pylab 4 import tikzplotlib 5 from qiskit import Aer 6 from qiskit.aqua import QuantumInstance, aqua_globals 7 from qiskit.aqua.algorithms import NumPyEigensolver 8 from qiskit.aqua.algorithms importVQE 9 from qiskit.aqua.components.initial_states import Zero 10 from qiskit.aqua.components.optimizers importSPSA 11 from qiskit.aqua.operators import WeightedPauliOperator 12 from qiskit.chemistry import FermionicOperator 13 from qiskit.chemistry.components.initial_states import HartreeFock 14 from qiskit.chemistry.components.variational_forms importUCCSD 15 from qiskit.chemistry.drivers import PySCFDriver, UnitsType 16 from qiskit.circuit.library import EfficientSU2 17 from qiskit.ignis.mitigation.measurement import CompleteMeasFitter 18 from qiskit.providers.aer import QasmSimulator, AerProvider 19 from qiskit.providers.aer.noise import NoiseModel 20 from qiskit.test.mock import* 21 22 plt.rcParams["figure.dpi"] = 100 23 plt.style.use("seaborn") 24 25 basis_sets = [’sto3g’,’631g’,’ccpvdz’] 26 threshold = 0.00000001 27 28 def get_qubit_hamiltonian(d, basis_set, map_type=’jordan_wigner’): 29 """ 30 Obtain the qubit operators from the fermionic operators after second quantization. 31 Args: 32 d: inter-atomic distance 33 basis_set: computational chemistry basis set 34 map_type: encoding method 35 36 Returns: qubit Hamiltonian asa weighted summed of Paulis, energy shift because of nuclear repulsion, number of particles and number of spin orbitals 37 38 """ 39 driver = PySCFDriver(atom="H .0 .0 .0;H .0 .0"+ str(d) , 40 unit=UnitsType.ANGSTROM, 41 charge =0, 42 spin =0, 43 basis=basis_set) 44 molecule = driver.run() 45 h1 = molecule.one_body_integrals 46 h2 = molecule.two_body_integrals 47 num_particles = molecule.num_alpha + molecule.num_beta 48 num_spin_orbitals = molecule.num_orbitals * 2 49 repulsion_energy = molecule.nuclear_repulsion_energy 50 fermionic_op = FermionicOperator(h1,h2) 51 qubit_op = fermionic_op.mapping(map_type=map_type, threshold=threshold) Bavithra Govintharajah 53

52 return qubit_op, repulsion_energy, num_particles, num_spin_orbitals 53 54 55 qubit_ops = {} 56 for set in basis_sets: 57 qubit_ops ["qubitOp_{0}".format(set)],_,_,_ = get_qubit_hamiltonian(0.735, set) 58 59 for label in qubit_ops: 60 print("\n---{}---".format(label),qubit_ops[label])

--- qubitOp_sto3g --- Representation: paulis, qubits: 4, size: 15

--- qubitOp_631g --- Representation: paulis, qubits: 8, size: 185

--- qubitOp_ccpvdz --- Representation: paulis, qubits: 20, size: 2951

According to the output, qubits represent the number of qubits needed and size represent the number of Pauli strings in the Hamiltonian written as a weighted sum. For example, the 4 qubit Hamiltonian obtained with STO-nG basis set and JW encoding above is the sum of 15 weighted Pauli strings given by

H = h0I + h1Z0 + h2Z1 + h3Z2 + h4Z3 + h5Z0Z1 + h6Z0Z2 + h7Z1Z2 + h8Z0Z3 + h9Z1Z3 + h10Z2Z3 + h11Y0Y1X2X3 + h12X0Y1Y2X3 + h13Y0X1X2Y3 + h14X0X1Y2Y3. (2.28)

To be able to simulate the problem on a personal computer, it’s better to choose the basis that will result in fewer qubits. Therefore let’s fix the basis set to STO-3G for convenience, but do keep in mind that working in a suitably large basis set is crucial for obtaining accurate results.

61 bond_lengths = np.linspace(0.1, 3.5, 30) 62 63 devices = ["fake_valencia","fake_valencia_with_mitigation"] 64 65 def construct_qinstance(device): 66 """ 67 Createa quantum instance with required backend properties including 68 noise models 69 70 Args: 71 device: high performance simulator backend to be used 72 qasm_simulator: ideal multi-shot execution 73 fake_valencia: simulation of an actual device backend 74 fake_valencia_with_mitigation:error mitigation applied 75 76 Returns: QuantumInstance obj 77 78 """ 79 simulator_backend = Aer.get_backend(’qasm_simulator’) 80 device_backend = FakeValencia() 81 qubit_connectivity = device_backend.configuration().coupling_map 82 noise_model = NoiseModel.from_backend(device_backend) 83 84 if device =="qasm_ideal": 85 qinstance=QuantumInstance(backend=simulator_backend, 86 seed_simulator=137) 87 88 elif device =="fake_valencia": 89 simulator = QasmSimulator(provider=AerProvider(), 54 Variational Quantum Algorithms

90 method =’density_matrix’) 91 92 qinstance = QuantumInstance(backend=simulator, 93 noise_model=noise_model, 94 coupling_map=qubit_connectivity, 95 seed_simulator=137, 96 seed_transpiler=137) 97 98 elif device =="fake_valencia_with_mitigation": 99 simulator = QasmSimulator(provider=AerProvider(), 100 method =’density_matrix’) 101 102 qinstance = QuantumInstance(backend=simulator, 103 seed_simulator=137, 104 seed_transpiler=137, 105 noise_model=noise_model, 106 measurement_error_mitigation_cls = CompleteMeasFitter, 107 cals_matrix_refresh_period=30) 108 return qinstance 109 110 111 exact_energies = [] 112 113 vqe_energies_effSU2 = [] 114 vqe_energies_uccsd = [] 115 vqe_energies_effSU2_fake = [] 116 vqe_energies_effSU2_fake_mit = [] 117 118 119 counts_effSU2 = [] 120 values_effSU2 = [] 121 counts_effSU2_fake = [] 122 values_effSU2_fake = [] 123 counts_effSU2_fake_mit = [] 124 values_effSU2_fake_mit = [] 125 126 counts_uccsd = [] 127 values_uccsd = [] 128 129 def store_intermediate_result_uccsd(eval_count, parameters, mean, std): 130 """ 131 Callback function forVQE that enables storange of intermediate values 132 Args: 133 eval_count: number of evaluations for convergence 134 parameters: variational paratmeters 135 mean: mean value 136 std: standard deviation 137 138 Returns: 139 counts, mean 140 141 """ 142 counts_uccsd.append(eval_count) 143 values_uccsd.append(mean) 144 return counts_uccsd, values_uccsd 145 146 def store_intermediate_result_effSU2(eval_count, parameters, mean, std): 147 counts_effSU2.append(eval_count) 148 values_effSU2.append(mean) 149 return counts_effSU2, values_effSU2 150 Bavithra Govintharajah 55

151 def store_intermediate_result_effSU2_fake(eval_count, parameters, mean, std ): 152 counts_effSU2_fake.append(eval_count) 153 values_effSU2_fake.append(mean) 154 return counts_effSU2_fake, values_effSU2_fake 155 156 def store_intermediate_result_effSU2_fake_mit(eval_count, parameters, mean , std ): 157 counts_effSU2_fake_mit.append(eval_count) 158 values_effSU2_fake_mit.append(mean) 159 return counts_effSU2_fake_mit, values_effSU2_fake_mit 160 161 162 for dist in bond_lengths: 163 qubitOp, shift, num_particles, num_spin_orbitals = get_qubit_hamiltonian(dist,basis_set=’sto3g’) 164 result = NumPyEigensolver(qubitOp).run() 165 exact_energy = np.real(result.eigenvalues) + shift 166 exact_energies.append(exact_energy) 167 initial_state = HartreeFock(num_spin_orbitals, 168 num_particles, 169 two_qubit_reduction=False, 170 qubit_mapping=’jordan_wigner’) 171 optimizer = SPSA(maxiter=200) 172 173 ansatz = [UCCSD(num_orbitals= num_spin_orbitals, 174 num_particles= num_particles, 175 initial_state= initial_state, 176 qubit_mapping=’jordan_wigner’, 177 two_qubit_reduction=False,), 178 EfficientSU2(num_qubits=qubitOp.num_qubits, 179 entanglement=’linear’, 180 initial_state=Zero(qubitOp.num_qubits))] 181 for psi in ansatz: 182 if psi == ansatz[0]: 183 print("----UCCSD----") 184 vqe = VQE(qubitOp, psi, optimizer, callback=store_intermediate_result_uccsd) 185 vqe_result = np.real(vqe.run(construct_qinstance(’qasm_ideal’))[’eigenvalue’] + shift ) 186 vqe_energies_uccsd.append(vqe_result) 187 print("Interatomic Distance:", np.round(dist, 2),"VQE Result:", vqe_result) 188 189 elif psi == ansatz[1]: 190 print("----Hardware Efficient Ansatz----") 191 vqe = VQE(qubitOp, psi, optimizer, callback=store_intermediate_result_effSU2) 192 vqe_result = np.real(vqe.run(construct_qinstance(’qasm_ideal’))[’eigenvalue’] + shift ) 193 vqe_energies_effSU2.append(vqe_result) 194 print("Interatomic Distance:", np.round(dist, 2),"VQE Result:", vqe_result) 195 196 for device in devices: 197 if device =="fake_valencia": 198 vqe = VQE(qubitOp, psi, optimizer, callback=store_intermediate_result_effSU2_fake) 199 vqe_result = np.real(vqe.run(construct_qinstance(device))[’eigenvalue’] + shift) 56 Variational Quantum Algorithms

200 vqe_energies_effSU2_fake.append(vqe_result) 201 elif device =="fake_valencia_with_mitigation": 202 vqe = VQE(qubitOp, psi, optimizer, callback=store_intermediate_result_effSU2_fake_mit) 203 vqe_result = np.real(vqe.run(construct_qinstance(device))[’eigenvalue’] + shift) 204 vqe_energies_effSU2_fake_mit.append(vqe_result) 205 print(’---{}---’.format(device),"Interatomic Distance:", np.round(dist, 2),"VQE Result:", vqe_result)

----UCCSD---- Interatomic Distance: 0.1 VQE Result: 2.7119857007258554 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.1 VQE Result: 3.3340488838888866 ---fake_valencia--- Interatomic Distance: 0.1 VQE Result: 4.139296024741463 ---fake_valencia_with_mitigation--- Interatomic Distance: 0.1 VQE Result: 3.90308358318452

----UCCSD---- Interatomic Distance: 0.24 VQE Result: -0.2514831954357013 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.24 VQE Result: 1.4645029490669041 ---fake_valencia--- Interatomic Distance: 0.24 VQE Result: 1.233062149233312 ---fake_valencia_with_mitigation--- Interatomic Distance: 0.24 VQE Result: 1.4417877166500799

----UCCSD---- Interatomic Distance: 0.38 VQE Result: -0.8781735458694451 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.38 VQE Result: -0.4645055062588499 ---fake_valencia--- Interatomic Distance: 0.38 VQE Result: 0.5091323601539339 ---fake_valencia_with_mitigation--- Interatomic Distance: 0.38 VQE Result: 0.7053884629541658

----UCCSD---- Interatomic Distance: 0.52 VQE Result: -1.077946447019994 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.52 VQE Result: -1.0465178360766256 ---fake_valencia--- Interatomic Distance: 0.52 VQE Result: 0.2101641377953375 ---fake_valencia_with_mitigation--- Interatomic Distance: 0.52 VQE Result: 0.43395689781122093

----UCCSD---- Interatomic Distance: 0.67 VQE Result: -1.126834155799024 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.67 VQE Result: -0.8147573466213908 ---fake_valencia--- Interatomic Distance: 0.67 VQE Result: -0.21131100514472856 ---fake_valencia_with_mitigation--- Interatomic Distance: 0.67 VQE Result: 0.0006752562516127991

----UCCSD---- Interatomic Distance: 0.81 VQE Result: -1.1410969468908667 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.81 VQE Result: -1.0559514724145356 ---fake_valencia--- Interatomic Distance: 0.81 VQE Result: -0.24018451331369006 ---fake_valencia_with_mitigation--- Interatomic Distance: 0.81 VQE Result: -0.028159296028826053

----UCCSD---- Interatomic Distance: 0.95 VQE Result: -1.1066470715561798 ----Hardware Efficient Ansatz---- Interatomic Distance: 0.95 VQE Result: -0.981918023471895 ---fake_valencia--- Interatomic Distance: 0.95 VQE Result: -0.35585277698967366 ...... Bavithra Govintharajah 57

207 plt.plot(bond_lengths, exact_energies, label=’NumPy Minimum Eigensolver’) 208 plt.scatter(bond_lengths, vqe_energies_effSU2, label=’VQE- Hardware efficient ansatz’, color=’g’, marker=’o’) 209 plt.scatter(bond_lengths, vqe_energies_effSU2_fake, label=’VQE- Hardware efficient ansatz- Valencia’, marker=’o’) 210 plt.scatter(bond_lengths, vqe_energies_effSU2_fake_mit, label=’VQE- Hardware efficient ansatz- Valencia- error mit’, color=’m’, marker =’o’) 211 plt.scatter(bond_lengths, vqe_energies_uccsd, label=’VQE-UCCSD’, color =’r’, marker=’o’) 212 213 plt.xlabel(’Interatomic distance’) 214 plt.ylabel(’Energy’) 215 plt . title (r’$H_2$ Ground State Energy’) 216 plt.legend(loc=’upper right’)

H2 Ground State Energy 3.5 VQE- Hardware efficient ansatz 3 VQE- Hardware efficient ansatz - Valencia VQE- Hardware efficient ansatz - Valencia - error mit VQE- UCCSD 2.5 NumPy Minimum Eigensolver

2

1.5

1 Energy

0.5

0

0.5 −

1 −

0 0.5 1 1.5 2 2.5 3 3.5 Interatomic distance It’s clear from the plot, how the noise from hardware can influence the calculation even for something small as the H2 in STO-3G basis. Changing the basis set to a different one will lead to more qubits and deeper circuits. In principle it’s possible to achieve reasonable good results with the hardware efficient ansatz. This simulation is just an exercise and hence doesn’t give extremely good results, but very accurate results were achieved for H2 with the hardware ansatz in Ref. [29]. As for the UCCSD ansatz, it almost coincides with the exact value calculation but entanglement requirements and deeper circuits will be a problem when extending it to larger molecules. As an exercise one can play around with making the optimization better using a different optimizer or initializing the optimizer with a suitable initial value. Another interesting task would be to look at how the optimizer converges. The function store_intermediate_result will store the intermediate values 58 Variational Quantum Algorithms that the user can retrieve through the VQE callback. One can look at the execution counts to understand how the optimizer is performing. This usually falls into the topic of classical optimization and therefore omitted from this example.

2.8 Outlook and conclusions

Will NISQ devices running VQAs be able to outperform classical algorithms that find solu- tions to the same computational problems? Nobody knows yet, but there is hope. Even if the early generation NISQ devices are incapable of competing with classical methods that were honed over decades of research, experimental results encourage that VQAs can soon surpass classical methods, and so spur further scientific advancements. Given that varia- tional algorithms are heuristic and provide only approximate solutions, one might wonder whether there will be value in these strategies once error-correction becomes possible, es- pecially compared to quantum algorithms with a proven advantage on the same tasks. The first thing to acknowledge is that composing algorithms with proven advantage is challeng- ing, and so far only a relatively small number of such algorithms have been discovered. Therefore, for many applications, variational algorithms may be the only quantum algo- rithms available, even in the error-correction era. A second consideration is that variational algorithms are intended to optimize the use of quantum resources, and therefore it is likely that their error-corrected implementations are more efficient than implementations of their counterparts with proven advantage. Consequently, quantum algorithms might still rival traditional algorithms regarding computational cost. Furthermore, since formal demon- strations of quantum advantage are based on asymptotic considerations, there is a chance for variational algorithms to be more efficient and adequately accurate for particular in- stances of a problem. Finally, the third consideration is that variational algorithms can be joined with traditional quantum algorithms to obtain even more powerful approaches. A concrete example of this is QPE and VQE. Procedures employed for simulating quantum systems with provable speed-up, such as QPE, require the preparation of states with suf- ficient overlap with the eigenstates of the Hamiltonian to measure eigenvalues with high probability. Using VQE, it will be possible to prepare such states, ultimately boosting the success probability of phase estimation. Finally, it must be pointed out that the suc- cess of variational methods ultimately depends on the quality of the quantum devices and the quantum operations they implement. Therefore, maximizing the utility of quantum devices, in particular, early NISQ devices, is another way to push quantum computing towards the practical frontier. These improvements can be introduced at the software level by finding more efficient protocols to implement quantum operations common to many algorithms. The work presented here is only a part of the continuously growing body of research in variational algorithms. The journey of this field has just started, and many exciting developments await us.

App. 2.A The variation principle

Consider an arbitrary physical system with Hamiltonian H. The time-independent Schrödinger equation then reads as, H ϕ = E ϕ ; n = 0, 1, 2,... (2.29) | ni n | ni Although the Hamiltonian H is known, this is not necessarily the case for its eigenvalues E and the corresponding eigenstates ϕ . Hence, the variational methods are the most n | ni useful when an exact diagonalization of H is not feasible or not known prior. Such instances of H also come with a drawback as it is difficult, if not impossible, to evaluate the error in Bavithra Govintharajah 59 a calculation when the final exact solution is unknown. This method is most useful when an exact diagonalization of H is not feasible and not known prior. Following Ref [44, 48, 49], the solution of Eq (2.29) with the approximate state 0 , | i recast in the form of a variation principle for the energy written as an expectation value is given by,

0˜ H 0˜ ˜ E[0] = . (2.30) 0˜ 0˜

Here, 0˜ denotes some approximate eigenstate. The square brackets indicate that the en- ergy functional depends on the form of the wavefunction rather than a set of parameters12. As the first step, a one-to-one relationship between the solutions to the Schrödinger equation and the stationary points of the energy functional E[0]˜ is established. Let 0 | i represent a solution to the Schrödinger equation (2.29) and δ , an allowed variation. The | i approximate state to the eigenstate is then given by,

0˜ = 0 + δ . (2.31) | i | i Inserting (2.31) in (2.30) followed by an expansion in orders of δ around 0 yields, | i | i 0 H 0 + 0 H δ + δ H 0 + δ H δ E[0 + δ] = h | | i h | | i h | | i h | | i 0 0 + 0 δ + δ 0 + δ δ h | i h | i h | i h | i = E + 0 H E δ + δ H E 0 + O(δ2) 0 h | − 0| i h | − 0| i 2 = E0 + O(δ ) (2.32)

The first order variation in E[0]˜ therefore vanishes whenever 0˜ corresponds to one of the eigenstates 0 . This shows that the eigenstates of the Schrödinger equation represent | i stationary points of the energy functional. Conversely, to show that all stationary points of E[0]˜ represent eigenstates of the Schrödinger equation, let 0 be a stationary point of the energy functional. For a variation | i δ , expansion of the energy functional around the stationary point similar to (2.32) gives, | i 0 H E[0] δ + δ H E[0] 0 = 0. (2.33) h | − | i h | − | i For the variation i δ , | i 0 H E[0] δ δ H E[0] 0 = 0. (2.34) h | − | i − h | − | i Combining Eq. (2.33) and Eq. (2.34) one obtains, δ H E[0] 0 = 0. (2.35) h | − | i Since this relation holds for any arbitrary δ , the eigenvalue equation is then, | i H 0 = E[0] 0 . (2.36) | i | i This shows that the each stationary point E[0] of the energy functional E[0]˜ also represents a solution to the Schrödinger equation with eigenvalue E[0]. To summarize in one line, the variational principle states that the solutions of the Schrödinger equation (2.29) are equivalent to a variational optimization of the energy functional (2.30). Due to the flexible nature of variational methods, they can be adapted to very diverse problem scenarios. It also gives great scope to physical intuition in the choice of trial function. It’s important to keep in mind that even though good approximations of eigen- values are obtained rather easily, the approximate states may come with certain completely unpredictable erroneous features that cannot be checked. 12In most applications like in VQAs, the wavefunction is described in terms of a finite set of parameters. During such cases, the usual notation for functions is used. 60 Variational Quantum Algorithms

App. 2.B Pauli basis

The contents of this section are from [23]. Definition 2.B.1. The state of an N-qubit quantum register is represented by a norm-1 N vector in the Hilbert space = C2 , under the association ψ eiφ ψ for φ R. H | i ∈ H ≡ | i ∈ ⊗ Definition 2.B.2. The Pauli basis on N qubits is defined as PN := I,X,Y,Z N , { } where I,X,Y,Z are the 2 2 matrices on C2: × 1 0 0 1 0 i 1 0  = X = Y = − Z = . I 0 1 1 0 i 0 0 1 −

Theorem 2.B.1. The Pauli basis is a basis for the set of 2N 2N complex valued matrices. It is also a basis for the set of Hermitian matrices if one chooses× real coefficients.

Remark. The Pauli basis is not a group under matrix multiplication, as the single-qubit Pauli matrices pick up a factor of i on multiplication. The closure of the Pauli basis is the Pauli group ΠN = i PN ; this is four times as large, and no longer has the basis {± } × properties of PN .

Some useful properties of PN :

• P 2 = I for all P PN . ∈ • For P,Q PN , either [P,Q] = 0, or P,Q := 0, and P commutes with precisely ∈ { } half of PN

• P PN = I has only two eigenvalues, 1 and the dimension of the corresponding ∈ 6 − ± N eigenspaces is precisely 2N 1 (i.e. each P divides C2 in two)

• This division by two may be further continued, given P,Q = I such that [P,Q] = 0, 6 P and Q divide the Hilbert space into 4 eigenspaces (labeled by combinations of their eigenvalues).

• To generalize, one can form a [N, k] stabilizer group , generated by k commuting, S Hermitian, non-generating elements of PN (up to a complex phase); this diagonalizes k − C2 into 2k unique eigensectors of dimension 2N k. When N = k, these sectors contain single eigenstates, which are called stabilizer states.

• Given such a stabilizer state ψ and Hermitian P PN , either P ψ = ψ or | i ∈ | i ± | i ψ P ψ = 0. h | | i Chapter 3

Topological Quantum Error Correction: Codes and Decoders

Orkun Şensebat Supervisor: Eliana Fiorelli

Quantum computation is notoriously vulnerable to noise to an extent that out- side interference remains a major hurdle for the realization of large-scale, fault- tolerant, universal quantum computers. Topological error correcting codes re- alize large error thresholds and already show promise for real-world quantum computing. Due to that, we have tried to offer an introduction to the theory of topological quantum error correction that is targeted even to readers entirely new to the field of quantum information processing. The framework that this theory is built upon is the stabilizer formalism, which is derived somewhat axiomatically yet concisely in the first part. The reader is going to learn the concepts of toric codes, rotated surface codes and even general surface codes as quickly as possible. We conclude the report with a brief discussion of decoders and present simulated data for the rotated ’surface-17’ code and estimate the error threshold for an independent noise model of random bit flips and an optimal decoder to 7.5%.

3.1 Introduction

The example we concentrate on in this introduction is the bit flip error and bit flip code. In the subsequent section the stabilizer formalism is presented in a mostly axiomatic approach but guided by the then familiar bit flip code. This stabilizer formalism is fundamental for topological error correction and will be the mathematical framework of this report.

The Bit Flip Error

Suppose is a 1-qubit Hilbert space in the computational base = 0 , 1 . The bit H B {| i | i} flip error is described by a unitary operator X that acts as

X 0 = 1 X 1 = 0 . | i | i | i | i 62 Topological Quantum Error Correction: Codes and Decoders

X flips the ground state with the excited state and vice versa. It is represented by one of the Pauli matrices 0 1 σ = , X 1 0 which is why this class of discrete errors is sometimes called Pauli error.1 A quantum bit can encode not only two distinct states but an entire continuum of distinct states. As such, it is intuitive that a quantum system is more sensitive to outside interference. We will need a way to detect these errors and to clean undesired artefacts up. If the degradation is too high, we risk losing the encoded quantum information. [3]

Difficulties

These errors are inevitable and a bigger problem than in classical computing as will be illustrated in this section. One caveat in quantum physics is that we cannot retrieve the quantum information of a qubit, to see if it might have changed. If we could, we could simply do some advanced bookkeeping and hope to detect random changes through noise that way. But to read out a qubit, we have to perform a measurement and it is in the nature of quantum physics that measurement can destroy quantum information. [49] To illustrate this, imagine we measured X. If initially α were vertically aligned in the | i Bloch sphere, that is if it were one of the two Eigenstates a , then the measurement | i ∈ B outcome would be 1 with certainty. However, we do not know a priori if the initial state ± is vertically aligned in the Bloch sphere. It could point in any direction other than the poles. That is, it would be a superposition of the two base states 0 and 1 . In this case, | i | i mother nature would flip a coin and destroy the quantum information. After measurement the state would be in one of the two eigenspaces. In simpler terms: A measurement would align a qubit vertically, even if it was not aligned prior to that. A further caveat arises from the no-cloning theorem, which states that quantum information cannot be cloned. [3] There is still a way around these difficulties. By using a CNOT-gate we can repeat the quantum information from our initial qubit. We encode the quantum information by distributing it redundantly across multiple qubits. 0 00 1 11 | i 7→ | i | i 7→ | i Note that this does not clone the first qubit into a second, independent qubit. The two qubits do not form two independent 1-qubit Hilbert spaces . Instead, they live in the H1 highly entangled two-qubit Hilbert space = . This entire approach is similar H2 H1 ⊗ H1 to what someone would do during a phone call with bad reception. If noise is a problem, we would simply repeat the last thing we have said.

The Bit Flip Code

Fig. 3.1 shows a simple three qubit circuit with two CNOT gates. It repeats an input qubit twice to end up with an output according to the map

0 000 1 111 | i → | i | i → | i 000 and 111 are called code words or logical states and are often denoted 0 | i | i | Li and 1 . The code words build the basis of the bit flip code or code space. It is possible | Li to encode the physical input state a in the code word aaa . | i ∈ H1 | i 1In general literature Pauli operators are commonly denoted with the σ symbol. However, in quantum computation it can be cumbersome to carry all the redundant σ-symbols around. We will be using the notation X instead. Orkun Şensebat 63

α | i 0 | i 0 | i Fig. 3.1: Encoding circuit for the bit flip code.

When we model noise, we give each qubit a probability of p to flip, which leaves a probability of 1 p to remain. If p is too large, say p > 0.5, then correct error detection − becomes hopeless. Let us assume reasonably small error probabilities, meaning it is much more likely to detect a single error with a probability of 3p(1 p)2 O(p). In contrast, − ∈ the probability to observe two flips at once is 3p2(1 p) O(p2) and much smaller by a − ∈ factor p/(1 p). A quantitative analysis of physical and logical error rates can be found − in section 3.3.3 Decoding 000 or 111 is straight forward. But is much larger than the code | i | i H3 space = . How do we decode a state such as 010 ? By estimating the most likely C ∼ H1 | i error that could cause this particular kind of misalignment. It is much more likely that the logical state initially was 000 , but that qubit 2 flipped due to independent noise. It | i remains possible, however far less likely, that the logical state initially was 111 and that | i we observed two independent bit flips on qubit 1 and 3 at once. The most likely outcome can be selected through a simple majority vote.

Parity

We are back to our original caveat. If we simply measured our quantum state by measuring the three observables Z1, Z2 and Z3 we would destroy our quantum information. But as illustrated earlier, we are only interested in the alignment of the qubits.2 What if we measured Z1Z2? The result is +1, if qubits 1 and 2 are in the same computational state and 1 if they are in distinct states. We call the first case even − 3 parity and the latter odd parity. By measuring Z1Z2 we measure parity. If parity is odd, then we know qubit 1 or qubit 2 flipped. If it is even, then they have not flipped — or both independently flipped at the same time. However, that does not help detect bit flips on qubit 3. We will require an observable that acts non-trivially on the third qubit. Either of Z2Z3 and Z1Z3 do the job and we only need one of them.4 If we are able to measure the parity of qubits 1 and 2 and the parity of qubits 2 and 3, we can thus figure out if an error occurred and even triangulate where it occurred.

Z1Z2 Z2Z3 Error Even Even I Odd Even X1 Odd Odd X2 Even Odd X3

Fig. 3.2: Parity table for the three qubit code. This look-up table allows to implement an optimal decoder as will be presented in section 3.3.3.

2 Z1 ≡ Z ⊗ I ⊗ I. 3For historic reasons, one can often find the word syndrome or error syndrome instead of parity. This wording is borrowed from disease diagnosis. 4While Z ⊗ Z ⊗ Z does detect arbitrary bit flips, it does not help us find out which qubit flipped. 64 Topological Quantum Error Correction: Codes and Decoders

So far we have not explained how measuring parity is any better than measuring the entire qubit via Z1,Z2 or Z3 directly. When we measure a combination of two Zi-operators, we are only extracting the parity information. That is the information about the align- ment of both qubits. The actual quantum state and, thus, the quantum information is neither touched nor destroyed. Our encoded information will remain exactly the same with certainty. To realize parity measurements, extra qubits are required: So-called ancillary qubits. These ancillas will be in the ground or excited state, in case there is even or odd parity. [3]

3.2 Quantum Error Correction

3.2.1 The Stabilizer Formalism

In the preceding section our analysis of the bit flip code led us to consider a way to detect errors via unitary operators such as Z1Z2. These operators play a special role and are called stabilizer operators because they act on the code space in a way that it is left invariant. In this section we formulate the basic theory of quantum error correction.

Concepts of Quantum Error Correction

Definition The set

G := I, iI, X, iX, Y, iY, Z, iZ 1 {± ± ± ± ± ± ± ± } is called the single-qubit Pauli group To prove the set actually forms a group a multiplication table is required, which we forego in this introduction. Note that the signs and factors of i are needed because products like XY = iZ introduce the imaginary unit. The Pauli group can be generalised to n qubits via tensor product and is denoted Gn. A familiarity with Pauli operators is a requirement to understand quantum error cor- rection. We present a quick review of their properties but refer the reader to [3] and [49] for proofs and a full overview as it would be redundant to verify the properties explicitly. Theorem Suppose n N and σ Gn. Then ∈ ∈ 1. [σ, σ0] = 0 or σ, σ0 = 0 σ0 G { } ∀ ∈ n 2. σ2 = I⊗n 3. σ† = σ From the second and third property follows that Pauli operators have to be unitary. We already illustrated the simplest possible error in the introduction: The bit flip error X. In this report, it is sufficient to regard the following, discrete set of errors Definition Suppose that n N and j 1..n . A unitary operation E : n n is ∈ ∈ { } H → H called Pauli error or error on the j-th qubit, if E is a linear combination of I,Xj,Zj and X Z with X ,Z G . j j j j ∈ n Quantum states are encoded by a unitary operation into a quantum error-correcting code . Formally speaking, C is a subspace of some larger Hilbert space .[3] C H

The Stabilizer

Definition Suppose that n, k N. Suppose is an abelian subgroup of the n-qubit ∈ S Pauli group G and has n k independent generators. The vector space defined as n S − [n, k] := α S α = α S is called a . is called the {| i | | i | i ∀ ∈ S} ≤ H S stabilizer of [n, k]. Orkun Şensebat 65

So the stabilizer is the group of Pauli operators that leave all code words and, due to linearity, the entire code space invariant. [50] We already know a set such that α = α from last section. Let n = 3 and D E S S| i {| i} = Z1Z2,Z2Z3 . The subspace fixed by Z1Z2 is spanned by aab a, b Z/2Z and S {| i | ∈ } the subspace fixed by Z2Z3 is spanned by abb a, b Z/2Z . The intersection is the set {| i | ∈ } of the states 000 and 111 , which span the code space [3, 1]: The bit flip code. | i | i Theorem S1 Suppose that n, k N and suppose Gn is the stabilizer of [n, k] = 0 . ∈ S ≤ 6 { } Then

I − 6∈ S [S, S0] = 0 S, S0 . ∀ ∈ S = 2n−k |S|

Proof. 1) ( I) α = α α = 0. − | i | i ⇐ | i 2) G so S, S0 either commute or anti-commute. But SS0,S0S , and since S ≤ n ∈ S SS0 = ( I)S0S part 2 follows from part 1. − 3) has n k independent generators. For an abelian group of self-inverse operators, S − any element can be written in terms of a bit-string. The algebra is inherited directly from n−k Z2 . The neutral element is represented as a string of zeroes and a string of zeroes alone. Consequently, using the inverse element, no generator can be written as a product of other generators. Any stabilizer element must be described by a unique bit-string. The order must be 2n−k.

= I,Z Z ,Z Z ,Z Z is a stabilizer of [3, 1]. It is isomorphic to the cyclic group S { 1 2 2 3 1 3} Z/4Z and, thus, a group. Evidently, theorem S1.1 holds as I is not a stabilizer element, − S commute, because their constituents are I and Z which commute with themselves ∈ S and each other, and has order 4 = 23−1. S When we describe a code in the stabilizer formalism, we are interested in the minimal set of independent generators. By making use of the group structure we can avoid a lot of extra work. For instance, to prove that an operator A commutes with the stabilizer = S ,S , it is enough to show that it commutes with the stabilizer generators S h 1 2 · · ·i S ,S , . 5 We leave this important missing bit of information as a theorem without a 1 2 ··· proof and refer the reader to a study of chapter 4 in [3] Theorem S2 k can be identified as the number of encoded logical qubits. In our introduction, it was rather straight forward to detect and correct the bit flip error. This process is called decoding. Other decoders are not as intuitive. We perform measurements of the stabilizer generators. Their outcomes are called error syndrome and yield parity information without destroying the quantum information that is encoded. We have to decide the most likely error responsible for this error syndrome. Successful detection allows us to apply the right correction strategy to recover from the error.

Logical Operators

Encoding the quantum information inside logical qubits 0 and 1 can only be of benefit | Li | Li if we can perform operations on them. If we can find operators XL and ZL that act on logical qubits the same way Pauli operators X,Z G act on physical qubits, then we ∈ 1 5Using a so-called check matrix we can find a simple algorithm to determine whether a particular set of generators is independent or not. 66 Topological Quantum Error Correction: Codes and Decoders

can express any Pauli operator in terms of a product of XL and ZL. These operators are called logical operators. Definition Suppose n N and XL,ZL Gn. Suppose is the stabilizer of a code D E ∈ ∈ S [n, 1] = 0 , 1 . X and Z act as the logical X operator and the logical Z | Li | Li L L operator of [n, 1], if

Z 0 = 0 X 0 = 1 L| Li | Li L| Li | Li Z 1 = 1 X 1 = 0 L| Li −| Li L| Li | Li Encoding more than one logical qubit is the exception rather than the rule in this report. Regardless, it should be clear how the definition can be generalised to larger code spaces. One consequence is fundamental that it makes sense to underline the following lemma. Lemma L1 Logical operators XL and ZL anti-commute.

Proof. X ,Z = 0 1 + 1 0 , 0 0 1 1 { L L} {| Lih L| | Lih L| | Lih L|−| Lih L|} = 0 1 + 1 0 + 0 1 1 0 = 0 −| Lih L| | Lih L| | Lih L|−| Lih L|

These properties are requirements for logical operators. We will look for operators that have the same action and use definition as qualifiers for candidates. The choice is not unique. For the bit flip code we could choose X = X X X XXX and L ⊗ ⊗ ≡ Z ZII,IZI,IIZ,ZZZ . Note that stabilizer elements are tensor products two Z L ∈ { } operators, whereas logical operators are tensor products of an odd number of non-trivial Pauli operators. Because of this odd number of Z operators, swapping the order of XL and Z introduces an odd number of sign changes, resulting in X ,Z = 0 according to L { L L} lemma L1. We will be working with numerous stabilizers. It is worthwhile to analyse the relation- ship of the logical operators to the stabilizer group in the following theorem. Theorem S3 Suppose XL and ZL are logical operators that act on a code stabilized by . Then S [X ,S] = [Z ,S] = 0 S L L ∀ ∈ S X ,Z / L L ∈ S So XL and ZL behave like stabilizer operators without being stabilizer elements them- selves. This is of course to be expected if we recall the definition of a stabilizer. XL and ZL do not stabilize the logical state. Instead, they operate per definition. Note that it is sufficient to check the commutators with the stabilizer generators. Theorem S3 holds for the bit flip code as [X , ] = 0 is inherited from [X,Z] = 0 L S { } and the commutator [Z , ] = 0 is trivial as S and Z are tensor products of Z L S { } ∈ S L operators. Furthermore, none of the candidates for logical operators are stabilizer elements.

XXX,ZII,IZI,IIZ,ZZZ/ ∈ S 3.2.2 Comparison of Quantum Codes

Fig. 3.3 contains an overview of interesting quantum codes and their stabilizer generators. We presented a motivation for the first row in the preceding section. Orkun Şensebat 67

Code n Generators XL ZL Circuit α | i ZZI Bit flip 3 X⊗n Z⊗n 0 IZZ | i 0 | i α H | i XXI ⊗n ⊗n Phase flip 3 Z X 0 H IXX | i

0 H | i XZZXI IXZZX 5 qubit 5 X⊗n Z⊗n XIXZZ ZXIXZ IIIXXXX IXXIIXX XIXIXIX Steane 7 X⊗n Z⊗n IIIZZZZ IZZIIZZ ZIZIZIZ α H | i 0 | i 0 ZZIIIIIII | i IZZIIIIII 0 H IIIZZIIII | i IIIIZZIII Shor 9 Z⊗n X⊗n 0 IIIIIIZZI | i IIIIIIIZZ 0 XXXXXXIII | i IIIXXXXXX 0 H | i 0 | i 0 | i

Fig. 3.3: List of [n, 1] quantum codes that encode one logical qubit and their respective stabilizer generators. n is the number of physical qubits. In this short-hand notation AB A B. Most of the logical operators allow for multiple choices. They all belong to ≡ ⊗ the same equivalence class of logical operators. 68 Topological Quantum Error Correction: Codes and Decoders

The Phase Flip Code

We can change the computational base = 0 , 1 to the Hadamard base + , . H B {| i | i} {| i |−i} is the Hadamard operator, represented by the Hadamard matrix

1 1 1  H = . √2 1 1 − The generators and logical operators of the phase flip code in the second row of fig. 3.3 look similar to those of the bit flip code. That is not a coincidence, because the code space is the unitary transformation of the bit flip code under the Hadamard transformation. D E D E H⊗3 000 , 111 + + + , . C ↔ C ⇔ | i | i ↔ | i |− − −i Consequently, any stabilizer element and logical operator can be derived via unitary trans- formation. H⊗3 H⊗3 X H⊗3X H⊗3 Z H⊗3Z H⊗3. S ↔ S L ↔ L L ↔ L This dual correspondence is to be interpreted in a way that the phase flip code can protect from an arbitrary Z error as opposed to an arbitrary X error. Z errors flip the phase of the 1 components relative to the phase of the 0 component. However, after change of | i | i base from H , the Z errors act just like the X errors do in the computational base B → B . B The Shor Code

The Shor Code combines both bit flip code and phase flip code in a single encoding circuit. The resulting code protects from arbitrary bit flip and phase flip errors. [51][3] As a consequence, the Shor code can protect against an arbitrary Pauli error. It even protects against continuous errors. This remarkable result is justified in section 10.2 of [3] and forms the foundation for fault-tolerant quantum computing.

Other Codes and Fault Tolerance

For reference, fig. 3.3 includes the 5 qubit code and 7 qubit code, the latter of which is sometimes called and belongs to an important class of CSS codes.[3] The Shor code is a good introduction to understanding quantum error correction and especially the stabilizer formalism. However, the Steane code can also protect against an arbitrary bit flip or phase flip error while requiring only 7 data qubits as opposed to 9. To determine if error correction is reliable enough, noise and de-coherence have to be simulated. Each qubit, wire and gate can be subject to independent or even biased errors. In fault-tolerant quantum computation quantum information is not merely protected in storage or transit, but dynamically as it undergoes computation. [3] Fault-tolerance can be achieved even with imperfect gates provided that they are accurate enough relative to the physical error rate p.[51] Since perturbations of the quantum computer have to be modelled in such detail, different encoding circuits can be advantageous with certain models of noise. [3] Orkun Şensebat 69

3.3 Topological Error Correction

Topology is concerned with the non-local structure6 of a geometric object. [50] Topolog- ically speaking, a doughnut and a cup are identical, as the cup has a handle and the doughnut has a hole. In topological quantum information we are able to distribute the quantum information across the entire topology as such computation is fault-tolerant by its very nature. [51] In this section we will explore topological ideas which form the basis of the toric code.

3.3.1 The Toric Code

The toric code is the simplest topological surface code possible and, as such, serves as a good introduction to the concept of topological error correction. It was introduced by in 1997 in his pioneering paper Fault-tolerant quantum computation by anions.[51]

Fig. 3.4: A torus. If we constructed the torus by wrapping around a flat surface, the loops would correspond to the edges of the surface. The illustration was taken from Wikipedia Commons7.

Surface Codes

From now on we will be dealing with qubits aligned on an L L grid. The qubits will × sit on the centre of the edges of the grid as can be seen in fig. 3.5. Let us topologically arrange the L L grid on a torus. When we reach the end of an axis, we end up back × where we started. The combination of four Z-operators aligned on the edges of a square is called a pla- quette operator. Likewise, the combination of four X-operators on the ends of a cross makes up a vertex operator. Fig. 3.6 depicts a plaquette and a vertex operator. Theorem The stabilizer of the toric code is generated by the set of plaquette and vertex operators.

Proof. Suppose S are plaquette operators and S0 are vertex operators, i 1..L , that i i ∈ { } generate the group . S First note that G 2 , meaning directly inherits the properties of the Pauli group S ≤ 2L S G2L2 . [Si,Sj] = 0 is trivial because their constituent single-qubit Z-operators trivially com- 0 0 mute. The same holds for vertex operators: [Si,Sj] = 0. 0 Of course [Si,Sj] = 0 if the stabilizer operators are not neighbours as in fig. 3.6, 0 because then the constituent operators act on different qubits altogether. If Si and Sj

6Chapters 3 and 4 of are highly recommended for a more mathematical lecture of topology targeted at physicists. 7https://en.wikipedia.org/wiki/File:ToricCodeTorus.png 70 Topological Quantum Error Correction: Codes and Decoders

Fig. 3.5: Two dimensional illustration of a 5 5 grid aligned on a torus. The qubits on × the right and bottom edges are greyed out, because they correspond to the ones on the opposite sides. The illustration of the grid was sourced from [50] and modified in the later figures containing the grid.

!

! !

!

"

" "

"

Fig. 3.6: Example of a plaquette operator and vertex operator. are neighbours, then they always share two qubits and once more two sign changes must 0 cancel each other. In either case Si and Sj commute: [Si,Sj] = 0.

-1 X -1

-1

Z

-1

Fig. 3.7: Illustration of an X error and a Z error on the toric code. The adjacent stabilizer elements indicate odd parity. Measuring them yields -1 with certainty.

Fig. 3.7 illustrates how we can detect simple bit flip and phase flip errors acting on this stabilizer code. Assuming a single flip, the error will always be accurately detected. Further below we will demonstrate how it is not always possible to detect the actual error with certainty and how to cope with that. However, assuming a single flip the error will always be accurately detected. To correct one such error E X ,Z we choose a correction ∈ { i i} C = E as our recovery strategy. Since E is self-inverse, the result is CE = I. The error Orkun Şensebat 71 followed by the recovery strategy contracts to unity and leaves the code space invariant. Any potential action induced from one such error E is undone.

Encoded Qubits

Each plaquette shares its four edges with one neighbour. Meaning the L2 plaquettes effectively have two unique edges each for a total of 2L2 edges. The question that arises is how many qubits we can encode in this arrangement. To that end, we must count the stabilizer generators and apply theorem S2. The product of plaquettes always yields an operator that consists of an even number of Zs. When we multiply two plaquettes that do not touch, the result will be two disjoint plaquettes with a total of 8 edges as is depicted in fig. 3.8.

Z

Z Z

Z

Z

Z Z

Z

Fig. 3.8: Product of two disjoint plaquette operators.

If they do touch like in fig. 3.9a, then they do so on exactly one edge. The Zs annihilate because Z2 = 1, so we are left with 6 edges or 6 Zs. In a practical sense, the plaquettes contract.

Z Z

' z z Z

Z Z

Z Z Z Z Z Z Z Z Z

Z Z

Z Z Z Z Z Z Z Z Z (a) (b) (c)

Fig. 3.9: (a) Product of two neighbouring plaquette operators. (b) String of plaquettes that loops around the torus. (c) Multiplication of the last remaining row of plaquettes.

The product of plaquette stabilizer elements can be illustrated with their edges in fig. 3.9. Since they generate a set, their products must also be stabilizer elements. Note that as we string up plaquettes via multiplication we end up looping all the way around the torus. The edges annihilate as can be seen in fig. 3.9b. Fig. 3.10 should raise eyebrows. This almost looks like a single plaquette operator. How do we know which side is inside and which side is outside? The answer to this question is that it does not matter. Both stabilizer operators are identical. Furthermore, 72 Topological Quantum Error Correction: Codes and Decoders

Z

Z Z

Z

Fig. 3.10: Product of all but one plaquette operators.

Count Physical qubits n 2L2 Stabilizer generators 2L2 2 − Encoded qubits k 2   Fig. 3.11: Number of encoded qubits in the L L toric code. Consequently it is an L2, 2 × code. if we multiply all stabilizer generators together, the result is unity. The L2 plaquettes cannot form an independent set of generators. There can at most exist L2 1 independent − plaquette stabilizer generators. The thought process for vertices is analogous as our choice of grid is arbitrary. Anything we did for plaquettes can be done for vertices by shifting the grid by half a unit cell. Applying theorem S2 we can now calculate the number of encoded qubits

Logical Operators

The logical operators are the cuts in fig. 3.12b. That is because theorem S2 is met.

X

Z

X

Z

X

Z

Z Z Z Xz Z

' X X z X X

X

Z

(a) (b)

Fig. 3.12: (a) (b)

Error Recovery

It is not always possible to find out the error that occurred just from the error syndrome. Fig. 3.13 illustrates this. Here the error E is X1X2. If we choose a recovery C = X1X2, then the combination CE contracts to unity. However, if we guessed wrong and chose Orkun Şensebat 73

0 0 C = X3X4, then the total action would be C E = X1X2X3X4. But that is a full vertex and, thus, a stabilizer generator.

X -1 -1 1

X 3 X2

X -1 4 -1

(a) (b)

Fig. 3.13: Error recovery

There are four distinct error classes which form a partition. Each part is an equivalence class. It is impossible to transfer one element of one error class to another error class by performing stabilizer operations. Note how in fig. 3.14b and 3.14c the error chains cross the top and the side boundaries respectively. The error loop in fig. 3.14c crosses both boundaries simultaneously.

X

X

X

X

-1 x x -1 -1 -1

X

X

X

(a) (b)

X

X

X

x x -1 -1 x -1 -1

X

X

X

x x

(c) (d)

Fig. 3.14: Examples of error class representatives

3.3.2 Planar Codes

The toric code requires a closed surface. It is desirable to design circuits on a flat, planar surface. However, those surfaces always have a boundary. In this section we will analyse, whether we can construct a code on a planar surface. 74 Topological Quantum Error Correction: Codes and Decoders

Boundaries

Fig. 3.15: Lattice for the minimal surface code [13, 1]

Fig. 3.15 shows a grid aligned on a planar surface. Note that the bottom and top boundaries have a different topology than the left and right boundaries. On the inside, there are two familiar plaquettes and vertices each. However, the top and bottom plaque- ttes and the left and right vertices only have 3 edges as opposed to 4. Fig. 3.16 shows these weight-3 plaquettes and vertices.

(a) Weight-3 Plaquettes (b) Weight-3 vertices

Fig. 3.16

In topology, boundaries like on the left and right side are called primal boundaries. On the other hand, the top and bottom ones are called dual boundaries because on the dual lattice they have the same properties as the primal boundaries. In the last section we have enumerated the number of logical qubits and realised that multiplication of all L2 plaquettes yields unity. For an open surface, which has a boundary, this no longer is the case. The product can easily be illustrated in fig. 3.17. All edges with only one neighbour vertex do not contract to unity in 3.17a but instead remain X. Likewise, the edges with only one neighbour vertex don’t contract to unity in 3.17b but instead remain Z.

Minimal Surface Code

There are 12 weight-4 plaquette and vertex operators and there are 12 weight-3 plaquette and vertex operators. Note that they are independent. The encoding circuit depicts 21 qubits. We can count a total of 21 (12 4 + 12 3)/4 = 1 logical qubits. − × × Orkun Şensebat 75

Z Z

Z Z

Z Z Q Q (a) Xi (b) Zi

Fig. 3.17: Product of all (a) vertex operators and (b) plaquette operators. The result is no longer unity.

Fig. 3.18: Rotated surface code. The lattice can still be seen at a 45◦ angle.

Rotated Surface Code

We can cut the corners from fig. 3.15 to end up with a [9, 1] code that is special. It is the smallest rotated surface code. Sometimes it is called ’surface-17’ code, because it needs 9 data qubits + 8 parity qubits for a total of 17 qubits. What remains are the four X stabilizers and four Z stabilizers depicted in fig. 3.18 8 The four stabilizers each are chained in a way that one can always pick the most likely equivalence class for an error to apply the best correction candidate. For larger L L rotated surface codes with L 5 an optimal decoder via lookup table becomes × ≥ combinatorially more complex. We will need more efficient algorithms like the MWPM algorithm in section 3.3.3.

3.3.3 Code Thresholds

In this section we will try to demonstrate and quantify how quantum codes can enable fault- tolerant and universal quantum computation. To that end we implemented a model that accounts for independent noise and can represent various quantum codes. The simulation is described in the appendix.

Independent Noise Model

While in nature noise can undergo bias, it pays off to start with a simple independent noise model for pedagogical reasons. It is sketched in fig. 3.19

8Picture taken from http://dx.doi.org/10.1088/1367-2630/14/12/123011. 76 Topological Quantum Error Correction: Codes and Decoders

Error Rate I (1 p) − X p(1 p) − Y p2 Z p(1 p) − Fig. 3.19: Independent noise model [50]

iY = ZX implies that a Y error can be considered a simultaneous X and Z error. Since the model is independent, the process is equivalent to a combination of two separate and independent processes as in fig. 3.20

Error Rate I 1 p − X p

Fig. 3.20: Bit flip noise model [50]

Optimal Decoder

To illustrate the concept, we will once again take a step back from topological error cor- rection and analyse the bit flip code. The logical error rate can be calculated analytically so we have a way to verify if the simulation gives an adequate result. After we prepare the initial state, noise is simulated and the random outcome is cached. Of course it would not be constructive to detect an error by peeking at the simulator’s cache. If we were allowed to do that in a real setting, then we would not need to use intricate means to detect an error without altering the quantum information. We have to detect and correct the error blindly, like we learned in this report. Only then can we compare our correction attempt with the cached error and count how many times we succeeded. The bit flip code can correct one arbitrary bit flip error per code cycle. However, if we encounter two or even three errors, then the correction attempt would fail.

Error Rate pL I (1 p)3 − X 3p(1 p)2 − Y 3p2(1 p) − Z p3

Fig. 3.21: Analytical contributions to the success 1 p and error rates p of the three − L L qubit bit flip code [3, 1]

Suppose we encounter the error syndrome 1, 1, indicating that only qubit 1 and 2 have − odd parity and, thus, qubit 1 should have flipped. The most likely error E is XII. We choose a recovery attempt C = XII and since CE = III our recovery was successful. ∈ S The probability to encounter a flip of qubits two and three at the same time is O(p2) meaning it is much lower for small enough physical error rates p. Regardless, let us suppose that occurred. We would again encounter the same error syndrome 1, 1 and choose a − recovery attempt C = XII. However, this time the error E is equal to IXX. Error and recovery yield Orkun Şensebat 77

Three Qubit Bit Flip Code 100 Simulation Analytical 1 p

% Threshold pc 75 n i

L p 1

e t a r 50 s s e c c u s

l a c i

g 25 o L

0 0 25 50 75 100 Physical error rate p in %

Fig. 3.22: Plot of the success rate for the three qubit bit flip code [3, 1] simulated in 7 N = 10 trials per data point. The vertical line is the critical threshold pc = 50%. The noise model is independent and only simulates X errors, as the bit flip code cannot even protect from other errors. Look up is performed via optimal decoder using the look up table from the introduction. Errors are suppressed with large enough N.

CE = XXX = X / . L ∈ S The simulated and analytical success rates are plotted in fig. 3.22. The act of looking up a strategy from the parity information is called decoding and an algorithm that performs that action is called a decoder. In this section we demonstrated an optimal decoder. We do not have to guess what the most likely outcome is. There is no ambiguity as every error syndrome corresponds to one candidate that wins the majority vote and not an equivalence class of error candidates.

Critical Threshold

Note that the success rate in fig. 3.22 is better than a linear decline from 100% to 0% for small enough physical error rates p. This means that the 3 qubit code outperforms a physical qubit regarding protection from independent bit flips. If it were not, we would be better off not encoding the quantum information in this 3 qubit code altogether. Beyond 50% that is no longer the case. We would be better off betting on the losing vote with a physical error rate that high. The 50% is called the critical threshold pc. If the physical error rate p < pc, then we can increase fault tolerance by increasing the number of encoding data qubits. [3] [50] In this part we will present the adaption of the last rather trivial simulation to the rotated ’surface 17’ code. This is surprisingly simple. The implementations for the entire class can be re-used, including most stabilizers. Since we have shown earlier this section that X and Z errors can be analysed indepen- dently, we will limit the simulation to X errors for pedagogical reasons. The only challenge is now to verify whether a correction is successful or not. This simulation is no longer a measure of counting outcomes of coin flips. 78 Topological Quantum Error Correction: Codes and Decoders

Rotated Surface Code 100 1 p

Threshold pc Simulation 3 × 3 %

n i

L p 95 1

e t a r

s s e c c u s

X

90 l a c i g o L

85 0 5 10 15 Physical error rate p in %

Fig. 3.23: Plot of the success rate for the rotated surface code [9, 1] simulated in N = 105 trials per data point. The vertical line is the critical threshold pc = 7.5%. The noise model is independent and only simulates X errors. Look up is performed via optimal decoder using the look up table from last section. Error bars are 1σ and binomial.

To that end we try to encode the error E and the recovery C as a bit-string and leverage the fact that the group algebra translates to a Z/2Z-addition. We compute EC via addition modulo 2 and now have to figure out if it is a stabilizer operator or a logical operator. It has to be one of the two. To that end we operate with stabilizer generators onto EC and leverage that for S 9 ∈ S S(EC) = EC. The result of this approach is Gaussian elimination with Z2 bit-strings. Q Either SiEC = g with g being a stabilizer generator has a solution, in which case it Si∈S is a stabilizer operator, or not, in which case the remainder is immediately recogniseable as a logical operator. This elimination takes at most 9 steps: One for each qubit. The resulting plot can be found in fig. 3.23. We omit a formal fit, as that would mostly pay off to compare more than just one decoder and code, particularly larger rotated surface codes.

Other Decoders

The optimal decoder is computationally expensive [3]. It is still important because it is the upper limit to thresholds that we can expect with other decoders. It is a field of active research to come up with better decoders and to perform benchmark analyses. One such algorithm is the Minimum-Weight-Perfect-Matching algorithm, or short: MWPM. The lattice is expressed as a graph theory problem and each vertex, edge or Orkun Şensebat 79

Toric Code MWPM 100 L = 3 L = 5 L = 7 %

n i

L p 1

e t a r

s 95 s e c c u s

X

l a c i g o L

90 0.00 0.05 0.10 Physical error rate p in %

Fig. 3.24: Plot of the success rate for the 3 3, 5 5 and 7 7 toric codes simulated × × × in N = 105 trials per data point. The noise model is independent and only simulates X errors. Look up is performed via MWPM decoder. plaquette is assigned a weight. The algorithm will then optimize the weight by finding the minimum. This is computationally much more efficient than an optimal decoder and comes close to the thresholds. [52][50] [53] has a simulation equivalent to the one in this report, albeit for the toric code. The difference is that the stabilizer verification can be done using a simple count for the toric code. The implementation includes an open source MWPM decoder that leverages the networkx python package. It can be used to compute the threshold for a non-optimal decoder. Fig. 3.24 shows the critical threshold pc using the python implementation by [53].

3.4 Conclusion

To summarise, we motivated and introduced the concept of quantum error correction using the bit flip code, which due to its simplicity is very instructional. What follows is a mostly self-contained and axiomatic description of the stabilizer formalism, which is the best framework to describe surface codes. We conclude the first part of the report with a comparison of quantum codes that encode one qubit and hope to offer a useful reference. This groundwork ensures that we are equipped to understand topological quantum codes in the second part. First we discuss error correcting circuits on closed surfaces using Kitaev’s toric code. We were able to transfer the discussion to the important class of topological quantum codes with open surfaces. These include the very useful planar surface codes, which are much simpler to design and scale, since they can be engineered onto a flat and finite space. What these codes have in common is that they allow to detect and correct a set of ar- bitrary Pauli errors below the critical threshold pc.[51] When we are below that threshold, 80 Topological Quantum Error Correction: Codes and Decoders we can hope to perform fault-tolerant, universal quantum computation and can increase resilience by making the encoding circuits larger. In some cases such as surface codes this scaling comes natural. In other cases we can concatenate different codes to obtain an improved code. We illustrate the powers of the stabilizer formalism by estimating the critical threshold pc using a python implementation that models independent noise. For pedagogical reasons section 3.3.3 implements a simulation for the bit flip code. Due to the stabilizer formalism, the framework can swiftly be adapted to different codes or noise models. Section shows the result for the rotated ’surface-17’ code on a 3 3 grid of data qubits. We estimate the × threshold for X errors to pc = 7.5%. Formulating a recovery strategy is straight forward for the quantum codes in the first part. However, to implement the simulation we need to estimate the most likely error that occurred. In many cases, multiple candidates share the same probability, but it turns out that they all belong to the same equivalence class of errors. As surface codes become larger, they become more robust to errors on one hand. However, on the other hand we observe more and more errors on average. Optimal decoders, i.e. those that compute the most likely equivalence class, do not scale efficiently to larger encoding circuits. Various decoders are under active research including MWPM algorithms, Bliss algorithms or even deep learning algorithms to discover a correction strategy that is sufficiently likely to succeed while maintaining efficiency.

3.5 Acknowledgments

I thank my supervisor Dr. E. Fiorelli for her patience and assistance, and Prof. F. Hassler and Prof. M. Müller for the seminar, which was a great opportunity to learn about the new field of quantum technologies. I also thank the other supervisors, students and everyone that listened to the presentations for participating in this masters seminar. Chapter 4

Quantum Walk and its Universality

Kiran Adhikari Supervisor: Thomas Botzung

Quantum walks is an advanced tool that has been used from building exponen- tially powerful quantum algorithms to universal quantum computation. Starting from the classical random walk, we will review the foundations of both discrete- and continuous-time quantum walks. We will review some of their properties and show the striking differences in probability distribution between quantum and classical cases. We will also discuss the algorithm based on a continuous- time quantum walk and the computational universality of the quantum walk.

4.1 Introduction

Quantum computation is the interdisciplinary field of building quantum computers, tech- nologies, and quantum information systems using the quantum mechanical properties of Nature. One of the main challenges in quantum computation is building better quantum algorithms than their classical counterparts. A review of quantum algorithms, as well as an excellent comprehensive introduction to quantum information and computation, can be found in [3]. Quantum walks are the quantum analogues of classical random walks. Unlike the classical case, where the walker is at a deterministic position at a given time, the quantum walker can be at multiple positions at the same time because of the quantum superposition principle. The tossed coin can also be in a superposition of "head" and "tail". Because of these unique features, the quantum walker can walk multiple positions faster than the classical walker, a feature which has been exploited for constructing quantum algorithms that are efficient than classical ones [54–56]. Quantum walks have also been shown to form a universal model of quantum computation [57–59]. A good review of quantum walks can be found in [60,61]. There are mainly two kinds of quantum walks: discrete and continuous quantum walks. In the discrete case, the corresponding evolution operator of the system is applied only in discrete time steps, while in the continuous case, it can be applied at any time. The structure of this article is as follows: First I will give some physical intuition behind random walks starting from a classical case in Section 4.2. I will discuss the classical random walk on an unrestricted line, its probability distribution and other features. This will be helpful while comparing the result to the quantum case. Then using graph theory, 82 Quantum Walk and its Universality

Fig. 4.1: One dimensional random walk: The walker jumps to the right with probability p and to the left with probability 1 p. −

I will generalize classical random walk on general graphs. This is then followed by an introduction to quantum walk in Section 4.3. In this section, I will try to motivate the need for studying the quantum walk and then give a short history of it. This is followed by rigorous definition and necessary terminologies in the context of Discrete quantum walk on line in Section 4.3.1. In section 4.3.2, I will discuss the continuous quantum walk and algorithms based on it. Finally, in section 4.3.3, I will discuss Universal Computation by Quantum walk based on Childs paper [57].

4.2 Quick review of Classical Random walk

Quantum walks can be considered a counterpart of a classical random walk. A simple classical random walk is a random walk on a regular lattice, in which one particle can jump to another position at each step according to some probability distribution. The classical random walk has been very successful in developing classical algorithms such as Page Rank Algorithm [62]. One of the challenges in Quantum computing is developing algorithms that are better than their classical counterpart. Quantum walk has come out as a potential candidate for creating such a quantum algorithm. Before directly jumping to the quantum case, let’s discuss a classical random walk. This will give some useful insights which can be helpful in comparing with the quantum case.

4.2.1 Classical random walk on an unrestricted line One example that is frequently studied in textbook is a classical random walk on a line which consists of a particle jumping to either left or right according to certain probability distribution [63,64]. Let us consider a random walker on a one dimensional lattice of sites that are fixed certain distance apart. The walker jumps to the right with probability p and to the left with probability q = 1 p. This process is shown in the fig. 4.1. The probability of being at − some position m after N jumps is given by popular distribution formula known as binomial distribution: N! N+m N−m ( 2 ) ( 2 ) p(m, N) = N+m  N−m p q (4.1) 2 2 Here, N is the number of steps which can also be seen as the number of time steps taken to reach point m. One can now easily calculate the first moment or expectation value of n from the given binomial distribution as: E[n] = Np (4.2) Similarly, variance of the displacement is: 2 σm = 4Npq (4.3) Kiran Adhikari 83

Fig. 4.2: Graph of binomial distribution with N = 100 for two different cases p = 0.6 (left peak) and p = 0.8 (right peak). We can see that the maximum value on the curve i.e. expectation value is located close to Np. Figure taken from [63].

For the special case of a symmetric walk where p = q = 1/2, σm = √N. This behavior where distance travelled is proportional to the square root of time taken is known as free diffusion.

Random Walk on a general graph

Let G be a connected, undirected graph 1 with n nodes (|V| = n) and m edges (|E| = m). G induces a Markov chain M if the states of M are the vertices of G and for all u, v V , G G ∈ we have: ( 1 d(u) , if (u, v) E puv = ∈ 0, otherwise , where d(u) is the degree of the vertex u. Now, we can formally define the random walk on a general graph. Let G be a graph. A random walk, starting from some vertex u V is the random process: ∈ • s = u

• Repeat

– Choose a neighbor v of u according to probability distribution P 2 – u = v

• Until some stop condition

Examples of such a walk includes random walk on a circle or on a three-dimensional Mesh. 1For quick review of graph theory, please check [65] 2 Usually, one chooses the puv = 1/d(u) 84 Quantum Walk and its Universality

4.3 Quantum walk

Even though Feynman’s work [66] on quantum mechanical computers can be interpreted as continuous-time quantum walk, the first paper using the quantum walk as the main topic was introduced by Aharonov et al in 1993 [67]. Two models of the quantum walk have been studied in the literature:

• The first model is discrete quantum walks which has discrete time and space. It consists of two quantum mechanical systems: walker and coin, and evolution operators which is then applied to both systems on discrete time steps.

• The second model is continuous quantum walks which has continuous space but discrete time. The system consists of walker and evolution operator that is applied without any specific time-restriction. Mathematically, this model is time-evolution described by Schrodinger’s equation.

4.3.1 Discrete quantum walk on a line

We will now start our study with a discrete quantum walk. One of the most studied models of discrete quantum walk is discrete quantum walks on a line (DQWL). The reason for that is:

• DQWL can be used to build quantum walks on the more general graph.

• DQWL can be used to understand the features of quantum walks necessary for the development of quantum algorithms.

• DQWL can be used to test the quantumness in quantum experiments or quantum computers.

DQWL can be described as a quantum evolution of a walker on a one-dimensional lattice. Such a evolution may be described on a Hilbert space = where and are H HC ⊗HP HC HP coin and position Hilbert spaces, respectively. Mathematically, DQWL are performed on graphs G(V,E) where each vertex has two edges. The main components of a coined DQWL are a walker, coin, evolution operators for both walker and coin, and set of observables. Let us define each term carefully. Walker: The walker is a quantum system in infinite but countable dimension Hilbert space . These can also be regarded as "position sites" for a walker denoted by position Hp | i ∈ . The canonical basis states i that span , as well as superposition of the form Hp | ip Hp P α i with P α 2 = 1, are states for position . Usually, position is initialized at i i | ip i | i| | i origin giving position = 0 . | i | ip Coin: We will now augment the position Hilbert space with ’coin’-space . The Hp Hc coin lives in a 2-dimensional Hilbert space which may take two canonical basis states Hc |↑i and together with any superposition of them. One can then write, coin = a + b , |↓i | i |↑i |↓i where a 2 + b 2 = 1. The coin Hilbert space is internal to each walker. In our example of | | | | one dimensional lattice, we may understand coin space as electron in a lattice with a spin component. Therefore, total state of the quantum walk lives in = N and ψ = Ht Hp Hc | iinitial position N coin . | iinitial | iinitial Evolution Operators: In the classical random walk, walkers evolve in the system based on some probability distribution. For example, in Fig 4.1 walker moves to left or right based on probability of a coin toss i.e. binomial distribution. In quantum version, one can apply coin operator C to render the coin state in a superposition, and randomness Kiran Adhikari 85 is introduced after measurement based on Born’s principle. Based on the state of the coin, conditional evolution operator S shifts the walker to next position. This choice of coin operator C is arbitrary. Depending on the choice, one can obtain a quantum walk with very different behavior. For now, let us choose C to be unbiased meaning that if the coin operator C acts on a basis states at site 0 , say , after one iteration, we want to get | i |↓i the equal probability of getting head and tail. This would correspond to be at position 1 | i with probability 1 and 1 with probability 1 . 2 |− i 2 A commonly used balanced coin is Hadamard coin H:

1 1 1  H = (4.4) √2 1 1 − We can now build conditional shift operator such that it allows the walker to go one step forward or backward depending on the state of coin. A suitable conditional shift operator can be written as: X X S = i + 1 i + i 1 i (4.5) |↑i h↓| ⊗ | i h | |↓i h↑| ⊗ | − i h | i i

Now, we can see that S transforms the basis state i to i + 1 and i to | i ⊗ |↑i |↑i ⊗ | i |↓i ⊗ | i i 1 . As an example, let us check if Hadamard coin is balanced or not: |↓i ⊗ | − i 1 0 H ( + ) 0 |↑i ⊗ | i −→ √2 |↑i |↓i ⊗ | i 1 S ( 1 + 1 ) −→ √2 |↑i ⊗ | i |↓i ⊗ |− i

So, the probability to be at either state or is 1/2 which is analogous to classical |↑i |↓i probability distribution. Therefore, the operator on the total Hilbert space is U = S.(C I) ⊗ and the mathematical representation of discrete quantum walk after t steps is given by:

ψ = (U)t ψ . (4.6) | it | iinitial So far, our formalism of DQWL is deterministic.3 To get randomness, we will have to perform measurement on the coin and position state. So, let us study observables on position and coin state. Observables: One of the key advantages of Quantum walk over classical are conse- quences of interference effects after several applications of U. There are various ways of extracting information from an evolved system. One possible way is performing measure- ment on the coin using measurement observable:

M = α + α (4.7) c 0 |↑i h↑| 1 |↓i h↓| and then performing measurement on the position states of the walker using: X M = α i i (4.8) p i | i h | i

In order to show that quantum walk is really different from classical counterpart, let us evolve the walk for some steps starting in initial condition ψ = 0 , and study | iinitial |↓i ⊗ | i the corresponding probability distribution.

3In fact, if we measure the probability distribution at every step, we would get the usual classical distribution. So, Quantum Mechanics dictates that in order to get the benefit, we need to let the system evolve for some time steps. 86 Quantum Walk and its Universality

U 1 ψ ( 1 1 ) (4.9) | iinitial −→ √2 |↑i ⊗ | i − |↓i ⊗ |− i U 1 ( 2 ( ) 0 + 2 (4.10) −→ 2 |↑i ⊗ | i − |↑i − |↓i ⊗ | i |↓i ⊗ |− i U 1 ( 3 + 1 + 1 2 1 3 ) (4.11) −→ 2√2 |↑i ⊗ | i |↓i ⊗ | i |↑i ⊗ |− i − |↓i ⊗ |− i − |↓i ⊗ |− i

Just by checking these three steps, we can see that probablity distribution from quan- tum walk is different than classical one. For example, there is higher probability to be at position 1 for quantum walk case compared to classical walk. In table 4.2 we have listed − the quantum probability distribution and in table 4.1 classical probability distribution if we measure the position register after T steps. Figure 4.3 and Figure 4.4 shows the probability distribution after 100 steps with symmetric and asymmetric initial condition respectively. It can be easily seen that the probability distribution is much more different than the normal Gaussian obtained in the classical case. One can see the two-peaked probability distribution. This two peaked probability distribution is much more visible in symmetric distribution.

Time || i -5 -4 -3 -2 -1 0 1 2 3 4 5 0 1 1 1/2 1/2 2 1/4 1/2 1/4 3 1/8 3/8 3/8 1/8 4 1/16 1/4 3/8 1/4 1/16 5 1/32 5/32 5/16 5/16 5/32 1/32

Table 4.1: Probability of being at position i after T steps for a classical random walk.

Time || i -5 -4 -3 -2 -1 0 1 2 3 4 5 0 1 1 1/2 1/2 2 1/4 1/2 1/4 3 1/8 5/8 1/8 1/8 4 1/16 5/8 1/8 1/8 1/16 5 1/32 17/32 1/8 1/8 5/32 1/32

Table 4.2: The probability of being found at position i after T steps of the quantum random walk on the line. Note that probability starts differing from step 3 compared to the classical case as illustrated in Eq. (4.9). Other important feature is the asymmetry of the distribution which is a feature of choosing particular initial state and coin operator. Details of the table can be found in [61].

Remarks on the probability distribution

By now, it is clear that the classical probability distribution of a random walk is very different from the quantum one. This is an intricate feature of the quantum world. Classical symmetric random walk on the line after T steps has a variance of σ2 = T , so expected distance from the origin is of the order √T . However, it can be shown that the quantum Kiran Adhikari 87

Fig. 4.3: The probability distribution of the quantum random walk with Hadamard coin starting in √1 ( + ) 0 after N = 100 steps. Since odd points have probability zero, 2 |↑i |↓i ⊗ | i only the probability at the even points. Symmetric probability distribution is the feature of the symmetric initial condition and Hadamard Coin. Codes for plotting this figure can be found at 4.A. walk has a variance which scales with σ2 T 2. Thus, expected distance from the origin is ∼ of the order σ T . Therefore, quantum walk propagates quadratically faster than classical ∼ walk (in the latter case, we had a free-diffusion equation). In figure 4.5, we have plotted the comparison between classical vs quantum variance.

Discrete Quantum Walk on a general graph

Now, let us generalize quantum walk to a general graph. Let G = (V,E) be a d-regular graph with V = n and be the Hilbert space spanned by states v where v V . Here, | | Hv | i ∈ the graph is finite instead of unrestricted infinite line case on DQWL. The coin space is defined by . The coin space is an auxiliary Hilbert space of dimension d spanned by HA basis states i , i 1, ..., d. The coin operator, C is a unitary transformation on . Shift | i ∈ HA operator S acts on which acts as: S a, v = a, u , where u is the ath neighbor Hv ⊗ HA | i | i of v. Therefore, one step of quantum walk on a general graph G is then U = S(C I). If ⊗ ψ is the initial state, then the state of a quantum walk after t steps is given by | i0 ψ = U ψ (4.12) | it | i0 4.3.2 Continuous-time quantum walk (CTQW)

So far, we have discussed discrete-time quantum walk. Now, we will focus our attention on a continuous-time quantum walk, contributed earlier by Feynman [66] and later rigorously by Farhi and Gutmann [68] where space is discrete but time is continuous. The continuous- time quantum walk has some features which are different than a discrete case, namely:

• CTQW are described by Markov processes; 88 Quantum Walk and its Universality

Fig. 4.4: The probability distribution of the quantum random walk with Hadamard coin starting in ( ) 0 after N = 100 steps. Since odd points have probability zero, only |↑i ⊗ | i the probability at the even points. Symmetric probability distribution is the feature of the asymmetric initial condition and Hadamard Coin. Codes for plotting this figure can be found at 4.A.

• Structure of a graph describes a Hamiltonian under which CTQW evolves.

• No coin operator is used in CTQW. A nice property of CTQW is that it can be considered a straightforward quantum version of a classical time random walk. A classical continuous random walk is defined as follow. Let G = (V,E) be a graph with V = n then a continuous time random walk on | | G can be described by the order n generator matrix M given by:  γ, a = b, (a, b) G − 6 ∈ Mab = 0, a = b, (a, b) G  6 6∈ kγ, a = b and k is the valence of vertex a

Here, γ denote the jumping rate per unit time between vertices, and above conditions imply that random walk can only occur between nodes connected to an edge. Then, Mab is a matrix element of stochaistic generator matrix M. Only matrix elements with nearest neighbors have non-zero entities. The probability of being at vertex a at time t is then given by: dpa(t) X = M p (t) (4.13) dt − ab b b These probabilities satisfies classical normalization condition i.e. X pa(t) = 1 (4.14) a Eq: (4.13) is very similar to Schrödinger equation without the factor of i. With this inspiration, let us go to the quantum case. Let H be a Hamiltonian with matrix elements Kiran Adhikari 89

Fig. 4.5: Comparison of classical Vs quantum Variance. Quantum walk has a variance which scales with σ2 T 2. Classical symmetric random walk has a variance of σ2 = T . ∼ This demonstrates quantum walk can propagate quadratically faster than classical walk. given by: a H b = M (4.15) h | | i ab We then construct an N-dimensional Hilbert space consisting of the vectors a where | i a = 1, 2...N denote the graphs vertices as before. The Schrödinger equation is then: d a ψ(t) X i h | i = a H b b ψ(t) (4.16) dt h | | i h | i b Similar to the classical case, the probabilities are conserved as: X a ψ(t) 2 = 1 (4.17) |h | i| a Finally, with eq: (4.15) and (4.16), the unitary operator U

U = exp( iHt) (4.18) − defines a continuous quantum walk on graph G such that the state vector in the Hilbert space at a time t is given by the evolution ψ(t) = U ψ(0) . | i | i The models for discrete and continuous quantum walks pose some theoretical and ex- perimental issues. The transformation between discrete quantum walks into the continuous quantum walk is not very clear compared to the classical case where the continuous random walk would be the continuous version of discrete walk connected via a limit process. It is also not very natural to have different kinds of quantum diffusion, one with extra particles (quantum coin) with no clear connection between them.4

Continuous Quantum Walk Based Algorithms:

There have been a lot of efforts and resources devoted to building both classical and quan- tum algorithms for solving a variety of search problems. For example, a typical classical 4Childs [69] and Struach [70] have made attempts to make a connection between discrete and continuous quantum walk. 90 Quantum Walk and its Universality random walk-based algorithms in computer science is the Page rank algorithm. It calcu- lates the importance of web pages by randomly walking among them. One of the main motivations behind building quantum computing is to develop quantum algorithms that are better than their classical counterpart. Farhi and Gutman made an earlier attempt to develop quantum walk algorithm [68] which had a polynomial benefit over the classical algorithm. Later, Childs et al [55], constructed an oracular problem that can be solved by a quantum walk in sub-exponential time. The graph they used is known as a glued tree. When two binary trees of equal height are glued such that each of the leaves of one of the trees is connected to exactly two leaves of the other tree, one gets a glued tree. Examples of such a graph is shown in Figure 4.6 and 4.7. In these graphs, each node that was a leaf in one of the original trees now has a degree of 3 except for end points. Childs et al considered a graph consisting of two balanced binary trees of height n in Fig. 4.6. They showed that, in this case there is no benefit of quantum over classical case. Then, they modified the graph by randomly choosing a leaf on the left and connecting it to a leaf on the right as shown in Fig. 4.7. We will call this graph modified tree Gr. Both, classical random walk and quantum walks are allowed to go from ENTRANCE to EXIT. Childs et al proved that in a modified tree Gr, quantum walks can reach EXIT point exponentially faster than classical random walk.5 One simply way to understand this is to see what happens for a classical walk on such a modified graph. In the left tree, probability of moving one level one to right is twice that of moving to the left. However, in the right tree, opposite happens. Thus, one can see that the walk will get lost in the middle of the tree. It will get stuck and have exponential small probability of reaching the right node. For quantum case, we will use a quantum walk transversing modifed graph Gr (Fig. 4.7). We define a Hamiltonian H based on G’s adjacency matrix A. Matrix elements of H is given by: ( γ, a = a0, aa0 G a H b = 6 ∈ (4.19) h | | i 0, otherwise Childs et.al, used an oracle to get the information about the structure of the graph G using the Hamiltonian given by Eq. (4.19). Then, they were able to prove that it is possible to design a continuous quantum walk that would exponentially transverse the graph G.

4.3.3 Universal Computation by Quantum walk In the field of quantum computing, one of the very desirable properties is universality for a model of computation. The reason is one can show that such a model is capable of simulating any other model of computations, although they can happen at different time regimes. There have been several efforts to build up such universal quantum computers. In the field of quantum walk, Childs proved that the model using continuous-time quantum walk on graph is universal for quantum computation [57].6 The steps in the proof by Childs are as follows: 1. Propagating a continuous-time quantum walk on a graph G is the same/equivalent as executing a continuous-time quantum walk-based algorithm. Propagation in a graph occurs via scattering theory. 2. The structure of the graph G is dependent on the problem we want to solve. However, in most cases graph G can be divided into subgraphs having a maximum degree of 3. 5However, one should understand that it is not possible find out the path taken by the quantum walk. It can only find the node named EXIT. 6There exists universal computation using discrete-time quantum walk [59]. Kiran Adhikari 91

Fig. 4.6: Balance tree. The task is to find the node EXIT ba a classical walk or quantum walk coming from ENTRANCE. Figure taken from [62].

3. Graph G is finite in terms of length of quantum wires and number of quantum gates.

4. Quantum wires do not represent qubits. Instead, they represent quantum states. This would imply that number of quantum wires in a graph G would blow up exponentially with respect to a number of qubits.

5. Universality: A set of gates is universal if any other unitary operation may be approximated to arbitrary accuracy by a quantum circuit using these gates. The main focus of Childs paper is to simulate a universal gate set for quantum computation by propagating continuous-time quantum walk on different graphs. The gate set Childs uses are: controlled-not, phase, and basis changing gates which are given in matrix representation as: 1 0 0 0 0 1 0 0 CNOT =   0 0 0 1 0 0 1 0 " # 0 1 Ub(P hase) = iπ 0 e 4

1 1 i Uc(Basis Changing) = √2 i 1 These gates form a dense subset of SU(2).

6. Childs paper is a theoretical framework, so it doesn’t constitute a hardware-oriented proposal for implementing a quantum computer based on continuous time quantum walk.

Scattering theory on graphs

Child utilizes scattering theory of graphs to show universality of continuous time quantum walk. We represent an infinite line of vertices by computational basis states x for x Z. | i ∈ 92 Quantum Walk and its Universality

Fig. 4.7: Modified Balance tree. The task is to find the node EXIT ba a classical walk or quantum walk coming from ENTRANCE. Figure taken from [62].

Each vertices are connected to x 1 nearest neighbors. The eigenstates are momentum E ± ˜ states k with: D E ˜ ikx x k = e (4.20) D E for k [ , ) and k˜ k˜0 = 2πδ(k k0). The eigenvalues are 2cos(k). Now, consider a ∈ −∞ ∞ − graph G, and an infinite graph with adjacency matrix H is produced by attaching semi- infinite lines to each of M of its vertices. For each j 1, 2, ..., N (i.e. for each infinite line ∈ E ˜ attached to G) there is an incoming scattering state of momentum k denoted by k, sc given by: D E ˜ −ikx ikx x, j k, sc = e + Rj(k)e (4.21) D E 0 ˜ ikx x, j k, sc = Tj,j0 (k)e (4.22) E ˜ The reflection coefficients Rj(k), transmission coefficients Tj,j0 (k) and k, sc are deter- E E E ˜ ˜ ˜ mined by the eigenequation H k, sc = 2cosk k, sc . Eigenstates k, sc together with bound states form a complete orthogonal set of eigenfunctions of H. Propagator for scat- tering through G can be calculated using these eigenstates.7 These tools are now enough to compute the propagation of the continuous time quantum walk on graphs which then acts as a gate. For a further discussion on this topic, please see the reference [57].

Universal Gate Set Widgets

As we have already discussed, the universal gate set chosen by Childs is: controlled-not, phase and basis-changing gates. Now, we will discuss each gate individually. Controlled-Not gate: Controlled-Not gate is trivial because it is sufficient to ex- change the quantum wires corresponding to the basis states 10 and 11 . The widget | i | i for Controlled-Not gate is shown in Fig. 4.8. This process might look unrealistic but we should understand that Childs proposal is a theoretical kind. It described logical process one should perform in order to achieve universal quantum computation.

7We don’t need to write the expression for bound states as Childs proved it can neglected for the scattering process. Kiran Adhikari 93

Phase gate: For the phase gate, we apply nontrivial phase to the 1 and doing | i nothing to 0 . The widget for phase gate is shown in Fig. 4.9. | i Basis changing gate: The basis changing widget is shown in Fig. 4.10. For more details, please check Ref. [57].

Fig. 4.8: CNOT Widget, Figure taken from [57]

Fig. 4.10: Basis changing widget, [57] Fig. 4.9: Phase changing widget, [57]

4.4 Conclusion

In this article we have reviewed theoretical foundation of discrete and continuous time quantum walk. We started from classical random walk and then developed a beautiful framework for discrete quantum walk. We showed some striking differences in the proba- bility distribution and behavior of quantum walks compared to their classical counterpart. Quantum computing is hoped to provide new and fast algorithms than classical computers. We showed that continuous walk can be a candidate for such algorithms. Using continuous walk, we have reviewed paper by Childs regarding universal computation by continuous quantum walk. Quantum walk is an exciting field of research and is still dotted with open questions and problems. We hope the reader has been motivated with the field and obtained general overview about the status of the field and direction it can take.

App. 4.A Notebook for Discrete quantum walk simulation

In this section, I will include the codes that I used to plot figures 4.3 and 4.4, asymmetric and symmetric discrete quantum walk. 1 Quantum Random Walk

[1]: from numpy import* from matplotlib.pyplot import* [2]: N= 100 # number of random steps P=2*N+1 # number of positions

1.1 Quantum Coin Quantum coin has basis states 0 and 1. We toss a quantum coin to decide whether to go left or right (or a superposition). [3]: coin0= array([1,0]) # |0> coin1= array([0,1]) # |1>

1.2 Hademard coin operator The coin operator, that can be used to flip a quantum coin into a superposition, is: [4]: C00= outer(coin0, coin0) # |0><0| C01= outer(coin0, coin1) # |0><1| C10= outer(coin1, coin0) # |1><0| C11= outer(coin1, coin1) # |1><1|

C_hat= (C00+ C01+ C10- C11)/sqrt(2.)

1.3 Shift (step) operator The step operator moves left or right along the line, depending on the value of the coin: [5]: ShiftPlus= roll(eye(P),1, axis=0) ShiftMinus= roll(eye(P),-1, axis=0) S_hat= kron(ShiftPlus, C00)+ kron(ShiftMinus, C11)

1.4 Walk operator The walk operator combines the coin operator on the coin state, and a step operator on the com- bined coin and position state: [6]: U= S_hat.dot(kron(eye(P), C_hat))

1.5 Initial State Let’s take the initial state of the system to be a coin in a superposition of left and right, and the walker at position 0:

Please change the initial state of the system of the coin here.

1 [7]: posn0= zeros(P) posn0[N]=1 # array indexing starts from 0, so index N is the central posn psi0= kron(posn0,(coin0+coin1*1j)/sqrt(2.))

1.6 state after N steps Then walking N steps is just applying the walk operator N times: [8]: psiN= linalg.matrix_power(U, N).dot(psi0)

1.7 Measurement operator

[9]: prob= empty(P) fork in range(P): posn= zeros(P) posn[k]=1 M_hat_k= kron( outer(posn,posn), eye(2)) proj= M_hat_k.dot(psiN) prob[k]= proj.dot(proj.conjugate()).real [23]: fig= figure() ax= fig.add_subplot(111) #fig.xlabel('position') plot(arange(P), prob) ax.set_xticklabels(range(-N, N+1,P// 10)) xlabel('position') ylabel('probability of location walker at position n') suptitle('Hadamard walk for N = 100, with symmetric initial condition') show()

2 Probability distribution for a quantum random walk with N=100, symmetric initial coin

1.8 Changing the initial state We start with initial coin state /0> [24]: newPosn0= zeros(P) newPosn0[N]=1 # array indexing starts from 0, so index N is the central␣ , →posn newPsi0= kron(posn0,coin0)

newPsiN= linalg.matrix_power(U, N).dot(newPsi0) newProb= empty(P) fork in range(P): posn= zeros(P) posn[k]=1 M_hat_k= kron( outer(posn,posn), eye(2)) proj= M_hat_k.dot(newPsiN) newProb[k]= proj.dot(proj.conjugate()).real

newfig= figure() ax= newfig.add_subplot(111) ax.set_xticklabels(range(-N, N+1,P// 10)) plot(arange(P), newProb)

3 xlabel('position') ylabel('probability of location walker at position n') suptitle('Hadamard walk for N = 100, with asymmetric initial condition') show()

4 Chapter 5

Adiabatic Quantum Computing

Ananya Singh Supervisor: Markus Müller

Since early 2000s, Adiabatic Quantum Computation (AQC) has emerged as a alternative way of performing quantum computation. In this report, we will highlight the major principle behind this approach and present solutions to sim- ple problems to show the correctness of the model. Subsequently, we will apply this computational method to well known quantum algorithm and discuss its complexity. Finally, before briefly discussing its potential and what outlook lies ahead for this method, we will also touch upon universality.

5.1 Introduction

Quantum computation and its application doesn’t seek for any kind of introduction as it is vastly known to one and all. What sits at centre of this computation paradigm is the difficulty in the implementation of the same. In general, there are two approaches one can take to perform quantum computation. The first one being the gate model, an intuitive approach that follows the Church-Turing-Deutsch thesis. It is intuitive because it is very similar to the logic implemented in classical computers: electronic gates that can perform all kinds of logic when combined in different fashions. There are, however, important differences between the elements we are going to show and their classical counterparts. Unfortunately, the actual realization of gate quantum algorithms (application of quantum gates to some initial quantum state) faces the problem that the errors accumulate over many operation and the resultant decoherence tends to destroy the fragile quantum features needed for the computation. Hence, an alternative approach has been suggested which is adiabatic quantum com- putation (AQC). Instead of manipulating the actual operations we need to run in order to solve our problem, we might want the computer to solve the problem by itself. This idea comes from the fact that physical systems in general naturally evolve towards the minimum of some quantity (energy, etc) or remain in steady states. It would thus be useful if we could imagine a quantum system whose energy level we could manipulate and which, after receiving the problem we need solved, could evolve towards a solution of our problem.We can use quantum mechanics and its properties to show that this is indeed possible. This one such property which unveils this magic of finding the solution on its own is described by what is called the adiabatic theorem. Ananya Singh 99

5.1.1 Overview of the report

In section 5.2 we will introduce the preliminaries required to understand the adiabatic computation, which are the adiabatic theorem and Landau-Zener transition. From there we will move on to section 5.3 which dives into the computational aspect of AQC, where we have solved simple example to understand the phenomenology. After this we use the concepts learn to solve a quantum algorithm adiabatically and simultaneously discuss its complexity. Finally before summarizing we briefly talk about the universality of AQC and its equivalence with the quantum circuit model.

5.2 Adiabatic Theorem

The content of this section is inspired by [71] and [72].

In this section we will initially discuss the celebrated adiabatic theorem that makes the basis for the adiabatic model of quantum computation. Essentially, adiabatic approxima- tion are employed as a standard methods in quantum mechanics to derive approximate solutions of the Schrödinger equation in the scenario when the Hamiltonian is slowly vary- ing. Its central theme can be summarized as follows:

If a quantum system has a time-dependent Hamiltonian and is in an eigenstate at the beginning of the process, it will remain in the corresponding eigenstate of the new Hamil- tonian (the one at the end of the process) provided that the evolution is slow enough

The Hamiltonian of a system provides a complete specification of its time evolution. At time t, let Ψ(t) denotes the state of the system whose evvolution is governed by the | i well-known Schrödinger equation

d i Ψ(t) = H(t) Ψ(t) , (5.1) dt | i | i where the initial condition to this differential equation is that Ψ(0) is the ground state | i of H(t). To solve this equation first we consider a family of instantaneous eigentates ψ(t) | i of H(t) H(t) ψ(t) = E (t) ψ(t) , (5.2) | i n | i where En(t) are the corresponding eigenenergies with E1(t) < E2(t) < ..., so that there is no degeneracy and the eigenvectors ψ(t) are chosen to be orthonormal. Next, we | i represent Ψ(t) in the basis formed by the eigenstates ψ(t) | i | i  Z t  X 0 0 Ψ(t) = a (t) exp i E (t )dt ψ(t) . (5.3) | i n − n | i n 0 Inserting the above expression in equation (5.1) and projecting onto ψ (t) and after h m | further simplification, we get the following evolution equations for the coefficients

D E ˙ ˙ X Hnk i}c˙k = (Ek i} ψk(t) ψk(t) ) i} cn. (5.4) − − En Ek n=6 k − A detail treatment with calculation can be found in Appendix (A) and interested read- ers can also refer to [71] and [72] for detailed explanation. Further, a useful expression in equation (5.4) is the rightmost term which encapsulates the fact that under slowly vary- ing Hamiltonian,this term will vanishes and hence and if the time-dependent Hamiltonian 100 Adiabatic Quantum Computing starts in the eigenstate ψ then it stays in ψ which precisely the statement of adiabatic | ki | ki theorem and can be seen mathematically from the equation shown below after taking the approximation and integrating

c (t) = c (0) exp iθ (t) exp iγ (t) , (5.5) k k { k } { k } where Z t 1 0 0 θk(t) = Ek(t )dt , (5.6) −} 0 and Z t D E ˙ 0 γk(t) = i ψk(t) ψk(t) dt . (5.7) 0 A condition for adiabatic regime can also be obtained after subsequent integration of equation (5.4) and letting total time of evolution T go to infinity. Thus, stating without proof, a general estimate of the time rate at which the adiabatic regime is approached can be expressed by

ε T , (5.8)  (∆E)2 where

dH(s) ε = max n(s) k(s) , (5.9) s∈[0,1] h | ds | i and

∆E = min [En(s) Ek(s)], (5.10) s∈[0,1] −

where s = t/T is the evolution profile (linear) with s [0,1]. ∈

5.2.1 Adiabatic quantum algorithm

A detailed analysis and discussion of the adiabatic algorithm will be presented in section 5.3 but a small prelude becomes essential after treatment of adiabatic theorem. The adiabatic quantum algorithm works by applying a time-dependent Hamiltonian that interpolates smoothly from an initial Hamiltonian H0 to a final Hamiltonian Hf . For instance, one can consider a linear interpolation path between the Hamiltonians

H(s) = [1 s(t)]H + s(t)H , (5.11) − 0 f with s(0) = 0 and s(T ) = 1, where T is the total time of evolution or the run time of the algorithm. We prepare the ground state of H0 at time t = 0, and then the state evolves from t = 0 to T according to Schrödinger equation. At time T , we measure the state and following the adiabatic theorem. if there is a non-zero gap between the ground state and the first exited state of H(s) for all s [0, 1] then the success probability of the algorithm ∈ approaches one in the limit T . How large the T should be is given by the equation → ∞ (5.8) where now one replaces the indices n and k by 0 and 1 corresponding to the ground state and first exited state, respectively. Hence, the total run-time T of the algorithm will be bounded by a polynomial in the number of qubits so long ∆ and ε are polynomially bounded. Ananya Singh 101

Fig. 5.1: Energy eigenvalue spectrum of Hamiltonian (1.12).

5.2.2 Landau-Zener transition The content of this section is inspired by [72] and the simulations are inspired by [73].

After a short introduction in adiabatic theorem, we would like to apply this formalism to simple two level quantum mechanical system with a time dependent Hamiltonian varying such that the energy separation of the two states can be approximated as a linear func- tion of time. Following, we would like to enquire about the probability of (non)adiabatic transition from one level to another, for which in 1932, Zener [74] published an exact so- lution. Landau [75] had also formulated the same model independently but Zener pointed out a missing factor of two in Landau’s calculation. Evidently, the exact solution to a one-dimensional semi-classical model for nonadiabatic transition came to be known as Landau-Zener formula. Consider a system starts in infinite past, in the lower energy eigenstate and we wish to obtain the transition probability of finding the system in exited energy eigenstate in the infinite future, a so called Landau-Zener transition. For infinitely slow variation of the energy difference , the adiabatic theorem tells us that no such transition will take place, as the system will always stay in its instantaneous eigenstate of the evolving Hamiltonian. At non-zero velocities, the transition will take place according to the Landau-Zener formula. Here, we will not set out to present the analytical treatment but rather we will verify the Landau-Zener prediction in a two level system through simulation. First, consider a slightly idealized 2 by 2 system given by the following Hamiltonian  αt 0  H(t) = 2 , (5.12) 0 αt − 2 with energy eigenvalue E (t) = αt/2 and E = αt/2 corresponding to eigenstates 1 2 − 102 Adiabatic Quantum Computing

Fig. 5.2: Energy eigenvalue spectrum of Hamiltonian (1.13).

0 and 1 . The energy eigenvalue spectrum is shown in Fig. 5.1. The instantaneous | i | i eigenstate 0 and 1 are time independent and their time evolution is clear from figure | i | i (1), that it stays in the same eigenstate from t = t = + .Next, we consider a −∞ → ∞ bit more interesting Hamiltonian by introducing the off diagonal elements, represented as follows

 αt H  H(t) = 2 12 , (5.13) H∗ αt 12 − 2

with H12 as constant. The energy eigenvalue spectrum is shown in Fig. 5.2 by green lines. At t = 0 the energy eigenvalue is H and if the magnitude goes down, then the two | 12| energy eigenvalues would come really close giving a finite transition probability. This is the probability which is given by Landau-Zener formula which we state here without proof

 2  H12 pad. = 1 exp 2πω12τα = 1 exp 2π | | , (5.14) − {− } − − }α

where ω is the Rabi frequency and τ H / α represents the duration of the 12 α ≡ | 12| | | change. We performed the simulation on QuTIP platform where we integrated the system dynamics of Hamiltonian (1.13) and obtained the state of the system for any intermediate time t. The results of the calculation are shown in Fig. 5.3 which shows the intermediate dynamics in terms of occupation probability of 0 and 1 states.The full code can be | i | i obtained from Appendix (B). It is easy to verify from the Fig. 5.3 that for simulation results matches with the theoretical prediction shown in black line. Ananya Singh 103

Fig. 5.3: The occupation probability of 1 (red curve) and 0 (blue curve) states of a | i | i quantum two level system. The solid black line is the final state according to equation (5.14).

5.3 Adiabatic Quantum Computing

5.3.1 The ingredients

The groundbreaking research by Farhi et al. [76] (from where the content of this section are also taken) shows how adiabatic evolution could be employed to solve computationally hard problems.The three main ingredients required for the algorithm to work are as follows:

a) Initial Hamiltonian H0 chosen in such a manner that the ground state of the system is easy to find/prepare.

b) Final Hamiltonian (or problem Hamiltonian) Hpthat encodes the problem (an ob- jective function) so that the ground state of the problem is the eigenstate of Hp having minimum eigenvalue.

c) A adiabatic evolution path, a function of s(t) that increases from 0 1 as t : 0 T , → → for some elapsed time T . An example of such a path is linear where s(t) = t/T .

Following the approach taken by Farhi et. al, we will take 3-SAT as an example to illustrate and explain the process. A 3-SAT problems contains n Boolean variables of bits and a collection of OR clauses which contain at most three of those variables each. The aim of the problem is to find a possible assignment that satisfies all clauses. 3-SAT is a NP-complete problem and to this date there is no polynomial time algorithm has been found. The approach we discuss in this section is purely theoretical and will serve as a basis for understanding the examples in later sections. In order to tackle the problem via adiabatic evolution, we need to make use of the 104 Adiabatic Quantum Computing principles learnt in previous section and the three ingredients mentioned above. To start with, we will recast the SAT problem which we wish to solve into one that minimizes an energy function i.e. our problem Hamiltonian. Consider an n-bit 3-SAT instance where each bit z (i goes from 0 n) can take value 0 or 1 and respective clauses will contain i → at most three of these n variables. Assume clause C contains the variables iC , jC and kC . A basic energy function for this clause should be minimized when C is satisfied. We will define ( 0 if ziC , zjC , zkC hC (z1, z2, ...zn) = (5.15) 1 otherwise, and the total energy function for the problem as X H(z1, z2, ...zn) = hC (z1, z2, ...zn). (5.16) C This energy function has a zero value iff all the clauses are satisfied. Following step is to implement above function as a Hamiltonian and reach its ground state, which is equivalent to solving the problem. To perform that, we consider qubits z as a elementary component | ii of our system with basis vectors 0 and 1 . The global state ψ will be represented | i | i | i as a tensor product z z ... z for which we will use short hand notation z z ...z . | 1i | 2i | ni | 1 2 ni There sample space will have 2n basis vectors describing every single variable assignment combination in our problem. Representing equation (1.16) in quantum terminology

H z z ...z = h z z ...z , (5.17) | 1 2 ni C | 1 2 ni and the total energy function will have an associated operator as a sum X H z z ...z = h z z ...z , (5.18) | 1 2 ni C | 1 2 ni C or simply put X H = HC . (5.19) C

This will serve as our problem Hamiltonian and we will represent as Hp. Our goal is make the system evolve to this Hamiltonian and to maintain its ground state. But in order to achieve that, we need to prepare our system into the ground state of a Hamiltonian we know. In general, the states of a basic quantum spin oriented along X axis serves the purpose as it is fairly easy to prepare a so in in a certain orientation along the X axis. We will represent our initial/base Hamiltonian as H0. Lastly, we now have to perform the computation by evolution, which under linear evolution during the time T should have the expression

H(t) = (1 t/T )H + (t/T )H , (5.20) − 0 p or using the dimensionless parameter s = t/T

H(s) = (1 s)H + sH . (5.21) − 0 p This will work for each individual clauses of the problem which means that

H (s) = (1 s)H + sH , (5.22) C − 0,C p,C and X HC (s) = HC (s). (5.23) Ananya Singh 105

This implies that at the end of the evolution, the system will reach ground state of the problem Hamiltonian. All we have to do in the end is do the measure the value of the qubits which are the actual variable assignment that satisfy the Boolean expression. An important aspect which comes from above analysis is that it can used to solve a lot of optimization problems. Starting with simple examples of adiabatic evolution for the case of two qubits we will show a simple application in the next section.

5.3.2 Some simple examples

We follow Farhi et. al. to describe AQC algorithm of the 2.bit disagree which is to output 01 and 10 but not 00 and 11 . The objective function on bits x = x x is | i | i | i | i 1 2 ( 0 if x1 = x2, fdisagree = 6 (5.24) 1 otherwise.

The problem Hamiltonian: The goal is to build a Hamiltonian such that each assignment matching to x has an eingenvalue that matches fdisagree. A bit of brainstorming and knowledge of Pauli σ operators would lead to the following problem Hamiltonian

1 + σzσz H = 1 2 . (5.25) p 2 It is easy to verify that the ground states of H are doubly degenerate with 10 and 01 . p | i | i The initial Hamiltonian: To form the initial Hamiltonian we use the Pauli-x operator x x σ . The eigenvectors of σ at x and x− . We define the initial Hamiltonian as | +i | i 2 X 1 H = (1 σx). (5.26) 0 2 − i i=1

One can easily verify the the eigenvalues of above Hamiltonian are 0,1,1 and 2 and the state corresponding to lowest energy eigenvalue is x x . Rewriting x x in terms of | + +i | + +i standard basis we have

1 1 1 1 x x = 00 + 01 + 10 + 11 . (5.27) | + +i 4 | i 4 | i 4 | i 4 | i

This means that H0 creates a ground state for which each eigenstate in the standard basis is equally likely to be observed. As a general rule in AQC, the initial Hamiltonian is specified in a basis that is orthogonal to that of the final Hamiltonian, so as to obtain an equiprobable superposition like this one. Using an orthogonal basis also ensures that a type of bad outcome called a “level crossing” is unlikely to occur. Lastly we will employ linear adiabatic evolution path as described in equation (1.21) and finally measure the ground state. The simulation results for two bit disagree is shown in Fig. 5.4 and the corresponding occupation probabilities are shown in Fig. 5.5 at different evolution rate t/T with T=1 for faster evolution and T=5000 for more adiabatic evolution. it is easy to observe that at faster speeds the probability to stay in the ground state drastically decreases. Similarly one can obtain same results for 2-bit agree case where the aim is to output 00 | i and 11 but not 10 and 01 , the only difference would be in the definition of the problem | i | i | i Hamiltonian. Another interesting case is 2-bit imply where the satisfying assignments are 00, 01 and 11. The relevant level diagram is shown in Fig. 5.6. 106 Adiabatic Quantum Computing

Fig. 5.4: The four eigenvalues of interpolating Hamiltonian associated with “2-bit disagree”. The same levels are associated with “2-bit agree”.

5.3.3 A simple application - 2-SAT with 3 qubits

To set the stage for the application, let’s imagine that you went to the supermarket and brought some supplies for the evening. Suddenly after reaching home you realized that a friend would be joining you for the dinner as well. You would like to serve your friend the best possible dinner with the supplies you have at hand. After a brief check you notice that you have a dry wine, a salmon fish and some delicious cheese. But there are some constraints in this dinner hangout. Wine pairs really well with fish, so it would be crazy to serve fish without wine. However, such a light wine would be overwhelmed by the cheese you bought, so you can’t possible serve them together. But on the other hand your friend likes both fish and cheese, you you would like to serve atleast one of them. An approach to find the best possible dinner would be to first represent the ingredients with Boolean variables p, q and r, indicating whether or not you will serve them. Further, use Boolean

Fig. 5.5: The corresponding occupation probabilities of the eigenvalues shown in Fig. 5.4 at different speed (T=500:Left and T=1:Right) Ananya Singh 107

Fig. 5.6: The four eigenvalues of the interpolating Hamiltonian associated with the 2-bit imply clause.

OR, AND and negation operators to represent the optimization problem. Your Boolean function may look like this

φ(p, q, r) = (p r) (q r) (p q), (5.28) ∨ ∧ ∨ ∧ ∨ where each bracket represents a clause where each variable is separated by OR operator and each clause is separated via AND operator. If φ(p, q, r) = TRUE for a particular assignment of truth values to the variable, we say that assignment satisfies φ. Readers are encouraged to verify the satisfying assignment found by them with the simulation results we will present next. Equation (1.28) is a three bit example that is build up from two bit clauses, so we have a instance of 2-SAT with three bits. Now we translate the Boolean expression to our Hamiltonian formalism and employ the constrains on the bit via the knowledge we acquired in previous section. We take the 2-bit imply clause on bits 1 and 2, the 2-bit disagree clause acting on bits 1 and 3, and the clause acting on bit 2 and 3. Although each 2-bit clause has more than one satisfying assignments, the full problem would have a satisfying assignment which we wish to obtain from AQC. The corresponding quantum Hamiltonian, H(s) = (1 s)H + sH , we write the sum of Hamiltonians each of which − 0 p acts on two bits, the problem Hamiltonian will have the form 12 13 23 Hp = Himply + Hdisagree + Hagree. (5.29) We performed the simulation using the above problem Hamiltonian and the results are shown in Fig. 5.7. The satisfying assignment we obtain is 011, which means you will serve fish and cheese for the dinner.

5.4 Adiabatic Quantum Algorithm

After such a preamble in adiabatic quantum computation in previous section, now we apply the same principles to a well known quantum algorithm. In this regard, we chose Grover’s 108 Adiabatic Quantum Computing

Fig. 5.7: The eight levels of the interpolating Hamiltonian with problem Hamiltoniam given by equation (5.29). search algorithm whose purpose is well described in the title : ’Quantum Mechanics Helps In Searching For Needle In A Haystack’ [77] by its originator Lov Grover. Below we will describe what kind of search we have in mind and subsequently address briefly both classical and quantum gate model while in later section digging deeper into adiabatic computation and complexity aspect of the algorithm.

5.4.1 Needle in the haystack Let us first consider a classical version of Grover’s search algorithm where we have an unsorted database with N items and there is a certain item w which one is looking for. For instance, one can imagine a telephone directory with N entries and a particular telephone number one is looking for. Additionally, assume that one is only provided with a black box for carrying out this search. This black box (or oracle), can decide whether an item is w or not. Mathematically, one has a Boolean function ( 1 if x = w f(x) = δxw = (5.30) 0 if x = w, 6 The classical oracle allows one to evaluate this Boolean function for any element x of the database. Assuming that each application of this oracle requires one elementary step, then a random classical search will require N 1 steps in worst case scenario and one step in − best possible situation. Hence, for large values of N, one would require on an average N/2 steps to find the item w. This means that the computational time increases with the size N of the database. In contrast, employing quantum mechanical superposition states (exploiting quantum interference) to address the elements of the data base, as is done in quantum gate model, the computational effort i.e. the number of times one need to call the oracle for conducting a search query only scales with O(√N). Hence, with increasing size of the database, i.e. increasing N, the search time increases much slower and one has achieved a quadratic speedup. The basic idea behind this quantum algorithm is to rotate the initial reference state of the qubit system representing the database in the direction of the searched state w | i Ananya Singh 109 with the help of unitary quantum version of oracle. Here we have stated the result without any rigorous treatment and calculations, interest readers are encouraged to check the article by Jones [78] for detailed explanation. Next, we will discuss the adiabatic computational aspect of Grover’s algorithm and subsequently address the complexity argument.

5.4.2 Finding the needle adiabatically This section is inspired by [79].

We now apply the adiabatic evolution method to the problem of finding the needle in our database. For simplicity, we will assume that there is only a single needle to be identified and our aim would be to find it in minimum time. We use n qubits to label the data which means that the Hilbert space is of dimension N = 2n. Here we represent the basis states as i where i= 0,1,....,N-1, while the marked state is denoted by w . Since we | i | i are unaware of the marked state w , we would choose our initial state as the superposition | i of all basis states N−1 1 X ψ = i . (5.31) | 0i √ | i N i=2

Following our treatment from section 1.2, we will now define our initial Hamiltonian H0

H = I Ψ Ψ , (5.32) 0 − | 0i h 0| whose ground state is equation (1.31) with energy zero. Next, we define the problem Hamiltonian Hp: H = I w w , (5.33) p − | i h | with the objective function ( z if z = w Hp z = | i 6 (5.34) | i 0 if z = w, 6 One can write the smoothly varying time dependent Hamiltonian as a linear interpolation between the two Hamiltonians

H(t) = (1 t/T )H + (t/T )H . (5.35) − 0 p The algorithm consist of initializing the system in the easy to prepare state ψ and then | 0i applying the Hamiltonian H(t) during a time T. The Hamiltonian is now well defined and instantaneous eigenvalue equation (5.2) i.e. H(t) ψ(t) = E (t) ψ(t) , can be solved to | i n | i obtain eigenvalue and eigenstates. We performed this simulation for n = 10 qubits and the results are shown in Fig. 5.8. Fig. 5.9 and Fig. 5.11 shows the result for different adia- batic speeds. One can easily observe that for faster speed, T=5 (Fig. 5.9) the occupation probability to find the needle drastically decreases near the point where the eigenvalues come close to each other. While in Fig. 5.10 at T=2000, for slower adiabatic speed the probability to remain in the ground state is significantly high which means you will find the needle 4 out of every 5 times. Similar analysis was done in [79] with n = 64 qubits and the results are shown in Fig. 5.11. We will now follow the analysis presented in this paper to highlight the com- plexity arguments in adiabatic Grover’s algorithm. To obtain the complexity one needs to evaluate the matrix element in equation (5.9) and the minimum gap in equation (5.10). It is easy to see that the two lowest energy E0 and E1 are separated by a time dependent gap r N 1 ∆E = 1 4 − s(1 s), (5.36) 10 − N − 110 Adiabatic Quantum Computing

Fig. 5.8: The two lowest eigenvalues of interpolating Hamiltonian for the Grover problem with 10 bits. with s = t/T . Then the minimum gap in equation (5.10) ∆E10 = 1/√N occurs at s =1/2.

The matrix element in equation (5.9) can be recast by using the variable transformation t = sT * + * + dH  ds dH˜ 1 dH˜ = = , (5.37) dt dt ds T ds 1,0 1,0 1,0 with dH/ds˜ = H H . The eigenvalues of H˜ (s) are plotted as a function of s in Fig. 5.11. p − 0 Therefore, knowing that * + ˜ dH 1, (5.38) ds ≤ provides the adiabatic condition in equation (5.8) as

N T , (5.39) ≥  where  1. Henceforth, the computation time is the order of N, and there is no advantage  of this search over the classical counterpart. Now, let us understand how to improve on this adiabaticity evolution method. One should note that we are applying equation (5.8) globally, i.e., to the entire time interval T. We impose a limit on the evolution rate during the whole computation while the limit is only severe when the gap is small or minimum (s = 1/2).Putting it differently, the gap is very large during the beginning and at the end of the computation and one doesn’t require this limit during those evolution intervals. Thus, by dividing T into infinitesimal time intervals dt and applying the adiabaticity condition locally to each of these intervals, we can vary the evolution rate continuously in time, thereby speeding up the computation. In other words, we do not use a linear evolution function s(t) any more, but we adapt the Ananya Singh 111

Fig. 5.9: Occupation Probabilities of the eigenstate shown in Fig. 5.8 when T = 5. evolution rate ds/dt to the local adiabaticity condition. The new condition is obtained by applying equation (5.8) to each infinitesimal time interval

ds dH˜ (s) ∆E2 1, s 0, s , (5.40) 10 ≥ | dt ||h | ds | i | for all times t. Substitution of equation (5.36), (5.37) into the equation (5.40) yields,

ds 1 = ∆E2 = [1 4(1 )s(1 s))]. (5.41) dt 10 − − N − After integration, one obtains the expression N t = [arctan[√N 1(2s 1)] + arctan √N 1]. (5.42) 2(N 2)1/2 − − − − For large N, one obtains π T √N. (5.43) ≈ 2 Thus, one obtains the quadratic speed up in comparison to the classical search, which is atleast expected from any quantum computation approach. Hence, the approach can be in true sense be perceived as the adiabatic evolution version of Grover’s algorithm.

5.5 Universality of AQC

We saw in previous section that both, quantum gate model and adiabatic computation, results in complexity of the same order in the context of Grover’s algorithm. This arises a seemingly important aspect, whether the gate model and the one with adiabatic evolution equivalent. Essentially, two models would be called equivalent if they produce same results upto a polynomial overhead. Simplifying, for any computational problem, the run time of 112 Adiabatic Quantum Computing

Fig. 5.10: Occupation Probabilities of the eigenstate shown in Fig. 5.8 when T = 2000. one model is equivalent to the run time of another, except with a factor that is at most polynomial in the size of the problem. U.Vazirani et. al. [80] in this paper discusses the power of adiabatic quantum computa- tion and presents a efficiently bounds for the model and shows that adiabatic computation can be efficiently simulated via quantum gate model. A proof of this statement can be understood by approximating the adiabatic evolution by a discrete, finite number of Hamil- tonians that renders the same output as that of continuous process. Further, they can be approximated by a set of unitary operations. If a adiabatic computation is performed with complexity of O(nd), then one can always find a sequence of unitary operations which allows for the computation to finish via polynomial number of quantum gates in O(nT ) time. This means that the computational power of adiabatic model is no better that the gate model, which is the first half of the equivalence statement. The other direction of the proof, i.e. a conventional quantum computation can be simulated with a adiabatic model within polynomial complexity, is not trivial and was presented by Aharonov et. al. [81]. The proof while being robust and interesting, is much more complex and requires a so called QMA-complete K-local Hamiltonian formulation. Getting into detail will beyond the scope of this report. Below we will present the one part of the proof and encourage interested reader to check out [82] for detailed and complete treatment of the proofs.

5.5.1 Gate model can efficiently simulate AQC

This particular treatment has been taken from [82].

Let us consider for simplicity a linear evolution path, ie. an AQC Hamiltonian of the form H(t) = (1 t/t )H + (t/t )H . This time dependent Hamiltonian is governed the − f 0 f p unitary operator Ananya Singh 113

Fig. 5.11: Energy eigenvalue spectrum of interpolating Hamiltonian for n = 64. Taken from [79]

 Z tf  U(tf , 0) = T exp i H(t)dt . (5.44) − 0

Next, it should be sufficient to show that the gate model can simulate U(tf , 0), for which we approximate the evolution by a product of unitaries involving time dependent Hamiltonians 0 Hm = H(m∆t)

M M 0 Y 0 Y n 0 o U(t , 0) U (t , 0) = U = exp i∆tH , (5.45) f → f m − m m=1 m=1 where ∆T = tf /M. The error incurred by this approximation is

0 q U(t , 0) U (t , 0) O( t poly(n)/M). (5.46) k f − f k∈ f Using Baker-Campbell-Hausdroff formula, one can approximate each term in equation (5.45) by     00 m∆t m∆t U(m) U (m) = exp i∆t(1 )H0 exp i∆t(1 )Hp , (5.47) → − − tf − − tf which incurs an error due to the neglected leading order commutator terms [A, B]/2, i.e, 2 0 00 tf U U O( H H ). (5.48) k m − mk∈ M 2 k 0 1k Hence, for the M terms in the product equation (5.45), a dominant error is observed in equation (5.46) M Y 00 U(t , 0) U (t , 0) O(t2 poly(n)/M). (5.49) k f − f k∈ f m=1 114 Adiabatic Quantum Computing

Hence, this means that we can approximate U(tf , 0) with a product of 2M unitaries pro- 2 vided that M scales as tf poly(n). Lastly, depending on the form of initial Hamiltonian H0, and problem Hamiltoanian Hp, we would need more decomposition in the equation (5.47) in terms of few-qubit unitaries. Thus, U(tf , 0) can be approximated as a product of unitary operators each of which acts on few qubits. The scaling of tf required for adiabatic evolution is inherited by the number of few qubit unitary operators in the corresponding gate model of the algorithm.

5.6 Summary

In section 5.2 we focused on the adiabatic aspect of the quantum computation and perceived its potential in solving optimization problem via a simple 2-SAT example in subsequent section. Although, adiabatic quantum computation seems different from that of sequential quantum computing but same complexity order was obtained in section 5.4 in the context of Grover’s algorithm and through arguments put forward in section 5.5,thus concluding that the two models are polynomially equivalent. AQC functions at the ground state (or near instantaneous ground state) of a time dependent Hamiltonian and hence, is expected to be resilient to relaxation and open system effects among the exited states which makes it a promising candidate for robust computation. We did not discuss the performance aspects of the simulations we presented. We tested simple problems to prove the correctness of adiabatic approach, so as expected the running time is small in the simulated mode in the classical computer and is not sufficient for com- parison. Regardless, the idea that the problem can be reduced to optimization problems and can be solved using energy paradigm is very promising. The abstraction we presented in this report is only one level above the actual D-Wave machine, a quantum computer which works on the principle of quantum annealing (which finds its roots in AQC). D-Wave machine is a topic for another report and we can not talk about its performance since we do not have access to one but nevertheless, at present it is finding its application in air- line scheduling, electron modeling, preventative healthcare and many more. To finish this report we will now briefly talk about NP-complete problems as an outlook and mention some studies done in this regard in the context of AQC.

5.6.1 NP-Complete problems

A shared belief among the quantum scientists is that quantum computers can solve prob- lems which are considered ’hard’ by classical computers. A ’hard’ problem in this context would be one which takes too long to solve as the input becomes too big. More pre- cisely, a classically hard problem would be one that cannot be solved using any classical algorithm whose running time increases only polynomially as the length of the input. A question which arises now, whether quantum computers can solve these classically difficult problems faster than classical computers. According to computational complexity, these problems can be majorly divided into two large groups. One for which the time required to find the solution grows polynomially with the size of the problem, called the P-class and other being the group which require polynomial time to verify the solution, a so called NP (nondeterministic polynomial) class. An important subset of NP are the so called NPC which has the property that any NP problem can be mapped to a NPC problem in polynomial time. An a consequence, if one could find a polynomial time algorithm for just one NPC problem, then this algorithm can be employed as a subroutine to solve all other NP problems within polynomial time. Hundreds of NPC problems are known, but the one we will direct our attention to is 3-SAT as we had a bit of exposure to a similar Ananya Singh 115 problem in section 5.2 i.e. 2-SAT. The only difference is now that each clause now contains three variables rather than 2 as in 2-SAT. It is worthwhile now to briefly mention the studies done to tackle the problem of 3- SAT via adiabatic quantum computing. Win van Dam et.al. [80] in their paper does not rule out polynomial time solution for 3-SAT. In their paper they also work out an Exact- Cover problem, which is another NPC problem and found quantum algorithm succeeds in a time that seems to grow only quadratically in the length of the input. T.Hogg [83] demonstrated polynomial complexity for 3-SAT for size upto n = 30. Farhi et. al. [76] also got polynomial complexity for small size 3-SAT. It is obvious to notice that these studies arrived at such complexity with small problem size which in our view is not enough to conclude any complexity argument. Studies done for larger sample by T. Neuhaus et. al. [84] i.e. n = 100 and 80 variables resulted in exponential complexity. Similarly, Marko Znidariac et. al. [85] ,suggest that for large problem sizes, gap decreases exponentially and runtime increases exponentially. There must be many more studies which may be done in this regard and the viewpoints regarding the complexity would also differ but nevertheless, we would gain a lot if for starters we attain better results than classical computation and from there aim towards increasing the performance. In the course of tackling this problem we would also figure out the instance at which a problem becomes hard. Approaching in this manner will potentially solve innumerable problems (even cancer) at present time and will also lay the foundation for unknown application.

App. 5.A Adaibatic theorem calculation

We wish to solve the following

i¯h∂ Ψ(t) = H(t) Ψ(t) . (5.50) t | i | i We will use a systematic approach by considering a family of instantaneous eigenstates:

H(t) ψ(t) = E (t) ψ(t) , (5.51) | i n | i with E1(t) < E2(t)..., so that there is no degeneracy. Calculations: X Ψ(t) = c (t) ψ (t) (5.52) | i n | n i n Schrödinger equation: X X i¯h (c ˙ ψ (t) + c ψ(˙t) ) = c (t)E (t) ψ (t) , (5.53) n | n i n| i n n | n i n n act with ψ (t) from the left: h k | X i¯hc˙ = E c i¯h ψ ψ˙ c , (5.54) k k k − h k| | ni n n X i¯hc˙ = (E i¯h ψ ψ˙ )c i¯h ψ ψ˙ c , (5.55) k k − h k| | ni k − h k| | ni n n=6 k we relate ψ ψ˙ to a matrix element of H(t) in the space of instantaneous eigenvectors h k| | ni of equation (5.51) and take its time derivative

H˙(t) ψ + H(t)ψ ˙(t) = E ˙(t) ψ + E (t)ψ ˙(t), (5.56) | ni n n | ni n n multiply by ψ (t) with k = n h k | 6 116 Adiabatic Quantum Computing

ψ (t) H˙(t) ψ + E (t) ψ (t) ψ˙ = E ˙(t) ψ (t) ψ˙ , (5.57) h k | | ni k h k | | ni n h k | | ni hence ˙ ˙ ψk(t) H(t) ψn H˙kn ψ (t) ψ˙ = h | | i = (5.58) h k | | ni E (t) E (t) E E n − k n − k We can plug in the equation (5.58) in equation (5.55) to obtain

D E ˙ ˙ X Hnk i}c˙k = (Ek i} ψk(t) ψk(t) ) i} cn. (5.59) − − En Ek n=6 k − If we ignore the extra term:

 Z t  Z t  1 0 0 ˙ 0 ck(t) = ck(0) exp Ek(t )dt exp ψk ψk dt , (5.60) i¯h 0 0 h | | i

c (t) = c (0) exp iθ (t) exp iγ (t) , (5.61) k k { k } { k } where Z t 1 0 0 θk(t) = Ek(t )dt , (5.62) −} 0 and Z t D E ˙ 0 γk(t) = i ψk(t) ψk(t) dt . (5.63) 0 App. 5.B Codes

One can find the codes of the simulation at the following link: https://drive.google.com/drive/folders/183MGXqwi7epAGziIiLOqijl7rQPgl-A7? usp=sharing Chapter 6

Topological Quantum Error Correction: Magic State Distillation

Dmitrii Dobrynin Supervisor: Lorenzo Cardarelli

In this chapter we present a brief review of one of the most prominent ap- proaches towards fault-tolerant universal quantum computation. In particular, we discuss computation with Clifford group gates, which can be effectively im- plemented fault-tolerantly, augmented by the injection of the so-called Magic States, which are reliably prepared via distillation algorithms. On top of a de- tailed discussion of the algorithm, we estimate the space costs of the magic distillation factories. The objective of this chapter is to provide the reader with a concise but clear introduction into profound concepts of both resilient and universal quantum computation. To fulfil this goal, we touch on the subject of conventional error-correcting encodings and their topological peers that are very promising for realisations. Additionally, we show why contextuality, the gener- alisation of non-local reality, arises in the context of Magic States and how it might be crucial for the speed-up of quantum computers.

6.1 Introduction

The intrinsic fragility of quantum computers (QC) is a major obstacle on the road towards useful “everyday” quantum computation. More than two decades ago, the pioneering works by Peter Shor [86,87] seemed to augur a fertile future for QC. However, physicists still face enormous challenges in performing large-scale, fast, and reliable problem-solving using all types of quantum systems. Any computing device is subject to errors. But whereas in classical computing systems −18 we deal with logical gates that are virtually perfect (perror can reach 10 ), in QC such tolerances, to put it lightly, are not accessible (p 10−3). This is why quantum error error ≈ correction (QEC) comes to the fore. Superficially, QEC tries to mimic classical methods by the creation of the redundancy of information (encoding) and applying a specific kind of majority voting. But the no-cloning theorem demonstrates that there is more to the subject than meets the eye, i.e. there is no function Uˆ that can clone any state α without disturbing it: Uˆ α, Uˆ α0 = αα . This and many other hurdles impede the realisation of a QC. ∀ ∃ | i 6 | i 118 Topological Quantum Error Correction: Magic State Distillation

Firstly, is it possible to detect errors by measurements without loss of information due to projections? Secondly, can all possible gates acting on encoded qubits tolerate the accumulation of errors? Thirdly, how are encodings realised in practice and up to what extent do they affect the size of quantum circuits, given that the resources are usually limited? The list of problems goes on. One of them is of the most importance to this chapter: is there a way to fault-tolerantly implement an arbitrary unitary operation, and what is the cost of such endeavour? The answers to the first three questions are given in Sec. 6.2. One of the solutions to the last problem is represented by Magic States, in Sec. 6.3. As implied above, the engineering of a universal QC is a challenge on its own. It can be proven that particular types of QC, even employing purely quantum effects like superposition and entanglement, can be efficiently simulated (polynomially in resources) by classical computers. As a result, in the last part of the chapter we argue that contextuality is the fundamental physical feature of the quantum world possibly essential for quantum advantage.

6.2 Theory of Quantum Error Correction

6.2.1 Basic Codes Let us start by introducing some of the most iconic error correcting codes.

1. 3-qubit repetition or bit-flip code

ψ = a 0 + b 1 ψbf = a 000 + b 111 . (6.1) | i | i | i → | i | i | i The idea of error correction in this case is rather simple. If a physical qubit suffers a bit-flip: ψ˜ = X I I ψ = a 100 + b 011 , | i 1 ⊗ ⊗ | i | i | i then one has to measure the Z1Z2 and Z2Z3 operators, which conserve the state and at the same time indicate the location of a bit-flip with the results of measure- ments given in Table 6.1. In quantum information these indicators are called error syndromes.

2. 9-qubit Shor code ( 000 + 111 )( 000 + 111 )( 000 + 111 ) ψShor = a | i | i | i | i | i | i + | i 2√2 ( 000 111 )( 000 111 )( 000 111 ) + b | i − | i | i − | i | i − | i (6.2) 2√2 ≡ a 0 Shor + b 1 Shor , ≡ | iL | iL where 0 and 1 are called logical basis states. One can check that the Shor code | iL | iL allows for detection and thus correction of X, Z and even XZ errors. Recalling that an arbitrary rotation can be written as:

RX (α)RZ (β)RX (γ) = c1I + c2X + c3Z + c4XZ,

where complex numbers c1−4 depend on the angles α, β, γ, we notice that now it becomes possible to correct arbitrary errors. The measurements project the state to only one term in the sum giving the corresponding syndromes. As a result, by considering only discrete set of Pauli errors, we render our state protected from any possible erroneous effect! Dmitrii Dobrynin 119

Z1Z2 Z2Z3 Error type Action +1 +1 no error no action +1 1 bit 3 flipped flip bit 3 − 1 +1 bit 1 flipped flip bit 1 − 1 1 bit 2 flipped flip bit 2 − − Table 6.1: Syndrome measurements results for the 3-qubit repetition (bit-flip) code. “+1, +1” corresponding to “no errors” stem from the fact that codespaces are defined to be a +1 eigenspace of the stabilizers S = Z Z ,S = Z Z : S ψ = ψ . Table from 1 1 2 2 2 3 i| i | i M. A. Nielsen and I. L. Chuang [3].

3. 5-qubit code The 9-qubit Shor code is not the smallest code in the number of physical qubits used to allow that. Actually, one can show that the smallest number of physical qubits required for detection of arbitrary single-qubit errors is 5. While being small in the number of physical qubits, this code has a rather complex logical structure: a ψ5q = ( 00000 + 10010 + 01001 + 10100 + 01010 11011 | i 4 | i | i | i | i | i − | i− 00110 11000 11101 00011 11110 − | i − | i − | i − | i − | i− (6.3) 01111 10001 01100 10111 + 00101 ) + − | i − | i − | i − | i | i b + ( 11111 + 01101 + 10110 + 01011 + ... ) a 0 5q + b 1 5q . 4 | i | i | i | i ≡ | iL | iL The 5-qubit code is very important for the further discussion, since it is used by the magic state distillation algorithm introduced in the next section.

Fig. 6.1: The general idea behind error-correcting codes: by choos- ing a codespace, one can detect errors by identifying in which of the resulting orthogonal spaces the state is. Every encoding of choice can protect against a specific set of errors. Figure from M. A. Nielsen and I. L. Chuang [3].

6.2.2 Stabilizer Formalism

The operators that we have used for the detection of errors are called stabilizers. Stabilizers constitute a group and define a codespace V (S) given by the set of eigenstates with the +1 eigenvalue. In turn, in order to fully define a stabilizer group, one chooses an independent set of generators Si, shown for the 5-qubit and 9-qubit Shor codes in Table 6.2. Having n physical qubits at our disposal, we define one logical qubit subspace using n 1 independent − stabilizers. One can define the distance d of a code as the minimum number of non-trivial (not identity) Pauli matrices constituting its logical operators. A quantum error-correcting code with a distance d 2t+1 can correct arbitrary errors acting on t physical qubits. One can ≥ 120 Topological Quantum Error Correction: Magic State Distillation

Operators Construction Operators Construction S1 ZZIIIIIII S1 XZZXI S2 IZZIIIIII S2 IXZZX ...... S3 XIXZZ S6 IIIIIIIZZ S4 ZXIXZ S7 XXXXXXIII ZL ZZZZZ S8 IIIXXXXXX XL XXXXX ZL XXXXXXXXX XL ZZZZZZZZZ

Table 6.2: Stabilizers and logical operators of the 5-qubit code and the Shor 9-qubit code. Both codes can correct any single-qubit error (arbitrary rotation). Comparing with Eq. (6.2), (6.3), one can see that stabilizers are much more convenient for the codespace definition. check that the 5-qubit and Shor 9-qubit codes have the same distance d = 3 which allows for correction of errors acting on a maximum of t = 1 qubits. There are many convenient features of the stabilizer approach which make it so appeal- ing. For one, if we have ψ stabilized by S: S ψ = ψ , then for any unitary operation U | i | i | i we have: USU †U ψ = US ψ = U ψ . (6.4) | i | i | i This means that USU † stabilizes U ψ and defines a new rotated codespace that also | i can be used for (rotated) error correction. For example, if U = H then the bit-flip code transforms as: a 000 + b 111 a+ + + + b , (6.5) | i | i → | i |−− −i protecting against single-qubit phase-flip (Z) errors. In every stabilizer architecture one has to perform different measurements of operators from the Pauli group. Fig. 6.2 shows the circuit which performs a measurement of any operator by measurements of only Z, thus making it an essential building block even in the advanced cases (see Sec. 6.2.4).

0 H H | i Fig. 6.2: Quantum circuit implementing a measurement of any operator S with only 1 eigenvalues. S ±

So far, for the logical operators, the stabilizer measurements and the errors we have only considered the operators from the n-qubit Pauli group Gn. However, many U operators (e.g. Toffoli) take g G out of the Pauli group under conjugation: UgU † G . In ∈ n 6∈ n this case, the stabilizer approach ceases to work because the convenient 1 eigenvalues ± assumption of all operators becomes invalid. To stay within Gn one has to use only the normalizer operators defined as follows:

N(G ): g G UgU † G . (6.6) n ∀ ∈ n ∈ n These specific rotations form a Clifford-group, and this group has only 3 generators (applied as tensor products on n physical qubits): 1 1 1  1 0 I 0  Hˆ = , Sˆ = ,CNOT = . (6.7) √2 1 1 0 i 0 X − Dmitrii Dobrynin 121

It turns out that any normalizer operation can be composed of only O(p2) (polynomial) number of operators from the set of Eq. (6.7). This crucial fact is the cornerstone of the proof of the famous Gottesman-Knill theorem [88].

Gottesman-Knill theorem: A quantum computer is effieciently classically simulated if:

1. only operations from normalizer of Gn are used; 2. measurements of only Pauli operators are performed; 3. the input states ρ of the circuit belong to the octahedron inside the Bloch sphere given by: ρ + ρ + ρ 1 (see Fig. 6.3). | x| | y| | z|≤ As a consequence, circuits satisfying the Gottesman-Knill theorem cannot perform uni- versal computation, since it can be proven that to approximate U acting on N qubits a quantum circuit scales as O(4N log4(1/)) in number of gates rather than polynomially [3].

Fig. 6.3: The octahedron of input states of quantum computation satisfying the Gottesman-Knill theorem. Taking states out- side of it, one can emulate gates required to perform universal computation (see state in- jection in Sec. 6.2.3).

6.2.3 Fault-tolerance and Universality

Having in possession a reliable stabilizer architecture we are one step away from making it fault-tolerant. By definition, a fault-tolerant quantum circuit is the one that does not accumulate errors. The threshold theorem states that it is possible to perform computation with arbitrary small errors by concatenation of error-correcting codes. But it is of little interest to have promising fault-tolerant architectures unless they allow us to perform universal computation, which they so far do not. Of course, one can augment the set of Clifford operators (Eq. (6.7)) by any non-Clifford and its inverse to promote the set of gates to a universal one. However, performing such a trick in a fault-tolerant fashion is an issue too. One smart method of achieving that is via state injection. Let’s now have a look how, using state injection, one can implement the non-Clifford π/8 gate: eiπ/8 e−iπ/8 0  Λπ/8 = . (6.8) √2 0 eiπ/8 The injected state in the circuit in Fig. 6.4b is given by:

1   Θ = 0 + eiπ/4 1 . (6.9) | i √2 | i | i 122 Topological Quantum Error Correction: Magic State Distillation

Θ SX Λπ/8 ψ ψ | i | i | 1iL

ψ2 L | i ψ | i (a) (b)

Fig. 6.4: a) The transversal implementation of the logical CNOT gate in a 7-qubit (Steane) code. Usually only Clifford group gates support transversal realisations making them fault- tolerant. One physical erroneous CNOT (outlined by red lines) causes only one error per logical block. b) The state injection circuit used for the emulation of non-Clifford group gates. One prepares a special ancilla Θ with required accuracy below the threshold and | i injects it into the circuit which operates using only Clifford group elements.

Applying CNOT to Θ ψ , where ψ = a 0 + b 1 , we arrive at: | i| i | i | i | i 1   CNOT Θ ψ = 0 ψ + eiπ/4 1 X ψ = | i| i √2 | i| i | i | i (6.10) 1 h i h i  = a 0 + eiπ/4b 1 0 + b 0 + eiπ/4a 1 1 . √2 | i | i | i | i | i | i

Measuring Z and applying SX operation in the case of the 1 state of the second qubit | i we get for the first qubit at the output:

iπ/8 1  iπ/4  e  −iπ/8 iπ/8  a 0 + e b 1 = e a 0 + e b 1 = Λπ/8 ψ . (6.11) √2 | i | i √2 | i | i | i

Remarkably, the state injection circuit itself is made within the stabilizer formalism, at first glance easily making it possible to implement non-Clifford gates fault-tolerantly. However, it still has the significant drawback of requiring the injected state Θ to be ideal. If it is | i faulty, then the injection circuit will emulate gates accordingly with errors. Fortunately, there exist distillation algorithms which allow for reliable preparation of a certain set of states for injection, the discussion of which is the main goal of this chapter and will be held in the next section. Before we touch on the subject of distillation and costs of reaching universality, we briefly review one of the most prominent stabilizer architectures today — topological codes. The combination of topology concepts together with state injection nowadays form one of the most promising approaches to universal quantum computation.

6.2.4 Topological Error Correction: Surface Code

There exist two major topological codes: the toric codes and the surface codes, the main difference of the latter from the former being the presence of a boundary. The planar layout suitable for conventional manufacturing techniques together with extraordinary high tolerances of the surface codes to errors is what makes them so unique and useful. The working principle of the surface code is as follows (see Fig. 6.5). All of the ZZZZ and XXXX operators are commuting independent stabilizers, measurements of which are performed using special black measurement qubits. The information is encoded in white data qubits. The stabilizers are measured (using circuits like in Fig. 6.2) from the very start and the resulting state is called the quiescent state. In this particular version of the Dmitrii Dobrynin 123 surface code, the entire structure represents one logical qubit. There are 41 physical qubits and 40 independent stabilizer leaving only 41 2 40 2 = 2 degrees of freedom (c. f. the ∗ − ∗ bit-flip code with 3 2 2 2 = 2). ∗ − ∗ The logical XˆL = Xˆ1Xˆ2Xˆ3Xˆ4Xˆ5 commutes with stabilizers but creates a quiescent state Ψ = Xˆ Ψ different from the one initially, manipulating one degree of freedom. The | X i L| i second introduced operator Zˆ anticommutes with Xˆ and gives Ψ = Zˆ Ψ without L L | Z i L| i changing stabilizer measurements. Next, if data qubits experience bit flip errors, the corresponding neighbouring Z stabilizers will indicate that with 1 measurement results. − This event is tracked using classical computers working in unison with the quantum surface code. The size of a surface code required for reliable computation mainly depends on the error rates of the gates. If these rates are below a certain threshold defined by the surface code implementation, then with increasing distance one gets better logical error rates (see Fig. 6.6). If the availability of physical elements in a circuit is not an issue, then with relatively high tolerances of p 100 10−1 employment of the surface codes is a reliable ≈ − way of designing scalable fault-tolerant quantum computers.

Fig. 6.5: A surface code example with distance d = 5. The entire structure shown rep- resents one logical qubit. The white circles denote physical data qubits, the black circles are physical qubits used in stabilizer measurements. Each operator of the form XˆXˆXˆXˆ or ZˆZˆZˆZˆ is an independent stabilizer. XˆL = Xˆ1Xˆ2Xˆ3Xˆ4Xˆ5 and ZˆL = Zˆ6Zˆ7Zˆ3Zˆ8Zˆ9 are logical operators commuting with stabilizers and anticommuting with each other: Zˆ , Xˆ = 0. Figure from A. G. Fowler et al. [89]. { L L} 124 Topological Quantum Error Correction: Magic State Distillation

Fig. 6.6: Logical XL error rates for surface codes with different dis- tances d. For per-step error rates p below p 0.57%, logical errors de- th ≈ crease with the increase of a code’s distance. Curve d = 5 corresponds to the code from Fig. 6.5. Figure from A. G. Fowler et al. [89].

6.3 Magic State Distillation

Protocol rounds …

ρout = |T⟩⟨T | ρ in ρ ρ ρin ρ in ρ ρ ρ or in ρ in ρ ρ ρ ρout = |H⟩⟨H |

(a) (b)

Fig. 6.7: a) The magic state distillation algorithm takes many noisy states at the input and processes them giving at the output less states after each protocol round but with better fidelity with pure magic states. b) Illustration of the general approach to universal fault- tolerant quantum computation using magic states. The distillation is realised in the magic factories the goal of which is to supply enough magic states to keep up with the quantum computation.

6.3.1 Magic States The ancillas used in the state injection circuit belong to the very useful set of states called the magic states (see Fig. 6.8a): 1  1  T T = I + ( X Y Z) , (6.12) | ih | 2 √3 ± ± ± 1  1  H H = I + ( X Y ) . (6.13) | ih | 2 √2 ± ± Besides the fact that they allow for emulation of non-Clifford group gates, they also possess a special property: they can be reliably prepared using the magic state distillation (MSD) algorithm [88]. An outline of the distillation is shown in Fig. 6.7a. The general idea is to take many noisy states described by ρin and purify them getting less states at the output with near to perfect fidelity of ρ and required T or H states. This approach is iterative, meaning out | i | i Dmitrii Dobrynin 125 that one can apply it as many times as necessary tailoring the number of repetitions to the output accuracy. The more rounds one uses, the more costly it is to use magic state distillation. We give the estimates of costs right after the detailed discussion of the algorithm steps. As we now have a high threshold fault-tolerant stabilizer architecture (surface code) with the ability to promote it to the universal computation system (magic state injection), we are ready to discuss the general structure of our QC (see Fig. 6.7b). The magic states are prepared in the special computational units — magic factories. These factories may have different physical realisations, i.e. using various physical gates and/or error-correcting codes. Moreover, one might use factories with more protocol rounds for one type of gate emulation and less rounds for others, optimizing state preparation speed to threshold requirements. The emulation of non-Clifford group gates via injection circuits needs at least one magic state supplied, thus one would usually need many factories not to leave the quantum computation parts of the circuit idling. z

T T + | i | i

H H | i | i m

T − T | i | i

(a) (b)

Fig. 6.8: a) The octahedron within the Bloch sphere defined in the Gottesman-Knill theo- rem: ρ + ρ + ρ 1, and examples of the T and H magic states. Both magic states | x| | y| | z|≤ | i | i lie outside of the octahedron and on its rotational symmetry axes. b) The projection of the octahedron on the z m plane with all T and H magic states in that plane. The colored − | i | i areas represent all the noisy states at the input of the distillation which can be successfully distilled.

6.3.2 Distillation Algorithm In this section the distillation of T magic state is introduced. The distillation of H | i | i states follows an overall similar procedure but with more bulky mathematical operations, and further is omitted. The algorithm proceeds according to the following steps [88]: 1. Since there is an entire family of T magic states, we restrict ourselves to considering | i only the one lying in the positive quadrant: ρ : ρx > 0, ρy > 0, ρz > 0, and denoted by T + . To begin, we have to make sure that the noisy input states satisfy | i   F (ρ ) = T + ρ T + > F th = 0.823 F th = 0.860 . (6.14) T in h | in| i T H 126 Topological Quantum Error Correction: Magic State Distillation

This threshold values of fidelity distinguish the states that are distillable from those that are not (see Fig. 6.8b). One should keep in mind that the states inside of the octahedron are fundamentally useless for MSD due to Gottesman-Knill theorem. 2. The T + and T − states are the eigenstates of the following operator: | i | i eiπ/4 1 1  ΛT = , √2 i i − which performs a 2π/3 rotation about the corresponding axis. Keeping it in mind, we apply a dephasing transformation given by:

1  † †  DT (ρin) = ρin + ΛT ρinΛ + Λ ρinΛT = 3 T T (6.15) = F T + T + +(1 F ) T − T − = ρ . T | ih | − T | ih | It effectively maps our input state to the direction of the magic state (see Fig. 6.9). 3. In the following we will be projecting our dephased input onto the 5-qubit codespace, and for that we tensor multiply five of the input states ρ: ⊗5 + + − − ⊗5 ρ = FT T T +(1 FT ) T T = | ih | − | ih | (6.16) = F 5 T +++++ T +++++ +F 4 (1 F ) T ++++− T ++++− + ..., T | ih | T − T | ih | where for example T ++++− = T +T +T +T +T − . | i | i 4. At the next step using stabilizers we define a 5-qubit codespace by the following projector: 4 1 Y Π = (I + S ) , (6.17) 16 j j=1

where S1 = XZZXI,S2 = IXZZX,S3 = XIXZZ,S4 = ZXIXZ. Such a simple form of the projector Π is due to the fact that stabilizer operators Si have only +1 and 1 eigenvalues (S G ), thus the +1 subspace of each S can simply be defined − i ∈ n i by Πi = 1/2(I + Si). In the 5-qubit subspace logical operators are very intuitively ⊗5 ⊗5 constructed (cf. Tab. 6.2), giving for example YL = Y and ΛTL = ΛT . 5. Next we define the logical T ± states by: | L i   ± ± 1 1 T T Π IL (XL + YL + ZL) | L ih L | ≡ 2 ± √3 → (6.18) Π = T + T + + T − T − . → | L ih L | | L ih L | One can also check that T + = √6Π T −−−−− , | L i | i (6.19) T − = √6Π T +++++ , | L i | i and that they are the eigenstates of the logical ΛTL. 6. The crucial step in the distillation is the projection of ρ⊗5 onto the 5-qubit codespace. This projection is made by observing +1 measurements results of all the stabilizers Si. If a measurement is not successful, then we simply discard the result and take new input states. Defining  = 1 F , for the projection ρ¯ = Πρ⊗5Π we get: − T out  5 2 3   5 3 2   + 5 (1 ) + + (1 ) + 5 (1 ) − − ρ¯out = − T T + − − T T = 6 | L ih L | 6 | L ih L | (6.20) = p+ T + T + +p− T − T − , | L ih L | | L ih L | Dmitrii Dobrynin 127

with the probability of observation given by: 5 2 3 3 2 5 ⊗5 + −  + 5 (1 ) + 5 (1 ) + (1 ) p = Tr Πρ = p + p = − − − . (6.21) s 6 This success probability is show in Fig. 6.10b and reaches 1/6 at  0. Therefore, → on average we would need at least 6 5 = 30 states at the input of distillation to × obtain only one state at the output. z

ρin Fig. 6.9: Steps of one round of the + T distillation algorithm. ρin – input ρout | i states, ρ = DT (ρin) – states af- ρ ter applied dephasing, ρout – out- put states after selecting only suc- cessful projections on the 5-qubit codespace. m

0, 5 p6

Output ǫ 0, 15 0, 4 Input ǫ p5

0, 3 0, 125 p4 ǫ ps 0, 2 conv div 0, 1 p3 ρin ρ in p2 p 0, 1 1 p1 0, 075 p2 ǫth p3 p4 p5 0 0 0, 1 0, 2 0, 3 0, 4 0, 5 0 0, 1 0, 2 0, 3 0, 4 0, 5 ǫin ǫin

(a) (b)

Fig. 6.10: a) Errors  = 1 F at the input of distillation (red line) and at the output − T of distillation (blue curve). The threshold  = 1 F th separates the distillable input th − T states (convergence to T or  = 0) from non-distillable states (convergence to 1/2Iˆ or | i  = 0.5). b) Success probabilities of the each round of the distillation. When the algorithm converges to T state, p increases, i.e. one needs less states to proceed to the next round. | i s

7. We notice that at  0ρ ¯ T − T − , where T − = √6Π T +++++ . Should it → out → | L ih L | | L i | i be necessary, one can decode T +++++ to T + 0 0 0 0 , thus obtaining the normalized | i | i single-qubit magic state with an analytical expression of the output fidelity given by (see Fig. 6.10): 3 out 1 + 5t 1 FT FT = 2 3 5 , t = − . (6.22) 1 + 5t + 5t + t FT 8. Taking the output states of the distillation at the input again as many times as necessary one can improve the fidelity, matching architecture’s requirements for error rates of emulated gates. 128 Topological Quantum Error Correction: Magic State Distillation

6.3.3 Costs of Computation with Magic States

Magic states Physical qubits in factory required −4 pg = 10 , ttoff = 0.1tsc Problem Type Count −3 −5 tsc = 10 s tsc = 10 s 1000-bit Toffoli 1010.6 6.3 106 (7 weeks) 6.3 106 (11 h) Shor × × 2000-bit Toffoli 1011.5 7.0 106 (53 weeks) 7.0 106 (3.7 days) Shor × ×

Table 6.3: Costs of MSD using surface codes for the Shor factoring problem. It takes 40N 3 sequential Toffoli gates for the algorithm to factor an N-bit number. Quantum device specifications are: pg —gates error rates, ttoff —Toffoli gates operation time, tsc — mea- surement time of surface code stabilizers. Table from J. O’Gorman and E. T. Campbell [90].

Having at our disposal powerful tools like the surface code or the magic state distillation, we arrive at the dilemma “costs vs benefits”. To benchmark the approach we will be using the Shor factoring algorithm, which is one of the few practically useful quantum algorithms which has been proven to be exponentially faster than its classical counterparts. The reasoning goes as follows [89]:

1. To factor N-bit number the Shor algorithm requires 40N 3 sequential Toffoli gates.

2. Each Toffoli gate requires one special Toffoli magic state (not introduced above) or seven H magic states (see Eq. (6.13)) supplied in parallel. | i 3. Depending on the surface code specifications it might take different number of phys- ical qubits to reliably prepare one magic state. One can estimate that it takes one magic factory of the size of O(105) physical qubits (surface code footprint) to distill a state in O(10−5)s.

4. The frequency at which sequential Toffoli gates are needed is roughly O(107) Hz. Thus one needs at least O(102) factories to supply magic states to keep up with the quantum computation. One can see more definite numbers with corresponding factoring times for the specific kinds of surface codes in Table 6.3.

5. The time estimates of the magic states approach to computing Shor algorithm are rather striking since they promise factoring of 1000-bit numbers in as little as hours. In comparison, currently classical computers have only achieved factoring of 829-bit RSA-250 numbers in 2700 high-performance core-years [91]. This dramatic difference becomes less surprising due to the fact that the best known scaling of number of gates of classical computers for the problem is exponential.

The footprint of the magic state factories compared to the footprint of the computa- tional blocks may range quite drastically, in all cases, however, being significantly costly. One can find approaches which use from 10% to 90% of physical qubits in the facto- ≈ ≈ ries [89, 92, 93]. Unfortunately, the major reduction of the footprint using smart protocols usually does not result in equally significant improvements in terms of time costs. This means that the best way to reduce the costs of fault-tolerant architectures is to simply correct less errors by improving the error rates of the physical elements of the quantum systems (easier said than done). Dmitrii Dobrynin 129

6.3.4 Implications of Magic States Universality What is still left unaddressed are the areas between the octahedron from the Gottesman- Knill theorem and the distillable areas (colored areas in Fig. 6.8b). It yet to be proven whether it is impossible or not to use these states for specific distillation procedures. However, such a proof would have a tremendous corollary: “... it would imply that the Gottesman-Knill theorem provides necessary and sufficient conditions for the classical sim- ulation, and that a transition from classical to universal quantum behaviour occurs at the boundary of the octahedron” [88].

Fig. 6.11: The connection of “magic” to the physical phenomena of contextuality and the is- sue of universal computation using MSD. Whether contextuality is indeed the resource for quantum speed-up is yet to be proven.

Contextuality The source of quantum speed-up has been elusive for quite a while, with the scientific community eventually ascertaining that neither parallelism nor entanglement are sufficient for a QC to be advantageous. Nowadays, it is very well known, that the quantum measurement results directly con- tradict the Bell’s inequality which has to be satisfied by any theory to meet the local reality requirement [94]. As a result, no hidden variables can be introduced to reconcile the quan- tum world with classical expectations. It was also shown that “the non-locality of quantum theory is a special case of contextuality [95]”. Just like in the non-locality case, one can also check special inequalities to determine whether a certain physical system satisfy con- textuality or not. In our particular case of quantum computation using state injection it can be shown that a state is non-contextual with respect to stabilizer measurements if and only if it lies inside the octahedron, Fig. 6.3. The consequence of this is that contextuality is a necessary resource for universal quan- tum computation using MSD. Proving the existence of distillation algorithms for the states lying inside the Bloch sphere between the octahedron and the distillable areas would mean that contextuality is also the sufficient resource, and indeed it may be the unique feature of the quantum world powering mysterious quantum advantage.

6.4 Conclusion

Starting the discussion from the very basic error-correcting codes and fundamental stabi- lizer formalism we have touched on the subject of topological codes and elaborated on the importance of the state injection procedure. The main goal was to introduce the set of magic states which possess very special properties allowing for the simultaneous solution of the problems of fault-tolerance and universality via the magic state distillation. In this chapter we have shown how and why the magic state distillation and the surface codes represent together a promising candidate for feasible universal quantum computers. The main disadvantage of MSD to date was identified in the significant resources con- sumption of the distillation factories. In light of this, one should keep in mind that the 130 Topological Quantum Error Correction: Magic State Distillation

emulation of non-Clifford gates using magic states is not a panacea and other solutions to the problem of universal quantum computation exist [96]. Regardless of the significant practical drawbacks of the magic states, it is astonishing how much power is hidden behind certain states and specific regions of the Bloch sphere. Quite remarkably, besides solving problems like the number factoring in a reliable way, the magic states provided us with crucial insights into the long-debated issue of the essence of quantum advantage by linking the distillation procedure to the quantum phenomenon of contextuality.

App. 6.A Program for MSD plots

The plots shown in Fig. 6.10 have been created using Asymptote vector graphic language [97] from the data calculated using Quantum++ library for C++. [98] In the following the major parts of the code generating the states errors  = 1 F (see Eq. (6.14)) of every − T round of the distillation are shown. When not commented, the definitions of the code’s instances are assumed to be self-explanatory. The recursive function takes the input state rho and performs magic_projection using co.P projector (Eq. (6.17)). The recursion continues until, in this case, state error −3 eps is less than 10 . At each round  and ps are stored and subsequently used by Asymp- tote for plots. CodeOperators and MagicStr structures have all important definitions for 5-qubit codespace and Magic States respectively.

217 \\cmat = complex matrix 218 \\kron = kronecker product from a vector of matrices 219 \\kronpow = kronecker product of identical matrices 220 \\prj function makes a projector from a state 221 222 struct CodeOperators{ 223 cmat S1{kron(std::vector{gt.X,gt.Z,gt.Z,gt.X,gt.Id2})}; 224 cmat S2{kron(std::vector{gt.Id2,gt.X,gt.Z,gt.Z,gt.X})}; 225 cmat S3{kron(std::vector{gt.X,gt.Id2,gt.X,gt.Z,gt.Z})}; 226 cmat S4{kron(std::vector{gt.Z,gt.X,gt.Id2,gt.X,gt.Z})}; 227 ... 228 cmat I5{kronpow(gt.Id2, 5)}; 229 cmat P{(I5+S1)*(I5+S2)*(I5+S3)*(I5+S4)/16.}; 230 }; 231 static const CodeOperators co; 232 ... 233 struct MagicStr{ 234 double T_angle = acos(1./sqrt(3.))/2; 235 ... 236 MagicStr(){ 237 Tp = cos(T_angle)*st.z0+exp(1_i*pi/4)*sin(T_angle)*st.z1; 238 Tm = sin(T_angle)*st.z0+exp(5.*1_i*pi/4)*cos(T_angle)*st.z1; 239 Tp_L = sqrt(6.)*co.P*kronpow(Tm, 5); 240 Tm_L = sqrt(6.)*co.P*kronpow(Tp, 5); 241 Tp_proj = prj(Tp_L); 242 Tm_proj = prj(Tm_L); 243 ... 244 } 245 ... 246 cmat get_T_L(double eps) const{ 247 return (1-eps)*Tm_proj+eps*Tp_proj; 248 } 249 ... Dmitrii Dobrynin 131

250 }; 251 static const MagicStr magic; 252 ... 253 254 struct round_data{ 255 double eps; 256 double ps; 257 round_data(double eps, double ps): eps(eps), ps(ps){;} 258 }; 259 ... 260 round_data magic_projection(cmat rho_in){ 261 cmat rho_5 = kronpow(rho_in, 5); 262 double p_plus = abs((magic.Tp_L.adjoint()*rho_5*magic.Tp_L)[0]); 263 double p_minus = abs((magic.Tm_L.adjoint()*rho_5*magic.Tm_L)[0]); 264 double ps = p_plus + p_minus; 265 return round_data(p_minus/ps, ps); 266 } 267 ... 268 double eps_stop = 0.001; 269 void recursive(cmat rho, ofstream& output){ 270 round_data temp = magic_projection(rho); 271 double eps = temp.eps; 272 double ps = temp.ps; 273 output << eps <<"," << ps <<"\n"; 274 if((eps>eps_stop && epseps_stop && eps>magic.eps_th)){ 276 recursive(magic.get_T_L(eps), output); 277 } 278 } Bibliography

[1] E. Nielsen, J. K. Gamble, K. Rudinger, T. Scholten, K. Young, and R. Blume-Kohout, Gate Set Tomography, pp. 1–48 (2020).

[2] S. Haroche and J. M. Raimond, Exploring the Quantum: Atoms, Cavities, and Pho- tons (2010).

[3] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, Cambridge, 2000).

[4] D. Greenbaum, Introduction to Quantum Gate Set Tomography, (2015).

[5] J. Lin, B. Buonacorsi, R. Laflamme, and J. J. Wallman, On the freedom in representing quantum operations, New Journal of Physics 21 (2018).

[6] R. Blume-Kohout, J. K. Gamble, E. Nielsen, J. Mizrahi, J. D. Sterk, and P. Maunz, Robust, self-consistent, closed-form tomography of quantum logic gates on a trapped ion qubit, 87185 (2013).

[7] C. Jackson and S. Van Enk, Nonholonomic tomography. I. the Born rule as a connec- tion between experiments, Physical Review A 95, 1 (2017).

[8] R. Blume-Kohout, J. K. Gamble, E. Nielsen, K. Rudinger, J. Mizrahi, K. Fortier, and P. Maunz, Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography, Nature Communications 8 (2017).

[9] C. Jackson and S. J. Van Enk, Detecting correlated errors in state-preparation- and-measurement tomography, Physical Review A - Atomic, Molecular, and Optical Physics 92, 1 (2015).

[10] J. M. Renes, R. Blume-Kohout, A. J. Scott, and C. M. Caves, Symmetric informa- tionally complete quantum measurements, Journal of Mathematical Physics 45, 2171 (2004).

[11] J. Johansson, P. Nation, and F. Nori, Qutip 2: A python framework for the dynamics of open quantum systems, Computer Physics Communications 184, 1234 (2013).

[12] E. Nielsen, K. Rudinger, T. Proctor, A. Russo, K. Young, and R. Blume-Kohout, Probing quantum processor performance with pyGSTi, Quantum Science and Tech- nology 5 (2020).

[13] S. T. Merkel, J. M. Gambetta, J. A. Smolin, S. Poletto, A. D. Córcoles, B. R. Johnson, C. A. Ryan, and M. Steffen, Self-consistent quantum process tomography, Physical Review A - Atomic, Molecular, and Optical Physics 87, 1 (2013).

[14] J. Watrous, The Theory of Quantum Information (2018). BIBLIOGRAPHY 133

[15] T. L. Scholten and R. Blume-Kohout, Behavior of the maximum likelihood in quantum state tomography, New Journal of Physics 20 (2018).

[16] J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2 (July), 79 (2018).

[17] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru- Guzik, and J. L. O’Brien, A variational eigenvalue solver on a photonic quantum processor, Nature Communications 5 (1), 4213 (2014).

[18] E. Farhi, J. Goldstone, and S. Gutmann, A Quantum Approximate Optimization Algorithm, pp. 1–16 (2014).

[19] J. Romero Fontalvo, Variational Quantum Information Processing, Doctoral disserta- tion, Harvard University, Graduate School of Arts and Sciences. (2019).

[20] G. Aleksandrowicz, T. Alexander, P. Barkoutsos, L. Bello, Y. Ben-Haim, D. Bucher, F. J. Cabrera-Hernández, J. Carballo-Franquis, A. Chen et al., Qiskit: An Open-source Framework for Quantum Computing, Quantum Information Science Kit (2019).

[21] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, The theory of vari- ational hybrid quantum-classical algorithms, New Journal of Physics 18 (2), 023023 (2016).

[22] S. Lloyd, Computational Capacity of the Universe, Physical Review Letters 88 (23), 237901 (2002).

[23] Y. Herasymenko and T. E. O’Brien, A diagrammatic approach to variational quantum ansatz construction, arXiv pp. 1–18 (2019).

[24] S. Aaronson, Lecture 16 - QMA, Quantum Complexity Theory– Course No. 6.845, MIT OpenCourseWare (2010).

[25] M. Suzuki, Generalized Trotter’s formula and systematic approximants of exponential operators and inner derivations with applications to many-body problems, Commu- nications in Mathematical Physics 51 (2), 183 (1976).

[26] A. G. Taube and R. J. Bartlett, New perspectives on unitary coupled-cluster theory, International Journal of Quantum Chemistry 106 (15), 3393 (2006).

[27] D. Wecker, M. B. Hastings, and M. Troyer, Progress towards practical quantum vari- ational algorithms, Physical Review A - Atomic, Molecular, and Optical Physics 92 (4), 042303 (2015).

[28] M. H. Yung, J. Casanova, A. Mezzacapo, J. McClean, L. Lamata, A. Aspuru-Guzik, and E. Solano, From transistor to trapped-ion computers for quantum chemistry, Scientific Reports 4 (1), 1 (2014).

[29] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets, Nature 549 (7671), 242 (2017).

[30] P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gustavsson, and W. D. Oliver, A quantum engineer’s guide to superconducting qubits, Applied Physics Reviews 6 (2), 021318 (2019). 134 BIBLIOGRAPHY

[31] K. Wright, K. M. Beck, S. Debnath, J. M. Amini, Y. Nam, N. Grzesiak, J. S. Chen, N. C. Pisenti, M. Chmielewski, C. Collins, K. M. Hudek, J. Mizrahi, J. D. Wong- Campos, S. Allen, J. Apisdorf, P. Solomon, M. Williams, A. M. Ducore, A. Blinov, S. M. Kreikemeier, V. Chaplin, M. Keesan, C. Monroe, and J. Kim, Benchmarking an 11-qubit quantum computer, Nature Communications 10 (1), 1 (2019).

[32] A. Lucas, Ising formulations of many NP problems, Frontiers in Physics 2, 1 (2014).

[33] P. Vikstål, M. Grönkvist, M. Svensson, M. Andersson, G. Johansson, and G. Ferrini, Applying the quantum approximate optimization algorithm to the tail-assignment problem, Physical Review Applied 14 (3), 034009 (2020).

[34] S. Lloyd, Universal Quantum Simulators, Science 273 (5278), 1073 (1996).

[35] T. C. Yen, V. Verteletskyi, and A. F. Izmaylov, Measuring All Compatible Operators in One Series of Single-Qubit Measurements Using Unitary Transformations, Journal of Chemical Theory and Computation 16 (4), 2400 (2020).

[36] C. Hempel, C. Maier, J. Romero, J. McClean, T. Monz, H. Shen, P. Jurcevic, B. P. Lanyon, P. Love, R. Babbush, A. Aspuru-Guzik, R. Blatt, and C. F. Roos, Quantum chemistry calculations on a trapped-ion , Physical Review X 8, 031022 (2018).

[37] J. I. Colless, V. V. Ramasesh, D. Dahlen, M. S. Blok, M. E. Kimchi-Schwartz, J. R. McClean, J. Carter, W. A. de Jong, and I. Siddiqi, Computation of molecular spectra on a quantum processor with an error-resilient algorithm, Physical Review X 8 (2018).

[38] R. Santagati, J. Wang, A. A. Gentile, S. Paesani, N. Wiebe, J. R. McClean, S. Morley- Short, P. J. Shadbolt, D. Bonneau, J. W. Silverstone, D. P. Tew, X. Zhou, J. L. O’Brien, and M. G. Thompson, Witnessing eigenstates for quantum simulation of hamiltonian spectra, Science Advances 4, eaap9646 (2018).

[39] E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris, T. Papen- brock, R. C. Pooser, D. J. Dean, and P. Lougovski, Cloud quantum computing of an atomic nucleus, Physical Review Letters 120, 210501 (2018).

[40] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. Johnson, M. Kieferová, I. D. Kivlichan, T. Menke, B. Peropadre, N. P. Sawaya, S. Sim, L. Veis, and A. Aspuru- Guzik, Quantum chemistry in the age of quantum computing, Chemical Reviews 119, 10856 (2019).

[41] S. McArdle, S. Endo, A. Aspuru-Guzik, S. C. Benjamin, and X. Yuan, Quantum computational chemistry, Reviews of Modern Physics 92 (1) (2020).

[42] P. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero, J. R. McClean, R. Barends, J. Kelly, P. Roushan, A. Tranter, N. Ding, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, E. Jeffrey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley, C. Neill, C. Quintana, D. Sank, A. Vainsencher, J. Wenner, T. C. White, P. V. Coveney, P. J. Love, H. Neven, A. Aspuru-Guzik, and J. M. Martinis, Scalable quantum simulation of molecular energies, Physical Review X 6, 1 (2016).

[43] Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. D. McClain, E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K. Chan, PySCF: the python-based simulations of chemistry framework, Wiley Interdisciplinary Reviews: Computational Molecular Science 8 (1), e1340 (2017). BIBLIOGRAPHY 135

[44] T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic-Structure Theory (Wi- ley, 2000).

[45] P. Atkins and R. Friedman, Molecular Quantum Mechanics, 5th edition (Oxford Uni- versity Press, 2010).

[46] J. T. Seeley, M. J. Richard, and P. J. Love, The bravyi-kitaev transformation for quantum computation of electronic structure, The Journal of Chemical Physics 137 (22), 224109 (2012).

[47] S. Bravyi, Fermionic quantum computation, Annals of Physics 298 (1), 210 (2002).

[48] C. Cohen-Tannoudji, B. Diu, and F. Laloë, Quantum Mechanics, Volume 2: Angular Momentum, Spin, and Approximation Methods (Wiley, 2019).

[49] J. J. Sakurai and J. Napolitano, Modern Quantum Mechanics (Cambridge University Press, Cambridge, 1985).

[50] D. Browne, Topological codes and computation, A lecture course given at the Univer- sity of Innsbruck (2014).

[51] A. Y. Kitaev, Fault-tolerant quantum computation by , Annals Phys. 303, 2 (2003).

[52] J. Edmonds, Paths, trees and flowers, Canadian Journal of mathematics 17.3, 449 (1965).

[53] N. Leijenhorst, Quantum error correction: Decoders for the toric code, A Bachelor thesis from Delft University of Technology (2019).

[54] N. Shenvi, J. Kempe, and K. B. Whaley, Quantum random-walk search algorithm, Phys. Rev. A 67, 11 (2003).

[55] A. M. Childs, R. Cleve, E. Deotto, E. Farhi, S. Gutmann, and D. A. Spielman, Exponential algorithmic speedup by a quantum walk, Proceedings of the 35th ACM Symposium on Theory of Computing San Diego p. 59–68 (2003).

[56] A. Ambainis, Quantum search algorithms, SIGACT 35, 22–35 (2004).

[57] A. Childs, Universal computation by quantum walk, Phys. Rev. Lett. 102, 180501 (2009).

[58] M. S. Underwood and D. L. Feder, Universal quantum computation by discontinuous quantum walk, Phys. Rev. A 82, 042304 (2010).

[59] N. B. Lovett, S. Cooper, M. Everitt, M. Trevers, and V. Kendon, Universal quantum computation using the discrete-time quantum walk, Phys. Rev. A 81, 042330 (2010).

[60] S. Venegas-Andraca, Quantum walks: a comprehensive review., Quantum Inf Process 11, 1015–1106 (2012).

[61] J. Kempe, Quantum random walks: An introductory overview, Contemp. Phys 44:4, 307 (2003).

[62] F. Xia, J. Liu, H. Nie, Y. Fu, L. Wan, and X. Kong, Random walks: A review of algorithms and applications, IEEE Trans Knowl Data Eng 4, 95–107 (2020). 136 BIBLIOGRAPHY

[63] F. Spitzer, Principles of random walk (Springer, New York, 1976).

[64] C. Grinstead and J. Snell, Introduction to Probability (American Mathematical Soci- ety, New York, 1997).

[65] M. U. Bondy A, Graph Theory (Springer, London, 2008).

[66] R. P. Feynman, Quantum mechanical computers, Found. Phys. 16, 507–531 (1986).

[67] Y. Aharonov, L. Davidovich, and N. Zagury, Quantum random walks, Phys. Rev. A 48, 1687 (1993).

[68] E. Farhi and S. Gutmann, Quantum computation and decision trees, Phys. Rev. A 58, 915 (1998).

[69] A. Childs, On the relationship between continuous- and discrete-time quantum walk, Commun. Math. Phys. 294, 581–603 (2010).

[70] F. W. Strauch, Connecting the discrete- and continuous-time quantum walks, Phys. Rev. A 74, 030301 (2006).

[71] S. Mostame, Thesis: On Quantum Simulators and Adiabatic Quantum Algorithms (2008).

[72] B. Zweibach, Lecture Notes, Chapter 6: Adiabatic Approximation (2017).

[73] J. Johansson, P. Nation, and N. F., Qutip: An open-source python framework for the dynamics of open quantum systems, Comp. Phys. Comm. 184, 1234 (2012).

[74] C. Zener, Non adiabatic crossing of energy level, Proceedings of the Royal Society of London 137, 692 (1932).

[75] L. Landau, Zur theorie der energieübertragung, Physics of Soviet Union 2, 46 (1932).

[76] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser, Quantum computation by adia- batic evolution, pre-print: quant-ph/0001106 (2000).

[77] L. K. Grover, Quantum mechanics helps in searching for a needle in a haystack, Phys. Rev. Lett 79, 325 (1997).

[78] J. A. Jones, Nmr quantum computation, Progress in Nuclear MagneticResonance Spectroscopy (arXiv:quant-ph/0009002) 38(4), 325 (2001).

[79] J. Roland and N. Cerf, Quantum search by local adiabatic evolution, Phys. Rev. A 65, 042308 (2002).

[80] U. Vazirani, How "Quantum" is the D-Wave Machine? (2014).

[81] D. Aharonov, W. V. Dam, J. Kempe, Z. Landau, S. Lloyd, and O. Regev, Adiabatic quantum computation is equivalent to standard quantum computation, SIAM Rev. 50(4), 755 (2008).

[82] T. Albash and D. A. Lidar, Adiabatic quantum computing, Rev. Mod. Phys 90, 015002 (2018).

[83] T. Hogg, Adiabatic Quantum Computing for Random Satisfiability Problems (2004). BIBLIOGRAPHY 137

[84] T. Neuhaus, K. Peschina, M.and Michielsen, and H. De Raedt, Classical and quantum annealing in the median of three-satisfiability, Phys. Rev. A 83, 12309 (2011).

[85] M. Žnidarič and M. Horvat, Exponential complexity of an adiabatic algorithm for an np-complete problem, Phys. Rev. A 73, 022329 (2006).

[86] P. W. Shor, Algorithms for quantum computation: discrete logarithms and factoring, pp. 124–134 (1994).

[87] P. W. Shor, Scheme for reducing decoherence in quantum computer memory, Phys. Rev. A 52, R2493 (1995).

[88] S. Bravyi and A. Kitaev, Universal quantum computation with ideal clifford gates and noisy ancillas, Physical Review A 71 (2) (2005).

[89] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Surface codes: Towards practical large-scale quantum computation, Physical Review A 86 (3) (2012).

[90] J. O’Gorman and E. T. Campbell, Quantum computation with realistic magic-state factories, Phys. Rev. A 95, 032338 (2017).

[91] Wikipedia contributors, Integer factorization records — Wikipedia, the free encyclo- pedia, https://en.wikipedia.org/w/index.php?title=Integer_factorization_ records&oldid=1004901716, [Online; accessed 12-February-2021] (2021).

[92] S. Bravyi and J. Haah, Magic-state distillation with low overhead, Phys. Rev. A 86, 052329 (2012).

[93] D. Litinski, Magic State Distillation: Not as Costly as You Think, Quantum 3, 205 (2019).

[94] J. S. Bell, On the problem of hidden variables in quantum mechanics, Rev. Mod. Phys. 38, 447 (1966).

[95] M. Howard, J. Wallman, V. Veitch, and J. Emerson, Contextuality supplies the “magic” for quantum computation, Nature 510 (7505), 351 (2014).

[96] B. J. Brown, A fault-tolerant non-clifford gate for the surface code in two dimensions, Science Advances 6 (21), eaay4929 (2020).

[97] Asymptote vector graphics language, https://asymptote.sourceforge.io.

[98] Quantum++ general purpose c++11 quantum computing library, https://github. com/softwareQinc/qpp.