Three Complexity Classification Questions at the ITUTE Quantum/Classical Boundary OFTENO by OCT 0 3 2019 Daniel Grier LIBRARIES ARCHIVES Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2019 @ Massachusetts Institute of Technology 2019. All rights reserved.

Signature redacted A u th o ...... Department of Electrical Engineering and Computer Science August 30, 2019 redacted Certified by.. Signature Scott Aaronson Professor of Computer Science, The University of Texas at Austin Thesis Supervisor Signature redacted A ccepted by ...... Ac b /Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Three Complexity Classification Questions at the Quantum/Classical Boundary by Daniel Grier

Submitted to the Department of Electrical Engineering and Computer Science on August 30, 2019, in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Abstract A central promise of quantum computers is their ability to solve some problems dramatically more efficiently than their classical counterparts. Thus, to understand feasible computation in our physical world, we must turn to quantum rather than classical complexity theory. That said, classical complexity theory has a long and successful history of developing tools and techniques to analyze the power of various computing models. Can we use classical complexity theory to aid our understanding of the quantum world? As it turns out, the answer is yes. There is actually a very fruitful connection between quantum and classical complexity theory, each field informing the other. We will add to this perspective through the lens of classification-attempts to categorize variations of the object of study as thoroughly and completely as possible. First, we will show that every regular language has quantum query complexity E(1), 5(Vi), or O(n). Combining quantum query complexity with these fundamental classical languages not only reveals new structure in these languages, but also leads to a generalization of Grover's famous quantum search . Second, we will discuss the complexity of computing the permanent over various matrix groups. In particular, this will show that computing the permanent of a unitary matrix is #P-hard. The theorem statement is classical, and yet, the proof is almost entirely the result of exploiting well-known theorems in quantum linear optics. Finally, we give a complete classification of Clifford operations over . Although the Clifford operations are classically simulable, they also exhibit distinct quantum behavior, making them a particularly interesting gate set at the quantum/classical boundary.

Thesis Supervisor: Scott Aaronson Title: Professor of Computer Science, The University of Texas at Austin

3 4 Acknowledgments

I should probably start my acknowledgments by stating how inadequate they will be in expressing my gratitude for all the people who have contributed to my graduate school experience. My advisor Scott Aaronson might refer to that sentiment as "forehead- bangingly obvious," but sometimes you just have to set the record straight. Of course, Scott is on the top of my list when it comes to people to thank. For all my years, Scott has been an academic heavyweight role model. He bleeds interesting research ideas, maintains a highly-successful blog, has two rambunctious children, all while being one of the most supportive advisors I've seen in the business. Even when forces drew him towards UT Austin during my third year, he fought to keep me financially supported at MIT as long as possible. I will always be grateful to have had Scott as an advisor, and I hope to never lose his infectious passion for science.

On the topic of advisors, I would also like to thank my undergraduate advisor Stephen Fenner at the University of South Carolina for introducing me to computa- tional complexity theory, and more generally to the world of research in theoretical computer science. It seems rather unlikely that I'd be sitting here typing these ac- knowledgments in this thesis were it not for him giving me some problem about the complexity of a strange combinatorial game many years ago. I will always appreciate his patience and encouragement during these early research years, and he remains a friend and collaborator to this day.

There are many faculty members at MIT who helped me somewhere along the way. First, I would like to thank both Aram Harrow and Ryan Williams for serving on my thesis committee. I would also like to thank those professors who helped to fund my studies through a TA position after Scott moved. Amongst them, I would especially like to mention Ronitt Rubinfeld for her support during this time. Finally, I am incredibly grateful for the kindness of Srini Devadas, who single-handedly made it possible for me to do research full-time for my final semester. I hope the work I did during that time would make him proud.

Compared to many of my peers, I have had relatively few distinct collaborators

5 during my time at MIT, which, I suppose, means that I am all the more grateful for the time I've worked with them: Scott Aaronson, Adam Bouland, Matt Coudron, Stephen Fenner, Ramis Movassagh, Luke Schaeffer, Aarthi Sundaram, and John Watrous. By far, my closest collaborator was the inimitable Luke Schaeffer. Almost no day went by during my time at MIT that didn't have some sort of research check-in with Luke, often lasting two, three, four hours. He is on all but a few of my published papers, including all three I will discuss in this thesis. I can't really imagine what grad school would have been like without Luke, but I'm guessing it would have been less fun and less successful. I'd also like to thank the entire theory group at MIT. It was an intensely social group of people that knew how to play a mean game of soccer, sing their hearts out during karaoke, and were even pretty good at math if you gave them the chance. I have no doubt that many of them will go on (or have already gone on) to be world class researchers, and I feel privileged to have been part of the group. Purely out of fear of leaving anybody out, I will not try to enumerate all of my theory friends, but needless to say... I will miss you guys. I'd also like to thank the various organizations I was part of during my time at MIT that made it that much more enjoyable. First, to the cycling club team at MIT, consider this my formal apology for not training harder. I didn't deserve to be part of such a fun and successful team, but I sure did enjoy being along for the ride. I'd also like to thank my hockey team, who took a chance on me when I messaged them out of the blue asking to join. Sports have always been a large part of my life, and I'm really lucky to have have been part of two great teams. Finally, I would like to thank the EECS Communications Lab for hiring me on as a writing advisor. I would especially like to thank my manager Alison Takemura, who not only helped me with my own communication tasks, but also helped me think about helping others as well. Finally, I would like to thank my entire family-Jon, Marion, and Ben Grier- and my girlfriend Amy for the incredible support and good times over the years. To say that they've helped me become a better and happier person would just be forehead-bangingly obvious.

6 Previously Published Material

The three main results of this thesis are based on previous papers. Chapter 3 is from a conference paper in joint work with Scott Aaronson and Luke Schaeffer [7]. Chapter 4 and Chapter 5 are based on a conference and workshop paper, respectively, with Luke Schaeffer [49, 50].

7 8 Contents

1 Introduction 13 1.1 R esults ...... 15 1.1.1 Quantum query complexity of regular languages ...... 15 1.1.2 Hardness results for matrix permanents ...... 16 1.1.3 Classification of Clifford operations ...... 17 1.2 Related Work ...... 17

2 Background 21 2.1 Quantum Computers ...... 21 2.1.1 States ...... 22 2.1.2 Operations...... 23 2.1.3 Measurement ...... 24 2.2 Query Complexity ...... 24 2.2.1 Relationships ...... 26 2.3 Complexity Classes ...... 27 2.3.1 Decision classes ...... 27 2.3.2 Function classes ...... 28

3 A Quantum Query Complexity 'richotomy for Regular Languages 33 3.1 Results Overview ...... 36 3.1.1 Proof Techniques ...... 38 3.1.2 Related Work ...... 38 3.2 Background ...... 40

9 3.2.1 Regular languages ...... 40

3.2.2 Query complexity with non-binary alphabets ...... 43

3.3 Applications of Star-free Algorithm ...... 44 3.3.1 Dynamic AND-OR ...... 44

3.3.2 Bounded Dyck language ...... 45 3.3.3 Addition ...... 45 3.3.4 Length-2 Word Break ...... 46 3.3.5 Grid problems ...... 48

3.4 Formal Statement of Trichotomy Theorem ...... 49

3.4.1 Flattening ...... 50

3.4.2 Trichotomy Theorem ...... 53

3.4.3 Equivalence of algebraic and regular expression definitions .. 54

3.4.4 Monotonic query complexity ...... 56

3.4.5 Structure of the proof ...... 57

3.5 Upper Bounds ...... 57

3.5.1 Proof techniques ...... 58 3.5.2 O(Vi) algorithm for star-free languages ...... 64

3.6 Dichotomy Theorems ...... 67

3.7 Lower Bounds ...... 71

3.8 Context-Free Languages ...... 74

3.8.1 Context-free languages do not obey the trichotomy ...... 75

4 The Complexity of the Permanent from Linear Optics 83

4.1 Results Overview ...... 85

4.1.1 Proof Outline ...... 86

4.2 Linear Optics Primer ...... 87

4.2.1 Linear optics as a symmetric subspace ...... 90

4.3 Permanents of Real Orthogonal Matrices ...... 93

4.3.1 Constructing the ...... 94

4.3.2 Postselected Linear Optical Gadgets ...... 98

10 4.3.3 M ain Result ...... 102 4.4 Permanents over Finite Fields ...... 104 4.5 Expanding Permanent Hardness ...... 108 4.5.1 Positive Semidefinite Matrix Permanents ...... 109 4.5.2 More Permanent Consequences of the Main Result .. . 111 4.6 Approxim ation ...... 113 4.7 Gadget Details ...... 116 4.7.1 Gadget entries ...... 117 4.7.2 Galois Extension ...... 119

5 A Complete Classification of Clifford Operations over Qubits 121 5.1 Results Overview ...... 123 5.1.1 Proof Outline ...... 125 5.2 Stabilizer Formalism ...... 128 5.3 G ates ...... 130 5.3.1 Single- Gates ...... 130 5.3.2 Multi-qubit Gates ...... 133 5.4 Tableaux ...... 134 5.4.1 Correspondence between Gates and Tab leaux ...... 138 5.5 Classes ...... 140 5.5.1 Clifford Ancilla Rule ...... 144 5.6 Invariants ...... 145 5.6.1 Formal invariants ...... 147 5.6.2 Subring invariants ...... 150 5.6.3 Permutation invariants ...... 152 5.7 Equivalence of Generator and Invariant Definitions ...... 155 5.8 Circuit Identities ...... 159 5.9 Universal Construction ...... 163 5.10 Completing the Classification ...... 169 5.11 Enumeration ...... 172

11 12 Chapter 1

Introduction

The development of electronic computers is undoubtedly one of the great accom- plishments of engineering and mathematics in the past century. Concurrent with the development of the computers themselves, computer scientists have created a litany of to run on them, from selecting the shortest path in a graph to testing primality. With such problems, larger amounts of data can be processed relatively efficiently by simply giving the computer more time and/or more memory. Unfortunately, since the dawn of computing, we have also known that there are problems that are simply out of reach for any such computer. Indeed, in 1936 (well before the first electronic computer in 1946) Alan Turing proved that no computer could solve what is known as the halting problem: given a description of a computer program as input, decide whether or not it gets stuck in an infinite loop. Such a theorem statement is predicated on the fact that we can agree on what a "computer" actually is. For this purpose, Turing defined what is now called the "Turing machine," a rel- atively simple model of computation for problems whose inputs and outputs are encoded by strings of letters from a finite alphabet. The universality of this model is captured by the Church-Turing thesis, which postulates that any algorithm one could execute in our physical world could be simulated by a Turing machine running for a sufficiently long period of time. Thus, to understand the fundamental limits of computation, complexity theorists have focused on understanding the classical Turing

13 machine and the classical resources required to solve certain problems. 1 That said, our physical world appears to be quantum mechanical. There is now widespread belief amongst complexity theorists that classical Turing machines cannot efficiently simulate quantum mechanics, and therefore, cannot efficiently simulate a quantum computer (recall that the Church-Turing thesis says nothing about the length of an arbitrary Turing machine simulation). More specifically, there is no known efficient classical algorithm to simulate local Hamiltonian dynamics, to factor numbers, or to sample from the distribution of random linear optical networks, all of which can be efficiently computed by a quantum computer. This apparent discrepancy between the power of classical and quantum Turing machines naturally leads to the study of quantum complexity theory, which seeks to understand the fundamental limits of efficient computation on quantum computers. One might worry that studying quantum complexity requires us to discard our knowledge of classical computation and reevaluate all problems in the context of quantum mechanics. Happily, such a nightmare scenario is far from reality. One of the primary goals of this thesis is to provide interesting and useful connections between quantum complexity theory and classical complexity theory such that each field can be seen as enhancing the other. In particular, we will explore these questions through the lens of classification. Properly defining a "classification result" in the most general sense might best be left as a philosophical question, rather than a mathematical one. Still, the results of this thesis all share the goal of understanding all possibilities which could arise from the object of study. For instance, in Chapter 5 we ask for a complete understanding of the possible ways in which one set of fails to generate another set. At times, classification may seem as a tedious exercise2 for the completionist, but we do not take this view. On the contrary, many of the interesting features of

'Such complexity classes include P, the class of problems which can be solved in polynomial time on a Turing machine; BPP, the class of problems which can be solved in polynomial time with additional randomness; and PSPACE, the class of problems which can be solved by with polynomial memory. For a more detailed look at complexity classes, both classical and quantum, we refer the reader to Section 2.3. 2 For instance, just describing the 57 classes of Clifford operations in this thesis requires two figures over two pages (see figures 5-1 and 5-2).

14 the results in this thesis are a direct consequence of the classification. First, the process of classification can reveal some interesting behavior we might have otherwise overlooked. Second, a theorem statement may make no reference to a classification, and yet, the only known proof may require such a classification. For instance, it is a fact that every infinite set of Clifford gates can be generated by a finite number of gates from that set. However, it is rather unclear how one would prove this statement without first classifying all possible classes of Clifford gates.

1.1 Results

This thesis is comprised of three topics: the query complexity of regular languages (Chapter 3), hardness results for the permanent over matrix groups (Chapter 4), and a classification of Clifford operations (Chapter 5). The introductions below do not serve as a full replacement for the introductions in their respective chapters, but we hope to at least give a high level description of the results, as well as put to them into context of our goal of connecting classical and quantum computation.

1.1.1 Quantum query complexity of regular languages

One of the glaring weaknesses of modern complexity theory is its inability to prove unconditional lower bounds on the time and space requirements for specific problems. For this reason, complexity theorists have developed different notions of complexity which are more amenable to our currently available proof techniques. One important such measure is called "query complexity," since the algorithm is only charged for how many bits of the input it accesses (or queries), rather than the time it takes to process those queries. One can also define a quantum query to the input, which leads to similar notion of quantum query complexity. In this work, we study the quantum query complexity of another great develop- ment in classical complexity theory, the regular languages. A language is regular if it is recognized by a finite-memory Turing machine. In fact, regular languages can be equivalently defined in a myriad of different ways. This robustness of definition, and

15 the fact that they have many desirable closure properties, makes them ideal targets for study. We give a complete picture of the quantum query complexity landscape for regular languages. That is, every regular language has quantum query complexity E(1), 5(/-), or 6(n), to some technical caveats. Interestingly, the classical query complexity of regular languages is either E(1) or 6(n), a much easier result to prove. Thus, the classification reveals some exploitable structure in a subclass of regular languages, which leads to faster quantum algorithms.

1.1.2 Hardness results for matrix permanents

One of the crowning achievements of classical complexity theory is the notion of reduction, an efficient procedure for converting solutions of one problem to another. Given an arbitrary problem, one can often use reductions to show it belongs in a class of problems which have roughly equivalent complexity. One of the most notable instances of such a reduction was a proof by Valiant that the permanent of an arbitrary matrix is #P-hard to compute. Roughly speaking, this result shows that the ability to compute the permanent is equivalent to the ability to count the number of inputs which make a polynomially-sized circuit evaluate to 1. Given the fundamental nature of the permanent, it would be nice to extend this hardness result to smaller and smaller classes of matrices. More precisely, what is the computational hardness of computing the permanent even when the underlying matrix has global structure (e.g., invertible, unitary)? As it turns out, techniques from quantum mechanics (and in particular linear optics) are extremely relevant. We adapt an argument of Aaronson [2], to show that the permanent of such matrices (most notably, orthogonal matrices over the real numbers) is still #P-hard to compute. The result even extends to permanents of orthogonal matrices over finite fields with characteristic not equal to 2 or 3, which explains for the first time a strange dichotomy in the complexity of these matrices which was noticed in the 1980's.

16 1.1.3 Classification of Clifford operations

It is often useful to think of computation from the perspective of circuits. In such a setting, a function is computed by a network of smaller gates, and a function's complexity is determined by the type/quantity of the gates required. It is well-known that the set of all single-qubit gates combined with the CNOT gate is universal, that is, these gates can generate all unitaries on an arbitrary number of qubits. This set of gates is not special. A few random gates will almost always suffice for universality. What is not known are all possible ways in which a gate set can fail to be universal, and there are reasons to believe that a complete classification of all such possibilities is slightly beyond our current mathematical techniques. Thus, we first focus on classifying a restricted sets of quantum gates-namely, the Clifford gates. The Clifford gates are an interesting set of gates because they exhibit both quantum and classical properties. On the one hand, the Clifford gates admit a succinct classical description and can be simulated efficiently on a classical computer. On the other hand, the states that arise from the application of Clifford gates are highly entangled and are even used as the basis for a large class of quantum error- correcting codes. Because of their intermediate nature, we view the Clifford gates as an ideal candidate for study. We give a complete classification of Clifford gate sets in terms of the unitaries that they generate. Amongst other things, the classification will reveal all of the possible symmetries in the gates that are preserved by the circuit building operations as well as efficient algorithms for detecting them.

1.2 Related Work

It would be dishonest to suggest that one of the central messages of this thesis-that quantum complexity theory and classical complexity theory are intimately linked- has not been previously explored and even touted in the literature on quantum com- puting. The intention of this section is not to give a thorough review of all such literature, but to give a flavor for the types of connections that will also appear later in this thesis.

17 To start, it has long been known that the quantum BQP, which contains those languages which are computed by polynomial-sized quantum circuits, is contained in the classical complexity class PP (see Section 2.3 for precise definitions) by using the so-called Feynman path integral. In fact, if you give the quantum computer an additional resource, namely, the ability to condition on low-probability measurement outcomes, then these two classes become equal, i.e., PostBQP = PP [1]. So, anything we could say about the classical complexity class PP, we could also say about the quantum complexity class PostBQP and vice versa. This is not simply a quaint-but-useless observation. Since PostBQP is easily shown to be closed under intersection, this equality provides the cleanest known proof that PP itself is closed under intersection, which took some 14 years to prove after it was initially proposed

[19]. Quantum proofs for classical theorems such as the one above are not anomalous. For example, every quantum query algorithm computing some function gives rise to a polynomial which also approximates the function.3 Using fewer queries yields a lower- degree polynomial. Equipped with this intuition, de Wolf was able to improve on the best known classical bound for the E-approximate degree of non-constant symmetric Boolean functions [36].

Even straightforward connections between quantum and classical objects are use- ful. Consider the problem of trying to store some subset of a list in a table that is as small as possible such that you can answer queries of the form "is item x in the subset" in as few queries to the table as possible. Of course, if you prove a lower bound for the number of quantum queries needed for a table of some size, then you have also proved a lower bound on the number of classical queries needed. In fact, by considering this quantum setting Radhakrishnan, Sen, and Venkatesh were able to do exactly that and improve the best known classical lower bounds using novel linear-algebraic techniques [69].

Sometimes, even when the connection is not as direct, ideas and techniques from

3 We say that polynomial p(x) c-approximates a Boolean function f:{0,1}' -> {,1} if |p() - f(x)| < F for all x E{o, 1}".

18 are useful in obtaining classical results. One notable example be- ing the proof that there is no polynomial-size linear program whose polytope projects onto the traveling salesman polytope, which used ideas from quantum communica- tion complexity [44]. We do not attempt to give an exhaustive list of quantum proofs for classical theorems, and we direct the interested reader to the excellent survey of

Drucker and de Wolf for a more thorough treatment of this topic [38].

The above examples showcase instances where the language of quantum computing is helpful in understanding classical computing, but of course the reverse is also true.

Concepts from classical computing permeate the vocabulary of quantum computing, and there are a myriad of interesting results-from the quantum PCP theorem 1651 to [21, 40]-which arose by "adding quantum" to an existing classical result.

Finally, we note that there are also many uses of classical techniques in quantum proofs. As one such example, consider the quantum query complexity of a function, which has emerged as one of the most useful tools for rigorously proving quantum speedups. One might hope that we could use this approach to show that a quantum computer could be exponentially faster than a classical computer on natural Boolean functions that are defined for all inputs. Unfortunately, Beals et al. show that the quantum query complexity of these total Boolean functions is polynomially related to their classical query complexity by using an intermediate complexity measure called block sensitivity [18]. This intermediate complexity measure was actually first devel- oped by Nisan to understand a classical parallel model of computation [67].

19 20 Chapter 2

Background

This chapter will be devoted to covering some of the more standard concepts that appear in the remainder of the thesis. We will start with a rather gentle introduction to quantum computing. We then define query complexity and some related com- plexity measures, which will only be needed in Chapter 3, but we feel is general and important enough to include here. Finally, we include a review of some of the stan- dard complexity classes. In particular, the counting classes will be used extensively in Chapter 4.

2.1 Quantum Computers

Quantum computing is often described as harnessing the weirdness of physics at the atomic level. It may then be surprising that to study quantum computing, no knowl- edge of Shrddinger's equations, wave-particle duality, Bose-Einstein condensates, etc. is needed. 1 Instead, one needs only to understand some basic linear algebra (which we will assume familiarity with) and a few rules concerning quantum states, operations, and measurements. This section serves to introduce these concepts. For a much more thorough introduction to quantum computing, we refer the reader to the excellent textbook of Nielsen and Chuang [66].

'Though of course, if you wanted to build a physical quantum computer, such concepts would become extremely important, and I would recommend reading a different thesis.

21 2.1.1 States

Let us first focus on the qubit, the quantum equivalent to the classical bit. At its core, a qubit is simply a unit vector of two complex numbers called amplitudes. One could chose to write such a state as a vector (a,#)TE C2 such that a1 2 +#|2= 1. However, the quantum computing community has adopted a what is called bra-ket notation that facilitates some common calculations. In this notation, a qubit state 4V) (read "ket" 4) is written as

|@) = aO0) + 11),

where 10) = (1, O)T and 11) = (0, 1)T represent the standard basis column vectors. The state is said to be in the superpositionof states 10) and 11). One might be tempted to say that ') is in state 10) with probability Ja12 and state 11) with probability1|12, but this linguistic simplification can lead to trouble later and should be avoided. Another bit of notational convenience is to write the conjugate transpose of |) as ($1 (read "bra" ). Thus, the standard inner product between states is written as ((4|)(|p)) for all4') and |), and in fact, is often simplified even further to(#|). So, the fact that a qubit 1') is a unit vector can be expressed as

(0|0) = (a* #*)(a)= +a13121.

In summary, the single-qubit states are those unit vectors spanning the Hilbert space C2 . We extend to states with multiple qubits via the tensor product. For instance, the two-qubit state 10o) 0101) is the tensor product of the two single-qubit states 10o) = aol0) +#o1) and 1@1) = ail0) +,311), and can be expanded as

10o) 0 11) = (ao10) +,3o 1)) 0 (a,10) + 1|1))

3 = a0oa100) + ao# 1|01) + ol1|10) + #001|11), where we write Ixy) = |x)01y) for x, y E {0, 1}. Formally, the n-qubit states are those

22 unit vectors spanning the Hilbert space (C2 )On. An n-qubit state |@) is represented by a unit vector of 24 complex amplitudes, written in bra-ket notation as

xE{0,11" where (ylx) = 0 for x 7 y {, 1}". It is worth noting here that multi-qubit states need not be expressible as tensor products of single-qubit states. In fact, states with this property are given a special name-they are called entangled. One particularly famous entangled state is the so-called Bell state or EPR pair: 1°)+"11) Such states will be useful in the quantum gate classification in Chapter 5.

2.1.2 Operations

The quantum operations on states are simply those linear transformations which preserve the norm. Such operations are called unitary. On n qubits, the unitary operations are represented by 2' x 2' complex matrices forming the group U(2 , C).

Equivalently, one can check that U C U(2 ,C) iff UUt = UtU = I. As an example, consider the Hadamard matrix, which appears all throughout quantum computing:

H =

We have that HI0) = 10)1) := H)and H11) 10)11) := |-).Since H is an involution, we have H|+) = 10) and HI-) = |1). Now it is clear why we cannot think of 1+) as "50% in the state |0) and 50% in the state 11)." If this were the case, then no linear operation would be able to map 1+) to 10). However, this is exactly what the Hadamard operation does. The Hadamard operation is an example of what is known as a gate-a simple quantum operation from which larger, more complicated operations are constructed. A quantum circuit is an explicit method to compose gates to generate some unitary. For example, it is known that any multi-qubit quantum operation can be generated

23 by a sufficiently long circuit of single and two-qubit gates.

2.1.3 Measurement

Given that an n-qubit state is represented by a vector of 2' amplitudes, one might think that you could use it to store 2 nbits of information. Unfortunately, this in- tuition turns out to be false due to measurement-the process by which information about an unknown quantum state is gathered. More precisely, given a state |) =

Zx 1} axIx),a2| measurement is a random process which produces some x E {O, 1} with probability equal to Iax12. Since the vector representing the state has unit norm, this operation-referred to as the Born rule-is well-defined. The key property of quantum measurement is that it is destructive. After mea- suring the state |b) to receive x, the state collapses to Ix). So, if we were to measure the state again, then the measurement would produce x with probability 1. Using |+) as our running example, measurement would result in 0 with probability 1/2 and 1 with probability 1/2, collapsing the state to whichever outcome was measured.

2.2 Query Complexity

This section serves as a brief overview of query complexity, a model of computation where algorithms are charged based on the number of input bits they reveal (the input is initially hidden) rather than the actual computation being done. This may at first appear to be a rather strange measure of complexity since reading the input is often much more efficient than computing whichever function of the input is needed. As it turns out, many problems which have efficient quantum query algorithms also are computed by quantum circuits of reasonable size and depth. From a theoretical standpoint, it is also much easier to prove rigorous lower bounds in the quantum query model than in the circuit model. Let us now formalize the model. To model that the input is hidden, all query algorithms must access their inputs via an indexing oracle-a function which takes some index and outputs the value of the corresponding input bit. We use the standard

24 notion of oracles in the quantum setting. That is, for oracle function 0: {0, 1} -+ {0, 1}, the can apply the (n+1)-qubit transformation which flips the last qubit if 0 applied to the first n qubits evaluates to 1. The quantum query complexity of function f: E* -{0, 1} is a function Q(f): N- N such that Q(f)(n) is the minimum number of oracle calls for a quantum circuit to decide (with bounded error) the value of f for input strings of length n. One can similarly define deterministic query complexity (D), bounded-error ran- domized query complexity (R), and zero-error randomized query complexity (R) by counting the number of input symbols accessed in these models. Closely - lated to quantum query complexity is a notion of approximation by polynomials called approximate degree, denoted deg(f). The approximate degree of a function f : [k]' -+ {0, 1} is the minimum degree of a polynomial p(xi,... ,) such that

|p(zi,... , x,) - f (xi,.. . , x)| < for all xi,..., zu E [k]. 3 Let us also define several query complexity measures which are useful tools in proving lower bounds in the more standard models of computation above. Fix a function f: E* -+ {0,1}. Let x E E' be some input.2 We say that some input symbol x is sensitive if changing only xi changes the value of the function on that input. The sensitivity of x is equal to its total number of sensitive symbols. The sensitivity of f, denoted s(f), is the maximum sensitivity over all inputs x. Similarly, the block sensitivity at an input is the maximum number of disjoint blocks (i.e., subsets of the input bits) such that changing one entire block changes the value of the function. The block sensitivity of f, denoted bs(f), is the maximum block sensitivity over all inputs x. A certificate is a partial assignment of the input symbols such that f evaluates to the same value on all inputs consistent with the certificate. The certificate complexity of an input is the minimum certificate size (i.e., the number of bits assigned in the partial assignment). The certificate complexity of f, denoted C(f), is the maximum certificate complexity over all inputs.

2 1t is typical that E = {0, 1}, and most of the query complexity measures in this section are usually defined with that alphabet in mind. In Section 3.2.2 we will address this change to non- binary alphabets directly.

25 Finally, when clear from context, we will often let a language denote its charac- teristic function when used as an argument in the various complexity measures. For example, for language L C E*, we will write Q(L) as the quantum query complexity of the function fL: E* - {0, 1} where f(x) = 1 iff x E L.

2.2.1 Relationships

There are many relationships between the different complexity measures that will be useful in Chapter 3. For example, the proposition below follows from the fact that some models of computation can easily simulate others.

Proposition 1 ([18]). For all f: {0, 1}* -+ {O, 1},

1- R(f) Ro(f) < D(f). 2- deg(f) < Q(f) <

In Section 3.6, we prove a dichotomy theorem for the block sensitivity of a reg- ular language-it is either 0(1) orQ(n). This is particularly useful since nearly all complexity measures are polynomially related to block sensitivity:

Theorem 2 ([27]). For all f:(0,1}* -- {0,1}, we have the following relationships for block sensitivity: Lower bounds Upper bounds C(f) > bs(f) s(f) bs(f) deg(f)= Q(/bs(f)) (f) bs(f)2 R(f)= Q(bs(f)) D(f) < bs(f)3. Notice that for nearly all complexity measures M, we have bs(f)a< M(f) < bs(f)b for some constants a, b > 0. This connection was strengthened recently when Huang showed that the block sensitivity is polynomially related to sensitivity as well. 3

Theorem 3 (Huang [56]). For all f : {0, 1}* - {0, 1}, bs(f) 5 2s(f)4 .

Corollary 4. If any query complexity measure in {s, bs, C, D, Ro, R, Q, deg} is 0(1), then all of them are 0(1). 3 1t is worth pointing out that for the purposes of this thesis, the relation bs(f) = O(s(f)4V() would suffice [78].

26 2.3 Complexity Classes

This section is devoted to introducing the various complexity classes we use in this the- sis. Section 2.3.1 contains many of the standard decision classes, while Section 2.3.2 contains some of the more obscure counting classes which are referenced in Chapter 4.

2.3.1 Decision classes

All the classes in this section are so-called decision classes. Each class is a set of languages L C E*, which themselves are subsets of words from the alphabet E. The term "decision" comes from the fact that a model of computation defining some class must give a yes-no answer for each input. The languages in the class are simply those sets associated with "yes" answers. Let's start with the complexity class which is used most often to describe efficient classical computation.

Definition 5. P is the class of languages recognized by deterministic Turing machines in polynomial time.

When adding randomness to the Turing machine, we think of each coin toss as splitting the computation into two nondeterministic paths-one for heads, one for tails. The error model determines how many of these paths must eventually end in a "yes" answer for the input to be consider a "yes" input. The most common type of error is a bounded two-sided error captured by the class below:

Definition 6. BPP is the class of languages by recognized by a polynomial-time non- deterministic Turing machine such that

• If the answer is "yes," then at least 2/3 of all paths accept.

* If the answer is "no," then at most 1/3 of all paths accept.

One of the ways to make the above class more powerful is to shrink the gap between the "yes" and "no" paths so that there is only an exponentially small difference between the two.

27 Definition 7. PP is the class of languages by recognized by a polynomial-time non- deterministic Turing machine such that

• If the answer is "yes," then at least 1/2 of all paths accept.

• If the answer is "no," then less than 1/2 of all paths accept.

It is widely believed that this class no longer captures efficient computation be- cause it is unclear how one could detect such a small gap without flipping exponen- tially many coins. On the other hand, it is one of the smallest classical classes we know which contains efficient quantum computation; i.e., BQP _ PP.

Definition 8. BQP is the class of languages by recognized by a polynomial-size quan- tum circuits such that

• If the answer is "yes," then measurement on the first input results in a 1 with probability at least 2/3.

• If the answer is "no," then measurement on the first input results in a 1 with probability at most 1/3.

2.3.2 Function classes

In Chapter 4, we will be concerned with the complexity of calculating the matrix permanent. Since the permanent is a function, its complexity is best captured by function classes. Hence, we will sometimes need the class FP to stand in for P when we are talking about function problems.

Definition 9. FP is the class of functions computable by deterministic Turing ma- chines in polynomial time.

Of course, computing the permanent is, in general, thought to be intractable (i.e., not in FP). We use a variety of different classes to capture the difficulty of computing the permanent (depending on the kind of matrix, underlying field, etc.), but the most important class is #P:

28 Definition 10. #P is the class of function problems of the form "compute the num- ber of accepting paths of a polynomial-time non-deterministic Turing machine." For example, given a classical circuit of NAND gates as input, the problem of computing

the number of satisfying assignments is in #P (and indeed, is #P-complete).

Since #P is a class of function problems (more specifically, counting problems), we often consider P#P to compare #P to decision classes. Observe that P#P PPP since, on the one hand, the #P oracle can count paths to simulate PP, and on the other hand, we can use the PP oracle to binary search (on the number of accepting paths) to count exactly. We add that P#P C PSPACE is a upper bound for #P, and Toda's theorem [84] gives PH C P#P. Fenner, Fortnow, and Kurtz [43] define a very closely related class, GapP, which is also relevant to us.

Definition 11. GapP is the class offunction problems of the form "compute the number of accepting paths minus the number of rejecting paths of a polynomial-time non-deterministic Turing machine."

We have GapP 2 #P since we can take a #P problem (manifest as a non- deterministic Turing machine) and at the end of each rejecting path, add a non- deterministic branch which accepts in one half and rejects in the other. In the other direction, any GapP problem can be solved with at most two calls to a #P oracle

(one for accepting paths, one for rejecting), and a subtraction. Hence, for most of our results we neglect the difference. Nonetheless, GapP and #P are different. For one, functions in #P are non-negative

(and integral) by definition, whereas functions in GapP can take negative values. The distinction is also important in the context of approximation; Stockmeyer's approx- imate counting gives a multiplicative approximation to any #P problem in BPPNP whereas it is known that multiplicative approximation to a GapP-hard problem re- mains GapP-hard under Turing reductions (see Theorem 80). One cannot even get very bad multiplicative approximations to GapP-hard prob- lems. Even the worst multiplicative approximation will distinguish zero from non-zero

29 outputs, and this problem is captured by the class CP, defined below.

Definition 12. C=P is the class of decision problems of solvable by a non-deterministic polynomial-time machine which accepts if it has the same number of accepting paths as rejecting paths.

A good upper bound for C=P is simply PP. This is easily seen once we have the following theorem.

Theorem 13. Suppose fi, f2 E E* -+ Z are functions computable in GapP. Then fi + f2, -fi, and fif2 are computable in GapP.

Proof. Let M1 and M2 be non-deterministic machines witnessing fi E GapP and f2 E GapP respectively. Then the machines for fi+ f2, -fi, and fif2 are defined as follows.

1. For fi+f2, non-deterministically branch at the start, then run M1 in one branch

and M2 in the other.

2. For -fi, take the complement of M1. That is, make every accepting path reject, and make every rejecting path accept.

3. For fif2, run Mi to completion, then run M 2 to completion (in every branch of

M1). Accept if the two machines produce the same outcome, otherwise reject.

The last construction may require some explanation. Let ai, a2 be the number of accepting paths of M1 and M2 respectively, and similarly let bi, b2 be the numbers of rejecting paths. Then there are aa2 + bb2 accepting paths for the new machine and a1b2 + a2bi rejecting paths, so as a GapP machine it computes

a1a2 - a1b2 - a2bi + b1 b2 = (a1 - bi)(a 2 - b2 ) = fi(x)f2 (X).

Theorem 13 implies that C-P C PP because we can square and negate the gap. In other words, we can find a machine such that the gap is always negative (i.e., strictly

30 less than half of all paths accept) unless the original machine had gap zero, in which case the gap is still zero (or, WLOG, very slightly positive). It is also worth noting that coCP is known to equal NQP, by a result of Fenner et al. [42].

Definition 14. The class NQP contains decision problems solvable by a polynomial- time (or, equivalently, a uniform, polynomial-size family of quantum circuits) where we accept if there is any nonzero amplitude on the accept state at the end of the computation.

Quantum classes with exact conditions on the amplitudes (e.g., NQP or EQP) tend to be very sensitive to the gate set or transition amplitudes in the quantum Turing machine. Adleman, Demarrais, and Huang [81 are careful to define NQP for the case where the transition amplitudes are algebraic and real. Finally, we specify computational hardness for our finite field problems using a mod k decision version of #P.

Definition 15. For any integer k > 2 let ModkPbetheclassofdecisionproblems solvable by a polynomial time non-deterministic machine which rejects if the number of accepting paths is divisible by k, and accepts otherwise. In the special case k = 2, ModkP is also known as '"arity P," and denoted GP.

Clearly P#P is an upper bound for ModkP. We are finally ready to state the main hardness result for these counting classes, namely, the celebrated theorem of Toda[841 and a subsequent generalization by Toda and Ogiwara [85]. There are many important consequences of Toda's work, but we only require the following formulation.

Theorem 16 (Toda's Theorem [84, 85]). Let A be one of the counting classes ModkP, C_P, #P, PP, or GapP. Then PH C BPPA.

This means in particular that, if a problem is hard for any of these classes, then there is no efficient algorithm for the problem unless PH collapses.

31 32 Chapter 3

A Quantum Query Complexity Trichotomy for Regular Languages

Regular languages have a long history of study in classical theoretical computer sci- ence, going back to Kleene in the 1950s [59]. The definition is extremely robust: there are many equivalent characterizations ranging from machine models (e.g., determin- istic or non-deterministic finite automata, o(log log n)-space Turing machines [81]), to grammars (e.g., regular expressions, prefix grammars), to algebraic structures (e.g., recognition via monoids, the syntactic congruence, or rational series). Regular lan- guages are closed under most natural operations (e.g., union, complement), and also most natural questions are decidable (e.g., is the language infinite?). Perhaps for this reason, regular languages are also a useful pedagogical tool, serving as a toy model for theory of computation students to cut their teeth on. We liken regular languages to the symmetric' Boolean functions. That is, both are a restricted, (usually) tractable special case of a much more general object, and often the common thread between a number of interesting examples. We suggest that these special cases should be studied and thoroughly understood first, to test proof techniques, to make conjectures, and to gain familiarity with the setting. In this work, we hope to understand the regular languages from the lens of an-

A symmetric Boolean function f : {0, 1}1 - {0, 1} is such that the value of f only depends on the Hamming weight of the input.

33 other great innovation of theoretical computer science-query complexity, particularly quantum query complexity. 2 Not only is query complexity one of the few models in which provable lower bounds are possible, but it is also often the case that efficient al- gorithms actually achieve the query lower bound. In this case, the query lower bound suggests an algorithm which was otherwise thought not to exist, as was famously the case for Grover's search algorithm.

In the case of query complexity, symmetric functions are extremely well-understood with complete characterizations known for deterministic, randomized, and quantum algorithms in both the zero-error and bounded-error settings [18]. However, to the authors' knowledge, regular languages have not been studied in the query complexity model despite the fact that they appear frequently in query-theoretic applications.

For example, consider the OR language, which is recognized by the regular ex- pression(0|1)*1(0|1)*. Similarly, the parity function is just membership in the regular language (0*10*1)*0*. It is well known that the quantum query complexity of OR is O(V/ii), whereas parity is known to require 6(n) quantum queries. Yet, there is a two-state deterministic finite automaton for each language. This raises the question: what is the difference between these two languages that causes the dramatic discrep- ancy between their quantum query complexities? More generally, can we decide the quantum query complexity of a regular language given a description of the machine recognizing it? Are all quantum query complexities even possible? We answer all of these questions in this chapter.

The main contribution of this work is the complete characterization of the quan- tum query complexity of regular languages (up to some technical details), manifest as the following trichotomy: every regular language has quantum query complexity E(1), E(/), or 6(n). In the process, we get an identical trichotomy for approximate degree, and dichotomies-in this case, E(1) or E(n)-for a host of other complexity measures including deterministic complexity, randomized query complexity, sensitiv- ity, block sensitivity, and certificate complexity.

2 A generic introduction to query complexity is given in Section 2.2. We discuss quantum query complexity with non-binary alphabets in Section 3.2.2.

34 Many of the canonical examples of regular languages fall easily into one of the

three categories via well-studied algorithms or lower bounds. For example, the upper

bound for the OR function results from Grover's famous search algorithm, and the

lower bounds for OR and parity functions are straightforward applications of either

the polynomial method [18] or adversary method [11]. Nevertheless, it turns out that

there exists a vast class of regular languages which have neither a trivial Q(n) lower

bound nor an obvious o(n) upper bound resulting from a straightforward application

of Grover's algorithm. A central challenge of the trichotomy theorem for quantum

query complexity was showing that these languages do actually admit a quadratic

quantum speedup.

One such example is the language E*(20*2)E*, where E = {0, 1, 2}. Although there is no finite witness for the language (e.g., to find by Grover search), we show that it nevertheless has anO(Vi) quantum algorithm. More generally, this language belongs to a subfamily of regular languages known as star-free languages because they have regular expressions which avoid Kleene star (albeit with the addition of the complement operation).3 Like regular languages, the star-free languages have many equivalent characterizations: counter-free automata [64], predicates expressible in either linear temporal logic or first-order logic [58, 64], the preimages of finite aperiodic monoids [74], or cascades of reset automata [62]. The star-free languages are those regular languages which can be decided in O(V ) queries. As a result, reducing a problem to any one of the myriad equivalent representations of these languages yields a quadratic quantum speedup for that problem.

Let us take McNaughton's characterization of star-free languages in first-order logic as one example [64]. That is, every star-free language can be expressed as a sentence in first-order logic over the natural numbers with the less-than relation and predicates 7rafor a E E, such that ira(i) is true if input symbol xi is a. We can easily express the OR function as 3i 1 (i), or the more complicated language E*(20*2)E* as

Ei Ek Vj i < k A r2 (i) A 7r2 (k) A (i < j < k ==> ro(j)).

3 For example, the star-free expression for E*(20*2)E* is 02 {1, 2}20.

35 Our result gives an algorithm for this sentence and arbitrarily complex sentences like it. We see this as a far-reaching generalization of Grover's algorithm, which extends the Grover speedup to a much wider range of string processing problems than was previously known. 4

3.1 Results Overview

Our main result is the following:

Theorem 17 (informal). Every regular language has quantum query complexity E(1), E(yGn), or 0(n). Moreover, the quantum of each language matches its query complexity.

The theorem and its proof have several consequences which we highlight below.

1. Algebraic characterization: We give a characterization of each class of regu- lar languages in terms of the monoids that recognize them. That is, the monoid is either a rectangular band, aperiodic, or finite. In particular, given a descrip- tion of the machine, grammar, etc. generating the language, we can decide its membership in one of the three classes by explicitly calculating its syntactic monoid and checking a small number of conditions. See Section 3.4.

2. Related complexity measures: Many of the lower bounds are derived from lower bounds on other query measures. To this end, we prove query dichotomies for deterministic complexity, randomized query complexity, sensitivity, block sensitivity, and certificate complexity-they are all either E(1) or 6(n) for reg- ular languages. By standard relationships between the measures, this shows

4Readers familiar with descriptive complexity will recall that AC 0 has a similar, but somewhat more general characterization in first-order logic. It follows that all star-free languages, which have quantum query complexity O(v/'), are in AC. Conversely, we will show that regular languages not in ACO have quantum query complexity Q(n). Thus, another way to state the trichotomy is that very roughly speaking regular languages in NC 0 have complexity 0(1), regular languages in AC° but not NCO have complexity e(v/), and everything else has complexity Q(n). 5 There are two caveats: the quantum query complexity may oscillate between asymptotically different functions; the quantum query complexity may also be zero. For the formal statement of this theorem see Section 3.4.

36 that approximate degree and quantum query complexity are either 0(1) or

Q(\/n). See Section 3.7.

3. Generalization of Grover's algorithm: The BQP algorithm using O(v-)

queries for star-free regular languages extends to a variety of other settings given

that the star-free languages enjoy a myriad of equivalent characterizations. The

characterization of star-free languages as first-order sentences over the natural

numbers with the less-than relation shows that the algorithm for star-free lan-

guages is a broad generalization of Grover's algorithm. See Section 3.5 for the

description and proof of the star-free algorithm and Section 3.3 for applications.

4. Star-free algorithm from faster unstructured search: The O(fri) algo-

rithm for star-free languages results from many nested calls to Grover search, using the speedup due to multiple marked items. However, a careful analysis

reveals that whenever this speedup is required, the marked items are consec-

utive. We show that these Grover search calls can then be replaced by any

unstructured search algorithm. Therefore, any model of computation that has

faster-than-brute-force unstructured search will have an associated speedup for

star-free languages. Consider, for example, the model of quantum computation

of Aaronson, Bouland, Fitzsimons, and Lee in which non-collapsing measure-

ments are allowed [4]. It was shown that unstructured search in that model re-

quires atmost(n1/3) queries, and therefore, star-free languages can be solved in O(ni/3) queries as well.

Finally, we stress that this trichotomy is only possible due to the extreme uni- formity in the structure of regular languages. In particular, the trichotomy does not extend to another basic model of computation, the context-free languages.

Theorem 18. For all limit computable6 c E [1/2,1], there exists a context-free lan- guage L such that Q(L) = O(+<) and Q(L) = Q(nc-e) for all c > 0. Furthermore,

6 We say that a number c c R is limit computable if there exists a Turing machine which on input n outputs some rational number T(n) such that limn,, T(n) = c.

37 if an additive e-approximation to c is computable in 20(1/) time, then Q(L) = 0(nc). In particular, any algebraic c E [1/2,1] has this property.

In fact, the converse also holds.

Theorem 19. Let L be a context-free language such that limogQ°L c. Then, c is limit computable.

3.1.1 Proof Techniques

Most of the lower bounds are derived from a dichotomy theorem for sensitivity-the sensitivity of a regular language is either 0(1) or Q(n). In particular, we show that the language of sensitive bits for a regular language is itself regular. Therefore, by the pumping lemma for regular languages, we are able to boost any nonconstant number of sensitive bits to Q(n) sensitive bits, from which the dichotomy follows. The majority of the work required for the classification centers around the (fr¶) quantum query algorithm for star-free languages. The proof is based on Schiitzen- berger's characterization of star-free languages as those languages recognized by finite aperiodic monoids. Starting from an aperiodic monoid, Schiitzenberger constructs a star-free language recursively based on the "rank" of the monoid elements involved. Roughly speaking, this process culminates in a decomposition of any star-free lan- guage into star-free languages of smaller rank. Although this decomposition does not immediately give rise to an algorithm, the notion of rank proves to be a particularly useful algebraic invariant. Specifically, we use it to show that given a (f)-query algorithm for membership in some star-free language L, we can construct a (")- query algorithm for E*LE*. This "infix" algorithm is the key subroutine for much of the general star-free algorithm.

3.1.2 Related Work

We are not the first to study regular languages in a query-complexity setting. One such example is work in property testing by Alon, Krivelevich, Newman, and Szegedy.

38 They show that regular languages can be tested7 with 0(1/c) queries [10]. Interest- ingly, Alon et al. also show that there exist context-free grammars which do not admit constant query property testers [10]. In Section 3.8, we show that context-free languages can have query complexity outside the trichotomy. A second example comes from work of Tesson and Th6rien on the communication complexity of regular languages [83]. As with query complexity, several important functions in communication complexity happen to be regular, e.g., inner product, disjointness, greater-than, and index. They show that for several measures of commu- nication complexity, the complexity is (1), E(log log n), E(log n), or e(n). Clearly, there are many parallels with this work, but surprisingly the classes of regular lan- guages involved are different. Also, communication complexity is traditionally more difficult than query complexity, yet the authors appear to have skipped over query complexity-we assume because quantum query complexity is necessary to get an interesting result. There are also striking parallels in work of Childs and Kothari, who conjecture a dichotomy for the quantum query complexity of minor-closed graph properties [32]. Minor-closed graph properties are not, to our knowledge, directly related to regular languages, but they are morally similar in that both are very uniform- (almost) every part of the input is treated the same by the property. Childs and Kothari show that such properties have query complexity E(n3 / 2), except for forbidden subgraph properties which are o(n3 / 2) and (n), and are conjectured to be O(n). Even some of the proof techniques are similar-the proof that forbidden subgraph properties are Q(n) could be phrased in terms of block sensitivity, like our Q(y¶) lower bound for non-trivial languages. Finally, we are aware of one more result on the complexity of star-free languages prior to our work. It is possible to show that star-free languages have o(n) quantum query complexity, just barely enough to separate them from non-star-free languages.

7We say a language L is testable with constantly many queries if there exists a randomized algorithm such that given a word w E E", the algorithm accepts w if w E L, and the algorithm rejects w if at least en many positions of w must be changed in order to create a word in L. The algorithm is given O(1/E) many queries to w.

39 This result is a combination of two existing results: Chandra, Fortune, and Lipton [31] show that star-free languages have (very slightly) super-linear size AC° circuits; Bun, Kothari, and Thaler show that linear size AC° circuits have (moderately) sublinear quantum query complexity [28]. This connection was pointed out to us by Robin Kothari.

3.2 Background

This section introduces both regular languages and basic query complexity measures and their relationships. In particular, we will focus on algebraic definitions of regular languages as they serve as the basis for many of the results in this chapter. Readers familiar with query complexity can skip much of the introduction on that topic, but may still want to read Section 3.2.2 on extending the complexity measures to larger alphabets.

3.2.1 Regular languages

The regular languages are those languages that can be constructed from 0, {}, and singletons {a} for all a E E using the operations of concatenation (e.g., AB), union (e.g., A U B), and Kleene star (A*). A regular expression for a regular language is an explicit expression for how to construct the language, traditionally writing I for alternation (instead of union), and omitting some brackets by writing a for {a} and E for {E}. For example, over the alphabet E = {0, 1}, the OR function can be written as regular expression E*1E*, and the languages of all strings such that there are no two consecutive l's is (0|10)*(E|1). The class of regular languages has extremely robust definitions and many equiv- alent characterizations. For instance, some machine-based definitions' include those

8 Let A be a set of strings. Define A* = {ai. .. a : k > 0, ai E A}, that is, the concatenation of zero or more strings in A. We will also use A+ ={a 1 ... a : k > 1, ai E A} to capture one or more strings. 9We assume familiarity with the basic machine models for regular languages-see [79] for an introduction.

40 languages accepted by deterministic finite automata (DFA), or by non-deterministic finite automata (NFA), or even by alternating finite automata. Regular languages also arise by weakening Turing machines, for example by making the machine read-only or limiting the machine to o(log log n) space. For our purposes, some of the most useful definitions of regular languages are algebraic in nature. In particular, regular languages arise as the preimage of a subset of a finite monoid under monoid homomorphism. 1 First, we say that language L C E* is recognized by a monoid M if there exists a monoid homomorphism : E* -- M (where E* is a monoid under concatenation) and a subset S C M such that

L = {w C E*: p(w) E S} =y (S).

Theorem 20 (folklore). A language is recognized by a finite monoid iff it is regular.

In fact, starting from a regular language, we can specify a finite monoid recognizing it through the so-called syntactic congruence. Given language L C E*, the syntactic

congruence is an equivalence relation -L on E* such that x -L y if

V,vE E*, uxv EL - yv E L.

Thus, ~L divides E* into equivalence classes. Furthermore, -L is a monoid congru- ence because u -L v and x ~L y imply ux ~L vy. This means the equivalence classes of E* under ~L are actually congruence classes (because they can be multiplied), defining a monoid ML which we call the syntactic monoid of L. Finally, it is not hard to see that the map p: E* - ML, from a string to its congruence class, is a homomorphism. Therefore, by Theorem 20, the syntactic monoid for any regular language is finite. The most important subclass of regular languages are the star-free languages. These languages are recognized by a variant of regular expressions where complement

'A monoid (M, -,1M) is a set M closed under an associative binary operation -: M x M -+ M with an identity element 1 M C M. A monoid homomorphism is a map from one monoid to another that preserves multiplication and identity.

41 (A) is allowed but Kleene star is not. We call these star-free regular expressions. For convenience, star-free regular expressions sometimes contain the intersection opera- tion since it follows by De Morgan's laws.

Note that star-free languages are not necessarily finite. For example, E* can also be expressed as 5, the complement of the empty language. Similarly, 0* is 5(E\{0})0, the set of strings which do not contain a string other than 0. Once again, an algebraic characterization of star-free languages will be particularly useful for us. First, we say that a monoid M is aperiodic if for all x E M there exists an integer n > 0 such that

= Xn+1.

Theorem 21 (Schiitzenberger [74]). A language is recognized by afinite aperiodic monoid iff it is star free.

We also define a subset of the star-free languages, which we call the trivial lan- guages. Intuitively, the trivial languages are those languages for which membership can be decided by the first and last characters of the input string," which we formalize as those languages accepted by trivial regular expressions. A trivial regular expression is any Boolean combination of the languages alaE*a, aE*b, and e for a -/b E E.

The algebraic characterization of trivial languages will need to use both the prop- erties of the monoid and the properties of the homomorphism onto the monoid. To that end, we say that language LC E* is recognized by a monoid homomorphism o: E* -+ M if L = {w E E* : p(w) E S} = 1- (S) for some subset S C M. Finally, a monoid M is a rectangularband if for r, s, t E M, each element is idempotent, r2 = r, and satisfies the rectangular property, rst = rt.

Theorem 22 (Section 3.4.3). A language is recognized by morphism V such that p( E+) is a finite rectangular band iff it is trivial.

"More generally, trivial languages are decided by a constant size prefix and/or suffix of the input, but the processing we do to formalize the trichotomy theorem compresses those substrings to length 1. See Section 3.4.

42 3.2.2 Query complexity with non-binary alphabets

In this section, we discuss how alphabet size affects the various query measures. Recall that the query complexity measures above are usually defined for Boolean functions. Nevertheless, we would like to extend the known relationships between the complexity measures to functions over larger (yet constant) alphabets. While it is true that many of these relationships generalize without too much work, we would like to avoid reproving the results one at a time. Our solution is to simply encode symbols of E as binary strings of length A

[log|E|]. If the size of the alphabet E is not a power of two, we can simply map the extra binary strings to arbitrary elements of E. This maps a language L C E* to a language Lin {0, 1}* over binary strings. Since regular languages are closed under inverse morphism, Li"n is regular if L is regular. It is also easy to see that almost all complexity measures are changed by at most a constant factor when converting to a binary alphabet. For example, D(L)(n) < D(Li"n)(An) since for any bit we look at, there is some symbol we can examine that tells us that bit. In the other direction, D(Lbin)(n) < AD(L)(An), since we can query the entire encoding of any symbol we query. Similarly, the encoding changes Ro, R, Q, s, C, and (with some additional work) deg, by at most a constant factor. The exception is block sensitivity. It is clear that bs(L)(n) bs(Lbin)(An), since for any sensitive block of symbols there is some way to flip it, and this changes some block of bits. In the other direction, a block of sensitive bits gives a block of sensitive symbols in the obvious way, but then disjoint blocks of bits will not necessarily map to disjoint blocks of symbols, so it is difficult to say more for general languages.

Theorem 23. Let L C E* be a regular language. Then, there exists constant c such that bs(L)(n) > c -bs(Lbin)(An) for all n.

Proof. We borrow a dichotomy result 1 2 from Section 3.6, namely Corollary 49-any flat regular language has sensitivity either 0(1) or Q(n). Since L is a regular language 1 2 Note that Corollary 49 is true for any alphabet size and does not depend on Theorem 23, so the argument is not circular.

43 and not necessarily flat, we also borrow Theorem 29 from Section 3.4-membership in L reduces to membership in some flat language based on some finite suffix of the input string. Therefore, for every length n, the sensitivity s(L) is either constant or Q(n), which we use to split the proof into two cases. If the sensitivity s(L) is constant, then s(Lbin) is also constant. This implies that bs(Lbin) is constant by Theorem 3. Therefore, bs(L) is also constant since bs(L)(n) < bs(Lbin)(An).If the sensitivity s(L) is not constant, then it is linear by the dichotomy theorem. Therefore, s(L)(n) bs(L)(n) < bs(Lbin)(An) implies block sensitivity is linear for both languages from which the theorem follows. E

With this theorem, every regular language and its encoding have the same com- plexity for all of the measures we are interested in, up to constants. Therefore, we will lift known relationships between complexity measures in the Boolean setting to the general alphabet setting without further comment.

3.3 Applications of Star-free Algorithm

We give quantum quadratic speedups for several problems simply by showing that the underlying language is star free. To the authors' knowledge, no quantum quadratic speedups for any of the following problems were previously known.

3.3.1 Dynamic AND-OR

Consider the language 2E*2\E*20*2E*, where E = {0, 1, 2}. We call this the dynamic AND-OR language, for reasons which may not be evident from the regular expression alone. Think of the 2's as delimiting the string into some number of blocks over{0, 11. We take the OR of each block and the AND of those results to decide if the string is in the language. That is, if there is some pair of consecutive 2's with no intervening 1, then that block evaluates to 0, and the whole string is not in the language. It has long been known that the quantum query complexity of the AND-OR tree, or more generally Boolean formulas with constant depth, is 8(Vi) [55]. In that case,

44 however, the tree or formula is fixed in advance and not allowed to change with the input. Nevertheless, our quantum algorithm for star-free languages implies that even the dynamic version of the AND-OR language (as well as the dynamic generalization of constant-depth Boolean formulas [17]) can be decided with 5(Vn) queries and, moreover, there is an efficient quantum algorithm.

3.3.2 Bounded Dyck language

Next consider the language of balanced parentheses, where the parentheses are only allowed to nest k levels deep. When k is unbounded, this is called the Dyck language.

When k = 1 this is the language of strings of the form ( ( ... 0, which has a simple Grover search speedup-search for (( or )). However, the language quickly becomes more interesting as k increases. Nevertheless, for any constant k, this language is known to be star free [35], and therefore has an O(¶) quantum algorithm by our classification.

3.3.3 Addition

Chandra, Fortune, and Lipton [31] observed that binary addition can be described by a monoid product. Specifically, a product over a monoid M with elements {S, R, P} (set, reset, propagate) satisfying

xS=S, xR=R, xP=x, for all x E M. The idea is that given two n-bit numbers, we map each column to a monoid element (i.e., 00 -+ R, 01,10 - P, 11 - S) and then the prefix product to a particular column (starting from the least significant column, so perhaps suffix product is more appropriate) indicates whether there is a carry in the next column

(R, P ==> no carry, S ==> carry). Chandra et al. show that there are ACO circuits for computing all prefix products, and thus binary addition can be computed in AC. Since the monoid is aperiodic, our result implies that the product of any prefix can be computed with O(F) queries to the input, and therefore any particular output bit

45 of a binary addition can be computed in the same number of queries. Similarly, the regular language accepting triples of binary numbers (represented a column at a time) such that the first two sum to the third is star free (the monoid is essentially M, adjoin a zero element _L which arises when the string is inconsistent with any valid addition). This implies that addition can be checked in (V-) quantum queries. Unfortunately, we cannot construct the sum in (f) queries for information theoretic reasons: if one of the summands is zero then the sum is exactly the other summand, which we should not be able to reconstruct in fewer than Q(n) queries. Furthermore, we can extend these results to the addition over any base k, for an integer k > 2. In fact, we use the exact same monoid. For example, in decimal, if sum of the digits in a column is more than 9, then a carry will be created. If the sum of the digits is less than 9, then even if there is an incoming carry, there will be no outgoing carry. And if the sum of digits is exactly 9, then a carry will propagate.

3.3.4 Length-2 Word Break

Problem 24 (Word Break Problem). Given a finite dictionary of strings D C Z* and a string w C E*, decide whether w C D*. That is, can w be written as a concatenation of words in D ?

There exists a straightforward dynamic program (DP) which solves this problem in polynomial time. Faster solutions exist (e.g., [15]), but still heavily rely on DP. Since DP is sometimes claimed to be incompatible with quantum speedups [13], we find it surprising that our result gives a speedup on the following (limited) special case of the word break problem.

Theorem 25. Fix a dictionary D C EUE2 containing strings of length 1 or 2. Given a string w E E*, there is a ( ) query algorithm to decide whether w E D*.

The result follows from a lemma characterizing the syntactic monoids of such languages.

Lemma 26. Let D CZ* be a set of strings of length at most 2. Let M be the flattened

46 syntactic monoid of D*. For any m E M, we show that m 2 = m3 . It follows that M is aperiodic.

Proof. It is clear that the identity element 1 c M has the property that 12 is. For any other m E M, m 1, we can find a string y E E* which p maps to m. Let n be the length of y. We may assume n is even because the monoid is flat.

The statement m 2 = m3 is equivalent to saying that for all x, z CE*,

xy2z E D* -- > z c D*.

We will argue this by showing that for any w E D* containing y2 , there is a substring u in y 2 , aligned to the word breaks and of lengthn= yI. This substring can be pumped up or down, to show the-> and «- directions respectively.

Now assume y2 is contained in some concatenation of words from D, and consider the positions where there are word breaks. If we find two word breaks (including the endpoints of the string, but not both endpoints since then we would pump all of y2 I) at the same position modulo n, we are done because we immediately have a pumpable substring. In particular, if there are n + 1 word breaks within y 2 , then pigeonhole principle implies there are two at same position modulo n. We are necessarily close to this limit since ly2 |= 2n, and words in D have length at most 2, so the concatenation involves at least n words.

Let us do the math more carefully. Suppose we have a concatenation of words in with n word breaks (i.e., n + 1 words) at n different positions modulo n. Since n is even, there must be breaks at both odd and even positions. It follows that at least one of the words in the concatenation has length 1, so the entire concatenation has length at most 2n +1. This is just short enough that y 2 and the concatenation must share an endpoint. This endpoint plus the n word breaks already in y 2 give us n+ 1 positions to apply the pigeonhole argument from before, finishing the proof.

We also note that this result is tight; if the dictionary contains even a single word of length 3 or more, the query complexity may be Q(n). For example, consider

47 D := {0, 11, 101}*, and note that the parity of a bit string x1 ... x, can be decided

by testing whether 1x 111x2 1 ... 1X»1 is in D*.

3.3.5 Grid problems

There are many instances of problems on grids which turn into regular languages if one of the dimensions is restricted to be constant. For example, 3-colorability is NP-complete for 4-regular planar graphs [45], and such graphs may be embedded into the grid with rectilinear edges [92]. However, if one dimension of the grid is constant size then the problem becomes regular under a suitable encoding. In this section, we consider a grid problem such that the constant-height restriction is star free. This leads to an efficient O(/i) quantum query algorithm, which is otherwise difficult to see.

Problem 27 (Grid Path Problem). Given an m x n grid of cells, some of which are impassable, decide whether there is a path from the bottom left corner to the top right corner.

For constant m, let

L = {w E ({O0, }" grid represented by w contains a path}. be the language of grids which have a path from the lower left corner to the top right corner. First, consider a monotone version of the grid path problem in which the path is only allowed to go up or to the right at each step. In this case, there is a straightforward first order logic characterization of this language, in which the existential quantifiers are used to guess the finitely-many positions at which the path's y-coordinate increases. Such a direct characterization will not suffice for the language L since there is no succinct way to describe a general path. Instead, we appeal to a more sophisticated approach of Hansen et al. based on a monoid which recognizes this language [53]. Roughly speaking, the monoid elements describe sets of compatible paths between

48 the ends of a grid. Thus, by multiplying the monoid elements corresponding to each column of the grid, one can determine membership in L. Hansen el al. show that the monoid is aperiodic, which immediately gives a faster quantum query algorithm using our classification:

Corollary 28 (Combining [53] with star-free algorithm). Q(L)= O(Vfi).

In fact, the monoid elements keep track of multiple disjoint paths through the grid (which is necessary if the path backtracks through a particular section of the grid), so one can decide whether there exist 0(1) disjoint paths through the grid.

3.4 Formal Statement of Trichotomy Theorem

The naYve version of the trichotomy theorem states that the quantum query complex- ity of a regular language is always E(1), 6(V5/i), or O(n). Unfortunately, this is not strictly true. We now explain the difficulty and a technique which we call "flattening" that allows us to formalize this statement. Let us see why flattening is necessary. Consider any language which has large quantum query complexity (e.g., parity) and take its intersection with (E2 )*, the language of even length strings. When the input length is odd, we know without any queries that the string cannot be in the language. When the input length is even, we have to solve the parity problem, which requires Q(n) queries. Thus, the query complexity oscillates drastically between 0 and 8(n) depending on the length of the input. Strictly speaking, this means the complexity is neither E(1), E(V i), nor O(n); the naYve statement of the trichotomy is false. We want to state the trichotomy only for languages which are length-independent. Fortunately, a DFA cannot count how many symbols it reads. With finite state, the best a DFA can do is count modulo some constant. Thus, if there is any dependence on length, it is periodic. Similarly, a language may have periodic dependence on position. For example, consider the language of all strings with exactly two 1s. This language is star free and therefore has an O(V) quantum query algorithm. If we

49 further require the Is to be an even distance apart, the language is no longer star free, but clearly has an O(Vd) quantum query algorithm. Flattening will reduce this language to a collection of star-free languages, and in general it will remove periodicities not inherent to the query complexity of the language. Before continuing with flattening, we address a different way to handle length dependence. That is, redefine the quantum query complexity of a function to be the minimum number of quantum oracle calls needed to compute the function on inputs of length up to n (rather than exactly n). For this definition, notice that the quantum query complexity is nondecreasing. In Section 3.4.4 we show that trichotomy theorem holds for all regular languages under this definition as a simple consequence of Theorem 17, the trichotomy theorem for flat languages. To be clear, we will continue to use the standard definition of quantum query complexity for the remainder of the chapter.

3.4.1 Flattening

The main idea behind flattening is to eliminate a language's periodicities by dividing the strings into blocks. For any string x E E* of length kn, we can reimagine x as a length-n string over Ek. This operation can be applied to a language by keeping only strings of length divisible by k and projecting them to the alphabet Ek. Flattening a regular language applies this operation to the language for some carefully chosen k to be determined later. Nevertheless, we argue that the language and its flattened version are essentially the same since we are simply blocking characters together. We formalize this in the following theorem.

Theorem 29. Let L C E* be a regular language recognized by a monoid M. There exists an integer p > 2 and a finite family of flat regular languages{Li}i 1 over alphabet EP such that testing membership in L reduces (in fewer than p queries) to testing membership in Li for some i. Furthermore, the same monoid M recognizes L and every Li (although there may be a simpler monoid which recognizes Li).

Before we prove this theorem, let us start with some precise definitions. Let

50 p: E* - M be a monoid homomorphism onto a finite monoid. Let Ek denote the non-empty strings of length divisible by k. The conductor is the least integer K such that p(EK) = y(EnK) for all n > 1. A regular language L C E* recognized by the morphism p: E* -+ ML onto its syntactic monoid is flat if its conductor is 1. Once we convert the language to blocks of size K (i.e., alphabet EK), any con- gruence class of the monoid containing a non-empty string contains strings of all

(non-zero) lengths. This is such a useful property, let us state it explicitly to refer back to it later.

Property 30. Let L C E* be a flat regular language. For any non-empty string x C E+, and any non-zero length k > 0, there exists a string y c Ek of length k such that for any u,vc E*, uxv CL uyv E L.

That is, x and y belong to the same congruence class.

In other words, for any non-empty string x, we can replace (substring) occurrences of x with some string of every (non-zero) length, without changing membership in the language. Notice that a flat regular language cannot have a length dependence, otherwise we would replace the first few letters with something slightly longer or shorter to reduce the problem to whichever nearby length is easiest. However, we still need to show K, and therefore flat regular languages, exist.

Theorem 31. For any homomorphism p: E* - M onto a finite monoid, the con- ductor is finite and computable.

Proof. Let A: E* - N be the homomorphism mapping strings to their lengths. The set Ar := A(p- 1 (r)) is ultimately periodic, i.e., there exists p such that A, and Ar+p differ at finitely many points. This may be easier to see by mapping p--1 (r) to unary, and since the language is still regular, considering the DFA. Let K' be the least common multiple of the period of Arfor all r E M. We will take K to be a multiple of K', so we may as well assume without loss of generality that the period of A is 1. When a set of natural numbers has period 1, it is either finite or cofinite. Take K larger than all the finite exceptions in either case. That is, for all r, take K larger

51 than the maximum element in A, (if finite) and the maximum element not in A, (if cofinite). The result is that each A, n KN is one of 0, {0}, KN, or KN\{0}. Only the identity class, A 1 , can contain 0, so all other A, are either 0 or KN\{0}. We throw away r E M such that A, = 0, and the remaining elements have the property, by construction, that they are the images of strings of all lengths divisible by K. O

We are finally ready to restate and prove Theorem 29, which states that any regular language can be divided into a collection of flat languages.

Theorem 29. Let L C E* be a regular language recognized by a monoid M. There

exists an integer p > 2 and a finite family of flat regular languages{ Lj}iEr over alphabet EP such that testing membership in L reduces (in fewer than p queries) to testing membership in some Li. Furthermore, the same monoid M recognizes L and every Li.

Proof. Let p be the conductor of L. Consider an input x E E* of length n. Clearly we can divide x into a string x' E (EP)* of length [n/pJ, and a remainder r E E* of length less than p. For each such r, we define the language

L, := {y E (EP)* : yr E L}, slightly abusing notation so that y denotes both a string over EP and a string over E. We leave it as an exercise to show that L, is regular. By construction, x is in L if and only if x' is in L, so by looking at length of the input and the lastIrl symbols, we have reduced testing membership in L to membership in Lr. Finally, let 9: (EP)* -+ M denote the extension of p to strings over EP. Note that we can write Lr as

Lr = {y E (EP)* : 54(y)5p(r) ES}

= {y E (EP)* : e(y) E {q M: qW(r) E S}} = (e)-K({q E M : qw(r) S}).

52 It follows that L, is recognized by M. By construction, the conductor of L, is 1, so L, is flat. E

To summarize, any regular language can be reduced (or flattened) to a collection of flat regular languages. Some of these languages may be easier than others, but they are all length-independent, and thus suitable for our trichotomy theorem.

3.4.2 Trichotomy Theorem

We are now ready to formally state Theorem 17. Technically, there are a few regular languages (even flat languages), which can be decided with zero queries, strictly from the length of the input. This divides the languages into the following four classes (i.e., a tetrachotomy).

Theorem 17. Every flattened regular language has quantum query complexity 0, 6(1), E(V/n), or 6(n) according to the smallest class in the following hierarchy that contains the language.

* Degenerate: One of the four languages 0, e, E*, or E+.

STrivial: The set of languages which have trivial regular expressions.

* Star free: The set of languages which have star-free regular expressions.

• Regular: The set of languages which have regular expressions.

Note that each class is contained in the next. Furthermore, the quantum time com- plexity of each class matches its query complexity.

Nevertheless, we refer to this classification as a trichotomy. We either think of degenerate and trivial languages under the category of "constant query regular lan- guages" or, alternatively, disregard the degenerate languages entirely because they are uninteresting.

53 3.4.3 Equivalence of algebraic and regular expression defini- tions

As it turns out, the regular expression descriptions, some of which were already men- tioned in Section 3.2, are not particularly useful for the classification. We will prefer the following algebraic/monoid definitions of the languages, and use them throughout.

Theorem 32. Let L be a regular language.

• L is degenerate if it is recognized by morphism V such that |p(E+= 1.

• L is trivial if it is recognized by morphism o such that o(E+) is afinite rect- angular band.

• L is star free if it is recognized by a finite aperiodic monoid.

• L is regular if it is recognized by afinite monoid.

Since Theorems 20 and 21 give characterizations for the regular and star-free languages, respectively, we prove the above theorem only for the degenerate and trivial languages.

Proposition 33. A language is recognized by morphism p such thatI AF-I= 1(ifE it is degenerate.

Proof. Recall that there are only four degenerate languages: 0, E, E*, or E+. First, we claim that the morphism o: E* - ML onto the syntactic monoid of each language is such that |p(E+)| = 1. This calculation is straightforward, and we leave it as an exercise.

Let language L C E* be recognized by morphism o such that 9| (E+)| = 1. For any x, y E E+, we have that o(x) = p(y). Therefore, x C L iff y E L. This only leaves four possible choices of languages based on whether or not E+ E L and whether or not E E L. These are exactly the degenerate languages. 0

Theorem 34. A language is recognized by morphism o such that o(E+) is a rectan- gular band if it is trivial.

54 Proof. Suppose first that L is a regular language recognized by homomorphism 0 : E* -+ M such that o(E+) is a rectangular band. Suppose a E E belongs to L.

We want to show that aE*a is also in L. For any w E E+, we have that §(a) = p(aa)= §(awa), where the first equality comes from idempotence of M and the second equality comes from the rectangular band property. Therefore, if a E L, then so is aE*a. Similarly, this implies that if awa E L for a E E and w E E*, then a c L and aE*a E L. A similar argument shows that if a f b E E and awb E L for some w E E*, then aE*b E L. Finally, membership of E is independent of W, so it may either be in the language or not in the language. Now suppose that L is a trivial language. Define monoid M = (E x E) U{(,)} with operation (a, b) - (c, d) = (a, d) for all a, b, c, d E E, and (a, b) = (E, E) - (a, b) (e, E) - (a, b). Define morphism W : E* -+ M such that (a) = (a, a) for a c E U{Ej. Therefore, §(awb) = (a, b) for a, b zE and w E E*. Define S C M, such that

(a, a) C S if a E L, (a, b)E S if aE*b E L, and (E, E) E S if E E L. By construction, we claim that L = W-1 (S), which completes the proof. l

One might wonder why we needed to reference the homomorphism W explicitly in the definition of the degenerate and trivial languages, when the other classes only needed a characterization of the monoid itself. In that case, each class of languages would be a variety. Unfortunately, such a characterization does not exist due the following theorem of Eilenberg:

Theorem 35 (Eilenberg's Variety Theorem [39]). If V is a class of monoids and £ is the class of regular languages whose syntactic monoids lie in V, then V is a monoid variety only if L is a language variety.1 3

Consider the degenerate language A = E+ and star-free language B = E*1E* over alphabet E = {O, 1}. We claim that B is the inverse morphism of A by x: E* -+ E* such that x(O) = e, x(1) = 1. Since B is clearly nontrivial, the trivial languages are 1 3 A class of regular languages is an language variety if it is closed under Boolean operations, left and right quotients, and inverse morphisms. For x c E*, the left quotient of language L by x is the language x--L = {z : xzCz L}. Let x: E --> E* be a homomorphism, and let L C E* be equal to EmES 1 (im) for some subset S of the syntactic monoid. The inverse morphism of L by X is the language X- 1L = EmEs X- 1 o2

55 not closed under inverse morphism. Therefore, by the Variety Theorem, the class of trivial languages is not a variety.

3.4.4 Monotonic query complexity

Let us now consider an alternative to flattening-namely, modifying the definition of query complexity so that it is nondecreasing. For this section only, define the quantum query complexity Q(f)(n) of function f to be the minimum number of quantum oracles calls needed to determine the value of f on all strings of length up to n. When query complexity is defined in this way, we can prove a quantum query complexity trichotomy theorem for all regular languages as a corollary of our trichotomy theorem for flat languages.

Theorem 36. Let LC E* be any regular language. The quantum query complexity of L is either 0, e(1), 5(vn), or ((n).

Proof. By Theorem 29, we have that L is a finite disjoint union of languages Lr where each r E E* has length less than p. Technically, Lr is over the alphabet EP, but we extend strings in Lr to strings over alphabet E in the obvious way. If all Lr have constant query complexity, then Q(L) = 0 or Q(L) = E(1). Therefore, assume there is some Lr such that Q(Lr) = w(1). We will show that Q(L) = E(max, Q(Lr)). Let us consider one algorithm for L on strings of length +i where i < p: query the last i characters of the string to determine r, and then use at most pQ(Lr)(n) queries to test the rest. Therefore, we have1 4

Q(L)(np + i) ; maxpQ(L,)(n) +p. r

In the other direction, notice that by decreasing the length of the string by at most p, we can have any remainder string r. By the modified definition of query complexity, shortening the length must decrease the query complexity. Since we can

1 4We now see the need to separate the constant and non-constant cases. The additive p factor would technically take a 0-query algorithm to an E(1)-query algorithm, which we want to avoid.

56 force the query algorithm to solve any smaller instance of a flat language L, we have

Q(L)(np + i) > max Q(L,)(n - 1). r

That is, Q(L) = e(max, Q(Lr)) from which the theorem follows. l

3.4.5 Structure of the proof

We separate the proof of the trichotomy into two natural pieces: upper bounds (Sec- tion 3.5) and lower bounds (Section 3.7). The upper bounds are derived directly from the monoid characterizations of the various classes. Given a flat language, we con- struct explicit algorithms using at most 0 queries for degenerate languages, 2 queries for trivial languages, O(-) queries for star-free languages, and n queries for regular languages.

The lower bound section aims to prove that these are the only possible classes.

First, we show that any non-degenerate language requires at least one quantum query.

We then show that any nontrivial language requires w(1) quantum queries. At this point, we will appeal to a dichotomy theorem for the block sensitivity of regular lan- guages, which we prove in Section 3.6. From this dichotomy and standard relation- ships between the complexity measures, we get that any regular language requiring w(1) quantum queries actually requiresQ(/n) queries. Finally, we show that any non-star-free language requires Q(n) queries, completing the proof.

3.5 Upper Bounds

In this section, we will describe the algorithms for achieving the query upper bounds in Theorem 17. As a warm-up, we will first consider every class besides the star-free languages. Each algorithm will follow trivially from the monoid characterization of each class.

Proposition 37. Any regular language has an0(n)-time deterministic algorithm.

The trivial languages have constant-time deterministic algorithms. The degenerate

57 languages have 0-query deterministic algorithms.1 5

Proof. Let L C E* be a regular language. Let p be the homomorphism onto its syntactic monoid ML such that L = {-1(s) : s E S C ML}. Let x = xi ... xC E'.

We have that x E L iff p(Xi)W(x 2) ... p(n) E S. Since ML is finite and W is specified by a finite mapping from characters to monoid elements, this product is computable in linear time. Suppose L is trivial. Consider input x = ayb where a, b E E and y E E*. By the rectangular band property, we have V(x) = yp(a)p(y)p(b) = W(a)W(b). That is, x E L iff p(ab) E S. Suppose L is degenerate. Consider some input xE E*. If |x| = 0, then x E L iff W(E) E S. If |x| > 0, then p(x) E W(E+) = {s} so x C L iff s E S. Since the query algorithm knows the length in advance, no queries are needed to determine the membership of x.

Of course, the existence of these deterministic algorithms implies their correspond- ing query upper bounds as well. Much more interesting is the (v/i) quantum algo- rithm for star-free languages to which the remainder of this section is dedicated. Much like Proposition 37, we will use the monoid characterization as our starting point for the algorithm; however, before delving directly into the details of the algorithm, we give some techniques and ideas that will be pervasive throughout.

3.5.1 Proof techniques

In this section, we introduce a basic substring search operation and a decomposition theorem (due to Schiitzenberger) for aperiodic monoids.

Splitting and infix search

Consider the language L = E*20*2E* over the alphabet E = {0, 1, 2}, that is, the problem of finding a substring of the form 20*2. We call the problem of finding

1 5Note, the power of constant-time algorithms depends on the particular model of computation. We assume a RAM model where the length of the input string is given, and arithmetic on indices can be performed in constant time.

58 a contiguous substring satisfying a predicate infix search. Since L is star free, our trichotomy theorem implies that infix search for the language 20*2 is possible with O(fri) queries.

Consider the following algorithm for L: Grover search for an index i in the middle of a substring 20*2, searching outwards to verify that there is a substring of the form 20* immediately before the index (suffix search) and a substring of the form 0*2 immediately after (prefix search). More precisely, we can use Grover search to check whether a substring is all Os, then binary search to determine how far the Os extend on either side of the index, and finally check for 2s on either end.

We introduce a few ideas necessary to prove this algorithm for L is efficient, and to generalize it to arbitrary languages. The first tool we need is Grover search, to help us search for the position of the substring. In particular, we use a version of

Grover search which is faster when there are multiple marked items. 16

Theorem 38 (Grover search). Given oracle access to a string of length n which is 1 on at least t > 1 indices, there exists a quantum algorithm which returns a random index on which the oracle evaluates to 1 in O( n/t) queries with constant probability.

Next, the solution to E*20*2E* used the fact that given an index, we can search outwards for a substring 20* before the index and 0*2 after. Notice that the index has "split" the regular language 20*2 into two closely related languages. It is not clear every language has this property, so we introduce a notion of splitting for arbitrary regular languages.

Definition 39. We say that a language L C E* splits if there exists a constant k

16 In this section, we will need the speedup from multiple marked items. However, whenever we require the speedup, the marked items will be consecutive. In this case, we can derive the same speedup from any O(v'i) unstructured search algorithm by searching over indices at fixed intervals (a "grid" on the input). In more detail: we search for a grid size G, starting from n and halving until G is less than the number of consecutive marked items (which is unknown). Hence, the set of indices divisible by G will intersect some marked item and the search on n/G indices will succeed in o( /n/G) queries. Since the last search dominates the runtime, the entire procedure requires O( n/t) queries. In fact, there are other models of computation where unstructured search uses O(n') queries for c # 1/2 (for instance, [4]). It will turn out that the procedure described above still accelerates search for multiple consecutive marked items. This will translate to an O(n)-query algorithm for star-free languages. In particular, the runtime in Theorem 40 becomes O(n).

59 and languages A 1,..., Ak, B 1,..., Bk such that L = U i=1 B andforallxELand decompositions x = uv, there exists 1< i

Formally, 20*2 splits as (20*2)E U (20*)(0*2) Ue(20*2). In fact, every star-free languageLCE*splitsasUAB where the Ai and Bi are also star free. We will prove this in the next section in Theorem 45. We delay the proof until we have the definitions to show that the languages Ai and B, are in some sense no harder than the language L itself. Supposing we can determine membership for E*Aj and BE* efficiently, a combi- nation of Grover search and exponential search will solve the infix search problem, as shown below.

Theorem 40 (Infix search). Let language L C E* split asG1 ABr. Suppose Q(E*Ai) and Q(BiE*) are O(V) for all i E {1,... , k}. Then, Q(E*LE*)= O(/).

Proof. We perform an exponential search-doubling £ with £ initially set to 1-until the algorithm succeeds. Let x be the input and suppose there is a substring of x belonging to L of length at least £ and at most 2£, for some power of two £. Search for an index j such that xj-2 -..xj_1 E E*Aj and xj - Xj+2f-1 E BjE* for some i = 1, ... , k. This implies the substring xj- 2 ... Xj+2- is in E*AiBiE* C E*LE Since testing each index requires at mostO(vf) queries and k is constant, there are O(v-) queries to the string to test a particular index j. Recall that we assumed the matching substring has length at least £, and thus, there are £ indices of x for which the prefix/suffix queries will return true. Hence, there are at most O( n/£) total Grover iterations (Theorem 38), and the final algorithm requires only O( ji) queries.

Aperiodic monoids and Schiitzenberger's proof

At its core, the algorithm for star-free languages uses one direction of Schiitzenberger's theorem for star-free languages, which we recall from Section 3.2.

60 Theorem 21. If language L is recognized by a finite aperiodic monoid, then L is star free.

We will show that Schiitzenberger's proof can be modified to produce aO(fn) algorithm for any star-free language starting from the aperiodic monoid recognizing it. Central to this modification will be the notion of splitting introduced in the previous section. In this section we give the basic prerequisites and outline for Schiitzenberger's proof which will eventually culminate in a formal justification of splitting based on the properties of aperiodic monoids. Let M be a finite aperiodic monoid recognizing some language L E E*. Recall

1 that L =

p(m) := IM - MmMj, that is, the number of elements not in MmM {amb: a E M, b E M}. For example, p(l) = 0. Rank is a particularly useful measure of progress in the induction due to the following proposition:

Proposition 41. For any p, q E M we have p(p), p(q) < p(pq).

Proof. MpqM C MpM, so M - MpqM D M - MpM. Therefore, p(p) p(pq). Similarly, p(q) < p(pq). 1:1

It will turn out that only the identity of the monoid M has rank 0. First, we show that a product of monoid elements is the identity if and only if every element is the identity.

17Let M be a monoid and I C M be a subset. We say I is a right ideal if IM = I, I is a left ideal if MI = I, and I is an ideal if MIM = I. For example, for any m E M, mM is a right ideal, Mm is a left ideal, and MmM is an ideal.

61 Proposition 42. Let pi, . -- ,,p M be elements in an aperiodic monoid M. If P1 -.-- pn = 1 , then p1 = -.-. = pn =1

Proof. It suffices to prove the result for n = 2 and induct. Suppose 1 = pq, and then by repeated substitution,

1 = pq = p1q = p2 q2PV for any i. Since the monoid is aperiodic, there exists n > 0 such that pn+1 _pln Therefore,

p =p(plq") = pn+lqn = fn

By symmetry, q is also the identity.

Corollary 43. Let M be a finite aperiodic monoid. For any m E M, p(m) = 0 if m = 1.

Proof. Suppose that p(m) = 0 for some monoid element m E M. By the definition of rank, we have that M = MmM, and in particular 1 E M implies 1 = amb for some a, b E M. By Proposition 42, a = b = m = 1. El

It is not hard to see that <-1(1) is star free. For p(m) > 0, Schiitzenberger decomposes

Theorem 44 (Decomposition Theorem). For any m E M,

<-1(m) = (UZ* n E*V)\(E*CE* U E*WE*).

62 where

U= U p (r~ a (r,a)EE V= U ap- 1(r) (a,r)EF C = {a E E: m Mp(a)M} W = U ap'(r)b (a,r,b)EG and

E {(r, a) E M x E: (a)M = mM, rM = mM},

F = {(a, r) E E x M: Mp(a)r = Mm, Mr # Mm},

G= {(a, r, b) E E x M x E: m c (Mp(a)rM n Mrp(b)M)\Mp(a)rtp(b)M}.

Furthermore, for all r E M appearing in E, F, or G, p(r) < p(m).

Although Theorem 44 is sufficient to prove Schiitzenberger's theorem, the same inductive approach does not immediately lead to a quantum algorithm for star-free languages. For example, it is not clear how to efficiently decide membership in UE* given an algorithm for membership in U.18 In the next section, we will strengthen our induction hypothesis such that queries of this type are possible. Let us conclude this section with a splitting theorem based on Schiitzenberger's notion of rank.

Theorem 45. LetL= p 1 (m) for monoid element m EM. Then, L splits as

pq=m

18 We will show this is possible, but it requires that the language is regular. In general, a O(#)- query algorithm for a language L does not imply aO(f)-query algorithm for LE*. We have a counterexample: consider the language L of strings of the form #xo#xi#x 2# -. #Xk#such that all xi are binary strings of the same length and x= Xz for some i < k. L can be decided inO( ) queries by a Grover search. There is a clear reduction from element distinctness to LE*, therefore Q(LE*) is at least Q(n 2/ 3 ).

63 Furthermore, for all elements of the union, p(p), p(q) p(m).

Proof. We first verify equality. We have that LD Upq=mt0 1 (p)W- 1 (q) since

1 W(=Wo( (p))'((q))pq=Tm.

Furthermore,

UJp-1 (rp-'f(q)2 (m)p-1 (1) = L. pq=m

Now, suppose x E L. For any decomposition x = uv, we have that (x) = p(uv)= 1 p(u)

3.5.2 O(Wii) algorithm for star-free languages

Recall that our objective is to create an ( ) algorithm for language p-9(n), where m E M is an arbitrary monoid element. We mimic Schtzenberger's proof of Theorem 21 by constructing algorithms for each <-1(m) in the order of the rank of m. Implicit in such an argument is a procedure that must convert an efficient query algorithm for

Lemma 46. Let p:Z* -+ M be a monoid homomorphism. Suppose there exists an

(#) membership algorithm for

Proof. Consider a string x E E*. The right ideal p(x1 ... xi)M represents the set of monoid elements we could reach after reading xi ... xi.These right ideals descend as

64 we read more of the string:

M = DD(E)M

If x c L, then there is some prefix y in

Notice that r E rp(a)M implies rM C rp(a)M, and since we have rM Q rp(a)M, we conclude that r V rp(a)M. In other words, the right ideal descends from some- thing containing r (namely rM), to something not containing r (namely rVp(a)M).

To decide whether x belongs to L, it suffices to find the longest prefix xi ... i 1 such that

K:= U -1 (s). s:rEsM

This is precisely the language of strings/prefixes that could be extended to strings in

<- 1(r). We can decide membership in K with O(V/i) queries because r E sM implies MrM C MsM and hence p(s) < p(r) < k.

It is also clear that K is prefix closed: if x 1 ... i E K then r C p(xi ... X)M C

<(z1 ... xi 1 )M, so x 1 ... -1 E K as well. The empty prefix is in K, and by binary search we can find the longest prefix in K. Then, as discussed above, we complete the algorithm by checking whether the prefix is (i) in <-'(r) and (ii) followed by an a. If so, then we report x E L, otherwise x L.

We are now ready to state and prove our main theorem.

Theorem 47. For any star-free language L CZ*, there exists a quantum algorithm which solves membershipin L with O(§) queries andO(V-) time.

65 Proof. Let L = UmESY(m) for some homomorphism p: E* - M to an aperiodic

finite monoid M, and S C M. We will show that there is an algorithm for each

p--1 (m) by induction on the rank of m. Suppose first that p(m) = 0, implying that m is the identity by Corollary 43. From

Proposition 42, we know that a string is in <- 1(1) if every character is in p-1(1), i.e.,

= {ac p(a)= 1}*.

We can Grover search for a counterexample in O(f) time to decide membership in

p-1(1).

Now suppose p(m) is nonzero. Our main tool is Theorem 44, which decomposes

p- 1(m) = (UE* n E*V)\(E*CE* U E*WE*),

where U, V, C, W C E* are as they appear in that theorem statement. We will also

make reference to sets E, F, G from Theorem 44.

To give an algorithm for

nent of this Boolean combination: UE*, E*V, E*CE* and E*WE*. Since U, V, and

W are finite unions of simpler languages, it suffices to consider each language in the

union separately.

The first component is UE*, but we have already done most of the work for UE* in Lemma 46. Recall UE*= U -1 (r)a* (r,a)EE where E = {(r, a) E M x E: rp(a)M = mM, rM i mM}. This gives us an (5)- time algorithm for UE*. By symmetry, there also exists an algorithm for E*V. Recall that C = {a E E : m V Mp(a)M} is a finite set of characters, so membership in

E*CE* is decided by a Grover search for any of those characters.

The last component is E*WE*, which consists of a union of languages of the form ap-1 (r)b where (a, r,b) E G. That is, m E Mcp(a)rM and m E Mrp(b)M but

66 m V Mp(a)rtp(b)M. We can use Theorem 45 to split W into

U aop-'(p)

We hope to apply Lemma 46 to p-(q)bE* and (in reverse) E*atp-(p), then use infix search (i.e., Theorem 40) to try to find a substring in W, but first we need to verify that all the preconditions of these theorems are met-namely, that the rank of p and q are small, and a and b cause the ideal to descend.

First, the decomposition theorem (Theorem 44) gives that p(r) < p(m), and by Proposition 41, p(p), p(q) < p(r). Next, suppose that qp(b)M = qM. It follows that

Mp(a)rM = M~o(a)pqM = MWp(a)pqp(b)M = Mp(a)rp(b)M, but we know m is in Mp(a)rM and not in Mp(a)rp(b)M, so we have a contradiction from the definition of G. Hence, qp(b)M # qM, and by a symmetric argument Mp(a)p / Mp, so we have O(')-query algorithms for E*at-l(p) and

This finishes the main theorem for this section. See Algorithm 1 for pseudocode.

3.6 Dichotomy Theorems

In this section, we prove a dichotomy result for block sensitivity. This will be im- portant for the next logical step in the trichotomy theorem: proving lower bounds to match our upper bounds in Section 3.7. The core of this section is a dichotomy the- orem for sensitivity, namely that the sensitivity is either 0(1) or Q(n). This implies an identical dichotomy for block sensitivity, from which the Q(\/i) lower bound on approximate degree follows for all nontrivial languages.

Regular languages are closed under an astonishing variety of natural operations.

Our Q(#) lower bound begins with one such closure property. Recall that a symbol in a string is sensitive with respect to some input x if changing only that symbol

67 Algorithm 1 Star Free Language Algorithm > The monoid M, alphabet E, and homomorphism o: E* -+ M are fixed and known. function INFIXSEARCH(X = x[1..n],pred) > Searches for a substring matching the predicate pred. See Theorem 40. for f = 1, 2,4, .. ,n do Grover search for i suchthat pred([min(1,i-+1)..i],x[i+1..max(i+,n)] is true if i found then return TRUE return FALSE

function PREFIXCHECK(, r, a) > This function decides whether x GoE-(r)aE* as described in Lemma 46. H <- {s E M: r E sM} Binary search for largest 1 ; i < n satisfying VEHMAIN(X[1..i], s) return (x[i + 1] = a) A MAIN(X[1..i], r)

function RIGHTIDEAL(x, m) > Checks if x is in UE*. E <- {(r, a) E M x E: rcp(a)M =mM, rM 5 mM} for (r, a) E E do if PREFIXCHECK(, r, a) then return TRUE return FALSE

> Define SUFFIXCHECK and LEFTIDEAL likewise. Details omitted.

function IDEAL(x,m) > Checks ifx is in E*WE*. G <- {(a, r, b) E E x M x E : m E (Mg(a)rM n Mr(b)M)\Mgo(a)rp(b)M} for (a, r, b) E G do

if INFIXSEARCH(X, (XI, X 2 ) e Vpq=rSUFFIXCHECK(Xipa) A PREFIXCHECK( 2 , q, b)) then return TRUE return FALSE

function MAIN(X = x[1..n], m) > Decides whether x is in -1 (m). if m = 1 then return -, GROVERSEARCH(1,... , n},ii p(x[i]) 1) else LEFTIDEAL(X, M) A return RIGHTIDEAL(X, M) A -,IDEAL(x, m) A -,GROVERSEARCH((1, ... , n}, i e m Mgo(x[i])M) 68 changes the value of the function.

Theorem 48. Let L C E* be a regular language. Define the language SL {0, 1} of all sensitivity masks as follows.

SL := {y E {0, 1}* : ]x E E* such that |x| = |yl and xi is sensitive in L iffy = 1}

Then, SL is regular.

Proof. This is an exercise in using non-determinism, but since there are a few levels, let us spell out the details. First, let us show that the following language is regular:

S := {(Xi, yi) (Xn, yn) E (E x{0, 1})* : y is the sensitivity mask of x}.

How do we go about proving a string is not in SL? There are two possibilities:

* Find some i such that yj = 0, but changing xi flips membership in the language.

e Find some i such that yj = 1, but all possible changes to xi fail to flip member- ship in the language.

Each of these can be checked by a co-non-deterministic finite automaton. In the first case, we guess a position i where y, = 0, guess the new value of xi, simulate the DFA on both paths and verify that they produce different outcomes. In the second case, we also guess a position i where yj = 1, but now simulate the original DFA for all possible values of xi and ensure that they are the same. Since there is a coNFA for S, we get that SL is regular.

Now use non-determinism to reduce S' to SL: a stringyi ... yn C {0,1}* is in SL if we can guess the accompanying x1. .. E E* that puts it in SL. We conclude that there is an NFA accepting SL, and therefore SL is regular. L

Corollary 49. Let L be a flat regular language. The sensitivity of L is either0(1) or Q(n).

Proof. Consider the language of sensitivity masks SL as defined in Theorem 48. Notice that for a given length n, the sensitivity of L is exactly the weight of the maximum

69 Hamming weight string in SL. Suppose the sensitivity is not 0(1). Therefore, for any k, there exists a string Yk E SL with Hamming weight at least k. Since SL is a regular language, it has some pumping length 19 p. We can pump down any block of p consecutive zero bits in Yk such that at least 1 fraction of the p remaining bits are sensitive (or n < kp). This implies that sensitivity is Q(n) for infinitely many n. We can also pump down arbitrary blocks of p bits to decrease the length, so we can make sure sensitivity is (n) for at least p1 fraction of n. Finally, since L is flat, congruence classes contain strings of all length, which allows us to replace some substring of a Q(n) sensitive string with a slightly longer or shorter string. In this way, we can construct strings of sensitivity Q(n) for all n. E

Corollary 50. Let L be a flat regular language. The block sensitivity of L is either 0(1) orQ(n).

Proof. By Corollary 49, sensitivity is either 0(1) or Q(n). If sensitivity is 0(1) then block sensitivity and all other measures are 0(1) by Corollary 4. However, s(f) < bs(f), so if sensitivity isQ(n) then block sensitivity is Q(n). It follows that block sensitivity is either 0(1) or Q(n). El

It follows that the certificate complexity, deterministic complexity, randomized zero-error complexity, randomized complexity are also 0(1) or Q(n).

Theorem 51. Let L be a flat regularlanguage. The approximate degree of L is either O(1) or Q(vfn).

Proof. Consider block sensitivity. If block sensitivity is 0(1), then so are approximate degree and quantum query complexity by Corollary 4. If block sensitivity is Q(n), then we recall that deg(L) = Q(V/bs(L)) = (V/) by Theorem 2. Furthermore, ldeg(L) < Q(L) by Proposition 1, so quantum query complexity is also (v-). El

It follows that Q(L) is either 0(1) or Q(4).

1 9 Let L C E* be a regular language. There exists a finite pumping length p > 0 such that for all strings w E E* with |w > p there exists a decomposition w = xyz for x, y, z E E* and ly| > 0, w c L<- (Vi > 0, xy'z E L). This (or a similar statement) is called the "pumpinglemma" since the substring y may be repeated ("pumped") arbitrarily many times.

70 3.7 Lower Bounds

In this section, we will show matching lower bounds for the algorithms described in Section 3.5. In fact, since approximate degree is a lower bound for quantum query complexity, it suffices to prove'lower bounds for approximate degree, which is what we will do. Let us start with simplest case-lower bounds on non-degenerate languages.

Proposition 52. Let L be a flat regular language. If L is not degenerate, then deg(L) > 1.

Proof. Let p: E* -+ ML be the homomorphism onto the syntactic monoid of L such that L = { 1 (s) : s c S C ML}. Since L is not degenerate, there exists x, y E E+ such that p(x) =, (y). By the definition of the syntactic congruence, there exist strings u, v E E+ such that u E L but v V L. Since L is flat, each set o (P(u)) and so(p(v)) contains strings of all positive lengths. Therefore, any polynomial approximating the membership function for L cannot be constant. l

For the trivial languages, we first prove a theorem about their deterministic com- plexity. Recall that a deterministic query algorithm is a decision tree: on input

XCEn , the algorithm queries a particular index of the input. Based on the value of x at that index (one of finitely many possible choices), the algorithm either deduces the membership of x in L or decides to query a different index. The process is re- peated until the algorithm can decide membership. The height of the decision tree is the deterministic query complexity of L. In particular, if the deterministic query complexity of L is constant, then the height of the decision tree is constant, which implies that the entire tree has constant size (since each node in the tree has constant fan-out).

Theorem 53. Let L be a flat regular language. If L is not trivial, then D(L) =w(1).

Proof. We will argue the contrapositive. Suppose D(L) 0O(1). That is, for any input x E E', the deterministic algorithm queries a constant-size set of indices to determine membership. Clearly, as n increases, there will be large gaps between the indices which are queried. Since L is flat we have two important consequences: first,

71 any nonempty string which is not queried can correspond to any non-identity element of the syntactic monoid; second, we may assume that any gap of nonzero size can be expanded or contracted to any other nonzero size. It follows that we can move the queries made by the deterministic algorithm (provided that we do not create or destroy any gaps) without changing its correctness. Therefore, let us move all the queries as close to the start or end of the input as possible, maintaining 1-symbol gaps where necessary. Since there are only constantly many queries, there exists a deterministic algorithm which determines membership of x by querying c symbols from the start and end of x for some constant c. Let p be the homomorphism from E* onto the syntactic monoid ML such that L = {p-'(s) s E S C ML. For x E E* of length greater than 2c, write x = uwv such that ul= |vl = c. We have that membership of x in L is determined completely by prefix u and suffix v. We claim that this implies that p(uwv)= s(uw'v) for all w E E*. For suppose that p(uwv) # (uw'v). By the definition of the syntactic congruence, there exists strings a E E* and b C E* such that auwvb E L and auw'vb 0 L (or vice versa). Since laul > 0 and |bvo > 0, there exists strings au, b E Ec such that W(au) = V(au) and p(b,) = W(bv). However, auwbv E L and auw'bv g L contradicts the fact that membership in L is determined by a prefix and suffix of length at most c. In particular, this holds when w' = E.

Let us now show that V(E+) is a rectangular band. Let x, y, z C E+ be nonempty strings, and let xz' be strings of length c such that W(x) = W(x') and W(z) = V(z'). We have that p(xyz) = V(x'yz') = p(x'z') = p(xz).

Finally, we show that V(E+) is idempotent. Let x E E+. By flatness, we have that V(x) = p(awb) for strings a, b E EC. Therefore, we have

p(x) = p(awb) = p(ab) = p(abab) = V(xx), where the middle two equalities come from substituting w = e and w = ba, respec-

72 tively. 11

Corollary 54. Let L be a flat regular language. If L is not trivial, then deg(L) =

Proof. The corollary follows almost immediately from Theorems 51 and 53. Suppose deg(L) = o(V'i). We wish to show that L is trivial. If D(L) = 0(1), then we are done by Theorem 53. If D(L) = w(1), then approximate degree is also non-constant by Corollary 4. But ifdeg(L) is non-constant, then we must have deg(L) = Q(d) by Theorem 51. l

Finally, we turn our attention to the star-free languages. Let MOD, be the lan- guage of bit strings whose Hamming weight is 0 modulo some fixed p > 2. We need the following theorem:

Theorem 55 (Beals et al. [18]). deg(MOD) = Q(n) for any p > 2.

Recall that star-free languages are aperiodic. Therefore, if a language is not star free, then it should exhibits some periodicity in which we can embed some MOD language. We appeal to this intuition in the following theorem.

Theorem 56. Let L be a flat regular language. If L is not star free, then deg(L) = Q(n).

Proof. Let ML be the syntactic monoid of L, and let : E* - ML be the accompa- nying surjection onto ML. We assume ML is not aperiodic, so there exists an element s E ML such that s" , sn+1 for any n. Since ML is finite, we have s" = sn+, for some p and n, and therefore for all sufficiently large n. Let us take the minimal p so that s' $, sn+' for 0 < i < p.

Since the language is flat, there exist ao, ai, b E E such that p(ao) = sP, p(a1 ) = s and W(b) = sr. One might worry that if s" is equal to the identity, its only preimage is the empty string, as is sometimes true for flat languages. However, because 0(a")= p(ai)"= s", this is not the case. Given string x C {0, 1}m', observe that

p(axiax2 - axmb) = sX1+X2+---+xm+n,

73 since sn+P = s'. In other words, the monoid element associated with a,, ... axb is determined by the Hamming weight of x modulo p. Therefore, to decide membership of x in MOD,, it suffices to compute the monoid element for a, ... axmb in ML.

Finally, by the definition of syntactic congruence, any two monoid elements may be distinguished by prepending and appending fixed strings to the input, then testing membership in L. By flatness, we may take those strings to be length zero or one.

Thus, we can determine the monoid element by a constant number of queries to L, and therefore compute the Hamming weight modulo p. It follows that membership in L has approximate degree Q(n) by Theorem 55.

3.8 Context-Free Languages

In this section we will prove that the context-free languages-a slightly larger class of languages containing the regular languages-have query complexities outside the trichotomy. The context-free languages are most often defined either through context- free grammars or through pushdown automata (PDA). It will be easier for us to work with the PDA definition in this section.

One can think of a PDA as a nondeterministic Turing machine which has a read- once input tape and read-write stack. Although the addition of the stack allows PDA to recognize many languages which are not regular, they are still limited in many senses. For instance, context-free languages exhibit a pumping lemma much like the regular languages, and the membership problem is decidable. For a more formal definition we refer the reader to introductory texts [79].

As a simple example, consider the Dyck language over alphabetE= {(,)}, which consists of all words with balanced parentheses. We can show this language is context free by constructing a PDA for it. The idea is that the stack contains all of the unmatched left parentheses. When a new parenthesis is read from the input tape, the PDA pushes it onto the stack if it is a left parenthesis or pops an item from the stack if there is a right parenthesis. The PDA accepts if the stack is empty when the input is read entirely.

74 3.8.1 Context-free languages do not obey the trichotomy

In general, the easiest way to construct a language with arbitrary query complexity is by padding a hard language. The procedure is simple: take a problem with Q(n) query complexity, e.g., parity, and make the input string longer by adding (or padding) irrelevant symbols to the end. For instance, computing the parity of the first 8(n2/ 3 ) bits and ignoring the rest will require 0(n2/ 3 ) queries.

Unfortunately, to create a context-free language with arbitrary query complexity, we cannot take such a direct approach. Context-free languages cannot simply count out some fraction of their input as the above example suggests. Instead, let us consider a general procedure for constructing a context-free language L C E* which has quantum query complexity 8(nc) for some c E [1/2, 1]. We construct L from the union of two context-free languages A and B. To test membership of someX E E* in

L, we first test whether or not x belongs to A. We always construct A in such a way that membership in A can be decided in O(V ) queries, usually through a simple

Grover search. 2 1 If x E A, then we are done. Otherwise, we can assume that x 0 A when testing membership in B. However, A is constructed such that x 0 A will imply that x has been "padded"-there is some special symbol in x such that the distance from that symbol to the beginning of the string is approximately ne. Therefore, if B is the language of all strings such that the prefix before the special symbol has even parity, then the query complexity of L = A U B is E(n').

Let us consider an example of such a language A C (E U {#})*. First, we enforce that every word in A begins and ends with #. Next, we say that x E A iff there is some substring #y# of x such that y E* and the length of y is not equal to the total number of #symbols in x. Notice that x 0 A implies that x = #Xi#X 2 # ... #Xk# where |xil ~ f. Furthermore, A is context free and the quantum query complexity of A is 8(/) by Grover search.

We will prove a theorem vastly generalizing this approach to create substrings of length n' for any c E [1/2, 1] which is limit computable. 2 ' A number c E R is limit

20In fact, the reason we cannot extend this procedure to other exponents c E (0, 1/2) is due to the fact that we will always incur this cost of Grover search. 2 1Since the theorem constructs a very contrived language, we note that natural problems can also

75 computable if there exists a Turing machine which on input n outputs some rational

number T(n) such that lim, T(n) = c. We will need two main technical lemmas, both of which define a language similar to A above. The first ensures that the input contains (as a substring) the total length of the input written in binary, and the second simulates arbitrary computation by a Turing machine.

Lemma 57. Let K {, 1, #1, #2, $}* be the language such that

" if x E K, then x ends with $y#i, and

" for all n > 6, there is an x C K ending in $y#,

where y is the binary representation of |x|. Then, K is context free, and Q(K)=

Proof. Let K1 be the language over E :={0, 1, #1, #2, $} containing all strings which

" start with#1#a or #2$#a,

" end with #1,

" match ((#1|#2)$*)*(0|1)*#1, and

" contain no substring of the form#a(01|$)i#b(O0|1$)j#c such that 2+) 1 where a, b, c E {1, 2} and i, j are integers.

Let us show that K1 is context free as a first step to constructing K. We claim there is a context-free language which accepts strings containing a substring of the

form #a(0|1$)i#(0|1|$)j#c. Indeed, it is easy to describe the pushdown automaton:

nondeterministically guess the positionof #a, push symbols onto the stack as we read the input, read#b and pop symbols off the stack at a ratio of 1 stack symbol for each be embedded into context-free languages, e.g., the element distinctness problem. Given a list of integers x 1,...,x, such that eachxi E {1,... rnm}, the element distinctness problem asks if there exists i # j such that xi = xj. Since m > n, we write eachxi as a string over{0, 1}, and delimit the xis by 2's. The languageCF-EDconsists of grammar rules: S -> A2B2A, B -+ OBO I 1BI I 2A2, A - 0 11216 AA. CF-ED accepts strings where some xi is the reverse of some xi. Thus, if all xi are represented by palindromes, CF-ED is at least as hard as element distinctness. On the other hand, it is possible to adapt the algorithm 0(n 2/ 3 ) quantum query algorithm for element distinctness to CF-ED (with a log factor loss)[12, 63].

76 2b input symbols. With some attention to detail, the PDA will be able to decide whether b= , and accept if it does not. Since the first three conditions define a regular language, the entire language K1 is context free.

The conditions above imply that any string z E K1 is of the form

#ao$*#a* ... $*#ak-_$*be_1 ... bibo# 1, where ao,. .., ak_1 E {1, 2} and bo ,. .. , be1 E {0, 1}. Let di be the distance (measured by the difference in indices) between#a, and#a+l, for all i = 0,..., k - 2. Let dk_1 be the distance from#a,1 to the final #1. Since#ao is the first symbol and #1 is the last, it follows that |z = 1 + E di.

Since strings in K 1 start with#1#a or #2$#a, we have do = ao. We also have a condition on any three consecutive#aj which translates into 2.ai =i aj+i A straight- forward induction tells us that di = ai2' for all i, which means

k-I |z| = 1 + E ai2. (3.1) i=O

On the other hand, we want bei- ... bo to be the binary number representation of Izi. That is, f-1 |z|= Ebi2. (3.2) i=O By combining (3.1) and (3.2), and considering the result modulo powers of 2, one can show that bi = ai - 1 for all i, and bk= 1. Let K2 be the language that accepts if the ais and bis match up as described above. Clearly K2 is context free because a PDA can easily push the ais onto the stack as it reads them, then pop off and compare as it reads the bis.

We define K:= K 1 nK2 and note that K= K1 U K2 is context free as desired.

There are strings in K 1 of any length n > 2, but to be in K2, we also need the binary representation of n to fit in dk1 - 1. We claim the binary representations fits for all n > 6, so there exist strings of those lengths in K.

77 Finally, we can decide whether a string z of length n is in K in O(f) time.

First, we check if z E K1 , since the length fixes the positions of #o through #,_ in the string. We can determine these positions ao,..., akI E {1, 2} from the length of the string, and check those positions in O(log n) queries. If z is in K1 then we check whether bits bk ... be at the end match the length inO(log n) queries. Finally, we check that all remaining positions are $'s in O(Vh) quantum queries by Grover search.

Lemma 58 (folklore [79]). Let N be a k-tape nondeterministic Turing machine.

Define language KN which contains strings of the form

C1#C2 #C3 ... C _1#Cn where C1 is a valid start configuration of N, Cn is a valid accepting configuration, and Ci to Cn1 is a valid transition. Then, KN is context free, and Q(KN) = 0(\).

Proof. The proof of this theorem follows from the observation that computation is local. Let us sketch the proof. First, we need to fix the encoding of the configuration of a Turing machine. Many different schemes suffice, but let us assume that the encoding consists of the k tapes laid out on top of each other so that each symbol of the encoding includes a k-tuple of the values of the k tapes. We also stipulate that one symbol on each tape is marked with the head and the current state (we can simply expand our alphabet to include these possibilities as well). To verify that one configuration follows properly from the next, the push-down automaton for language

KN nondeterministically guesses the location on one of the tapes where a violation might occur. It can count to the same position in the tape in the next configuration by pushing all remaining tape symbols onto the stack until the next # symbol. At this point, it can pop these symbols to count back to the same location (this is why each configuration is the reverse of the previous one). All that remains is to check a finite set of conditions. El

We are now ready to construct context-free languages that have quantum query complexities corresponding to limit computable exponents. Although there are several

78 technical details to check in the proof, the central idea is straightforward: LetX E En be the input. If x is not in the language defined in Lemma 57, then the n will be written in binary on the string. If x is also not in the language defined in Lemma 58, then the input will contain a correct simulation of a Turing machine limit computing some query exponent c E [1/2, 1] and verifying that a # symbol has been placed at position n'. Using Grover search, we can verify that the membership in these languages in O(V/) time. If x is in neither language, then computing parity on the prefix of the input (up to the # symbol) takes time 6(n'), from which the theorem follows.

Theorem 18. For all limit computable c E [1/2,1], there exists a context-free lan- guage L such that Q(L) = O(nc+) and Q(L) = Q(nc-6) for all e > 0. Furthermore, if an additivec-approximation to c is computable in 20(1/) time, then Q(L) = 8(nc). In particular, any algebraic c E [1/2,1] has this property.

Proof. Let M be the Turing machine computing c in the limit. That is, on input lk it outputs a rational approximation ck such that limk+ck = c. Let nk be the size of the computation history when computing Ck. Without loss of generality, we assume nk is strictly increasing with k. We also assume that the computation history for computing nk from n and ck (both written in binary) is of size at most n for all n ;> nk .2

Our goal is to construct a context-free language which accepts any input not satisfying the following array of conditions. Note that each condition may require a complicated witness to verify, perhaps as long as the input itself. Therefore, we let the alphabet be tuples E = E1 x E2 X --- X Em so that there are m independent tracks to work with. Suppose the input has length n, and consider the following six tracks.

1. The first track contains bits and a $ symbol, hopefully at position Fnck lin the

string.

22Exponentiation will run in polynomial time, which is actually polylog(n) since n is written in binary, plus the description size of ck. In fact, we can be even sloppier and approximate nk with the first ck fraction of the bits of n, and still be accurate up to constant factors.

79 2. Some Turing machine M limit computes c C [1/2,1]. The second track holds a

valid computation history of M computing some ck from input 1'.

3. The third track contains an incomplete execution of M on 1 k+1. If the string is

long enough to complete the computation of Ck+l, then Ck is obsolete and should not be used.

4. The fourth track contains a binary number (and associated machinery, see Lemma 57) matching the length of the input.

5. The fifth track is the same as the fourth, except the number is the position of $ on track one.

6. The sixth track holds a Turing machine computation history which verifies that the position of $ is [nckl , based on the numbers from tracks 2 (Ck), 4 (1zI), and 5 (position of $).

We enforce these conditions with a corresponding array of context-free languages, which reject satisfying strings. The final language will be a union of these languages, so that it rejects a string if and only if all the conditions are satisfied. We have already seen most of the languages we need. For example, we want to accept if track two is not the computation history of the Turing machine M which computes ck, but we have seen how to construct such a language in Lemma 58. Simi- larly for tracks three and six, we can tweak this construction to accept on incomplete computation histories. On track four we want precisely the binary counter language in Lemma 57. Track five is the same thing concatenated with $E* to ignore symbols after $. Track one is actually just a regular language, E\(011)*$(0|1)*. Each language so far focuses on just one of the tracks, and we need a few "glue" languages to ensure the various tracks interact correctly. The first verifies that ck, n, and the position of $ (appearing on track two, track four, and track five respectively) match the strings in the input configuration on track six. A second glue language checks that if the starting configuration on track two was 1k, then the starting con- figuration on track three is 1 k+1, so that it computes ck+1. The third and final glue language checks that the $ on track one matches the $ on track five. We arrange for

80 all of the glue languages to accept strings which fail these checks, in keeping with the complemented behavior of the other languages.

Suppose we have a string of length n n < nk+1 which is rejected by all of the languages. It follows that all of the conditions are satisfied, so we can say a lot about the string. First, track two must computeCk for some k'

Ck'+1 for some k' > k, since for small k' it would have finished. But k' is the same in both cases (due to a glue language), and hence k' = k. Track four and five generate binary numbers for the length and position of the $ symbol which, by another glue language, are written on the input of track six, along with ck. Finally, the sixth track verifies that the position is indeed [nckl We have one final language, which accepts depending on the parity of the bits on the first track up to the $. If all of the other languages reject, then we have argued that $ is at position [nck] so computing the parity takes n k queries, up to constant factors. Checking all the other conditions takes O(Vr) queries, so the cost of computing the parity dominates because Ck >. It follows that the quantum query complexity is within a constant factor of nCk for n between n and nk+1. For any f > 0 we have Ic -- Ck I for sufficiently large k, and hence for sufficiently large n, the query complexity is Q(L) = O(nc+) and Q(nc).

Finally, we note that if theCkS converge sufficiently quickly (with respect to com- putation time, not k) then the query complexity is truly Q(L) = 0(nc). For example, suppose we have a Turing machine which spends 2 0(1/e) time to output an approxi- mation c' to c such that |c - c'| < E, for any c > 0. It does not matter whether the machine outputs a stream of better and better approximations, or takes e as input and outputs a sufficiently good approximation. Either way, we can construct a ma- chine which maps 1 k to Ck with a similar guarantee: the time to compute ck+1 is at most 20 (1/1) where ek = Ic - CkI. We claim this is enough to show Q(L) = e(nc). Our construction of L is such that the query complexity is (up to constant multi- plicative factors) nCk on the entire interval[nk, nk+1). For convenience, define a func- tion c(n): N -+ R such that c(n) = Ckiff n I[nk, nk+1). This means Q(L) = E(n(n)),

81 and taking logs gives log Q(L) - c(n) = O(1/log n). Recall nk+1 < 2/6k for some b log n_ (and all sufficiently large k) so we have Ic - Ck| = ek < b/lognk+1. It follows that c - c(n) < 0(1/log n). Together, this implies

log Q(L)_ log Q(L) a log n log n c) logn

for some a and for all sufficiently large n. It follows 2--nc < Q(L) 5 2anc, So we conclude that for sequences converging sufficiently quickly, Q(L) = e(n). For example, any algebraic number can be computed to 1/e precision in polylog(1/E) time using Newton's method from a suitable starting point. D

The converse of this theorem also holds.

Theorem 19. Let L be a context-free language such that limlog Q(L) log= nc. Then c is limit computable.

Proof. Suppose L is context free. Recall that given w E E*, the problem of computing membership of w in L is decidable [79]. Next, we observe that the quantum query complexity can be expressed as the solution (up to logarithmic factors) to a large semi- definite program [72]. That is, there exists a Turing machine which outputs Adv*(L) such that Q(L) = 5(Adv*(L)). Therefore, we can construct a Turing machine which outputs log(Adv+(L))/(login), and

Q(L)= lim log(Adv+(L)) =_iVilog 0(Q(L))= lm=cFmlog new logn n-+oo logn n-+oo login

Therefore, c is limit computable.

82 Chapter 4

The Complexity of the Permanent from Linear Optics

The permanent of a matrix has been a central fixture in computer science ever since Valiant showed how it could efficiently encode the number of satisfying solutions to classic NP-complete constraint satisfaction problems [90]. His theory led to the formalization of many counting classes in complexity theory, including #P. Indeed, the power of these counting classes was later demonstrated by Toda's celebrated theorem, which proved that every language in the could be computed in polynomial-time with only a single call to a #P oracle [84].

Let us recall the definition of the matrix permanent. Suppose A = (aj) is an n x n matrix over some field. The permanent of A is

n per(A) = ai,o(i) 0-ESn i=1 where Sn is the group of permutations of {1, 2, ... , n}. Compare this to the determi- nant of A: n det(A)= E sgn(o) ai,«(i)- o-ESn i=1

Since we can compute the determinant in polynomial time (in fact, in NC 2; see Berkowitz [22]), the apparent difference in complexity between the determinant and

83 permanent comes down to the cancellation of terms in the determinant [75, 91].

In his original proof, Valiant [90] casts the permanent in a combinatorial light, in terms of a directed graph rather than as a polynomial. Imagine the matrix A encodes the adjacency matrix of a weighted graph with vertices labeled {1,...,n}. Each permutation o on the vertices has a cycle decomposition, which partitions the vertices into a collection of cycles known as a cycle cover. The weight of a cycle cover is the product of the edge weights of the cycles (i.e., ]Jn 1 aJ ). Therefore, the permanent is the sum of the weights of all cycle covers of the graph. Equipped with this combinatorial interpretation of the permanent, Valiant constructs a graph by linking together different kinds of gadgets in such a way that some cycle covers correspond to solutions to a CNF formula, and the rest of the cycle covers cancel out. Valiant's groundbreaking proof, while impressive, is fairly opaque and full of com- plicated gadgets. A subsequent proof by Ben-Dor and Halevi [20] simplified the construction, while still relying on the cycle cover interpretation of the permanent. In 2009, Rudolph [73] noticed an important connection between quantum circuits and matrix permanents-a version of a correspondence we will use often in this pa- per. Rudolph cast the cycle cover arguments of Valiant into more physics-friendly language, which culminated in a direct proof that the amplitudes of a certain class of universal quantum circuits were proportional to the permanent. Had he pointed out that one could embed #P-hard problems into the amplitudes of a quantum circuit, then this would have constituted a semi-quantum proof that the permanent is #P- hard. Finally, in 2011, Aaronson [2] (independently from Rudolph) gave a completely self-contained and quantum linear optical proof that the permanent is #P-hard.

One must then ask, what is gained from converting Valiant's combinatorial proof to Aaronson's linear optical one? One advantage is pragmatic-much of the difficulty of arguments based on cycle cover gadgets is offloaded onto central, well-known the- orems in linear optics and quantum computation. We show that the linear optical approach has an even more important role in analyzing permanents of matrices with a global group structure. Such properties can be very difficult to handle in the "cy- cle cover model." For instance, the matrices which arise from Valiant's construction

84 may indeed be invertible, but this seems to be more accidental than intentional, and a proof of their invertibility appears nontrivial. Adapting such techniques to give hardness results for orthogonal matrices would be extraordinarily tedious. In con- trast, using the linear optical framework, we give proofs of hardness for many such matrices.

4.1 Results Overview

We refine Aaronson's linear optical proof technique and show that it can provide new #P-hardness results. First, let us formally define what we mean by #P-hardness. We say that the permanent is #P-hard for a class of matrices if all functions in #P can be efficiently computed with single-call access to an oracle which computes permanents of matrices in that class. That is, the permanent is hard for a function class A if, given an oracle ( for the permanent, AC FPO[']. Our main result is a linear optical proof that the permanent of a real orthogonal matrix is #P-hard. Consequently, the permanent of matrices in any of the classical Lie groups (e.g., invertible matrices, unitary matrices, symplectic matrices) is also #P-hard. Our approach also reveals a surprising connection between the hardness of the permanent of orthogonal matrices over finite fields and the characteristic of the field. First notice that in fields of characteristic 2, the permanent is equal to the deter- minant and is therefore efficiently computable. Over fields of characteristic 3, there exists an elaborate yet polynomial time algorithm of Kogan [61] that computes the (orthogonal) matrix permanent. We give the first explanation for why no equivalent algorithm was found for the remaining prime characteristics, establishing a sharp di- chotomy theorem: for fields of characteristic 2 or 3 there is an efficient procedure to compute orthogonal matrix permanents, and for all other primes p there exists a finite field 1 of characteristic p for which the permanent of an orthogonal matrix (over

'We prove that this field is F4, although in some cases F,2 or even F, will suffice. See Section 4.4 for more details.

85 that field) is as hard as counting the number of solutions to a CNF formula mod p.2 Furthermore, there exist infinitely many primes for which computing the permanent of an orthogonal matrix over Fp (i.e., modulo p) is hard. Finally, we give a polynomial interpolation argument showing that the permanent of a positive semidefinite matrix is #P-hard. This has an interesting consequence due to a recent connection between matrix permanents and experiments with thermal input states [30, 70]. In particular, the probability of a particular experimental outcome is proportional to a positive semidefinite matrix which depends on the temperatures of the thermal states. Our result implies that it is hard to compute such output probabilities exactly despite the fact that an efficient classical sampling algorithm exists [70].

4.1.1 Proof Outline

The main result concerning the #P-hardness of real orthogonal permanents follows from three major steps:

1. Construct a quantum circuit (over qubits) with the following property: If you could compute the probability of measuring the all-zeros state after the circuit has been applied to the all-zeros state, then you could calculate some #P-hard quantity. We must modify the original construction of Aaronson [2], so that all the gates used in this construction are real.

2. Use a modified version of the Knill, Laflamme, Milburn protocol [41] to con- struct a linear optical circuit which simulates the quantum circuit in the previous step. In particular, we modify the protocol to ensure that the linear optical cir- cuit starts and ends with one photon in every mode. Notice that this is distinct from Aaronson's approach [2] because we can no longer immediately use the dual-rail encoding of KLM. We build new postselected encoding and decoding gadgets to circumvent this problem.

2 Formally, this language is complete for the class ModpP. By Toda's theorem, we have that PH C BPPModpP. See Section 2.3 for a more precise exposition of such counting classes.

86 3. Use a known connection (first pointed out by Caianiello [29]) between the tran- sition amplitude of a linear optical circuit and the permanent of its underlying matrix. Because we paid special attention to the distribution of photons across the modes of our linear optical network in the previous step, the success prob- ability of the linear optical circuit is exactly the permanent of the underlying transition matrix. It is then simple to work backwards from this permanent to calculate our original #P-hard quantity.

The remainder of this chapter is organized as follows: Section 4.2 gives a brief in- troduction to the linear optical framework and its relationship to quantum computing over qubits. In Section 4.3, we use this framework to show that the permanent of a real orthogonal matrix is #P-hard. A careful analysis in Section 4.4 (and Section 4.7) extends these gadgets to finite fields.3 In Section 4.5, we explore other matrix classes, culminating in a proof that the permanent of a real special orthogonal symplectic involution is #P-hard. Finally, in Section 4.6 we use standard techniques to show that multiplicatively approximating the permanent remains #P-hard for many of the matrix groups previously discussed.

4.2 Linear Optics Primer

In this section we will introduce the so-called "boson sampling" model of quantum computation, which will make clear the connection between the dynamics of non- interacting bosons and the computation of matrix permanents [29, 871. The most promising practical implementations of this model are based on linear optics and use photons controlled by optical elements such as beamsplitters. We will use the term "linear optics" throughout, although any type of indistinguishable bosons would have the same dynamics. Let us first consider the dynamics of a single boson. At any point in time, it is in one of finitely many modes. As the system evolves, the particle moves from one of m

3 As is the case with Aaronson's proof, our real orthogonal construction also leads naturally to hardness of approximation results, which we discuss in Section 4.6.

87 initial modes to a superposition of m final modes according to a transition matrix of amplitudes. That is, there is an m x m unitary transition matrix U E Cmm, where Up is the amplitude of a particle going from mode i to mode j. The model becomes more complex when we consider a system of multiple particles evolving on the same modes according to the same transition matrix. Let us define states in our space of k bosons in what is called the Fock basis. A Fock state for a k-photon, n-mode system is of the form 1si, s2,... , Sm) where si > 0 is the number of bosons in the ith mode and E1 si = k. Therefore, the Hilbert space which is spanned by the Fock basis states 'm,k has dimension(k+ 1). Alternatively, one can think of m,k as the symmetrized subspace of (Cm)ok. We've included a full exposition of the Fock space in these terms at the end of this section. Let o be the transformation which lifts the unitary U to act on a multi-particle system. On a k-particle system, p(U) is a linear transformation from m,k to Pm,k. Let IS) = Isi, S2, ... , sm) be the Fock state describing the starting state of the system, and let IT) = |ti, t 2,..., tm) be the ending state. We have:

(Tjp(U)|S) = per(US,T) VI/si!. . .sm!ti! . tm! whereUS,Tis the matrix obtained by taking si copies of the ith row and tj copies of the column i in U for all i E{1, 2, ..., m}. We will refer to this formula as the p-transitionformula. 4

Notice that si+ --- + sm must equal t 1 + - + tm in order forUS,Tto be square. This expresses the physical principle that photons are not created or destroyed in the experiment. For example, suppose U is the Hadamard gate and that we wish to apply U to two modes each with a single photon. That is, U = -( 11 ) and IS) = I1, 1). Since the number of photons must be conserved, the resulting state of the system is in some linear combination of 12, 0), 11, 1), and10, 2). We calculate these amplitudes explicitly below:

4 Once again, we refer readers, especially non-physicists, to Section 4.2.1 for a description of the

88 IT) |2, 0) 1,1) |0, 2)

Us7r Vii) (1 - 1i) -,f--1 per(Us,T) 1 0 -1 (Tko(U)|S) 1/y,2 0 -1/V

Therefore, when we apply Hadamard in a linear optical circuit to the state 11,1) we get the state12,0)-0,2) Indeed, we have derived the famous Hong-Ou-Mandel

effect-the photons are noninteracting, yet the final state is clearly highly entangled

[54]. Finally, we note that p expresses the fact that linear optical systems are reversible and can be composed together. This behavior is captured by the following theorem:

Theorem 59 (see Facts 62 and 63 in Section 4.2.1). The map p is a group homo-

morphism. Furthermore, if U G C"' "is unitary, then p(U) is unitary.

We now state a landmark result in linear optics, which connects the dynamics of a linear optical system with those of a traditional quantum circuit over qubits. Define

|I) = 10,1, ... , 0, 1), the Fock state with a photon in every other mode.

Theorem 60 (Knill, Laflamme, and Milburn [41]). Postselected linear optical cir-

cuits are universal for quantum computation. Formally, given a quantum circuit Q

consisting of CSIGN' and single-qubit gates, there exists a linear optical network U

constructible in polynomial time such that

1 (I40(U)|I)=4( - -( -.0|Q|0 -.-. 0), where 1F is the number of CSIGN gates in Q.

We will refer to the construction of the linear optical network U from Q in The- orem 60 as the KLM protocol. It will be helpful to give some idea of its proof here.

First, each qubit of Q is encoded in two modes of U in the classic dual-rail encoding.

5The CSIGN gate, also often referred to as a controlled-Z gate, is the two-qubit operation which applies a minus phase when both of its inputs are 1. That is, CSIGNlxlx 2) = (-1)XX2X 1 X 2) for X1 , x2 C {0, 1}. It is well-known that CSIGN and single-qubit gates are universal for quantum computation [66].

89 That is, the qubit state |0) is encoded by the Fock state 10,1) and the state 11) is encoded by the Fock state |1, 0). Now suppose G is a single-qubit gate in Q. Using the p-transition formula, it is not hard to see that applying G to the corresponding pair of dual-rail modes in the linear optical circuit implements the correct single-qubit unitary. Applying a CSIGN gate is trickier. The KLM protocol builds the CSIGN gate from a simpler NS1 gate, which flips the sign of a single mode if it has 2 photons and does nothing when the

mode has 0 or 1 photon. Using two NS1 gates one can construct a CSIGN gate:

(i, 0) - NS1 CSIGN H H

(j,0) - NS1

Figure 4-1: Generating CSIGN from H and NS1 [41].

Unfortunately, the NS 1 gate cannot be implemented with a straightforward linear optical circuit. Therefore, some additional resource is required. The original KLM protocol uses adaptive measurements, that is, the ability to measure in the Fock basis in the middle of a linear optical computation and adjust the remaining sequence of gates if necessary. Intuitively, using adaptive measurements one can apply some

-transformation and then measure a subset of the modes to "check" if the NS1 gate was applied. For simplicity, however, we will assume we have a stronger resource-namely, postselection-so we can assume the measurements always yield the most convenient outcome. Putting the above parts together completes the proof Theorem 60.

4.2.1 Linear optics as a symmetric subspace

The Fock space

optical states with multiple photons are described by the symmetric tensor. That is,

90 for V1i V2,..., Vk E Cm, let

vI ( V2 O ... OVk 0vY(1)Vo(2) 0 ... 0 Vo(k) oESk be their symmetric tensor. Notice that the symmetric tensor is invariant under per- mutations; that isviv2 0 ... V = V(i) 0 Vo( 2 ) o ... V,(k) for any aE Sk. This captures the physical intuition that the photons are indistinguishable. We can extend the usual inner product to the symmetric setting:

(vi 0 ... Vki W10 ... 0Wk)= ()2 (V(l)w()... (vO(k)Wp(k) OpESk

(Vi, Wp(i)) -.•. (Vk, Wp(k)) PeSk 1 = per((vi, wj))i,j

We are now ready to define an orthonormal basis for (m,k. Let ei,... em be the standard basis for Cm, where ej represents a photon in mode i. The basis vectors for 1 ,kwill be k-fold symmetric tensors of the ej vectors. Let vi 0 ... oVkbe one such symmetric tensor with vj C {ei, .. . , em}. Let si = |{vj I vj = ej}I; that is, there are si photons in mode i. We will denote the corresponding basis vector in Dm,k as

si,... , Sm). Formally,

8|si,...,sm)= (VieG)V2 .. vk). I~si s2-' = ' sm

Notice that if we specify the symmetric tensor by the si in this way, we lose the relative ordering of the elementsv 1, ... , ,kRecall, however, that any choice will do since the symmetric tensor is invariant under permutation.

Theorem 61. The elements \s1, ... , Sm) such that_'s = kform anorthonormal basis for (m,k.

Proof. First, it should be clear that every symmetrized basis vector of (Cm)k corre- sponds to some element |si,..., Sm) such that2 1si= k. We neednowonlyshow

91 orthonormality. For states si,... , sm) and ti, ... , tm), we have

(ti, ...., tm | S11, -.- ) sm) =(s!.--smt1--t (iD .. . G) k, Wi 0 . .. 0 Wk) \Si! ... sM!tJ ... tm

__ per((vi, wj))i,j dsi! ... sm!ti ... tm

Since the ei form an orthonormal basis for Cm, we have (vi, wj) = 1 when vi = wj and 0 otherwise. Therefore, if there exists i such that si # ti, then per((vi, wj))ij = 0.

Otherwise, the value of this permanent is equal to the number of permutations o (E Sk such that v,(j) = vj for all j. In other words, these permutations only permute the photons within each mode. Since there are si many photons in mode i, there are si! many permutations of photons in that mode. Therefore, per((vi, vj))i,5 = sS!s2! - Sm!, which completes the proof. El

Finally, we must describe the transformations of the space

Ak (Vio.. Vk) = Av,(1) 3 ... Ava(k) = W O.. . OWk, ek where wi = Avi E Cm. We get the following other important properties of p from this definition:

Fact 62. If U is unitary, then p(U) is unitary.

Proof. If U E U(m), then Uok E U(mk). El

Fact 63. p is a group homomorphism: (p(AB) = p(A)(B) for A, B E CMxM.

Proof. (AB)©k = A&k B9k.

Finally, we ready to state and prove the p-transition formula.

92 m Theorem 64. For Ac Cx , IS)= s,... , Sm) (m,k, and IT) = ti,... , tm) E

(bm,k, we have

(ti, . .. , tmI p (A)si .8.. ,sm) =e( ST V/si!. -!ti1! -.-. tm! providedZ((1>si = t2 = k.Let Asbe the matrix obtained by taking si copies of the ith row and tj copies of the column iin A for all i E {1, 2,...,m}.

Proof. By definition, we have

(ti, ... , tmIp(A)|si... ,SM)= k! - (vi 0 OvkAwl 0... 0Awk) Vsi! .- sm!ti -tm per((vi, Awj))i,j V/si! -. -- smiti -- .tm where vi, wE{ez,. . , em}, si = {w I wj = ei}, and t2 = {vj I vo = ei}. Notice that if vi = ek, then the ith row of the matrix ((vi, Awj))i,j corresponds to the kth row of A. Following this reasoning, we get that ((vi, Awj))i,g = AS,T E

4.3 Permanents of Real Orthogonal Matrices

The first class of matrices we consider are the real orthogonal matrices, that is, square matrices A C R""' with AAT = ATA = I. This section is devoted to proving the following theorem, which forms the basis for many of the remaining results in this chapter.

Theorem 65. The permanent of a real orthogonal matrix is #P-hard.

The orthogonal matrices form a group under composition, the real orthogonal group, usually denoted O(n,R). This is a subgroup of the unitary group, U(n,C), which is itself a subgroup of the general linear group GL(n,C). Notice then that the hardness result of Theorem 65 will carry over to unitary matrices and invertible matrices. 6

6 See Corollary 79 for a complete list of classical Lie groups for which our result generalizes.

93 Our result follows the outline of Aaronson's linear optical proof [2] that the perma-

nent is #P-hard. In particular, our result depends on the KLM construction [41], and

a subsequent improvement by Knill [60], which will happen to have several important

properties for our reduction.

Let us briefly summarize Aaronson's argument. Suppose we are given a classical

circuit C, and wish to compute Ac, the number of satisfying assignments minus the

number of unsatisfying assignments. Clearly, calculating Ac is a #P-hard problem. The first thing to notice is that there exists a simple quantum circuit Q such that the amplitude (0 ... 0IQI0 ... 0) is proportional to Ac. The KLM protocol of Theorem 60

implies that there exists a postselected linear optical experiment simulating Q. This

results in the following chain which relates Ac to a permanent.

per(Ur,i) = (I(U)II) oc (0... OIQIO .. 0) oc Ac.

Notice that Aaronson's result does not imply that the permanent of U E U(n, C)

is #P-hard since Ur is a submatrix of U. If, however, S) = T) = 1,... , 1), then

US,T = U so the analogous chain relates Ac directly to the permanent of U, which is

a complex unitary matrix. In fact, this is exactly what we will arrange by modifying the KLM protocol. Furthermore, we will be careful to use real matrices exclusively during all gadget constructions, which will result in U being real, finishing the proof of Theorem 65.

In the following subsections, we will focus on the exact details of the reduction and emphasize those points where our construction differs from that of Aaronson.

4.3.1 Constructing the Quantum Circuit

Let C :{0, 1}' - {0, 1} be a classical Boolean circuit of polynomial size and let

AC: (-1)c(x).

In this section, we prove the following:

94 Theorem 66. Given C, there exists a p(n)-qubit quantum circuit Q such that

_A (0|or'n)Q|0)0p(n)

where p(n) is some polynomial in n. Furthermore, Q can be constructed in polynomial time with a polynomial number of real single-qubit gates and CSIGN gates.

To prove the theorem, it will suffice to implement 0C, the standard oracle instan- tiation of C on n + 1 qubits. That is, Ocix, b) = Ix, b D C(x)) for all x E{0, 1} and b E {0, 1}. The circuit for Q is depicted below, where H is the Hadamard gate and Z= (' ) is the Pauli oz gate.

From this construction, we have

(0|p(n)Q|0)©p") = (-|((E I c (Ix)|-)) =A.

Therefore, to complete the proof, it suffices to construct Oc from CSIGN and single-qubit gates. For now let us assume we have access to the Toffoli gate as well. Since C is a classical Boolean function of polynomial complexity, Oc can be implemented with a polynomial number of Toffoli and NOT gates7 and a polynomial number of ancillas starting in the 10) state [86]. Let us describe, briefly, one way to construct Oc. Suppose we are given the circuit C as a network of polynomially many NAND gates. For each wire, with the exception of the input wires, we create an ancilla initially in state |0) and use the NOT gate to put it in state |1). For each NAND gate (in topological ordering, i.e., such that no gate is applied before its inputs have been computed), we apply a Toffoli gate

7Because we require that all ancillas start in the 10) state, we also need the NOT gate to create l) ancillas.

95 targeting the ancilla associated with the output wire, and controlled by the qubits associated with its input wires (whether they are the output of an earlier NAND gate, or an actual input). Hence, the target qubit is in state 11) unless both control qubits are in state 11), simulating a NAND gate. Once we have applied all the gates of C, the output of the function will exist in the final ancilla register. We can now apply the same sequence of gates (ignoring the final Toffoli gate) in reverse order, which returns all other ancillas and inputs to their original value. This completes the construction. Finally, we must construct the Toffoli gate from single-qubit gates and CSIGN gates. Unfortunately, Aaronson's proof [2] uses a classic construction of the Toffoli gate which uses complex single-qubit gates (see, for example, Nielsen and Chuang [66]). This will later give rise to linear optical circuits with complex matrix represen- tations as well.8 Therefore, we will restrict ourselves to CSIGN and real single-qubit gates in our construction of the Toffoli gate. Let us first define Ry(0) as the rotation by 6 about the Y-axis. That is, Ry (6) = cos(/2)I - i sin(/2)Y where Y is the Pauli -y matrix. For our purposes, we only require the following two matrices:

0 -1 Ry(7/4 2 + v/2 - /2 - v/- R (r= i -/2 2±V ) R 1/ -d/2 2 1 0

Lemma 67. There exists a circuit of CSIGN, Hadamard, and Ry(r/4) gates which implements a Toffoli gate exactly.

Proof. We construct the Toffoli gate from the CSIGN, Hadamard, and Ry(7r/4) gates in three steps:

1. Construct a controlled-controlled-Ry(T) gate (CC-R,(7r)) from CSIGN and Ry(wr/4) gates. CC-R,(7r) is a three-qubit gate that applies Ry(7r) to the

8 Actually, the proof of Aaronson [2] claims that the final linear optical matrix consists entirely of real-valued entries even though the matrices of the individual single-qubit gates have complex entries. In fact, the matrix does have complex entries, but our construction for Toffoli suffices to fix this error. 9Although it is known that the Toffoli gate and the set of real single-qubit gates suffice to densely generate the orthogonal matrices (i.e., O(2") for every n > 0) 176], it will turn out to be both simpler and necessary to have an exact decomposition. In particular, we will need an exact construction of the Toffoli gate in Section 4.4 where we discuss the computation of permanents in finite fields.

96 third qubit if the first two qubits are in the state |11). Notice that CC-RY(7) is already a kind of "poor man's" Toffoli gate. If it were not for the minus sign in the Ry(7r) gate, we would be done. The construction is given in Figure 4-2. Observe that if either of the two control qubits is zero, then any CNOT gate controlled by that qubit can be ignored. The remaining gates will clearly cancel to the identity. Furthermore, if the two control qubits are in the state |11), then on the last qubit, we apply the operation

X Ry(7r/4)-X Ry (ir/4)X Ry (7r/4) -X Ry (7r/4).

Since X Ry (7r/4) 1 X = Ry (7r/4),

X Ry(7r/4)- 1 X Ry(7r/4)X Ry(7r/4)- 1 X Ry(r/4) = Ry(r/4) 4 = Ry (7r).

Notice that this construction uses CNOT gates, but observe that a CNOT is a CSIGN gate conjugated by the Hadamard gate:

(I & H) CSIGN(I @ H) = CNOT.

--Ry(7r/4) Ry(7r/4)- 1 --- Ry(ir/4) Ry(wr/4)- 1 4-- -Ry(ir) -

Figure 4-2: Generating CC-Ry(wr) from the CNOT and Ry(r/4) gates.

2. Construct a non-affine reversible gate from CSIGN and CC-R(7r) gates. By classical, we simply mean that the gate maps each computational basis state to another computational basis state (i.e., states of the form |x) for x E{0, 1}"). If this transformation is non-affine, then it suffices to generate Toffoli (perhaps with some additional ancilla qubits) by Aaronson et al. [6]. The construction is shown in Figure 4-3.

3. Use the non-affine gate to generate Toffoli. We give an explicit construc-

97 Ry(7r)

Ry(ir)

Figure 4-3: Generating non-affine classical gate from CC-R( 7r) and CSIGN.

tion in Figure 4-4. Notice that the fourth qubit is an ancillary qubit starting in the 10) state.10

ID-E E) Figure 4-4: Generating Toffoli gate from non-affine gate in Figure 4-3.

LI

This completes the proof of Theorem 66.

4.3.2 Postselected Linear Optical Gadgets

We will construct a postselected linear optical circuit L which will simulate the qubit circuit Q on the all zeros input via a modified version of the KLM protocol. The following chain of relations will hold:11

per(L) = (1, ... ,1 (L)1, ... ,1) oC (0 ... 0Q0 ... 0) oc Ac.

The first step was to convert from a classical circuit to a quantum circuit. Below we formalize the second step: converting from a quantum circuit to a linear optical circuit.

10Indeed, this ancillary qubit is necessary because the non-affine gate in Figure 4-3 is an even permutation and the Toffoli gate is an odd permutation on three bits. "To clarify, 10 ...0) is a tensor product of qubits in the state 10) and 1, ... ,1) is a Fock state with 1 photon in every mode.

98 Theorem 68. Given an n-qubit quantum circuit Q with a polynomial number of CSIGN and real single-qubit gates, there exists a linear optical circuit L E O(4n+ 21,R) such that

r (010"Q10)*n, where F is the number of CSIGN gates in Q. Furthermore, L can be computed in time polynomial in n.

We now give an explicit construction of L using the original KLM protocol, subse- quent improvements by Knill 60], and a new gadget unique to our problem. First, let us recall our main issue with using the original KLM protocol: to prove that orthog- onal matrices are #P-hard, we must have that all modes start and end with exactly one photon. There are two instances in which the original KLM protocol requires a mode to be empty at the beginning and end of the computation. First, the NS1 gate postselects on the Fock state 10,1), and second, KLM protocol works in a dual-rail encoding. Therefore, half of the modes in the original KLM protocol start and end empty.

To overcome the first obstacle, we appeal to subsequent work of Knill [60], in which the NS 1 gadget construction for CSIGN is replaced by a single 4-mode gadget V, which directly implements CSIGN with two modes postselected in state |1,1). From the matrix gadget

-2 2

1 2 -2V 2 V =_ 3dv _ 6 + 2vf 6 - 2VV - v/3 + vf6 \3 - V6 - /6 - 2vr6 -l/6 + 2 v'-6 - v/3 - v5_ - yv3 + v6-)

99 we can directly calculate the transition amplitudes of the circuit:

(0, 0, 1, 1|p(V)0, 0, 1, 1) = (0,1,1,p(V)1,0,1,1)= 0

(0, 1, 1, 1|k(V)0,1, 1,1) = (1,0,1,1 (V)0, 1,1,1) = 0

(1, 0, 1, 1|p0(V)|I1, 0, 1, 1) = 1,V 2 (2, 0, 1, 1|po(V)|1, 1, 1, 1) = 0

(1, 1, 1, 1|p(V)|1, 1,1, 1) = W2- (0, 2, 1, 1| p(V)|1, 1, 1, 1) = 0

We now argue that these transition amplitudes suffice to generate a postselected CSIGN. Consider the linear optical circuit depicted in Figure 4-5: the first two inputs of the V gadget are applied to the dual rail modes which contain a photon whenever their corresponding input qubits of the CSIGN gate are in state |1); the next two modes are postselected in the 11,1) state. First, because we postselect on the final two modes ending in the state 11, 1), we only need to consider those transitions for which those two modes end in that state. Secondly, because we use "fresh" ancillary modes for every CSIGN gate, we can always assume that those two modes start in the 11, 1) state. This already vastly reduces the number of cases we must consider.

CSIGN 1) V - I|) |1) - 1) Figure 4-5: Applying a postselected V gadget to generate CSIGN.

Finally, we wish to know what will happen when the first two modes start in the states |0, 0), |0, 1),11, 0), and 11, 1). Our construction will ensure that there is never more than one photon per mode representing one of the dual-rail encoded qubits. For instance, the transition amplitudes of V show that whenever the first two modes of the circuit each start with a photon, there is 0 probability (after postselection) that those photons transition to a state in which one of those modes contains 2 photons and the other contains no photons. We find that all other amplitudes behave exactly as we would expect for CSIGN. Since each of the acceptable transitions (e.g. from the state 10, 1) to the state 10, 1))

100 has equal magnitude, we only have left to check that V flips the sign of the state whenever the input modes are both in the 11) state, which is indeed the case. Im- portantly, because W is a homomorphism, we can analyze each such gate separately. Therefore, using the above we can now construct a linear optical circuit where all of our postselected modes for CSIGN start and end with exactly one photon. We now turn our attention to the dual-rail encoding. Instead of changing the dual-rail encoding of the KLM protocol directly, we will start with one photon in every mode and apply a linear optical gadget to convert to a dual-rail encoding. Of course, the number of photons in the circuit must be conserved, so we will dump these extra photons into n modes separate from the modes of the dual-rail encoding. Specifically, each logical qubit is our scheme is initially represented by four modes in the state 11, 1, 1, 1). We construct a gadget that moves a photon from the first mode to the third mode, postselecting on a single photon in the last mode. That is, under postselection, we get the transition

|1, 1, 1, 1) -+ 10, 1, 2, 1), where the first two modes now represent a logical 0) qubit in the dual-rail encoding, the third mode stores the extra photon (which we will reclaim later), and the last mode is necessary for postselection. We call the gadget for this task the encoding gadget E, and it is applied to the first, third, and fourth mode of the above state. The matrix 1 2 for E is

E= 0 v/ v/J from which we get the following transition amplitudes:

(1,1,l1o(E)|1, 1) = 0, (2, 0, 1|o(E)|1,1, 1) = 0, (0, 2, 1|p(E)1, 1, 1)=

2 To find E, we first define a set of constraints on transition amplitudes. The following equations must hold for this particular encoding gadget to exist: (1, 1, 1|p(E)11, 1, 1) = 0, (2, 0, 1| (E)11, 1, 1) = 0, (0, 2, 1|p(E)11, 1, 1) # 0. That is, starting from the state1l, 1, 1), there is some nonzero amplitude on the state 10, 2,1) and zero amplitude on the states 11,1,1) and 12,0,1). We then solve these constraints using MATHEMATICA.

101 After applying the encoding gadget to each logical qubit, we can implement the KLM protocol as previously discussed.s Therefore, the relevant amplitude in the computation of Q is now proportional to amplitude of the Fock state which has n groups of modes in the state |0, 1, 2, 1) and 21 modes in the state 11). Because we want to return to a state which has one photon in every mode, we must reverse the encoding step. 14 For this purpose, we construct a decoding gadget D, which will not require any extra postselected modes. We apply the gadget to the second and third modes of the logical qubit such that the two photons in the third mode split with some nonzero probability. The matrix for D is

D = D ( 1 -1) from which the transition condition (1, 1| p(D)|0, 2)= -1/ V follows. Nearly any two-mode linear optical gate would suffice here, but D, the familiar Hadamard gate, maximizes the norm of the amplitude on state 11, 1). If the logical qubit is in state |1), then D is applied to the three-photon state |1, 2). Therefore, the resulting amplitude on the two-photon state 11, 1) is zero by conservativity. To complete the proof of the theorem, let the linear optical circuit L simply be the composition of the encoding gadget, the KLM scheme, and the decoding gadget.

4.3.3 Main Result

We are finally ready to prove the Theorem 65, which we restate formally below:

Theorem 65. The permanent of a real orthogonal matrix is #P-hard. Specifically,

13One might wonder why we cannot simply apply the encoding gadget to the entire input, thus circumventing the need to use Knill's more complicated V gadget to implement CSIGN. Examining Theorem 60 carefully, we see that all the postselection actually happens at the end of the compu- tation. One might be concerned that once we measured the state 10,1) to implement NS 1, those modes would remain in that state. Nevertheless, it is possible to compose the gadgets in such a way to allow for postselection on 10) while maintaining that the desired amplitude is still on the |1, ... ,1) state. We omit such a design since V will turn out to have some nice properties, including its minimal usage of ancillary modes. 4 1 Notice that postselection was required for the encoding gadget, so it does not have a natural inverse.

102 given a polynomially sized Boolean circuit C, there exist integers a,b E Z and a real orthogonal matrix L over a finite Galois extension Q(a) (where a = V/2+v/2 +

,3 +V6) computable in polynomial time such that per(L) - 2a3bAc.

Proof. We reduce from the problem of calculating Ac for some polynomially sized

Boolean circuit C on n bits. By Theorem 66, we first construct the quantum circuit

Q from CSIGN and single-qubit gates such that (0p(n)QO)©p(n) = Ac/2". Let F be the number of CSIGN gates in Q. We then convert the qubit circuit Q to a linear optical circuit L on 4p(n)+2F modes using Theorem 68. Notice that we can assume without loss of generality that p(n) and r are both even since we can always add an extra qubit to the circuit Q and/or apply an extra CSIGN gate to the |00) state.

Combined with the fact that the output amplitudes of linear optical experiments can be described by permanents via the <-transition formula, we have the following chain of consequences

per(L)= (1,..., 1| (L)|1,..., 1)

)©PC") = (J~()P~l)( 1 ®(0|n)Q0

= 243"AC, where the last equality comes from the fact that F and p(n) are even. l

We now turn to the question of how to represent entries of the orthogonal matrix.

First, the problem is clearly still hard if we generalize the matrix to arbitrary algebraic numbers (say, represented implicitly with integer polynomials) instead of only Q(a).

More practically, the entries may be represented as floating point numbers, such that the matrix is only approximately orthogonal due to rounding error. To this end, we state without proof the following corollary:

Corollary 69. Given AE Qfx" such that ||A - A||o < 2c' for some orthogonal

103 matrix A E Rnx", the problem of computing per(A) to within additive 2` precision is #P-hard for some constant c.

4.4 Permanents over Finite Fields

Valiant's foundational work on #P is well-known, but his contemporary work on the relationship between the permanent and the class we now know as ModkP is less appreciated. In another 1979 paper [89], Valiant showed that the permanent modulo p is ModpP-complete, except when p = 2, in which case the permanent coincides with the determinant because 1 = -1 (mod 2).

Theorem 70 (Valiant [89]). The problem of computing per(M) mod p for a square matrix M Ffx" is ModP-complete for any prime p # 2 (and in NC 2otherwise).

As discussed in Section 2.3, ModpP-hardness provides evidence for the difficulty of computing the permanent, even modulo a prime. In particular, an efficient algorithm for the problem would collapse the polynomial hierarchy.

In the spirit of our result on real orthogonal matrices, we ask whether the per- manent is still hard for orthogonal matrices in a finite field. We are not the first to consider the problem; there is the following surprising theorem of Kogan [61] in 1996.

Theorem 71 (Kogan [61]). Let IF be any field of characteristic3. There is a polyno- mial time algorithm to compute the permanent of any orthogonal matrix overIF.

In other words, for orthogonal matrices, the permanent is easy to compute for fields of characteristic 2 (since it is easy in general), but it is also easy for fields of characteristic 3 (by a much more elaborate argument)! Could it be that the permanent is easy for all finite fields of some other characteristic? No, it turns out. Using the gadgets from Section 4.3, we prove a converse to Theorem 71.

Theorem 72. Let p # 2, 3 be a prime. There exists afinite field of characteristicp, namely FYp, such that the permanent of an orthogonal matrix in F 4is ModpP-hard.

104 We prove the theorem by carefully porting Theorem 65 to the finite field set- ting. Recall that Theorem 65 takes a circuit C and constructs a sequence of gadgets

G 1, ... , Gm such that per(Gi ... Gm) = 2a3bAc, (4.1) for some a, b E Z. In general, there is no way to convert such an identity on real numbers into one over finite fields, but all of our gadgets are built out of algebraic numbers. In particular, all of the entries are in some algebraic field extension Q(a) of the rationals, where a ~ 4.182173283 is the largest real root of irreducible polynomial

1 6 1 4 12 1 8 f (w) = - 40x + 572 - 3736x + 11782 - 178166 +11324w - 1832X2 + 1.

Each element in Q(a) can be written as a polynomial (of degree less than 16) in a over the rationals. In Section 4.7.1, we give explicit canonical representations for a set of numbers which generate (via addition, subtraction and multiplication, but not division) the entries of all our gadgets.

Each entry of a gadget Gi is a polynomial in a with rational coefficients, so observe that we can take a common denominator for the coefficients and write the entry as an integer polynomial divided by some positive integer. By the same token, we can take a common denominator for the entries of a gadget G, and write it as Gwhere G is a matrix over Z[a], and ki is a positive integer.

Now we would like to take Equation 4.1 modulo a prime p. In principle, we can pull ki, . . . , km out of the permanent, multiply through by Z = (ki ... km)n21al3bi to remove all fractions on both sides, and obtain an equation of the form

K per(01 --- m) = K'Ac, where K, K' are integers. Then the entire equation is over Z[a], so if we reduce all the coefficients modulo p, we get an equation over ]F,[a].

We show in Section 4.7.1 that for each gadget we use, the denominator ki may have prime divisors 2, 3, and 23, but no others. Hence, as long as p # 2, 3, 23 (and in

105 the case p = 23, there is an alternative representation we can use, see Section 4.7), we can divide through by Z, pull it back inside the permanent as the is, and distribute each 1 into the corresponding Oi. This gives

per(Gi ... Gm) = 2a3bAc (mod p), the equivalent of Equation 4.1, but over F[a]. In particular,G,... , Gm are now orthogonal matrices in F[a], and Ac has been reduced modulo p.

Note that F, [a] Fp, [x]/(f(x)) is a ring, not a field. If f(x) were irreducible modulo p then it would be a field, but this will never happen for our f. Consider the following lemma.

Lemma 73. Let q be a prime power. SupposeFq is the subfield of order q contained in the finite field Fq2. Then every element in Fq has a square root in Fq2.

Proof. Let a be an arbitrary element of Fq. By definition, a has a square root if the polynomial f(x) := x-2 a has a root. If f has a root in F then we are done.

Otherwise, f is irreducible, but has a root in F[x]/(f(x)) - Fq2.-

By Lemma 73, the square roots of 2 and 6 are in F,2, and therefore so are 2+ V and 3 + /. Then their square roots are in F4, so a = v/2 + v + /3 + 6 is in F4. All the other roots of f can be expressed as polynomials in a (see Section 4.7.2), so they are all in F, 4. It follows that f factors over F, as a product of irreducible polynomials, each of degree 1, 2, or 4. Suppose g is some irreducible factor of f. The ideal (g(x)) contains (f(x)), so there exists a ring homomorphism o from F, [x]/(f(x)) to F[x]/(g(x)). Note that

F ,[]/(g(x)) is a field because g(x) is irreducible over F, . Also, o fixes Fp, so we obtain

per(u(G 1) ... (Gm)) = or(per(Gi ... Gm)) = 2a3bAc as an equation over the field F, [x]/(g(x)). For each i, o-(Gi) is orthogonal in the field

106 F,[x]/(g(x)) as well:

-(Gi a-(G) = o-(GiG7 ) = -(I) I.

It follows that M:= o(Gi) - -(Gm) is orthogonal.

Depending on the degree of g, the field F,[]/(g(x)) is isomorphic to F,, Fp2 , or F,4. But Fp4 contains F, and F,2, so M can be lifted to a matrix over F,4. Given the permanent of M in Fp4, we can easily solve for Ac, so this completes the proof of Theorem 72.

Theorem 72 shows that for any prime p 4 2, 3 there is some finite field of charac- teristic p where computing permanents (of orthogonal matrices) is hard. In particular, p = 2 and p = 3 are the only cases where the permanent of an orthogonal matrix is easy to compute in every finite field of characteristic p, assuming the polynomial hierarchy does not collapse. We will now show that there are primes p for which this problem is hard in any field of characteristic p, by showing that it is hard to compute in Fp (which is contained in every other field of characteristic p).

Theorem 74. For all but finitely many primes p that split completely in Q(a), com- puting the permanent of an orthogonal matrix over F, is ModpP-complete. This is a sequence of primes with density - beginning

191,239,241,337,383,433,673,863,911,1103,1151,1249,1583,1871,1873,2017,...

Proof. Recall that in the proof of Theorem 72, if g is an irreducible factor of f, then the result applies over the field F,[x]/(g(x)) _ Fpeg,. We show that g is degree at most 4, but in special cases this can be improved. In particular, we want g to be degree 1 (i.e., a linear factor) for our orthogonal matrix to be over Fp.

First, observe that Q(a) is a Galois extension of Q. That is, every root of the minimal polynomial for a is in Q(a). See Section 4.7.2 for details. We apply Cheb- otarev's density theorem [88], which says that if K is a finite Galois extension of Q of degree n, then the density of primes which split completely in K is . We take

107 K = Q(a), a degree 16 extension of Q. For our purposes, a prime p splits completely if and only if the ideal (p) factors into 16 distinct maximal ideals in the ring of integers of Q(a). For all by finitely many such primes, 15 we also have that f (the minimal polynomial for a) factors into distinct linear terms modulo p by Dedekind's theorem. Furthermore, since Q(a) is a Galois extension, f will split into equal degree factors. Hence, if any factor is linear, then all the factors are linear. Therefore, according to Chebotarev's theorem, (1/16)th of all primes split com- pletely and yield the desired hardness result. We verified the list of primes given in the theorem computationally. 0

Note that as a consequence of the proof above, we can also prove a hardness result over F, 2 for 3/16 of all primes. We leave open how hard it is to compute the permanent of an orthogonal matrix over F, for the remaining 15/16 of all primes. Other linear optical gadgets can be used for CSIGN instead of V, resulting in different field extensions where different primes split. For instance, one can show that there exists an orthogonal gadget for KLM's NSi gate for which computing the permanent modulo 97 is hard. However, it seems impossible to design linear optical gadgets that do not involve 2 or 3 photons at a time, in which case writing down <(L) requires V2 and v5. By quadratic reciprocity, these square roots only exist if p =+ 1 (mod 24) (i.e., for about a quarter of all primes), so the remaining primes may require some other technique.

4.5 Expanding Permanent Hardness

In this section, we try to fill in some of the remaining landscape of matrix permanents. In particular, we will focus on the permanents of positive semidefinite (PSD) matrices and their connection to boson sampling. We will conclude by listing some matrix variants and their accompanying permanent complexities, many of which are simple

15 Actually, we can compute these primes explicitly as those that divide the index of Z[a] in the ring of integers of Q(a). For our choice of field, this number is 19985054955504338544361472 = 275232.

108 consequences of the reduction in Section 4.3.

4.5.1 Positive Semidefinite Matrix Permanents

Permanents of PSD matrices have recently become relevant to the expanding theory of boson sampling [70]. Namely, permanents of PSD matrices describe the output probabilities of a boson sampling experiment in which the input is a tensor product of thermal states. Suppose we have a thermal state with m modes. The ith mode of the system starts in a state of the form

00

p= (1-T) r,"n)(n| n=O where = (ni) /((ni) + 1) and (ni) is average number of photons one observes when measuring pi. In particular, notice that T > 0.

Let U be a unitary matrix representing the linear optical network applied to our thermal state. Define D to be the diagonal matrix withi, ... , Tm along the diagonal, and let A = UDU. SinceT > 0 for all i, A is PSD. We can calculate the probability of detecting one photon in each mode: 16

...,71) = mper(A) (1..,1

One might then reasonably ask, "how hard is it to compute such probabilities?" The following theorem answers that question in the exact case.

Theorem 75. The permanent of a positive-definite matrix in Z" " is #P-hard. This implies #P-hardness for the larger class of positive semidefinite matrices.

Proof. It is well-known that the permanent of a 0-1 matrix is #P-hard [90]. Therefore,

6 1 A similar formula arises for detecting 1 photon in each of k distinct modes and 0 photons in the remaining m - k modes.

109 let B E {0, 1}"' and consider the matrix

0 B AB= (B T 0

Since per(B) > 0, we have per(B) = Fper(AB). Also observe that AT= AB, so AB is Hermitian, that is, diagonalizable with real eigenvalues. Furthermore, since B is a

0-1 matrix, its spectral radius is at most 2n. Defining AB(X) := AB + XI, we see that AB() is positive-definite for all x > 2n. Notice now that per(AB(x)) is a degree-2n polynomial in x. Therefore, given an oracle that calculates the permanent of a positive-definite matrix, we can interpolate a monic polynomial through the points x= 2n + 1, 2n + 2,..., 4n to recover the polynomial per(AB(X)). Since per(AB(0))= per(AB), the permanent of a positive- definite matrix under Turing reductions is #P-hard. We now only have left to prove that the above reduction can be condensed into a single call to the positive-definite matrix permanent oracle. Since the matrix B is a 0-1 matrix, the polynomial per(AB(x)) has positive integer coefficients, the largest of which is at most (2n)!. Therefore, if x > (2n)!, then we can deduce the constant term of per(AB(x)) with a single oracle call. Clearly, this requires at most a polynomial increase in the bit length of the integers used in the reduction. 0

Theorem 75 implies that there is some linear optical experiment one can per- form with thermal input states for which calculating the exact success probability is computationally difficult. We would like to say that this also precludes an efficient classical sampling algorithm (unless PH collapses), as is done in work by Aaronson and Arkhipov [3] and Bremner, Jozsa, Shepherd [26]. Unfortunately, those arguments rely on the fact that even finding an approximation to their output probabilities is difficult, but the following theorem heavily suggests that such a result cannot exist.

Theorem 76 (Rahimi-Keshari, Lund, Ralph 170]). There exists an efficient classi- cal sampling algorithm for Boson Sampling with thermal input states. Furthermore, multiplicatively approximating the permanent of a PSDmatrixis in the class FBPPNP.

110 Intuitively, such an algorithm exists because it is possible to write the permanent of a PSD matrix as an integral 7 of a nonnegative function, on which we can use Stockmeyer's approximate counting algorithm [82]. Such a representation as a sum of positive terms also implies that the permanent of a PSD matrix is nonnegative. Notice that this also justifies our use of techniques distinct from the linear opti- cal approach. Suppose we can encode the answer to a GapP-hard problem into the permanent of a PSD matrix as we do with real orthogonal matrices, then multiplica- tively approximating the permanent of a PSD matrix would also be GapP-hard under Turing reductions (see Theorem 80 in Section 4.6). On the other hand, Theorem 76 says that such a multiplicative approximation does exist, so

PHC pGapP BPPNPC P

Therefore, either such a reduction does not exist or the polynomial hierarchy collapses to the third level.

4.5.2 More Permanent Consequences of the Main Result

In this section, we try to give a sense in which our proof for the hardness of the permanent for real orthogonal matrices leads to new hardness results for many classes of matrices. The structure of this section is as follows: we will first restrict as much as possible the class of matrices for which the permanent is #P-hard; we will then observe that the permanent for any larger class of matrices must also be hard, which will show hardness for many natural classes of matrices.

We call matrix A an involution if A = A- 1 .

Theorem 77. Let A be a real orthogonal involution with per(A) > 0. The permanent

1 7Suppose we have PSD matrix A = CCf where C = {ciJ. Then the permanent of A can be expressed as the following expected value over complex Gaussians:

per(A)= E ] c zEGeCA(01 _i=1 j=1

111 of A is #P-hard.

Proof. Let C : {0, 1}"- {0, 1} be a Boolean function for which we want to calculate

Ac. We will construct a new circuit C' :{O, 1}n+1 {0, 1} such that for x EC{0, 1}" and b E {0, 1} we have C'(x, b) = C(x) V b. It is not hard to see then that Act Ac + 2". Importantly, this implies that Ac, 2 0. Now let us leverage the reduction in Theorem 65 to build a real orthogonal matrix B such that per(B) oc Ac'. As in the proof of Theorem 75, let

0 B AB =( (B T 0

Since Ac, > 0, we have per(B) 2 0, which implies that per(B) = Vper(AB). How- ever, since B is orthogonal, we have that A 2 = I, so ABis an involution. Furthermore, AB= A , so AB is a real orthogonal matrix. Therefore, the permanent of real or- thogonal involutions is #P-hard. El

We call a matrix A special if det(A) = 1. Furthermore, a matrix A is symplectic if ATA = Q where Q = (° 0). We strengthen Theorem 77 to provide the smallest class of matrices for which we know the permanent is #P-hard.

Theorem 78. Let A be a real special orthogonal symplectic involution with per(A) > 0. The permanent of A is #P-hard.

Proof. Let B be a real orthogonal involution, and let I, be the n x n identity matrix. Consider the matrix B 0 12 B = . (0 B Notice that

2 2 det(12 0 B) = det(B) = det(B )= det(In) = 1, where we use that B 2 = In is an involution for the third equality. Therefore, I0 B is special. It is also easy to verify that 120 B is real orthogonal symplectic involution.

Assuming per(B) > 0, we have per(B)= per(12 0 B). Combining the above with

112 Theorem 77, we get that the permanent of real special orthogonal involutions is #P- hard. 0

Since the set of n x n real special orthogonal matrices form a group SO(n, R), we immediately get #P-hardness for all the matrix groups containing it.

Corollary 79. The permanent of an n x n matrix A in any of the classical Lie groups over the complex numbers is #P-hard. That is, it is hard for the following matrix groups:

General linear: A GL(n) iff det(A) / 0

Special linear: A E SL(n) iff det(A) = 1

Orthogonal: A c-E 0(n) iff AAT = In C Special orthogonal: A SO(n) iff AAT = In and det(A) = 1 cC Unitary: A U(n) iff AAt J EC Special unitary: A SU(n) iff AAt = In and det(A) = 1

Symplectic: A Sp(2n) iff ATQA = Q where Q = ( _ I")

Proof. Since SO(n, R) is a subgroup of all the stated Lie groups besides the symplectic group Sp(2n), their permanents are #P-hard by Theorem 78. Theorem 78 handles the symplectic case separately.

4.6 Approximation

Much like in Aaronson's paper [2], our hardness reductions for exactly computing the permanent lead naturally to hardness of approximation results as well. Approxima- tion results comes in two flavors: additive and multiplicative. For example, Gurvits' algorithm [52] approximates the permanent of a matrix A up to EIIAll"additive error. We will focus strictly on multiplicative approximation. That is, the result of the approximation should be between per(A) and k per(A) for some k.

113 We give approximation results only for real orthogonal matrices since it is unclear how to even define multiplicative approximation in a finite field. All of our results follow from the fact that we actually prove GapP-hardness (since we compute the gap, AC, rather than just the number of satisfying assignments). None of the results use anything specific to permanents; they are all GapP folklore, but we state them as permanent results for clarity.

Theorem 80. Suppose A is an oracle that approximates the permanent of a real orthogonal matrix to any multiplicative factor. In other words, A is an oracle for the sign (zero, positive, or negative) of the permanent. Then GapP C FPA.

Proof. We give an FPA algorithm for computing AC for a classical circuit C. Since this problem is GapP-hard, we get GapP G FPA. By earlier results, we can construct a real orthogonal matrix with permanent pro- portional to AC. Then we can apply the oracle to compute the sign of the permanent, and hence the sign of Ac. This is helpful, but we can do better. Recall that we can add or subtract two GapP functions (see Section 2.3), so for any integer k, we can construct a circuit Ck such that Ac, = Ac - k. Then we can apply A to give us the sign of Ac, or equivalently, compare Ac to k. In other words, we can use A to binary search for the value of Ac, which we know to be an integer in the range -2" and 2". 0

Recall that C=P is the class of decision problems of solvable by a non-deterministic polynomial-time machine which accepts if it has the same number of accepting paths as rejecting paths. By Toda's theorem, PH C BPPC=P.

Theorem 81. Suppose A is an oracle that approximates the absolute value of the permanent of a real orthogonal matrix to any multiplicative factor. That is, A tells us whether the permanent is zero. Then PC=P C Pc.

Proof. The problem of computing whether Ac = 0 for a classical circuit C is C=P- hard. But clearly we can construct a real, orthogonal matrix from the circuit with permanent proportional to Ac, and then apply A to determine if the permanent is zero, and hence whether Ac is zero. Therefore pC=PC PA

114 Finally, we show that even a very poor approximation to the absolute value of the permanent still allows us to calculate the exact value of the permanent via a boosting argument.

Theorem 82. Suppose A is an oracle that approximates the absolute value of the permanent of an n x n real orthogonal matrix to within a 2 n factor for some E > 0. Then GapP C FPA.

Proof. We give an FPA algorithm for computing Ac of a classical circuit. Since this problem is GapP-hard, we get GapP C FPA.

As in Theorem 80, we can construct a circuit Ck such that Ac = Ac - k for any integer k. By applying oracle A to the real orthogonal matrix corresponding to Ck, we can get a multiplicative estimate for IAc - kI. Let us assume for the moment that A gives a multiplicative approximation to within a factor of 2, and improve this to

2" later.

Suppose we are given an interval [a, b] guaranteed to contain Ac. For instance, AC is initially in [-2, 2n]. Apply A to find an estimate for Ac = Ac - a. Suppose the approximation we get is x*. Then we have

1 a + -* < Ac < a + 2x*. 2

So AC is in the interval [a + lx*, a + 2x*] n [a, b]. One can show that this interval is longest when a + 2x* = b, where it has length j(b - a). Since the interval length decreases by a constant factor each step, we only need 0(n) steps to shrink it from [-2 , 2n] to length < 1, and determine Ac. Finally, suppose we are given an oracle which gives an approximation to within a multiplicative factor 2" . Theorem 13 in Section 2.3 lets us construct a circuit Cm (not to be confused with Ck) such that Acm = (Ac) m . The circuit is essentially m copies of C, so we can only afford to do this for k polynomial in the size of C, otherwise our algorithm is too slow.

The point of Cm is that a factor # approximation to Acm gives a factor 31/m approximation of Ac by taking mth roots. This is excellent for reducing a constant

115 approximation factor, but when #(n) grows with n, we must account for the fact that the size of Cm grows with n as well. In particular, the size of Cm scales with m, and the dimension of the matrix in our construction scales linearly with m as well.

So, for our algorithm to succeed, we need #(nm)i/m < 2 or

/(nm) < 2m for m a polynomial in n. Suppose we can afford m =n copies of C. Then we succeed when #(+c) < 2"", or

#(n) < 2" T .

Within the scope of polynomial time algorithms, we can make 1 less than any&, and thereby handle any 2"n approximation factor.

The core ideas in both Theorem 80 and 82 were already noticed by Aaronson [2], but we give slightly better error bounds for the latter theorem.

4.7 Gadget Details

As discussed above in Section 4.3 and Section 4.4, our results on orthogonal matrices depend on a collection of gadgets. In the real orthogonal setting (Section 4.3), each gadget is a real orthogonal matrix with algebraic entries, and all entries have clear, compact expressions in terms of radicals. However, in Section 4.4, we wish to reuse the same gadgets over finite fields, and radicals are no longer the best representation.

Instead, we will show that our (real) gadget matrices have entries in Q(a), the alge- braic field extension of the rational numbers by a, where a = V/2+ v,2+ 3 + v6 ~ 4.182173283 is the largest real root of irreducible polynomial

14 12 8 6 2 f (X) = z - 40x + 572x - 3736x'+ 11782x - 17816x + 113244 - 1832x + 1.

More specifically, we will write every entry as a polynomial in a, with rational coef- ficients and degree less than 16.

116 This is a cumbersome representation for hand calculation, but there are some advantages. First, it eliminates any ambiguity about, for instance, which square root of 2 to use in a finite field. Second, we can check the various conditions our gadgets need to satisfy in the field Q(a), and then argue that the verification generalizes to F,(a), with a few caveats. So, without further ado, we present polynomials for a set of reals which generate all the entries of our gadgets.

4.7.1 Gadget entries

Since are the only entries in D, our decoder gadget, we show how to express those entries as polynomials in a.

1 a14 - 53a12 + 1077a10 - 1056108 + 51555a6 - 115791a4 + 9520702 - 8379 11776

For our encoder gadget E, we also must also express as an element in Q(a). Note that can be obtained as - 1

1 _ 14 - 53a12 + 1077a10 - 1056108 + 5155506 - 115791a4 + 9520702 - 8379 11776

Showing that the entries of the 1y(7r/4) gate are in Q(a) requires the following:

V2 + =2 ( - 123a15 + 4932a13 - 70785a1 + 464494a9 5888

1470141a7 + 2209176a5 - 1357287a3 + 193302a)

2 - = 5 (216 15- 8711a13 + 12623401 - 841629a9 5888 + 2733428a7 - 4270353a5 + 2799098a3 - 466411a)

Finally, we have the V gate. We already have the 1 in front, and the various multiples of V inside, so we just need /3 ±x/5 and /6 ±2V. These are related

117 by a factor of v, so it suffices to give -/3 t 0.

3 +76 = 5 (123a 15 - 4932a 3 + 70785a" - 464494a 9 5888 5 + 1470141z7 - 2209176a + 1357287a 3 - 187414a)

3 - = 1 (a 15 - 598a 13 + 8505a" - 55084a9 256 + 171665a7 - 256518a 5 + 161671a3 - 25624a)

The numbers above, combined with j and , generate all the entries of our real orthogonal gadgets. Note that the denominators in front of the polynomials above

(e.g., 11776, 5888, 256, 3, etc.) all divide 35328 = 29 - 3 . 23. In other words, this representation is a bad choice for fields of characteristic 2, 3, or 23 because, in those cases, division by 35328 is division by 0. Aside from this restriction, the representation is well-defined for any field containing some root a of the polynomial p.

We should not be surprised that the representation fails for fields of characteristic 2 or 3 because our matrices contain, for instance, the entries and . We also know 3j V2 the permanent of an orthogonal matrix is easy to compute in fields of characteristic 2 or 3, so it is actually no surprise to find this obstacle to our hardness proof.

On the other hand, we can find no explanation for the requirement p = 23; it appears to be a quirk of the algebraic number a. In fact, a different choice fails for different primes. Consider ~5.596386846, the largest real root of

x16 - 56x 1 4 - 32x 1 3 + 1084x 1 2 + 960x" - 9224xm - 8928x + 37702x8 +

33920x7 - 73736x6 - 53216x5 + 63932x4 + 23488x3 - 21560X2 + 3808x - 191.

This section is long enough without doing all the same steps for , so let us claim without proof that Q(3) = Q(a). Furthermore, when we represent the matrix entries as polynomials in # (we omit the details), the denominators prohibit the use of this representation for fields of characteristic 2, 3, 191, and 3313, but not 23. Hence, for all primes p other than 2 or 3, there is some representation that works for that prime.

118 4.7.2 Galois Extension

We need Q(a) to be aGalois extension to apply Chebotarev's theorem, which we use to prove Theorem 74. Another helpful consequence is that if a is in some field, then all the roots of f are also in the field since they can be expressed as polynomials in a. The most direct way to prove Q(a) is a Galois extension is to write all 16 roots of f in terms of a. Since f is an even polynomial, half of the roots are just the negatives of the other half, so we restrict our attention to the 8 positive roots.

Root Polynomial

0.0234 8 (-129a1 + 5043a13 - 69381011+4253039 - 1214867,7+1629561C5 - 919335a3 +122941a)

0.4866 1 (123a 15 - 4932a13 + 70785011 - 464494a9 +1470141a7 - 2209176, +1357287a3 - 190358a)

1.1057 1 (-234,15+ 934313 - 133200011+8657139 - 2709218C, + 40545455 - 2537860oa3+ 391327c)

1.5073 5 (561 5 - 22465a13 + 321849a11 - 2108561i9 + 6681723a7 - 10170267±5 + 6517531C, - 1055763o)

1.5690 5 (-93a 5 + 3779c13 - 55449a"+ 377135a9 - 1263287a7 + 2061177a5 - 1441811.a + 278997a)

9 2.5897 _I_ (111015 - 4411a13 +62415011 - 4012190 +1239077a7 - 1845369c, +1180573a3 - 198025a)

3.0997 1 (339ai5 - 13643al3+ 19701901- 1306123a9+ 420356907 - 6479529a5 + 4156385a3 - 653825a)

4.1821 a

119 120 Chapter 5

A Complete Classification of Clifford Operations over Qubits

A common thread throughout quantum computing is the manner in which a few ele- mentary gates often suffice for universal computation. This "pervasiveness of univer- sality" is noted in a classification theorem for classical reversible gates by Aaronson, Grier, and Schaeffer [6].1 Of course, the ultimate goal would be a complete classification of quantum gate sets based on the functions over qubits they generate. Unfortunately, not even a full classification of the subgroups of a three-qubit system is known. 2 Since each class of gates is a subgroup, this suggests that a complete classification remains out of

reach.3 This might be surprising given how well we understand random gate sets, and even those that contain particular gates such as CNOT [16, 76]. However, a full classification begets a complete understanding of all possible behaviors, despite their strangeness or rarity. Nevertheless, there has been some recent and encouraging progress on some classification problems, in particular, on classifying Hamiltonians

'Much of [6] as well as some preliminary results for this chapter were contained in the master's thesis of the author. 2 The difficulty in classifying the subgroups of SU(N) arises not from the infinite classes but from the finite ones. In fact, even the finite subgroups of SU(5) remain unclassified. This motivates our focus on finite, discrete classes such as the Clifford group. 3 Although it is worth pointing out that our ancilla model (see Section 5.5 could potentially make the problem easier.

121 (which can be applied for any period of time), rather than discrete gate sets. For in- stance, Bouland, Maneinska, and Zhang [24] classified all 2-qubit commuting Hamilto- nians, while Childs et al. [33] characterized all 2-qubit Hamiltonians when restricted to circuits over two qubits. Additionally, Aaronson and Bouland [23] completed a classification for linear optics of 2-mode beamsplitters, which relied heavily on the characterization [51] of the finite subgroups of SU(3), underscoring the difficulty of quantum gate classification. This work contributes a new classification of quantum gate sets by giving a com- plete classification of the Clifford gates, the set of unitaries normalizing the Pauli group. To provide some context, the Clifford gates are generated by the CNOT gate, the Hadamard gate, and the E-phase gate. It is not hard to see that the Clifford operations on n qubits are a discrete, finite set, so it has always been widely assumed that they do not suffice for universal quantum computation. Indeed, Gottesman and Knill [47] showed that they could be efficiently simulated with a classical computer, despite the fact that Clifford circuits can be highly entangling. Aaronson and Gottes- man [5] improved this result to show that Clifford circuits can be simulated in eL, for which the complete problem is the solution of a linear system over F2.4 Clifford circuits are somewhat remarkable in that they may in fact be integral to our eventual development of a general-purpose quantum computer. For instance, the stabilizer formalism, which tracks state evolution through conjugated Pauli ele- ments, underlies many of the important quantum error correcting codes [46]. In fact, the Clifford operations are exactly those operations which can be easily computed transversally in many fault-tolerant schemes of quantum computing (e.g., those using the Shor code [77] or the [[7,1,3]] [80]). Our model is motivated in part by the use of Clifford circuits as subroutines of a general quantum computation, much like the transversal gates in a fault-tolerant scheme. We regard the creation of complicated ancilla states as an inherently diffi- cult task, and therefore require that all ancillary qubits used during the computation

4 In that work, Aaronson and Gottesman proposed the tableau representation of a Clifford gate, which we reinterpret to serve as one of the principal components of our classification theorem. See Section 5.4 for those details.

122 be returned to their initial state at the end of the computation. This restriction eliminates schemes in which much of the difficulty of the computation is offloaded to the creation of "magic states" which are subsequently consumed by the computation [25, 71]. Unlike these schemes in which the ancillas boost weak gate sets to compu- tational universality, Clifford operations cannot be boosted in our model to generate anything outside the Clifford group. 5 We also regard the classification of Clifford gates as an important step towards a full classification of quantum gate classes. Although the complete inclusion lattice for general quantum gates will be significantly more complicated than the one we present for the Clifford gates, the classes described here provide a testbed for the techniques used for general quantum gates. This is due to the fact that our lattice for Clifford gates must appear as a sublattice in the complete quantum gate classification. This situation contrasts with the reversible gate classification of Aaronson et al., in which much of the complexity of the lattice is due to the fact that only 10) and 11) ancillas were allowed. Because we allow for arbitrary ancillas in our model, our classification does not suffer from the same issue.

5.1 Results Overview

We wish to determine the set of Clifford operations that can be realized as circuits consisting of gates from a given gate set. Let us briefly explain the circuit building operations we allow (full details in Section 5.5). First, we can combine gates in series or parallel, i.e., their composition or tensor product. We also assume for simplicity that swapping qubits is allowed at any point in the circuit. Finally, each circuit has access to arbitrary quantum ancillas provided that they are returned to their initial states by the end of the computation. Under this model, our main result is the classification of Clifford gates below:

Theorem 83. Any set of Clifford gates generate one of 57 distinct classes of Clifford operations. There are 30 classes (depicted in Figure 5-1) generated by single-qubit

'This was first noticed by Anderson [14].

123 gates. The remaining 27 nondegenerate classes are shown in Figure 5-2. Notation for the generators of the classes depicted in those diagrams is given in Section 5.3.

We list some consequences and highlights of the classification below:

(1) Invariants. Every class can be defined by a collection of invariants, i.e., prop- erties of the Clifford gates which are preserved under our circuit building opera- tions. Formally, we define each invariant based on the tableau representation of the Clifford gate (see sections 5.4 and 5.6). We now describe the broad themes behind the main invariants of the classification. First, there is a three-fold

symmetry in the classification, corresponding to the symmetry of the X, Y, and Z elements of the Pauli group. For example, the CNOT gate permutes the X-basis, and permutes the Z-basis, but not the Y-basis. Naturally, there are gates corresponding to CNOT for other choices of bases. In fact, there is

a nontrivial class of gates that act as permutations in all three bases (up to sign).

When two classes cannot be distinguished by their high-level basis behavior, we need more refined invariants to separate them. Sometimes the set of single- qubit operations generated by a set of gates serves to easily distinguish two classes. However, consider the '-phase gate and Pauli operations which are common to several classes. Even when combined with these single-qubit gates, CNOT can correlate bits in the Z-basis, whereas CSIGN cannot. In fact, there is a third class which can correlate bits in the Z-basis, but only by orthogonal transformations, separating it from both CSIGN and CNOT.

(2) Finite Generation. Every class can be generated by a single gate on at most four qubits. Also, given a set of gates generating some class, there always exist three gates from that set that generate the same class. Moreover, the classifica- tion implies that the canonical set of Clifford generators-CNOT, Hadamard, and phase-is not a minimal set of generators in our model. It turns out that with the aid of ancilla qubits, CNOT and Hadamard generate a phase gate. This

124 is well-known [9, 57], but comes as a simple consequence of our classification theorem.

(3) Ancillas. In general, giving a Clifford gate access to ancillary qubits often increases the set of functions it can compute. A priori, one might suspect that extracting all functionality from a large entangling Clifford gate would require large highly-entangled ancilla states. Nevertheless, our classification shows that only a constant number of one- and two-qubit ancillary states are ever needed. In fact, an even stronger result is true. Namely, our classification holds even when we allow the ancillas to change in an input-independent manner,6 as would be natural for a Clifford subroutine in a general quantum computation. See Section 5.5 for further discussion.

(4) Algorithms. Our classification implies a linear time algorithm which, given the tableau of a gate G, identifies which class G belongs to. In fact, to witness that G generates some class in the classification, one only needs to view a constant number of bits of the tableau. These details are discussed in sections 5.9 and 5.10.

(5) Enumeration. For each class C and for all n, we give explicit formulas for the number of gates in C on n-qubits. The enumeration usually leads to effi- cient canonical forms for circuits in the various classes. In fact, every class is exponentially smaller than any class strictly containing it. See Section 5.11 for details.

5.1.1 Proof Outline

We can divide the proof into a few major steps. First, we introduce the notion of a tableau, a binary matrix representation of a Clifford circuit. We then present all the classes in the classification and designate them by their generators. An examination 6 Aaronson et al. called this the "Loose Ancilla Rule," and it does affect their classification of classical reversible circuits.

125 T

,p + r0+++ 0+-- 0-+- 0--+

P+Rx P+Ry P+Rz

E . Rx Ry • z X+9yz Y +Oxz +Oxy

rx-y x+++

_L

Figure 5-1: The inclusion lattice of degenerate Clifford gate classes ALL

C(Y, X) + P+ Rx T+ r C(Z, Y)+P + Ry .(C(X, Z) +P+ Rz

C(X, X) + E x C(Y Y) + Ry C(Z Z)+P+

CX, X) +?- C(X, X) + Rx (X, X) + X+ YZ C(,)

C(X, X) +XCY Y) +Y C(Z, Z)+Z

Figure 5-2: The inclusion lattice of non-degenerate Clifford gate classes. Red, green, blue denote X-, Y-, and Z-preserving, respectively. of the tableaux of the gates in these classes reveals candidate invariants. We then prove that these candidate invariants are indeed invariant under the circuit building operations in our model. That is to say, if we have two gates whose tableaux satisfy the invariant, then the tableau of their composition satisfies the invariant, and so on for all the other ways to build circuits from gates-tensor products, ancillas, swapping.

At this point, we will have shown that each class has a corresponding invariant, which implies that each class in our lattice is distinct. That is, for any two classes, there is a generator of one that fails to satisfy the invariant of the other. Next, we will show that this correspondence is complete. The generators of a class can construct any gate which satisfies the invariant for that class.

The challenge remains to show that our list of classes is exhaustive. Suppose we are given some gate set G, and we wish to identify the class it generates. Clearly, the class generated by G is contained in some class in the lattice, and let C be the smallest such class. The hope is to show that G generates all of C. To do this, we use the minimality of C. That is, for each class S C C, there must be some gate g E G which is not in S, otherwise S would be a smaller class containing G. We now wish to use g to generate a simpler gate, also violating some invariant of S. This is accomplished via the "universal construction," which is a particular circuit built from g and SWAP gates. Finally, we combine the simpler gates to construct the canonical generators for the class C itself.

5.2 Stabilizer Formalism

The one-qubit unitary operations

0 1 0 -i 1 0 X = (10Y = (1 0Z = 1

128 are known as Pauli matrices. The Pauli matrices are all involutions (X 2 y2 Z2 = I), and have the following relations between them

XY = iZ YZ = iX ZX = iY

YX = -iZ ZY = -iX XZ = -iY

It follows that the Pauli matrices generate a discrete group (under multiplication), called the PauligroupP, which consists of sixteen elements: {I, X, Y, Z} with phases ±1 or ±i. The Pauli group on n qubits, P, is the set of all n-qubit tensor products of elements from P. We define a Pauli string as any element of P with positive phase (i.e., a tensor product of the matrices I, X, Y, Z). We frequently omit the tensor product symbol from Pauli strings and write, e.g., Pi ... Pn where we mean

P1 0 ... 0 Pn. The Clifford group on n qubits, C, is the set of unitary operations which normalize P, in the group-theoretic sense. That is, U E C if UpUt G Pn for all p E P,. We leave it as a simple exercise to check that C, is indeed a group.

A Clifford gate is any unitary in Un> 1C,. A Clifford circuit is a quantum circuit of Clifford gates implementing a unitary transformation on some set of qubits, des- ignated the input/output qubits, while preserving the state of the remaining ancilla qubits. We say that a state |@) is stabilized by an operation U iff U|0) = I). In other words, 1) is in the +1 eigenspace of U. The Pauli elements and their corresponding stabilized states are below:

X : +) = 10+1 X :I-) = 11) Y :i) = 1°)ill) -Y :|-i) 10)-ijl)

Z : 0) -Z :11)

We call the vectors stabilized by non-identity Pauli elements P and -P the P-basis. A stabilizer state is any state U|o...0) where U is a Clifford gate. For example, 10), |1), 1+), -), i), and I-i) are the 6 stabilizer states on one qubit. Multi-qubit stabilizer states include (0 1)+1")) and E., 1 , x). In general, stabilizer states

129 are of the form (unnormalized)ed(-x)ihez) where A is an affine space over

F2, q() is a quadratic form, and f(x) is a linear form [37, 93].

5.3 Gates

Let us introduce some common Clifford gates used throughout the classification.

5.3.1 Single-qubit Gates

We start with the single-qubit Clifford gates, which by definition permute (up to phases) the X, Y, and Z bases. In fact, the single-qubit Clifford gates correspond to symmetries of the cube (see Figure 5-3) or octahedron. We group the gates by the type of rotation to emphasize this geometric intuition.

Face rotations: The Pauli matrices X, Y, and Z (as gates) correspond to 180 rotations about the X, Y, and Z axes respectively. Similarly, we define Rx, Ry, and Rz to be 90° rotations (in the counterclockwise direction) about their respective axes. Formally,

I-iX I-iY I-iZ Rx = ,F Ry= , F Rz= ,-

although in the case of Rz (also known as the phase gate and often denoted by S or P), a different choice of phase is more conventional. The clockwise rotations are then R , R, and Rz

Edge rotations: Another symmetry of the cube is to rotate one of the edges 1800.

Opposing edges produce the same rotation, so we have six gates: 9 x+y, Ox-y, 6 x+z, 0 x-zO y+z,O y-z. We define

6 P+Q P-Q P+Q =PQ

for all Pauli matrices P 4 Q. Note that x+z0 is the well-known Hadamardgate, usually denoted by H.

130 )

Vertex rotations: The final symmetry is a 120 counterclockwise rotation around one of the diagonals passing through opposite vertices of the cube. The cube has eight vertices, (± ±1, ±1), and we denote the corresponding single-qubit gates I+++, ]F++-, - -, F---. Algebraically, we define

I - iX - iY - iZ 2 I - iX - iY + iZ 2

I+iX+iY+iZ 2

We also define F (without subscripts) to be the first gate, I+++, since it is the most convenient; conjugation by F maps X to Y, Y to Z, and Z to X.

Gate Tableau | Unitary Matrix x 1 0 0 0 1 0 1 1) 0) 1 0 0 (A -i) 0 1) z 1 0 1 0 (ij 01 1 01J f Rx ) 0 0 ) 0 -1f 0 1 10) -i 10Of 1 ( 0> Rz = S (A V-1

Oxz = Ox+z = H 1 10 ~1 Ox-z 0 1 ) 1 f0 1 0 -i - _F +++ 0 0 2 i i

Table 5.1: Single-qubit gates

131

)

) r-4- RzA

F

6 Y+Z

F+-+ ----- z Ox+z

r---

Ox-y' 4..

4o X+ 6 y-z x

9 x-z

Figure 5-3: Single-qubit gates as symmetries of the cube

132 5.3.2 Multi-qubit Gates

We now introduce the multi-qubit Clifford gates relevant to the classification. The SWAP gate, for instance, simply exchanges two qubits. A more interesting example is the controlled-NOT or CNOT gate, and the generalized CNOT gates. A generalized CNOT gate is a two-qubit Clifford gate of the form

IOI+POI+I®Q-P®Q C(P, Q) := 2 where P and Q are Pauli matrices. If the first qubit is in the +1 eigenspace of P then C(P, Q) does nothing, but if it is in the -1 eigenspace of P then C(P, Q) applies Q to the second qubit. Of course, the definition is completely symmetric, so you can also view it as applying P to the first qubit when the second qubit is in the -1 eigenspace of Q. Observe that C(Z, X) is actually the CNOT gate; it applies a NOT gate to the second qubit when the first qubit is |1) and does nothing when the first qubit is10). Figure 5-4 shows this equivalence, and illustrates our circuit diagram notation for generalized CNOT gates. Also note that C(X, Z) is a CNOT, but with the opposite orientation (i.e., the second bit controls the first). The rest of the heterogeneous generalized CNOT gates (i.e., C(P, Q) where P # Q) are the natural equivalents of CNOT in different bases.

Z

Figure 5-4: CNOT expressed as a C(Z, X) gate

Similarly, C(Z, Z) is better known as the controlled-sign gate (denoted CSIGN), which flips the sign on input |11), but does nothing otherwise. The homogeneous generalized CNOT gates (i.e., C(P, P) for some P) are quite different from hetero- geneous CNOT gates. For instance, when one CNOT targets the control qubit of

133 another CNOT then it matters which gate is applied first. On the other hand, two CSIGN gates will always commute, whether or not they have a qubit in common.

It turns out that every two-qubit Clifford gate is equivalent (up to a SWAP) to some combination of one qubit Clifford gates and at most one generalized CNOT gate. Although most classes of Clifford gates can be specified by such two-qubit generators, there are five classes which require a larger generator such as the T4 gate [6, 48].

For all k > 1, let T2k be a 2k-qubit gate such that for all x = (XI,... X2k) E {o, }2k

T 2k |x) = x ED b, X2 E bx, ... ,x 2 k e bx) where bx = Xi DX 2 G - EX2k. Intuitively, T2k outputs the complement of the input when the parity of the input is odd and does nothing when the parity of the input is even. In particular, T2 is the lowly SWAP gate. Notice that this is an orthogonal linear function of the input bits, which hints at an invariant which may arise in the classification.

5.4 Tableaux

Observe that the matrices I, X, Y, Z are linearly independent, and therefore form a basis for the 2 x 2 complex matrices. It follows that P, spans all 2" x 2' complex matrices. Hence, any unitary operation on n qubits can be characterized by its action on P,. In particular, any gate is characterized by how it acts on

p1= XI...I, p 2 = ZI...I,., P2n- 1 =I...IX, P2n= I...IZ.

We call this list the Pauli basis on n qubits, since one can write any element of P as a product of basis elements times a phase (+1 or i).

Now suppose we are given a Clifford gate, U E C,. By definition, Clifford gates map each Pauli basis element to something in P, which can be written as a product

134 Gate Tableau UnitaryMatrix 1 0 0 0 (1 1 1 0 1 1 0 1 1 -11 C(XX) 0 0 1 0 -1 i1) 1 0 0 1 21 1 1 1 0 1 1 -i -i -i 1 0 1 1 1 1 -1 -~i C(YY) 1 1 1 0 -1 1-z 1 1 0 1 i i 1 1 0 0 1 0 0 0 0 C(Z, Z) = CSIGN 2 z 1 0 10 1 0 0 0 -1 0 1 1 0 0 0 10 1 Ku 0 -i 1 01 C(YX) 11 0 1 0 1 i -i 1 1j 010 1 10 i 1 011/ 00 1 0 0 0 0 1 10 1 0 0 0 0 1 0 0 C(ZY) 0 0 0 1 0 1 0 i 1 0 0 0 0 0 0 0 I) C(X, Z) = CNOT 1 0 0 0 1 1 0

Table 5.2: Two qubit gates. The sign bits are all 0 in the above tableaux so they are omitted.

135 of basis elements times a phase. That is,

2n UpjUt = a. fpMik k=1

7 for some bitsM 1 ,..., Mj( 2 n) C{0, 1} and some phase aj E {±1, ti}. The tableau for U is a succinct representation for U consisting of the binary matrix M= [Mjk], and some representation of the phases a1,..., a2,. It turns out that U maps pj (or any Pauli string) to 1 times a Pauli string, never ±itimes a Pauli string. This follows from the fact that the square of any Pauli string is I -.-I, so

(UpyUf) 2 = Up Ut = U(I ... I)Ut = +I ... I.

If the phase in front of the Pauli string in UpjUt were ti, then squaring it would produce a negative phase. Therefore, UpjUt is ±1 times a Pauli string.

Unfortunately, aj may still be any one of {±1, i}. This is because the product of P2k-lP2k is I ... I(XZ)I ... I= -iI IYI - -*- I, with an awkward -i phase. Once we cancel the extra factors of i from ay, we are left with

n

(-1)"a := aj r(-i)Mj(2k-1>Mj<2k> k=1 where v E {0, 1}is the phase bit for row j. For example, if UpUt is YI ... I then we have M1 1 = M12= 1 and vi = 0. Then the complete tableau for U is the matrix

M = [Mjk] and the vector of bits v = [vj, which we typically write as

M11 ... M( 2n) V 1

M(2n)1 -I M(2n)(2n) V2n

Our ordering of the basis elements (which differs from other presentations [5])

7 Note that the order of the terms in the product matters, since the Pauli basis elements do not necessarily commute, so we assume the terms are in the natural order from p up to P 2 .)

136 puts Pauli strings on the same qubit (e.g., XI ... I and ZI... I) side-by-side in the matrix, so the 2 x 2 submatrix

M2i-1,2j-1 M2i-1,2j

M2i,2j-1 M2i,2j completely describes how the ith qubit of the input affects the jth qubit of the output. In fact, it will be fruitful to think of the tableau as an n x n matrix of 2 x 2 blocks, along with a vector of 2n phase bits. To be clear, the blocks come from R := F2, the ring of 2 x 2 matrices over the field of two elements, F2 . Then the tableau is a matrix in Rfx" (the n x n matrices over the ring R), combined with a vector of phase bits in IFj. Each row of the matrix is associated with a pair of phase bits from the vector.

However, not every matrix in R nx corresponds to a Clifford circuit due to unitarity constraints. To best express these constraints, we define a unary operation * on R such that a b d b c d c a

The * operator has the property that

a b a b a b d b ad+bc 0 a b c d ) c d c d c a 0 ad+bc c d

Additionally,

1* =I,

(M + N)* M* + N*,

(MN)* N*M*,

(M*) * = M so * makes R a *-ring or ring with involution. We also extend * to an operation

137 on matrices (over R) which applies * to each entry and then transposes the matrix. It turns out that a tableau represents a unitary operation if and only if the matrix M E R"'x satisfies MM* = M*M = I. This (intentionally) resembles the definition of a unitary matrix (UUt = UtU = I), we will call this the unitarity condition, and it corresponds to the unitarity of U as a gate, but M is certainly not a traditional complex unitary matrix (nor unitary over some finite field with conjugation).

5.4.1 Correspondence between Gates and Tableaux

We will find it useful to switch between gates and tableaux, as one notion often informs the other. In light of that, define T(g) to be the tableau of a Clifford gate g. In fact, we will use M(g) to denote only the matrix part of the tableau of g, as the phase bits of the tableau often prove to be irrelevant. Indeed, most non-degenerate gate sets generate the Pauli group, which alone suffices to set the phase bits of the tableau arbitrarily as follows: applying X to qubit j negates v2j and applying Z to qubit j negates v2j-. Furthermore, there is a surprising connection between individual entries of tableaux and Clifford operations that can be extracted from them (see Section 5.9). If a E R is invertible, then let 9(a) be the single-qubit gate with M(g(a)) = a and zeros for phase bits. These are the gates in the first row of Table 5.3. Let g(a, i) be the gate !(a) applied to the ith qubit.

(10) (10) (0 1) ( 1) ( 1 ) (0 1) (8) I R1 0 x+z Rz F+++ r--- (0) X Rx Rty OX+Y F--+ r+-+

(10) Z Oy+z Ry Rt F-+- _-++ (1) Y Yv-z ox-z 0 x-r F+-- F++-

Table 5.3: Invertible tableau elements and the corresponding single-qubit gates pro- duced by the universal construction. Row of the table corresponds to the sign bit of the row of the tableau in which the element occurs.

Much like the invertible elements of R correspond to single-qubit Clifford gates, the noninvertible elements correspond to generalized CNOT gates. Indeed, from

138 Table 5.2, we see that the tableau of each generalized CNOT gate consists of non- invertible elements b E R and b* E R along the off-diagonal. Therefore, if b E R is noninvertible, define CNOT(b, j)i, to be the generalized CNOT on qubits i and j corresponding to the singular matrix b. We summarize these gates again in Table 5.4. The tableau T(CNOT(b,i, j)) is the identity tableau except for b* and b in positions (i, j) and (j,i), respectively. We use the circuit in Figure 5-5 to designate such a gate.

Element (0 0) (11) (01) ( 0) / (0 0) (10) / ( 0i ( ) / ( 1 Gen. CNOT C(X,X) C(Y,Y) C(Z,Z) C(X,Z) C(YX) C(ZY)

Table 5.4: Noninvertible tableau elements and the corresponding generalized CNOT gates produced by the universal construction.

b

b* Figure 5-5: Circuit diagram for CNOT(b, 1, 2) gate.

Finally, we would like to have a direct way to compose two circuits by a simple operation on their tableaux. Suppose we wish to compute the composition of circuits

C1 and C2. To compute T(C 2 o C1), we must compute, for each Pauli basis element pj, the product C2CipCtC. First consider the jth row of T(C1 ), which gives

2n I-ct a.JPk k=1 where MM is the binary representation of M(C 1), and aj is the phase. Similarly,

2n 2

C2P3 C~ -J0 pk k=1

139 Therefore,

/ 2n 2n C2C1PjClCt = C2 aj pM C2P 3kc0 k=1 k=1

ka17 - 1 J 3kk%- 2n 2n M(2 O 2n 1)2n ()k2 =Gj k k1 )3Ak rp,M M k=1 \f=1 k=1 \ t=1/ 2n 2n 2n E2n M ')Mk2 ) 2n 1 2 JJ7PM k _lP1 k Mke = JyJ[M( )M( )]e f=1 k=1 t=1 f=1

Notice that this implies M(C2 o C1) = M(C1)M(C2 ). Since it is cumbersome to write out explicitly, we did not include the exact phases in the above calculation.

Nevertheless, one can compute the phase bits by tracking the intermediate steps in the above calculation, which includes the multiplication of Pauli basis elements.

5.5 Classes

Informally, the class generated by a set of Clifford gates is the collection of Clifford operations which can be constructed from those gates. Our goal is to determine the set of classes and the relationships between them. We now must lay out the precise operations we use to build circuits, such as composition and tensor product. Let us also introduce and justify our rules concerning the reordering of qubits and ancillary workspace.

First, we will always allow the swapping of qubits. This allows us to consider gates without needing to specify the qubits on which they must be applied. Indeed, we can relabel the input wires however we like. Secondly, we allow the use of quantum ancillas, that is, arbitrary states that we can "hardwire" into the computation. We only stipulate that these ancilla qubits be returned to their initial configuration at the end of the computation.

These ancilla inputs can be viewed as the workspace of the computation. If ancillary qubits are not allowed, then a gate cannot be used to generate a smaller gate because there are not enough inputs to use it at all. Furthermore, if we want to

140 apply a Clifford operation as a subroutine of a general quantum computation, then we need that the ancilla states do not depend on the input. Otherwise, they would destroy the quantum coherence of the computation.

Finally, we turn to the question of the ancillas themselves. The weakest assump- tion one could make is that the ancillas are initialized to an unknown state, which the circuit may only change temporarily. This is somewhat artificial since we can, at the very least, initialize the workspace to the all-zeros state. Other classifications suggest that without this assumption the problem becomes dramatically more difficult. 8 A slightly stronger assumption would be to allow ancillas initialized to computational basis states, but this would break symmetry by introducing a bias towards the Z- basis.9

A next natural step would be to permit ancillas initialized to arbitrary stabilizer states. Although this would appear to be circular (i.e., Clifford gates are necessary to implement stabilizer states, which we then use as ancillas in Clifford circuits), the reusability of the ancilla states implies that even if the states are difficult to construct, at least we only have to construct them once. Unfortunately, we are unable to complete the classification in its entirety under this ancilla model. However, we have reduced the problem to finding a single stabilizer state which is stabilized by

F and a permutation, and we conjecture that such a state exists (see Section 5.5.1). Moreover, the conjectured classification matches the one we will present (under a stronger ancilla model).

Finally, we arrive at our chosen model; that is, ancillas initialized to arbitrary quantum states. A priori, these states could be arbitrarily large, and arbitrarily complicated to construct, which is clearly undesirable. It turns out, however, the classification only requires finitely many one or two qubit states, in particular, the eigenstates of the single-qubit Clifford gates and states that are locally equivalent to the Bell state.

8For example, the lattice of classical many-to-one functions over bits is finite when we allow 0/1 inputs to any function, but infinite when we do not allow such freedom [68, 61. 9We can fix the bias by allowing all single-qubit ancillary states: j0),|1),-+),|--1),|i),|-i). This introduces new classes such as (Ox+z 0@x+z), but we leave the classification under these assump- tions as an open problem.

141 It is worth noting that there is a long line of work showing that weak gate sets, including Clifford gates, are universal for quantum computation when given access to magic states [25, 71]. Importantly, these magic states do not need to be preserved after the computation. Conversely, under our model, we show that arbitrary quantum ancillas cannot boost the power of Clifford gates beyond the Clifford group. One might assume that we could increase the power of ancillas by letting them change over the course of the computation, as long as the change is independent of the input (that is, from some constant initial state to some possibly different constant final state). Indeed, in the classification of reversible gates [6], these "loose ancillas" collapse a few pairs of classes. Nevertheless, we show that even with loose ancillas, our classification (as presented) still holds. When we formally define class invariants in Section 5.6 (see Theorem 87 in particular), we will see that the invariants hold under the loose quantum ancilla model, and therefore hold for all weaker models. We consider another ancilla model-namely, when only Clifford ancillas are allowed-at the end of the section. Now let us formally define a class C to be a set of Clifford gates satisfying the following four rules:

1. Composition Rule C is closed under composition of gates. If f, g E C, then f og E C.

2. Tensor Rule C is closed under tensor product of gates. If f, g E C, then

f o g E C.

3. Swap Rule C contains the SWAP gate.

4. Ancilla Rule C is closed under ancillas. If f E C and there exists g such that

f(Ix) 0 |0)) = g(|x)) &4')

for some 1) and for all inputs |x) (up to a global phase), then g c C.

Let's first see a few simple consequences of the model.

142 Proposition 84. Let C be a class of Clifford gates. Then C contains the n qubit identity gate for any n.

Proof. First, C contains SWAP. It follows that C contains the two qubit identity gate since it is the composition SWAP o SWAP. By the ancilla rule, we can remove a qubit from the two qubit identity using any one-qubit state. Hence, the one qubit identity gate is in C. Finally, C must contain the n-qubit identity gate because it is the tensor product of n one-qubit gates. l

Proposition 85. Let C be a class of Clifford gates. For any g E C, the inverse, g, belongs to C.

Proof. Consider the sequence

g19gog gagog, ... ,g...

Since there are finitely many Clifford gates on n qubits (certainly finitely many tableaux, and one gate per tableau), the sequence must eventually repeat. That is, g= g- for some 1 < i < j. Since every Clifford gate has an inverse, we conclude that 1 = g° = g-i-, and hence g 1 = gj-. In other words, g-1 is a Clifford gate, and g-1 is a (positive) power of g and therefore in C. El

The most practical way to talk about a class C is by its generators. We say a set of gates G generates a classCifCC andeveryclass containing G also contains C. We introduce the notation (-) for the class generated by a set of gates. Similarly, we say that G generates a specific gate g if g E (G). Our goal is therefore to identify all Clifford gate classes, determine their generators, and diagram the relationships between classes. As it turns out, there are 57 different classes, which we have split across Figure 5-1 (which contains the classes with single- qubit generators) and Figure 5-2 (which contains the multi-qubit classes). Each class is labelled by a set of generators for that class, except for ALL, the class of all Clifford gates; T, the class of all single-qubit Clifford gates; and _L, the class generated by the empty set. Additionally, we abbreviate some class names in Figure 5-1:

143 0+++, 0+--,0-+-, 0--+ denote the single-qubit classes containing F+++, +-,

1-+-, and F--+ respectively, and three 0 gates each, as indicated by the gray lines.

•0., abbreviates 0,+y or 0_, (it contains both) and similarly for 02, and 6,2.

Some of the lines in Figure 5-1 are gray and dotted, not for any technical reason, but because the lattice would be unreadable otherwise.

In Figure 5-2 each class includes the label of the single-qubit subgroup, even when not all of the single-qubit generators are necessary to generate the class. This is intended to make the relationship between the degenerate and non-degenerate lattices clearer. For example, T4 generates the Pauli group, P, on its own (Lemma 95), but we label the class (T4 , P) to make it clear that the class (T4 ) is above (P) in the lattice.

5.5.1 Clifford Ancilla Rule

Since we are classifying Clifford operations, one might naturally ask what happens when we only allow for stabilizer ancilla. In fact, it's not hard to show that the classification remains unchanged if the following conjecture holds:

Conjecture 86. For any single-qubit Clifford gate g, there exists a stabilizer state

|) and circuit of SWAP gates 7r such that g o r|@) = 1@).

This is sufficient to remove single-qubit gates in situations where we would other- wise use an eigenstate.

For many single-qubit gates, there is a trivial stabilizer state which stabilizes it. For instance, X is stabilized by |+), Rz is stabilized by 10), and many other single- qubit Clifford gates are conjugate to one of these cases. Now consider the gate x+z, whose eigenstates (unnormalized) (1 ±/2)0) + |1) are not stabilizer states. How then, given the gate Ox+z 0 x+z, does one generate the gate 6x+z which acts only on one qubit? Han-Hsuan Lin discovered the first explicit nine qubit stabilizer state for this task.

144 Let 7r be a circuit that cyclicly permutes qubits 2 through 9, and suppose x+z is applied to qubit 1. Let ') be the state stabilized by the following commuting Pauli strings,

XXZXZIIII, ZIXZXZIII, XIIXZXZII, ZIIIXZXZI, XIIIIXZXZ, ZZIIIIXZX, XXZIIIIXZ, ZZXZIIIIX, YYIIIYIII, -YIYIIJY11, YIIYIIIYI, -YIIIYIIIY, 9 of which are independent. One can check that conjugating each generator by X+Zo7r yields another element of the stabilizer group, so (x+zor)|4) = IV). In other words, Conjecture 86 holds for 0 x+z, and for all conjugates P+Q by symmetry. All that remains to verify the conjecture is to find a similar state stabilizing the eight remaining gates-the F gate and its conjugates. It suffices to find a stabilizer state 4') and circuit C, constructed of SWAP gates and a single F gate, such that

C4) = 10).

5.6 Invariants

Until now, we have defined each class in terms of the generators for that class. It turns out that each class can also be characterized as the set of all gates satisfying a collection of invariants. Section 5.7 formalizes this equivalence. This section focuses on the form of the invariants themselves. Informally, an invariant is a property of gates which is preserved by the circuit building operations. In other words, if a collection of gates all satisfy a particular invariant then any circuit constructed from those gates must also satisfy the invariant. All our invariants are formally defined from the tableaux, but for now we give the following informal descriptions to make the intuition for each invariant clear.

X-, Y-, or Z-preserving: We say a Clifford gate is Z-preserving if it maps Z-basis states to Z-basis states, possibly with a change of phase. The Z-preserving

gates include all (classical) reversible gates (e.g., X, CNOT, and T 4), gates which only manipulate the sign (e.g., Rz and CSIGN), and combinations of the two.

145 Symmetrically, there are X-preserving gates mapping X-basis states to X-basis states, and Y-preserving gates gates mapping Y-basis states to Y-basis states. Our definitions of classes, gates, invariants, etc., are completely symmetric with respect to X, Y and Z basis, so if some gate or class is X-preserving (resp. Y- preserving or Z-preserving), then there must be a corresponding gate or class which is Y-preserving (resp. Z-preserving or X-preserving). We will often appeal to this symmetry to simplify proofs.

Note that a gate can be any combination of X-, Y-, and Z-preserving. For in-

stance, T4 is X-, Y-, and Z-preserving; CNOT is X-preserving and Z-preserving but not Y-preserving (similarly C(X, Y) and C(Y, Z) fail to be Z-preserving and X-preserving respectively); Rz is Z-preserving only (similarly Rx is X- preserving and Ry is Y-preserving); and F is not X-, Y-, or Z-preserving.

Egalitarian We say a gate is egalitarianif it is fixed (up to a Pauli operation on each qubit) by the aforementioned X/Y/Z symmetry, that is, the symmetry arising

from conjugating all qubits by F (which cycles X to Y, Y to Z, and Z to X). In particular, this implies that if egalitarian operation U maps Pauli string P to Q = UPU under conjugation, then U maps FPFt to

UFPFtUf c FU FIFPrrUt f= FUPU1ft = FQFt.

Not only are F and the Pauli matrices themselves egalitarian, but so is T4.

Degenerate: We say a gate is degenerate if each input affects only one output. More precisely, when applying the gate to a string of Paulis, changing one Pauli in the input will change exactly one Pauli in the output. All single-qubit gates are degenerate, and all degenerate gates can be composed of single-qubit gates and SWAP gates.

X-, Y-, or Z-degenerate: A gate is Z-degenerate if it is Z-preserving and flipping any bit of a classical (Z-basis) input to the gate causes exactly one bit of the output to flip. The gate may or may not affect the phase. This class includes

146 several Z-preserving single-qubit gates, like Rz, the Pauli operations, and 6x+y.

It also includes CSIGN because this gate only affects phase, but CNOT is not

Z-degenerate because flipping the control bit changes both outputs. Notice

that CSIGN is Z-degenerate, but not degenerate. We define X-degenerate and

Y-degenerate symmetrically.

X-, Y-, or Z-orthogonal: A gate G is Z-orthogonal if it can be built from T 4 and

Z-preserving single-qubit gates. The term "orthogonal" comes from the fact

that T4 is an orthogonal linear transformation in the Z-basis, but not all Z-

orthogonal gates are literally orthogonal transformations in the Z-basis (see, for

example, Lemma 95). Similarly for X-orthogonal and Y-orthogonal.

Single-Qubit Gates: There are thirty different classes of single-qubit gates. All

of these classes are degenerate, and some can be distinguished by the other

invariants above. However, many single-qubit invariants depend on the phase

bits of the tableau. For instance, the tableau of 0x+y, Ox-y, and Rz all have

the same matrix part, ( ),but generate three distinct classes. One can write

down explicit invariants for these classes where the phase bits are correlated

to the tableau entries, but in most cases we present a single-qubit class as a

subgroup of the symmetries of the cube/octahedron, as shown in Figure 5-3.

5.6.1 Formal invariants

An invariant is a property of tableaux which is preserved by the four circuit-building rules.

Swap Rule: Every class contains the SWAP gate, so every invariant we propose must be satisfied by the tableau for SWAP.

Composition Rule: If the invariant holds for two gates, then it must hold for their

composition. We have seen that the tableau for the composition of two gates

is essentially the matrix product of the two tableau, except for the phase bits

(which are significantly more complicated to update).

147 Tensor Rule: The tensor product of two gates satisfying the invariant must also satisfy the invariant. Note that the tableau of the tensor product is the direct sum of the tableaux, and phase bits are inherited from the sub-tableaux in the natural way.

Ancilla Rule: The invariant must be preserved when some qubits are used as an- cillas. It turns out the ancilla operation reduces the tableau to a submatrix (of non-ancilla rows and columns) and under certain conditions, the correspond- ing subset of the phase bits. This is somewhat technical, so we prove it in Theorem 87 below.

Theorem 87. Let G be a Clifford gate on n qubits, and suppose there exist states |$) and |@') such that

G(|x) 0 |@b)) = H(|x)) 0 |0') for all |x), for some unitary H on m-qubits. In particular, this is true if we use the ancilla rule to reduce G to H, where |') =|1') is the ancilla state. Then

1. H is a Clifford operation,

2. M(H) is obtained by removing the rows and columns corresponding to the an- cilla bits from M(G),

3. If every bit (in M(G) as a binary matrix) we remove from a row is zero, then the phase bit for that row is the same in T(G) and T(H).

Proof. Let P E Pm. Then for all |x),

G(Px) 0 4')) = H(Plx)) 0 14').

On the other hand, G is a Clifford gate, so conjugating the Pauli string p I"-m by G produces Q 0 R for QE Pm and RE Pn-m . Equivalently,

G(P 0 I"--) = (Q 0 R)G.

148 It follows that

H(Pl)) 0|1') = G(PIx) 04|))

= (Q 0 R)G(|x) 0kI))

= (Q 0 R) (H(|x)) 0 14))

= QH(|)) 0.R|b'), so up to phase, H(Plx)) = QH(lx)) for all |x), and 10') RIO'). The first equation implies HP = QH, or HPHt = Q. Since P was arbitrary, the conjugation of a Pauli string by H is always another Pauli string, so H is a Clifford gate. In the special case that P (and therefore P 0 I'-m) is a Pauli basis element, then Q 0 R is represented in row of the binary tableau of G. We keep the bits representing

Q in the tableau for H, since HPHf = Q, which is why M(H) is a submatrix of M(G). Clearly the phase of Q is the same as the phase of Q 0 R if and only if R is positive. In the special caseR= I m , it is easy to see that R is positive, otherwise

10') = RIO')= -In-|'/)= -10") is a contradiction. Hence, the phase for the corresponding row of T(H) is inherited from T(G). 0

As a direct consequence of these rules, our invariants take on a distinctly algebraic flavor. Let us consider, for the sake of illustration, invariants that depend only on the matrix part of the tableau and ignore the phase bits. Then an invariant is equivalent to a set of matrices closed under the four rules above. In particular, the matrices to form a group under multiplication as a consequence of the composition rule (and the fact that every gate has finite order). On the other hand, not every group of matrices will correspond to an invariant. For instance, due to the swap rule, the group of matrices must also be closed under arbitrary reordering the rows and columns. This eliminates, e.g., the group of upper triangular matrices. Similarly, the ancilla rule excludes the special orthogonal group.

149 In the end, we are left with just two kinds of matrix groups which lead to invariants:

Subring Invariants Matrices with elements restricted to a particular subring of R

(analogous to the real matrices, integer matrices, etc.)

Permutation Invariants Permutation matrices, except where each 1 entry can be any one of a subset of invertible elements, and each 0 entry comes from a collection of non-invertible elements.

Now we are ready to present formal definitions for these invariants, and show that they really are preserved by the circuit-building rules.

5.6.2 Subring invariants

The first kind of invariant restricts the entries of the tableau to a subring of R. That is, given a subring S C R, a gate satisfies the invariant J(S) if and only if all entries of the tableau are in S. There are twelve classes, all near the top of the lattice, of the form C = {All gates satisfying J(S)}, corresponding to all 12 subrings of R listed below.

* The entire ring, R, is technically a subring of itself, and 1(R) is the trivial invariant satisfied by all Clifford gates. Notice that not every matrix over R gives a valid tableau because it must still be unitary.

• There are four maximal proper subrings of R:

Rx = {(a 0) :{a, c, d} E {0, 1}},

Ry = {( ) :{a, b, c, d} E {0, 1}, a + b + c + d = 0,

Rz = {( ): {a, b, d} E{0, 1}},

RE= b ab f){a, b} E 0, 1}}.

Our formal definition for Z-preserving gates is the invariant7(Rz). The fact that the lower left entry is 0 implies that the gate maps Pauli strings of I and

150 Z to strings of I and Z. Hence, Z-basis strings are mapped to Z-basis strings. Similarly, the X-preserving and Y-preserving invariants are J(Rx) and J(Ry)

respectively. The egalitarian invariant, J(RE), comes from the subring RE.

•The intersection of two subrings is itself a subring, giving us exactly four more subrings (RxnRy, RxnRz, Ry nRz, and RxnRy n Rz) since the intersection of REwith any of the others is

Rx n Ry nRz = {(88),( )},

the trivial ring.

•Three more subrings are obtained by taking only self-conjugate elements of Rx, Ry, and Rz respectively. An element (c ) is self-conjugate if

(c ) =( )*

or equivalently, a = d. These three subring invariants correspond to the X- orthogonal, Y-orthogonal, and Z-orthogonal classes respectively (i.e., the classes

(T4 ,P, Rx), (T4 P, Ry), and (T 4, P, Rz)).

Theorem 88. For any subring S C R, the propertyJ(S) is an invariant. That is, the set of matrices over S respect the circuit building operations.

Proof. Every subring contains (8 8) and ( 1 0) by definition, and therefore the tableau (phase bits omitted) of the SWAP gate,

0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 satisfies J(S).

151 Matrix multiplication is a polynomial in the entries of the two matrices, so compo- sition cannot produce entries outside the subring. Similarly, combining tableau with tensor products or reducing tableau to submatrices via ancillas does not introduce any new ring elements; those operations only use elements already present in the tableau. We conclude that J(S) is an invariant for any subring S. l

5.6.3 Permutation invariants

The permutation invariants get their name from the matrix part of their tableaux, which is required to have the structure of a permutation matrix. That is, every row

(or column) has exactly one element which is invertible, and the others are non- invertible. Permutation invariants are also sensitive to phase bits. It is natural to associate the unique invertible element in a row with the phase bits for that row, giving the tableau of a single-qubit gate. A permutation invariant P(G, S) is defined by the set of single-qubit gates G which can be obtained in this way, and the set of non-invertible elements S used to fill the rest of the tableau. In other words, a tableau satisfies '(G, S) if all entries are from S except exactly one entry per row which, when combined with the phase bits for the row, is the tableau of some gate in G.

Note that not all pairs of sets (G, S) produce an invariant. For instance, circuit- building operations will fail to preserve '(G, S) if G is not a group. The exact relationship between G and S required to produce an invariant is difficult to write down. Roughly speaking, products of elements in S should be zero, products of elements in G should remain in G, and products between S and M(G) should be manageable in some sense. Theorem 89 gives a list of P(G, S) invariants, which will turn out to be exhaustive by Theorem 103, the culminating theorem of this chapter.

Theorem 89. We prove that the following permutation invariants are indeed invari- ant under the circuit-building operations.

152 1. If G is a group of single-qubit gates then

'P(G, {(088)})

is an invariant for (G). All thirty degenerate classes are characterized by in- variants of this form.

2. If (X) C G (P, Rx) is a group of single-qubit gates then

is an invariant for (C(X,X),G). These invariants characterize the five X- degenerate classes.

3. If (Y) (P,Ry)is a group of single-qubit gates then

is an invariant for (C(YY),G). These invariants characterize the five Y- degenerate classes.

4. If(Z) C GC (P, Rz) is a group of single-qubit gates then

is an invariant for (C(Z, Z),G). These invariants characterize the five Z- degenerate classes.

Proof. Let P(G, S) be one of the invariants above. Let M = {M(g) : g E G} be the set of matrices from tableaux in G. In all cases, S contains (8 8), and G contains the single-qubit identity operation,

1 0 0 0 1 0

153 so SWAP satisfies the invariant. And clearly the direct sum of two tableaux in '(G, S) is still in '(G, S) for any G and S.

Now consider the composition of two gates. Each entry in the tableau is a dot product of some row from one tableau with some column from the other. Hence, the entry is a sum of S x S, S x M, M x S, or M x M products. Observe that the S x S products are all zero (for the particular sets S above), so we may ignore those products. Recall that the row and column each contain exactly one entry in M, so depending on whether those entries align, we get either SM+ MS C S or M2 = M. Furthermore, for any row in one tableau there is exactly one column in the other such that the invertible entries line up. Therefore, exactly one entry in any row (or column) of the composition is in M and the rest are in S. Clearly the matrix part of the tableau has the correct form for the invariant.

We must also consider phase bits under composition. Recall that the phase bits associate with the invertible entries of the matrix to produce single-qubit gates. When we multiply two tableau, these single-qubit gates multiply to produce elements in G (since G is a group), as you would expect. If the non-invertible elements are all zero, then this is the only factor in determining phase bits, so the invariant is preserved by composition.

Now consider the phase bits in the case where S contains nonzero elements, for instance,

S = {(88),' (00 )}.

Notice that in this case, both matrices in S have zeros in the bottom row, and the invertible matrices are of the form ( 1 a ). Hence, every even-indexed row of the tableau (as a binary matrix) is all zeros except for one entry. Using the method of tableau composition in Section 5.4, one can easily show that for these even-indexed rows, the phase bits are exactly what one would get by composing the invertible elements as gates in G. For the other half of the rows, the non-invertible elements may flip the phase bits. But we assume G contains the Pauli element (in this case Z) which flips that sign, so the invertible elements and associated phase bits are still in G, therefore

154 the invariant is preserved. The X- and Y-degenerate cases are similar. Last, we show that T(G, S) is preserved under ancilla operations. Recall that when we use ancillas, we remove the rows and columns corresponding to those bits. Clearly the elements of the submatrix are still in M and S. There is a risk that the invertible element for some row could be in one of the removed columns, but if the submatrix is missing an invertible element in some row then the submatrix is not unitary and the ancilla rule must have been misapplied. Hence, only elements in S are removed in the non-ancilla part of the tableau, and each row still contains exactly one entry in M. We appeal to Theorem 87 for the phase bits. The theorem says that removing the ancillas can only change the sign for a row if there is a nonzero entry in the non-ancilla

bits of the row that are removed. For example, if S 0 8) (8)} then only the top phase bit can change. But changing the top phase bit is the same as applying a Z, and for this case Z is assumed to be in G, so the combination of the element in M and the phase bits is still in G. Therefore the Z-degenerate '(G, S) are invariants, and the X-degenerate and Y-degenerate invariants follow by symmetry.

5.7 Equivalence of Generator and Invariant Defini- tions

We have now defined each class by a set of generators, and by an invariant, but have not yet shown that these definitions coincide. Below are a collection of lemmas which prove this for all classes in our lattice. Note that one direction is always trivial: it is easy to check that the generators defining a class satisfy a particular invariant, and therefore everything they generate (i.e., the class) must satisfy the invariant. We encourage the reader to check these invariants against, say, the tableaux in Table 5.2. For the other inclusion (i.e., every gate satisfying the invariant can be generated by the given generators), we start with an arbitrary gate g satisfying the invariant,

155 and apply gates in the class to g to simplify its tableau step-by-step until it is the tableau of the identity operation. It follows that AgB = I for circuits A and B in the class, which proves g = AMB-- is in the class. In many cases, the circuit derived this way is a canonical form for the gate, and can be used to count the number of gates on n qubits in a class. Let us start with the degenerate classes.

Lemma 90. Let G be a group of single-qubit gates, and let g be a gate satisfying the permutation invariant (G,{ 0}. Then there is a circuit for g consisting of a permutation of the inputs followed by layer of single-qubit gates in G.

Proof. Consider the tableau for g. Each row or column has exactly one invertible element, so we can read off a permutation 7r from the positions of those elements. Apply SWAP gates to g to remove this permutation, and put the invertible elements on the diagonal. When we pair a diagonal element with the phase bits for that row, we get a single-qubit gate gi in G. Applying the inverse of this gate to qubit i will zero the phase bits for that row, and make (Q ) the diagonal entry. Once we do this for each row, we have the identity tableau, therefore g is in (G) and has a circuit of the desired form. E

Next, we consider the Z-degenerate classes and, by symmetry, the X-degenerate and Y-degenerate classes.

Lemma 91. Let (Z) C G (P, Rz) be a subgroup of Z-preserving single-qubit gates. Let g be any gate satisfying the permutation invariant'?(G,{(0), ( )}). Then there is a circuit for g consisting of a layer of single-qubit gates (from G), a layer of C(Z, Z) gates, and a permutation.

Proof. Consider the tableau of g. We can read off a permutation 7r, and a single-qubit gate for each input. Assume we have removed those gates (i.e., we now consider the tableau of gir-gi- ... g,-1), so the tableau has ( 0) on the diagonal, all other entries are either ( ) or zero, and the phase bits are zero. The non-zero, off-diagonal entries in the matrix indicate the positions of C(Z, Z) gates. Specifically, if the entry in row i and column j is nonzero then there is a

156 C(Z, Z) on qubits i and j. Note that because the matrix part of the tableau is unitary, the symmetric entry in row j and column i must also be non-zero. The remainder of the circuit consists of the set of C(Z, Z) gates indicated by the non-zero, off- diagonal entries. Notice that C(Z, Z) gates always commute, so their ordering does not matter.. El

Now let us consider four Z-preserving classes which, when we consider symme- try (i.e., the X-preserving and Y-preserving equivalents) cover all but two of the remaining classes.

Lemma 92. Each class (T4, P), (T 4, P, Rz), (C(Z, X), P), and (C(Z, X), P, Rz) is the set of all gates corresponding to a subring invariant, where the subrings are

51 = {(80 00), (10 0)}, S2 = { (800 ), (800'), ( 1 ) (0 i }

S3 = { (8 8), (§ 0'), (8000), ( 10 01)},

S4 = { (80 ), (8 00 ), ( 0 1), ( 001),i ( 1000), ( 1 ), (1 ), (0 )}.

Proof. In all four classes, elements of the tableau are of the form (a ). Suppose ( ) is the ith entry of some row, and ( ) is the jth entry in the same row, where entries labeled by "?" are unconstrained. If we apply a CNOT gate from qubit i to j, these entries will be of the form ( ) and(? di ?d2 ) respectively. That is, the bottom right bits change as though we applied the CNOT gate to those bits. Since a T 4 gate can be built from CNOT gates, it will (similarly) affect the bottom right bits as though we are applying a T4.

Our strategy is to use either CNOT or T 4 gates (depending on the class) to perform Gaussian elimination on the bottom right entries of the matrix elements. If we have access to CNOT gates then we literally apply Gaussian elimination, using CNOT to add one column to another, and using SWAP to exchange columns.

If we only have T4 gates then we are in subring SiC S2 or S2 , so (10) and (Q ) are the only elements with a 1 in the bottom right position, and also the only invertible

157 elements. It follows that the number of bottom right bits set to 1 in a row is the same as the number of invertible elements, which must be odd because the matrix is unitary. To reduce the number, we apply a T4 to three 1 bits and a 0 bit (note: we may add a zero bit by adding an ancilla, if necessary), which changes the 0 to a 1 and the l's to0's, reducing the number of l's (or invertible elements) in the row by two. When there is a single 1 left in the row, unitarity conditions imply that it is also the only 1 left in that column, so we may ignore that row and column for the moment and continue to eliminate the rest of the matrix.

Now suppose we have row reduced the matrix, using either CNOT or T4 , so that the bottom right entry of every element is 0, except along the diagonal where that bit 1. At this point, the diagonal element is the only element in a row that can possibly be invertible, therefore the diagonal elements are of the form( 1 b). Similarly, unitarity conditions imply that the off-diagonal elements are of the form(8 ). In other words, the remaining tableau is Z-degenerate, since there is only one invertible element per row or column, and the off-diagonal elements are inI= {(8 8), (8 )}. We can use either Lemma 90 or Lemma 91 to find a circuit from the remainder, which is in either (P) or (C(Z, Z), P, Rz), depending on the class. 0

There are only two classes remaining, ALL and (T4, P, F), which we handle spe- cially. For the first, we appeal to Aaronson and Gottesman [5] who give an explicit decomposition for any Clifford gate into layers of CNOT, Hadamard (6x+z in our notation), and phase (Rz) gates.

Lemma 93. Any egalitariangate g can be constructed from T4 , P and F gates.

Proof. Egalitarian gates satisfy the invariant that all elements are in the subring

{( 0) (H0), (1H),(1i.

In fact, this subring is isomorphic to F4 , so it is a field. The unitarity of the matrix in our sense translates to unitarity as a matrix over F 4.

Like the other T4 classes, we use Gaussian elimination on the tableau of g. Con- sider a row of the tableau. If the entry in some column is not the identity, then apply

158 F or F 1 to the corresponding qubit to make it the identity. By unitarity, there are an odd number of identity elements in the row. We may remove pairs of identity elements with a T4 , similar to Lemma 92, until there is only one left and the rest of the row is zero. Unitarity implies the column below the identity element is also zero, and we proceed to eliminate the rest of the tableau. Once the matrix part of the tableau is the identity, we apply Pauli matrices to zero out the phase bits.

We conclude that all egalitarian gates are in (T 4, P, F). El

5.8 Circuit Identities

In this section, we give necessary tools to prove that a set of gates generates, in some sense, "all that one could hope for." Formally, we wish to prove that the gate set generates a particular class in the classification lattice when it is contained in that class but fails to satisfy the invariants of all classes below it. To this end, we give several useful circuit identities that will be used extensively in Section 5.9. The following lemma gives only the aspect of that theorem that is necessary to the classification, that is, the ability to extract single-qubit Clifford operations from the composition of generalized CNOT gates.

Lemma 94. Let P, Q, R C P, and let FPFt = Q andJQFt = R. Then

• C(P,Q) and C(P, R) generate Rp.

• C(P, P) and C(P, R) generate Rp.

• C(P, P) and C(Q, R) generate F.

• C(P, P) and C(Q,Q) generateOP+Q.

Proof. The first two inclusions come from the following identity, which holds whenever FQFt = R (i.e. regardless of P):

P P Rp P ancilla rule R Q RQ

159 Similarly, for the third identity, we get

P Q P F swap ancilla rule rulef P R P Ft and for the final identity

P Q P OP+Q 0 P+Q swap ancilla rule rule |P+Q P Q P OP+Q -P+Q

It might seem strange to reduce non-degenerate gates into less powerful single- qubit gates, but we will eventually see that single-qubit generators are the most crucial. Once we have shown that a particular set of gates generates all single-qubit operations, then that set of gates will generate the class of all Clifford operations provided it contains any non-degenerate gate. All non-degenerate gates generate at least one Pauli, often the entire Pauli group, which is why some single-qubit classes do not appear as the single-qubit subgroup of a non-degenerate class. For instance, consider the CNOT gate where the first qubit controls the second qubit. If we let the first input be 11), then a Pauli X operation is always applied to the second qubit. Similarly, if we let the input to the second qubit be |-), then a Pauli Z operation is always applied to the first qubit. Under the ancilla rule, we now have Pauli X and Z operations, so we can generate Y and the entire Pauli group. Clearly, the same is true for any heterogeneous CNOT gate. However, surprisingly, the following lemma shows that even the T 4 gate suffices to generate the entire Pauli group.

Lemma 95. T4 generates the Pauli group.

Proof. Consider the following two circuits:

160 100)-11l) 101)+110)

10) - T4 +) T4

X) x (-) X) |X) x Ix (D 1) Under the ancilla rule, the first generates a Pauli Z operation while the second gen- erates a Pauli X, from which we can clearly generate the Pauli group. E

There is another way to view the identity of Lemma 95 which will be useful later.

Since T 4 is an affine gate over the computational basis states, T4 = C1C2... C, where each Ci is a CNOT gate. Furthermore, T4 and CNOT are their own inverses, so T4 = CCn-1. ..C1. Finally, because T4 is symmetric when represented as a 4 x 4 matrix over F2, T4 = C'Cj... CT. Notice that CT just represents the CNOT gate Ci where the control and target qubits are swapped. Therefore, leveraging the well- known equivalence 012 o CNOT 00+z= SWAP o CNOT o SWAP we arrive at the following consequence: 10

Oi+z T 4 +z = T4 .

Similarly, by straightforward calculation, we get

R0 T 4 1 R4 = T4 .

It is now easy to extend old circuit identities into new ones. For instance, conjugating the first circuit in the proof of Lemma 95 by B+z (which does not change the circuit because of the above observations) and pushing the Ox+z gates into the inputs, yields the second circuit. This technique is in fact very general and is used in the proof of the lemma below.

Lemma 96. T4 and C(P, P) generate Rp.

Proof. Figure 5-6 shows how to generate Rz with C(Z, Z). Using the argument by conjugation above, T 4 and C(P, P) generate RP. l

"Recall that 6x+z is commonly known as the Hadamard gate.

161 -- Z T4 10) - Zx |X) -ix)

Figure 5-6: Generating Rz with T4 and C(Z, Z).

The following lemmas make precise our working assumption that single-qubit gates can significantly bolster the power of non-degenerate gate sets.

Lemma 97. Suppose we have any C(P,Q) gate with any single-qubit gate G that does not preserve the P-basis and any single-qubit gate H that does not preserve the Q-basis. Then (C(P, Q), G, H) = ALL.

Proof. We will prove that the class (C(P, Q), G, H) contains all single-qubit gates. Then, to prove that the class generates all Clifford operations, it is be sufficient to show that it contains a CNOT gate. However, since all generalized CNOT gates are conjugates of each other, this is immediate. First suppose P = Q. Since G does not preserve P-basis, we can use G to create a C(R, R) gate where R # P. By Lemma 94, we can generate aP+Rgate. Conjugating C(P, P) by0 P+Ron the second qubit yields a C(P, R) gate. Once again leveraging Lemma 94, C(P, R) and C(P, P) generate an RP gate. Referring to the single-qubit lattice (see Figure 5-1), we see that the class (P, P+R, R) contains all single-qubit gates. Now suppose that P # Q. Once again, since G does not preserve P-basis, we can use G to create a C(R, Q) gate. If R= Q, then by the logic above, we can use H to generate all single-qubit gates, so suppose R # Q. By Lemma 94, we can use C(P, Q) and C(R, Q) to generate an RQ gate. Conjugating both C(P, Q) and C(R, Q) by H appropriately, gives a C(P, S) and C(R, S) for some S # Q, which we can once again generate an Rs gate. Referring to the single-qubit lattice, we see that the class (P, Rs, RQ) contains all single-qubit gates. I

Lemma 98. T 4 with the class of all single-qubit gates generates ALL.

162 Proof. It is well known that CNOT, Ox+z, and Rz generate all Clifford circuits.

Therefore, it will be sufficient to show that T4 plus all single-qubit gates generate

CNOT. Under the ancilla rule, it is clear by Figure 5-7 that T4 and Rz suffice to generate C(Z, Z). Conjugating one qubit of C(Z, Z) by Ox+z yields a C(Z, X) = CNOT gate, completing the proof. l

-- z - Rz -- ~-z- -Z Z T4 T4 Z Z

------Rz -Z Z

Figure 5-7: Generating C(Z, Z) with T 4 and Rz.

5.9 Universal Construction

Suppose G is an n-qubit Clifford gate. It turns out there is a single circuit €(G), the universal construction, which can help us extract useful generators (e.g., single-qubit gates, generalized CNOTs, etc.) from G. Specifically, the circuit €(G) (shown in Figure 5-8) applies G to qubits 2 through n + 1, swaps qubits 1 and 2, then applies G-1 to qubits 2 through n +1. Note that after we apply G, all of the qubits but one go directly into G- 1, which should intuitively cancel out "most" of the effect G has on those qubits, isolating the effect of G on the swapped qubit. The following theorem makes this intuition more precise.

Theorem 99. Let G be an n-qubit Clifford gate. Then

M(€(G)) = In+1 + ( (1 V*)

163 G G-1

Figure 5-8: Universal Construction €(G) where v E Rnx is the first column of M(G).

Proof. Let M(G)= (A B) with A E R, B E R1x (n- 1), C E R(n-1 )x1, and D E

R(n- 1)x(n- 1). Thus,

M()MGI =1 0 A B A* C* 0 In_1 C D B* D*

AA* +BB* AC*+BD* CA*+DB* CC*+DD*J which implies

1 0 0 0 1 0 1 0 0 0 A* C* M(0(G))= A B 1 0 0 0 A* C* =A BB* BD* 0 C D 00 1 0 B* D* C( DB* DD*

0 A* C* A 1+ AA* AC* A* =In+1 + A (1 C*) - C CA* In_1 +CC*J C

Tableaux of the form I, + ov* have relatively simple circuit decompositions in terms of single-qubit generators and generalized CNOT gates, which is formalized in the following theorem.

164 Theorem 100. Letv= {1, a2, a3 ,..., a2k,bi, , b E R nxl where each as is invert- ible and each bi is singular, and let C be a Clifford circuit such that M(C) = I+vv*. Then C is equivalent to the circuit consisting of

" aT2k gate on the first 2k qubits,

* conjugated by (ai, i) for all i in {2,..., 2k},

" conjugated by CNOT(bi, 1,2k +i) for all i {1, ... , }, and

" a final layer of Pauli gates (not conjugated) on every qubit.

That is, C is equivalent to the circuit in Figure 5-9.

Proof. Notice first that because each ai is invertible, we can conjugate C by 9(ai) for each i E {2, .... , 2k}, yielding a circuit with the simpler tableau

Diag(1,a 2 ,... , a2k1,... ,1)(In+vv*)Diag(1,a*,..., a*,1,... , 1)=In+vv

where v'= {1, ..., 1, bi, . . . , bf}. Furthermore, conjugating the circuit by the gate CNOT(bi, 1, i+ 2k) corresponds to the simplification

CNOT(bi, 1, i +2k)(In +v'v'*)CNOT(bi, 1, i +2k) =In +v"v"* where v" is equal to v' with the exception that entry i+2k is equal to zero. Repeating this procedure for each i {1,. . . , £}, we arrive at a circuit with a very simple tableau:

In + {11, ...,71, 0, ... ,0}{lf1,...,)1,0, ...,70}TI which is exactly the tableau of a T2k gate applied to the first 2k qubits. Notice, finally, that by reversing the procedure and applying the appropriate Pauli gates to each qubit, we can ensure that the tableau of the decomposition is that of €(G). Li

Theorem 100 leads to a clean circuit decomposition of C(G). All that is left to show is that we can actually generate each of the elementary gates that appears in the

165 -[DE .. -- b ---.. b2 b1 P1

9(a2) -- '(a 2 ) - P2F

-g(as) -T2k g-(a3 ) -R- --- P3-

1 -g(a2 k) - -g- (a2 k) ----- P2k --

b* 2k+1 I-

- - b* P2k+2-

Figure 5-9: Decomposition of €(G). decomposition under the ancilla rule. First, we will need the following useful lemma, which will allow us to essentially disregard the Pauli operators in the decomposition of the universal construction when applying the ancilla rule.

Lemma 101. Let G be a gate on n qubits which is stabilized by some state |a) on the first k qubits and generates H on the remaining qubits. Furthermore, let P be any gate on the first k qubits. Then P o G generates H 0 H-'.

Proof. By supposition we have that G(a) 0 1@)) = 1a) 0 H10). Therefore P o G(la) 0 10)) = Pla) 0 H|1). Now apply the inverse of P o G to Pla) 0 H1) with the same first k qubits and n - k new qubits. In the middle, P cancels with P-', and we can remove those k qubits by using 1a) as an ancilla. On the remaining qubits, we have H 0 H-1 .

We are finally ready to prove the main theorem of this section.

Theorem 102. Let G be a Clifford gate on n qubits. Furthermore, let

v = (1,a2 ,a 3, ... , a2k, bi, ... ,be) E Rnxl

166 be a vector where each ai is invertible and each bi is singular be some row of M(G). Then G generates a gate g such that M(9) = ai for each i E {2,..., 2k}, CNOT(bj) for all] E {1, ... ,£}, and T2k gate.

Proof. From theorems 99 and 100, we know that universal construction €(G) can be decomposed as shown in Figure 5-9. The proof will proceed in the following manner. Starting with the decomposition of C(G), we show that it generates some elementary gate. We then use that gate to simplify the original decomposition of €(G), eventually generating all such gates in this manner. First notice that for some input iE {2k +1,... , n},the single-qubit Clifford state |bi) serves to remove the effect of the generalized CNOT. By Lemma 101, we can remove the last £ qubits with ancillas, at the expense of creating another (inverted) copy of the remaining 2k bits. That is, we have H 0 H- where H is as follows:

1 -g(a 2 ) - -g- (a2) P

g(as) -T2k g-1 (as)

1 g(a2k) g- (a2 k) P2k

Now let )= 1)+111be the Bell state on two qubits. Notice that we can use|#)as an ancilla to remove two bits from T2e (i.e., leaving a T2-2). However, the T2 occurs in H conjugated by single-qubit gates, followed by Pauli operations. If we feed the state g(ai, i)t g(a, j)t0) to bits i and j, the single-qubit gates transform it to|), it removes two bits from the T2, then it is transformed to Pig(ai, i)t 0 Pg(aj)f #). We can do the exact same thing to H- 1, starting with Pig(ai, i)f - Pg(aj, j)t|#) and going to g(ai, i) t 0g(ay, j)ft#). Then we swap the two states, and use the ancilla rule to remove them. The net result is that we can remove any two qubits of H, as long as we remove the same pair of qubits from H-1 . Iterate this procedure to until H has been reduced to just the first two qubits, and 1 so has H- . In particular, the T2 gate in the middle is now a T2,which we observe

167 is actually a SWAP gate. From H, the remaining circuit is

and the inverse survives from H- 1. Remove the swaps, and observe that we have ancillas to remove any of the remaining single-qubit gates, so we can isolate each 1 single-qubit gate, in particular Pi o G-'(ai), P1 o - (ai) and their inverses. From this point on, everything we do for H can be repeated for H-1 , which culminates in

removing all the single-qubit gates, at which point we can remove the extra T2f gate easily. Hence, let us ignore H- 1 . Now let us repeat the procedure above starting from the €(G), but stop short of applying the ancilla rule to qubit 2k+j. The result is the first circuit depicted below, which is then simplified by an application of the swap rule, and gates Pi o9-1(ai) and

P1 o G-(ai):

bj by P1 - (aj) bj g-1(ai)-

9(ai) -- Q-(ai) Pi -4 by

Notice that the topmost qubit is stabilized by G-1(ai)|bj), from which we can see that the ancilla rule immediately generates CNOT(bj). Finally, we exploit the identity,

T2k(P 0 I=2k-1)T2k - I &p®2k-1

which holds up to a global phase. We have the following chain of consequences.

P1 Y 1 -1iF g- (a2 ) P2 9'(a2) P2 1 T2k 9 (a ) P T2k 3 3 T2k

g~1(a) P2k g(a2 k) Pl

168 The last implication comes from the fact that we already generated Pi o g- 1(ai) and 1 P1 o g- (ai). El

5.10 Completing the Classification

The final step in the classification is to demonstrate that the classes we have defined are in fact the only classes that exist.

Theorem 103. Let S be some class in the classification, and let G be a collection of gates. Suppose (G) C S, but (G) 5 S' for all S' below S in our classification. That is, for all such S' there exists a gate g G G such that g $ S'. Then (G) = S.

Proof. Let C be the class generated by G. There is a very general strategy for prov- ing that the given class C is indeed one already stated in the classification. For each invariant described in Section 5.6 that C fails to preserve, the universal construction generates a (simple) gate which also fails to satisfy that invariant by Theorem 102. Composing these gates using identities from Section 5.8, one can show that they al- ways generate some class in the classification. This would complete the classification. Nevertheless, we now give a complete sequence of tests to identify the class C. First consider the degeneracy invariant. If C is degenerate, then by Lemma 90 we can decompose each gate into a circuit of single-qubit gates and swap gates. Each single- qubit gate can be extracted with an appropriate tensor product of single-qubit ancillas on the rest. The question therefore reduces to the simple group-theoretic question about the subgroups of single-qubit gates, which can be solved straightforwardly. Therefore, assume C is non-degenerate. Let us separate the remainder of the proof based on the X-, Y-, Z-preserving invariant. Let P, Q, and R be distinct Pauli operations. Suppose first that C is X-, Y-, and Z-preserving. Because every generalized CNOT gate violates one such invariant and C is non-degenerate, some gate in C must have a tableau with multiple invertible elements in some row. Therefore, from the universal construction we extract a T4 gate. By Lemma 95, we have then that C = (T4 ,'P).

169 Suppose now that C is P- and Q-preserving but not R-preserving. Because C is non-degenerate, it must generate either a heterogeneous CNOT gate, a homogenous

CNOT gate, or a T 4 gate. We wish to show that C contains a C(P, Q) gate, which would imply C = (C(P, Q), P) as desired. First notice that no homogeneous CNOT gate can be P- and Q-preserving. Suppose then that from the universal construction,

C generates a T4 gate but no C(P, Q) gate. Since T4 is R-preserving, there must be some single-qubit gate from the universal construction that is not R-preserving but is P- and Q-preserving. It is straightforward to check that no such single-qubit gate exists, which implies that C must contain a C(P, Q) gate. Suppose now that C is P-preserving but not Q- and R-preserving. This is the most involved case and will require several more subdivisions. It will be first useful to notice that all P-preserving single-qubit gates are also P-orthogonal. Therefore, if C violates the P-orthogonality invariant, then the universal construction must produce some non-degenerate gate which violates P-orthogonality. Therefore, C contains a C(P, Q) gate or a C(P, R) gate. If it contains both via the universal construction, then indeed C = (C(P,Q), Rp,P) by Lemma 94. Otherwise, the universal construction produces some single-qubit gate which is P-preserving but not Q- and R-preserving. Since all heterogeneous CNOT gates generate the Pauli group as well, the class of single-qubit gates must therefore contain an Rp gate.

Therefore let us now assume that C is P-orthogonal but not P-degenerate. Since C is P-orthogonal, it cannot contain a C(P, Q) or C(P, R). However, because C is P-preserving, it must contain a T4 gate; otherwise it would be P-degenerate.

Once again, since the T4 gate generates the Pauli group and C is neither Q- nor R-preserving, the class of single-qubit gates must contain an Rp gate, implying that

C = (T4, Rp, P). Let us then assume that C is P-degenerate. Since C is non-degenerate, it must contain a C(P, P) gate. There are five P-degenerate classes, which are determined by their single-qubit subgroup. Indeed, the five P-degenerate classes correspond to the five P-preserving single-qubit classes containing P. Unlike previous cases, such a diversity of classes exists because C(P, P) does not suffice to generate the Pauli

170 group on its own. Once again, the universal construction allows us to extract the C(P, P) gate along with single-qubit gates which suffice to generate every gate in C. It is straightforward to see why the entire single-qubit group must arise from the universal construction. If not, then each gate in C could be constructed by gates in a smaller class, a contradiction. This completes the classification of all gates that are P-preserving but not Q- and R-preserving. Assume then that C is neither P-,Q-, nor R-preserving. If it is egalitarian, then it must contain a T4 gate because it is non-degenerate. Therefore it must contain a single-qubit gate that is egalitarian, but not P-,Q-, or R-preserving. The only such single-qubit class which contains the Pauli group is (F, P). Therefore C = (T4, F, P). Finally, let us then assume that C also violates the egalitarian invariant. That is, C violates every invariant so should be equal to the class of all Clifford operations. Suppose the only non-degenerate gate generated by the universal construction is the

T4 gate. In particular, this implies that class of single-qubit gates generated by C must not be X-, Y-, Z-preserving, nor egalitarian. Therefore the single-qubit class of C must contain all single-qubit gates. Therefore by Lemma 98 we generate all Clifford circuits. Similarly, Lemma 94 implies that if the set of generalized CNOT gates generated by the universal construction fails to be P-, Q-, or R-preserving, then C generates a single-qubit gate which fails to be P-, Q-, or R-preserving, respectively. Combining this fact with Lemma 97 implies that C = ALL. E

Corollary 104. Given any set of gates G, there is a subset S C G of at most three gates such that (S) = (G).

Proof. The result follows by a careful accounting of the gates used in the proof of Theorem 103. We give the argument only for the degenerate classes.

Let G be a set of single-qubit gates with S C G and such that (S) = (G). Suppose

S = {gi, ... , g }with k > 3. We can assume that each generator in the set S is not contained in the subgroup generated by the other elements in S, otherwise we could remove that generator, reducing the size of S. Therefore, there is an ascending chain

171 of subgroups Go C GC ... C Gk, where Gi = (g1,... ,g). Observing Figure 5-1, the longest ascending chain has length four, and in particular, all chains of length four end at the class of all single-qubit gates. Furthermore, that chain must contain one of the subgroups (P, Rx), (P, Ry), or (P, Rz), which we can assume is (P, Rz) by symmetry. Since not all gates in S are Pauli operators, we can assume that 9i is not a Pauli operator. In particular, gi E (P7,Rz) \ (7). That is, gi is one of Rz, Rtz, Ox+y, or

Ox-y. Therefore, the class G2 is either (Rz) or (Z,Oxy). Clearly, 94 (P, Rz) (i.e., it does not preserve the Z-basis), and by a simply case analysis, it is easy to see that

(gi,92,94) is the entire class of single-qubit operations.

5.11 Enumeration

Theorem 105. Let # (-), denote the number of n qubit gates in a class. Then

# (G)n - |G|In! for G a group of single-qubit gates,

#(C(Z, Z), G)n - |G|" 2n(n-1)/ 2n! for (Z) C G C (P, Rz) a group,

# CZ, X),1 P)n - 4n2n(n-1)/2 (2 -- 1),

#(C(Z, X), P, Rz)n - 8n2n(n-1) (2i - 1),

# (T4,) ) n = 4"a(n),

#(T 4,P, Rz)n = 8"2"C"n-1)/2 a(n),

#(T 4,7,7F) - 4n2n(n-1)/2 (2'-

# (ALL)n - 4n2n2fJ(4 - 1), i=1 where

1), if n = 2m, a(n) = 2m2 ]m-l(2 2i-

2m2 -R (2 2i- 1), if n = 2m + 1.

172 Proof. Most of these numbers follow from the lemmas above. For example, consider the class (C(Z, X), P, Rz). It follows from Lemma 92 that any gate in this class has a circuit consisting of a layer of C(Z, X) gates, then a layer of C(Z, Z) gates, then a layer of single-qubit gates in G. We would like to count the number of possible gates by multiplying the number of possibilities for each layer, but we must be careful that there is no gate with two circuit representations. Suppose for a contradiction that gi and92 generate the same gate, but some layer of gi differs fromg2. Then g 192 is the identity, since gi and92 generate the same transformation.

On the other hand, the C(Z, X) layers of 9i and92 meet in the middle of the 1 circuit for 9T 92. If those layers do not generate the same linear transformation, then the combination is some non-trivial linear transformation which is, in particular, not Z-degenerate. The other layers of gi and92 are Z-degenerate, so we conclude that

91 92 is not Z-degenerate (if it were, we could invert the outer layers to show that the two middle layers are Z-degenerate). But9 192= I is clearly Z-degenerate, therefore the C(Z, X) layers of gi and92must generate the same linear transformation.

The C(Z, X) layers of gi and92 cancel (since we have shown they are equivalent), so they effectively disappear, and we make a similar argument about the C(Z, Z) layers, and then the single-qubit layers. That is, if the C(Z, Z) layers do not contain the same set of C(Z, Z) gates, then we obtain a contradiction because they produce a non-degenerate layer in the middle, implying that g192= Iis non-degenerate. Once we remove the C(Z, Z) layers, the single-qubit layers must be the same or they would leave behind a non-trivial single-qubit gate. We conclude that all layers of gi and92 are actually the same, so the number of gates is the product of the number of choices for each layer.

Now the problem is to count the number of choices for each layer. For the single- qubit layer, this is clearly just n independent choices of single-qubit gate from ('P, Rz), or 8n. For the C(Z, Z) layer, there is a choice whether or not to place a C(Z, Z) gate in each of the () possible positions, so 2n(n-1)/2 choices for the layer. For the C(Z, X) layer, observe that C(Z, X) generate precisely the set of invertible linear

173 transformations, of which there are

2n(n-1)/2 ]J(2' - 1) i=1 by a classical argument. Multiplying the three layers, we have a total of

#(C(Z, X),7 P, Rz)n = 8n2n(n-1) J(2 - 1) i=1 n-qubit transformations generated byC(Z, X),7', and Rz.

The numbers for (G), (C(Z, Z), G), (C(Z, X), P), (T4,P), and(T 4, P,Rz) follow by a similar argument, although for the last two classes we need the fact that T4 generates {2m2 Hlm(22i- 1), if n = 2m,

2m2 H= (2 2i- 1), if n = 2m + 1. orthogonal transformations on n qubits.

For the final two classes, we use known expressions (from [34]) for the number of n x n unitary matrices over F4 (in the case of (T4, P, F)) and for the number of

2n x 2n symplectic matrices over F2 (in the case of ALL). We multiply by 4" in both cases to account for the phase bits, which are completely independent of the matrix part.

174 Theorem 106. The asymptotic size of each class is as follows.

1 = nlog (G|) + n log n - nlog e+ log 27r + O(n- 1 log2 # (G)n 2 2 2 2 2 ),

= nlg2(IGI) - log2 #(C(Z, Z), G)n + 2(n 1)+nlog 2 n - nlog 2 e 1 1 + 2log 2 27r + O(n- ), = log2 # (C(Z, X), P) n n2 + 2n - a + O(2-n), 3 n2 + --n - a + O(2-n), log2 #(C(Z, X), P, Rz)n 2 2 12 3 = n2 +-n - +O(2~-), log2 #(T 4 , P), 2 2 n2 log2 #(T 4 ,P, Rz)n + 3n - 3 + O(2-n),

= 2 log2 #(T 4 , P, F)n + 2n + y + O(2-"),

= 2n2 +3n - # + O(4-n). log2 #(ALL)n where G is the same as in Theorem 105, and

00

a - 0log 2 (1 - 2-) 1.7919,

00 - log2(1 - 4-) 0.53839,

00

109lo 2 (1 - (-2)-) 0.27587.

Proof. We take the logarithm of each class size, which we can separate into the logarithm of each layer comprising that class, as in Theorem 105. For most layers this is straightforward, except for the layer of permutations, orthogonal transformations, or general linear transformations. The first we handle with Stirling's approximation. For the other two, we factor out powers of two leaving a partial sum of a convergent series, which we analyze with a Taylor expansion. The classes (T4 ,P, F)" and (ALL). follow by similar techniques. E

Corollary 107. Let C be any class, and let G be an n-qubit gate chosen uniformly

175 at random from C. Then

Pr [G generates C] = 1 - O(2-).

176 Bibliography

[1] Scott Aaronson. Quantum computing, postselection, and probabilistic polynomial-time. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 461(2063):3473-3482, 2005.

[2] Scott Aaronson. A linear-optical proof that the permanent is #P-hard. Proceed- ings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 467(2136):3393-3405, 2011.

[3] Scott Aaronson and Alex Arkhipov. The computational complexity of linear optics. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 333-342. ACM, 2011.

[4] Scott Aaronson, Adam Bouland, Joseph Fitzsimons, and Mitchell Lee. The space just above BQP. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, pages 271-280. ACM, 2016.

[5] Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizer circuits. Physical Review A, 70(5):052328, 2004.

[6] Scott Aaronson, Daniel Grier, and Luke Schaeffer. The classification of reversible bit operations. In 8th Innovations in Theoretical Computer Science Conference. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.

[7] Scott Aaronson, Daniel Grier, and Luke Schaeffer. A quantum query complexity trichotomy for regular languages. arXiv preprint arXiv:1812.04219 (to appear FOCS 2019), 2018.

[8] Leonard M Adleman, Jonathan DeMarrais, and Ming-Deh A Huang. Quantum computability. SIAM Journal on Computing, 26(5):1524-1540, 1997.

[9] Panos Aliferis. Level reduction and the quantum threshold theorem. In PhD thesis, Caltech, 2007, arXiv: quant-/0703230. Citeseer, 2007.

[10] Noga Alon, Michael Krivelevich, Ilan Newman, and Mario Szegedy. Regular languages are testable with a constant number of queries. SIAM Journal on Computing, 30(6):1842-1862, 2001.

[11] Andris Ambainis. Quantum lower bounds by quantum arguments. Journal of Computer and System Sciences, 64(4):750-767, 2002.

177 [12] Andris Ambainis. Polynomial degree and lower bounds in quantum complex- ity: Collision and element distinctness with small range. Theory of Computing, 1(1):37-46, 2005.

[13] Andris Ambainis, Kaspars Balodis, Janis Iraids, Martins Kokainis, Krisjanis Pruisis, and Jevgenijs Vihrovs. Quantum speedups for exponential-time dynamic programming algorithms. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1783-1793. SIAM, 2019.

[14] Jonas T Anderson. On the power of reusable magic states. arXiv preprint arXiv:1205.0289, 2012.

[15] Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 457-466. IEEE, 2016.

[16] Adriano Barenco, Charles H Bennett, Richard Cleve, David P DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A Smolin, and Harald Weinfurter. Elementary gates for quantum computation. Physical review A, 52(5):3457, 1995.

[17] David A Barrington and Denis Th6rien. Finite monoids and the fine structure of NC 1. Journal of the ACM (JACM), 35(4):941-952, 1988.

[18] Robert Beals, Harry Buhrman, Richard Cleve, Michele Mosca, and Ronald De Wolf. Quantum lower bounds by polynomials. Journal of the A CM (JACM), 48(4):778-797, 2001.

[19] Richard Beigel, Nick Reingold, and Daniel Spielman. PP is closed under inter- section. Journal of Computer and System Sciences, 50(2):191-202, 1995.

[20] Amir Ben-Dor and Shai Halevi. Zero-one permanent is #P-complete, a simpler proof. In ISTCS, pages 108-117, 1993.

[21] Charles H Bennett and Gilles Brassard. : public key distribution and coin tossing. Theor. Comput. Sci., 560(12):7-11, 2014.

[22] Stuart J Berkowitz. On computing the determinant in small parallel time using a small number of processors. Information processing letters, 18(3):147-150, 1984.

[23] Adam Bouland and Scott Aaronson. Generation of universal linear optics by any beam splitter. Physical Review A, 89(6):062316, 2014.

[24] Adam Bouland, Laura Maneinska, and Xue Zhang. Complexity classification of two-qubit commuting hamiltonians. In Proceedings of the 31st Conference on Computational Complexity, page 28. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.

178 [25] Sergey Bravyi and Alexei Kitaev. Universal quantum computation with ideal Clifford gates and noisy ancillas. Physical Review A, 71(2):022316, 2005. [26] Michael J Bremner, Richard Jozsa, and Dan J Shepherd. Classical simulation of commuting quantum computations implies collapse of the polynomial hierar- chy. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 467(2126):459-472, 2010. [27] Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey. Theoretical Computer Science, 288(1):21-43, 2002. [28] Mark Bun, Robin Kothari, and Justin Thaler. Quantum algorithms and approx- imating polynomials for composed functions with shared inputs. arXiv preprint arXiv:1809.02254, 2018. [291 Eduardo R Caianiello. On quantum field theory, i: explicit solution of Dyson's equation in electrodynamics without use of Feynman graphs. Nuovo Cimento, 10:1634-1652, 1953. [30] Levon Chakhmakhchyan, Nicolas J Cerf, and Raul Garcia-Patron. Quantum- inspired algorithm for estimating the permanent of positive semidefinite matrices. Physical Review A, 96(2):022329, 2017. [31] Ashok K Chandra, Steven Fortune, and Richard Lipton. Unbounded fan-in circuits and associative functions. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 52-60. ACM, 1983. [32] Andrew M Childs and Robin Kothari. Quantum query complexity of minor- closed graph properties. SIAM Journal on Computing, 41(6):1426-1450, 2012. [33] Andrew M Childs, Debbie Leung, Laura Mandinska, and Maris Ozols. Char- acterization of universal two-qubit Hamiltonians. and Computation, 11(1&2):0019-0039, 2011. [34] John Horton Conway, Robert T Curtis, Simon P Norton, and Richard A Parker. Atlas of finite groups. Oxford University Press, 1985. [35] Stefano Crespi-Reghizzi, Giovanni Guida, and Dino Mandrioli. Noncounting context-free languages. Journal of the ACM (JACM), 25(4):571-580, 1978. [36] Ronald de Wolf. A note on quantum algorithms and the minimal de- gree of epsilon-error polynomials for symmetric functions. arXiv preprint arXiv:0802.1816, 2008. [37] Jeroen Dehaene and Bart De Moor. Clifford group, stabilizer states, and linear and quadratic operations over GF(2). Physical Review A, 68(4):042318, 2003. [38] Andy Drucker and Ronald de Wolf. Quantum proofs for classical theorems. Theory of Computing GraduateSurveys, (2):1-54, 2011. arXiv:0910.3376, ECCC TR03-048.

179 [39] Samuel Eilenberg. Automata, Languages, and Machines. Academic press, 1974. [40] Artur K Ekert. Quantum cryptography based on Bell's theorem. Physical review letters, 67(6):661, 1991.

[41] Gerard J Milburn Emanuel Knill, Raymond Laflamme. A scheme for efficient quantum computation with linear optics. Nature, 409:46-52, 2001.

[42] Stephen Fenner, Frederic Green, Steven Homer, and Randall Pruim. Determining acceptance possibility for a quantum computation is hard for the polynomial hierarchy. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 455(1991):3953-3966, 1999.

[43] Stephen A Fenner, Lance J Fortnow, and Stuart A Kurtz. Gap-definable counting classes. Journal of Computer and System Sciences, 48(1):116-148, 1994.

[44] Samuel Fiorini, Serge Massar, Sebastian Pokutta, Hans Raj Tiwary, and Ronald De Wolf. Exponential lower bounds for polytopes in combinatorial opti- mization. Journal of the ACM (JACM), 62(2):17, 2015.

[45] Michael R Garey, David S Johnson, and Larry Stockmeyer. Some simplified N P-complete problems. In Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, pages 47-63. ACM, 1974.

[46] Daniel Gottesman. Stabilizer codes and . PhD thesis, California Institute of Technology, 1997.

[47] Daniel Gottesman. The Heisenberg representation of quantum computers. arXiv preprint quant-ph/9807006, 1998.

[48] Daniel Gottesman. Theory of fault-tolerant quantum computation. Physical Review A, 57(1):127, 1998.

[49] Daniel Grier and Luke Schaeffer. The classification of stabilizer operations over qubits. arXiv preprint arXiv:1603.03999, 2016.

[50] Daniel Grier and Luke Schaeffer. New hardness results for the permanent using linear optics. In 33rd Computational Complexity Conference (CCC 2018). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2018.

[51] Walter Grimus and Patrick Otto Ludl. On the characterization of the SU(3)- subgroups of type C and D. Journal of Physics A: Mathematical and Theoretical, 47(7):075202, 2014.

[52] Leonid Gurvits. On the complexity of mixed discriminants and related problems. In InternationalSymposium on Mathematical Foundations of Computer Science, pages 447-458. Springer, 2005.

180 [53] Kristoffer Arnsfelt Hansen, Balagopal Komarath, Jayalal Sarma, Sven Skyum, and Navid Talebanfard. Circuit complexity of properties of graphs with constant planar cutwidth. In InternationalSymposium on Mathematical Foundations of Computer Science, pages 336-347. Springer, 2014.

[54] Chong-Ki Hong, Zhe-Yu Ou, and Leonard Mandel. Measurement of subpicosec- ond time intervals between two photons by interference. Physical review letters, 59(18):2044, 1987.

[55] Peter Hoyer, Michele Mosca, and Ronald de Wolf. Quantum search on bounded- error inputs. In International Colloquium on Automata, Languages, and Pro- gramming, pages 291-299. Springer, 2003.

[56] Hao Huang. Induced subgraphs of hypercubes and a proof of the sensitivity conjecture. arXiv preprint arXiv:1907.00847, 2019.

[57] N. Cody Jones, Rodney Van Meter, Austin G. Fowler, Peter L. McMahon, Jungsang Kim, Thaddeus D. Ladd, and Yoshihisa Yamamoto. Layered archi- tecture for quantum computing. Physical Review X, 2(3):031007, 2012.

[58] Johan Anthony Wilem Kamp. Tense logic and the theory of linear order. 1968.

[59] Stephen Cole Kleene. Representations of events in nerve nets and finite automata. Automata Studies [Annals of Math. Studies 34], 1956.

[60] Emanuel Knill. Quantum gates using linear optics and postselection. Physical Review A, 66(5), 2002.

[61] Grigory Kogan. Computing permanents over fields of characteristic 3: Where and why it becomes difficult. In Foundations of Computer Science, 1996. Proceedings, 37th Annual Symposium on, pages 108-114. IEEE, 1996.

[62] Kenneth Krohn and John Rhodes. Algebraic theory of machines. I. Prime de- composition theorem for finite semigroups and machines. Transactions of the American Mathematical Society, 116:450-464, 1965.

[63] Samuel Kutin. Quantum lower bound for the collision problem with small range. Theory of Computing, 1(1):29-36, 2005.

[64] Robert McNaughton and Seymour A Papert. Counter-Free Automata (MIT research monograph no. 65). The MIT Press, 1971.

[65] Anand Natarajan and Thomas Vidick. Low-degree testing for quantum states, and a quantum entangled games pcp for . In 2018 IEEE 59th Annual Sympo- sium on Foundations of Computer Science (FOCS), pages 731-742. IEEE, 2018.

[66] Michael A Nielsen and Isaac L Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000.

181 [67] Noam Nisan. CREW PRAMs and decision trees. SIAM Journal on Computing, 20(6):999-1007, 1991.

[68] Emil L Post. The two-valued iterative systems of mathematical logic. Number 5 in Annals of Mathematics Studies. Princeton University Press, 1941.

[69] Jaikumar Radhakrishnan, Pranab Sen, and Srinivasan Venkatesh. The quan- tum complexity of set membership. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pages 554-562. IEEE, 2000.

[70] Saleh Rahimi-Keshari, Austin P Lund, and Timothy C Ralph. What can quan- tum optics say about computational complexity theory? Physical review letters, 114(6):060501, 2015.

[71] Robert Raussendorf, Daniel E Browne, and Hans J Briegel. Measurement-based quantum computation on cluster states. Physical review A, 68(2):022312, 2003.

[72] Ben W Reichardt. Span programs and quantum query complexity: The general adversary bound is nearly tight for every boolean function. In Foundations of Computer Science, 2009. FOCS'09. 50th Annual IEEE Symposium on, pages 544-551. IEEE, 2009.

[73] Terry Rudolph. Simple encoding of a quantum circuit amplitude as a matrix permanent. Physical Review A, 80(5):054302, 2009.

[74] Marcel Paul Schiitzenberger. On finite monoids having only trivial subgroups. Information and control, 8(2):190-194, 1965.

[75] Rimli Sengupta. Cancellation is exponentially powerful for computing the deter- minant. Information Processing Letters, 62(4):177-181, 1997.

[76] Yaoyun Shi. Both toffoli and controlled-not need little help to do universal quantum computing. Quantum Information & Computation, 3(1):84-92, 2003.

[77] Peter W Shor. Scheme for reducing decoherence in quantum computer memory. Physical review A, 52(4):R2493, 1995.

[78] Hans-Ulrich Simon. A tight Q(loglogn)-bound on the time for parallel RAM's to compute nondegenerated boolean functions. In International Conference on Fundamentals of Computation Theory, pages 439-444. Springer, 1983.

[79] Michael Sipser. Introduction to the Theory of Computation, volume 2. Thomson Course Technology Boston, 2006.

[80] Andrew Steane. Multiple-particle interference and quantum error correction. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 452(1954):2551-2577, 1996.

182 [81] Richard Edwin Stearns, Juris Hartmanis, and Philip M Lewis. Hierarchies of memory limited computations. In Switching Circuit Theory and Logical Design, 1965. SWCT 1965. Sixth Annual Symposium on, pages 179-190. IEEE, 1965.

[82] Larry Stockmeyer. The complexity of approximate counting. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 118-126. ACM, 1983.

[83] Pascal Tesson and Denis Th6rien. Complete classifications for the communication complexity of regular languages. In Annual Symposium on Theoretical Aspects of Computer Science, pages 62-73. Springer, 2003.

[84] Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 20(5):865-877, 1991.

[85] Seinosuke Toda and Mitsunori Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 21(2):316-328, 1992.

[86] Tommaso Toffoli. Reversible computing. In International Colloquium on Au- tomata, Languages, and Programming, pages 632-644. Springer, 1980.

[87] Lidror Troyansky and Naftali Tishby. Permanent uncertainty: On the quantum evaluation of the determinant and the permanent of a matrix. In Proceedings of PhysComp, volume 15, 1996.

[88] Nikolaj Tschebotareff. Die Bestimmung der Dichtigkeit einer Menge von Primzahlen, welche zu einer gegebenen Substitutionsklasse geh6ren. Mathema- tische Annalen, 95(1):191-228, 1926.

[89] Leslie G Valiant. Completeness classes in algebra. In Proceedings of the eleventh annual ACM symposium on Theory of computing, pages 249-261. ACM, 1979.

[90] Leslie G Valiant. The complexity of computing the permanent. Theoretical computer science, 8(2):189-201, 1979.

[91] Leslie G Valiant. Negation can be exponentially powerful. In Proceedings of the eleventh annual ACM symposium on Theory of computing, pages 189-196. ACM, 1979.

[92] Leslie G Valiant. Universality considerations in VLSI circuits. IEEE Transactions on Computers, 100(2):135-140, 1981.

[93] Maarten Van den Nest. Classical simulation of quantum computation, the Gottesman-Knill theorem, and slightly beyond. Quantum Information & Com- putation, 10(3):258-271, 2010.

183