Successes and Setbacks in P Vs NP

Successes and setbacks in P vs NP

A study of fruitful proof methods in complexity theory and the barriers they face towards solving P vs NP

Marlou M. Gijzen July 8, 2016

Bachelorthesis Supervisor: prof. dr. Ronald de Wolf

Korteweg-de Vries Instituut voor Wiskunde Faculteit der Natuurwetenschappen, Wiskunde en Informatica Universiteit van Amsterdam Abstract

There are several successful proof techniques in complexity theory, but each has its own barrier towards solving P vs NP. Diagonalization gave us the Hierarchy Theorems, but most diagonalization proofs relativize and cannot solve P vs NP. When considering circuits, we found a lot of lower bounds on complexity classes. But most of the proofs in circuit complexity are natural proofs, and those cannot show that P 6= NP. Arithmetization turned out useful in interactive proofs, but it induces algebrizing results. Those cannot solve P vs NP either. However, there is slight hope for proving that P 6= NP by showing upper bounds.

Title: Successes and setbacks in P vs NP Author: Marlou M. Gijzen, [email protected].nl, 6127901 Supervisor: prof. dr. Ronald de Wolf Second grader: dr. Leen Torenvliet End date: July 8, 2016

Korteweg-de Vries Instituut voor Wiskunde Universiteit van Amsterdam Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math

ii Contents

1 Introduction 1

2 Preliminaries 3 2.1 Turing Machines ...... 3 2.1.1 What are Turing machines? ...... 3 2.1.2 How the Turing machine solves a problem ...... 3 2.1.3 Universal and probabilistic Turing machines ...... 4 2.1.4 Languages and oracle Turing machines ...... 5 2.2 Computational complexity ...... 5 2.2.1 Running time and measuring complexity ...... 5 2.2.2 Classication according to complexity ...... 6 2.2.3 NP-complete problems and reduction ...... 7 2.2.4 Measuring complexity with Boolean circuits ...... 7

3 Diagonalization and relativization 9 3.1 Diagonalization and complexity ...... 9 3.1.1 How diagonalization found its way into computer science ...... 9 3.1.2 What diagonalization is and what it’s used for in complexity . . . . 10 3.1.3 The Time Hierarchy Theorem ...... 10 3.1.4 Other diagonalization results ...... 11 3.2 The barrier diagonalization imposes: relativization ...... 12 3.2.1 Relativization and why diagonalization proofs relativize ...... 12 3.2.2 Relativizing results cannot solve the P vs NP problem ...... 12 3.2.3 Non-relativizing diagonalization ...... 13

4 Circuits and natural proofs 14 4.1 Usage and successes ...... 14 4.1.1 The size and depth of circuits ...... 14 4.1.2 Parity is not in AC0 ...... 15 4.1.3 Majority is not in AC0 with parity gates ...... 16 4.2 The barrier in circuit complexity: natural proofs ...... 18 4.2.1 What are natural proofs? ...... 18 4.2.2 Most proofs in circuit complexity naturalize ...... 19 4.2.3 Natural proofs cannot solve P vs NP ...... 20 4.2.4 Non-naturalizing results ...... 21

iii 5 Arithmetization and algebrization 22 5.1 The usage of arithmetization ...... 22 5.1.1 What is arithmetization? ...... 22 5.1.2 Rounds of interaction and deterministic interactive proofs ...... 22 5.1.3 Probabilistic interactive proofs and the class IP ...... 23 5.1.4 IP = PSPACE ...... 24 5.2 The barrier arithmetization imposes: algebrization ...... 26 5.2.1 What are algebrizing results? ...... 27 5.2.2 Most results that use arithmetization algebrize ...... 27 5.2.3 Algebrizing results cannot solve P vs NP ...... 27 5.2.4 Other results that need non-algebrizing techniques ...... 28

6 Evaluation of the current situation 29 6.1 Remarks ...... 29 6.1.1 P vs NP could be independent of ZFC ...... 29 6.1.2 The natural proofs barrier ...... 29 6.1.3 The opinions of researchers about P vs NP ...... 30 6.2 Other methods for proving P 6= NP ...... 30 6.2.1 Proof complexity ...... 30 6.2.2 Autoreducibility ...... 30 6.2.3 Showing lower bounds from upper bounds ...... 31 6.3 Attacking NP-hard problems ...... 32 6.3.1 Algorithms and approximation ...... 32 6.3.2 Average-case and worst-case scenario’s ...... 32 6.3.3 Quantum computing ...... 33

7 Conclusion 34

8 Samenvatting 35

iv 1 Introduction

The P versus NP problem has gone from an interesting problem related to logic to perhaps the most fundamental and important mathematical question of our “ time, whose importance only grows as computers become more powerful and widespread. Lance Fortnow in [22] ” In our daily lives, we are accustomed to using computers to solve a plethora of problems; calculating costs, determining the shortest route or even simply googling things we want to know. The time needed to compute solutions to problems like this varies with the problem itself. In 1956 Gödel wrote a letter to von Neumann with the rst mention of the P vs NP problem. He wondered to what extent we would be able to reduce the time it takes for a Turing machine to solve hard computational problems. Turing machines describe our notion of computability. The Church-Turing thesis states that anything that is computable (in an intuitive sense), can be computed by a Turing machine. Dierent alternatives to distinguish between the computable and non-computable have arisen (Post systems, λ-calculus, combinatory logic, µ-recursive functions, Turing machines). These all turned out to be equivalent to each other; hence Church declared that they all represent what it means to be computable. This was before computers were even invented [33, Lec. 28]. Since then, we’ve started using Turing machines in the theory of problem solving. Scientists began to categorise problems according to the duration of the computation. We call the eld that studies the complexity (or hardness) of problems complexity theory. P is the collection of problems that we can solve quickly with a computer. To be more precise: P stands for Polynomial time, problems that a Turing machine can solve in time that grows polynomially in the input size. Consider an algorithm that determines whether a given number is prime. For larger numbers, the algorithm takes more time to compute. For problems in P, the time an algorithm needs can be expressed as a polynomial. The variable in the polynomial is the size of the input (the length of the list). NP stands for Non-deterministic Polynomial time and refers to problems for which a computer can quickly verify a solution. A computer, a deterministic machine, always per- forms the exact same computation for the same input; their actions for each situation are predetermined. However, a non-deterministic machine can perform dierent possible computations for the same input. These dierent computations are the dierent paths that lead towards possible solutions of the problem. Non-determinism enables a Turing machine to make guesses during the computation. This way, the machine can always nd a path towards

1 a correct solution, if such a path exists. We call this path a certicate. For problems in NP, this path leads the machine to the solution in polynomial time. If we give a deterministic machine this certicate, it is also able to go through it in polynomial time. This is why problems in NP can be veried quickly. Hence the big question: does P equal NP? If we are able to verify a solution quickly, are we able to nd one quickly too? Intuitively, the answer is no; verifying the solution to a sudoku puzzle is much easier than nding one yourself. Yet no-one has been able to prove that this is the case. In 2000, the Clay Mathematics Institute listed seven fundamental unsolved math problems known as the Millenium Problems. A prize of a million (US) dollars will be given to anyone with a solution. Among these problems is P vs NP: we want a proof that shows that P 6= NP or P = NP . These classes are of great interest to us, because P envisions the problems we can solve quickly, while NP contains many problems we would like to solve quickly [20]. The schedul- ing of timetables is one of them. But also protein folding, which is important for biologists. If we had an ecient algorithm for nding the optimal folding of a protein, then we could understand and prevent a lot of diseases far better. Cryptography, what we use for sending information over the internet securely, relies on the hardness of NP problems. If P = NP, then we could easily decrypt messages sent over the internet and nd passwords that are used on websites. But we would also be able to live in a world with optimal schedules in public transportation, better weather predictions and improved disease prevention. We could understand stock markets far better and retrace evolution [1, 20]. It seems almost obvious that P 6= NP. Feynman, a famous physicist, found it hard to accept that P vs NP even was an open problem [2]. However, a proof that P 6= NP could give us insight in why the problems in NP are hard to solve and it might enable us to nd ecient solutions to those problems in specic cases. But so far we have been unable to give such a proof. The proof techniques that have been traditionally successful for other problems in complexity theory turned out to be insucient for solving P vs NP. Each faces unique barriers and is thus unable to prove P 6= NP or P = NP. This thesis studies these proof techniques and explores why P vs NP is so dicult to solve. In Chapter 3, diagonalization and the relativization barrier will be discussed. Chapter 4 is about circuit complexity and natural proofs. In Chapter 5 we will study arithmetization and algebrization. Chapter 6 will discuss some other proof techniques and the current situation of the P vs NP problem.

2 2 Preliminaries

We rst need to discuss dierent kinds of Turing machines and how they work. With that we can explain how computational complexity is measured and categorised.

2.1 Turing Machines

2.1.1 What are Turing machines? Alan Turing dened his automatic machine in 1936 [48]. This machine, which we now call a Turing machine, was meant as a "human computer": It has an unlimited supply of paper for its calculations and it has to follow xed rules, supposedly written in a book. Whenever it’s assigned a new job, the rules get altered [47]. More formally:

Denition 2.1 (Turing machine). A Turing machine is a tuple (Γ, Q, δ), where:

• Γ is a nite set of tape symbols, the alphabet.

• Q is a nite set of states.

• δ is a transition function: Q × Γ → Q × Γ × {L, R, S}.

The Turing machine has a tape, meant for reading and writing. A tape is a one-way innite line of cells, each of which holds a single symbol of Γ. Γ generally contains a blank symbol, a start symbol and the numbers 1 and 0. The tape has a tape head that can read or write symbols on the tape. The machine can only look at one symbol at a time. It can remember those that it has already seen by altering the state it’s in. The register holds the state of the machine, which determines the behaviour. There is a begin state, qstart and an end state, qhalt, that tell the machine when to start and stop [7, Sec. 1.2], [48].

2.1.2 How the Turing machine solves a problem The Turing machine is able to compute solutions to dierent problems. We have to formulate an instance of a problem and give it as input to a Turing machine with a suitable δ in qstart. From then on the machine will try to calculate the solution step by step:

Denition 2.2 (Computational step [7, Sec. 1.2]). A single step is described as follows:

1. The machine reads the symbol under the tape head.

2. This symbol is erased and replaced with a symbol from Γ.

3 3. The state of the machine is replaced with a state from Q. 4. The head moves to the left, right or stays in its position (L, R or S). The new state and symbol can be the same as the ones before. The transition function δ describes for every symbol and state which new symbol and new state should come in place and in which direction the tape head goes next. TM (x) denotes the number of steps a TM M takes on input x. If a machine is able to compute the solution, then it will eventually reach the end state with the solution written on the tape. We can consider a problem as a function f. A Turing machine takes an input x and, after taking several computational steps, it can come into the nal state and output f(x). We use |x| to denote the length of a string x. The computation of a function is thus dened as follows: Denition 2.3 (Computing a function [7, Def. 1.3]). Let f : {0, 1}∗ → {0, 1}∗ and let M be a TM (Turing machine). M computes f if for every x ∈ {0, 1}∗, when M starts with x written on its tape, it halts with f(x) on the tape. A Turing machine can be dened to have more than one tape. Multi-tape Turing machines can compute the same functions as single-tape machines [7, Sec. 1.3]. Thus, without loss of generality we will consider machines with one tape only. 2.1.3 Universal and probabilistic Turing machines We can represent a Turing machine as a binary string, using its description. For the description a list of all inputs and outputs of the transition function will suce, since that function fully determines the behaviour. We use Mα to denote the Turing machine represented by the string α and we use Mα(x) to denote the output that the machine gives on input x. When proving results it will come in handy if we can make two assumptions about the representation scheme. Firstly, we want that every string represents some Turing machine. This can be done by mapping each string that is not a valid description to a trivial machine: one that immediately outputs 0. Secondly, we want that every machine is represented by in- nitely many strings. We ensure this by saying that the description can end with an arbitrary number of 1’s that are ignored [7, Sec. 1.4]. We can use the description to simulate the behaviour of a machine on some input x by another, universal, Turing machine. In order to separate the input and the description we use the symbol # from the alphabet of the universal machine. Theorem 2.4 (Universal Turing machine [7, Th. 1.9]). Let T : N → N be a function. There ∗ exists a Turing machine U such that for all x, α ∈ {0, 1} , U(α#x) = Mα(x). If Mα halts on input x within O(T (|x|)) steps, then U(α#x) halts within CT (|x|) log T (|x|) steps, where C is some number independent of x. Hence we only need one machine to compute everything that is computable. A Turing machine can also use probability: Denition 2.5 (Probabilistic Turing machine [7, Def. 7.1]). A probabilistic Turing machine (PTM) is a Turing machine with two transition functions δ0 and δ1. At every step, the TM 1 applies δ0 or δ1 with probability 2 .

4 2.1.4 Languages and oracle Turing machines For computation we’ll only consider Boolean functions, f : {0, 1}∗ → {0, 1}. The problems associated with these functions are languages or decision problems: questions with a yes-or- n no (1 or 0) answer. For a collection of Boolean functions fn : {0, 1} → {0, 1}, one for each n ∈ N, a language L is a set of binary strings: L = {x | fn(x) = 1 for some n}. Denition 2.6 (Deciding a language). A machine M decides a language L if it computes the functions f : {0, 1}∗ → {0, 1} where f(x) = 1 ⇔ x ∈ L.

An oracle machine is some kind of Turing machine, that is able to immediately decide problems in the language used as the oracle. How this is done is not specied: The oracle works as a "black box", much like an "actual" oracle. If O is a language and M an oracle machine with access to O, we denote the output of M on an input x ∈ {0, 1}∗ by M O(x).

Denition 2.7 (Oracle machines [7, Sec. 3.4]). An oracle Turing machine is a Turing machine that has a tape called the oracle tape. It also has the three states qquery, qyes and qno. The oracle of the machine will start to work whenever the state is in qquery with y written on the oracle tape. In a single step the machine will then move into qyes if y ∈ O and into qno otherwise

2.2 Computational complexity

2.2.1 Running time and measuring complexity If we want to talk about the time it takes to solve a problem, then it is more convenient to count the number of steps a Turing machine takes, than to measure the number of seconds that computing the function took: We don’t want the time for completing a problem to depend on something other than the nature of the problem itself. The number of steps taken by the machine is dependent on the input size. So it makes sense to dene the running time as a function on the input length.

Denition 2.8 (Running time [7, Def. 1.3]). Let f : {0, 1}∗ → {0, 1} and T : N → N be some functions and let M be a TM. M computes f in T(n)-time if on every input x ∈ {0, 1}n, M computes f while taking at most T (n) steps.

Exactly counting the number of steps a Turing machine takes is too precise. If we let a Turing machine work with a dierent numerical system, the computation of the same problem can be done in a dierent number of steps. We will therefore only consider the highest growing term in the running time function. For this we use the big Oh notation [7, Def. 0.2].

Denition 2.9. Let f,g be two functions from N to N. We say that f = O(g) if there exist c, n0 ∈ N such that for all n > n0, f(n) ≤ c · g(n). We say that f = Ω(g) if g = O(f). We f(n) say that f = o(g) if limn→∞ g(n) = 0.

5 We use the term complexity to say something about the resources needed to compute a function. By looking at the complexity of problems we are able to compare them according to their diculty. We can use the running time to measure the complexity, but we can also consider, for example, the amount of working storage used for the computation (space complexity) or the number of gates in a circuit that computes the function (circuit complexity). All these are dependent on the input size and thus we represent them the same way.

2.2.2 Classication according to complexity We classify languages according to their complexity. Before we introduce P, the class that started it all [14], we need the class DTIME:

Denition 2.10 (The class DTIME [7, Def. 1.12]). Let T: N → N be a function. A language L is in DTIME(T (n)) i there exists a TM M such that M decides L and TM (x) ≤ O(T (n)). The class P consists of languages that a machine can solve in polynomial time [7, Sec. 1.6].

S c Denition 2.11 (The class P [7, Def. 1.13]). P = c≥1 DTIME(n ). Examples of problems in P are determining whether a number is prime and deciding whether a graph has some maximum matching (a maximal set of edges such that no edges share a vertex). The class NP contains the languages for which a possible answer, the certicate, can be veried in polynomial time.

Denition 2.12 (The class NP [7, Def. 2.1]). L ∈ NP i there exists a polynomial p: N → N and a TM M that runs in polynomial time such that ∀x ∈ {0, 1}∗: x ∈ L ⇔ ∃u ∈ {0, 1}p(|x|) such that M(x#u) = 1. We call the Turing machine M the verier for L and u the certicate for x. A problem is thus in NP, if there is a possible solution to the problem for which a Turing machine can verify that it is a correct solution in polynomial time. Consider the subset sum problem: given a list of numbers, is there a subset with the sum equal to some C? The input will be the list of numbers, while the certicate is a subset of these numbers. The Turing machine can then check if it adds up to C or not [7, Sec. 2.1]. Other examples of problems in NP are factoring and determining whether two graphs are isomorphic. We have that P ⊆ NP: If L ∈ P, the certicate can be an empty string and the machine can just solve the problem in polynomial time. The classes EXP and NEXP are the exponential-time equivalents of P and NP. If L ∈ NP, we can enumerate and check all possible certicates for the problem in exponential time, so NP ⊆ EXP [7]. For the complexity we can also look at the amount of memory a Turing machine uses: Denition 2.13 (The class SPACE [7, Def. 4.1]). We say that a language L ∈ SPACE(T (n)) i there is a TM that decides L on an input of length n using no more than O(T (n)) locations of its tape. The class of problems which use polynomial space is the following:

S c Denition 2.14 (The class PSPACE [14]). PSPACE = c>0 SPACE(n ).

6 If L ∈ NP, we can go trough all possible certicates using polynomial space: We can write one down, check it, erase it and write down the next one. Thus: NP ⊆ PSPACE [7]. Since a machine that works with polynomial space can visit at most an exponential amount of congurations, PSPACE ⊆ EXP. Thus: P ⊆ NP ⊆ PSPACE ⊆ EXP ⊆ NEXP. Whether P = PSPACE is another open problem in complexity theory. We can also dene complexity classes with oracles. For a complexity class C and oracle O, CO gives the class of languages decided by Turing machines that decide languages in C, only now with oracle access to O.

2.2.3 NP-complete problems and reduction We can compare the diculty of decision problems with reductions. If we translate a specic problem to another, we can nd a solution by solving the other problem. Denition 2.15 (Polynomial-time reducability [7, Def. 2.7]). A language L is polynomial- 0 0 time reducible to a language L (L ≤p L ) if there is a polynomial-time computable function g : {0, 1}∗ → {0, 1}∗ such that for every x ∈ {0, 1}∗, x ∈ L ⇔ g(x) ∈ L0.

With reductions we are able to compare languages based on how hard they are. If L ≤p L0 then L0 is at least as hard as L. If the reduction works both ways the problems can be considered equally dicult, up to a polynomial slowdown. Since ≤p is transitive we don’t need a direct reduction. 0 0 If a problem L is at least as hard as any other problem in NP, so L ≤p L for every L ∈ NP, we call it NP-hard. If L0 is also in NP, then we call it NP-complete [7, Def. 2.7]. Boolean formulas consist of variables in {0, 1} and the logical operators ∧, ∨ and ¬. We say that such a formula is satisable if there exists an assignment that makes the formula true. Satisability, or SAT, is the language of all satisable Boolean formulas. Theorem 2.16 (Cook-Levin Theorem [15, 35]). SAT is NP-complete Besides SAT, there are many more NP-complete problems: every L ∈ NP such that SAT ≤p L. The previously mentioned Subset Sum Problem is one of these. But also the Travelling Salesman Problem, which asks whether there exists a path of some length that visits specic cities on a map (does a graph have a Hamiltonian path with a weight smaller than or equal to some C?). The Clique Problem, of nding a maximum subset of adjacent vertrices in a graph, is also NP-complete.

2.2.4 Measuring complexity with Boolean circuits We can also use the Boolean circuit that computes a function for measuring the complexity. A Boolean circuit can be seen as a simplied model of the silicon chips in a modern computer. It is a diagram that shows how to derive an output from a binary input string. Denition 2.17 (Boolean circuit [7, Sec. 6.1]). An n-input Boolean circuit is a directed acyclic graph with n vertices without incoming edges (the input) and one vertex with no outgoing edges (the output). All vertices except those for the input are gates, for which it uses the

7 logical operations ∨, ∧ and ¬. The fan-in or -out of a gate is the number of in- or outcoming edges. The ∨ and ∧ gates have fan-in 2 and ¬ has fan-in 1. All gates have fan-out 1. If C is a Boolean circuit and x ∈ {0, 1}∗ an input, the output is denoted as C(x). This is the value of the output vertex. The size of the circuit, |C|, is the number of vertices it contains. The depth is the length of the longest path from an input to output vertex.

Below is an example of a circuit that computes ¬x1 ∧ (x2 ∨ x3). It has depth 2 and size 6. In layer one, we have the gates ¬ and ∨ and in layer two the output gate ∧.

Figure 2.1: Example of circuit We can classify languages according to the size and depth of the corresponding circuit. There are two types of circuit families: uniform and non-uniform.

Denition 2.18 ([41, Def. 8.13.1]). A circuit family C = {C1,C2,...} is a collection of circuits in which Cn has n inputs. A T (n)-time (or -space) uniform circuit family contains circuits for which there is a TM M such that on input n, M outputs a description of Cn in T (n) time (or space). Non-uniform circuit families are not restricted to this requirement. With uniform computation the same Turing machine is used for all input sizes. Uniform circuits compute the same functions as Turing machines. Non-uniform computation allows the usage of a dierent algorithm for dierent input sizes. Non-uniform circuits can compute functions that are not computable by any Turing machine, such as the Halting Problem that will be discussed in the next chapter. Those circuits might be an unpractical choice for complexity, given the Church-Turing thesis. However, if we can prove lower bounds on the size of circuits without taking into account whether they are uniform or not, then we can also apply these bounds to uniform circuits and thus to other models as Turing machines [41, Sec. 8.13]. A non-uniform computation can be represented as a family of Boolean circuits: Denition 2.19 (Circuit families and language recognition [7, Def. 6.2]). Let T : N → N be a function. A T (n)-size circuit family is a sequence {Cn}n∈N of Boolean circuits, where Cn has n inputs and ∀n, |Cn| ≤ T (n). For a language L, L ∈ SIZE(T (n)) i there exists a n T (n)-size circuit family {Cn} such that for all x ∈ {0, 1} , x ∈ L ⇔ Cn(x) = 1. The class P contains precisely the languages that have a logspace-uniform polynomial- sized circuit family (a circuit family {Cn} is logspace-uniform if there is a function computable with logarithmic space that maps n to the description of Cn) [7, Sec. 6.2.1]. If we consider non-uniform circuits we obtain the following class: c Denition 2.20 (The class P/poly[7, Def. 6,5]). P/poly = ∪cSIZE(n ).

So P/poly contains languages that are decidable by polynomial-sized circuit families. We have that P ⊆ P/poly. We have now discussed enough basic theory to go into the main part of this thesis.

8 3 Diagonalization and relativization

Diagonalization can be used for separating complexity classes and it gave us the Time Hierarchy Theorem. However, most diagonalization proofs relativize and therefore cannot solve P vs NP.

3.1 Diagonalization and complexity

Cantor was the rst to use the diagonalization argument. He showed the existence of uncountable sets. We call a set X uncountable i there is no injective function from X to N. If X is countable, such a function exists. We can thus enumerate the elements in X, by looking at the image of the injection. Cantors proof goes as follows: Suppose towards a contradiction that we could enumerate the set of innite sequences of binary digits S. We can now create a new element by ipping the diagonal: changing each digit on the diagonal (so a 0 becomes a 1 and vice versa), see Figure 3.1. This new element, s, is another innite sequence of binary digits, so s ∈ S. But s was not in our enumeration, since it diers from each element on at least one point. Therefore, such an enumeration isn’t possible and S is uncountable.

s1 =00000000 ···

s2 = 11111111 ···

s3 = 01010101 ···

s4 = 10101010 ···

s5 = 11001100 ···

s6 = 00110011 ···

s7 = 11100011 ···

s8 = 00011100 ··· ......

s = 10110101 ···

Figure 3.1: The diagonalization argument

3.1.1 How diagonalization found its way into computer science Inspired by this method and by Gödel’s incompleteness theorems, Turing brought the diagonalization method into computer science. He showed that there is no way of deciding whether a Turing machine is circular or circle free. ("If a computing machine never writes down more than a nite number of symbols of the rst kind [0 and 1], it will be called circular. Otherwise it is said to be circle-free") [48, Ch. 2]. In present-day we consider a slightly dif- ferent notion, known as the Halting Problem. Davis stated it as the problem of determining

9 whether a Turing machine will eventually halt on an input (i.e., produce an output instead of computing for innity) [16, p. 70]. He formulated a function HALT: HALT(α#x) = 1 i a Turing machine represented by the string α halts on input x after a nite number of steps and HALT(α#x) = 0 otherwise [7, Ch. 1.5] Theorem 3.1 (Davis 1958 [16, 7]). No TM can compute the language HALT. To prove that this is an undecidable problem we use the same argument as Turing uses in [48, p. 246]: We dene a machine that ips on the diagonal and we reach a contradiction when we let it compute itself. We use xH to denote the description of a Turing machine H.

Proof sketch. Assume towards a contradiction that there is a TM M that computes HALT. We can now dene a new TM H as follows: If M(x#x) = 1 then loop forever, else halt. Running H on input xH causes a contradiction: if M(xH #xH ) = 1 then H loops forever on input xH . However, since M computes HALT, H has to halt on input xH . If M(xH #xH ) = 0, H halts, but also cannot halt after a nite number of steps.

3.1.2 What diagonalization is and what it’s used for in complexity Since the diagonalization method made its rst appearance in complexity theory, it has successfully been used for separating complexity classes. The purpose of this method is showing the existence of a language that is in one of two complexity classes, but not in the other. It often relies on the fact that there’s a correspondence between Turing machines and natural numbers. We previously mentioned that every binary string represents some Turing machine. Since a binary string is essentially a natural number, we can order the Turing machines accordingly. The sequence of all Turing machines T = T1,T2,... will thus start with the machine that is represented by the binary string of the natural number 1, the second one by 2, etc. We will now give an idea of how diagonalization can separate two complexity classes. If we want to prove that A 6⊂ B, we create some sequence M = M1,M2,..., such that for every language in class B there is a Turing machine in M that decides it. This can be done by simply taking the ordered sequence of all Turing machines and removing every one that does not decide a language in B. We then dene a Turing machine, N, that decides a language as follows: on input 1 it outputs the opposite answer of M1(1), on input 2 it outputs the opposite of M2(2), etc. This is where we ip the diagonal. The language that N decides is dierent from all languages in B, since for every machine in M it diers in output on at least one input. If we were able to make sure that the language that N decides is in A, we get the desired result: a language which is in A, but not in B.

3.1.3 The Time Hierarchy Theorem A well-known theorem that uses diagonalization to separate complexity classes is the Time Hierarchy Theorem. It tells us that with more computation time, Turing machines can decide more languages. We call a function T : N → N time (or space) constructible if T (n) ≥ n and

10 there is a Turing machine that on input of some x outputs the binary representation of T (|x|) in time (or space) O(T (n)) [7, Sec. 1.3].

Time Hierarchy Theorem (Hartmanis, Stearns 1965 [27, Cor. 9.1]). If we have two time- constructible functions V and T satisfying T (n) log T (n) = o(V (n)), then

DTIME(T (n)) ( DTIME(V (n))

. The theorem implies that P 6= EXP. We will use the following corollary:

Corollary 3.1 (Hartmanis, Stearns [27, Cor. 2.7]). If T (n) ≥ n and x ∈ DTIME(T (n)), then ∀ε > 0, ∃ a multi-tape TM that prints out the nth digit of x in (1 + ε)T (n) or fewer steps.

We can now give the proof of Theorem 3.1.3 as in [27]. Proof. Assume that V and T are functions as in the theorem. We create a sequence M = M1,M2,... of TM’s that compute all languages computable in time 2T . This can be done by adding an extra tape and a counter to all TM’s. If a TM takes more than 2T (n) steps for a computation on an input x ∈ {0, 1}n, a mark is added to the extra tape and we will remove the TM from the sequence. Corollary 3.1 now implies that for all L ∈ DTIME(T (n)) there exists an Mi in M such that Mi decides L. Let U be the universal TM from Theorem 2.4. With this we dene a TM N, that on input (xMi #i) outputs 1 − U(xMi #i). Here we ip the diagonal. Then N decides a language L0 ∈/ DTIME(T (n)), since it diers on at least one input from every language in DTIME(T (n)). We know that U runs in CiT (n) log T (n) time. Suppose that N begins to simulate U after Di time, for a constant Di. Then N operates in time Di + Ci · T (n) log T (n). But since T (n) log T (n) = o(V (n)), there exists an m0 ∈ N such that Di +Ci ·T (m) log T (m) ≤ V (m) 0 for all m > m0. For the m < m0 we could provide N with a table containing U(xMm #m), so N 0 runs in V (n) time. We have found a language L0 ∈ DTIME(V (n)), but L0 ∈/ DTIME(T (n)). This proves the theorem.

3.1.4 Other diagonalization results The diagonalization method has been used successfully for separating a lot more complexity classes. The Space Hierarchy Theorem, an analogue of the Time Hierarchy Theorem, concerns space-bounded complexity classes.

Denition 3.2 (The class L [14]). L = SPACE(log(n)).

Space Hierarchy Theorem (Stearns, Hartmanis, Lewis [7, 46]). If U and T two are space- constructible functions such that T (n) = o(U(n)), then SPACE(T (n)) ( SPACE(U(n)) The Space Hierarchy Theorem tells us that L 6= PSPACE. But diagonalization can be used for other results than separations. Ladner proved the following, using diagonalization:

11 Theorem 3.3 (Ladner 1975 [34]). If P 6= NP, then there exists a language L ∈ NP, L/∈ P that is not NP-complete.

So if P 6= NP, there are problems beside the NP-complete problems that we cannot eciently compute. We call these languages in NP\P NP-intermediate languages.

3.2 The barrier diagonalization imposes: relativization

3.2.1 Relativization and why diagonalization proofs relativize All diagonalization results depend upon certain attributes of Turing machines. The Time Hi- erarchy theorem uses a representation of Turing machines by strings, as well as the ability of one Turing machine to simulate other machines. The oracle Turing machine is also governed by these attributes. This implies that the Time Hierarchy theorem also applies to languages decided by oracle Turing machines [7, Ch. 3.4]. This is known as a relativizing result, since it holds relative to any oracle. If we had a relativizing result showing that C 6= D for complexity classes C,D, then also CO 6= DO for all oracles O. All the other diagonalization arguments that I previously mentioned also relativize, as well as many others.

3.2.2 Relativizing results cannot solve the P vs NP problem The following theorem shows that relativizing results alone are insucient to solve P vs NP. We will give the proof of the theorem according to [7, Ch. 3.4].

Theorem 3.4 (Baker, Gill, Solovay 1975 [9]). There exist oracles A, B such that PA = NPA and PB 6= NPB.

Proof. For the oracle A we will take a PSPACE-complete language. Then PSPACEA ⊆ PSPACE, since A doesn’t give PSPACE any more power. Also, PSPACE ⊆ PA, because we can reduce all L ∈ PSPACE to A. With the same reasons that P ⊆ NP ⊆ PSPACE, we have that PA ⊆ NPA ⊆ PSPACEA. But then PA = NPA = PSPACE.

We will now construct a set B and language L such that L ∈ NPB, but L 6∈ PB. n n n We let L = {1 | ∃x ∈ B : |x| = n}, L(1 ) = 1 i 1 ∈ L and 0 otherwise. Let M1,M2,... 2n be an enumeration of DTIME( 10 ) oracle TMs. ∞ We will create B = ∪i=1Bi in stages. Let B1 = ∅. At stage i we will determine Bi+1. Let Bi ni be the smallest ni that is bigger than the lengths of all strings in Bi. We run Mi on input ni Bi 1 . Then, the behaviour of the oracle is as follows: If Mi queries a string x it has queried before, the oracle gives the same answer. Otherwise, it answers negatively and we decide Bi that x∈ / Bi+1. We now want to make sure that Mi disagrees with L on at least one input. Bi ni ni If Mi accepts 1 , we decide that all strings of length ni are not in Bi+1, so L(1 ) = 0. If Bi n Bi Bi Mi rejects, we can nd a string x ∈ {0, 1} that Mi has not queried (since Mi can make 2ni 2n at most 10 queries). For small n, poly(n) > 10 , but this is not a problem since there is an innite number of strings that represent each TM. We then decide that x ∈ Bi+1.

12 If we continue to do this at every step, then L 6∈ PB. But L ∈ NPB, since every TM in this class can non-deterministically guess a string and ask if it is in B [7, Ch. 3.4].

3.2.3 Non-relativizing diagonalization We haven’t formally dened diagonalization. We could dene the diagonalization method as the process of ipping the diagonal. This is usually done by a universal Turing machine that uses the representations of Turing machines by strings as input. A proof that uses the diagonalization method with this techniques is relativizing. Kozen dened the diagonal as a function that ips the outcome of languages on some index. In most diagonalization proofs, as the proof of the Time Hierarchy Theorem in Section 3.1.3, this index is precisely on the diagonal line (as Cantor originally used it). But this is of course not necessary. Furthermore, Kozen shows that for any language f that is not in a complexity class, there exists an index such that f is a diagonal. So if P 6= NP is provable, then it is also provable by (non-relativizing) diagonalization [32].

13 4 Circuits and natural proofs

When studying the computation of a function we can also consider the Boolean circuit that computes it. Proving lower bounds was particularly successful for circuits with constant depth. But most lower bound proofs in circuit complexity are natural and cannot be used to separate P from NP.

4.1 Usage and successes

From a mathematical point of view, Boolean circuits are a lot easier than Turing machines. They also allow us to use nonuniform models of computation, where we can use a dierent algorithm for dierent input sizes. By looking at multiple methods of computation, we might increase our chances of solving P vs NP. Circuits also oer hope for circumventing relativizing techniques: By directly looking at the circuits that compute the function we are no longer treating machines as black boxes, in comparison to oracle machines.

4.1.1 The size and depth of circuits Since non-uniform circuits can even compute undecidable languages, we need to restrict their size or depth to gain interesting results. By considering circuits of polynomial size, we could eventually prove that P 6= NP: We know that P ⊆ P/poly (see Section 2.2.4), so a goal in circuit complexity is showing that NP 6⊆ P/poly. The rst who gained results by considering the size of circuits was Shannon. He showed that there exist hard functions, which require large circuits for their computation:

Theorem 4.1 (Shannon 1949 [43]). Almost all Boolean functions on n variables can only be 2n computed by Boolean circuits (with fan-in ≤ 2 and ∧, ∨, ¬ gates) of size greater than Ω( n ). Circuit complexity also has its own, nonuniform, hierarchy theorem. It tells us that small circuits are able to compute fewer functions than large circuits:

Theorem 4.2 (Nonuniform Hierarchy Theorem [7, Th. 6.22]). For all functions T,T 0 : N → N 2n 0 0 with n > T (n) > 10T (n) > n, SIZE(T (n)) ( SIZE(T (n)). Apart from size, we can also look at the depth of a circuit. Parallel computing allows us to execute dierent computations at the same time by using several processing devices. In this case, the depth of a circuit is a more adequate measure for the cost of a computation. There are several complexity classes depending on the depth of a circuit. One of them is NC:

14 Denition 4.3 (The class NC [14]). For every i, NCi is the class of decision problems that can be decided by a uniform family of circuits {Cn}, where every Cn has polynomial size i S i and O(log n) depth. The allowed gates are ∨, ∧ with fan-in 2 and ¬. NC = i≥0 NC . The languages in NC are actually the languages that have ecient parallel algorithms [7, Th. 6.27]. Suppose a circuit has depth d and k · d gates. We can let each of k processors of a parallel computer take on the role of a single gate. Then, the computation of gate ki in layer dj can be performed by a processor i at time j. So the running time of a parallel algorithm is given by the depth of the circuit. Because we can simulate a parallel logarithmic-time algorithm with a (regular) polynomial- time one, NC ⊆ P. Another open question is whether P = NC, or, whether every ecient algorithm has an even faster parallel implementation [7, Sec. 6.7.2]. We can also allow the gates to have unbounded fan-in, so the ∨ and ∧ gates can take more than two inputs. The corresponding complexity class is the following:

Denition 4.4 (The class AC [14]). For every i, ACi is the class of decision problems that can be decided by a nonuniform family of circuits {Cn}, where Cn has polynomial size and i S i O(log n) depth. The allowed gates are ∨, ∧ with unbounded fan-in and ¬. AC = i≥0 AC .

We have that ACi ⊆ NCi+1, since every ∨ or ∧ gate with fan-in n can be simulated by a tree with depth log(n) and fan-in 2. This implies that NC = AC [7, Sec. 6.7.1].

4.1.2 Parity is not in AC0 Researchers started to focus on the computational power of non-uniform circuits with a low depth. There was hope for a proof that a problem in NP needs circuits with restricted depth. Then, we might be able to gradually remove the restrictions and nd superpolynomial-size lower bounds for NP, showing that P 6= NP [49]. One of the rst big successes was the result that circuits in AC0, which are of constant depth, cannot compute Parity. L Pn Denition 4.5 (Parity [7, Th. 14.1]). (x1, x2, . . . , xn) = i=1 xi (mod 2). Theorem 4.6 (Furst et al. , Ajtai [24, 5]). L ∈/ AC0

This result was proven using restrictions on functions. With restrictions some of the inputs are xed, while others remain variable. A restriction on a function induces another function on the unrestricted variables. If a circuit C computes a function f, then for a restriction ρ, Cρ computes f ρ. Putting restrictions on the parity function will still give the parity function, only for the unrestricted variables [24]. It can be shown that any circuit in AC0 can be made into a constant function by restricting some, but not all, of the input bits. However, Parity can only be made constant by restricting all input bits. So Parity cannot be computed by a circuit in AC0 [7, Sec. 14.1.1].

15 4.1.3 Majority is not in AC0 with parity gates We can take this one step further. Razborov was able to show in [39] that the Majority function is not contained in AC0, even with Parity gates. Pn Denition 4.7 (Majority [31]). Majn(x1, x2, . . . , xn) = 1 i i=1 xi ≥ n/2. Instead of Majority, we’ll consider the Threshold function: ( Pn n 1 if i=1 xi ≥ k Denition 4.8 (Threshold function). T (x1, x2, . . . , xn) = k 0 otherwise

For k = n/2 the Threshold function is√ equal to Majority. We will prove the statement for the Threshold function with k = dn/2 + n/2 + 1/2e. n 0 The proof of the result that Tk ∈/ AC outlined by Jukna in [31] consists of two parts. First, it is shown that functions decided by small circuits can be approximated by low-degree polynomials. Subsequently, it is proven that the Threshold function cannot be approximated by such polynomials. In order to show the rst part, we need the following lemma: Qm n Lemma 4.9. Let f = i=1 fi, where each fi : F2 → F2 is a polynomial of degree at most d n over F2. Then for all r ≥ 1, there exists a polynomial g : F2 → F2 of degree at most dr that diers from f on at most 2n−r inputs. Proof. We will dene g randomly and calculate the upped bound on the expected number of dierences with f. Let S1,...,Sm ⊆ {1, 2, . . . , m}, chosen uniformly at random. Then let g be dened as follows: r Y X g = hj, hj = 1 − (1 − fi).

j=1 i∈Sj We will calculate the probability that f and g dier on some xed input a ∈ {0, 1}n. If f(a) = 1, then fi(a) = 1 for all i , and thus g(a) = 1 and f(a) = g(a).

If f(a) = 0, fi0 (a) = 0 for at least one i0, and g(a) = 1 i hj(a) = 1 for all j. We h (a) = 1 ⇔ P (1 − f ) = 0 ⇔ (f (a) = 1, ∀i ∈ S ) have that j i∈Sj i i j . The probability that 1 1 i0 ∈ {1, 2, . . . , m} is contained in Si is 2 , so the probability that i0 ∈/ Si is also 2 . Then, P r[hj(a) = 1] ≤ 1/2. Given f(a) = 0, the probability that g(a) = 1 becomes:

−r P r[g(a) = 1] = P r[∀j, hj(a) = 1] ≤ 2 ,

−r because the events hj(a) = 1 are independent for all j. So P r[f(a) 6= g(a)] ≤ 2 . n For an input a ∈ {0, 1} , let Xa be the indicator variable for the event g(a) 6= f(a). Let X be the sum of all Xa. Then X will be the number of inputs on which f and g dier. The expectation value of X can be calculated as follows:

X X X X −r n−r E[X] = E[Xa] = P r[Xa = 1] = P r[g(a) 6= f(a)] ≤ 2 = 2 . a a a a

16 The pigeonhole principle states that if E[f 0] ≤ t for some function f 0, then there exists a point x for which f 0(x) ≤ t. This point x gives our function g that ts the requirements in the lemma.

Now let f be a function computed by a circuit of depth c and size `. We will approximate f at every gate in the circuit. The input of f consists of polynomials of degree 1. The circuit can contain four gates. For a polynomial p, ¬(p) = 1 − p and the L gate uses the sum of the inputs. But the ∧ gate uses the product of the inputs and the ∨ gate can be constructed from ∧, so we need Lemma 4.9 for these gates. Since the circuit has ` gates, there will be at most ` approximations, so g will dier from f on at most ` · 2n−r inputs. The degrees of the approximating polynomials will grow individually for each path from input to output and each path contains c gates, so the nal function g that approximates f will have degree at most rc. Thus, a function f computed by a small circuit can be approximated by a low-degree polynomial. For two vectors u and v: u ≤ v i ui ≤ vi for all i. Then: Q Lemma 4.10. Let f = i∈S fi be a monomial of degree d = |S| ≤ n − 1. If a is a vector Pn P (a1, . . . , an) with ai ∈ {0, 1} and i=0 ai ≥ d − 1, then over F2, b≤a f(b) = 0.

Proof. Since f has degree d, f(a) = 0 if there exists an i ∈ S such that ai = 0. Then for all b ≤ a f(b) = 0. If f(a) = 1, then ai = 1 ∀i ∈ S. The number of vectors b ≤ a with f(b) = 1 m is then equal to 2 , with m the number of ai ∈ {ad+1, . . . , an} for which ai = 1. This is an P even number, so b≤a f(b) = 0 over F2.

With this lemma we are able to show that the Threshold function cannot be approximated well by a low-degree polynomial:

Lemma 4.11. Let n/2 ≤ k ≤ n. Then every polynomial of degree d ≤ 2k − n − 1 over F2 n n diers from Tk on at least k inputs.

n Proof. Let g be a polynomial of degree d ≤ 2k −n−1 over F2. Let U = {u | g(u) 6= Tk (u)} Pn n and A = {(a1, . . . , an) | ai ∈ {0, 1}, i=1 ai = k}. We want to show that |U| ≥ k . For this we create a matrix M = (ma,u): We index the rows and columns according to the members of A and U respectively and ma,u = 1 i u ≤ a and 0 otherwise. If it can be shown that the n (k) n columns of M span the whole linear space, F2 , then |U| ≥ |A| = k . Let a ∈ A and Ua = {u ∈ U | ma,u = 1}. If it can be shown that for every b ∈ A : ( X 1 if b = a mb,u = 0 otherwise, u∈Ua then we get the desired result. Over all b ∈ A, this sum gives vectors with a 1 only on index n (k) a. So all unit vectors are in the column-span of M and the column-span thus equals F2 . Let a ∧ b be a vector with (a ∧ b)i = 1 i ai = bi = 1. Then the number of ones in a ∧ b are the indices where a and b both have a one. Since a and b both contain k ones, there are

17 at least n − 2(n − k) = 2k − n of those indices. We have that d ≤ 2k − n − 1, so a ∧ b has at least d + 1 ones. Furthermore, we know that for every u ∈ Ua, u ≤ a. So mb,u = 1 i u ≤ a ∧ b. Then:

X X X n X n X mb,u = 1 = (Tk (x) + g(x)) = Tk (x) + g(x). ua u≤a∧b x≤a∧b x≤a∧b x≤a∧b

P P n By linearity and Lemma 4.10: x≤a∧b g(x) = 0. We have that x≤a∧b Tk (x) = 1 i a = b, so this proves the Lemma.

With Lemma 4.9 and 4.11 we can prove the following theorem:

Theorem 4.12 (Razborov 1987 [39]). Every circuit with constant depth c, unbounded fan- n √ in and ∧, ∨ and Parity gates that computes Tk with k = dn/2 + n/2 + 1/2e has size ` ≥ 2Ω(n1/(2c)).

Proof. Let f be a function computed by a circuit of size ` and let r = n1/(2c). According to c √ Lemma 4.9 there is a polynomial g of degree at most r = n that approximates√f, making at most ` · 2n−r mistakes. Lemma 4.11 tells us that a polynomial g of degree n that ap- n n n √2 n−r n proximates Tk makes at least k = Ω( n ) mistakes. If we have that ` · 2 ≥ k , then Ω(n1/(2c)) ` ≥ 2 .

Suppose, towards a contradiction, that we had an ecient (i.e., polynomial-sized) circuit for Majority. We could add zeroes to the end of an input of n bits, such that it becomes a N n string of length N = 2k. Then MajN = TN/2 = Tk and the latter function must also have a polynomial-sized circuit. But this is in contradiction with the theorem. We can conclude that Majority is not in AC0.

4.2 The barrier in circuit complexity: natural proofs

A common proof strategy in circuit complexity can be described as follows: First, some property of Boolean functions is dened and it is shown that all functions in some complexity class do not posses that property. Then, it is shown that a function f does have that property. This implies that f is not in the complexity class. Both Theorem 4.6 and 4.12 were proven this way. But Razborov and Rudich argue in [40] that, under an assumption, such lower bound techniques cannot be used to prove that P 6= NP.

4.2.1 What are natural proofs? Natural proofs use a certain property that Boolean functions can posses. This property can be identied as a collection of Boolean functions, {Cn|n ∈ ω}. Each Cn is in itself a collection of Boolean functions on n variables. Some fn will posses this property i fn ∈ Cn (or Cn(fn) = 1).

18 Denition 4.13 (Natural proof [40]). The property that corresponds to a natural proof has three characteristics: Constructiveness: The property Cn is computable in time polynomial in the truth table of n O(n) fn. The truth table has size 2 , so there is an algorithm running in 2 time that computes Cn(fn). 1 Largeness: A random Boolean function gn has probability at least nk , for some xed k > 0, to be in Cn. We call Cn a natural property if it is constructive and large. A property is useful against P/poly if it satises the following: Usefulness: Every sequence of functions f1, f2, . . . , fn,... with fn ∈ Cn has a circuit size that is super-polynomial. So gn ∈/ Cn for all {gn} ∈ P/poly. A property could also be useful against other complexity classes, with an obvious modication in the denition.

There is a motivation behind these characteristics. Constructiveness has an empirical justication. Proofs of circuit lower bounds often use combinatorial techniques, like the proof shown above. It turns out that almost all properties of Boolean functions in combinatorics are at worst exponential-time decidable [7, 40]. The largeness condition might make sense intuitively: We want that a random function gn has a non-negligible chance of having the property Cn. In fact, a lower bound on the circuit complexity of one function implies a lower bound on the complexity of a lot more functions [40, 7]. So a property that only applies to a small number of functions cannot be used for proving lower bounds. At last, a property Cn has to be useful. Only then we can use it for separating complexity classes.

4.2.2 Most proofs in circuit complexity naturalize In 1997 Razborov and Rudich showed that all circuit lower bound proofs that were known at the time naturalize, i.e., are natural proofs [40]. In order to show that a proof naturalizes, one has to nd the property that is used and show that it is natural. The validation of the usefulness requirement is generally contained in the proof.

Parity ∈/ AC0 naturalizes Functions decided by circuits in AC0 become constant after xing a number of inputs, but Parity can only become constant after xing all variables. So the property that was used is n ρ Cn = {fn : {0, 1} → {0, 1} | ∀ restrictions ρ on less than n variables, fn is not constant}. Cn has the largeness condition, since a random function isn’t likely to become constant after a random restriction. It is also constructive: If we let k variables be unxed, we can n n n−k O(n) calculate Cn(f) for some f : {0, 1} → {0, 1} by listing all k 2 = 2 possible restrictions [40].

19 Majority ∈/ AC0 with parity gates naturalizes The proof in Section 4.12 is a modication of the original proof of Razborov. Our variant gives a mapping of the Threshold function to a matrix. The rank of that matrix gives the minimal number of mistakes an approximation polynomial makes. Both proofs are similar, but Razborov’s version gives a mapping M from all symmetric functions to matrices [39]. n The property Cn = {fn : {0, 1} → {0, 1} | rank(M(fn)) is large } [40]. In the proof, it is shown that all functions in AC0 with Parity gates can be approximated well by a low-degree polynomial. This means that the mapping M on functions in AC0 +L cannot give matrices with high rank. The property is thus useful against AC0 + L. The calculation of the rank can be done in polynomial-time (constructiveness). And for at least 1/2 of all Boolean functions fn the rank of the matrix M(fn) is high (largeness) [40].

4.2.3 Natural proofs cannot solve P vs NP

In their paper Razborov and Rudich showed that a property Cn of a natural proof can be used to break a pseudorandom function family[40]. We can construct pseudorandom function families from one-way functions. Therefore, if one-way functions exist, then natural proofs cannot show that P 6= NP.The existence of one-way functions is a widely believed conjecture and it implies that P 6= NP[7, Sec. 9.2]. Denition 4.14 (T (|x|)-strong one-way function [7, Def. 9.4]). A polynomial-time computable function f : {0, 1}∗ → {0, 1}∗ is a T (|x|)-strong one-way function i for every probabilistic T (|x|)-time algorithm A,

P r[A(y) = x s.t. f(x) = y] < n−c for every c and suciently large n ∈ N, where y = f(x) and x ∈ {0, 1}n uniformly at random. From any one-way function we can construct a pseudorandom generator. This is a function that creates a pseudorandom string of bits on input of some other string, called the seed [7, Sec. 9.2.3]. From this generator we can construct a pseudorandom family of functions:

Denition 4.15 (T (|k|)-secure pseudorandom function family, [7, Def. 9.16]). Let {fk}, k ∈ ∗ |k| {0, 1} be a family of functions such that fk : {0, 1} → {0, 1}, and there is a polynomial- ∗ |k| time algorithm that, given k ∈ {0, 1} and x ∈ {0, 1} , computes fk(x). This family is T (|k|)-secure pseudorandom i for every probabilistic oracle Turing machine A that runs in time T (|k|) |P r[Afk (1n) = 1] − P r[Ag(1n) = 1]| < n−c for every c and suciently large n ∈ N, where g : {0, 1}n → {0, 1} and k ∈ {0, 1}n chosen uniformly at random.

We will assume the existence of 2n -strong (subexponentially strong) one-way functions 0 for some xed > 0. From these, we can create a 2|k| -secure pseudorandom function family 0 {fk} for another > 0 (, ‘ < 1) [7, 28].

20 We will now discuss why natural proofs cannot show P 6= NP: It turns out that a natural property enables us to distinguish between oracle access to a pseudorandom function from such a family, and oracle access to a uniformly random Boolean function. The following theorem is due to Razborov and Rudich [40]. The proof comes from [7, Th. 23.1].

Theorem 4.16. If subexponentially strong one-way functions exist, there is no lower-bound natural proof useful against P/poly

Proof. We assume towards a contradiction that there exists a lower bound natural proof, that uses some natural property Cn, useful against P/poly. There exists a c > 0 such that a 1 uniformly random function on n variables has probability of at least nc to be in Cn (largeness). |k| Let {fk} be a 2 -secure pseudorandom function family. Let A be a polynomial-time algorithm with oracle access to a function h : {0, 1}m → {0, 1} and let n = m/2 . The m function h can be a uniformly random function, or fk for some k ∈ {0, 1} . Dene a function g : {0, 1}n → {0, 1} as g(x) = h(x0m−n), so m − n zeroes are added to the input and it is then sent as input to h. We let A compute Cn(g). If h is fk, then fk can be computed in time polynomial in m and thus also in time polynomial in n. This means that g can be computed in polynomial time, and thus by a polynomial- sized circuit. So Cn(g) = 0, since Cn is useful against P/poly. If h is a uniformly random function, then g is also a uniformly random function and thus 1 Cn(g) = 1 with probability at least nc . O(n) Cn(g) can be computed in 2 time (constructiveness). So A distinguishes between oracle m access to fk and to a uniformly random function in less than 2 time, which is a contradiction.

Thus, natural proofs cannot separate NP from P/poly. In order to solve this problem, proofs must either violate the largeness or the constructiveness condition.

4.2.4 Non-naturalizing results That Majority is not in AC0 even with Parity gates was one of the last remarkable circuit lower bounds for quite some time. After 1987 there were no signicant new general lower bounds in circuit complexity [19], until Ryan Williams was able to show that NEXP 6⊂ AC0 with MODm gates in [49] (a MODm gate outputs 1 i m divides the sum of its inputs, where m > 1 is an arbitrary constant). The result is non-relativizing and non-natural and it has been called the "rst baby step" towards solving P vs NP [18]. This will be discussed in Chapter 6. There has been another noteworthy result: Burhman et al. showed that MAexp (the class of languages computable with two-round public-coin interactive proof with an exponential- time verier) does not have polynomial-sized circuits [11]. This is another result that is non- natural and non-relativizing, since it uses diagonalization and arithmetization respectively [4].

21 5 Arithmetization and algebrization

Arithmetization is used for representing Boolean functions by polynomials and it’s especially useful in interactive proofs. However, most lower bound proofs that use arithmetization algebrize and cannot solve P vs NP.

5.1 The usage of arithmetization

Arithmetization circumvents the relativization barrier. It has been used in circuit complexity: Razborov’s result that Majority is not in AC0 used a representation of Boolean formulas by polynomials, as did the similar result of Smolensky [39, 45].

5.1.1 What is arithmetization? Arithmetization is used for extending Boolean formulas to polynomials. The arithmetization of a Boolean formula to a polynomial is such that their outputs coincide on Boolean inputs. This way, we can use the polynomial instead of the formula. Denition 5.1 (Arithmetization [30, B.2.2]). Arithmetization is the conversion of a Boolean formula ϕ : {0, 1}n → {0, 1} to a multivariate polynomial ϕ˜ : Fn → F, for a nite eld F. Let ϕ be a Boolean formula on the variables x1, . . . , xn, such that ϕ has no ∨ symbols. We can dene ϕ˜ by induction on the structure of ϕ:

xi 7→ xi ∈ F[x1, . . . , xn] ¬φ 7→ (1 − φ˜) φ ∧ ψ 7→ φ˜ · ψ˜

Now, for all x ∈ {0, 1}n: ϕ(x) =ϕ ˜(x).

5.1.2 Rounds of interaction and deterministic interactive proofs The rst results where arithmetization played a major role in the proof concerned interactive proofs. These results where due to Lund et al. and Shamir, who showed that IP = PSPACE [42, 36] (of which more later) and Babai et al., who proved that MIP = NEXP [8]. The class of languages that have probabilistic interactive proofs is IP, while MIP contains languages that have protocols with multiple provers [14]. We will rst consider deterministic interactive proofs, that consist of an interaction between a deterministic verier and prover. The verier asks the prover questions and veries its responses. The interaction consists of several rounds:

22 Denition 5.2 (A k-round deterministic interaction [7, Def. 8.2]). Let f, g : {0, 1}∗ → {0, 1}∗ be functions and k ≥ 0 an integer. A k-round interaction on input x ∈ {0, 1}∗ is dened as a ∗ sequence of strings a1, . . . , ak ∈ {0, 1} :

a1 = f(x)

a2 = g(x, a1) . .

a2i+1 = f(x, a1, . . . , a2i) (2i < k)

a2i+2 = g(x, a1, . . . , a2i+1) (2i + 1 < k)

The output of the interaction is dened as f(x, a1, . . . , ak).

In the end, the verier is convinced of the statement the prover was trying to prove (f(x, a1, . . . , ak) = 1), or not. Languages decided by deterministic interactive proof systems correspond with the following class:

Denition 5.3 (The class dIP [7, Def. 8.3]). A language L ∈ dIP i there is a k-round deterministic interaction between a polynomial-time TM V and a prover P, where k is at most a polynomial in the size of the input x ∈ {0, 1}∗, such that: 1. If x ∈ L, then there is a prover P : {0, 1}∗ → {0, 1}∗ such that V accepts. 2. If x∈ / L, then for all provers P : {0, 1}∗ → {0, 1}∗ V rejects.

The prover has unlimited computational power, but he cannot let the verier accept a false statement. Since the verier and prover are deterministic, all questions and answers can be announced immediately: there is no need for more than one round of interaction. We have that NP = dIP: the existence of a certicate that can be veried in polynomial time corresponds with the existence of an interaction transcript. Namely: if such a certicate exists, there is a 1-round deterministic interaction where the verier accepts. If there is a transcript a1, . . . , ak of a k-round interaction, then (a1, . . . , ak) can count as a certicate: On input x, a polynomial-time Turing machine V can check that V (x) = a1,V (x, a1, a2) = a3,...,V (x, a1, . . . , ak) = 1 [7, Lem. 8.4].

5.1.3 Probabilistic interactive proofs and the class IP In probabilistic interactive proofs, with at most a small probability true statements get re- jected and false statements get accepted. The verier is now a probabilistic Turing machine. The verier V and the prover P engage in a conversation, consisting of several rounds of interaction. Every other round V generates a random string, the coin, that it uses in its verication. But P cannot see this coin. A formal denition of such an interaction is as follows:

Denition 5.4 (A k-round interaction with private coins [7, Sec 8.1.2]). Let f, g : {0, 1}∗ → {0, 1}∗ be functions, k ≥ 0 an integer, x ∈ {0, 1}∗ an input and r ∈ {0, 1}m a random string gen- erated by f.A k-round probabilistic interaction with private coins is dened as a sequence of

23 ∗ strings a1, . . . , ak ∈ {0, 1} :

a1 = f(x, r)

a2 = g(x, a1) . .

a2i+1 = f(x, r, a1, . . . , a2i) (2i < k)

a2i+2 = g(x, a1, . . . , a2i+1) (2i + 1 < k)

The output of the interaction is dened as f(x, r, a1, . . . , ak).

Every other round, f uses the random variable r. It can use the same part of r, but it can also divide r into disjoint sequences and use a dierent part of r every round. Since the interactions are random variables, the output is too. Because of the probability in the interaction, the verier can sometimes accept a false statement. We want that correct statements are accepted with high probability and wrong statements with low probability:

Denition 5.5 (The class IP [14, 7]). A language L ∈ IP i there is a k-round private coin interaction a1, . . . , ak between a probabilistic polynomial-time TM V and a prover P, where k is at most a polynomial in the size of the input x ∈ {0, 1}∗. Furthermore: ∗ ∗ 2 1. If x ∈ L, there is a P : {0, 1} → {0, 1} such that V accepts with probability at least 3 . ∗ ∗ 2 2. If x∈ / L, for all P : {0, 1} → {0, 1} the probability that V accepts is at most 3 .

5.1.4 IP = PSPACE It was known for some time that IP ⊆ PSPACE [7, Sec. 8.1.1]. For a language L, we can compute the probability that a verier accepts an input x using polynomial space, which tells us whether x ∈ L or not. Most researchers thought that IP ⊆ PSPACE would be a proper containment, since there are oracles relative to which this is the case. In 1992 Shamir [42], following up on Lund et al. [36], was able to show the opposite, using algebrization.

Theorem 5.6 ([42, 36]). PSPACE ⊆ IP

In order to prove this, we will show that a PSPACE-complete language, TQBF, is in IP. This language consists of True Quantied Boolean Formulas. This can be understood as SAT for formulas with quantiers. The proof below gives an interaction protocol for TQBF and the presentation is inspired by [7, Sec. 8.3.3]. n Proof. Let ψ = ∀x1∃x2 · · · ∀xn−1∃xnϕ(x1, . . . , xn) be a TQBF and ϕ : {0, 1} → {0, 1}. We will construct a polynomial-time verier V and a prover P, such that TQBF ∈ IP. The idea is to use arithmetitization to let P convince V that ψ 6= 0. Let ϕ˜ be the arithmetization of ϕ over a nite eld Fp, for some p. We know that ψ is true if Y X Y X Ψ = ··· ϕ˜(b1, . . . , bn) 6= 0

b1∈{0,1} b2∈{0,1} bn−1∈{0,1} bn∈{0,1}

24 for a right choice of p, on which more later. In order to convince V that Ψ 6= 0, P will try to show V that Ψ = K for some K ∈ Fp. In the interaction, P will provide V with the univariate polynomial X Y X h(x1) = ··· ϕ˜(x1, b2 . . . , bn).

b2∈{0,1} bn−1∈{0,1} bn∈{0,1}

V will then verify that h(0) · h(1) = K. More rounds will follow, so V can check that P is not cheating. However, there are two problems. Because of the products, the degree of h can get exponentially big. If we had a TQBF with only universal quantiers, the degree could get as high as 2n. The value of Ψ can thus also be double-exponential. A polynomial-time verier cannot read the binary string of a double-exponential number, nor can it receive the possible 2n coecients of the polynomial. We will thus work modulo some prime number p to lower the values involved in the interaction. The choice of p is such that Ψ will get a non-zero value and such that all values that arise in the interaction are low enough for V. In order to reduce the degrees of the polynomials that will arise in the interaction, we use a linearization operator Li. For a polynomial q:

Liq(x1, . . . , xm) = (1−xi)q(x1, . . . , xi−1, 0, xi+1, . . . , xm)+xi·q(x1, . . . , xi−1, 1, xi+1, . . . , xm)

This way, xi has a power at most 1 in the expression Liq, and q will agree with Liq on inputs in {0, 1}. Then let d be the upper bound on the degree of all polynomials involved, which is known to the verier. For the interaction, we will also treat the quantiers ∀xi, ∃xi as operators Ai,Ei:

Aiq(x1, . . . , xm) = q(x1, . . . , xi−1, 0, xi+1, . . . , xm) · q(x1, . . . , xi−1, 1, xi+1, . . . , xm)

Eiq(x1, . . . , xm) = q(x1, . . . , xi−1, 0, xi+1, . . . , xm) + q(x1, . . . , xi−1, 1, xi+1, . . . , xm) The linearization operator should be there for every free variable. So instead of Ψ, we will now use the expression

P = A1L1E2L1L2 ··· An−1L1 ··· Ln−1EnL1 ··· Lnϕ˜(x1, . . . , xn) and let P show that it is equal to K. The protocol can be described in the following way: In the rst round V asks P to prove that ψ 6= 0. In the second round, P says that he will show that P = K, for some K ∈ Fp. In the third round, V can verify the expression P and check that p is indeed a prime number. Round 4 and 5: A1

1. P provides a polynomial h1(x1) of degree at most d that is supposed to equal L1E2 ··· Lnϕ˜(x1, . . . , xn).

2. V checks whether h1(0) · h1(1) = K.

25 3. If h1(0) · h1(1) 6= K, V rejects. Otherwise, V takes a random a1 ∈ Fp, and asks P to show that h1(a1) = L1E2 ··· Lnϕ˜(a1, x2, . . . , xn).

Round 6 and 7: L1

1. P provides a polynomial h2(x1) that is supposed to equal E2L1L2A3 ··· Lnϕ˜(x1, . . . , xn).

2. V checks whether (1 − a1)h2(0) + a1h2(1) = h1(a1).

3. If this is not the case, V rejects. Otherwise it takes a random a2 ∈ Fp, and asks P to show that h2(a2) = E2L1 ··· Lnϕ˜(a2, x2, . . . , xn).

Round 8 and 9: E2

1. P provides a polynomial h3(x2) that is supposed to equal L1L2A3 ··· Lnϕ˜(a2, x2, . . . , xn).

2. V checks whether h3(0) + h3(1) = h2(a2).

3. If this is not the case, V rejects. Otherwise it takes a random a3 ∈ Fp, and asks P to show that h3(a3) = L1L2A3 ··· Lnϕ˜(a2, a3, x3 . . . , xn).

This will continue in a recursive manner. Every other round, V wants to know whether 0 0 O · q(xj) = K , for some K ∈ Fp and polynomial q, where O can be either Ai,Ei or Li. The correctness of the protocol can be veried in the following way. If P 6= K, and for hi(xj) P provides the polynomial q(xj), but (e.g.) hi(0) · hi(1) 6= hi−1(ai−1), V rejects. Now assume that P 6= K and P provided a polynomial hi not equal to q. The polynomials hi and q have a degree at most d, so the polynomial hi − q has at most d roots. Thus, for a d d random a ∈ Fp: P r[hi(a) = q(a)] ≤ p , since there are at most p values for a for which hi(a) = q(a). If h(a) 6= q(a), P still has to proof an incorrect statement the next round and d P ra[hi(a) 6= q(a)] ≥ 1 − p . So the probability that V rejects increases every round by a d factor of at least (1 − p ). In the end, V will reject a false statement with probability at least d m d m 1 (1 − p ) , with m the number of operators in P and (1 − p ) > 3 with the right choice of p. If P = K for some K ∈ Fp, then P is able to convince V to accept with probability 1. The number of rounds in the interaction are then equal to 2m + 3. In P there are n + n + (n − 1) + (n − 2) + ··· + 1 operators, so m = O(n2). We can now conclude that TQBF ∈ IP. 5.2 The barrier arithmetization imposes: algebrization

Arithmetization lets us consider polynomials instead of Boolean functions. This motivates us to consider another oracle. We dened an oracle as a language: Lf = {x : f(x) = 1}, that an oracle Turing machine has access to. An equivalent denition, one that we will use for the m rest of this chapter, is a collection of Boolean functions fm : {0, 1} → {0, 1}, one for each m ∈ N. When we show that some proof relativizes, we can give the oracle Turing machine access to the arithmetized polynomials as well. This way we can nd another barrier towards solving the P vs NP problem [4].

26 5.2.1 What are algebrizing results? Aaronson and Widgerson dened an extension oracle:

m Denition 5.7 (Extension oracle over a nite eld). Let Am : {0, 1} → {0, 1} be a Boolean ˜ m function and Am,F : F → F the algebrization over nite eld F. Given an oracle A = ˜ ˜ (Am), m ∈ N, an extension A of A is the collection (Am,F), one for each m ∈ N and each nite eld F. Given a complexity class C, CA˜ is the class of languages decided by Turing machines that ˜ decide languages in C, only now every machine can query Am,F for every m ∈ N and nite eld F. This means that with one step the Turing machine can ask whether for some x ∈ Fm ˜ Am,F(x) 6= 0. With relativization in mind, we can dene the notion of algebrizing results:

Denition 5.8 (Algebrization, [4, Def. 2.3]). A complexity class inclusion C ⊆ D algebrizes if CA ⊆ DA˜ for all oracles A and their nite eld extensions A˜. A separation C 6⊂ D algebrizes if CA˜ 6⊂ DA for all A, A˜.

There is a reason that in this denition one of the classes has access to the regular oracle, while the other one has access to the extended oracle: Only this way we know how to prove that existing results algebrize. Oracles A and A˜ contain functions that coincide on Boolean inputs, so for a class C we have that CA ⊆ CA˜. Therefore, all relativizing results algebrize.

5.2.2 Most results that use arithmetization algebrize Aaronson and Widgerson show that a number of non-relativizing results also algebrize [4]. This includes IP = PSPACE and MIP = NEXP. There are also lower bounds in circuit complexity that algebrize (e.g. MAexp 6⊂ P/poly as mentioned in Section 4.2.4). In order to see that PSPACE ⊆ IP algebrizes, we rst have to note that TQBFA is a PSPACEA-complete language [29, Sec. 3]. TQBFA is a formula with A-gates in addition to the ¬, ∧ and ∨ symbols. If we follow the proof of Theorem 5.6 and arithmetize the formula, we will end up with a polynomial containing A˜-gates. For the evaluation of this polynomial, ˜ V can use its access to A˜. Therefore, PSPACEA ⊆ IPA [4].

5.2.3 Algebrizing results cannot solve P vs NP It turns out that algebrization is another barrier in solving P vs NP .

Theorem 5.9 ([4, Thm. 5.1,5.3]). There exist oracles A, B and extensions A,˜ B˜ such that ˜ NPA ⊆ PA and NPB 6⊂ PB˜

The theorem and the proof are very similar to the relativization result of Baker et al. in Section 3.2.2. We do need an extra lemma in order to give the proof.

Lemma 5.10 ([4, Lem. 4.5]). Let F be a collection of elds and let f : {0, 1}n → {0, 1} be a n Boolean function. For every F ∈ F , let pF : F → F be a polynomial over F extending f. Also

27 let Y ⊆ n for each ∈ F and t := P |Y |. Then there exists a subset B ⊆ {0, 1}n with F F F F F |B| ≤ t, such that for all Boolean functions g : {0, 1}n → {0, 1} that agree with f on B, there n exist polynomials qF : F → F such that: 1. qF extends g 2. qF(y) = pF(y) for all y ∈ YF Proof. For the oracle A we can take any PSPACE-complete language. Considering the proof of Theorem 5.6, we know that the arithmetization of a language in PSPACE is still ˜ in PSPACE. Then, with the same argument as in Section 3.2.2, we have that: NPA = NPA = PSPACE = PA. S∞ ˜ S∞ ˜ For creating the oracles B = i=1 Bi and B = i=1 Bi we take the language L consisting of all strings 1n for which there is a w ∈ B (or B˜) with length n. For every n, we will n ˜ construct a Boolean function fn : {0, 1} → {0, 1}. B will contain these fn, while B will ˜ contain the arithmetized fn,F for every nite eld F. log(n) Let M1,M2,... be an enumeration of DTIME(n ) extension oracle TM’s. We will create B˜ in several stages and with that we will also choose our B. At stage i ˜ ˜ ˜ we will dene Bi+1, B1 until Bi have already been dened. Let ni be the smallest ni that is ni bigger than all strings in Bi. Then, the simulation of Mi on input 1 and the corresponding B˜i behaviour of the oracle are as follows: If Mi queries a string that has been queried in a previous round, we let the oracle give the same answer as it did before. Otherwise: return 0. ˜ ˜ Bi ˜ ˜ Bi When Mi halts, we put fn,F := 0, the constant-0 function, in Bi+1, for all n for which Mi queried a string of that length, except strings of length ni. ˜ Bi ni For the strings of length ni the behaviour is the following: If Mi accepts on input 1 , x ˜ ni fni,F := 0 for all nite F. Then L(1 ) = 0. ˜ ˜ Bi ni ni Bi If Mi rejects on input 1 , let, for all F, YF be the set of y ∈ F that Mi queried. We P |Y | ≤ nlog(n) f f˜ f˜ have that F F . Let ni and ni,F be the functions such that ni,F corresponds f 0 f with the queries that have been made. We can nd another function ni that agrees with ni f 0 (w) = 1 w ∈ {0, 1}ni on all the queried inputs, but ni for some that has not been queried. f˜0 f 0 f˜ By Lemma 5.10 there exist extensions ni,F of ni that agree with ni,F on the queried inputs, f˜0 (w) = 1 f˜0 B˜ L(1ni ) = 1 but ni,F . If we put these ni,F are in i+1, we have that . So we have found an oracle B˜, such that L 6∈ PB˜ , but L ∈ NPB with the same argument that Baker et al. use. 5.2.4 Other results that need non-algebrizing techniques Algebrization explains why we don’t have superlineair circuit lower bounds for NP. Besides P vs NP, there are other open problems that need non-algebrizing techniques, e.g. NP vs P/poly or NEXP vs P/poly. Aaronson and Widgerson propose several methods for evading the arithmetization barrier. We could use "recursive" arithmetization instead: rst arithmetize a formula, reinterpet the result as a Boolean function and arithmetize again. This could be done several times. We cannot prove P 6= NP this way, but other separations might benet from this method. We could also try to nd properties of the polynomial produced by arithmetization that oracles cannot have access to. [4].

28 6 Evaluation of the current situation

There are things left to say about P vs NP that should be taken into consideration and some additional proof techniques that can be discussed.

6.1 Remarks

6.1.1 P vs NP could be independent of ZFC It is possible that P 6= NP or P = NP is independent from the chosen mathematical axioms, as ZFC. So far, there have been several attempts towards proving this. We could start by showing that P 6= NP is unprovable in weaker logical theories. But even Shannon’s Theorem 4.1 turned out to be unprovable in some of these. Ultimately, we want to prove that P vs NP is independent of a strong theory, but our current proof techniques are probably incapable of this [3]. Sipser believes that P vs NP is not formally independent. We just aren’t able to solve it yet [44]. To support this assumption, Aaronson formulated the following criterion for mathematical truth: We should expect a mathematical question to have a denite answer, if and only if we can phrase the question in terms of a physical process we can imagine.

6.1.2 The natural proofs barrier Razborov and Rudich provided us with one of the barriers, natural proofs, towards solving P vs NP [40]. For each of the properties of a natural proof they were able to give a justication. The justication for constructivity, however, is only empirical. They argue that all properties of Boolean functions that have been used in actual circuit lower bounds are constructive. Nonetheless, it might still be possible to provide a circuit lower bound with a property that does not satisfy constructivity. Furthermore, natural proofs are only a barrier towards showing that P 6= NP. When Razborov and Rudich proved that we cannot show that P 6= NP with natural proofs, they assumed the existence of strong one-way functions. The existence of these functions implies that P 6= NP [7, Sec. 9.2]. Thus, in order to give their proof they had to make an even stronger assumption than P 6= NP. Considering all this, we could still use natural proofs as a guidance towards techniques needed for lower bounds. But they do not have to keep us from nding them [21].

29 6.1.3 The opinions of researchers about P vs NP Current research mainly focuses on showing that P 6= NP . However, even Gödel thought that the world of P = NP could be reality. Some researchers think that working towards P 6= NP is not the right way to solve the problem [26].

"Being attached to a speculation is not a good guide to research planning. One should always try both directions of every problem. Prejudice has caused famous mathematicians to fail to solve problems whose solution was opposite of their expectations, even though they had developed all the methods required." Anil Nerode in [26].

In 2002 and 2012 Gasarch conducted a poll to gather the opinions of researchers in this eld about P vs NP. At both times, almost 10% of them thought that P = NP and about 4% thought that P vs NP is independent of ZFC. In 2002 61% was convinced that P 6= NP and in 2012 this was 83% [25].

6.2 Other methods for proving P =6 NP

We discussed the most important proof methods towards showing that P 6= NP, but there are some notable techniques left.

6.2.1 Proof complexity The class coNP contains languages for which the complement is in NP. We know that the language TAUTOLOGY of all tautologies is coNP-complete. Proof complexity studies the resources required to prove propositional tautologies. An open problem in this area is whether every Boolean tautology has a short propositional proof. If we are able to show that there is no short certicate to certify that all assignments make a formula true, i.e., TAUTOLOGY ∈/ NP, then coNP 6= NP, which implies P 6= NP [7, Sect. 2.6]. Recently researches started a new approach towards proof complexity, using algebraic complexity. This interaction is expected to give many new insights [38].

6.2.2 Autoreducibility Buhrman et al. propose another method for separating complexity classes: using autoreducibility [12]. A language L is autoreducible if there is a polynomial-time oracle machine M L that accepts L, but M L(x) does not query whether x ∈ L. They show that complexity classes dier by proving that all languages in one class are autoreducible, but one language in the other isn’t. This way, they were able to obtain the result L 6= PSPACE, as in Section 3.1.4. These separations rely on a dierence in structural properties, so the results do not relativize: There are cases where O is some oracle and C is a class that only contains autoreducible languages, but CO contains a language that isn’t autoreducible. Classes that are

30 inseparable with known diagonalization techniques might be separable with autoreducibility.

6.2.3 Showing lower bounds from upper bounds Geometric complexity theory Ketan Mulmuley has outlined a plan for showing P 6= NP with the use of Geometric Com- plexity Theory (GCT). His idea is to ip the (negative) lower bound of P 6= NP to an equivalent (positive) upper bound: showing that a series of decision problems in representation theory and algebraic geometry is in P. He wants to achieve this in the following way: If we take an NP-complete function N, we want that N cannot be computed by circuits of polynomial size. This implies P 6= NP. We can show that N cannot be computed by polynomial-sized circuits if we can prove the existence of a series of obstructions. These obstructions are subrepresentations as they are known in representation theory and algebraic geometry. If it can be shown that the construction and verication of these obstructions can be done in polynomial time (i.e., if they are in P), then we have a way of constructing these actual obstructions. That the verication and construction are in P follows from a hypothesis: an extension of the Riemann hypothesis over nite elds. So the main goal is to prove this hypothesis [37]. Many researchers are enthusiastic about this approach. However, it is expected that it will take decades until we will receive results. So far, this method has established some lower bounds in matrix multiplication, using the same representation theoretic obstructions [13].

Ryan Williams and NEXP 6⊂ ACC 0 In 2010 Williams showed that NEXP is not contained in ACC (AC with added MODm gates, where m is a constant) [49]. At the core of his proof is an algorithm that determines satisability of a circuit in ACC in less than exponential time (an upper bound). The idea is to replace the original circuit C with another circuit C0, that consists of copies of C in between ∨-gates. In each copy, some of the rst inputs get xed. Then C0 is satisable i C is. This circuit C0 can be transformed into another equivalent circuit C00 with slightly bigger size and ∧-gates. With 2 matrix multiplication the satisability of C00 can be obtained in O(2n−log n) time. In order to show that NEXP 6⊂ ACC, he starts with assuming that NTIME(2n) has circuits in ACC. From there, he can deduce that succinct 3-SAT, a NEXP-complete language, also has a circuit in ACC. The algorithm now gives a way to determine the satisabil- 2 ity of this circuit in O(2n−log n) time. But this is in contradiction with the Nondeterministic Time Hierarchy Theorem [49]. This was not only an impressive result in circuit complexity, but it also evades all dis- ˜ cussed barriers. There are oracles A, A˜ for which NEXPA ⊂ ACCA, so the proof avoids relativization and algebrization. This is because the faster SAT-algorithm uses properties of the circuit that black-box methods do not have access to (the reduction from C0 to C00 is non-relativizing).

31 The natural proofs barrier is also evaded, because of the contradiction with the Nondeter- ministic Time Hierarchy Theorem. Since this is proven with diagonalization, the constructiveness requirement cannot be met [49].

6.3 Attacking NP-hard problems

As said before, most researchers believe that P 6= NP. To nd more evidence for this conjecture, Aaronson studied many ways in which physical processes could solve NP-complete problems. He considers processes as the formation of soap bubbles to protein folding. But these problems form as much an obstacle in the physical world as in mathematics. He pro- poses the following assumption: that we should consider the intractability of NP-complete problems as a principle of physics. This indicates that we would never be able to live in the world of P = NP [1]. However, in practice we still need to deal with NP-hard problems.

6.3.1 Algorithms and approximation For a lot of problems in NP, we have no better algorithm than to try all possible options: brute force (or perebor in Russian). Because computers keep getting faster, some instances of NP-complete problems are solvable in reasonable time with brute force. But the running time grows exponentially with the size of the input, so an algorithm quickly becomes inecient [22]. In articial intelligence, Neural Networks and Genetic Algorithms are used regularly. The workings of these algorithms are based on processes of organisms. Neural Networks mimic the use of our brains, while Genetic Algorithms use Darwin’s natural selection and survival of the ttest [10] . The latter has been used to solve SAT [17] and the Traveling Salesmen Problem, but it still uses resources that scale exponentially with the input size [23]. Aproximation can give us reasonable fast algorithms. Arora, for example, has developed an algorithm that nds an approximation to the optimal route for the Euclidean Traveling Salesman Problem in polynomial time. In practice, approximate solutions can be almost as good as solving the problem exactly. If we could approximate some NP-problems with a specic precision, then P 6= NP might not be such a big issue in real life [22, 6, 7].

6.3.2 Average-case and worst-case scenario’s Someone who rst hears about the Traveling Salesman Problem might think that nding an almost optimum path that goes through a number of cities isn’t that hard (you might have done this a couple times already when you were planning a vacation). Our current classication in complexity theory considers problems in the worst-case scenario. In average- case complexity the hardness of problems that are likely to arise in our daily lives are studied. Some problems are hard in the worst-case precisely when they’re hard on average. Our current cryptography system actually assumes that factoring is hard on average: We have no guarantee that the produced key is a worst-case instance. But we do not know whether

32 NP-complete problems are hard on average precisely when they are hard in worst-case scenarios.

6.3.3 Quantum computing Quantum computation is fundamentally dierent from computation as we know it from Tur- ing machines. Instead of regular bits, a quantum computer uses qubits. The state of such a bit is a superposition of the basic states: Instead of being just a 0 or a 1, it is both at the same 2 2 time. The state is in a superposition: α0 |0i + α1 |1i, such that |α0| + |α1| = 1. Quantum computation has its own complexity classes, such as BQP: The class of decision problems solvable in polynomial time by a quantum Turing machine, where the problem is 2 solved with a probability of at least 3 [14]. Quantum computing can increase the speed of an algorithm. But so far it has not been able to solve NP-complete problems in polynomial time. The most ecient algorithm we have is from Grover, and it’s able to solve SAT in roughly 2n/2 time (instead of 2n) [7, Th. 10.13].

33 7 Conclusion

The P vs NP problem as we know it was rst stated in 1971 by Cook in his paper "The Complexity of Theorem-proving Procedures" [15]. Since then, computer scientists have been busy with the P vs NP problem and more have come to believe that P 6= NP. The proof techniques that were so fruitful in separations or inclusions of other complexity classes, turned out to be incapable of solving P vs NP. Proofs that use diagonalization often relativize: the results also hold when we use oracle Turing machines. But since there are oracles A, B such that PA = NPA and PB 6= NPB, diagonalization alone cannot solve P vs NP. This teaches us that if we want to show that a problem in NP is not in P, then we need to use some characteristic of the NP-problem that oracles do not have access to. Arithmetization and the algebrization barrier show us that this characteristic has to be more than just the possibility to arithmetize formulas to low-degree polynomials. Considering Boolean circuits presented us with the barrier of natural proofs. This tells us that the characteristic of the NP-problem cannot be natural, otherwise we could use it to break pseudorandom function families [2]. Ryan Williams was able to combine known techniques such that his proof evaded all barriers. He also proved a lower bound (NEXP 6⊂ ACC) by showing an upper bound (determining the satisability of ACC circuits in slightly less than exponential time). This is also the idea behind the ip of Mulmuley. Proving upper bounds might be a better method towards solving P vs NP. It is easier to nd an example where something can be done, than to prove that something cannot be done at all. We could try to evade the barriers with known techniques, but many think that an entirely new method like the GCT of Mulmuley is what we need to solve P vs NP. Research can focus on answering the questions in GCT or on nding lower bounds by showing upper bounds. But if experience has taught us anything, then we could also start to look for a barrier towards solving P vs NP via upper bounds.

34 8 Samenvatting

Computers gebruiken algoritmes om verschillende problemen voor ons op te lossen. Een algoritme doet dit stap voor stap. In complexiteitstheorie zijn we geïnteresseerd in hoe lang het duurt om een probleem op te lossen. Omdat de tijd in seconden die een computer hiervoor nodig heeft afhangt van de hardware, kijken we in plaats daarvan naar het aantal stappen dat het algoritme gebruikt. Algoritmes die moeilijke problemen oplossen gebruiken meer stappen dan algoritmes die makkelijke problemen oplossen. Maar als we eenzelfde algoritme een groter probleem geven om op te lossen, heeft deze ook meer stappen nodig dan bij een kleiner probleem. Daarom omschrijven we de tijd die nodig is voor het oplossen als een functie van de grootte van het probleem. Op basis hiervan worden in complexiteitstheorie de problemen die een computer kan oplossen ingedeeld in klassen. De klasse P bevat problemen waarvoor een algoritme is dat polynomiaal veel stappen nodig heeft: Het aantal stappen is te schrijven als een polynoom in de grootte van de invoer. Dit zijn precies de problemen die een computer snel kan oplossen, zoals bijvoorbeeld bepalen of een getal priem is. De klasse NP bevat problemen waarvoor een mogelijke oplossing in polynomiaal veel stappen kan worden gecontroleerd. Dit zijn dus problemen die snel geverieerd kunnen worden, maar niet per se snel opgelost, net als een sudoku puzzel. Veel problemen die we snel zouden willen oplossen zitten in NP, zoals bijvoorbeeld roosteren of het vinden van een optimale vouwing voor een eiwit. Veel mensen houden zich bezig met de vraag of P gelijk is aan NP. Als we oplossing voor een probleem snel kunnen veriëren, kunnen we het probleem zelf dan ook snel oplossen? Als P = NP, zouden we niet alleen beter het weer kunnen voorspellen, maar er zou ook geen groot verschil meer zitten in een oplossing controleren en er zelf mee komen. We weten echter allemaal dat een sudoku oplossen meer tijd kost dan een ingevulde sudoku checken. Daarom verwachten de meesten dat P 6= NP. Een bewijs van P 6= NP kan ons wel inzicht geven in waarom problemen in NP moeil- ijk zijn. We zouden met behulp van het bewijs wellicht een manier kunnen vinden om in sommige gevallen toch eciënt een oplossing te vinden. Aangezien problemen in NP veel voorkomen in het alledaagse leven, maar ook in andere takken van de wetenschap als biolo- gie en economie, is zo’n bewijs voor ons van groot belang. Naast P en NP zijn er nog andere klassen. Zoals problemen die in exponentieel veel stappen zijn op te lossen, of problemen waarbij de computer voor het oplossen polynomiaal veel geheugen gebruikt. Sommige van deze klassen hebben we al gelijk of ongelijk bewezen. Hiervoor worden verschillende bewijstechnieken gebruikt. Diagonalisatie was de eerste succesvolle bewijstechniek. Hierbij wordt gebruik gemaakt van de omschrijving van het algoritme die het probleem oplost. We kunnen het algoritme omschrijven in woorden of programmeercode. Net zoals de computer alle informatie omzet

35 naar een reeks nullen en enen, kan ook deze omschrijving als een binaire reeks worden opgeschreven. Maar een binaire reeks stelt eigenlijk een getal voor. Dus we kunnen alle algoritmes die een probleem oplossen binnen bepaalde tijd op volgorde zetten van klein naar groot, corresponderend met het getal dat de omschrijving voorstelt. Hierdoor kunnen we al deze algoritmes stap voor stap langsgaan. Dit blijkt heel handig te zijn om een probleem te maken dat door geen van deze algoritmes kan worden opgelost. Nu blijkt dat de meeste bewijzen die gebruik maken van deze bewijsmethode een bepaalde eigenschappen bezitten: ze relativizeren. En men heeft kunnen aantonen dat bewijzen die relativizeren, niet kunnen bewijzen dat P = NP of P 6= NP. Er zijn nog meer bewijsmethodes. We kunnen namelijk ook kijken naar het circuit dat gebruikt wordt om een probleem op te lossen, in plaats van het algoritme. Een circuit is een soort model van de chips die gebruikt worden om een computer te maken. Het circuit is ook een generalisatie van Booleaanse formules. Een Booleaanse formule bestaat uit variabelen en de symbolen ∧ (en), ∨ (of) en ¬ (niet). De variabelen kunnen waar (1) of onwaar (0) zijn. Een formule a ∧ b is waar, als a en b allebei waar (1) zijn. Een computer berekent dit door middel van een en-poort. Er gaan twee stroomdraadjes in de poort en er gaat één stroomdraadje uit. Als er een stroompje loopt door allebei de twee ingaande draden, loopt er na de en-poort ook weer een stroompje uit, maar anders niet. Een circuit bestaat uit deze en-poorten, of- poorten en invertors met in- en uitgaande paden. Als we kunnen bewijzen dat problemen in NP niet kunnen worden opgelost door een circuit met polynomiaal veel poorten, hebben we dat P 6= NP. De meeste bewijzen die kijken naar zo’n circuit blijken echter van dezelfde vorm te zijn: het zijn natuurlijke bewijzen. Ook natuurlijke bewijzen vormen een barrière tot het oplossen van P vs NP. Een laatste belangrijke bewijsmethode heet arithmetizatie. Hierbij worden de Booleaanse formules omgezet naar polynomen, die we in plaats van de formules kunnen gebruiken. Als de formule a ∧ b waar is, moeten a en b dus allebei de waarde 1 hebben. Maar dit is het geval als a × b gelijk is aan 1: als a of b 0 (onwaar) zou zijn, zou a × b ook 0 zijn. De ∨ en ¬ symbolen kunnen we op vergelijkbare manier omzetten naar de wiskundige operatoren keer, plus en min. Op deze manier kan een formule met symbolen opgeschreven worden als een polynoom met operatoren. Echter, bewijzen die arithmetizatie gebruiken blijken meestal een eigenschap te bezitten: het zijn algebraizerende bewijzen. En ook hiervoor is bewezen dat het een barriere vormt tot het oplossen van P vs NP. Het P vs NP probleem is dus lastig, aangezien de belangrijkste bewijsmethodes die in andere gevallen succesvol waren het niet kunnen oplossen. Er wordt gewerkt aan een nieuwe bewijstechniek, waarvan gehoopt wordt dat deze P 6= NP kan bewijzen. Maar het is goed mogelijk dat ook hiervoor weer een barrière wordt gevonden.

36 Acknowledgements

During the creation of this project, Ronald de Wolf was very helpful. I would like to thank him for his guidance, which helped me towards understanding some of the more dicult concepts within this subject and which lead to the creation of this thesis. I would also like to thank my fellow student Thijs Benjamins for the useful discussions that we had. I would like to end with my favourite quote from the poll that Gasarch conducted on the P vs NP problem, answering the question whether P = NP [25].

"An alien who knows the answer may say: “Not, but that’s not the right way to formulate the problem. It will be solved after a reformulation of computational complexity" Y. C. Tay

37 Bibliography

[1] Scott Aaronson. “Guest column: NP-complete problems and physical reality”. In: ACM Sigact News 36.1 (2005), pp. 30–52. [2] Scott Aaronson. Has There Been Progress on the P vs. NP Question? Powerpoint Presen- tation. 2010. [3] Scott Aaronson. “Is P versus NP formally independent?” In: Bulletin of the EATCS 81 (2003), pp. 109–136. [4] Scott Aaronson and Avi Wigderson. “Algebrization: A new barrier in complexity theory”. In: ACM Transactions on Computation Theory (TOCT) 1.1 (2009), p. 2. [5] Miklós Ajtai. “Σ 1 1-formulae on nite structures”. In: Annals of pure and applied logic 24.1 (1983), pp. 1–48. [6] Sanjeev Arora. “Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems”. In: Journal of the ACM (JACM) 45.5 (1998), pp. 753– 782. [7] Sanjeev Arora and Boaz Barak. Computational complexity: a modern approach. Cam- bridge University Press, 2009. [8] László Babai, Lance Fortnow, and Carsten Lund. “Non-deterministic exponential time has two-prover interactive protocols”. In: Computational complexity 1.1 (1991), pp. 3– 40. [9] Theodore Baker, John Gill, and Robert Solovay. “Relativizations of the P=?NP question”. In: SIAM Journal on computing 4.4 (1975), pp. 431–442. [10] David Beasley, Ralph R. Martin, and David R. Bull. “An overview of genetic algorithms: Part 1. Fundamentals”. In: University computing 15 (1993), pp. 58–58. [11] Harry Buhrman, Lance Fortnow, and Thomas Thierauf. “Nonrelativizing separations”. In: Computational Complexity, 1998. Proceedings. Thirteenth Annual IEEE Conference on. IEEE. 1998, pp. 8–12. [12] Harry Buhrman, Lance Fortnow, Dieter Van Melkebeek, and Leen Torenvliet. “Sepa- rating complexity classes using autoreducibility”. In: SIAM Journal on Computing 29.5 (2000), pp. 1497–1520. [13] Peter Bürgisser and Christian Ikenmeyer. “Explicit lower bounds via geometric complexity theory”. In: Proceedings of the forty-fth annual ACM symposium on Theory of computing. ACM. 2013, pp. 141–150. [14] Complexity Zoo. https://complexityzoo.uwaterloo.ca/Complexity_Zoo.

38 [15] Stephen A. Cook. “The complexity of theorem-proving procedures”. In: Proceedings of the third annual ACM symposium on Theory of computing. ACM. 1971, pp. 151–158. [16] Martin Davis. Computability & unsolvability. Courier Corporation, 1958. [17] Kenneth A. De Jong and William M. Spears. “Using Genetic Algorithms to Solve NP- Complete Problems.” In: ICGA. 1989, pp. 124–132. [18] Lance Fortnow. A Breakthrough Circuit Lower Bound. 2010. url: http://blog.computationalcomplexity. org/2010/11/breakthrough-circuit-lower-bound.html (visited on 11/09/2010). [19] Lance Fortnow. Favorite Theorems: Circuit Lower Bounds. 2014. url: http://blog.computationalcomplexity. org/2014/11/favorite-theorems-circuit-lower-bounds.html (visited on 11/05/2014). [20] Lance Fortnow. The golden ticket: P, NP, and the search for the impossible. Princeton University Press, 2013. [21] Lance Fortnow. The Importance of Natural Proofs. 2006. url: http://blog.computationalcomplexity. org/2006/05/importance-of-natural-proofs.html (visited on 10/05/2006). [22] Lance Fortnow. “The status of the P versus NP problem”. In: Communications of the ACM 52.9 (2009), pp. 78–86. [23] Bernd Freisleben and Peter Merz. “New genetic local search operators for the traveling salesman problem”. In: Parallel Problem Solving from Nature—PPSN IV. Springer, 1996, pp. 890–899. [24] Merrick Furst, James B. Saxe, and Michael Sipser. “Parity, circuits, and the polynomial- time hierarchy”. In: Mathematical Systems Theory 17.1 (1984), pp. 13–27. [25] William I. Gasarch. “Guest column: The second P=? NP poll”. In: ACM SIGACT News 43.2 (2012), pp. 53–77. [26] William I. Gasarch. “The P =? NP poll”. In: Sigact News 33.2 (2002), pp. 34–47. [27] Juris Hartmanis and Richard E. Stearns. “On the computational complexity of algorithms”. In: Transactions of the American Mathematical Society 117 (1965), pp. 285–306. [28] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. “A pseudorandom generator from any one-way function”. In: SIAM Journal on Computing 28.4 (1999), pp. 1364–1396. [29] Russell Impagliazzo, Valentine Kabanets, and Antonina Kolokolova. “An axiomatic approach to algebrization”. In: Proceedings of the forty-rst annual ACM symposium on Theory of computing. ACM. 2009, pp. 695–704. [30] Brendan Juba. Universal semantic communication. Springer Science & Business Media, 2011. [31] Stasys Jukna. Extremal combinatorics: with applications in computer science. Springer Science & Business Media, 2011. [32] Dexter Kozen. “Indexing of subrecursive classes”. In: Proceedings of the tenth annual ACM symposium on Theory of computing. ACM. 1978, pp. 287–295.

39 [33] Dexter C. Kozen. Automata and computability, Undergraduate Texts in Computer Sci- ence. 1997. [34] Richard E. Ladner. “On the structure of polynomial time reducibility”. In: Journal of the ACM (JACM) 22.1 (1975), pp. 155–171. [35] Leonid A. Levin. “Universal sequential search problems”. In: Problemy Peredachi Infor- matsii 9.3 (1973), pp. 115–116. [36] Carsten Lund, Lance Fortnow, Howard Karlo, and Noam Nisan. “Algebraic methods for interactive proof systems”. In: Journal of the ACM (JACM) 39.4 (1992), pp. 859–868. [37] Ketan D. Mulmuley. “On P vs. NP, Geometric Complexity Theory, and the Flip I: a high level view”. In: arXiv:0709.0748 (2007). [38] Toniann Pitassi and Iddo Tzameret. “Algebraic Proof Complexity: Progress, Frontiers and Challenges”. In: Electronic Colloquium on Computational Complexity 101 (2016). [39] Alexander A. Razborov. “Lower bounds on the size of bounded depth circuits over a complete basis with logical addition”. In: Mathematical Notes 41.4 (1987), pp. 333–338. [40] Alexander A. Razborov and Steven Rudich. “Natural proofs”. In: Journal of Computer and System Sciences 55.1 (1997), pp. 24–35. [41] John E. Savage. “Models of computation”. In: Exploring the Power of Computing (1998). [42] Adi Shamir. “IP= PSPACE”. In: Journal of the ACM (JACM) 39.4 (1992), pp. 869–877. [43] Claude Shannon et al. “The synthesis of two-terminal switching circuits”. In: Bell Sys- tem Technical Journal 28.1 (1949), pp. 59–98. [44] Michael Sipser. “The history and status of the P versus NP question”. In: Proceedings of the twenty-fourth annual ACM symposium on Theory of computing. ACM. 1992, pp. 603– 618. [45] Roman Smolensky. “Algebraic methods in the theory of lower bounds for Boolean circuit complexity”. In: Proceedings of the nineteenth annual ACM symposium on Theory of computing. ACM. 1987, pp. 77–82. [46] Richard Edwin Stearns, Juris Hartmanis, and Philip M. Lewis. “Hierarchies of memory limited computations”. In: Sixth Annual Symposium on Switching Circuit Theory and Logical Design. IEEE. 1965, pp. 179–190. [47] Alan M. Turing. “Computing machinery and intelligence”. In: Mind 59.236 (1950), pp. 433– 460. [48] Alan M. Turing. “On computable numbers, with an application to the Entscheidungsprob- lem”. In: Proceedings of the London Mathematical Society 42.2 (1936), pp. 230–265. [49] Ryan Williams. “Nonuniform ACC circuit lower bounds”. In: Journal of the ACM (JACM) 61.1 (2014), p. 2.