CS-E4530 Computational Complexity Theory

CS-E4530 Computational Complexity Theory Lecture 12: Boolean Circuits and Parallel Computation Aalto University School of Science Department of Computer Science Spring 2020 Topics Boolean circuits and circuit complexity Parallel algorithms and complexity measures Parallel models of computation Parallel random access machines (PRAMs) The class NC Sequential space and parallel time (C. Papadimitriou: Computational Complexity, Chapters 15 and 16) CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 2/25 12.1 Boolean Circuits and Circuit Complexity A Boolean circuit with n inputs accepts a string x = x1 :::xn of length n in 0;1 if the output of the circuit is true given x as its f g∗ input (i.e., input i is true if xi is 1 and false otherwise). The size of a circuit is the number of gates in it. To discuss circuits on strings of arbitrary length, we consider families of circuits. A family of circuits is an infinite sequence C = (C0;C1;:::) of Boolean circuits where Cn has n input variables. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 3/25 Languages with polynomial circuits A language L 0;1 has polynomial(-size) circuits if there is a ⊆ f g∗ family C = (C0;C1;:::) such that 1. the size of Cn is at most p(n), for some fixed polynomial p; and 2. for all x 0;1 ∗, x L iff the output of C x is true when x is 2 f g 2 j j given as input to C x . j j Proposition (12.1) All languages in P have polynomial circuits. Proof By the P-completeness proof of CIRCUIT VALUE: Given a Turing machine M, input string x, x = n, and time bound p(n), j j a variable-free circuit CM(x) of size O(p(n)2) can be constructed such that the output of CM(x) is true iff M accepts x. The circuit is the same for all inputs of length n except for the input gates that encode x. It is easy to modify input gates of CM(x) to variable gates M that correspond to the symbols in x, resulting in a circuit Cn . CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 4/25 Proposition (12.2) There are undecidable languages that have polynomial circuits. Proof Let L 0;1 be an undecidable language and consider the language ⊆ f g∗ U = tally(L) = 1num(1x) : x L , where ’num(z)’ denotes the numer- f 2 g ical value of binary string z. We note: 1. U is undecidable. 2. U has polynomial circuits (C0;C1;:::) where each Cn consists of n input gates and either: n I n 1 AND-gates (computing the AND of all input bits) if 1 U; or − n 2 I a constant false output gate if 1 U. 62 CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 5/25 Uniformly polynomial circuits A family C = (C0;C1;:::) of circuits is said to be uniform if there is a logn-space bounded Turing machine which on input 1n outputs Cn. A language L 0;1 has uniformly polynomial circuits if there ⊆ f g∗ is a uniform family of polynomial circuits C = (C0;C1;:::) that decides L. Theorem (12.3) A language L has uniformly polynomial circuits iff L P. 2 Proof ( ) x L can be decided in polynomial time by constructing C x in ) 2 j j log x space (and hence in polynomial time) and evaluating it for input j j x (in polynomial time). ( ) By the P-completeness proof of CIRCUIT VALUE: ( Given a Turing machine M running in time p(n) and input length n, M 2 the circuit Cn from Proposition 2.1 has size O(p(n) ) and can be constructed in logn space. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 6/25 Polynomial circuits and P vs NP P = NP is equivalent to 6 I Conjecture A: NP-complete problems have no uniformly polynomial circuits. Related(?) Conjecture B: NP-complete problems have no polynomial circuits, uniform or not. Shannon (1949): Most Boolean functions do not have polynomial (not even subexponential) circuits. An approach to establish P = NP: Prove Conjecture B for some 6 NP-complete problem. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 7/25 12.2 Parallel Algorithms and Complexity Measures The goal of parallel algorithms is to be dramatically better than sequential ones, preferably polylogarithmic, i.e. such that the length of the parallel computation is O(logk n) for some k. However, the executions of parallel algorithms should not require inordinately large (superpolynomial) numbers of processors. Let us now consider parallel computations using processors more powerful than single Boolean gates. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 8/25 Case: Matrix Multiplication The task is to compute the product of two n n matrices A and B. × The product C = AB is defined by n cij = ∑ aikbkj k=1 for indices i and j ranging from 1 to n. There is a sequential algorithm with O(n3) arithmetic operations. Example: multiplying two 4 4 matrices × c1;1 = a1;1b1;1 + a1;2b2;1 + a1;3b3;1 + a1;4b4;1 c1;2 = a1;1b1;2 + a1;2b2;2 + a1;3b3;2 + a1;4b4;2 . CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 9/25 The same can be achieved in T = 1 + logn parallel steps with n3 processors and n3 + n2 (n 1) = 2n3 n2 arithmetic × − − operations. Example: multiplying two 4 4 matrices × + c1,1 + c1,2 + + + × × × × × a a a a a b b b b 1,1 1,2 1,3 1,4 2,1 ···1,1 2,1 3,1 4,1 ··· The number of processors used by the algorithm can be brought down to any P using Brent’s principle: The computation is done in 1 + logn “shifts” (corresponding to the levels of the computation n3=2t DAG), where each shift t = 0;:::;logn is completed in P 2n3 d e parallel steps, for a total of TP P + logn + 1 steps. n3 ≤ With P , one still achieves TP = O(logn) parallel time. ∼ logn CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 10/25 Observations The amount of work done by a parallel algorithm can be no smaller than the time complexity of the best sequential algorithm. Parallel computation is not the answer to NP-completeness: total work = parallel time number of processors, × If the total amount of work ( sequential time) is exponential, ∼ then either the number of parallel steps or the number of processors (or both) is exponential. Brent’s Principle in general: Let TP be the computation time on P processors. Then T1 TP + T¥: ≤ P CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 11/25 Parallel time and work The primary complexity measures for parallel computation are parallel time and parallel work. Let C = (C0;C1;:::) be a uniform family of Boolean circuits. I C has parallel time t(n) if for all n the depth of Cn is at most t(n). I C has parallel work s(n) if for all n the size of Cn is at most s(n). The class PT=WK(t(n);s(n)) comprises the languages L 0;1 for which there is a uniform ⊆ f g∗ family of circuits C deciding L with O(t(n)) parallel time and O(s(n)) parallel work. Note: No speedup theorem for circuits big-Oh notation in the ) definition. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 12/25 Case: REACHABILITY By using the circuit representation of Warshall’s algorithm for reflexive and transitive closure of graphs (recall Lecture 7 on “Reductions and Completeness”), we have REACHABILITY PT=WK(n;n3) 2 By an adjacency matrix based technique, one achieves REACHABILITY PT=WK(log2 n;n3 logn) 2 I Input: Boolean adjacency matrix A of a graph G = (V;E) with self-loops (Aii = for all i). > 2 2 Wn I Boolean product of A with itself A = AA is A = (Aik Akj) ij k=1 ^ for all 1 i;j n = V . [ PT=WK(logn;n3)] ≤ ≤ j j ) I Parallel algorithm is obtained by computing the transitive closure 2 4 2 logn A∗ of A by the sequence A, A , A ,..., A d e . [ PT=WK(log2 n;n3 logn)] ) CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 13/25 Some Other Problems Summarised 1. Arithmetic Operations Using the prefix sum technique, the sum of two n-bit binary integers can be computed in O(logn) parallel time and O(n) work. For products of n-bit integers, the work becomes O(n2 logn). 2. Maximum Flow A prime example of a polynomial-time solvable problem that seems to be inherently sequential. 3. The Travelling Salesperson Problem Parallelism is not sufficient alone to conquer NP-completeness. 4. Matrix Determinants and Inverses There is a polylog parallel time & polynomial work algorithm. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 14/25 12.3 Parallel Random Access Machines (PRAMs) A formal model of parallel computation by a large number of independent processors. Each processor can execute its own program and can communicate with other processors instantaneously and synchronously through a large shared memory. Complexity closely related to Boolean circuits, hence a useful high-level tool for designing parallel algorithms and understanding parallelism. From the practical point of view somewhat unrealistic due to the assumptions of instantaneous communication and concurrent write access to the shared memory.

Load more