<<

CS-E4530 Computational Complexity Theory

Lecture 12: Boolean Circuits and Parallel Computation

Aalto University School of Science Department of Computer Science

Spring 2020 Topics

Boolean circuits and Parallel algorithms and complexity measures Parallel models of computation Parallel random access machines (PRAMs) The class NC Sequential space and parallel time (C. Papadimitriou: Computational Complexity, Chapters 15 and 16)

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 2/25 12.1 Boolean Circuits and Circuit Complexity

A with n inputs accepts a string x = x1 ...xn of length n in 0,1 if the output of the circuit is true given x as its { }∗ input (i.e., input i is true if xi is 1 and false otherwise). The size of a circuit is the number of gates in it. To discuss circuits on strings of arbitrary length, we consider families of circuits.

A family of circuits is an infinite sequence C = (C0,C1,...) of Boolean circuits where Cn has n input variables.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 3/25 Languages with polynomial circuits A language 0,1 has polynomial(-size) circuits if there is a ⊆ { }∗ family C = (C0,C1,...) such that 1. the size of Cn is at most p(n), for some fixed polynomial p; and 2. for x 0,1 ∗, x L iff the output of C x is true when x is ∈ { } ∈ | | given as input to C x . | | Proposition (12.1) All languages in P have polynomial circuits.

Proof By the P-completeness proof of CIRCUIT VALUE: Given a Turing machine M, input string x, x = n, and time bound p(n), | | a variable-free circuit CM(x) of size O(p(n)2) can be constructed such that the output of CM(x) is true iff M accepts x. The circuit is the same for all inputs of length n except for the input gates that encode x. It is easy to modify input gates of CM(x) to variable gates M that correspond to the symbols in x, resulting in a circuit Cn .

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 4/25 Proposition (12.2) There are undecidable languages that have polynomial circuits.

Proof Let L 0,1 be an undecidable language and consider the language ⊆ { }∗ U = tally(L) = 1num(1x) : x L , where ’num(z)’ denotes the numer- { ∈ } ical value of binary string z. We note: 1. U is undecidable.

2. U has polynomial circuits (C0,C1,...) where each Cn consists of n input gates and either: n I n 1 AND-gates (computing the AND of all input bits) if 1 U; or − n ∈ I a constant false output gate if 1 U. 6∈

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 5/25 Uniformly polynomial circuits A family C = (C0,C1,...) of circuits is said to be uniform if there is a logn-space bounded Turing machine which on input 1n outputs Cn. A language L 0,1 has uniformly polynomial circuits if there ⊆ { }∗ is a uniform family of polynomial circuits C = (C0,C1,...) that decides L. Theorem (12.3) A language L has uniformly polynomial circuits iff L P. ∈ Proof

( ) x L can be decided in polynomial time by constructing C x in ⇒ ∈ | | log x space (and hence in polynomial time) and evaluating it for input | | x (in polynomial time). ( ) By the P-completeness proof of CIRCUIT VALUE: ⇐ Given a Turing machine M running in time p(n) and input length n, M 2 the circuit Cn from Proposition 2.1 has size O(p(n) ) and can be con- structed in logn space. CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 6/25 Polynomial circuits and P vs NP P = NP is equivalent to 6 I Conjecture A: NP-complete problems have no uniformly polynomial circuits. Related(?) Conjecture B: NP-complete problems have no polynomial circuits, uniform or not. Shannon (1949): Most Boolean functions do not have polynomial (not even subexponential) circuits. An approach to establish P = NP: Prove Conjecture B for some 6 NP-complete problem.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 7/25 12.2 Parallel Algorithms and Complexity Measures

The goal of parallel algorithms is to be dramatically better than sequential ones, preferably polylogarithmic, i.e. such that the length of the parallel computation is O(logk n) for some k. However, the executions of parallel algorithms should not require inordinately large (superpolynomial) numbers of processors. Let us now consider parallel computations using processors more powerful than single Boolean gates.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 8/25 Case: Matrix Multiplication

The task is to compute the product of two n n matrices A and B. × The product C = AB is defined by n cij = ∑ aikbkj k=1 for indices i and j ranging from 1 to n. There is a sequential algorithm with O(n3) arithmetic operations. Example: multiplying two 4 4 matrices ×

c1,1 = a1,1b1,1 + a1,2b2,1 + a1,3b3,1 + a1,4b4,1

c1,2 = a1,1b1,2 + a1,2b2,2 + a1,3b3,2 + a1,4b4,2 . .

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 9/25 The same can be achieved in T = 1 + logn parallel steps with n3 processors and n3 + n2 (n 1) = 2n3 n2 arithmetic × − − operations. Example: multiplying two 4 4 matrices × + c1,1 + c1,2

+ + +

× × × × × a a a a a b b b b 1,1 1,2 1,3 1,4 2,1 ···1,1 2,1 3,1 4,1 ··· The number of processors used by the algorithm can be brought down to any P using Brent’s principle: The computation is done in 1 + logn “shifts” (corresponding to the levels of the computation n3/2t DAG), where each shift t = 0,...,logn is completed in P 2n3 d e parallel steps, for a total of TP P + logn + 1 steps. n3 ≤ With P , one still achieves TP = O(logn) parallel time. ∼ logn

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 10/25 Observations

The amount of work done by a parallel algorithm can be no smaller than the of the best sequential algorithm. Parallel computation is not the answer to NP-completeness: total work = parallel time number of processors, × If the total amount of work ( sequential time) is exponential, ∼ then either the number of parallel steps or the number of processors (or both) is exponential.

Brent’s Principle in general: Let TP be the computation time on P processors. Then T1 TP + T∞. ≤ P

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 11/25 Parallel time and work

The primary complexity measures for parallel computation are parallel time and parallel work.

Let C = (C0,C1,...) be a uniform family of Boolean circuits. I C has parallel time t(n) if for all n the depth of Cn is at most t(n). I C has parallel work s(n) if for all n the size of Cn is at most s(n). The class PT/WK(t(n),s(n)) comprises the languages L 0,1 for which there is a uniform ⊆ { }∗ family of circuits C deciding L with O(t(n)) parallel time and O(s(n)) parallel work. Note: No speedup theorem for circuits big-Oh notation in the ⇒ definition.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 12/25 Case: REACHABILITY

By using the circuit representation of Warshall’s algorithm for reflexive and transitive closure of graphs (recall Lecture 7 on “Reductions and Completeness”), we have

REACHABILITY PT/WK(n,n3) ∈ By an adjacency matrix based technique, one achieves

REACHABILITY PT/WK(log2 n,n3 logn) ∈

I Input: Boolean adjacency matrix A of a graph G = (V,E) with self-loops (Aii = for all i). > 2 2 Wn I Boolean product of A with itself A = AA is A = (Aik Akj) ij k=1 ∧ for all 1 i,j n = V . [ PT/WK(logn,n3)] ≤ ≤ | | ⇒ I Parallel algorithm is obtained by computing the transitive closure 2 4 2 logn A∗ of A by the sequence A, A , A ,..., A d e . [ PT/WK(log2 n,n3 logn)] ⇒

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 13/25 Some Other Problems Summarised

1. Arithmetic Operations Using the prefix sum technique, the sum of two n-bit binary integers can be computed in O(logn) parallel time and O(n) work. For products of n-bit integers, the work becomes O(n2 logn). 2. Maximum Flow A prime example of a polynomial-time solvable problem that seems to be inherently sequential. 3. The Travelling Salesperson Problem Parallelism is not sufficient alone to conquer NP-completeness. 4. Matrix Determinants and Inverses There is a polylog parallel time & polynomial work algorithm.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 14/25 12.3 Parallel Random Access Machines (PRAMs)

A formal model of parallel computation by a large number of independent processors. Each processor can execute its own program and can communicate with other processors instantaneously and synchronously through a large shared memory. Complexity closely related to Boolean circuits, hence a useful high-level tool for designing parallel algorithms and understanding parallelism. From the practical point of view somewhat unrealistic due to the assumptions of instantaneous communication and concurrent write access to the shared memory.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 15/25 The PRAM model

A PRAM program is a set of RAM programs P = (Π1,...,Πq), one for each of the q RAMs. Each RAM Πi executes its own program, has its own program counter and accumulator, i.e. the ith register, but shares all registers (including accumulators and input). For concurrent writes the RAM with the smallest index prevails, i.e. we assume the PRIORITY CRCW PRAM model.1 input registers parallel programs registers 0 1 1 read 1 read 2 ... store 1 store 2 read 3 noop store 3 noop load 1 load 2 jzero f4 jpos t5 load 3 load 3 jump e4 jump e5 f4: load =0 t5: load =1 noop noop e4: store. 4 e5: .store 5 . . 1Cf. Papadimitriou, Note 15.5.7.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 16/25 Uniform PRAM families

The number q of RAMs is a function q(m,n) of the number m of input integers in I = (i1,...,im) and their total length n = l(I). A family of PRAMs P = Pm,n m,n 0 is uniform if there is a { | ≥ } TM which given 1m01n generates q(m,n) and the programs Pm,n = (Πm,n,0,Πm,n,1,...,Πm,n,q(m,n)) all in logarithmic space.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 17/25 Computing functions with PRAMs Let F be a function from finite sequences of integers to finite sequences of integers; and t(n) and q(n) functions from positive integers to positive integers.

Let P = Pm,n m,n 0 be a uniform family of PRAMs. { | ≥ } Definition (12.1) The family P computes F in parallel time t with q processors if for each m,n 0, for Pm,n it holds that: ≥ (i) it has q(m,n) q(n) processors and ≤ (ii) if Pm,n is executed on input I of m integers with total length n, then all q(m,n) RAMs reach a HALT instruction after at most t(n) steps and the k q(m,n) first registers contain the output F(I). ≤

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 18/25 PRAMs and circuits PRAMs can simulate circuits: If L 0,1 is in PT/WK(t(n),s(n)), then there is a uniform ⊆ { }∗ PRAM that computes the corresponding function FL in parallel s(n) time O(t(n)) using O( t(n) ) processors. [Brent’s principle!] Circuits can simulate PRAMs: Let F be computed by a uniform PRAM in parallel time t(n) using q(n) processors (t(n),q(n) can be computed from 1n in logarithmic space). Then there is a uniform family of circuits of depth O(t(n)(t(n) + logn)) and size O(q(n)t(n)(nkt(n) + q(n))) which computes the binary representation of F. (Here nk is the time bound of the logspace TM computing the nth PRAM in the family given 1n as its input.)

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 19/25 12.4 The Class NC

What would be the class of problems that is satisfactorily solved by parallel computers? A candidate definition (“Nick’s class”): NC = PT/WK(logk n,nk). NC is the class of languages decided by PRAMs in polylogarithmic parallel time and with polynomially many processors. However, the difference between polylog and polynomial is seen sometimes only for big n. For example, consider log3 n and √n: log3 108 > 18000 and √108 = 10000. One possibility is to consider subclasses of NC for j = 1,2,... : j k NCj = PT/WK(log n,n ) — a potential hierarchy of classes.

The class NC2 provides an alternative (more conservative) notion of “efficient parallel computation”.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 20/25 NC vs P Clearly NC P but is NC = P? ⊆ There seem to be problems in P that are inherently sequential. Conjecture: NC = P. 6 Since NC (and NC2) is closed under logspace reductions, P-complete problems are the least likely to be in NC.

Theorem (12.4) CIRCUIT VALUE is P-complete.

Other P-complete problems are e.g. HORNSAT and MAX FLOW(D).

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 21/25 12.5 Sequential Space and Parallel Time

Logarithmic space classes and parallel complexity classes are closely related: Theorem (12.5)

NC1 L NL NC2. ⊆ ⊆ ⊆ Proof 1. For the last inclusion note that REACHABILITY is NL-complete (by the “reachability method” of Lecture 6), and belongs to 2 3 PT/WK(log n,n logn) NC2. ⊆ 2. The inclusion in the middle is trivial. 3. For the first inclusion, we have to compose three algorithms that operate in logarithmic space.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 22/25 Proof cont’d

Proof of NC1 L ⊆ The following logspace algorithms are needed: 1. The first generates a circuit C from the given uniform family. 2. The second transforms C into an equivalent circuit/expression E whose gates have all outdegree one (no shared subexpressions). – Each path in C identifies a gate in E. 3. The third evaluates the output gate of the tree-like circuit E. – During the recursive evaluation, it is sufficient to remember the label of the gate being evaluated and its truth value. The composition operates in logarithmic space (cf. Section 8.3).

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 23/25 The parallel computation thesis The previous result can be generalised beyond logarithmic space: PT/WK(f (n),kf (n)) SPACE(f (n)) ⊆ NSPACE(f (n)) ⊆ 2 PT/WK(f (n)2,kf (n) ). ⊆ Sequential space and parallel time complexities are polynomially related!

Theorem (12.6) REACHABILITY is NL-complete.

Theorem (12.7) 2SAT is NL-complete.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 24/25 Learning Objectives

Basic concepts of Boolean circuit complexity: languages with polynomial circuits and uniformly polynomial circuits. Parallel models of computation: uniform circuits and PRAMs

The classes NC, NC1, NC2 and their relationship to L,NL,P.

CS-E4530 Computational Complexity Theory / Lecture 12 Department of Computer Science 25/25