<<

Time Complexity (1)

CSCI 2670

Original Slides were written by Dr. Frederick W Maier

Spring 2014

CSCI 2670 Time Complexity (1) Time Complexity

I So far we’ve dealt with determining whether or not a problem is decidable. I But even if it is, it might be “too difficult” in practice to decide. I For a given string w and language , it might require too much time or too much memory to determine whether or not w ∈ L. I The time required to solve a problem is called its time complexity. I The memory required to solve a problem is called its . I theory is the study of the time and space complexity of problems. I Chapter 7 deal with time complexity.

CSCI 2670 Time Complexity (1) Time Complexity

k k I Suppose we have a TM to decide A = {0 1 |k ≥ 0}. I This language is context free and so decidable. I Informally, the time complexity is the number of steps required by the TM, as a of the input size. I We want to know the number of steps needed to determine whether w ∈ A. I We usually express time complexity as a function of the length of the input string w. I Note that if different input strings u and v both have length n, it might take more time to process u than v. I In worst case analysis, we are interested in the maximum number of steps required for an input string of length n. I In average case analysis, we are interested in the average number of steps required for an input string of length n.

CSCI 2670 Time Complexity (1) Time Complexity

Definition I If M is a deterministic TM that halts on all inputs, then the time complexity (running time) of M is the function f : N → N, where f (n) is the maximum number of steps M uses on an input of length n. I We say that M runs in time f (n) and M is an f (n) .

I It is often more convenient to use estimates of f (n) rather than f (n) itself to describe the running time of a TM. I In asymptotic analysis, we estimate the running time of the when it is run on large inputs. I Not all terms of the function contribute very much to the running time and so can be ignored. I The most common estimates are big-O and small-O (little-O) estimates.

CSCI 2670 Time Complexity (1)

Definition If f and g are functions such that f , g : N → +, then f (n) is O(g(n)) iff there exist positive integers c and n0 such that

f (n) ≤ c · g(n) for all n ≥ n0.

g(n) is an asymptotic upper bound (cg(n) is an upper bound of f (n)).

“f (n) is O(g(n))” means that if we ignore constant factors, f (n) ≤ g(n).

Example 3 2 3 I 5n + 2n + 22n + 6 is O(n ): Let c = 6 and n0 = 10. 2 I n is O(n ). Let c = 1 and n0 = 1. 2 2 I n is not O(n): Not matter what c and n0 are chosen, n  cn for some n ≥ n0.

CSCI 2670 Time Complexity (1) Big O Notation and

I The base of a doesn’t matter when using Big-O notation. Note that for any bases a and b, log (n) = loga(n) . I b loga(b)

loga(n) So, if f (n) ≤ clogb(n), then f (n) ≤ c . I loga(b) c Letting c1 = d , it follows that f (n) ≤ c1loga(n). I loga(b) I So, if f (n) is O(logb(n)), f (n) is O(loga(n)). I We don’t even bother with the base: f (n) is O(log(n)).

Example

If f (n) = 3nlog2(n) + 5nlog2(log2(n)) + 2, then f (n) is O(nlog(n)).

CSCI 2670 Time Complexity (1) Arithmetic and Big O Notation

I If f1(n) is O(g1(n)) and f2(n) is O(g2(n)), then I f1(n) + f2(n) is O(g1(n)) + O(g2(n)). I f1(n) + f2(n) is max(O(g1(n)), O(g2(n))). I If f (n) appears in an exponent, we can use the Big-O estimate there: 3n3+2n2+n+6 O(n3) I 2 is 2 .

c I Frequently we derive bounds of the form n for c > 0. Such bounds are called bounds. (nδ ) I Bounds of the form 2 are called exponential bounds when δ is a real number greater than 0.

CSCI 2670 Time Complexity (1) Small-O Notation

I In a way, Big-O notation says that f (n) is less than or equal to g(n). I Small-O notation says that f (n) is less than g(n).

Definition If f and g are functions such that f , g : N → R+, then f (n) is o(g(n)) iff

f (n) lim = 0. n→∞ g(n)

Alternatively, f (n) is o(g(n)) iff for all real constants c > 0, there is an n0 such that f (n) < cg(n) for all n ≥ n0.

Example √ I n is o(n). I n is o(nlog(log(n)). 2 I nlog(n) is o(n ). 2 3 I n is o(n ).

CSCI 2670 Time Complexity (1) Analyzing

k k I Consider TM M1 which decides A = {0 1 |k ≥ 0}. It works in 4 phases.

On input w: 1. Scan the tape, rejecting if a 0 is found to the right of a 1. 2. While both 0s and 1s are still on the tape: 3. Scan the tape, marking off a single 0 and 1. 4. Reject if a 0 remains but all 1s are marked, or vice versa. If not, accept.

I What is the running time of M1 as a function of n?

CSCI 2670 Time Complexity (1) Analyzing Algorithms

k k I Consider TM M1 which decides A = {0 1 |k ≥ 0}. It works in 4 phases.

On input w: 1. Scan the tape, rejecting if a 0 is found to the right of a 1. 2. While both 0s and 1s are still on the tape: 3. Scan the tape, marking off a single 0 and 1. 4. Reject if a 0 remains but all 1s are marked, or vice versa. If not, accept.

I Phase 1 scans once through the tape, taking O(n) steps, where |w| = n. The tapehead returns left—another n steps. Phase 1 takes O(n) steps. I In Phase 2-3, the tape is scanned to check that both 1s and 0s appear; another scan marks off a single 0 and 1. I In each cycle, 2 symbols are marked, and so the total number of cycles is O(n/2). Phases 2 and 3 together take O(n2) steps. I In Phase 4, we check to see that all 0s and 1s are marked off. This takes only a single scan of the tape: O(n). 2 2 I And so the running time of M1 is O(n) + O(n ) + O(n) = O(n ).

CSCI 2670 Time Complexity (1) Complexity Classes: TIME(t(n))

2 I Observe that M1 ran in time O(n ), and it decides A. I We can classify languages by the algorithms that decide them.

Definition Let t : N → R+ be a function. The time TIME(t(n)) is the set of all languages that can be decided in time O(t(n)).

2 3 I Observe, e.g., if L ∈ TIME(n ), then L ∈ TIME(n ). I If a decider M for L runs in time O(t(n)), then L ∈ TIME(t(n)). k k 2 I So, A = {0 1 |k ≥ 0} is in time TIME(n ). I Failing to find a O(t(n))-time decider doesn’t imply L ∈/ TIME(t(n)).

CSCI 2670 Time Complexity (1) Analyzing Algorithms

k k I Consider TM M2 which decides A = {0 1 |k ≥ 0}. It works in 5 phases.

On input w: 1. Scan the tape, rejecting if a 0 is found to the right of a 1. 2. While some 0s and some 1s are on the tape: 3. Scan the tape. Reject if the number of unmarked symbols is odd. 4. Scan the tape, crossing off every other 0 and every other 1. 5. Scan the tape. If all symbols are marked, accept. Otherwise reject.

I What’s the running time of M2?

CSCI 2670 Time Complexity (1) Analyzing Algorithms

k k I Consider TM M2 which decides A = {0 1 |k ≥ 0}. It works in 5 phases.

On input w: 1. Scan the tape, rejecting if a 0 is found to the right of a 1. 2. While some 0s and some 1s are on the tape: 3. Scan the tape. Reject if the number of unmarked symbols is odd. 4. Scan the tape, crossing off every other 0 and every other 1. 5. Scan the tape. If all symbols are marked, accept. Otherwise reject.

I Phase 1 again takes O(n) steps, as does phase 5. I To check that some 0s and 1s appear (phase 2) takes O(n) steps. I Each execution of phase 3 and 4 takes O(n) steps. I Each execution of phase 4 cuts the number of 0s and 1s by half. I Phase 3, 4 run at most 1 + log(n) times; phases 2-4 take time O(nlog(n). I Total running time of M2: O(n) + O(nlog(n)) = O(nlog(n)). I As such, A ∈ TIME(nlog(n)).

CSCI 2670 Time Complexity (1) Analyzing Algorithms

I TMs M1 and M2 were single tape deterministic machines. I M3 is a 2-tape machine that decides A.

On input w: 1. Scan tape 1, rejecting if a 0 is found to the right of a 1. 2. Scan across the 0s on tape 1 until the first 1. At the same time, copy the 0s onto tape 2. 3. Scan across the 1s on tape 1 until the end of the input. For each 1 read on tape 1, cross off a 0 on tape 2. If all 0s are crossed off before all the 1s are read, reject. 4. If all the 0s have now been crossed off, accept. If any 0s remain, reject.”

I What’s the running time of M3?

CSCI 2670 Time Complexity (1) Analyzing Algorithms

I TMs M1 and M2 were single tape deterministic machines. I M3 is a 2-tape machine that decides A.

On input w: 1. Scan tape 1, rejecting if a 0 is found to the right of a 1. 2. Scan across the 0s on tape 1 until the first 1. At the same time, copy the 0s onto tape 2. 3. Scan across the 1s on tape 1 until the end of the input. For each 1 read on tape 1, cross off a 0 on tape 2. If all 0s are crossed off before all the 1s are read, reject. 4. If all the 0s have now been crossed off, accept. If any 0s remain, reject.”

I M3 runs in time O(n). I Note that this running time is the best possible because n steps are necessary just to read the input.

CSCI 2670 Time Complexity (1) Analyzing Algorithms

I M2 decides A in time O(nlog(n)). I It turns out that no single tape TMs can decide A in o(nlog(n)). I In fact, a language decidable by a 1-tape TM in o(nlog(n)) is regular. I Yet a multitape TM can decide A (a nonregular language) in O(n). I It is important to realize the following:

The complexity class of a language depends on the computational model.

I Contrast this with the Church-Turing Thesis, which asserts that the computational model doesn’t affect decidability. I One question to ask is: How do the complexity classes of one computational model relate to those of another model?

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: k-tape vs. 1-tape

Theorem Every t(n) time multitape TM M, where t(n) ≥ n, has an equivalent O(t2(n)) single tape TM S.

We already have a way to convert a k-tape TM M into a single tape machine S. We simply need to analyze the behavior of S.

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: k-tape vs. 1-tape

I To simulate the behavior of the multi-tape machine, the single tape machine makes multiple passes across its tape, updating the virtual tapes appropriately. I If a virtual tape head moves onto an unread blank, the contents of the single tape must be shifted appropriately to make room.

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: k-tape vs. 1-tape

Theorem Every t(n) time multitape TM M, where t(n) ≥ n, has an equivalent O(t2(n)) single tape TM S.

M: a k-tape TM; S: a single tape TM. I For each step of M, S makes two passes over its tape: one to collect information about the current configuration of M, and one pass to update M’s configuration. I Note that each of M’s tapes can use at most t(n) tape cells. I S must simulate each of the t(n) steps of M. I To update a given portion of S tape (representing one of the k tapes), a shift of its contents might be needed. This takes no more than t(n) steps. So to update all k tape portions takes O(kt(n)) or O(t(n)) steps. 2 I As such, the total running time of S is t(n)O(t(n)) or O(t (n)).

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

I In a nondeterminstic TM N, the computations form a tree. I Each node represents a configuration. I Each edge represents a possible transition from one configuration to another. I The TM accepts w if any branch ends in an accepting configuration. I It rejects if all branches end in a rejecting configuration. I In a nondeterminstic decider, all computation branches halt.

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

Definition Let N be a nondeterministic TM decider. The running time f (n) of N is a function f : N → N where f (n) is the maximum number of steps taken by any of N’s computation branches on any input of length n.

Theorem Every t(n) time nondeterministic single-tape TM, where t(n) ≥ n, has an equivalent 2O(t(n))-time deterministic single-tape TM.

I Observe that we have a way to simulate a nondeterministic TM (NTM) N into a deterministic 3-tape machine D.

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

An NTM N can be simulated with deterministic TM (DTM) D.

I D uses 3 tapes: I Tape 1—Records the input string. It’s reused many times. I Tape 2—used as N’s tape. I Tape 3: holds a string d1d2d3 ... dn, where each di indicates a choice to make at step i. Each di is taken from 1,2,...,b, where b is the maximum number of children of any node. The string d1d2d3 ... dn indicates a computation branch of N.

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

1. The DTM D begins with tape 2,3 empty; tape 1 holds the input . 2. Wipe tape 2, and copy the input string from tape 1 to tape 2. 3. Simulate the NTM N on tape 2.

3.1 At each step i, determine the value v of cell di on tape 3. 3.2 If v is a valid transition choice for N, then update tape 2. 3.3 If not, abort the branch: GOTO step 4. 3.4 Also abort if the transition represents reject. 4. Update the value on tape 3 by choosing the lexicographically next string. GOTO step 2.

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

I When simulating the NTM N, D uses a breadth first search (it visits all nodes of depth d before visiting nodes of depth d+1). This is due to how strings for tape 3 are chosen (lexicographically). I Let b be the maximum number of children of any node. I Every branch of N’s nondeterministic computation tree has a length of at most t(n). I The time for starting from the root and traveling down to a node is O(t(n)).

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

t(n) I The # of leaves in the tree is at most: b . I The total number of nodes in the tree is less than twice the maximum number of leaves. t(n) I So the # of nodes in the tree is bounded by the # of leaves: O(b ). t(n) I As such, the total running time of D is O(t(n)b ).

CSCI 2670 Time Complexity (1) Complexity Relationships Among Models: NTMs vs DTMs

Definition Let N be a nondeterministic TM decider. The running time f (n) of N is a function f : N → N where f (n) is the maximum number of steps taken by any of N’s computation branches on any input of length n.

Theorem Every t(n) time nondeterministic single-tape TM, where t(n) ≥ n, has an equivalent 2O(t(n))-time deterministic single-tape TM.

t(n) I The total running time of D is O(t(n)b ). t(n) log (t(n)bt(n)) log (t(n))+log (bt(n)) log (t(n))+t(n)log (b) I t(n)b = 2 2 = 2 2 2 = 2 2 2 log (t(n))+t(n)log (b) O(t(n)) I Running time of D is O(2 2 2 ) = 2 . I From D we can construct a single tape DTM M running in squared time. O(t(n)) 2 O(2t(n)) O(t(n)) I Running time of M is (2 ) = 2 = 2 .

CSCI 2670 Time Complexity (1) The Class P

I There is a polynomial difference in the running time of a deterministic k-tape TM and a single tape TM. I There is an exponential difference in the running time of a nondeterministic TM and a deterministic TM. I The difference between algorithms that run in polynomial running times and those with exponential running times is considered great. I Polynomial time algorithms are considered tractable. I Exponential time algorithms are rarely usable for large input sizes. I All reasonable deterministic computational models are polynomially equivallent. That is, any one of them can simulate another with only a polynomial increase in running time. I For the moment, we will focus on the class of problems that are decidable in polynomial time. I This is the class P.

CSCI 2670 Time Complexity (1) The Class P

Definition P is the class of languages that are decidable in polynomial time on a deterministic single-tape Turing machine. [ P = TIME(nk ) k

I The various “reasonable” models of computation that have been developed are all polynomially equivalent—each can be simulated by another with only a polynomial increase in running time. I As such, the emphasis on deterministic single-tape Turing machines is not important. I We could replace deterministic single-tape Turing machines with some other reasonable model of computation, and P would be left unaffected. I That is, P is invariant for all computation models that are polynomial equivalent to deterministic single tape TMs. I Roughly, P corresponds to problems that are realistically solvable by computers.

CSCI 2670 Time Complexity (1) A note about encoding schemes

I How a problem is encoded has an affect on the running time of an algorithm. I For instance, we might choose to encode a graph of n nodes as a list of numbers (nodes) and number pairs (edges). We might choose to encode a TM M as a list of its 7-tuples. They are reasonable. I But note that unary notation for encoding numbers (as in the number 17 encoded by the unary string 11111111111111111) is not reasonable because it is exponentially larger than truly reasonable encodings, such as base k notation for any k ≥ 2. I We only consider “reasonable” encoding schemes, those that can be converted to another reasonable encoding scheme in polynomial time.

CSCI 2670 Time Complexity (1) Problems in P: PATH

The language PATH is in P: PATH = {hG, s, ti|G is a directed graph with directed path from s to t}.

I A path can be represented as a list of nodes. I Suppose G has m nodes. Then a path in G has at most length m. I A brute force approach of checking all potential paths in G would take exponential time. I The below algorithm M, however, will take polynomial time.

On input hG, s, ti (where G has m nodes): 1. Mark s. 2. Repeat Step 3 until no additional nodes are marked: 3. Scan the nodes and edges of G. If a is a marked node and (a, b) is an edge and b is unmarked, then mark b. 4. If t is marked, accept. If not reject.

CSCI 2670 Time Complexity (1) Problems in P: PATH

On input hG, s, ti (where G has m nodes): 1. Mark s. 2. Repeat Step 3 until no additional nodes are marked: 3. Scan the nodes and edges of G. If a is a marked node and (a, b) is an edge and b is unmarked, then mark b. 4. If t is marked, accept. If not reject.

I 1 and 4 are executed once. They require (at most) scanning G once. I 3 is executed at most m times, since a node is marked with each iteration. I Executing 3 can clearly be done in polynomial time (relative to m). I So the running time of the algorithm is polynomial.

CSCI 2670 Time Complexity (1) Problems in P: RELPRIME

I Integers x and y are relatively prime if their is 1. I That is, x and y are relatively prime if 1 is the only integer to evenly divide them both.

RELPRIME is in P.

RELPRIME = {hx, yi|x, y are relatively prime}

I Observe that a brute force approach of checking every integer n between 2 and, e.g., min(x, y) won’t work, because the value of n will be exponentially larger than any reasonable encoding of n. I Instead, the can be used.

CSCI 2670 Time Complexity (1) Problems in P: RELPRIME

I The Euclidean algorithm is encoded in E. I R uses E to decide RELPRIME. I If E runs in polynomial time, then R does.

E = On input hx, yi (x, y ∈ N, represented in binary.): 1. Repeat until y = 0: 2. assign x mod y to x. 3. swap x and y. 4. return x.

R = On input hx, yi (x, y ∈ N, represented in binary.): 1. Run E on hx, yi. 2. If E returns 1, then accept. Otherwise reject.

CSCI 2670 Time Complexity (1) Problems in P: RELPRIME

E = On input hx, yi (x, y ∈ N, represented in binary.): 1. Repeat until y = 0: 2. assign x mod y to x. 3. swap x and y. 4. return x.

I In iteration 1, if x < y, then x and y are swapped. So assume x > y. I After stage 2, x < y (because x mod y < y). After stage 3, x > y. I Either x/2 ≥ y or else x/2 < y. I If x/2 ≥ y, then since x mod y < y, it follows that x mod y < y ≤ x/2. I If x/2 < y, then x < 2y, and so x mod y = x − y. Note that x − y < x/2 (since x − x/2 < y implies x − y < x/2). ∴ x mod y < x/2. I So, either way, x drops by half. Since x and y are swapped each iteration, the values of each are cut in half every other iteration. I So 2-3 are run min(log2(x), log2(y)). I This is proportional to the encoding size of x and y, and so the algorithm runs in time O(n).

CSCI 2670 Time Complexity (1) In-class Questions???

Which of the following pairs of numbers are relatively prime? Show the calculations that led to your conclusions. 1. 81 and 625 2. 375 and 147

CSCI 2670 Time Complexity (1) Problems in P: CFLs

Every context free language is in P

I This can be shown using the CYK algorithm, which determines whether string w is derivable by CNF grammar G. I The algorithm divides w, |w| = n, into substrings of length i = 1, 2,..., n, and then determines the variables deriving each substring of length i. I The algorithm begins with substrings of length i = 1. I For subsequent lengths i, results for previous lengths are combined to provide answers for the current value of i. I The CYK algorithm is an example of : a problem is subdivided into smaller ones, and solutions to the smaller problems are used to solve the larger one. I Importantly, solutions to each smaller problem are recorded, so that work is not needlessly repeated.

CSCI 2670 Time Complexity (1) CYK Algorithm

1: procedure cyk(w1w2 ... wn) // G = (V , Σ, R, S). 2: if w = ε and S → ε ∈ R then // w = ε is a special case. 3: return accept 4: for i = 1 to i = n do 5: for each A ∈ V do // find all variables yielding each wi 6: if A → wi ∈ R then 7: add A to table(i, i) 8: for len = 2 to len = n do // compute variables for substring length len. 9: for i = 1 to i = n − len + 1 do 10: j := i + len − 1 // j is the end of the substring. 11: for split = i to split = j − 1 do 12: for each A → BC ∈ R do 13: if B ∈ table[i][split] and C ∈ table[split + 1][j] then 14: add A to table(i, j) 15: if S ∈ table[1][n] then 16: return accept 17: else 18: return reject 19: end procedure

CSCI 2670 Time Complexity (1) CYK Algorithm: Analysis

I We can assume that G, and hence it’s rules and variables, etc. are fixed. That is, they are constant. I Thus, we are only interested in the algorithm as the size n of the input string changes. I Steps 2 and 3 can be done in constant time. I Since V and R are constant, Steps 4-7 take time O(n). I The loops at 8, 9, and 11 run at most n times. I The loop at 12 is bounded by a constant (the number of rules). I The remaining if statements, etc., can be done in constant time. 3 I As such, the total running time of the algorithm is O(n ).

CSCI 2670 Time Complexity (1) In-class Questions???

Use the CYK algorithm to determine whether the following grammar generates string babab.

S → AB | AC A → BC | AA B → b C → a

CSCI 2670 Time Complexity (1) Difficult problems

Problems exist for which no polynomial time algorithms are known.

I A Hamiltonian path through a directed graph (digraph) G is a path that enters every node exactly once. I No polynomial time algorithms are known for HAMPATH:

HAMPATH = {hG, s, ti|G is a digraph with a Hamiltonian path from s to t}

I A brute force enumeration of all possible paths would decide the language, but it runs in exponential time. I We would need only check that a found path is Hamiltonian. CSCI 2670 Time Complexity (1) Polynomial Verifiability

I Observe that it is easy to check that a given sequence of nodes is a path and that it is Hamiltonian. I HAMPATH is polynomially verifiable. If given a Hamiltonian path p, we can verify that p is a Hamiltonian path in polynomial time. This is much easier than discovering a solution. I COMPOSITES = {x|x = pq for integers p, q > 1} is also polynomially verifiable. Given p and q, we can easily check that pq = x.

CSCI 2670 Time Complexity (1) Polynomial Verifiability

I HAMPATH might not be polynomially verifiable. (It’s difficult to test for the non-existence of something—in this case a Hamiltonian path).

Definition I A verifier for a language A is an algorithm V , where A = {w|V accepts hw, ci for some string c}. I The verifier halts on every input. I A polynomial verifier runs in polynomial time in the length of w. I String c is called a certificate (proof) of w’s membership in A. I For a polynomial verifier, c’s length is a polynomial of w’s length. I Language A is polynomially verifiable if it has a polynomial verifier.

I For HAMPATH, the certificate would be a Hamiltonian path from s to t. I For COMPOSITES, the certificate would be one of x’s divisors.

CSCI 2670 Time Complexity (1) Polynomial NTM

The following is a nondeterministic Turing machine (NTM) N1 that decides the HAMPATH problem in nondeterministic polynomial time.

N1 = “On input hG, s, ti, where G is a directed graph with nodes s and t:

1. Write a list of m numbers, p1, p2, ··· , pm, where m is the number of nodes in G. Each number in the list is nondeterministically selected to be between 1 and m. 2. Check for repetitions in the list. If any are found, reject.

3. Check whether s = p1 and t = pm. If either fail, reject.

4. For each i between 1 and m − 1, check whether (pi , pi+1) is an edge of G. If any are not, reject. Otherwise, all tests have been passed, so accept.”

CSCI 2670 Time Complexity (1) The Class NP

Definition NP is the class of languages that have polynomial time verifiers.

I P is clearly a subclass of NP. If a problem can be decided in polynomial time, then a polynomial verifier can be made (ignore the certificate). I The term NP comes from “nondeterministic polynomial time”. I NP is the class of languages that can be decided in polynomial time by nondeterministic single-tape Turing machines.

Theorem A language is in NP iff it is decided by some nondeterministic polynomial time Turing machine.

I The idea is to convert a polynomial time verifier into a nondeterministic polynomial time decider, and vice versa.

CSCI 2670 Time Complexity (1) The Class NP

Theorem A language is in NP iff it is decided by some nondeterministic polynomial time Turing machine.

Proof. For the forward direction of this theorem (LR), let A ∈ NP and show that A is decided by a polynomial time NTM N. Let V be the polynomial time verifier of A that runs in time nk , where k is an integer constant. And construct N as follows. N = “On input w of length n: 1. Nondeterministically generate a certificate c of length at most nk . 2. Run V on input hw, ci. 3. If V accepts, accept; if V rejects, reject.

CSCI 2670 Time Complexity (1) The Class NP

Theorem A language is in NP iff it is decided by some nondeterministic polynomial time Turing machine.

Proof. To prove the other direction (LR), assume that A is decided by a polynomial time NTM N and construct a polynomial time verifier V as follows. V = “On input hw, ci, where w and c are strings: 1. Simulate N on input w, treating each symbol of c as a description of the nondeterministic choice to make at each step (c encodes a computation branch of N). 2. If this branch of N’s computation accepts, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) The Class NP

The nondeterministic time complexity class NTIME(t(n)).

Definition NTIME(t(n)) = {L|L is a language decided by a O(t(n)) time NTM}.

Corollary [ NP = NTIME(nk ). k

I Like P, NP is invariant under the choice of model. I The problems and SUBSET-SUM are in NP.

CSCI 2670 Time Complexity (1) Problems in NP: CLIQUE

I In an undirected graph, a clique is a set of nodes such that each pair in the clique is connected by an edge. I A k-clique is a clique of k nodes.

Theorem CLIQUE = {hG, ki|G is an undirected graph with a k-clique} is in NP.

CSCI 2670 Time Complexity (1) Problems in NP: CLIQUE

Theorem CLIQUE = {hG, ki|G is an undirected graph with a k-clique} is in NP.

Proof Idea: the clique is the certificate. Proof. The following is a verifier V for CLIQUE. V = “On input hhG, ki, ci: 1. Test whether c is a set of k nodes in G. 2. Test whether G contains all edges connecting nodes in c. 3. If both pass, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) Problems in NP: CLIQUE

Theorem CLIQUE = {hG, ki|G is an undirected graph with a k-clique} is in NP.

Alternative Proof: think of NP in terms of nondeterministic polynomial time Turing machines. Proof. N = “On input hG, ki, where G is a graph: 1. Nondeterministically select a subset S of k nodes of G. 2. Test whether G contains all edges connecting nodes in S: for every a, b ∈ S (with a, b distinct), check that an edge for a, b exists in G. 3. If yes, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) Problems in NP: SUBSET-SUM

I Given S = {x1,..., xn}, and a number t, we want a subset of S that sums to t.

Theorem SUBSET-SUM is in NP, where SUBSET-SUM is

0 X {hS, ti|S = {x1,..., xn} and there is a S ⊆ S such that y = t}. y∈S0

I For example, h{4, 11, 16, 21, 27}, 25i ∈ SUBSET-SUM because 4 + 21 = 25. 0 I Note that S and S are considered to be multisets and so allow repetition of elements.

CSCI 2670 Time Complexity (1) Problems in NP: SUBSET-SUM

Theorem SUBSET-SUM is in NP, where SUBSET-SUM is

0 X {hS, ti|S = {x1,..., xn} and there is a S ⊆ S such that y = t}. y∈S0

Proof Idea: the subset is the certificate. Proof. The following is a verifier V for SUBSET-SUM. V = “On input hhS, ti, ci: 1. Test whether c is a collection of numbers that sum to t. 2. Test whether S contains all the numbers in c. 3. If both pass, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) Problems in NP: SUBSET-SUM

Theorem SUBSET-SUM is in NP, where SUBSET-SUM is

0 X {hS, ti|S = {x1,..., xn} and there is a S ⊆ S such that y = t}. y∈S0

Alternative Proof: think of NP in terms of nondeterministic polynomial time Turing machines. Proof. N = “On input hS, ti: 1. Nondeterministically select a subset c of numbers in S. 2. Test whether c is a collection of numbers that sum to t. 3. If the test passes, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) P versus NP

I P is the class of languages decidable in polynomial time by deterministic TMs. I NP is the class of languages decidable in polynomial time by nondeterministic TMs. Or, more colloquially...

I P is the class of languages for which membership can be decided quickly. I NP is the class of languages for which membership can be verified quickly.

I It should be clear that P is a subset of NP. I What we don’t know is whether P is a proper subset of NP. I That is, We don’t know whether P = NP or P 6= NP.

CSCI 2670 Time Complexity (1) P versus NP

I We don’t know whether P = NP or P 6= NP.

I This is one of the greatest unsolved problems in theoretical . I If P = NP, then all problems in NP would have deterministic polynomial time algorithms. That is, they would all be quickly solvable. I Very few people think that P is NP, but no one has been able to prove it.

CSCI 2670 Time Complexity (1) P versus NP

I The best method known for solving languages in NP deterministically uses exponential time.

[ nk I And so, what we do know is that NP ⊆ EXPTIME = TIME(2 ). k I EXPTIME is the set of languages decidable by deterministic TMs with running times of O(2p(n)), there p(n) is a polynomial function of n. I We don’t know whether NP is in a smaller deterministic complexity class.

CSCI 2670 Time Complexity (1) P versus NP

I P is closed under union, concatenation, and complement. I NP is closed under union and concatenation.

Prove that I P is closed under complement. I NP is closed under concatenation

CSCI 2670 Time Complexity (1) P versus NP

I P is closed under complement.

Proof. For any language L ∈ P, let M be the TM that decides it in polynomial time. We construct a TM M’ that decides the complement of L in polynomial time: M0 = “On input hwi: 1. Run M on w. 2. If M accepts, reject. If it rejects, accept.”

CSCI 2670 Time Complexity (1) P versus NP

I NP is closed under concatenation.

Proof.

For any two languages L1 and L2, L1, L2 ∈ NP, let M1 and M2 be the NTMs that decide them in polynomial time. We construct a NTM M’ that decides L1 ◦ L2 in polynomial time: M0 = “On input hwi:

1. For each way to cut w into two substrings w = w1w2:

2. Run M1 on w1.

3. Run M2 on w2. If both accept, accept; otherwise continue with the next choice of w1 and w2. 4. If w is not accepted after trying all the possible cuts, reject.”

CSCI 2670 Time Complexity (1) More Examples in P

I Let CONNECTED = {hGi|G is a connected graph}, show that CONNECTED is in P. Recall that a graph is connected if every node can be reached from every other node by traveling along the edges of the graph.

CSCI 2670 Time Complexity (1) More Examples in P

I Let CONNECTED = {hGi|G is a connected graph}, show that CONNECTED is in P.

Proof. We construct a TM M that decides CONNECTED in polynomial time. M = “On input hGi where G is a graph: 1. Select the first node of G and mark it. 2. Repeat the following state until no new nodes are marked:

3. For each node in G, mark it if it is attached by an edge to a node that is already marked. 4. Scan all the nodes of G to determine whether they all are marked. If they are, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) More Examples in P

Proof. We construct a TM M that decides CONNECTED in polynomial time. M = “On input hGi where G is a graph: 1. Select the first node of G and mark it. 2. Repeat the following state until no new nodes are marked:

3. For each node in G, mark it if it is attached by an edge to a node that is already marked. 4. Scan all the nodes of G to determine whether they all are marked. If they are, accept; otherwise, reject.”

I Stage 1: O(1) I Stage 2: at most (n + 1) iteration for loops 2 I Stage 3: O(n ) I Stage 4: O(n) 3 I Therefore, the algorithm runs in O(n ) time and CONNECTED is in P.

CSCI 2670 Time Complexity (1) More Examples in P

∗ I Show that ALLDFA = {hAi|A is a DFA and L(A) = Σ } is in P.

CSCI 2670 Time Complexity (1) More Examples in P

∗ I Show that ALLDFA = {hAi|A is a DFA and L(A) = Σ } is in P.

Proof.

We construct a TM M that decides ALLDFA in polynomial time. M = “On input hAi where A is a DFA: 1. Construct DFA B that recognizes L(A) by swapping the accept and non accept states in DFA A.

2. Run the decider S of EDFA on input hBi. 3. If S accepts, accept. If S rejects, reject.”

Can we determine if this decider M runs in polynomial time?

CSCI 2670 Time Complexity (1) More Examples in P

∗ I Show that ALLDFA = {hAi|A is a DFA and L(A) = Σ } is in P.

Proof.

We construct a TM M that decides ALLDFA in polynomial time. M = “On input hAi where A is a DFA: 1. Construct DFA B that recognizes L(A) by swapping the accept and non accept states in DFA A. 2. Test whether a path exists from the start state to each accept state in B. 3. If no such path exists, accept; otherwise, reject.”

CSCI 2670 Time Complexity (1) More Examples in P

Show the following languages are in P. 1. Let MODEXP = {ha, b, c, pi| a, b, c, and p are binary integers such that ab ≡ c (mod p) }. 2. A permutation on the set {1, 2,..., k} is a one-to-one, onto function on this set. When p is a permutation, Pt means the composition of p with itself t times. Let PERM-POWER = {hp, q, ti| p = qt where p and q are permutations on {1, 2,..., k} and t is a binary integer}. Hint: note that the most obvious algorithm does not run in polynomial time. Try it first where b (for question 1) and t (for question 2), each is a power of 2.

CSCI 2670 Time Complexity (1)