1989: the Computational Complexity of Some Rounding and Survey

The Computational Complexity of Some Rounding and Survey Overlap Problems Kirk Pruhs, Computer Science Department University of Pittsburgh, Pittsburgh, PA 15260 KEY WORDS: NP-complete, Zero-restricted Rounding, Unbiased Rounding, Controlled Rounding 1. Introduction tempts to design efficient algorithms for this prob- In this paper we examine the computational lem have been unsuccessful [Cox, HS, Wat]. In §3, complexity of two classes of problems. The first we explain why these attempts were unsuccessful class of problems involves rounding entries in multi- by showing that both the problem of determining way tables subject to certain restrictions. The sec- whether a 3-way table has a zero-restricted round- ond class of problems involves maximizing the over- ing, and the problem of determining whether a 3- lap between several surveys, with stratified design, way table has an unbiased rounding, are N P-hard. on a common population. In particular, we inves- We discuss, in §2, the implications of a problem tigate whether efficient algorithms exist for these being NP-hard. problems. For some of these problems we give ef- An instance of a survey overlap problem con- ficient algorithms. For the other problems, we use sists of several survey sampling problems, with the theory of NP-completeness to show that it is stratified design, on a common population. (We de- highly unlikely that efficient algorithms exist for fine stratification in §2.) It is possible that both the these problems. stratification and the selection probabilities of the The rounding problems that we consider are sampling units are different in each survey. In sur- the zero-restricted rounding problem and the unbi- vey overlap problems we make the assumption that ased rounding problem. Given a table with ratio- the cost of sampling is roughly proportional to the nal entries, the goal in the zero-restricted round- total number of units sampled in all of the surveys, ing problem is to replace the entries in the ta- i.e., it is cheaper to sample the same unit twice ble with adjacent integers in such a way that the than it is to sample two distinct units. Minimizing marginals are maintained (this is called a zero- the number of distinct units chosen in the different restricted rounding). Given a table with rational surveys would then minimize the cost. Hence, the entries, the goal in the unbiased rounding problem goal in the survey overlap problem is to solve each is randomly generate a zero-restricted rounding in survey in such a way as to minimize the maximum such a way that the expected value of the rounding number of units in the union of the samples. of each entry is equal to the value of that entry. In section 4, we give an O(n 2) time algorithm For applications of these rounding problems see for instances of the survey overlap problem that [CFGH, Cox, HRF, IK]. Cox [Cox], and Causey, consist of two singularly stratified survey sampling Cox, and Ernst [CCE] have given efficient algo- problems. This algorithm is optimal, in terms of rithms for generating unbiased roundings of 2-way minimizing the total number of elements sampled, tables. Their algorithms run in time O(n 3) on n both in the expected case and in the worst case. by n tables. An algorithm A runs in time O(f(n)), We then show that if we generalize this problem by for some function f, if for every sufficiently large allowing three singularly stratified surveys, or two input, A finishes after at most c. f(n) steps, where doubly stratified surveys, then the survey overlap c is some constant and n is the size of the input. problem becomes N P-hard. We also show that the An unbiased rounding of a 1-way table of size n can survey overlap problem is N P-hard for instances be generated in time O(n). consisting of an arbitrary number of unstratified Causey, Cox, and Ernst [CCE], Hess and surveys. Srikantan [HS], Waterton [Wat], and Cox [Cox] 2. Definitions posed the problem of how to generalize these re- We assume familiarity with standard defini- sults to 3-way tables. The first difficulty encoun- tions and concepts from graph theory; see for ex- tered when generalizing these results is that not all ample. An edge 3-coloring of a multigraph is an 3-way tables have zero-restricted roundings [CCE, assignment of one of three colors to each edge that HS]. Given a 3-way table T, one might still hope has the property that each pair of edges incident for an efficient algorithm that determines whether on a common vertex are assigned distinct colors. T has a zero-restricted rounding (or an unbiased A vertex 3-coloring of a multigraph G is an assign- rounding), and if so, generates such a rounding. At- ment of one of three colors to each vertex of G, 747 with the property that each pair of adjacent ver- nomial p(n). The class P is generally regarded as tices are assigned different colors. For notational the class of problems that can be solved by compu- convenience we will represent the three colors by tationally feasible algorithms [GJ]. The class NP the integers 1, 2, and 3. is defined as the the class of problems for which it An m-way array, A- A1 × As × ... Am, is the is possible to, in polynomial time, guess at a po- Cartesian product of m finite sets. Each A/is called tential solution and then verify the correctness of a stratum. An m-way table T is an assignment of a that potential solution. As an example of a prob- positive number, T(al, a2,..., am), to each element lem in NP, consider the edge 3-coloring problem. (al,as,...,am) of A. Each ai is called an index, Given a trivalent multigraph G, the edge 3-coloring and T(al,a2,...,am) is called an entry of T. problem is the problem of determining whether G We denote the floor and ceiling of a number has an edge 3-coloring. It easy to, in polynomial z by, [zJ and Ix], respectively. An integer k is time, guess a color for each edge, and then verify a rounding of rational number z with respect to a that each vertex has exactly one edge incident to rounding base B if ~r- [~J, or ~r- [~]" A table it of each color. R is a rounding of T with respect to a rounding There are many problems, such as the edge base B, if R and T have the same underlying array, 3-coloring problem, that are known to be in N P, and each entry of R is a rounding with respect to but are not known to be P. Garey and Johnson's B of the corresponding entry in T. From now on book [GJ] contains a 100 page list of such prob- we assume, without loss of generality, that B - 1, lems arising from such diverse areas as graph the- and that the range of T is [0, 1) [CCE]. A k-way ory, network design, algebra, number theory, math hyperplane, 0 < k < m, of A is obtained fixing programming, and logic. The P =?NP problem is m-k of the indices, and is denoted in the following the problem of determining whether all problems manner. A(al, ., ., a4, .) is the 3-way hyperplane in NP are solvable by polynomial-time algorithms. derived from A by considering only those elements The P =?NP problem is the most important open of A whose the first index is al and whose fourth problem in theoretical computer science, and has index is a4. We call a 1-way hyperplane a line, and been cited as one of the ten most famous open prob- a 2-way hyperplane a plane. For a table T and a lems in mathematics. hyperplane H, we define #T(H) to be ~aeH T(a). An important concept in the study of the A rounding R of T preserves a hyperplane H if P =?NP problem is NP-completeness (or NP- =g=R(g) is a rounding of #T(H). equivalence). A problem 7~ is N P-hard if a R is a zero-restricted rounding of T if R polynomial-time algorithm for 7~ implies that every preserves all hyperplanes. A probabilistic proce- problem in N P has a polynomial-time algorithm. dure generates a fair rounding R of a table T A problem is NP-equivalent if it is in NP and it is if the expected value of each R(al, as,..., am) is NP-hard. Most natural problems, that are known T(al, a2,..., am). A probabilistic procedure gen- to be in NP, but are not known to be in P, are erates an unbiased rounding R of a table T if the NP-equivalent. procedure fairly generates a zero-restricted round- One way to show that a problem 7~ is NP- ing T. hard is to exhibit a polynomial-time reduction from In the m-way stratified sampling problem the some known NP-hard problem C to 7~. A func- population is stratified by m variables that are tion r from instances of C to instances of T' is a hopefully correlated with the measure variable. To polynomial-lime reduclion if: reduce the variance of the measure variable a sam- 1. r(z) can be computed in time polynomial in ple whose distribution among the strata mirrors the the size of x. populations distribution as closely as possible. We 2.

Load more