Parameterized Algorithms for the Matrix Completion Problem
Robert Ganian 1 Iyad Kanj 2 Sebastian Ordyniak 3 Stefan Szeider 1
Abstract observing a sample from the set of entries of a low-rank matrix, and attempting to recover the missing entries with We consider two matrix completion problems, in the goal of optimizing a certain measure. In this paper, we which we are given a matrix with missing entries focus our study on matrix completion with respect to two and the task is to complete the matrix in a way measures (considered separately): (1) minimizing the rank that (1) minimizes the rank, or (2) minimizes the of the completed matrix, and (2) minimizing the number of number of distinct rows. We study the param- distinct rows of the completed matrix. eterized complexity of the two aforementioned problems with respect to several parameters of in- The first problem we consider—matrix completion terest, including the minimum number of matrix w.r.t. rank minimization—has been extensively studied, and rows, columns, and rows plus columns needed to is often referred to as the low-rank matrix completion prob- cover all missing entries. We obtain new algorith- lem (Candes` & Plan, 2010; Candes` & Recht, 2009; Candes` mic results showing that, for the bounded domain & Tao, 2010; Hardt et al., 2014; Fazel, 2002; Keshavan case, both problems are fixed-parameter tractable et al., 2010a;b; Recht, 2011; Saunderson et al., 2016). A cel- with respect to all aforementioned parameters. We ebrated application of this problem lies in the recommender complement these results with a lower-bound re- systems area, where it is known as the Netflix problem (net). sult for the unbounded domain case that rules out In this user-profiling application, an entry of the input ma- fixed-parameter tractability w.r.t. some of the pa- trix represents the rating of a movie by a user, where some rameters under consideration. entries could be missing. The goal is to predict the missing entries so that the rank of the complete matrix is minimized. The low-rank matrix completion problem is known to be 1. Introduction NP-hard, even when the matrix is over the field GF(2) (i.e., Problem Definition and Motivation. We consider the ma- each entry is 0 or 1), and the goal is to complete the ma- trix completion problem, in which we are given a matrix M trix into one of rank 3 (Peeters, 1996). A significant body (over some field that we also refer to as the domain of the of work on the low-rank matrix completion problem has matrix) with missing entries, and the goal is to complete the centered around proving that, under some feasibility as- entries of M so that to optimize a certain measure. There is sumptions, the matrix completion problem can be solved a wealth of research on this fundamental problem (Candes` efficiently with high probability (Candes` & Recht, 2009; & Plan, 2010; Candes` & Recht, 2009; Candes` & Tao, 2010; Recht, 2011). These feasibility assumptions are: (1) low Elhamifar & Vidal, 2013; Hardt et al., 2014; Fazel, 2002; rank; (2) incoherence; and (3) randomness (Hardt et al., Keshavan et al., 2010a;b; Recht, 2011; Saunderson et al., 2014). Hardt et al. (2014) argue that feasibility assumption 2016) due to its ubiquitous applications in recommender (3), which states that the subset of determined entries in the systems, machine learning, sensing, computer vision, data matrix is selected uniformly at random and has a large (sam- science, and predictive analytics, among others. In these pling) density, is very demanding. In particular, they justify areas, the matrix completion problem naturally arises after that in many applications, such as the Netflix problem, it is not possible to arbitrarily choose which matrix entries are *Equal contribution 1Algorithms and Complexity Group, determined and which are not, as those may be dictated by TU Wien, Vienna, Austria 2School of Computing, DePaul outside factors. The low-rank matrix completion problem 3 University, Chicago, USA Algorithms Group, University of also has other applications in the area of wireless sensor Sheffield, Sheffield, UK. Correspondence to: Robert Ganian
We chart our results in Table 1. Interestingly, in the un- We formally define the problems under consideration below. bounded domain case, both considered problems exhibit BOUNDED RANK MATRIX COMPLETION (p-RMC) wildly different behaviors: While RMC admits XP algo- Input: An incomplete matrix M over GF(p) for a rithms regardless of whether we parameterize by row or fixed prime number p, and an integer t. col, using these two parameterizations for DRMC results Task: Find a matrix M0 consistent with M such that in the problem being FPT and paraNP-hard, respectively. rk(M0) ≤ t. On the other hand, in the (more studied) bounded domain BOUNDED DISTINCT ROW MATRIX COMPLETION (p-DRMC) case, we show that both problems are in FPT (resp. FPTR) w.r.t. all parameters under consideration. Finally, we prove Input: An incomplete matrix M over GF(p) for a fixed prime number p, and an integer t. that 2-DRMC remains NP-hard even if every column and 0 row contains (1) a bounded number of missing entries, or (2) Task: Find a matrix M consistent with M such that dr(M0) ≤ t. a bounded number of determined entries. This effectively rules out FPT algorithms w.r.t. the parameters: maximum Aside from the problem variants where p is a fixed prime number of missing/determined entries per row or column. number, we also study the case where matrix entries range over a domain that is provided as part of the input. In particu- 2. Preliminaries lar, the problems RMC and DRMC are defined analogously to p-RMC and p-DRMC, respectively, with the sole distinc- For a prime number p, let GF(p) be a field of order p; recall tion that the prime number p is provided as part of the input. that each such field can be equivalently represented as the We note that 2-RMC is NP-hard even for t = 3 (Peeters, set of integers modulo p. For positive integers i and j > i, 1996), and the same holds for 2-DRMC (see Theorem 14). we write [i] for the set {1, 2, . . . , i}, and i : j for the set Without loss of generality, we assume that the rows of the {i, i + 1, . . . , j}. input matrix are pairwise distinct. For an m × n matrix M (i.e., a matrix with m rows and 2.2. Treewidth n columns), and for i ∈ [m] and j ∈ [n], M[i, j] de- notes the element in the i-th row and j-th column of M. Treewidth (Robertson & Seymour, 1986) is one of the most Similarly, for a vector d, we write d[i] for the i-th co- prominent decompositional parameters for graphs and has ordinate of d. We write M[∗, j] for the column-vector found numerous applications in computer science. A tree- (M[1, j], M[2, j],..., M[m, j]), and M[i, ∗] for the row- decomposition T of a graph G = (V,E) is a pair (T, χ), Parameterized Algorithms for the Matrix Completion Problem where T is a tree and χ is a function that assigns each value for comb can also be computed in polynomial time by tree node t a set χ(t) ⊆ V of vertices such that the fol- reducing the problem to finding a vertex cover in a bipartite lowing conditions hold: (TD1) for every edge uv ∈ E(G) graph: there is a tree node t such that u, v ∈ χ(t); and (TD2) Proposition 1. Given an incomplete matrix M over GF(p), for every vertex v ∈ V (G), the set of tree nodes t with we can compute the parameter value for comb, along with v ∈ χ(t) forms a non-empty subtree of T . The width of sets R and C of total cardinality comb containing the in- a tree-decomposition (T, χ) is the size of a largest bag mi- dices of covering rows and columns, respectively, in time nus 1. A tree-decomposition of minimum width is called O((n · m)1.5). optimal. The treewidth of a graph G, denoted by tw(G), is the width of an optimal tree decomposition of G. We will assume that the tree T of a tree-decomposition is rooted and 3. Rank Minimization we will denote by Tt the subtree of T rooted at t and write In this section we present our results for BOUNDED RANK S 0 χ(T ) for the set 0 χ(t ). t t ∈V (Tt) MATRIX COMPLETION under various parameterizations.
2.3. Problem Parameterizations 3.1. Bounded Domain: Parameterization by row One advantage of the parameterized complexity paradigm As our first result, we present an algorithm for solving is that it allows us to study the complexity of a problem p-RMC[row]. This will serve as a gentle introduction w.r.t. several parameterizations of interest/relevance. To to the techniques used in the more complex result for p- provide a concise description of the parameters under con- RMC[comb], and will also be used to give an XP algorithm sideration, we introduce the following terminology: We say for RMC[row]. that a • entry at position [i, j] in an incomplete matrix M is Theorem 2. p-RMC[row] is in FPT. covered by row i and by column j. In this paper, we study RMC and DRMC w.r.t. the following parameterizations Proof Sketch. Let R be the (minimum) set of rows that (see Figure 1 for illustration): cover all occurrences of • in the input matrix M. Since ◦ col: The minimum number of columns in the matrix the existence of a solution does not change if we permute M covering all occurrences of • in M. the rows of M, we permute the rows of M so that the rows ◦ row: The minimum number of rows in the matrix M in R have indices 1, . . . , k. We now proceed in three steps. covering all occurrences of • in M. For the first step, we will define the notion of signature: ◦ comb: The minimum value of r + c such that there A signature S is a tuple (I,D), where I ⊆ R and D is a exist r rows and c columns in M with the property that mapping from R \ I to (I → GF(p)). Intuitively, a sig- each occurrence of • is covered one of these rows or nature S specifies a subset I of R which is expected to be columns. independent in M[k + 1 : m, ∗] ∪ I (i.e., adding the rows in I to M[k + 1 : m, ∗] is expected to increase the rank of 1 1 1 0 • 1 M[k + 1 : m, ∗] by |I|); and for each remaining row of R, 0 0 1 0 • 1 S specifies how that row should depend on I. The latter is 0 •• 0 •• carried out using D: For each row in R \ I, D provides a set 1 1 0 1 0 1 of coefficients expressing the dependency of that row on the rows in I. Formally, we say that a matrix M0 that is com- Figure 1. Illustration of the parameters col, row, and comb in an incomplete matrix. Here col = 4, row = 3, and comb = 2. patible with the incomplete matrix M matches a signature (I,D) if and only if, for each row (i.e., vector) d ∈ R \ I, For instance, the aforementiond problem TRIANGULA- d d there exist coefficients ak+1, . . . , am ∈ GF(p) such that TION FROM INCOMPLETE DATA, where a small number d d P d = ak+1M[k+1, ∗]+···+amM[m, ∗]+ i∈I D(d)(i)·i. of distance-sensors are faulty, would result in matrix com- The first step of the algorithm branches through all possible pletion instances where col and row are both small. signatures S. Clearly, the number of distinct signatures is 2 We denote the parameter under consideration in brackets af- upper-bounded by 2k · pk . ter the problem name (e.g., DRMC[comb]). As mentioned For the second step, we fix an enumerated signature S. The in Section 1, both p-RMC and p-DRMC are trivially in algorithm will verify whether S is valid, i.e., whether there FPT when parameterized by the number of missing entries, exists a matrix M0 compatible with M that matches S. To and hence this parameterization is not discussed further. do so, the algorithm will construct a system of |R \ I| equa- Given an incomplete matrix M, computing the parameter tions over vectors of size n, and then transform this into a values for col and row is trivial. Furthermore, the parameter system ΥS of |R\I|·n equations over GF(p) (one equation values satisfy comb ≤ row and comb ≤ col. The parameter for each vector coordinate). For each d ∈ R\I, ΥS contains Parameterized Algorithms for the Matrix Completion Problem
d d 0 one variable for each coefficient ak+1, . . . , am and one vari- 3. replacing x with Γi,x in every equation in Υ . able for each occurrence of • in the rows of R. For instance, the first equation in ΥS has the following form: d[1] = Observe that Υ0 has size O(n · `), and can also be computed d d P ak+1M[k +1, 1]+···+amM[m, 1]+ i∈I D(d)(i)·i[1], in time O(n·`), where n is the number of variables occurring d d 0 where ak+1, . . . , am are variables, and d[1] as well as each in Υ. Furthermore, any solution to Υ can be transformed i[1] in the sum could be a variable or a fixed number. Cru- into a solution to Υ in linear time, and similarly any solution 0 cially, ΥS is a system of at most (k·n) linear equations over to Υ can be transformed into a solution to Υ in linear time GF(p) with at most m + kn variables, and can be solved in (i.e., Υ0 and Υ are equivalent). Moreover, Υ0 contains at time O((m + kn)3) by Gaussian elimination. Constructing least one fewer variable and one fewer equation than Υ. the equations takes time O(m · n). The following proposition is crucial for our proof, and is of During the second step, the algorithm determines whether independent interest. a signature S is valid or not, and in the end, after going Proposition 5. Let Υ be a system of ` quadratic equations through all signatures, selects an arbitrary valid signature over GF(p). Then computing a solution for Υ is in FPT S = (I,D) with minimum |I|. For the final third step, the R parameterized by ` and p, and in XP parameterized only algorithm checks whether |I|+rk(M[k+1 : m, ∗]) ≤ t. We R by `. note that computing rk(M[k + 1 : m, ∗]) can be carried out in time O(nm1.4) (Ibarra et al., 1982). If the above inequal- ity does not hold, the algorithm rejects; otherwise it recom- Proof. Let n be the number of variables in Υ. We distin- 0 guish two cases. If n ≥ `(` + 3)/2, then Υ can be solved putes a solution to ΥS and outputs the matrix M obtained ` 3 2 from M by replacing each occurrence of • at position [i, j] in randomized time O(2 n `(log p) ) (Miura et al., 2014). Otherwise, n < `(` + 3)/2, and we can solve Υ by a by the value of the variable i[j] in the solution to ΥS. The k k2 3 1.4 brute-force algorithm which enumerates (all of the) at most total running time is O((2 ·p )·((m+kn) +nm )) = n `(`+3)/2 2 p < p assignments of values to the variables in Υ. O(2kpk · (m + kn)3). The proposition now follows by observing that the given algorithm runs in time O(2`n3`(log p)2 +p`(`+3)/2`2). Since the the transpose of M has the same rank as M, it p col follows immediately that -RMC[ ] is in FPT. Theorem 6. p-RMC[comb] is in FPTR. Corollary 3. p-RMC[col] is in FPT. Proof Sketch. We begin by using Proposition 1 to compute As a consequence of the running time of the algorithm given the sets R and C containing the indices of the covering in the proof of Theorem 2, we obtain: rows and columns, respectively; let |R| = r and |C| = c, Corollary 4. RMC[row] and RMC[col] are in XP. and recall that the parameter value is k = r + c. Since the existence of a solution for p-RMC does not change 3.2. Bounded Domain: Parameterization by comb if we permute rows and columns of M, we permute the rows of M so that the rows in R have indices 1, . . . , r, and In this subsection, we present a randomized fixed-parameter subsequently, we permute the columns of M so that the algorithm for p-RMC[comb] with constant one-sided error columns in C have indices 1, . . . , c. probability. Before we proceed to the algorithm, we need to introduce some basic terminology related to systems of Before we proceed, let us give a high-level overview of equations. Let Υ be a system of ` equations EQ1, EQ2,. . . , our strategy. The core idea is to branch over signatures, EQ` over GF(p); we assume that the equations are simpli- which will be defined in a similar way to those in Theo- fied as much as possible. In particular, we assume that no rem 2. These signatures will capture information about equation contains two terms over the same set of variables the dependencies among the rows in R and columns in C; such that the degree/exponent of each variable in both terms one crucial difference is that for columns, we will focus is the same. Let EQi be a linear equation in Υ, and let x be only on dependencies in the submatrix M[r + 1 : m, ∗]. In a variable which occurs in EQi (with a non-zero coefficient). each branch, we arrive at a system of equations that needs Naturally, EQi can be transformed into an equivalent equa- to be solved in order to determine whether the signatures tion EQi,x, where x is isolated, and we use Γi,x to denote are valid. Unlike Theorem 2, here the obtained system of the side of EQi,x not containing x, i.e., EQi,x is of the form equations will contain non-linear (but quadratic) terms, and 0 x = Γi,x. We say that Υ is obtained from Υ by substitution hence solving the system is far from being trivial. Once we 0 of x in EQi if Υ is the system of equations obtained by: determine which signatures are valid, we choose one that minimizes the total rank. 1. computing EQ and in particular Γ from EQ ; i,x i,x i For the first step, let us define the notion of signature 0 2. setting Υ := Υ \{EQi}; and that will be used in this proof. A signature S is a tuple Parameterized Algorithms for the Matrix Completion Problem
(IR,DR,IC ,DC ) where: 1. IR ⊆ R; 2. DR is a mapping lects a valid signature S = (I,D) that has the minimum from R \ IR to (IR → GF(p)); 3. IC ⊆ C; and 4. DC is a value of |IR|+|IC |, checks whether |IR|+|IC |+ rk(M[r + mapping from C \ IC to (IC → GF(p)). 1 : m, c : 1 + n]) ≤ t, and either uses this to construct a so- lution (similarly to the proof of Theorem 2), or outputs “no”. We say that a matrix M0 compatible with the incomplete The theorem now follows by observing that the total running matrix M matches a signature (I ,D ,I ,D ) if: R R C C time of the algorithm is obtained by combining the branch- 2 ing factor of branching over all signatures (O(2k ·pk )) with ◦ for each row d ∈ R \ IR, there exist coefficients 2 ad , . . . , ad ∈ GF(p) such that d = ad M0[r + the run-time of Proposition 5 for k many quadratic equa- r+1 m r+1 k2 3 2 k4 d 0 P tions (O(3 n (log p) + p )). In particular, we obtain a 1, ∗] + ··· + amM [m, ∗] + i∈I DR(d)(i) · i; and 2 4 R running time of O(3k · pk · n3). ◦ for each column h ∈ C \ Ic, there exist coefficients h h bc+1, . . . , bn ∈ GF(p) such that h[r + 1 : m] = As a consequence of the running time of the algorithm given h 0 h 0 bc+1M [r + 1 : m, c] + ··· + bnM [r + 1 : m, n] + in the proof of Theorem 6, we obtain: P D (h)(i) · i[r + 1 : m]. i∈IC C Corollary 7. RMC[comb] is in XPR. The number of distinct signatures is upper-bounded by 2 2 2 2r · pr · 2c · pc ≤ 2k · pk , and the first step of the al- 4. BOUNDED DISTINCT ROW MATRIX gorithm branches over all possible signatures S. In the COMPLETION second step, for each enumerated signature S, we check whether S is valid (i.e., whether there exists a matrix M0, Let (p, M, t) be an instance of DRMC. We say that two compatible with the incomplete M, that matches S) in a rows of M are compatible if whenever the two rows differ similar fashion as in the proof of Theorem 2. Here, this at some entry then one of the rows has a • at that entry. The compatibility graph M G(M) results in a system ΥS of |R \ IR| · n + |C \ IC | · (m − r) of , denoted by , is the undi- equations which check the dependencies for rows in R \ IR rected graph whose vertices correspond to the row indices M and columns in C \ IC . For instance, the first equation in of and in which there is an edge between two vertices ΥS for some d ∈ R \ IR has the following form: d[1] = if and only if their two corresponding rows are compatible. ad M[r+1, 1]+···+ad M[m, 1]+P D (d)(i)·i[1], See Figure 2 for an illustration. r+1 m i∈IR R d d where ar+1, . . . , am are variables, DR(d)(i) is a number, and all other occurrences are either variables or numbers. 1 • 0 •• 1 5 Similarly, the second equation in ΥS for some h ∈ C \ IC 1 0 0 1 •• 2 h 1 0 • 1 0 1 1 3 has the following form: h[r + 2] = bc+1M[r + 2, c + 1] + h P 1 0 1 1 0 • 4 ··· + b M[r + 2, n] + DC (d)(i) · i[r + 2], where n i∈IC 1 0 1 1 0 0 h h bc+1, . . . , bn are variables, DC (d)(i) is a number, and all other occurrences are either variables or numbers. Figure 2. Illustration of a matrix and its compatibility graph. The Next, observe that the only equations in ΥS that may con- vertex label indicates the corresponding row number. tain non-linear terms are those for d[j], where j ≤ c, and in 2 We start by showing that DRMC (and therefore p-DRMC) particular ΥS contains at most k equations with non-linear can be reduced to the CLIQUE COVER problem, which is terms (k equations for at most k vectors d in R\IR). We will defined as follows. now use substitutions to simplify ΥS by removing all linear equations; specifically, at each step we select an arbitrary lin- CLIQUE COVER (CC) ear equation EQi containing a variable x, apply substitution Input: An undirected graph G and an integer k. of x in EQi to construct a new system of equations with one fewer equation, and simplify all equations in the new system. Task: Find a partition of V (G) into at most k If at any point we reach a system of equations that contains cliques, or output that no such partition an invalid equation (e.g., 2=5), then ΥS does not have a so- exists. lution, and we discard the corresponding branch. Otherwise, Lemma 8. An instance I = (p, M, t) of DRMC has a after at most |R\IR|·n+|C \IC |·(m−r) ∈ O(kn+km) solution if and only if the instance I0 = (G(M), t) of k2 substitutions, we obtain a system of at most quadratic CC does. Moreover, a solution for I0 can obtained in equations ΨS such that any solution to ΨS can be trans- polynomial-time from a solution for I and vice versa. formed into a solution to ΥS in time O(kn + km). We can now apply Proposition 5 to solve Ψ and mark S as a valid S Proof Sketch. The lemma follows immediately from the signature if Ψ has a solution. S observation that a set R of rows in M can be made identical After all signatures have been processed, the algorithm se- if and only if G(M)[R] is a clique. Parameterized Algorithms for the Matrix Completion Problem
Theorem 9. CC is in FPT when parameterized by the For the remainder of this section, we will introduce the treewidth of the input graph. PARTITIONING INTO TRIANGLES (PIT) problem: Given a graph G, decide whether there is a partition P of V (G) into Proof Sketch. We show the theorem via a standard dynamic triangles. We will often use the following easy observation. programming algorithm on a tree-decomposition of the in- Observation 1. A graph G that does not contain a clique put graph (Bodlaender & Koster, 2008). Namely after com- with four vertices has a partition into triangles if and only puting an optimal tree-decomposition (T, χ) of the input if it has a partition into at most |V (G)|/3 cliques. graph G, which can be achieved in FPT-time w.r.t. the treewidth of G (Kloks, 1994; Bodlaender, 1996; Bodlaen- Theorem 13. DRMC[col] is paraNP-hard. der et al., 2016), we compute a set R(t) of tuples via a bottom-up dynamic programming algorithm for every Proof Sketch. We will reduce from the NP-complete 3-SAT- t ∈ V (T ). In our case (P, c) ∈ R(t) if and only if P is 2 problem (Berman et al., 2003): Given a propositional a partition of G[χ(t)] into cliques and c is the minimum formula φ in conjunctive normal form such that (1) every 0 number such that G[χ(Tt)] has a partition P into c cliques clause of φ has exactly three distinct literals and (2) every with P = { P 0 ∩ χ(t) | P 0 ∈ P0 } \ {∅}. literal occurs in exactly two clauses, decide whether φ is satisfiable. To make our reduction easier to follow, we Note that the above theorem also implies that the well- will divide the reduction into two steps. Given an instance known COLORING problem is FPT parameterized by the (formula) φ of 3-SAT-2, we will first construct an equiv- treewidth of the complement of the input graph. The theo- alent instance G of PIT with the additional property that rem below follows immediately from Lemmas 8 and 9. G does not contain a clique on four vertices. We note that similar reductions from variants of the satisfiability problem Theorem 10. DRMC and p-DRMC are in FPT when to PIT are known (and hence our first step does not show parameterized by the treewidth of the compatibility graph. anything new for PIT); however, our reduction is specifi- cally designed to simplify the second step, in which we will 4.1. p-DRMC construct an instance (M, |V (G)|/3) of DRMC such that Theorem 11. p-DRMC[comb] is in FPT. G(M) is isomorphic to G and M has only seven columns. By Observation 1 and Lemma 8, this proves the theorem Proof. Let (M, t) be an instance of p-DRMC, and let k be since (M, |V (G)|/3) has a solution if and only if φ does. the parameter comb. By Proposition 1, we can compute a set Let φ be an instance of 3-SAT-2 with variables x1, . . . , xn R• of rows and a set C• of columns, where |R• ∪ C•| ≤ k, and clauses C1,...,Cm. We first construct the instance G and such that every occurrence of • in M is either contained of PIT such that G does not contain a clique of size four. in a row or column in R• ∪ C•. Let R and C be the set of For every variable xi of φ, let G(xi) be the graph illustrated rows and columns of M, respectively. Let P be the unique in Figure 3, and for every clause Cj of φ, let G(Cj) be 0 partition of R \ R• such that two rows r and r belong to the graph illustrated in Figure 4. Let f :[m] × [3] → the same set in P if and only if they are identical on all { xo, x¯o | 1 ≤ i ≤ n ∧ 1 ≤ o ≤ 2 } be any bijective k i i columns in C \ C•. Then |P | ≤ (p + 1) , for every P ∈ P, function such that for every j and r with 1 ≤ j ≤ m and since two rows in P can differ on at most |C | ≤ k entries, o • 1 ≤ r ≤ 3, it holds that: If f(j, r) = xi (for some i and o), each having (p + 1) values to be chosen from. Moreover, o then xi is the r-th literal of Cj; and if f(j, r) =x ¯i , then x¯i any two rows in R \ R• that are not contained in the same is the r-th literal of Cj. set in P are not compatible, which implies that they appear The graph G is obtained from the disjoint union of the in different components of G(M) \ R• and hence the set graphs G(x1),...,G(xn),G(C1),...,G(Cm) after apply- of vertices in every component of G(M) \ R• is a subset of P , for some P ∈ P. It is now straightforward to show ing the following modifications: (1) For every j and r that tw(G(M)) ≤ k + (p + 1)k, and hence, tw(G(M)) is with 1 ≤ j ≤ m and 1 ≤ r ≤ 3 add edges forming a 1 2 bounded by a function of the parameter k. The theorem triangle on the vertices lj,r, lj,r, f(j, r); and (2) for ev- 1 2 now follows from Theorem 10. ery i with 1 ≤ i ≤ 2n − m, add the vertices gi , gi 1 2 and an edge between gi and gi . Finally we add edges 4.2. DRMC forming a complete bipartite graph between all vertices in o { gi | 1 ≤ i ≤ 2n − m ∧ 1 ≤ o ≤ 2 } and all vertices in o The proof of the following theorem is very similar to the { hi | 1 ≤ i ≤ n ∧ 1 ≤ o ≤ 2 }. proof of Theorem 11, i.e., we mainly use the observation G that the parameter row is also a bound on the treewidth of This completes the construction of . The following claim the compatibility graph and then apply Theorem 10. concludes the first step of our reduction. Theorem 12. DRMC[row] is in FPT. Claim 1. φ is satisfiable if and only if G has a partition Parameterized Algorithms for the Matrix Completion Problem
1 OLORINGδ=n−4 xi (i, 1, •, •, 0, 0, 0) by 3; denote this problem as (n/3)-C . This 2 problem is NP-hard via a reduction from the PARTITION xi (i, 1, •, 0, •, 0, 0) INTO TRIANGLES problem on K4-free cubic graphs. The xi(i, •, 0, 0, 0, 0, 0) NP-hardness of the latter problem follows from the NP- 2 x¯i (i, 0, •, 0, •, 0, 0) hardness of the PARTITION INTO TRIANGLES problem on x¯1(i, 0, •, •, 0, 0, 0) planar cubic graphs (Cerioli et al., 2008), since a K4 in a i cubic graph must be isolated, and hence can be removed from the start. Finally, using Observation 1, PARTITION Figure 3. An illustration of the gadget G(xi) introduced in the INTO TRIANGLES on K4-free cubic graphs is polynomial- reduction of Theorem 13. The label of each vertex v indicates the time reducible to (n/3)-COLORINGδ=n−4, via the simple row vector R(v). reduction that complements the edges of the graph.
1 n/3 OLORINGδ=n−4 p lj,1(4, 1, j, 1, •, •, 0) Now we reduce from ( )-C to -DRMC by mimicking a standard reduction from 3-coloring to rank l2 (4, 1, j, 1, •, •, 0) j,1 minimization (Peeters, 1996). Given an instance G of (n/3)- 1 1 δ=n−4 hj (•, •, j, 1, 1, 1, •) lj,2(5, 1, j, •, 1, •, 0) COLORING , we construct an n × n matrix M whose h2(•, •, j, 1, 1, 2, •) l2 (5, 1, j, •, 1, •, 0) rows and columns correspond to the vertices in G, as follows. j j,2 The diagonal entries of M are all ones. For an entry at row 1 lj,3(6, 0, j, 1, •, •, 0) i and column j, where i 6= j, M[i, j] = 0 if ij ∈ E(G), 2 and is • otherwise. Finally, we set r = n/3. Observe that lj,3(6, 0, j, 1, •, •, 0) since each vertex in G has n − 4 neighbors, the number of missing entries in any row and any column of M is 3. Figure 4. An illustration of the gadget G(Cj ) introduced in the reduction of Theorem 13. The label of each vertex v indicates the It is not difficult to show that G is a yes-instance of (n/3)- 1 2 COLORINGδ=n−4 if and only if (M, n/3) is a yes-instance row vector R(v); here we assume that f(j, 1) = x4, f(j, 2) = x5, 1 and f(j, 3) =x ¯6. of 2-DRMC. into triangles. Moreover, G does not contain a clique of size The second statement in the theorem follows via a reduction four. from 3-COLORING on graphs of maximum degree at most 4 (Garey et al., 1976), using similar arguments. We will now proceed to the second (and final) step of our reduction, i.e., we will construct an instance (M, |V (G)|/3) 5. Conclusion of DRMC such that: G(M) is isomorphic to G and M has only seven columns. We studied the parameterized complexity of two fundamen- tal matrix completion problems under several parameteri- M contains one row R(u) for every u ∈ V (G). The defini- zations. For the bounded domain case, we painted a posi- tion of R(u) for every vertex u that is part of some gadget tive picture by showing that the two problems are in FPT G(xi) or G(Cj) is illustrated in Figures 3 and 4. Addition- (resp. FPTR) w.r.t. all considered parameters. For the un- o ally, we set R(gj ) = (•, •, •, •, •, •, j) for every j and o bounded domain case, we characterized the parameterized with 1 ≤ j ≤ 2n − m and 1 ≤ o ≤ 2. Using an exhaustive complexity of DRMC by showing that it is in FPT parame- case analysis, one can show that G(M) is indeed isomorphic terized by row, and paraNP-hard parameterized by col (and to G, which concludes the proof of the theorem. hence by comb). For RMC, we could show its membership in XP (resp. XPR) w.r.t. all considered parameters. Three We conclude this section with a hardness result showing that immediate open questions ensue: 2-DRMC remains NP-hard when the number of missing or known entries in each column/row is bounded. ◦ Is it possible to obtain a deterministic algorithm for p-RMC and RMC parameterized by comb? Theorem 14. The restriction of 2-DRMC to instances in which each row and each column contains exactly three ◦ Can we improve our XP (resp. XPR) results for RMC missing entries is NP-hard. The same holds for the restric- to FPT (resp. FPTR) or show that the problems are tion of 2-DRMC to instances in which each row and each W[1]-hard? column contains at most 4 determined entries. ◦ Does a hardness result, similar to the one given in Theorem 14 for p-DRMC, hold for p-RMC? Proof Sketch. Consider the problem of (properly) coloring a graph on n vertices, having minimum degree n − 4 and no Acknowledments. Robert Ganian is also affiliated with independent set of size 4, by n/3 colors, where n is divisible FI MU, Czech Republic. Parameterized Algorithms for the Matrix Completion Problem
References Cygan, M., Fomin, F. V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., and Saurabh, S. Matrix completion and the Netflix challenge. Parameterized Algorithms. Springer, 2015. See https://en.wikipedia.org/wiki/ Matrix completion. Downey, R. G. and Fellows, M. R. Fundamentals of Parame- terized Complexity. Texts in Computer Science. Springer Backstr¨ om,¨ C., Jonsson, P., Ordyniak, S., and Szeider, S. A Verlag, 2013. ISBN 978-1-4471-5558-4, 978-1-4471- complete parameterized complexity analysis of bounded 5559-1. planning. J. of Computer and System Sciences, 81(7): 1311–1332, 2015. Elhamifar, E. and Vidal, R. Sparse subspace clustering: Berman, P., Karpinski, M., and Scott, A. D. Approximation Algorithm, theory, and applications. IEEE Trans. Pattern hardness of short symmetric instances of MAX-3SAT. Anal. Mach. Intell., 35(11):2765–2781, 2013. Electronic Colloquium on Computational Complexity Endriss, U., de Haan, R., and Szeider, S. Parameterized (ECCC), (049), 2003. complexity results for agenda safety in judgment aggrega- Bessiere, C., Hebrard, E., Hnich, B., Kiziltan, Z., Quim- tion. In Weiss, G., Yolum, P., Bordini, R. H., and Elkind, per, C., and Walsh, T. The parameterized complexity of E. (eds.), Proceedings of AAMAS 2015, the 14th Interna- global constraints. In Fox, D. and Gomes, C. P. (eds.), tional Conference on Autonomous Agents and Multiagent Proceedings of the Twenty-Third AAAI Conference on Systems, pp. 127–136. IFAAMAS/ACM, 2015. Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, Fazel, M. Matrix rank minimization with applications. PhD July 13-17, 2008 , pp. 235–240. AAAI Press, 2008. thesis, Stanford University, 2002. Bodlaender, H. L. A linear-time algorithm for finding tree- Frieze, A. M., Kannan, R., and Vempala, S. Fast monte- decompositions of small treewidth. SIAM J. Comput., 25 carlo algorithms for finding low-rank approximations. J. (6):1305–1317, 1996. ACM, 51(6):1025–1041, 2004. Bodlaender, H. L. and Koster, A. M. C. A. Combinato- Ganian, R. and Ordyniak, S. The complexity landscape rial optimization on graphs of bounded treewidth. The of decompositional parameters for ILP. Artificial Intelli- Computer Journal, 51(3):255–269, 2008. gence, 257:61 – 71, 2018. Bodlaender, H. L., Drange, P. G., Dregi, M. S., Fomin, F. V., Lokshtanov, D., and Pilipczuk, M. A O(ckn) 5- Garey, M. R., Johnson, D. S., and Stockmeyer, L. J. Some approximation algorithm for treewidth. SIAM J. Comput., simplified NP-complete graph problems. Theoretical 45(2):317–378, 2016. Computer Science, 1(3):237–267, 1976. Candes,` E. J. and Plan, Y. Matrix completion with noise. Gaspers, S. and Szeider, S. Guarantees and limits Proceedings of the IEEE, 98(6):925–936, 2010. of preprocessing in constraint satisfaction and reason- ing. Artificial Intelligence, 216:1–19, 2014. doi: Candes,` E. J. and Recht, B. Exact matrix completion via con- 10.1016/j.artint.2014.06.006. vex optimization. Foundations of Computational Mathe- matics, 9(6):717–772, 2009. Gottlob, G. and Szeider, S. Fixed-parameter algorithms for artificial intelligence, constraint satisfaction, and database Candes,` E. J. and Tao, T. The power of convex relaxation: problems. The Computer Journal, 51(3):303–325, 2006. near-optimal matrix completion. IEEE Trans. Information Survey paper. Theory, 56(5):2053–2080, 2010. Hardt, M., Meka, R., Raghavendra, P., and Weitz, B. Com- Cerioli, M. R., Faria, L., Ferreira, T. O., Martinhon, C. A. J., putational limits for matrix completion. In Proceedings of Protti, F., and Reed, B. A. Partition into cliques for cubic The 27th Conference on Learning Theory, volume 35 of graphs: Planar case, complexity and approximation. Dis- JMLR Workshop and Conference Proceedings, pp. 703– crete Applied Mathematics, 156(12):2270–2278, 2008. 725. JMLR.org, 2014.
Courtois, N., Goubin, L., Meier, W., and Tacier, J. Solving Ibarra, O. H., Moran, S., and Hui, R. A generalization of underdefined systems of multivariate quadratic equations. the fast LUP matrix decomposition algorithm and appli- In Public Key Cryptography, 5th International Workshop cations. J. Algorithms, 3(1):45–56, 1982. on Practice and Theory in Public Key Cryptosystems, PKC 2002, Paris, France, February 12-14, 2002, Pro- Keshavan, R. H., Montanari, A., and Oh, S. Matrix comple- ceedings, volume 2274 of Lecture Notes in Computer tion from a few entries. IEEE Trans. Information Theory, Science, pp. 211–227. Springer, 2002. 56(6):2980–2998, 2010a. Parameterized Algorithms for the Matrix Completion Problem
Keshavan, R. H., Montanari, A., and Oh, S. Matrix com- pletion from noisy entries. Journal of Machine Learning Research, 11:2057–2078, 2010b. Kloks, T. Treewidth: Computations and Approximations. Springer Verlag, Berlin, 1994. Miura, H., Hashimoto, Y., and Takagi, T. Extended al- gorithm for solving underdefined multivariate quadratic equations. IEICE Transactions, 97-A(6):1418–1425, 2014. Peeters, R. Orthogonal representations over finite fields and the chromatic number of graphs. Combinatorica, 16(3): 417–431, 1996. Recht, B. A simpler approach to matrix completion. Journal of Machine Learning Research, 12:3413–3430, 2011. Robertson, N. and Seymour, P. D. Graph minors. II. Algo- rithmic aspects of tree-width. J. Algorithms, 7(3):309– 322, 1986. Saunderson, J., Fazel, M., and Hassibi, B. Simple algo- rithms and guarantees for low rank matrix completion over F2. In Proceedings of The IEEE International Sym- posium on Information Theory, pp. 86–90. IEEE, 2016. van Bevern, R., Komusiewicz, C., Niedermeier, R., Sorge, M., and Walsh, T. H-index manipulation by merging articles: Models, theory, and experiments. Artif. Intell., 240:19–35, 2016.