Biconvex Relaxation for Semidefinite Programming in Computer Vision

Sohil Shah1, Abhay Yadav1, Tom Goldstein1, David Jacobs1, Carlos Castillo1 and Christoph Studer2 1University of Maryland, College Park, 2Cornell University, Ithaca, NY

Abstract non-convex constraints, such as integer-valued vectors and low-rank matrices, can be handled by semidefinite relaxation Semidefinite programming has become an indispensable (SDR) [16]. In this method, a non- tool in computer vision. Unfortunately, general-purpose problem involving a vector of unknowns is “lifted” to a solvers for semidefinite programs are too slow to be used on higher dimensional convex problem involving a positive large-scale problems and exhibit excessive memory require- semidefinite matrix (all eigenvalues are non-negative) of ments. We propose a general framework to approximately unknowns, which then enables one to solve a semidefinite solve large-scale semidefinite problems (SDPs) at low com- program (SDP) [31]. While SDR was shown to deliver putational complexity. Our new approach, referred to as state-of-the-art performance in a wide range of applications biconvex relaxation (BCR), transforms a general SDP into [9, 16, 10, 30, 37, 20], the approach significantly increases a specific biconvex optimization problem, which can then be the dimensionality of the original optimization problem (i.e., solved in the original, low-dimensional variable space at replacing a vector with a matrix), which typically results in low complexity. The resulting biconvex problem is approx- exorbitant computational costs and memory requirements. imately solved using an efficient alternating minimization Nevertheless, SDR leads to SDPs whose global optimal (AM) procedure. Since AM has the potential to get stuck in solution(s) can be found using robust numerical methods. local minima, we propose a general initialization scheme A growing number of computer-vision applications in- that enables our BCR framework to start close to the global volve high-resolution images (or even videos) that require optimum of the original SDP—this is key for our algorithm the solution to SDPs with an excessive number of variables. to quickly converge to optimal or near-optimal solutions. We General-purpose solvers for SDPs (e.g., interior point meth- showcase the efficacy of our approach on two applications in ods) require high computational complexity; the worst-case computer vision, namely segmentation and co-segmentation. complexity is O(N 6.5 log(1/ε)) [25] for an N ×N problem BCR achieves a comparable solution quality to state-of-the- with  objective error. Very often, N is proportional to the art SDP methods with speedups ranging between 4× and number of image pixels, which is potentially very large. 35×. At the same time, BCR handles a more general set of SDPs than the fastest previous approaches, which are more The prohibitive complexity and memory requirements of specialized. solving SDPs exactly with a large number of variables has spawned recent interest in fast, non-convex solvers that avoid lifting, i.e., operate in the original low-dimensional variable 1. Introduction space. For example, recent progress in phase retrieval by Candes` et al. [5] has shown that non-convex optimization Optimization problems involving either integer-valued methods provably achieve a solution quality comparable to vectors or matrices with low-rank structure are ubiquitous exact SDR-based methods at significantly lower complexity. in computer vision. Graph-cut methods for image segmen- This method operates on the original dimension, which en- tation, for example, involve optimization problems where ables its use on high-dimensional problems (reference [5] integer-valued variables represent region labels [26, 13, 30, performs phase retrieval on a 1080×1920 pixel image in less 9]. Problems in multi-camera structure-from motion [1], than 22 minutes). Another prominent example is max-norm manifold embedding [37], and matrix completion [20] re- regularization by Lee et al. [15], which was proposed for quire the solution of optimization problems over matrices solving high-dimensional matrix-completion problems and that are constrained to have low-rank. Since the constraints to approximately perform max-cut clustering. This method in such optimization problems are non-convex, the design was shown to outperform exact SDR-based methods in terms of computationally-efficient algorithms that find globally of computational complexity, while delivering acceptable optimal solutions is, in general, a difficult task. solution quality. While both of these examples outperform For a wide range of applications [16, 14, 3, 8, 27, 37], classical SDP-based methods, they are limited to very spe-

1 cific problem types, i.e., they perform one of phase retrieval, vex constraints via semidefinite relaxation (SDR) [16] and matrix completion, or max-cut clustering. More importantly, (ii) the resulting problems often come with strong theoreti- these methods are unable to handle complex SDPs that typi- cal guarantees. For example, the Goemans-Williamson [9] cally appear in computer vision. SDR seeks to find a solution to the NP-complete max-cut problem, which can be relaxed to the form (1) and solved 1.1. Contributions using general-purpose methods. Furthermore, the value of In this paper, we introduce a novel framework for approx- the graph cut obtained via this relaxation is known to exceed imately solving general SDPs in a computationally-efficient 80% of the maximum achievable value. manner and with small memory footprint. Our approach, re- In computer vision, a large number of problems can be ferred to as biconvex relaxation (BCR), transforms a general cast as SDPs of the general form (1). For example, [37] SDP into a specific biconvex optimization problem, which formulates image manifold learning as an SDP, [27] uses can then be solved in the original, low-dimensional vari- an SDP to enforce a non-negative lighting constraint when able space at low complexity. The resulting biconvex prob- recovering scene lighting and object albedos, [2] uses an lem is solved using a computationally-efficient alternating SDP for graph matching, [1] proposes an SDP that recovers minimization (AM) procedure. Since AM is prone to get the orientation of multiple cameras from point correspon- stuck in local minima, we propose a general initialization dences and essential matrices, and [20] uses low-rank SDPs scheme that enables our BCR framework to start close to to solve matrix-completion problems that arise in structure- the global optimum of the original SDP—this initialization from-motion and photometric stereo. is key for our algorithm to quickly converge to an optimal or near-optimal solution. We showcase the effectiveness of 2.2. SDR for Binary-Valued Quadratic Problems the proposed BCR framework by comparing the resulting A large body of results that use SDR to solve binary la- algorithms to that of highly-specialized SDP solvers for a beling problems exist in the literature. For such problems, select set of problems in computer vision involving image a set of variables take on binary values (e.g., indicating a segmentation and co-segmentation. Our results demonstrate segmentation into two groups) while minimizing a quadratic that BCR enables high-quality results at best-in-class run- cost function that depends on the assignment of pairs of vari- times. Specifically, BCR achieves speedups ranging from ables. Such labeling problems typically arise from Markov 4× to 35× over state-of-the-art competitor methods [34, 17] random fields (MRFs) for which a rich literature on solution for image segmentation and co-segmentation. methods exist (see [33] for a recent survey). Spectral meth- ods, e.g., [26], are often used to solve such binary-valued 2. Background and Relevant Prior Art quadratic problems (BQPs)—the references [13, 30] used We now briefly review semidefinite programs (SDPs) and SDR, inspired by the work of [9] that provides a generalized then discuss prior work on fast, approximate solvers for SDR for the max-cut problem. BQP problems have wide SDPs in computer vision and related applications. applicability to computer vision problems, such as segmen- tation and perceptual organization [13, 34, 11], semantic 2.1. Semidefinite Programs (SDPs) segmentation [35], matching [30, 24], surface reconstruction including photometric stereo and shape from defocus [8], SDPs find use in a large (and growing) number of fields, and image restoration [23]. including computer vision, machine learning, signal and im- The key idea of solving BQPs via SDR is to lift the binary- age processing, statistics, communications, and control [31]. N 2 SDPs can be written in the following general form: valued label vector b ∈ {±1} to an N -dimensional ma- trix space by forming the PSD matrix B = bbT , whose minimize hC, Yi non-convex rank-1 constraint is relaxed to PSD matrices N×N Y∈R + B ∈ SN×N with an all-ones diagonal [16]. The goal is then subject to hA , Yi = b , ∀i ∈ E, i i (1) to solve a SDP for B in the hope that the resulting matrix is hAj, Yi ≤ bj, ∀j ∈ B, rank 1 (in this case, SDR found an optimal solution); if B has Y ∈ S+ , higher rank, an approximate solution must be extracted. Ap- N×N proximate solutions can either be obtained from the leading + eigenvector or via randomization methods [16]. where SN×N represents the set of N × N symmetric, pos- hC, Yi = tr(CT Y) itive semidefinite matrices, and is the 2.3. Specialized Solvers for SDPs matrix inner product. The sets E and B contain the indices associated to the equality and inequality constraints, respec- While a number of general-purpose solvers for SDPs tively; the matrices Ai and Aj have appropriate dimensions. exist, such as SeDuMi [28] or SDPT3 [29], their computa- The key advantages of SDPs are that (i) they enable the tional complexity and memory requirements are generally transformation of certain non-convex constraints into con- high. Hence, their use is restricted to low-dimensional prob-

2 lems. For problems in computer vision, where the number 3. Biconvex Relaxation (BCR) Framework of variables can become comparable to the number of pixels in an image, more efficient algorithms are necessary. To We now present our framework, referred to as biconvex this end, a handful of special-purpose algorithms have been relaxation (BCR). We then propose an alternating minimiza- proposed to solve specific problem types arising in computer tion procedure and a suitable initialization method. vision. These algorithms fit into two classes: (i) convex algo- rithms that solve the original SDP by exploiting the problem 3.1. Biconvex Relaxation structure and (ii) non-convex methods that avoid lifting. Rather than attempting to solve the general SDP (1) di- rectly, we exploit the following key fact: any matrix Y For certain problems in computer vision, it is possible is symmetric positive semidefinite if and only if it has an to exactly solve SDPs with much lower complexity than expansion of the form Y = XXT . By substituting the fac- interior point schemes. Most work in this area has focused torization Y = XXT into (1), we are able to remove the et al. on BQP problems. Ecker [8] deployed a number of semidefinite constraint and arrive at the following problem: heuristics to speed up the Goemans-Williamson SDR [9] for surface reconstruction. Olsson et al. [23] proposed a minimize tr(XT CX) spectral to solve BQP problems that N×r X∈R include a linear term, but are unable to handle inequality T subject to tr(X AiX) = bi, ∀i ∈ E, (2) constraints. A particularly popular approach is the SDCut T algorithms of Wang et al. [34]. This method solves BQP tr(X AjX) ≤ bj, ∀j ∈ B. for some types of segmentation problems by using gradi- ent descent on the dual. This approach leads to a similar Above, we used the fact that hC, XXT i = tr(XT CX), relaxation as for BQP problems, but enables significantly which is a quadratic function in X, and r = rank(Y).1 lower computational complexity for graph cutting and its Note that any symmetric matrix A has a (possibly variants. To the best of our knowledge, the method by Wang complex-valued) square root L of the form A = LT L. et al. [34] yields state-of-the-art performance—nevertheless, Furthermore, we have tr(XT AX) = tr(XT LT LX) = our proposed method is at least an order of magnitude faster, 2 kLXkF , where k · kF is the Frobenius (matrix) norm. This as we show in Section4. formulation enables us to rewrite (2) as follows:

Another algorithm class is non-convex approximation minimize tr(XT CX) N×r methods that avoid lifting. Because they work with low- X∈R 2 (3) dimensional unknowns, such algorithms are potentially more subject to Qi = LiX, kQikF = bi, ∀i ∈ E, efficient than lifted methods. Simple examples include the 2 Qj = LjX, kQjkF ≤ bj, ∀j ∈ B. Wiberg method [22] for low-rank matrix approximation, which uses Newton-type iterations to minimize a non-convex If the matrices {A }, {A }, and C are themselves symmet- objective. A number of methods have been proposed for i j ric positive semidefinite, then the objective function in (3) is SDPs where the objective function is simply the trace-norm convex and quadratic, and the inequality constraints in (3) of Y (i.e., problem (1) with C = I) and without inequality are convex—non-convexity of the problem is only caused constraints. Approaches include replacing the trace norm by the equality constraints. The core idea of BCR explained with the max norm [15], or using the so-called Wirtinger next is to relax these equality constraints. flow to solve phase-retrieval problems [5]. One of the ear- liest approaches for non-convex methods are due to Burer In the formulation (3), we have lost convexity. Never- and Montiero [4], who propose an augmented Lagrangian theless, whenever r < N, we achieved a (potentially large) method. While this method is able to handle arbitrary objec- dimensionality reduction compared to the original SDP (1). tive functions, it does not naturally support inequality con- We now relax (3) in a form that is biconvex, i.e., convex straints (i.e., without auxiliary slack variables). Furthermore, with respect to a group of variables when the remaining vari- this approach uses convex methods for which convergence ables are held constant. By relaxing the convex problem is not well understood. This method is also highly sensitive in biconvex form, we retain many advantages of the con- to the initialization value. vex formulation while maintaining low dimensionality and speed. In particular, we propose to approximate (1) with the While most of the above-mentioned methods provide 1Straightforward extensions of our approach allow us to also handle best-in-class performance at low computational complexity, T constraints of the form tr(X AkX) ≥ bk, ∀k ∈ A, as well as complex- they are limited to very specific problems and cannot be valued matrices and vectors. For the sake of simplicity, we will not elaborate generalized to other, more general SDPs. on both of these extensions.

3 following biconvex relaxation (BCR): Algorithm 1 AM for Biconvex Relaxation

1: inputs: C, {Li}, bi, α, and β T α X 2 minimize tr(X CX) + kQi − LiXkF 2: Compute an initializer for X as in Section 3.3 X,Qi,i∈{B∪E} 2 i∈{E∪B} P T −1 3: Precompute M = C + α i∈{E∪B} Li Li β X − kQ k2 (4) 4: while not converged do j F n√ o 2 LiX α j∈E 5: Qi ← min bi, kLiXkF kLiXkF α−βi 2   subject to kQikF ≤ bi, ∀i ∈ {B ∪ E}, P T 6: X ← M i∈{E∪B} Li Qi , where α > β > 0 are (suitably chosen) regularization pa- 7: end while rameters. In this BCR formulation, we relaxed the equal- 8: output: X 2 ity constraints kQikF = bi, ∀i ∈ E, to inequality con- 2 Intuitively, this expansion–re-projection update causes the straints kQikF ≤ bi, ∀i ∈ E, and added negative quadratic β matrix Qi to expand if i ∈ E, thus, encouraging it to satisfy penalty functions − 2 kQik, ∀i ∈ E, to the objective function. These negative quadratic penalty function promote that the the relaxed in equality constraints in (4) with equality. relaxed inequality constraints in E are satisfied with equality. Stage 2: Minimize with respect to X. This stage re- quires us to solve the following least-squares problem: We furthermore replaced the constraints Qi = LiX and Qj = LjX by quadratic penalty functions in the objective T α X 2 X ← arg min tr(X CX) + kQi −LiXkF . (6) function. N×r 2 X∈R i∈{E∪B} Our BCR formulation in (4) has some interesting prop- + erties. First, whenever C ∈ SN×N the problem is bicon- Since the optimality conditions for this problem are linear vex, i.e., convex with respect to X when the {Qi} are held equations, its solution is simply given by constant, and vice versa. Consider the case of solving a  −1  constraint feasibility problem (i.e., problem (1) with C = 0). X T X T If Y = XXT is a solution to (1), the problem (4) assumes X ← C + α Li Li  Li Qi, (7) P i∈{E∪B} i∈{E∪B} objective value − i bi, which is the global minimizer of the BCR formulation (4). Likewise, it is easy to see that any P where the matrix inverse may be replaced by a pseudo- global minimizer of (4) with objective value − i bi must inverse, if necessary. Alternatively, one may perform a sim- be a solution to the original problem (1). ple gradient-descent step with a suitable step size, which avoids the costly (pre-)computation of a potentially large- 3.2. Alternating Minimization Algorithm dimensional (pseudo) inverse. One of the key benefits of biconvexity is that (4) can be The resulting AM algorithm to approximately solve the globally minimized with respect Q or X. Hence, it is natural proposed BCR (4) is summarized in Algorithm1. to compute approximate solutions to (4) via alternating min- 3.3. Initialization imization (AM). We emphasize that the convergence of AM is well understood for a variety of biconvex problems [7, 6]. Since the problem (4) is biconvex, a global minimizer AM proceeds by optimizing the problem for the matrices can be found with respect to either {Qi} or X. With the {Qi} while keeping X fixed, and then, optimizing for X proposed AM procedure, we hope to find a global minimizer while keeping the matrices {Qi} fixed, in an alternating in both the matrices {Qi} and X at low complexity. In prac- fashion. The two stages of the proposed AM method for tice, however, AM is prone to get trapped in local minima, BCR are detailed below. especially if the variables have been initialized poorly. We Stage 1: Minimize with respect to {Qi}. The objective now discuss a principled method for computing an initializer of the BCR in (4) is quadratic in {Qi}, and no dependence for X that is often close to the global optimum of the BCR among these matrices exist. Consequently, the optimal value problem–our initializer is key for the success of the proposed of Qi can be found by minimizing the quadratic objective, AM procedure and enables fast convergence. The papers [21, 5] have considered optimization problems and then re-projecting√ back into a unit Frobenius-norm ball of radius bi. The minimizer of the quadratic objective is that arise in phase retrieval where B = ∅ (i.e., there are only α given by LiX, where βi = 0 if i ∈ B and βi = β if equality constraints), C = I being the identity, and Y being α−βi i ∈ E. The projection onto the unit Frobenius-norm ball then rank one. For such problems, the objective of (1) reduces to leads to the following expansion–re-projection update: tr(Y). By setting Y = xxT , we obtain 2 minimize kxk2 LiX np α o x∈ N Qi ← min bi, kLiXkF . (5) R (8) kLiXkF α − βi 2 subject to qi = Lix, kqik2 = bi, ∀i ∈ E.

4 In order to find a good initializer for such phase retrieval solutions to biconvex problems [38]. In contrast, the method problems of the form (8), Netrapali et al. [21] proposed the in [4] uses augmented Lagrangian methods with non-linear following strategy. Define constraints for which convergence is not guaranteed.

1 X Z = b LT L . (9) 4. Benchmark Problems |E| i i i i∈E In this section we evaluate our solver using both synthetic Let v be the leading eigenvector of Z and λ the leading and real-world data. We begin with a brief comparison show- eigenvalue. Then x = λv is an accurate approximation ing that biconvex solvers out-perform interior-point methods to the true solution of (8). In fact, if the matrices Li are for general-form SDPs. Of course, for more specific problem sampled from a random normal distribution, then it was forms, specialized solvers for convex problems achieve supe- ? 2 shown in [21, 5] that Ekx − λxk2 → 0 (in expectation) as rior performance to classical interior point schemes. For this |E| → ∞, where x? is the true solution to (8). reason, we evaluate our proposed method on two important We are interested in a good initializer for the general computer vision applications, segmentation and cosegmenta- problem in (3) where X can be rank one or higher. We focus tion, using public datasets and comparing with state of the on problems with equality constraints only—note that one art methods. These applications are ideal because (i) it is can use slack variables to convert a problem with inequality easy to generate large problems and (ii) customized solvers constraints into the same form [31]. Given that C is a sym- are available (SDCut and its variants) that exploit problem metric positive definite matrix, it can be decomposed into structure to solve the lifted problem efficiently. This way we C = UT U. By the change of variables Xe = UX, we can can compare our BCR framework against a relatively power rewrite (1) as follows: and optimized solver for the lifted case.

minimize kXe k2 4.1. General-form Problems N×r F X∈R (10) T We briefly demonstrate that BCR in (4) performs well on subject to hAe i, Xe Xe i = bi, ∀i ∈ E general SDPs by comparing to the widely used SDP solvers, −T −1 SDPT3 and SeDuMi. Note that, SDPT3 and SeDuMi use where Ae i = Z AiZ with Z from (9), and we omitted the inequality constraints. To initialize the proposed AM interior point approach to solve the convex problem in (1). procedure in Algorithm1, we make the change of variables We randomly generate a rank-3 data matrix of the form Y = x xT + x xT + x xT . Each vector x is a 256- Xe = UX to transform the BCR formulation into the form true 1 1 2 2 3 3 i of (10). Analogously to the initialization procedure in [21] dimensional standard normal vector. We generate a standard normal matrix L and compute C = LT L. Standard normal for phase retrieval, we then compute an initializer Xe 0 using the leading r eigenvectors of Z scaled by the leading eigen- matrices {Ai} of dimension 250 × 250 are sampled and M value λ. Finally, we calculate the initializer for the original measurements bi are recorded. We report the relative error in |hC,Yreci−hC,Ytruei| the recovered solution Yrec measured as . problem by reversing the change of variables as X0 = UXe 0. hC,Ytruei We fix α = 1000 and β = 500. Average runtimes for 3.4. Advantages of Biconvex Relaxation varying numbers of constraints are shown in Figure 1a, while The proposed BCR framework has numerous advantages Figure 1b plots the average relative error. Figure 1a shows over other non-convex methods. First and foremost, our ap- that our method has better runtime complexity than interior proach can be applied to general SDPs. Specialized methods, point schemes. Figure 1b shows that the convex interior such as the Wirtinger flow [5] proposed for phase retrieval point methods do not recover the correct solution for small and the Wiberg method [22] for low-rank approximation, numbers of constraints. Because the lifted SDP has many are computationally efficient, but restricted in the type of unknowns, the convex problem becomes under-determined, problem they can solve. Similarly, the max-norm regular- allowing the objective to go to zero. The proposed BCR ization in [15] is limited to solving trace-norm-regularized approach is able to enforce a rank-3 constraint, which is SDPs. The method proposed by Burer and Montiero [4] is advantageous when the number of constraints is low. less specialized, but does not naturally support inequality 4.2. Image Segmentation constraints. Furthermore, since BCR problems are biconvex, it is possible to develop corresponding numerical methods Consider an image of N pixels, containing one major that are guaranteed to convergence. We emphasize that con- foreground object. The problem of segmenting this image vergence is guaranteed not only for the proposed AM least into foreground and background can been solved using graph- squares methods in Algorithm1 (for which the objective based approaches. For these methods, vertices that represent can be shown to decrease monotonically), but also for a the pixels are bisected into two disjoint sets using spectral broad range of gradient-descent schemes suitable to find clustering approaches including normalized cut [26] and

5 1500 Furthermore, our BCR is capable of incorporating anno- SDPT3 tated foreground and background pixel priors [39] using lin- 1000 SeDuMi ear equality and inequality constraints. Following the work BCR of [39], we consider three grouping constraints on the pix-

500 T 2 T 2 T 2 T 2 els: (tf Px) ≥ κ||tf Px||1, (tb Px) ≥ κ||tb Px||1 and

Average CPU time(s) T 2 T 2 ((tf − tb) Px) ≥ κ||(tf − tb) Px||1, where κ ∈ [0, 1]. 0 −1 100 200 300 400 500 1,000 P = D A is the normalized affinity matrix and tf and Number of constraints t are indicator variables denoting the foreground and back- (a) b ground pixels. Considering these constraints and the formu- T 1 lation in (13) with X = YY , the problem can be written in the form of (2) as follows: 0.8 SDPT3 T 0.6 minimize tr(Y LY) SeDuMi N×r Y∈R 0.4 BCR T subject to tr(Y AiY) = 1, ∀i = 1,...,N 0.2 T

Relative Error in Objective 0 tr(Y B1Y) = 0 100 200 300 400 500 1000 (14) Number of constraints T T 2 tr(Y B2Y) ≥ κ||tf Px||1 (b) T T 2 tr(Y B3Y) ≥ κ||tb Px||1 Figure 1: Results on synthetic data for varying number of T T 2 tr(Y B4Y) ≥ κ||(tf − tb) Px|| linear constraints (a) Average solver runtime, (b) Average 1 relative error T where r is the rank of the desired solution, B1 = 11 , B2 = T T T Ptf t P, B3 = Ptbt P, B4 = P(tf − tb)(tf − tb) P, ratio cut [36]; graph edges encode the similarities between f b A = a aT , a ∈ n is a vector with all zeros except ones pairs of pixels. This graph cut problem is formulated as, i i i i R at the ith position. The affinity matrix W is given by T minimize x Lx, (11)  2 2  x∈{−1,1}N ( ||fi−fj ||2 d(i,j) exp − γ2 − γ2 , if d(i, j) < r Wi,j = f d (15) which is NP-hard due to the binary-valued vector constraints. 0, otherwise, Spectral clustering approaches attempt to solve a relaxed version of (11) given by where fi and fj are color histograms i, j and d(i, j) is the spatial distance between i,j. By transforming (14) into the T T 2 minimize x Lx subject to 1 x = 0, kxk2 = N. (12) x∈{−1,1}N BCR form (4), we compute a score vector and the final binary solution. Here, L is a positive semi-definite Laplacian matrix given by SDCut as proposed in [34] presents an efficient algorithm L = D − W, where W is the weighted affinity matrix, and for binary quadratic problems. The paper devises an efficient D =diag(W1). Although the above problem is non-convex, dual formulation of the SDR. Although SDCut is accurate it can be solved using an eigendecomposition. Equation 12 and fast, it is limited to solving BQP problems rather than can be reformulated as follows, general SDP problems. In the following experiments, we compare our proposed approach (i.e., BCR) with the highly minimize tr(LT X) + customized solver SDCut. We also compare both the meth- X∈SN×N T ods to another fast method, namely biased normalized cut subject to 1 X1 = 0 (13) (BNCut) [18]. BNCut is an extension of Normalized cut diag(X) = 1 algorithm [26] which is capable of incorporating prior in- rank(X) = 1. formation for constrained image segmentation. However, this formulation can only supports one quadratic grouping This problem has been solved approximately in [10, 13, 24] constraint per problem. using convex methods after dropping the non-convex rank constraint. This approach, however, does not scale well 4.2.1 Experiments with a large number of pixels and becomes computationally inefficient when solved using off-the-shelf solvers such as In this section, we evaluate the BCR formulation to segment SeDuMi[28] or SDPT3[29]. In contrast, BCR enables us to a single foreground object and compare the runtime and incorporate rank constraints while constraining the solution quality of our AM method for BCR with that of SDCut and search space, which leads to efficient spectral clustering. BNCut. We use the publicly available code for SDCut and

6 T T 2 2 Method BNCut SDCut BCR equal cut 1 x = 0 is replaced by the bias (x δi) ≤ λ . Rank 1 7 2 We can write the image co-segmentation problem as (2) Time (s) 0.08 27.64 0.79 minimize tr(XT AX) Objective 10.84 6.40 6.34 n×r X∈R T (17) Table 1: Results on image segmentation. Numbers are the subject to: tr(X ZiX) = 1, ∀i = 1,...,N T 2 mean over the images in Fig.2. Lower numbers are better. tr(X ∆iX) ≤ λ , ∀i = 1, . . . , q

T T N BNCut with default settings. Following [34], we consider where ∆i = δiδi and Zi = eiei , ei ∈ R is an all-zeros the Berkeley segmentation dataset [19] and segment each vector with a single one at the ith position. image into super-pixels using the VL-Feat [32] toolbox. The For fairness, we compare to two well-known co- super-pixel wise affinity matrix is computed using (15). segmentation methods, namely low-rank factorization [12] For all the experiments, we set the parameters α and β to (denoted LR) and SDCut [11]. Following [12], we re- ∗ ∗ 2 and 0.06, respectively. The iteration limit is set to 1000. cover the optimal score vector xp from X using eigen- All the experiments are conducted on the same computer. decomposition principle. The final binary solution is ob- ∗ The quantitative results are tabulated in Table1 and quali- tained by thresholding xp. tative results are compared in Fig.2. Observe that for all the images considered, our approach gives better foreground 4.3.1 Experiments object segmentation compared to SDCut. For example, the grass in between the hind legs of the kangaroo is clearly We evaluate all three method on the Weizman horses3 and segmented out as background. Moreover, one can observe MSRC4 datasets. There are a total of four classes (horse, from Table1, that our solver is 35× faster than SDCut and car-front, car-back, and face) with 6 ∼ 10 images per class. yields lower objective energy. For all the images, our seg- Each image is over-segmented to 400 ∼ 700 SLIC super- mentations are achieved using only rank 2 solutions whereas pixels using the VLFeat [32] toolbox which gives a total of SDCut obtains similar results using rank 7 solutions.2 On around 4000 ∼ 7000 super-pixels per class. Compared to im- the other hand, although BNCut with rank 1 solution is much age segmentation formulation, this application requires 10× faster than SDP based methods, their segmentation results more variables. Note that the standard SDP solvers such as are not en par with that of SDP approaches. SeDuMi and SDPT3 are not effective with such large num- bers of variables [34]. We used the publicly available code 4.3. Image Co-segmentation for the LR algorithm and closely follow the experimental In co-segmentation, segmentation of the same object is procedure as detailed in [34]. performed using multiple images simultaneously. The co- The qualitative results for all three methods are presented segmentation formulation optimizes two criteria: (i) color in Figure3, while Table2 provides a quantitiative compari- and spatial consistency within a single image and (ii) dis- son. From Table2, we observe that on average our method crimination between foreground and background pixels over converges ∼ 9.5× faster than the SDCut and ∼ 60× faster multiple images. We follow the work of Joulin et al. [11], than LR. The objective function value achieved by BCR in which uses discriminative clustering to enforce the second (17) is also significantly lower than that of both SDCut and criteria. The problem is given by LR methods. Figure3 provides the visualization of the final score vector x∗ on selected images, which depicts that for T p minimize x Ax (16) all classes SDCut and LR produces visually similar results. x∈{±1}N x∗ T 2 2 Moreover, for BCR the optimal score vector p is obtained subject to (x δi) ≤ λ , ∀i = 1,...,M from a solution of only rank 2, as compared to rank 3 and PM rank 7 solutions for SDCut and LR methods, respectively. where N = i=1 Ni is the total number of pixels over all images, and M denotes number of images. The ma- µ −1/2 −1/2 5. Conclusion trix A = Ab + N Aw with Aw = In − D WD denotes the intra-image affinity matrix and Ab = λk(I − We have presented a novel biconvex relaxation framework T −1 T enen /N)(NλkIN +K) (I−eN eN /N) is the inter-image (short BCR) that enables the solution of general semidefinite discriminative clustering cost matrix. The matrix K is a ker- programs (SDPs) with low computational complexity and nel based on the χ2 distance of the SIFT features (see [11] low memory footprint. We have provided an alternating min- for the details). imization (AM) procedure along with a new initialization Now, one should note the above problem resembles a procedure which—together—are guaranteed to converge, graph-cut problem except that the balancing constraint for 3www.msri.org/people/members/eranb/ 2Except for one class with rank 5, all SDCut solutions have rank 7. 4www.research.microsoft.com/en-us/projects/objectclassrecognition/

7 Original

BNCut

SDCut

BCR

Figure 2: Image segmentation results on the Berkeley dataset.

Test Cases Dataset horse face car-back car-front Number of images 10 10 6 6 Variables in BQPs 4587 6684 4012 4017 LowRank 2647 1614 724 749 Time (s) SDCut 220 274 180 590 BCR 15 55 44 42 LowRank 4.84 4.48 5.00 4.17 Objective SDCut 5.24 4.94 4.53 4.27 BCR 4.64 3.29 4.36 3.94 LowRank 18 11 7 10 Rank SDCut 3 3 3 3 BCR 2 2 2 2

Table 2: Summary of co-segmentation results.

computationally efficient (even for large-scale problems), and able to handle a variety of SDPs. Comparisons of the biconvex relaxation approach with state-of-the-art methods for specific vision problems, such as segmentation and co- segmentation, show that BCR provides similar or better so- lution quality with significantly lower runtime. While this paper only shows applications for a select set of computer vision problems, the efficacy of BCR for other problems in signal processing, machine learning, control, etc. is left for future work. Figure 3: Co-segmentation results on the Weizman horses and MSRC datasets. The original images, the results of References LowRank, SDCut and BCR (illustrated from top to bottom). [1] M. Arie-Nachimson, S. Z. Kovalsky, I. Kemelmacher- Shlizerman, A. Singer, and R. Basri. Global motion esti- mation from point matches. In 3D Imaging, Modeling, Pro-

8 cessing, Visualization and Transmission (3DIMPVT), 2012 [17] S. Maji, N. K. Vishnoi, and J. Malik. Biased normalized Second International Conference on, pages 81–88. IEEE, cuts. In IEEE Conference on Computer Vision and Pattern 2012.1,2 Recognition (CVPR), June 2011.2 [2] X. Bai, H. Yu, and E. R. Hancock. Graph matching using [18] S. Maji, N. K. Vishnoi, and J. Malik. Biased normalized spectral embedding and alignment. In Proceedings of the cuts. In IEEE Conference on Computer Vision and Pattern 17th International Conference on Pattern Recognition (2004), Recognition (CVPR), pages 2057–2064, 2011.6 volume 3, pages 398–401, 2004.2 [19] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database [3] S. Boyd and L. Vandenberghe. Semidefinite programming of human segmented natural images and its application to relaxations of non-convex problems in control and combinato- evaluating segmentation algorithms and measuring ecological rial optimization. In Communications, Computation, Control, statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, and Signal Processing, pages 279–287. Springer, 1997.1 pages 416–423, July 2001.6 [4] S. Burer and R. D. Monteiro. A [20] K. Mitra, S. Sheorey, and R. Chellappa. Large-scale matrix algorithm for solving semidefinite programs via low-rank factorization with missing data under additional constraints. factorization. Mathematical Programming, 95(2):329–357, In Advances in Neural Information Processing Systems, pages 2003.3,5 1651–1659, 2010.1,2 [5] E. J. Candes,` X. Li, and M. Soltanolkotabi. Phase retrieval via [21] P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using wirtinger flow: Theory and algorithms. IEEE Transactions alternating minimization. In Advances in Neural Information on Information Theory, 61(4):1985–2007, 2015.1,3,4,5 Processing Systems, pages 2796–2804, 2013.4,5 [6] J. Douglas Jr. and J. E. Gunn. Efficient online and batch [22] T. Okatani and K. Deguchi. On the wiberg algorithm for learning using forward backward splitting. Numerische Math- matrix factorization in the presence of missing components. ematik, 6:428–453, 1964.4 International Journal of Computer Vision, 72(3):329–337, [7] J. C. Duchi and Y. Singer. Efficient online and batch learn- 2007.3,5 ing using forward backward splitting. Journal of Machine [23] C. Olsson, A. P. Eriksson, and F. Kahl. Solving large scale Learning Research, 10:2899–2934, 2009.4 binary quadratic problems: Spectral methods vs. semidefinite [8] A. Ecker, A. D. Jepson, and K. N. Kutulakos. Semidefi- programming. In IEEE Conference on Computer Vision and nite programming heuristics for surface reconstruction am- Pattern Recognition (CVPR), pages 1–8, 2007.2,3 biguities. In Computer Vision–ECCV 2008, pages 127–140. [24] C. Schellewald and C. Schnorr.¨ Probabilistic subgraph match- Springer, 2008.1,2,3 ing based on convex relaxation. In Energy minimization [9] M. X. Goemans and D. P. Williamson. Improved approxima- methods in computer vision and pattern recognition, pages tion algorithms for maximum cut and satisfiability problems 171–186. Springer, 2005.2,6 using semidefinite programming. Journal of the ACM (JACM), [25] C. Shen, J. Kim, and L. Wang. A scalable dual approach to 42(6):1115–1145, 1995.1,2,3 semidefinite metric learning. In IEEE Conference on Com- [10] M. Heiler, J. Keuchel, and C. Schnorr.¨ Semidefinite clustering puter Vision and Pattern Recognition (CVPR), pages 2601– for image segmentation with a-priori knowledge. In Pattern 2608. IEEE, 2011.1 Recognition, pages 309–317. Springer, 2005.1,6 [26] J. Shi and J. Malik. Normalized cuts and image segmenta- [11] A. Joulin, F. Bach, and J. Ponce. Discriminative clustering tion. IEEE Transactions on Pattern Analysis and Machine for image co-segmentation. In Proceedings of the Conference Intelligence, 22(8):888–905, 2000.1,2,5,6 on Computer Vision and Pattern Recognition (CVPR), 2010. 2,7 [27] S. Shirdhonkar and D. W. Jacobs. Non-negative lighting and specular object recognition. In Computer Vision, 2005. ICCV [12] M. Journee,´ F. Bach, P.-A. Absil, and R. Sepulchre. Low-rank 2005. Tenth IEEE International Conference on, volume 2, optimization on the cone of positive semidefinite matrices. pages 1323–1330. IEEE, 2005.1,2 SIAM Journal on Optimization, 20(5):2327–2351, 2010.7 [13] J. Keuchel, C. Schno, C. Schellewald, and D. Cremers. Bi- [28] J. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for nary partitioning, perceptual grouping, and restoration with optimization over symmetric cones, 1998.2,6 semidefinite programming. IEEE Transactions on Pattern [29] K. C. Toh, M. Todd, and R. Tutuncu. Sdpt3 - a matlab soft- Analysis and Machine Intelligence, 25(11):1364–1379, 2003. ware package for semidefinite programming. Optimization 1,2,6 Methods and Software, 11:545–581, 1998.2,6 [14] J. B. Lasserre. An explicit exact sdp relaxation for nonlinear [30] P. H. Torr. Solving markov random fields using semi definite 0-1 programs. In and Combinatorial programming. In Artificial Intelligence and Statistics, 2003. Optimization, pages 293–303. Springer, 2001.1 1,2 [15] J. D. Lee, B. Recht, N. Srebro, J. Tropp, and R. R. Salakhut- [31] L. Vandenberghe and S. Boyd. Semidefinite programming. dinov. Practical large-scale optimization for max-norm reg- SIAM review, 38(1):49–95, July 1996.1,2,5 ularization. In Advances in Neural Information Processing [32] A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable Systems, pages 1297–1305, 2010.1,3,5 library of computer vision algorithms (2008), 2012.6,7 [16] Z.-Q. Luo, W.-k. Ma, A. M.-C. So, Y. Ye, and S. Zhang. [33] C. Wang, N. Komodakis, and N. Paragios. Markov random Semidefinite relaxation of quadratic optimization problems. field modeling, inference & learning in computer vision & IEEE Signal Processing Magazine, 27(3):20–34, May 2010. image understanding: A survey. Computer Vision and Image 1,2 Understanding, 117(11):1610–1627, 2013.2

9 [34] P. Wang, C. Shen, and A. van den Hengel. A fast semidefi- nite approach to solving binary quadratic problems. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2013.2,3,6,7 [35] P. Wang, C. Shen, and A. van den Hengel. Efficient sdp inference for fully-connected crfs based on low-rank decom- position. June 2015.2 [36] S. Wang and J. M. Siskind. Image segmentation with ratio cut. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:675–690, 2003.5 [37] K. Q. Weinberger and L. K. Saul. Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1):77–90, 2006.1,2 [38] Y. Xu and W. Yin. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Jour- nal on imaging sciences, 6(3):1758–1789, 2013.5 [39] S. X. Yu and J. Shi. Segmentation given partial grouping con- straints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):173–183, 2004.6

10