Exponential Family Bipartite Matching
Total Page:16
File Type:pdf, Size:1020Kb
The Problem The Model Model Estimation Experiments Exponential Family Bipartite Matching Tibério Caetano (with James Petterson, Julian McAuley and Jin Yu) Statistical Machine Learning Group NICTA and Australian National University Canberra, Australia http://tiberiocaetano.com AML Workshop, NIPS 2008 nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 1 / 38 The Problem The Model Model Estimation Experiments Outline The Problem The Model Model Estimation Experiments nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 2 / 38 The Problem The Model Model Estimation Experiments The Problem nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 3 / 38 The Problem The Model Model Estimation Experiments Overview 1 1 2 2 6 6 3 5 5 3 4 4 nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 4 / 38 The Problem The Model Model Estimation Experiments Overview nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 5 / 38 The Problem The Model Model Estimation Experiments Assumptions Each couple ij has a pairwise happiness score cij Monogamy is enforced, and no person can be unmatched Goal is to maximize overall happiness nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 6 / 38 The Problem The Model Model Estimation Experiments Applications Image Matching (Taskar 2004, Caetano et al. 2007) nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 7 / 38 The Problem The Model Model Estimation Experiments Applications Machine Translation (Taskar et al. 2005) nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 8 / 38 The Problem The Model Model Estimation Experiments Formulation Goal: Find a perfect match in a complete bipartite graph: nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 9 / 38 The Problem The Model Model Estimation Experiments Formulation Solution is a permutation y : 1;:::; n 1;:::; n f g 7! f g Aggregate pairwise happiness of a collective marriage y: i ciy(i) Best collective marriageP y ∗: ∗ y = argmaxy i ciy(i) P Maximum-Weight Perfect Bipartite Matching Problem. Also called Assignment Problem. Exactly solvable in O(n3) (e.g. Hungarian algorithm) nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 10 / 38 The Problem The Model Model Estimation Experiments The Model nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 11 / 38 The Problem The Model Model Estimation Experiments Our Contribution We relax the assumption that we know the scores cij . 1 d In reality we measure edge features xij = (xij ;:::; xij ) Instead we parameterize the edge features and perform Maximum-Likelihood Estimation of the parameters. cij = f (xij ; θ) nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 12 / 38 The Problem The Model Model Estimation Experiments Maximum-Likelihood Marriage Estimator Probability of a match given a graph: p(y x; theta) = exp( φ(x; y); θ g(x; θ)) j h i − y = match; x = graph Most likely match: y ∗ = argmax φ(x; y); θ y h i IDEA: construct φ(x; y) such that y ∗ agrees with best match φ(x; y); θ = c ( ) h i i iy i P nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 13 / 38 The Problem The Model Model Estimation Experiments Maximum-Likelihood Marriage Estimator φ(x; y); θ = c ( ) suggests: h i i iy i φ(x; y) = i Piy(i) ciy(i) = iyP(i); θ I.e. the pairwise happiness is now parameterized. I.e. the goal will be to learn which features of people are more relevant to make them happier collectively (not individually!!) nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 14 / 38 The Problem The Model Model Estimation Experiments Maximum-Likelihood Marriage Estimation `(Y X; θ) = N (g(x n; θ) φ(x n; y n); θ ) j n=1 − h i P Partition function: N N exp(g) = y exp i=1 ciy(i) = exp ciy(i) y i=1 P P X Y :=Biy(i) =Permanent| of{z matrix} B Permanent: ]P-complete | {z } nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 15 / 38 The Problem The Model Model Estimation Experiments Maximum-Likelihood Marriage Estimation For learning we need to do gradient descent in `(θ): N n n n θ`(X; Y ; θ) = θg(x ; θ) φ(x ; y ) r n=1 r − P BAD NEWS θg(x; θ) = φ(x; y)p(y x; θ) = E ∼ ( j θ)[φ(x; y)] r y j y p y x P nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 16 / 38 The Problem The Model Model Estimation Experiments Model Estimation nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 17 / 38 The Problem The Model Model Estimation Experiments Maximum-Likelihood Marriage Estimation GOOD NEWS A sampler of perfect matches has been recently proposed (Huber & Law, SODA ’08), which is O(n4 log n) to generate a sample. This sampler is EXACT. Previous fastest sampler (Jerrum, Sinclair & Vigoda J. ACM ’04) was O(n7 log4 n) and was INEXACT (truncated Markov Chain). This was IMPRACTICAL. nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 18 / 38 The Problem The Model Model Estimation Experiments General Idea of Sampler Construct an upper bound on the partition function Use self-reducibility of permutations to generate successive upper bounds of partial partition functions Use sequence of upper bounds to generate an accept-reject algorithm nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 19 / 38 The Problem The Model Model Estimation Experiments General Idea of Sampler 1 1 1 1 2 2 2 2 3 3 3 3 Ω = X; Ω = 6 Ω1 = x : x1;2 = 1 ; Ω1 = 2 j j f g j j 1 1 2 2 3 3 Ω2 = x : x1;2 = 1; x2;3 = 1 ; Ω2 = 1 f g j j nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 20 / 38 The Problem The Model Model Estimation Experiments The Sampler Let x be a sample obtained after the algorithm is run. Then: p(Ω) = y2Ω w(y) = Z p(Ω1) = w(y) P y2Ω1 p(Ω2) = w(y) = w(x) Py2Ω2 Its probabilityP is: U(Ω1) U(Ω2) = U(Ω2) = w(x) U(Ω) U(Ω1) U(Ω) U(Ω) Z But the probability of accepting is U(Ω) So p(x) = w(x)=U(Ω) = w(x) EXACT SAMPLER Z=U(Ω) Z ) nicta-logo Tibério Caetano: Exponential Family Bipartite Matching 21 / 38 The purpose of this work is to present a perfect sam- possible, then applying the idea in [12] using the new pling algorithm that generates random variates exactly generalization of Bregman’s theorem. from π where the running time of the algorithm is it- For simplicity, in this section the algorithm is self a random variable. In addition, the time used to presented only for 0-1 matrices, and where the δ of generate these variates can be used to approximate the Theorem 1.1 lies in (0, 1]. For arbitrary nonnegative permanent of the matrix at no additional cost. Finally, matrices, see Section 5. Let A be a 0-1 matrix. Then the expected running time of the method is polynomial to determine if the permanent is zero or nonzero, the when the original problem is dense. Hopcroft and Karp algorithm [11] can be used to find a permutation in O(n2.5) time (for weighted bipartite Theorem 1.1. For any 0 and δ (0, 1], there ex- graphs, the Hungarian Algorithm [1] can be employed, ≥ ∈ ists a randomized approximation algorithm whose out- but takes O(n3) time.) When the permanent is nonzero, put comes within a factor of 1 + δ of the permanent of a this method finds a permutation σ with A(i, σ(i)) = 1 nonnegative matrix A with probability at least 1 with for all i, so per(A) 1. Then changing any zeros in A − 1≥ random running time T satisfying: for a function R(n), to α1 = (δ/3)(n!)− increases the permanent by at most a factor of 1 + δ/3. 4 2 1 (1.3) E[T ] = O(n log n + R(n)δ− log − ), 1 s/2 (1.4) P(T > sE[T ]) 2 − , for all s > 0. Definition 2.1. A 0-1 matrix A is γ-dense if every ≤ row and column sum is at least γn. When A is a 0-1 matrix such that all the row and column sums are at least γn and γ (0.5, 1], then When a matrix is (1/2)-dense the choice of α1 need 1.5+0.5/(2γ 1) ∈ R(n) = O(n − ). In particular, if γ .6, not be so extreme. Work in [5, 14] implies that for (1/2)- The Problem The Model 4 2Model Estimation1 ≥ Experiments 3 then the running time is O(n [log n + δ− log − ]). dense matrices, when α1 = (δ/3)n− , the permanent increases by at most a factor of 1 + δ/3. The UpperThe primary Bound tool in this algorithm is a version of Phase I: Nearly Doubly Stochastic Scaling. Bregman’s Theorem similar in form to an inequality of For diagonal matrices X and Y , XAY is a scaling of the Soules [26]. Let us first define: matrix A where each row i is multiplied by X(i, i) and each column j is multiplied by Y (j, j). In our work A r + (1/2) ln(r) + e 1, r 1 must be scaled to be nearly doubly stochastic so that the (1.5) h(r) = − ≥ 1 + (e 1)r, r [0, 1] rows and columns each sum to almost 1. To be precise, − ∈ following the same definition in [18], we say diagonal Our bound is as follows: matrices X and Y scale A to accuracy α2 if Theorem 1.2. Let A be a matrix with entries in [0, 1]. T (2.7) XAY ~1 ~1 < α2, YA X~1 ~1 < α2, Let r(i) be the sum of the ith row of the matrix.