ASYMMETRIC SPARSE HASHING Xin Gao†, Fumin Shen†∗, Yang
Total Page:16
File Type:pdf, Size:1020Kb
ASYMMETRIC SPARSE HASHING Xin Gaoy, Fumin Shen†∗, Yang Yangy, Xing Xuy, Hanxi Liz, Heng Tao Sheny y Center for Future Media & School of Computer Science and Engineering, University of Electronic Science and Technology of China, China zJiangXi Normal University, China gx [email protected], [email protected], [email protected] [email protected], [email protected], [email protected] ABSTRACT [10], image classification [11][12], sketch based retrieval [13], action recognition [14], etc. Learning based hashing has become increasingly popular A common problem of hashing is to generate similarity- because of its high efficiency in handling the large scale image preserving hash functions, which encode similar inputs into retrieval. Preserving the pairwise similarities of data points in nearby binary codes. Recently proposed hashing algorithms the Hamming space is critical in state-of-the-art hashing tech- are focused on learning compact codes from data, a.k.a. niques. However, most previous methods ignore to capture learning to hash, which can be roughly grouped into two the local geometric structure residing on original data, which main categories: unsupervised and supervised hashing. Rep- is essential for similarity search. In this paper, we propose resentative unsupervised hashing methods include Spectral a novel hashing framework, which simultaneously optimizes Hashing (SH) [1], Iterative Quantization (ITQ) [15], Induc- similarity preserving hash codes and reconstructs the locally tive Manifold Hashing (IMH) [16][17], etc. Supervised linear structures of data in the Hamming space. In specific, methods try to learn hash code by leveraging the supervised we learn two hash functions such that the resulting two sets semantic information from data. Many famous methods fall of binary codes can well preserve the pairwise similarity and in this category, such as Binary Reconstructive Embedding sparse neighborhood in the original feature space. By taking (BRE)[18], Kernel-Based Supervised Hashing (KSH) [19] advantage of the flexibility of asymmetric hash functions, and Supervised Discrete Hashing (SDH) [11] . we devise an efficient alternating algorithm to optimize the The appealing property of similarity-preserving hashing hash coding function and high-quality binary codes jointly. methods is to minimize the gap between the similarities given We evaluate the proposed method on several large-scale in the original space and in the hash coding space. In image datasets, and the results demonstrate it significantly the hashing literature, pairwise similarity is widely adopt- outperforms recent state-of-the-art hashing methods on large- ed to make the distances of similarity pairs in the input scale image retrieval problems. and Hamming space as consistent as possible. However, Index Terms— binary code, asymmetric hashing, pair- most previous hashing approaches fail to explore the local wise affinity, sparse representation neighborhood structure from data, which is essential for the similarity search. It is shown that data points in the high- 1. INTRODUCTION dimensional feature space may lie on a low-dimensional manifold, where each point can be represented by a linear Hashing has become a popular and efficient technique combination of its neighbors on the same manifold [16]. The for efficient large-scale image and video retrieval recently linear representation is usually sparse and can well capture [1][2][3][4][5][6][7][8][9]. Hashing methods aim to map the local geometric structure of manifolds, which contains the original high-dimensional feature to compact binary code valuable information for nearest neighbor search. and preserve the semantic structure of the original feature in Inspired by that, in this work, we aim to learn binary the Hamming space simultaneously. It turns out that using codes not only preserving the pairwise similarities as well as compact binary codes is much more efficient due to the em- capturing the underlying neighborhood structure of local data, ployment of extremely fast Hamming distance computation. where the core idea is to construct asymmetric hash func- Hashing can be potentially used in recommendation systems tions that simultaneously preserve pairwise affinity and the local neighborhood structure while mapping to the Hamming ∗Corresponding author: Fumin Shen. This work was supported in space. To achieve this, we optimize the asymmetric binary part by the National Natural Science Foundation of China under Project code inner product to minimize the Hamming distance on 61502081, Project 61673299, Project 61572108, Project 61602089 and Project 61632007, in part by the Fundamental Research Funds for the Central similar pairs and maximize on dissimilar pairs. Meanwhile, Universities under Project ZYGX2014Z007. by analyzing the sparse representation of original feature 978-1-5090-6067-2/17/$31.00 c 2017 IEEE space, we show how to capture the local linearity of manifold where X and A denotes two sets of data points, S is the and preserve the learned linear structure in low-dimensional similarity matrix between A and X and k·k is the Frobenius Hamming space. Our main contributions include: norm. AIBC has been shown to achieve promising results for • We address the importance of preserving the pairwise the MIPS problem. As aforementioned, however, AIBC fails similarity and local geometric structure of data in the to preserve the local neighborhood structure of data in the Hamming space. To this end, we propose a nov- Hamming space, which is suboptimal for similarity search. el asymmetric sparse hashing (ASH) framework that Also, the involved optimization problem in AIBC differs from jointly optimizes the Hamming affinity and encodes the the one presented in this work, which further explores the geometric structure into binary codes. orthogonality of hash functions and sparsity preservation. • To generate better binary codes, we impose the orthog- 2.2. Asymmetric sparse hashing onality constraint on the projection vectors. To solve the associated binary optimization problem, we adopt Let us first introduce some notations. Suppose we have two an efficient algorithm, which alternatingly optimizes sets of points: X = [x1; x2 ··· xm] and A = [a1; a2 ··· an], d×1 d×1 over binary codes and hash functions in an iterative where xi 2 R , aj 2 R . Let denote by @E(x) way. the set of neighborhood samples of x in dataset A. Let xi be the input sample and A the dictionary, and we assume • We extensive evaluate the proposed ASH approach on xi can be represented as a sparse linear combination of the the image retrieval tasks and the experiments show that dictionary samples in @E(x) ⊂ A. That is, to relate xi our binary coding method significantly outperforms the with its neighbors, we choose to solve the following objective state-of-the-art on several large-scale datasets. to obtain its linear representation coefficients in terms of A, which contain the geometric structure information around xi: 2. THE PROPOSED APPROACH 1 2 min kxi − Acik2 + λkcik1 (3) In this section, we first briefly review the asymmetric hash ci 2 function algorithm and then describe the detailed formulation s.t. ci;j = 0 if aj 2= @E(xi) of the proposed approach. An efficient alternating algorithm > is introduced to solve the resulting problem. where ci = [ci;1; ··· ; ci;n] is the coefficient vector for sample xi. We use the neighborhood constraint to favor the 2.1. Asymmetric Binary Code Learning locally sparse representation, therefore xi can be represented as a sparse linear combination of its neighbors. Problem (3) Inspired by the recent work [20], we dedicated to learning can be solved with various efficient sparse coding algorithms, asymmetric hash function by dealing with Maximum Inner where we apply the Homotopy algorithm [22] to solve this Product Search (MIPS) problem instead of ANN in this paper. `1-regularized regression problem. Solving the MIPS problem has a critical practical impact. n×m Let C = [c1; c2; ··· ; cm] 2 R be the sparse matrix Finding hashing based algorithms for MIPS was considered composed of the sparse representation of the dataset X, which hard. To solve this challenge, [21] shows that it is possible to encodes the local neighbor structure between X and A. We relax the current LSH framework to allow asymmetric hash would like to preserve this in Hamming space while learning functions which can efficiently solve MIPS. Compared to the binary codes. We denote B = h (X), B = [b1; b2; ··· ; bm] 2 LSH which has a single hash function, [21] defines two hash f1; −1gr×m is the binary code matrix of dataset X; Z = functions h(·), z(·) to compute the inner product of the query r×n z (A) , Z = [z1; z2; ··· ; zn] 2 f1; −1g is the binary and database point: code matrix of dataset A. We have the following optimization p = argmax h(x)T z(a) (1) problem: a2S where h(·), z(·) is the hash functions for the input query X X X 2 vector and data. ALSH is shown to be efficient to solve the min bi − cijzj = kh(xi) − C:iz(A)k2 bi;zj MIPS problem both theoretically and empirically. i j Motivated by ALSH, the asymmetric-inner product binary = kh(X) − Cz(A)k2: (4) coding (AIBC) work [20] advocates the asymmetric form of hash function, which is learned from data rather than which C:i is the i-th column of matrix C. Minimizing the generated randomly. In [20], the Hamming affinity by the above objective function can well preserve the locally linear code inner product is optimized to match the ground truth data structure in the Hamming space. The local linear hashing inner products as following: (LLH) work [23] also studies a similar problem as (4) for 2 hashing. However LLH is different from our work by the fol- T min h(X) z(A) − S (2) lowing aspects: 1) we are focusing on the asymmetric hashing for MIPS problem while LLH is designed with a symmetric equivalent to: setting; 2) our model learns orthogonal hash functions while min kB − WXk2 (10) LLH does not; 3) we optimize binary codes in a discrete way, W while a relaxed problem (minimizing quantization loss with s.t.