Fast Algorithm for Anchor Graph Hashing Yasuhiro Fujiwara Sekitoshi Kanai Yasutoshi Ida NTT Communication Science NTT Software Innovation Center NTT Software Innovation Center Laboratories Keio University [email protected] [email protected] [email protected] Atsutoshi Kumagai Naonori Ueda NTT Software Innovation Center NTT Communication Science [email protected] Laboratories [email protected]

ABSTRACT for nearest neighbor search, such as R-tree [15], K-D tree [2], and SR- Anchor graph hashing is used in many applications such as cancer tree [26], which were proposed in the 1980s and 1990s. They need detection, web page classifcation, and drug discovery. It computes � (log�) query time where � is the number of data points. Although the hash codes from the eigenvectors of the matrix representing these approaches yield accurate results, they are not time-efcient the similarities between data points and anchor points; anchors for high-dimensional data due to the curse of dimensionality [18]. refer to the points representing the data distribution. In perform- Specifcally, when the dimensionality exceeds about ten, they are ing an approximate nearest neighbor search, the hash codes of a slower than the brute-force, linear-scan approach [51]. In these ap- query data point are determined by identifying its closest anchor proaches, most of the query cost is spent on verifying a data point points. Anchor graph hashing, however, incurs high computation as a real nearest neighbor [4]. Since the linear-scan approach’s cost since (1) the computation cost of obtaining the eigenvectors is cost increases linearly with dataset size, the query cost should be quadratic to the number of anchor points, and (2) the similarities of constant or sub-linear [47]. Approximate nearest neighbor search the query data point to all the anchor points must be computed. Our improves efciency by relaxing the precision of verifcation, and several approaches were proposed such as Clindex [28], MEDRANK proposal, Tridiagonal hashing, increases the efciency of anchor graph hashing because of its two advances: (1) we apply a graph [6], and SASH [20] in the database community in the 2000s. How- clustering algorithm to compute the eigenvectors from the tridi- ever, these approaches do not ofer sub-linear growth of query cost agonal matrix obtained from the similarities between data points in the worst case. and anchor points, and (2) we detect anchor points closest to the Hashing techniques transform data records into short fxed- query data point by using a dimensionality reduction approach. length codes to reduce storage costs and ofer approximate nearest Experiments show that our approach is several orders of magnitude neighbor search with constant or sub-linear time complexity. The faster than the previous approaches. Besides, it yields high search approximate nearest neighbor search consists of two phases: ofine accuracy than the original anchor graph hashing approach. and online. The ofine phase computes hash codes from the data. The online phase fnds the nearest neighbors to the query data point by hashing (compressing) the query. Locality Sensitive Hashing PVLDB Reference Format: (LSH) is one of the most popular hashing techniques [13]. It uses Yasuhiro Fujiwara, Sekitoshi Kanai, Yasutoshi Ida, Atsutoshi Kumagai, and Naonori Ueda. Fast Algorithm for Anchor Graph Hashing. PVLDB, random projection to map data points into hash codes, and it is 14(6): 916 - 928, 2021. more efcient than a tree-based scheme in searching for nearest doi:10.14778/3447689.3447696 neighbors as it needs fewer disk accesses [13]. In the 2010s, several database researchers proposed LSH variants such as C2LSH [12], 1 INTRODUCTION SRS [46], and QALSH [22]. However, the LSH-based approaches yield inadequate search accuracy since LSH employs random pro- Massive sets of high-dimensional data are now being stored day jections [30]. As a result, they require long hash codes (say, 1024 after day with the rapid development of database systems. This bits) to achieve satisfactory search accuracy [14]. This is because imposes a fundamental challenge to efciently and accurately pro- the collision probability of two codes having the same hash codes cess millions of records of diferent data types such as images, falls as code length increases [37]. However, long codes increase audio, and video [8, 10, 11, 23, 24]. Nearest neighbor search on high- search and storage costs [29, 35]. dimensional data is a fundamental research topic [9, 42]. One con- Anchor graph hashing was proposed to improve the search ef- ventional approach to address the problem uses tree-based schemes ciency and accuracy [36]; anchors refer to the points representing the data distribution in the high-dimensional space. Technically, This work is licensed under the Creative Commons BY-NC-ND 4.0 International it is based on spectral hashing that computes eigendecomposition License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of for the similarity matrix among data points to obtain hash codes. this license. For any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights Although spectral hashing achieves higher search accuracy than licensed to the VLDB Endowment. LSH, it is difcult to apply to large-scale data since it requires Proceedings of the VLDB Endowment, Vol. 14, No. 6 ISSN 2150-8097. 2 2 doi:10.14778/3447689.3447696 � (� �) time and � (� ) space to obtain all pairwise similarities

916 Table 1: Defnitions of main symbols. Anchor (single-thread) Anchor (single-thread) 5 10 Anchor (40-threads) 0.55 Anchor (40-threads) Symbol Defnition Proposed (single-thread) Proposed (single-thread) Proposed (40-threads) Proposed (40-threads) � Number of data points 0.50 � Number of dimensions 4 10 MAP � Number of anchor points 0.45 Wall clock time [s] � Number of bits in hash codes � Number of closest anchor points 3 10 0.40 X Set of data points (1-1) Ofine phase time (1-2) Search accuracy U Set of anchor points Figure 1: Ofline phase time and MAP for wesad. When a sin- �� �-th eigenvalue to compute hash codes gle thread is used, our approach (“Proposed (single-thread)”) x� �-th data point vector of length � is much faster than the original approach of anchor graph u� �-th anchor point vector of length � hashing (“Anchor (single-thread)”), and our approach’s ef- v� �-th eigenvector to compute hash codes ciency improves with the use of multiple threads (“Proposed Y Embedding matrix of size � × � ( -threads)”). Moreover, our approach yields higher search Z Similarity matrix of size � × � 40 Approximate adjacency matrix of size accuracy than the original approach. We will describe the A � × � of size � � parameter settings detailed in Section 5. M × T of size � × � where � is the number of dimensions. To overcome this drawback, anchor graph hashing computes similarities between data points as the query data point. In this fgure, “Proposed (single-thread)” by exploiting the low-rank property of the approximate adjacency and “Anchor (single-thread)” indicate our approach and the original matrix computed from the similarity between data points and an- anchor graph hashing using a single thread, respectively. “Proposed chor points. As shown in previous studies [31, 34, 36], LSH ofers (40-thread)” corresponds to our approach using 40 threads, and lower search accuracy than other hashing techniques, and anchor “Anchor (40-thread)” is the original approach parallelized by the graph hashing outperforms recent hashing techniques such as bi- proposed approach. As shown in Figure 1-1, our approach is more nary reconstruction embeddings [27], discrete graph hashing [34], efcient than the original approach, and it can increase efciency by and binary autoencoder [3] in terms of efciency. Due to its efec- using multiple threads. As a result, our approach can compute hash tiveness, anchor graph hashing can be used in many applications codes within a half-hour. On the other hand, the original approach such as cancer detection [32], web page classifcation [54], and drug with a single thread takes over eight hours, and it takes nearly two discovery [53]. However, its computation time is high for large- hours even if it uses multiple threads. As shown in Figure 1-2, our scale data. In the ofine phase, to obtain hash codes, it computes approach yields higher search accuracy than the original approach eigenvectors of the matrix obtained from the similarities between since it guarantees no false negatives in the search results. These 2 data points and anchor points. However, it needs � (�� ) time to results indicate that our approach enhances the efciency of anchor obtain the eigenvectors where � is the number of anchor points; graph hashing while achieving high search accuracy. In summary, the computation cost is quadratic to the number of anchor points. the main contributions of this paper are as follows: The online phase fnds the closest anchor points to a query data We propose an efcient approach to anchor graph hashing point from all anchor points to perform the nearest neighbor search. • that computes eigenvectors using a graph clustering algo- Therefore, anchor graph hashing sufers high computation cost. rithm and prunes similarity computations in fnding the This paper proposes , a novel and efcient Tridiagonal hashing closest anchor points by using approximate distances. approach to anchor graph hashing. Its advantage is that it can im- The proposed approach yields higher search accuracy than prove the efciency and search accuracy of anchor graph hashing. • the original approach of anchor graph hashing. This is be- The ofine phase of our approach efciently computes the eigen- cause the proposed approach is theoretically designed to vectors by applying a graph clustering algorithm to the tridiagonal yield the search results of no false negatives. matrix obtained from a similarity matrix between data points and Experiments confrm that our approach achieves higher anchor points. Moreover, the online phase computes approximate • search accuracy while reducing the time to compute the distances using a dimensionality reduction approach to identify an- hash codes compared to the previous approaches; it is up to chor points closest to the query data point efciently. Note that our 3195.7 times faster than the existing approaches. approach is easily parallelized to improve efciency. Besides, we can improve the search accuracy because our approach recursively In the remainder of this paper, we give preliminaries in Section 2, reduces the number of bits in hash codes in fnding similar data introduce our approach in Section 3, describe related work in Sec- points. Our approach is theoretically guaranteed to fnd all data tion 4, show experimental results in Section 5, and provide our points found by the original anchor graph hashing; it is guaranteed conclusions in Section 6. to have no false negatives in the search results. Figure 1 shows the ofine phase time for the wesad dataset, 2 PRELIMINARIES a dataset of stress-afect lab study, including over sixty million We introduce the background to this paper. Table 1 lists the main data points. To assess search accuracy, the fgure also shows Mean symbols. Let � and � be the number of data points and dimensions, Average Precision (MAP) [1] for fnding data points of the same label respectively. Let row vector x� = [�� [1], . . . , �� [�]] correspond to

917 the �-th data point; a set of data points is given as X ,..., . � largest eigenvector-eigenvalue pairs , � � such that 1 > = {x1 x� } {(v� � )}�=1 If � (� < �) is the number of anchor points, U = {u1,..., u� } is �1 ≥ ... ≥ �� > 0 by ignoring the largest eigenvalue �0 = 1 [36]. � � � � a set of anchor points. Anchor points are randomly sampled from If V = [v1,..., v� ] ∈ R × , Σ = diag (�1, . . . , �� ) ∈ R × , and √ 1 1 the data points or centers of k-means clusters for the data points W = �Λ− 2 VΣ− 2 , it obtains embedding matrix Y as follows [36]: [33]. If � is the number of bits in hash codes and Y ∈ {1, −1}�×� is an embedding of � data points, anchor graph hashing computes the Y = sgn (ZW) (4) embedding by minimizing where sgn(·) is the sign function. 1 Í� 2 ⊤ In the online phase, for query data point x, it computes column minY 2 �,� 1 ∥y� − y� ∥ �[�][�] = tr (Y (D − A)Y) = (1) vector z(x) of length � as follows: s.t. 1, 1 �×� , ⊤ , ⊤ � Y ∈ { − } 1 Y = 0 Y Y = I 2 2 ⊤ [�1 exp(−� (x,u1)/�),...,�� exp(−� (x,u�)/�)] (5) In this equation, is the �-th row of representing the �-bits code z(x) = Í� � exp �2 , � y� Y �=1 � (− (x u� )/ )) for data point � , ∥ · ∥ is the �2-norm of a vector, is the � ×� adja- x A where �� ∈ {1, 0} and �� = 1 if and only if u� is one of the � closest cency matrix where �[�][�] corresponds to the similarity between anchor points of x in U; �� = 0 otherwise. Let y be a row vector data point x� and x� , D is a diagonal matrix given as D = diag (A1) of length � that corresponds to �-bits codes for data point , it is ⊤ � x with 1 = [1,..., 1] ∈ R , and I is the identity matrix. The con- computed as follows [36]: straint 1⊤Y = 0 is imposed to maximize each bit’s information, and Y⊤Y = �I forces � bits to be mutually uncorrelated to minimize y = sgn (z(x)W) (6) redundancy among bits. Intuitively, anchor graph hashing embeds Even though anchor graph hashing can efectively compute hash the data points in a low-dimensional Hamming space such that the codes, it incurs high computation cost. The ofine phase needs neighbors in the original space remain neighbors. � (���+��2+���) time to compute hash codes for the given anchor Equation (1) is an integer problem, and solving the problem is points described in [36]; it is quadratic to the number of anchor equivalent to balanced graph partitioning even for a single bit; this 2 points. This is because it needs � (���) time to compute Z, � (�� ) is known to be NP-hard [52]. One approach to relaxing it uses the time to compute , � �3 time to compute its eigenvectors, � ��� � � M ( ) ( ) spectral method to drop the integer constraint of Y ∈ {1, −1} × time to compute , and � �� time to compute . As a result, � � ZW ( ) Y and allow Y ∈ R × . The relaxed problem’s solution consists of in terms of computation cost, the ofine phase has a bottleneck in the � eigenvectors of with minimal eigenvalues except for the A computing eigenvectors from M. In the online phase, anchor graph eigenvalue of zero. Since these eigenvectors are orthogonal to each hashing takes � (�� +��) to compute hash codes for data point x. other, they satisfy the orthogonal constraint after multiplication √ This is because it takes � (��) time to identify the closest anchor by the scale �, that is, ⊤ � holds. Besides, they satisfy the Y Y = I points for obtaining z(x) and � (��) time to compute y. As described constraint of 1⊤Y = 0. This is because the excluded eigenvector in [36], computation cost in the online phase is dominated by the with the eigenvalue of zero is and all the other eigenvectors 1 construction of z(x). Consequently, the computation cost of anchor are orthogonal to it. To obtain the binary codes, the binarization graph hashing is high when handling large-scale data. operation with threshold zero is performed over the solution Y, which forms a Hamming embedding. 3 PROPOSED METHOD Anchor graph hashing approximately computes adjacency ma- This section explains the proposed approach. Section 3.1 shows trix by using anchor points. If is an � � similarity matrix A Z × how our approach computes eigenvectors of the tridiagonal matrix between data points and anchor points, it computes matrix as Z obtained from similarities between data points and anchor points 2  exp(−� (x�,u� )/�) , � � in the ofine phase. Section 3.2 details how our approach fnds the  Í exp �2 , � ∀ ∈ ⟨ ⟩ � [�][�] = �∈⟨�⟩ (− (x� u� )/ ) (2) closest anchor points using a dimensionality reduction approach. 0 otherwise  , Section 3.3 details our algorithms and their properties. Section 3.4  In this equation, ⟨�⟩ are the indices of the � (� ≪ �) closest anchor proposes implementation-oriented approaches to improve our ap- proach’s efciency and search accuracy. We show proofs of lemmas points of x� in U according to �2 distance function �(·); � (� > 0) is and theorems in this section in the Appendix. the bandwidth parameter. Using Z, approximate adjacency matrix can be computed as = −1 ⊤ where = diag ( ⊤ ) [33]. A A ZΛ Z Λ Z 1 3.1 Ofline Phase Computation Each row of Z contains � nonzero elements, which sum to 1 from Equation (2). By following the original paper [36], we assume that In this section, we show our approach to computing � eigenvector- all data points are connected in the graph given by A. eigenvalue pairs to obtain hash codes in the ofine phase. As men- The ofine phase computes the largest eigenvalues �� (� ≥ 0) of tioned in Section 2, anchor graph hashing needs high computation the following � × � symmetric matrix M to obtain hash codes: cost to obtain eigenvectors of the largest eigenvalues. Although the 1 1 power method is a well-known approach to computing eigenval- − ⊤ − M = Λ 2 Z ZΛ 2 (3) ues, we do not use the approach since it is computationally slow; This is because the minimal eigenvalues of A correspond to the it needs � (�3) time to solve the eigenproblem [41]. Instead, we maximal eigenvalues of M. If all the data points are connected, we exploit the bisection method since it can efciently compute just a have �0 = 1; the largest eigenvalue of M is 1. However, �0 = 1 subset of approximate eigenvalues; other approaches such as the QR yields the eigenvector of 1, and such the eigenvector is not useful in algorithm and the divide-and-conquer algorithm compute all the computing hash codes. Therefore, anchor graph hashing computes eigenvalues [7]. We compute the tridiagonal matrix directly from

918 ⊤ ⊤ Z used in the bisection method; we do not compute M to obtain Since P is an orthonormal matrix, we have p� p� = 1 and p� p� = 0 eigenvalues. From approximate eigenvalues, we accurately com- if � ≠ �. Therefore, pute eigenvalues and eigenvectors by using inverse iteration [44]. � 2 ⊤ 2 � ⊤ ⊤ � ⊤ �2 �2 ( � ) =p� M p� − �−1p� M p�−1 − �−1p�−1Mp� − � + � 1 Although our approach does not compute M, we can efectively − ⊤ −1 ⊤ � ⊤ � ⊤ �2 �2 perform approximate nearest neighbor searches. This is because =b� ZΛ Z b� − �−1b� b�−1 − �−1b�−1b� − � + �−1 (14) the tridiagonal matrix has the same eigenvalues as M [7]. − 1 ⊤ 2 2 2 =∥Λ 2 Z b� ∥ − 2�� 1⟨b�, b� 1⟩ − � + � In Section 3.1.1, we describe the approach to compute the tridiag- − − � �−1 onal matrix. Section 3.1.2 shows our approach to compute approxi- where ⟨·⟩ represents the inner product of two vectors. Therefore, mate eigenvalues by the bisection method for the tridiagonal matrix. �� can be recast as follows: Section 3.1.3 describes the approach to compute eigenvectors from q 1 � − 2 ⊤ 2 2� , �2 �2 (15) the obtained approximate eigenvalues by inverse iteration. � = ∥Λ Z b� ∥ − �−1⟨b� b�−1⟩ − � + �−1 p�+1 is also rewritten as follows: This section describes the 3.1.1 Tridiagonal Matrix Computation. − 1 ⊤ approach to compute the tridiagonal matrix, which has the same p�+1 = (Λ 2 Z b� − ��−1p�−1 − �� p� )/�� (16) eigenvalues as M. In our approach, we use the Lanczos method Equations (12), (15), and (16) indicate that we can compute �� , �� , [7] since it is an efective way of computing a tridiagonal matrix and p� from b� . We have the following lemma for computation cost: from a symmetric matrix. If P is an � × � orthonormal matrix, the Lemma 1 (Tridiagonal Matrix Computation). It takes� (���) Lanczos method decomposes M as follows: time to compute T and P from Equation (12), (15), and (16). ⊤ M = PTP (7) Note that the original approach of the Lanczos method needs � (��2) time to compute M [36] and � (�3) time to compute T [41]. where T is an � × � symmetric tridiagonal matrix given by Therefore, this lemma indicates that we can efciently compute T � � 0 that has the same eigenvalues as M from Z without computing M.  1 1   � � �   1 2 2  3.1.2 Approximate Eigenvalues Computation. This section shows  . .   . .  how to compute approximate eigenvalues by using the bisection T =  �2 . .  (8)   method. The bisection method can fnd the �-th approximate eigen-  . .   . �� 1 �� 1  value �˜ corresponding to � in the interval [� , � ]; � and � are,  − −  � � � � � �  0 �� 1 ��  respectively, left and right bounds of �˜ such that � ≤ �˜ ≤ �  −  � � � � [7]. The bisection method computes an approximate eigenvalue by If is the �-column vector of , we have p� P using �(�); �(�) is the number of agreements in sign between con- ⊤ secutive numbers of the sequence (�0 (�), �1 (�), . . . , �� (�)) where �� = p� Mp� (9) �� (�) is given as follows [7]: �� = ∥Mp� − ��−1p�−1 − �� p� ∥ (10) 1 if � 0 � � � (11)  = p�+1 = (Mp� − �−1p�−1 − � p� )/ �  �� (�) = �1 − � if � = 1 (17) where 0 = and 1 is a random vector such that ∥ 1 ∥ = 1. We  � � � � �2 � � otherwise p 0 p p ( � − ) �−1 ( ) − �−1 �−2( ) have � 0 from Equation (10). Since is an orthonormal matrix,  � ≥ P T Here, agreement corresponds to � � � � > 0 and, for example, and have the same eigenvalue � [7]. Besides, the eigenvector of � ( ) �−1( ) M � if signs of the sequence are , , , , we have � � 1; moreover, is obtained as ′ from Equation (7) if ′ is the eigenvector (+ − − +) ( ) = M v� = Pv� v� � � 2 holds if , , , 1. It requires � � time to compute se- of corresponding to � . Although we can obtain eigenvector- ( ) = (+ − − −) ( ) T � quence � � , � � , . . . , � � from Equation (17). The bisection eigenvalue pair , � of from , it takes � �3 time to obtain ( 0 ( ) 1 ( ) � ( )) (v� � ) M T ( ) method fnds eigenvalues in an interval based on the property that T, whose size is � × �, from Equation (9), (10), and (11). 1 �(�) is equal to the number of eigenvalues not smaller than � [7]. We introduce column vector − 2 of length � to ef- b� = ZΛ p� As described in Section 2, we need to compute the � + 1 largest ciently compute . It needs � �� time to compute since the T ( ) b� eigenvalues of M to obtain hash codes. The largest eigenvalue of M number of nonzero elements in is �� and can be obtained Z b� is 1 (i.e., �0 = 1) and the smallest eigenvalue is larger than 0 [33, 50]. through matrix-vector multiplications. Note that is a diagonal Λ Therefore, to obtain eigenvalue �1, a naive approach based on the matrix of size � × �. From Equation (3), we can rewrite �� as bisection method is to set �1 = 1 and �1 = 0, and then iteratively update � and � following � � 2 until it has � � 2 ⊤ − 1 ⊤ − 1 ⊤ 2 1 1 ( 1 + 1)/ ( 1) = �� = 2 2 � = � = ∥ � ∥ (12) p� Λ Z ZΛ p b� b b and �(�1) = 1. After convergence, we accurately compute �1 by exploiting inverse iteration, where we set �˜1 = (�1 + � )/2 as As for �� , we have the following equation: 1 described in the next section. Similarly, we can compute �� (2 ≤ 2 ⊤ � �) by initializing � � and � 0. However, this naive (�� ) =(Mp� − ��−1p�−1 − �� p� ) (Mp� − ��−1p�−1 − �� p� ) ≤ � = �−1 � = approach needs a large number of iterations to update � and � . ⊤ 2 ⊤ ⊤ ⊤ ⊤ � � =p M p� − ��−1p M p�−1 − �� p M p� � � � (13) This is because all eigenvalues used in hash codes are larger than 0, � ⊤ �2 ⊤ � � ⊤ and thus � 0 is not so tight as the left bound for � , whereas we − �−1p�−1Mp� + �−1p�−1p�−1 + � �−1p�−1p� � = � ⊤ ⊤ 2 ⊤ 1 − �� p� Mp� + �� ��−1p� p�−1 + �� p� p� If �� (�) = 0, assuming that �� (�) has the same sign as ��−1 (�).

919 can efectively obtain the right bound by setting � � since � = �−1 � > � and � � in practice. �−1 � �−1 ≈ � To initialize �� , we use a graph clustering algorithm in computing eigenvalues in the order of �1, �2, . . . , �� . Since �� is the (� + 1)-th ′ largest eigenvalue of T, if V� is an � × (� + 1) matrix of eigenvectors, ′ Figure 2: Chain graph of Figure 3: Clusters in the V� gives the optimal solution for the following problem [19]; tridiagonal matrix . chain graph. ′ ⊤ ′ ′ ⊤ T max ′ tr (( ) ) s.t. ( ) � = (18) V� V� TV� V� V I In this equation, we have tr ′ ⊤ Í� � where � 1. edge that corresponds to � , we compute the cut by identifying ((V� ) TV� ) = �=0 � 0 = � ⊤ the edge that maximizes Δ� (� ) in the chain graph of � clusters. To obtain �� , we compute � × (� + 1) matrix H� such that H� H� = I � � by using a graph clustering algorithm for the set of data points X. As shown in Figure 3, if X� is the cluster that includes the edge corresponding to � , � is edge weight between cluster X and Specifcally, we compute �� as follows: � � � X�−1, and ��′ is edge weight between cluster X� and X�+1, Δ�� (�� ) Definition 1 (Left Bound). If X is partitioned into �+1 clusters is computed as follows from Equation (21): X1,..., X�+1 by a graph clustering algorithm, the �-th column vector �� +�� �� +��′ �� +��′ Δ�� (�� ) = + − (23) h� of matrix H� is given as follows: lef(�j) right(�j) |X� | ( p 1/ |X� | if x� ∈ X� In this equation, lef(�j) and right(�j) are the number of data points ℎ� [�] = (19) in the left and right sides of the edge corresponding to � in cluster 0 otherwise � X� , respectively. In the case of Figure 3, lef(�j) = 1 and right(�j) = X X where | � | is the number of data points included in cluster � and the 3 since |X� | = 4. Note that we can incrementally update lef(�j) ⊤ columns in H� are orthonormal to each other; H� H� = I. We compute and right(�j) in � (|X� |) time from the clustering results. It also � left bound � as follows: takes � (� − �) time to fnd the cut that maximizes Δ�� (�� ) from � tr ⊤ Í�−1 � (20) Equation (23). Since �� ≥ 0 from Equation (10), Δ�� (�� ) ≥ 0 from � = (H� TH� ) − �=0 � Equation (23). We can incrementally compute left bound �� based We have the following property for left bound �� : on the following lemma: Lemma 2 (Left Bound). � � For tridiagonal matrix T, � ≤ � holds. Lemma 3 (Incremental computation). Left bound �� can be computed in � (1) time as follows: This lemma indicates that we can compute left bound �� by using a graph clustering algorithm to compute �˜ . � � Δ� � � (24) � � = �−1 + � ( � ) − �−1 While several graph clustering approaches have been proposed, This lemma indicates that we can efciently compute � from such as METIS [25] and the Girvan-Newman algorithm [38], we � Equation (24) instead of Equation (20). Note that � = 1 since adopt the ratio cut algorithm [16] since its object function equals 0 �0 = 1. The next section shows our algorithm based on the bisection tr ⊤ [50]. Specifcally, if � is the objective function of the (H� TH� ) � method with inverse iteration. ratio cut algorithm used to obtain � + 1 clusters, we have �� = ⊤ We use inverse iteration to accu- tr (H� TH� ). As a result, we can compute �� by using �� instead 3.1.3 Eigenvector Computation. ⊤ rately compute the eigenvector-eigenvalue pair ( ′, �� ) of from of tr (H� TH� ) in Equation (20). Intuitively, the ratio cut algorithm v� T partitions a graph by cutting edges in the graph so that edges approximate eigenvalue �˜� obtained by the bisection method. In- between diferent clusters have low weights, and edges within verse iteration is a variant of the power method based on the prop- 1 clusters have high weights. The objective function of the ratio cut erty that inverse matrix (T − �˜� I)− has eigenvector-eigenvalue 1 algorithm to obtain � 1 clusters is given as follows: pair ( ′, ) [44]. Since �˜� is an approximation of �� such that + v� �� −�˜� 1 1 �+1 cut(Xj,X\Xj) �� ≤ �˜� ≤ �� , � �˜ is larger than � �˜ for any eigenvalue �� ≠ �� . �� = Í (21) � − � � − � �=1 |X� | 1 1 Therefore, is the largest eigenvalue of ( − �˜� )− . In general, �� −�˜� T I In this equation, cut(Xj, X\Xj) corresponds to the sum of edge ′ inverse iteration computes eigenvector-eigenvalue pair (v�, �� ) by weights of data points between X� and X\X� given as 1 applying the power method to (T − �˜� I)− [7]. cut X , X X 1 Í T k k′ (22) Although matrix �˜ is sparse, it requires � �3 time to ( j \ j) = 2 xk ∈Xj,xk′ ∈X\Xj [ ][ ] T − � I ( ) compute its inverse since ˜ −1 is a dense matrix [19]. We The original ratio cut algorithm fnds the cut that minimizes (T − �� I) compute the LU decomposition of ˜ to obtain eigenvector- the objective function since clusters should be large groups of data T − �� I eigenvalue pairs since it has desirable properties, as revealed in points [16]. However, we modify the algorithm to fnd the cut that this section. LU decomposition factorizes a matrix into the product maximizes the objective function. This is because we can efectively of a lower and an upper triangular matrix [41]. obtain tight bound � by using a large value of � from Equation (20). � � Specifcally, if ′ ˜ , we compute ′ where and We efciently compute clusters based on the observation that T = T − �� I T = LU L U are lower and upper triangular matrices, respectively. We have the tridiagonal matrix T corresponds to the chain graph of Figure 2, following property for LU decomposition of T′: where �� is the edge weight between data points. In obtaining � + 1 clusters, we fnd a cut from the chain graph of � clusters; we cut Lemma 4 (Zero Elements). In the �-th row of L where � ≥ 3, we edges in the chain graph one by one to obtain clusters. In particular, have �[�][�] = 0 if 1 ≤ � ≤ � − 2 holds. Besides, in the �-th column if Δ�� (�� ) is an increase in the objective function by cutting the of U where � ≥ 3, we have � [�][�] = 0 if 1 ≤ � ≤ � − 2 holds.

920 � � Algorithm 1 Eigenvector Computation � + � [��, �� ] (lines 11-17). It then computes �˜� = 2 (line 18). The Input: tridiagonal matrix T, orthonormal matrix P inverse iteration part initializes random vector w whose �2-norm is Output: eigenvector-eigenvalue pairs , � � 1 (line 19-20), and computes the LU decomposition of �˜ (line {(v� � )}�=1 T − � I 1: �0 = 1; 21). It iteratively uses forward and back substitutions to compute 2: E � � �−1; ′ = { ( � )}�=1 eigenvector v (lines 22-27); forward and back substitutions itera- 3: for � = 1 to � do tively solve a system of linear algebraic equations by using lower 4: for each � (�� ) ∈ E do and upper triangular matrices, respectively [41]. It then computes 5: compute Δ�� (�� ) from Equation (23); the eigenvector-eigenvalue pair (v�, �� ) of M from eigenvector v′ 6: � (�� ) = argmaxE {Δ�� (�� )}; (lines 28-29). We have the following property for Algorithm 1; 7: cut � (�� ) in the chain graph; 8: subtract � (�� ) from E; Lemma 6 (Eigenvector Computation). If �� and �� are the 9: � � ; � = �−1 numbers of iterations in the bisection method and inverse iteration, 10: � � Δ� � � ; � = �−1 + � ( � ) − �−1 respectively, Algorithm 1 requires � (�(�� +�� +�)) time to compute 11: repeat ( �, �� ) � +� the eigenvector-eigenvalue pair v of matrix M from tridiagonal 12: � � � ; = 2 matrix T and its orthonormal matrix P. 13: if �(�) ≤ � + 1 then Unlike the power method that takes � �3 time, we can ef- 14: �� = �; ( ) 15: else fciently compute the eigenvectors of M as shown in Lemma 6. 16: �� = �; Therefore, we can efectively reduce the computation cost of the 17: until �(�) = � + 1 ofine phase of anchor graph hashing. �� +�� 18: �˜� = 2 ; 19: pick a random column vector w of length �; 3.2 Online Phase Computation 20: ; w = w/∥w∥ This section describes our approach for the online phase. As de- 21: compute and from �˜ ; L U T − � I scribed in Section 2, we need to compute vector for query data 22: repeat z(x) point in the online phase. The �-th element of vector has a 23: solve Lw′ = w for w′ by forward substitution; x z(x) 24: solve Uv′ = w′ for v′ by back substitution; nonzero element if the �-th anchor point is one of the closest anchor 25: v′ = v′/∥v′ ∥; points of x; it has a zero element otherwise. However, it needs a 26: w = v′; high computation cost of � (��) time to fnd the closest anchor 27: until v′ reaches convergence points due to the data points’ high dimensionality. ′ 28: v� = Pv ; In fnding the closest anchor points, we use SVD (Singular Value ′ ⊤ ′ 29: �� = (v ) Tv ; Decomposition) to approximate distances since it gives the smallest error in approximating high-dimensional data [45]. For query data point � 1 , . . . , � � with � dimensions, we can use SVD Since is a lower triangular matrix, � � � 0 if � � 1 for its x = [ [ ] [ ]] L [ ][ ] = ≥ + of rank � ′ to obtain its approximation ˜ �˜ 1 ,..., �˜ � ′ of � ′ �-th row. Therefore, Lemma 4 indicates that � � � can be nonzero x = [ [ ] [ ]] [ ][ ] dimensions (� ′ �). We apply SVD to the matrix of � anchor only if � 1 � � in the �-th row; has at most two nonzero ≤ − ≤ ≤ L points ,..., ⊤, which takes � �� log� ′ � � � ′ 2 time elements in each row. As a result, the number of nonzero elements [u1 u�] ( +( + )( ) ) [17]. If Q is a � × � ′ orthonormal matrix obtained by SVD, x˜ is in L of size�×� is� (�). Similarly, the number of nonzero elements computed as x˜ = xQ. Similarly, if u˜� is the approximate vector in U is � (�). From Lemma 4, we have the following property; of length � ′ for anchor point u� , u˜� is computed as u˜� = u� Q. We Lemma 5 (LU Decomposition). It takes � (�) time to compute compute the approximate distance as follows: L and U from T′. Definition 2 (Approximate Distance). If we have ∥x′∥ = In general, it takes � (�3) time to compute LU decomposition p 2 2 ′ p 2 2 ∥x∥ − ∥x˜ ∥ and ∥u� ∥ = ∥u� ∥ − ∥u˜� ∥ , approximate distance [41]. However, this lemma shows that we can efciently compute ˜ �(x, u� ) between query data point x and anchor point u� is given by the LU decomposition of T′. Algorithm 1 shows how to compute eigenvector-eigenvalue pair ˜ q 2 2 ′ , � of based on the bisection method and inverse iteration. �( , � )= ∥ ∥ +∥ � ∥ −2⟨˜, ˜� ⟩−2∥ ′∥∥ ∥cos(� ′ ′ −� ′, ′ ) (25) (v� � ) M x u x u x u x u� u�,a x a Algorithm 1 consists of three parts; graph clustering, bisection � ′ ′ � ′, ′ method, and inverse iteration. The graph clustering part (lines 4-8), In this equation, u�,a and x a are given as follows: partitions the chain graph into � + 1 clusters. The bisection method −1 ⟨u�,a⟩−⟨u˜ �,a˜ ⟩ −1 ⟨x,a⟩−⟨x˜,a˜ ⟩ � ′, ′ = cos ′ ′ , �x′,a′ = cos ′ ′ (26) part computes approximate eigenvalues (lines 9-18). The inverse u� a ∥u� ∥ ∥a ∥ ∥x ∥ ∥a ∥ iteration part computes eigenvector-eigenvalue pairs (lines 19-29). where a ∈ U is a reference anchor point, a˜ is an approximate vector Algorithm 1 frst initializes � 1 and E � � �−1 where ′ p 2 2 0 = = { ( � )}�=1 of a, and ∥a ∥ = ∥a∥ − ∥a˜ ∥ �(�� ) is an edge in the chain graph corresponding to �� (lines 1-2). The graph clustering part computes Δ�� (�� ) for each edge (lines Note that we subject anchor points to SVD in the ofine process. Similarly, in the ofine process, we compute � ′ ′ by setting = 4-5) and cuts an edge in the chain graph based on Δ�� (�� ) (lines u�,a a 2 6-7). The bisection method part initializes �� and �� (lines 9-10), u1,..., u� for anchor points in � (� �) time and compute ∥u� ∥ for and updates them until only a single eigenvalue lies in the interval each anchor point in � (��) time. Therefore, it takes � (� ′) time to

921 Algorithm 2 Closest Anchor Points Algorithm 3 Ofine Phase of Tridiagonal Hashing Input: query data point x, set of anchor points U, orthonormal matrix Q Input: set of data points X, set of anchor points U, number of bits in hash Output: set of the � closest anchor points C codes �, target rank �′ 1: C = ∅; Output: hash code matrix Y ′ ⊤ 2: add � dummy anchor points to C; 1: compute rank-� SVD of matrix [u1,..., u� ] ; 3: a = u0; 2: for � = 1 to � do 4: x˜ = xQ; 3: compute ∥u� ∥; ′ 5: compute ∥x∥, ∥x ∥, and �x′,a′ ; 4: for � = � to � do 6: for � 1 to � do 5: compute � ; = u′,a′ ˜ � 7: compute � (x, u� ) for u� ; ˜ 6: for � = 1 to � do 8: if � (x, u� ) ≤ maxC {� (x, u� )} then 7: compute C� of x� by Algorithm 2; 9: compute � (x, u� ) for u� ; 8: compute Z by Equation (2); 10: if � (x, u� ) ≤ maxC {� (x, u� )} then 9: Λ = diag (Z⊤1); 11: add u� to C; 10: compute T and P by Equation (12), (15), and (16); 12: subtract u� = argmaxC {� (x, u� )} from C; 11: compute , � � by Algorithm 1; {(v� � )}�=1 13: if � ′, ′ ≤ � ′, ′ then u� x a x 12: V = [v1,..., v� ]; 14: a = u� ; 13: Σ = diag (�1, . . . , �� ); 15: compute � ′, ′ ; √ 1 1 x a 14: W = �Λ− 2 VΣ− 2 ; 15: Y = sgn (ZW); ˜ compute approximated distance �(x, u� ) in the online phase. We introduce the following lemma for the approximate distance: Algorithm 4 Online Phase of Tridiagonal Hashing Input: query data point , set of anchor points U, orthonormal matrix , Lemma 7 (Approximate Distance). We have the following prop- x Q hash code matrix , matrix erty for the approximate distance: Y W Output: hash code vector y ˜ �(x, u� ) ≤ �(x, u� ) (27) 1: compute C of x by Algorithm 2; 2: compute z(x) by Equation (5); This lemma indicates that �˜ , has the lower bounding prop- (x u� ) 3: y = sgn (z(x)W); erty for �(x, u� ), which enables us to exactly compute the closest anchor points for query data point x [45]. Algorithm 2 shows the approach to fnding the closest anchor anchor points (line 2-5). If C� is the set of the closest anchor points points. It frst initializes the set of the closest anchor points by to x� , it computes C� of each data points by using Algorithm 2 adding � dummy anchor points whose distances to x are ∞ (line 1- and computes Z and Λ from the closest anchor points (lines 6- 2). It sets the reference anchor point as = 0 such that cos(� ′ ′ − a u u�,u0 9). It computes T and P and obtains eigenvector-eigenvalue pairs � ′ ′ ) = 1 for each anchor point (line 3), and computes approximate by Algorithm 1 (line 10-11). It then computes from and x ,u0 W V Σ ′ vector x˜ (line 4). It then computes ∥x∥, ∥x ∥, and �x′,a′ (line 5). (lines 12-14), and computes Y from W (line 15). In the online phase, It picks up anchor points one by one to compute approximate Algorithm 4 computes the closest anchor points to the query data distances (lines 6-7). If an anchor point can be the closest anchor point using Algorithm 2 (line 1). It then computes z(x) and obtains points, it accurately computes the distance for the anchor point y (lines 2-3). The proposed approach has the following properties: (lines 8-9), and it updates the closest anchor points if necessary Theorem 1 (Computation Cost). (lines 10-12). Since the approximation quality of �˜ , improves Our ofine phase requires (x u� ) � �� ′ � � ′ � ��� �� �� �� �� as the angle between vector ′ and ′ decreases from Equation (25), ( ( + ) + ( + + + � + � )) time. Besides, our online x a � � ′ � � � �� � it changes the reference anchor point based on the angle to the phase needs ( ( + )+ ( + )) time. query data point (lines 13-15). For Algorithm 2, we have Theorem 2 (Hashing Result). The proposed approach yields the same results as the original anchor graph hashing algorithm if the Lemma 8 (Closest Anchor Points). If � is the ratio of the an- fxed-number of bits is used in hash codes to perform an approximate chor points used to compute the exact distances, Algorithm 2 requires nearest neighbor search. � ((�+�)� ′+���) time to identify the closest anchor point to x. Theorem 1 and 2 theoretically indicate that the proposed ap- Lemma 8 indicates that we can reduce the online phase’s com- proach can improve the efciency of anchor graph hashing while putation cost by pruning distance computations when identifying guaranteeing search results’ equivalence. the closest anchor points. Note that we can exploit Algorithm 2 in computing Z in the ofine phase, as shown in the next section. 3.4 Extension 3.3 Hashing Algorithm This section proposes implementation-oriented approaches to im- prove our approach’s efciency and search accuracy. Algorithms 3 and 4 give full descriptions of our ofine and online phases. In the ofine phase, Algorithm 3 computes SVD of matrix 3.4.1 Parallelization. Algorithms 3 and 4 implicitly assume a single [u1,..., u�]⊤ used in determining the approximate distances (line thread. However, they are easily parallelized. Let � be the number 1). It then computes the norms of anchor points and angles between of threads. To fnd the closest anchor points, we randomly divide

922 the anchor points into � sets and identify each set’s closest anchor hashing problem by using an orthonormal matrix and then updates points in parallel. We can efciently obtain the closest anchor points the orthonormal matrix from hash codes by using SVD. This ap- from the identifed anchor points. Moreover, we use block parti- proach iteratively performs these procedures to determine hash tioning to perform efciently. Specifcally, in codes of query data points from the eigenvectors of the adjacency computing X = BC, we perform block partitioning on the input matrix. Discrete spectral hashing computes the spectral solution matrix B and C as follows: with the rotation technique and generates hash codes under the required binary constraint of hash codes [21]. It iteratively com-  B11 B12   C11 C12  B = , C = (28) putes hash codes by using generalized [39], where B21 B22 C21 C22 it recursively transforms a matrix obtained from the anchor graph Then, we have by using an orthonormal matrix obtained by SVD.  B11C11 + B12C21 B11C12 + B12C22  As described in Section 3, the divide-and-conquer algorithm com- X = (29) B21C11 + B22C21 B21C12 + B22C22 putes all eigenvalues of a tridiagonal matrix. It recursively divides the tridiagonal matrix into submatrices, and performs eigendecom- Note that the smaller block matrices B11, B12, ..., C11, C12, ... can be further partitioned recursively. Since the recursive matrix multi- position by multiplying the orthonormal matrices obtained from plications and summations can be performed in parallel, efciency the submatrices. The major computation cost of the approach is 3 is signifcantly improved by using multiple threads. the matrix multiplication as it needs � (� ) time [7]. Coakley et al. proposed to use the fast multiple method (FMM) of one dimension 3.4.2 Hash Codes. As shown in Theorem 2, our approach yields to approximately compute the matrix multiplication in � (� log�) the same search results as the original approach if we fx the number time [5]. However, it is difcult to use their approach rather than our of bits in hash codes. To improve search accuracy, we recursively approach. Our approach can accurately compute eigenvalue �� since reduce the number of bits in the ofine phase. In hashing, the simi- 1 1 is the largest eigenvalue of inverse matrix (T − �˜� I)− based larity of data points of the same hash codes becomes more similar �� −�˜� on the property that �� ≤ �˜� ≤ �� , as described in Section 3.1.3. as code length increases. However, since hash space size exponen- However, approximated eigenvalues obtained by the FMM-based tially increases as the number of bits increases, we can fail to fnd approach do not have this property. Therefore, the FMM-based similar data points if we use long codes. To alleviate this problem, approach cannot be used to compute eigenvalues accurately. we adaptively use several diferent bit numbers in fnding similar The Nyström approach is a technique widely used to approx- data points. Specifcally, we frst perform an approximate nearest imate eigendecomposition. Locally linear landmarks is a recent neighbor search using hash codes of � bits. If we do not fnd any Nyström approach [48]. It projects all points by using a locally data points, we reduce hash code length by � bits, which corre- linear function of nearest sampled points where the locally linear sponds to the smallest eigenvalues used in anchor graph hashing. function is obtained from M. Variational Nyström is the state-of-the- We iteratively perform this procedure until we fnd a data point, or art Nyström approach that uses samples of points similar to locally the number of bits is smaller than �. We have the following property linear landmarks [49]. It computes the eigendecomposition by using for the search result of the recursive approach: the columns of the matrix corresponding to sampled points. Theorem 3 (No False Negative). Let S[x] and S′[x] be sets of similar data points for query data point x output by the original 5 EXPERIMENTAL EVALUATION and recursive approaches, respectively. We have S′[x] ⊇ S[x]. We performed experiments to show the efectiveness of our ap- proach. In this section, “Proposed”, “Anchor”, “VN” “LGHSR”, and This theorem indicates that, if a data point is found by the original “DSH” represent our approach, the original approach of anchor approach, the data point must be found by the recursive approach; graph hashing [36], variational Nyström-based approach [49], large this guarantees no false negatives in the search results. Note that, graph hashing with spectral rotation [31], and discrete spectral only if the original approach cannot fnd any data point do we have hashing [21], respectively. VN is the state-of-the-art Nyström ap- S′[ ] ⊇ S[ ]; otherwise S′[ ] S[ ]. x x x = x proach, and we used it to compute hash codes of anchor graph 4 RELATED WORK hashing since it can approximately compute eigendecomposition. LGHSR and DSH are state-of-the-art anchor graph hashing tech- Hashing techniques can be classifed into data-independent or data- niques as described in the previous section. dependent. LSH is the representative data-independent method We used the connect-4, epsilon, mnist, kitsune, and wesad datasets2. as it computes hash codes by random projections [13]. However, connect-4 is a dataset obtained from Connect Four games, a two- LSH needs long hash codes to achieve high search accuracy [14]. player game in which the players take turns dropping colored disks Recently, data-dependent methods have begun to attract more at- onto the 42 grid spaces formed by seven-columns and six-rows. tention. Spectral hashing is one of the most popular data-dependent The objective of the game is to form a line of four same color disks. methods [52]. It constructs the similarity matrix of data points and Each feature corresponds to each grid space occupied by a player solves the hashing problem via spectral decomposition, which is or empty; the number of dimensions is 126. The number of data inefcient for large-scale data. The anchor graph can efciently points is 67557, and classes correspond to win, loss, or draw for the compute the similarity matrix [33], and it is used in recent hashing frst player; the number of classes is three. epsilon is a dataset used techniques. For example, large graph hashing with spectral rotation in the Pascal large scale learning challenge in 2008. The goal of this is a recent anchor graph-based hashing technique [31]. To compute hash codes, it transforms the solution to spectral relaxation of the 2 https://www.csie.ntu.edu.tw/˜cjlin/libsvm/, https://archive.ics.uci.edu/ml/index.php

923 8 3 108 10 10 Proposed 2 Proposed 7 Proposed (40-threads) Proposed (40-threads) 107 10 10 Anchor Anchor Proposed (single-thread) 2 Proposed (single-thread) 6 6 10 10 VN VN 10 Anchor (40-threads) Anchor (40-threads) LGHSR LGHSR Anchor (single-thread) Anchor (single-thread) 5 105 10 DSH 1 DSH 1 10 4 10 104 10 3 103 10 100 2 2 10 0 10 10 1 1 Wall clock time [s] Wall clock time [s] Wall clock time [s] 10 Wall clock time [s] 10 10-1 0 100 10 -1 -2 10-1 10-1 10 10 connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad (1) � = 100 (1) � = 100 107 103 Figure 6: Ofline phase time Figure 7: Online phase time Proposed Proposed 106 Anchor Anchor with parallelization. with parallelization. VN 2 VN 105 LGHSR 10 LGHSR DSH DSH 104 101 study [36]. For the proposed approach, we set the target rank of 103 SVD to � ′ 30. We set the number of iterations used in inverse 2 = 10 0 Wall clock time [s] Wall clock time [s] 10 iteration to three by following a previous study [7]. We used the 1 10 power method to compute eigenvalues except for the proposed 100 10-1 connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad approach by setting tolerance to 0.001. For LGHSR and DSH, we set (2) � = 300 (2) � = 300 the number of SVD computations to fve. We also set the number 107 Proposed 103 Proposed of generalized power iterations to fve for DSH. The number of 106 Anchor Anchor VN VN 5 sampled data points in VN was set to 0.5�. 10 LGHSR 102 LGHSR DSH DSH 104 All the approaches examined were implemented in C++ with 1 3 10 10 Eigen library. Eigen is a popular C++ template library for linear 2 3 10 0 algebra . Moreover, to parallelize our approach, we implemented it Wall clock time [s] Wall clock time [s] 10 4 101 in OpenMP, a multi-threaded coding interface . We conducted all 100 10-1 experiments on a Linux server with two Intel(R) Xeon(R) E5-2650 connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad (3) � = 500 (3) � = 500 v3 CPUs with 2.30 GHz processors. Each CPU has ten physical cores Figure 4: Ofline phase time Figure 5: Online phase time with hyper-threading, and thus the server has 40 logical cores. The of each approach. of each approach. server has 314GB of memory and a 1TB hard disk. We conducted the experiments using a single thread unless otherwise stated. challenge is to recognize objects from a number of visual object classes. This dataset includes images obtained from the Flickr web- 5.1 Processing Time site. It contains 500000 objects 2000 numerical features. The number We evaluated the processing time. Figure 4 shows the results of of classes is two in this dataset. mnist is a dataset of handwritten the ofine phase, and Figure 5 plots the online phase results. The digits. This dataset is a collection of 8100000 grayscale images with number of anchor points was set to � = 100, 300, and 500. Note 784 features. Each image is one digit from “0” to “9”, so the num- that the number of anchor points was set to � = 300 in the original ber of classes is ten. kitsune is a cybersecurity dataset containing paper [36]. For kitsune and wesad, we omit the results of LGHSR diferent network attacks on a commercial IP-based surveillance and DSH since they failed to complete the hash codes within three system and an IoT network. Each data point represents a behavioral months. To evaluate the scalability ofered by our parallelization snapshot of the hosts and protocols and consists of 115 trafc sta- approach, we evaluated the times taken by the ofine and online tistics captured in a temporal window. This dataset includes several phases using 40 threads, see Figure 6 and 7, respectively. In these network attacks, such as DoS, reconnaissance, man-in-the-middle, fgures, “Proposed (40-threads)” is for our approach with 40-threads, and botnet attacks. The numbers of data points and classes are and “Anchor (40-threads)” is for the original approach parallelized 21017596 and ten, respectively. wesad contains data of subjects by our approach. “Proposed (single-threads)” and “Anchor (single- during a stress-afect lab study while wearing physiological and threads)” are results using a single thread. In this experiment, we motion sensors. It includes sensor modalities such as blood volume set the number of anchor points to � = 500. Figure 1 of Section 1 pulse, respiration, body temperature, and three-axis acceleration. evaluates each approach with � = 500. To efectively perform stress-afect detection, we used the dataset’s As shown in Figure 4 and 5, our approach ofers higher efciency polynomial features [40]. In particular, we created second-order than the previous approaches. Specifcally, in the ofine phase, it polynomial features by following the method of [43]. As a result, is up to 7.6, 7.0, 3191.4, and 3195.7, times faster than Anchor, VN, the number of dimensions is 176. The numbers of data points and LGHSR, and DSH, respectively. Moreover, it is up to 2.2, 2.1, 93.6, classes are 60807600 and eight, respectively. and 84.7 times faster than Anchor, VN, LGHSR, and DSH, respec- In each dataset, we randomly sampled 10000 data points to form tively, in the online phase. As described in Section 2, Anchor needs the query set used in the online phase, and we used the other to compute the eigenvalues of M in the ofine phase and closest data points to compute hash codes in the ofine phase. We set the anchor points in the online phase. Since it takes times of � (��2) number of bits in hash codes, �, to 32 and the number of reduced 2 bit, �, to 8. Besides, we set the bandwidth parameter � = � and the 3 http://eigen.tuxfamily.org number of closest anchor points to � = 2 by following a previous 4 http://www.openmp.org/

924 105 12 102 14 Proposed Proposed Proposed 4 Original 10 Naive Inverse 12 10 101 10 connect-4 3 epsilon 10 8 100 8 mnist 2 kitsune 10 6 6 wesad 10-1 101 4 4 Wall clock time [s] Wall clock time [s] Wall clock time [s]

Number of updates -2 2 100 2 10 0 10-1 0 10-3 10 15 20 25 30 35 40 45 50 connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad connect-4 epsilon mnist kitsune wesad Target rank Figure 8: Computation time Figure 9: Number of updates Figure 10: Computation time Figure 11: Online phase time to obtain tridiagonal matrix. in bisection method. of inverse iteration. vs. target rank. and � (��) to compute eigenvalues and closest anchor points, re- 5.3 Bisection Method spectively [36, 41], Anchor is slower than our approach. Although As mentioned in Section 3.1.2, we initialize left bound � by using VN can efciently compute eigenvectors, it still needs � (��2) time � the ratio cut algorithm to reduce the number of updates for � to compute M. Therefore, VN is slower than our approach. As � described in Section 4, LGHSR needs to compute the eigendecom- needed in the bisection method. We evaluated the average number of updates needed in computing approximate eigenvalue �˜ (1 � position of the � × � adjacency matrix of the anchor graph in the � ≤ ≤ ofine phase. In the online phase, it computes hash codes from the �); we evaluated �� in this experiment. In Figure 9, “Naive” indicates the results of the approach described in Section 3.1.2, which sets �×� dense matrix of obtained eigenvalues. Therefore, LGHSR incurs signifcantly high computation times compared to our approach. �� = 0 based on the property of �� > 0. In each approach, the process of computing �˜ sets the right bound to � � . We set The ofine phase of DSH computes the eigendecomposition of the � � = �−1 adjacency matrix similar to LGHSR, as described in Section 4. The the number of anchor points to � = 500. online phase of DSH computes the hash codes of each query data Figure 9 shows that our approach reduced the number of updates by up to 28.6% from the naive approach. Since the smallest eigen- point using the � ×� dense matrix of the obtained hash codes in the ofine phase. Therefore, DSH is much slower than our approach. value of T is larger than 0 [33, 50], it is theoretically valid to set � 0. However, since in practice � � , it is difcult to obtain On the other hand, to compute hash codes, we efciently compute � = � ≈ �−1 eigenvectors by applying a graph clustering algorithm to the tridi- tight left bound �� by setting �� = 0. Therefore, the naive approach agonal matrix, as described in Section 3.1. We also compute lower needs a larger number of updates to produce �� . To reduce update bounds of distances to efciently identify the closest anchor points computations, we apply the ratio cut algorithm to the chain graph for each query data point, as described in Section 3.2. Consequently, corresponding to T, and efectively compute the left bound for �� our approach is much faster than previous approaches. as shown in Figure 9. Our approach needs � (���) time to compute As described in Section 3.4.2, our approach can be parallelized by �˜� where �� is a small number, as shown in Figure 9. Therefore, we using multiple threads. As shown in Figure 6 and 7, our paralleliza- can efciently compute approximate eigenvalues. tion approach with 40-thread execution is faster than the proposed approach using a single thread; up to 24.8 times in the ofine phase 5.4 Inverse Iteration and 29.5 times faster in the online phase. This is because we can As described in Section 3.1.3, we use inverse iteration to compute 1 efciently fnd the closest anchor points by dividing the anchor the eigenvectors by using (T − �˜� I)− (1 ≤ � ≤ �). We compute the points into several sets and performing matrix multiplication using LU decomposition of T′ = T − �˜� I to improve efciency. This exper- block partitioning. Since our approach can be easily parallelized, iment examined the processing times needed for inverse iteration we can improve its efciency by using multiple threads. where � = 500. In Figure 10, “Inverse” is the approach that directly 1 computes (T − �˜� I)− for obtaining eigenvectors. 5.2 Lanczos Method As shown in Figure 10, our approach is up to 2867.0 times faster As described in Section 3.1.1, we compute following the Lanczos than the comparable approach. Even though T − �˜� I has a sparse T 1 1 structure, its inversion, (T−�˜� I)− , has a dense structure [19]. There- method by using − 2 . This experiment examined the time b� = ZΛ p� fore, it takes � �3 time to compute �˜ −1; the comparable taken to compute where � 500. In Figure 8, “Original” denotes ( ) (T − � I) T = approach incurs high computation cost. On the other hand, since the original Lanczos method, which gets by computing . T M the number of nonzero elements is � � in the lower and upper Figure 8 shows that our approach achieves shorter computation ( ) triangular matrices obtained by LU decomposition from Lemma 4, times than the original Lanczos method. In particular, it is up to we can compute LU decomposition of ′ in � � time as shown in 63.8 times faster than the original Lanczos method because it takes T ( ) Lemma 5. Consequently, we can efciently compute eigenvectors. � (��2) time to compute M [36]. The original Lanczos method needs � (�3) time to compute from Equation (9), (10), and (11), as T 5.5 Singular Value Decomposition described in Section 3.1.1. On the other hand, we compute T from b� by using Equation (12), (15), and (16) in � (���) time as shown As mentioned in Section 3.2, we exploit the SVD of rank � ′ for ma- in Lemma 1 without computing M. As a result, our approach can trix [u1,..., u�]⊤ to compute approximated distances to improve more efciently compute T than the original Lanczos method. the online phase’s efciency. Since we avoid exact computations of

925 Table 2: MAE vs. target rank. 0.9 �′ connect-4 epsilon mnist kitsune wesad 0.8 10 6.53 102 3.17 102 4.57 105 6.15 101 3.83 102 0.7 × × × × × 0.6 2 2 5 1 2 20 3.78 × 10 3.05 × 10 3.48 × 10 1.50 × 10 1.04 × 10 0.5 MAP 30 2.40 102 2.93 102 2.76 105 3.22 100 2.11 101 0.4 connect-4 × × × × × epsilon 40 1.20 × 102 2.83 × 102 2.35 × 105 7.01 × 10−1 5.56 × 100 0.3 mnist 0.2 kitsune 50 6.17 101 2.72 102 1.97 105 1.49 10−1 1.36 100 wesad × × × × × 0.1 8 16 24 32 Table 3: MAP of each approach. Number of bits Dataset � Proposed Anchor VN LGHSR DSH Figure 12: MAP vs. the number of bits. 100 0.536 0.392 0.335 0.533 0.123 Table 3 indicates that the search accuracy of our approach im- connect-4 300 0.551 0.411 0.361 0.541 0.128 proves with the number of anchor points. This is because anchor 500 0.556 0.412 0.371 0.543 0.149 points represent the data distribution in the high-dimensional space. 100 0.479 0.249 0.208 0.377 0.192 We note that our approach yields higher search accuracy than the epsilon 300 0.500 0.270 0.249 0.383 0.195 500 0.505 0.340 0.324 0.390 0.237 original approach of anchor graph hashing. This is because we 100 0.579 0.578 0.548 0.582 0.345 use the recursive technique to iteratively reduce hash codes, as de- mnist 300 0.689 0.686 0.662 0.709 0.371 scribed in Section 3.4.2. As shown in Theorem 3, there are no false 500 0.732 0.730 0.705 0.759 0.390 negatives in the recursive approach’s search results. Specifcally, if 100 0.798 0.719 0.690 − − the original approach fnds any data points, we have S′[x] = S[x]; kitsune 300 0.807 0.776 0.767 − − the proposed approach fnds the same data points. Furthermore, 500 0.808 0.793 0.775 − − even if the original approach cannot fnd any data points, our ap- 100 0.443 0.317 0.304 − − proach can fnd similar data points since S′[x] ⊇ S[x] holds. As a wesad 300 0.498 0.440 0.433 − − result, our approach can more accurately fnd similar data points 500 0.509 0.458 0.452 − − than the original approach and indeed other previous approaches. As shown in Figure 12, our approach can more accurately fnd similar data points as the number of bits increase. The results in distances in identifying the closest anchor point by using approxi- Table 3, as well as those in Figure 4 and 5, indicate that the proposed mated distances, the target rank of SVD is expected to impact the approach increases the efciency of anchor graph hashing while efciency of our approach. Figure 11 evaluates the processing time improving search accuracy. Moreover, our approach’s search accu- of the online phase for diferent target rank settings. We also eval- racy can be enhanced by increasing the number of anchor points uated approximation errors of approximated distances by setting without a signifcant increase in computation time. diferent target rank values. Table 2 shows the results using Mean Absolute Error (MAE) as the metric [1]. In these experiments, we 6 CONCLUSIONS set the number of anchor points to � = 500. As shown in Figure 11, our approach’s processing time did not This paper addressed the problem of reducing the processing time of vary much with SVD target rank; it is not so sensitive to rank anchor graph hashing. Our approach computes eigenvectors from � ′. Table 2 shows that the mean absolute errors of approximated the tridiagonal matrix by using the ratio cut algorithm. The pro- distances decrease as the target rank increases. As described in posed approach also uses SVD to compute the closest anchor points efciently. Experiments show that our approach is signifcantly Section 3.2, it takes � (� ′) time to compute approximated distance by SVD. Therefore, if � ′ is small, we can efciently compute approx- faster than existing approaches with improved search accuracy. imated distances. On the other hand, we can reduce the number of exact distance computations when � ′ is large. This is because ACKNOWLEDGMENTS we can improve the approximation quality by increasing the target This work was supported by JSPS KAKENHI Grant Number 19H04117. rank, as shown in Table 2. Due to this trade-of, derived from target rank � ′, our approach is not so sensitive to the target rank of SVD. APPENDIX We provide the proofs omitted from Section 3. 5.6 Search Accuracy Proof of Lemma 1. Since we can compute b� in � (��) time and This section shows that the proposal yields an efective approximate �� is obtained from the �2-norm of b� from Equation (12), we can nearest neighbor search in terms of accuracy. Table 3 shows search compute �� in � (��) time. In Equation (15) and (16), we can compute accuracy when using a hash lookup to fnd data points with the 1 − 2 ⊤ in � (��) time since the number of nonzero elements in same label as the query data point. By following the previous study Λ Z b� ⊤ is ��. It also requires � (�) time to compute the inner product of [36], we used Mean Average Precision (MAP) as the metric of Z ⟨ , 1⟩ since the length of is �. Therefore, we can compute search accuracy; MAP is the mean of the average precision scores b� b�− b� � and in � (��) time from Equation (15) and (16). As a result, it for each query data point in fnding data points of the same label [1]. � p� takes � (���) time to compute and . □ Moreover, we evaluated the MAP of our approach for several bit T P ′ numbers, �, in the hash codes with � = 500. In Figure 1 of Section 1, Proof of Lemma 2. Since V� gives the optimal solution for the prob- ⊤ ′ ⊤ MAP was evaluated by setting � = 500 and � = 32. lem of Equation (18), we have tr (H� TH� ) ≤ tr ((V� ) TV� ) and

926 tr ′ ⊤ Í� � . Therefore, we have � tr ′ ⊤ ′ We frst prove that �˜ , 0 holds. Since ((V� ) TV� ) = �=0 � � = ((V� ) TV� ) − Proof of Lemma 7. (x u� ) ≥ � 1 � 1 Í − � tr ⊤ Í − � , which completes the proof. □ ⟨x˜, u˜� ⟩ ≤ ∥x˜ ∥∥u˜� ∥ and cos(� ′, ′ − �x′,a′ ) ≤ 1, we have �=0 � ≥ (H� TH� ) − �=0 � u� a ⊤ 2 2 ′ ′ Proof of Lemma 3. Since �� = tr (H TH� ) holds, left bound � is ∥ ∥ +∥ � ∥ −2⟨˜, ˜� ⟩−2∥ ∥∥ ∥ cos(� ′ ′ −� ′, ′ ) � � x u x u x u� u�,a x a computed as � tr ⊤ Í�−1 � � Í�−1 � . Since � = (H� TH� ) − �=0 � = � − �=0 � 2 ′ 2 2 ′ 2 ′ ′ ≥∥x˜ ∥ +∥x ∥ +∥u˜� ∥ +∥u� ∥ −2∥x˜ ∥∥u˜� ∥−2∥x ∥∥u� ∥ (34) Δ�� (�� ) is the increase in the objective functions by cutting the 2 ′ ′ 2 edge corresponding to � , we have � � � � � =(∥x˜ ∥−∥u˜� ∥) +(∥x ∥−∥u ∥) ≥ 0 � � − �−1 = � − �−1 − �−1 = � Δ�� (�� ) − �� 1. As a result, we can compute � in � (1) time. □ − � ˜ Therefore, �(x, u� ) ≥ 0 holds from Equation (25). Since SVD is an Proof of Lemma 4. Each element of the lower and upper triangular orthonormal transformation, we have matrices of LU decomposition of T′ is given as follows [41]: 2 � 2 � (x, u� ) =Í (� [�]−�� [�])  0 if � < � �=1  2 2 �′ � � � �  1 if � = � (30) =∥x∥ +∥u� ∥ −2 Í �˜ [�]�˜� [�]−2 Í �˜ [�]�˜� [�] (35) [ ][ ] = � 1 �=1 �=�′+1 � ′ [� ][� ]−Í − �[� ][� ]� [� ][� ]  �=1 otherwise 2 2 ′ ′  � [� ][� ] =∥x∥ +∥u� ∥ −2⟨x˜, u˜� ⟩−2∥x ∥∥u ∥ cos� ′, ′  � u� x 0 if � > �  ′ ′  ′ where � ′, ′ is the angle between u and x . As shown in Equa- � [�][�] = � [�][�] if � = 1 (31) u� x � � 1 tion (26), � is the angle between ′ and ′, and � is the angle  � ′[�][�] − Í − �[�][�]� [�][�] otherwise u�,a′ u� a x′,a′  �=1 ′ ′  between and . Since we have cos� ′ ′ ≤ cos(� ′ ′ − � ′, ′ ), x a u�,x u�,a x a We prove Lemma 4 by using mathematical induction. ˜ we have �(x, u� ) ≤ �(x, u� ) from Equation (25) and (35). □ Initial step: show the statement above holds in the case of � = ′ 1. If � 1, we have � � � � [� ][� ] from Equation (30), and Since the size of is � � ′, it takes � �� ′ time = [ ][ ] = � [� ][� ] Proof of Lemma 8. Q × ( ) � ′[�][�] = 0 since T′ = T − �˜� I and � [�][�] = 0 from Equation (8). to compute x˜ = xQ. It also needs � (�) time to compute ∥x∥ and Therefore, �[�][�] = 0. Similarly, if � = 1, we have � [�][�] = ∥x′∥. Therefore, it requires � ((�+�)� ′) time to compute approx- � ′[�][�] = 0 from Equation (31). imate distances for the anchor points. It also takes � (���) time Inductive step: assume that �[�][� ′] = 0 and � [� ′][�] = 0 hold to compute �(x, u� ) after pruning anchor points by using approxi- for the � ′-th element such that 1 ≤ � ′ ≤ � − 1. From Equation (30), mate distances. Therefore, it needs � ((�+�)� ′+���) time to fnd ′ we have � � � � [� ][� ] since � � � ′ 0 holds for all � ′ the closest anchor points to query data point x. □ [ ][ ] = � [� ][� ] [ ][ ] = such that 1 � ′ � 1. Since � ′ � � 0 holds, we have ≤ ≤ − [ ][ ] = Proof of Theorem 1. In the ofine phase, it takes � (�� log� ′+(�+ � � � 0. Similarly, we have � � ′ � 0 from Equation (31) [ ][ ] = [ ][ ] = �)(� ′)2) time to compute SVD. It takes � (��) and � (�2�) time ′ ′ ′ since � [� ][�] = 0 for all � such that 1 ≤ � ≤ � − 1, which to compute the anchor points’ norms and angles between anchor completes the inductive step. □ points, respectively. It requires � ((�+�)�� ′+����) time to obtain and from the closest anchor points. From Lemma 1, we can Proof of Lemma 5. From Lemma 4, L and U is computed as Z Λ compute T and P in � (���) time. We can compute � eigenvector- � ′ [� ][� ]  if � = � + 1 eigenvalue pairs in � (�� (��+�� +�)) time from Lemma 6. Since the  � [� ][� ] �[�][�] = 1 if � = � (32) size of W is � × �, it needs � (���) time to compute Y of size � × �.  As a result, we need � �� ′ � � ′ � ��� �� �� �� �� time  0 otherwise ( ( + )+ ( + + + � + � ))  in the ofine phase. The online phase takes � ((�+�)� ′+���) time � ′[�][�] if � � − 1  = to obtain the anchor points closest to the query data point from � [�][�] =  � ′[�][�]−�[�][� −1]� [�−1][�] if � = � (33) Lemma 8. It takes � (��) time to compute y. Therefore, the online  0 otherwise  phase of our approach needs � (� ′(�+�)+�(�� +�)) time. □ where we assume �[1][0] = � [0][1] = 0 in computing � [1][1]. Equation (32) and (33) indicate that it takes � (1) time to compute Proof of Theorem 2. Our approach uses approximate distances each element of L and U. Since the number of nonzero elements in with the lower bounding property in computing the closest an- L and U is � (�), it needs � (�) time to compute L and U. □ chor points, as shown in Lemma 7. Therefore, it can exactly obtain the closest anchor points [45]. It computes �� of T by inverse itera- From Lemma 3, we can compute � in � 1 Proof of Lemma 6. � ( ) tion from �˜� obtained by the bisection method. The corresponding time. Since it takes � � to compute � � , we need � �� time ( ) ( ) ( �) eigenvector v� to �� can be computed from P. Since T and M have to compute �˜� using the bisection method. From Lemma 5, it takes the same eigenvalues [7], our approach can accurately compute the � (�) time to compute LU decomposition of T − �˜� I. Since the � largest eigenvector-eigenvalue pairs , � � of from . □ {(v� � )}�=1 M T number of nonzero elements in L and U is � (�), we need � (��� ) time to compute v′ using the inverse method. Since the size of P is Proof of Theorem 3. If the original approach fnds data points similar 2 � × �, it requires � (� ) time to compute v from equation of v� = to query data points using � bits, the recursive approach also fnds ′ ′ Pv� . Since the number of nonzero elements in T is � (�) as shown the same data points using � bits; S[x] = S [x]. Otherwise, since in Equation (8), it takes � (�) time to compute �� from equation the original approach fails to fnd similar data points, we have of �� = (v′)⊤Tv′. As a result, we can compute an eigenvector- S[x] = ∅. Since the recursive approach iterates the approximate ′ eigenvalue pair (v�, �� ) of M in � (�(�� +�� +�)) time from T. □ nearest neighbor search, S [x] ⊇ ∅ = S[x] holds. □

927 REFERENCES Trans. Knowl. Data Eng. 14, 4 (2002), 792–808. [29] Mingjie Li, Ying Zhang, Yifang Sun, Wei Wang, Ivor W. Tsang, and Xuemin Lin. [1] S. Margret Anouncia and Ufe Kock Wiil. 2018. Knowledge Computing and its 2020. I/O Efcient Approximate Nearest Neighbour Search based on Learned Applications: Knowledge Computing in Specifc Domains: Volume II. Springer. Functions. In . 289–300. [2] Jon Louis Bentley. 1990. K-d Trees for Semidynamic Point Sets. In SoCG. 187–197. ICDE [3] Miguel Á. Carreira-Perpiñán and Ramin Raziperchikolaei. 2015. Hashing with [30] Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2020. Approximate Nearest Neighbor Search on High Dimensional Binary Autoencoders. In CVPR. 557–566. [4] Paolo Ciaccia and Marco Patella. 2000. PAC Nearest Neighbor Queries: Approxi- Data - Experiments, Analyses, and Improvement. IEEE Trans. Knowl. Data Eng. 32, 8 (2020), 1475–1488. mate and Controlled Search in High-Dimensional and Metric Spaces. In ICDE. 244–255. [31] Xuelong Li, Di Hu, and Feiping Nie. 2017. Large Graph Hashing with Spectral [5] Ed S. Coakley and Vladimir Rokhlin. 2013. A fast divide-and-conquer algorithm Rotation. In AAAI. 2203–2209. for computing the spectra of real symmetric tridiagonal matrices. [32] Jingjing Liu, Shaoting Zhang, Wei Liu, Xiaofan Zhang, and Dimitris N. Metaxas. Applied and 2014. Scalable Mammogram Retrieval Using Anchor Graph Hashing. In . Computational Harmonic Analysis 34, 3 (2013), 379 – 414. ISBI [6] Ronald Fagin, Ravi Kumar, and D. Sivakumar. 2003. Efcient Similarity Search 898–901. and Classifcation via Rank Aggregation. In . 301–312. [33] Wei Liu, Junfeng He, and Shih-Fu Chang. 2010. Large Graph Construction for SIGMOD Scalable Semi-Supervised Learning. In . 679–686. [7] William Ford. 2014. with Applications: Using MATLAB. ICML Academic Press. [34] Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete Graph [8] Yasuhiro Fujiwara, Yasutoshi Ida, Junya Arai, Mai Nishimura, and Sotetsu Iwa- Hashing. In NIPS. 3419–3427. mura. 2016. Fast Algorithm for the Lasso based L1-Graph Construction. [35] Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, and Lu Qin. 2019. I-LSH: I/O Proc. Efcient c-Approximate Nearest Neighbor Search in High-Dimensional Space. In VLDB Endow. 10, 3 (2016), 229–240. [9] Yasuhiro Fujiwara, Go Irie, Shari Kuroyama, and Makoto Onizuka. 2014. Scaling ICDE. 1670–1673. [36] Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with Manifold Ranking Based Image Retrieval. Proc. VLDB Endow. 8, 4 (2014), 341–352. [10] Yasuhiro Fujiwara, Atsutoshi Kumagai, Sekitoshi Kanai, Yasutoshi Ida, and Graphs. In ICML. 1–8. [37] Kejing Lu, Hongya Wang, Wei Wang, and Mineichi Kudo. 2020. VHP: Approxi- Naonori Ueda. 2020. Efcient Algorithm for the b-Matching Graph. In KDD. 187–197. mate Nearest Neighbor Search via Virtual Hypersphere Partitioning. Proc. VLDB [11] Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Yasutoshi Ida, and Endow. 13, 9 (2020), 1443–1455. Machiko Toyoda. 2015. Adaptive Message Update for Fast Afnity Propagation. [38] M.E.J. Newman. 2003. Fast Algorithm for Detecting Community Structure in Networks. 69 (2003). In KDD. 309–318. Physical Review E [12] Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-sensitive [39] Feiping Nie, Rui Zhang, and Xuelong Li. 2017. A Generalized Power Iteration Method for Solving Quadratic Problem on the Stiefel Manifold. Hashing Scheme Based on Dynamic Collision Counting. In SIGMOD. 541–552. Sci. China Inf. [13] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity Search in Sci. 60, 11 (2017), 112101:1–112101:10. [40] Paul Pavlidis, Jason Weston, Jinsong Cai, and William Noble Grundy. 2001. Gene High Dimensions via Hashing. In VLDB. 518–529. [14] Yuchen Guo, Guiguang Ding, Li Liu, Jungong Han, and Ling Shao. 2017. Learning functional classifcation from heterogeneous data. In RECOMB. 249–255. to Hash With Optimized Anchor Embedding for Scalable Retrieval. [41] William H. Press, Saul A. Teukolsky, William T Vetterling, and Brian P. Flannery. IEEE Trans. 2007. . Cambridge Image Processing 26, 3 (2017), 1344–1354. Numerical Recipes 3rd Edition: The Art of Scientifc Computing [15] Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Search- University Press. [42] Parikshit Ram and Kaushik Sinha. 2019. Revisiting kd-tree for Nearest Neighbor ing. In SIGMOD. 47–57. [16] Lars W. Hagen and Andrew B. Kahng. 1992. New Spectral Methods for Ratio Search. In KDD. 1378–1388. Cut Partitioning and Clustering. [43] Volker Roth and Bernd Fischer. 2008. The Group-Lasso for Generalized Linear IEEE Trans. on CAD of Integrated Circuits and Models: Uniqueness of Solutions and Efcient Algorithms. In . 848–855. Systems 11, 9 (1992), 1074–1085. ICML [17] Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. 2011. Finding Structure [44] Yousef Saad. 2003. Iterative Methods for Sparse Linear Systems. Society for with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Industrial and Applied Mathematics. [45] Dennis Shasha and Yunyue Zhu. 2004. Decompositions. SIAM Rev. 53, 2 (2011), 217–288. High Performance Discovery In Time Series: [18] Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate Nearest Techniques And Case Studies. SpringerVerlag. [46] Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin. 2014. SRS: Neighbor: Towards Removing the Curse of Dimensionality. Theory of Computing 8, 1 (2012), 321–350. Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index. 8, 1 (2014), 1–12. [19] David A. Harville. 2008. Matrix Algebra From a Statistician’s Perspective. Springer. Proc. VLDB Endow. [20] Michael E. Houle and Jun Sakuma. 2005. Fast Approximate Similarity Search in [47] Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2009. Quality and Efciency in High Dimensional Nearest Neighbor Search. In . 563–576. Extremely High-Dimensional Data Sets. In ICDE. 619–630. SIGMOD [21] Di Hu, Feiping Nie, and Xuelong Li. 2019. Discrete Spectral Hashing for Efcient [48] Max Vladymyrov and Miguel Á. Carreira-Perpiñán. 2013. Locally Linear Land- marks for Large-Scale Manifold Learning. In . 256–271. Similarity Retrieval. IEEE Trans. Image Process. 28, 3 (2019), 1080–1091. ECML PKDD [22] Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng. 2015. [49] Max Vladymyrov and Miguel Á. Carreira-Perpiñán. 2016. The Variational Nys- Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor trom Method for Large-scale Spectral Problems. In ICML. 211–220. [50] Ulrike von Luxburg. 2007. A Tutorial on Spectral Clustering. 17, 4 Search. Proc. VLDB Endow. 9, 1 (2015), 1–12. Stat. Comput. [23] Yasutoshi Ida, Yasuhiro Fujiwara, and Hisashi Kashima. 2019. Fast Sparse Group (2007), 395–416. [51] Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A Quantitative Analysis Lasso. In NeurIPS. 1700–1708. [24] Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara, Tomoharu Iwata, Koh Takeuchi, and Performance Study for Similarity-Search Methods in High-Dimensional and Hisashi Kashima. 2020. Fast Deterministic CUR with Spaces. In VLDB. 194–205. [52] Yair Weiss, Antonio Torralba, and Robert Fergus. 2008. Spectral Hashing. In . Accuracy Assurance. In ICML. 4594–4603. NIPS [25] George Karypis and Vipin Kumar. 1999. Parallel Multilevel Series k-Way Parti- 1753–1760. [53] Yi Zheng, Hui Peng, Xiaocai Zhang, Xiaoying Gao, and Jinyan Li. 2018. Predicting tioning Scheme for Irregular Graphs. SIAM Rev. 41, 2 (1999), 278–300. [26] Norio Katayama and Shin’ichi Satoh. 1997. The SR-tree: An Index Structure for Drug Targets from Heterogeneous Spaces using Anchor Graph Hashing and Ensemble Learning. In . 1–7. High-Dimensional Nearest Neighbor Queries. In SIGMOD. 369–380. IJCNN [27] Brian Kulis and Kristen Grauman. 2009. Kernelized Locality-sensitive Hashing [54] Y. Zheng, C. Sun, C. Zhu, X. Lan, X. Fu, and W. Han. 2015. LWCS: A Large-scale Web Page Classifcation System Based on Anchor Graph Hashing. In . for Scalable Image Search. In ICCV. 2130–2137. ICSESS [28] Chen Li, Edward Y. Chang, Hector Garcia-Molina, and Gio Wiederhold. 2002. 90–94. Clustering for Approximate Similarity Search in High-Dimensional Spaces. IEEE

928