Fast Random Projections Using Lean Walsh Transforms

Fast Random Projections Using Lean Walsh Transforms

Fast Random Projections using Lean Walsh Transforms Abstract We present a random projection matrix that is applicable to vectors x 2 Rd in O(d) operations if 0 d ¸ k2+± for arbitrary small ±0 and k = O(log(n)="2). The projection succeeds with probability 1 ¡ 1=n ¡(1+±) and it preserves lengths up to distortion " for all vectors such that kxk1 · kxk2k for arbitrary small ±. Previous results are either not applicable in linear time or require a bound on kxk1 that depends on d. 1 Introduction We are interested in producing a random projection procedure such that applying it to a vector in Rd requires O(d) operation. The idea is to ¯rst project an incoming vector x 2 Rd into an intermediate dimension d~ = d® (0 < ® < 1) and then use a known construction to project it further down to the minimal Johnson 2 Lindenstrauss (JL) dimension k = O(log(n)=" )). The mapping is composed of three matrices x 7! RADsx. ~ The ¯rst, Ds, is a random §1 diagonal matrix. The second, A, is a dense matrix of size d £ d termed the Lean Walsh Matrix. The third, R, is a k £ d~ Fast Johnson Lindenstrauss matrix. As R, we use the fast JL construction by Ailon and Liberty [2]. Their matrix, R, can be applied ~ ~ 2+±00 00 (2+±00)=® to ADsx in O(d log(k)) operations as long as d ¸ k for arbitrary small ± . If d > k this 1 condition is met . Therefore, for such d, applying R to ADsx is length preserving w.p and accomplishable in O(d~log(k)) = O(d® log(k)) = O(d) operations. ~ The main purpose of this manuscript is to produce a dense d £ d matrix A such that x 7! ADsx is ¡(1+±) computable in O(d) operation, and preserves length w.p for vectors such that kxk1 · kxk2k for arbitrarily small ±. 1Later in the manuscript we see that ® can be chosen arbitrarily close to 1. 1 2 Norm concentration For a matrix to exhibit the property of a random projection, it is enough for it to preserve the length of any one vector with very high probability: h i 2 Pr j kADsxk ¡ 1 j ¸ ") < 1=n (1) where A is a column normalized matrix, Ds is a random diagonal §1 w.p 1=2 matrix, x is a general vector (w.l.o.g kxk2 = 1), and n is chosen according to a desired success probability, usually polynomial in the number of projected vectors. Notice that we can replace the term ADsx with ADxs where Dx is a diagonal matrix holding on the diagonal the values of x, i.e Dx(i; i) = x(i) and s is a vector of random §1. By denoting M = ADx we view the term kMsk as a convex function over the product space f1; ¡1gd from which the variable s is chosen. In his book, Talagrand [4] describes a strong concentration result for convex Lipschitz bounded functions over probability product spaces. Lemma 2.1 (Talagrand [4]). Consider a matrix M and a random entree-wise i.i.d §1 vector s. De¯ne the random variable Y = kMsk2. Denote by ¹ the median of Y and σ = kMk2!2 the spectral norm of M. The following holds: 2 2 Pr[jY ¡ ¹j > t] < 4e¡t =8σ (2) Lemma 2.1 asserts that kADxsk distributes like a (sub) Gaussian around its median, with standard deviation 2σ. First, in order to have E(Y 2) = 1 it is necessary and su±cient for the columns of A to be normalized (or normalized in expectancy). To estimate the median, ¹, we substitute t2 ! t0 and compute: Z 1 E[(Y ¡ ¹)2] = Pr[(Y ¡ ¹)2] > t0]dt0 0 Z 1 0 2 · 4e¡t =(8σ )dt0 = 32σ2 0 Furthermore, by Jensen, (E[Y ])2 · E[Y 2] = 1. Hence E[(Y ¡ ¹)2] = E[Y 2] ¡ 2¹E[Y ] + ¹2 ¸ 1 ¡ 2¹ + ¹2 = p (1 ¡ ¹)2. Combining, j1 ¡ ¹j · 32σ. By setting " = t + j1 ¡ ¹j, we get: 2 2 Pr[jY ¡ 1j > "] < 4e4e¡" =8σ (3) If we set k = 16 log(n)="2 (assuming log(n) > 6) the requirement of equation 1 is met for σ = k¡1=2. We see that a condition on σ = kADxk is su±cient for the projection to succeed w.h.p. This naturally de¯nes Â. d De¯nition 2.1. Let  ½ R be such that for all x 2  we have kxk2 = 1 and the spectral norm of ADx is at most k¡1=2. n o d ¡1=2 Â(A; "; n) = x 2 R j kxk = 1; kADxk2 · k (4) 2 2 where Dx is a diagonal matrix such that Dx(i; i) = x(i) and k = 16 log(n)=" . Lemma 2.2. For any column normalized matrix, A, and a random §1 diagonal matrix, Ds, the following holds: ¯ ¯ 2 8x 2 Â(A; "; n) Pr[¯kADsxk ¡ 1¯ ¸ "] · 1=n (5) Proof. By the de¯nition of  (2.1) and substituting σ into equation 3 It is convenient to think about  as the "good" set of vectors for which ADs is length preserving with high probability. Our goal is to characterize Â(A; "; n) for a speci¯c ¯xed matrix A, which we term Lean Walsh. It is described in the next section. 3 Lean Walsh transforms The Lean Walsh Transform, similar to the Walsh Transform, is a recursive tensor product matrix de¯ned by a constant seed matrix, A1, and constructed recursively by using Kronecker tensor products A`0 = A1 ­ A`0¡1. De¯nition 3.1. A1 is a Lean Walsh seed (or simply 'seed') if i) A1 is a rectangular, possibly complex, matrix r£c ¡1=2 ¡1=2 A1 2 C , such that r < c; ii) A1 is absolute valued r entree-wise, i.e, jA1(i; j)j = r ; iii) the rows of A1 are orthogonal; iv) all inner products between its di®erent columns are bounded, in absolute value, by a constant ½ < 1. ½ is called the Coherence of A1. 0 0 De¯nition 3.2. A` is a Lean Walsh transform if for all ` · ` we have A` = A1 ­ A`0¡1. Here ­ stands for the Kronecker product and A1 is a seed as in de¯nition 3.1. The following are examples of seed matrices: 0 1 1 1 ¡1 ¡1 0 1 B C 1 1 1 0 1 B C 00 1 @ A A1 = p B 1 ¡1 1 ¡1 C ;A1 = p (6) 3 @ A 2 1 e2¼i=3 e4¼i=3 1 ¡1 ¡1 1 These examples are a part of a large family of possible seed matrices. This family includes, amongst other 0 00 constructions, sub-Hadamard matrices (like A1) or sub-Fourier matrices (like A1 ). A simple construction is given for possible larger seeds: c c Fact 3.1. Let F be the c £ c Discrete Fourier matrix, not normalized such that jF (i; j)j = 1. De¯ne A1 p to be the matrix consisting of the ¯rst r = c ¡ 1 rows of F normalized by 1= r. A1 is a viable Lean Walsh seed with coherence 1=r. 3 p Proof. The fact that jA1(i; j)j = 1= r and the fact that the rows of A1 are orthogonal are trivial. Moreover, the inner product of two di®erent columns of A1 must be ½ = 1=r in absolute value. Xc c ¤ c (F (i; j1)) F (i; j2) = 0 (7) i 1 Xr 1 (F c(i; j ))¤F c(i; j ) = ¡ (F c(c; j ))¤F c(c; j ) (8) r 1 2 r 1 2 i 1 jhA(j1);A(j2)ij = (9) 1 1 r From elementary properties of Kronecker products we characterize A`. Using the number of rows, r, number of columns, c, and the coherence, ½, of A1. ® 2 ` Fact 3.2. The following facts hold true for A`: i) A` is of size d £ d ,® = log(r)= log(c) < 1 . c = d and ` ~ ~ ® ~¡1=2 r = d we have that d = d , ® = log(r)= log(c) < 1. ii) for all i and j, A`(i; j) 2 §d which means that A` is column normalized. iii) the rows of A` are orthogonal. d Fact 3.3. The time complexity of applying A` to any vector z 2 R is O(d). Proof. Let z = [z1; ::: ; zc] where zi are sections of length d=c of the vector z. Using the recursive decom- position for A` we compute A`z by ¯rst summing over the di®erent zi according to the values of A1 and d applying to each sum the matrix A`¡1. Denoting by T (d) the time to apply A` to z 2 R we get that T (d) = rT (d=c) + O(d). Due to the Master Theorem, and the fact that r < c, T (d) = O(d), or more speci¯cally T (d) · dcr=(c ¡ r). 0 For clarity, we demonstrate Fact 3.3 for the ¯rst example, A1: 0 1 0 1 z1 B C A0 (z + z ¡ z ¡ z ) B C B `¡1 1 2 3 4 C B z2 C 1 B C A0 z = A0 B C = p B A0 (z ¡ z + z ¡ z ) C (10) ` ` B C @ `¡1 1 2 3 4 A B z3 C 3 @ A 0 A`¡1(z1 ¡ z2 ¡ z3 + z4) z4 In what follows we characterize Â(A; "; n) for a general Lean Walsh transform by the parameters of its seed, r; c and ½. The omitted notation, A, stands for A` of the right size to be applied to x i.e ` = log(d)= log(c).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us