<<

1 On the Restricted Isometry of the Columnwise Khatri-Rao Saurabh Khanna, Student Member, IEEE and Chandra R. Murthy, Senior Member, IEEE Dept. of ECE, Indian Institute of Science, Bangalore, India {saurabh, cmurthy}@iisc.ac.in

Abstract—The columnwise Khatri-Rao product of two matrices product A B, for certain m×n sized system matrices A and is an important type, reprising its role as a structured B, plays the role of the sensing matrix used to generate linear sensing matrix in many fundamental linear inverse problems. measurements y of an unknown signal vector x according to Robust signal recovery in such inverse problems is often con- tingent on proving the restricted isometry property (RIP) of a y = (A B) x + w, (2) certain system matrix expressible as a Khatri-Rao product of two matrices. In this work, we analyze the RIP of a generic where w represents the additive measurement noise. It is now columnwise Khatri-Rao product matrix by deriving two upper well established in the sparse signal recovery literature [10]– kth k bounds for its order Restricted Isometry Constant ( -RIC) for [12] that, if the signal of interest, x, is a -sparse1 vector in different values of k. The first RIC bound is computed in terms k n, it can be stably recovered from its noisy underdetermined of the individual RICs of the input matrices participating in the R 2 Khatri-Rao product. The second RIC bound is probabilistic, and linear observations y ∈ Rm (m2 < n) in a computationally is specified in terms of the input matrix dimensions. We show that efficient manner provided that the sensing matrix (here, A B) the Khatri-Rao product of a pair of m×n sized random matrices satisfies the restricted isometry property defined next. comprising independent and identically distributed subgaussian A matrix Φ ∈ m n is said to satisfy the Restricted entries satisfies k-RIP with arbitrarily high probability, provided R × m exceeds O(k log n). Our RIC bounds confirm that the Khatri- Isometry Property (RIP) [13] of order k, if there exists a Rao product exhibits stronger restricted isometry compared to its constant δk(Φ) ∈ (0, 1), such that for all k-sparse vectors constituent matrices for the same RIP order. The proposed RIC z ∈ Rn, bounds are potentially useful in the sample complexity analysis 2 2 2 of several sparse recovery problems. (1 − δk(Φ))||z||2 ≤ ||Φz||2 ≤ (1 + δk(Φ))||z||2. (3) Index Terms—Khatri-Rao product, Kronecker product, com- The smallest constant δk(Φ) for which (3) holds for all k- pressive sensing, Restricted isometry property, covariance matrix sparse z is called the th order restricted isometry constant or estimation, multiple measurement vectors, PARAFAC, CANDE- k COMP, direction of arrival estimation. the k-RIC of Φ. Matrices with small k-RICs are good encoders for storing/sketching high dimensional vectors with k or fewer nonzero entries [14]. For example, δk(A B) < 0.307 is a I.INTRODUCTION sufficient condition for a unique k-sparse solution to (2) in the The Khatri-Rao product, denoted by the symbol , is noiseless case, and its perfect recovery via the `1 minimization a columnwise Kronecker product, which was originally in- technique [15]. As pointed out earlier, in many structured troduced by Khatri and Rao in [1]. For any two matrices signal recovery problems, the main sensing matrix can be A = [a1, a2 ..., ap] and B = [b1, b2 ..., bp] of sizes m × p expressed as a columnwise Khatri-Rao product between two and n × p, respectively, the columnwise Khatri-Rao product matrices. Thus, from a practitioner’s viewpoint, it is pertinent A B is a matrix of dimension mn × p defined as to study the restricted isometry property of a columnwise Khatri-Rao product matrix, which is the focus of this work. arXiv:1709.05789v3 [cs.IT] 24 Jul 2018 A B = [a1 ⊗ b1 a2 ⊗ b2 ... ap ⊗ bp] , (1) where a ⊗ b denotes the Kronecker product [2] between A. Applications involving Khatri-Rao matrices vectors a and b. That is, each column of A B is the Kronecker product between the respective columns of the We briefly discuss some examples where it is required to two input matrices A and B. In this article, we shall refer show the restricted isometry property of a KR product matrix. to the columnwise Khatri-Rao product as simply the Khatri- 1) Support recovery of joint sparse vectors from underde- Rao product or the KR product. Since the Kronecker product termined linear measurements: Suppose x1, x2,..., xL are n A⊗B comprises all pairwise Kronecker product combinations unknown joint sparse signals in R with a common k-sized of the columns of the input matrices, it can be shown that support denoted by an index set S. A canonical problem in A B = (A ⊗ B)J, where J is a p2 × p selection matrix multi-sensor signal processing is concerned with the recovery p2 of the common support S of the unknown signals from their with columns as a subset of the standard in R [3]. Khatri-Rao product matrices are encountered in several noisy underdetermined linear measurements y1, y2,..., yL ∈ m linear inverse problems of fundamental importance. Recent R generated according to examples include compressive sensing [4], [5], covariance yj = Axj + wj, 1 ≤ j ≤ L, (4) matrix estimation [6], [7], direction of arrival estimation [8] and decomposition [9]. In these examples, the KR 1A vector is said to be k-sparse if at most k of its entries are nonzero. 2

m n where A ∈ R × (m < n) is a known measurement matrix, as X = [vec(X1), vec(X2),..., vec(XK )]. In the parallel n and wj ∈ R models the noise in the measurements. This factor analysis (PARAFAC) model [20], the matrix X can be problem arises in many practical applications such as MIMO approximated as channel estimation, cooperative wideband spectrum sensing in X ≈ (A B)CT , (9) cognitive radio networks, target localization, and direction of arrival estimation. In [16], the support set S is recovered as where A, B and C are the loading matrices with columns the support of γˆ, the solution to the Co-LASSO problem: as the loading vectors ai, bi and ci, respectively. In many problems such as direction of arrival estimation using a 2D- 2 ˆ antenna array, the loading matrix C turns out to be row- Co-LASSO: min vec(Ryy) − (A A)γ +λ ||γ||1 , (5) γ 0 2  [21]. In such cases, the uniqueness of the ˆ 1 PL T PARAFAC model shown in (9) depends on the restricted where Ryy , L j=1 yjyj . From compressive sensing theory [11], the RIP of A A (also called the self Khatri-Rao isometry property of the Khatri-Rao product A B. product of A) determines the stability of the sparse solution Finding the exact kth order RIC of a given matrix X entails in the Co-LASSO problem. computation of extreme singular values of all possible k- In M-SBL [17], a different support recovery algorithm, A column submatrices of X, which is an NP hard task [22]. A satisfying 2k-RIP can guarantee exact recovery of S from In this work, we follow an alternative approach to analyzing multiple measurements [18]. the RIP of a KR product matrix. We seek to derive tight upper 2) Sparse sampling of stationary graph signals: Let bounds for its restricted isometry constants. T n x = [x1, x2,..., xn] ∈ C be a stochastic, zero mean and second-order stationary graph signal defined on n vertices of a B. Related Work graph G. This implies that the graph signal x can be modeled Perhaps the most direct approach for analyzing the RICs of as x = Hn, where H is any valid graph filter [19], and the KR product matrix is to use the eigenvalue interlacing n ∼ N (0, I ). The covariance matrix R = [xxH ] can n xx E theorem [23], which relates the singular values of any - then be expressed as k column submatrix of the KR product between two matrices to H H H Rxx = HE[nn ]H = HH the singular values of their Kronecker product. This is possible = Udiag(p)UH , (6) because any k columns of the KR product can together be interpreted as a submatrix of the Kronecker product. However, where the nonnegative vector p refers to the graph power barring the maximum and minimum singular values of the spectral density of the stationary graph signal x and the Kronecker product, there is no available explicit characteri- columns of U serve as Fourier like orthonormal basis for the zation of its non-extremal singular values that can be used to graph signal. obtain tight bounds for the k-RIC of the KR product. Bounding As motivated in [19], in many applicaions, we are interested the RIC using the extreme singular values of the Kronecker in reconstructing the sparse graph power spectral density product matrix turns out to be too loose to be useful. In this p by observing a small subset of the graph vertices. Let context, it is noteworthy to mention that an upper bound for y1, y2,..., yL denote the L independent obervations of the the k-RIC of the Kronecker product is derived in terms of the subsampled graph signal x, i.e., k-RICs of the input matrices in [4], [24]. However, the k-RIC of the Khatri-Rao product is yet to be analyzed. yj = Φxj, 1 ≤ j ≤ L, (7) Recently, [25], [26] gave probabilistic lower bounds for m n where Φ ∈ {0, 1} × is referred to as a binary subsampling the minimum singular value of the columnwise KR product matrix with m( n) rows, each row containing exactly one between two or more matrices. These bounds are limited to nonzero unity element. To recover the graph power spectral randomly constructed input matrices, and are polynomial in density p from the subsampled observations, we note that the matrix size. In [27], it is shown that for any two matrices L A and B, the Kruskal-rank2 of A B has a lower bound in 1 X H T H T yjy ≈ ΦR Φ = ΦUdiag(p)U Φ terms of K-(A) and K-rank(B). In fact, K-rank(A B) L j xx j=1 is at least as high as max (K-rank(A), K-rank(B)), thereby  L  suggesting that A B exhibits a stronger restricted isometry 1 X H property than both A and B. or, vec  yjyj  ≈ (ΦU∗ ΦU)p, (8) L j=1 A closely related yet weaker notion of restricted isometry constant is the τ-robust K-rank, denoted by K-rank . For a the superscript ∗ denoting conjugation without . Here τ given matrix Φ, the K-rank (Φ) is defined as the largest k for again, the Khatri-Rao product ΦU ΦU satisfying RIP of τ ∗ which every n × k submatrix of Φ has its smallest singular order 2r guarantees a unique r-sparse solution for p in (8). value larger than 1/τ. In [25], it is shown that the τ-robust K- 3) PARAFAC model for low-rank three-way arrays: rank is super-additive, implying that the K-rank of the Khatri- Consider an I × J × K tensor X of rank r. We can τ Rao product is strictly larger than individual K-rank s of the express X as the sum of r rank-one three way arrays as τ Pr input matrices. X = i=1 ai ◦ bi ◦ ci, where ai, bi, ci are loading vectors of dimension I, J, K, respectively, and ◦ denotes the vector outer 2The Kruskal rank or K-rank of any matrix A is the largest integer r such product. The tensor X itself can be arranged into a matrix that any r columns of A are linearly independent. 3

In [7], the isometry property of the Kronecker product A ⊗ of a vector x is denoted by ||x||2. For an m × n matrix A, Ax 2 B is analyzed in the `1-norm sense for the restricted class ||A|| denotes its operator norm, ||A|| sup n, || || . , x R x=0 x 2 of input vectors expressible as vectorized d-distributed sparse The Hilbert-Schmidt (or Frobenius) norm of∈ A 6is defined|| || matrices, wherein, A and B are the adjacency matrices of two Pm Pn 2 as ||A||HS = i=1 j=1 |Ai,j| . The symbol [n] denotes independent uniformly random δ-left regular bipartite graphs. the index set {1, 2, . . . , n}. For any index set S ⊆ [n], A In this paper, we instead analyze the restricted isometry of the denotes the submatrix comprising the columns of A indexedS columnwise Khatri-Rao product A B, which is equivalent by S. The matrices A ⊗ B, A ◦ B and A B denote to the RIP of the Kronecker product A ⊗ B with respect to the Kronecker product, Hadamard product and columnwise vectorized sparse diagonal matrices. In our work, we assume Khatri-Rao product of A and B, respectively. A ≤ B implies T H 1 that input matrices A and B are random with independent that B − A is a positive semidefinite matrix. A , A , A− , subgaussian elements. and A† denote the transpose, conjugate-transpose, inverse and generalized matrix inverse operations of A, respectively. E(x) C. Main Contributions denotes the expectation of the random variable x. P(E) denotes the probability of event E. (E|E ) denotes the conditional We derive two kinds of upper bound on the k-RIC of the P 0 probability of event E given event E has occurred. columnwise Khatri-Rao product of m × n sized matrices A 0 and B. The bounds are briefly described below. II.DETERMINISTIC k-RICBOUND 1) A deterministic upper bound for the k-RIC of A B in terms of the k-RICs of the input matrices A and B. The In this section, we present our first upper bound on the k- bound is valid for k ≤ m, and for input matrices with RIC of a generic columnwise KR product A B, for any two unit `2-norm columns. same-sized matrices A and B with normalized columns. The 2a) A probabilistic upper bound for the k-RIC of A B in bound is given in terms of the k-RICs of A and B. terms of m, n and k, for real valued random matrices Theorem 1. Let A and B be m × n sized real-valued A, B containing i.i.d. subgaussian elements. The proba- th matrices with unit `2-norm columns and satisfying the k bilistic bound is polynomially tight with respect to the A order restricted isometry property with constants δk and input matrix dimension n. B δk , respectively. Then, their columnwise Khatri-Rao product 2b) A probabilistic upper bound for the k-RIC of the self KR A B satisfies the restricted isometry property with k-RIC at product A A in terms of m, n, and k, where A is a 2 A B most δ , where δ , max δk , δk , i.e., real valued matrix containing i.i.d. subgaussian elements. 2 2 2 2 2 A key idea used in our RIC analysis is the fact (stated for- (1 − δ )||z||2 ≤ ||(A B)z||2 ≤ (1 + δ )||z||2 (10) mally as Proposition 12) that given any two matrices A and B, n holds for all k-sparse vectors z ∈ R . the Gram matrix of their Khatri-Rao product (A B)H (A B) can be interpreted as the Hadamard product (element wise Proof. The proof is provided in Section V. AH A BH B multiplication) between Gram matrices and . The Remark 1: The RIC bound for A B in Theorem 1 is Hadamard product form turns out to be more analytically relevant only when δ (A) and δ (B) lie in (0, 1), which tractable than columnwise Kronecker product form of the KR k k is true only for k ≤ m. In other words, the above k-RIC matrix. characterization for A B requires the input matrices A and Lately, in several machine learning problems, the necessary B to be k-RIP compliant. and sufficient conditions for successful signal recovery have Remark 2: Since the input matrices A and B satisfy k- been reported in terms of the RICs of a certain Khatri-Rao RIP with δ (A), δ (B) ∈ (0, 1), it follows from Theorem 1 product matrix serving asa pseudo sensing matrix [8], [16]. k k that δ (A B) is strictly smaller than max (δ (A), δ (B)). In light of this, our proposed RIC bounds are quite timely, and k k k If B = A, the special case of self Khatri-Rao product A A pave the way towards obtaining order-wise sample complexity arises, for which bounds in several fundamental learning problems. 2 The rest of this article is organized as follows. In Secs. II δk(A A) <δ k(A). (11) and III, we present our main results: deterministic and prob- Above implies that the self Khatri-Rao product A A is a bet- abilistic RIC bounds, respectively, for a generic columnwise ter restricted isometry compared to A itself. This observation KR product matrix. Sec. III also discusses about the RIP of the is in alignment with the expanding Kruskal rank and shrinking self Khatri-Rao product of a matrix with itself, an important mutual coherence of the self Khatri-Rao product reported in matrix type encountered in the sparse diagonal covariance [16]. In fact, for , the -RIC bound (11) exactly matches matrix estimation problem. In Secs. IV and VI, we present k = 2 2 the mutual coherence bound shown in [16]. some background concepts required for proving the proposed For k ∈ (m, m2], using ( [28], Theorem 1), one can RIC bounds. Secs. V and VII provide the detailed proofs of √  the deterministic and probabilistic RIC bounds, respectively. show that δk(A B) ≤ k + 1 δ√k, where δ√k =  Final conclusions are presented in Sec. VIII. max δ√k(A), δ√k(B) . This bound, however, loses its tight- Notation: In this work, bold lowercase letters are used for ness and quickly becomes unattractive for larger values of k. representing both random variables as well as vectors. Finding a tighter k-RIC upper bound for the k > m case Bold uppercase letters are reserved for matrices. The `2-norm remains an open problem. 4

normalized sensing matrices is commonplace in compressive 1 A sensing, we are often interested in showing the restricted δ2 B δ2 isometry of the Khatri-Rao product of randomly constructed 0.8 A B δ2 ⊙ input matrices with columns being normalized only in the A B Bound for δ2 ⊙ in Theorem-1 average sense. This concern is addressed by our second RIC 0.6 bound which is of probabilistic type and is applicable to the 2

δ columnwise KR product of random input matrices containing 0.4 i.i.d. subgaussian entries. Below, we define a subgaussian random variable and state some of its properties. 0.2 Definition 1. (Subgaussian Random Variable): A zero mean random variable x is called subgaussian, if its tail probability 0 100 300 500 700 900 is dominated by that of a Gaussian random variable. In other n words, there exist constants C, K > 0 such that P(|x| ≥ t) ≤ t2/K2 Ce− for t > 0. 1.2 Gaussian, Bernoulli and all bounded random variables are 1 subgaussian random variables. For a subgaussian random variable, its pth order moment grows no faster than O(pp/2) 0.8 [30]. In other words, there exists K1 > 0 such that

3 1 δ √ 0.6 p p (E |x| ) ≤ K1 p, p ≥ 1. (12) 0.4 The minimum such K1 is called the subgaussian or ψ2 norm 0.2 of the random variable x, i.e., 1 / p 0 ||x|| 1 2 |x| p (13) ψ2 = sup p− (E ) . 50 100 150 200 250 300 p 1 n ≥ Given a pair of random input matrices with i.i.d. subgaus- Fig. 1. Variation of k-RICs of A, B, A B and the proposed upper bound sian entries, Theorem 2 presents an upper bound on the k-RIC with increasing input matrix dimensions. The top and the bottom plots are of their columnwise KR product. for k = 2 and 3, respectively. Each data point is averaged over 10 trials. Theorem 2. Suppose A and B are m × n random matrices with real i.i.d. subgaussian entries, such that EAij = 0, To gauge the tightness of the proposed k-RIC bound for 2 EAij = 1, and ||Aij|| ≤ κ, and similarly for B. Then, the A B, we present its simulation-based quantification for ψ2 kth order restricted isometry constant of A B , denoted the case when the input matrices A and B are random √m √m by , satisfies ≤ with probability at least − 2(γ 1) Gaussian matrices with i.i.d. N (0, 1/m) entries. Fig. 1 plots δk δk δ 1 10n− − for any γ > 1, provided that δk(A), δk(B), δk(A B) and the upper bound δk(A B) = (max (δ (A), δ (B)))2 for a range of input matrix dimension k log n k k m ≥ 4cγκ4 , m. The aspect ratio m/n of the input matrices is fixed to o δ 0.5.3 For computational tractability, we restrict our analysis where κo = max (κ, 1) and c a universal positive constant. to the cases k = 2 and 3. The RICs: δk(A), δk(B) and δk(A B) are computed by exhaustively searching for the Proof. The proof is provided in Section VII. worst conditioned submatrix comprising k columns of A, B √ The normalization constant m used while computing and A B, respectively. From Fig. 1, we observe that the the KR product A B ensures that the columns of proposed k-RIC upper bound becomes tighter as the input √m √m A B matrices grow in size. Interestingly, the experiments suggest the input matrices √m , √m have unit average energy, i.e. √ 2 √ 2 that the KR product A B satisfies k-RIP in spite of the input E ||ai/ m||2 = E ||bi/ m||2 = 1 for 1 ≤ i ≤ n. Column matrices A and B failing to do so. A theoretical confirmation normalization is a key assumption towards correct modelling of this empirical observation is an interesting open problem. of the isotropic, norm-preserving nature of the effective sens- 1 ing matrix m (A B), an attribute desired in most sensing III.PROBABILISTIC k-RICBOUND matrices employed in practice. The deterministic RIC bound for the columnwise KR prod- Theorem 2 implies that uct discussed in the Section II is useful only when the input  A B  k log n δk √ √ ≤ O (14) matrices have unit norm columns. While the use of column m m m 3 While the m×n matrices A and B may represent highly underdetermined with arbitrarily high probability. Thus, the above k-RIC bound 2 linear systems (when m  n), their m ×n sized Khatri-Rao product A B decreases as m increases, which is intuitively appealing. can become an overdetermined system. In fact, many covariance matching based sparse support recovery algorithms [16], [17], [29] exploit this fact to Interestingly, for fixed k and n, the above k-RIC upper offer significantly better support reconstruction performance. A B 1 bound for √m √m decays as O( m ). This is a significant 5

1 yields the eigenvalue decomposition for . improvement over the O( √m ) decay rate [11] already known A ⊗ B for the individual k-RICs of the input subgaussian matrices For any two matrices of matching dimensions, say m × n, A and B . Thus, for any m, the Khatri-Rao product exhibits √m √m their Hadamard product A ◦ B is obtained by elementwise stronger restricted isometry property, with significantly smaller multiplication of the entries of the input matrices, i.e., k-RICs compared to the k-RICs for the input matrices. In some applications, the effective sensing matrix is express- [A ◦ B]i,j = aijbij for i ∈ [m], j ∈ [n]. (17) ible as the self-Khatri Rao product X X of a certain system matrix X with itself [16]. In Theorem 3 below, we present a The Hadamard product A ◦ B is a principal submatrix of the k-RIC bound for self-Khatri-Rao product matrices. Kronecker product A ⊗ B [3], [31]. For n × n sized square Theorem 3. Let A be an m × n random matrix with real matrices A and B, one can write, 2 i.i.d. subgaussian entries, such that EAij = 0, EAij = 1, and A ◦ B JT A ⊗ B J (18) ||A || ≤ κ. Then, the kth order restricted isometry constant = ( ) , ij ψ2 of the column normalized self Khatri-Rao product A A √m √m where J is an n2×n sized selection matrix constructed entirely 2(γ 1) T satisfies δk ≤ δ with probability at least 1 − 5n− − for from 0’s and 1’s which satisfies J J = In. any γ ≥ 1, provided In Proposition 5, we present an upper bound on the spectral   radius of a generic Hadamard product. 4 k log n m ≥ 4c0γκo . δ m n Proposition 5. For every A, B ∈ R × , we have Here, κo = max (κ, 1) and c0 > 0 is a universal constant. σmax (A ◦ B) ≤ rmax(A)cmax(B) (19) Proof. See Appendix B. Theorem 3 implies that where σmax(·), rmax(·) and cmax(·) are the largest singular value, the largest row ` -norm and the largest column ` -norm  A A  k log n 2 2 δk √ √ ≤ O (15) of the input matrix, respectively. m m m Proof. See Theorem 5.5.3 in [32]. with arbitrarily high probability. The above k-RIC bound for the self Khatri-Rao product scales with m, n, and k in a similar We now state an important result about the Hadamard fashion as the asymmetric Khatri-Rao product. product of two positive semidefinite matrices. Remark 3: From Theorem 3, the k-RIC of the columnwise Khatri-Rao product of two m × n sized subgaussian matrices Proposition 6 (Mond and Pecari˘ c´ [33]). Let A and B be can be guaranteed to be less than unity for m = O (k log n). positive semidefinite n × n Hermitian matrices and let r and We conjecture that the Khatri-Rao product continues to be k- s be two nonzero integers such that s > r. Then, RIP compliant even when m < k. However, proving such a result is beyond the scope of this work. (As ◦ Bs)1/s ≥ (Ar ◦ Br)1/r . (20)

IV. PRELIMINARIESFOR DETERMINISTIC k-RICBOUND In Propositions 7 and 8, we state some spectral properties of In this section, we present some preliminary concepts and correlation matrices and their Hadamard products. Correlation results which are necessary for the derivation of the determin- matrices are symmetric positive semidefinite matrices with istic k-RIC bound in Theorem 1. For the sake of brevity, we diagonal entries equal to one. Later on, we will exploit the provide proofs only for claims that have not been explicitly fact that the singular values of the columnwise KR product shown in their cited sources. are related to the singular values of the Hadamard product of certain correlation matrices. A. Properties of the Kronecker and Hadamard product Proposition 7. If A is an n×n correlation matrix, then A1/2◦ For any two matrices A and B of dimensions m × n and A1/2 is a doubly stochastic matrix. p × q, the kronecker product A ⊗ B is the mp × nq matrix   Proof. See Appendix A. a11B a12B ··· a1nB  a21B a22B ··· a2nB  Proposition 8 (Werner [34]). For any correlation matrices A A ⊗ B   (16) =  . . .. .  . 1/2 1/2  . . . .  and B of the same size, we have A ◦ B ≤ I, where A1/2 and B1/2 are the positive square roots of A and B, am1B am2B ··· amnB respectively. The following Proposition relates the spectral properties of the Kronecker product and its constituent matrices. Proof. Since A is a correlation matrix, from Proposition 7, 1/2 1/2 n n p p it follows that A ◦ A is doubly stochastic. Since the Proposition 4 (7.1.10 in [2]). Let A ∈ R × and B ∈ R × 1/2 1/2 T T rows and columns of A ◦ A sum to unity, we have admit eigenvalue decompositions UAΛAUA and UBΛBUB, 1/2 1/2 1/2 rmax(A ) = cmax(A ) = 1. Similarly, rmax(B ) = respectively. Then, 1/2 cmax(B ) = 1. Then, from Proposition 5, it follows that T 1/2 1/2 (UA ⊗ UB)(ΛA ⊗ ΛB)(UA ⊗ UB) the largest eigenvalue of A ◦ B is at most unity. 6

B. Matrix Kantorovich Inequalities Using Fact 8.21.29 in [2], i.e., (A ⊗ B)2 = A2 ⊗ B2, we get

Matrix Kantorovich inequalities relate positive definite ma- T 2 2 2 1 2 4 J (A ⊗ B )J − (A ◦ B) ≤ (M − m) In, trices by inequalities in the sense of the Lowner¨ partial order. 4 These inequalities can be used to extend the Lowner¨ partial Finally, by observing that JT (A2 ⊗ B2)J = A2 ◦ B2, we order to the Hadamard product of positive definite matrices. obtain the desired result. Our proposed RIC bound relies on the tightness of these Kantorovich inequalities and their extensions. Lemma 2 (Liu [38]). Let A, B be n × n positive definite A matrix version of the Kantorovich inequality was first correlation matrices. Then, proposed by Marshall and Olkin in [35]. It is stated below as √ 1/2 1/2 2 mM Proposition 9. A ◦ B ≥ In (25) m + M Proposition 9 (Marshall and Olkin [35]). Let A be an n × n where the eigenvalues of A and B lie inside [m, M]. positive definite matrix. Let A admit the Schur decomposition T A = UΛU with unitary U and Λ = diag(λ1, λ2, . . . , λn) Lemma 2 follows from Proposition 11, by replacing A with 1/2 1/2 such that λi ∈ [m, M]. Then, we have In ⊗ B, B with A ⊗ B , and X with J, where J is the n2 × n binary selection matrix such that JT J = I and A2 ≤ (M + m)A − mMI . (21) n A ◦ B = JT (A ⊗ B)J. The above inequality (21) is the starting point for obtaining a variety of forward and reverse Kantorovich-type matrix V. PROOFOFTHE DETERMINISTIC k-RICBOUND inequalities for positive definite matrices. In Propositions 10 (THEOREM 1) and 11, we state specific forward and reverse inequalities, The key idea used in bounding the k-RIC of the columnwise respectively, which are relevant to us. KR product A B is the observation that the Gram matrix Proposition 10 (Liu and Neudecker [36]). Let A be an n × n of A B can be interpreted as a Hadamard product between positive definite Hermitian matrix, with eigenvalues in [m, M]. the two correlation matrices AT A and BT B, as mentioned T Let V be an n × n matrix such that V V = In. Then, in the following Proposition.

T 2 T 2 1 2 m n V A V − V AV ≤ (M − m) In. (22) Proposition 12 (Rao and Rao [39]). For A, B ∈ R × , 4 Proposition 11 (Liu and Neudecker [37]). Let A and B be (A B)T (A B) = (AT A) ◦ (BT B) (26) n×n positive definite matrices. Let m and M be the minimum 1/2 1 1/2 Proof. See Proposition 6.4.2 in [39]. and maximum eigenvalues of B A− B . Let X be an n × p matrix. Then, we have Then, by using the forward and reverse Kantorovich matrix inequalities, we obtain the proposed upper bound for k-RIC of T T T 4mM T 1 (X BX)(X AX)†(X BX) ≥ X BA− BX. (M + m)2 A B in Theorem 1 as explained in the following arguments. (23) Without loss of generality, let S ⊂ [n] be an arbitrary index set representing the nonzero support of z in (10), with Proposition 10 can be proved using (21) by pre- and |S| ≤ k. Let A denote the m × |S| submatrix of A, post-multiplying by VT and V, respectively, followed by S constituting |S| columns of A indexed by the set S. Let B completion of squares for the right hand side terms. The proof S be constructed similarly. Since δ (A), δ (B) < 1, both A , of Proposition 11 is given in [37]. k k S BS have full column rank, and consequently the associated T T Gram matrices AS AS, BS BS are positive definite. Further, C. Kantorovich Matrix Inequalities for the Hadamard Prod- T since A and B have unit norm columns, both AS AS and ucts of Positive Definite Matrices T BS BS are correlation matrices with unit diagonal entries. Lemmas 1 and 2 stated below extend the Kantorovich in- Using Proposition 12, we can write equalities from the previous subsection to Hadamard products. (A B )T (A B ) = AT A ◦ BT B . (27) Lemma 1 (Liu and Neudecker [36]). Let A and B be S S S S S S S S n × n positive definite matrices, with m and M denoting the Next, for k ≤ m, by applying Lemma 1 to the positive T 1/2 T 1/2 minimum and maximum eigenvalues of A ⊗ B. Then, we have definite matrices (AS AS) and (BS BS) , we get, 1 2 2 2 2 2 T T  T 1 T 1 1 2 A ◦ B ≤ (A ◦ B) + (M − m) In. (24) A AS ◦ B BS ≤ (A AS) 2 ◦(B BS) 2 + (M −m) Ik 4 S S S S 4 T Proof. Let J be the selection matrix such that J J = In and 1 2 ≤ Ik + (M − m) Ik, (28) A◦B = JT (A⊗B)J. Then, by applying Proposition 10 with 4 A replaced with A ⊗ B, and V replaced with J, we obtain where the second inequality is a consequence of the unity

T 2 T 2 1 2 bound on the spectral radius of the Hadamard product between J (A ⊗ B) J − (J (A ⊗ B)J) ≤ (M − m) In. 4 correlation matrices, shown in Proposition 8. In (28), M and

4 m are upper and lower bounds for the maximum and minimum The Lowner¨ partial order here refers to the relation “≤”. For positive T 1/2 T 1/2 definite matrices A and B, A ≤ B if and only if B − A is a positive eigenvalues of (AS AS) ⊗ (BS BS) , respectively. From semi-definite matrix. the restricted isometry of A and B, and by application of 7

Proposition 4, the minimum and maximum eigenvalues of Proof. The desired tail bound is obtained by using the Hanson- T 1/2 T 1/2  T T T (A AS) ⊗ (B BS) are lower and upper bounded by Wright inequality in Theorem 13 with x = u v and q S S q − A − B and A B , respectively. A = [0n n | D; 0n n | 0n n]. (1 δk )(1 δk ) (1 + δk )(1 + δk ) × × × A B By introducing δ , max(δk , δk ), it is easy to check that T 1/2 T 1/2 VII.PROOFOFTHE PROBABILISTIC k-RICBOUND the eigenvalues of (AS AS) ⊗ (BS BS) also lie inside the interval [1 − δ, 1 + δ]. Plugging m = 1−δ and M = 1+δ (THEOREM 2) in (28), and by using (27), we get The proof of Theorem 2 starts with a variational definition   T 2 of the k-RIC, δ A B given below. (AS BS) (AS BS) ≤ 1 + δ Ik. (29) k √m √m AT A BT B     2 Similarly, by applying Lemma 2 to S S and S S with A B A B δk √ √ = sup √ √ z − 1 . m = 1 − δ and M = 1 + δ, we obtain n m m z R , m m 2   ∈ T 1/2 T 1/2 p 2 z 2=1, z 0 k (AS AS) ◦ (BS BS) ≥ 1 − δ Ik. || || || || ≤ (31) T T In order to find a probabilistic upper bound for δk, we intend From Proposition 6, we have AS AS ◦ BS BS ≥   2 A B T 1/2 T 1/2 to find a constant δ ∈ (0, 1) such that P(δk m m ≥ δ) (AS AS) ◦ (BS BS) . Therefore, we can write √ √ is arbitrarily close to zero. We therefore consider the tail event T T 2 AS AS ◦ BS BS ≥ 1 − δ Ik.      2  Further, using (27), we get  A B  E sup √ √ z − 1 ≥ δ , , n T 2  z R , m m 2  (AS BS) (AS BS) ≥ 1 − δ Ik. (30)  ∈  z 2=1, z 0 k || || || || ≤ (32) Finally, Theorem 1’s statement follows from (29) and (30). and show that for m sufficiently large, P(E) can be driven arbitrarily close to zero. In other words, the constant δ serves VI.PRELIMINARIESFOR PROBABILISTIC k-RICBOUND  A B  as a probabilistic upper bound for δk . Let Uk In this section, we briefly discuss some concentration results √m √m denote the set of all k or less sparse unit norm vectors in for functions of subgaussian random variables which will n. Then, using Proposition 12, the tail event in (32) can be appear in the proofs of Theorems 2 and 3. R rewritten as   A. A Tail Probability for Subgaussian Vectors T T 2 2 P(E) = P sup z (A B) (A B) z − m ≥ δm z k The theorem below presents the Hanson-Wright inequality  ∈U  T T T  2 2 [40], [41], a tail probability for a quadratic form constructed = P sup z A A ◦ B B z − m ≥ δm using independent subgaussian random variables. z k  ∈U  n n Theorem 13 (Rudelson and Vershynin [41]). Let x = X X T  T  2 2 n = P  sup zizj ai aj bi bj − m ≥ δm (33), (x1, x2,..., xn) ∈ be a random vector with independent R z k i=1 j=1 components x satisfying x and ||x || ≤ . Let A ∈U i E i = 0 i ψ2 K be an n × n matrix. Then, for every t ≥ 0, where ai and bi denote the ith column of A and B, re- spectively. Further, by applying the triangle inequality and the  xT Ax − xT Ax P E > t union bound, the above tail probability splits as " 2 !# t t n ! ≤ 2 exp −c min , X 4 2 K2 |||A||| E ≤ 2 ||a ||2 ||b ||2 − 2 ≥ 2 K ||A||HS 2 P( ) P sup zi i 2 i 2 m αδm z k ∈U i=1 where c is a universal positive constant.   n n X X T T 2 The following corollary of Hanson-Wright inequality dis- +P  sup zizjai ajbi bj ≥ (1 − α)δm  .(34) z k cusses the concentation of weighted inner product between ∈U i=1 j=1,j=i 6 two subgaussian vectors. In the above, α ∈ (0, 1) is a variational union bound parameter n Corollary 1. Let u = (u1, u2,..., un) ∈ R and v = which can be optimized at a later stage. We now proceed to n (v1, v2,..., vn) ∈ R be independent random vectors with find separate upper bounds for each of the two probability independent subgaussian components satisfying Eui = Evi = terms in (34). and ||u || ≤ , ||v || ≤ . Let D be an × matrix. 0 i ψ2 K i ψ2 K n n The first probability term in (34) admits the following Then, for every t ≥ 0, sequence of relaxations.  T n ! P u Dv > t X sup z2 ||a ||2 ||b ||2 − m2 ≥ αδm2 " 2 !# P i i 2 i 2 t t z k ≤ 2 exp −c min , ∈U i=1 4 ||D||2 K2 |||D||| n ! K HS 2 (a) X ≤ 2 ||a ||2 ||b ||2 − 2 ≥ 2 P sup zi i 2 i 2 m αδm where c is a universal positive constant. z k ∈U i=1 8

(b)     2 2 2 2 ≤ P max ||ai||2 ||bi||2 − m ≥ αδm T T 1 i n ≤ k  max ai aj bi bj  , (36) ≤ ≤ i,j [n],   i∈=j (c) [ n o 6 ||a ||2 ||b ||2 − 2 ≥ 2 = P  i 2 i 2 m αδm  1 i n where the second step is the application of the H√olders¨ ≤ ≤ inequality. The last step uses the fact that ||z|| ≤ k for (d)   1 ||a ||2 ||b ||2 − 2 ≥ 2 = nP 1 2 1 2 m αδm z ∈ Uk. Using (36), and by applying the union bound over n possible distinct (i, j) pairs, the second probability term (e)   2 ≤ ||a ||2 − ||b ||2 − ≥ 2 nP 1 2 m 1 2 m αβδm in (34) can be bounded as  α(1 − β)δm 2  n n  +2nP ||a1||2 − m ≥ 2 X X T T 2 P  sup zizjai ajbi bj ≥ (1 − α)δm  (f) z k  2 p  ∈U i=1 j=1,j=i ≤ 2nP ||a1|| − m ≥ αβδm 6 2 2  2    n T T (1 − α)δm α(1 − β)δm ≤ P a a2 b b2 ≥ ||a ||2 − ≥ 2 1 1 k +2nP 1 2 m . 2 p ! (1 − α)δm (g)    2 T 2 αδm αδ ≤ n P a1 a2 ≥ √ ≤ 4nP ||a1|| − m ≥ 1 − 2 2 4 k   (h) α2δ2 2 c(1−α)δm c(1−α)δm 2 cm 4 (1 αδ/4) 2 4 κ4k log n ≤ 8ne− 4κo − ≤ 2n e− κok = 2n− o − . (37)   cmα2δ2(1−αδ/4)2 4 1 = 8n− 4κo log n − . (35) The last inequality in the above is obtained by using the tail T bound for |a1 a2| from Corollary 1. Finally, by combining In the above, step (a) follows from the triangle inequality (34), (35) and (37), and setting α = 1/2, we obtain the 2 combined with the fact that zi ’s sum to one. The inequality following simplified tail bound, in step (b) is a consequence of the fact that any nonnegative  2 2    convex combination of n arbitrary numbers is at most the cmδ (1−δ/8) 1 cδm 2 16κ4 log n 2κ4k log n maximum among the n numbers. Step (c) is obtained by P(E) ≤ 8n− o − + 2n− o − . (38) simply rewriting the tail event for the maximum of n random  2 4  4γκok log n 32γκo log n variables as the union of tail events for the individual random From (38), for m > max cδ , cδ2(1 δ/8)2 and any variables. Step (d) is the application of the union bound over 2(γ 1) − γ > 1, we have P(E) < 10n− − . Note that, in terms of k values of index i ∈ [n] and exploiting the i.i.d. nature of the and n, the first term in the inequality for m scales as k log n; columns of A and B. Step (e) is the union bound combined it dominates the second term, which scales as log n. This ends m with the fact that for any two vectors a, b ∈ R , the following our proof. triangle inequality holds:     ||a||2 ||b||2 − 2 ≤ ||a||2 − ||b||2 − 2 2 m 2 m 2 m VIII.CONCLUSIONS

||a||2 − ||b||2 − +m 2 m + m 2 m . In this work, we have analyzed the restricted isometry property of the columnwise Khatri-Rao product matrix in In step (e), β ∈ (0, 1) is a variational union bound parameter. terms of its restricted isometry constants. We gave two upper Step (f) is once again the union bound which exploits the bounds for the k-RIC of a generic columnwise Khatri-Rao fact that the columns a1 and b1 are identically distributed. product matrix. The first k-RIC bound, a deterministic bound, Step (g) is obtained by setting β = αδ/4. Lastly, step (h) is valid for the Khatri-Rao product of an arbitrary pair of is the Hanson-Wright inequality (Theorem 13) applied to the input matrices of the same size with normalized columns. subgaussian vector a1. It is conveniently computed in terms of the k-RICs of the Next, we turn our attention to the second tail probability input matrices. We also gave a probabilistic RIC bound for term in (34). We note that the columnwise KR product of a pair of random matrices with i.i.d. subgaussian entries. The probabilistic RIC bound

n n is one of the key components needed for computing sample X X T T sup zizjai ajbi bj complexity bounds for several machine learning algorithms. z k ∈U i=1 j=1,j=i The analysis of the RIP of Khatri-Rao product matrices in 6 n n this article can be extended in multiple ways. The current RIC X X T T ≤ sup |zizj| ai aj bi bj bounds can be extended to the Khatri-Rao product of three or z k ∈U i=1 j=1,j=i more matrices. More importantly, in order to relate the RICs 6  n n   to the dimensions of the input matrices, we had to resort to the X X T T ≤ sup  |zizj| max ai aj bi bj  randomness in their entries. Removing this randomness aspect z k i,j supp(u), ∈U i=1 j=1,j=i ∈i=j of our results could be an interesting direction for future work. 6 6 9

n ! APPENDIX X ≤ 2 ||a ||4 − 2 ≥ 2 P sup zi i 2 m αδm A. Proof of Proposition 7 z k ∈U i=1   Proof. Since A is a correlation matrix, it admits the Schur n n T X X T 2 2 decomposition, A = UΛU , with unitary U and eigenvalue +P  sup zizj ai aj ≥ (1 − α)δm (41). z k matrix Λ = diag(λ , λ , . . . , λn). Since A is positive semi- ∈U i=1 j=1,j=i 1 2 6 definite, its nonnegative square-root exists and is given by A1/2 = UΛ1/2UT . Consider In the above, the second identity follows from Proposition 12. n n The last inequality uses the triangle inequality followed by the 1/2 1/2 X 1/2 T X 1/2 T union bound, with ∈ being a variational parameter to A ◦ A = λi uiui ◦ λj ujuj α (0, 1) i=1 j=1 be optimized later. Similar to the proof of Theorem 2, we now n n derive separate upper bounds for each of the two probability X X 1/2 1/2 T  T  = λi λj uiui ◦ ujuj terms in (41). i=1 j=1 The first term in (41) admits the following series of relax- n n X X 1/2 1/2 T ations. = λi λj (ui ◦ uj)(ui ◦ uj) .(39) n ! i=1 j=1 X 2 ||a ||4 − 2 ≥ 2 P sup zi i 2 m αδm The second equality above follows from the distributive z k ∈U i=1 property of the Hadamard product and the last step follows n ! (a) from Fact 7.6.2 in [2]. Using (39), we can show that the rows X 2 4 2 2 ≤ P sup zi ||ai|| − m ≥ αδm 1/2 1/2 2 A ◦ A z k and columns of sum to one, as follows: ∈U i=1 n n (b)   X X 1/2 1/2 T 4 2 2 (A1/2 ◦ A1/2)1 = λ λ (u ◦ u )(u ◦ u ) 1 ≤ P max ||ai|| − m ≥ αδm i j i j i j 1 i n 2 i=1 j=1 ≤ ≤ (c)   n n ≤ ||a ||4 − 2 ≥ 2 X X 1/2 1/2 T nP 1 2 m αδm . = λi λj (ui ◦ uj)(ui ◦ 1) uj (d)  2  i=1 j=1 ≤ ||a ||2 − ≥ 2 n n nP 1 2 m αβδm X X 1/2 1/2 T = λi λj (ui ◦ uj) ui uj  α(1 − β)δm ||a ||2 − ≥ i=1 j=1 +nP 1 2 m n 2 X (e)   = λi (ui ◦ ui) = d (say). 2 p ≤ nP ||a1||2 − m ≥ αβδm i=1   The above arguments follow from the orthonormality of 2 α(1 − β)δm +nP ||a1|| − m ≥ . the columns of U, and repeated application of Fact 7.6.1 in 2 2 n P 2 (f)    [2]. Note that for k ∈ [n], d(k) = i=1 λ(i)(ui(k)) = 2 αδm αδ T 1/2 ≤ 2nP ||a1|| − m ≥ 1 − [UΛU ]kk = Akk = 1. Thus, we have shown that (A ◦ 2 2 4 1/2 T 1/2 A )1 = 1. Likewise, it can be shown that 1 (A ◦ 2 2 (g) cm α δ (1 αδ/4)2 / T / / 4κ4 A1 2) = 1 . Thus, A1 2 ◦ A1 2 is doubly stochastic. ≤ 4ne− o −  2 2 2  cmα δ (1−αδ/4) 1 4κ4 log n B. Proof of Theorem 3 = 4n− o − . (42) Proof. The proof of Theorem 3 follows along similar lines as In the above, step (a) is the triangle inequality. The inequality that of Theorem 2. We consider the tail event in step (b) follows from the fact that nonnegative convex ( )  A A  2 combination of n arbitrary numbers is at most the maximum E √ √ z − ≥ (40) 1 , sup 1 δ , among the n numbers. Step (c) is a union bound. Step (d) z k m m 2 ∈U is also a union bounding argument with β ∈ (0, 1) as a and show that for sufficiently large m, P(E1) can be driven variational parameter, combined with the fact that for any vec- arbitrarily close to zero, thereby implying that δ is a proba- 2 √ √ tor a, the triangle inequality ||a||4 − m2 ≤ ||a||2 − m + bilistic upper bound for A A . Once again, 2 2 δk (( / m / m)) U 2 k denotes the set of all k or less sparse unit norm vectors in 2m ||a||2 − m is always true. Step (f) is obtained by choos- m R . We note that P(E1) admits the following union bound: ing the union bound parameter β = αδ/4. Finally, step (g)   is the Hanson-Wright inequality (Theorem 13) applied to the T T 2 2 (E1) = sup z (A A) (A A) z − m ≥ δm P P subgaussian vector a1. z k  ∈U  Next, we derive an upper bound for the second tail proba- T T T  2 2 = P sup z A A ◦ A A z − m ≥ δm bility term in (34). We observe that z k  ∈U  n n n n X X T 2 2 2 X X T 2 = P  sup zizj ai aj − m ≥ δm  sup zizj ai aj z k z k ∈U i=1 j=1 ∈U i=1 j=1,j=i 6 10

n n X X 2 ≤ | | aT a  [9] N. D. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing sup zizj i j for sparse low-rank ,” IEEE Signal Process. Lett., vol. 19, no. 11, z k ∈U i=1 j=1,j=i 6 pp. 757–760, Nov. 2012.  n n    [10] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of sparse X X T 2 overcomplete representations in the presence of noise,” IEEE Trans. Inf. ≤ sup  |zizj|  max ai aj  Theory, vol. 52, no. 1, pp. 6–18, Jan. 2006. z k i,j supp(u), ∈U i=1 j=1,j=i ∈ [11] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive 6 i=j   6 Sensing. Birkhauser¨ Basel, 2013. [12] E. J. Candes,` J. K. Romberg, and T. Tao, “Stable signal recovery from 2 T incomplete and inaccurate measurements,” Communications on Pure and ≤ k  max ai aj  , (43) i,j [n], Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006. i∈=j 6 [13] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec 2005. where the second step is the application of the H√olders¨ [14] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof inequality. The last step uses the fact that ||z||1 ≤ k for of the restricted isometry property for random matrices,” Constructive z ∈ Uk. Using (43), and by applying the union bound over Approximation, vol. 28, no. 3, pp. 253–263, 2008. n [15] T. T. Cai, L. Wang, and G. Xu, “New bounds for restricted isometry 2 possible distinct (i, j) pairs, the second probability term constants,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4388–4394, Sept in (34) can be bounded as 2010.   [16] P. Pal and P. P. Vaidyanathan, “Pushing the limits of sparse support n n recovery using correlation information,” IEEE Trans. Signal Process., X X T 2 2 P  sup zizj ai aj ≥ (1 − α)δm  vol. 63, no. 3, pp. 711–726, Feb. 2015. z k [17] D. P. Wipf and B. D. Rao, “An empirical Bayesian strategy for solving ∈U i=1 j=1,j=i 6 the simultaneous sparse approximation problem,” IEEE Transactions on 2  2  n T 2 (1 − α)δm Signal Processing, vol. 55, no. 7, pp. 3704–3716, Jul. 2007. ≤ P a1 a2 ≥ [18] S. Khanna and C. R. Murthy, “On the support recovery of jointly 2 k sparse Gaussian sources using sparse Bayesian learning,” CoRR, vol. p ! abs/1703.04930. [Online]. Available: http://arxiv.org/abs/1703.04930 2 T (1 − α)δm ≤ n P a1 a2 ≥ √ [19] S. P. Chepuri and G. Leus, “Graph sampling for covariance estimation,” k IEEE Transactions on Signal and Information Processing over Networks,   vol. 3, no. 3, pp. 451–466, Sept 2017. c(1−α)δm c(1−α)δm 2 [20] N. D. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing 2 4 κ4k log n ≤ n e− κok = n− o − . (44) for sparse low-rank tensors,” IEEE Signal Processing Letters, vol. 19, no. 11, pp. 757–760, Nov. 2012. The last inequality in the above is obtained by using the tail [21] D. Nion and N. D. Sidiropoulos, “A PARAFAC-based technique for T detection and localization of multiple targets in a MIMO radar system,” bound for |a1 a2| from Corollary 1. Finally, by combining in Proc. ICASSP, 2009, pp. 2077–2080. (41), (42) and (44), and setting α = 1/2, we obtain the [22] A. M. Tillmann and M. E. Pfetsch, “The computational complexity of the following tail bound. restricted isometry property, the nullspace property, and related concepts

 2 2    in compressed sensing,” IEEE Trans. Inf. Theory, vol. 60, no. 2, pp. cmδ (1−δ/8) cδm 4 1 4 2 1248–1259, Feb. 2014. − 16κo log n − − 2κok log n − P(E) ≤ 4n + n . (45) [23] R. A. Horn and C. R. Johnson, Eds., Matrix Analysis. Cambridge University Press, 1986. From (45), we can conclude that for m > [24] S. Jokar, “Sparse recovery and Kronecker products,” in 44th Annual  4 4  4γκok log n 32γκo log n Conference on Information Sciences and Systems, Mar. 2010, pp. 1–4. max , 2 2 and any γ > 1, we have cδ cδ (1 δ/8) [25] A. Bhaskara, M. Charikar, A. Moitra, and A. Vijayaraghavan, “Smoothed 2(γ 1) − P(E) < 5n− − . Note that, in terms of k and n, the first analysis of tensor decompositions,” in Proceedings of the 46th Annual term in the inequality for m scales as k log n ; it dominates ACM Symposium on Theory of Computing, 2014, pp. 594–603. δ [26] J. Anderson, M. Belkin, N. Goyal, L. Rademacher, and J. R. Voss, “The the second term, which scales as log n. more, the merrier: the blessing of dimensionality for learning large Gaussian mixtures,” in 27th Annual Conference on Learning Theory, REFERENCES Jun. 2014, pp. 1135–1164. [27] N. D. Sidiropoulos and R. Bro, “On the uniqueness of multilinear [1] C. G. Khatri and C. R. Rao, “Solutions to some functional equations decomposition of N-way arrays,” Journal of Chemometrics, vol. 14, and their applications to characterization of probability distributions,” no. 3, pp. 229–239, 2000. Sankhya: The Indian Journal of Statistics, Series A (1961-2002), vol. 30, [28] P. Koiran and A. Zouzias, “Hidden cliques and the certification of the no. 2, pp. 167–180, 1968. restricted isometry property,” IEEE Trans. Inf. Theory, vol. 60, no. 8, [2] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas pp. 4999–5006, Aug. 2014. (Second Edition). Princeton University Press, 2009. [29] S. Khanna and C. R. Murthy, “Renyi´ divergence based covariance [3] S. Liu and G. Trenkler, “Hadamard, Khatri-Rao, Kronecker and other matching pursuit of joint sparse support,” in Sig. Process. Advances matrix products,” International Journal of Information and System in Wireless Commun. (SPAWC), Jul. 2017, pp. 1 – 5. Sciences, vol. 4, no. 1, pp. 160–177, 2007. [30] R. Vershynin, “Introduction to the non-asymptotic analysis of random [4] M. F. Duarte and R. G. Baraniuk, “Kronecker compressive sensing,” matrices,” 2010. IEEE Trans. Image Process., vol. 21, no. 2, pp. 494–504, Feb. 2012. [31] G. Visick, “A quantitative version of the observation that the Hadamard [5] A. Koochakzadeh and P. Pal, “Sparse source localization in presence of product is a principal submatrix of the Kronecker product,” Linear co-array perturbations,” in International Conference on Sampling Theory Algebra and its Applications, vol. 304, no. 13, pp. 45 – 68, 2000. and Applications (SampTA), May 2015, pp. 563–567. [32] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge; [6] D. Romero, D. D. Ariananda, Z. Tian, and G. Leus, “Compressive co- New York: Cambridge University Press, 1994. variance sensing: Structure-based compressive sensing beyond sparsity,” [33] B. Mond and J. E. Pecari˘ c,´ “Inequalities for the Hadamard product of IEEE Signal Process. Mag., vol. 33, no. 1, pp. 78–93, Jan. 2016. matrices,” SIAM J. Matrix Anal. Appl., vol. 19, no. 1, pp. 66–70, Jan. [7] G. Dasarathy, P. Shah, B. N. Bhaskar, and R. D. Nowak, “Sketching 1998. sparse matrices, covariances, and graphs via tensor products,” IEEE [34] H. J. Werner, “Hadamard product of square roots of correlation matri- Trans. Inf. Theory, vol. 61, no. 3, pp. 1373–1388, Mar. 2015. ces,” Image, vol. 26, pp. 1–32, Apr. 2001. [8] W. K. Ma, T. H. Hsieh, and C. Y. Chi, “DOA estimation of quasi- [35] A. W. Marshall and I. Olkin, “Reversal of the Lyapunov, Holder,¨ stationary signals with less sensors than sources and unknown spatial and Minkowski inequalities and other extensions of the Kantorovich noise covariance: A Khatri-Rao subspace approach,” IEEE Trans. Signal inequality,” Journal of Mathematical Analysis and Applications, vol. 8, Process., vol. 58, no. 4, pp. 2168–2180, Apr. 2010. no. 3, pp. 503 – 514, 1964. 11

[36] S. Liu and H. Neudecker, “Several matrix Kantorovich-type inequali- [39] C. R. Rao and M. B. Rao, Matrix Algebra and Its Applications to ties,” Journal of Mathematical Analysis and Applications, vol. 197, no. 1, Statistics and Econometrics. Singapore: World Scientific, 1998. pp. 23 – 26, 1996. [40] R. Adamczak, “A note on the Hanson-Wright inequality for random [37] ——, “Kantorovich inequalities and efficiency comparisons for several vectors with dependencies,” Electron. Commun. Probab., vol. 20, pp. classes of estimators in linear models,” Statistica Neerlandica, vol. 51, 72–84, 2015. no. 3, pp. 345–355, 1997. [41] M. Rudelson and R. Vershynin, “Hanson-Wright inequality and sub- [38] S. Liu, “On the Hadamard product of square roots of correlation gaussian concentration,” Electron. Commun. Probab., vol. 18, pp. 82–91, matrices,” Econometric Theory, vol. 18, no. 4, p. 1007, Aug. 2002. 2013.