<<

-Tensor Products with Invertible Linear Transforms

Eric Kernfelda, Misha Kilmera, Shuchin Aeronb

aDepartment of , Tufts University, Medford, MA bDepartment of Electrical and Computer , Tufts University, Medford, MA

Abstract Research in tensor representation and analysis has been rising in popularity in direct response to a) the increased ability of data collection systems to store huge volumes of multidimensional data and b) the recognition of potential modeling accuracy that can be provided by leaving the data and/or the in its natural, multidi- mensional form. In comparison to theory, the study of is still in its early stages. In recent work [1], the authors introduced the notion of the t-product, a generalization of matrix multiplication for tensors of order three, which can be extended to multiply tensors of arbitrary order [2]. The multiplication is based on a convolution-like operation, which can be implemented efficiently using the Fast Fourier Transform (FFT). The corresponding linear algebraic framework from the original work was further developed in [3], and it allows one to elegantly generalize all classical algorithms from linear algebra. In the first half of this paper, we develop a new tensor- for third-order tensors that can be implemented by us- ing a discrete cosine transform, showing that a similar linear algebraic framework applies to this new tensor operation as well. The second half of the paper is devoted to extending this development in such a way that tensor-tensor products can be de- fined in a so-called transform domain for any invertible linear transform. We present the algebraic (modular) structure induced by the new multiplication and resulting norms. Finally, in the spirit of [4], we give a matrix-algebra based interpretation of the new tensor-tensor product, and discuss how to choose the transform defining the product from a design perspective. We demonstrate the applicability of our new framework on two problems: image deblurring and video compression. Keywords: tensor, multiway, SVD, DCT, linear transformation

1. Introduction Recently, the t-product has been introduced as a useful generalization of matrix multiplication for tensors of third order (i.e. three-way arrays) [1] and higher [2]. The

Preprint submitted to Linear Algebra and its Applications July 28, 2014 t-product results in sensible definitions of and orthogonality, and it allows for QR- and SVD-like decompositions [1], where the latter can be leveraged to derive an Eckhart-Young like theorem for third-order tensors. Many matrix algorithms carry over simply by replacement of traditional matrix multiplication with the t- product [4]. The t-product has been applied to image deblurring [1], face recognition [5], completion of seismic data [6], and tensor compression [7]. As described in detail in [1], the t-product of a pair of third order tensors is defined by unfolding tensors into block circulant matrices, multiplying the matrices, and folding the result back up into a three-dimensional array. In [1], it is shown that the t-product can be computed by performing a discrete Fourier transform along the mode-three fibers of each tensor (i.e. into the page), performing pair-wise matrix products for all frontal faces of the tensors in the “transform domain,” and then applying an inverse DFT along the mode-three fibers of the result. In [4, 3, 8], the authors exploit the fact that the t-product between 1 × 1 × n tensors is in fact equivalent to discrete convolution of two vectors, so that the t-product between a pair of third-order tensors amounts to the traditional matrix multiplication algorithm between two matrices whose entries are 1 × 1 × n tensors, where the usual product is replaced by discrete convolution. The paper [9] extends this viewpoint using convolutions indexed by abstract finite groups. It has been shown that convolution multiplication properties extend to different types of convolution [10], and there are corresponding structured matrix diagonaliza- tion results involving real-valued fast transforms [11]. Some of these transforms, such as the discrete cosine transform, are known to be highly useful in image processing [12]. Since imaging provides many existing applications of the t-product, in this pa- per our first goal is to develop a similar tensor-tensor product to be computed with the discrete cosine transform. This has the advantage that we can avoid complex arithmetic, which is required to use the t-product. In the new framework, we can also implement a set of boundary conditions that works better in image deblurring than the ones associated with the t-product [13]. Using the implementation of the t-product as motivation, we then take the natural step of investigating a family of tensor-tensor products defined in the transform domain for any invertible linear transform L. This leaves open the possibility of using transforms for L, such as orthogonal wavelets, that have no interpretation in terms of the convolution, and it adds considerable flexibility in modeling. Indeed, given a diagonalizable operator T on C1 1 n, we can choose L so that for all x, T (x) can be rewritten as a “scalar” multiplication× × of x by another element of C1 1 n (details are in section 6). Since linear operators are diagonalizable with probability× × one, almost any linear operator can be represented solely by scalar multiplication.

2 This paper is organized as follows. In Section 2, we give notation and definitions, and review some background in order to properly motivate the first new tensor-tensor product, which we call the cosine transform product. The definition of the cosine transform product and consequences are described in Section 3. In Section 4, we abandon the spatial domain definition (which is a feature common to both the t- product and cosine transform products) and define a class of products by appealing to the transform domain. Section 5 is devoted to the generalization of some matrix decompositions and introduction of new norms for tensors. Section 6 discusses how to interpret a product using a given transform and how to come up with an appropriate transform given a desired model. In Section 7, we test the new tools on an application in image deblurring and an application in tensor compression. Conclusions and future work are discussed in Section 8.

2. Background and Notation Due to the ubiquitous nature of multiway data in science and engineering appli- cations, tensor theory and decompositions have crept into recent literature on signal processing [14, 15], Bayesian statistics [16, 17], and even systems biology [18]. The recent review article by Kolda and Bader [19] provides several illustrations of how much more difficult it is dealing with tensors of order three and higher. The many equivalent definitions of rank, despite their coincidence for matrices, result in gen- uinely different notions of rank for tensors. Higher-order generalizations of the SVD also take different forms, and none of these SVD generalizations can preserve all the properties of the matrix SVD. In particular, the HOSVD of [20] does not provide the best low rank approximation of a tensor, while a space-saving alternative known as the canonical decomposition can be difficult or impossible to compute. Until the recent work by Kilmer and Martin [1], there was no definition of a closed tensor- tensor product – other previously defined notions of such products either produced a contraction or an expansion of the orders of the tensors involved. In light of these difficulties, the t-product and associated algebraic constructs pro- vide many comforts, among them an Eckart-Young like theorem and decompositions like QR and SVD similar to the matrix case. However, it should be noted that the t-product and associated decompositions are orientation-dependent (i.e. a permuta- tion of the tensor followed by a decomposition does not yield a permutation of the original decomposition). On the other hand, many tensors are oriented in practice, and the t-product has found uses in image processing, [1], face recognition [5], com- pletion of seismic data [6], and tensor compression [7]. This paper outlines a family of tensor-tensor products, each of them orientation dependent but still potentially useful in applications.

3 Figure 1: Frontal slices of a tensor.

2.1. Notation and Terminology We will denote scalars with regular lowercase font: e.g. d ∈ C. Vectors in Cn will be in lowercase with a vector symbol atop them, and matrices will be bold and n n n uppercase: e⃗1, v⃗ ∈ C ; X ∈ C . Tensors will be uppercase and calligraphic, with the exception of tubal scalars –× that is, 1 × 1 × n tensors. These will be denoted with lowercase bold: x ∈ C1 1 n. The structure imposed× × on Cn1 n2 n by our cosine product is different from the usual interpretation as a way of representing× × elements of tensor products of complex vector spaces [21]. To distinguish from so-called hypermatrices as representations of multilinear functionals, we adopt a variation on the notation from [8] and use Kn in 1 1 n n1 n2 n n n place of C and Kn in place of C 1 2 . This notation is consistent with the notion that× a× third-order× tensor (of fixed× orientation)× can be viewed as an n1 × n2 matrix whose elements are the tube fibers aij ∈ Kn. Note we have used boldface on the lowercase letters to distinguish these elements, which are tubal scalars, from n1 scalars. Likewise, an element of Kn is a length n1 vector of tubal scalars. Such Ð→ n1 vectors will be calligraphic with arrows: C ∈ Kn . We will also refer to these objects as oriented matrices. The frontal faces of a third order tensor A are obtained by slicing the tensor front to back (see Figure 2.1). For the sake of convenience, we adopt the convention of indexing frontal faces by labeling with a superscript in the top right. In other

4 i words, the ith frontal face of A is given by A . In Matlab style notation, A(∶, ∶, i) has the same meaning as A i . ( ) ( ) Definition The symbol ⊙ denotes a facewise product of two third order ten- i i i sors, i.e. X = A ⊙ B if X = A B for every frontal face. ( ) ( ) ( ) Many of our operations along tube fibers are equivalent to applying a linear operator to a vector. Thus, we introduce one more pair of operations that allow us to move between tube fibers and vectors.

n Definition Let a ∈ Kn. Then vec(a) ∈ C is defined

i vec(a)i = a , i = 1, ..., n. ( ) The following inverse operation to vec(⋅) moves vectors in Cn to tube fibers (i.e. elements of Kn).

n Definition Given a⃗ ∈ C , then a ∈ Kn with

i a = ai. ( ) 2.2. Block Matrix Tools Motivated by the definition of the t-product in [1], the cosine transform product ∗c is defined using block matrix foldings and unfoldings, which we outline below.

1 2 n Definition If A ∈ Cn1 n2 n, let A , A , ⋯, A denote its frontal faces. Then we will use mat(A) to denote× × the following( ) (n)1n×n2(n )block Toeplitz-plus-Hankel matrix ⎡ 1 2 n ⎤ ⎡ 2 n ⎤ ⎢ A A ⋯ A ⎥ ⎢ A ⋯ A 0 ⎥ ⎢ 2 1 n 1 ⎥ ⎢ n ⎥ ⎢ A( ) A( ) A ( ) ⎥ ⎢ ( ) ( ) A ⎥ mat A ⎢ ⋯ ⎥ ⎢ ⋮ ⋰ ⋰ ⎥ (1) ( ) ∶= ⎢ ( ) ( ) ( − ) ⎥ + ⎢ n ( ) ⎥ ⎢ ⋮ ⋱ ⋱ ⋮ ⎥ ⎢ A 0 ⋰ ⋮ ⎥ ⎢ n n 1 1 ⎥ ⎢ n 2 ⎥ ⎢ A A A ⎥ ⎢ 0( ) A A ⎥ ⎣ ⋯ ⎦ ⎣ ⋯ ⎦ ( ) ( − ) ( ) ( ) ( ) where 0 denotes the zero matrix of size n1 × n2.

Let ten(⋅) be defined as the inverse of the mat operation: ten(mat(A)) = A. To see that the mat-operation is indeed invertible, and that the ten operation is well-defined, we have provided the algorithm for the ten-operation on an abitrary, n1n × n2n block matrix M as input.

5 n n n n Given M ∈ C 1 2 , compute A = ten(M, n1, n2, n) n A ←Ð bottom× left n1 × n2 block of M. For( )i = n−1,..., 1 i i 1 A ←Ð [i-th block of first (block) column of M] −A end( ) ( + )

3. Cosine transform product: definition and properties We are now ready to define our first new tensor-tensor product between two third-order tensors of appropriate . We will call this product the cosine- transform product to distinguish it from the t-product in [1]. Also in this section, we develop fast computation tools, and we give compatible ideas of Hermitian transpose, unitary tensors, a singular value decomposition, and commutative “scalars”.

n n n Definition Let the cosine transform product be the operator ∗c ∶ C 1 2 × n n n n n n C 2 4 → C 1 4 that takes an n1 × n2 × n A and an n2 × n4 × n tensor ×B ×and produces× × an n1 × n×4 × n tensor as follows: A ∗c B = ten(mat(A)mat(B)). In section 4, we will endeavor to put a complete linear-algebraic framework around this operation. This is more elegantly handled if we first discuss efficient implemen- tation of the cosine-transform product.

3.1. Computation using a fast transform Our goal in this section is to show that the cosine transform product can be computed without explicitly forming the block matrices in the definition. Rather, it can be determined more easily in so-called transform space, where the transforms that are applied make use of fast trigonometric transforms that can be implemented efficiently on a computer. The proof that this is possible is based on the definition of the cosine-transform product as laid out using block matrix representation. First, note that if y is a 1 × 1 × n tensor, then mat(y) is a 1 ⋅ n × 1 ⋅ n Toeplitz- plus-Hankel matrix with the form (1) where the blocks are 1 × 1. Let Cn denote 1 1 T the n × n orthogonal DCT matrix as in [22]. Since it is orthogonal, Cn = Cn . T It is known (see [22]) that Cnmat(y)Cn = D is diagonal with diagonal elements− di that can be computed from the DCT of the first column of mat(y), specifically, 1 D = W diag(Cn(mat(y)e⃗1)), where W = diag(Cn(∶, 1)) is the diagonal matrix made of the first− column of the DCT matrix.

1This can be computed in Matlab using C = dct(eye(n)).

6 Now, mat(y)e⃗1 = (I + Z)vec(y) where Z is the n × n (singular) circulant upshift 2 1 matrix . Therefore, D = W Cn(I + Z)vec(y). It is convenient for our purposes to store the eigenvalues on the− diagonal of D as a tube fiber; we can do that be defining a linear operation on y.

Definition Let L(y) ∶ Kn → Kn be defined as i 1 (L(y)) = [W Cn(I + Z)vec(y)]i. ( ) − Note that L is an invertible transform. Also, the cost of implementing L (working from right to left as a sequence of matrix-vector products) is O(n lg n) flops because a) I + Z is upper bidiagonal with ones on the bidiagonals b) Cn times a vector can be implemented with a fast DCT in O(n lg n) flops, and c) applying W 1 can be done with point-wise products in O(n) flops. Likewise, computing L 1(v)− requires the matrix-vector product − 1 T (I + Z) (Cn (W v⃗)), − ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶r where the innermost product requires O(n) flops⃗ and can be done without explicitly forming W and the second product can be done with a call to the inverse dct routine. The product (I + Z) 1r⃗ is also done without explicit calculation of (I + Z) 1 in O(n) flops, since − − 1 (I + Z)x⃗ = v⃗ ⇒ x⃗ = (I + Z) v,⃗ requires only a backward substitution with an upper bidiagonal− matrix. Next, we specify what L means when applied to an ` × m × n tensor. Definition Let A be an ` × m × n tensor. Then L(A) =  is the ` × m × n tensor whose tube fibers aˆij are computed according to

aˆij ∶= (Â )ij = L(a)ij, i = 1, . . . , `; j = 1, . . . , m. Since, as just noted, mat(y) is diagonalizable by the DCT, it is straightforward exercise to show that premultiplication of a block matrix with structure given in 1 T by (Cn ⊗ I) and postmultiplication by (Cn ⊗ I), where ⊗ means matrix Kronecker product, yields a block diagonal matrix. What we wish to make concrete is that this resulting block diagonal matrix has the frontal slices of L(A) on its diagonal.

Lemma The matrix mat(A) is block diagonalizable by the matrix pair Cn ⊗ I and 1 Cn ⊗ I, with diagonal blocks given by the ith frontal slices of L(A). −

2In Matlab, this would be Z = diag(ones(n-1,1),1).

7 n1 n2 T Proof. Follows easily from the observation that mat(A) = ∑i 1 ∑j 1 mat(aij) ⊗ e⃗ie⃗j 1 1 n where aij is a tube fiber of A. ∈ R = = × × With this established, we can prove the following lemma, which shows that the cosine-transform product can easily be computed in the transform domain.

Lemma The cosine transform product can computed in the transform domain by n pairwise matrix products.

Proof. mat(A ∗c B) = mat(A)mat(B) 1 1 1 = [Cn ⊗ I][Cn ⊗ I]mat(A)[Cn ⊗ I][Cn ⊗ I]mat(B)[Cn ⊗ I][Cn ⊗ I] −1 − − = [Cn ⊗ I]blockdiag(L(A) ⊙ L(B))[Cn ⊗ I] 1 = mat− (L (L(A) ⊙ L(B))) − and so

1 A ∗c B = L (L(A) ⊙ L(B)) −

3.2. Macro vs. micro perspective to computing ∗c products The t-product in [1] was introduced in a similar way as we have introduced the ∗c product: as a matrix-matrix product between a structured matrix derived from the tensor on the left of the product and matrix derived from the unfolding of the tensor on the right. We refer to this as the “macro” level view of tensor-tensor product. In [8] and subsequently [4, 6, 7], a “micro” level view was taken whereby the t-product is equivalently derived by considering a third order tensor as a matrix of tubal scalars, leaving matrix multiplication untouched, but replacing scalar multiplication with the definition required to handle tubal scalars. Indeed, this micro level view holds for the ∗c product as well, as we now show.

Lemma The (i, j)th entry of the cosine transform product of two appropriately sized tensors A, B has an interpretation as a dot-product over ∗c. That is,

n2 (A ∗c B)ij = Q Aik ∗c Bkj. k 1

=

8 Proof:

1 (A ∗c B)ij = L [L(A) ⊙ L(B)]ij − n2 1 = L [Q L(A)ik ⊙ L(B)kj] − k 1 n2 =1 = Q L [L(A)ik ⊙ L(B)kj] k 1 − n2 = = Q Aik ∗c Bkj. k 1 We will revisit the significance of= this interpretation in Section 6.

3.3. Computational Complexity ` m n m p n Practically, the ∗c product between A ∈ C , B ∈ C can be computed using Algorithm 1. The steps consist of moving× × both tensors× × into the transform domain by applying the transform along tub fibers; computing n, matrix-matrix products in the transform domain; applying the inverse transform along tube fibers to obtain the result. Thus, the number of floating point operations required to compute the product is dominated (see Section **) by `m + mp applications of an n-length DCT and its inverse (at O(n lg n) flops each) and n matrix-matrix products at a cost of O(m`p) flops each.

Algorithm 1: Tensor-Tensor product under ∗c Given A ∈ C` m n, B ∈ Cm p n  ← L(A) for× ×L from Definition× × 3.1 B̂ ← L(B) for L from Definition 3.1 For i = 1, . . . , n i i i Set Ĉ ∶=  B̂ End for( ) ( ) ( ) Set C = L 1(Ĉ); −

3.4. Basic Algebraic Properties: Identity Element and Associativity We now begin to uncover the algebraic properties that characterize the new oper- ation. We first demonstrate the existence of an identity element. In fact, the identity element is the same as that given for the t-product in [1].

9 n n ` 1 j Definition Let I ∈ R such that I = In (the ) and I = [0] for j ≠ 1. × × ( ) ( )

Note that mat(I) is the n`×n` identity matrix. Thus, I∗cA = ten(mat(I)mat(A)) = ten(mat(A)) = A and I is an identity element. (Omitted is the virtually identical proof that A ∗c I = A). Like the t-product, the cosine transform product is associative. Based on Lemma 3.2 in the last subsection, we need only show that associativity is true for tubal scalars.

Lemma The cosine transform product is associative over Kn. Proof.

(a ∗c b) ∗c c = ten[mat(ten(mat(a) ⋅ mat(b)))mat(c)] = ten(mat(a)mat(b)mat(c)) = ten(mat(a)mat(ten(mat(b)mat(c)))) = a ∗c (b ∗c c). . There are other algebraic concepts that can be readily developed, e.g. scalar commutativity, the Hermitian transpose, unitary tensors, inner products, and norms. However, the family of operators described in the next section includes the cosine transform product. Rather than setting out all the algebra twice, we prefer to first define the new family, then write a single set of proofs that will apply to the t- product, the cosine transform product, and the rest of the family. Thus, we proceed to the next section.

4. Tensor-tensor products with arbitrary invertible linear transforms The definition of the cosine transform product introducted here, as well as the definition of the t-product in [1], were predicated on forming the appropriate block- structured matrix, which could then be readily block-diagonalized, from an unfolding of the tensor. However, algorithms for implementing those products (refer to Algo- rithm 1) rely on operating in the transform domain: applying the appropriate 1D transform to the tube fibers, doing the appropriate matrix operations facewise in the transform domain, and applying the inverse transform directly to the tube fibers of the resulting tensor. This leads to a natural question: is it possible to define tensor-tensor products and associated algebraic structure from any invertible linear

10 transformation by working only in the transform domain? If so, which of the desir- able properties of the t-product and the cosine transform product require the block matrix setting, and which can be satisfied without it? The answer, as we show be- low, is that one can develop all of the same algebraic machinery without a structured block-matrix equivalent. We need only an invertible linear transformation to move us back and forth into the so-called transform domain. Indeed, since these results will apply to the cosine transform product, we reduce redundancy in this paper by extending our development in this section to include results not presented earlier. This begs the question: if we define new products based on fixed invertible linear transformation, which invertible linear transformations are suitable in applications? We confront this problem of interpretation in Section 6, and we believe the material there will facilitate better modeling decisions.

4.1. New products and their algebraic properties The purpose of this section is to introduce a family of tensor-tensor products that are well defined with nice algebraic structure, given any fixed, invertible linear transformation L. We now generalize from Section 3 transformation into transform space by any fixed invertible linear L not necessarily identified with the discrete cosine transform.

Definition Let L ∶ C1 1 n → C1 1 n be invertible and linear defined by (L(y)) i = n n n n n n (Mvec(y))i for invertible× × n × n×matrix× M. Likewise, L ∶ C 1 2 → C 1 2 ( ) is defined by applying L to each tube fiber of the argument. × × × × Now we can generalize the concept of multiplication.

` m n m p n ` p n l l l Definition Define ∗L ∶ C ×C → C so that L(A∗LB) = L(A) L(B) . × × × × × × ( ) ( ) ( ) We note that the cosine transform product is an example of a ∗L-family product, 1 with L given by M = W Cn(I + Z) (see Section 3 for details). We also note one representation of this action− from the literature (see [19] for example) that could be applied to describe this action is

L(A) = A ×3 M. Note that from the definition it is easy to show the following:

` m n m p n Lemma For A ∈ C , B ∈ C , the (i,j) tube fiber of A∗L B can be computed from appropriate sums× × of of tubal× scalar× products; that is

` m (A ∗L B)ij = Q Q aij ∗L bij. i 1 j 1

= = 11 Next, we prove some basic algebraic properties of ∗L. l ˆ m m n ˆ 1 ˆ Proposition Let I ∈ R so that I = Im m for l = 1, . . . , n. Then I = L (I) is ( ) the identity element. × × − × Proof.

1 ˆ 1 I ∗L A = L (I ⊙ L(A)) = L (L(A)) = A. − −

The proof of right identity is nearly identical. Note that I has tube fibers with 0 entries everywhere except on the diagonal, and on the diagonal the tube fibers are given by L 1(e), where e is a tube fiber of all ones. −

Proposition The set of tubes in C1 1 n forms a commutative ring with identity under ∗L. This is what we have chosen× to× call Kn when L is the DCT; if it becomes necessary, we propose the notation (∗L, Kn) to distinguish between rings induced by different transform products.

Proof. The proofs of associativity, commutativity, and distribution over addition follow. For convenience, we prove associativity for third order tensors, with tubes covered as an important case.

1 1 (A ∗L B) ∗L C = L (L(L (L(A) ⊙ L(B)))L(C)) 1 = L− (L(A−)L(B)L(C)) 1 1 = L− (L(A)L(L L(B) ⊙ L(C))) − − = A ∗L (B ∗L C) 1 a ∗L b = L (L(a) ⊙ L(b)) 1 = L− (L(b) ⊙ L(a)) − = b ∗L a 1 a ∗L (b + c) = L (L(a) ⊙ L(b + c)) 1 1 = L− (L(a) ⊙ L(b)) + L (L(a) ⊙ L(c)) − − = a ∗L b + a ∗L c

12 m We note that an element of Kn is a length-m vector of tubal scalars (i.e. tube fibers). However, such an object is essentially a m×n matrix, oriented into the third dimension. Because of this geometric one-to-one relationship between elements of m m n m Kn and C , we may refer to elements of Kn also as oriented matrices. In the spirit× of [8], the set of oriented matrices can thus be thought of as a module over the ring Kn, i.e. a space where the scalars are tubes in Kn instead of elements of C and scalar multiplication is replaced by ∗L. We now show that this module is a free module.

m Theorem (∗L, Kn ) is a free module. Ð→ Proof. Denote by B j the j-th lateral slice of the m × m × n identity tensor. Now Ð→ m B j ∈ Kn must have a tube of zeros in every tubal entry but the jth due to the definition of the identity tensor. In the jth entry appears L 1(e), where e is the Ð→ m − 1 × 1 × n3 tube of ones. Then { B j}j 1 is a linearly independent generating set. It Ð→ m Ð→ Ð→ Ð→ is a generating set because, for X ∈=Kn , I ∗L X = X expresses X as a ∗L-linear Ð→ m combination of the lateral slices of I, which are { B j}j 1. It is linearly independent Ð→ Ð→ since I ∗L X = 0 ⇒ X = 0. = m Following [23] as used in [8], we conclude that Kn is a free module. As a corollary, it has a unique dimension (m, in this case).

4.2. Hermitian Transpose, norms and inner products In this section, we define a Hermitian transpose for the new tensor-tensor prod- ucts. We define a norm so that the new products induce a Hilbert C*-algebra on Kn m and a Hilbert C*- module on Kn . In doing so, we provide something fundamentally different from what has been discussed previously in the literature for the t-product.

H H Definition The Hermitian transpose A is the tensor such that L(A )(∶, ∶, k) = (L(A)(∶, ∶, k))H . Proposition The multiplication reversal property of the Hermitian transpose holds: H H H A ∗L B = [B ∗L A] . Proof.

H H H H L(A ∗L B )(∶, ∶, k) = [L(A)(∶, ∶, k) ⊙ L(B)(∶, ∶, k) ] H = [L(B)(∶, ∶, k) ⊙ L(A)(∶, ∶, k)] H = [L(B ∗L A)(∶, ∶, k)] .

13 The result follows from equality in the transform domain because L is invertible.

H It is perhaps worth noting that in transform space, the frontal faces of A ∗L A are Hermitian positive definite matrices, since

H i i H i [L(A ∗L A)] = (L(A) ) L(A) . ( ) ( ) ( ) Having defined a Hermitian transpose, for completeness we should note in what sense the new definition matches the previous definitions. Indeed, it is straight- forward to prove that the new definition of the Hermitian transpose matches the existing definition of the t-product transpose given in [1]. We want our paper to be self-contained, and working with the t-product would require many additional definitions, so in order to remain focused, we will not give the formal proof here. With a working definition of conjugate transpose, we are in a to define a few more concepts. The most significant is a new norm for tubes and oriented matrices, which results in additional algebraic structure, and we take a detour from new definitions to prove properties of the norm. The new norm inspires definitions of tubal angle, which are discussed in the Appendix to preserve this section’s fluidity.

Ð→ Ð→ m Ð→ Ð→ Ð→H Ð→ Definition Given A, B ∈ Kn , their ∗L-inner product < A, B > L is B ∗L A, an element of . Kn ∗ d Definition SSaSS L ∶ Kn → R is given by maxd{SL(a) S}. ( ) ∗ Note that the above is a valid norm on Kn, since YaY L = Yvec (L(a)) Y . Now we can show that the inner product can be used to define a norm on m. ∗ ∞ Kn 1 2 Ð→ n1 Ð→ Ð→ Definition Let SSX SS L max ∶ Kn3 → R is given by SS < X , X > L SS L  ].

∗ − m ∗ ∗ Lemma The above definition yields a valid norm on Kn . Ð→ Ð→ Proof. Let a ∶=< X , X >, and note that 1 Ð→ H Ð→ L(a) = L(L ŠL(X ) ⊙ L(X )), − so the leftmost L and L 1 cancel. Thus, − ½ Ð→ d max max L a YX Y L = d S ( ) S ( ) ∗ − Ð→ d 2 1 2 max L 2 = d {Y (X ) Y } ( ) ~ Ð→ d max L 2 , = d {Y (X ) Y } ( ) 14 Ð→ where we have used the fact that L(X ) d is a length m vector in Cm. It is now straightforward using this last equality to show( ) that the triangle inequality must hold. Ð→ Ð→ It is easy to show that YX Y L max is positive homogeneous and zero iff X = 0. Now we come to an important∗ − result of this section.

n1 Proposition Kn is a Hilbert C -algebra and Kn3 is a Hilbert C -module with in- Ð→ Ð→ ∗Ð→ ∗ ner product < A, B > L , norm SSASS L max, and involution given by the Hermitian transpose. ∗ ∗ − H H Proof. The key property to show is that SSx ∗ xSS L = SSx SS L SSxSS L for any tubal scalar x . The rest follows directly from properties of along with existing ∈ Kn ∗ ∗∗L ∗ results like II.7.1.3 on page 138 of [24]. Recall that if v is a tubal scalar, then v d refers to the dth frontal face, which is a scalar (element of C). Also, SvS is the( tubal) scalar resulting from taking the magnitude of each entry.

H d x L x max L x L x Y ∗ Y L = d S( ( ) ⊙ ( ))S ( ) ∗ max L x d 2 = d (S ( ) S ) ( ) max L x d max L x d = d S ( ) S d S ( ) S ( ) ( ) x 2 . = Y Y L

∗ H Since it is easy to see from the definitions that Yx Y L = YxY L , the proof is complete.

∗ ∗

5. Tensor Decompositions A common goal in tensor applications is the compressed representation of the tensor through use of an appropriate factorization. The goal is for the compressed representation to maintain a high proportion of the “energy” and features of the original tensor. Thus, we follow [1] and Section 3 in generalizing the matrix SVD and some spectral norms using the new family of products. We begin with the definition of unitary tensor under the ∗L operation. (Compare with the definitions in Section 3).

n n n H H Definition Q ∈ C 1 2 3 is ∗L-unitary if Q ∗L Q = Q ∗L Q = I. × × Consider the following algorithm (compare to the tensor SVD from [1]):

15 Algorithm 2: Tensor SVD induced by ∗L ˆ Given n1 × n2 × n tensor X, compute X = L(X). For i = 1, . . . , n ˆ i ˆ i ˆ i H ˆ i [U , S , V ] ←Ð svd[X ] end ( ) ( ) ( ) ( ) 1 ˆ 1 ˆ 1 ˆ Compute U ←Ð L (U), S ←Ð L (S), V ←Ð L (V). − − −

Theorem Given the output of Algorithm 2, the tensors U and V will be ∗L-unitary, and X = U ∗L S ∗L V.

H i i H i H i i ˆ ˆ i Proof. (L(U ∗L U)) = L(U ) L(U) = U U = L(I) by the result of the ( ) ( ) matrix SVDs. Proof for( ) V follows( ) similarly.( ) ( )

Furthermore, S will have diagonal faces since L(0) = L 1(0) for any linear, invert- ible transform. Thus, this decomposition preserves properties− of the matrix counter- part, and we refer to it as the tensor-SVD ∗L factorization. Clearly, other factoriza- tions (such as QR) are possible - the main ingredient is to transform the tensor first, do facewise matrix factorizations, and then apply inverse transform. The Frobenius norm of an ` × m × n tensor A is the square root of the sum of the squares of the entries in the tensor. For matrices, it is unitarily invariant: multiplying by a unitary matrix does not change the Frobenius norm. However, not all of our tensor-tensor products are preserve this property. For instance, to find L, Q, and X such that Q is ∗L-unitary but YQ ∗L XYF ro differs from YXYF ro, take L to 1, 2 be the transform represented by , Q as the unitary tensor with L Q 1 I,  3, 4 ( ) = ( ) 1 , 1 2 2 2 1, 2 L Q 1 1 , and X to be . ( ) =  √ , √  3, 4 ( ) 2 2 To obtain√ unitary√− invariance and to develop technical tools for use in applications, it makes sense to investigate new operator (spectral) norms induced by the tensor SVD for a given L. We pursue this in the Appendix. We conclude by noting that the t-product in [1] is unitarily invariant, which is particularly nice in that it leads to an Eckart-Young like result. Having carried the development of our new operators from basic properties through matrix decompositions, we will use the next section to demonstrate the uses of some of these tools.

16 6. Multiplication under L and a matrix algebra perspective

Let x⃗ be an n-length vector, let x denote the tensor-representation of that vector, and let T be a diagonalizable linear transformation. The first question one might ask: does there exist a c and invertible linear transformation L (which may be different from T ) such that T (x⃗) ≡ c ∗L x? The answer is yes, as the following shows.

Theorem Let T be a diagonalizable linear transformation so that T = M 1DM. Let M be the matrix defining L, and define Lc from the diagonals of D. Then− for any x⃗ in the range of T , T (x⃗) = vec (c ∗L x) .

Proof. Proof follows directly from the definitions and the observation

1 1 1 T (x⃗) = M DMx⃗ = M (D(Mx⃗)) = M vec(L(c) ⊙ L(x)). − − −

Now, we can begin to think in the opposite direction. Given L defined by invert- ible M, observe that the product x ∗L c is equivalent to the linear transformation 1 Tc(x⃗) where Tc = M diag(vec(L(c)))Mx⃗. That is, the action of one scalar on another under ∗L is equivalent− to the action of a matrix from the matrix subalgebra

1 n {C = M diag(d⃗)MSd⃗∈ C } (2) − on the second scalar’s vectorization. In the t-product definition of [1], M is the DFT matrix, and hence the algebra (interpreted in the spatial domain) has a closed form as the “algebra of circulants” defined in [4] of the t-product. We can now say more generally if M is such that we know the structure of this subalgebra in the spatial domain, a similar spatial domain closed form interpretation will apply to entire tensor-tensor products owing to Lemma 4.1. For example, when M = Cn is the discrete cosine transform matrix defined in Section 3, matrices in this class are of a special symmetric Toeplitz-plus- Hankel form (see [22] and references therein for details). Even if we do not know the closed form of this algebra in the spatial domain, we can still make some sense of scalar multiplication if we regard it as applying a linear transformation with known eigenvectors and flexible eigenvalues. Ð→ In practice, it may not be obvious which L to choose. Suppose X contains data. Ð→ Consider the product X ∗L c. The ith component of the first column is xi ∗L c, 1 which is equivalent to T (x⃗i) for T = M diag(Lc⃗ )M. Thus, if one can “learn” − 17 a transformation T from the data, one can learn the appropriate L to use. For example, to apply the transformations learned in [25] to RGB video, the video could be treated as an 3 × n2 × n3 object, where n2 is the number of frames and n3 is the number of pixels per frame. The same transformation ought to map one frame to the next regardless of color, and this is the rationale for shaping the tensor so that in an outer-product style decomposition, the same scalar multiplies every color of a given frame. Linear algebraic approaches could then be used, with L learned from the data.

7. Numerical Results 7.1. Video Compression We now provide demonstrations of the new tools on two signal processing tasks: compression of video and recovery of a blurred, noisy image. First, we illustrate the ∗L, L =DCT-IIE, with a video compression demo. Results are displayed for the truncated ∗L-SVD, the truncated t-SVD of [1], and a matri- cization of the video followed by truncation of the regular SVD. For the ∗L-SVD and t-SVD, we compress by singular value thresholding in transform domain, removing the lowest scalars from L(S) along with the corrsponding row and column fibers. Here, X is the original tensor and Y is the compressed version. We define compres- sion ratio as (number of pixels)/(number of real floating point numbers stored), and log compression ratio is plotted against relative error in decibels:

YX(∶) − Y(∶)YF Error = 20 log( ). (3) YX(∶)YF For the “Basketball” sequence, the dct-based compression scheme is competitive with the t-SVD, even though that sequence was chosen to illustrate the capabilities of the t-SVD in [26].

7.2. Deconvolution A typical discrete model for image blurring [22] is given by

Ax⃗ + n⃗ = ⃗b, (4) where A denotes the discrete blurring operator, x⃗ is the vectorized form of the m × n image X, so that x⃗i = X(∶, i), etc. Likewise, ⃗b is the vectorized form of the measured image, which includes additive white noise indicated by n⃗. The underlying continuous model is convolution, but since x⃗ represents only part of the scene, it is typical to

18 Basketball Video Dash Camera Video 0 0

−5 −5

−10 −10

−15 −15

−20 t−SVD1 2 3 −20 t−SVD1 2 3 starL−SVD1 2 3 starL−SVD1 2 3 SVD1 2 3 SVD1 2 3 Relative error (db) t−SVD2 3 1 Relative error (db) t−SVD2 3 1 starL−SVD2 3 1 −25 −25 starL−SVD2 3 1 SVD2 3 1 SVD2 3 1 t−SVD3 1 2 t−SVD3 1 2 starL−SVD3 1 2 starL−SVD3 1 2 SVD3 1 2 SVD3 1 2 −30 −30 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 3.5 log compression ratio log compression ratio 10 10

Figure 2: impose boundary conditions to make the model more accurate. If the blurring is assumed to be spatially invariant, and zero boundary conditions are imposed, then A would be a block-Toeplitz with Toeplitz block matrix. On the other hand, when the scene is not black outside the edge of the image, typically reflexive boundary contiions are employed. Under the assumption of spatially invariant blur with a symmetric point spread function, this gives rise to a symmetric matrix that is block Toeplitz- plus-Hankel with Toeplitz-plus-Hankel blocks, so A has the form (1). Typically, since ⃗b contains noise, one should only approximately solve Ax⃗ ≈ ⃗b. Define g⃗ such that x⃗ = ((I + Z) ⊗ I)g⃗. Then Ax⃗ = Ag⃗. We wish to be able to find d⃗ so that Ax⃗ = Ag⃗ = ⃗b is equivalent to ten(Ag⃗), in order to recast our problem using Ð→ tensor notation. Notice that from the definition (with n1 = m, n2 = 1, ten(d⃗) = Q, Ð→ where Q i is the ith subvector of ((I + Z) 1 ⊗ I)d⃗. So if we set d⃗= ((I + Z) ⊗ I)⃗b, we obtain( ) an equivalent formulation of our− deblurring problem Ð→ Ð→ A ∗c G = D. 1 Ð→ Although A is an invertible tensor in this example, the image A ∗c(D) produces a worthless approximation due to the magnification of the ignored− noise. Therefore, we will compute a regularized solution

Ð→ 1 Ð→ G reg = (A + R) ∗c D − where R is the tensor with all zeros except the first frontal face; the first frontal face is a constant α times the identity matrix. The regularized image Xreg is recovered

19 Figure 3:

using the invertible mapping between x⃗ and g⃗ described above. The parameter α is called the regularization parameter. The result for the “optimal” value of α - that is, the choice among a discrete sample of parameters that gave the best approximation to ground truth - is given in the right of Figure 3. In this example, tensors have been used to reformulate a problem that can be written using traditional matrix-vector format, with the goal of showing the utility and convenience of third order tensors acting on matrices. Images are kept in their natural 2D format. Indeed, the above was carried out by creating a tensor ∗c class and overloading the matrix-vector multiplication command in Matlab.

8. Conclusions and Future Work Our main contribution in this paper was to define a new family of operators on C1 1 n, extending crucial pieces of linear algebra to include these n-dimensional (aka tubal)× × scalars. We proved that they form a C*-algebra, defining a norm and an invo- lution, and we showed that tensors can be interpreted as ∗L-linear transformations on a free module represented by Cn1 1 n3 . We use this theory to provide calculable decompositions of tensors into products× × of tensors, which we apply to an example problem in video compression. We also use our new framework to represent and approximately invert a blurring operator. The question of how to generalize other classical concepts such as rank and de- terminant remains to be investigated. In future work, we will also consider whether this framework can be reconstructed in a setting where the transformation L lacks a right inverse. Further investigation into the utility of these results in applications, particularly in machine learning, is currently being undertaken.

20 9. Appendix 9.1. Angles

n1 Finally, we give two definitions of the angle between two elements of Kn3 . Angles can be of use, for example, in machine learning applications. In recent work [? ], the authors have been able to show that the linear algebraic framework set out here, including norms and angles, is useful in clustering along submodules rather than subspaces. The first angle definition parallels that given in [3], and so we call it the KBHH angle. The second definition takes advantage of the ∗L − max norm, and we call it the C*-angle. As our scalars are elements of Kn3 , so are our angles. The cosine in the definitions below is taken to act element-wise. Ð→ Ð→ Definition The KBHH angle θ between A and B is given by its cosine: cos(θ) = , , ∗L ∗L . 2Ð→LÐ→ LÐ→ Ð→ F+< B A>F Ð→ Ð→ SS (A)SS SS ( B )SS Ð→ Ð→ Definition The C*-angle θ between A and B is given by its cosine: cos(θ) = , , ∗L ∗L . 2 Ð→ Ð→ Ð→ Ð→ max+< B ∗AL>−max Ð→ Ð→ 9.2.SSASS TensorSS B SS Operator Norms In order to facilitate applications and illustrate the utility of the new tensor SVD, we now define an operator norm and a convex “nuclear norm.”

n1 n2 H Definition If X ∈ Kn has ∗L-SVD components U, S, V with diagonal elements min n ,n s of S, we define the × -max nuclear norm to be A n L 1 2 s j . i ∗L SS SS , L = ∑j 1 (∑i 1 i) { } ( ) ∗ ∗ = = Lemma SSASS , L satisfies the triangle inequality.

∗ ∗ The proof follows easily fron the definition, ultimately involving the use of the triangle inequality on the matrix nuclear norm for each individual face.

n1 n2 H Definition If X ∈ Kn has ∗L-SVD components U, S, V with diagonal elements × si of S, we define the ∗L-operator norm to be SSASS L max = SSs1SS L .

∗ − ∗ Lemma SSASS L max is a valid norm.

∗ −

21 Proof. The major hurdle is in showing the triangle inequality. First, if we let szj denote the jth diagonal tube in the tensor SVD of Z ∶= X + Y, then

d d L(sz1) = SSL(X + Y) SS2 ( ) d (d) ≤ SSL(X) SS2 + SSL(Y) SS2 (d) d( ) = L(sx1) + L(sy1) ( ) ( ) The proof is now straightforward from the definition and properties of maxima.

n1 n2 Ð→ n2 Lemma SSXSS L max is an operator norm: for X ∈ Kn and A ∈ Kn , SSX ∗L Ð→ Ð→ × ASS L max ≤ SSX∗ SS− L maxSSASS L max.

∗ − ∗ − ∗ − n1 n2 H Proof. Let X ∈ Kn have ∗L-SVD components U, S, V with diagonal elements si of S. We use the fact× that induced matrix 2-norms SS ⋅ SS2 are given by max singular values. To prove the assertion,

Ð→ d Ð→ d L X L max max L X L 2 SS ( ∗ A)SS L = d SS ( ) (A) SS ( ) ( ) d ∗Ð→− d max s1 L 2 ≤ d SS (A) SS ( ) ( ) d Ð→ d′ max s1 max L 2 ≤ d d′ SS (A) SS ( ) Ð→ ( ) ≤ SSXSS L maxSSL(A)SS L max . ∗ − ∗ −

References [1] M. E. Kilmer, C. D. Martin, Factorization strategies for third-order tensors, Linear Algebra and its Applications 435 (3) (2011) 641–658. doi:http://dx.doi.org/10.1016/j.laa.2010.09.020. URL http://www.sciencedirect.com/science/article/pii/S0024379510004830

[2] C. Martin, R. Shafer, B. LaRue, An order-$p$ tensor factorization with appli- cations in imaging, SIAM Journal on Scientific Computing 35 (1) (2013) A474– A490. doi:10.1137/110841229. URL http://dx.doi.org/10.1137/110841229

22 [3] M. Kilmer, K. Braman, N. Hao, R. Hoover, Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging, SIAM Journal on Matrix Analysis and Applications 34 (1) (2013) 148– 172. doi:10.1137/110837711. URL http://dx.doi.org/10.1137/110837711 [4] D. F. Gleich, C. Greif, J. M. Varah, The power and arnoldi methods in an algebra of circulants, Numerical Linear Algebra with Applications (2012) n/a– n/adoi:10.1002/nla.1845. URL http://dx.doi.org/10.1002/nla.1845 [5] N. Hao, M. Kilmer, K. Braman, R. Hoover, Facial recognition using tensor- tensor decompositions, SIAM Journal on Imaging Sciences 6 (1) (2013) 437–463. doi:10.1137/110842570. URL http://dx.doi.org/10.1137/110842570 [6] G. Ely, S. Aeron, N. Hao, M. E. Kilmer, 5d and 4d pre-stack seismic data completion using tensor nuclear norm (tnn)., sEG International Exposition and Eighty-Third Annual Meeting at Houston, TX, 2013 (2013). [7] Z. Zhang, G. Ely, S. Aeron, N. Hao, M. Kilmer, Novel Factorization Strategies for Higher Order Tensors: Implications for Compression and Recovery of Multi- linear Data, ArXiv e-printsarXiv:1307.0805. [8] K. Braman, Third-order tensors as linear operators on a space of ma- trices, Linear Algebra and its Applications 433 (7) (2010) 1241–1253. doi:http://dx.doi.org/10.1016/j.laa.2010.05.025. URL http://www.sciencedirect.com/science/article/pii/S0024379510002934 [9] C. Navasca, M. Opperman, T. Penderghest, C. Tamon, Tensors as module ho- momorphisms over group rings, CoRR abs/1005.1894. [10] S. A. Martucci, Symmetric convolution and the discrete sine and cosine trans- forms, IEEE Transactions on Signal Processing 42 (5) (1994) 1038–1051. [11] V. Sanchez, P. Garcia, A. M. Peinado, J. C. Segura, A. J. Rubio, Diagonalizing properties of the discrete cosine transforms, Signal Processing, IEEE Transac- tions on 43 (11) (Nov 1995) 2631–2641. doi:10.1109/78.482113. [12] G. Strang, The discrete cosine transform, SIAM Review 41 (1) (1999) 135–147. doi:10.2307/2653173. URL http://www.jstor.org/stable/2653173

23 [13] M. K. Ng, R. H. Chan, W.-C. Tang, A fast algorithm for deblurring models with neumann boundary conditions, SIAM J. Sci. Comput. 21 (3) (1999) 851–866. doi:10.1137/S1064827598341384. URL http://dx.doi.org/10.1137/S1064827598341384 [14] J. Li, L. Zhang, D. Tao, H. Sun, Q. Zhao, A prior neurophysiologic knowl- edge free tensor-based scheme for single trial eeg classification, Neural Systems and Rehabilitation Engineering, IEEE Transactions on 17 (2) (2009) 107–115. doi:10.1109/TNSRE.2008.2008394. [15] L. D. Lathauwer, J. Vandewalle, Dimensionality reduction in higher- order signal processing and rank-(r1,r2,. . . ,rn) reduction in , Linear Algebra and its Applications 391 (0) (2004) 31 – 55, ¡ce:title¿Special Issue on Linear Algebra in Signal and Image Process- ing¡/ce:title¿. doi:http://dx.doi.org/10.1016/j.laa.2004.01.016. URL http://www.sciencedirect.com/science/article/pii/S0024379504000886 [16] Y. Yang, D. B. Dunson, Bayesian Conditional Tensor Factorizations for High- Dimensional Classification, ArXiv e-printsarXiv:1301.4950. [17] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, M. Telgarsky, Tensor decompo- sitions for learning latent variable models, CoRR abs/1210.7559. [18] S. P. Ponnapalli, M. A. Saunders, C. F. Van Loan, O. Alter, A higher- order generalized singular value decomposition for comparison of global mrna expression from multiple organisms., PLoS One 6 (12) (2011) e28072. doi:10.1371/journal.pone.0028072. [19] T. Kolda, B. Bader, Tensor decompositions and applications, SIAM Review 51 (3) (2009) 455–500. doi:10.1137/07070111X. URL http://dx.doi.org/10.1137/07070111X [20] L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition, SIAM Journal on Matrix Analysis and Applications 21 (4) (2000) 1253–1278. doi:10.1137/S0895479896305696. URL http://dx.doi.org/10.1137/S0895479896305696 [21] L.-H. Lim, Tensors and hypermatrices, Handbook of Linear Algebra, 2nd Ed., CRC Press, Boca Raton, FL. [22] M. Ng, R. Chan, W.-C. Tang, A fast algorithm for deblurring models with neumann boundary conditions, SIAM J. Sci. Comput (1999) 851–866.

24 [23] T. W. Hungerford, Algebra, Springer, 1974.

[24] B. Blackadar, Operator Algebras: Theory of C*-Algebras and Von Neumann Algebras, Springer, 2010. URL http://books.google.com/books?id=wfoCkgAACAAJ

[25] J. Sohl-Dickstein, J. C. Wang, B. A. Olshausen, An unsupervised algorithm for learning lie group transformations, CoRR abs/1001.1027.

[26] Z. Zhang, G. Ely, S. Aeron, N. Hao, M. Kilmer, Novel Factorization Strategies for Higher Order Tensors: Implications for Compression and Recovery of Multi- linear Data, ArXiv e-printsarXiv:1307.0805.

25