IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016 6533 Polar n-Complex and n-Bicomplex Singular Value Decomposition and Principal Component Pursuit Tak-Shing T. Chan, Member, IEEE, and Yi-Hsuan Yang, Member, IEEE

Abstract—Informed by recent work on tensor singular value de- PCP with c =1has a high probability of exact recovery, though composition and circulant algebra matrices, this paper presents a c can be tuned if the conditions are not met. new theoretical bridge that unifies the hypercomplex and tensor- Despite its success, one glaring omission from the original based approaches to singular value decomposition and robust prin- cipal component analysis. We begin our work by extending the PCP is the lack of complex (and hypercomplex) formulations. principal component pursuit to Olariu’s polar n-complex num- In numerous signal processing domains, the input phase has a bers as well as their bicomplex counterparts. In doing so, we have significant meaning. For example in parametric spatial audio, derived the polar n-complex and n-bicomplex proximity opera- spectrograms have not only spectral phases but inter-channel tors for both the 1 - and trace-norm regularizers, which can be phases as well. For that reason alone, we have recently extended used by proximal optimization methods such as the alternating di- rection method of multipliers. Experimental results on two sets of the PCP to the complex and the quaternionic cases [7]. However, audio data show that our algebraically informed formulation out- there exists inputs with dimensionality greater than four, such as performs tensor robust principal component analysis. We conclude microphone array data, surveillance video from multiple cam- with the message that an informed definition of the trace norm can eras, or electroencephalogram (EEG) signals, which exceed the bridge the gap between the hypercomplex and tensor-based ap- capability of . These signals may instead be repre- proaches. Our approach can be seen as a general methodology for generating other principal component pursuit algorithms with sented by n-dimensional hypercomplex , defined as [8] proper algebraic structures. a = a0 + a1 e1 + ···+ an−1 en−1 , (3) Index Terms—Hypercomplex, tensors, singular value decompo- ∈ sition, principal component, pursuit algorithms. where a0 ,...,an−1 R and e1 ...,en−1 are the imaginary units. Products of imaginary units are defined by an ar- bitrary (n − 1) × (n − 1) multiplication table, and multipli- I. INTRODUCTION cation follows the distributive rule [8]. If we impose the multiplication rules HE robust principal component analysis (RPCA) [1] has received a lot of attention lately in many application areas −  T ej ei ,i= j, of signal processing [2]–[5]. The form of RPCA decom- ei ej = (4) × − poses the input X ∈ Rl m into a low-rank matrix L and a sparse 1, 0, or 1,i= j, matrix S: and extend the algebra to include all 2n−1 combinations of

min rank(L)+λS0 s.t. X = L + S, (1) imaginary units (formally known as multivectors): L,S a = a0 + a1 e1 + a2 e2 + ... where ·0 returns the of nonzero matrix elements. Ow- ing to the NP-hardness of the above formulation, the principal + a1,2 e1 e2 + a1,3 e1 e3 + ... component pursuit (PCP) [1] has been proposed to solve this + ...+ a − e e ...e − , (5) relaxed problem instead [6]: 1,2,...,n 1 1 2 n 1 then we have a [9]. For example, the real, min L∗ + λS1 s.t. X = L + S , (2) L,S complex, and algebras are all Clifford algebras. Yet previously, Alfsmann [10] suggests two families of 2N- where ·∗ is the trace norm (sum of the singular values), · 1 dimensional hypercomplex numbers suitable for signal pro- is the entrywise  -norm, and λ can be set to c/ max(l, m) 1 cessing and argued for their superiority over Clifford algebras. where c is a positive parameter [1], [2]. The trace norm and the One family starts from the two-dimensional hyperbolic numbers  -norm are the tightest convex relaxations of the rank and the 1 and the other one starts from the four-dimensional tessarines,1  -norm, respectively. Under somewhat general conditions [1], 0 with dimensionality doubling up from there. Although initially attractive, the 2N-dimensional restriction (which also affects Manuscript received August 26, 2015; revised May 26, 2016 and July 16, 2016; accepted September 3, 2016. Date of publication September 21, 2016; Clifford algebras) seems a bit limiting. For instance, if we have date of current version October 19, 2016. The associate editor coordinating the 100 channels to process, we are forced to use 128 dimensions review of this manuscript and approving it for publication was Prof. Masahiro (wasting 28). On the other hand, tensors can have arbitrary di- Yukawa. This work was supported by a grant from the Ministry of Science and Technology under the contract MOST102-2221-E-001-004-MY3 and the mensions, but traditionally they do not possess rich algebraic Academia Sinica Career Development Program. structures. Fortunately, recent work on the tensor singular value The authors are with the Research Center for Information Technology In- novation, Academia Sinica, Taipei 11564, Taiwan (e-mail: takshingchan@ 1 2 citi.sinica.edu.tw; [email protected]). Hyperbolic numbers are represented by a0 + a1 j where j =1 and Digital Object Identifier 10.1109/TSP.2016.2612171 a0 ,a1 ∈ R [10]. Tessarines are almost identical except that a0 ,a1 ∈ C [10].

1053-587X © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information. 6534 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016 decomposition (SVD) [11], which the authors call the t-SVD, rules [16]. The inverse of p is the number p−1 such that pp−1 =1 has begun to impose more structures on tensors [12]–[14]. Fur- [16]. Olariu named it the polar n-complex algebra because it is thermore, a tensor PCP formulation based on t-SVD has also motivated by the polar representation of a [16] been proposed lately [15]. Most relevantly, Braman [12] has where√ a + jb ∈ C is represented geometrically by its modu- suggested to investigate the relationship between t-SVD and lus a2 + b2 and polar angle arctan(b/a). Likewise, the polar Olariu’s [16] n-complex numbers (for arbitrary n). This is ex- n-complex number in (6) can be represented by its modulus actly what we need, yet the actual work is not forthcoming. So | | 2 2 ··· 2 we have decided to begin our investigation with Olariu’s po- p = a0 + a1 + + an−1 (8) lar n-complex numbers. Of special note is Gleich’s work on the circulant algebra [17], which is isomorphic to Olariu’s po- together with n/2 −1 azimuthal angles, n/2 −2 planar lar n-complex numbers. This observation simplifies our current angles, and one polar angle (two if n is even), totaling n − 1 T work significantly. Nevertheless, the existing tensor PCP [15] angles [16]. To calculate these angles, let [A0 ,A1 ,...,An−1 ] T employs an ad hoc tensor nuclear norm, which lacks algebraic be the discrete Fourier transform (DFT) of [a0 ,a1 ,...,an−1 ] , validity. So, in this paper, we remedy this gap by formulating defined by ⎡ ⎤ ⎡ ⎤ the first proper n-dimensional PCP algorithm using the polar A0 a0 n-complex algebra. ⎢ ⎥ ⎢ ⎥ ⎢ A1 ⎥ ⎢ a1 ⎥ Our contributions in this paper are twofold. First, we have ⎢ ⎥ = F ⎢ ⎥ , (9) ⎣ . ⎦ n ⎣ . ⎦ extended PCP to the polar n-complex algebra and the polar . . n-bicomplex algebra (defined in Section III), via: 1) properly An−1 an−1 exploiting the circulant isomorphism for the polar n-complex −j2π/n numbers; 2) extending the polar n-complex algebra to a new where ωn = e is a principal nth root of unity and ⎡ ⎤ polar n-bicomplex algebra; and 3) deriving the proximal opera- 11··· 1 tors for both the polar n-complex and n-bicomplex matrices by ⎢ ··· n−1 ⎥ 1 ⎢ 1 ωn ωn ⎥ leveraging the aforementioned isomorphism. Second, we have √ ⎢ ⎥ Fn = ⎣ . . .. . ⎦ , (10) provided a novel hypercomplex framework for PCP where al- n . . . . n−1 ··· (n−1)(n−1) gebraic structures play a central role. 1 ωn ωn This paper is organized as follows. In Section II, we review which is unitary, i.e., F∗ = F−1 .Fork =1,...,n/2 −1,the polar n-complex matrices and their properties. We extend this to n n azimuthal angles φ can be calculated from [16] the polar n-bicomplex case in Section III. This leads to the polar k −jφk n-complex and n-bicomplex PCP in Section IV. Experiments Ak = |Ak |e , (11) are conducted in Sections V and VI to justify our approach. We ≤ conclude by describing how our work provides a new direction where 0 φk < 2π. Note that we have reversed the sign of the for future work in Section VII. angles as Olariu was a physicist so his DFT is our inverse DFT. Furthermore, for k =2,...,n/2 −1, the planar angles ψk−1 II. THE POLAR n-COMPLEX NUMBERS are defined by [16] | | In this section we introduce polar n-complex matrices and A1 tan ψk−1 = , (12) their isomorphisms. These will be required in Section IV for the |Ak | formulation of polar n-complex PCP. Please note that the value where 0 ≤ ψk ≤ π/2. The polar angle θ+ is defined as [16] of n here does not have to be a power of two. √ 2|A1 | tan θ+ = , (13) A. Background A0

Olariu’s [16] polar n-complex numbers, which we denote where 0 ≤ θ+ ≤ π. Finally, for even n, there is an additional by Kn ,aren-dimensional (n ≥ 2) extensions of the complex polar angle [16], algebra, defined as √ 2|A1 | − p = a0 e0 + a1 e1 + ···+ an−1 en−1 ∈ Kn , (6) tan θ = , (14) An/2 where a ,a ,...,a − ∈ R. The first is defined 0 1 n 1 where 0 ≤ θ− ≤ π. We can uniquely recover the polar n- to be e0 =1whereas e1 ,...,en−1 are defined by the multipli- complex number given its modulus and the n − 1 angles de- cation table [16] fined above.2 More importantly, the polar n-complex numbers 3 ei ek = e(i+k)modn . (7) are ring-isomorphic to the following matrix representation [16],

We call Re p = a0 the real part of p and Imi p = ai the imag- inary parts of p for i =0, 1,...,n− 1. We remark that our 2Exact formulas can be found in [16, pp. 212–216], especially (6.80), (6.81), imaginary index starts with 0, which includes the real part, to (6.103), and (6.104). We remark that Olariu’s choice of A1 as a reference for the planar and polar angles is convenient but somewhat arbitrary. facilitate a shorter definition of equations such as (34) and (41). 3 A ring isomorphism is a bijective map χ : R → S such that χ(1R )=1S , Multiplication follows the usual associative and commutative χ(ab)=χ(a)χ(b),andχ(a + b)=χ(a)+χ(b) for all a, b ∈ R. CHAN AND YANG: POLAR n-COMPLEX AND n-BICOMPLEX SINGULAR VALUE DECOMPOSITION AND PRINCIPAL COMPONENT PURSUIT 6535

→ n×n 0 χ : Kn R : ⎡ ⎤ following the convention that En = In . It is trivial to show that a0 an−1 an−2 ··· a1 i k (i+k)modn En En = En [20]. Hence the isomorphism is immedi- ⎢ ⎥ n−1 ⎢ a a a − ··· a ⎥ { i } ⎢ 1 0 n 1 2 ⎥ ately obvious. Recall that the group of imaginary units En i=0 ⎢ ⎥ is called cyclic if we can use a single element E to gen- χ(p)=⎢ a2 a1 a0 ··· a3 ⎥ , (15) n ⎢ ⎥ erate the entire algebra, so the algebra in (24) has another name ⎣ ...... ⎦ . . . . . called a cyclic algebra [21]. ··· an−1 an−2 an−3 a0 The circulant isomorphism helps us to utilize recent literature which is a circulant matrix.4 This means that polar n-complex on circulant algebra matrices [17], which simplifies our work multiplication is equivalent to circular convolution. Due to the in the next subsection. The circulant algebra in [17] breaks the circular convolution theorem, it can be implemented efficiently modulus into n pieces such that the original number can be in the Fourier domain [17]: uniquely recovered without the planar and polar angles. How- √ ever, for the  -norm at least, we need a single number for  ◦ 1 Fn (a b)= n(Fn a) (Fn b), (16) minimization purposes. Moreover, although our goal is phase where a, b ∈ Rn ,  denotes circular convolution, and ◦ is the preservation, we do not need to calculate the angles explicitly for Hadamard product. The isomorphism in (15) implies [17]: the PCP problem. Consequently, we will stick with the original definition in (8). χ(1) = In , (17)

χ(pq)= χ(p)χ(q), (18) B. Polar n-Complex Matrices and their Isomorphisms χ(p + q)= χ(p)+χ(q), (19) We denote the set of l × m matrices with polar n-complex × × −1 −1 l m ∈ l m χ(p )= χ(p) , (20) entries by Kn . For a polar n-complex matrix A Kn ,we l×m → ln×mn define its adjoint matrix via χlm : Kn R [17]: for 1,p,q ∈ Kn . From these properties it becomes natural to define the polar n-complex conjugation p¯ by [17] ⎡ ⎤ χ(A11) χ(A12) ... χ(A1m ) χ(¯p)=χ(p)∗ (21) ⎢ ⎥ ⎢ ⎥ ∗ ⎢ χ(A21) χ(A22) ... χ(A2m ) ⎥ where χ(p) denotes the conjugate transpose of χ(p).Thisal- χlm (A)=⎢ ⎥ . (26) ⎣ . . . . ⎦ lows us to propose a new scalar product inspired by its quater- . . .. . nionic counterpart [19], χ(Al1 ) χ(Al2 ) ... χ(Alm ) p, q = Re pq,¯ (22) We will now show that the R-linear map χ (A):Rmn → which we will use later for the Frobenius norm of the polar lm Rln operates in an identical manner as the K -linear map A : n-complex numbers. Note that this differs from the usual defi- n Km → Kl . nition p, q = pq¯ [17] because we need the real restriction for n n Theorem 1: Let A ∈ Kl×m . Then the following holds: the desirable property p, p = |p|2 . To wit, observe that Re p = n 1) χ (I )=I if I ∈ Km ×m ; a =[χ(p)] i pq¯ =[χ(p)χ(q)∗] = mm m mn m n 0 ii for arbitrary , thus Re ii m ×r n 2) χlr(AB)=χlm (A)χmr(B) if B ∈ K ; [χ(p)]ik[χ(q)]ik which is the standard inner product be- n k=1 ∈ l×m tween the underlying elements. The same results can also be 3) χlm (A + B)=χlm (A)+χlm (B) if B Kn ; n−1 ∗ ∗ obtained from Re pq¯ . In other words, if p = i=0 aiei and 4) χlm (A )=χlm (A) ; n−1 −1 −1 q = i=0 bi ei , we get 5) χlm (A )=χlm (A) if it exists. n−1 Proof: 1, 3, and 4 can be verified by direct substitution. 5 Re pq¯ = Re pq¯ = a b . (23) − i i can be derived from 1–2 via the equality AA 1 = I. 2 can be i=0 proven using (15) and (18): An alternative way of looking at the isomorphism in (15) is to consider the circulant matrix as a sum [20], ⎛⎡ ⎤⎞ m m − ··· 0 1 ··· n 1 ⎜⎢ A1k Bk1 A1k Bkr ⎥⎟ χ(p)=a0 En + a1 En + + an−1 En , (24) ⎜⎢ ⎥⎟ ⎜⎢ k=1 k=1 ⎥⎟ where ⎡ ⎤ χ (AB)=χ ⎜⎢ . .. . ⎥⎟ ··· lm ⎜⎢ . . . ⎥⎟ 00 01 ⎝⎣ m m ⎦⎠ ⎢ ⎥ ··· ⎢ 10··· 00⎥ AlkBk1 AlkBkr ⎢ ⎥ k=1 k=1 ⎢ 01··· 00⎥ ∈ n×n En = ⎢ ⎥ R , (25) ⎡     ⎤ ⎢ . . . . . ⎥ m m ⎣ . . . . . ⎦ ··· . . . . . ⎢ χ A1k Bk1 χ A1k Bkr ⎥ ⎢ ⎥ 00··· 10 ⎢ k=1 k=1 ⎥ = ⎢ . .. . ⎥ ⎢  .  .  .  ⎥ 4A circulant matrix is a matrix C where each column is a cyclic shift of ⎣ m m ⎦ its previous column, such that C is diagonalizable by the DFT [18]. More χ AlkBk1 ··· χ AlkBkr concisely, we can write cik = a(i−k )modn . k=1 k=1 6536 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016

TABLE I ∈ 2×2 STEP-BY-STEP ILLUSTRATION OF THE CFT FOR A K2 ; SEE (10), (26) AND (27) FOR DEFINITIONS

  In general, due to the properties of the circulant blocks, the CFT can block diagonalize the adjoint matrix of any polar n-complex matrices. Here F = √1 1 1 . 2 2 1 −1 ⎡ ⎤ m m Algorithm 1: t-SVD [11]. χ(A )χ(B ) ··· χ(A )χ(B ) ⎢ 1k k1 1k kr ⎥ l×m ×n ⎢ k=1 k=1 ⎥ Input: X∈C // See footnote 5 for tensor notation. ⎢ ⎥ ⎢ . .. . ⎥ Output: U, S, V = ⎢ . . . ⎥ ⎣ m m ⎦ 1: X←ˆ fft(X , n, 3) // Applies n-point DFT to each tube. ··· χ(Alk)χ(Bk1 ) χ(Alk)χ(Bkr) 2: for i =1:n do k=1 k=1 3: [Uˆ ::i, Sˆ::i , Vˆ ::i ] ← svd(Xˆ ::i) // SVD each frontal slide. = χlm (A)χlm (B). 4: end for 5: U←ifft(Uˆ,n,3); S←ifft(Sˆ,n,3); V←ifft(Vˆ,n,3) In other words, the adjoint matrix χlm (A) is an isomorphic representation of the polar n-complex matrix A.  The above isomorphism is originally established for circulant blockwise through the SVD of cft(A) [11]: matrix-vector multiplication [17], which we have just extended ⎡ ⎤ ⎡ ⎤ ⎡ ⎤∗ ˆ ˆ ˆ to the case of matrix-matrix multiplication. This isomorphism U1 Σ1 V1 ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ simplifies our work both theoretically and experimentally by ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ , (31) allowing us to switch to the adjoint matrix representation where Uˆ n Σˆ n Vˆ n it is more convenient. ˆ ˆ ˆ ∈ l×l then we can use icft(U),icft(Σ), and icft(V) to get U Kn , ∈ l×m ∈ m ×m C. Singular Value Decomposition S Kn , and V Kn where U and V are unitary [11], [17]. This is equivalent to the t-SVD in tensor signal processing ∈ l×m For the SVD of A Kn , we first define the stride-by-s (see Algorithm 1) [11], provided that we store the l × m polar [22] permutation matrix of order m by: n-complex matrix into an l × m × n real tensor,5 then the n- point DFT along all tubes is equivalent to the CFT. Matrix [Pm,s] =[Im ] − −   (27) ik is (m 1) is/m ,k multiplication√ can also be done blockwise in the CFT domain for i, k =0, 1,...,m− 1. This is equivalent to but more suc- with the n scaling as before. cinct than the standard definition in the literature [22]. The D. Proposed Extensions stride-by-s permutation greatly simplifies the definition of the two-dimensional shuffle in the following. We define the circu- In order to study the phase angle between matrices, we define lant Fourier transform (CFT) and its inverse (ICFT), in the same a new polar n-complex inner product as  ∗ ∈ l×m way as [17]: A, B = Re tr(AB ), A, B Kn . (32) ⊗ ⊗ ∗ −1 and use it to induce the polar n-complex Frobenius norm: cft(A)=Pln,l(Il Fn )χlm (A)(Im Fn )Pmn,m, (28)    ˆ ⊗ ∗ −1 ˆ ⊗ A F = A, A . (33) χlm (icft(A)) = (Il Fn )Pln,lAPmn,m(Im Fn ), (29) We propose two further isomorphisms for polar n-complex ma- P (·)P−1 ln × mn l×m → l×mn l×m → lmn where ln,l mn,m shuffles an matrix containing trices via ξ : Kn R and ν : Kn R : n × n diagonal blocks into a block diagonal matrix containing l × m blocks. Please refer to Table I to see this shuffle in action. ξ(A)= [Im0 A, Im1 A,...,Imn−1 A], (34) The purpose of cft(A) is to block diagonalize the adjoint matrix ν(A)= vecξ(A). (35) of A into the following form [17]: ⎡ ⎤ 5By convention, we denote tensors with calligraphic letters. For a three- × × Aˆ 1 dimensional tensor A∈Rn 1 n 2 n 3 , a fiber is a one-dimension subarray de- ⎢ ⎥ ˆ ⎣ . ⎦ fined by fixing two of the indices, whereas a slice is a two-dimensional subarray A = cft(A)= .. , (30) defined by fixing one of the indices [23]. The (i, k, l)-th element of A is denoted by Aikl. If we indicate all elements of a one-dimensional subarray using the Aˆ n MATLAB colon notation, then A:kl, Ai:l ,andAik: are called the column, row and tube fibers, respectively [23]. Similarly, Ai::, A:k : ,andA::l are called the horizontal, lateral, and frontal slides, respectively [23]. Notably, Kilmer, Martin, while icft(Aˆ ) inverts this operation. Here, Aˆ i can be understood and Perrone [11] reinterprets an n × n × n tensor as an n × n matrix as the eigenvalues of the input as produced in the Fourier trans- 1 2 3 1 2 of tubes (of length n3 ). This is most relevant to our present work when polar form order, as noted by [17]. The SVD of A can be performed n3 -complex numbers are seen as tubes. CHAN AND YANG: POLAR n-COMPLEX AND n-BICOMPLEX SINGULAR VALUE DECOMPOSITION AND PRINCIPAL COMPONENT PURSUIT 6537

2 These are the polar n-complex matrix counterparts of the tensor Note that we still have p, p = |p| , because Re pq¯ = Re[χ 6 ∗ n unfold and vec operators, respectively. We end this subsection (p)χ(q) ]ii = Re k=1[χ(p)]ik[χ(q)]ik for arbitrary i, which l×m by enumerating two elementary algebraic properties of Kn , gives the Euclidean inner product (likewise for Re pq¯ ). So given n−1 n−1 which will come in handy when we investigate the trace norm p = i=0 ai ei and q = i=0 biei ,wenowhave later in Theorem 9. The proofs are given below for completeness. ∈ l×m n−1 Proposition 2: If A, B Kn , then the following holds: ∗ T Re pq¯ = Re pq¯ = Re a ¯b . (39) 1) A, B = Re tr(A B)=ν(A) ν(B); i i  2 | | i=0 2) A F = i σi(A) . where σ (A) are the singular values of A obtained from icft(Σˆ) i A. Polar n-Bicomplex Matrices and Their Isomorphisms after steps (30) and (31). Proof: Analogously, we denote the set of l × m matrices with polar l×m ∈ n-bicomplex entries by CKn . The adjoint matrix of A 1) This is a direct consequence of (20) after observing that × × ∗ l m l m → ln×mn ¯ CKn can be defined similarly via χlm : CKn C : Re tr(AB )=Re i,k AikBik. From this we can say that our polar n-complex inner product is Euclidean. As a ⎡ ⎤ χ(A11) χ(A12) ... χ(A1m ) corollary we have A2 = |A |2 . ⎢ ⎥ F i,k ik ⎢ ⎥ 2) As the Frobenius norm is invariant under any unitary χ(A21) χ(A22) ... χ(A2m ) χ (A)=⎢ ⎥ . (40)  2  2 lm ⎢ . . . . ⎥ transformation [24], we can write A F = Σ F = ⎣ . . .. . ⎦ | |2  . . . i σi (A) . χ(Al1 ) χ(Al2 ) ... χ(Alm )

III. EXTENSION TO POLAR n-BICOMPLEX NUMBERS Next we are going to show that the C-linear map χlm (A): √ mn → ln One problem with the real numbers is that −1 ∈/ R; that C C operates in the same manner as the CKn-linear m → l is, they are not algebraically closed. This affects the polar n- map A : CKn CKn . ∈ l×m complex numbers too since their real and imaginary parts consist Theorem 3: Let A CKn . Then we have: ∈ m ×m of real coefficients only. To impose algebraic closure for certain 1) χmm(Im )=Imn if Im CKn ; ∈ m ×r applications, we can go one step further and use complex coeffi- 2) χlr(AB)=χlm (A)χmr(B) if B CKn ; ∈ l×m 3) χlm (A + B)=χlm (A)+χlm (B) if B CKn ; cients instead. More specifically, we extend the polar n-complex ∗ ∗ algebra by allowing for complex coefficients in (6), such that 4) χlm (A )=χlm (A) ; −1 −1 5) χlm (A )=χlm (A) if it exists. p = a0 e0 + a1 e1 + ···+ an−1 en−1 ∈ CKn , (36) Proof: See Theorem 1.  The polar n-bicomplex SVD, inner product and Frobenius where a ,a ,...,a − ∈ C. In other words, both real and imag- 0 1 n 1 norm can be defined following (31), (32) and (33). The illus- inary parts of p now contain complex numbers (effectively tration in Table I still applies. The additional isomorphisms are doubling its dimensions). This constitutes our definition of the l×m → l×2mn l×m → 2lmn defined via ξ : CKn R and ν : CKn R : polar n-bicomplex numbers CKn . The first imaginary unit is − still e0 =1and e1 ,...,en 1 satisfies the same multiplication ξ(A)= [Re Im0 A, Im Im0 A,...,Im Imn−1 A] (41) table in (7). We can now write Re p = Re a0 for the real part ν(A)= vec ξ(A). (42) of p (note the additional Re) and Imi p = ai for the imaginary parts for i =0, 1,...,n− 1 (as before, the imaginary part in- ∈ l×m Proposition 4: If A, B CKn , then the following holds: cludes the real part for notational convenience). The modulus ∗ T 1) A, B = Re tr(A B)=ν(A) ν(B); then becomes  2 | | 2) A F = i σi (A) , 2 2 2 |p| = |a0 | + |a1 | + ···+ |an−1 | , (37) where σi(A) are the singular values of A. Proof: See Proposition 2.  along with the same n − 1 angles in (11–14). For example, if g =(1+2j)+(3+4j)e +(5+6j)e ,wehaveReg =1, 1 2 IV. POLAR n-COMPLEX AND n-BICOMPLEX PCP Im√ 0 g =1+2j,Im1 g =3+4j,Im2 g =5+6j, and |g| = 91. The polar n-bicomplex numbers are ring-isomorphic to PCP algorithms [1], [2] are traditionally implemented by the same matrix in (15), and have the same properties (17–20). proximal optimization [25] which extends gradient projection to The multiplication can still be done in the Fourier domain if the nonsmooth case. Often, closed-form solutions for the prox- desired. The polar n-bicomplex conjugation can be defined in imity operators are available, like soft-thresholding [26] and the same manner as (21). Given our new definition of Re, the singular value thresholding [27] in the real-valued case. scalar product is: A. Equivalence to Real-Valued Proximal Methods p, q = Re pq.¯ (38) To fix our notation, recall that the proximity operator of a function f : Rm → R is traditionally defined as [25]: × × 6Column unfolding reshapes the tensor A∈Rn 1 n 2 n 3 into a matrix M ∈ × n 1 n 2 n 3 1 R by mapping each tensor element Aikl into the corresponding matrix  − 2 ∈ m element M − [23]. proxf z = arg min z x 2 + f(x), x R . (43) i,k+(l 1)n 2 x 2 6538 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016

∈ m m   For x Kn or CKn we can use ν(x) instead of x and adjust λ X 1 using (44):    − 2 f(x) accordingly. As z x 2 is invariant under this transfor- F − λ ∈ lm m prox · z = 1 z, z F , (47) f λ 1 | | mation, we can equivalently extend the domain of to Kn or z + CKm without adjusting f(x) in the following. This equivalence n | | establishes the validity of proximal minimization using polar where F is Kn or CKn and z = vec Z.Here z corresponds n-complex and n-bicomplex matrices directly, without needing to the Euclidean norm in (44) and the grouping should follow to convert to the real domain temporarily. the definition of ξ(A) for the respective algebra. Note how each entry corresponds to its real-isomorphic group ξ(·) here. B. The Proximity Operator for the  Norm 1 C. The Proximity Operator for the Trace Norm We deal with the  - and trace-norm regularizers in order. 1 Next we will treat the trace-norm regularizer. We begin our Lemma 5 (Yuan and Lin [28]): Let {x ,...,x } be a  (1) (m ) proof by quoting a classic textbook inequality. In what follows, partition of x such that x = m x . The proximity oper- i=1 (i) σ (A) denotes the singular values of A. ator for the group lasso regularizer λ m x  is i i=1 (i) 2 Lemma 8 (von Neumann [24]): For any A, B ∈ Cl×m ,the     m von Neumann trace inequality holds: λ prox z = 1 − z , z ∈ Rr , (44) λ ·2 (i) ∗ z(i)2 Re tr(AB ) ≤ σi(A)σi (B). (48) + i=1 i m T T T where (y)+ denotes max(0, y), [yi ]i=1 =[y1 ,...,ym ] is a Proof: This is a standard textbook result [24].  × real column vector, and r is the sum of the sizes of x(·). Theorem 9: For any A, B ∈ Kl×m or CKl m , the following  n n Proof: This result is standard in sparse coding [28], [29]. extension to the von Neumann inequality holds: The group lasso is a variant of sparse coding that promotes ∗ group sparsity, i.e., zeroing entire groups of variables at once or Re tr(AB ) ≤ |σi(A)||σi (B)|. (49) not at all. When we put the real and imaginary parts of a polar i n-complex or n-bicomplex number in the same group, group Proof: This theorem embodies the key insight of this sparsity makes sense, since a number cannot be zero unless all paper. Our novel discovery is that we can switch to the block- its constituent parts are zero, as in the next theorem. diagonalized CFT space to separate the sums and switch back: Theorem 6: The polar n-complex or n-bicomplex lasso n−1 ˆ ˆ ∗ ˆ ˆ ∗ 1 2 m Re tr(AB )= Re tr(Ak Bk ) min z − x + λx1 , z, x ∈ F , (45) x 2 2 k=0 n−1 where F is Kn or CKn , is equivalent to the group lasso ≤ σi(Aˆ k )σi (Bˆ k ) 1 − 2   i k=0 min ξ(z x) F + λ ξ(x) 1,2 , (46)  ξ(x) 2 n−1 n−1 2 2 ≤ σ (Aˆ k ) σ (Bˆ k ) where A is defined as |A |2 . k=0 i k=0 i 1,2 i k ik i Proof: The proof is straightforward: = |σ (Aˆ )||σ (Bˆ)|, 1 1 i i z − x2 + λx = |z − x |2 + λ|x | i 2 2 1 2 i i i i where Aˆ = cft(A) and Bˆ = cft(B), respectively. The second

1 line is by Lemma 8 and the third line is due to the Cauchy- = ξ(z − x )2 + λξ(x ) 2 i i 2 i 2 Schwarz inequality. Using Parseval’s theorem, this theorem i is proved.  1 − 2   Theorem 10: The proximity operator for the polar n- = ξ(z x) F + λ ξ(x) 1,2 . 2 n λ |σ (X)| X,  complex or -bicomplex trace norm i i , assuming Z ∈ Kl×m or CKl×m ,is: The first line invokes the definitions of |·|in (8) and (37), n n    while the second line is due to the proposed isomorphisms in λ ∗ prox z = vec U 1 − ◦ Σ V , z ∈ F lm , λ·∗ | | (34) and (41). In other words, we have discovered a method to Σ + solve the novel polar n-complex or n-bicomplex lasso problem (50) using real-valued group lasso solvers. By combining Lemma 5 where z = vec Z, UΣV∗ is the SVD of Z with singular values and Theorem 6, we arrive at the main result of this subsection. Σii = σi(Z), the absolute value of Σ is computed entrywise, Corollary 7: For the entrywise 1 -regularizer λX1 , where and F is Kn or CKn . ∈ l×m l×m X, Z Kn or CKn , we may treat X as a long hypercom- Proof: The proof follows [29] closely except that Theo- plex vector of length lm without loss of generality. Simply rem 9 allows us to extend the proof to the polar n-complex assign each to its own group gi , for all and n-bicomplex cases. Starting from the Euclidean inner prod- 1 ≤ i ≤ lm numbers, and we obtain the proximity operator for uct identity z − x, z − x = z,z−2 z,x + x, x, which CHAN AND YANG: POLAR n-COMPLEX AND n-BICOMPLEX SINGULAR VALUE DECOMPOSITION AND PRINCIPAL COMPONENT PURSUIT 6539 is applicable because of Propositions 2 and 4, we have the Algorithm 2: Polar n-(Bi)complex PCP. following inequality: l×m ∞ Input: X ∈ F ,F ∈{Kn , CKn }, λ ∈ R, μ ∈ R  − 2 | |2 −  | |2 Z X F = σi (Z) 2 Z, X + σi (X) Output: Lk , Sk −1 i i 1: Let S =0, Y = X/ max X ,λ X∞ , k =1 1 1 2 2 2 2: while not converged do ≥ |σi (Z)| − 2|σi (Z)||σi (X)| + |σi (X)| −1 3: Lk+1 ← prox · (X − Sk + μ Yk ) i 1/μk ∗ k ← F − −1 4: Sk+1 prox (X Lk+1 + μk Yk ) 2 λ/μk ·1 = (|σi (Z)|−|σi (X)|) , 5: Yk+1 ← Yk + μk (X − Lk+1 − Sk+1) i 6: k ← k +1 where Theorem 9 is invoked on the penultimate line. Thus: 7: end while 1 Z − X2 + λ |σ (X)| 2 F i i Algorithm 3: Optimized Polar n-(Bi)complex PCP. ∈ l×m ∈{ } ∈ ∈ ∞ 1 Input: X F ,F Kn , CKn , λ R, μ R ≥ (|σ (Z)|−|σ (X)|)2 + λ|σ (X)| 2 i i i Output: L, S i −1 1: Let Sˆ =0, Y = X/ max X2 ,λ X∞ , k =1 1 2 2: Xˆ ← fft(X, n, 3) // Applies n-point DFT to each tube. = |σ(Z)|−|σ(X)| + λσ(X)1 , 2 2 3: Yˆ ← fft(Y, n, 3) which is equivalent to a lasso problem on the (elementwise) 4: while not converged do ˆ ← ˆ − ˆ −1 ˆ modulus of the singular values of a polar n-complex or n- 5: Z X S + μk Y bicomplex matrix. By applying Corollary 7 to the modulus of 6: for i =1:n do ˆ ˆ ˆ ← ˆ the singular values entrywise, the theorem is proved.  7: [U::i , Σ::i , V::i ] svd(Z::i) Unlike the entrywise 1 -regularizer, the proximity operator in 8: end for 9: Σˆ ← prox√F Σˆ Theorem 10 first operates on the entire matrix all at once. Once n/μk ·1 the SVD is computed, the absolute value of its singular values 10: for i =1:n do ˆ ˆ ˆ ˆ ∗ are then calculated entrywise (or real-isomorphic groupwise) to 11: L::i = U::iΣ::i V::i respect the properties of the underlying algebra. 12: end for − ˆ ← F √ ˆ − ˆ 1 ˆ 13: S prox (X L + μk Y) λ n/μk ·1 D. The Extended Formulations of PCP 14: Yˆ ← Yˆ + μk (Xˆ − Lˆ − Sˆ) With the new proximal operators in (47) and (50), we can 15: k ← k +1 finally define the polar n-complex and n-bicomplex PCP: 16: end while 17: L ← ifft(Lˆ,n,3) min L∗ + λS1 s.t. X = L + S , (51) ← ˆ L,S 18: S ifft(S,n,3) ∈ l×m ∈ where X Kn for the polar n-complex PCP and X l×m CKn for the polar n-bicomplex PCP. We can solve this by would work, but good guesses would converge faster [6]. As the same algorithms in [6], except that we should replace the for μk , any increasing sequence can be used, so long as it sat- ∞ soft-thresholding function: isfies the convergence assumptions μ /μ2 < ∞ and ⎧ k=1 k+1 k lim →∞ μ (S − S )=0[6]. As both K and CK are iso- ⎪ x − λ, if x>λ, k k k+1 k n n ⎨ morphic to the circulant algebra, the easiest option is to use S − λ[x]=⎪ x + λ, if x< λ, (52) Gleich’s circulant algebra matrix (CAMAT) toolbox [17] to ⎩⎪ 0, otherwise implement the algorithms. However, CAMAT is slightly slow due to unnecessary conversions to and from frequency domain with proxKn z and proxCKn z for the polar n-complex at each iteration, so we reimplement this algebra from scratch λ·1 λ·1 and n-bicomplex PCP, respectively. The inexact augmented and entirely in the Fourier domain, via (16). See Algorithm 3 Lagrange multiplier (IALM) method, also known as alternating √for our optimized frequency domain implementation. The extra n scaling for the proximity operators is due to the fact that direction method of multipliers, is well-established in the 7 literature and its convergence has long been proven [30]–[32]. MATLAB’s fft is unnormalized. Our adaptation is shown in Algorithm 2. As the constraint X = L + S only uses simple additions, which are elementwise V. N UMERICAL SIMULATIONS by definition, IALM will continue to work without change (via To demonstrate the benefit of algebraic closure in polar n- Proposition 2 and Proposition 4). In the original IALM formu- bicomplex numbers (introduced in Section III), we will numer- lation [6], their choice of Y1 is informed by the dual problem, whereas their μk ’s are incremented geometrically to infinity. 7All the code for this paper (including Algorithms 3 and 4) is available at We will simply follow them here. In theory, any initial value http://mac.citi.sinica.edu.tw/ikala/code.html to support reproducibility. 6540 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016

Fig. 1. Recovery success rates for (a) polar 4-complex embedding, (b) polar 2-bicomplex embedding, and (c) quaternionic embedding. Matrix generation and success criteria are detailed in Section V. From top to bottom: results for ε =0.1, 0.05, and 0.01, respectively. Grayscale color indicates the fraction of success (white denoting complete success, black denoting total failure).

ically recover hypercomplex matrices of various ranks from trial a success if the recovered low-rank solution L1 satis-  − ∗  ∗ additive noises with different levels of sparsity using hyper- fies L1 X1 Y1 F / X1 Y1 F <ε. Likewise, the M2 part complex PCP. Low-rank plus sparse matrices can be generated of the trial is deemed successful if the recovered L2 satisfies ∗  − ∗  ∗ using Candes` et al.’s XY + S model [1], where X and Y L2 X2 Y2 F / X2 Y2 F <ε. are m × r matrices with independent and identically distributed The results are shown in Fig. 1 for ε =0.1, 0.05, and 0.01. The (i.i.d.) Gaussian entries from N (0, 1/m), S is an m × m ma- color of each cell indicates the proportion of successful recovery trix with i.i.d. 0-1 entries from Bernoulli(ρ) multiplied by uni- for each (r, ρ) pair across all 10 trials. Results suggest that formly random signs, and r and ρ are the desired rank and quaternions and polar 2-bicomplex numbers have comparable sparsity, respectively. To accommodate complex coefficients, performance up to a sparsity of about 0.16. Both markedly we instead use the complex normal distribution CN(0, I/m) outperform polar 4-complex numbers for all ε. As we decrease for X and Y, and replace the random signs for S with unit- ε to 0.01, the polar 4-complex numbers have completely failed modulus complex numbers whose phases are uniformly dis- while the other two are still working well. It may be argued tributed. Following [1], we consider square matrices of size that the quaternions are better than polar 2-bicomplex numbers m = 400. For each (r, ρ) pair, we conduct 10 trials of the fol- for sparsities above 0.16, but their main weakness is that the lowing simulation. In each trial, we generate two complex ma- dimensionality is fixed at 4 so they are less flexible than polar ∗ ∗ trices, M1 = X1 Y1 + S1 and M2 = X2 Y2 + S2 ,usingthe n-bicomplex numbers in general. In summary, our simulations complexified model described above. Then we embed the two have provided clear evidence for the importance of algebraic complex matrices into one hypercomplex matrix by: closure in hypercomplex systems. 1) Polar 4-complex embedding: the matrices are combined Next, we will use real data to test the practicality of our into (ReM1 )+(ImM1 )e1 +(ReM2 )e2 +(ImM2 )e3 . proposed algorithms. 2) Polar 2-bicomplex embedding: the matrices are combined into M1 + M2 e1 . 3) Quaternionic embedding [7]: the matrices are combined VI. EXPERIMENTS into M1 + M2 j. For each embedding, we perform PCP with a relative er- In this section, we use the singing voice separation (SVS) −7 ror tolerance of 10 , as in [6]. We call the M1 part of the task to evaluate the effectiveness of the polar n-bicomplex PCP. CHAN AND YANG: POLAR n-COMPLEX AND n-BICOMPLEX SINGULAR VALUE DECOMPOSITION AND PRINCIPAL COMPONENT PURSUIT 6541

Algorithm 4: Optimized Tensor RPCA (cf. [15]). l×m ∞ Input: X ∈ F ,F ∈{Kn , CKn }, λ ∈ R, μ ∈ R Output: L, S −1 1: Let Sˆ =0, Y = X/ max X2 ,λ X∞ , k =1 2: Xˆ ← fft(X, n, 3) // Applies n-point DFT to each tube. 3: Yˆ ← fft(Y, n, 3) Fig. 2. Block diagram of a multichannel PCP-SVS system. For our ex- periments, PCP is either polar n-bicomplex PCP, polar 2n-complex PCP, 4: while not converged do ˆ ← ˆ − ˆ −1 ˆ quaternionic PCP [7], or tensor RPCA [15]. 5: Z X S + μk Y 6: for i =1:n do ˆ ˆ ˆ ← ˆ SVS is an instance of blind source separation in the field of 7: [U::i , Σ, V::i ] svd(Z::i) 8: Σˆ ←S [Σˆ] music signal processing, and its goal is to separate the singing 1/μk ˆ ˆ ˆ ˆ ∗ voice component from an audio mixture containing both the 9: L::i = U::i ΣV::i 10: end for singing voice and the instrumental accompaniment (see Fig. 2). − ˆ ← F √ ˆ − ˆ 1 ˆ 11: S prox (X L + μk Y) For applications such as singer modeling or lyric alignment [33], λ n/μk ·1 SVS has been shown an important pre-processing step for better 12: Yˆ ← Yˆ + μk (Xˆ − Lˆ − Sˆ) performance. We consider SVS in this evaluation because PCP 13: k ← k +1 has been found promising for this particular task, showing that 14: end while to a certain degree the magnitude spectrogram of pop music can 15: L ← ifft(Lˆ,n,3) be decomposed into a low-rank instrumental component and a 16: S ← ifft(Sˆ,n,3) sparse voice component [2]. A. Algorithms of 100 full stereo songs of different styles and includes The following versions of PCP-SVS are compared: the synthesized mixtures and the original sources of voice 1) Polar n-bicomplex PCP: the n-channel audio is repre- and instrumental accompaniment. To reduce computa- ··· sented using X1 e0 + + Xn en−1 , where Xi contains tions, we use only 30-second fragments (1’45” to 2’15”) the complex spectrogram for the i-th channel. clipped from each song, which is the only period where all 2) Polar 2n-complex PCP: the n-channel audio is rep- 100 songs contain vocals. The MSD100 songs are divided ··· resented using (Re X1 )e0 +(Im X1 )e1 + +(Re Xn ) into 50 development songs and 50 test songs, but SiSEC e2n−2 +(Im Xn )e2n−1 , where Xi contains the complex requires testing to be done on both sets. We will follow spectrogram for the i-th channel. their convention here. 3) Quaternionic PCP (if applicable) [7]: the two-channel au- 2) The Single- and Multichannel Audio Recordings dio is represented using X1 + X2 j, where Xi contains Database (SMARD).9 This dataset contains 48 measure- the complex spectrogram for the i-th channel. ment configurations with 20 audio recordings each [34]. 4) Tensor RPCA [15]: the same spectrograms are represented SMARD configurations consist of four digits (ABCD): by complex matrices of tubes. The tensor RPCA is used, A denotes the loudspeaker equipment used, B denotes which is defined by: loudspeaker location, C denotes microphone type, and D

min LTNN + λS1,1,2 s.t. X = L + S, (53) denotes microphone array locations. To simulate real life L,S recordings, we require that voice and music come from

where LTNN is defined as the sum of the singular different point sources, that is B =0for voice and 1 for values of all frontal slices of Lˆ (obtained by a Fourier music or vice versa. Secondly, we require C =2for circu- S transform along each tube) and 1,1,2 is defined by lar microphone arrays, because they are better for spatial S  i,k ik: F [15]. To facilitate comparison with polar surround audio recording. Further we choose the first cir- n-bicomplex PCP, we retrofit (53) into our framework: cular array which is closest to the sources, which gives us six audio channels. Finally, we require voice and music min cft(L)∗ + λS1 s.t. X = L + S, (54) L,S to have the same A and D so it makes sense to mix the signals. For each chosen configuration, we mix the first ∈ l×m where X Kn is the input. Our optimized implemen- 30 seconds of soprano with the first 30 seconds of each of tation is shown in Algorithm 4, where all calculations are the music signals (clarinet, trumpet, xylophone, ABBA, done in the frequency domain. bass flute, guitar, violin) at 0 dB signal-to-noise ratio. For B. Datasets soprano, we pad zero until it reaches 30 seconds; for mu- sic, we loop it until it reaches 30 seconds. This creates a The following datasets will be used: repeating music accompaniment mixed with sparser vo- 1) The MSD100 dataset from the 2015 Signal Separation cals. We single out two configurations as the training set Evaluation Campaign (SiSEC).8 The dataset is composed

8http://corpus-search.nii.ac.jp/sisec/2015/MUS/MSD100_2.zip 9http://www.smard.es.aau.dk/ 6542 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016

(music from 2020 with soprano from 2120, music from TABLE II 2021 with soprano from 2121), while using the remaining RESULTS FOR MSD100 INSTRUMENTAL (L) AND VOCAL (S), IN DB 10 configurations for testing. For both datasets, we downsample the songs to 22 050 Hz to reduce memory usage, then we use a short-time Fourier trans- form (STFT) with a 1 411-point Hann window with 75% overlap as in [35].

C. Parameters and Evaluation

Following [6], the convergence criteria is X − Lk − −7 Sk F /XF < 10 , and μ is defined by μ0 =1.25/X2 and μk+1 =1.5μk .Thevalueofc is determined by a grid search TABLE III on the training set and is found to be 3 for SiSEC and 2 for RESULTS FOR SMARD INSTRUMENTAL (L) AND VOCAL (S), IN DB SMARD (1 for SMARD with tensor RPCA). The quality of separation will be assessed by BSS Eval toolbox version 3.010 in terms of signal-to-distortion ratio (SDR), source-image-to-spatial-distortion ratio (ISR), source- to-interference ratio (SIR), and sources-to-artifacts ratio (SAR), for the vocal and the instrumental parts, respectively [36]. BSS Eval decomposes each estimated source h into four compo- nents (assuming that the admissible distortion is a time-invariant filter [37]): 2-bicomplex PCP has the highest GNSDR in both L and S, strue espat einterf eartif, sˆh = h + h + h + h (55) followed by polar 4-complex PCP. Both are also slightly better where sˆ is the estimated source, strue is the true source, espat than tensor RPCA on all other performance measures except is the spatial distortion for multi-channel signals, einterf is the GISR and GSAR in L. Overall, the result for L is better than S interference from other sources, and eartif is the artifacts of the because the instruments in this dataset are usually louder than source separation algorithm such as musical noise. The metrics the vocals (as reflected by the GSDR). It can be observed that are then computed as follows [36]: the GNSDR for polar n-(bi)complex PCP are not inferior to that of quaternionic PCP, suggesting that they are good candidates strue SDR = 20 log h , (56) for PCP with four-dimensional signals. h 10  − true sˆh sh For the SMARD dataset, the results are presented in Ta-  true ble III. Both of our proposed algorithms are equally competitive, sh ISRh = 20 log , (57) 10 espat and both clearly outperform tensor RPCA in terms of GNSDR, h GSDR, and GSIR. When we break down the results by config- strue + espat uration, we find that polar n-(bi)complex PCP are better than SIR = 20 log h h , (58) h 10  interf  tensor RPCA in 8 out of 10 configurations. eh  − artif sˆh eh SAR = 20 log . (59) VII. DISCUSSION AND CONCLUSION h 10  artif eh We believe that we have demonstrated the superiority of our All these measures are energy ratios expressed in decibels. proposed hypercomplex algorithms. Theoretically, the tensor Higher values indicate better separation quality. During param- RPCA [15] is computing the nuclear norm in the CFT space eter tuning, h is dropped and the measures are averaged over (54), which is probably due to an erroneous belief that the CFT all sources. From SDR we also calculate the normalized SDR is unitary and thus does not change anything [39]. However, (NSDR) by computing the improvement in SDR using the mix- as t-SVD is based on the circulant algebra, where the singular ture itself as the baseline [38]. We compute these measures for values are also circulants, the two trace norms are not equiva- each song and then report the average result (denoted by the G lent. As a result, we should not have omitted the ICFT, as tensor prefix) for both the instrumental (L) and vocal (S) parts. The RPCA does. This omission is difficult to detect because tensors most important metric is GNSDR which measures the overall themselves do not have enough algebraic structures to guide us. improvement in source separation performance. In contrast, our formulation includes both the CFT and ICFT steps while computing the SVD of a polar n-bicomplex matrix, D. Results as described in the paragraph after (31), which does not violate The results for the MSD100 dataset are shown in Table II. The the underlying circulant algebra. This observation hints at a new best results are highlighted in bold. Broadly speaking, polar role for hypercomplex algebras—to provide additional algebraic structures that serve as a new foundation for tensor factoriza- 10http://bass-db.gforge.inria.fr/ tion. By way of example, let us consider Olariu’s other work, CHAN AND YANG: POLAR n-COMPLEX AND n-BICOMPLEX SINGULAR VALUE DECOMPOSITION AND PRINCIPAL COMPONENT PURSUIT 6543

where a ,a ∈ R for hyperbolic numbers and a ,a ∈ C for Algorithm 5: t-SVD with a Skew-Circulant Representation. 0 1 o 1 the tessarines. Thus, the hyperbolic numbers are isomorphic X∈ l×m ×n Input: C to K whereas the tessarines are isomorphic to CK . Interest- U S V 2 2 Output: , , ingly, the seminal paper on tessarine SVD [45] has advocated 1: X←ˆ sft(X , n, 3) // We use the skew DFT instead. the e1 − e2 form to simplify computations, where they trans- 2: for i =1:n do form the inputs with (a0 ,a1 ) → (a0 + a1 ,a0 − a1 ), perform 3: [Uˆ , Sˆ , Vˆ ] ← svd(Xˆ ) ::i ::i ::i ::i the SVDs, then transform the outputs back with (A0 ,A0 ) → 4: end for ((A + A )/2, (A − A )/2). If we look closely, these are ac- U← Uˆ S← Sˆ V← Vˆ 0 1 0 1 5: isft( ,n,3); isft( ,n,3); isft( ,n,3) tually Fourier transform pairs (as used in Algorithm 1), hence the tessarine SVD can be considered as a special case of t- TABLE IV SVD when n =2. It can also be observed that, when n =1, NUMBER OF DISTINCT (SEMI)GROUPS OF ORDERS UP TO 9 the polar n-complex and polar n-bicomplex PCP degenerate into the real and complex PCP, respectively. It should be em- phasized that the complex numbers are not in Kn , therefore we have introduced CKn for algebraic closure, and its importance has been confirmed by numerical simulations. We further note that the two families of 2N-dimensional hypercomplex numbers introduced by Alfsmann [10] are also commutative group alge- bras diagonalizable by the Walsh-Hadamard transform matrices F2 ⊗···⊗F2 [10], [41]. To conclude, we have extended the PCP to the polar n- complex and n-bicomplex algebras, with good results. Both From the On-Line Encyclopedia of Sequences, http:// N oeis.org/A000688 and http://oeis.org/A001427 algebras are representationally compact (does not require 2 dimensions) and are computationally efficient in Fourier space. We have found it beneficial to incorporate hypercomplex alge- the planar n-complex numbers, which have a skew-circulant braic structures while defining the trace norm. More concretely, representation [16]. As skew circulants are diagonalizable by we have proven an extended von Neumann theorem, together the skew DFT,11 a new kind of t-SVD can be derived easily (see with an adaptation of the group lasso, which in concert enable Algorithm 5). Here sft and isft stands for skew DFT and inverse us to formulate and solve the hypercomplex PCP problem. In skew DFT, respectively. doing so, we are able to incorporate the correct algebraic struc- What is more, the above procedure can be trivially extended tures into the objective function itself. We have demonstrated to any commutative group algebras,12 since the matrix represen- that the hypercomplex approach is useful because it can: 1) in- tation of a commutative group algebra is diagonalizable by the form t-SVD-related algorithms by imposing relevant algebraic DFT matrix for the algebra [41], viz. F ⊗···⊗F where n 1 n m structures; and 2) generate new families of t-SVD’s beyond n to n can be uniquely determined [42]. In other words, we 1 m the circulant algebra. We have also established that tessarine get the commutative group algebraic t-SVD simply by reinter- SVD is a special case of t-SVD, and that the 2N-hypercomplex preting fft and ifft in Algorithm 1 according to the algebra’s DFT family of Alfsmann is amenable to a straightforward exten- matrix, for which fast algorithms are available [42]. Going even sion of t-SVD which we call the commutative group algebraic further, we conjecture that the most fruitful results for hyper- t-SVD. Having formulated the first proper PCP algorithm on complex SVD may originate from regular semigroup algebras cyclic algebras, we would recommend more crossover attempts (i.e., by relaxing the group axioms of identity and invertibility between the hypercomplex and tensor-based approaches for to that of pseudoinvertibility [43]). By doing so, we gain a much future work. larger modeling space (see Table IV) which may be desirable for data fitting applications. At present, harmonic analysis on semigroups [44] is still relatively unexplored in tensor signal ACKNOWLEDGMENT processing. The authors would like to thank the anonymous reviewers for Regarding the hyperbolic numbers and tessarines that Alfs- their numerous helpful suggestions. mann has recommended, we find that both of them share the same circulant representation [10]:   REFERENCES a0 a1 , (60) [1] E. J. Candes,` X. Li, Y. Ma, and J. Wright, “Robust principal component a1 a0 analysis?” J. ACM, vol. 58, no. 3, pp. 1–37, 2011. [2] P. -S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, 11 T T “Singing-voice separation from monaural recordings using robust princi- The skew DFT of [a0 ,a1 ,...,an −1 ] is [A0 ,A1 ,...,An −1 ] where − pal component analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal n 1 −πij(2k +1)/n − Ak = i=0 ai e for k =0, 1,...,n 1 [40]. Process., 2012, pp. 57–60. 12Hypercomplex algebras where the real and imaginary units obey the com- [3] Y.Ikemiya, K. Yoshii, and K. Itoyama, “Singing voice analysis and editing mutative group axioms including associativity, commutativity, identity, and based on mutually dependent f0 estimation and source separation,” in Proc. invertibility. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 574–578. 6544 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 24, DECEMBER 15, 2016

[4] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robust align- [32] S. Kontogiorgis and R. R. Meyer, “A variable-penalty alternating di- ment by sparse and low-rank decomposition for linearly correlated im- rections method for convex optimization,” Math. Program., vol. 83, ages,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 29–53, 1998. 2010, pp. 763–770. [33] B. Zhu, W. Li, R. Li, and X. Xue, “Multi-stage non-negative matrix fac- [5] T. Bouwmans and E. H. Zahzah, “Robust PCA via principal component torization for monaural singing voice separation,” IEEE Trans. Audio, pursuit: A review for a comparative evaluation in video surveillance,” Speech, Lang. Process., vol. 21, no. 10, pp. 2096–2107, Oct. 2013. Comput. Vis. Image Understand., vol. 122, pp. 22–34, 2014. [34] J. K. Nielsen, J. R. Jensen, S. H. Jensen, and M. G. Christensen, “The [6] Z. Lin, M. Chen, L. Wu, and Y. Ma, “The augmented Lagrange multiplier single- and multichannel audio recordings database (SMARD),” in Proc. method for exact recovery of corrupted low-rank matrices,” Univ. Illinois Int. Workshop Acoust. Signal Enhancement, 2014, pp. 40–44. at Urbana-Champaign, IL, USA, Tech. Rep. UILU-ENG-09–2215, 2009. [35] T.-S. Chan et al., “Vocal activity informed singing voice separation [7] T.-S. T. Chan and Y.-H. Yang, “Complex and quaternionic principal com- with the iKala dataset,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal ponent pursuit and its application to audio separation,” IEEE Signal Pro- Process., 2015, pp. 718–722. cess. Lett., vol. 23, no. 2, pp. 287–291, Feb. 2016. [36] E. Vincent et al., “The signal separation evaluation campaign (2007– [8] I. L. Kantor and A. S. Solodovnikov, Hypercomplex Numbers.NewYork, 2010): Achievements and remaining challenges,” Signal Process., vol. 92, NY, USA: Springer-Verlag, 1989. pp. 1928–1936, 2012. [9] P. Lounesto, Clifford Algebras and Spinors. Cambridge, U.K.: Cambridge [37] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in Univ. Press, 2001. blind audio source separation,” IEEE Trans. Audio, Speech, Lang. Pro- [10] D. Alfsmann, “On families of 2N -dimensional hypercomplex algebras cess., vol. 14, no. 4, pp. 1462–1469, 2006. suitable for digital signal processing,” in Proc. Eur. Signal Process. Conf., [38] C.-L. Hsu and J.-S. Jang, “On the improvement of singing voice separation 2006, pp. 1–4. for monaural recordings using the MIR-1K dataset,” IEEE Trans. Audio, [11] M. E. Kilmer, C. D. Martin, and L. Perrone, “A third-order generaliza- Speech, Lang. Process., vol. 18, no. 2, pp. 310–319, Feb. 2010. tion of the matrix SVD as a product of third-order tensors,” Tufts Univ., [39] O. Semerci, N. Hao, M. E. Kilmer, and E. L. Miller, “Tensor-based for- Medford, MA, USA, Tech. Rep. TR-2008–4, Oct. 2008. mulation and nuclear norm regularization for multienergy computed to- [12] K. Braman, “Third-order tensors as linear operators on a space of matri- mography,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1678–1693, ces,” Linear Algebra Appl., vol. 433, pp. 1241–1253, 2010. Apr. 2014. [13] M. E. Kilmer and C. D. Martin, “Factorization strategies for third-order [40] I. J. Good, “Skew circulants and the theory of numbers,” Fibonacci Quart., tensors,” Linear Algebra Appl., vol. 435, pp. 641–658, 2011. vol. 24, no. 2, pp. 47–60, 1986. [14] M. E. Kilmer, K. Braman, N. Hao, and R. C. Hoover, “Third-order tensors [41] M. Clausen and U. Baum, Fast Fourier Transforms. Mannheim, Germany: as operators on matrices: A theoretical and computational framework BI-Wissenschaftsverlag, 1993. with applications in imaging,” SIAM J. Matrix Anal. Appl., vol. 34, no. 1, [42] G. Apple and P.Wintz, “Calculation of Fourier transforms on finite Abelian pp. 148–172, 2013. groups (Corresp.),” IEEE Trans. Inf. Theory, vol. 16, no. 2, pp. 233–234, [15] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. E. Kilmer, “Novel methods Mar. 1970. for multilinear data completion and de-noising based on tensor-SVD,” in [43] M. Kilp, U. Knauer, and A. V. Mikhalev, Monoids, Acts and Categories Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 3842–3849. With Applications to Wreath Products and Graphs: A Handbook for Stu- [16] S. Olariu, Complex Numbers in N Dimensions. Amsterdam, The dents and Researchers. Berlin, Germany: Walter de Gruyter, 2000. Netherlands: Elsevier, 2002. [44] C. Berg, J. P. R. Christensen, and P. Ressel, Harmonic Analysis on Semi- [17] D. F. Gleich, C. Greif, and J. M. Varah, “The power and Arnoldi methods groups: Theory of Positive Definite and Related Functions.NewYork, in an algebra of circulants,” Numer.Linear Algebra Appl., vol. 20, pp. 809– NY, USA: Springer-Verlag, 1984. 831, 2013. [45] S.-C. Pei, J.-H. Chang, J.-J. Ding, and M.-Y. Chen, “Eigenvalues and [18] P. J. Davis, Circulant Matrices. New York, NY, USA: Wiley, 1979. singular value decompositions of reduced matrices,” IEEE [19] D. P. Mandic, C. Jahanchahi, and C. C. Took, “A quaternion gradient Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 9, pp. 2673–2685, Oct. operator and its applications,” IEEE Signal Process. Lett., vol. 18, no. 1, 2008. pp. 47–50, Jan. 2011. [20] I. Kra and S. R. Simanca, “On circulant matrices,” Notices Amer. Math. Soc., vol. 59, no. 3, pp. 368–377, 2012. [21] P. M. Cohn, Further Algebra and Applications. London, U.K.: Springer- Tak-Shing T. Chan (M’15) received the Ph.D. de- Verlag, 2003. gree from the University of London, London, U.K., [22] J. Granata, M. Conner, and R. Tolimieri, “The tensor product: A mathe- in 2008. From 2006 to 2008, he was a Scientific Pro- matical programming language for FFTs and other fast DSP operations,” grammer with the University of Sheffield. In 2011, IEEE Signal Process. Mag., vol. 9, no. 1, pp. 40–48, Jan. 1992. he was a Research Associate with the Hong Kong [23] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” Polytechnic University. He is currently a Postdoctoral SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009. Fellow in the Academia Sinica, Taipei, Taiwan. His [24] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. Cambridge, U.K.: research interests include signal processing, cognitive Cambridge Univ. Press, 2013. informatics, distributed computing, pattern recogni- [25] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in sig- tion, and hypercomplex analysis. nal processing,” in Fixed-Point Algorithms for Inverse Problems in Sci- ence and Engineering.,vol.49,H.H.Bauschke,R.S.Burachik,P.L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Eds. New York, NY, Yi-Hsuan Yang (M’11) received the Ph.D. degree in USA: Springer-Verlag, 2011, pp. 185–212. communication engineering from the National Tai- [26] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. Theory, wan University, Taipei, Taiwan, in 2010. Since 2011, vol. 41, no. 3, pp. 613–627, May 1995. he was with Academia Sinica as an Assistant Re- [27] J.-F. Cai, E. J. Candes,` and Z. Shen, “A singular value thresholding algo- search Fellow. He is also an Adjunct Assistant Profes- rithm for matrix completion,” SIAM J. Optim., vol. 20, no. 4, pp. 1956– sor with the National Cheng Kung University, Tainan, 1982, 2010. Taiwan. His research interests include music informa- [28] M. Yuan and Y. Lin, “Model selection and estimation in regression with tion retrieval, machine learning, and affective com- grouped variables,” J. Roy. Stat. Soc. B, vol. 68, no. 1, pp. 49–67, 2006. puting. He received the 2011 IEEE Signal Processing [29] R. Tomioka, T. Suzuki, and M. Sugiyama, “Augmented Lagrangian meth- Society Young Author Best Paper Award, the 2012 ods for learning, selecting, and combining features,” in Optimization for ACM Multimedia Grand Challenge First Prize, the Machine Learning, S. Sra, S. Nowozin, and S. J. Wright, Eds. Cambridge, 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and MA, USA: MIT Press, 2012, pp. 255–285. Technology, Taiwan, and the 2014 IEEE ICME Best Paper Award. He is an [30] P. L. Lions and B. Mercier, “Splitting algorithms for the sum of two author of the book Music Emotion Recognition (CRC Press, 2011) and a tutorial nonlinear operators,” SIAM J. Numer. Anal., vol. 16, no. 6, pp. 964–979, speaker on music affect recognition in the International Society for Music Infor- 1979. mation Retrieval Conference (ISMIR, 2012). In 2014, he served as a Technical [31] J. Eckstein and D. P. Bertsekas, “On the Douglas–Rachford splitting Program cochair of ISMIR, and a Guest Editor of the IEEE TRANSACTIONS method and the proximal point algorithm for maximal monotone oper- ON AFFECTIVE COMPUTING,andtheACMTRANSACTIONS ON INTELLIGENT ators,” Math. Program., vol. 55, pp. 293–318, 1992. SYSTEMS AND TECHNOLOGY.