<<

Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

Hypercomplex Principal Component Pursuit via Convex Optimization

Takehiko Mizoguchi∗† and Isao Yamada∗ ∗Department of Information and Communications Engineering, Tokyo Institute of Technology, Tokyo, Japan †NEC Corporation, Kanagawa, Japan E-mail: {mizoguchi,isao}@sp.ce.titech.ac.jp

Abstract—Expressing multidimensional information as a value color, material, intensity, etc., and each attribute may have in hypercomplex systems (e.g., , , etc.) correlation with other attributes. Modeling of the correlations has great potential, in signal processing applications, to enjoy among attributes in multidimensional data and will be more their nontrivial algebraic benefits which are not available in standard real or complex vector systems. Exploiting sparsity of and more important by the popularization of 3D printer [11], hypercomplex matrices or vectors would play an important role virtual reality, medical imaging etc. Algebraically natural oper- in utilizing such benefits. In this paper, we propose a new spar- ations in system has great potential for sity measure for evaluating sparsity of hypercomplex matrices. such modelingof various correlations (see e.g., [12]–[16] for With utilizing this measure, the hypercomplex robust principal color image processing applications). However, because of the component analysis can be relaxed into the hypercomplex robust principal component pursuit and it can be reduced to a real “singularity” of higher dimensional C-D number systems (see convex optimization problem. We then derive an algorithmic e.g., Example 1), few mathematical tools had been maintained solution to the hypercomplex principal component pursuit based [17]–[20]. To overcome this situation, in our previous works on a proximal splitting technique. Numerical experiments are [21]–[23], we have proposed several useful mathematical tools performed in the octonion domain and show that the proposed for designing advanced algorithm for optimization, learning algorithm outperforms a part-wise state-of-art real, complex and quaternion algorithm. and low rank approximation in hypercomplex domain. In [21] we proposed an algebraic real translation for clarifying the I.INTRODUCTION relation between C-D linear system and real vector valued Multidimensional information arises naturally in many areas linear systems, and successfully designed some online learning of engineering and science since almost all observations have algorithms which are available in general C-D domain. More- many attributes. Utilizing hypercomplex number system for over, we also proposed in [22], [23] useful tools C-D singular representing such multidimensional information is one of the value decomposition, R-rank (see Section II-C) and low rank most effective ways because we can express multidimensional approximation technique and proposed an algorithmic solution information not in terms of vectors but in terms of num- for low rank hypercomplex tensor completion. bers among which we can define the four basic arithmetic One of the other standard approximation methods utilizing operators. Indeed, it has been used in many areas such as simpler structure of matrices is the robust principal component computer graphics [1] and robotics [2], [3] wind forecasting analysis (RPCA) [24], which separates an input into a [4]–[6] and noise reduction in acoustic systems [7]. In the low-rank and sparse ones: statistical signal processing field, effective utilization of the ∥ ∥ m-dimensional Cayley-Dickson number system (C-D number minimize rank(L) + λ S 0 s.t. M = L + S, (1) L,S∈RM×N system) [8], [9], which is a standard class of hypercomplex ∥·∥ number systems [10], including, e.g., real R, complex C, where 0 denotes the ℓ0-, which counts the number quaternion H, octonion O and S etc., have been of non-zero entries of S. The RPCA has been successfully investigated. used for various signal processing applications such as source A hypercomplex number has one real part and many imag- separation [25], face recognition [26] and so on. Unfortunately, inary parts, and it can represent multidimensional data as a the formulation (1) is NP-hard, so its convex relaxation called number for which the four arithmetic operations including the principal component pursuit [24] is often considered in and division are available. It can fulfill the stead of the RPCA. Recently, the PCP is extended to simpler four arithmetic operation for multidimensional information, hypercomplex domains C and H using the complex matrix which are not available for ordinary real multidimensional isomorphism [27]. However, it can be only applied up to four vectors. Moreover, thanks to the nontrivial algebraic struc- dimensional data and hard to be generalized to octonion and ture, the multiplication of hypercomplex can enjoy higher dimensional C-D domain only with this isomorphism. algebraically interactions among real and imaginary parts. In this paper, to establish a RPCA framework in hyper- Hypercomplex vectors, matrices and tensors can also enjoy complex domain, first, we present Cayley-Dickson principal these benefits. For example, in 3D object modeling, each point component analysis (C-D PCP) as a convex relaxation of the in 3-dimensional space can have multiple attribute such as C-D extension of RPCA. This relaxation is based on the

978-988-14768-5-2 ©2018 APSIPA 579 APSIPA-ASC 2018 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

R-rank proposed in [22] and a new sparsity measure, ℓ1- m-dimensional hypercomplex number Am is extended to A2m norm of C-D matrices. This sparsity measure can be inter- [8], [9] as preted as evaluating a group sparsity of real matrices, so the z := x + yi ∈ A , x, y ∈ A , proximity operator can be easily calculated. Hence, the C-D m+1 2m m

PCP is a convex optimization in real domain which can be where im+1 ̸∈ Am is the additional imaginary unit for dou- A 2 − solved by applying proximal splitting techniques to a certain bling the of m satisfying im+1 = 1, i1im+1 = structured convex optimization problem. We finally propose im+1i1 = im+1 and ivim+1 = −im+1iv =: im+v for all A an hypercomplex PCP algorithm m based on Douglas- v = 2, . . . , m. For example, the system (A1 :=) R Rachford splitting (DRS) [28]. The proposed algorithm is a is extended into system C (= A2) by the C-D C-D generalization of a PCP algorithm (DR-PCP) proposed construction. Note that the value of m is restricted to the form in [29] and can be applied to general C-D domains. of 2n (n ∈ N). The hypercomplex number systems constructed Numerical experiments are performed in the context of inductively from the real number by the C-D construction are recovering sparsely corrupted low rank matrices in octonion called Cayley-Dickson number system (C-D number system). domain and show that the proposed algorithm successfully uti- The imaginary units appeared in the C-D number systems 2 − lizes algebraically natural correlations of real and all imaginary have many characteristic properties [21] such as iα = 1 − ̸ ∈ { } parts to recover much more faithfully the original matrices, and iαiβ = iβiα(α = β)∑for all α, β 2, . . . , m . These corrupted randomly by noise, than a part-wise real, complex ∗ m 2 ≥ ∈ A properties ensures aa = ℓ=1 aℓ 0 for any a m in ∗ and quaternion PCP algorithms. (2) and a ∈ Am in (3) and enable√ us to define the absolute | | ∗ II.PRELIMINARIES values of C-D number a as a := aa . A. Hypercomplex Number System Example 1. 1) A representative example of hypercomplex Let N and R be respectively the set of all non-negative inte- number is the quaternion H. The quaternion number gers and the set of all real numbers. Define an m-dimensional system is constructed from the complex number system hypercomplex number Am (m ∈ N \{0}) expanded on the by using the C-D construction. A quaternion number is real [8] a 4-dimensional hypercomplex which is defined as ∈ H ∈ R a := a1i1 + a2i2 + ··· + amim ∈ Am, a1, . . . , am ∈ R (2) q = q1 + q2ı + q3ȷ + q4κ , q1, q2, q3, q4

based on imaginary units i1,..., im, where i1 = 1 represents with the multiplication table: the vector identity element. Any hypercomplex number is ıȷ = −ȷı = κ, ȷκ = −κȷ = ı, κı = −ıκ = ȷ, (4) expressed uniquely in the form of (2). The coefficient of each ı2 = ȷ2 = κ2 = −1 imaginary unit aℓ (ℓ = 1, . . . , m) is represented as aℓ = ℑℓ(a). A multiplication table defines the products of any imaginary by letting m = 4, i1 = 1, i2 = ı, i3 = ȷ and i4 = κ. 2 2 − From (4), are not commutative, i.e., pq ≠ qp unit with each other or with itself (e.g., i1 = 1, i2 = 1 for p, q ∈ H in general. and i1i2 = i2i1 = i2 for A2(=: C)). The addition and the subtraction of two hypercomplex numbers are defined as 2) The octonion O can be constructed from the quaternion commutative binomial operations H by the C-D construction. Note that the multiplication in O is neither commutative nor associative, i.e., pq ≠ qp    ···  a b := (a1 b1)i1 + (a2 b2)i2 + + (am bm)im and (pq)r ≠ p(qr) for p, q, r ∈ O in general [10]. For the octonion multiplication table, see, e.g., [10]. for a, b ∈ Am, where b := b1i1 + b2i2 + ··· + bmim, ∈ R b1, . . . , bm . From the unique expression of (2), the The C-D number system can be seen as an algebraically nat- multiplication of two hypercomplex numbers ural higher dimensional generalization of our familiar fields, R C ab = (a1i1 + a2i2 + ··· + amim)(b1i1 + b2i2 + ··· + bmim) i.e., and . AN { ⊤| ∈ A ∑m ∑m We also define m := [x1, . . . , xN ] xi m (i = ⊤ := akbℓikiℓ ∈ Am 1,...,N)} for ∀N ∈ N \{0}, where (·) stands for the H N k=1 ℓ=1 transpose. Define ⟨x, y⟩AN := x y ∈ Am, ∀x, y ∈ A m m ∥ ∥ ⟨ ⟩1/2 ∀ ∈ AN · H is determined uniquely according to the multiplication table. and x AN := x, x AN , x m, where ( ) de- m m We also define the conjugate of hypercomplex number a as notes the Hermitian transpose of vectors or matrices (e.g., H ∗ ∗ ⊤ ∈ AN ∗ x := [x1, . . . , xN ] for x := [x1, . . . , xN ] m, where a := a1i1 − a2i2 − · · · − amim. (3) x1. . . . , xN ∈ Am). We also define the addition of two hyper- ··· ⊤ ∈ AN In this paper, we consider the hypercomplex number systems complex vectors x+y := [x1 +y1, , xN +yN ] m for ⊤ ∈ AN S R S C S which are constructed recursively by the Cayley-Dickson con- x, y(:= [y1, . . . , yN ] ) m. Let := , := or := struction (C-D construction or C-D (doubling) procedure) [8]. Am (m ≥ 4), and call the element of S scalar. If we define the ⊤ ∈ AN The C-D construction is a standard method for extending a left scalar multiplication as αx := [αx1, . . . , αxN ] m R ∈ S ∈ AN ∈ AN number system. This method has been used in extending for α and x m, we have αx + βy m, C C H H O ∀ ∈ S ∀ ∈ AN to , to and to . By using the C-D construction, an α, β , x, y m. We can also define the right scalar

580 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

∈ AN multiplication xα m in a similar way. Fact 1 (Algebraic correspondence between real and C-D vec- ′ ∈ AM×N ∈ AN×L tors and matrices [21]). For all A, A m , B m B. Algebraic Translations ∈ AN and x m, In this section, we introduce algebraic translation of C-D \ ′ c′ 1) (A + A ) = Ab + A , (αdA) = αAb , valued vectors and matrices proposed in [21]. A trivial corre- ^ ′ e f′ g e spondence (mapping) of hypercomplex vectors or matrices to (A + A ) = A + A , (αA) = αA for all α ∈ R, f ⊤ real ones is H e   2) (A ) = A , b 3) ∥x∥AN = ∥x∥RmN , A1 m c × ×   d e b d e b (·): AM N → RmM N : A 7→ Ab :=  . , (5) 4) (AB) = AB and (Ax) = Ax, m . g e e 5) (AB) = AB if m ≤ 4, i.e., if Am = R or C or H. Am ··· ∈ AM×N ∈ Example 2. For a quaternion matrix A := A1 +A2ı+A3ȷ+ where A = A1i1 + + Amim m and A1 ..., Am M×N b 4M×N e 4M×4N × A κ ∈ H , A ∈ R and A ∈ R are given as RM N . This correspondence is just concatenating a real and 4     all imaginary parts in the hypercomplex matrices. Obviously, A1 A1 −A2 −A3 −A4     this mapping is invertible and thus we can also define b A2 e A2 A1 −A4 A3  A =  , A =  − . | × × A3 A3 A4 A1 A2 (·): RmM N → AM N : Ab 7→ A. (6) m A4 A4 −A3 A2 A1 c· |· Only in terms of the mappings ( ) and ( ), it is hard to obtain Remark 1. Obviously, SA (M,N) is an mMN-dimensional m f the correspondence of matrix-vector product Ax, so we also real vector space and the non-trivial mapping (·) is guaranteed introduce the following non-trivial mapping: AM×N to be an isomorphism between m and SAm (M,N) AM×N R f M×N regarding m as a vector space over (see Section II-A). (·): A → SA (M,N): m m [ ] ⊤ ⊤ C. Singular Value Decomposition, Rank, and Low Rank Ap- A 7→ Ae := L(1) Ab ,..., L(m) Ab , (7) M M proximation where the matrix L(ℓ) ∈ RmM×mM (ℓ = 1, . . . , m) is defined In this section, we introduce useful notions C-D singular M R for the m-dimensional hypercomplex number Am as value decomposition (C-D SVD) and -rank for C-D matrices   A ∈ AM×N (ℓ) (ℓ) (ℓ) originally proposed in [22]. For any C-D matrix m , δ I δ I ··· δ I ∈ RmM×mM ∈  1,1 M 1,2 M 1,m M  there exist orthogonal real matrices U and V  − (ℓ) − (ℓ) · · · − (ℓ)  RmN×mN δ2,1IM δ2,2IM δ2,mIM such that L(ℓ) =  , M  . . . .  ⊤  . . .. .  Ae = UΣV , (9) (ℓ) (ℓ) (ℓ) −δ IM −δ IM · · · −δm,mIM mM×mN m,1 m,2 where Σ := diag(σ1, . . . , σr, 0,..., 0) ∈ R is with the M-dimensional identity matrix I and a rectangular diagonal matrix with positive singular values  M e σ1 ≥ · · · ≥ σr(> 0) of A on the diagonal We call  1 (if iαiβ = iγ ), (γ) it C-D singular value decomposition (C-D SVD) and call δ := −1 (if i i = −i ), R α,β  α β γ rank (A) := r = rank(Ae ) ≤ max(mM, mN) R-rank of 0 (otherwise). A. The R-rank has very strong relation to well-established ⊂ RmM×mN e original ranks [19] in C-D domain. SAm (M,N) is the restriction of A and repre- sented as Fact 2 (R-rank and original ranks in C-D domain [22]). For e M×N mM×mN SA (M,N) := {A|A ∈ A } ⊂ R complex (m = 2) or quaternion (m = 4) cases, m {[ m ] } R (1)⊤ ′ (m)⊤ ′ ′ ∈ RmM×N rank (A) = mrank(A) = LM A ,..., LM A A . ∈ AM×N (8) holds for all A m . ⊤ ∈ Am Using the imaginary unit vector i := [i1, i2,..., im] m, Originally, the rank of matrices is defined as the maximum (ℓ) LM is also compactly represented as number of column vectors of them which are linearly inde- pendent. In quaternion and higher dimensional C-D domain, (ℓ) ℑ H ⊗ LM = ℓ(ii IM ), we can consider left and right linearly independence since where ‘⊗’ is the Kronecker product. Similar to the trivial left and right scalar are distinct, so we have f A mapping, (·) is also invertible and thus we can define to define carefully the rank over m. In quaternion domain, by convention, the rank is defined as the maximum number of

× f· → AM N e 7→ ( ): SAm (M,N) m : A A. columns which are right linearly independent [19], since if we define so, the rank becomes equal to the number of positive These mappings have the following useful algebraic proper- singular values with the right eigenvalues. For octonion and ties: higher dimensional C-D domain, the concrete definition of the

581 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

rank has not yet been established to the best of the author’s of a C-D matrix as follows: knowledge. However, the R-rank is explicitly available for M,N∑ general C-D matrices keeping the consistency with the known ∥A∥ := |A |, A ∈ AM×N . (12) 1,Am i,j m results (e.g., [19]) from Fact 2. Therefore, the R-rank is a i,j=1 natural generalization of the rank to hypercomplex domain. For any C-D matrix A ∈ AM×N , we can consider the Finally, we discuss a low rank approximation of matrices in m following real matrix: C-D domain. By passing through the Schmidt-Eckart-Young   b b theorem [30] we can also perform a low rank approxima- A1,1 ··· A1,N   tion of C-D matrices in terms of R-rank minimization with b . . . ∈ RmM×N A =  . .. .  (13) truncating C-D SVD. Note that this approximation does not b b R AM,1 ··· AM,N always provide a low -rank matrix in SAm (M,N), i.e., R the existence the corresponding C-D matrix of the low - c × × c with the mapping (·): AM N → RmM N . Note that (·) is rank approximation is not guaranteed (for details, see [22]). c m just a permutation of (·) in (5) and we can define its inverse However, there are several cases where the corresponding C- | R |· RmM×N → AM×N b 7→ D matrices of low -rank approximation is guaranteed. The ( ): m : A A. Then, we have following fact is one of the examples (another examples can M,N∑ A m be seen in [22]). ∥ ∥ b b A 1,A = Ai,j =: A , m 2 1 Fact 3 (Inheritance of special structure of non-trivial map- i,j=1 e ping with the shrinkage operator). Let shrink(A, τ) be the ∥·∥ b ∈ where 2 is the ℓ2-norm of real vectors and note that Ai,j shrinkage operator given by m R (i = 1,...,M, j = 1,...,N). This implies that the ℓ1- ⊤ shrink(Ae , τ) = UΣ V (10) norm of a C-D matrix A can be regarded as a convex function τ A b ∥·∥ m b ∈ RmM×N 1 of the real matrix A and it evaluates the { − ′ for (9) and the shrunk diagonal matrix Στ := diag(max σ1 group sparsity (structured sparsity) [33] of A . Therefore, we } { − } ∈ AM×N τ, 0 ,..., max σr τ, 0 , 0,..., 0). For any A m and have the following theorem: τ > 0, A ≤ Theorem 1 (Proximity operator of ∥·∥ m ). For any A ∈ 1) If m 4, the shrinkage operator keeps the special 1 A f e AM×N ∥·∥ m structure of non-trivial mapping (·), i.e., shrink(A, τ) ∈ m , the proximity operator (see Appendix A) of 1 with index τ > 0 can be easily calculated group-wise as SAm (M,N). 2) If m > 4, the shrinkage operator does not always [ ] b b Ai,j b e A − keep this structure, i.e., shrink(A, τ) ̸∈ SA (M,N) in prox ∥·∥ m (A) = max(0, Ai,j τ), (14) m γ 1 i,j b 2 Ai,j general. 2 c b The shrinkage operator will be used in Section III-B and =: [ST(A, τ)]i,j (15) it often appears as the proximity operator (see Appendix A) where the indices [·]i,j (i = 1,...,M, j = 1,...,N) stand for of nuclear norm regularization in many context of matrix low ′ × × A ∈ RmM N the (i, j)-th group of size m 1 in prox ∥·∥ m (A ) . rank approximation via proximal splitting methods including γ 1

the principal component pursuit [24] and low rank matrix (or b | If we note that Ai,j = |Ai,j| and by applying (·) tensor) completion [31], [32]. 2 defined in (6) to the right hand side of (14), we formally III.HYPERCOMPLEX PRINCIPAL COMPONENT PURSUIT obtain the following entry-wise soft-thresholding function of C-D matrices: A. Convex Relaxation of Hypercomplex Robust Principal Ai,j Component Analysis [ST(A, τ)]i,j := max(0, |Ai,j| − τ). |Ai,j| In this section, we formulate the robust principal component analysis (RPCA) in C-D domain. Since the R-rank defined in This is obviously equivalent to (15) and a C-D generalization II-C is available for general C-D domain, we can formulate it of real, complex and quaternion soft-thresholding functions. in C-D domain as follows: By using ℓ1 norm we have discussed above and approximat- R ing R-rank with nuclear norm, we have the following convex minimize rank (L) + λ ∥S∥ A s.t. M = L + S, M×N 0, m optimization problem: L,S∈Am (11) ∥·∥ ∥e∥ ∥ ∥ where λ > 0 and 0,A is the number of non-zero entries minimize L ∗ + λ S 1,A s.t. M = L + S, (16) m ∈AM×N m in A ∈ AM×N . Obviously from Fact 2, this is a C-D L,S m generalization of RPCA in real domain. Similar to the real where ∥·∥∗ is the nuclear norm of real matrices i.e., the case [24], (11) is NP-hard. For relaxing (11) to a convex sum of positive singular values. In this paper, we call the optimization problem, we first introduce newly the ℓ1-norm problem (16) Cayley-Dickson principal component pursuit (C-

582 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

D PCP). Obviously, if we set Am = R (m = 1) and For the function g, the proximity operator of the indicator A C H A m = or ( m = 2 or 4), (16) is respectively identical function ιD1 is the orthogonal projection PD1 onto the sub- to the original PCP in real domain [24], and the complex space D1, i.e., and quaternionic PCP in [27]. Therefore, the C-D PCP is a prox (X ) = PD (X ) := arg min ∥X − Y∥H . natural generalization of these problems. Moreover, the C-D γg 1 0 Y∈D1 PCP can be regarded as a convex optimization problem in real Since D ⊂ D ⊂ H , we have by [34, 5.14, Reduction domain since the ℓ -norm of C-D matrices A can be regarded 1 2 0 1 principle] as a convex function of real matrices, and can be solved by proximal splitting techniques. X | ◦ X PD1 ( ) = PD1 D2 PD2 ( ). | | B. Hypercomplex Principal Component Pursuit via Convex Note that ‘ D2’ in PD1 D2 stands for the restriction of Optimization the domain to the subspace D2. The orthogonal projection P : H → D and P |D : D → D respectively can be In this section, we derive a new algorithm based on the D2 0 2 D1 2 2 1 Douglas-Rachford splitting technique [28] to solve the C-D calculated as

PCP (16) efficiently. Denote the 2-fold Cartesian product of PD (X ) = [PS(X1), X2] mM×mN mM×N 2 the spaces of real matrices by H0 := R × R . 1 ⊤ and By defining the inner product ⟨X , Y⟩ := tr(X Y 1) + H0 2 1 [ ] 1 ⊤ 1 1 ⋆ ⋆ ⟨ ⟩ × ⟨ ⟩ × 1 f f c c 2 tr(X2 Y 2) =: 2 X1, Y 1 RmM mN + 2 X2, Y 2 RmM N , | X − − PD1 D2( ) = M + X1 X2, M X1 + X2 , where X := [X1, X2] ∈ H0 and Y := [Y 1, Y 2] ∈ H0, 2

mM×mN mM×N ∈ R ∈ R f (X1, Y 1 √ , X2, Y 2 ) and induced ⋆ ∈ AM×N ⋆ q ∈ AM×N where X1 := X1 m and X2 := X2 m . norm ∥X ∥H := ⟨X , X⟩H , H0 becomes a real Hilbert ∈ AM×N 0 0 For PS(X1), let Ep,q,ℓ := Ep,qiℓ m (ℓ = 1, . . . , m), ∈ RM×N space. First, we reformulate the problem (16) as an uncon- where Ep,q is the matrix only whose (p, q)-th entry strained the sum of two functions as follows: (p = 1,...,M, q = 1,...,N) is 1 and all other entries are 0. Then, we can easily verify that minimize f(Z) + g(Z), (17) Z∈H { 0 ′ ′ ′ e e m (if (p, q, ℓ) = (p , q , ℓ )), where ⟨E , E ′ ′ ′ ⟩RmM×mN =  p,q,ℓ p ,q ,ℓ A 0 (otherwise).  Z ∥ ∥ ∥ ∥ m f( ) := f1(Z1) + f{2(Z2) = Z1 ∗ + Z2 1 , { √1 e }M,N,m Z ∈ and therefore, Ep,q,ℓ is an orthonormal  Z Z 0 (if D1), m p=1,q=1,ℓ=1 g( ) := ιD1 ( ) = of S and thus P (X ) can be easily calculated as: +∞ (otherwise), S 1 ∑M ∑N ∑m 1 e e ⟨ ⟩ × Z :=[Z , Z ] ∈ H , PS(X1) = X1, Ep,q,ℓ RmM mN Ep,q,ℓ.

{ 1 2 0 } m

p=1 q=1 ℓ=1 e q D := [Z , Z ] ∈ D M = Z + Zq ⊂ D , 1 1 1 2 1 2 2 Now, we can calculate × RmM×N ⊂ H D2 :=S 0, X | ◦ X proxγg( ) = PD1 D2 PD2 ( ) mM×mN S :=SA (M,N) ⊂ R . m = P |D [P (X ), X ] D[1 2 S 1 2 ] Note that the subspace D represents the constraint that the 1 ⋆ ⋆⋆ 1 = Mf + P (X ) − Xf , Mc − Xc + X ,

observation M is from the sum of low rank and sparse 2 S 1 2 1 2 matrices. This requests that both Z1 belong to S, so we need ⋆⋆ ^ ∈ AM×N where X1 := PS(X1) m . Since all ingredients the subspace D2. are identified, we can summarize the proposed hypercom- Apparently this reformulation (17) is equivalent to (16), plex principal component pursuit∑ algorithm in Algorithm 1. so all we need is to identify the concrete calculation of the ⊂ − ∞ Here, (tk)k≥0 [0, 2] satisfied k≥0 tk(2 tk) = + , proximity operators of f and g. In the same way as [29], the γ ∈ (0, +∞). Note that the shrinkage operator does not keep proximity operator of f is given by f· e ̸∈ [ ] the special structure of ( ), i.e., shrink(A, 2γ) S in general, X P proxγf ( ) = prox2γf (X1), prox2γf (X2) . so we need the projection onto the structure S. However, in complex and quaternion domain, it keeps the structure as The proximity operator of f1, i.e., the nuclear norm with index (k) (k) (k) shown in Fact 3, so L ∈ S and thus PS(L ) = L for 2γ is given by all k ≥ 0. Especially if m = 1 (i.e., Am = R), Algorithm 1 is identical to the original DRS for the PCP (DR-PCP) proposed prox2γf (X1) = shrink(X1, 2γ). 1 in [29]. Lastly, we state the convergence of the proposed By Theorem 1, the proximity operator of f2, reduces to the algorithm. group-wise soft-thresholding (15) of a real matrix: A c Theorem 2 (Convergence of m-DRS-PCP). Let parameters prox (X2) = ST(X2, 2γλ). (18) 2γf2 of Algorithm 1 be chosen so that γ ∈ (0, +∞), (tk)k≥0 ⊂

583 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

A TABLE I Algorithm 1: m-Douglas-Rachford splitting for hy- PERFORMANCE COMPARISON percomplex principal component pursuit (Am-DRS- R PCP) L, S ∈ O32×32, ρ = 0.2, rank (L) = 29

Input : M, tk, λ Algorithm error # iter. Output: Low R-rank L and sparse S Am-DRS-PCP 2.0e-6 2,044 Initialize k ← 0, L(k) ← 0, S(k) ← 0; H2-DRS-PCP 1.0 3,009 repeat C4-DRS-PCP 8.2e-1 2,265

8 ⋆⋆ ^(k) ⋆⋆ q (k) R -DRS-PCP 37.5 2,990 L ← PS(L ), S ← S ; 32×32 R ⋆ f (k) e ⋆⋆ L, S ∈ O , ρ = 0.2, rank (L) = 58 L ← (M + PS(L ) − S )/2; ⋆⋆ Algorithm error # iter. S⋆ ← (Mc − Lb + S(k))/2; A -DRS-PCP 6.5e-2 2,971 (k+1) ← ( ) m L H2-DRS-PCP 11.9 2,553 (k) ⋆ − (k) − ⋆ L + tk shrink(2(L L , 2γ) L ; ) C4-DRS-PCP 9.6 1,894 (k+1) (k) c ⋆ (k) ⋆ R8 S ← S +tk ST(2S − S , 2γλ) − S ; -DRS-PCP 78.7 1,924 k ← k + 1; until convergence;

DRS-PCP outperforms all part-wise methods by exploiting all ^ q L⋆⋆ ← P (L(k)), S⋆⋆ ← Sq (k); S correlations among real and imaginary parts. H-DRS-PCP and ⋆ ← f (k) − e ⋆⋆ L (M + PS(L ) S )/2; C-DRS-PCP much better than R-DRS-PCP since it may utilize ⋆⋆

S⋆ ← (Mc − Lb + S(k))/2; these correlations in part. This experiment also shows that C4- e q 2 [L, S] ← [L⋆, Sq ⋆]; DRS-PCP is a little bit better than H -DRS-PCP. This may be because the exploiting correlations with quaternion is not so important in this example. ∑ − ∞ [0, 2] satisfying k≥0 tk(2 tk) = + . Then, the output of V. CONCLUSIONS Algorithm 1 converges to a minimizer of (16). In this paper, we have proposed an algorithmic solution to Remark 2. In this paper, we employ the DRS for solving (16) hypercomplex principal component pursuit based on a proxi- but it can be also solved by other advanced convex optimiza- mal splitting technique. This solution solves the hypercomplex tion techniques such as the alternating direction method of principal component pursuit, which is a convex relaxation multipliers (ADMM) [35] and the primal-dual splitting (PDS) of hypercomplex robust principal component analysis with a [36], [37]. new sparsity measure of C-D matrices, and utilizes a useful mathematical tools including C-D SVD and R-rank based IV. NUMERICAL EXAMPLES on algebraic translations of C-D number systems. Numerical experiments show that the proposed algorithm separates the In this section, we perform some numerical experiments observed matrices into the sum of low rank and sparse ones for examining the effectiveness of the proposed method. much more faithfully than existing algorithms. Following the settings in [29], [38], we randomly generate an H ∈ AM×N input pairs (L, S) as follows: L := XLXR m , where APPENDIX A ∈ AM×r ∈ AN×r XL m and XR m (r < min(M,N)) with the DOUGLAS-RACHFORD SPLITTING all real and imaginary parts of each entry of XL, XR being R i.i.d from N (0, 1). Note that r is not always agree to mrank The Douglas-Rachford splitting (DRS) [28], [39], [40] is a since Fact 2 does not hold for m > 4 in general. We choose well-defined proximal splitting method that solves the mini- the support set of S uniformly at random from all support set mization of the sum of two functions ∈ of size ρMN (ρ (0, 1)). All real and imaginary parts of the f(x) + g(x), (19) non-zero entries are independently√ drawn form U(−256, 256). We fixed λ = 1/ max(M,N) in the experiments. We where f and g are assumed to be elements of the class, H perform experiments in the case where Am = O (m = 8).We denoted by Γ0( ), of proper lower semicontinuous convex H R ∪ { ∞} compare the proposed method Am-DRS-PCP and three part- functions from a real Hilbert space to + . For wise DRS-PCP method, H2-DRS-PCP, C4-DRS-PCP and R8- given(γ ∈ (0, +∞), the DRS approximates a minimizer of (19) O H2 C4 prox (x ) DRS-PCP. These part-wise methods split into , with γg k k≥0 by generating the following sequence 8 and R estimate all parts separately. TABLE I shows the (xk)k≥0: performance comparisons of all four algorithms. Fig. 1 and x ← x + t {prox [2 prox (x ) − x ] − prox (x )}, Fig. 2 show the differences of all real and parts between k+1 k k γf γg k k γg k (20) the estimated low rank matrices and original matrices for the ∑ A H2 ⊂ − ∞ right case of TABLE I for m-DRS-PCP (Fig. 1) and - where (tk)k≥0 [0, 2] satisfies k≥0 tk(2 tk) = + and DRS-PCP (Fig. 2). They show that the proposed method Am- the proximity operator [41] of index γ of f ∈ Γ0(H) is defined

584 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

opt Fig. 1. Difference between the original matrix L and the estimated low rank matrix L with Am-DRS-PCP

Fig. 2. Difference between the original matrix Lopt and the estimated low rank matrix L with H2-DRS-PCP

as [3] J. S. Yuan, “Closed-loop manipulator control using quaternion feed- { } 1 back,” IEEE Trans. Robot. Autom., vol. 4, no. 4, pp. 434–440, Aug. H → H 7→ ∥ − ∥2 1988. proxγf : : x arg min f(y) + x y H y∈H 2γ [4] D. P. Mandic and S. L. Goh, Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models. John Wiley and Sons (with the norm) on H denoted by ∥·∥H. Indeed, if dim(H) < ∞, Ltd, 2009. [5] S. L. Goh, M. Chen, D. H. Popovi, K. Aihara, D. Obradovic, and proxγg(xk) ≥ converges to a minimizer of (19) (see e.g., k 0 D. P. Mandic, “Complex-valued forecasting of wind profile,” Renewable [42]). Energy, vol. 31, pp. 1733–1750, 2006. [6] C. C. Took, D. P. Mandic, and K. Aihara, “Quaternion-valued short term REFERENCES forecasting of wind profile,” in Proc. IJCNN, 2010. [7] J. Benesty, J. Chen, and Y. Huang, “A widely linear distortionless filter [1] S. B. Choe and J. J. Faraway, “Modeling head and hand orientation for single-channel noise reduction,” IEEE Signal Process. Lett., vol. 17, during using quaternions,” Journal of Aerospace, vol. 113, no. 1, no. 5, pp. 469–472, May 2010. pp. 186–192, 2004. [8] I. Kantor and A. Solodovnikov, Hypercomplex numbers, An Elementary [2] J. C. K. Chou, “Quaternion kinematic and dynamic differential equa- Introduction to Algebras. Springer-Verlag, New York, 1989. tions,” IEEE Trans. Robot. Autom., vol. 8, no. 1, pp. 53–64, Feb. 1992. [9] D. Alfsmann, “On families of 2N -dimenisonal hypercomplex algebras

585 Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii

suitable for digital signal processing,” in Proc. EUSIPCO, Florence, of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, Italy, 2006. 2013. [10] J. C. Baez, “The ,” Bulletin of the American Mathematical [37] B. C. Vu,˜ “A splitting algorithm for dual monotone inclusions involving Society, vol. 39, no. 2, pp. 145–205, 2001. cocoercive operators,” Advances in Computational Mathematics, vol. 38, [11] W. E. Frazier, “Metal additive manufacturing: A review,” Journal of no. 3, pp. 667–681, Apr. 2013. Materials Engineering and Performance, vol. 23, no. 6, pp. 1917–1928, [38] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma, “Fast Jun. 2014. convex optimization algorithms for exact recovery of a corrupted low- [12] S. J. Sangwine and T. A. Ell, “Hypercomplex auto- and cross-correlation rank matrix,” UIUC Technical Report UILU-ENG-09-2214, Tech. Rep., of color images,” in Proc. IEEE ICIP, 1999, pp. 319–322. Jul. 2009. [13] N. L. Bihan and S. J. Sangwine, “Quaternion principal component [39] J. Douglas and H. Rachford, “On the numerical solution of heat analysis of color images,” in Proc. IEEE ICIP, 2003, pp. 809–812. conduction problems in two and three space variables,” Trans. of the [14] S. Miron, N. L. Bihan, and J. I. Mars, “Quaternion-MUSIC for vector- American Mathematical Society, vol. 82, pp. 421–439, 1956. sensor array processing,” IEEE Trans. Signal Process., vol. 54, no. 4, [40] J. Eckstein and D. P. Bertsekas, “On the Douglas-Rachford splitting pp. 1218–1229, Apr. 2006. method and the proximal point algorithm for maximal monotone oper- [15] T. A. Ell and S. J. Sangwine, “Hypercomplex Fourier transforms of ators,” Math. Programming, vol. 55, pp. 293–318, Apr. 1992. color images,” IEEE Trans. Image Process., vol. 16, no. 1, pp. 22–35, [41] J. J. Moreau, “Fonctions convexes duales et points proximaux dans Jan. 2007. unespace hilbertien,” Comptes Rendus de l’Academie´ des Sciences [16] A. M. Grigoryan and S. S. Agaian, Quaternion and Octonion Color (Paris), Serie´ A, vol. 255, pp. 2897–2899, 1962. Image Processing with MATLAB. SPIE Press Monographs, 2018. [42] I. Yamada, M. Yukawa, and M. Yamagishi, “Minimizing the Moreau [17] L. A. Wolf, “Similarity of matrices in which the elements are real envelope of nonsmooth convex functions over the fixed point set of quaternions,” Bulletin of the American Mathematical Society, vol. 42, certain quasi-nonexpansive mappings,” in Fixed-Point Algorithms for no. 10, pp. 737–743, 1936. Inverse Problems in Science and Engineering, H. H. Baischke, R. S. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, [18] N. A. Wiegmann, “Some theorems on matrices with real quaternion elements,” Canadian Journal of Mathematics, vol. 7, pp. 191–201, 1955. Eds. Springer, 2011, ch. 17, pp. 345–390. [19] F. Zhang, “Quaternions and matrices of quaternions,” and Its Applications, vol. 251, pp. 21–57, 1997. [20] T. Dray and C. A. Manogue, “The octonionic eigenvalue problem,” Advances in Applied Clifford Algebras, vol. 8, no. 2, pp. 341–364, 1998. [21] T. Mizoguchi and I. Yamada, “An algebraic translation of Cayley- Dickson linear systems and its applications to online learning,” IEEE Trans. Signal Process., vol. 62, no. 6, pp. 1438–1453, Mar. 2014. [22] ——, “Hypercomplex tensor completion with Cayley-Dickson singular value decomposition,” in Proc. IEEE ICASSP, Calgary, Alberta, Canada, 2018. [23] ——, “Hypercomplex tensor completion via convex optimization,” Sub- mitted to IEEE Trans. Signal Process., 2018. [24] E. J. Candes,` X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, pp. 1–37, May 2011. [25] P.-S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, “Singing-voice separation from monaural recordings using robust prin- cipal component analysis,” in Proc. IEEE ICASSP, 2012, pp. 57–60. [26] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 2233–2246, Jan. 2012. [27] T.-S. T. Chan and Y.-H. Yang, “Complex and quaternionic principal component pursuit and its application to audio separation,” IEEE Signal Processing Letters, vol. 23, no. 2, pp. 287–291, Feb. 2016. [28] P. L. Combettes and J. C. Pesquet, “A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery,” IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 564–574, Dec. 2007. [29] S. Gandy and I. Yamada, “Convex optimization techniques for the efficient recovery of a sparsely corrupted low-rank matrix,” Journal of Math-for-Industry, vol. 2, pp. 147–156, 2010. [30] G. W. Stewart, “On the early history of the singular value decomposi- tion,” SIAM Review, vol. 35, no. 4, pp. 551–566, 1993. [31] E. J. Candes` and B. Recht, “An empirical bayesian strategy for solving the simultaneous sparse approximation problem,” Foundations of Com- putational Mathematics, vol. 9, no. 6, pp. 717–772, Dec. 2008. [32] S. Gandy, B. Recht, and I. Yamada, “Tensor completion and low-n-rank tensor recovery via convex optimization,” Inverse Problems, vol. 27, no. 2, 2011. [33] D. Wipf and B. Rao, “An empirical bayesian strategy for solving the simultaneous sparse approximation problem,” IEEE Trans. Signal Process., vol. 55, no. 7, pp. 3704–3716, Jun. 2007. [34] F. R. Deutsch, Best Approximation in Inner Product Spaces. Springer- Verlag, New York, 2001. [35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 2, pp. 1–122, 2011. [36] L. Condat, “A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” Journal

586