<<

Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model

Shinichi Mogami∗, Norihiro Takamune∗, Daichi Kitamura†, Hiroshi Saruwatari∗, Yu Takahashi‡, Kazunobu Kondo‡, Hiroaki Nakajima‡, and Nobutaka Ono§ ∗ The University of Tokyo, Tokyo, Japan † National Institute of Technology, Kagawa College, Kagawa, Japan ‡ Yamaha Corporation, Shizuoka, Japan § Tokyo Metropolitan University, Tokyo, Japan

Abstract—Independent low-rank matrix analysis (ILRMA) is distributions can be parametrically changed along with the a fast and stable method for blind audio source separation. degree-of-freedom parameter in Student’s t distribution and Conventional ILRMAs assume time-variant (super-)Gaussian the in the GGD. By changing the of source models, which can only represent signals that follow a super-Gaussian distribution. In this paper, we focus on ILRMA the distributions, we can control how often the source signal based on a generalized Gaussian distribution (GGD-ILRMA) outputs outliers or its expected sparsity. In particular, in sub- and propose a new type of GGD-ILRMA that adopts a time- Gaussian models, i.e., models that follow distributions with variant sub-Gaussian distribution for the source model. By using a platykurtic shape, the source signal rarely outputs outliers. a new update scheme called generalized iterative projection for Therefore, the sub-Gaussian modeling of sources is expected homogeneous source models, we obtain a convergence-guaranteed update rule for demixing spatial parameters. In the experimental to accurately estimate the source spectrogram without ignoring evaluation, we show the versatility of the proposed method, i.e., its important spectral peaks. Furthermore, many audio sources the proposed time-variant sub-Gaussian source model can be follow platykurtic distributions; it is known that musical in- applied to various types of source signal. strument signals obey sub-Gaussian distributions [18]. However, neither conventional t-ILRMA nor GGD-ILRMA I.INTRODUCTION assumes that the source model follows a sub-Gaussian distri- Blind source separation (BSS) [1]–[8] is a technique bution. Both t-ILRMA and GGD-ILRMA can adopt only a for extracting specific sources from an observed multi- super-Gaussian (or Gaussian) model, i.e., models that follow channel mixture signal without knowing a priori infor- distributions with a leptokurtic shape, for the source model. mation about the mixing system. The most commonly In t-ILRMA, this is because the complex Student’s t distribu- used algorithm for BSS in the (over)determined case tion becomes only super-Gaussian for any degree-of-freedom (number of microphones number of sources) is indepen- parameter. In GGD-ILRMA, on the other hand, it is because dent component analysis (ICA)≥ [1]. As a state-of-the-art ICA- the estimation algorithm for the demixing matrix has not yet based BSS method, Kitamura et al. proposed independent been derived for a sub-Gaussian case, although the GGD itself low-rank matrix analysis (ILRMA) [9], [10], which is a uni- can represent a sub-Gaussian distribution depending on its fication of independent vector analysis (IVA) [5], [6] and shape parameter. More specifically, the conventional iterative nonnegative matrix factorization (NMF) [11]. ILRMA assumes projection (IP) [7], which is an algorithm that updates the both statistical independence between sources and a low-rank demixing matrix mainly used in IVA and ILRMA, cannot time- structure for each source, and the frequency- be applied to sub-Gaussian-based GGD-ILRMA owing to wise demixing systems are estimated without encountering mathematical difficulties. the permutation problem [3], [4]. ILRMA is a faster and In this paper, we propose a new type of ILRMA that more stable algorithm than multichannel NMF (MNMF) [12]– assumes time-variant sub-Gaussian source models. This paper [14], which is an algorithm for BSS that estimates the mixing includes three novelties. First, we construct a new update system on the basis of spatial matrices. scheme for the demixing matrix called generalized IP for The original ILRMA based on Itakura–Saito (IS) homogeneous source models (GIP-HSM). Second, we derive arXiv:1808.08056v1 [eess.AS] 24 Aug 2018 assumes a time-variant isotropic complex Gaussian distribu- a convergence-guaranteed update rule for the demixing matrix tion for the source . Hereafter, we refer to in GGD-ILRMA with a shape parameter of four. To the the original ILRMA as IS-ILRMA. Recently, various types of best of our knowledge, this is the world’s first attempt to source generative model have been proposed in ILRMA for ro- model the source signal with a time-variant sub-Gaussian bust BSS. In particular, t-ILRMA [15] and GGD-ILRMA [16], distribution, and we derive the update rule by applying the [17] have been proposed as generalizations of IS-ILRMA with above-mentioned scheme. Third, we show the validity of the a complex Student’s t distribution and a complex general- proposed sub-Gaussian GGD-ILRMA via BSS ized Gaussian distribution (GGD), respectively. In t-ILRMA on music and speech signals. We confirm that the proposed and GGD-ILRMA, the kurtosis of the generative models’ method is a versatile approach to source modeling, i.e., the proposed time-variant sub-Gaussian model can represent super-Gaussian or Gaussian signals as well as sub-Gaussian signals owing to its time-variant nature, whereas the conven- tional models can only represent super-Gaussian or Gaussian signals.

II.PROBLEM FORMULATION

A. Formulation of Demixing Model (a) GGD with β = 2. (b) GGD with β = 4. Let N and M be the numbers of sources and channels, respectively. The short-time Fourier transforms (STFTs) of Fig. 1. Examples of shapes of complex GGD. (a) When β = 2, shape of GGD corresponds to that of Gaussian distribution. (b) When β = 4, GGD is the multichannel source, observed, and estimated signals are platykurtic. defined as

> N sij = (sij1, . . . , sijN ) C , (1) independently defined in each time-frequency slot as follows: >∈ M xij = (xij1, . . . , xijM ) , (2) Y C p(Yn) = p(yijn) > ∈ N yij = (yij1, . . . , yijN ) C , (3) i,j ∈  β  Y β yijn where i = 1,...,I; j = 1,...,J; n = 1,...,N; and m = = exp | | , (7) 2πr 2Γ(2/β) − r β 1 ...,M are the integral indices of the frequency bins, time i,j ijn ijn frames, sources, and channels, respectively, and > denotes the p X rijn = tiknvkjn, (8) transpose. We assume the mixing system k

where the local distribution p(yijn) is defined as a circularly xij = Aisij, (4) symmetric complex Gaussian distribution, i.e., the probability M×N of p(y ) only depends on the power of the complex value where Ai = (ai1,..., aiN ) C is a frequency-wise ijn ∈ y . r is the time-frequency-varying and mixing matrix and ain is the steering vector for the nth source. ijn ijn p is the domain parameter in NMF modeling. Moreover, When M = N and Ai is not a singular matrix, the estimated signal y can be expressed as the variables tikn and vkjn are the elements of the basis ij I×K K×J matrix Tn R and the activation matrix Vn R , ∈ ≥0 ∈ ≥0 yij = Wixij, (5) respectively, where R≥0 denotes the set of nonnegative real numbers. k = 1, ,K is the integral index, and K is set −1 H ··· where Wi = Ai = (wi1,..., wiN ) is the demixing matrix, to a much smaller value than I and J, which leads to the H win is the demixing filter for the nth source, and denotes the low-rank approximation. From (7), the negative log-likelihood Hermitian transpose. ILRMA estimates both W and y from function GGD of the observed signal xij can be obtained as i ij L only the observation xij assuming statistical independence follows by assuming independence between sources: 0 between sijn and sijn0 , where n = n . X 6 GGD = 2J log det Wi L − | | i B. Generative Model and Cost Function in GGD-ILRMA β ! X yijn + | | + 2 log r , (9) GGD-ILRMA utilizes the isotropic complex GGD. The r β ijn i,j,n ijn probability density function of the GGD is where yijn = winxij and we used the transformation of β β  z  random variables from xij to yij. The cost function of p(z) = exp | | , (6) 2πr2Γ(2/β) − rβ GGD-ILRMA (9) coincides with that of IS-ILRMA when β = p = 2. By minimizing (9) w.r.t. Wi and rijn under where β is the shape parameter, r is the scale parameter, and the limitation (8), we estimate the demixing system that Γ( ) is the gamma function. Fig. 1 shows the shapes of the maximizes the independence between sources. · GGD with β = 2 and β = 4. When β = 2, (6) corresponds Fig. 2 shows a conceptual model of GGD-ILRMA. When to the probability density function of the complex Gaussian each of the original sources has a low-rank spectrogram, the distribution with a mesokurtic shape. In the case of 0 < β < spectrogram of their mixture should be more complicated, 2, the distribution becomes super-Gaussian with a leptokurtic where the rank of the mixture spectrogram will be greater shape. In the case of β > 2, the distribution becomes sub- than that of the source spectrogram. On the basis of this Gaussian with a platykurtic shape. assumption, in GGD-ILRMA, the low-rank constraint for each In GGD-ILRMA, we assume the time-variant isotropic estimated spectrogram is introduced by employing NMF. The complex GGD as the source generative model, which is demixing matrix Wi is estimated so that the spectrogram of apply IP because no quadratic function w.r.t. x can majorize xβ. Conventional GGD-ILRMA achieves various types of source generative model: when β = 2, the entry of the source spectrogram follows the complex Gaussian distribution (the same model as that of IS-ILRMA), and when β < 2, the entry of the source spectrogram follows the complex leptokurtic Fig. 2. Principle of source separation in GGD-ILRMA, where x˜m and y˜n distribution. However, a source generative model that follows are time-domain signals of xijn and yijn, respectively. a platykurtic complex GGD is yet to be achieved. Since the marginal distribution of the time-variant super-Gaussian or Gaussian model w.r.t. the time frame becomes only super- the estimated signal becomes a low-rank matrix modeled by Gaussian, any signals that follow sub-Gaussian distributions, TnVn, whose rank is at most K. The estimation of Wi, Tn, such as music signals, cannot be appropriately dealt with by and Vn can consistently be carried out by minimizing (9) in the conventional GGD-ILRMA. a fully blind manner. B. Update Rule for Low-Rank Source Model III.CONVENTIONAL METHOD The update rules for Tn and Vn in IS-ILRMA and GGD- A. Update Rule for Demixing Matrix ILRMA can be derived by the MM algorithm, which is a In IS-ILRMA, the demixing matrix Wi can be efficiently popular approach for NMF. We obtain the following update updated by IP, which can be applied only when the cost rules: function is the sum of log det Wi and the quadratic form p − | |  y β  β+p of win (this corresponds to GGD-ILRMA with β = 2). In P ijn β | | vkjn j β +1 GGD-ILRMA, the update rule of W is also derived using  P p  i  ( k0 tik0nvk0jn)  the majorization-minimization (MM) algorithm [19]. When tikn tikn   , (16) ←  P 1  0 < β 2, we can use the following inequality of weighted  2 j P vkjn  0 tik0nvk0jn arithmetic≤ and geometric to design the majorization k p function:  β  β+p P yijn 2 β | | tikn β y  β   j P β +1  β ijn β ( t 0 v 0 ) p yijn | | + 1 αijn , (10)  k0 ik n k jn  | | ≤ 2 α 2−β − 2 vkjn vkjn   . (17) ijn ←  P 1   2 j P tikn  where αijn is an auxiliary variable and the equality of (10) k0 tik0nvk0jn holds if and only if αijn = yijn . By applying (10) to (9), the majorization function of (9)| can| be designed as See Appendix A for their detailed derivation. X In GGD-ILRMA, the cost function (9) is minimized by GGD 2J log det Wi alternately repeating the update of the demixing matrix W L ≤ − | | i i using (13)–(15) and the update of the low-rank source models X H + J winFinwin + const., (11) Tn and Vn using (16) and (17), respectively. A monotonic i,n decrease in the cost is guaranteed over these update rules. β X 1 H Fin = β xijxij, (12) IV. PROPOSED METHOD 2J α 2−β(P t v ) p j ijn k ikn kjn A. Motivation where the constant term is independent of win. By applying The conventional methods [9], [16] have a limitation that IP to (11) and substituting the equality condition αijn = yijn | | the source signal cannot be appropriately represented when into (12), the update rule for Wi is derived as the signal follows a sub-Gaussian distribution. In this paper, β X 1 H we propose an MM-algorithm-based update rule for GGD- Fin = xijx , (13) 2−β β ij ILRMA to maximize the likelihood based on the sub-Gaussian 2J y (P t v ) p j ijn k ikn kjn source model. To derive the update rule, we also extend the −1 |−1 | win Fin Wi en, (14) problem of demixing matrix estimation into a more general- ← q H ized form and propose its optimization scheme based on GIP- win win 1/(w Finwin). (15) ← in HSM. When β = p = 2, these update rules (13)–(15) coincide with In contrast to the time-variant super-Gaussian or Gaussian those in IS-ILRMA. model, the marginal distribution of the time-variant sub- Note that the update rules (13)–(15) are valid only when Gaussian model w.r.t. the time frame can be sub-Gaussian 0 < β 2, which is equivalent to the condition that the as well as Gaussian or super-Gaussian, depending on its ≤ inequality (10) holds. In fact, when β > 2, it is thought to be time of the scale parameter rijn. For example, the impossible to design a majorization function to which we can time-variant sub-Gaussian model is platykurtic when rijn is constant w.r.t. the time frame, whereas it becomes mesokurtic Hence, letting the right side of (20) be zero, we can obtain or leptokurtic when rijn fluctuates appropriately. This shows the optimal ηin as that the proposed time-variant sub-Gaussian model covers pd η = 2/d. (21) distributions with a wider between platykurtic and in leptokurtic shapes than other conventional source models. The actual difficulty in the optimization of the demixing Therefore, the proposed GGD-ILRMA is expected to have a matrix is the direction optimization, i.e., the minimization of 2 robust performance against the variation of the target signals. 2 log det Ui . Since minimizing log x is equivalent to − | x2| − B. Derivation of GIP-HSM maximizing , we can reformulate this problem as 2 The cost functions of IS-ILRMA and GGD-ILRMA are maximize det Ui s.t. fin(uin) = 1. (22) | | generalized as Since it is generally difficult to solve this problem by a closed I " N # X X form, we apply an approach called vectorwise coordinate = 2 log det Wi + fin(win) + const., (18) descent. In this algorithm, we focus on u , namely, the L − | | in i=1 n=1 Hermitian transpose of a particular row vector of Ui. By N where the constant term is independent of win and fin : C cofactor expansion, we can deform the problem (22) as → is a real-valued function that satisfies the following three 2 R maximize bH u s.t. f (u ) = 1, (23) conditions: in in in in 1) fin(w) is differentiable w.r.t. w at an arbitrary point. where bin is a column vector of the adjugate matrix ∀  N  H 2) c > 0, w C fin(w) c is convex (naturally Bi = bi1 biN of Ui. Since bin only depends ∈ ≤ 0 ··· satisfied when fin(w) is convex). on uin0 (n = n) and is independent of uin, (23) can be ∀ d 6 3) η, fin(ηw) = η fin(w), namely, fin is a homoge- regarded as a function of uin by fixing the other row vectors of neous function of degree d. Ui. Using the method of Lagrange multipliers, the stationary The term fin(win) is determined by the distribution condition is of the source generative model, e.g., fin(win) = H ∂fin P H β β bin(binuin) + λ (uin) = 0, (24) (1/J) w xij /rijn in GGD-ILRMA. H j in ∂uin Here we| show that| the optimization of (18) w.r.t. w is in H composed of “direction optimization” and “scale optimization” where λ is a Lagrange multiplier. Since (binuin) is a scalar, the stationary condition can be rewritten as for each frequency bin. Let uin be an N-dimensional vector that satisfies f (u ) = 1. Then, w can be uniquely repre- ∂f in in in in (u ) b , (25) w = η u η H in in sented as in in in, where in is a positive real value. ∂uin k By regarding f (u) as the norm of u, we can interpret u as in in where the binary relation “x y” means that x is parallel to a unit vector w.r.t. the f -norm. Substituting w = η u in in in in y. In (25), b is representedk in terms of W as into (18), the cost function is represented as in in b = (det U )U −1e I  in i i n X  H −1 −1 −1 = 2 log det η1u1 ηN uN = (det U )(diag(η , . . . , η )W ) e L − ··· i i1 iN i n i=1 = (det U )W −1diag(η , . . . , η )e N  i i i1 iN n X −1 + fin(ηinuin) = (ηin det Ui)Wi en n=1 −1 Wi en, (26) I " ! N # k X Y X d = 2 log ηin det Ui + ηin fin(uin) where diag(c1, . . . , cN ) denotes the N N diagonal matrix − · | | × i=1 n n=1 whose (n, n)th element is cn, and en is an N-dimensional I " N # vector whose nth element is one and whose other elements are X X  d = 2 log det Ui + 2 log ηin + ηin , (19) zero. Since fin is convex, the stationary point of the objective − | | − i=1 n=1 function (23) must also be the optimal point. Therefore, the H cost function (19) that includes (22) monotonically decreases where U =  u u  . Therefore, the minimization i i1 iN with each update of the direction u . of the cost function··· can be interpreted as the minimization of in In conclusion, to minimize the cost function (18), we update log det U for each frequency bin and the minimization of i the vector w by the following two steps in GIP-HSM. (a) −2 log| η +|η d for each source and frequency bin. These in in in Find a vector w0 that satisfies −direction optimization and scale optimization problems are in independent of each other. The optimal ηin can be calculated ∂fin 0 −1 H (win) Wi en. (27) by a closed form because the derivative of the cost function ∂win k w.r.t. η can be written as in (b) Update win as d d 2 d−1 q ( 2 log η + η ) = + dη . (20) 0 d 0 in in in win w 2/(d fin(w )). (28) dηin − −ηin ← in · in The first step (a) and second step (b) correspond to the The quadratic form of the right side in (37) can be deformed 0 direction and scale optimizations, respectively. Note that win as 0 calculated in the first step does not need to satisfy fin(w ) = in H ˜ ˜ H 1 because the scale is automatically adjusted in the second q Qq = tr(Qqq ) 0 0 0   step. In fact, if win is represented as win = ηinuin, the second X 2 X ∗ ∗ = qj q˜ qiq q˜ q˜j step results in | | k k − j i j i6=j q 0 d 0 d pd    win η uin 2/(d η ) = uin 2/d, (29) X 2 X 2 X ∗ ∗ in in = qj q˜j qiq q˜ q˜j. (38) ← · · | | | | − j i at which point both the direction and the scale are optimized. j j i6=j C. Sub-Gaussian ILRMA Based on GIP-HSM Hence, we prove the following inequality hereafter:    Using GIP-HSM, we propose a new update rule in GGD- X 4 X 4 qj q˜j ILRMA whose shape parameter β is set to four (time-variant | | | | sub-Gaussian model). The cost function of GGD-ILRMA with j j  2 β = 4 is written as    X 2 X 2 X ∗ ∗ H 4  qj q˜j qiqj q˜i q˜j . (39) X X winxij ≤ | | | | − = 2J log det Wi + | | + const., (30) j j i6=j J − | | r 4 i i,j,n ijn Let where the constant term is independent of win. It seems f (w ) = X 2 possible to apply GIP-HSM by letting in in x1 = qj , (40) P H 4 4 (1/J) w xij /rijn . In this case, however, it is diffi- | | j | in | j cult to solve (27), which is reduced to a cubic vector equation s X 2 2 w.r.t. w0 . Instead, we apply an MM algorithm to derive an x2 = qi qj , (41) in | | | | update rule that does not contain any cubic vector equations. i6=j Hereafter, we prove the following theorem, and then design a X 2 y1 = q˜j , (42) | | new type of majorization function of (30) using the theorem. j PJ H 4 4 Theorem 1: Let f(w) = (1/J) j=1( w xj /rj ) and s H 2 | | X 2 2 g(w) = (w Gw) , where G is defined in terms of a vector y2 = q˜i q˜j . (43) | | | | w˜ as i6=j  1 1  H = x1 xJ , (31) Since r1 ··· rJ  > H q˜ = q˜1 q˜J = H w˜, (32) 2 2 X 4 ··· x1 x2 = qj 0, (44)  2 ∗ ∗  − | | ≥ q˜ q˜1q˜ q˜1q˜ j k k − 2 · · · − J  ∗ 2 ∗  2 2 X 4  q˜2q˜ q˜ q˜2q˜  y1 y2 = q˜j 0, (45) ˜ − 1 k k · · · − J  − | | ≥ Q =  . . . .  , (33) j  . . .. .    ∗ ∗ 2 and x1, x2, y1, y2 0, it is obvious that q˜J q˜ q˜J q˜ q˜ ≥ − 1 − 2 · · · k k 1 H x1y1 x2y2 0. (46) G = HQH˜ . (34) − ≥ q P 4 J q˜j j | | Furthermore, we obtain the following inequality by applying Then, g(w) satisfies f(w) g(w) for arbitrary w and the the Cauchy–Schwarz inequality: ≤ equality holds when w = w˜. v > u     H u X 2 2 X 2 2 Proof: Let q = q1 qJ = H w. f(w) and x2y2 = qi qj q˜i q˜j ··· t | | | | | | | | g(w) can be written as i6=j i6=j

1 X 4 f(w) = q , (35) j X ∗ ∗ X ∗ ∗ J | | qiq q˜ q˜j qiq q˜ q˜j, (47) j ≥ j i ≥ j i 1 i6=j i6=j g(w) = (qHQq˜ )2. (36) P 4 J q˜j where we used the fact that j | | X X Then, the objective inequality f(w) g(w) holds if and only q q∗q˜∗q˜ = q q∗q˜∗q˜ + q∗q q˜ q˜∗ if ≤ i j i j i j i j i j i j i H qin = qi1n qiJn = H win, (59) ··· in Therefore, (39) can be proven by using the Brahmagupta  2 ∗ ∗  qin qi1nq qi1nq identity, (44), (45), and (49), as follows: k k − i2n · · · − iJn  ∗ 2 ∗   qi2nq qin qi2nq     − i1n k k · · · − iJn X 4 X 4 Qin =  . . . .  , (60) qj q˜j . . .. . | | | |  . . .  j j   ∗ ∗ 2 2 2 2 2 qiJnqi1n qiJnqi2n qin = (x1 x2 )(y1 y2 ) − − · · · k k − 2 − 2 1 H = (x1y1 x2y2) (x1y2 x2y1) Gin = HinQinHin, (61) − − − q P 4 2 J qijn (x1y1 x2y2) j | | ≤ − −1 −1  2 win G W en. (62) X ∗ ∗ ← in i x1y1 qiq q˜ q˜j ≤ − j i i6=j Finally, we operate the following scale optimization by apply-  2 ing (28):    X 2 X 2 X ∗ ∗  > H =  qj q˜j qiqj q˜i q˜j . (50) q = q q = H w , (63) | | | | − in i1n iJn in in j j i6=j q ··· 4 P 4 win win J/(2 j qijn ), (64) It is easy to prove that the equality of (39) holds if w = w˜ ← | | because then q = q˜ holds. which is the scale optimization w.r.t. the fin-norm. Applying Theorem 1, we can design a majorization function In GGD-ILRMA with β = 4, the demixing matrix Wi of (30) as is updated by (58)–(64), and the low-rank models Tn and Vn are updated by (16) and (17), respectively. These update X X H 2 2J log det Wi + J (w Ginwin) + const. rules are derived using the MM algorithm and GIP-HSM, thus J ≤ − | | in i i,n guaranteeing a monotonic decrease of the cost function (30). =: +, (51) J V. EXPERIMENTAL EVALUATION where A. BSS on Music Signals

 1 1  We compared the separation performance of the proposed Hin = xi1 xiJ , (52) ri1n ··· riJn sub-Gaussian GGD-ILRMA (β = 4) with those of conven-  > H q˜in = q˜i1n q˜iJn = H w˜in, (53) tional IS-ILRMA [9] and GGD-ILRMA (β < 2) [16]. We ar- ··· in  2 ∗ ∗  tificially produced monaural dry music sources of four melody q˜in q˜i1nq˜ q˜i1nq˜ k k − i2n · · · − iJn parts (melody 1: main melody, melody 2: counter melody,  ∗ 2 ∗   q˜i2nq˜i1n q˜in q˜i2nq˜iJn midrange, and bass) using Microsoft GS Wavetable Synth, ˜ − k k · · · −  Qin =  . . . .  , (54)  . . .. .  where several musical instruments were chosen to play these   melody parts [20], [21]. Six combinations of sources, Music ∗ ∗ 2 q˜iJnq˜ q˜iJnq˜ q˜in − i1n − i2n · · · k k 1–Music 6, were constructed by selecting typical combinations 1 ˜ H of instruments with different melody parts. The combinations Gin = HinQinHin, (55) q P 4 of dry sources used in this experiment are shown in Table I. J q˜ijn j | | To simulate a reverberant mixture, the observed signals were produced by convoluting the E2A, which where w˜in is an auxiliary variable and the equality of (51) H 2 was obtained from the RWCP database [22], shown in Fig. 3. holds when win = w˜in. Since gin(win) = (winGinwin) is a differentiable, convex, and homogeneous function of As the evaluation score, we used the improvement of the + signal-to-distortion ratio (SDR) [23], which indicates the over- win, we can apply GIP-HSM to minimize . The optimal J all separation quality. An STFT was performed using a 128- condition for the direction of win is determined as ms-long Hamming window with a 64-ms-long shift. The shape ∂gin 0 0 H 0 0 −1 parameter β of the GGD was set to 1, 1.99, 2 in conventional H (win) = (win Ginwin)Ginwin Wi en. (56) ∂win k GGD-ILRMA and 4 in the proposed method, where β = 1.99 is the best parameter according to [16]. The other conditions 0 H 0 Since (win Ginwin) is a scalar, one of the solutions of (56) are shown in Table II. is Fig. 4 shows the average SDR improvements for Music

0 −1 −1 1–Music 6. The proposed sub-Gaussian GGD-ILRMA on win = Gin Wi en. (57) average outperforms the other conventional ILRMAs. This TABLE I COMBINATIONS OF DRY SOURCES TABLE II EXPERIMENTALCONDITIONSFORMUSICANDSPEECHSOURCE Index Source 1 Source 2 SEPARATION Music 1 Fg. (bass) Ob. (melody 1) Music 2 Fg. (bass) Tp. (melody 1) frequency 16 kHz Music 3 Ob. (melody 1) Fl. (melody 2) Number of iterations 1000 Music 4 Pf. (midrange) Ob. (melody 1) Number of bases 20 Music 5 Pf. (midrange) Tp. (melody 1) Number of trials 10 Music 6 Tp. (melody 1) Fl. (melody 2) Domain parameter p = 0.5 Initial demixing matrix Wi identity matrix Entries of initial source uniformly distributed random values model matrices Tn and Vn

10 GGD (β = 1) GGD (β = 1.99) IS (β = 2) GGD (β = 4) 8

6

Fig. 3. Recording conditions of impulse response E2A (T60 = 300 ms) obtained from RWCP database [22]. 4 result confirms that the proposed sub-Gaussian source model SDR improvement [dB] 2 is more appropriate for dealing with music signals than other conventional (super-)Gaussian source models. 0 B. BSS Experiment on Speech Signals Music 1 Music 2 Music 3 Music 4 Music 5 Music 6 Average We also confirmed the separation performance of the pro- Fig. 4. Average SDR improvement of GGD-ILRMA (β = 1), GGD-ILRMA posed sub-Gaussian GGD-ILRMA for speech signals, which (β = 1.99), IS-ILRMA, and proposed GGD-ILRMA (β = 4) for six music are less likely to follow a sub-Gaussian distribution than music signals. “Average” bar graph on right side shows average SDR improvement signals. We used the monaural dry speech sources from the among six music signals for each method. source separation task in SiSEC2011 [24], Speech 1–Speech 4. An STFT was performed using a 256-ms-long Hamming window with a 128-ms-long shift. The other conditions were the same as those of the music source separation experiment. 14 Fig. 5 shows the average SDR improvements for Speech 1– GGD (β = 1) GGD (β = 1.99) Speech 4. The proposed GGD-ILRMA on average outperforms 12 IS (β = 2) GGD (β = 4) the other conventional ILRMAs even for speech signals, which 10 are expected to be sparse and follow super-Gaussian distribu- tions. This shows that the proposed time-variant sub-Gaussian 8 model can appropriately model super-Gaussian signals as well 6 as sub-Gaussian signals owing to its time-variant property, as described in Sect. IV-A. 4

VI.CONCLUSION SDR improvement [dB] 2

We proposed a new type of ILRMA, which assumes that 0 the source signal follows the time-variant isotropic complex 2 sub-Gaussian GGD. By using a new update scheme called − Speech 1 Speech 2 Speech 3 Speech 4 Average GIP-HSM, we obtained a convergence-guaranteed update rule for the demixing matrix. Furthermore, in the experimental Fig. 5. Average SDR improvement of GGD-ILRMA (β = 1), GGD-ILRMA evaluation, we revealed the versatility of the proposed method, (β = 1.99), IS-ILRMA, and proposed GGD-ILRMA (β = 4) for four speech i.e., the proposed time-variant sub-Gaussian source model can signals. “Average” bar graph on right side shows average SDR improvement among four speech signals for each method. deal with various types of source signal, ranging from sub- Gaussian music signals to super-Gaussian speech signals. ACKNOWLEDGMENT [22] S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, and T. Yamada, “Acous- tical sound database in real environments for sound scene understanding This work was partly supported by SECOM Science and and hands-free speech recognition,” in Proc. LREC, 2000, pp. 965–968. Technology Foundation and JSPS KAKENHI Grant Numbers [23] E. Vincent, R. Gribonval, and C. Fevotte,´ “Performance measurement in blind audio source separation,” IEEE Trans. ASLP, vol. 14, no. 4, pp. JP17H06572 and JP16H01735. 1462–1469, 2006. [24] S. Araki, F. Nesta, E. Vincent, Z. Koldovsky,´ G. Nolte, A. Ziehe, REFERENCES and A. Benichoux, “The 2011 signal separation evaluation campaign (SiSEC2011): - audio source separation -,” in Proc. LVA/ICA, 2012, pp. [1] P. Comon, “Independent component analysis, a new concept?” Signal 414–422. Process., vol. 36, no. 3, pp. 287–314, 1994. [2] P. Smaragdis, “Blind separation of convolved mixtures in the frequency APPENDIX domain,” Neurocomputing, vol. 22, no. 1, pp. 21–34, 1998. [3] H. Sawada, R. Mukai, S. Araki, and S. Makino, “A robust and precise A. Derivation of Update Rule for Low-Rank Source Model method for solving the permutation problem of frequency-domain blind T V source separation,” IEEE Trans. ASLP, vol. 12, no. 5, pp. 530–538, 2004. The update rules for n and n in GGD-ILRMA can be [4] H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, derived by the MM algorithm. In the MM algorithm, we “Blind source separation based on a fast-convergence algorithm com- minimize the majorization function instead of the original cost bining ICA and beamforming,” IEEE Trans. ASLP, vol. 14, no. 2, pp. function. To derive the majorization function in GGD-ILRMA, 666–678, 2006. [5] A. Hiroe, “Solution of permutation problem in ICA we introduce Jensen’s inequality using multivariate probability density functions,” in Proc. ICA, 2006, β β !− p pp. 601–608.  − p X X tiknvkjn [6] T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, “Blind source separation tiknvkjn φijnk (65) exploiting higher-order frequency dependencies,” IEEE Trans. ASLP, ≤ φijnk k k vol. 15, no. 1, pp. 70–79, 2007. [7] N. Ono, “Stable and fast update rules for independent vector analysis and the tangent-line inequality based on auxiliary function technique,” in Proc. WASPAA, 2011, pp. ! 189–192. X 1 X [8] K. Yatabe and D. Kitamura, “Determined blind source separation via log tiknvkjn tiknvkjn 1 + log ψijn, ≤ ψijn − proximal splitting algorithm,” in Proc. ICASSP, 2018, pp. 776–780. k k [9] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “De- (66) termined blind source separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE/ACM Trans. ASLP, vol. 24, where φijnk > 0 and ψijn > 0 are auxiliary variables and no. 9, pp. 1626–1641, 2016. P [10] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, φijnk satisfies k φijnk = 1. The equalities of (65) and (66) “Determined blind source separation with independent low-rank matrix hold if and only if analysis,” in Audio Source Separation, S. Makino, Ed. Springer, Cham, March 2018, ch. 6, pp. 125–155. tiknvkjn φijnk = , (67) [11] D. D. Lee and H. S. Seung, “Learning the parts of objects by non- P 0 0 k0 tik nvk jn negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, X 1999. ψijn = tiknvkjn, (68) [12] A. Ozerov and C. Fevotte,´ “Multichannel nonnegative matrix factoriza- k tion in convolutive mixtures for audio source separation,” IEEE Trans. ASLP, vol. 18, no. 3, pp. 550–563, 2010. respectively. By substituting (8) into (9) and applying (65) and [13] A. Ozerov, C. Fevotte,´ R. Blouet, and J. L. Durrieu, “Multichannel (66) to (9), the majorization function of (9) can be designed nonnegative tensor factorization with structured constraints for user- guided audio source separation,” in Proc. ICASSP, 2011, pp. 257–260. as β [14] H. Sawada, H. Kameoka, S. Araki, and N. Ueda, “Multichannel exten-  p +1 β  φ yijn 2t v sions of non-negative matrix factorization with complex-valued ,” X ijnk | | ikn kjn IEEE Trans. ASLP, vol. 21, no. 5, pp. 971–982, 2013. GGD  β +  + const., L ≤ p pψijn [15] S. Mogami, D. Kitamura, Y. Mitsui, N. Takamune, H. Saruwatari, i,j,n,k (tiknvkjn) and N. Ono, “Independent low-rank matrix analysis based on complex (69) Student’s t-distribution for blind audio source separation,” in Proc. MLSP, 2017. where the constant term is independent of tikn and vkjn. By [16] D. Kitamura, S. Mogami, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, Y. Takahashi, and K. Kondo, “Generalized independent low- setting the partial derivatives of (69) w.r.t. tikn and vkjn to rank matrix analysis using heavy-tailed distributions for blind source zero, we obtain the update rules (16) and (17), respectively. separation,” EURASIP Journal on Advances in , vol. 2018, no. 28, pp. 1–25, 2018. [17] R. Ikeshita and Y. Kawaguchi, “Independent low-rank matrix analysis based on multivariate complex exponential power distribution,” in Proc. ICASSP, 2018, pp. 741–745. [18] G. R. Naik and W. Wang, “Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions,” International Journal of Electronics, vol. 99, no. 10, pp. 1333–1350, 2012. [19] D. R. Hunter and K. Lange, “Quantile regression via an MM algorithm,” J. Comput. Graph. Stat., vol. 9, no. 1, pp. 60–77, 2000. [20] D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, and S. Nakamura, “Multichannel signal separation combining direc- tional clustering and nonnegative matrix factorization with spectrogram restoration,” IEEE Trans. ASLP, vol. 23, no. 4, pp. 654–669, 2015. [21] D. Kitamura, “Open dataset: songkitamura,” http://d-kitamura.net/en/ dataset en.htm. Accessed 27 May 2018.