Arxiv:1909.11294V3 [Stat.ML] 11 Jun 2021

Hierarchical Probabilistic Model for Blind Source Separation via Legendre Transformation Simon Luo1,2 Lamiae Azizi1,2 Mahito Sugiyama3 1School of Mathematics and Statistics, The University of Sydney, Sydney, Australia 2Data Analytics for Resources and Environments (DARE), Australian Research Council, Sydney, Australia 3National Institute of Informatics, Tokyo, Japan Abstract 2010] extract a specified number of components with the largest variance under an orthogonal constraint. They are composed of a linear combination of variables, and create a We present a novel blind source separation (BSS) set of uncorrelated orthogonal basis vectors that represent method, called information geometric blind source the source signal. The basis vectors with the N largest separation (IGBSS). Our formulation is based on variance are called the principal components and are the the log-linear model equipped with a hierarchi- output of the model. PCA has shown to be effective for cally structured sample space, which has theoreti- many applications such as dimensionality reduction and cal guarantees to uniquely recover a set of source feature extraction. However, for BSS, PCA makes the signals by minimizing the KL divergence from a assumption that the source signals are orthogonal, which is set of mixed signals. Source signals, received sig- often not the case in most practical applications. nals, and mixing matrices are realized as different layers in our hierarchical sample space. Our em- Similarly, ICA also attempts to find the N components with pirical results have demonstrated on images and the largest variance by relaxing the orthogonality constraint. time series data that our approach is superior to Variations of ICA, such as infomax [Bell and Sejnowski, well established techniques and is able to separate 1995], FastICA [Hyvärinen and Oja, 2000], and JADE [Car- signals with complex interactions. doso, 1999], separate a multivariate signal into additive subcomponents by maximizing the statistical independence of each component. ICA assumes that each component is 1 INTRODUCTION non-gaussian and the relationship between the source signal and the mixed signal is an affine transformation. In addition The objective of blind source separation (BSS) is to identify to these assumptions, ICA is sensitive to the initialization of a set of source signals from a set of multivariate mixed sig- the weights as the optimization is non-convex and is likely nals1. BSS is widely used for applications which are consid- to converge to a local optimum. ered to be the “cocktail party problem”. Examples include Other potential methods which can perform BSS in- image/signal processing [Isomura and Toyoizumi, 2016], clude non-negative matrix factorization (NMF) [Lee and arXiv:1909.11294v3 [stat.ML] 11 Jun 2021 artifact removal in medical imaging [Vigário et al., 1998], Seung, 2001, Berne et al., 2007], dictionary learning and electroencephalogram (EEG) signal separation [Con- (DL) [Olshausen and Field, 1997], and reconstruction ICA gedo et al., 2008]. Currently, there are a number of solutions (RICA) [Le et al., 2011]. NMF, DL and RICA are degener- for the BSS problem. The most widely used approaches are ate approaches to recover the source signal from the mixed variations of principal component analysis (PCA) [Pearson, signal, which means that they lose information when recov- 1901, Murphy, 2012] and independent component analysis ering the source signal. These approaches are more typically (ICA) [Comon, 1994, Murphy, 2012]. However, they all used for feature extraction. NMF factorizes a matrix into two have limitations with their approaches. matrices with nonnegative elements representing weights PCA and its modern variations such as sparse and features. The features extracted by NMF can be used PCA (SPCA) [Zou et al., 2006], non-linear PCA to recover the source signal. More recently, there are more (NLPCA) [Scholz et al., 2005], and Robust PCA [Xu et al., advanced techniques that uses Short-time Fourier transform (STFT) to transform the signal into the frequency domain 1“Mixed signals” and “received signals” are used exchangeably to construct a spectrogram before applying NMF [Sawada throughout this article. et al., 2019]. However, NMF does not maximize statistical Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021). independence which is required to completely separate the a a z z x x mixed signal into the source signal, and it is also sensitive 11 12 11 12 = 11 12 to initialization as the optimization is non-convex. Due to a21 a22 z21 z22 x21 x22 the non-convexity, additional constraints or heuristics for weight initialization is often applied to NMF to achieve Mixing Source Received layer layer layer better results [Ding et al., 2008, Boutsidis and Gallopou- los, 2008]. DL can be thought of as a variation of the ICA a11 z11 x11 approaches which requires an over-complete basis vector for the mixing matrix. DL may be advantageous because a12 z12 x12 additional constraints such as a positive code or a dictionary ? can be applied to the model. However, since it requires an a21 z21 x21 over-complete basis vector, information may be lost when reconstructing the source signal. In addition, like all the a22 z22 x22 other approaches, DL is also non-convex and it is sensitive to the initialization of the weights. Figure 1: An example of our sample space. Dashed lines All previous approaches have limitations such as loss of show removed partial orders to allow for learning. The nodes information or non-convex optimization and require con- represent the state of each variable and the arrow shows the straints or assumptions such as orthogonality or an affine direction of the partial ordering. transformation which are not ideal for BSS. In the following, we introduce our approach to BSS, called IGBSS (Informa- recover the source signal, that is Z = A−1X = BX. tion Geometric BSS), using the log-linear model [Agresti, 2012], which can introduce relationships between possible Our strategy is to treat the three components, X, Z, and A, states into its sample space [Sugiyama et al., 2017]. Unlike of BSS as a joint distribution and model it by the log-linear the previous approaches that we mentioned above, our ap- model [Agresti, 2012], which is a well-known energy-based proach does not have the assumptions or limitations that they model. We can take non-affine transformation into account require. We provide a flexible solution by introducing a hi- and formulate BSS as a convex optimization problem. erarchical structure between signals into our model, which allows us to treat interactions between signals that are more complex than an affine transformation. Unlike other existing 2.1 LAYER CONFIGURATION methods, our approach does not require the inversion of the mixing matrix and is able to recover the sign of the signal. Let Ω be a sample space of distributions modeled by the Thanks to the well-developed information geometric analy- log-linear model, which is composed of possible states of a sis of the log-linear model [Amari, 2001], optimization of system of interest. Our key idea is to introduce a hierarchical our method is achieved via convex optimization, hence it layered structure into Ω to achieve BSS. We call this model always arrives at the globally optimal unique solution. We information geometric BSS (IGBSS) as its optimality is theoretically show that it always minimizes the Kullback– supported by the tight connection between the log-linear Leibler (KL) divergence from a set of mixed signals to a model and the information geometric properties of the space set of source signals. We empirically demonstrate that our of distributions (statistical manifold), which we will show hierarchical model leads to better separation of signals in- in the following subsections. We implement three layers of cluding complex interaction such as higher-order feature BSS, the mixing layer, the source layer, and the received interactions than existing methods. layer, into Ω in the form of partial orders and learn the joint representation on it using the log-linear model. The log- linear model on a partially ordered set (poset), a set equipped 2 FORMULATION with a partial order “” [Gierz et al., 2003], is proposed by Sugiyama et al. [2017], which includes a (higher-order) Boltzmann machines as an instance [Luo and Sugiyama, BSS is formulated as a function f that separates a set of 2019]. We use this model to achieve the task of BSS by received signals X into a set of source signals Z, i.e., introducing layered structure as partial orders. The received Z = f(X). For example, if one employs a ICA based layer and the source layer represent the input received signal formulation, the BSS problem reduces to X = AZ, where and the output source signal of BSS, respectively, and the the received signal X 2 L×M with L signals and the R mixing layer encodes information of how to mix the source sample size M is an affine transformation of the source signal. In the following, we consistently assume that L is signal Z 2 N×M with N signals and a mixing matrix R the number of received signals, M is the sample size, and A 2 L×N . The objective is to estimate Z by learning A R N is the number of source signals. given X. Our approach is different from the classical formulation, where the inverse of the mixing matrix is learnt to Let us construct three layers in the sample space Ω as Ω = f?g [ A [ Z [ X and assume that these sets are model [Coull and Agresti, 2003] and F is called a model given as A = fa11; : : : ; aLN g, Z = fz11; : : : ; zNM g, and matrix, which represents relationship between states. The X = fx11; : : : ; xLM g. The element ? denotes the least assumption of the log-linear model is that F is needs to element, and it acts as a partition function of the log-linear be non-singular, and Sugiyama et al.

Arxiv:1909.11294V3 [Stat.ML] 11 Jun 2021

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support