Audio Source Separation for Music in Low-Latency and High-Latency Scenarios

Total Page:16

File Type:pdf, Size:1020Kb

Audio Source Separation for Music in Low-Latency and High-Latency Scenarios Audio Source Separation for Music in Low-latency and High-latency Scenarios Ricard Marxer Pi~n´on TESI DOCTORAL UPF / 2013 Directors de la tesi Dr. Xavier Serra i Dr. Jordi Janer Department of Information and Communication Technologies This dissertation is dedicated to my family and loved ones. Acknowledgments I wish to thank the Music Technology Group (MTG) at the Universitat Pompeu Fabra (UPF) for creating such a great environment in which to work on this research. I especially want to thank my supervisors Xavier Serra and Jordi Janer for giving me this opportunity and their support during the whole process. I also want to express my deepest gratitude to the Yamaha Corporation and their Monet research team formed by Keita Arimoto, Sean Hashimoto, Kazunobu Kondo, Yu Takahashi and Yasuyuki Umeyama, without whom this work would not have been possible. I also want to thank MTG's signal processing team composed of Merlijn Blaauw, Jordi Bonada, Graham Coleman, Saso Musevic and Marti Umbert with whom we had many fruitful discussions that became the seeds of the research conducted here. Another special thanks goes to the Music Cognition Group at the ILLC / University of Amsterdam with Henkjan Honing, Leigh Smith and Olivia Ladinig who invited me to stay and do research with them for a while. There I learned a lot about how we humans perceive music and how this can be taken into account when processing these types of signals. Many thanks to all the researchers with whom I have collaborated, discus- sed and shared great moments over these years. Some of these are Eduard Aylon, Andreas Beisler, Dmitry Bogdanov, Oscar` Celma, Maarten de Boer, Juan Jos´eBosch, Ferdinand Fuhrmann, Jordi Funollet, Cristina Garrido, Emilia G´omez,Perfecto Herrera, Piotr Holonowicz, Enric Guaus, Mart´ın Haro, Cyril Laurier, Oscar Mayor, Owen Meyers, Waldo Nogueira, Hen- drik Purwins, Gerard Roma, Justin Salamon, Joan Serr`a,Mohamed Sordo, Roberto Toscano, Nicolas Wack and Jos´eR. Zapata. I'm sorry if I have v vi forgotten anyone of you who have aided me so much. Many thanks to my father, Jack Marxer, for his valuable feedback, proo- freading and corrections of this document. Finally I would like to thank my family and Mathilde for being there when I most needed them. Abstract Source separation in digital signal processing consists in finding the original signals that were mixed together into a set of observed signals. Solutions to this problem have been extensively studied for musical signals, however their application to real-world practical situations remains infrequent. There are two main obstacles for their widespread adoption depending on the scenario. The main limitation in some cases is high latency and computational cost. In other cases the quality of the results is insufficient. Much work has gone toward improving the quality of music separation under general conditions. But few studies have been devoted to the development of low-latency and low computational cost separation of monaural signals, as well as to the separation quality of specific instruments. We propose specific methods to address these issues in each of these scena- rios independently. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decompo- sition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals. vii Resum En el camp del tractament digital de senyal, la separaci´ode fonts consisteix en l'obtenci´odels senyals originals que trobem barrejats en un conjunt de se- nyals observats. Les solucions a aquest problema s'han estudiat `ampliament per a senyals musicals. Hi ha per`odues limitacions principals per a la seva aplicaci´ogeneralitzada. En alguns casos l'alta lat`enciai cost computacional del m`etode ´esl'obstacle principal. En un segon escenari, la qualitat dels resultats ´esinsuficient. Gran part de la recerca s'ha enfocat a la millora de qualitat de separaci´ode la m´usicaen condicions generals, per`opocs es- tudis s'han centrat en el desenvolupament de t`ecniquesde baixa lat`enciai baix cost computacional de mescles monoaurals, aix´ıcom en la qualitat de separaci´ode instruments espec´ıfics. Aquesta tesi proposa m`etodes per tractar aquests temes en cadascun dels dos casos de forma independent. En primer lloc, ens centrem en els m`etodes amb un baix cost computacional i baixa lat`encia. Proposem l'´usde la regularitzaci´ode Tikhonov com a m`etode de descomposici´ode l'espectre en el context de baixa lat`encia. El comparem amb les t`ecniquesexistents en tasques d’estimaci´oi seguiment dels tons, que s´onpassos crucials en molts m`etodes de separaci´o. A continuaci´outilitzem i avaluem el m`etode de descomposici´ode l'espectre en tasques de separaci´ode veu cantada, baix i percussi´o. En segon lloc, proposem diversos m`etodes d'alta lat`enciaque milloren la separaci´ode la veu cantada, gr`aciesal modelatge de components espec´ıfics,com la respiraci´oi les consonants. Finalment, explorem l'´usde correlacions temporals i anotacions manuals per millorar la separaci´odels instruments de percussi´oi dels senyals musicals polif`onicscomplexes. ix Resumen En el campo del tratamiento digital de la se~nal,la separaci´onde fuentes consiste en la obtenci´onde las se~nalesoriginales que han sido mezcladas en un conjunto de se~nalesobservadas. Las soluciones a este problema se han estudiado ampliamente para se~nalesmusicales. Hay dos limitaciones principales para su adopci´ongeneralizada. En algunos casos la alta latencia y coste computacional es el mayor obst´aculo.En un segundo escenario, la calidad de los resultados es insuficiente. Gran parte de la investigaci´onse ha enfocado en la mejora de la calidad de separaci´onde la m´usicaen condiciones generales, pero pocos estudios se han centrado en el desarrollo de t´ecnicasde baja latencia y bajo coste computacional de mezclas monoaurales, as´ıcomo en la calidad de separaci´onde instrumentos espec´ıficos. Esta tesis propone m´etodos para tratar estos temas en cada uno de los ca- sos de forma independiente. En primer lugar, nos centramos en los m´etodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularizaci´onde Tikhonov como m´etodo de descomposici´ondel espectro en el contexto de baja latencia. Lo comparamos con las t´ecnicasexistentes en tareas de estimaci´ony seguimiento de los tonos, que son pasos cruciales en muchos m´etodos de separaci´on.A continuaci´onutilizamos y evaluamos el m´etodo de descomposici´ondel espectro en tareas de separaci´onde voz cantada, bajo y percusi´on.En segundo lugar, proponemos varios m´etodos de alta latencia que mejoran la separaci´onde la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiraci´ony las consonantes. Finalmente, exploramos el uso de correlacio- nes temporales y anotaciones manuales para mejorar la separaci´onde los instrumentos de percusi´ony se~nalesmusicales polif´onicascomplejas. xi R´esum´e Dans le domaine du traitement du signal, la s´eparationde source consiste `aobtenir les signaux originaux qui ont ´et´em´elang´esdans un ensemble de signaux observ´ees.Les solutions `ace probl`emeont largement ´et´e´etudi´es pour l'application `ala musique. Il existe deux principales limitations. La premi`ereest une latence et un co^utde calcul ´elev´es.La seconde est une qualit´einsuffisante des r´esultats.Une grande partie de la recherche actuelle s'est concentr´ee`aam´eliorer,dans le cas g´en´erale,la qualit´ede s´eparationde la musique. Peu d'´etudes ont ´et´econsacr´ees`aminimiser la latence et le co^ut de calcul des techniques, ainsi qu’`ala qualit´ede s´eparationd'instruments sp´ecifiques. Cette th`esepropose des m´ethodes pour aborder ces deux questions ind´epen- damment. Dans un premier temps, nous nous concentrons sur les m´ethodes `afaibles co^ut de calcul et de latence. Pour cela, nous proposons d'utiliser la r´egularisationde Tikhonov en tant que m´ethode de d´ecomposition spec- trale. Nous la comparons `ades techniques existantes dans le cadre de l'estimation et de suivi des tons, qui sont des ´etapes cruciales pour de nom- breux proc´ed´esde s´eparation.Nous avons aussi utilis´eet ´evalu´ela m´ethode de d´ecomposition spectrale pour la s´eparation de voix chant´ee,de basse et de percussion. Dans un second temps, nous proposons des m´ethodes `ala- tence ´elev´ee.La mod´elisationdes composants qui ne sont souvent pas pris en compte, comme la respiration et les consonnes, nous permet d'am´eliorer la s´eparationde voix chant´ee.Nous explorons l'utilisation de corr´elations temporelles et annotations manuelles pour am´eliorerla s´eparationdes ins- truments de percussion et les signaux musicaux polyphoniques complexes. xiii Contents Acknowledgmentsv Abstract vii Resum ix Resumen xi R´esum´e xiii Contents xv List of Figures xviii List of Tables xxii List of Abbreviations and Symbols xxiii 1 Introduction3 1.1 Applications............................ 5 1.2 Motivation and Objectives.................... 9 1.3 Context of the Study....................... 10 1.4 Presentation of Contributions.................. 10 1.5 Organization ........................... 11 2 Audio Source Separation Basics 15 2.1 Problem Definition and Classification.............
Recommended publications
  • Chapter 15 - BLIND SOURCE SEPARATION: Principal & Independent Component Analysis C G.D
    HST-582J/6.555J/16.456J Biomedical Signal and Image Processing Spring 2008 Chapter 15 - BLIND SOURCE SEPARATION: Principal & Independent Component Analysis c G.D. Clifford 2005-2008 Introduction In this chapter we will examine how we can generalize the idea of transforming a time series into an alternative representation, such as the Fourier (frequency) domain, to facil- itate systematic methods of either removing (filtering) or adding (interpolating) data. In particular, we will examine the techniques of Principal Component Analysis (PCA) using Singular Value Decomposition (SVD), and Independent Component Analysis (ICA). Both of these techniques utilize a representation of the data in a statistical domain rather than a time or frequency domain. That is, the data are projected onto a new set of axes that fulfill some statistical criterion, which implies independence, rather than a set of axes that represent discrete frequencies such as with the Fourier transform, where the independence is assumed. Another important difference between these statistical techniques and Fourier-based tech- niques is that the Fourier components onto which a data segment is projected are fixed, whereas PCA- or ICA-based transformations depend on the structure of the data being ana- lyzed. The axes onto which the data are projected are therefore discovered. If the structure of the data (or rather the statistics of the underlying sources) changes over time, then the axes onto which the data are projected will change too1. Any projection onto another set of axes (or into another space) is essentially a method for separating the data out into separate components or sources which will hopefully allow us to see important structure more clearly in a particular projection.
    [Show full text]
  • Signal Processing for Magnetoencephalography
    Signal Processing for Magnetoencephalography Rupert Benjamin Clarke Submitted for the degree of Doctor of Philosophy University of York Department of Electronics September 2010 Abstract Magnetoencephalography (MEG) is a non-invasive technology for imaging human brain function. Contemporary methods of analysing MEG data include dipole fitting, minimum norm estimation (MNE) and beamforming. These are concerned with localising brain activity, but in isolation they do not provide concrete evidence of interaction among brain regions. Since cognitive neuroscience demands answers to this type of question, a novel signal processing framework has been developed consisting of three stages. The first stage uses conventional MNE to separate a small number of underlying source signals from a large data set. The second stage is a novel time-frequency analysis consisting of a recursive filter bank. Finally, the filtered outputs from different brain regions are compared using a unique partial cross-correlation analysis that accounts for propagation time. The output from this final stage could be used to construct conditional independence graphs depicting the internal networks of the brain. In the second processing stage, a complementary pair of high- and low-pass filters is iteratively applied to a discrete time series. The low-pass output is critically sampled at each stage, which both removes redundant information and effectively scales the filter coefficients in time. The approach is similar to the Fast Wavelet Transform (FWT), but features a more sophisticated resampling step. This, in combination with the filter design procedure leads to a finer frequency resolution than the FWT. The subsequent correlation analysis is unusual in that a latency estimation procedure is included to establish the probable transmission delays between regions of interest.
    [Show full text]
  • Arxiv:1909.11294V3 [Stat.ML] 11 Jun 2021
    Hierarchical Probabilistic Model for Blind Source Separation via Legendre Transformation Simon Luo1,2 Lamiae Azizi1,2 Mahito Sugiyama3 1School of Mathematics and Statistics, The University of Sydney, Sydney, Australia 2Data Analytics for Resources and Environments (DARE), Australian Research Council, Sydney, Australia 3National Institute of Informatics, Tokyo, Japan Abstract 2010] extract a specified number of components with the largest variance under an orthogonal constraint. They are composed of a linear combination of variables, and create a We present a novel blind source separation (BSS) set of uncorrelated orthogonal basis vectors that represent method, called information geometric blind source the source signal. The basis vectors with the N largest separation (IGBSS). Our formulation is based on variance are called the principal components and are the the log-linear model equipped with a hierarchi- output of the model. PCA has shown to be effective for cally structured sample space, which has theoreti- many applications such as dimensionality reduction and cal guarantees to uniquely recover a set of source feature extraction. However, for BSS, PCA makes the signals by minimizing the KL divergence from a assumption that the source signals are orthogonal, which is set of mixed signals. Source signals, received sig- often not the case in most practical applications. nals, and mixing matrices are realized as different layers in our hierarchical sample space. Our em- Similarly, ICA also attempts to find the N components with pirical results have demonstrated on images and the largest variance by relaxing the orthogonality constraint. time series data that our approach is superior to Variations of ICA, such as infomax [Bell and Sejnowski, well established techniques and is able to separate 1995], FastICA [Hyvärinen and Oja, 2000], and JADE [Car- signals with complex interactions.
    [Show full text]
  • Brain Signal Analysis
    Duann, Jung and Makeig: Brain Sugnal Analysis Brain Signal Analysis Jeng-Ren Duann, Tzyy-Ping Jung, Scott Makeig Institute for Neural Computation, University of California, San Diego CA Computational Neurobiology Lab, The Salk Institute, La Jolla CA RUNNING HEAD: Brain signal analysis Artificial neural networks (ANNs) have now been applied to a wide variety of real-world problems in many fields of application. The attractive and flexible characteristics of ANNs, such as their parallel operation, learning by example, associative memory, multifactorial optimization and extensibility, make them well suited to the analysis of biological and medical signals. In this study, we review applications of ANNs to brain signal analysis, for instance, for analysis of the electroencephalogram (EEG) and magnetoencephalogram (MEG), or electromyogram (EMG), and as applied to computed tomographic (CT) images and magnetic resonance (MR) brain images, and to series of functional MR brain images (i.e. fMRI). 1 Duann, Jung and Makeig: Brain Sugnal Analysis 1. INTRODUCTION Artificial neural networks (ANNs) are computational framework inspired by our expanding knowledge of the activity of networks of biological neurons in the brain. ANNs cannot hope to reproduce all the still not well-understood complexities of actual brain networks. Rather, most ANNs are implemented as sets of nonlinear summing elements interconnected by weighted links, forming a highly simplified model of brain connectivity. The basic operation of such artificial neurons is to pass a weighted sum of their inputs through a nonlinear hard-limiting or soft “squashing” function. To form an ANN, these basic calculating elements (artificial neurons) are most often arranged in interconnected layers.
    [Show full text]
  • Some Multivariate Signal Processing Operations
    Chapter 5 Some multivariate signal processing operations In this chapter we study some signal processing operations for multivariable signals. We will consider applications where one collects a set of signals xi(n); i = 1; 2;:::;M from some system. It is then important to study whether there are possible interde- pendencies between the signals. Such interdependencies cause redundancies, which can be exploited for data compression. Interdependencies between the individual signals can also contain useful information about the structure of the underlying systems that generated the set of signals. Firstly, when the number M of signals is large dependencies between the individual signals make it possible to compress the data. This is for example the case when the M signals xi represent pixels of a sequence of images, where M is the number of pixels in one image. If the images of interest depict a speci¯c class of objects, such as human faces, it turns out that there are redundancies among the various xi's which can be exploited to compress the images. Secondly, the individual signals xi(n) are often mixtures of unknown source signals sj(n). If the mixture is assumed linear, this implies that there are source signals sj(n) such that xi(n) = ai1s1(n) + ai2s2(n) + ¢ ¢ ¢ + aiMS sMS ; i = 1; 2;:::;M (5.1) for n = 0; 1;:::;N ¡ 1, where MS is the number of source signals. Introducing the vectors 2 3 2 3 x1(n) s1(n) 6 7 6 7 6 x (n) 7 6 s (n) 7 6 2 7 6 2 7 x(n) = 6 .
    [Show full text]
  • Thesis Signal Fraction Analysis and Artifact
    THESIS SIGNAL FRACTION ANALYSIS AND ARTIFACT REMOVAL IN EEG Submitted by James N. Knight Department of Computer Science In partial ful¯llment of the requirements for the Degree of Master of Science Colorado State University Fort Collins, Colorado Fall 2003 ABSTRACT OF THESIS SIGNAL FRACTION ANALYSIS AND ARTIFACT REMOVAL IN EEG The presence of artifacts, such as eye blinks, in electroencephalographic (EEG) recordings obscures the underlying processes and makes analysis di±cult. Large amounts of data must often be discarded because of contamination by eye blinks, muscle activity, line noise, and pulse signals. To overcome this di±culty, signal separation techniques are used to separate artifacts from the EEG data of interest. The maximum signal fraction (MSF) transformation is introduced as an alternative to the two most common techniques: principal component analysis (PCA) and independent component analysis (ICA). A signal separation method based on canonical correlation analysis (CCA) is also considered. The method of delays is introduced as a technique for dealing with non-instantaneous mixing of brain and artifact source signals. The signal separation methods are compared on a series of tests constructed from arti¯cially generated data. A novel method of comparison based on the classi¯cation of mental task data for a brain-computer interface (BCI) is also pursued. The results show that the MSF transformation is an e®ective technique for removing artifacts from EEG recordings. The performance of the MSF approach is comparable with ICA, the current state of the art, and is faster to compute. It is also demonstrated that certain artifacts can be removed from EEG data without negatively impacting the classi¯cation of mental tasks.
    [Show full text]
  • A Review of Blind Source Separation in NMR Spectroscopy Ichrak Toumi, Stefano Caldarelli, Bruno Torrésani
    A review of blind source separation in NMR spectroscopy Ichrak Toumi, Stefano Caldarelli, Bruno Torrésani To cite this version: Ichrak Toumi, Stefano Caldarelli, Bruno Torrésani. A review of blind source separation in NMR spectroscopy. Progress in Nuclear Magnetic Resonance Spectroscopy, Elsevier, 2014, 81, pp.37-64. 10.1016/j.pnmrs.2014.06.002. hal-01060561 HAL Id: hal-01060561 https://hal.archives-ouvertes.fr/hal-01060561 Submitted on 3 Sep 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A review of Blind Source Separation in NMR Spectroscopy Ichrak Toumi, Stefano Caldarelli iSm2, UMR 7313, Aix Marseille Universit´eCNRS Marseille, France Bruno Torr´esani Aix-Marseille Universit´eCNRS, Centrale Marseille I2M, UMR 7373, 13453 Marseille, France Abstract Fourier transform is the data processing naturally associated to most NMR experi- ments. Notable exceptions are Pulse Field Gradient and relaxation analysis, the structure of which is only partially suitable for FT. With the revamp of NMR of complex mixtures, fueled by analytical challenges such as metabolomics, alternative and more apt mathematical methods for data processing have been sought, with the aim of decomposing the NMR signal into simpler bits.
    [Show full text]
  • Unsupervised Audio Source Separation Using Generative Priors
    INTERSPEECH 2020 October 25–29, 2020, Shanghai, China Unsupervised Audio Source Separation using Generative Priors Vivek Narayanaswamy1, Jayaraman J. Thiagarajan2, Rushil Anirudh2 and Andreas Spanias1 1SenSIP Center, School of ECEE, Arizona State University, Tempe, AZ 2Lawrence Livermore National Labs, 7000 East Avenue, Livermore, CA [email protected], [email protected], [email protected], [email protected] Abstract given set of sources and the mixing process, consequently re- quiring complete re-training when those assumptions change. State-of-the-art under-determined audio source separation sys- This motivates a strong need for the next generation of unsuper- tems rely on supervised end to end training of carefully tailored vised separation methods that can leverage the recent advances neural network architectures operating either in the time or the in data-driven modeling, and compensate for the lack of labeled spectral domain. However, these methods are severely chal- data through meaningful priors. lenged in terms of requiring access to expensive source level Utilizing appropriate priors for the unknown sources has labeled data and being specific to a given set of sources and been an effective approach to regularize the ill-conditioned na- the mixing process, which demands complete re-training when ture of source separation. Examples include non-Gaussianity, those assumptions change. This strongly emphasizes the need statistical independence, and sparsity [13]. With the emergence for unsupervised methods that can leverage the recent advances of deep learning methods, it has been shown that choice of in data-driven modeling, and compensate for the lack of la- the network architecture implicitly induces a structural prior for beled data through meaningful priors.
    [Show full text]
  • Blind Dimensionality Reduction with Analog Integrated Photonics
    1 Demonstration of multivariate photonics: blind dimensionality reduction with analog integrated photonics Alexander N. Tait*, Philip Y. Ma, Thomas Ferreira de Lima, Eric C. Blow, Matthew P. Chang, Mitchell A. Nahmias, Bhavin J. Shastri, and Paul R. Prucnal Abstract Multi-antenna radio front-ends generate a multi-dimensional flood of information, most of which is partially redundant. Redundancy is eliminated by dimensionality reduction, but contemporary digital processing techniques face harsh fundamental tradeoffs when implementing this class of functions. These tradeoffs can be broken in the analog domain, in which the performance of optical technologies greatly exceeds that of electronic counterparts. Here, we present concepts, methods, and a first demonstration of multivariate photonics: a combination of integrated photonic hardware, analog dimensionality reduc- tion, and blind algorithmic techniques. We experimentally demonstrate 2-channel, 1.0 GHz principal component analysis in a photonic weight bank using recently proposed algorithms for synthesizing the multivariate properties of signals to which the receiver is blind. Novel methods are introduced for controlling blindness conditions in a laboratory context. This work provides a foundation for further research in multivariate photonic information processing, which is poised to play a role in future generations of wireless technology. Index Terms Microwave Photonics, Silicon Photonics, Analog Integrated Circuits, Multidimensional Signal Pro- arXiv:1903.03474v1 [physics.app-ph] 10 Feb 2019 cessing, Adaptive Estimation *Corresponding author: [email protected] The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA. A.N.T. is now with the National Institute of Standards and Technology, Boulder, CO 80305, USA. B.J.S.
    [Show full text]
  • Survey of Sparse and Non-Sparse Methods in Source Separation
    Survey of Sparse and Non-Sparse Methods in Source Separation Paul D. O’Grady,1 Barak A. Pearlmutter,1 Scott T. Rickard2 1 Hamilton Institute, National University of Ireland Maynooth, Co. Kildare, Ireland. Ollscoil na hE´ ireann Ma´ Nuad, E´ ire 2 University College Dublin, Ireland. An Cola´ iste Ollscoile, Baile A´ tha Cliath, E´ ire Received 20 September 2004; accepted 4 April 2005 ABSTRACT: Source separation arises in a variety of signal process- influences the complexity of source separation. If M ¼ N, the mixing ing applications, ranging from speech processing to medical image process A is defined by an even-determined (i.e. square) matrix and, analysis. The separation of a superposition of multiple signals is provided that it is non-singular, the underlying sources can be esti- accomplished by taking into account the structure of the mixing proc- mated by a linear transformation. If M > N, the mixing process A is ess and by making assumptions about the sources. When the infor- defined by an over-determined matrix and, provided that it is full mation about the mixing process and sources is limited, the problem is called ‘‘blind’. By assuming that the sources can be represented rank, the underlying sources can be estimated by least-squares opti- sparsely in a given basis, recent research has demonstrated that sol- misation or linear transformation involving matrix pseudo-inver- utions to previously problematic blind source separation problems sion. If M < N, then the mixing process is defined by an under-deter- can be obtained. In some cases, solutions are possible to problems mined matrix and consequently source estimation becomes more intractable by previous non-sparse methods.
    [Show full text]
  • Anechoic Blind Source Separation Using Wigner Marginals
    JournalofMachineLearningResearch12(2011)1111-1148 Submitted 11/08; Revised 8/10; Published 3/11 Anechoic Blind Source Separation Using Wigner Marginals Lars Omlor [email protected] Martin A. Giese∗ [email protected] Section for Computational Sensomotorics, Department of Cognitive Neurology Hertie Institute for Clinical Brain Research & Center for Integrative Neuroscience University Clinic Tubingen¨ Frondsbergstrasse 23 72070 Tubingen,¨ Germany Editor: Daniel Lee Abstract Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be ex- tended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the gen- eral anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subprob- lem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes.
    [Show full text]
  • Comparison of Principal Component Analysis and Independedent Component Analysis for Blind Source Separation
    Romanian Reports in Physics, Volume 56, Number I, P. 20-32, 2004 COMPARISON OF PRINCIPAL COMPONENT ANALYSIS AND INDEPENDEDENT COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION R. MUTIHAC1, MARC M. VAN HULLE2 1 University of Bucharest, Department of Electricity and Biophysics, RO-76900, Romania, 2 K.U. Leuven, Labo voor Neuro- en Psychofysiologie, B-3000 Leuven, Belgium, {radu.mutihac, marc.vanhulle}@med.kuleuven.ac.be Abstract. Our contribution briefly outlines the basics of the well-established technique in data mining, namely the principal component analysis (PCA), and a rapidly emerging novel method, that is, the independent component analysis (ICA). The performance of PCA singular value decomposition-based and stationary linear ICA in blind separation of artificially generated data out of linear mixtures was critically evaluated and compared. All our results outlined the superiority of ICA relative to PCA in faithfully retrieval of the original independent source components. Key words: principal component analysis (PCA), independent component analysis (ICA), blind source separation (BSS), and higher-order statistics. 1. INTRODUCTION In data analysis, a common task consists in finding an adequate representation of multivariate data, which is expected to provide the underlying factors describing their essential structure. Linear transformations are often envisaged to perform such a task due to their computational and conceptual simplicity. Some common linear transformation methods are principal component analysis (PCA), factor analysis (FA), projection pursuit (PP), and more recently the independent component analysis (ICA). ICA has emerged as a useful extension of PCA and developed in context with blind source separation (BSS) and digital signal processing (DSP) [1], [2]. ICA is also related to recent theories of the visual brain, which assume that consecutive processing steps lead to a progressive reduction in the redundancy of the representation [3], [4].
    [Show full text]