Independent Component Analysis: a Tutorial
Total Page:16
File Type:pdf, Size:1020Kb
Indep endent Comp onent Analysis A Tutorial Aap o Hyvrinen and Erkki Oja Helsinki University of Technology Lab oratory of Computer and Information Science PO Box FIN Esp o o Finland aapohyvarinenhutfi erkkiojahutfi httpwwwcishutfipro ject sic a A version of this pap er will app ear in Neural Networks with the title Indep endent Comp onentAnalysis Algorithms and Applications April Motivation Imagine that you are in a ro om where twopeople are sp eaking simultaneously You havetwomicrophones whichyou hold in dierentlocations The microphones giveyou two recorded time signals whichwe could denote by x t and x t with x and x the amplitudes and t the time index Eachofthese recorded signals is a weighted sum of the sp eechsignals emitted bythetwospeakers whichwedenote by s t and s t We could express this as a linear equation x ta s a s x ta s a s where a a a and a are some parameters that dep end on the distances of the microphones from the sp eakers It would b e very useful if you could now estimate the twooriginal sp eech signals s t and s t using only the recorded signals x t and x t This is called the cocktailparty problem For the time b eing weomitany time delays or other extra factors from our simplied mixing mo del As an illustration consider the waveforms in Fig and Fig These are of course not realistic sp eech signals but suce for this illustration The original sp eechsignals could lo ok something likethose in Fig and the mixed signals could lo ok likethosein Fig The problem is to recover the data in Fig using only the data in Fig Actuallyifwe knew the parameters a we could solvethe linear equation in by classical metho ds ij The p ointis however that if you dont knowthe a the problem is considerably more dicult ij One approachto solving this problem would b e to use some information on the statistical prop erties of the signals s t to estimate the a Actually and p erhaps surprisinglyitturns out that it is enough to i ii assume that s t and s tateachtimeinstant tarestatistical ly independent This is not an unrealistic assumption in manycases and it need not b e exactly true in practice The recently develop ed technique of Indep endentComp onentAnalysis or ICA can b e used to estimate the a based on the information of ij their indep endence whichallows us to separate the twooriginal source signals s t and s t from their mixtures x t and x t Fig gives the twosignals estimated bythe ICA metho d As can b e seen these are very close to the original source signals their signs are reversed but this has no signicance Indep endent comp onentanalysis was originally develop ed to deal with problems that are closely related to the co cktailpartyproblem Since the recentincrease of interest in ICA it has b ecome clear that this principle has a lot of other interesting applications as well 2 0 −2 0 10 20 30 40 50 60 70 80 90 100 2 0 −2 0 10 20 30 40 50 60 70 80 90 100 Figure The original signals 5 0 −5 0 10 20 30 40 50 60 70 80 90 100 2 0 −2 0 10 20 30 40 50 60 70 80 90 100 Figure The observed mixtures of the source signals in Fig 2 0 −2 0 10 20 30 40 50 60 70 80 90 100 2 0 −2 0 10 20 30 40 50 60 70 80 90 100 Figure The estimates of the original source signals estimated using only the observed signals in Fig The original signals were very accurately estimated up to multiplicativesigns Consider for example electrical recordings of brain activityasgiven byanelectro encephalogram EEG The EEG data consists of recordings of electrical p otentials in many dierent lo cations on the scalp These p otentials are presumably generated by mixing some underlying comp onents of brain activity This situation is quite similar to the co cktailpartyproblem wewould liketondthe original comp onents of brain activitybutwe can only observemixtures of the comp onents ICA can reveal interesting information on brain activitybygiving access to its indep endentcomponents Another very dierent application of ICA is on feature extraction Afundamental problem in digital signal pro cessing is to nd suitable representations for image audio or other kind of data for tasks like compression and denoising Data representations are often based on discrete linear transformations Standard linear transformations widely used in image pro cessing are the Fourier Haar cosine transforms etc Eachof them has its own favorable prop erties It would b e most useful to estimate the linear transformation from the data itself in which case the transform could b e ideally adapted to the kind of data that is b eing pro cessed Figure shows the basis functions obtained by ICA from patches of natural images Each image windowinthesetoftraining images would be a sup erp osition of these windows so that the co ecient in the sup erp osition are indep endent Feature extraction by ICA will b e explained in more detail later on All of the applications describ ed ab ovecan actually b e formulated in a unied mathematical framework that of ICA This is a very generalpurp ose metho d of signal pro cessing and data analysis In this review we cover the denition and underlying principles of ICA in Sections and Then starting from Section the ICA problem is solved on the basis of minimizing or maximizing certain conrast functions this transforms the ICA problem to a numerical optimization problem Many contrast functions are given and the relations b etween them are claried Section covers a useful prepro cessing that greatly helps solving the ICA problem and Section reviews one of the most ecientpractical learning rules for solving the problem the FastICA algorithm Then in Section typical applications of ICA are covered removing artefacts from brain signal recordings nding hidden factors in nancial time series and reducing noise in natural images Section concludes the text Figure Basis functions in ICA of natural images The input window size was pixels These basis functions can b e considered as the indep endent features of images Indep endent Comp onent Analysis Denition of ICA Torigorously dene ICA we can use a statistical latentvariables mo del Assume that we observe n linear mixtures x x of n indep endentcomp onents n x a s a s a s for all j j j j jn n We have now dropp ed the time index t in the ICA mo del we assume that each mixture x as well j as each indep endent comp onent s is a random variable instead of a prop er time signal The observed k values x t eg the microphone signals in the co cktail partyproblem are then a sample of this random j variable Without loss of generalitywecanassume that b oth the mixture variables and the indep endent comp onents havezeromean If this is not true then the observable variables x can always b e centered by i subtracting the sample mean whichmakes the mo del zeromean It is convenienttousevectormatrix notation instead of the sums likein the previous equation Let us denote by x the random vector whose elements are the mixtures x x and likewise by s the random n vector with elements s s Let us denote by A the matrix with elements a Generallybold lower n ij case letters indicate vectors and bold upp ercase letters denote matrices All vectors are understo o d as T column vectors thus x or the transp ose of x is a rowvector Using this vectormatrix notation the ab ovemixingmodel is written as x As Sometimes we need the columns of matrix Adenoting them by a the mo del can also b e written as j n X x a s i i i The statistical mo del in Eq is called indep endent comp onentanalysis or ICA mo del The ICA mo del is a generativemodel which means that it describ es howthe observed data are generated byapro cess of mixing the comp onents s The indep endent comp onents are latentvariables meaning that they cannot i be directly observed Also the mixing matrix is assumed to be unknown All weobserveisthe random vector xandwemust estimate b oth A and s using it This must b e done under as general assumptions as p ossible The starting point for ICA is the very simple assumption that the comp onents s are statistically i independent Statistical indep endence will b e rigorously dened in Section It will b e seen b elowthat we must also assume that the indep endentcomp onentmust have nongaussian distributions However in the basic mo del wedonot assume these distributions known if they are known the problem is considerably simplied For simplicity we are also assuming that the unknown mixing matrix is square but this assumption can b e sometimes relaxed as explained in Section Then after estimating the matrix A we can compute its inverse say W and obtain the indep endentcomponentsimplyby s Wx ICA is very closely related to the metho d called blind source separation BSS or blind signal separa tion Asource means here an original signal ie indep endent comp onent like the sp eaker in a co cktail partyproblem Blind means that we no very little if anything on the mixing matrix and makelittle assumptions on the source signals ICA is one metho d p erhaps the most widely used for p erforming blind source separation In many applications it would b e more realistic to assume that there is some noise in the measurements see eg whichwould mean adding a noise term in the mo del For simplicitywe omit any noise terms since the estimation of the noisefree mo del is dicult enough in itself and seems to b e sucient for manyapplications Ambiguities of ICA In the ICA mo del in Eq it is easy to see that the following ambiguities will hold We cannot determine the variances energies of the indep endentcomponents