EE 556 Neural Networks - Course Project
Total Page:16
File Type:pdf, Size:1020Kb
EE 556 Neural Networks - Course Project Technical Report Isaac Gerg and Tim Gilmour December 12, 2006
Technical Report on the implementation of FastICA and Infomax Independent Component Analysis
1. Introduction
In this project we implemented the FastICA [1] and the Infomax [2] algorithms for Independent Component Analysis (ICA). We developed Matlab code based on the equations in the papers, and tested the algorithms on a cocktail-party audio simulation. Our original plan was to compare the FastICA algorithm with a specialized “Two- Source ICA” algorithm presented in [3], but after extensive work trying to reproduce the results in [3], we decided that the algorithm was not robust enough to spend more time analyzing, so we decided to use the better-known Infomax algorithm [2] as a comparison instead. This report contains a brief overview of our implementation of the FastICA and Infomax algorithms, followed by our experimental results and overall conclusions about the comparison between the two algorithms. Our analysis of the two primary papers [1] and [2] is contained in a separate technical summary document.
2. Implementation
We implemented each algorithm in Matlab. We implemented the FastICA algorithm based primarily on the weight update equations in [1] (Eq. (20), p. 7), and the Infomax algorithm based primarily on the weight update equations in [2] (Eqs. (14) and (15), p. 7). To test each algorithm’s ability to correctly separate the sources, we constructed the following test:
1. Simulate two independent sources by reading in two distinct audio files. 2. Simulate the mixing of the two audio sources by creating a 2x2 mixing matrix and using it to mix the two sources. Calculate the signal-to-interference ratio (SIR) of the resulting input mixes to measure the maliciousness of the mixing matrix. A “hard” mix is where the SIR’s are quite different from each other. 3. Run the respective ICA algorithm on the input mixes and compute the unmixed sources. 4. Scale and match the estimated components to correspond to their associated source components. 5. Measure the signal-to-noise ratio (SNR) of the recovered sources. We used two public-domain ICA test sounds (source2.wav and source3.wav) from the Helsinki University of Technology demo at http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi to test both algorithms. The two sources were approximately ten seconds long and were composed of speech from two different subjects – one talking in English and the other in Spanish. We chose speech samples to give the ICA algorithm a rigorous test, because speech sounds (although slightly supergaussian) are harder to separate than sources with highly nongaussian pdfs. For this test, one hundred trials were conducted. We randomly generated the mixing matrix for each trial (mixing matrix values uniformly distributed between 0 and 1). For a fair comparison of the convergence speed, we manually optimized the learning rate and adjusted the convergence criteria so that convergence was defined (using the mean of all mixing matrix values) at approximately ~99% of final optimum weight values. For FastICA, the learning rate was fixed at 0.1 and minimum delta fixed at 0.00001. For Infomax we used a simple one-tap IIR filter with parameter 0.9999 for smoothing, and the learning rate was fixed at 0.001 and the smoothed minimum delta was fixed at 0.46. We presented the audio data repeatedly until convergence was reached. We chose to use the hyperbolic tangent as the nonlinearity for the FastICA method, since it was recommended by the authors of the paper as a good general-purpose nonlinearity for most input signals. For the Infomax algorithm we chose to use the logistic function as our neural network activation function, as it provides a simple differentiable non-linearity and produces an anti-Hebb (anti-saturation) term that includes the logistic non-linearity itself (thus taking advantage of higher than second-order statistics, as shown in the Taylor expansion).
3. Experimental Results
For the two algorithms we plotted the output SNR for the multiple trials, providing a measure of the success of the unmixing of the mixed input signals. The SNR was computed by dividing the power of the original source signal by the power of the difference between the output estimated unmixed signal and the original source signal (e.g. the “noise”). The ICA technique is able to unmix sources up to an arbitrary constant and a permutation of source order. Thus for proper calculation of the “noise”, each output unmixed signal had to be properly matched and scaled to the corresponding original input source signal (before mixing). The matching was performed using by picking the output signal that had maximum correlation with the specified input signal. The scaling used the mean of the array of ratios between the output and input signals at all time instants. We also computed the SIR of each mixing matrix, giving a measure of “how difficult” the different mixes were to separate. Finally, we also measured the time taken to converge in each trial, for both algorithms. Figures 1 through 4 below show typical data time plots and histograms. Figures 5 through 7 show statistics for 100 trials of both FastICA and Infomax. S o u r c e 1 1
0 . 5
0
- 0 . 5
- 1 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 4 x 1 0 S o u r c e 2 1
0 . 5
0
- 0 . 5
- 1 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 4 x 1 0 O b s e r v a t i o n 1 0 . 2
0 . 1
0
- 0 . 1
- 0 . 2 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 4 x 1 0 O b v e r v a t i o n 2 1
0 . 5
0
- 0 . 5
- 1 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 4 x 1 0 E s t i m a t e o f S o u r c e 1 , S N R = 4 1 . 1 7 4 d B
1
0 . 5
0
- 0 . 5
- 1 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 4 x 1 0 E s t i m a t e o f S o u r c e 2 , S N R = 3 1 . 4 4 3 d B
2
1
0
- 1 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 4 x 1 0 Figure 1. Typical audio signal plots: original sources (top two), mixed sources (middle two), unmixed sources (bottom two). The similarity between the original and unmixed signals is evident, and the listed SNR’s give the ratio of the original signal to the difference of the original and unmixed signals. 4 H i s t o g r a m : S o u r c e 1 x 1 0 H i s t o g r a m : S o u r c e 2 1 0 0 0 0 2 . 5
8 0 0 0 2
6 0 0 0 1 . 5
4 0 0 0 1
2 0 0 0 0 . 5
0 0 - 1 - 0 . 5 0 0 . 5 1 - 1 - 0 . 5 0 0 . 5 1 H i s t o g r a m : O b s e r v a t i o n 1 H i s t o g r a m : O b s e r v a t i o n 2 1 0 0 0 0 8 0 0 0
8 0 0 0 6 0 0 0
6 0 0 0 4 0 0 0 4 0 0 0
2 0 0 0 2 0 0 0
0 0 - 0 . 2 - 0 . 1 0 0 . 1 0 . 2 - 1 - 0 . 5 0 0 . 5 1
4 H i s t o g r a m : S o u r c e 1 E s t i m a t e x 1H 0 i s t o g r a m : S o u r c e 2 E s t i m a t e 1 0 0 0 0 2 . 5
8 0 0 0 2
6 0 0 0 1 . 5
4 0 0 0 1
2 0 0 0 0 . 5
0 0 - 1 - 0 . 5 0 0 . 5 1 - 1 0 1 2
Figure 2. Typical signal histograms: original sources (top two), mixed sources (middle two), unmixed sources (bottom two). Notice the supergaussian shape of each histogram, and also that the linearly mixed source histograms are more gaussian than the original or unmixed sources, as expected. C o n v e r g e n c e o f t h e 2 x 2 w e i g h t m a t r i x v a l u e s C o n v e r g e n c e o f t h e 2 x 2 w e i g h t m a t r i x v a l u e s 1 2 . 5
0 . 8 2
1 . 5 0 . 6 s e e u
u 1 l l a a v
V 0 . 4
x t i r n t e
i 0 . 5 a c m e
f t
f 0 . 2 h e g o i 0 e C W 0 - 0 . 5
- 0 . 2 - 1
- 0 . 4 - 1 . 5 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 0 5 1 0 1 5 I t e r a t i o n I t e r a t i o n s 4 x 1 0 Figure 3. Typical FastICA (left) and Infomax (right) weight matrix convergence over time. Note that FastICA converges in fewer iterations, and also that each component in FastICA converges separately.
Figure 4. Scatterplots of the original (top left), mixed (top right), sphered (bottom left) and unmixed (bottom right) data, showing the ICA transformation toward maximally independent directions. U n m i x i n g S N R 4 2
4 0
3 8
3 6 ] B d [ 3 4 R N S 3 2
3 0
2 8 S o u r c e 1 S N R [ d B ] S o u r c e 2 S N R [ d B ] 2 6 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 T r i a l #
Figure 5. SNR of the original signals to the output noise over all 100 trials, for FastICA (left) and
Infomax (right). Means for FastICA: Source1 40.197dB , Source2 33.646dB . Means for
Infomax: Source1 35.6900dB , Source2 30.2497dB .
M i x i n g S I R w . r . t . S o u r c e 1 6 0 M i x 1 S I R [ d B ] 5 0 M i x 2 S I R [ d B ]
4 0
3 0
2 0 ] B d [
1 0 R I S 0
- 1 0
- 2 0
- 3 0
- 4 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 T r i a l #
Figure 6. FastICA SIR of each of the 100 mixtures fixing one of the sources to be the source and one to be the interferer for FastICA (left) and Infomax (right). Means for FastICA: Source1 7.9332dB ,
Source2 7.4199dB . Means for Infomax: Source1 6.2372dB , Source2 6.7274dB . Figure 7. CPU time of each trial for FastICA (left) and Infomax (right). Mean time for FastICA was 4.6395 seconds, and mean time for Infomax was 3.8991 seconds.
4. Discussion and Conclusions
Both algorithms performed well at separating the two sources (Figure 5), despite the intentional wide variation in SIR of the two mixed sources (Figure 6). The algorithms are thus robust to a wide range of mixing situations. Mean output SNRs were greater than 30 dB, with FastICA generally providing higher SNR than Infomax. Figure 1, Figure 2, and Figure 4 clearly show the unmixing reconstruction of the original signals, with the time plots, histograms, and scatterplots being almost completely restored. We also listened to the unmixed sounds and they were nearly indistinguishable from the original sources. The FastICA algorithm generally converged in much fewer iterations than the Infomax algorithm (~100 iterations for FastICA, ~30000 iterations for Infomax), although the CPU time for FastICA was slightly higher. The CPU time could probably be reduced significantly by further code optimizations. Further enhancements to our implementation could include a scheme to vary the learning parameter with time instead of using a fixed as in this experiment. This would reduce trial variability and improve the final solution stability. Additionally, the “natural gradient” method [7] could be employed to eliminate taking the inverse of the weight matrix in the Infomax code, speeding up the algorithm. In conclusion, although both algorithms work well, the FastICA method works slightly better overall than the Infomax method in terms of output SNR and iterations to convergence.
5. References
[1] Hyvarinen, A., Fast and Robust Fixed-Point Algorithms for Independent Component Analysis. IEEE Transactions on Neural Networks. Vol 10. Num 3. Pps. 626-634. 1999 [2] Bell, A.J. and T.J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, Volume 7(6). 1995. Pps. 1004-1034. [3] Yang, Zhang, Wang. A Novel ICA Algorithm For Two Sources. ICSP Proceedings (IEEE). 2004. Pps. 97-100. [4] Haykin, Simon. Neural Networks: a comprehensive foundation, 2nd edition. Prentice- Hall: Upper Saddle River, New Jersey. 1999. [5] Tuner, P. Guide to Scientific Computing, 2nd Edition. CRC Press: Boca Raton, FL. 2001. p. 45. [6] Luenberger, D. Optimization by Vector Space Methods, Wiley: New York, 1969. [7] Amari, S., Cichocki, A., Yang, H.H. “A New Learning Algorithm for Blind Signal Separation,” in Advances in Neural Information Processing Systems, D. Touretzky, M. Mozer, M. Hasselmo, Eds. MIT Press: Cambridge, MA. 1996. pp. 757-763.