Introduction Non-negative matrix factorization Information geometry Conclusion
RIKEN BSI Seminar
Some applications of non-negative matrix factorization and of information geometry in audio signal processing
Arnaud Dessein (Arshia Cont, Gérard Assayag, Guillaume Lemaitre)
Institute for Research and Coordination of Acoustics and Music, Paris, France Japanese-French Laboratory for Informatics, Tokyo, Japan
October 22nd 2010
[email protected] October 22nd 2010 RIKEN BSI Seminar 1/42 Introduction Non-negative matrix factorization Information geometry Conclusion Outline
1 Introduction
2 Non-negative matrix factorization
3 Information geometry
4 Conclusion
[email protected] October 22nd 2010 RIKEN BSI Seminar 2/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion Outline
1 Introduction Presentation of the IRCAM Research at the IRCAM Motivations towards NMF and IG
2 Non-negative matrix factorization
3 Information geometry
4 Conclusion
[email protected] October 22nd 2010 RIKEN BSI Seminar 3/42 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened.
General head: Frank Madlener. Scientific director: Hugues Vinet. History:
Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?
Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris.
Figure: Centre Georges Pompidou (Renzo Piano and Richard Rogers).
[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened.
History:
Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?
Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet.
Figure: Frank Madlener and Hugues Vinet.
[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened. Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?
Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research.
Figure: Georges Pompidou and Pierre Boulez.
[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 1977: the center opened. Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?
Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished.
Figure: Stravinsky fountain (Tinguely and Niki de Saint Phalle) and scale model of the IRCAM.
[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?
Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened.
Figure: IRCAM (Renzo Piano and Richard Rogers).
[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?
Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened. Figures: 150 people: artists, scientists, technicians, Figure: IRCAM (Renzo Piano administrative staff. and Richard Rogers). 11,000,000 euros of annual budget. 8 research teams.
[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 Sound synthesis and processing. Live interaction. Computer-aided composition. Sound spatialization.
Research thematics:
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?
Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services.
“Working transversally for music research.” Researchers and musicians working together on multidisciplinary projects centered around music and exploration of sounds. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Live interaction. Computer-aided composition. Sound spatialization.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?
Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing.
“Creating new sounds as an extension of acoustic instruments.” Writing of sound, digital signal processing for sound transformation and synthesis, physical modeling, virtual instrument design. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Computer-aided composition. Sound spatialization.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?
Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing. Live interaction.
“Interacting with the computer during performances.” Writing of time, performance capture/analysis (audio, gesture), synchronization (event detec- tion, score following, shape recognition), multimedia/multimodality (dance, image, text). [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Sound spatialization.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?
Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing. Live interaction. Computer-aided composition.
“Using the computer as a reflexive support of composition.” Writing of music, formalizing/producing/managing complex musical structures, assisting composition/orchestration. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?
Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing. Live interaction. Computer-aided composition. Sound spatialization.
“Composing space as a dimension of musical expression.” Writing of space, simulation of static/mobile sources and of acoustic spaces, perception and cognition of space. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?
Figure: Levels of representation of audio, waveform and spectrogram representations.
[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Devise computational tools for complex real-time settings. Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?
Figure: Levels of representation of audio, waveform and spectrogram representations. Fill in the gap between signal and symbolic representations.
[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.
Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?
Figure: Levels of representation of audio, waveform and spectrogram representations. Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings.
[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?
Figure: Levels of representation of audio, waveform and spectrogram representations. Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.
[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Outline
1 Introduction
2 Non-negative matrix factorization Background Proposed system for real-time recognition of multiple sources Sparsity and non-negative decomposition Beta-divergence and non-negative decomposition Results Discussion
3 Information geometry
4 Conclusion
[email protected] October 22nd 2010 RIKEN BSI Seminar 7/42 Interpretation: X vj ≈ Whj = hij wi i
wi : basis vectors. hij : decomposition coefficients.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is NMF?
Standard NMF model [Lee & Seung, 1999]. n×m n×r r×m Let V ∈ R+ and r < min(n, m), find W ∈ R+ and H ∈ R+ such that: V ≈ WH
[email protected] October 22nd 2010 RIKEN BSI Seminar 8/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is NMF?
Standard NMF model [Lee & Seung, 1999]. n×m n×r r×m Let V ∈ R+ and r < min(n, m), find W ∈ R+ and H ∈ R+ such that: V ≈ WH
Interpretation: X vj ≈ Whj = hij wi i
wi : basis vectors. hij : decomposition coefficients.
[email protected] October 22nd 2010 RIKEN BSI Seminar 8/42 Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+ Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?
Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+
Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] :
[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?
Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+
Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] : Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+
[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?
Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+
Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] : Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+ Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij
[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?
Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+
Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] : Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+ Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT
[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Examples of application: source separation [Cichocki et al., 2009], but also polyphonic music transcription [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Bertin et al., 2010, Vincent et al., 2010]. Limits for real-time settings.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use NMF for sound analysis?
Model reminder.
V ≈ WH X vj ≈ Whj = hij wi i
Usual setting: V: time-frequency representation. vj : successive frames. wi : spectral models. hij : activation coefficients.
[email protected] October 22nd 2010 RIKEN BSI Seminar 10/42 Limits for real-time settings.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use NMF for sound analysis?
Model reminder.
V ≈ WH X vj ≈ Whj = hij wi i
Usual setting: V: time-frequency representation. vj : successive frames. wi : spectral models. hij : activation coefficients. Examples of application: source separation [Cichocki et al., 2009], but also polyphonic music transcription [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Bertin et al., 2010, Vincent et al., 2010].
[email protected] October 22nd 2010 RIKEN BSI Seminar 10/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use NMF for sound analysis?
Model reminder.
V ≈ WH X vj ≈ Whj = hij wi i
Usual setting: V: time-frequency representation. vj : successive frames. wi : spectral models. hij : activation coefficients. Examples of application: source separation [Cichocki et al., 2009], but also polyphonic music transcription [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Bertin et al., 2010, Vincent et al., 2010]. Limits for real-time settings.
[email protected] October 22nd 2010 RIKEN BSI Seminar 10/42 Applications: Speech analysis [Sha & Saul, 2005]. Score following [Cont, 2006]. Multi-f0 and multi-instrument recognition [Cont et al., 2007]. Sight-reading evaluation [Cheng et al., 2008]. Polyphonic music transcription [Niedermayer, 2008].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to adapt NMF to real-time settings?
Towards non-negative decomposition: 1 Learn source templates wi before decomposition. 2 Stack this templates in a dictionary W kept fixed during decomposition. 3 Project the incoming audio stream onto the dictionary W in real-time.
[email protected] October 22nd 2010 RIKEN BSI Seminar 11/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to adapt NMF to real-time settings?
Towards non-negative decomposition: 1 Learn source templates wi before decomposition. 2 Stack this templates in a dictionary W kept fixed during decomposition. 3 Project the incoming audio stream onto the dictionary W in real-time. Applications: Speech analysis [Sha & Saul, 2005]. Score following [Cont, 2006]. Multi-f0 and multi-instrument recognition [Cont et al., 2007]. Sight-reading evaluation [Cheng et al., 2008]. Polyphonic music transcription [Niedermayer, 2008].
[email protected] October 22nd 2010 RIKEN BSI Seminar 11/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion General architecture
Template learning (off-line) Audio stream decomposition (on-line)
Sound source database Auditory scene
Short-time sound representation Short-time sound representation
(k) V vj
W Non-negative matrix factorization Non-negative decomposition (k) (k) (k) V w h vj Whj ≈ ≈
w(k) hj
Source templates Source activations
Figure: Schema of the general architecture of the system.
[email protected] October 22nd 2010 RIKEN BSI Seminar 12/42 Method: apply standard NMF to each sound sample k with a factorization rank r = 1. Example: the sources are the 88 notes of the piano.
6275 0
5648 −1
5020 −2 −3 4393 −4 3765 −5 3138 −6 2510 Frequency (Hz) −7 1883 −8 1255 −9
628 −10
0 −11 A0 A1 A2 A3 A4 A5 A6 A7 Note Figure: Templates learned for the piano.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Template learning
Template learning (off-line) Goal: learn a dictionary W source templates.
Sound source database
Short-time sound representation
V(k)
Non-negative matrix factorization V(k) w(k)h(k) ≈
w(k)
Source templates
Figure: Schema of templates learning.
[email protected] October 22nd 2010 RIKEN BSI Seminar 13/42 Example: the sources are the 88 notes of the piano.
6275 0
5648 −1
5020 −2 −3 4393 −4 3765 −5 3138 −6 2510 Frequency (Hz) −7 1883 −8 1255 −9
628 −10
0 −11 A0 A1 A2 A3 A4 A5 A6 A7 Note Figure: Templates learned for the piano.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Template learning
Template learning (off-line) Goal: learn a dictionary W source templates. Method: apply standard NMF to each sound Sound source database sample k with a factorization rank r = 1.
Short-time sound representation
V(k)
Non-negative matrix factorization V(k) w(k)h(k) ≈
w(k)
Source templates
Figure: Schema of templates learning.
[email protected] October 22nd 2010 RIKEN BSI Seminar 13/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Template learning
Template learning (off-line) Goal: learn a dictionary W source templates. Method: apply standard NMF to each sound Sound source database sample k with a factorization rank r = 1.
Example: the sources are the 88 notes of the Short-time sound representation piano. V(k) 6275 0
5648 −1
5020 −2 −3 Non-negative matrix factorization 4393 V(k) w(k)h(k) −4 ≈ 3765 −5 3138 (k) −6 w 2510 Frequency (Hz) −7 1883 −8 1255 −9 Source templates 628 −10
0 −11 A0 A1 A2 A3 A4 A5 A6 A7 Note Figure: Templates learned for the piano. Figure: Schema of templates learning.
[email protected] October 22nd 2010 RIKEN BSI Seminar 13/42 Method: employ non-negative decomposition to project the audio stream onto W. Example: chromatic scale on the piano. Encoding coefficients
A#7 90 F7 C7 80 G6 D6 70 A5 E5 60 B4 F#4 50 C#4
Template 40 G#3 D#3 30 A#2 F2 20 C2 G1 10 D1
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale.
Two approaches investigated to control the decomposition: sparsity, beta-divergence.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition
Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene
Short-time sound representation
vj
Non-negative decomposition v Wh j ≈ j
hj
Source activations
Figure: Schema of audio stream decomposition. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 Example: chromatic scale on the piano. Encoding coefficients
A#7 90 F7 C7 80 G6 D6 70 A5 E5 60 B4 F#4 50 C#4
Template 40 G#3 D#3 30 A#2 F2 20 C2 G1 10 D1
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale.
Two approaches investigated to control the decomposition: sparsity, beta-divergence.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition
Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene Method: employ non-negative decomposition to project the audio stream onto W.
Short-time sound representation
vj
Non-negative decomposition v Wh j ≈ j
hj
Source activations
Figure: Schema of audio stream decomposition. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 Two approaches investigated to control the decomposition: sparsity, beta-divergence.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition
Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene Method: employ non-negative decomposition to project the audio stream onto W. Example: chromatic scale on the piano. Short-time sound representation Encoding coefficients
A#7 90 v F7 j C7 80 G6 D6 70 A5 E5 60 Non-negative decomposition B4 vj Whj F#4 50 ≈ C#4
Template 40 G#3 hj D#3 30 A#2 F2 20 C2 G1 10 D1 Source activations
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale. Figure: Schema of audio stream decomposition. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition
Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene Method: employ non-negative decomposition to project the audio stream onto W. Example: chromatic scale on the piano. Short-time sound representation Encoding coefficients
A#7 90 v F7 j C7 80 G6 D6 70 A5 E5 60 Non-negative decomposition B4 vj Whj F#4 50 ≈ C#4
Template 40 G#3 hj D#3 30 A#2 F2 20 C2 G1 10 D1 Source activations
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale. Figure: Schema of Two approaches investigated to control the audio stream decomposition. decomposition: sparsity, beta-divergence. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 kxk card {i : x 6= 0} sp(x) = 0 = i n n
kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n
P b tanh |axi | sp(x) = i with a > 0 et b 1 n >
p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?
Definition. A vector x is sparse when its energy is concentrated in a few coefficients.
Sparsity measures [Karvanen & Cichocki, 2003]:
[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n
P b tanh |axi | sp(x) = i with a > 0 et b 1 n >
p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?
Definition. A vector x is sparse when its energy is concentrated in a few coefficients.
Sparsity measures [Karvanen & Cichocki, 2003]:
kxk card {i : x 6= 0} sp(x) = 0 = i n n
[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 P b tanh |axi | sp(x) = i with a > 0 et b 1 n >
p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?
Definition. A vector x is sparse when its energy is concentrated in a few coefficients.
Sparsity measures [Karvanen & Cichocki, 2003]:
kxk card {i : x 6= 0} sp(x) = 0 = i n n
kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n
[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?
Definition. A vector x is sparse when its energy is concentrated in a few coefficients.
Sparsity measures [Karvanen & Cichocki, 2003]:
kxk card {i : x 6= 0} sp(x) = 0 = i n n
kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n
P b tanh |axi | sp(x) = i with a > 0 et b 1 n >
[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?
Definition. A vector x is sparse when its energy is concentrated in a few coefficients.
Sparsity measures [Karvanen & Cichocki, 2003]:
kxk card {i : x 6= 0} sp(x) = 0 = i n n
kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n
P b tanh |axi | sp(x) = i with a > 0 et b 1 n >
p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6
[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?
Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007].
[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?
Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006].
[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?
Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002].
[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?
Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1
Figure: Projection onto the s-sparsity cone.
[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?
Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming Figure: Optimization between [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. the smin- and smax -sparsity cones.
[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?
Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].
[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Sparsity parameter: λ1 > 0. Regularization parameter: λ2 > 0. Constraint parameters: 0 6 smin < smax 6 1. Algorithm: Update h with a sequence of convex quadratic programs. Approximation of the sparsity cones with tangent planes. Figure: Approximation of the sparsity cones.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on convex quadratic programming
Problem. 1 λ Minimize kv − Whk2 + λ khk + 2 khk2 2 2 1 1 2 2 r subject to h ∈ R++, smin 6 sp(h) 6 smax
[email protected] October 22nd 2010 RIKEN BSI Seminar 17/42 Algorithm: Update h with a sequence of convex quadratic programs. Approximation of the sparsity cones with tangent planes. Figure: Approximation of the sparsity cones.
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on convex quadratic programming
Problem. 1 λ Minimize kv − Whk2 + λ khk + 2 khk2 2 2 1 1 2 2 r subject to h ∈ R++, smin 6 sp(h) 6 smax
Sparsity parameter: λ1 > 0. Regularization parameter: λ2 > 0. Constraint parameters: 0 6 smin < smax 6 1.
[email protected] October 22nd 2010 RIKEN BSI Seminar 17/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on convex quadratic programming
Problem. 1 λ Minimize kv − Whk2 + λ khk + 2 khk2 2 2 1 1 2 2 r subject to h ∈ R++, smin 6 sp(h) 6 smax
Sparsity parameter: λ1 > 0. Regularization parameter: λ2 > 0. Constraint parameters: 0 6 smin < smax 6 1. Algorithm: Update h with a sequence of convex quadratic programs. Approximation of the sparsity cones with tangent planes. Figure: Approximation of the sparsity cones.
[email protected] October 22nd 2010 RIKEN BSI Seminar 17/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example
Encoding coefficients
A#7
F7 90 C7 G6 80 D6 A5 70 E5 60 B4 F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Figure: Activations for a chromatic scale, λ1 = 0.
[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example
Encoding coefficients
A#7
F7 90 C7 G6 80 D6 A5 70 E5 60 B4 F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Figure: Activations for a chromatic scale, λ1 = 1.
[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example
Encoding coefficients
A#7
F7 90 C7 G6 80 D6 A5 70 E5 60 B4 F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Figure: Activations for a chromatic scale, λ1 = 5.
[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example
Encoding coefficients
A#7 F7 90 C7 G6 80 D6 70 A5 E5 60 B4
F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Figure: Activations for a chromatic scale, λ1 = 10.
[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example
Encoding coefficients
A#7 80 F7 C7 G6 70 D6 A5 60 E5 B4 50 F#4 C#4 40 Template G#3 D#3 30 A#2 F2 20 C2
G1 10 D1
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Figure: Activations for a chromatic scale, λ1 = 50.
[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example
Encoding coefficients
A#7 F7 60 C7 G6 D6 50 A5 E5 40 B4 F#4 C#4 30 Template G#3 D#3 A#2 20 F2 C2 10 G1 D1
A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Figure: Activations for a chromatic scale, λ1 = 100.
[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y. β Scaling property: dβ (λx|λy) = λ dβ (x|y).
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?
Definition [Eguchi & Kano, 2001].
Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)
[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y. β Scaling property: dβ (λx|λy) = λ dβ (x|y).
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?
Definition [Eguchi & Kano, 2001].
Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)
Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2
[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 β Scaling property: dβ (λx|λy) = λ dβ (x|y).
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?
Definition [Eguchi & Kano, 2001].
Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)
Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y.
[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?
Definition [Eguchi & Kano, 2001].
Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)
Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y. β Scaling property: dβ (λx|λy) = λ dβ (x|y).
[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT
Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].
Employed in audio for the particular case dIS [Févotte et al., 2009].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?
NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++
[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].
Employed in audio for the particular case dIS [Févotte et al., 2009].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?
NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++
Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT
[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].
Employed in audio for the particular case dIS [Févotte et al., 2009].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?
NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++
Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT
Employed to interpolate between dE and dKL [Kompass, 2007].
[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Employed in audio for the particular case dIS [Févotte et al., 2009].
Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?
NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++
Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT
Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].
[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?
NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++
Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT
Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].
Employed in audio for the particular case dIS [Févotte et al., 2009].
[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Decomposition parameter: β ∈ R. Algorithm: 1 Initialize h with positive values. 2 Update h until convergence: WT (Wh).β−2 ⊗ v h ← h ⊗ WT (Wh).β−1 Updates tailored to real-time: