Introduction Non-negative matrix factorization Information geometry Conclusion

RIKEN BSI Seminar

Some applications of non-negative matrix factorization and of information geometry in audio signal processing

Arnaud Dessein (Arshia Cont, Gérard Assayag, Guillaume Lemaitre)

Institute for Research and Coordination of Acoustics and Music, , France Japanese-French Laboratory for Informatics, Tokyo, Japan

October 22nd 2010

[email protected] October 22nd 2010 RIKEN BSI Seminar 1/42 Introduction Non-negative matrix factorization Information geometry Conclusion Outline

1 Introduction

2 Non-negative matrix factorization

3 Information geometry

4 Conclusion

[email protected] October 22nd 2010 RIKEN BSI Seminar 2/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion Outline

1 Introduction Presentation of the IRCAM Research at the IRCAM Motivations towards NMF and IG

2 Non-negative matrix factorization

3 Information geometry

4 Conclusion

[email protected] October 22nd 2010 RIKEN BSI Seminar 3/42 1970: the President asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place was finished. 1977: the center opened.

General head: Frank Madlener. Scientific director: Hugues Vinet. History:

Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?

Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris.

Figure: Centre Georges Pompidou (Renzo Piano and Richard Rogers).

[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened.

History:

Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?

Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet.

Figure: Frank Madlener and Hugues Vinet.

[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened. Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?

Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research.

Figure: Georges Pompidou and Pierre Boulez.

[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 1977: the center opened. Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?

Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished.

Figure: Stravinsky (Tinguely and ) and scale model of the IRCAM.

[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 Figures: 150 people: artists, scientists, technicians, administrative staff. 11,000,000 euros of annual budget. 8 research teams.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?

Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened.

Figure: IRCAM (Renzo Piano and Richard Rogers).

[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What is the IRCAM?

Status: Institut de Recherche et Coordination Acoustique/Musique. Associated with the Centre Georges Pompidou in Paris. General head: Frank Madlener. Scientific director: Hugues Vinet. History: 1970: the President Georges Pompidou asked Pierre Boulez to found an institution for musical research. 1973: the part underneath Place Igor Stravinsky was finished. 1977: the center opened. Figures: 150 people: artists, scientists, technicians, Figure: IRCAM (Renzo Piano administrative staff. and Richard Rogers). 11,000,000 euros of annual budget. 8 research teams.

[email protected] October 22nd 2010 RIKEN BSI Seminar 4/42 Sound synthesis and processing. Live interaction. Computer-aided composition. Sound spatialization.

Research thematics:

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?

Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services.

“Working transversally for music research.” Researchers and musicians working together on multidisciplinary projects centered around music and exploration of sounds. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Live interaction. Computer-aided composition. Sound spatialization.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?

Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing.

“Creating new sounds as an extension of acoustic instruments.” Writing of sound, digital signal processing for sound transformation and synthesis, physical modeling, virtual instrument design. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Computer-aided composition. Sound spatialization.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?

Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing. Live interaction.

“Interacting with the computer during performances.” Writing of time, performance capture/analysis (audio, gesture), synchronization (event detec- tion, score following, shape recognition), multimedia/multimodality (dance, image, text). [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Sound spatialization.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?

Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing. Live interaction. Computer-aided composition.

“Using the computer as a reflexive support of composition.” Writing of music, formalizing/producing/managing complex musical structures, assisting composition/orchestration. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we do?

Research teams: Instrumental Acoustics. Acoustic and Cognitive Spaces. Perception and Sound Design. Sound Analysis-Synthesis. Musical Representations. Analysis of musical practices. Real-Time Musical Interactions. Online Services. Research thematics: Sound synthesis and processing. Live interaction. Computer-aided composition. Sound spatialization.

“Composing space as a dimension of musical expression.” Writing of space, simulation of static/mobile sources and of acoustic spaces, perception and cognition of space. [email protected] October 22nd 2010 RIKEN BSI Seminar 5/42 Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?

Figure: Levels of representation of audio, waveform and spectrogram representations.

[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Devise computational tools for complex real-time settings. Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?

Figure: Levels of representation of audio, waveform and spectrogram representations. Fill in the gap between signal and symbolic representations.

[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.

Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?

Figure: Levels of representation of audio, waveform and spectrogram representations. Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings.

[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Introduction Presentation of the IRCAM Non-negative matrix factorization Research at the IRCAM Information geometry Motivations towards NMF and IG Conclusion What do we need?

Figure: Levels of representation of audio, waveform and spectrogram representations. Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Two approaches: NMF: current trend, structural a priori, reductionist. IG: new trend, no structural a priori, holistic.

[email protected] October 22nd 2010 RIKEN BSI Seminar 6/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Outline

1 Introduction

2 Non-negative matrix factorization Background Proposed system for real-time recognition of multiple sources Sparsity and non-negative decomposition Beta-divergence and non-negative decomposition Results Discussion

3 Information geometry

4 Conclusion

[email protected] October 22nd 2010 RIKEN BSI Seminar 7/42 Interpretation: X vj ≈ Whj = hij wi i

wi : basis vectors. hij : decomposition coefficients.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is NMF?

Standard NMF model [Lee & Seung, 1999]. n×m n×r r×m Let V ∈ R+ and r < min(n, m), find W ∈ R+ and H ∈ R+ such that: V ≈ WH

[email protected] October 22nd 2010 RIKEN BSI Seminar 8/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is NMF?

Standard NMF model [Lee & Seung, 1999]. n×m n×r r×m Let V ∈ R+ and r < min(n, m), find W ∈ R+ and H ∈ R+ such that: V ≈ WH

Interpretation: X vj ≈ Whj = hij wi i

wi : basis vectors. hij : decomposition coefficients.

[email protected] October 22nd 2010 RIKEN BSI Seminar 8/42 Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+ Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?

Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+

Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] :

[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?

Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+

Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] : Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+

[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?

Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+

Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] : Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+ Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij

[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to solve NMF?

Standard NMF problem. 1 Minimize C(W, H) = kV − WHk2 2 F n×r r×m subject to W ∈ R+ , H ∈ R+

Standard algorithms [Berry et al., 2007, Cichocki et al., 2009] : Alternate non-negative least-squares: 2 2 H ← arg min kV − WHkF W ← arg min kV − WHkF r×m n×r H∈R+ W∈R+ Additive updates: ∂C(W, H) ∂C(W, H) hij ← hij − µij wij ← wij − ηij ∂hij ∂wij Multiplicative updates: WT V VHT H ← H ⊗ W ← W ⊗ WT WH WHHT

[email protected] October 22nd 2010 RIKEN BSI Seminar 9/42 Examples of application: source separation [Cichocki et al., 2009], but also polyphonic music transcription [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Bertin et al., 2010, Vincent et al., 2010]. Limits for real-time settings.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use NMF for sound analysis?

Model reminder.

V ≈ WH X vj ≈ Whj = hij wi i

Usual setting: V: time-frequency representation. vj : successive frames. wi : spectral models. hij : activation coefficients.

[email protected] October 22nd 2010 RIKEN BSI Seminar 10/42 Limits for real-time settings.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use NMF for sound analysis?

Model reminder.

V ≈ WH X vj ≈ Whj = hij wi i

Usual setting: V: time-frequency representation. vj : successive frames. wi : spectral models. hij : activation coefficients. Examples of application: source separation [Cichocki et al., 2009], but also polyphonic music transcription [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Bertin et al., 2010, Vincent et al., 2010].

[email protected] October 22nd 2010 RIKEN BSI Seminar 10/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use NMF for sound analysis?

Model reminder.

V ≈ WH X vj ≈ Whj = hij wi i

Usual setting: V: time-frequency representation. vj : successive frames. wi : spectral models. hij : activation coefficients. Examples of application: source separation [Cichocki et al., 2009], but also polyphonic music transcription [Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006, Raczyński et al., 2007, Bertin et al., 2010, Vincent et al., 2010]. Limits for real-time settings.

[email protected] October 22nd 2010 RIKEN BSI Seminar 10/42 Applications: Speech analysis [Sha & Saul, 2005]. Score following [Cont, 2006]. Multi-f0 and multi-instrument recognition [Cont et al., 2007]. Sight-reading evaluation [Cheng et al., 2008]. Polyphonic music transcription [Niedermayer, 2008].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to adapt NMF to real-time settings?

Towards non-negative decomposition: 1 Learn source templates wi before decomposition. 2 Stack this templates in a dictionary W kept fixed during decomposition. 3 Project the incoming audio stream onto the dictionary W in real-time.

[email protected] October 22nd 2010 RIKEN BSI Seminar 11/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to adapt NMF to real-time settings?

Towards non-negative decomposition: 1 Learn source templates wi before decomposition. 2 Stack this templates in a dictionary W kept fixed during decomposition. 3 Project the incoming audio stream onto the dictionary W in real-time. Applications: Speech analysis [Sha & Saul, 2005]. Score following [Cont, 2006]. Multi-f0 and multi-instrument recognition [Cont et al., 2007]. Sight-reading evaluation [Cheng et al., 2008]. Polyphonic music transcription [Niedermayer, 2008].

[email protected] October 22nd 2010 RIKEN BSI Seminar 11/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion General architecture

Template learning (off-line) Audio stream decomposition (on-line)

Sound source database Auditory scene

Short-time sound representation Short-time sound representation

(k) V vj

W Non-negative matrix factorization Non-negative decomposition (k) (k) (k) V w h vj Whj ≈ ≈

w(k) hj

Source templates Source activations

Figure: Schema of the general architecture of the system.

[email protected] October 22nd 2010 RIKEN BSI Seminar 12/42 Method: apply standard NMF to each sound sample k with a factorization rank r = 1. Example: the sources are the 88 notes of the piano.

6275 0

5648 −1

5020 −2 −3 4393 −4 3765 −5 3138 −6 2510 Frequency (Hz) −7 1883 −8 1255 −9

628 −10

0 −11 A0 A1 A2 A3 A4 A5 A6 A7 Note Figure: Templates learned for the piano.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Template learning

Template learning (off-line) Goal: learn a dictionary W source templates.

Sound source database

Short-time sound representation

V(k)

Non-negative matrix factorization V(k) w(k)h(k) ≈

w(k)

Source templates

Figure: Schema of templates learning.

[email protected] October 22nd 2010 RIKEN BSI Seminar 13/42 Example: the sources are the 88 notes of the piano.

6275 0

5648 −1

5020 −2 −3 4393 −4 3765 −5 3138 −6 2510 Frequency (Hz) −7 1883 −8 1255 −9

628 −10

0 −11 A0 A1 A2 A3 A4 A5 A6 A7 Note Figure: Templates learned for the piano.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Template learning

Template learning (off-line) Goal: learn a dictionary W source templates. Method: apply standard NMF to each sound Sound source database sample k with a factorization rank r = 1.

Short-time sound representation

V(k)

Non-negative matrix factorization V(k) w(k)h(k) ≈

w(k)

Source templates

Figure: Schema of templates learning.

[email protected] October 22nd 2010 RIKEN BSI Seminar 13/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Template learning

Template learning (off-line) Goal: learn a dictionary W source templates. Method: apply standard NMF to each sound Sound source database sample k with a factorization rank r = 1.

Example: the sources are the 88 notes of the Short-time sound representation piano. V(k) 6275 0

5648 −1

5020 −2 −3 Non-negative matrix factorization 4393 V(k) w(k)h(k) −4 ≈ 3765 −5 3138 (k) −6 w 2510 Frequency (Hz) −7 1883 −8 1255 −9 Source templates 628 −10

0 −11 A0 A1 A2 A3 A4 A5 A6 A7 Note Figure: Templates learned for the piano. Figure: Schema of templates learning.

[email protected] October 22nd 2010 RIKEN BSI Seminar 13/42 Method: employ non-negative decomposition to project the audio stream onto W. Example: chromatic scale on the piano. Encoding coefficients

A#7 90 F7 C7 80 G6 D6 70 A5 E5 60 B4 F#4 50 C#4

Template 40 G#3 D#3 30 A#2 F2 20 C2 G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale.

Two approaches investigated to control the decomposition: sparsity, beta-divergence.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition

Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene

Short-time sound representation

vj

Non-negative decomposition v Wh j ≈ j

hj

Source activations

Figure: Schema of audio stream decomposition. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 Example: chromatic scale on the piano. Encoding coefficients

A#7 90 F7 C7 80 G6 D6 70 A5 E5 60 B4 F#4 50 C#4

Template 40 G#3 D#3 30 A#2 F2 20 C2 G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale.

Two approaches investigated to control the decomposition: sparsity, beta-divergence.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition

Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene Method: employ non-negative decomposition to project the audio stream onto W.

Short-time sound representation

vj

Non-negative decomposition v Wh j ≈ j

hj

Source activations

Figure: Schema of audio stream decomposition. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 Two approaches investigated to control the decomposition: sparsity, beta-divergence.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition

Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene Method: employ non-negative decomposition to project the audio stream onto W. Example: chromatic scale on the piano. Short-time sound representation Encoding coefficients

A#7 90 v F7 j C7 80 G6 D6 70 A5 E5 60 Non-negative decomposition B4 vj Whj F#4 50 ≈ C#4

Template 40 G#3 hj D#3 30 A#2 F2 20 C2 G1 10 D1 Source activations

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale. Figure: Schema of audio stream decomposition. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Audio stream decomposition

Goal: obtain in real-time the activations of the Audio stream decomposition (on-line) sources present in the auditory scene. Auditory scene Method: employ non-negative decomposition to project the audio stream onto W. Example: chromatic scale on the piano. Short-time sound representation Encoding coefficients

A#7 90 v F7 j C7 80 G6 D6 70 A5 E5 60 Non-negative decomposition B4 vj Whj F#4 50 ≈ C#4

Template 40 G#3 hj D#3 30 A#2 F2 20 C2 G1 10 D1 Source activations

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s) Figure: Activations for a chromatic scale. Figure: Schema of Two approaches investigated to control the audio stream decomposition. decomposition: sparsity, beta-divergence. [email protected] October 22nd 2010 RIKEN BSI Seminar 14/42 kxk card {i : x 6= 0} sp(x) = 0 = i n n

kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n

P b  tanh |axi | sp(x) = i with a > 0 et b 1 n >

p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?

Definition. A vector x is sparse when its energy is concentrated in a few coefficients.

Sparsity measures [Karvanen & Cichocki, 2003]:

[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n

P b  tanh |axi | sp(x) = i with a > 0 et b 1 n >

p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?

Definition. A vector x is sparse when its energy is concentrated in a few coefficients.

Sparsity measures [Karvanen & Cichocki, 2003]:

kxk card {i : x 6= 0} sp(x) = 0 = i n n

[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 P b  tanh |axi | sp(x) = i with a > 0 et b 1 n >

p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?

Definition. A vector x is sparse when its energy is concentrated in a few coefficients.

Sparsity measures [Karvanen & Cichocki, 2003]:

kxk card {i : x 6= 0} sp(x) = 0 = i n n

kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n

[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?

Definition. A vector x is sparse when its energy is concentrated in a few coefficients.

Sparsity measures [Karvanen & Cichocki, 2003]:

kxk card {i : x 6= 0} sp(x) = 0 = i n n

kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n

P b  tanh |axi | sp(x) = i with a > 0 et b 1 n >

[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is sparsity?

Definition. A vector x is sparse when its energy is concentrated in a few coefficients.

Sparsity measures [Karvanen & Cichocki, 2003]:

kxk card {i : x 6= 0} sp(x) = 0 = i n n

kxk card {i : |x | ε} sp(x) = 0, ε = i > with ε > 0 n n

P b  tanh |axi | sp(x) = i with a > 0 et b 1 n >

p kxk P |x |p sp(x) = p = i i with 0 < p 1 n n 6

[email protected] October 22nd 2010 RIKEN BSI Seminar 15/42 Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?

Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007].

[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?

Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006].

[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?

Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002].

[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?

Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1

Figure: Projection onto the s-sparsity cone.

[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?

Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming Figure: Optimization between [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. the smin- and smax -sparsity cones.

[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to obtain sparse NMF?

Penalty and multiplicative updates [Eggert & Körner, 2004, Virtanen, 2007]. Penalty and alternate non-negative least-squares [Albright et al., 2006]. Penalty and projected gradient [Hoyer, 2002]. Constraint and projected gradient [Hoyer, 2004]. √ n − kxk /kxk sp(x) = √ 1 2 n − 1 Constraint and second-order cone programming [Heiler & Schnörr, 2005, Heiler & Schnörr, 2006]. Penalty and convex quadratic programming [Zdunek & Cichocki, 2008].

[email protected] October 22nd 2010 RIKEN BSI Seminar 16/42 Sparsity parameter: λ1 > 0. Regularization parameter: λ2 > 0. Constraint parameters: 0 6 smin < smax 6 1. Algorithm: Update h with a sequence of convex quadratic programs. Approximation of the sparsity cones with tangent planes. Figure: Approximation of the sparsity cones.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on convex quadratic programming

Problem. 1 λ Minimize kv − Whk2 + λ khk + 2 khk2 2 2 1 1 2 2 r subject to h ∈ R++, smin 6 sp(h) 6 smax

[email protected] October 22nd 2010 RIKEN BSI Seminar 17/42 Algorithm: Update h with a sequence of convex quadratic programs. Approximation of the sparsity cones with tangent planes. Figure: Approximation of the sparsity cones.

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on convex quadratic programming

Problem. 1 λ Minimize kv − Whk2 + λ khk + 2 khk2 2 2 1 1 2 2 r subject to h ∈ R++, smin 6 sp(h) 6 smax

Sparsity parameter: λ1 > 0. Regularization parameter: λ2 > 0. Constraint parameters: 0 6 smin < smax 6 1.

[email protected] October 22nd 2010 RIKEN BSI Seminar 17/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on convex quadratic programming

Problem. 1 λ Minimize kv − Whk2 + λ khk + 2 khk2 2 2 1 1 2 2 r subject to h ∈ R++, smin 6 sp(h) 6 smax

Sparsity parameter: λ1 > 0. Regularization parameter: λ2 > 0. Constraint parameters: 0 6 smin < smax 6 1. Algorithm: Update h with a sequence of convex quadratic programs. Approximation of the sparsity cones with tangent planes. Figure: Approximation of the sparsity cones.

[email protected] October 22nd 2010 RIKEN BSI Seminar 17/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7

F7 90 C7 G6 80 D6 A5 70 E5 60 B4 F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, λ1 = 0.

[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7

F7 90 C7 G6 80 D6 A5 70 E5 60 B4 F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, λ1 = 1.

[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7

F7 90 C7 G6 80 D6 A5 70 E5 60 B4 F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, λ1 = 5.

[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 F7 90 C7 G6 80 D6 70 A5 E5 60 B4

F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, λ1 = 10.

[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 80 F7 C7 G6 70 D6 A5 60 E5 B4 50 F#4 C#4 40 Template G#3 D#3 30 A#2 F2 20 C2

G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, λ1 = 50.

[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 F7 60 C7 G6 D6 50 A5 E5 40 B4 F#4 C#4 30 Template G#3 D#3 A#2 20 F2 C2 10 G1 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, λ1 = 100.

[email protected] October 22nd 2010 RIKEN BSI Seminar 18/42 Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y. β Scaling property: dβ (λx|λy) = λ dβ (x|y).

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?

Definition [Eguchi & Kano, 2001].

Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)

[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y. β Scaling property: dβ (λx|λy) = λ dβ (x|y).

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?

Definition [Eguchi & Kano, 2001].

Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)

Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2

[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 β Scaling property: dβ (λx|λy) = λ dβ (x|y).

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?

Definition [Eguchi & Kano, 2001].

Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)

Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y.

[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What is the beta-divergence?

Definition [Eguchi & Kano, 2001].

Let β ∈ R and x, y ∈ R++, the β-divergence from x to y is defined by: 1 d (x|y) = x β + (β − 1)y β − βxy β−1 β β(β − 1)

Particular cases: x x Itakura-Saito divergence: d (x|y) = − log − 1. β=0 y y x Kullback-Leibler divergence: d (x|y) = x log + y − x. β=1 y 1 Euclidean distance: d (x|y) = (x − y)2. β=2 2 Generalized distance: dβ (x|y) > 0 and dβ (x|y) = 0 iff x = y. β Scaling property: dβ (λx|λy) = λ dβ (x|y).

[email protected] October 22nd 2010 RIKEN BSI Seminar 19/42 Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT

Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].

Employed in audio for the particular case dIS [Févotte et al., 2009].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?

NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++

[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].

Employed in audio for the particular case dIS [Févotte et al., 2009].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?

NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++

Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT

[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].

Employed in audio for the particular case dIS [Févotte et al., 2009].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?

NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++

Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT

Employed to interpolate between dE and dKL [Kompass, 2007].

[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Employed in audio for the particular case dIS [Févotte et al., 2009].

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?

NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++

Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT

Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].

[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion How to use the beta-divergence in NMF?

NMF problem with the beta-divergence. X Minimize Dβ (V|WH) = dβ (vij | [WH]ij ) i, j n×r r×m subject to W ∈ R++ , H ∈ R++

Algorithms [Cichocki et al., 2009] : Multiplicative updates: WT (WH).β−2 ⊗ V (WH).β−2 ⊗ VHT H ← H ⊗ W ← W ⊗ WT (WH).β−1 (WH).β−1HT

Employed to interpolate between dE and dKL [Kompass, 2007]. Employed in audio [O’Grady & Pearlmutter, 2008, Bertin et al., 2009, Bertin et al., 2010, Vincent et al., 2010].

Employed in audio for the particular case dIS [Févotte et al., 2009].

[email protected] October 22nd 2010 RIKEN BSI Seminar 20/42 Decomposition parameter: β ∈ R. Algorithm: 1 Initialize h with positive values. 2 Update h until convergence: WT (Wh).β−2 ⊗ v h ← h ⊗ WT (Wh).β−1 Updates tailored to real-time:

WT ⊗ (evT )(Wh).β−2 h ← h ⊗ WT (Wh).β−1

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on multiplicative updates

Problem.

r Minimize Dβ (v|Wh) subject to h ∈ R++

[email protected] October 22nd 2010 RIKEN BSI Seminar 21/42 Algorithm: 1 Initialize h with positive values. 2 Update h until convergence: WT (Wh).β−2 ⊗ v h ← h ⊗ WT (Wh).β−1 Updates tailored to real-time:

WT ⊗ (evT )(Wh).β−2 h ← h ⊗ WT (Wh).β−1

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on multiplicative updates

Problem.

r Minimize Dβ (v|Wh) subject to h ∈ R++

Decomposition parameter: β ∈ R.

[email protected] October 22nd 2010 RIKEN BSI Seminar 21/42 Updates tailored to real-time:

WT ⊗ (evT )(Wh).β−2 h ← h ⊗ WT (Wh).β−1

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on multiplicative updates

Problem.

r Minimize Dβ (v|Wh) subject to h ∈ R++

Decomposition parameter: β ∈ R. Algorithm: 1 Initialize h with positive values. 2 Update h until convergence: WT (Wh).β−2 ⊗ v h ← h ⊗ WT (Wh).β−1

[email protected] October 22nd 2010 RIKEN BSI Seminar 21/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Proposed approach based on multiplicative updates

Problem.

r Minimize Dβ (v|Wh) subject to h ∈ R++

Decomposition parameter: β ∈ R. Algorithm: 1 Initialize h with positive values. 2 Update h until convergence: WT (Wh).β−2 ⊗ v h ← h ⊗ WT (Wh).β−1 Updates tailored to real-time:

WT ⊗ (evT )(Wh).β−2 h ← h ⊗ WT (Wh).β−1

[email protected] October 22nd 2010 RIKEN BSI Seminar 21/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7

F7 90 C7 G6 80 D6 70 A5 E5 60 B4

F#4 50 C#4 Template G#3 40 D#3 A#2 30 F2 20 C2 G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, β = 3.

[email protected] October 22nd 2010 RIKEN BSI Seminar 22/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 90 F7 C7 80 G6 D6 70 A5 E5 60 B4 F#4 50 C#4

Template 40 G#3 D#3 30 A#2 F2 20 C2 G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, β = 2.

[email protected] October 22nd 2010 RIKEN BSI Seminar 22/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 90 F7 C7 80 G6 D6 70 A5 60 E5 B4 50 F#4 C#4

Template 40 G#3 D#3 30 A#2 F2 20 C2

G1 10 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, β = 1.5.

[email protected] October 22nd 2010 RIKEN BSI Seminar 22/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7

F7 80 C7

G6 70 D6

A5 60 E5

B4 50 F#4

C#4 40 Template G#3

D#3 30 A#2

F2 20 C2

G1 10 D1 A0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, β = 1.

[email protected] October 22nd 2010 RIKEN BSI Seminar 22/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 F7 100 C7 G6 D6 80 A5 E5 B4 F#4 60 C#4 Template G#3 D#3 40 A#2 F2 C2 20 G1 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, β = 0.5.

[email protected] October 22nd 2010 RIKEN BSI Seminar 22/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Illustrative example

Encoding coefficients

A#7 350 F7 C7 300 G6 D6 A5 250 E5 B4 200 F#4 C#4 Template G#3 150 D#3

A#2 100 F2 C2 50 G1 D1

A0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Figure: Activations for a chromatic scale, β = 0.

[email protected] October 22nd 2010 RIKEN BSI Seminar 22/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Drum transcription

Hi−hat Hi−hat Hi−hat Hi−hat Hi−hat 4 4 4 4 4 2 2 2 2 2 0 0 0 0 0 Kick Kick Kick Kick Kick 100 100 100 100 100 50 50 50 50 50 0 0 0 0 0 Snare Snare Snare Snare Snare 100 100 100 100 100 50 50 50 50 50 0 0 0 0 0 Tom Tom Tom Tom Tom 8 8 8 8 8 4 4 4 4 4 0 0 0 0 0 0 10 21 32 43 54 5 0 10 21 32 43 54 5 Time (s)Time (s) Time (s)Time (s) Sparsity 1

0.9 0 1 2 3 4 5 Time (s) (a) MU. (b) PGC. (c) SCQP.

Figure: Decomposition of a drum loop.

[email protected] October 22nd 2010 RIKEN BSI Seminar 23/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Environmental sound detection in complex scenes

Bell Bell Bell Bell Bell 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 Dog Dog Dog Dog Dog 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 Door Door Door Door Door 100 100 100 100 100 50 50 50 50 50 0 0 0 0 0 Glass Glass Glass Glass Glass 20 20 20 20 20 10 10 10 10 10 0 0 0 0 0 Car horn Car horn Car horn Car horn Car horn 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 Pan Pan Pan Pan Pan 40 40 40 40 40 20 20 20 20 20 0 0 0 0 0 Razor Razor Razor Razor Razor 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0 0 50 105 1510 015 50 105 1510 15 Time (s) Time (s) Time (s) Time (s) Sparsity 1 0.9 0.8 0 5 10 15 Time (s) (a) MU. (b) PGC. (c) SCQP. Figure: Decomposition of a complex environmental scene. [email protected] October 22nd 2010 RIKEN BSI Seminar 24/42 Objective evaluation: recorded piano music.

Alg. P1 R1 F1 A1 M1 P2 R2 F2 A2 SCQP 64.1 56.2 59.9 42.7 53.7 25.4 22.3 23.7 13.5 BMU 75.5 67.1 71.1 55.1 56.7 30.0 26.6 28.2 16.4 MU 57.9 58.2 58.1 40.9 53.9 21.4 21.6 21.5 12.0 Hoyer 57.2 56.3 56.8 39.6 54.1 21.0 20.7 20.8 11.6 Vincent 58.1 73.7 65.0 48.1 57.7 20.7 26.3 23.2 13.1 Yeh 33.0 58.8 42.3 26.8 55.1 11.6 20.7 14.9 8.0

Table: Results of the evaluation for recorded piano music.

International evaluation: 2nd rank at MIREX 2010 (against off-line systems) for note-level transcription of polyphonic music (not only piano).

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Polyphonic music transcription

Subjective evaluation: music synthesized with real piano samples (demo).

[email protected] October 22nd 2010 RIKEN BSI Seminar 25/42 International evaluation: 2nd rank at MIREX 2010 (against off-line systems) for note-level transcription of polyphonic music (not only piano).

Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Polyphonic music transcription

Subjective evaluation: music synthesized with real piano samples (demo). Objective evaluation: recorded piano music.

Alg. P1 R1 F1 A1 M1 P2 R2 F2 A2 SCQP 64.1 56.2 59.9 42.7 53.7 25.4 22.3 23.7 13.5 BMU 75.5 67.1 71.1 55.1 56.7 30.0 26.6 28.2 16.4 MU 57.9 58.2 58.1 40.9 53.9 21.4 21.6 21.5 12.0 Hoyer 57.2 56.3 56.8 39.6 54.1 21.0 20.7 20.8 11.6 Vincent 58.1 73.7 65.0 48.1 57.7 20.7 26.3 23.2 13.1 Yeh 33.0 58.8 42.3 26.8 55.1 11.6 20.7 14.9 8.0

Table: Results of the evaluation for recorded piano music.

[email protected] October 22nd 2010 RIKEN BSI Seminar 25/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion Polyphonic music transcription

Subjective evaluation: music synthesized with real piano samples (demo). Objective evaluation: recorded piano music.

Alg. P1 R1 F1 A1 M1 P2 R2 F2 A2 SCQP 64.1 56.2 59.9 42.7 53.7 25.4 22.3 23.7 13.5 BMU 75.5 67.1 71.1 55.1 56.7 30.0 26.6 28.2 16.4 MU 57.9 58.2 58.1 40.9 53.9 21.4 21.6 21.5 12.0 Hoyer 57.2 56.3 56.8 39.6 54.1 21.0 20.7 20.8 11.6 Vincent 58.1 73.7 65.0 48.1 57.7 20.7 26.3 23.2 13.1 Yeh 33.0 58.8 42.3 26.8 55.1 11.6 20.7 14.9 8.0

Table: Results of the evaluation for recorded piano music.

International evaluation: 2nd rank at MIREX 2010 (against off-line systems) for note-level transcription of polyphonic music (not only piano).

[email protected] October 22nd 2010 RIKEN BSI Seminar 25/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What we (don’t) have

Summary and perspectives. Representations. Audio decomposition. Template learning. Temporality of events.

[email protected] October 22nd 2010 RIKEN BSI Seminar 26/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What we (don’t) have

Summary and perspectives. Representations. Audio decomposition. Template learning. Temporality of events.

Many possibilities. Complex representations for V and W. Tensors for multichannel information [Cichocki et al., 2009].

[email protected] October 22nd 2010 RIKEN BSI Seminar 26/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What we (don’t) have

Summary and perspectives. Representations. Audio decomposition. Template learning. Temporality of events.

Euclidean geometry and sparsity. More general geometries and beta-divergence. Bayesian interpretation [Févotte et al., 2009, Bertin et al., 2010].

[email protected] October 22nd 2010 RIKEN BSI Seminar 26/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What we (don’t) have

Summary and perspectives. Representations. Audio decomposition. Template learning. Temporality of events.

Rank-one standard NMF. Extensions of standard NMF. Harmonicity constraints [Bertin et al., 2010, Vincent et al., 2010].

[email protected] October 22nd 2010 RIKEN BSI Seminar 26/42 Background Introduction Proposed system for real-time recognition of multiple sources Non-negative matrix factorization Sparsity and non-negative decomposition Information geometry Beta-divergence and non-negative decomposition Conclusion Results Discussion What we (don’t) have

Summary and perspectives. Representations. Audio decomposition. Template learning. Temporality of events.

Extended NMF model [Smaragdis, 2004]. State representation. Non-negative HMM [Mysore, 2010].

[email protected] October 22nd 2010 RIKEN BSI Seminar 26/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Outline

1 Introduction

2 Non-negative matrix factorization

3 Information geometry Background Proposed system for real-time mining of audio data streams Results Other applications Discussion

4 Conclusion

[email protected] October 22nd 2010 RIKEN BSI Seminar 27/42 1  (x − µ)2  Example: p(x; ξ) = √ exp − with ξ = [µ, σ2]. 2πσ2 2σ2

Fisher information metric [Rao, 1945, Chentsov, 1982]. Under certain assumptions, the Fisher information matrix defines the unique Z Riemannian metric g on S: gij (ξ) = ∂i log p(x; ξ) · ∂j log p(x; ξ) · p(x; ξ) · dx.

Dual affine connections [Chentsov, 1982, Amari & Nagaoka, 2000]. Under certain assumptions, there is a family of dual affine connections {∇(α), ∇(−α)} on (S, g) called α-connections. α∈R

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What is IG?

Statistical differentiable manifold. Under certain assumptions, a statistical model forms a differentiable manifold: 1 n S = {pξ = p(x; ξ): ξ = [ξ , . . . , ξ ] ∈ Ξ}.

[email protected] October 22nd 2010 RIKEN BSI Seminar 28/42 Fisher information metric [Rao, 1945, Chentsov, 1982]. Under certain assumptions, the Fisher information matrix defines the unique Z Riemannian metric g on S: gij (ξ) = ∂i log p(x; ξ) · ∂j log p(x; ξ) · p(x; ξ) · dx.

Dual affine connections [Chentsov, 1982, Amari & Nagaoka, 2000]. Under certain assumptions, there is a family of dual affine connections {∇(α), ∇(−α)} on (S, g) called α-connections. α∈R

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What is IG?

Statistical differentiable manifold. Under certain assumptions, a statistical model forms a differentiable manifold: 1 n S = {pξ = p(x; ξ): ξ = [ξ , . . . , ξ ] ∈ Ξ}.

1  (x − µ)2  Example: p(x; ξ) = √ exp − with ξ = [µ, σ2]. 2πσ2 2σ2

[email protected] October 22nd 2010 RIKEN BSI Seminar 28/42 Dual affine connections [Chentsov, 1982, Amari & Nagaoka, 2000]. Under certain assumptions, there is a family of dual affine connections {∇(α), ∇(−α)} on (S, g) called α-connections. α∈R

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What is IG?

Statistical differentiable manifold. Under certain assumptions, a statistical model forms a differentiable manifold: 1 n S = {pξ = p(x; ξ): ξ = [ξ , . . . , ξ ] ∈ Ξ}.

1  (x − µ)2  Example: p(x; ξ) = √ exp − with ξ = [µ, σ2]. 2πσ2 2σ2

Fisher information metric [Rao, 1945, Chentsov, 1982]. Under certain assumptions, the Fisher information matrix defines the unique Z Riemannian metric g on S: gij (ξ) = ∂i log p(x; ξ) · ∂j log p(x; ξ) · p(x; ξ) · dx.

[email protected] October 22nd 2010 RIKEN BSI Seminar 28/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What is IG?

Statistical differentiable manifold. Under certain assumptions, a statistical model forms a differentiable manifold: 1 n S = {pξ = p(x; ξ): ξ = [ξ , . . . , ξ ] ∈ Ξ}.

1  (x − µ)2  Example: p(x; ξ) = √ exp − with ξ = [µ, σ2]. 2πσ2 2σ2

Fisher information metric [Rao, 1945, Chentsov, 1982]. Under certain assumptions, the Fisher information matrix defines the unique Z Riemannian metric g on S: gij (ξ) = ∂i log p(x; ξ) · ∂j log p(x; ξ) · p(x; ξ) · dx.

Dual affine connections [Chentsov, 1982, Amari & Nagaoka, 2000]. Under certain assumptions, there is a family of dual affine connections {∇(α), ∇(−α)} on (S, g) called α-connections. α∈R

[email protected] October 22nd 2010 RIKEN BSI Seminar 28/42 Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θ T (x) − F (θ) + C(x) where F is a strictly convex function.

θ: natural parameters. F (θ): log-normalizer. C(x): carrier measure. T (x): sufficient statistic.

[email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θAT taxonomy(x) − F (θ) + ofC probability(x) where F measuresis a strictly convex function.

Probability measure

Parametric Non-parametric

Exponential families Non-exponential families

Univariate Multivariate Uniform Cauchy L´evyskew α-stable

uniparameter Bi-parameter multi-parameter

Beta β Gamma Γ Binomial Multinomial Dirichlet Weibull

Bernoulli Poisson

Exponential Rayleigh Gaussian

Figure: A taxonomy of exponential families.

c 2009, Frank Nielsen — p. 62/129 [email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 Centroids and hard clustering (k-means). Parameter estimation and soft clustering (expectation-maximization). Ball trees and search queries (nearest-neighbors search, range search).

Generic algorithms that handle many generalized distances (demo) [Banerjee et al., 2005, Cayton, 2008, Cayton, 2009, Nielsen & Nock, 2009, Nielsen et al., 2009, Garcia et al., 2009]:

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θ T (x) − F (θ) + C(x) where F is a strictly convex function.

Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Links with dually flat spaces, Legendre transform and expectation parameters [Amari & Nagaoka, 2000, Banerjee et al., 2005]:

DKL(p k q) = BF (θq k θp) = BG (ηp k ηq)

[email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 Centroids and hard clustering (k-means). Parameter estimation and soft clustering (expectation-maximization). Ball trees and search queries (nearest-neighbors search, range search).

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θ T (x) − F (θ) + C(x) where F is a strictly convex function.

Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Links with dually flat spaces, Legendre transform and expectation parameters [Amari & Nagaoka, 2000, Banerjee et al., 2005]:

DKL(p k q) = BF (θq k θp) = BG (ηp k ηq) Generic algorithms that handle many generalized distances (demo) [Banerjee et al., 2005, Cayton, 2008, Cayton, 2009, Nielsen & Nock, 2009, Nielsen et al., 2009, Garcia et al., 2009]:

[email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 Parameter estimation and soft clustering (expectation-maximization). Ball trees and search queries (nearest-neighbors search, range search).

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θ T (x) − F (θ) + C(x) where F is a strictly convex function.

Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Links with dually flat spaces, Legendre transform and expectation parameters [Amari & Nagaoka, 2000, Banerjee et al., 2005]:

DKL(p k q) = BF (θq k θp) = BG (ηp k ηq) Generic algorithms that handle many generalized distances (demo) [Banerjee et al., 2005, Cayton, 2008, Cayton, 2009, Nielsen & Nock, 2009, Nielsen et al., 2009, Garcia et al., 2009]: Centroids and hard clustering (k-means).

[email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 Ball trees and search queries (nearest-neighbors search, range search).

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θ T (x) − F (θ) + C(x) where F is a strictly convex function.

Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Links with dually flat spaces, Legendre transform and expectation parameters [Amari & Nagaoka, 2000, Banerjee et al., 2005]:

DKL(p k q) = BF (θq k θp) = BG (ηp k ηq) Generic algorithms that handle many generalized distances (demo) [Banerjee et al., 2005, Cayton, 2008, Cayton, 2009, Nielsen & Nock, 2009, Nielsen et al., 2009, Garcia et al., 2009]: Centroids and hard clustering (k-means). Parameter estimation and soft clustering (expectation-maximization).

[email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to use IG from a computational viewpoint?

Exponential family. T  pθ(x) = exp θ T (x) − F (θ) + C(x) where F is a strictly convex function.

Bregman divergence. T BF (θ1, θ2) = F (θ1) − F (θ2) − (θ1 − θ2) ∇F (θ2) where F is a strictly convex differentiable function.

Links with dually flat spaces, Legendre transform and expectation parameters [Amari & Nagaoka, 2000, Banerjee et al., 2005]:

DKL(p k q) = BF (θq k θp) = BG (ηp k ηq) Generic algorithms that handle many generalized distances (demo) [Banerjee et al., 2005, Cayton, 2008, Cayton, 2009, Nielsen & Nock, 2009, Nielsen et al., 2009, Garcia et al., 2009]: Centroids and hard clustering (k-means). Parameter estimation and soft clustering (expectation-maximization). Ball trees and search queries (nearest-neighbors search, range search). [email protected] October 22nd 2010 RIKEN BSI Seminar 29/42 In particular, it allows to define the notion of similarity in an information setup through divergences. Potential applications: Audio content analysis. Segmentation of audio streams. Automatic structure discovery of audio signals. Sound processing and synthesis.

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to design real-time systems for audio based on IG?

Scheme: 1 Represent the incoming audio stream with short-time sound descriptors dj . 2 Model these descriptors as probability distributions pθj from a given exponential family. 3 Use the framework of computational information geometry on these distributions.

[email protected] October 22nd 2010 RIKEN BSI Seminar 30/42 Potential applications: Audio content analysis. Segmentation of audio streams. Automatic structure discovery of audio signals. Sound processing and synthesis.

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to design real-time systems for audio based on IG?

Scheme: 1 Represent the incoming audio stream with short-time sound descriptors dj . 2 Model these descriptors as probability distributions pθj from a given exponential family. 3 Use the framework of computational information geometry on these distributions. In particular, it allows to define the notion of similarity in an information setup through divergences.

[email protected] October 22nd 2010 RIKEN BSI Seminar 30/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion How to design real-time systems for audio based on IG?

Scheme: 1 Represent the incoming audio stream with short-time sound descriptors dj . 2 Model these descriptors as probability distributions pθj from a given exponential family. 3 Use the framework of computational information geometry on these distributions. In particular, it allows to define the notion of similarity in an information setup through divergences. Potential applications: Audio content analysis. Segmentation of audio streams. Automatic structure discovery of audio signals. Sound processing and synthesis.

[email protected] October 22nd 2010 RIKEN BSI Seminar 30/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion General architecture

Audio stream decomposition (on-line)

Auditory scene

Short-time sound representation

dj

Sound descriptors modeling

pθj

Temporal modeling

Figure: Schema of the general architecture of the system.

[email protected] October 22nd 2010 RIKEN BSI Seminar 31/42 Modeling with a probability distribution pθj from an exponential family: Categorical distributions. Many other possibilities.

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Sound descriptors modeling

Computation of a sound descriptor dj : Fourier or constant-Q transforms for information on the spectral content. Mel-frequency cepstral coefficients for information on the timbre. Many other possibilities.

Figure: Sound descriptors modeling.

[email protected] October 22nd 2010 RIKEN BSI Seminar 32/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Sound descriptors modeling

Computation of a sound descriptor dj : Fourier or constant-Q transforms for information on the spectral content. Mel-frequency cepstral coefficients for information on the timbre. Many other possibilities.

Modeling with a probability distribution pθj from an exponential family: Categorical distributions. Many other possibilities.

Figure: Sound descriptors modeling.

[email protected] October 22nd 2010 RIKEN BSI Seminar 32/42 Factor oracle: from symbol to syntax (and from genetics to music!). Forward transitions: original sequence factors. Backward links: suffix relations, common context.

Figure: Factor oracle of the word abbbaab.

Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Temporal information modeling

Model formation: from signal to symbol. Assumption of quasi-stationary audio chunks. Change detection adapted from CuSum [Basseville & Nikiforov, 1993].

Figure: Model formation at time t.

[email protected] October 22nd 2010 RIKEN BSI Seminar 33/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Temporal information modeling

Model formation: from signal to symbol. Assumption of quasi-stationary audio chunks. Change detection adapted from CuSum [Basseville & Nikiforov, 1993].

Figure: Model formation at time t. Factor oracle: from symbol to syntax (and from genetics to music!). Forward transitions: original sequence factors. Backward links: suffix relations, common context.

Figure: Factor oracle of the word abbbaab. [email protected] October 22nd 2010 RIKEN BSI Seminar 33/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Audio segmentation

Figure: Segmentation of the 1st Piano Sonate, 1st Movement, 1st Theme, Beethoven.

[email protected] October 22nd 2010 RIKEN BSI Seminar 34/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Music similarity analysis

Figure: Similarity analysis of the 1st Piano Sonate, 3rd Movement, Beethoven.

[email protected] October 22nd 2010 RIKEN BSI Seminar 35/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Musical structure discovery

Figure: Structure discovery of the 1st Piano Sonate, 3rd Movement, Beethoven.

[email protected] October 22nd 2010 RIKEN BSI Seminar 36/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Query by similarity

Figure: Query by similarity of the 1st Theme over the entire 1st Piano Sonate, 1st Movement, Beethoven.

[email protected] October 22nd 2010 RIKEN BSI Seminar 37/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Audio recombination by concatenative synthesis

Figure: Audio recombination of African drums by concatenative synthesis of congas.

[email protected] October 22nd 2010 RIKEN BSI Seminar 38/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion Computer-assisted improvisation

Figure: Computer-assisted improvisation, Fabrizio Cassol and Philippe Leclerc.

[email protected] October 22nd 2010 RIKEN BSI Seminar 39/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What we (don’t) have

Summary and perspectives. Representations. Descriptors modeling. Temporal modeling. Temporality of events.

[email protected] October 22nd 2010 RIKEN BSI Seminar 40/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What we (don’t) have

Summary and perspectives. Representations. Descriptors modeling. Temporal modeling. Temporality of events.

Many possibilities. Combinations of descriptors. Complex representations.

[email protected] October 22nd 2010 RIKEN BSI Seminar 40/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What we (don’t) have

Summary and perspectives. Representations. Descriptors modeling. Temporal modeling. Temporality of events.

Exponential families and Bregman divergences. Mixture models of a given exponential family. Other geometries, divergences, metrics.

[email protected] October 22nd 2010 RIKEN BSI Seminar 40/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What we (don’t) have

Summary and perspectives. Representations. Descriptors modeling. Temporal modeling. Temporality of events.

On-line segmentation and factor oracle. On-line clustering and equivalence between symbols. Overlap between symbols and other temporal models.

[email protected] October 22nd 2010 RIKEN BSI Seminar 40/42 Background Introduction Proposed system for real-time mining of audio data streams Non-negative matrix factorization Results Information geometry Other applications Conclusion Discussion What we (don’t) have

Summary and perspectives. Representations. Descriptors modeling. Temporal modeling. Temporality of events.

Assumption of quasi-stationarity. Non-stationarity modeling. Time series.

[email protected] October 22nd 2010 RIKEN BSI Seminar 40/42 Introduction Non-negative matrix factorization Information geometry Conclusion Outline

1 Introduction

2 Non-negative matrix factorization

3 Information geometry

4 Conclusion

[email protected] October 22nd 2010 RIKEN BSI Seminar 41/42 Resources on NMF: http://imtr.ircam.fr/imtr/Realtime_Transcription - A. Dessein, A. Cont, G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. Proc. of the 11th Int. Soc. for Music Information Retrieval Conf. Utrecht, Netherlands, Aug. 2010. - A. Dessein. Incremental multi-source recognition with non-negative matrix factorization. Master’s Thesis, Université Pierre et Marie Curie, 2009. - A. Cont, Realtime multiple pitch observation using sparse non-negative constraints. Proc. of the 7th Int. Symp. on Music Information Retrieval. Victoria, Canada, Oct. 2006. Resources on IG: http://imtr.ircam.fr/imtr/Music_Information_Geometry - A. Cont, S. Dubnov and G. Assayag. On the information geometry of audio streams with applications to similarity computing. IEEE Transactions on Audio, Speech and Language Processing. In press. - A. Cont. Modeling musical anticipation: From the time of music to the music of time. PhD thesis, Université Pierre et Marie Curie and University of California San Diego, Oct. 2008. National research group: IRCAM, Ecole Polytechnique, Thales, etc. Brillouin seminar: http://www.informationgeometry.org/Seminar/seminarBrillouin.html IGAIA 2012. Thank you very much for your attention! Questions?

Introduction Non-negative matrix factorization Information geometry Conclusion The story so far

Motivations: Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings.

[email protected] October 22nd 2010 RIKEN BSI Seminar 42/42 Resources on IG: http://imtr.ircam.fr/imtr/Music_Information_Geometry - A. Cont, S. Dubnov and G. Assayag. On the information geometry of audio streams with applications to similarity computing. IEEE Transactions on Audio, Speech and Language Processing. In press. - A. Cont. Modeling musical anticipation: From the time of music to the music of time. PhD thesis, Université Pierre et Marie Curie and University of California San Diego, Oct. 2008. National research group: IRCAM, Ecole Polytechnique, Thales, etc. Brillouin seminar: http://www.informationgeometry.org/Seminar/seminarBrillouin.html IGAIA 2012. Thank you very much for your attention! Questions?

Introduction Non-negative matrix factorization Information geometry Conclusion The story so far

Motivations: Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Resources on NMF: http://imtr.ircam.fr/imtr/Realtime_Transcription - A. Dessein, A. Cont, G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. Proc. of the 11th Int. Soc. for Music Information Retrieval Conf. Utrecht, Netherlands, Aug. 2010. - A. Dessein. Incremental multi-source recognition with non-negative matrix factorization. Master’s Thesis, Université Pierre et Marie Curie, 2009. - A. Cont, Realtime multiple pitch observation using sparse non-negative constraints. Proc. of the 7th Int. Symp. on Music Information Retrieval. Victoria, Canada, Oct. 2006.

[email protected] October 22nd 2010 RIKEN BSI Seminar 42/42 Thank you very much for your attention! Questions?

Introduction Non-negative matrix factorization Information geometry Conclusion The story so far

Motivations: Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Resources on NMF: http://imtr.ircam.fr/imtr/Realtime_Transcription - A. Dessein, A. Cont, G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. Proc. of the 11th Int. Soc. for Music Information Retrieval Conf. Utrecht, Netherlands, Aug. 2010. - A. Dessein. Incremental multi-source recognition with non-negative matrix factorization. Master’s Thesis, Université Pierre et Marie Curie, 2009. - A. Cont, Realtime multiple pitch observation using sparse non-negative constraints. Proc. of the 7th Int. Symp. on Music Information Retrieval. Victoria, Canada, Oct. 2006. Resources on IG: http://imtr.ircam.fr/imtr/Music_Information_Geometry - A. Cont, S. Dubnov and G. Assayag. On the information geometry of audio streams with applications to similarity computing. IEEE Transactions on Audio, Speech and Language Processing. In press. - A. Cont. Modeling musical anticipation: From the time of music to the music of time. PhD thesis, Université Pierre et Marie Curie and University of California San Diego, Oct. 2008. National research group: IRCAM, Ecole Polytechnique, Thales, etc. Brillouin seminar: http://www.informationgeometry.org/Seminar/seminarBrillouin.html IGAIA 2012.

[email protected] October 22nd 2010 RIKEN BSI Seminar 42/42 Introduction Non-negative matrix factorization Information geometry Conclusion The story so far

Motivations: Fill in the gap between signal and symbolic representations. Devise computational tools for complex real-time settings. Resources on NMF: http://imtr.ircam.fr/imtr/Realtime_Transcription - A. Dessein, A. Cont, G. Lemaitre. Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. Proc. of the 11th Int. Soc. for Music Information Retrieval Conf. Utrecht, Netherlands, Aug. 2010. - A. Dessein. Incremental multi-source recognition with non-negative matrix factorization. Master’s Thesis, Université Pierre et Marie Curie, 2009. - A. Cont, Realtime multiple pitch observation using sparse non-negative constraints. Proc. of the 7th Int. Symp. on Music Information Retrieval. Victoria, Canada, Oct. 2006. Resources on IG: http://imtr.ircam.fr/imtr/Music_Information_Geometry - A. Cont, S. Dubnov and G. Assayag. On the information geometry of audio streams with applications to similarity computing. IEEE Transactions on Audio, Speech and Language Processing. In press. - A. Cont. Modeling musical anticipation: From the time of music to the music of time. PhD thesis, Université Pierre et Marie Curie and University of California San Diego, Oct. 2008. National research group: IRCAM, Ecole Polytechnique, Thales, etc. Brillouin seminar: http://www.informationgeometry.org/Seminar/seminarBrillouin.html IGAIA 2012. Thank you very much for your attention! Questions?

[email protected] October 22nd 2010 RIKEN BSI Seminar 42/42 Introduction Non-negative matrix factorization Information geometry Conclusion BibliographyI

Abdallah, S. A. & Plumbley, M. D. (2004). Polyphonic music transcription by non-negative sparse coding of power spectra. In Proc. of the 5th Int. Conf. on Music Information Retrieval (pp. 318–325). Barcelona, Spain.

Albright, R., Cox, J., Duling, D., Langville, A. N., & Meyer, C. D. (2006). Algorithms, initializations, and convergence for the non negative matrix factorization. Technical report, NC State University.

Amari, S.-i. & Nagaoka, H. (2000). Methods of information geometry, volume 191 of Translations of Mathematical Monographs. American Mathematical Society.

Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.

Basseville, M. & Nikiforov, V. (1993). Detection of abrupt changes: Theory and application. Englewood Cliffs, NJ, USA: Prentice-Hall, Inc.

Berry, M. W., Browne, M., Langville, A., Pauca, V. P., & Plemmons, R. J. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1), 155–173.

Bertin, N., Badeau, R., & Vincent, E. (2010). Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Transactions on Audio, Speech and Language Processing, 18(3), 538–549.

[email protected] October 22nd 2010 RIKEN BSI Seminar 43/42 Introduction Non-negative matrix factorization Information geometry Conclusion BibliographyII

Bertin, N., Févotte, C., & Badeau, R. (2009). A tempering approach for Itakura-Saito non-negative matrix factorization. with application to music transcription. In Proc. of the IEEE 34th Int. Conf. on Acoustics, Speech and Signal Processing (pp. 1545–1548). Taipei, Taiwan.

Cayton, L. (2008). Fast nearest neighbor retrieval for Bregman divergences. In Proceedings of the 25th International Conference on Machine Learning, volume 307 Helsinki, Finland.

Cayton, L. (2009). Efficient Bregman range search. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in Neural Information Processing Systems, volume 22 (pp. 243–251). Curran Associates, Inc.

Cheng, C.-C., Hu, D. J., & Saul, L. K. (2008). Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation. In Proc. of the IEEE 33rd Int. Conf. on Acoustics, Speech and Signal Processing (pp. 2017–2020). Las Vegas, NV, USA.

Chentsov, N. N. (1982). Statistical decision rules and optimal inference, volume 53 of Translations of Mathematical Monographs. American Mathematical Society.

Cichocki, A., Zdunek, R., Phan, A. H., & Amari, S.-i. (2009). Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley-Blackwell.

Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. In Proc. of the 7th Int. Conf. on Music Information Retrieval Victoria, Canada. [email protected] October 22nd 2010 RIKEN BSI Seminar 44/42 Introduction Non-negative matrix factorization Information geometry Conclusion BibliographyIII

Cont, A., Dubnov, S., & Wessel, D. (2007). Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints. In Proc. of the 10th Int. Conf. on Digital Audio Effects Bordeaux, France.

Eggert, J. & Körner, E. (2004). Sparse coding and NMF. In Proc. of the IEEE Int. Joint Conf. on Neural Networks (pp. 2529–2533). Budapest, Hungary.

Eguchi, S. & Kano, Y. (2001). Robustifying maximum likelihood estimation. Technical report, Institute of Statistical Mathematics, Tokyo, Japan.

Févotte, C., Bertin, N., & Durrieu, J.-L. (2009). Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Computation, 21(3), 793–830.

Garcia, V., Nielsen, F., & Nock, R. (2009). Levels of details for Gaussian mixture models. In Proceedings of the 9th Asian Conference on Computer Vision, ACCV 2009 (pp. 514–525). Xi’an, China.

Heiler, M. & Schnörr, C. (2005). Learning non-negative sparse image codes by convex programming. In Proc. of the IEEE 10th Int. Conf. on Computer Vision Beijing, China.

Heiler, M. & Schnörr, C. (2006). Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. of Machine Learning Research, 7, 1385–1407.

[email protected] October 22nd 2010 RIKEN BSI Seminar 45/42 Introduction Non-negative matrix factorization Information geometry Conclusion BibliographyIV

Hoyer, P. O. (2002). Non-negative sparse coding. In Proc. of the IEEE 12th Workshop on Neural Networks for Signal Processing (pp. 557–565). Martigny, Switzerland.

Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. J. of Machine Learning Research, 5, 1457–1469.

Karvanen, J. & Cichocki, A. (2003). Measuring sparseness of noisy signals. In Proc. of the 4th Int. Symposium on Independent Component Analysis and Blind Signal Separation (pp. 125–130). Nara, Japan.

Kompass, R. (2007). A generalized divergence measure for nonnegative matrix factorization. Neural Computation, 19(3), 780–791.

Lee, D. D. & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.

Mysore, G. J. (2010). A Non-negative Framework for Joint Modeling of Spectral Structure and Temporal Dynamics in Sound Mixtures. PhD thesis, Stanford University, Palo Alto, CA, USA.

Niedermayer, B. (2008). Non-negative matrix division for the automatic transcription of polyphonic music. In Proc. of the 9th Int. Conf. on Music Information Retrieval (pp. 544–549). Philadelphia, PA, USA.

[email protected] October 22nd 2010 RIKEN BSI Seminar 46/42 Introduction Non-negative matrix factorization Information geometry Conclusion BibliographyV

Nielsen, F. & Nock, R. (2009). Sided and symmetrized Bregman centroids. IEEE Transactions on Information Theory, 55(6), 2882–2904.

Nielsen, F., Piro, P., & Barlaud, M. (2009). Tailored Bregman ball trees for effective nearest neighbors. In Proceedings of the 25th European Workshop on Computational Geometry (EuroCG) (pp. 29–32). Brussels, Belgium.

O’Grady, P. D. & Pearlmutter, B. A. (2008). Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing, 72, 88–101.

Raczyński, S. A., Ono, N., & Sagayama, S. (2007). Harmonic nonnegative matrix approximation for multipitch analysis of musical sounds. In Proc. of ASJ Autumn Meeting (pp. 827–830).

Rao, C. R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37, 81–91.

Sha, F. & Saul, L. K. (2005). Real-time pitch determination of one or more voices by nonnegative matrix factorization. In Advances in Neural Information Processing Systems: Proc. of the 2004 Conf., volume 17 (pp. 1233–1240). Cambridge, MA, USA.

[email protected] October 22nd 2010 RIKEN BSI Seminar 47/42 Introduction Non-negative matrix factorization Information geometry Conclusion BibliographyVI

Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Proc. of the 5th Int. Conf. on Independent Component Analysis and Blind Signal Separation, volume 3195 (pp. 494–499). Granada, Spain.

Smaragdis, P. & Brown, J. C. (2003). Non-negative matrix factorization for polyphonic music transcription. In Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 177–180). New Paltz, NY, USA.

Vincent, E., Bertin, N., & Badeau, R. (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Transactions on Audio, Speech and Language Processing, 18(3).

Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.

Virtanen, T. & Klapuri, A. (2006). Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop.

Zdunek, R. & Cichocki, A. (2008). Nonnegative matrix factorization with quadratic programming. Neurocomputing, 71(10–12), 2309–2320.

[email protected] October 22nd 2010 RIKEN BSI Seminar 48/42