<<

Recommending Music with Waveform-based Architectures

ORIOL NIETO

GLOBAL BIG DATA CONFERENCE SANTA CLARA, CA JAN 21, 2019

@urinieto Pandora Confidential OUTLINE

Background: Collaborative Filtering

Music Recommendation

Demo OUTLINE

Background: Collaborative Filtering

Music Recommendation

Demo

Collaborative Filtering RECOMMENDING “POPULAR” ITEMS ? ? [ ? ? ? ? Items (Tracks) ? ? [ ? ? Users

Collaborative Filtering

PROBLEM OVERVIEW

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.

Collaborative Filtering

PROBLEM OVERVIEW

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.

Collaborative Filtering

PROBLEM OVERVIEW

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49. Collaborative Filtering LATENT FACTORS Complex Harmony

Calm Aggressive

Simple Harmony Collaborative Filtering THE GOOD AND THE BAD

Rich preference-driven similarity space Latent space is generally not interpretable Can only recommend items that Powerful at matching the right song have already been rated with the right listener (what about long tail content?) OUTLINE

Background: Collaborative Filtering

Music Recommendation

Demo

Music Recommendation

WITH COLLABORATIVE SONG FACTORS

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users

Music Recommendation

WITH COLLABORATIVE SONG FACTORS

[ [ k

? ? [

? ? ?

? Songs

Songs k ? ? ⇡

? ? [ [ [ Seeds

Seeds Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Ouield Your Love

Ranked 2 Eagles Hotel California

Ranked 3 Survivor Eye Of The Tiger

Ranked 4 Queen We Will Rock You Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Ouield Your Love

Ranked 2 Eagles Hotel California

Ranked 3 Survivor Eye Of The Tiger

Ranked 4 Queen We Will Rock You Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Ouield Your Love

Ranked 2 Eagles Hotel California

Ranked 3 Survivor Eye Of The Tiger

Ranked 4 Queen We Will Rock You Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Ouield Your Love

Ranked 2 Eagles Hotel California

Ranked 3 Survivor Eye Of The Tiger

Ranked 4 Queen We Will Rock You The Music Genome Project LARGE-SCALE HUMAN ANNOTATED DATASET

Attribute Examples

Breathy Voice Nasal Voice Odd Meter Has Banjo Joyful Lyrics …

Up to ~400 attributes per track Music Genome Project EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 Jefferson Starship Find Your Way Back

Ranked 3 David Bowie Teenage Wildlife

Ranked 4 Thriving Ivory On Your Side Music Genome Project EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 Jefferson Starship Find Your Way Back

Ranked 3 David Bowie Teenage Wildlife

Ranked 4 Thriving Ivory On Your Side Music Genome Project EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 Jefferson Starship Find Your Way Back

Ranked 3 David Bowie Teenage Wildlife

Ranked 4 Thriving Ivory On Your Side Music Genome Project EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 Jefferson Starship Find Your Way Back

Ranked 3 David Bowie Teenage Wildlife

Ranked 4 Thriving Ivory On Your Side

Estimating Latent Factors

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users Estimating Latent Factors FROM THE MUSIC GENOME PROJECT

k

y(null)

(null) x

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

f(x; ✓) y (null) ⇡ Estimating Latent Factors FROM THE MUSIC GENOME PROJECT N 2048

2048 2048 k

Dense Layers Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y { } • ~900k tracks (from “head”) M(null)

• Loss : (✓) L(null) • Cosine Distance • Dropout 10% in Dense Layers 1 f(x; ✓)T y • Batch Normalization in all layers (✓)=1 L M f(x; ✓) 2 y 2

(null) x ,y • Adam optimizer 2XX 2Y || || || || Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m Latent Factor Estimations WITH THE MUSIC GENOME PROJECT

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 D Drive A Lile Bia Sunshine

Ranked 3 John Parr Naughty, Naughty

Ranked 4 Kiss Turn On The Night Latent Factor Estimations WITH THE MUSIC GENOME PROJECT

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 D Drive A Lile Bia Sunshine

Ranked 3 John Parr Naughty, Naughty

Ranked 4 Kiss Turn On The Night Latent Factor Estimations WITH THE MUSIC GENOME PROJECT

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 D Drive A Lile Bia Sunshine

Ranked 3 John Parr Naughty, Naughty

Ranked 4 Kiss Turn On The Night Latent Factor Estimations WITH THE MUSIC GENOME PROJECT

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 D Drive A Lile Bia Sunshine

Ranked 3 John Parr Naughty, Naughty

Ranked 4 Kiss Turn On The Night Machine Listening ESTIMATING THE MUSIC GENOME PROJECT

Pons, J., et al. ISMIR 2018 Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS N 2048

2048 2048 k

Dense Layers Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Dean Friedman Don’t You Ever Dare

Ranked 2 James Taylor Stand And Fight

Ranked 3 The Dingoes Starng Today

Ranked 4 Chuck Girad The Days Are Young Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Dean Friedman Don’t You Ever Dare

Ranked 2 James Taylor Stand And Fight

Ranked 3 The Dingoes Starng Today

Ranked 4 Chuck Girad The Days Are Young Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Dean Friedman Don’t You Ever Dare

Ranked 2 James Taylor Stand And Fight

Ranked 3 The Dingoes Starng Today

Ranked 4 Chuck Girad The Days Are Young Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Dean Friedman Don’t You Ever Dare

Ranked 2 James Taylor Stand And Fight

Ranked 3 The Dingoes Starng Today

Ranked 4 Chuck Girad The Days Are Young Estimating Latent Factors FROM AUDIO

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

Oord, A. Van Den, Dieleman, S., & Schrauwen, B. (2013). Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651. Estimating Latent Factors FROM RAW WAVEFORMS

k

y(null)

(null) x

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

f(x; ✓) y (null) ⇡ Estimating Latent Factors FROM RAW WAVEFORMS

Conv1D Conv1D MP Conv1D MP Conv1D MP 64 x 3 64 x 3 3 64 x 3 3 128 x 3 3

(null) x k Conv1D MP Conv1D MP Conv1D Conv1D 1024 1024 1024 128 x 3 3 128 x 3 3 128 x 3 128 x 3

y(null)

Conv1D Conv1D Conv1D Conv1D 256 x 3 512 x 7 512 x 7 512 x 7 Dense Layers

Auto Pooling https://github.com/jordipons/music-audio-tagging-at-scale-models Lee, J., et al., 2018 McFee, B., et al., 2018 Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y • ~900k tracks (from “head”) { } • 16kHz - 16 bit waveforms

M(null) • 3 patches of 15 seconds per track (~2.7M patches) • Loss : (✓) L(null) • Cosine Distance • Dropout 10% in Dense Layers 1 f(x; ✓)T y • Batch Normalization in all layers (✓)=1 L M f(x; ✓) 2 y 2

(null) x ,y • Adam optimizer 2XX 2Y || || || || Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m

Spectrogram (35s patches) 0.37 22 ~2h Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m

Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Latent Factor Estimations WITH WAVEFORMS

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Gaes Band The Final Countdown

Ranked 2 Survivor Backstreet Love Affair

Ranked 3 Toto Angel Don’t Cry

Ranked 4 Orion The Hunter Dark And Stormy Latent Factor Estimations WITH WAVEFORMS

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Gaes Band The Final Countdown

Ranked 2 Survivor Backstreet Love Affair

Ranked 3 Toto Angel Don’t Cry

Ranked 4 Orion The Hunter Dark And Stormy Latent Factor Estimations WITH WAVEFORMS

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Gaes Band The Final Countdown

Ranked 2 Survivor Backstreet Love Affair

Ranked 3 Toto Angel Don’t Cry

Ranked 4 Orion The Hunter Dark And Stormy Latent Factor Estimations WITH WAVEFORMS

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Gaes Band The Final Countdown

Ranked 2 Survivor Backstreet Love Affair

Ranked 3 Toto Angel Don’t Cry

Ranked 4 Orion The Hunter Dark And Stormy Estimating Latent Factors COMBINING MODELS

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors LATE-FUSION ARCHITECTURE

2 Co Co M Co M Co M

1 1 1 k Co M Co M Co Co

Co Co Co Co

Pons, J., McFee, k N 4 4 k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-

Oramas, S., et al. 2018 Estimating Latent Factors LATE-FUSION ARCHITECTURE 2048 2048 2048 k 2048 + 1024

Dense Layers

Oramas, S., et al. 2018 Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m

Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Waveform + MGP Esmaons 0.32 20 ~4m Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Patrick Simmons Knocking at your Door

Ranked 2 Night Ranger When You Close Your Eyes

Ranked 3 Prism Young And Restless

Ranked 4 The Front The Truth Hurts Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Patrick Simmons Knocking at your Door

Ranked 2 Night Ranger When You Close Your Eyes

Ranked 3 Prism Young And Restless

Ranked 4 The Front The Truth Hurts Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Patrick Simmons Knocking at your Door

Ranked 2 Night Ranger When You Close Your Eyes

Ranked 3 Prism Young And Restless

Ranked 4 The Front The Truth Hurts Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Patrick Simmons Knocking at your Door

Ranked 2 Night Ranger When You Close Your Eyes

Ranked 3 Prism Young And Restless

Ranked 4 The Front The Truth Hurts CONCLUSIONS

Collaborative filtering is powerful at recommending popular music

Deep architectures are effective at recommending undiscovered/new music

Waveform architectures outperform Spectral ones when enough music data are available

Pandora Confidential DEMO TIME

Pandora Confidential References

McFee, B., Salamon, J., Bello, J. P., Adaptive pooling operators for weakly labeled sound event detection, IEEE Transactions on Audio, Speech and Language Processing, 2018.

Koren, Y., Bell, R., & Volinsky, C., Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49, 2009.

Lee, J., Park, J, Kim, K. L., & Nam, J. SampleCNN: End-to-end Deep Convolutional neural networks using very small filters for music classification.Applied Sciences, 8(1):150, 2018.

Oord, A. Van Den, Dieleman, S., & Schrauwen, B., Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651, 2013.

Oramas, S., Barbieri, F., Nieto, O., Serra, X., Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval (TISMIR), 2018.

Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-End Learning for Music Audio Tagging at Scale. Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018.

Pandora Confidential Thanks!

Collaborative filtering is powerful at recommending popular music

Deep architectures are effective at recommending undiscovered/new music

Waveform architectures outperform Spectral ones when enough music data are available

ORIOL NIETO

@urinieto Pandora Confidential