Recommending Music with Waveform-based Architectures
ORIOL NIETO
GLOBAL BIG DATA CONFERENCE SANTA CLARA, CA JAN 21, 2019
@urinieto Pandora Confidential OUTLINE
Background: Collaborative Filtering
Music Recommendation
Demo OUTLINE
Background: Collaborative Filtering
Music Recommendation
Demo
Collaborative Filtering RECOMMENDING “POPULAR” ITEMS ? ? [ ? ? ? ? Items (Tracks) ? ? [ ? ? Users
Collaborative Filtering
PROBLEM OVERVIEW
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.
Collaborative Filtering
PROBLEM OVERVIEW
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.
Collaborative Filtering
PROBLEM OVERVIEW
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49. Collaborative Filtering LATENT FACTORS Complex Harmony
Calm Aggressive
Simple Harmony Collaborative Filtering THE GOOD AND THE BAD
Rich preference-driven similarity space Latent space is generally not interpretable Can only recommend items that Powerful at matching the right song have already been rated with the right listener (what about long tail content?) OUTLINE
Background: Collaborative Filtering
Music Recommendation
Demo
Music Recommendation
WITH COLLABORATIVE SONG FACTORS
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users
Music Recommendation
WITH COLLABORATIVE SONG FACTORS
[ [ k
? ? [
? ? ?
? Songs
Songs k ? ? ⇡
? ? [ [ [ Seeds
Seeds Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ou ield Your Love
Ranked 2 Eagles Hotel California
Ranked 3 Survivor Eye Of The Tiger
Ranked 4 Queen We Will Rock You Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ou ield Your Love
Ranked 2 Eagles Hotel California
Ranked 3 Survivor Eye Of The Tiger
Ranked 4 Queen We Will Rock You Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ou ield Your Love
Ranked 2 Eagles Hotel California
Ranked 3 Survivor Eye Of The Tiger
Ranked 4 Queen We Will Rock You Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ou ield Your Love
Ranked 2 Eagles Hotel California
Ranked 3 Survivor Eye Of The Tiger
Ranked 4 Queen We Will Rock You The Music Genome Project LARGE-SCALE HUMAN ANNOTATED DATASET
Attribute Examples
Breathy Voice Nasal Voice Odd Meter Has Banjo Joyful Lyrics …
Up to ~400 attributes per track Music Genome Project EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 Jefferson Starship Find Your Way Back
Ranked 3 David Bowie Teenage Wildlife
Ranked 4 Thriving Ivory On Your Side Music Genome Project EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 Jefferson Starship Find Your Way Back
Ranked 3 David Bowie Teenage Wildlife
Ranked 4 Thriving Ivory On Your Side Music Genome Project EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 Jefferson Starship Find Your Way Back
Ranked 3 David Bowie Teenage Wildlife
Ranked 4 Thriving Ivory On Your Side Music Genome Project EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 Jefferson Starship Find Your Way Back
Ranked 3 David Bowie Teenage Wildlife
Ranked 4 Thriving Ivory On Your Side
Estimating Latent Factors
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users Estimating Latent Factors FROM THE MUSIC GENOME PROJECT
k
y
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg
f(x; ✓) y
2048 2048 k
Dense Layers Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y { } • ~900k tracks (from “head”) M
• Loss : (✓) L
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m Latent Factor Estimations WITH THE MUSIC GENOME PROJECT
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 D Drive A Li le Bi a Sunshine
Ranked 3 John Parr Naughty, Naughty
Ranked 4 Kiss Turn On The Night Latent Factor Estimations WITH THE MUSIC GENOME PROJECT
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 D Drive A Li le Bi a Sunshine
Ranked 3 John Parr Naughty, Naughty
Ranked 4 Kiss Turn On The Night Latent Factor Estimations WITH THE MUSIC GENOME PROJECT
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 D Drive A Li le Bi a Sunshine
Ranked 3 John Parr Naughty, Naughty
Ranked 4 Kiss Turn On The Night Latent Factor Estimations WITH THE MUSIC GENOME PROJECT
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 D Drive A Li le Bi a Sunshine
Ranked 3 John Parr Naughty, Naughty
Ranked 4 Kiss Turn On The Night Machine Listening ESTIMATING THE MUSIC GENOME PROJECT
Pons, J., et al. ISMIR 2018 Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS
k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS N 2048
2048 2048 k
Dense Layers Estimating Latent Factors RESULTS
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Dean Friedman Don’t You Ever Dare
Ranked 2 James Taylor Stand And Fight
Ranked 3 The Dingoes Star ng Today
Ranked 4 Chuck Girad The Days Are Young Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Dean Friedman Don’t You Ever Dare
Ranked 2 James Taylor Stand And Fight
Ranked 3 The Dingoes Star ng Today
Ranked 4 Chuck Girad The Days Are Young Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Dean Friedman Don’t You Ever Dare
Ranked 2 James Taylor Stand And Fight
Ranked 3 The Dingoes Star ng Today
Ranked 4 Chuck Girad The Days Are Young Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Dean Friedman Don’t You Ever Dare
Ranked 2 James Taylor Stand And Fight
Ranked 3 The Dingoes Star ng Today
Ranked 4 Chuck Girad The Days Are Young Estimating Latent Factors FROM AUDIO
k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg
Oord, A. Van Den, Dieleman, S., & Schrauwen, B. (2013). Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651. Estimating Latent Factors FROM RAW WAVEFORMS
k
y
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg
f(x; ✓) y
Conv1D Conv1D MP Conv1D MP Conv1D MP 64 x 3 64 x 3 3 64 x 3 3 128 x 3 3
y
Conv1D Conv1D Conv1D Conv1D 256 x 3 512 x 7 512 x 7 512 x 7 Dense Layers
Auto Pooling https://github.com/jordipons/music-audio-tagging-at-scale-models Lee, J., et al., 2018 McFee, B., et al., 2018 Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y • ~900k tracks (from “head”) { } • 16kHz - 16 bit waveforms
M
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m
Spectrogram (35s patches) 0.37 22 ~2h Estimating Latent Factors RESULTS
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m
Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Latent Factor Estimations WITH WAVEFORMS
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ga es Band The Final Countdown
Ranked 2 Survivor Backstreet Love Affair
Ranked 3 Toto Angel Don’t Cry
Ranked 4 Orion The Hunter Dark And Stormy Latent Factor Estimations WITH WAVEFORMS
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ga es Band The Final Countdown
Ranked 2 Survivor Backstreet Love Affair
Ranked 3 Toto Angel Don’t Cry
Ranked 4 Orion The Hunter Dark And Stormy Latent Factor Estimations WITH WAVEFORMS
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ga es Band The Final Countdown
Ranked 2 Survivor Backstreet Love Affair
Ranked 3 Toto Angel Don’t Cry
Ranked 4 Orion The Hunter Dark And Stormy Latent Factor Estimations WITH WAVEFORMS
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ga es Band The Final Countdown
Ranked 2 Survivor Backstreet Love Affair
Ranked 3 Toto Angel Don’t Cry
Ranked 4 Orion The Hunter Dark And Stormy Estimating Latent Factors COMBINING MODELS
k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors LATE-FUSION ARCHITECTURE
2 Co Co M Co M Co M
1 1 1 k Co M Co M Co Co
Co Co Co Co
Pons, J., McFee, k N 4 4 k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-
Oramas, S., et al. 2018 Estimating Latent Factors LATE-FUSION ARCHITECTURE 2048 2048 2048 k 2048 + 1024
Dense Layers
Oramas, S., et al. 2018 Estimating Latent Factors RESULTS
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m
Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Waveform + MGP Es ma ons 0.32 20 ~4m Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Patrick Simmons Knocking at your Door
Ranked 2 Night Ranger When You Close Your Eyes
Ranked 3 Prism Young And Restless
Ranked 4 The Front The Truth Hurts Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Patrick Simmons Knocking at your Door
Ranked 2 Night Ranger When You Close Your Eyes
Ranked 3 Prism Young And Restless
Ranked 4 The Front The Truth Hurts Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Patrick Simmons Knocking at your Door
Ranked 2 Night Ranger When You Close Your Eyes
Ranked 3 Prism Young And Restless
Ranked 4 The Front The Truth Hurts Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Patrick Simmons Knocking at your Door
Ranked 2 Night Ranger When You Close Your Eyes
Ranked 3 Prism Young And Restless
Ranked 4 The Front The Truth Hurts CONCLUSIONS
Collaborative filtering is powerful at recommending popular music
Deep architectures are effective at recommending undiscovered/new music
Waveform architectures outperform Spectral ones when enough music data are available
Pandora Confidential DEMO TIME
Pandora Confidential References
McFee, B., Salamon, J., Bello, J. P., Adaptive pooling operators for weakly labeled sound event detection, IEEE Transactions on Audio, Speech and Language Processing, 2018.
Koren, Y., Bell, R., & Volinsky, C., Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49, 2009.
Lee, J., Park, J, Kim, K. L., & Nam, J. SampleCNN: End-to-end Deep Convolutional neural networks using very small filters for music classification.Applied Sciences, 8(1):150, 2018.
Oord, A. Van Den, Dieleman, S., & Schrauwen, B., Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651, 2013.
Oramas, S., Barbieri, F., Nieto, O., Serra, X., Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval (TISMIR), 2018.
Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-End Learning for Music Audio Tagging at Scale. Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018.
Pandora Confidential Thanks!
Collaborative filtering is powerful at recommending popular music
Deep architectures are effective at recommending undiscovered/new music
Waveform architectures outperform Spectral ones when enough music data are available
ORIOL NIETO
@urinieto Pandora Confidential