Recommending Music with Waveform-based Architectures at Scale
ORIOL NIETO
SEMINAR SERIES IN DATA SCIENCE UNIVERSITY OF SAN FRANCISCO FEB 1, 2019
Pandora Confidential OUTLINE
Motivation: The Long Tail
Background: Collaborative Filtering
Music Recommendation
Demo OUTLINE
Motivation: The Long Tail
Background: Collaborative Filtering
Music Recommendation
Demo The Long Tail
Most Popular Tracks
0.01 %
Pandora Confidential The Long Tail
Most Popular Tracks
1 %
Pandora Confidential The Long Tail
Most Popular Tracks
0 spins last week
100 %
Pandora Confidential OUTLINE
Motivation: The Long Tail
Background: Collaborative Filtering
Music Recommendation
Demo
Collaborative Filtering RECOMMENDING POPULAR MUSIC ? ? [ ? ? ? ? Items (Tracks) ? ? [ ? ? Users
Collaborative Filtering
PROBLEM OVERVIEW
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.
Collaborative Filtering PROBLEM FORMULATION [ [ k
? ? [ Given Item i and User u: pu ? ? ? qi r?iu k Rating:
riu Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ k[ [ Users Item Latent Factor: qi R 2 Users k User Latent Factor: pu R 2 T Rating Approximation: rˆiu = qi pu T 2 2 2 argminq ,p (rui qi pu) + ( qi + pu ) ⇤ ⇤ u,i || || || || Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.X2S Collaborative Filtering LATENT FACTORS Complex Harmony
Calm Aggressive
Simple Harmony Collaborative Filtering THE GOOD AND THE BAD
Rich preference-driven similarity space Latent space is generally not interpretable Can only recommend items that Powerful at matching the right song have already been rated with the right listener (what about long tail content?) OUTLINE
Motivation: The Long Tail
Background: Collaborative Filtering
Music Recommendation
Demo
Music Recommendation
WITH COLLABORATIVE SONG FACTORS
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing Collaborative Filtering EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ou ield Your Love
Ranked 2 Eagles Hotel California
Ranked 3 Survivor Eye Of The Tiger
Ranked 4 Queen We Will Rock You The Music Genome Project LARGE-SCALE HUMAN ANNOTATED DATASET
Attribute Examples
Breathy Voice Nasal Voice Odd Meter Has Banjo Joyful Lyrics …
Up to ~400 attributes per track Music Genome Project EXAMPLE
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 Jefferson Starship Find Your Way Back
Ranked 3 David Bowie Teenage Wildlife
Ranked 4 Thriving Ivory On Your Side
Estimating Latent Factors
[ [ k
? ? [
? ? ? ? k
Items (Tracks) ? ?
⇡ Items (Tracks)
? ? [ [ [ Users
Users Estimating Latent Factors FROM THE MUSIC GENOME PROJECT
k
y
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg
f(x; ✓) y
Dense Layers Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y { } • ~900k tracks (from “head”) M
• Loss : (✓) L
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m Latent Factor Estimations WITH THE MUSIC GENOME PROJECT
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Journey Stone In Love
Ranked 2 D Drive A Li le Bi a Sunshine
Ranked 3 John Parr Naughty, Naughty
Ranked 4 Kiss Turn On The Night Machine Listening ESTIMATING THE MUSIC GENOME PROJECT
Pons, J., et al. ISMIR 2018 Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS
k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS N 2048 2048 k
Dense Layers Estimating Latent Factors RESULTS
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Dean Friedman Don’t You Ever Dare
Ranked 2 James Taylor Stand And Fight
Ranked 3 The Dingoes Star ng Today
Ranked 4 Chuck Girad The Days Are Young Estimating Latent Factors FROM AUDIO
k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg
Oord, A. Van Den, Dieleman, S., & Schrauwen, B. (2013). Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651. Estimating Latent Factors FROM RAW WAVEFORMS
k
y
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg
f(x; ✓) y
Conv1D Conv1D MP Conv1D MP Conv1D MP 64 x 3 64 x 3 3 64 x 3 3 128 x 3 3
y
Conv1D Conv1D Conv1D Conv1D 256 x 3 512 x 7 512 x 7 512 x 7 Dense Layers
Auto Pooling https://github.com/jordipons/music-audio-tagging-at-scale-models Lee, J., et al., 2018 McFee, B., et al., 2018 Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y • ~900k tracks (from “head”) { } • 16kHz - 16 bit waveforms
M
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m
Spectrogram (35s patches) 0.37 22 ~2h Estimating Latent Factors RESULTS
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m
Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Latent Factor Estimations WITH WAVEFORMS
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 The Ga es Band The Final Countdown
Ranked 2 Survivor Backstreet Love Affair
Ranked 3 Toto Angel Don’t Cry
Ranked 4 Orion The Hunter Dark And Stormy Estimating Latent Factors COMBINING MODELS
k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors LATE-FUSION ARCHITECTURE
2 Co Co M Co M Co M
1 1 1 k Co M Co M Co Co
Co Co Co Co
Pons, J., McFee, k N 4 4 k
http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-
Oramas, S., et al. 2018 Estimating Latent Factors LATE-FUSION ARCHITECTURE 2048 2048 k 2048 + 1024
Dense Layers
Oramas, S., et al. 2018 Estimating Latent Factors RESULTS
Input Cos Distance # Epochs Time / Epoch
MGP 0.30 15 ~4m
MGP Es ma ons 0.44 21 ~4m
Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Waveform + MGP Es ma ons 0.32 20 ~4m Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES
Ar st Title
Query Track Journey Don’t Stop Believing
Ranked 1 Patrick Simmons Knocking at your Door
Ranked 2 Night Ranger When You Close Your Eyes
Ranked 3 Prism Young And Restless
Ranked 4 The Front The Truth Hurts CONCLUSIONS
The severe length of the Long Tail in music catalogs is real
Collaborative filtering is powerful at recommending music from the “head”
Waveform-based architectures are effective at recommending undiscovered/new music
Pandora Confidential DEMO TIME
Pandora Confidential References
McFee, B., Salamon, J., Bello, J. P., Adaptive pooling operators for weakly labeled sound event detection, IEEE Transactions on Audio, Speech and Language Processing, 2018.
Koren, Y., Bell, R., & Volinsky, C., Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49, 2009.
Lee, J., Park, J, Kim, K. L., & Nam, J. SampleCNN: End-to-end Deep Convolutional neural networks using very small filters for music classification.Applied Sciences, 8(1):150, 2018.
Oord, A. Van Den, Dieleman, S., & Schrauwen, B., Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651, 2013.
Oramas, S., Barbieri, F., Nieto, O., Serra, X., Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval (TISMIR), 2018.
Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-End Learning for Music Audio Tagging at Scale. Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018.
Pandora Confidential