Recommending with Waveform-based Architectures at Scale

ORIOL NIETO

SEMINAR SERIES IN DATA SCIENCE UNIVERSITY OF SAN FRANCISCO FEB 1, 2019

Pandora Confidential OUTLINE

Motivation: The

Background:

Music Recommendation

Demo OUTLINE

Motivation: The Long Tail

Background: Collaborative Filtering

Music Recommendation

Demo The Long Tail

Most Popular Tracks

0.01 %

Pandora Confidential The Long Tail

Most Popular Tracks

1 %

Pandora Confidential The Long Tail

Most Popular Tracks

0 spins last week

100 %

Pandora Confidential OUTLINE

Motivation: The Long Tail

Background: Collaborative Filtering

Music Recommendation

Demo

Collaborative Filtering RECOMMENDING POPULAR MUSIC ? ? [ ? ? ? ? Items (Tracks) ? ? [ ? ? Users

Collaborative Filtering

PROBLEM OVERVIEW

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.

Collaborative Filtering PROBLEM FORMULATION [ [ k

? ? [ Given Item i and User u: pu ? ? ? qi r?iu k Rating:

riu Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ k[ [ Users Item Latent Factor: qi R 2 Users k User Latent Factor: pu R 2 T Rating Approximation: rˆiu = qi pu T 2 2 2 argminq ,p (rui qi pu) +( qi + pu ) ⇤ ⇤ u,i || || || || Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49.X2S Collaborative Filtering LATENT FACTORS Complex Harmony

Calm Aggressive

Simple Harmony Collaborative Filtering THE GOOD AND THE BAD

Rich preference-driven similarity space Latent space is generally not interpretable Can only recommend items that Powerful at matching the right song have already been rated with the right listener (what about long tail content?) OUTLINE

Motivation: The Long Tail

Background: Collaborative Filtering

Music Recommendation

Demo

Music Recommendation

WITH COLLABORATIVE SONG FACTORS

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing Collaborative Filtering EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Ouield Your Love

Ranked 2 Eagles Hotel California

Ranked 3 Survivor Eye Of The Tiger

Ranked 4 Queen We Will Rock You The Music Genome Project LARGE-SCALE HUMAN ANNOTATED DATASET

Attribute Examples

Breathy Voice Nasal Voice Odd Meter Has Banjo Joyful Lyrics …

Up to ~400 attributes per track Music Genome Project EXAMPLE

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 Jefferson Starship Find Your Way Back

Ranked 3 David Bowie Teenage Wildlife

Ranked 4 Thriving Ivory On Your Side

Estimating Latent Factors

[ [ k

? ? [

? ? ? ? k

Items (Tracks) ? ?

⇡ Items (Tracks)

? ? [ [ [ Users

Users Estimating Latent Factors FROM THE MUSIC GENOME PROJECT

k

y(null)

(null) x

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

f(x; ✓) y (null) ⇡ Estimating Latent Factors FROM THE MUSIC GENOME PROJECT N 2048 2048 k

Dense Layers Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y { } • ~900k tracks (from “head”) M(null)

• Loss : (✓) L(null) • Cosine Distance • Dropout 10% in Dense Layers 1 f(x; ✓)T y • Batch Normalization in all layers (✓)=1 L M f(x; ✓) 2 y 2

(null) x ,y • Adam optimizer 2XX 2Y || || || || Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m Latent Factor Estimations WITH THE MUSIC GENOME PROJECT

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Journey Stone In Love

Ranked 2 D Drive A Lile Bia Sunshine

Ranked 3 John Parr Naughty, Naughty

Ranked 4 Kiss Turn On The Night Machine Listening ESTIMATING THE MUSIC GENOME PROJECT

Pons, J., et al. ISMIR 2018 Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors FROM THE MUSIC GENOME PROJECT ESTIMATIONS N 2048 2048 k

Dense Layers Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m Latent Factor Estimations WITH MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Dean Friedman Don’t You Ever Dare

Ranked 2 James Taylor Stand And Fight

Ranked 3 The Dingoes Starng Today

Ranked 4 Chuck Girad The Days Are Young Estimating Latent Factors FROM AUDIO

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

Oord, A. Van Den, Dieleman, S., & Schrauwen, B. (2013). Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651. Estimating Latent Factors FROM RAW WAVEFORMS

k

y(null)

(null) x

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg

f(x; ✓) y (null) ⇡ Estimating Latent Factors FROM RAW WAVEFORMS

Conv1D Conv1D MP Conv1D MP Conv1D MP 64 x 3 64 x 3 3 64 x 3 3 128 x 3 3

(null) x k Conv1D MP Conv1D MP Conv1D Conv1D 1024 1024 1024 128 x 3 3 128 x 3 3 128 x 3 128 x 3

y(null)

Conv1D Conv1D Conv1D Conv1D 256 x 3 512 x 7 512 x 7 512 x 7 Dense Layers

Auto Pooling https://github.com/jordipons/music-audio-tagging-at-scale-models Lee, J., et al., 2018 McFee, B., et al., 2018 Estimating Latent Factors DATA AND OPTIMIZATION • Data set : X, Y • ~900k tracks (from “head”) { } • 16kHz - 16 bit waveforms

M(null) • 3 patches of 15 seconds per track (~2.7M patches) • Loss : (✓) L(null) • Cosine Distance • Dropout 10% in Dense Layers 1 f(x; ✓)T y • Batch Normalization in all layers (✓)=1 L M f(x; ✓) 2 y 2

(null) x ,y • Adam optimizer 2XX 2Y || || || || Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m

Spectrogram (35s patches) 0.37 22 ~2h Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m

Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Latent Factor Estimations WITH WAVEFORMS

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 The Gaes Band The Final Countdown

Ranked 2 Survivor Backstreet Love Affair

Ranked 3 Toto Angel Don’t Cry

Ranked 4 Orion The Hunter Dark And Stormy Estimating Latent Factors COMBINING MODELS

k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-artificial-intelligence-2-970x0-970x646.jpg Estimating Latent Factors LATE-FUSION ARCHITECTURE

2 Co Co M Co M Co M

1 1 1 k Co M Co M Co Co

Co Co Co Co

Pons, J., McFee, k N 4 4 k

http://blogs-images.forbes.com/kevinmurnane/files/2016/03/google-deepmind-

Oramas, S., et al. 2018 Estimating Latent Factors LATE-FUSION ARCHITECTURE 2048 2048 k 2048 + 1024

Dense Layers

Oramas, S., et al. 2018 Estimating Latent Factors RESULTS

Input Cos Distance # Epochs Time / Epoch

MGP 0.30 15 ~4m

MGP Esmaons 0.44 21 ~4m

Spectrogram (35s patches) 0.37 22 ~2h Waveform (15s patches) 0.34 9 ~5h Waveform + MGP Esmaons 0.32 20 ~4m Latent Factor Estimations WITH AUDIO + MACHINE LISTENING ATTRIBUTES

Arst Title

Query Track Journey Don’t Stop Believing

Ranked 1 Patrick Simmons Knocking at your Door

Ranked 2 Night Ranger When You Close Your Eyes

Ranked 3 Prism Young And Restless

Ranked 4 The Front The Truth Hurts CONCLUSIONS

The severe length of the Long Tail in music catalogs is real

Collaborative filtering is powerful at recommending music from the “head”

Waveform-based architectures are effective at recommending undiscovered/new music

Pandora Confidential DEMO TIME

Pandora Confidential References

McFee, B., Salamon, J., Bello, J. P., Adaptive pooling operators for weakly labeled sound event detection, IEEE Transactions on Audio, Speech and Language Processing, 2018.

Koren, Y., Bell, R., & Volinsky, C., Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 42–49, 2009.

Lee, J., Park, J, Kim, K. L., & Nam, J. SampleCNN: End-to-end Deep Convolutional neural networks using very small filters for music classification.Applied Sciences, 8(1):150, 2018.

Oord, A. Van Den, Dieleman, S., & Schrauwen, B., Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651, 2013.

Oramas, S., Barbieri, F., Nieto, O., Serra, X., Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval (TISMIR), 2018.

Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-End Learning for Music Audio Tagging at Scale. Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018.

Pandora Confidential