Advanced Course Computer Science Overview (Audio Retrieval) Music Processing

Summer Term 2010  Audio identification (audio fingerprinting) Meinard Müller

Saarland University and MPI Informatik [email protected]  Audio matching

Audio Retrieval  Cover song identification

Overview (Audio Retrieval) Audio Identification

 Audio identification  Allamanche et al. (AES 2001) (audio fingerprinting)  Cano et al. ( IEEE MMSP 2002)  Kurth/Clausen/Ribbrock (AES 2002)  Wang (ISMIR 2003)  Audio matching  Shrestha/Kalker (ISMIR 2004)

 Cover song identification

Audio Identification Audio Identification

Shazam application scenario Shazam application scenario

 User hears music playing in the environment ``The Moment´´

 User records music fragment (5-15 seconds) with Radio – Car, home, work mobile phone TV and cinema  Audio fingerprints are extracted from recording and sent to a service Clubs and bars

 Server identifies audio recording based on fingerprints Cafes, shops, restaurants

 Server sends back metadata (track title, artist) to user

[Wang, ISMIR 2003] [Wang, ISMIR 2003] Audio Identification Audio Identification

Shazam application scenario: Target audience An audio fingerprint is a content-based compact signature that summarizes a piece of audio content

Requirements:

 Discriminative power

 Invariance to distortions

 Compactness

 Computational simplicity

[Wang, ISMIR 2003]

Audio Identification Audio Identification

An audio fingerprint is a content-based compact An audio fingerprint is a content-based compact signature that summarizes a piece of audio content signature that summarizes a piece of audio content

Requirements: Requirements:  Ability to accurately identify an  Recorded query may be item within a huge number of distorted and superimposed with  Discriminative power other items  Discriminative power other audio sources (informative, high entropy)  Background noise  Invariance to distortions  Invariance to distortions  Pitching  Low probability of false positives (audio played faster or slower)  Compactness  Compactness  Equalization  Recorded query excerpt  Compression artifacts (only a few seconds)  Computational simplicity  Computational simplicity  Cropping, framing  …  Large audio collection on the server side (millions of songs)

Audio Identification Audio Identification

An audio fingerprint is a content-based compact An audio fingerprint is a content-based compact signature that summarizes a piece of audio content signature that summarizes a piece of audio content

Requirements: Requirements:  Reduction of complex  Computational efficiency multimedia objects  Discriminative power  Discriminative power  Extraction of fingerprint should  Reduction of dimensionality be simple  Invariance to distortions  Invariance to distortions  Making indexing feasible  Size of fingerprint should be  Compactness  Compactness small  Allowing for fast search  Computational simplicity  Computational simplicity Matching Fingerprints (Shazam) Matching Fingerprints (Shazam)

 Generate list of matching fingerprints (matches between query and database document)

 Each match is represented by a pair (tdatabase , t query ) of time positions  Matching segment is characterized by set M of pairs each having the same time difference

 For each database document (audio file), generate ∩ ∩ tdatabase – tquery = constant for (tdatabase , t query ) ∩ ∩ M reproducible landmarks  Each landmark occurs at a time position  Set of false positives have random time differences  For each landmark, generate a “fingerprint” that  Filter out cruft by doing a histogram on time differences characterizes its location  Score is size of largest histogram peak  Do same for query fragment [Wang, ISMIR 2003] [Wang, ISMIR 2003]

Matching Fingerprints (Shazam) Matching Fingerprints (Shazam)

Scatter plot of matching hash locations Scatter plot of matching hash locations Time (Query) Time (Query) Time

Time (Database) Time (Database)

[Wang, ISMIR 2003] [Wang, ISMIR 2003]

Matching Fingerprints (Shazam) Matching Fingerprints (Shazam)

Scatter plot of matching hash locations Scatter plot of matching hash locations Time (Query) Time (Query) Time

Time (Database) Time (Database) Histogram of time differences Histogram of time differences

Time offset [Wang, ISMIR 2003] Time offset [Wang, ISMIR 2003] Matching Fingerprints (Shazam) Matching Fingerprints (Shazam)

Scatter plot of matching hash locations Scatter plot of matching hash locations Time (Query) Time (Query) Time

Time (Database) Time (Database) Histogram of time differences Histogram of time differences

no peak → no matching segment

Time offset [Wang, ISMIR 2003] Time offset [Wang, ISMIR 2003]

Matching Fingerprints (Shazam) Fingerprints (Shazam)

Scatter plot of matching hash locations Steps: 1. Spectrogram Time (Query) Time

Time (Database) Histogram of time differences  Efficiently computable Frequency Frequency (Hertz)  Standard transform  Robust Matching segment (starting at position 40)

Time (Seconds)

Time offset [Wang, ISMIR 2003] [Wang, ISMIR 2003]

Fingerprints (Shazam) Fingerprints (Shazam) Steps: Steps: 1. Spectrogram 1. Spectrogram 2. Peaks 2. Peaks

 ``Constellation map´´ Problem:

Frequency Frequency (Hertz)  Robust to noise, reverb, Frequency (Hertz)  Individual peaks have low room acoustics entropy  Tend to survive through  Not suitable for indexing voice codec

Time (Seconds) Time (Seconds)

[Wang, ISMIR 2003] [Wang, ISMIR 2003] Fingerprints (Shazam) Fingerprints (Shazam) Steps: Steps: 1. Spectrogram 1. Spectrogram 2. Peaks 2. Peaks 3. Target zone 3. Target zone 4. Pairs of peaks 4. Pairs of peaks

 Fix anchor point  Fix anchor point

Frequency Frequency (Hertz)  Define target zone Frequency (Hertz)  Define target zone  Use pairs of points  Use pairs of points  Use every point as anchor  Use every point as anchor point point

Time (Seconds) Time (Seconds)

[Wang, ISMIR 2003] [Wang, ISMIR 2003]

Indexing (Shazam) Indexing (Shazam)

Definitions:  Hash is formed between anchor point and each point in  N = number of spectral peaks target zone using frequency values and time difference.  p = probability that a spectral peak can be found in (noisy and distorted) query  F = fan-out of target zone, e. g. F = 10  Fan-out (taking pairs of peaks) may cause a  B = #(bits) used to encode spectral peaks and time difference combinatorial explosion in the number of tokens. However, this can be controlled by the size of the traget Consequences :  F · N = #(tokens) to be indexed zone.  2B+B = increase of specifity (2B+B+B instead of 2B)  p2 = propability of a hash to survive  Using more complex hashes increases specificity  p·(1-(1-p) F) = probability of at least on hash survives per anchor point (leading to much smaller hash bucktes) and speed (making the retrieval much faster). Example: F = 10 and B = 10  Memory requirements: F · N = 10 · N  Speedup factor: 2B+B / F2 ~ 10 6 / 10 2 = 10000 (F times as many tokens in query and database, respectively) [Wang, ISMIR 2003] [Wang, ISMIR 2003]

Results (Shazam) Conclusions (Shazam) Test dataset of 10000 tracks Search time: 5 to 500 milliseconds Many parameters to choose:

 Temporal and spectral resolution in spectrogram

 Peak picking strategy

 Target zone and fan-out parameter

1 15 sec Recognition rate Recognition 2 10 sec  5 sec Hash function

 …

Signal/ Noise Ration (dB) [Wang, ISMIR 2003] [Wang, ISMIR 2003] Conclusions (Audio Identification) Overview (Audio Retrieval)

 Identifies audio recording ( not piece of music)  Audio identification (audio fingerprinting)  Highly robust to noise, artifacts, deformations

 May even work to handle superimposed recordings  Audio matching

 Does not allow to identify studio recordings by query taken from recordings  Cover song identification

 Does not generalize to identify different interpretations of the same piece of music

Audio Matching Audio Matching

Various interpretations – Beethoven‘s Fifth  Pickens et al. (ISMIR 2002)  Müller/Kurth/Clausen (ISMIR 2005)  Suyoto et al. (IEEE TASLP 2008) Bernstein  Kurth/Müller (IEEE TASLP 2008) Karajan

Scherbakov (piano)

MIDI (piano)

Audio Matching Audio Matching

General strategy Given: Large music database containing several – recordings of the same piece of music  Normalized and smoothed chroma features – interpretations by various musicians – correlate to harmonic progression – arrangements in different instrumentations – robust to variations in dynamics, timbre, articulation, local tempo Goal: Given a short query audio clip , identify all corresponding audio clips of similar musical content  Robust matching procedure – irrespective of the specific interpretation and instrumentation – efficient – automatically and efficiently – robust to global tempo variations – scalable using index structure Query-by-Example paradigm

[Müller et al., ISMIR 2005] [Müller et al., ISMIR 2005] Feature Design Feature Design Beethoven‘s Fifth: Bernstein

Audio Subband Chroma Statistics Convolution signal decom- energy position distribution Quantization Normalization CENS 88 bands 12 bands Downsampling

Two stages:

Stage 1: Local chroma energy distribution features Stage 2: Normalized short-time statistics

CENS = Chroma Energy Normalized Statistics Resolution: 10 features/second Feature window size: 200 milliseconds

Feature Design Feature Design Beethoven‘s Fifth: Bernstein Beethoven‘s Fifth: Bernstein

Resolution: 10 features/second Resolution: 1 features/second Feature window size: 200 milliseconds Feature window size: 4000 milliseconds

Feature Design Feature Design Beethoven‘s Fifth: Bernstein vs. Sawallisch Beethoven‘s Fifth: Bernstein vs. Sawallisch

Resolution: 10 features/second Resolution: 1 features/second Feature window size: 200 milliseconds Feature window size: 4000 milliseconds Matching Procedure Matching Procedure

Compute CENS feature sequences Query: Beethoven‘s Fifth / Bernstein, first 20 seconds  Database  Query 

Global distance function

Matching Procedure Matching Procedure

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Best audio matches: 1 Best audio matches: 2

Matching Procedure Matching Procedure

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Best audio matches: 3 Best audio matches: 4 Matching Procedure Matching Procedure

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Best audio matches: 5 Best audio matches: 6

Matching Procedure Global Tempo Variations

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Problem: Karajan is much faster useless

Solution?

Best audio matches: 7

Global Tempo Variations Global Tempo Variations

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Problem: Karajan is much faster useless Problem: Karajan is much faster useless

Solution: Make Bernstein query faster and comute new Solution: Compute for various tempi Global Tempo Variations Experiments

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Problem: Karajan is much faster useless  Audio database > 110 hours, 16.5 GB

Solution: Minimize over all resulting ’s  Preprocessing CENS features, 40.3 MB

 Query clip 20 seconds

 Query response time < 10 seconds

Experiments Experiments

Query: Beethoven‘s Fifth / Bernstein, first 20 seconds Query: Beethoven‘s Fifth / Bernstein, first 20 seconds

Experiments Experiments

Query: Shostakovich, Waltz/Chailly, first 27 seconds Query: Shostakovich, Waltz/Chailly, first 21 seconds Index-based Matching Index-based Matching Indexing stage Quantization

 Feature space

Visualization (3D)

 Convert database into feature sequence (chroma/CENS)  Quantize features with respect to a fixed codebook  Create an inverted file index – contains for each codebook vector an inverted list – each list contains feature indices in ascending order

[Kurth/Müller, IEEE-TASLP 2008]

Index-based Matching Index-based Matching Quantization How to derive a good codebook?

 Codebook selection by unsupervised learning  Feature space – Linde–Buzo–Gray (LBG) algorithm – similar to k-means  Codebook selection – adjust algorithm to spheres of suitable size R  Codebook selection based on musical knowledge  Quantization using nearest neighbors

Index-based Matching Index-based Matching LBG algorithm LBG algorithm Steps: Steps:

1. Initialization of 1. Initialization of codebook vectors codebook vectors 2. Assignment 2. Assignment 3. Recalculation 3. Recalculation 4. Iteration (back to 2.) 4. Iteration (back to 2.) Index-based Matching Index-based Matching LBG algorithm LBG algorithm Steps: Steps:

1. Initialization of 1. Initialization of codebook vectors codebook vectors 2. Assignment 2. Assignment 3. Recalculation 3. Recalculation 4. Iteration (back to 2.) 4. Iteration (back to 2.)

Index-based Matching Index-based Matching LBG algorithm LBG algorithm Steps: Steps:

1. Initialization of 1. Initialization of codebook vectors codebook vectors 2. Assignment 2. Assignment 3. Recalculation 3. Recalculation 4. Iteration (back to 2.) 4. Iteration (back to 2.)

Index-based Matching Index-based Matching LBG algorithm LBG algorithm Steps: Steps:

1. Initialization of 1. Initialization of codebook vectors codebook vectors 2. Assignment 2. Assignment 3. Recalculation 3. Recalculation 4. Iteration (back to 2.) 4. Iteration (back to 2.)

Until convergence Index-based Matching Index-based Matching LBG algorithm for spheres LBG algorithm for spheres

 Example: 2D  Example: 2D  Assignment  Assignment  Recalculation  Recalculation  Projection  Projection

Index-based Matching Index-based Matching LBG algorithm for spheres LBG algorithm for spheres

 Example: 2D  Example: 2D  Assignment  Assignment  Recalculation  Recalculation  Projection  Projection

Index-based Matching Index-based Matching Codebook using musical knowledge Codebook using musical knowledge

 Observation: Chroma features capture  C-Major harmonic information

 C#-Major  Example : C -Major

 Choose codebook to contain n-chords for n=1,2,3,4  Example: C #-Major n 1 2 3 4  Experiments: For more then 95% of all chroma features >50% of energy lies in at most 4 components template

# 12 66 220 495 793 Index-based Matching Index-based Matching Codebook using musical knowledge Quantization

Additional consideration of harmonics in chord templates Orignal chromagram and projections on codebooks

Example: 1-chord C Original LBG-based Model-based

Harmonics 1 2 3 4 5 6 Pitch C3 C4 G4 C5 E5 G5 Frequency 131 262 392 523 654 785 Chroma C C G C E C

Replace by with suitable weights for the harmonics

Index-based Matching Index-based Matching Query and retrieval stage Retrieval results

 Medium sized database – 500 pieces – 112 hours of audio – mostly  Selection of various queries – 36 queries  Query consists of a short audio clip (10-40 seconds) – duration between 10 and 40 seconds  Specification of fault tolerance setting – hand-labelled matches in database – fuzzyness of query  Indexing leads to speed-up factor between 15 and 20 – number of admissable mismatches (depending on query length) – tolerance to tempo variations  Only small degradation in precision and recall – tolerance to modulations

Index-based Matching Conclusions (Index-based Matching) Retrieval results

No index  Described method suitable for medium-sized databases LBG-based index – index is assumed to be in main memory Model-based index – inverted lists may be long  Goal was to find all meaningful matches – high -degree of fault -tolerance required (fuzzyness, mismatches) – number of intersections and unions may explode  What to do when dealing with millions of songs?  Can the quantization be avoided? Average Precision Average  Better indexing and retrieval methods needed! – kd-trees – locality sensitive hashing – … Average Recall Conclusions (Audio Matching) Conclusions (Audio Matching)

Strategy: Absorb variations at feature level Global matching procedure  Strategy: Exact matching and multiple scaled queries – simulate tempo variations by feature resampling  Chroma invariance to timbre – different queries correspond to different tempi – indexing possible  Normalization invariance to dynamics  Strategy: Dynamic time warping  Smoothing invariance to local time deviations – subsequence variant – more flexible (in particular for longer queries) – indexing hard

Application: Audio Matching Application: Audio Matching

Overview (Audio Retrieval) Cover Song Identification

 Audio identification  Gómez/Herrera (ISMIR 2006) (audio fingerprinting)  Casey/Slaney (ISMIR 2006)  Serrà (ISMIR 2007)  Audio matching  Ellis/Polioner (ICASSP 2007)  Serrà/Gómez/Herrera/Serra (IEEE TASLP 2008)

 Cover song identification Cover Song Identification Cover Song Identification

Motivation Goal: Given a music recording of a song or piece of music, find all corresponding music recordings within a huge  Automated organization of music collections collection that can be regarded as a kind of version, interpretation, or cover song. ``Find me all covers of …´´  Live versions  Versions adapted to particular country/region/language  Musical rights management  Contemporary versions of an old song  Radically different interpretations of a musical piece  Learning about music itself  … ``Understanding the essence of a song´´ Instance of document-based retrieval!

Cover Song Identification Cover Song Identification

Nearly anything can change! But something doesn't change. How to compare two different songs? Often this is chord progression and/or melody Song A

Bob Dylan Avril Lavigne Knockin’ on Heaven’s Door key Knockin’ on Heaven’s Door timbre Enter Sandman Nirvana Nirvana Poly [Incesticide ] tempo Poly [Unplugged] Song A Black Sabbath Cindy & Bert Paranoid lyrics Der Hund Der Baskerville AC/DC AC/DC High Voltage recording conditions High Voltage [live] song structure

[Serrà et al., IEEE-TASLP 2009]

Cover Song Identification Cover Song Identification How to compare two different songs? How to compare two different songs?

Chroma Chroma Song A Song A Sequence Sequence

Optimal Transposition

Chroma Chroma Song A Song A Sequence Sequence

 Feature computation  Feature computation  Dealing with different keys

[Serrà et al., IEEE-TASLP 2009] [Serrà et al., IEEE-TASLP 2009] Cover Song Identification Cover Song Identification How to compare two different songs? How to compare two different songs?

Chroma Chroma Song A Song A Sequence Sequence Dyncamic Binary Binary Optimal Optimal Programming Similarity Similarity Score Transposition Transposition Local Matrix Matrix Alignment

Chroma Chroma Song A Song A Sequence Sequence

 Feature computation  Feature computation  Dealing with different keys  Dealing with different keys  Local similarity measure  Local similarity measure  Global similarity measure [Serrà et al., IEEE-TASLP 2009] [Serrà et al., IEEE-TASLP 2009]

Cover Song Identification Cover Song Identification

B A# A Feature computation G# Dealing with different keys G F# F E D# D C# C

 Chroma features 20 40 60 80 100 120 140 Bob Dylan – Knockin’ on Heaven’s Door Avril Lavigne – Knockin’ on Heaven’s Door – correlates to harmonic progression – robust to changes in timbre and instrumentation – normalization introduces invariance to dynamics  Compute average chroma vectors for each song  Consider cyclic shifts of the chroma vectors to  Enhancement strategies simulate transpositions – model for considering harmonics – compensation of tuning differences  Determine optimal shift indices so that the shifted – finer resolution (1, 1/2, 1/3 semitone resolution) chroma vectors are matched with minimal cost → 12/24/36 dimensional chroma features [Gómez, PhD 2006]  Transpose the songs accordingly

Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Feature space:  Fix a local cost measure  Compute cost between x and shifted y  Chroma vector:

 Cyclic shift operator: x y

1  Composition of shifts: , 0.5 Cost  Note: 0 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Given chroma vectors  Fix a local cost measure  Fix a local cost measure  Compute cost between x and shifted y  Compute cost between x and shifted y

x x σ (y) σ 2(y)

1 1

0.5 0.5 Cost Cost

0 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Shift index

Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Given chroma vectors  Fix a local cost measure  Fix a local cost measure  Compute cost between x and shifted y  Compute cost between x and shifted y

x x σ 3(y) σ 4(y)

1 1

0.5 0.5 Cost Cost

0 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Shift index

Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Given chroma vectors  Fix a local cost measure  Fix a local cost measure  Compute cost between x and shifted y  Compute cost between x and shifted y

x x σ 5(y) σ 6(y)

1 1

0.5 0.5 Cost Cost

0 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Shift index Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Given chroma vectors  Fix a local cost measure  Fix a local cost measure  Compute cost between x and shifted y  Compute cost between x and shifted y

x x σ 7(y) σ 8(y)

1 1

0.5 0.5 Cost Cost

0 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Shift index

Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Given chroma vectors  Fix a local cost measure  Fix a local cost measure  Compute cost between x and shifted y  Compute cost between x and shifted y

x x σ 9(y) σ 10 (y)

1 1

0.5 0.5 Cost Cost

0 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Shift index

Cyclic Chroma Shifts Cyclic Chroma Shifts  Given chroma vectors  Given chroma vectors  Fix a local cost measure  Fix a local cost measure  Compute cost between x and shifted y  Compute cost between x and shifted y  Minimizing shift index: 3

x x σ 11 (y) σ 3(y)

1 1

0.5 0.5 Cost Cost

0 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Shift index Shift index Cyclic Chroma Shifts Cyclic Chroma Shifts

B B  What is a good local cost A#  What is a good local cost A# A A meaure for chroma space? G# meaure for chroma space? G# G G F# Euclidean? Cosine distance? F# F ? F ? E E D# d D# D D C# α C# C C

1 2 3 1 2 3  Is the chroma space Euclidean? Probably not! For example, C is musically closer to G than C #

 Idea: Usage of very coarse binary cost measure that indicates the same tonal root [Serrà et al., IEEE-TASLP 2009]

Cyclic Chroma Shifts Cyclic Chroma Shifts

 Original local cost measure Cost matrix based on c Binary cost matrix based on cb

 Binary cost measure

1 Song A Song

0

for Song B Song B [Serrà et al., IEEE-TASLP 2009] [Serrà et al., IEEE-TASLP 2009]

Cyclic Chroma Shifts Cover Song Identification Think positive! How to compare two different songs? Cost matrix based on c Binary similarity matrix Chroma Song A Sequence Dyncamic Binary Optimal Programming Similarity Score 1 Transposition Local Matrix Alignment

Chroma Song A Sequence Song A Song

 Feature computation -1  Dealing with different keys  Local similarity measure Song B Song B  Global similarity measure [Serrà et al., IEEE-TASLP 2009] [Serrà et al., IEEE-TASLP 2009] Cover Song Identification Local Alignment How to compare two different songs? Assumption: Chroma Song A Two songs are considered as similar if they contain Sequence possibly long subsegments that possess a similar Dyncamic Binary Optimal Programming harmonic progression Similarity Score Transposition Local Matrix Alignment Task: Chroma Song A Sequence Let X=(x 1,…,xN) and Y=(y 1,…,yM) be the two chroma sequences of the two given songs, and let S be the  Feature computation resulting similarity matrix. Then find the maximum similarity  Dealing with different keys of a subsequence of X and a subsequence of Y.  Local similarity measure  Global similarity measure [Serrà et al., IEEE-TASLP 2009]

Local Alignment Local Alignment

Note:  Classical DTW Global correspondence X This problem is also known from bioinformatics. between X and Y The Smith-Waterman algorithm is a well-known algorithm Y for performing local sequence alignment ; that is, for determining similar regions between two nucleotide or  Subsequence DTW protein sequences. Subsequence of Y corresponds X to X

Strategy: Y We use a variant of the Smith-Waterman algorithm.  Local Alignment

Subsequence of Y corresponds X to subequence of X

Y

Local Alignment Local Alignment Computation of accumulated score matrix D Example: Knockin’ on Heaven’s Door

from given binary similarity (score) matrix S Knockin' on Heaven's Door 1 Binary similarity 140 0.8 matrix

120 0.6 1

0.4 100

0.2

80

0 Bob Dylan Bob

60 -0.2 Bob Dylan Bob  Zero-entry allows for jumping to any cell without penalty 40 -0.4-1  g penalizes`` inserts´´ and ``delets´´ in alignment -0.6  Best local alignment score is the highest value in D 20  Best local alignment ends at cell of highest value -0.8

50 100 150 200 250 300 Guns and Roses  Start is obtained by backtracking to first cell of value zero Guns and Roses Local Alignment Local Alignment Example: Knockin’ on Heaven’s Door Example: Knockin’ on Heaven’s Door

Knockin' on Heavens's Door Knockin' on Heavens's Door 94.2 94.2 90 90 140 90 Accumulated 140 90 Accumulated

80 score matrix 80 score matrix 120 120

70 70

100 100 Cell with max. 60 60 60 60 score = 94.2

80 50 80 50 Bob Bob Dylan Bob Dylan 40 40 60 60 Bob Dylan Bob Dylan Bob 30 30 30 30 40 40

20 20

20 20 10 10

0 0 50 100 150 200 250 300 50 100 150 200 250 300 Guns and Roses Guns and Roses Guns and Roses Guns and Roses

Local Alignment Local Alignment Example: Knockin’ on Heaven’s Door Example: Knockin’ on Heaven’s Door

Knockin' on Heaven's Door Knockin' on Heaven's Door 94.2 94.2 90 90 140 90 Accumulated 140 90 Accumulated

80 score matrix 80 score matrix 120 120

70 70

100 Cell with max. 100 Cell with max. 60 60 score = 94.2 94.2 60 60 score = 94.2

80 50 80 50 Bob Bob Dylan Bob Dylan 40 40 60 Alignment path 60 Alignment path Bob Dylan Bob Dylan Bob 30 30 of maximal score 30 30 of maximal score 40 40

20 score Accumulated 20

20 20 Alignment path 10 10

0 0 50 100 150 200 250 300 50 100 150 200 250 300 Guns and Roses Guns and Roses Guns and Roses Guns and Roses

Local Alignment Cover Song Identification Example: Knockin’ on Heaven’s Door Query: Bob Dylan – Knockin’ on Heaven’s Door

Knockin' on Heaven's Door Retrieval result: 90 140 90 Accumulated

80 score matrix Rank Recording Score 120 70 1. Gunsand Roses:Knockin‘ On Heaven’s Door 94.2 100 Cell with max. 60 60 score = 94.2 2. AvrilLavigne:Knockin‘OnHeaven’sDoor 86.6

80 50 3. WyclefJean:Knockin‘OnHeaven’sDoor 83.8 Bob Bob Dylan 40 60 Alignment path 4. Bob Dylan: Not For You 65.4 Bob Dylan Bob 30 30 of maximal score 40 5. Gunsand Roses:Patience 61.8 20 6. Bob Dylan: Like A Rolling Stone 57.2 20 Matching 10 subsequences 7.-14. … 0 50 100 150 200 250 300 Guns and Roses Guns and Roses Cover Song Identification Conclusions (Cover Song Identification)

Query: AC/DC – Highway To Hell  Harmony-based approach Retrieval result:

Rank Recording Score  Binary cost measure a good trade-off between robustness and expressiveness 1. AC/DC:Hard Asa Rock 79.2 2. Hayseed Dixie: Dirty Deeds Done Dirt Cheap 72.9  Measure is suitable for document retrieval, but seems to 3. AC/DC: Let There Be Rock 69.6 be too coarse for audio matching applications 4. AC/DC: TNT (Live) 65.0 5.-11. …  Every song has to be compared with any other 12. Hayseed Dixie: Highway To Hell 30.4 → method does not scale to large data collection 13. AC/DC: Highway To Hell Live (live) 21.0 14. …  What are suitable indexing methods?

Conclusions (Audio Retrieval)

Retrieval Audio Audio Cover song task identification matching identification

Identification Concrete audio Different Different recording interpretations versions

Query Short fragment Audio clip Entire song (5-10 seconds) (10-40 seconds)

Retrieval level Subsequence Subsequence Document Features Spectral peaks Chroma Chroma (abstract) (harmony) (harmony)

Indexing Hashing Inverted lists No indexing