Automatic Audio and Lyrics Alignment
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Audio and Lyrics Alignment DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur in der Studienrichtung INFORMATIK Eingereicht von: Andreas Kothmeier, 0155676 Angefertigt am: Institut für Computational Perception Betreuung: Univ.-Prof. Dr. Gerhard Widmer Linz, August 2006 Acknowledgements I would like to thank the following people 1) Univ.-Prof. Dr. Gerhard Widmer* for his support and good sugges- tions while writing this diploma thesis. * 2) Dipl. Inf. Tim Pohle* for his introduction into digital audio processing and his support in the early implementation stage. 3) Dipl.-Ing. Peter Knees* for providing the topic of this diploma thesis and his support. 4) Dipl.-Ing. Klaus Seyerlehner* for his advice on some books about digi- tal audio processing. 5) Clemens Raab for his support at the implementation of the band pass filter. * Department of Computational Perception, Johannes Kepler University Linz Abstract English Version: Nowadays computers start to replace the hi-fi system in living rooms all over the world. This opens up more and more possibilities for the user to gain addi- tional information for the song currently played. One kind of information is the lyrics; that is what my diploma thesis deals with. The goal is to provide a pro- gram that is able to automatically align the lyrics to the audio signal. This way the annoying scrolling of the lyrics while listening to a song is not necessary anymore. Deutsche Version: Heutzutage verdrängen Computer mehr und mehr die HiFi-Anlage im Wohn- zimmer. Dadurch ergeben sich immer mehr neue Möglichkeiten für den Benut- zer, zusätzliche Informationen zu dem gerade gespielten Lied zu erlangen. Eine dieser Informationen ist der Liedtext, der das Hauptthema in meiner Diplomar- beit darstellt. Der Text eines Songs soll dabei automatisch mit dem Audiosignal synchronisiert werden. Das ermöglicht es dem Benutzer, den Text beim Anhö- ren des Songs mitzulesen, ohne, dass dabei ein manuelles Scrollen notwendig wäre. Contents 1 INTRODUCTION ........................................................................................................... 1 1.1 DEVELOPMENT OF INTELLIGENT MUSIC PROCESSING.............................................. 1 1.2 MOTIVATION ....................................................................................................... 2 1.2.1 Application Areas for Automatic Lyrics Alignment ........................................... 2 1.2.2 Main Reason for Choosing this Topic .............................................................. 2 1.3 GOAL.................................................................................................................. 3 2 BASIC KNOWLEDGE.................................................................................................... 6 2.1 DIGITAL REPRESENTATION OF AUDIO DATA .......................................................... 6 2.2 DFT, FFT & INVERSE DFT................................................................................... 9 2.2.1 DFT ............................................................................................................... 9 2.2.2 FFT.............................................................................................................. 11 2.2.3 Inverse DFT ................................................................................................. 11 2.3 WINDOW FUNCTIONS - THE HANN WINDOW ........................................................ 12 2.4 CALCULATING THE FREQUENCY OF A NOTE ......................................................... 14 3 THE PROCESS OF LYRICS ALIGNMENT................................................................. 16 3.1 APPROACH 1: CHI HANG WONG ET AL................................................................. 16 3.2 APPROACH 2: ALEX LOSCOS ET AL...................................................................... 18 3.3 APPROACH 3: WANG ET AL. ................................................................................ 20 3.3.1 Structural Element Level Alignment............................................................... 22 3.3.1.1 Beat Detector...................................................................................................23 3.3.1.2 Measure Detector.............................................................................................24 3.3.1.3 Chorus Detector...............................................................................................25 3.3.1.4 Section Processor.............................................................................................25 3.3.2 Line Level Alignment .................................................................................... 26 3.3.2.1 Vocal Detector.................................................................................................26 3.3.2.2 Line Processor.................................................................................................27 3.3.3 System Integration ........................................................................................ 28 3.3.3.1 Section Level Alignment..................................................................................28 3.3.3.2 Line Level Alignment ......................................................................................30 3.3.4 Evaluation.................................................................................................... 30 3.4 OVERVIEW OF MY APPROACH ............................................................................. 32 3.5 DIFFERENCES BETWEEN EXISTING APPROACHES AND MINE .................................. 35 3.5.1 Wang et al. vs. My Approach ......................................................................... 36 3.5.2 Goto vs. My Approach................................................................................... 37 4 IMPLEMENTATION.................................................................................................... 39 4.1 ADVANTAGES & DISADVANTAGES OF JAVA VS. MATLAB...................................... 39 4.2 IMPLEMENTATION DETAILS ................................................................................ 39 4.2.1 Applying the FFT.......................................................................................... 41 4.2.2 Band Pass Filter ........................................................................................... 42 4.2.3 Similarity Comparison .................................................................................. 45 4.2.4 Finding Line Segments .................................................................................. 47 4.2.4.1 The Basic Concept of Finding Line Segments....................................................47 4.2.4.2 Differences to Goto’s Approach........................................................................49 4.2.4.3 Detecting Modulated Chorus Sections...............................................................50 4.2.4.4 Line Segments vs. Tracks.................................................................................52 4.2.5 Merging Tracks............................................................................................. 53 4.2.5.1 Eliminate Redundant Line Segments.................................................................53 4.2.5.2 Reconstruct Missing Line Segments..................................................................57 4.2.6 Finding Chorus Track ................................................................................... 61 4.2.6.1 Half-Length Sub-Segments...............................................................................61 4.2.6.2 Calculating the Track Score..............................................................................62 4.2.7 Aligning Lyrics ............................................................................................. 64 4.2.7.1 Preparations of the Lyrics.................................................................................64 4.2.7.2 Audio and Lyrics Alignment.............................................................................66 5 USER MANUAL ........................................................................................................... 71 5.1 THE MAIN WINDOW........................................................................................... 71 5.1.1 Step1 – Selecting the source file .................................................................... 72 5.1.2 File Info ....................................................................................................... 74 5.1.3 Step2 - Settings & feature extraction.............................................................. 75 5.1.4 Step3 – After feature extraction ..................................................................... 76 5.1.5 Plotted Audio Signal ..................................................................................... 77 5.1.6 Listen ........................................................................................................... 77 5.1.7 Line Segments Selection ................................................................................ 78 5.1.8 The Log ........................................................................................................ 78 5.1.9 Progress Bar ................................................................................................ 79 5.2 THE AUDIO & LYRICS ALIGNMENT WINDOW ....................................................... 79 5.2.1 The Tracks...................................................................................................