Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)
Total Page:16
File Type:pdf, Size:1020Kb
University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses Summer 8-9-2017 Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM) Mohan Kumar Kanuri [email protected] Follow this and additional works at: https://scholarworks.uno.edu/td Part of the Signal Processing Commons Recommended Citation Kanuri, Mohan Kumar, "Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)" (2017). University of New Orleans Theses and Dissertations. 2381. https://scholarworks.uno.edu/td/2381 This Thesis is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights- holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Thesis has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM) A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of Master of Science in Engineering – Electrical By Mohan Kumar Kanuri B.Tech., Jawaharlal Nehru Technological University, 2014 August 2017 This thesis is dedicated to my parents, Mr. Ganesh Babu Kanuri and Mrs. Lalitha Kumari Kanuri for their constant support, encouragement, and motivation. I also dedicate this thesis to my brother, Mr. Hima Kumar Kanuri for all his support. ii Acknowledgement I would like to express my sincere gratitude to my advisor Dr. Dimitrios Charalampidis for his constant support, encouragement, patient guidance and instruction in the completion of my thesis and degree requirements. His innovative ideas, encouragement, and positive attitude have been an asset to me throughout my Masters in achieving my long-term career goals. I would also like to thank Dr. Vesselin Jilkov, Dr. Kim D Jovanovich, for serving on my committee, and for their support, motivation throughout my graduate research that enabled me to complete my thesis successfully. iii Table of Contents List of Figures ................................................................................................................................................ v Abstract ........................................................................................................................................................ vi 1. Introduction .............................................................................................................................................. 1 1.1 Sound .................................................................................................................................................. 1 1.2 Characteristics of sound ...................................................................................................................... 1 1.3 Music and speech................................................................................................................................ 3 2. Scope and Objectives ................................................................................................................................ 5 3. Literature Review ...................................................................................................................................... 6 3.1 Repetition used as a criterion to extract different features in audio ................................................. 6 3.1.1 Similarity matrix .......................................................................................................................... 7 3.1.2 Cepstrum .................................................................................................................................... 10 3.2 Previous work.................................................................................................................................... 12 3.2.1 Mel Frequency Cepstral Coefficients (MFCC) .......................................................................... 13 3.2.2 Perceptual Linear Prediction (PLP) ........................................................................................... 15 4. REPET and Proposed Methodologies ...................................................................................................... 18 4.1 REPET methodology .......................................................................................................................... 18 4.1.1 Overall idea of REET................................................................................................................. 18 4.1.2 Identification of repeating period: ............................................................................................. 20 4.1.3 Repeating Segment modeling .................................................................................................... 23 4.1.4 Repeating Patterns Extraction .................................................................................................... 24 4.2. Proposed methodology:................................................................................................................... 25 4.2.1 Lag evaluation ............................................................................................................................ 27 4.2.2 Alignment of segments based on the lag t: ................................................................................ 28 4.2.3 Stitching the segments ............................................................................................................... 29 4.2.4 Unwrapping and extraction of repeating background ................................................................ 30 5. Results and Data Analysis ....................................................................................................................... 32 6. Limitations and Future Recommendations ............................................................................................. 37 7. Bibliography ............................................................................................................................................ 39 Vita .............................................................................................................................................................. 43 iv List of Figures Figure 1. Intensity of sound varies with the distance ...................................................................... 2 Figure 2. Acoustic processing for similarity measure .................................................................... 8 Figure 3. Visualization of drum pattern highlighting the similar region on diagonal .................. 10 Figure 4. cepstrum coefficients calculation .................................................................................. 11 Figure 5. Matlab graph representing X[k], X̂ [k] and c[n] of a signal x[n] ................................... 12 Figure 6. Building blocks of Vembu separation system ............................................................... 13 Figure 7. Process of building MFCCs .......................................................................................... 14 Figure 8. Process of building PLP cepstral coefficients ............................................................... 16 Figure 9. Depiction of Musical work production using different instruments and voices ........... 18 Figure 10. REPET Methodology summarized into three stages ................................................... 20 Figure 11. Spectral Content of drums using different window length for STFT .......................... 21 Figure 12. Segmentation of magnitude spectrogram V into ‘r’ segments .................................... 23 Figure 13. Estimation of background and unwrapping of signal using ISTFT. ........................... 25 Figure 14. Alignment of segment for positive lag ........................................................................ 28 Figure 15. Alignment of segment for negative lag ....................................................................... 29 Figure 16. Stitching of CRM segments........................................................................................ 30 Figure 17. Unwrapping of repeating patterns in audio signal. ...................................................... 31 Figure 18. SNR ratio of REPET and CPRM for different audio clips .......................................... 33 Figure 19. Foreground extracted by REPET and CPRM for Matlab generated sound ................. 34 Figure 20. Foreground extracted by REPET and CPRM for priyathama ..................................... 34 Figure 21. Foreground extracted by REPET and CPRM for Desiigner Panda song .................... 35 v Abstract Extraction of singing voice from music is one of the ongoing research topics in the field of speech recognition and audio analysis. In particular, this topic finds many applications in the music field, such as in determining music structure, lyrics recognition, and singer recognition. Although many studies have been conducted for the separation of