Exploring Use of Non-Negative Matrix Factorization for Lossy Audio Compression Student: Bc
Total Page:16
File Type:pdf, Size:1020Kb
ASSIGNMENT OF MASTER’S THESIS Title: Exploring use of non-negative matrix factorization for lossy audio compression Student: Bc. Tomáš Drbota Supervisor: doc. Ing. Ivan Šimeček, Ph.D. Study Programme: Informatics Study Branch: System Programming Department: Department of Theoretical Computer Science Validity: Until the end of winter semester 2019/20 Instructions The aim of this work is to research existing methods for lossy audio compression and to apply non-negative matrix factorization (NMF) in order to improve the compression ratio without a noticeable loss in quality. Therefore, the goals are as following: 1) Become familiar with state of the art digital audio encoding and compression methods [1]. 2) Become familiar with non-negative matrix factorization, its properties and applications in digital audio processing [2][3]. 3) Design and implement a proof of concept audio codec utilizing NMF for audio compression, or extend an existing codec with this functionality [4]. 4) Find a suitable method for measuring perceived audio quality. 5) Measure the resulting audio quality and compare the quality and compression ratio to existing solutions. References [1] BOSI, Marina and Richard E. GOLDBERG, 2003. Introduction to digital audio coding and standards. Boston: Kluwer Academic Publishers. [2] Wang, Y., & Zhang, Y. (2013). Nonnegative Matrix Factorization: A Comprehensive Review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336-1353. [3] RECOSKIE, Daniel and MANN, Richard, 2014, Constrained Nonnegative Matrix Factorization with Applications to Music Transcription. University of Waterloo Technical Report CS-2014-27 [online]. 23 December 2014. [4] VALIN, Jean-Marc, MAXWELL, Gregory, TERRIBERRY, Timothy B. and VOS, Koen, 2013, High-Quality, Low-Delay Music Coding in the Opus Codec. [online]. October 2013. doc. Ing. Jan Janoušek, Ph.D. doc. RNDr. Ing. Marcel Jiřina, Ph.D. Head of Department Dean Prague June 19, 2018 Master’s thesis Exploring use of non-negative matrix factorization for lossy audio compression Bc. Tom´aˇsDrbota Department of Theoretical Computer Science Supervisor: doc. Ing. Ivan Simeˇcek,Ph.D.ˇ May 8, 2019 Acknowledgements I would like to thank my supervisor for always being quick to consult any arising issues. My thanks also go to my family and friends for supporting me throughout the entire time of my studies. Declaration I hereby declare that the presented thesis is my own work and that I have cited all sources of information in accordance with the Guideline for adhering to ethical principles when elaborating an academic final thesis. I acknowledge that my thesis is subject to the rights and obligations stipu- lated by the Act No. 121/2000 Coll., the Copyright Act, as amended. In accor- dance with Article 46(6) of the Act, I hereby grant a nonexclusive authoriza- tion (license) to utilize this thesis, including any and all computer programs incorporated therein or attached thereto and all corresponding documentation (hereinafter collectively referred to as the “Work”), to any and all persons that wish to utilize the Work. Such persons are entitled to use the Work in any way (including for-profit purposes) that does not detract from its value. This authorization is not limited in terms of time, location and quantity. In Prague on May 8, 2019 . Czech Technical University in Prague Faculty of Information Technology c 2019 Tom´aˇsDrbota. All rights reserved. This thesis is school work as defined by Copyright Act of the Czech Republic. It has been submitted at Czech Technical University in Prague, Faculty of Information Technology. The thesis is protected by the Copyright Act and its usage without author’s permission is prohibited (with exceptions defined by the Copyright Act). Citation of this thesis Drbota, Tom´aˇs. Exploring use of non-negative matrix factorization for lossy audio compression. Master’s thesis. Czech Technical University in Prague, Faculty of Information Technology, 2019. Abstrakt Algoritmus nez´aporn´ehorozkladu matic byl ´uspˇesnˇepouˇzit pro mnoho r˚uzn´ych probl´em˚u,kter´esouvis´ıs anal´yzouvelk´ehomnoˇzstv´ıdat a z nich plynouc´ıch souvislost´ı. Vyuˇz´ıv´ase napˇr. pro rozpozn´av´an´ıobliˇcej˚u,separaci sign´al˚uˇci kompresi obr´azk˚u. C´ılemt´etopr´ace je prozkoumat moˇzn´evyuˇzit´ınez´aporn´ehorozkladu matic pro ztr´atovou kompresi zvuku. Bude naprogramov´anreferenˇcn´ıkod´era dekod´er zvuku vyuˇz´ıvajic´ıNMF, a s n´ımbudou potom provedeny r˚uzn´eexperimenty. V´ysledkybudou zmˇeˇreny a srovn´any v˚uˇciexistuj´ıc´ımn´astroj˚umpro kompresi zvuku. Kl´ıˇcov´a slova ztr´atov´akomprese, zvuk, komprese dat, nmf, k´odov´an´ı Abstract Non-negative matrix factorization has been successfully applied in various scenarios involving analysis of large chunks of data and finding patterns in them for later use. It’s used to perform things such as face recognition, source separation or image compression among others. The purpose of this thesis is to research possible uses of non-negative matrix factorization in the problem of lossy audio compression. A reference vii audio encoder and decoder using NMF will be implemented and various ex- periments using this encoder will be conducted. The results will be measured and compared to existing audio compressing solutions. Keywords lossy compression, audio, data compression, nmf, encoding viii Contents 1 Introduction 1 2 Digital audio 3 2.1 Important terms . 3 2.1.1 Sampling . 3 2.1.2 Nyquist frequency . 4 2.1.3 Quantization . 5 2.1.4 Windowing . 7 2.1.5 Transient . 9 2.1.6 Aliasing . 9 2.1.7 Spectral leakage . 9 2.1.8 Scalloping . 9 2.2 Digital audio representation . 10 2.2.1 Time domain representation . 10 2.2.2 Frequency domain representation . 11 2.3 Psychoacoustics . 15 2.3.1 Pitch . 15 2.3.2 Loudness . 17 2.3.3 Auditory masking . 17 3 Non-negative matrix factorization 19 3.1 Linear dimensionality reduction . 19 3.2 NMF definition . 20 3.3 Classification . 20 3.3.1 Basic NMF . 20 3.3.2 Constrained NMF (CNMF) . 21 3.3.3 Structured NMF (SNMF) . 21 3.3.4 Generalized NMF (GNMF) . 21 3.4 Properties . 22 ix 3.5 Algorithms . 22 3.5.1 Cost function . 22 3.5.2 Update rules . 23 3.5.3 Initialization . 23 3.6 Use in digital audio processing . 24 4 Data compression 25 4.1 Lossy audio compression . 25 4.1.1 State of the art . 25 4.1.2 Compression using NMF . 28 4.2 Lossless data compression . 29 4.2.1 Huffman coding . 29 5 Design 31 5.1 WAVE file . 31 5.2 ANMF File structure . 32 5.3 Encoder . 33 5.3.1 ANMF-RAW . 33 5.3.2 ANMF-MDCT . 34 5.3.3 ANMF-STFT . 35 5.4 Decoder . 39 6 Implementation 41 6.1 Used 3rd party libraries . 41 6.1.1 click . 41 6.1.2 pytest . 42 6.1.3 dahuffman . 42 6.1.4 SciPy . 42 6.2 Command line utility . 43 6.3 Audio library . 43 6.4 Compression library . 44 6.4.1 Utility functions . 44 6.4.2 NMF implementation . 47 6.4.3 ANMF-RAW . 48 6.4.4 ANMF-MDCT . 48 6.4.5 ANMF-STFT . 50 7 Evaluation 53 7.1 Methodology . 53 7.2 GstPEAQ . 54 7.3 Evaluating results . 54 7.3.1 Audio examples . 55 7.3.2 Bitrate . 55 7.3.3 ANMF-RAW . 55 x 7.3.4 ANMF-MDCT . 57 7.3.5 ANMF-STFT . 58 7.3.6 Comparison . 63 8 Conclusion 65 Bibliography 67 A List of Symbols 73 B Abbreviations 75 C Contents of enclosed CD 77 xi List of Figures 2.1 Example audio signal . 4 2.2 Windowing example . 8 2.3 Spectral leakage example . 10 2.4 Example audio spectrogram . 13 2.5 Bark scale . 16 2.6 Auditory masking . 18 3.1 NMF classification . 20 4.1 MP3 encoding scheme . 26 4.2 Comparison of various audio codecs . 28 5.1 Encoder overview . 33 5.2 ANMF-RAW Encoder . 33 5.3 ANMF-MDCT Encoder . 34 5.4 ANMF-STFT Encoder . 36 5.5 ANMF-STFT quantized phase frequencies . 37 5.6 ANMF-STFT quantized magnitude coefficient frequencies . 38 5.7 Decoder overview . 39 7.1 ANMF-RAW cost function . 56 7.2 ANMF-MDCT cost function . 58 7.3 ANMF-STFT cost function . 59 7.4 ANMF-STFT rank versus bitrate . 62 7.5 ANMF-STFT chunk size versus bitrate . 63 xiii List of Tables 2.1 Digital audio notation . 5 5.1 ANMF file structure . 32 5.2 Serialized matrix structure . 32 5.3 Serialized quantized matrix structure . 32 5.4 ANMF-RAW data structure . 34 5.5 ANMF-RAW structure of each data chunk . 34 5.6 ANMF-MDCT data structure . 35 5.7 ANMF-MDCT structure of each data chunk . 35 5.8 ANMF-STFT data structure . 39 5.9 ANMF-STFT structure of each magnitude submatrix . 39 7.1 PEAQ - Objective Difference Grade table . 54 7.2 List of tested audio files . 55 7.3 ANMF-RAW rank experiment results . 56 7.4 ANMF-RAW average runtime in seconds . 56 7.5 ANMF-MDCT rank experiment results . 59 7.6 ANMF-STFT rank experiment results . 61 7.7 ANMF-STFT chunk size experiment results . 62 7.8 ANMF-STFT µW experiment results . 63 7.9 ANMF-STFT µH experiment results . ..