Exploring Use of Non-Negative Matrix Factorization for Lossy Audio Compression Student: Bc

ASSIGNMENT OF MASTER’S THESIS Title: Exploring use of non-negative matrix factorization for lossy audio compression Student: Bc. Tomáš Drbota Supervisor: doc. Ing. Ivan Šimeček, Ph.D. Study Programme: Informatics Study Branch: System Programming Department: Department of Theoretical Computer Science Validity: Until the end of winter semester 2019/20 Instructions The aim of this work is to research existing methods for lossy audio compression and to apply non-negative matrix factorization (NMF) in order to improve the compression ratio without a noticeable loss in quality. Therefore, the goals are as following: 1) Become familiar with state of the art digital audio encoding and compression methods [1]. 2) Become familiar with non-negative matrix factorization, its properties and applications in digital audio processing [2][3]. 3) Design and implement a proof of concept audio codec utilizing NMF for audio compression, or extend an existing codec with this functionality [4]. 4) Find a suitable method for measuring perceived audio quality. 5) Measure the resulting audio quality and compare the quality and compression ratio to existing solutions. References [1] BOSI, Marina and Richard E. GOLDBERG, 2003. Introduction to digital audio coding and standards. Boston: Kluwer Academic Publishers. [2] Wang, Y., & Zhang, Y. (2013). Nonnegative Matrix Factorization: A Comprehensive Review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336-1353. [3] RECOSKIE, Daniel and MANN, Richard, 2014, Constrained Nonnegative Matrix Factorization with Applications to Music Transcription. University of Waterloo Technical Report CS-2014-27 [online]. 23 December 2014. [4] VALIN, Jean-Marc, MAXWELL, Gregory, TERRIBERRY, Timothy B. and VOS, Koen, 2013, High-Quality, Low-Delay Music Coding in the Opus Codec. [online]. October 2013. doc. Ing. Jan Janoušek, Ph.D. doc. RNDr. Ing. Marcel Jiřina, Ph.D. Head of Department Dean Prague June 19, 2018 Master’s thesis Exploring use of non-negative matrix factorization for lossy audio compression Bc. TomáˇsDrbota Department of Theoretical Computer Science Supervisor: doc. Ing. Ivan Simeˇcek,Ph.D.ˇ May 8, 2019 Acknowledgements I would like to thank my supervisor for always being quick to consult any arising issues. My thanks also go to my family and friends for supporting me throughout the entire time of my studies. Declaration I hereby declare that the presented thesis is my own work and that I have cited all sources of information in accordance with the Guideline for adhering to ethical principles when elaborating an academic final thesis. I acknowledge that my thesis is subject to the rights and obligations stipu- lated by the Act No. 121/2000 Coll., the Copyright Act, as amended. In accordance with Article 46(6) of the Act, I hereby grant a nonexclusive authorization (license) to utilize this thesis, including any and all computer programs incorporated therein or attached thereto and all corresponding documentation (hereinafter collectively referred to as the “Work”), to any and all persons that wish to utilize the Work. Such persons are entitled to use the Work in any way (including for-profit purposes) that does not detract from its value. This authorization is not limited in terms of time, location and quantity. In Prague on May 8, 2019 . Czech Technical University in Prague Faculty of Information Technology c 2019 TomáˇsDrbota. All rights reserved. This thesis is school work as defined by Copyright Act of the Czech Republic. It has been submitted at Czech Technical University in Prague, Faculty of Information Technology. The thesis is protected by the Copyright Act and its usage without author’s permission is prohibited (with exceptions defined by the Copyright Act). Citation of this thesis Drbota, Tomáˇs. Exploring use of non-negative matrix factorization for lossy audio compression. Master’s thesis. Czech Technical University in Prague, Faculty of Information Technology, 2019. Abstrakt Algoritmus nezápornéhorozkladu matic byl úspˇesnˇepouˇzit pro mnoho r˚uzných problém˚u,kterésouvis´ıs analýzouvelkéhomnoˇzstv´ıdat a z nich plynouc´ıch souvislost´ı. Vyuˇz´ıváse napˇr. pro rozpoznáván´ıobliˇcej˚u,separaci signál˚uˇci kompresi obrázk˚u. C´ılemtétopráce je prozkoumat moˇznévyuˇzit´ınezápornéhorozkladu matic pro ztrátovou kompresi zvuku. Bude naprogramovánreferenˇcn´ıkodéra dekodér zvuku vyuˇz´ıvajic´ıNMF, a s n´ımbudou potom provedeny r˚uznéexperimenty. Výsledkybudou zmˇeˇreny a srovnány v˚uˇciexistuj´ıc´ımnástroj˚umpro kompresi zvuku. Kl´ıˇcová slova ztrátovákomprese, zvuk, komprese dat, nmf, kódován´ı Abstract Non-negative matrix factorization has been successfully applied in various scenarios involving analysis of large chunks of data and finding patterns in them for later use. It’s used to perform things such as face recognition, source separation or image compression among others. The purpose of this thesis is to research possible uses of non-negative matrix factorization in the problem of lossy audio compression. A reference vii audio encoder and decoder using NMF will be implemented and various ex- periments using this encoder will be conducted. The results will be measured and compared to existing audio compressing solutions. Keywords lossy compression, audio, data compression, nmf, encoding viii Contents 1 Introduction 1 2 Digital audio 3 2.1 Important terms . 3 2.1.1 Sampling . 3 2.1.2 Nyquist frequency . 4 2.1.3 Quantization . 5 2.1.4 Windowing . 7 2.1.5 Transient . 9 2.1.6 Aliasing . 9 2.1.7 Spectral leakage . 9 2.1.8 Scalloping . 9 2.2 Digital audio representation . 10 2.2.1 Time domain representation . 10 2.2.2 Frequency domain representation . 11 2.3 Psychoacoustics . 15 2.3.1 Pitch . 15 2.3.2 Loudness . 17 2.3.3 Auditory masking . 17 3 Non-negative matrix factorization 19 3.1 Linear dimensionality reduction . 19 3.2 NMF definition . 20 3.3 Classification . 20 3.3.1 Basic NMF . 20 3.3.2 Constrained NMF (CNMF) . 21 3.3.3 Structured NMF (SNMF) . 21 3.3.4 Generalized NMF (GNMF) . 21 3.4 Properties . 22 ix 3.5 Algorithms . 22 3.5.1 Cost function . 22 3.5.2 Update rules . 23 3.5.3 Initialization . 23 3.6 Use in digital audio processing . 24 4 Data compression 25 4.1 Lossy audio compression . 25 4.1.1 State of the art . 25 4.1.2 Compression using NMF . 28 4.2 Lossless data compression . 29 4.2.1 Huffman coding . 29 5 Design 31 5.1 WAVE file . 31 5.2 ANMF File structure . 32 5.3 Encoder . 33 5.3.1 ANMF-RAW . 33 5.3.2 ANMF-MDCT . 34 5.3.3 ANMF-STFT . 35 5.4 Decoder . 39 6 Implementation 41 6.1 Used 3rd party libraries . 41 6.1.1 click . 41 6.1.2 pytest . 42 6.1.3 dahuffman . 42 6.1.4 SciPy . 42 6.2 Command line utility . 43 6.3 Audio library . 43 6.4 Compression library . 44 6.4.1 Utility functions . 44 6.4.2 NMF implementation . 47 6.4.3 ANMF-RAW . 48 6.4.4 ANMF-MDCT . 48 6.4.5 ANMF-STFT . 50 7 Evaluation 53 7.1 Methodology . 53 7.2 GstPEAQ . 54 7.3 Evaluating results . 54 7.3.1 Audio examples . 55 7.3.2 Bitrate . 55 7.3.3 ANMF-RAW . 55 x 7.3.4 ANMF-MDCT . 57 7.3.5 ANMF-STFT . 58 7.3.6 Comparison . 63 8 Conclusion 65 Bibliography 67 A List of Symbols 73 B Abbreviations 75 C Contents of enclosed CD 77 xi List of Figures 2.1 Example audio signal . 4 2.2 Windowing example . 8 2.3 Spectral leakage example . 10 2.4 Example audio spectrogram . 13 2.5 Bark scale . 16 2.6 Auditory masking . 18 3.1 NMF classification . 20 4.1 MP3 encoding scheme . 26 4.2 Comparison of various audio codecs . 28 5.1 Encoder overview . 33 5.2 ANMF-RAW Encoder . 33 5.3 ANMF-MDCT Encoder . 34 5.4 ANMF-STFT Encoder . 36 5.5 ANMF-STFT quantized phase frequencies . 37 5.6 ANMF-STFT quantized magnitude coefficient frequencies . 38 5.7 Decoder overview . 39 7.1 ANMF-RAW cost function . 56 7.2 ANMF-MDCT cost function . 58 7.3 ANMF-STFT cost function . 59 7.4 ANMF-STFT rank versus bitrate . 62 7.5 ANMF-STFT chunk size versus bitrate . 63 xiii List of Tables 2.1 Digital audio notation . 5 5.1 ANMF file structure . 32 5.2 Serialized matrix structure . 32 5.3 Serialized quantized matrix structure . 32 5.4 ANMF-RAW data structure . 34 5.5 ANMF-RAW structure of each data chunk . 34 5.6 ANMF-MDCT data structure . 35 5.7 ANMF-MDCT structure of each data chunk . 35 5.8 ANMF-STFT data structure . 39 5.9 ANMF-STFT structure of each magnitude submatrix . 39 7.1 PEAQ - Objective Difference Grade table . 54 7.2 List of tested audio files . 55 7.3 ANMF-RAW rank experiment results . 56 7.4 ANMF-RAW average runtime in seconds . 56 7.5 ANMF-MDCT rank experiment results . 59 7.6 ANMF-STFT rank experiment results . 61 7.7 ANMF-STFT chunk size experiment results . 62 7.8 ANMF-STFT µW experiment results . 63 7.9 ANMF-STFT µH experiment results . ..

Exploring Use of Non-Negative Matrix Factorization for Lossy Audio Compression Student: Bc

Download Media Player Codec Pack Version 4.1 Media Player Codec Pack

Ardour Export Redesign

A Practical Approach to Spatiotemporal Data Compression

(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW

Command-Line Sound Editing Wednesday, December 7, 2016

Multimedia Compression Techniques for Streaming

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter

Data Compression Using Pre-Generated Dictionaries

Lossy Audio Compression Identification

What Is Ogg Vorbis?

Data Compression a Comparison of Methods

Methods of Sound Data Compression \226 Comparison of Different Standards