Bachelor Thesis

BACHELOR THESIS Perceived Audio Quality in Ogg Vorbis and AAC in Compressed Popular Music in Bit Rates between 96 kbit/s, 128 kbit/s and 192 kbit/s Erica Engborg 2016 Bachelor of Arts Audio Engineering Luleå University of Technology Department of Arts, Communication and Education Perceived Audio Quality in Ogg Vorbis and AAC in Compressed Popular Music in Bit Rates between 96 kbit/s, 128 kbit/s and 192 kbit/s Erica Engborg Bachelor Thesis – Audio Engineering 2016 Supervisor: Jonas Ekeroot Luleå University of Technology Department of Arts, Communication and Education 2 Bachelor Thesis Abstract The audio quality in music has proved to be important by a broad audience. Studies show that 67,1 % of young listeners (Olive, 2011) and 72 % of experienced listeners (Pras et al, 2009) can distinguish lossy coded audio from lossless coded audio. Ogg Vorbis and AAC are two common lossy codecs in music streaming services. Factors that impact the perceived audio quality are genre, bit rate, codec and instrument. An ABX test was performed with 20 trained listeners. The excerpts that were tested were three popular songs from 2015 and 2016 coded in AAC and Ogg Vorbis 96 kbit/s, 128 kbit/s and 192 kbit/s. The results showed that the subjects only could hear a difference between uncoded and lossy coded audio for one excerpt coded in AAC 96 kbit/s. This result does not exactly match those of earlier studies where subjects could hear the difference at higher bit rates, but there are still some similarities. This study is too small to come to any conclusions and more research has to be done on these codecs. The music streaming services that use these codecs already use higher bit rates than these that were tested so the quality is already good enough. 2 Erica Engborg Table of contents Abstract 2 1. Introduction 4 1.1 Background 4 1.1.1 AAC 4 1.1.2 Ogg Vorbis 6 1.2 Previous research 7 1.2.1 Comparison of Vorbis and other codecs in terms of quality 7 1.2.2 Comparison of Vorbis, AAC and other codecs in terms of quality 7 1.2.3 Different sound qualities in different genres 8 1.2.4 Conclusions from earlier research 9 1.3 Purpose 9 2. Method 11 2.1 Stimuli 11 2.1.1 Ogg Vorbis encoding and decoding 12 2.1.2 AAC encoding and decoding 12 2.2 Listening environment 12 2.3 Test subjects 13 2.4 Test design and test instructions 13 3. Results 18 3.1 Statistical analysis 19 3.2 Results analysis 20 4. Discussion 21 5. References 24 Appendix 1 – Spectrograms 26 Appendix 2 – Technical information about the coded songs 33 3 4 Bachelor Thesis 1. Introduction The way we listen and consume music has actually prefer bad sound to good sound”. changed over the last ten years. The With good sound he meant CD quality and compact disc (CD) introduced 44.1/16 to a with bad he meant lossy compressed audio. consumer market. The accessibility of The test result showed that people less than applications on the internet that allow the 20 years old prefer CD-quality to MP3- consumer to buy digital music instead of quality. Similar results have been physical CDs leads to many people using discovered in a similar test with the internet to download and stream music. experienced listeners. (Olive, 2011). Ten years ago limited bandwidth was a problem and audio files with the resolution In Olive’s (2011) test 67,1 % of 44.1 kHz in sample rate and 16 bits led inexperienced teenagers and 72 % in Pras to slow streaming and downloading. To et al’s (2009) test with experienced make music more accessible the music was listeners preferred high quality. Therefore often compressed by an audio codec to it is important to maintain the quality and reduce the file size. In order to reduce the ensure that the lossy compressed audio file size, bits were thrown away. That kind contains as few audible artifacts that is of bit reduction can lead to audible possible to resemble the CD-quality and artifacts. Bit reduction in music is still contain a preferred quality even after the widely used on the Internet in music compression of the audio. streaming services, audio in web TV and DAB-radio. The lossy compression of the Two common codecs that is used for music audio can lead to artifacts and quality streaming and music downloading are Ogg degradation depending on the compression Vorbis and AAC. Ogg Vorbis is used in ratio. The higher the compression ratio is, Spotify, which is a music streaming the more compressed the audio becomes service with 20 million paying users and the more the quality degrades. At the (Karlsson, 2015). AAC is used in iTunes expense of the worse audio quality, this and Apple Music that provides music leads to smaller audio files and faster downloading and streaming for nearly 800 transmission. million accounts (Arora, 2015). Olive (2001) summarized three earlier 1.1 Background studies that “provide some evidence that 1.1.1 AAC listeners generally prefer CD-quality to AAC is a proprietary codec, which means MP3 formats, particularly at lower bit rates that the creators are obligated to guarantee (< 92-128 kbit/s) where the MP3 artifacts a predictable result (Pan, 1995). This become audible”. Olive’s “study was means that the players that intend to have partially motivated by the current popular the compatibility to play AAC need to pay myth – reported widely in the media - that for a license and conform to specifications young consumers are either indifferent to that achieve the intended result. The AAC- the quality of reproduced sound, or they codec is the result of further development 4 Erica Engborg of the MPEG-1 layer III (also known as components of the audio so they MP3) coding from which it uses many could be treated differently. features. The problem with MPEG-1 layer • Model 2 is more complex than III is that it is designed to be backward model 1. It locates the tonal and the nontonal components in the audio. compatible. That results in some To locate these sounds the model compromises that could be improved in a uses data from the two previous new codec, like the way bands are divided. windows to predict the next. The To be backward compatible with MPEG-1 predictability probably makes the layer II and MPEG-1 layer I the audio second model better for must be divided into 32 spectrally equally discriminating the tonal and wide bands on a linear frequency scale. nontonal audio. These bands are then divided in 6 or 18 The models take advantage of the human sub bands depending on the amount of ear’s disability to hear sounds before and transients in each band. In case if many after a transient. Transients have the ability short transients, more sub bands are needed to mask a sound before and after the attack. to hide the bit reduction artifacts (Sayood, The masking is longer after a transient than 2005). before it. Bands that are inaudible for the human ear (that are below the audible SPL- Salomon and Motta (2010) describe the bit level) are removed. The removal of bits in reduction procedure in the following way: bands result in noise that is ideally masked the audio samples are grouped into 32 sub from being below the audible SPL-level. bands by polyphase filters. The audio is The noise can also be masked spectrally by transformed from the time domain to the a tone. If a tone is played, the threshold of frequency domain by using the modified audibility will increase for frequencies discrete cosine transform (MDCT). The around this tone and if another tone is SPL level of the audio for each sub band is played at lower amplitude, spectrally near calculated. Each band is processed the first tone, the second tone will be separately to optimize the quality of the masked. (Salomon & Motta, 2010) lossy compression. A combination of two psychoacoustic models and the number of The noise in the perceptual coding is available bits determines how much noise mainly quantization noise that arises when there will be in each band. The algorithm the audible bands are quantized. This noise determines which psychoacoustic model can be shaped with different noise shaping that is used for each frame. MPEG-1 Layer techniques in the AAC-codec to have less III uses mostly model 2 that is more audible impact on the audio signal. compilation demanding (Salomon & Temporal noise shaping (TNS) shapes the Motta, 2010). Pan (1995) presents the two noise around transients so the masking psychoacoustic models that is used in the effect will cover the noise. This technique MPEG layer III and AAC codecs as: improves the quality especially on voice signals (Salomon & Motta, 2010). The Model 1 identifies and separates • Perceptual noise substitution (PNS) the nontonal and the tonal improves the bit reduction by localizing 5 6 Bachelor Thesis the level of the noise and replaces it with a variable bit rate (VBR). Montnémery and gain value so the noise can be added in the Sandvall (2004) say that VBR can be used decoding. That can result in a lower bit rate to allow the bits to vary over time at a with the same quality (Painter & Spanias, “fixed” bit rate. Each frame does not need 2000). to contain the same amount of bits. With VBR the frames contain the amount of bits 1.1.2 Ogg Vorbis that is needed to avoid audible artifacts in The Vorbis-codec is an open-source codec the best way that is possible with the and doesn’t charge users with any license available amount of bits.

Bachelor Thesis

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support