Listening Test Conduction Handbook

Listening Test Conduction Handbook Roberto Amorim, PITA May 2005 Version 0.3 – December 2006 Summary Acknowledgements ....................................................................................................... 3 Introduction ..................................................................................................................... 4 The ABC/hr test methodology...................................................................................... 5 Steps to conduct a successful listening test.............................................................. 7 Test discussion........................................................................................................... 8 Preparing the sample packages.............................................................................. 9 Creating the config files and finishing packaging ............................................... 11 Starting the test ........................................................................................................ 13 Preparing the results ............................................................................................... 14 Making Plots ............................................................................................................. 16 Finishing at Last....................................................................................................... 17 Opening the Cans of Worms...................................................................................... 18 High bitrate testing....................................................................................................... 19 VBR versus CBR ......................................................................................................... 20 Dealing with ranked references ................................................................................. 22 Hints ............................................................................................................................... 23 Annex 1: Model test readme ...................................................................................... 24 2 Acknowledgements I have lots of people to thank for all the experience and knowledge I gathered while conducting listening tests from mid- 2003 to mid-2004. First and foremost, Mr. Darryl Miyaguchi (ff123). If it wasn’t for him, I wouldn’t have even started conducting tests. He was always there to provide support and guidance whenever a particularly tricky issue surfaced. He also helped me a lot fending off clueless critics and debating with clued-in ones. He’s also a very smart guy and a great friend. Then, the people that helped me with valuable criticism and comments: Juha Laaksonheimo (JohnV), Gian-Carlo Pascutto (Garf), Francis Niechcial (Guruboolez), Darin Morrison (Dibrom), [proxima], tigre and so many others. They helped me make my tests much better and more trustworthy. Also, the people that supported me with software they developed and hosting of samples: Darryl Miyaguchi, Thomas Schumm (Phong), Benno K. (schnofler), Paul Harris (verloren), Menno Bakker, Darin Morrison and spoon. Darryl Miyaguchi, Thomas Schumm and Jason Anthony were extremely helpful proof-reading this document. Last but not least, the listening test participants. They contributed their time and hearing to make my tests meaningful and informative. To all of you, my heartfelt thanks. Curitiba, May 2005 3 Introduction Blind tests were created as a method to test a product where the tester’s psychology could influence the test results. Such a phenomenon is often called “placebo effect”. The name comes from medication effectiveness tests (probably the field where blind tests are most widely used), and refers to test participants that display changes in their condition even though they were taking the placebo – usually tablets containing only sugar or flour. In that case, psychology alone is affecting their reaction to the placebo. A similar phenomenon can happen when the test depends on the auditory system. Subjectivists (people that prefer to rely on subjective opinions) want to believe some piece of audio hardware sounds better than another just because it costs 20 times more (a claim quickly dismissed by objectivists as placebo or “snake oil” until objective proof is produced). The battle between subjectivists and objectivists is an endless one, and has seen some of its highest points at the rec.audio.opinion Usenet newsgroup. At times, the threads there become ablaze in vitriolic rage as holy wars between the two rival groups take place. To counter the claims of the subjectivists, the objectivists created a method called ABX to reliably compare two audio signals. In this test, the person taking the test sits in front of a testing device connected to the signal sources A and B. The test consists of several trials, and in each one the signal from one of the sources is routed to the loudspeaker and the other signal is muted. Sources are picked randomly at each trial. If the participant can’t consistently differentiate source A from source B, the signals are considered to sound the same to that person and any claim about superior quality in one device is considered to be a result of the placebo effect. Besides countering placebo, this methodology also is effective against bias. For instance, it’s likely that a die-hard Free Software enthusiast will claim Vorbis is better than WMA, no matter in what situation. In a blind environment, he won’t be able to let that prejudice influence his test results. In the late nineties, the International Telecommunications Union (ITU) created a method to compare several audio sources at once with attention to relative quality among sources, and not only the ability to detect differences among them. That method is called ABC/hr, and is the core of audio codec listening tests. 4 The ABC/hr test methodology The International Telecommunications Union created the ABC/hr (ABC/hidden reference) methodology as a solution to test their own telecommunication codecs easily and reliably while avoiding interferences related to placebo. It is officially called “Methods For The Subjective Assessment Of Small Impairments In Audio Systems Including Multichannel Sound Systems” and is referenced as document ITU-R BS.1116-11. Here is how a test is assembled: Fig. 1: Example of a test going on in ABC/hr for Java 1. The test conductor prepares the samples beforehand by encoding them with the encoders being tested and decoding said samples back to .wav 2. The samples and the uncompressed reference are loaded into the comparer program. Advanced ABC/hr comparers like ABC/hr for Java can automatically detect and correct offset and gain differences among samples 3. The program randomizes the order of the samples to be compared so that it’s pretty much impossible to know what sample is being tested at a given time. It then creates sliders for each sample. The sliders come in left-right pairs. One of them is the uncompressed reference and the other is the 1 http://www.itu.int/rec/R-REC-BS.1116-1-199710-I/e 5 sample to be tested. The position of the reference and the sample is also randomized 4. The test starts. By clicking the play button under a slider, the test participant listens to the stream represented by that slider. Clicking the “Ref” play button, he/she can listen to the reference in that test without being told which slider is actually the reference 5. The bar at the bottom is used to select a range of audio data that the participant deems particularly interesting for that test. When a range is selected, the other parts of the sample aren't played. 6. Once he/she has listened to the samples, the listener moves the sliders he/she believes to be the samples attributing scores to the feature being tested (usually absolute quality). He/she can click the notepad on top of the moved slider to comment on that sample 7. Having finished the test, the listener saves his/her results and reads them, or sends them to the test administrator in case they were encrypted to avoid tampering. One of the biggest strengths of this test is the fact that the reference is hidden next to the sample. That way, clowns that took the test and just moved sliders randomly will have a big chance of being caught for ranking the samples several times. Of course, not all people ranking references are necessarily trying to mess your test, or deaf, as will be discussed later on. The ABC/hr method can be used to test not only audio codecs, but also video codecs, audio hardware, and several other products. This document will focus on audio codec tests alone, since audio hardware tests are very difficult to set up properly and the software tools to set up an audio codec test are easily available, which doesn’t apply to video codec testing. So you got sold and want to conduct your very own public listening test, but you have doubts how to proceed? I’m here to help. 6 Steps to conduct a successful listening test In this section, I’ll try to guide you in the steps to conduct a listening test, from the early discussion to the publication of the results. First and foremost, find a place to discuss your test. You should be open to accept criticism and suggestions, otherwise people won’t be very interested in your test. Probably the most friendly place to conduct such discussion is the Hydrogenaudio forum (http://www.hydrogenaudio.org), with what is probably the largest population of objectivists on the Internet, and an eventual subjectivist that serves as laughing

Listening Test Conduction Handbook

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support