Interactive Design for Perceptually Motivated HRTF Selection

FilmuniversitätBabelsberg KONRAD WOLF Interactive design for perceptually motivated HRTF selection Interaktives Design für wahrnehmungsmotivierte HRTF Auswahl Master thesis by Vensan Mazmanyan November, 2017 Vorgelegt von: Vensan Mazmanyan Studiengang: Master Sound for picture Matrikelnummer: 5353 Betreuer: Dipl.-Ing. Felix Fleischmann Gutachter: Prof. Dr.-Ing. Klaus Hobohm Anhang: USB Stick - Videobeispiele Acknowledgments The following work was accomplished at the Fraunhofer Institute for Integrated Cir- cuits in Erlangen, Germany. I am very grateful to Dipl.-Ing. Felix Fleischmann for involving with the idea of my thesis and for his invaluable support. I would also like to specially thank to Dipl.-Ing. Jan Plogsties for his help during the preliminary research prior to for- mulating the thesis and for his essential advices afterwards. Special thanks also to Dipl.-Tonmeister Ulli Scuda for all the opportunities and resources provided for the implementation of the test-design. My warmest thanks to all who have supported me with advices and feedback during my research and to all who have participated in the listening tests. Vensan Mazmanyan List of abbreviations 3D - Three-dimensional 6DoF - Six degrees of freedom API - Application programming interface AR - Augmented reality. ARI - Acoustics Research Institute BEM - Boundary element method BRIR - Binaural room impulse response CIPIC - Center for Image Processing and Integrated Computing DIRAC - short name of the Dirac-impulse or Dirac-function, named after the English theoretical physicist Paul Dirac DFEQ - Diffuse-field equalization DSP - Digital signal processing EQ - Equalizer FIR - Finite impulse response GPU - Graphics processing unit GUI - Graphical user interface HMD - Head-mounted display (e.g. Oculus Rift, HTC Vive etc.) HPIR - Headphones impulse response HRIR - Head-related impulse response HRTF - Head-related transfer function IIS - Institut fürIntegrierte Schaltungen ILD - Interaural level difference IR - Impulse response IRCAM - Institut de Recherche et Coordination Acoustique/Musique ISD - Interaural spectral differences ITD - Interaural time difference ITU-R - International Telecommunication Union, Radiocommunication Sector LS - Loudspeaker LTI - Linear time-invariant (system) MLS - Maximum length sequence MUSHRA - Multi-Stimulus Test with Hidden Reference and Anchor PRTF - Pinna related transfer function QE - Quality element (in the physical domain) [Jekosch 2004] QF - Quality feature (in the perceptual domain) [Jekosch 2004] QoE - Quality of Experience [Jekosch 2004] QoS - Quality of system or service [Jekosch 2004] SOFA - Spatially Oriented Format for Acoustics TF - Transfer function VR - Virtual reality. Zusammenfassung Die Entwicklung von immersiven Medien führtzu einer erhöhten Aufmerksamkeit auf die Wiedergabemöglichkeiten von räumlicher Tongestaltung. Im Bereich der VR-Technologie, stellt die binaurale Tonwiedergabe über Kopfhörereinen wichtigen Aspekt der Personalisierung dar. Die Erstellung von individuellen HRTFs bleibt auch heute immer noch schwierig und ist nur unter speziellen Bedingungen möglich. Weil passende HRTFs fürdie Gesamtqualitätder Wahrnehmung eines binauralen Con- tents entscheidend sind, werden unterschiedliche Lösungenuntersucht die individu- elle HRTF anzunähernoder eine andere HRTF anzupassen. Mit der Entwicklung neuer Austauschformate fürHRTFs und deren Unterstützungdurch diverse neu- entwickelte binaurale Systeme, wird es möglich, existierende HRTF-Datenbanken zu untersuchen und perzeptiv zu evaluieren. Dies hat den Autor motiviert, die Effekte und Zusammenwirkung von konkreten tech- nischen HRTF Parametern und Wahrnehmungskriterien zu untersuchen. Im Rahmen der Recherche wurde eine Datenbank aus individuell gemessenen HRTFs erstellt. Diese wurde im Zusammenhang mit anderen existierenden HRTF-Datenbanken analysiert. Anhand der Ergebnisse wurden Hörtestsdurchgeführt, um Wahrnehmungskriterien und verantwortliche technische Parameter zu testen. Dabei wurden auch multi- variate Zusammenhängezwischen unterschiedlichen Kriterien festgestellt. Ein Ver- fahren fürwahrnehmungsmotivierte HRTF-Auswahl wurde vorgeschlagen, das in einer VR-Implementierung in Unity getestet wurde. Die Ergebnisse haben gezeigt, dass die Hörerin der Lage waren, innerhalb einer VR-Umgebung kritisch zuzuhören und ihre Präferenzanhand isolierter Kriterien und Parameter auszudrücken. Die statistische Auswertung hat gezeigt, dass die ausgewähltenHRTFs keine signifikante Verbesserung gegenüber individuellen HRTFs, der HRTF von einem Kunstkopf und einer nummerisch gemittelten HRTF gebracht hat. Allerdings war eine Validierung der Gesamtqualitätim Kontext einer komplexen Wiedergabe nicht möglich aufgrund der Einschränkungdes Renderings auf ein dynamisches Objekt. Contents 1 Introduction 1 2 Fundamentals 3 2.1 Binaural hearing . 3 2.2 VR..................................... 6 2.3 HRTF acquisition methods . 11 2.4 HRTF selection methods . 13 2.4.1 Seeber-Fastl method . 13 2.4.2 DOMISO method . 13 3 Proposed HRTF selection method 15 3.1 General considerations . 15 3.2 Proposed method overview . 15 3.3 BRIR measurements . 17 3.3.1 Measurement setup . 17 3.3.2 Postprocessing . 19 3.3.3 HRTF database preparation in SOFA format . 19 3.4 Selecting the most relevant perceptual criteria and isolation of the responsible technical parameters . 20 3.4.1 Perceptual criteria . 20 3.4.2 Possible relevant technical parameters . 20 3.4.3 Analisys of HRTF-databases . 24 3.4.4 ITD analysis . 24 3.4.5 Main pinna notch analysis . 26 4 Implementation of the VR-environment 32 4.1 System overview . 32 4.2 VR-rendering . 33 4.3 Controller . 33 4.4 GUI . 34 5 Preliminary listening tests 35 5.1 Sound coloration and spectral dynamics . 35 5.2 Externalization, generic HRTF, DFEQ and main notch effect . 38 5.3 Horizontal localization . 43 5.4 Vertical localization . 45 5.5 Data analysis and interpretation . 48 6 Summary of the selected QF-QE pairs. Selection design. 50 7 Selection and validation tests 52 7.1 Database preparation for the selection procedure . 52 7.2 Selection test . 52 7.3 Validation test . 54 8 Discussion and outlook 56 9 References 59 10 List of figures 63 11 Declaration of authorship / Eidesstattliche Erklärung 66 1 Introduction The recent uplift of the consumer aimed VR-technologies and applications have brought up attention to the personalized spatial audio rendering solutions. In order to enable a complete immersive experience for the end-user, not only an immersive 360° visual environment, but also a plausible three-dimensional (3D) audio listening experience through headphones is needed. This represents a challenge in many aspects of the content creation and rendering that need to be addressed [Rumsey 2016]. If we compare the conventional multichannel audio reproduction over loudspeakers and the spatial audio reproduction over headphones, there is one obvious difference that has to be pointed. In contrast to the loudspeaker reproduction, the headphones reproduction utilizes a binaural perceptual model that exists as an additional layer in the rendering engine. This model, often referred to as HRTF as a generalized term, contains the so called binaural cues, which provide the information about the acoustic effects caused by the human head, torso and outer ear (as well as the room acoustics if present) when interfering with an acoustic wave front emitted by a sound source in the three-dimensional space. This information is stored as a set of transfer functions describing those effects for a given source position and when applied to an incoming signal, the human cognitive system reacts to the sound similarly as if the signal originates from a real acoustical source positioned in the natural surrounding space. Àuthenticity, in this context, means that the subjects at the receiving end do not sense a difference between their actual auditory events and those which they would have had at the recording position when the recording was made.' [Jens Blauert 1997] Jens Blauert, 1997 Many researchers throughout the years have tested and analyzed the nature of the binaural cues of the human spatial hearing, studying the principles that enables us to localize the direction of the sound, to evaluate its distance, to analyze the interac- tion between the sounds and the reflective environment that they are placed in etc. There are still many unanswered questions concerning the psychophysics of human hearing that need to be further investigated, but one thing is being made clear so far. The perceived binaural effect could be significantly different for different indi- viduals, which is due to a complex mix of many different factors involving not only the purely physical qualities of the human body structure, but among others also the personal auditive experience and education, health condition, the type of sound source, headphone qualities, rendering artifacts, signal parameters etc. A highly regarded example as part of those studies is the binaural rendering for a particular person by applying a perceptual model based on the actual anthropometric features of this same person, or a perceptual model also known as individual HRTF. Theoretically this is the only way to the most precise approximation of the individual perception under real natural (without headphones) listening conditions. 1 To date, the process of acquiring the personal HRTF remains quite expensive, time- consuming and really couldn't be categorized as a user-friendly experience. This is why such procedures

Interactive Design for Perceptually Motivated HRTF Selection

Sprachrohr Heft 58

A Layer Model of Sound Quality

Schriftkulturen Der Musik

Final Catalogue

ISSN 1611-0153 Urn: Nbn: De: 101-ND06 2010-0

Jens Blauert (Editor) Communication Acoustics

Instrumental Analysis and Synthesis of Auditory Scenes: Communication Acoustics

International Journal of Acoustics and Vibration, Vol

Modelling the Direction-Specific Build-Up of the Precedence Effect

Psychoacoustics: a Brief Historical Overview

Precedence Effect for Specular and Diffuse Reflections

Notes on the Materiality of Sound