Interactive Design for Perceptually Motivated HRTF Selection
Total Page:16
File Type:pdf, Size:1020Kb
Filmuniversit¨atBabelsberg KONRAD WOLF Interactive design for perceptually motivated HRTF selection Interaktives Design f¨ur wahrnehmungsmotivierte HRTF Auswahl Master thesis by Vensan Mazmanyan November, 2017 Vorgelegt von: Vensan Mazmanyan Studiengang: Master Sound for picture Matrikelnummer: 5353 Betreuer: Dipl.-Ing. Felix Fleischmann Gutachter: Prof. Dr.-Ing. Klaus Hobohm Anhang: USB Stick - Videobeispiele Acknowledgments The following work was accomplished at the Fraunhofer Institute for Integrated Cir- cuits in Erlangen, Germany. I am very grateful to Dipl.-Ing. Felix Fleischmann for involving with the idea of my thesis and for his invaluable support. I would also like to specially thank to Dipl.-Ing. Jan Plogsties for his help during the preliminary research prior to for- mulating the thesis and for his essential advices afterwards. Special thanks also to Dipl.-Tonmeister Ulli Scuda for all the opportunities and resources provided for the implementation of the test-design. My warmest thanks to all who have supported me with advices and feedback during my research and to all who have participated in the listening tests. Vensan Mazmanyan List of abbreviations 3D - Three-dimensional 6DoF - Six degrees of freedom API - Application programming interface AR - Augmented reality. ARI - Acoustics Research Institute BEM - Boundary element method BRIR - Binaural room impulse response CIPIC - Center for Image Processing and Integrated Computing DIRAC - short name of the Dirac-impulse or Dirac-function, named after the English theoretical physicist Paul Dirac DFEQ - Diffuse-field equalization DSP - Digital signal processing EQ - Equalizer FIR - Finite impulse response GPU - Graphics processing unit GUI - Graphical user interface HMD - Head-mounted display (e.g. Oculus Rift, HTC Vive etc.) HPIR - Headphones impulse response HRIR - Head-related impulse response HRTF - Head-related transfer function IIS - Institut f¨urIntegrierte Schaltungen ILD - Interaural level difference IR - Impulse response IRCAM - Institut de Recherche et Coordination Acoustique/Musique ISD - Interaural spectral differences ITD - Interaural time difference ITU-R - International Telecommunication Union, Radiocommunication Sector LS - Loudspeaker LTI - Linear time-invariant (system) MLS - Maximum length sequence MUSHRA - Multi-Stimulus Test with Hidden Reference and Anchor PRTF - Pinna related transfer function QE - Quality element (in the physical domain) [Jekosch 2004] QF - Quality feature (in the perceptual domain) [Jekosch 2004] QoE - Quality of Experience [Jekosch 2004] QoS - Quality of system or service [Jekosch 2004] SOFA - Spatially Oriented Format for Acoustics TF - Transfer function VR - Virtual reality. Zusammenfassung Die Entwicklung von immersiven Medien f¨uhrtzu einer erh¨ohten Aufmerksamkeit auf die Wiedergabem¨oglichkeiten von r¨aumlicher Tongestaltung. Im Bereich der VR-Technologie, stellt die binaurale Tonwiedergabe ¨uber Kopfh¨orereinen wichtigen Aspekt der Personalisierung dar. Die Erstellung von individuellen HRTFs bleibt auch heute immer noch schwierig und ist nur unter speziellen Bedingungen m¨oglich. Weil passende HRTFs f¨urdie Gesamtqualit¨atder Wahrnehmung eines binauralen Con- tents entscheidend sind, werden unterschiedliche L¨osungenuntersucht die individu- elle HRTF anzun¨ahernoder eine andere HRTF anzupassen. Mit der Entwicklung neuer Austauschformate f¨urHRTFs und deren Unterst¨utzungdurch diverse neu- entwickelte binaurale Systeme, wird es m¨oglich, existierende HRTF-Datenbanken zu untersuchen und perzeptiv zu evaluieren. Dies hat den Autor motiviert, die Effekte und Zusammenwirkung von konkreten tech- nischen HRTF Parametern und Wahrnehmungskriterien zu untersuchen. Im Rahmen der Recherche wurde eine Datenbank aus individuell gemessenen HRTFs erstellt. Diese wurde im Zusammenhang mit anderen existierenden HRTF-Datenbanken analysiert. Anhand der Ergebnisse wurden H¨ortestsdurchgef¨uhrt, um Wahrnehmungskriterien und verantwortliche technische Parameter zu testen. Dabei wurden auch multi- variate Zusammenh¨angezwischen unterschiedlichen Kriterien festgestellt. Ein Ver- fahren f¨urwahrnehmungsmotivierte HRTF-Auswahl wurde vorgeschlagen, das in einer VR-Implementierung in Unity getestet wurde. Die Ergebnisse haben gezeigt, dass die H¨orerin der Lage waren, innerhalb einer VR-Umgebung kritisch zuzuh¨oren und ihre Pr¨aferenzanhand isolierter Kriterien und Parameter auszudr¨ucken. Die statistische Auswertung hat gezeigt, dass die ausgew¨ahltenHRTFs keine signifikante Verbesserung gegen¨uber individuellen HRTFs, der HRTF von einem Kunstkopf und einer nummerisch gemittelten HRTF gebracht hat. Allerdings war eine Validierung der Gesamtqualit¨atim Kontext einer komplexen Wiedergabe nicht m¨oglich aufgrund der Einschr¨ankungdes Renderings auf ein dynamisches Objekt. Contents 1 Introduction 1 2 Fundamentals 3 2.1 Binaural hearing . 3 2.2 VR..................................... 6 2.3 HRTF acquisition methods . 11 2.4 HRTF selection methods . 13 2.4.1 Seeber-Fastl method . 13 2.4.2 DOMISO method . 13 3 Proposed HRTF selection method 15 3.1 General considerations . 15 3.2 Proposed method overview . 15 3.3 BRIR measurements . 17 3.3.1 Measurement setup . 17 3.3.2 Postprocessing . 19 3.3.3 HRTF database preparation in SOFA format . 19 3.4 Selecting the most relevant perceptual criteria and isolation of the responsible technical parameters . 20 3.4.1 Perceptual criteria . 20 3.4.2 Possible relevant technical parameters . 20 3.4.3 Analisys of HRTF-databases . 24 3.4.4 ITD analysis . 24 3.4.5 Main pinna notch analysis . 26 4 Implementation of the VR-environment 32 4.1 System overview . 32 4.2 VR-rendering . 33 4.3 Controller . 33 4.4 GUI . 34 5 Preliminary listening tests 35 5.1 Sound coloration and spectral dynamics . 35 5.2 Externalization, generic HRTF, DFEQ and main notch effect . 38 5.3 Horizontal localization . 43 5.4 Vertical localization . 45 5.5 Data analysis and interpretation . 48 6 Summary of the selected QF-QE pairs. Selection design. 50 7 Selection and validation tests 52 7.1 Database preparation for the selection procedure . 52 7.2 Selection test . 52 7.3 Validation test . 54 8 Discussion and outlook 56 9 References 59 10 List of figures 63 11 Declaration of authorship / Eidesstattliche Erkl¨arung 66 1 Introduction The recent uplift of the consumer aimed VR-technologies and applications have brought up attention to the personalized spatial audio rendering solutions. In order to enable a complete immersive experience for the end-user, not only an immersive 360° visual environment, but also a plausible three-dimensional (3D) au- dio listening experience through headphones is needed. This represents a challenge in many aspects of the content creation and rendering that need to be addressed [Rumsey 2016]. If we compare the conventional multichannel audio reproduction over loudspeakers and the spatial audio reproduction over headphones, there is one obvious difference that has to be pointed. In contrast to the loudspeaker reproduction, the headphones reproduction utilizes a binaural perceptual model that exists as an additional layer in the rendering engine. This model, often referred to as HRTF as a generalized term, contains the so called binaural cues, which provide the information about the acoustic effects caused by the human head, torso and outer ear (as well as the room acoustics if present) when interfering with an acoustic wave front emitted by a sound source in the three-dimensional space. This information is stored as a set of transfer functions describing those effects for a given source position and when applied to an incoming signal, the human cognitive system reacts to the sound similarly as if the signal originates from a real acoustical source positioned in the natural surrounding space. `Authenticity, in this context, means that the subjects at the receiving end do not sense a difference between their actual auditory events and those which they would have had at the recording position when the recording was made.' [Jens Blauert 1997] Jens Blauert, 1997 Many researchers throughout the years have tested and analyzed the nature of the binaural cues of the human spatial hearing, studying the principles that enables us to localize the direction of the sound, to evaluate its distance, to analyze the interac- tion between the sounds and the reflective environment that they are placed in etc. There are still many unanswered questions concerning the psychophysics of human hearing that need to be further investigated, but one thing is being made clear so far. The perceived binaural effect could be significantly different for different indi- viduals, which is due to a complex mix of many different factors involving not only the purely physical qualities of the human body structure, but among others also the personal auditive experience and education, health condition, the type of sound source, headphone qualities, rendering artifacts, signal parameters etc. A highly regarded example as part of those studies is the binaural rendering for a particular person by applying a perceptual model based on the actual anthropometric features of this same person, or a perceptual model also known as individual HRTF. Theoretically this is the only way to the most precise approximation of the individual perception under real natural (without headphones) listening conditions. 1 To date, the process of acquiring the personal HRTF remains quite expensive, time- consuming and really couldn't be categorized as a user-friendly experience. This is why such procedures