Improving Broadcast Accessibility for Hard of Hearing Individuals Using Object-Based Audio Personalisation and Narrative Importance
Total Page:16
File Type:pdf, Size:1020Kb
Improving broadcast accessibility for hard of hearing individuals using object-based audio personalisation and narrative importance Lauren Alison Ward Supervisor: Dr. Ben Shirley Prof. Bill Davies Acoustics Research Centre School of Computing, Science and Engineering Doctorate of Philosophy in Acoustics and Audio Engineering University of Salford May 2020 "We’re all born mad. Some remain so." – Samuel Beckett, Waiting for Godot This work is dedicated to the memory of Gordon ‘Tere’ Manns, who never let an idea pass him without interrogating it. Abstract Technological advances in broadcasting can be the impetus for advances in accessibility services. For the 11 million individuals in the United Kingdom with some degree of hearing loss, the advent of object-based broadcasting and it’s personalisation features has the po- tential to facilitate a transition towards more accessible broadcast audio. Part I of this work conducts a systematic review of previous object-based accessibility research, identifying the personalisation of redundant non-speech objects as a potentially high impact yet unexplored area of research. Guided by these findings, and the results of a survey of end-user needs, the specific research questions of this work are then developed as: 1. What is the relationship between redundant non-speech audio objects and broadcast speech intelligibility, for normal and hard of hearing listeners? 2. Can a system be designed which allows end-users to control the balance between audio objects for dramatic content which is simple to use and preserves comprehension? Part II of this work shows that the presence of redundant non-speech sounds improve speech recognition in noise in normal hearing listeners, even when the sound partially masks the speech. Subsequent investigations show that this effect exists within hard of hearing cohorts also, and the benefit yielded by non-speech sounds can be predicted by the severity of hearing loss in an individual’s better hearing ear. Part III work translates these novel findings into practical broadcast accessibility technol- ogy, through the development of a new conceptual framework called: ‘Narrative Importance’. Based on this framework, production tools and an end-user interface are developed and de- ployed in a large scale public trial. The results of this trial demonstrate that this new approach to accessible audio can deliver content which is more enjoyable with reduced energetic masking of speech, whilst still maintaining the creative integrity and comprehension of the content. Acknowledgements Research is far from an individual achievement and there are so many friendly giants who have let me stand on their shoulders to get here; those who have contributed to this research in ways big and small, weird and wonderful. First of all, to my fellow inhabitants of the fish bowl (and those who managed escaped). In particular thank you Will, for always being willing to dig through R code with me, and Philippa for being my speech intelligibility buddy and the greatest experiment day organiser the world has ever seen. To the acoustics research centre lab staff, past and present: for last minute experiment participation, lending audiometers and long chats about stats over oreos. To the production staff who have let me pick their brains: Martyn Whitley, Eloise Whit- more, Howard Bargroff and Martyn Harries. Without you I would not know the narrative importance of the door in the ‘Rover’s Return’. To the members of the S3A Project, thank you for adopting me and providing the founda- tions upon which much this research has been built. To BBC R&D Northlab, for being the most brilliant workplace and for providing an endless supply of happiness cake on the kitchen table. In particular, my immense gratitude goes to the the audio team. Working with you has turned the crazy, un-achievable goals I set for myself at the beginning of this PhD into a reality. Particular thanks go to Jon Francombe for creating the first version of the NI control, colloquially known as "the knob". Thank you for making the ideas we had into something real and tangible. To Matthew Paradis, for creating the second NI control and the producer experiment interface – for your tireless hours spent on wrangling the WebAudio API, I am truly grateful. To Rick Hughes for the first NI metadata plug-in and to Manos Chourdakis, vi for his work on the second. To my dear sister-in-law Katherine Tucker, for many hours spent transcribing IPA and Catherine Robinson for re-recording all the R-SPIN sentences. Ginormous thanks go to Laura Russon, Rhys Davies, Gabriella Leon and the Casualty Post Production team. Your enthusiasm for trying new things made the A&E Audio trial not only possible, but a joy to be part of. My gratitude goes to those who put their money behind this project: the Institute of Acoustics, EPSRC, ISCA, ACM, the IET, BBC R&D and most of all the General Sir John Monash Foundation. Without the Sir John Monash Foundation and the belief and guidance of Peter Binks and Judith Landsberg, none of this would have been possible. To those who have edited and applied copious amounts red-pen to this work I, and the subsequent readers of this thesis, thank you: Chris Baume, Chris Pike, Hannah Clawson, Cammi Motley, Matteo Torcoli Bill Davies and Will Bailey. And most of all to those who have critiqued this work cover to cover: Pamela Ward, Kate Tucker, Mike Armstrong and Ben Shirley. And finally those special few My family, who instilled in me the curiosity and the wanderlust that led me to a PhD on the other side of the world and cheered me on throughout it. Mike Armstrong for all the looong discussions and kitchen table editing sessions. You attract interesting people into your orbit and I am proud to count myself among them. Ben Shirley – more colleague and friend, than supervisor. It has been a joy to work with you, busk presentations with you and drink fine tequila with you. Thank you for rolling the dice on a random Australian, thank you for introducing me to everyone worth knowing and thank you for being an excellent human being. And to Matthew Tucker – my champion and the sane, steady presence in my frenetic brainstorm. We are better as us. From the bottom of my heart, thank you. Declaration I hereby declare that except where specific reference is made to the work of others, the contents of this thesis are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. This dissertation contains fewer than 100,000 words including appendices. This thesis is my own work and contains nothing which is the outcome of work done in collaboration with others, except as specified in the following section, Collaborative work. Sections of collaborative work are also noted throughout the thesis text as footnotes at the beginning of the relevant sections of work. Sections of work in this thesis which had been published at the time of writing are noted in the List of Publications. Lauren Alison Ward May 2020 viii Collaborative work • Chapter3 – In Section 3.4.4 coding of free text responses was undertaken as part of the directed content analysis of the survey results. This was undertaken by two additional researchers to improve the reliability of the coding: Dr. Ben Shirley and Dr. Alex Wilson. All analysis of the codings was undertaken by the author. • Chapter7 – The voice recordings for the R2SPIN in Section 7.4.1 were made by Catherine Robinson. All other work in developing the R2SPIN was completed by the author. – In Section 7.4.2, the phonetic transcription of the original and re-recorded R- SPIN sentences was undertaken by Katherine Tucker. The summaries of these transcriptions and all subsequent usage of the transcriptions were performed by the author. • Chapter8 – The data collection in Section 8.2.2 and 8.2.3 was collected by the author and a team of additional researchers: Dr. Lara Harris, Philippa Demonte, Zuza Podwinska, Dr. William Bailey, Georgia Shirley and Dr. Ben Shirley. All analysis of these data was completed by the author. • Chapter9 – The initial NI control prototype, described in Section 9.3 was developed col- laboratively by Dr. Ben Shirley, Dr. Jon Francombe and the author, as part of the S3A: Future Spatial Audio for the Home project. Dr. Jon Francombe coded the function and interface for this prototype and performed the NI metadata assignment to the audio objects in ‘The Turning Forest’ and the ‘Protest’ scene used in Section 9.4. – In Section 9.4, the focus groups were facilitated by Dr. Ben Shirley, with notes taken by Philippa Demonte. Coding of the data was undertaken by an additional researcher to improve the reliability of the coding: Dr. Ben Shirley. All analysis of the codings was undertaken by the author. • Chapter 10 – The initial version of the NI metadata assignment plug-in, described in Sec- tion 10.2.1 was developed as part of the S3A: Future Spatial Audio for the Home project. It was coded by Dr. Rick Hughes, with support from Dr. Jon Fran- ix combe and Dr. James Woodcock. All work resulting from the use of this plug-in described in Section 10.2 was completed by the author. – The interface for the online metadata assignment task in Section 10.3.1 was col- laboratively designed by the author and Dr. Matthew Paradis, and implemented in the WebAudio API by Dr. Matthew Paradis. All other data collection and analysis undertaken in Section 10.3 was completed by the author. • Chapter 11 – The trial described in Chapter 11 was a collaboration between the author, the University of Salford, BBC R&D and BBC Studios’ Casualty programme.