Audio Processing and Loudness Estimation Algorithms with Ios Simulations

Audio Processing and Loudness Estimation Algorithms with iOS Simulations by Girish Kalyanasundaram A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved September 2013 by the Graduate Supervisory Committee: Andreas Spanias, Chair Cihan Tepedelenlioglu Visar Berisha ARIZONA STATE UNIVERSITY December 2013 ABSTRACT The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP.” These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been i developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students. ii DEDICATION To my parents. iii ACKNOWLEDGMENTS This thesis would not have been possible without the support of a number of people. It is my pleasure to start by heartily thanking my advisor, Prof. Andreas Spanias, for providing me with the wonderful opportunity of being a Research Assistant with his supervision and pursuing a thesis for my Master’s degree. I thank him for his support and for providing me with all the resources for fuelling a fruitful Masters program. I also thank Prof. Cihan Tepedelenlioglu and Dr. Visar Berisha for agreeing to be part of my defense committee. In particular, I would like to thank Dr. Tepedelenlioglu for his support for the first six months of my assistantship on the PV array project. My interactions with him were great learning experiences during the initial phases of my assistantship. I also thank ‘Paceco Corporation’ for providing the financial support during my involvement in the initial stages of my assistantship. And, I thank Henry Braun and Venkatachalam Krishnan for all their support during this project. Dr. Harish Krishnamoorthi played a particularly important role by providing me with all the required materials and software during the initial stage of my work on auditory models. The lengthy discussions with him provided me with crucial insight to work towards the initial breakthroughs in the research. I would like to specially thank Dr. Mahesh Banavar , Dr. Jayaraman Thiagarajan and Dr. Karthikeyan Ramamurthy for all those stimulating discussions, timely help, suggestions and continued motivation which were vital for every quantum of my progress. I also thank Deepta Rajan, Sai Zhang, Shuang Hu, Suhas Ranganath, Mohit iv Shah, Steven Sandoval, Brandon Mechtley, Dr. Alex Fink, Bob Santucci, Xue Zhang, Prasanna Sattigeri and Huan Song for all their support. The best is always reserved for the last. I express my deepest and warmest thanks to my parents for placing their faith in me. Their contributions are beyond the scope of this thesis. I thank all friends and relatives for their support. And I finally thank the grace of the divine for everything. v TABLE OF CONTENTS Page LIST OF TABLES .................................................................................................................... x LIST OF FIGURES ................................................................................................................. xi CHAPTER 1 INTRODUCTION ..................................................................................................... 1 1.1 Human Auditory Models Based Processing and Loudness Estimation .............. 3 1.2 Computation Pruning for Efficient Loudness Estimation ................................... 6 1.3 Automatic Control of Perceptual Loudness ......................................................... 7 1.4 Interactivity in Speech/Audio DSP Education ..................................................... 7 1.5 Contributions ....................................................................................................... 12 1.6 Organization of Thesis ........................................................................................ 13 2 PSYCHOACOUSTICS AND THE HUMAN AUDITORY SYSTEM ................ 15 2.1 The Human Auditory System ............................................................................. 15 The Outer Ear ...................................................................................................... 15 The Middle Ear ................................................................................................... 15 The Inner Ear ...................................................................................................... 17 2.2 Psychoacoustics .................................................................................................. 18 The Absolute Threshold of Hearing ................................................................... 19 Auditory Masking ............................................................................................... 22 Critical Bands ..................................................................................................... 27 3 HUMAN AUDITORY MODELS AND LOUDNESS ESTIMATION ................ 31 3.1 Loudness Level and the Equal Loudness Contours (ELC) ............................... 31 vi 3.2 Steven’s Law and the ‘Sone’ Scale for Loudness .............................................. 35 3.3 Loudness Estimation from Neural Excitations .................................................. 36 3.4 The Moore and Glasberg Model for Loudness Estimation ............................... 41 Outer and Middle Ear Transformation ............................................................... 42 The Auditory Filters: Computing the Excitation Pattern .................................. 45 Specific Loudness Pattern .................................................................................. 47 Total Loudness .................................................................................................... 49 Short-term and Long-term Loudness ................................................................. 50 3.5 Moore and Glasberg Model: Algorithm Complexity ........................................ 51 4 EFFICIENT LOUDNESS ESTIMATION - COMPUTATION PRUNING TECHNIQUES ....................................................................................................... 54 4.1 Estimating the Excitation Pattern from the Intensity Pattern ............................ 59 Interpolative Detector Pruning ........................................................................... 61 Exploiting Region of Support of Auditory Filters in Computation Pruning .... 62 4.2 Simulation and Results ....................................................................................... 62 Experimental Setup ............................................................................................. 63 Performance of Proposed Detector Pruning ...................................................... 64 5 LOUDNESS CONTROL ........................................................................................ 68 5.1 Background ......................................................................................................... 68 5.2 Loudness Based Gain Adaptation ...................................................................... 70 5.3 Preserving the Tonal Balance ............................................................................. 77 5.4 Loudness Control System Setup......................................................................... 80 6 SPEECH/AUDIO PROCESSING FUNCTIONALITY IN IJDSP ........................ 82 vii 6.1 The iJDSP Architecture: Enhancements ............................................................ 83 Framework for Blocks with Long Signal Processing Capabilities ................... 83 Frame-by-Frame Processing and Visualization ................................................. 85 Planned DSP Simulations ................................................................................... 88 6.2 Developed DSP Blocks .....................................................................................

Audio Processing and Loudness Estimation Algorithms with Ios Simulations

An Assessment of Sound Annoyance As a Function of Consonance

Relating Optical Speech to Speech Acoustics and Visual Speech Perception

Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Lecture 16: Linear Prediction-Based Representations

Source-Channel Coding for CELP Speech Coders

AN INTRODUCTION to MUSIC THEORY Revision A

Samia Dawood Shakir

End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks

The Perception of Pure and Mistuned Musical Fifths and Major Thirds: Thresholds for Discrimination, Beats, and Identification

Linear Predictive Modelling of Speech - Constraints and Line Spectrum Pair Decomposition

A Biological Rationale for Musical Consonance Daniel L

Beethoven's Fifth 'Sine'-Phony: the Science of Harmony and Discord