Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE

NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier report series U.S. DEPARTMENT OF COMMERCE • National Telecommunications and Information Administration NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier U.S. DEPARTMENT OF COMMERCE September 2015 DISCLAIMER Certain commercial equipment and materials are identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the material or equipment identified is the best available for this purpose. iii PREFACE The work described in this report was performed by the Public Safety Communications Research Program (PSCR) on behalf of the Department of Homeland Security (DHS) Science and Technology Directorate. The objective was to quantify the speech intelligibility associated with a range of digital audio coding algorithms in various acoustic noise environments. This report constitutes the final deliverable product for this project. The PSCR is a joint effort of the National Institute for Standards and Technology and the National Telecommunications and Information Administration. v CONTENTS Preface..............................................................................................................................................v Figures.......................................................................................................................................... viii Tables ............................................................................................................................................. ix Abbreviations/Acronyms .................................................................................................................x Executive Summary ..................................................................................................................... xiii 1. Background ..................................................................................................................................1 1.1 Speech Intelligibility Factors .................................................................................................2 1.2 Speech Intelligibility Reference .............................................................................................3 2. Audio Codecs ...............................................................................................................................5 3. Speech and Noise .........................................................................................................................8 3.1 Speech ....................................................................................................................................8 3.2 Noise ......................................................................................................................................8 3.3 Processing Speech and Noise ................................................................................................9 4. Objective Estimation of Speech Intelligibility ...........................................................................11 4.1 Selecting SNRs ....................................................................................................................13 4.2 Selecting Codec Modes .......................................................................................................16 5. Modified Rhyme Testing ...........................................................................................................19 5.1 Listening Lab .......................................................................................................................19 5.2 An MRT Trial ......................................................................................................................20 5.3 MRT Structure .....................................................................................................................21 5.4 Test Subjects and Procedure ................................................................................................23 6. Analysis and Discussion ............................................................................................................25 6.1 Number and Distribution of Trials .......................................................................................25 6.2 MRT Data Analysis .............................................................................................................25 6.3 Analog FM Reference ..........................................................................................................28 6.4 Other Codec Modes .............................................................................................................29 6.5 Comparisons ........................................................................................................................32 7. Conclusions ................................................................................................................................37 8. References ..................................................................................................................................38 Acknowledgements ........................................................................................................................40 vii FIGURES Figure 1. Estimated speech intelligibility example results for five NB codec modes and AFM in club noise. ......................................................................................................12 Figure 2. Estimated speech intelligibility example results for five NB codec modes and AFM in siren noise. .....................................................................................................13 Figure 3. Number of codec modes that have estimated intelligibility not lower than AFM in siren noise. ................................................................................................................15 Figure 4. Photo depicting MRT lab setup. .....................................................................................19 Figure 5. Screenshot of the MRT voting interface. .......................................................................21 Figure 6. AFM intelligibility for each noise environment. ............................................................28 Figure 7. Intelligibility vs. data rate for all 28 codec modes in saw noise environment. ..................................................................................................................................30 Figure 8. Intelligibility vs. data rate for all 28 codec modes in club noise environment. ..................................................................................................................................30 Figure 9. Intelligibility vs. data rate for all 28 codec modes in coffee noise environment. ..................................................................................................................................30 Figure 10. Intelligibility vs. data rate for all 28 codec modes in siren noise environment.. .................................................................................................................................31 Figure 11. Intelligibility vs. data rate for all 28 codec modes in alarm noise environment. ..................................................................................................................................31 Figure 12. Intelligibility vs. data rate for all 28 codec modes in quiet environment. ....................31 Figure 13. Hypothesis test outcomes for 24 non-reference codec modes organized by increasing data rate and audio bandwidth. Light blue indicates intelligibility lower than AFM. White indicates intelligibility the same as AFM. Light yellow indicates intelligibility higher than AFM. ......................................................................................36 viii TABLES Table 1. Audio codec modes considered in this study. ....................................................................7 Table 2. Noise environments considered in this study. ...................................................................9 Table 3. SNR selected for each noise type. ...................................................................................16 Table 4. List of 28 codec modes with bandwidth and data rate. ....................................................18 Table 5. Number of successful trials (out of 432 total trials) for each condition. .........................26 Table 6. Intelligibility (R) for each condition (0≤R≤1). .................................................................27 Table 7. Example table comparing Codec Mode C with AFM. ....................................................32 Table 8. Values of the chi-squared (χ2) statistic for testing the null hypothesis. ...........................33 Table 9. Hypothesis test outcomes for 168 conditions. A minus sign with light blue shading indicates intelligibility lower than AFM, an equal sign with no shading indicates intelligibility the same as AFM, and a plus sign with light yellow shading indicates intelligibility higher than AFM. ............................................................35 ix ABBREVIATIONS/ACRONYMS 3GPP Third Generation

Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE

Speech Coding and Compression

Digital Speech Processing— Lecture 17

Surround Sound Processed by Opus Codec: a Perceptual Quality Assessment

Communications

Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform

THE FUTURE of IDEAS This Work Is Licensed Under a Creative Commons Attribution-Noncommercial License (US/V3.0)

Speech Coding Using Code Excited Linear Prediction

Intelligibility of Selected Speech Codecs in Frame-Erasure Conditions

An Ultra Low-Power Miniature Speech CODEC at 8 Kb/S and 16 Kb/S Robert Brennan, David Coode, Dustin Griesdorf, Todd Schneider Dspfactory Ltd

Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyaﬁl

Tr 126 959 V15.0.0 (2018-07)

A Novel Speech Enhancement Approach Based on Modified Dct and Improved Pitch Synchronous Analysis