Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE

Total Page:16

File Type:pdf, Size:1020Kb

Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier report series U.S. DEPARTMENT OF COMMERCE • National Telecommunications and Information Administration NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier U.S. DEPARTMENT OF COMMERCE September 2015 DISCLAIMER Certain commercial equipment and materials are identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the material or equipment identified is the best available for this purpose. iii PREFACE The work described in this report was performed by the Public Safety Communications Research Program (PSCR) on behalf of the Department of Homeland Security (DHS) Science and Technology Directorate. The objective was to quantify the speech intelligibility associated with a range of digital audio coding algorithms in various acoustic noise environments. This report constitutes the final deliverable product for this project. The PSCR is a joint effort of the National Institute for Standards and Technology and the National Telecommunications and Information Administration. v CONTENTS Preface..............................................................................................................................................v Figures.......................................................................................................................................... viii Tables ............................................................................................................................................. ix Abbreviations/Acronyms .................................................................................................................x Executive Summary ..................................................................................................................... xiii 1. Background ..................................................................................................................................1 1.1 Speech Intelligibility Factors .................................................................................................2 1.2 Speech Intelligibility Reference .............................................................................................3 2. Audio Codecs ...............................................................................................................................5 3. Speech and Noise .........................................................................................................................8 3.1 Speech ....................................................................................................................................8 3.2 Noise ......................................................................................................................................8 3.3 Processing Speech and Noise ................................................................................................9 4. Objective Estimation of Speech Intelligibility ...........................................................................11 4.1 Selecting SNRs ....................................................................................................................13 4.2 Selecting Codec Modes .......................................................................................................16 5. Modified Rhyme Testing ...........................................................................................................19 5.1 Listening Lab .......................................................................................................................19 5.2 An MRT Trial ......................................................................................................................20 5.3 MRT Structure .....................................................................................................................21 5.4 Test Subjects and Procedure ................................................................................................23 6. Analysis and Discussion ............................................................................................................25 6.1 Number and Distribution of Trials .......................................................................................25 6.2 MRT Data Analysis .............................................................................................................25 6.3 Analog FM Reference ..........................................................................................................28 6.4 Other Codec Modes .............................................................................................................29 6.5 Comparisons ........................................................................................................................32 7. Conclusions ................................................................................................................................37 8. References ..................................................................................................................................38 Acknowledgements ........................................................................................................................40 vii FIGURES Figure 1. Estimated speech intelligibility example results for five NB codec modes and AFM in club noise. ......................................................................................................12 Figure 2. Estimated speech intelligibility example results for five NB codec modes and AFM in siren noise. .....................................................................................................13 Figure 3. Number of codec modes that have estimated intelligibility not lower than AFM in siren noise. ................................................................................................................15 Figure 4. Photo depicting MRT lab setup. .....................................................................................19 Figure 5. Screenshot of the MRT voting interface. .......................................................................21 Figure 6. AFM intelligibility for each noise environment. ............................................................28 Figure 7. Intelligibility vs. data rate for all 28 codec modes in saw noise environment. ..................................................................................................................................30 Figure 8. Intelligibility vs. data rate for all 28 codec modes in club noise environment. ..................................................................................................................................30 Figure 9. Intelligibility vs. data rate for all 28 codec modes in coffee noise environment. ..................................................................................................................................30 Figure 10. Intelligibility vs. data rate for all 28 codec modes in siren noise environment.. .................................................................................................................................31 Figure 11. Intelligibility vs. data rate for all 28 codec modes in alarm noise environment. ..................................................................................................................................31 Figure 12. Intelligibility vs. data rate for all 28 codec modes in quiet environment. ....................31 Figure 13. Hypothesis test outcomes for 24 non-reference codec modes organized by increasing data rate and audio bandwidth. Light blue indicates intelligibility lower than AFM. White indicates intelligibility the same as AFM. Light yellow indicates intelligibility higher than AFM. ......................................................................................36 viii TABLES Table 1. Audio codec modes considered in this study. ....................................................................7 Table 2. Noise environments considered in this study. ...................................................................9 Table 3. SNR selected for each noise type. ...................................................................................16 Table 4. List of 28 codec modes with bandwidth and data rate. ....................................................18 Table 5. Number of successful trials (out of 432 total trials) for each condition. .........................26 Table 6. Intelligibility (R) for each condition (0≤R≤1). .................................................................27 Table 7. Example table comparing Codec Mode C with AFM. ....................................................32 Table 8. Values of the chi-squared (χ2) statistic for testing the null hypothesis. ...........................33 Table 9. Hypothesis test outcomes for 168 conditions. A minus sign with light blue shading indicates intelligibility lower than AFM, an equal sign with no shading indicates intelligibility the same as AFM, and a plus sign with light yellow shading indicates intelligibility higher than AFM. ............................................................35 ix ABBREVIATIONS/ACRONYMS 3GPP Third Generation
Recommended publications
  • Speech Coding and Compression
    Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression Corso di Networked Multimedia Systems Master Universitario di Primo Livello in Progettazione e Gestione di Sistemi di Rete Carlo Drioli Università degli Studi di Verona Facoltà di Scienze Matematiche, Dipartimento di Informatica Fisiche e Naturali Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression: OUTLINE Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks Introduction Approaches to voice coding/compression I Waveform coders (PCM) I Voice coders (vocoders) Quality assessment I Intelligibility I Naturalness (involves speaker identity preservation, emotion) I Subjective assessment: Listening test, Mean Opinion Score (MOS), Diagnostic acceptability measure (DAM), Diagnostic Rhyme Test (DRT) I Objective assessment: Signal to Noise Ratio (SNR), spectral distance measures, acoustic cues comparison Speech coding and compression Introduction to voice coding/compression PCM coding Speech vocoders based on speech production LPC based vocoders Speech over packet networks
    [Show full text]
  • Digital Speech Processing— Lecture 17
    Digital Speech Processing— Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Processing • Waveform coding – sample-by-sample matching of waveforms – coding quality measured using SNR • Source modeling (block processing) – block processing of signal => vector of outputs every block – overlapped blocks Block 1 Block 2 Block 3 2 Model-Based Speech Coding • we’ve carried waveform coding based on optimizing and maximizing SNR about as far as possible – achieved bit rate reductions on the order of 4:1 (i.e., from 128 Kbps PCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech • to lower bit rate further without reducing speech quality, we need to exploit features of the speech production model, including: – source modeling – spectrum modeling – use of codebook methods for coding efficiency • we also need a new way of comparing performance of different waveform and model-based coding methods – an objective measure, like SNR, isn’t an appropriate measure for model- based coders since they operate on blocks of speech and don’t follow the waveform on a sample-by-sample basis – new subjective measures need to be used that measure user-perceived quality, intelligibility, and robustness to multiple factors 3 Topics Covered in this Lecture • Enhancements for ADPCM Coders – pitch prediction – noise shaping • Analysis-by-Synthesis Speech Coders – multipulse linear prediction coder (MPLPC) – code-excited linear prediction (CELP) • Open-Loop Speech Coders – two-state excitation
    [Show full text]
  • Surround Sound Processed by Opus Codec: a Perceptual Quality Assessment
    28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken SURROUND SOUND PROCESSED BY OPUS CODEC: APERCEPTUAL QUALITY ASSESSMENT Franziska Trojahn, Martin Meszaros, Michael Maruschke and Oliver Jokisch Hochschule für Telekommunikation Leipzig, Germany [email protected] Abstract: The article describes the first perceptual quality study of 5.1 surround sound that has been processed by the Opus codec standardised by the Internet Engineering Task Force (IETF). All listening sessions with up to five subjects took place in a slightly sound absorbing laboratory – simulating living room conditions. For the assessment we conducted a Degradation Category Rating (DCR) listening- opinion test according to ITU-T P.800 recommendation with stimuli for six channels at total bitrates between 96 kbit/s and 192 kbit/s as well as hidden references. A group of 27 naive listeners compared a total of 20 sound samples. The differences between uncompressed and degraded sound samples were rated on a five-point degradation category scale resulting in Degradation Mean Opinion Score (DMOS). The overall results show that the average quality correlates with the bitrates. The quality diverges for the individual test stimuli depending on the music characteristics. Under most circumstances, a bitrate of 128 kbit/s is sufficient to achieve acceptable quality. 1 Introduction Nowadays, a high number of different speech and audio codecs are implemented in several kinds of multimedia applications; including audio / video entertainment, broadcasting and gaming. In recent years the demand for low delay and high quality audio applications, such as remote real-time jamming and cloud gaming, has been increasing. Therefore, current research objectives do not only include close to natural speech or audio quality, but also the requirements of low bitrates and a minimum latency.
    [Show full text]
  • Communications
    Oracle Enterprise Session Border Controller with Zoom Phone Premise Peering ( BYOC) and Verizon Business SIP Trunk Technical Application Note COMMUNICATIONS Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2 | P a g e Contents 1 RELATED DOCUMENTATION ............................................................................................................................... 5 1.1 ORACLE SBC ........................................................................................................................................................................ 5 1.2 ZOOM PHONE ....................................................................................................................................................................... 5 2 REVISION HISTORY ................................................................................................................................................. 5 3 INTENDED AUDIENCE ............................................................................................................................................ 5 3.1 VALIDATED ORACLE VERSIONS .......................................................................................................................................
    [Show full text]
  • Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 5, July - 2012 Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform Smita Vatsa, Dr. O. P. Sahu M. Tech (ECE Student ) Professor Department of Electronicsand Department of Electronics and Communication Engineering Communication Engineering NIT Kurukshetra, India NIT Kurukshetra, India Abstract reproduced with desired level of quality. Main approaches of speech compression used today are Aim of this paper is to explain and implement waveform coding, transform coding and parametric transform based speech compression techniques. coding. Waveform coding attempts to reproduce Transform coding is based on compressing signal input signal waveform at the output. In transform by removing redundancies present in it. Speech coding at the beginning of procedure signal is compression (coding) is a technique to transform transformed into frequency domain, afterwards speech signal into compact format such that speech only dominant spectral features of signal are signal can be transmitted and stored with reduced maintained. In parametric coding signals are bandwidth and storage space respectively represented through a small set of parameters that .Objective of speech compression is to enhance can describe it accurately. Parametric coders transmission and storage capacity. In this paper attempts to produce a signal that sounds like Discrete wavelet transform and Discrete cosine original speechwhether or not time waveform transform
    [Show full text]
  • THE FUTURE of IDEAS This Work Is Licensed Under a Creative Commons Attribution-Noncommercial License (US/V3.0)
    less_0375505784_4p_fm_r1.qxd 9/21/01 13:49 Page i THE FUTURE OF IDEAS This work is licensed under a Creative Commons Attribution-Noncommercial License (US/v3.0). Noncommercial uses are thus permitted without any further permission from the copyright owner. Permissions beyond the scope of this license are administered by Random House. Information on how to request permission may be found at: http://www.randomhouse.com/about/ permissions.html The book maybe downloaded in electronic form (freely) at: http://the-future-of-ideas.com For more permission about Creative Commons licenses, go to: http://creativecommons.org less_0375505784_4p_fm_r1.qxd 9/21/01 13:49 Page iii the future of ideas THE FATE OF THE COMMONS IN A CONNECTED WORLD /// Lawrence Lessig f RANDOM HOUSE New York less_0375505784_4p_fm_r1.qxd 9/21/01 13:49 Page iv Copyright © 2001 Lawrence Lessig All rights reserved under International and Pan-American Copyright Conventions. Published in the United States by Random House, Inc., New York, and simultaneously in Canada by Random House of Canada Limited, Toronto. Random House and colophon are registered trademarks of Random House, Inc. library of congress cataloging-in-publication data Lessig, Lawrence. The future of ideas : the fate of the commons in a connected world / Lawrence Lessig. p. cm. Includes index. ISBN 0-375-50578-4 1. Intellectual property. 2. Copyright and electronic data processing. 3. Internet—Law and legislation. 4. Information society. I. Title. K1401 .L47 2001 346.04'8'0285—dc21 2001031968 Random House website address: www.atrandom.com Printed in the United States of America on acid-free paper 24689753 First Edition Book design by Jo Anne Metsch less_0375505784_4p_fm_r1.qxd 9/21/01 13:49 Page v To Bettina, my teacher of the most important lesson.
    [Show full text]
  • Speech Coding Using Code Excited Linear Prediction
    ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) IJCST VOL . 5, Iss UE SPL - 2, JAN - MAR C H 2014 Speech Coding Using Code Excited Linear Prediction 1Hemanta Kumar Palo, 2Kailash Rout 1ITER, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha, India 2Gandhi Institute For Technology, Bhubaneswar, Odisha, India Abstract The main problem with the speech coding system is the optimum utilization of channel bandwidth. Due to this the speech signal is coded by using as few bits as possible to get low bit-rate speech coders. As the bit rate of the coder goes low, the intelligibility, SNR and overall quality of the speech signal decreases. Hence a comparative analysis is done of two different types of speech coders in this paper for understanding the utility of these coders in various applications so as to reduce the bandwidth and by different speech coding techniques and by reducing the number of bits without any appreciable compromise on the quality of speech. Hindi language has different number of stops than English , hence the performance of the coders must be checked on different languages. The main objective of this paper is to develop speech coders capable of producing high quality speech at low data rates. The focus of this paper is the development and testing of voice coding systems which cater for the above needs. Keywords PCM, DPCM, ADPCM, LPC, CELP Fig. 1: Speech Production Process I. Introduction Speech coding or speech compression is the compact digital The two types of speech sounds are voiced and unvoiced [1]. representations of voice signals [1-3] for the purpose of efficient They produce different sounds and spectra due to their differences storage and transmission.
    [Show full text]
  • Intelligibility of Selected Speech Codecs in Frame-Erasure Conditions
    NTIA Report 17-522 Intelligibility of Selected Speech Codecs in Frame-Erasure Conditions Andrew A. Catellier Stephen D. Voran report series U.S. DEPARTMENT OF COMMERCE • National Telecommunications and Information Administration NTIA Report 17-522 Intelligibility of Selected Speech Codecs in Frame-Erasure Conditions Andrew A. Catellier Stephen D. Voran U.S. DEPARTMENT OF COMMERCE November 2016 DISCLAIMER Certain commercial equipment and materials are identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the material or equipment identified is the best available for this purpose. iii PREFACE The work described in this report was performed by the Public Safety Communications Research Program (PSCR) on behalf of the Department of Homeland Security (DHS) Science and Technology Directorate. The objective was to quantify the speech intelligibility associated with selected digital speech coding algorithms subjected to erased data frames. This report constitutes the final deliverable product for this project. The PSCR is a joint effort of the National Institute of Standards and Technology and the National Telecommunications and Information Administration. v CONTENTS Preface..............................................................................................................................................v Figures.........................................................................................................................................
    [Show full text]
  • An Ultra Low-Power Miniature Speech CODEC at 8 Kb/S and 16 Kb/S Robert Brennan, David Coode, Dustin Griesdorf, Todd Schneider Dspfactory Ltd
    To appear in ICSPAT 2000 Proceedings, October 16-19, 2000, Dallas, TX. An Ultra Low-power Miniature Speech CODEC at 8 kb/s and 16 kb/s Robert Brennan, David Coode, Dustin Griesdorf, Todd Schneider Dspfactory Ltd. 611 Kumpf Drive, Unit 200 Waterloo, Ontario, N2V 1K8 Canada Abstract SmartCODEC platform consists of an effi- cient, block-floating point, oversampled This paper describes a CODEC implementa- Weighted OverLap-Add (WOLA) filterbank, a tion on an ultra low-power miniature applica- software-programmable dual-Harvard 16-bit tion specific signal processor (ASSP) designed DSP core, two high fidelity 14-bit A/D con- for mobile audio signal processing applica- verters, a 14-bit D/A converter and a flexible tions. The CODEC records speech to and set of peripherals [1]. The system hardware plays back from a serial flash memory at data architecture (Figure 1) was designed to enable rates of 16 and 8 kb/s, with a bandwidth of 4 memory upgrades. Removable memory cards kHz. This CODEC consumes only 1 mW in a or more power-efficient memory could be package small enough for use in a range of substituted for the serial flash memory. demanding portable applications. Results, improvements and applications are also dis- The CODEC communicates with the flash cussed. memory over an integrated SPI port that can transfer data at rates up to 80 kb/s. For this application, the port is configured to block 1. Introduction transfer frame packets every 14 ms. e Speech coding is ubiquitous. Increasing de- c i WOLA Filterbank v mands for portability and fidelity coupled with A/D e and D the desire for reduced storage and bandwidth o i l Programmable d o r u utilization have increased the demand for and t D/A DSP Core A deployment of speech CODECs.
    [Show full text]
  • Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyafil
    1 Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyafil Abstract—Speech coding is a field where compression ear and undergoes a highly complex transformation paradigms have not changed in the last 30 years. The before it is encoded efficiently by spikes at the auditory speech signals are most commonly encoded with com- nerve. This great efficiency in information representation pression methods that have roots in Linear Predictive has inspired speech engineers to incorporate aspects of theory dating back to the early 1940s. This paper tries to cognitive processing in when developing efficient speech bridge this influential theory with recent cognitive studies applicable in speech communication engineering. technologies. This tutorial article reviews the mechanisms of speech Speech coding is a field where research has slowed perception that lead to perceptual speech coding. Then considerably in recent years. This has occurred not it focuses on human speech communication and machine because it has achieved the ultimate in minimizing bit learning, and application of cognitive speech processing in rate for transparent speech quality, but because recent speech compression that presents a paradigm shift from improvements have been small and commercial applica- perceptual (auditory) speech processing towards cognitive tions (e.g., cell phones) have been mostly satisfactory for (auditory plus cortical) speech processing. The objective the general public, and the growth of available bandwidth of this tutorial is to provide an overview of the impact has reduced requirements to compress speech even fur- of cognitive speech processing on speech compression and discuss challenges faced in this interdisciplinary speech ther.
    [Show full text]
  • Tr 126 959 V15.0.0 (2018-07)
    ETSI TR 126 959 V15.0.0 (2018-07) TECHNICAL REPORT 5G; Study on enhanced Voice over LTE (VoLTE) performance (3GPP TR 26.959 version 15.0.0 Release 15) 3GPP TR 26.959 version 15.0.0 Release 15 1 ETSI TR 126 959 V15.0.0 (2018-07) Reference DTR/TSGS-0426959vf00 Keywords 5G ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI.
    [Show full text]
  • A Novel Speech Enhancement Approach Based on Modified Dct and Improved Pitch Synchronous Analysis
    American Journal of Applied Sciences 11 (1): 24-37, 2014 ISSN: 1546-9239 ©2014 Science Publication doi:10.3844/ajassp.2014.24.37 Published Online 11 (1) 2014 (http://www.thescipub.com/ajas.toc) A NOVEL SPEECH ENHANCEMENT APPROACH BASED ON MODIFIED DCT AND IMPROVED PITCH SYNCHRONOUS ANALYSIS 1Balaji, V.R. and 2S. Subramanian 1Department of ECE, Sri Krishna College of Engineering and Technology, Coimbatore, India 2Department of CSE, Coimbatore Institute of Engineering and Technology, Coimbatore, India Received 2013-06-03, Revised 2013-07-17; Accepted 2013-11-21 ABSTRACT Speech enhancement has become an essential issue within the field of speech and signal processing, because of the necessity to enhance the performance of voice communication systems in noisy environment. There has been a number of research works being carried out in speech processing but still there is always room for improvement. The main aim is to enhance the apparent quality of the speech and to improve the intelligibility. Signal representation and enhancement in cosine transformation is observed to provide significant results. Discrete Cosine Transformation has been widely used for speech enhancement. In this research work, instead of DCT, Advanced DCT (ADCT) which simultaneous offers energy compaction along with critical sampling and flexible window switching. In order to deal with the issue of frame to frame deviations of the Cosine Transformations, ADCT is integrated with Pitch Synchronous Analysis (PSA). Moreover, in order to improve the noise minimization performance of the system, Improved Iterative Wiener Filtering approach called Constrained Iterative Wiener Filtering (CIWF) is used in this approach. Thus, a novel ADCT based speech enhancement using improved iterative filtering algorithm integrated with PSA is used in this approach.
    [Show full text]