Transcoding a Look at How Transcoding Can Help Ipx Providers Climb up the Value Chain

Total Page:16

File Type:pdf, Size:1020Kb

Transcoding a Look at How Transcoding Can Help Ipx Providers Climb up the Value Chain 07 360 VISION 2016 TRANSCODING A LOOK AT HOW TRANSCODING CAN HELP IPX PROVIDERS CLIMB UP THE VALUE CHAIN FUTURE PROOFING YOUR Digital BUSINESS www.hottelecom.com Sponsored by: www.radisys.com www.hottelecom.com www.radisys.com EVOLUTION CREATES COMPLEXITY table of ContentS The world of telecom services is evolving In contrast, while each originating service faster than at any time in the past, as customer provider could provide the necessary expectations for always-on and faster solutions transcoding to make international calls work, Before we transcode grow with each new handset design. their transcoding requirements are more sporadic and variable, and for them investing we need to Encode... 5 In this new world, there are a plethora of in transcoding solutions is therefore less cost applications making use of the high definition effective. audio and video capabilities of modern mobile handsets. And these applications require It could be said that basic transcoding is just a data processing that would eclipse a serious necessary function to make a call work when gaming PC from just a few years back. the end devices or networks are unable to DIFFERENT APPROACHES directly establish a call. As such, it could be TO TRANSCODING 11 While this is great for consumers and the one of those capabilities that does not create industry, this evolution also comes with a any visible differentiator for IPX providers. massive increase in overall service complexity. So why should IPX providers bother when For the foreseeable future, these evolved simpler options exist? One strong driver is the handset capabilities will still need to interwork growing adoption of VoLTE services, with many CARRIERS' TRANSCODING with much less powerful earlier generation more advanced interactive services driving handset, or even a standard telephone. opportunities for IPX providers with a future- OPPORTUNITY 16 proof vision for delivering transcoding and IPX providers responsible to interconnect calls other value added services for their service between disparate endpoints and networks provider customers. could fall back to the lowest quality, lowest complexity service. However, as subscribers This 360 vision will not only explain what WHERE IS THE now demand a better quality of experience, transcoding really is, but also why it is becoming this simple route is not always the right one. increasingly important for IPX providers to VALUE? 20 support it. We will also discuss the challenges IPX providers are in an ideal position to offer related to offering transcoding services and the evolved transcoding services required for will finally outline potential real life use cases communications interoperability to happen, that will convince you to add this capability to partly because of their position as the future-proof your business. interconnecting engine of the global telecom network, and partly because of the efficiency about HOT TELECOM they can generate from sharing a transcoding Steve Heap 26 infrastructure investment across many CTO about radisys customer groups. HOT TELECOM 2 360° VISION TRANSCODING 3 TRANSCODING Simplifying the digital evolution Before we transcode we need to encode... nteractive voice and video communication services involve the exchange of sounds and images. Audio encompasses a wide range of frequencies, from sounds that we feel rather than hear, through to the highest notes that we might not be even able to discern. Our eyes see a broad range of colors and perceive Iimages in 3D. When we try to transmit this information to another location, we need a way to convert it into the basic building blocks of modern telecoms: packets of digital information. A coder/decoder algorithm (or a codec), encodes an analog audio soundwave or a video stream into 1s and 0s, for transmission across the network, so it can then be decoded to recreate the original audio or video as accurately as possible at the far end. TRANSCODING 5 www.hottelecom.com www.radisys.com CODECS, CODECS AND MORE CODECS... As a result, we have gone from a world of TDM of these encoding technologies to ensure Voice transcoding circuit networks encoded with G.711 (the basic that the latest handsets will communicate When it comes to voice, current “phones” have Although more bandwidth is available now codec which compressed a very narrow range with legacy devices, and interwork with other a defined set of codecs and service providers than at any time in history, there continues to of audio frequencies into a 64kbit/s stream), service providers that are at different stages of pre-select certain ones to be preferred. For be bottlenecks and costs associated with its to a variety of open source and commercial their network evolution. instance, 2G and 3G mobile networks use provision, especially toward the edge of mobile codecs used with IP packet networks. AMR-NB as a low bandwidth way of providing networks. basic narrowband audio service. As the diagram below explains, modern THE ROLE OF THE HANDSET Due to these constraints, codec technology codecs have managed to support a massive As customers sought out better quality, some has evolved to deliver ever improving voice improvement in quality – up to full band stereo Voice over IP calls use a signaling system service providers added a wideband version of and video quality, while compressing as – in basically the same bandwidth allocation (known as SIP) to establish and control the codec (AMR-WB) into 3G smartphones and much information as possible into the lowest as the narrow “dead sounding” audio of that the call. During call setup, these signaling provided the necessary transcoding to make bandwidth to minimize transmission costs. old telephone call. Simply put, HD voice and messages also exchange codec “capability calls between these HD enabled phones and beyond is quickly becoming the new norm. and preference” messages from the originating the older 2G/3G devices work. But another big technology change needs to equipment to the terminating device. be overlaid on codec evolution: the move from However, with the growing number of codecs With the rollout of LTE (and particularly Voice TDM to IP. TDM (or Time Division Multiplex) comes growing complexity. Service providers Each endpoint outlines the specific codecs over LTE, or VoLTE), the entire network structure basically assigns a full circuit (at 64Kbit/s) to now have to be in a position to support most that the endpoint can support in priority order. is finally being moved from its traditional TDM a voice call, and uses that If there is no match between the available architecture to an end-to-end IP environment. bandwidth to encode a codecs, the network needs to quickly insert a New codecs for both audio and video continue narrow approximation of Types of audio codecs currently in service transcoding function in the call path during call to be developed and continue to grow in the original audio. setup, to establish a call between incompatible complexity. handsets/networks. The evolution to IP packet Agile transcoding allows service providers to networks meant that carefully select which codecs to deploy in order service providers had WHAT IS TRANSCODING REALLY? to maximize utilization and minimize cost. to encode the original audio into a series of data Transcoding is the interception of the media In situations with incompatible endpoints, packets. (the voice and/or video) in real time and service providers could elect a simple strategy modifying it to make it work with the equipment to transcode any high definition source down to Voice calls in IP networks in the distant network. If the source and the lowest common denominator (narrowband are vulnerable to lost destination endpoints discover they support voice). packets, noise, delay, the same codec during call set up negotiation, and jitter (variances in transcoding is not required. However, when a While you could use narrowband end-to-end on delay), hence modern codec mismatch is identified, transcoding will a call, there are benefits to applying transcoding codecs along with the be required somewhere in the call path. and keeping the VoLTE call portion on HD transcoding functions codec. Agile transcoding would allow AMR-WB in IP networks, are also But the real value add of transcoding comes and Enhanced Voice Services (EVS) codecs to designed to compensate from honoring the best quality codec offered be utilized for better bandwidth efficiency over and minimize their impact by the source and destination to get the best the expensive air interface, and then transcode on the real-time audio call possible performance from that specific within the IPX network for connectivity with experience. session – what we like to call agile transcoding. endpoints using older handset technology. Source: opus-codec.org, Radisys 6 360° VISION TRANSCODING 7 www.hottelecom.com www.radisys.com Finally, communications app designers are options to compress real-time video into a back the quality to the level supported by depending on the case being dealt with and much less constrained on which codecs to stream of packets, making it far more complex the oldest video device on the video call, we will provide you answers to most of these use. They tend to prefer high-definition open to transcode than voice. but rather delivers the highest quality of in the remainder of our 360 vision document. source codecs, to differentiate their audio experience for the end-users when it comes quality with no licensing fees, to minimize their Yet, video is the future of communication, with to live media. costs. LTE providing the high speed IP connectivity to the handset and consumers, and a growing Going beyond the needs of end-to-end As each new technology rolls out, such as number of networks offer video over LTE (ViLTE) calling (whether audio or with video), the WebRTC, more advanced codecs are often services to their customers.
Recommended publications
  • Surround Sound Processed by Opus Codec: a Perceptual Quality Assessment
    28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken SURROUND SOUND PROCESSED BY OPUS CODEC: APERCEPTUAL QUALITY ASSESSMENT Franziska Trojahn, Martin Meszaros, Michael Maruschke and Oliver Jokisch Hochschule für Telekommunikation Leipzig, Germany [email protected] Abstract: The article describes the first perceptual quality study of 5.1 surround sound that has been processed by the Opus codec standardised by the Internet Engineering Task Force (IETF). All listening sessions with up to five subjects took place in a slightly sound absorbing laboratory – simulating living room conditions. For the assessment we conducted a Degradation Category Rating (DCR) listening- opinion test according to ITU-T P.800 recommendation with stimuli for six channels at total bitrates between 96 kbit/s and 192 kbit/s as well as hidden references. A group of 27 naive listeners compared a total of 20 sound samples. The differences between uncompressed and degraded sound samples were rated on a five-point degradation category scale resulting in Degradation Mean Opinion Score (DMOS). The overall results show that the average quality correlates with the bitrates. The quality diverges for the individual test stimuli depending on the music characteristics. Under most circumstances, a bitrate of 128 kbit/s is sufficient to achieve acceptable quality. 1 Introduction Nowadays, a high number of different speech and audio codecs are implemented in several kinds of multimedia applications; including audio / video entertainment, broadcasting and gaming. In recent years the demand for low delay and high quality audio applications, such as remote real-time jamming and cloud gaming, has been increasing. Therefore, current research objectives do not only include close to natural speech or audio quality, but also the requirements of low bitrates and a minimum latency.
    [Show full text]
  • Tr 126 959 V15.0.0 (2018-07)
    ETSI TR 126 959 V15.0.0 (2018-07) TECHNICAL REPORT 5G; Study on enhanced Voice over LTE (VoLTE) performance (3GPP TR 26.959 version 15.0.0 Release 15) 3GPP TR 26.959 version 15.0.0 Release 15 1 ETSI TR 126 959 V15.0.0 (2018-07) Reference DTR/TSGS-0426959vf00 Keywords 5G ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI.
    [Show full text]
  • Enhanced Voice Services (EVS) Codec Management of the Institute Until Now, Telephone Services Have Generally Failed to Offer a High-Quality Audio Experience Prof
    TECHNICAL PAPER Fraunhofer Institute for Integrated Circuits IIS ENHANCED VOICE SERVICES (EVS) CODEC Management of the institute Until now, telephone services have generally failed to offer a high-quality audio experience Prof. Dr.-Ing. Albert Heuberger due to limitations such as very low audio bandwidth and poor performance on non- (executive) speech signals. However, recent developments in speech and audio coding now promise a Dr.-Ing. Bernhard Grill significant quality boost in conversational services, providing the full audio bandwidth for Am Wolfsmantel 33 a more natural experience, improved speech intelligibility and listening comfort. 91058 Erlangen www.iis.fraunhofer.de The recently standardized Enhanced Voice Service (EVS) codec is the first 3GPP commu- nication codec providing super-wideband (SWB) audio bandwidth for improved speech Contact quality already at 9.6 kbps. At the same time, the codec’s performance on other signals, Matthias Rose like music or mixed content, is comparable to modern audio codecs. The key technology Phone +49 9131 776-6175 of the codec is a flexible switching scheme between specialized coding modes for speech [email protected] and music signals. The codec was jointly developed by the following companies, repre- senting operators, terminal, infrastructure and chipset vendors, as well as leading speech Contact USA and audio coding experts: Ericsson, Fraunhofer IIS, Huawei Technologies Co. Ltd, NOKIA Fraunhofer USA, Inc. Corporation, NTT, NTT DOCOMO INC., ORANGE, Panasonic Corporation, Qualcomm Digital Media Technologies* Incorporated, Samsung Electronics Co. Ltd, VoiceAge and ZTE Corporation. Phone +1 408 573 9900 [email protected] The objective of this paper is to provide a brief overview of the landscape of commu- nication systems with special focus on the EVS codec.
    [Show full text]
  • Parametric Coding for Spatial Audio
    Master Thesis : Parametric Coding for Spatial Audio Author : Bertrand Fatus Supervisor at Orange : St´ephane Ragot Supervisor at KTH : Sten Ternstr¨om July-December 2015 1 2 Abstract This thesis presents a stereo coding technique used as an extension for the Enhanced Voice Services (EVS) codec [10] [8]. EVS is an audio codec recently standardized by the 3rd Generation Partnership Project (3GPP) for compressing mono signals at chosen rates from 7.2 to 128 kbit/s (for fixed bit rate) and around 5.9 kbit/s (for variable bit rate). The main goal of the thesis is to present the architecture of a parametric stereo codec and how the stereo extension of EVS may be built. Parametric stereo coding relies on the transmission of a downmixed signal, sum of left and right channels, and the necessary audible cues to synthesize back the stereo image from it at the decoding end. The codec has been implemented in MATLAB with use of the existing EVS codec. An important part of the thesis is dedicated to the description of the implementation of a robust downmixing technique. The remaining parts present the parametric coding architecture that has been adapted and used to develop the EVS stereo extension at 24.4 and 32 kbit/s and other open researches that have been conducted for more specific situ- ations such as spatial coding for stereo or binaural applications. Whereas the downmixing algorithm quality has been confronted to subjective testing and proven to be more efficient than any other existing techniques, the stereo extension has been tested less extensively.
    [Show full text]
  • NTT Technical Review, Vol. 14, No. 11, Nov. 2016
    Feature Articles: Basic Research Envisioning Future Communication Transmission of High-quality Sound via Networks Using Speech/Audio Codecs Yutaka Kamamoto, Takehiro Moriya, and Noboru Harada Abstract This article describes two recent advances in speech and audio codecs. One is EVS (Enhanced Voice Service), the new standard by 3GPP (3rd Generation Partnership Project) for speech codecs, which is capable of transmitting speech signals, music, and even the ambient sound on the speaker’s side. This codec has been adopted in a new VoLTE (voice over Long-Term Evolution) service with enhanced high- definition voice (HD+), which provides us with clearer and more natural conversations than conven- tional telephony services such as with fixed-line/land-line and 3G mobile phones. The other is MPEG-4 Audio Lossless Coding (ALS) standardized by the Moving Picture Experts Group (MPEG), which makes it possible to transmit studio-quality audio content to the home. ALS is expected to be used by some broadcasters, including IPTV (Internet protocol television) companies, in their broadcasts in the near future. Keywords: audio/speech coding, data compression, international standards 1. Introduction speech codecs use a human phonation model, so they are not suitable for music. When clean speech items Many audio and speech codecs are available, and are coded by audio codecs at low bit rates, we get the we can select the most suitable one for different usage impression that a machine is talking. Speech and scenarios ranging from those requiring reasonable audio compression schemes have these kinds of quality with low bit rates to ones demanding original trade-offs.
    [Show full text]
  • Ts 126 244 V12.5.0 (2016-10)
    ETSI TS 1126 244 V12.5.0 (201616-10) TECHNICAL SPECIFICATION Digital cellular telecommmmunications system (Phasee 2+) (GSM); Universal Mobile Telelecommunications System ((UMTS); LTE; Transpaparent end-to-end packet switchedd sstreaming service (PSS); 3GPPP file format (3GP) (3GPP TS 26.2.244 version 12.5.0 Release 12) 3GPP TS 26.244 version 12.5.0 Release 12 1 ETSI TS 126 244 V12.5.0 (2016-10) Reference RTS/TSGS-0426244vc50 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI.
    [Show full text]
  • Overview of the Evs Codec Architecture
    OVERVIEW OF THE EVS CODEC ARCHITECTURE Martin Dietz1, Markus Multrus2, Vaclav Eksler3, Vladimir Malenovsky3, Erik Norvell4, Harald Pobloth4, Lei Miao5, Zhe Wang5, Lasse Laaksonen6, Adriana Vasilache6, Yutaka Kamamoto7, Kei Kikuiri8, Stephane Ragot9, Julien Faure9, Hiroyuki Ehara10, Vivek Rajendran11, Venkatraman Atti11, Hosang Sung12, Eunmi Oh12, Hao Yuan13, Changbao Zhu13 1Consultant for Fraunhofer IIS, 2Fraunhofer IIS, 3VoiceAge, 4Ericsson AB, 5Huawei Technologies Co. Ltd., 6Nokia Technologies, 7Nippon Telegraph and Telephone Corp., 8NTT DOCOMO, INC., 9Orange, 10Panasonic, 11Qualcomm Technologies, Inc., 12Samsung Electronics Co., Ltd., 13ZTE Corporation ABSTRACT a high-level overview paper, specific details of the codec are described in various companion papers [1]-[16]. The recently standardized 3GPP codec for Enhanced Voice 2. KEY FUNCTIONALITIES IN THE EVS CODEC Services (EVS) offers new features and improvements for low- delay real-time communication systems. Based on a novel, 2.1. Switched Speech/Audio Coding at Low Delay switched low-delay speech/audio codec, the EVS codec Earlier generations of 3GPP codecs for voice services, such as contains various tools for better compression efficiency and AMR [30] and AMR-WB [20] are based on the principles of higher quality for clean/noisy speech, mixed content and music, speech coding. The EVS codec is the first codec to deploy including support for wideband, super-wideband and full-band content-driven on-the-fly switching between speech and audio content. The EVS codec operates in a broad range of bitrates, is compression at low algorithmic delay of 32 ms and bitrates highly robust against packet loss and provides an AMR-WB down to 5.9 kbps (average) or 7.2 kbps (constant) as used in interoperable mode for compatibility with existing systems.
    [Show full text]
  • The Importance of Mobile Audio Sponsored By
    White Paper The Importance of Mobile Audio Sponsored by SUMMARY When it comes to smartphones, virtually everyone knows and cares about things like the number of megapixels for the onboard cameras, the resolution of the screen, what kind of video it can shoot, and just about everything else related to visual quality. Ask most people about the quality of the audio features, however, and you’ll probably get little more than a quizzical look and shrug of the shoulders. That’s unfortunate because in the same way that pairing a beautiful new 65” or larger 4K TV with a tiny, powered Bluetooth speaker as a sound system would ruin the overall media experience, not thinking about the differences that audio systems can make with smartphones doesn’t make any sense either. If you want to enjoy the highest possible audio quality when listening to music, get the best possible audio response time while gaming and enjoy the highest quality voice calls while talking, then digging into the details of what a mobile sound system is capable of—including both the smartphone and any earbuds, headphones or speakers to which they are connected—is important to do. First, however, you need to understand a bit more about how digital audio and wireless audio works. “Audio quality is an incredibly important part of the overall AV experience that modern smartphones provide, but few people understand that the only way to maintain that quality is through an end-to-end focused technology solution.”—Bob O’Donnell, Chief Analyst TECHnalysis Research, ©2021 1 | Page www.technalysisresearch.com The Importance of Mobile Audio INTRODUCTION While some people view them primarily as mobile data terminals, today’s smartphones are also tremendously powerful sources of audio-driven entertainment and communication.
    [Show full text]
  • Ts 103 624 V1.1.1 (2019-11)
    ETSI TS 103 624 V1.1.1 (2019-11) TECHNICAL SPECIFICATION Characterization Methodology and Requirement Specifications for the ETSI LC3plus codec 2 ETSI TS 103 624 V1.1.1 (2019-11) Reference DTS/STQ-279 Keywords codec, listening quality, speech ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI deliverable is the one made publicly available in PDF format at www.etsi.org/deliver. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI.
    [Show full text]
  • 5G Multimedia Standardization
    5G Multimedia Standardization Frédéric Gabin1, Gilles Teniou2, Nikolai Leung3 and Imre Varga4 1Chairman of 3GPP SA4, Standardisation Manager at Ericsson, France 2Vice Chairman of 3GPP SA4, Senior Standardisation Manager at Orange, France 3Vice Chairman of 3GPP SA4, Director of Technical Standards at Qualcomm, Philippines 4Chairman of 3GPP SA4 EVS SWG, Director of Technical Standards at Qualcomm, Germany E-mail: [email protected]; [email protected]; [email protected]; [email protected] Received 30 March 2018; Accepted 3 May 2018 Abstract In the past 10 years, the Smartphone device and its 4G Mobile Broadband Connection supported the now well-established era of video multimedia services. Future mass market multimedia services are expected to be highly immersive and interactive. This paper presents an overview of 5G multimedia aspects as specified by 3GPP for various services that will be provisioned over the 5G network. Specifically, we cover the evolution of streaming services for 5G, Virtual Reality 360◦ video streaming, real-time speech and audio communication services VR evolution and user generated multimedia content. Keywords: AR, VR, Audio, Video, Codec, Immersive, Live, Streaming, Multimedia. 1 Introduction In the past 10 years, the Smartphone device and its 4th generation (4G) Mobile Broad-Band (MBB) connection supported the now well-established era of video multimedia services. Future mass market multimedia services Journal of ICT, Vol. 6 1&2, 117–136. River Publishers doi: 10.13052/jicts2245-800X.618 This is an Open Access publication. c 2018 the Author(s). All rights reserved. 118 F. Gabin et al. are expected to be highly immersive and interactive.
    [Show full text]
  • Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE
    NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier report series U.S. DEPARTMENT OF COMMERCE • National Telecommunications and Information Administration NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier U.S. DEPARTMENT OF COMMERCE September 2015 DISCLAIMER Certain commercial equipment and materials are identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the material or equipment identified is the best available for this purpose. iii PREFACE The work described in this report was performed by the Public Safety Communications Research Program (PSCR) on behalf of the Department of Homeland Security (DHS) Science and Technology Directorate. The objective was to quantify the speech intelligibility associated with a range of digital audio coding algorithms in various acoustic noise environments. This report constitutes the final deliverable product for this project. The PSCR is a joint effort of the National Institute for Standards and Technology and the National Telecommunications and Information Administration. v CONTENTS Preface..............................................................................................................................................v
    [Show full text]
  • Convolutional Neural Networks to Enhance Coded Speech
    1 Convolutional Neural Networks to Enhance Coded Speech Ziyue Zhao, Huijun Liu, Tim Fingscheidt, Senior Member, IEEE, Abstract—Enhancing coded speech suffering from far-end maintaining the speech perceptually undistorted, since the acoustic background noise, quantization noise, and potentially SNR drops and more importantly, only the mean squared error transmission errors, is a challenging task. In this work we propose (MSE) is minimized in the Wiener filter [7]. Therefore, some two postprocessing approaches applying convolutional neural net- works (CNNs) either in the time domain or the cepstral domain perceptually-based postfilters have been proposed to reduce the to enhance the coded speech without any modification of the perceptual degradation caused by low bitrate codecs. Formant codecs. The time domain approach follows an end-to-end fashion, enhancement postfilters [8], [9] emphasize the peaks of the while the cepstral domain approach uses analysis-synthesis with spectral envelope while further suppressing the valleys to cepstral domain features. The proposed postprocessors in both reduce the impact of quantization noise in coded speech, domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed since the formants are perceptually more important than the postprocessor improves speech quality (PESQ) by up to 0.25 spectral valleys. This type of postfilter typically consists of MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points three parts [9]: The core short-term postfilter to enhance the for G.722, and 0.26 points for adaptive multirate wideband codec formants, a tilt correction filter to compensate the low-pass tilt (AMR-WB).
    [Show full text]