NTT Technical Review, Vol. 14, No. 11, Nov. 2016
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Surround Sound Processed by Opus Codec: a Perceptual Quality Assessment
28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken SURROUND SOUND PROCESSED BY OPUS CODEC: APERCEPTUAL QUALITY ASSESSMENT Franziska Trojahn, Martin Meszaros, Michael Maruschke and Oliver Jokisch Hochschule für Telekommunikation Leipzig, Germany [email protected] Abstract: The article describes the first perceptual quality study of 5.1 surround sound that has been processed by the Opus codec standardised by the Internet Engineering Task Force (IETF). All listening sessions with up to five subjects took place in a slightly sound absorbing laboratory – simulating living room conditions. For the assessment we conducted a Degradation Category Rating (DCR) listening- opinion test according to ITU-T P.800 recommendation with stimuli for six channels at total bitrates between 96 kbit/s and 192 kbit/s as well as hidden references. A group of 27 naive listeners compared a total of 20 sound samples. The differences between uncompressed and degraded sound samples were rated on a five-point degradation category scale resulting in Degradation Mean Opinion Score (DMOS). The overall results show that the average quality correlates with the bitrates. The quality diverges for the individual test stimuli depending on the music characteristics. Under most circumstances, a bitrate of 128 kbit/s is sufficient to achieve acceptable quality. 1 Introduction Nowadays, a high number of different speech and audio codecs are implemented in several kinds of multimedia applications; including audio / video entertainment, broadcasting and gaming. In recent years the demand for low delay and high quality audio applications, such as remote real-time jamming and cloud gaming, has been increasing. Therefore, current research objectives do not only include close to natural speech or audio quality, but also the requirements of low bitrates and a minimum latency. -
Tr 126 959 V15.0.0 (2018-07)
ETSI TR 126 959 V15.0.0 (2018-07) TECHNICAL REPORT 5G; Study on enhanced Voice over LTE (VoLTE) performance (3GPP TR 26.959 version 15.0.0 Release 15) 3GPP TR 26.959 version 15.0.0 Release 15 1 ETSI TR 126 959 V15.0.0 (2018-07) Reference DTR/TSGS-0426959vf00 Keywords 5G ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. -
Enhanced Voice Services (EVS) Codec Management of the Institute Until Now, Telephone Services Have Generally Failed to Offer a High-Quality Audio Experience Prof
TECHNICAL PAPER Fraunhofer Institute for Integrated Circuits IIS ENHANCED VOICE SERVICES (EVS) CODEC Management of the institute Until now, telephone services have generally failed to offer a high-quality audio experience Prof. Dr.-Ing. Albert Heuberger due to limitations such as very low audio bandwidth and poor performance on non- (executive) speech signals. However, recent developments in speech and audio coding now promise a Dr.-Ing. Bernhard Grill significant quality boost in conversational services, providing the full audio bandwidth for Am Wolfsmantel 33 a more natural experience, improved speech intelligibility and listening comfort. 91058 Erlangen www.iis.fraunhofer.de The recently standardized Enhanced Voice Service (EVS) codec is the first 3GPP commu- nication codec providing super-wideband (SWB) audio bandwidth for improved speech Contact quality already at 9.6 kbps. At the same time, the codec’s performance on other signals, Matthias Rose like music or mixed content, is comparable to modern audio codecs. The key technology Phone +49 9131 776-6175 of the codec is a flexible switching scheme between specialized coding modes for speech [email protected] and music signals. The codec was jointly developed by the following companies, repre- senting operators, terminal, infrastructure and chipset vendors, as well as leading speech Contact USA and audio coding experts: Ericsson, Fraunhofer IIS, Huawei Technologies Co. Ltd, NOKIA Fraunhofer USA, Inc. Corporation, NTT, NTT DOCOMO INC., ORANGE, Panasonic Corporation, Qualcomm Digital Media Technologies* Incorporated, Samsung Electronics Co. Ltd, VoiceAge and ZTE Corporation. Phone +1 408 573 9900 [email protected] The objective of this paper is to provide a brief overview of the landscape of commu- nication systems with special focus on the EVS codec. -
Parametric Coding for Spatial Audio
Master Thesis : Parametric Coding for Spatial Audio Author : Bertrand Fatus Supervisor at Orange : St´ephane Ragot Supervisor at KTH : Sten Ternstr¨om July-December 2015 1 2 Abstract This thesis presents a stereo coding technique used as an extension for the Enhanced Voice Services (EVS) codec [10] [8]. EVS is an audio codec recently standardized by the 3rd Generation Partnership Project (3GPP) for compressing mono signals at chosen rates from 7.2 to 128 kbit/s (for fixed bit rate) and around 5.9 kbit/s (for variable bit rate). The main goal of the thesis is to present the architecture of a parametric stereo codec and how the stereo extension of EVS may be built. Parametric stereo coding relies on the transmission of a downmixed signal, sum of left and right channels, and the necessary audible cues to synthesize back the stereo image from it at the decoding end. The codec has been implemented in MATLAB with use of the existing EVS codec. An important part of the thesis is dedicated to the description of the implementation of a robust downmixing technique. The remaining parts present the parametric coding architecture that has been adapted and used to develop the EVS stereo extension at 24.4 and 32 kbit/s and other open researches that have been conducted for more specific situ- ations such as spatial coding for stereo or binaural applications. Whereas the downmixing algorithm quality has been confronted to subjective testing and proven to be more efficient than any other existing techniques, the stereo extension has been tested less extensively. -
Ts 126 244 V12.5.0 (2016-10)
ETSI TS 1126 244 V12.5.0 (201616-10) TECHNICAL SPECIFICATION Digital cellular telecommmmunications system (Phasee 2+) (GSM); Universal Mobile Telelecommunications System ((UMTS); LTE; Transpaparent end-to-end packet switchedd sstreaming service (PSS); 3GPPP file format (3GP) (3GPP TS 26.2.244 version 12.5.0 Release 12) 3GPP TS 26.244 version 12.5.0 Release 12 1 ETSI TS 126 244 V12.5.0 (2016-10) Reference RTS/TSGS-0426244vc50 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. -
Overview of the Evs Codec Architecture
OVERVIEW OF THE EVS CODEC ARCHITECTURE Martin Dietz1, Markus Multrus2, Vaclav Eksler3, Vladimir Malenovsky3, Erik Norvell4, Harald Pobloth4, Lei Miao5, Zhe Wang5, Lasse Laaksonen6, Adriana Vasilache6, Yutaka Kamamoto7, Kei Kikuiri8, Stephane Ragot9, Julien Faure9, Hiroyuki Ehara10, Vivek Rajendran11, Venkatraman Atti11, Hosang Sung12, Eunmi Oh12, Hao Yuan13, Changbao Zhu13 1Consultant for Fraunhofer IIS, 2Fraunhofer IIS, 3VoiceAge, 4Ericsson AB, 5Huawei Technologies Co. Ltd., 6Nokia Technologies, 7Nippon Telegraph and Telephone Corp., 8NTT DOCOMO, INC., 9Orange, 10Panasonic, 11Qualcomm Technologies, Inc., 12Samsung Electronics Co., Ltd., 13ZTE Corporation ABSTRACT a high-level overview paper, specific details of the codec are described in various companion papers [1]-[16]. The recently standardized 3GPP codec for Enhanced Voice 2. KEY FUNCTIONALITIES IN THE EVS CODEC Services (EVS) offers new features and improvements for low- delay real-time communication systems. Based on a novel, 2.1. Switched Speech/Audio Coding at Low Delay switched low-delay speech/audio codec, the EVS codec Earlier generations of 3GPP codecs for voice services, such as contains various tools for better compression efficiency and AMR [30] and AMR-WB [20] are based on the principles of higher quality for clean/noisy speech, mixed content and music, speech coding. The EVS codec is the first codec to deploy including support for wideband, super-wideband and full-band content-driven on-the-fly switching between speech and audio content. The EVS codec operates in a broad range of bitrates, is compression at low algorithmic delay of 32 ms and bitrates highly robust against packet loss and provides an AMR-WB down to 5.9 kbps (average) or 7.2 kbps (constant) as used in interoperable mode for compatibility with existing systems. -
Transcoding a Look at How Transcoding Can Help Ipx Providers Climb up the Value Chain
07 360 VISION 2016 TRANSCODING A LOOK AT HOW TRANSCODING CAN HELP IPX PROVIDERS CLIMB UP THE VALUE CHAIN FUTURE PROOFING YOUR Digital BUSINESS www.hottelecom.com Sponsored by: www.radisys.com www.hottelecom.com www.radisys.com EVOLUTION CREATES COMPLEXITY table of ContentS The world of telecom services is evolving In contrast, while each originating service faster than at any time in the past, as customer provider could provide the necessary expectations for always-on and faster solutions transcoding to make international calls work, Before we transcode grow with each new handset design. their transcoding requirements are more sporadic and variable, and for them investing we need to Encode... 5 In this new world, there are a plethora of in transcoding solutions is therefore less cost applications making use of the high definition effective. audio and video capabilities of modern mobile handsets. And these applications require It could be said that basic transcoding is just a data processing that would eclipse a serious necessary function to make a call work when gaming PC from just a few years back. the end devices or networks are unable to DIFFERENT APPROACHES directly establish a call. As such, it could be TO TRANSCODING 11 While this is great for consumers and the one of those capabilities that does not create industry, this evolution also comes with a any visible differentiator for IPX providers. massive increase in overall service complexity. So why should IPX providers bother when For the foreseeable future, these evolved simpler options exist? One strong driver is the handset capabilities will still need to interwork growing adoption of VoLTE services, with many CARRIERS' TRANSCODING with much less powerful earlier generation more advanced interactive services driving handset, or even a standard telephone. -
The Importance of Mobile Audio Sponsored By
White Paper The Importance of Mobile Audio Sponsored by SUMMARY When it comes to smartphones, virtually everyone knows and cares about things like the number of megapixels for the onboard cameras, the resolution of the screen, what kind of video it can shoot, and just about everything else related to visual quality. Ask most people about the quality of the audio features, however, and you’ll probably get little more than a quizzical look and shrug of the shoulders. That’s unfortunate because in the same way that pairing a beautiful new 65” or larger 4K TV with a tiny, powered Bluetooth speaker as a sound system would ruin the overall media experience, not thinking about the differences that audio systems can make with smartphones doesn’t make any sense either. If you want to enjoy the highest possible audio quality when listening to music, get the best possible audio response time while gaming and enjoy the highest quality voice calls while talking, then digging into the details of what a mobile sound system is capable of—including both the smartphone and any earbuds, headphones or speakers to which they are connected—is important to do. First, however, you need to understand a bit more about how digital audio and wireless audio works. “Audio quality is an incredibly important part of the overall AV experience that modern smartphones provide, but few people understand that the only way to maintain that quality is through an end-to-end focused technology solution.”—Bob O’Donnell, Chief Analyst TECHnalysis Research, ©2021 1 | Page www.technalysisresearch.com The Importance of Mobile Audio INTRODUCTION While some people view them primarily as mobile data terminals, today’s smartphones are also tremendously powerful sources of audio-driven entertainment and communication. -
Ts 103 624 V1.1.1 (2019-11)
ETSI TS 103 624 V1.1.1 (2019-11) TECHNICAL SPECIFICATION Characterization Methodology and Requirement Specifications for the ETSI LC3plus codec 2 ETSI TS 103 624 V1.1.1 (2019-11) Reference DTS/STQ-279 Keywords codec, listening quality, speech ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI deliverable is the one made publicly available in PDF format at www.etsi.org/deliver. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. -
5G Multimedia Standardization
5G Multimedia Standardization Frédéric Gabin1, Gilles Teniou2, Nikolai Leung3 and Imre Varga4 1Chairman of 3GPP SA4, Standardisation Manager at Ericsson, France 2Vice Chairman of 3GPP SA4, Senior Standardisation Manager at Orange, France 3Vice Chairman of 3GPP SA4, Director of Technical Standards at Qualcomm, Philippines 4Chairman of 3GPP SA4 EVS SWG, Director of Technical Standards at Qualcomm, Germany E-mail: [email protected]; [email protected]; [email protected]; [email protected] Received 30 March 2018; Accepted 3 May 2018 Abstract In the past 10 years, the Smartphone device and its 4G Mobile Broadband Connection supported the now well-established era of video multimedia services. Future mass market multimedia services are expected to be highly immersive and interactive. This paper presents an overview of 5G multimedia aspects as specified by 3GPP for various services that will be provisioned over the 5G network. Specifically, we cover the evolution of streaming services for 5G, Virtual Reality 360◦ video streaming, real-time speech and audio communication services VR evolution and user generated multimedia content. Keywords: AR, VR, Audio, Video, Codec, Immersive, Live, Streaming, Multimedia. 1 Introduction In the past 10 years, the Smartphone device and its 4th generation (4G) Mobile Broad-Band (MBB) connection supported the now well-established era of video multimedia services. Future mass market multimedia services Journal of ICT, Vol. 6 1&2, 117–136. River Publishers doi: 10.13052/jicts2245-800X.618 This is an Open Access publication. c 2018 the Author(s). All rights reserved. 118 F. Gabin et al. are expected to be highly immersive and interactive. -
Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE
NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier report series U.S. DEPARTMENT OF COMMERCE • National Telecommunications and Information Administration NTIA Report 15-520 Speech Codec Intelligibility Testing in Support of Mission-Critical Voice Applications for LTE Stephen D. Voran Andrew A. Catellier U.S. DEPARTMENT OF COMMERCE September 2015 DISCLAIMER Certain commercial equipment and materials are identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the material or equipment identified is the best available for this purpose. iii PREFACE The work described in this report was performed by the Public Safety Communications Research Program (PSCR) on behalf of the Department of Homeland Security (DHS) Science and Technology Directorate. The objective was to quantify the speech intelligibility associated with a range of digital audio coding algorithms in various acoustic noise environments. This report constitutes the final deliverable product for this project. The PSCR is a joint effort of the National Institute for Standards and Technology and the National Telecommunications and Information Administration. v CONTENTS Preface..............................................................................................................................................v -
Convolutional Neural Networks to Enhance Coded Speech
1 Convolutional Neural Networks to Enhance Coded Speech Ziyue Zhao, Huijun Liu, Tim Fingscheidt, Senior Member, IEEE, Abstract—Enhancing coded speech suffering from far-end maintaining the speech perceptually undistorted, since the acoustic background noise, quantization noise, and potentially SNR drops and more importantly, only the mean squared error transmission errors, is a challenging task. In this work we propose (MSE) is minimized in the Wiener filter [7]. Therefore, some two postprocessing approaches applying convolutional neural net- works (CNNs) either in the time domain or the cepstral domain perceptually-based postfilters have been proposed to reduce the to enhance the coded speech without any modification of the perceptual degradation caused by low bitrate codecs. Formant codecs. The time domain approach follows an end-to-end fashion, enhancement postfilters [8], [9] emphasize the peaks of the while the cepstral domain approach uses analysis-synthesis with spectral envelope while further suppressing the valleys to cepstral domain features. The proposed postprocessors in both reduce the impact of quantization noise in coded speech, domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed since the formants are perceptually more important than the postprocessor improves speech quality (PESQ) by up to 0.25 spectral valleys. This type of postfilter typically consists of MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points three parts [9]: The core short-term postfilter to enhance the for G.722, and 0.26 points for adaptive multirate wideband codec formants, a tilt correction filter to compensate the low-pass tilt (AMR-WB).