Enhanced Voice Services – EVS IWAENC 2014 Presentation

Total Page:16

File Type:pdf, Size:1020Kb

Enhanced Voice Services – EVS IWAENC 2014 Presentation September 2014 Enhanced Voice Services – EVS IWAENC 2014 Presentation Confidential and Proprietary - Qualcomm Technologies, Incorporated. All Rights Reserved. Topics of this Presentation Benefits of EVS Standardization framework Specifications Algorithmic overview and special aspects Delay and complexity Embedding EVS in 3GPP system Quality characterization Deployment / commercialization aspects 2 EVS – Next Gen 3GPP Speech Coding for Improved User Experience in Telephony AMR 4.75 kbps 12.2 AMR-WB 6.6 kbps 23.85 EVS 5.9 kbps 128 EVS Quality AMR- WB AMR 1995 2002 2014 3 3GPP Voice Service Evolution AMR AMR-WB EVS Standards 2000 2002 2014 Commercial 2001 2010 2015 WCDMA+ 4 Different Voice Solutions Carrier Grade Voice Best Effort Voice CS Voice IMS/RCS VoIP OTT-VOIP OPERATOR OPERATOR 3RD PARTY PORTAL PROVIDED PROVIDED PROVIDED PROVIDED SERVICE SERVICE SERVICE SERVICE TAPI Operator 3rd party HLOS Dialer Client Client Client CDMA UMTS VoMBB VoLTE VoWiFi VoWAN Very reliable, fully interoperable, but Ensures Consistent & Seamless Interoperable but Limited to Lacks IP Mobility, Connectivity & Interop lacks personalization Rich Voice Experience VoLTE Coverage with other mVoIP EVRC Family AMR Family Customized Family EVS Family Internet Family G.7xx Family 5 What is EVS? 3GPP Speech Conversation / Telephony Coder EVS – Enhanced Voice Services − Next generation 3GPP speech coding − Following the successful FR, HR, EFR, AMR, AMR-WB codecs − Designed for packet-switched networks / mobile VoIP − VoLTE is a key target application − Application in other networks − AMR-WB interoperable mode − Rel-12 Work Item in 3GPP preceded by a Study Item TR 22.813 Key features − Super-wideband speech (32 kHz sampling) – improved speech quality − Source-controlled variable bit-rate operation – improved capacity − Designed for VoIP – improved robustness − Improved music performance − Wide bit-rate range and all bandwidths for maximum flexibility − Backward interoperable mode to AMR-WB Standardization process − Qualification phase − Selection phase − Characterization phase 6 3GPP EVS is the Next Generation Speech Coder Speech quality determines user experience − Ensuring voice quality on new VoLTE deployments − EVS addresses all networks – mobile VoIP with QoS, best effort VoIP, CS 3GPP goals of Enhanced Voice Services (EVS) standardization − Feature-rich coder − Designed for VoIP applications such as MTSI in TS 26.114 − It is further desirable that the benefits of EVS are available for users of other networks such as CS − NB, WB, SWB bandwidths, FB optional, high robustness mode − Bit rates: 7.2, 8, 9.6, 13.2, 16.4, and 24.4 kb/s gross rates that comply with LTE TBSs; 32, 48, 64, 96, 128 kb/s − Quality improvements – improving user experience − Better quality in VoLTE and UMTS (with no new RAB) − Evolution path: EVS provides SWB at around 13 kbps – lower rate and lower delay SWB than other industry coders without sacrificing quality − Better quality for music and mixed content in conversational applications − Capacity improvements – increasing system efficiency − VBR at 5.9 kbps provides high capacity mode − Robustness improvements – optimized behavior in VoIP applications − More robust NB/WB through significantly better error resilience − High robustness mode 7 EVS- Enhanced Voice Services The Ultimate Codec of Choice for Mobile Telephony EVS More natural sounding speech SWB and improved music quality @13.2 kbps Better Voice Quality BetterVoice AMR-WB WB Improves voice clarity and intelligibility @12.65 kbps AMR NB Toll quality narrowband voice @ 12.2 kbps EVS @13.2kbps provides Super Wideband Voice Quality at comparable bit-rate to AMR & AMR-WB 8 EVS Benefits Better Capacity Enhanced Error Resiliency Super Wideband: 13.2 – 128 kbps Optimized for VoLTE and Circuit Switched Networks Wideband: 5.9 – 128 kbps Improved Robustness to packet Narrowband: 5.9 – 13.2 kbps loss compared to AMR-WB Support of Source-Controlled Variable Bit Rate operation Extended audio bandwidth: 50 Hz to 16 kHz Better quality NB and WB Voice compared to AMR & AMR-WB Superior Quality Entertainment quality music coding Super Wideband (Voice and Music) Wideband (Voice) Narrowband (Voice) ~7kHz 16kHz Low frequencies increases naturalness, presence High frequencies improves voice clarity and Reproduces better and comfort intelligibility audio and music 9 Enhanced Error Resiliency Superior Voice Quality Error Resilience Improvement for 3GPP Speech coding performance delay loss profile (6% FER) (with background car noise) 3.5 4 3 EVS Performance 3 2.5 2 2 23.85 13.2 3.5 Music coding performance (3% FER) 12.65 23.85 13.2 12 16 AMR-WB EVS AMR-WB EVS Opus 3 2.5 2 19.85 23.85 13.2 AMR-WB EVS Enhanced In-Call Music Quality 10 EVS – Solution for Each Situation Jitter Buffer Management NB WB SWB FB Stereo New EVS New EVS AMR-WB New EVS New EVS Modes (CBR) New EVS Modes New EVS Modes New EVS Modes Modes Modes Interop Modes 7.2-128 kb/s 7.2-128 kb/s (optional) (optional) (CBR) (VBR) Modes (VBR) Speech Speech Speech Speech Music Speech Speech Music Speech Music Speech Music 7.2-13.2 kb/s 5.9kb/s 6.6-23.85 7.2-128 kb/s 7.2-128 5.9 kb/s 13.2-128 13.2-128 7.2-128 kb/s 7.2-128 kb/s 7.2-128 kb/s 7.2-128 kb/s (avg) kb/s kb/s (avg) kb/s kb/s Better Capacity Better Music Better Quality Improved Error Resilience Same NB/WB quality as legacy Near AAC Quality at much Same capacity as legacy Much better than AMR-WB, VoIP EVS? lower delay NB/WB Optimizations Why Deploy 11 3GPP EVS Standardization Process in Rel-12 Requirements phase – design constraints and performance requirements Candidate coders − 13 companies submitted a candidate by 16 November 2012 − Ericsson, Fraunhofer, Huawei, Motorola, Nokia, NTT, NTTDoCoMo, Orange, Panasonic, Qualcomm, Samsung, VoiceAge, ZTE − Standardization by competition Qualification phase − Aim is to keep the most promising 5 candidates for selection − Extensive testing − 12 experiments, each candidate is tested in-house and in another listening lab − Global Analysis Lab performs collection and analysis of test results − Qualification meeting in March 2013 agreed in 5 candidates All proponents announced a collaborative development of a joint candidate Selection phase – single joint candidate − Codec selection is based on extensive testing in neutral listening labs − Selection meeting in August 2014 agreed to adopt the joint candidate as EVS standard − Agreement on most EVS specifications Characterization phase − Aim is to test the coder performance for all conditions and special signals / conditions Approval of remaining EVS Specifications and Technical Report 12 EVS Rel-12 Standardization Timeline 3GPP SA4#80 Aug 4 3GPP SA4#80bis: codec selection and approval of specifications Submission of EVS Aug 30 Approval of EVS executable 3GPP SA4#81 Technical Report and for testing SA approval of floating-point spec EVS standard Nov 6 June 27 Dec 10 Sep 15 2014 2015 Jun Jul Aug Sep Oct Nov Dec Selection Testing Characterization Testing EVS Prototypes Available for Preliminary Lab/Field Testing EVS Engineering Build Available For IOT and Field Trials 2015 EVS over 3G UTRAN CS Work Item (Rel-13) 13 EVS Is A Global Collaboration Broad Industry Support Across the Ecosystem Keys For Successful Deployment Qualcomm • Codec hw/sw support (i.e., chipset, IMS/RCS client, Fraunhofer Samsung voice pre/post-proc, etc.) VoiceAge Nokia • Super Wideband terminal acoustic designs Ericsson EVS Panasonic • Infra support (IMS, gateways, etc) 12 Party Collaboration • Test Equipment Support (call box, IMS, SWB Huawei NTT acoustics, voice quality) ZTE NTT DoCoMo Orange • EVS support in voice services outside of mobile ecosystem (e.g., wireline VoIP, Enterprise VoIP & Video Telephony, etc.) 14 EVS Design Requirements Superwideband (0- 16 kHz) Coding of Speech better than AMR-WB Constraints on Frame Length, Improved Error Max. Resilience Improved Algorithmic for both Circuit Wideband Source Coding of Delay, Switched and (0-8 kHz) Coding of Controlled Music Complexity, Packet Switched Speech better than Variable Rate for In-call Music JBM, Rate Communication AMR-WB; inclusion of Coding (Music on hold Switching, and AMR-WB IO and Ringback) PLC, RTP VoIP Capability Payload Format, VAD/DTX/CNG Narrowband (0-4 KHz) Coding of Speech better than AMR 15 EVS Requirements in SWB at Low Rates Category Bitrate (kbit/s) FER DTX Requirements Clean speech 13.2 0% On†/Off NWT G.722.1C @ 32 -26,-16,-36dBov 16.4 NWT G.722.1C @ 48 24.4 NWT G.718B @ 36 Clean speech 13.2 x=3%, Off NWT G.722.1C @ 48, x% FER -26 dBov 16.4 6% On† for 13.2 NWT G.719 @ 48, x% FER 24.4 NWT G.719 @ 56, x% FER Noisy Speech (Car, Office, 13.2 0% On‡/Off NWT G.722.1C @ 24 when EVS DTX off Street) NWT AMR-WB @19.85 DTX on when EVS DTX on -26 dBov 16.4 NWT G.722.1C @ 32 when EVS DTX off NWT AMR-WB @23.05 DTX on when EVS DTX on 24.4 NWT G.722.1C @ 48 when EVS DTX off NWT AMR-WB @23.85 DTX on when EVS DTX on Noisy Speech (Car, Office, 13.2 x=3%, Off NWT G.722.1C @ 24, x% FER and DTX off Street) 6% On‡ for 13.2 NWT AMR-WB @19.85, x% FER and DTX on when EVS DTX on -26 dBov 16.4 NWT G.722.1C @ 32, x% FER 24.4 NWT G.722.1C @ 48, x% FER 16 3GPP EVS Specifications Spec No. Title Status: agreed TS 26.441 EVS Codec General Overview For approval TS 26.442 EVS Codec ANSI C code (fixed-point) For approval TS 26.443 EVS Codec ANSI C code (floating point) Draft TS 26.444 EVS Codec Test Sequences For approval TS 26.445 EVS Codec Detailed Algorithmic Description For approval TS 26.446 EVS Codec AMR-WB Backward Compatible Functions For approval TS 26.447 EVS Codec Error Concealment
Recommended publications
  • MIGRATING RADIO CALL-IN TALK SHOWS to WIDEBAND AUDIO Radio Is the Original Social Network
    Established 1961 2013 WBA Broadcasters Clinic Madison, WI MIGRATING RADIO CALL-IN TALK SHOWS TO WIDEBAND AUDIO Radio is the original Social Network • Serves local or national audience • Allows real-time commentary from the masses • The telephone becomes the medium • Telephone technical factors have limited the appeal of the radio “Social Network” Telephones have changed over the years But Telephone Sound has not changed (and has gotten worse) This is very bad for Radio Why do phones sound bad? • System designed for efficiency not comfort • Sampling rate of 8kHz chosen for all calls • 4 kHz max response • Enough for intelligibility • Loses depth, nuance, personality • Listener fatigue Why do phones sound so bad ? (cont) • Low end of telephone calls have intentional high- pass filtering • Meant to avoid AC power hum pickup in phone lines • Lose 2-1/2 Octaves of speech audio on low end • Not relevant for digital Why Phones Sound bad (cont) Los Angeles Times -- January 10, 2009 Verizon Communications Inc., the second-biggest U.S. telephone company, plans to do away with traditional phone lines within seven years as it moves to carry all calls over the Internet. An Internet-based service can be maintained at a fraction of the cost of a phone network and helps Verizon offer a greater range of services, Stratton said. "We've built our business over the years with circuit-switched voice being our bread and butter...but increasingly, we are in the business of selling, basically, data connectivity," Chief Marketing Officer John Stratton said. VoIP
    [Show full text]
  • Surround Sound Processed by Opus Codec: a Perceptual Quality Assessment
    28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken SURROUND SOUND PROCESSED BY OPUS CODEC: APERCEPTUAL QUALITY ASSESSMENT Franziska Trojahn, Martin Meszaros, Michael Maruschke and Oliver Jokisch Hochschule für Telekommunikation Leipzig, Germany [email protected] Abstract: The article describes the first perceptual quality study of 5.1 surround sound that has been processed by the Opus codec standardised by the Internet Engineering Task Force (IETF). All listening sessions with up to five subjects took place in a slightly sound absorbing laboratory – simulating living room conditions. For the assessment we conducted a Degradation Category Rating (DCR) listening- opinion test according to ITU-T P.800 recommendation with stimuli for six channels at total bitrates between 96 kbit/s and 192 kbit/s as well as hidden references. A group of 27 naive listeners compared a total of 20 sound samples. The differences between uncompressed and degraded sound samples were rated on a five-point degradation category scale resulting in Degradation Mean Opinion Score (DMOS). The overall results show that the average quality correlates with the bitrates. The quality diverges for the individual test stimuli depending on the music characteristics. Under most circumstances, a bitrate of 128 kbit/s is sufficient to achieve acceptable quality. 1 Introduction Nowadays, a high number of different speech and audio codecs are implemented in several kinds of multimedia applications; including audio / video entertainment, broadcasting and gaming. In recent years the demand for low delay and high quality audio applications, such as remote real-time jamming and cloud gaming, has been increasing. Therefore, current research objectives do not only include close to natural speech or audio quality, but also the requirements of low bitrates and a minimum latency.
    [Show full text]
  • Polycom Voice
    Level 1 Technical – Polycom Voice Level 1 Technical Polycom Voice Contents 1 - Glossary .......................................................................................................................... 2 2 - Polycom Voice Networks ................................................................................................. 3 Polycom UC Software ...................................................................................................... 3 Provisioning ..................................................................................................................... 3 3 - Key Features (Desktop and Conference Phones) ............................................................ 5 OpenSIP Integration ........................................................................................................ 5 Microsoft Integration ........................................................................................................ 5 Lync Qualification ............................................................................................................ 5 Better Together over Ethernet ......................................................................................... 5 BroadSoft UC-One Integration ......................................................................................... 5 Conference Link / Conference Link2 ................................................................................ 6 Polycom Desktop Connector ..........................................................................................
    [Show full text]
  • NTT DOCOMO Technical Journal
    3GPP EVS Codec for Unrivaled Speech Quality and Future Audio Communication over VoLTE EVS Speech Coding Standardization NTT DOCOMO has been engaged in the standardization of Research Laboratories Kimitaka Tsutsumi the 3GPP EVS codec, which is designed specifically for Kei Kikuiri VoLTE to further enhance speech quality, and has contributed to establishing a far-sighted strategy for making the EVS codec cover a variety of future communication services. Journal NTT DOCOMO has also proposed technical solutions that provide speech quality as high as FM radio broadcasts and that achieve both high coding efficiency and high audio quality not possible with any of the state-of-the-art speech codecs. The EVS codec will drive the emergence of a new style of speech communication entertainment that will combine BGM, sound effects, and voice in novel ways for mobile users. 2 Technical Band (AMR-WB)* [2] that is used in also encode music at high levels of quality 1. Introduction NTT DOCOMO’s VoLTE and that sup- and efficiency for non-real-time services, The launch of Voice over LTE (VoL- port wideband speech with a sampling 3GPP experts agreed to adopt high re- TE) services and flat-rate voice service frequency*3 of 16 kHz. In contrast, EVS quirements in the EVS codec for music has demonstrated the importance of high- has been designed to support super-wide- despite its main target of real-time com- quality telephony service to mobile users. band*4 speech with a sampling frequen- munication. Furthermore, considering In line with this trend, the 3rd Genera- cy of 32 kHz thereby achieving speech that telephony services using AMR-WB tion Partnership Project (3GPP) complet- of FM-radio quality*5.
    [Show full text]
  • The Growing Importance of HD Voice in Applications the Growing Importance of HD Voice in Applications White Paper
    White Paper The Growing Importance of HD Voice in Applications The Growing Importance of HD Voice in Applications White Paper Executive Summary A new excitement has entered the voice communications industry with the advent of wideband audio, commonly known as High Definition Voice (HD Voice). Although enterprises have gradually been moving to HD VoIP within their own networks, these networks have traditionally remained “islands” of HD because interoperability with other networks that also supported HD Voice has been difficult. With the introduction of HD Voice on mobile networks, which has been launched on numerous commercial mobile networks and many wireline VoIP networks worldwide, consumers can finally experience this new technology firsthand. Verizon, AT&T, T-Mobile, Deutsche Telekom, Orange and other mobile operators now offer HD Voice as a standard feature. Because mobile users tend to adopt new technology rapidly, replacing their mobile devices seemingly as fast as the newest models are released, and because landline VoIP speech is typically done via a headset, the growth of HD Voice continues to be high and in turn, the need for HD- capable applications will further accelerate. This white paper provides an introduction to HD Voice and discusses its current adoption rate and future potential, including use case examples which paint a picture that HD Voice upgrades to network infrastructure and applications will be seen as important, and perhaps as a necessity to many. 2 The Growing Importance of HD Voice in Applications White Paper Table of Contents What Is HD Voice? . 4 Where is HD Voice Being Deployed? . 4 Use Case Examples .
    [Show full text]
  • HD Voice – a Revolution in Voice Communication
    HD Voice – a revolution in voice communication Besides data capacity and coverage, which are one of the most important factors related to customers’ satisfaction in mobile telephony nowadays, we must not forget about the intrinsic characteristic of the mobile communication – the Voice. Ever since the nineties and the introduction of GSM there have not been much improvements in the area of voice communication and quality of sound has not seen any major changes. Smart Network going forward! Mobile phones made such a progress in recent years that they have almost replaced PCs, but their basic function, voice calls, is still irreplaceable and vital in mobile communication and it has to be seamless. In order to grow our customer satisfaction and expand our service portfolio, Smart Network engineers of Telenor Serbia have enabled HD Voice by introducing new network features and transitioning voice communication to all IP network. This transition delivers crystal-clear communication between the two parties greatly enhancing customer experience during voice communication over smartphones. Enough with the yelling into smartphones! HD Voice (or High-Definition Voice) represents a significant upgrade to sound quality in mobile communications. Thanks to this feature users experience clarity, smoothly reduced background noise and a feeling that the person they are talking to is standing right next to them or of "being in the same room" with the person on the other end of the phone line. On the more technical side, “HD Voice is essentially wideband audio technology, something that long has been used for conference calling and VoIP apps. Instead of limiting a call frequency to between 300 Hz and 3.4 kHz, a wideband audio call transmits at a range of 50 Hz to 7 kHz, or higher.
    [Show full text]
  • Low Bit-Rate Speech Coding with Vq-Vae and a Wavenet Decoder
    ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 735-739. IEEE, 2019. DOI: 10.1109/ICASSP.2019.8683277. c 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER Cristina Garbaceaˆ 1,Aaron¨ van den Oord2, Yazhe Li2, Felicia S C Lim3, Alejandro Luebs3, Oriol Vinyals2, Thomas C Walters2 1University of Michigan, Ann Arbor, USA 2DeepMind, London, UK 3Google, San Francisco, USA ABSTRACT compute the true information rate of speech to be less than In order to efficiently transmit and store speech signals, 100 bps, yet current systems typically require a rate roughly speech codecs create a minimally redundant representation two orders of magnitude higher than this to produce good of the input signal which is then decoded at the receiver quality speech, suggesting that there is significant room for with the best possible perceptual quality. In this work we improvement in speech coding. demonstrate that a neural network architecture based on VQ- The WaveNet [8] text-to-speech model shows the power VAE with a WaveNet decoder can be used to perform very of learning from raw data to generate speech. Kleijn et al. [9] low bit-rate speech coding with high reconstruction qual- use a learned WaveNet decoder to produce audio comparable ity.
    [Show full text]
  • Tr 126 959 V15.0.0 (2018-07)
    ETSI TR 126 959 V15.0.0 (2018-07) TECHNICAL REPORT 5G; Study on enhanced Voice over LTE (VoLTE) performance (3GPP TR 26.959 version 15.0.0 Release 15) 3GPP TR 26.959 version 15.0.0 Release 15 1 ETSI TR 126 959 V15.0.0 (2018-07) Reference DTR/TSGS-0426959vf00 Keywords 5G ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI.
    [Show full text]
  • Enhanced Voice Services (EVS) Codec Management of the Institute Until Now, Telephone Services Have Generally Failed to Offer a High-Quality Audio Experience Prof
    TECHNICAL PAPER Fraunhofer Institute for Integrated Circuits IIS ENHANCED VOICE SERVICES (EVS) CODEC Management of the institute Until now, telephone services have generally failed to offer a high-quality audio experience Prof. Dr.-Ing. Albert Heuberger due to limitations such as very low audio bandwidth and poor performance on non- (executive) speech signals. However, recent developments in speech and audio coding now promise a Dr.-Ing. Bernhard Grill significant quality boost in conversational services, providing the full audio bandwidth for Am Wolfsmantel 33 a more natural experience, improved speech intelligibility and listening comfort. 91058 Erlangen www.iis.fraunhofer.de The recently standardized Enhanced Voice Service (EVS) codec is the first 3GPP commu- nication codec providing super-wideband (SWB) audio bandwidth for improved speech Contact quality already at 9.6 kbps. At the same time, the codec’s performance on other signals, Matthias Rose like music or mixed content, is comparable to modern audio codecs. The key technology Phone +49 9131 776-6175 of the codec is a flexible switching scheme between specialized coding modes for speech [email protected] and music signals. The codec was jointly developed by the following companies, repre- senting operators, terminal, infrastructure and chipset vendors, as well as leading speech Contact USA and audio coding experts: Ericsson, Fraunhofer IIS, Huawei Technologies Co. Ltd, NOKIA Fraunhofer USA, Inc. Corporation, NTT, NTT DOCOMO INC., ORANGE, Panasonic Corporation, Qualcomm Digital Media Technologies* Incorporated, Samsung Electronics Co. Ltd, VoiceAge and ZTE Corporation. Phone +1 408 573 9900 [email protected] The objective of this paper is to provide a brief overview of the landscape of commu- nication systems with special focus on the EVS codec.
    [Show full text]
  • Parametric Coding for Spatial Audio
    Master Thesis : Parametric Coding for Spatial Audio Author : Bertrand Fatus Supervisor at Orange : St´ephane Ragot Supervisor at KTH : Sten Ternstr¨om July-December 2015 1 2 Abstract This thesis presents a stereo coding technique used as an extension for the Enhanced Voice Services (EVS) codec [10] [8]. EVS is an audio codec recently standardized by the 3rd Generation Partnership Project (3GPP) for compressing mono signals at chosen rates from 7.2 to 128 kbit/s (for fixed bit rate) and around 5.9 kbit/s (for variable bit rate). The main goal of the thesis is to present the architecture of a parametric stereo codec and how the stereo extension of EVS may be built. Parametric stereo coding relies on the transmission of a downmixed signal, sum of left and right channels, and the necessary audible cues to synthesize back the stereo image from it at the decoding end. The codec has been implemented in MATLAB with use of the existing EVS codec. An important part of the thesis is dedicated to the description of the implementation of a robust downmixing technique. The remaining parts present the parametric coding architecture that has been adapted and used to develop the EVS stereo extension at 24.4 and 32 kbit/s and other open researches that have been conducted for more specific situ- ations such as spatial coding for stereo or binaural applications. Whereas the downmixing algorithm quality has been confronted to subjective testing and proven to be more efficient than any other existing techniques, the stereo extension has been tested less extensively.
    [Show full text]
  • Lapped Transforms in Perceptual Coding of Wideband Audio
    Lapped Transforms in Perceptual Coding of Wideband Audio Sien Ruan Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2004 A thesis submitted to McGill University in partial fulfillment of the requirements for the degree of Master of Engineering. c 2004 Sien Ruan ° i To my beloved parents ii Abstract Audio coding paradigms depend on time-frequency transformations to remove statistical redundancy in audio signals and reduce data bit rate, while maintaining high fidelity of the reconstructed signal. Sophisticated perceptual audio coding further exploits perceptual redundancy in audio signals by incorporating perceptual masking phenomena. This thesis focuses on the investigation of different coding transformations that can be used to compute perceptual distortion measures effectively; among them the lapped transform, which is most widely used in nowadays audio coders. Moreover, an innovative lapped transform is developed that can vary overlap percentage at arbitrary degrees. The new lapped transform is applicable on the transient audio by capturing the time-varying characteristics of the signal. iii Sommaire Les paradigmes de codage audio d´ependent des transformations de temps-fr´equence pour enlever la redondance statistique dans les signaux audio et pour r´eduire le taux de trans- mission de donn´ees, tout en maintenant la fid´elit´e´elev´ee du signal reconstruit. Le codage sophistiqu´eperceptuel de l’audio exploite davantage la redondance perceptuelle dans les signaux audio en incorporant des ph´enom`enes de masquage perceptuels. Cette th`ese se concentre sur la recherche sur les diff´erentes transformations de codage qui peuvent ˆetre employ´ees pour calculer des mesures de d´eformation perceptuelles efficacement, parmi elles, la transformation enroul´e, qui est la plus largement r´epandue dans les codeurs audio de nos jours.
    [Show full text]
  • NTT Technical Review, Vol. 14, No. 11, Nov. 2016
    Feature Articles: Basic Research Envisioning Future Communication Transmission of High-quality Sound via Networks Using Speech/Audio Codecs Yutaka Kamamoto, Takehiro Moriya, and Noboru Harada Abstract This article describes two recent advances in speech and audio codecs. One is EVS (Enhanced Voice Service), the new standard by 3GPP (3rd Generation Partnership Project) for speech codecs, which is capable of transmitting speech signals, music, and even the ambient sound on the speaker’s side. This codec has been adopted in a new VoLTE (voice over Long-Term Evolution) service with enhanced high- definition voice (HD+), which provides us with clearer and more natural conversations than conven- tional telephony services such as with fixed-line/land-line and 3G mobile phones. The other is MPEG-4 Audio Lossless Coding (ALS) standardized by the Moving Picture Experts Group (MPEG), which makes it possible to transmit studio-quality audio content to the home. ALS is expected to be used by some broadcasters, including IPTV (Internet protocol television) companies, in their broadcasts in the near future. Keywords: audio/speech coding, data compression, international standards 1. Introduction speech codecs use a human phonation model, so they are not suitable for music. When clean speech items Many audio and speech codecs are available, and are coded by audio codecs at low bit rates, we get the we can select the most suitable one for different usage impression that a machine is talking. Speech and scenarios ranging from those requiring reasonable audio compression schemes have these kinds of quality with low bit rates to ones demanding original trade-offs.
    [Show full text]