6 Best Practices for Launching Vowifi, Volte and EVS

Total Page:16

File Type:pdf, Size:1020Kb

6 Best Practices for Launching Vowifi, Volte and EVS White Paper 6 Best Practices for Launching VoWiFi, VoLTE and EVS Launching Next-Gen Voice Services Powered by new wireless and IP technology, we live in an The latest development in the next generation of voice increasingly connected world. Now we expect to be able to services is the introduction of the EVS codec. EVS encodes access services anytime and anywhere and seamlessly move input audio signals with a bandwidth of up to 20kHz, the full between locations. Oh, and the quality needs to be great— bandwidth of audio perceptible to the human ear. EVS- everywhere! With the advent of Voice over Wi-Fi (VoWiFi) encoded speech is more faithfully reproduced than previous and Voice over LTE (VoLTE) and a range of supporting generation codecs such as AMR-WB and AMR–NB. In short, technologies including the Enhanced Voice Services (EVS) audio over EVS sounds more like the real thing. On the codec, Internet Multimedia Subsystem (IMS) and new “carrier- technical side the EVS codec also holds a lot of promise—it grade” Wi-Fi standards, providers are now able to deliver has been touted to perform better than previous generation next-generation voice services with unprecedented quality codecs when signal levels are poor and to improve call and accessibility. quality for connections to lower bandwidth codecs. 6 Best Practices for Launching VoWiFi, VoLTE and EVS Up to 20 kHz Up to 7 kHz Up to 3.4 kHz VoLTE EVS 3G AMR-WB 3G AMR-NB Figure 1: The EVS codec encodes audio inputs up 20kHz whereas previous generation codecs supported narrower bandwidths of 3.4 to 7kHz. Over the past few years, Spirent has worked closely with industry’s first round of providers rolling out next-generation voice services. Based on that experience, Spirent has developed best practices for assuring the successful launch of next-generation voice services with a focus on VoLTE, VoWiFi and EVS-enabled voice services. This white paper details the key challenges providers face as they launch these services and shares Spirent’s recommended best practices and lessons learned. The whitepaper also illustrates key principles with test results of next-gen voice services from actual operational networks. Assurance Challenges for Launching Next-Gen Voice Services Evaluating Inter-service, Inter-codec Calling (VoLTE & VoWiFi) The first challenge we will examine is how to assess the To illustrate the importance of testing these various performance of EVS voice services within a VoLTE network. combinations, we’ll share results from some mobile-to- To do this comprehensively it’s important to test how well mobile tests we performed on a live operational network the service works only when placing calls between EVS enabled with EVS and AMR-WB codecs. Our tests focused capable devices, but also when placing calls to the previous on comparing EVS and AMR-WB speech quality in a variety generation of devices that use narrow and wideband codecs of typical user locations. The chart on the left of Figure 2 including phones outside the mobile network via the PSTN shows that EVS to EVS calling was superior in terms of speech (public switched telephone network). quality when compared to AMR-WB to AMR-WB calling. The chart on the right side of Figure 2 shows cross-codec test In each of these scenarios different parts of the mobile results for AMR-WB to EVS codec calling (and vice versa). network are exercised and different transcoding is required. During this particular test, we observed a problem with It’s possible (and common) that the voice service may be transcoding which led to degraded speech quality for cross- working well in one scenario but exhibit an issue in another codec connectivity only. After this problem was identified, scenario. That means it’s critical to test VoLTE to VoLTE calls, the carrier was able to isolate the root cause to an IMS VoLTE to 3G calls and VoLTE to Landline calls as well as firmware issue which was quickly fixed. making sure all the combinations of narrow and wideband codecs are exercised. An identical challenge exists for evaluating EVS services over VoWiFi: it’s critical to test VoWiFi to VoWiFi calls, Wi-Fi to 3G, VoWiFi to 3G, VoWiFi to Landline calls and again to make sure all combinations of codecs are exercised. www.spirent.com Speech quality (POLQA MOS) Speech quality (EVS to AMR) AMR-WB EVS AMR-WB party EVS party Good 4.4 4.1 speech 4.1 Lower quality than 3.8 quality AMR-AMR 3.8 3.4 3.4 3.0 3.0 Mean Max Mean Max Figure 2: Test results for EVS, AMR-WB and cross-codec EVS / AMR-WB calling in an operational network. Troubleshooting Mobile-to-Mobile Issues (VoLTE & VoWiFi) The next challenge we’ll look at is how to troubleshoot The nature of Wi-Fi means congestion, interference and mobile-to-mobile voice service issues in a VoLTE network. latency are additional possible causes of the problem and Tests based on mobile to mobile calling are often the only often outside the direct control of the mobile operator. way to “exercise” a new service or technology due to the Following is a practical example of a set of mobile-to-mobile fact that the service only works on a subset of devices or for calling issues which could have multiple root causes and certain network infrastructure. If the test results reveal a voice are therefore extremely challenging to troubleshoot. We services problem exists it can be challenging to pinpoint and performed over 1,500 mobile-to-mobile calls in multiple troubleshoot exactly what could be causing it: locations served by operational Wi-Fi and 3G (UMTS) • Is it a device issue? networks. We used two pairs of mobile phones, where one pair was making Wi-Fi to Wi-Fi calls and the other was making • Is it an LTE Access network issue? Wi-Fi to 3G calls. In Figure 3, which shows the test results, we • Is it a core/IMS network issue? can see Wi-Fi to Wi-Fi performance is superior compared to • Is the problem is occurring in the uplink of one device or Wi-Fi to 3G in terms of call completion, call setup and speech the downlink of the other? quality. Because the Wi-Fi to 3G calls in this test campaign include an uplink and downlink on each technology, it is These challenges also apply to VoWiFi networks—is it a extremely challenging to isolate the root causes of the device issue, a network issue or perhaps a problem with the performance differences observed. venue Wi-Fi network or its backhaul to the core network? Wi-Fi to Wi -Fi was more Wi-Fi to Wi -Fi had faster call Wi-Fi to Wi -Fi had better speech reliable vs. Wi-Fi to Mobile setup vs. Wi-Fi to Mobile quality vs. Wi-Fi to Mobile 95% 4 More 10 Slower Better 92% Reliable 8.6 Quality s 3.3 s e ) 8 c 90% s 3 c ( ) y u t 2.5 S i e l S 87% ) O a m i n u % 6 5.1 M ( T o i Q t e A 85% p 2 t e h l u Q a c t p L R e e 4 e m O S l p o P l ( S C a 80% 1 l l C 2 a C Less Worse Reliable Faster 75% 0 0 Quality WiFi-WiFi WiFi-Mobile WiFi-WiFi WiFi-Mobile WiFi-WiFi WiFi-Mobile Call Completion Success = Successful Initiation + Successful Retention (No Drop) Figure 3: Over 1,500 VoWiFi calls made in multiple locations in an operational network. Two scenarios: Wi-Fi to Wi-Fi and Wi-Fi to 3G Mobile. 3 6 Best Practices for Launching VoWiFi, VoLTE and EVS Evaluating VoLTE-VoWiFi Handoff The next challenge we’ll examine is evaluating the impact of handoff between VoLTE and VoWiFi on user experience. More and more mobile network operators are using VoWiFi as a way of easing the load on their LTE / 3G networks and expanding their coverage. An important aspect of this policy is to ensure seamless handover of calls from the VoLTE network to the VoWiFi network (and vice versa) without the user perceiving discontinuity in the call or degradation in audio quality. As the call hands over from one access technology to the other the routing through the backhaul and core IMS network often changes and there are number of places where things can go wrong. Figure 4 and 5 depict measurements of speech quality for VoLTE to VoWiFi handoffs in an operational network. For these measurements, we evaluated speech quality before, during and after a handoff. The test scenario included the following steps: establish a call on VoLTE, emulate a user walking into Wi-Fi coverage (by adjusting the attenuation of the Wi-Fi signal transmitted by the access point), waiting for the handoff to Wi-Fi, staying on Wi-Fi and collecting multiple speech samples and then emulating walking out of Wi-Fi coverage (again by varying Wi-Fi signal levels) until the call transitions back to VoLTE. In the chart at the top of Figure 4, MOS values before the handoffs were very strong, averaging a score around 4. However, during the handoff we see scores that fluctuate between 1.5 and 2 with one example just below 3, reflecting a substantial degradation in user experience. The bottom graph in Figure 4 shows the WAV file for the speech during the handoff. During the handoff period from 5.7 seconds to 7.2 seconds, speech was almost non-existent.
Recommended publications
  • MIGRATING RADIO CALL-IN TALK SHOWS to WIDEBAND AUDIO Radio Is the Original Social Network
    Established 1961 2013 WBA Broadcasters Clinic Madison, WI MIGRATING RADIO CALL-IN TALK SHOWS TO WIDEBAND AUDIO Radio is the original Social Network • Serves local or national audience • Allows real-time commentary from the masses • The telephone becomes the medium • Telephone technical factors have limited the appeal of the radio “Social Network” Telephones have changed over the years But Telephone Sound has not changed (and has gotten worse) This is very bad for Radio Why do phones sound bad? • System designed for efficiency not comfort • Sampling rate of 8kHz chosen for all calls • 4 kHz max response • Enough for intelligibility • Loses depth, nuance, personality • Listener fatigue Why do phones sound so bad ? (cont) • Low end of telephone calls have intentional high- pass filtering • Meant to avoid AC power hum pickup in phone lines • Lose 2-1/2 Octaves of speech audio on low end • Not relevant for digital Why Phones Sound bad (cont) Los Angeles Times -- January 10, 2009 Verizon Communications Inc., the second-biggest U.S. telephone company, plans to do away with traditional phone lines within seven years as it moves to carry all calls over the Internet. An Internet-based service can be maintained at a fraction of the cost of a phone network and helps Verizon offer a greater range of services, Stratton said. "We've built our business over the years with circuit-switched voice being our bread and butter...but increasingly, we are in the business of selling, basically, data connectivity," Chief Marketing Officer John Stratton said. VoIP
    [Show full text]
  • Polycom Voice
    Level 1 Technical – Polycom Voice Level 1 Technical Polycom Voice Contents 1 - Glossary .......................................................................................................................... 2 2 - Polycom Voice Networks ................................................................................................. 3 Polycom UC Software ...................................................................................................... 3 Provisioning ..................................................................................................................... 3 3 - Key Features (Desktop and Conference Phones) ............................................................ 5 OpenSIP Integration ........................................................................................................ 5 Microsoft Integration ........................................................................................................ 5 Lync Qualification ............................................................................................................ 5 Better Together over Ethernet ......................................................................................... 5 BroadSoft UC-One Integration ......................................................................................... 5 Conference Link / Conference Link2 ................................................................................ 6 Polycom Desktop Connector ..........................................................................................
    [Show full text]
  • NTT DOCOMO Technical Journal
    3GPP EVS Codec for Unrivaled Speech Quality and Future Audio Communication over VoLTE EVS Speech Coding Standardization NTT DOCOMO has been engaged in the standardization of Research Laboratories Kimitaka Tsutsumi the 3GPP EVS codec, which is designed specifically for Kei Kikuiri VoLTE to further enhance speech quality, and has contributed to establishing a far-sighted strategy for making the EVS codec cover a variety of future communication services. Journal NTT DOCOMO has also proposed technical solutions that provide speech quality as high as FM radio broadcasts and that achieve both high coding efficiency and high audio quality not possible with any of the state-of-the-art speech codecs. The EVS codec will drive the emergence of a new style of speech communication entertainment that will combine BGM, sound effects, and voice in novel ways for mobile users. 2 Technical Band (AMR-WB)* [2] that is used in also encode music at high levels of quality 1. Introduction NTT DOCOMO’s VoLTE and that sup- and efficiency for non-real-time services, The launch of Voice over LTE (VoL- port wideband speech with a sampling 3GPP experts agreed to adopt high re- TE) services and flat-rate voice service frequency*3 of 16 kHz. In contrast, EVS quirements in the EVS codec for music has demonstrated the importance of high- has been designed to support super-wide- despite its main target of real-time com- quality telephony service to mobile users. band*4 speech with a sampling frequen- munication. Furthermore, considering In line with this trend, the 3rd Genera- cy of 32 kHz thereby achieving speech that telephony services using AMR-WB tion Partnership Project (3GPP) complet- of FM-radio quality*5.
    [Show full text]
  • The Growing Importance of HD Voice in Applications the Growing Importance of HD Voice in Applications White Paper
    White Paper The Growing Importance of HD Voice in Applications The Growing Importance of HD Voice in Applications White Paper Executive Summary A new excitement has entered the voice communications industry with the advent of wideband audio, commonly known as High Definition Voice (HD Voice). Although enterprises have gradually been moving to HD VoIP within their own networks, these networks have traditionally remained “islands” of HD because interoperability with other networks that also supported HD Voice has been difficult. With the introduction of HD Voice on mobile networks, which has been launched on numerous commercial mobile networks and many wireline VoIP networks worldwide, consumers can finally experience this new technology firsthand. Verizon, AT&T, T-Mobile, Deutsche Telekom, Orange and other mobile operators now offer HD Voice as a standard feature. Because mobile users tend to adopt new technology rapidly, replacing their mobile devices seemingly as fast as the newest models are released, and because landline VoIP speech is typically done via a headset, the growth of HD Voice continues to be high and in turn, the need for HD- capable applications will further accelerate. This white paper provides an introduction to HD Voice and discusses its current adoption rate and future potential, including use case examples which paint a picture that HD Voice upgrades to network infrastructure and applications will be seen as important, and perhaps as a necessity to many. 2 The Growing Importance of HD Voice in Applications White Paper Table of Contents What Is HD Voice? . 4 Where is HD Voice Being Deployed? . 4 Use Case Examples .
    [Show full text]
  • HD Voice – a Revolution in Voice Communication
    HD Voice – a revolution in voice communication Besides data capacity and coverage, which are one of the most important factors related to customers’ satisfaction in mobile telephony nowadays, we must not forget about the intrinsic characteristic of the mobile communication – the Voice. Ever since the nineties and the introduction of GSM there have not been much improvements in the area of voice communication and quality of sound has not seen any major changes. Smart Network going forward! Mobile phones made such a progress in recent years that they have almost replaced PCs, but their basic function, voice calls, is still irreplaceable and vital in mobile communication and it has to be seamless. In order to grow our customer satisfaction and expand our service portfolio, Smart Network engineers of Telenor Serbia have enabled HD Voice by introducing new network features and transitioning voice communication to all IP network. This transition delivers crystal-clear communication between the two parties greatly enhancing customer experience during voice communication over smartphones. Enough with the yelling into smartphones! HD Voice (or High-Definition Voice) represents a significant upgrade to sound quality in mobile communications. Thanks to this feature users experience clarity, smoothly reduced background noise and a feeling that the person they are talking to is standing right next to them or of "being in the same room" with the person on the other end of the phone line. On the more technical side, “HD Voice is essentially wideband audio technology, something that long has been used for conference calling and VoIP apps. Instead of limiting a call frequency to between 300 Hz and 3.4 kHz, a wideband audio call transmits at a range of 50 Hz to 7 kHz, or higher.
    [Show full text]
  • Low Bit-Rate Speech Coding with Vq-Vae and a Wavenet Decoder
    ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 735-739. IEEE, 2019. DOI: 10.1109/ICASSP.2019.8683277. c 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER Cristina Garbaceaˆ 1,Aaron¨ van den Oord2, Yazhe Li2, Felicia S C Lim3, Alejandro Luebs3, Oriol Vinyals2, Thomas C Walters2 1University of Michigan, Ann Arbor, USA 2DeepMind, London, UK 3Google, San Francisco, USA ABSTRACT compute the true information rate of speech to be less than In order to efficiently transmit and store speech signals, 100 bps, yet current systems typically require a rate roughly speech codecs create a minimally redundant representation two orders of magnitude higher than this to produce good of the input signal which is then decoded at the receiver quality speech, suggesting that there is significant room for with the best possible perceptual quality. In this work we improvement in speech coding. demonstrate that a neural network architecture based on VQ- The WaveNet [8] text-to-speech model shows the power VAE with a WaveNet decoder can be used to perform very of learning from raw data to generate speech. Kleijn et al. [9] low bit-rate speech coding with high reconstruction qual- use a learned WaveNet decoder to produce audio comparable ity.
    [Show full text]
  • Lapped Transforms in Perceptual Coding of Wideband Audio
    Lapped Transforms in Perceptual Coding of Wideband Audio Sien Ruan Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2004 A thesis submitted to McGill University in partial fulfillment of the requirements for the degree of Master of Engineering. c 2004 Sien Ruan ° i To my beloved parents ii Abstract Audio coding paradigms depend on time-frequency transformations to remove statistical redundancy in audio signals and reduce data bit rate, while maintaining high fidelity of the reconstructed signal. Sophisticated perceptual audio coding further exploits perceptual redundancy in audio signals by incorporating perceptual masking phenomena. This thesis focuses on the investigation of different coding transformations that can be used to compute perceptual distortion measures effectively; among them the lapped transform, which is most widely used in nowadays audio coders. Moreover, an innovative lapped transform is developed that can vary overlap percentage at arbitrary degrees. The new lapped transform is applicable on the transient audio by capturing the time-varying characteristics of the signal. iii Sommaire Les paradigmes de codage audio d´ependent des transformations de temps-fr´equence pour enlever la redondance statistique dans les signaux audio et pour r´eduire le taux de trans- mission de donn´ees, tout en maintenant la fid´elit´e´elev´ee du signal reconstruit. Le codage sophistiqu´eperceptuel de l’audio exploite davantage la redondance perceptuelle dans les signaux audio en incorporant des ph´enom`enes de masquage perceptuels. Cette th`ese se concentre sur la recherche sur les diff´erentes transformations de codage qui peuvent ˆetre employ´ees pour calculer des mesures de d´eformation perceptuelles efficacement, parmi elles, la transformation enroul´e, qui est la plus largement r´epandue dans les codeurs audio de nos jours.
    [Show full text]
  • ITU-T G.711.1: Extending G.711 to Higher-Quality Wideband Speech
    HIWASAKI LAYOUT 9/22/09 2:24 PM Page 110 ITU-T STANDARDS ITU-T G.711.1: Extending G.711 to Higher-Quality Wideband Speech Yusuke Hiwasaki and Hitoshi Ohmuro, NTT Corporation ABSTRACT 7 kHz, called wideband speech, which is equiva- lent to audio signals conveyed in AM radio In March 2008 the ITU-T approved a new broadcasts. One of the most popular applications wideband speech codec called ITU-T G.711.1. of voice over IP (VoIP) is remote audio-visual This Recommendation extends G.711, the most conferences, where hands-free terminals are widely deployed speech codec, to 7 kHz audio often used. In that case, intelligibility becomes bandwidth and is optimized for voice over IP more important than when using handsets applications. The most important feature of this because participants usually sit around a terminal codec is that the G.711.1 bitstream can be at a certain distance from a loudspeaker. This is transcoded into a G.711 bitstream by simple where wideband speech coders, which can repro- truncation. G.711.1 operates at 64, 80, and 96 duce speech at high fidelity and intelligibility, are kb/s, and is designed to achieve very short delay particularly favored. and low complexity. ITU-T evaluation results Today, the majority of fixed-line digital show that the codec fulfils all the requirements telecommunications terminals are equipped with defined in the terms of reference. This article ITU-T G.711 (log-compressed pulse code modu- presents the codec requirements and design con- lation [PCM]) capability. In fact, for communica- straints, describes how standardization was con- tion using Real-Time Transport Protocol (RTP) ducted, and reports on the codec performance over IP networks, G.711 support is mandatory.
    [Show full text]
  • Classroom Mini £5,250
    DATA SHEET Classroom Mini Classroom Mini Price: £5,250 A setup designed for learners to receive remote teaching in medium-sized learning space How it works: Ideal for: The classroom setup is designed to deliver the typical classroom experience for learners, just with a remote Learners receiving remote teaching teacher rather than one who is physically present. Ideal for small to medium learning spaces of 20-30m2, the classroom mini can accommodate up to 20 learners at Medium learning spaces of 20-30m2 one time. Up to 20 learners in the classroom Remote teaching Here, the learners are taught from a screen and Teacher may be teaching from a teaching pod, loudspeaker at the front of the classroom, with the or another classroom lesson delivered by a teacher working remotely. The teacher may be working from a teaching pod or connected from another classroom on a different site. Real-time video streaming What’s included: The high frame rate and zero latency of the video streaming gives a true-to-life classroom experience. It’s Ultra high definition video streaming like an invisible wall through to the other classroom or teaching pod. Learners will feel as though the teacher High definition content sharing is there in the room with them. Optical zoom gives the teacher a view of the remote classroom, so this 5 x optical zoom (class view) immersive experience works both ways. Wide audio pickup Flawless audio feedback The high-spec audio setup also helps to give this sense of immediacy. The audio pickup, placed at the top of the room, can determine the difference between background noise and the human voice.
    [Show full text]
  • Ts 103 624 V1.1.1 (2019-11)
    ETSI TS 103 624 V1.1.1 (2019-11) TECHNICAL SPECIFICATION Characterization Methodology and Requirement Specifications for the ETSI LC3plus codec 2 ETSI TS 103 624 V1.1.1 (2019-11) Reference DTS/STQ-279 Keywords codec, listening quality, speech ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI deliverable is the one made publicly available in PDF format at www.etsi.org/deliver. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI.
    [Show full text]
  • A Dynamic Rate Adaptation Algorithm Using WB E-Model for Voice Traffic
    A dynamic rate adaptation algorithm using WB E-model for voice traffic over LTE network Duy-Huy Nguyen and Hang Nguyen Department of Wireless Network and Multimedia Services Institut Mines-Telecom, Telecom SudParis Samovar Laboratory, UMR 5157, CNRS, Evry, France fduy huy.nguyen, [email protected] Abstract—This paper presents a dynamic adaptation algorithm When voice traffic is transmitted over LTE network, the of joint source-channel code rate for enhancing voice transmis- voice signal firstly is compressed at Application layer by sion over LTE network. In order to assess the speech quality, we AMR-WB codec, and then it is packetized into RTP payload. use the Wideband(WB) E-model. In this model, both end-to-end delay and packet loss are taken into account. The goal of this When this payload goes through each layer, it is packetized paper is to find out the best suboptimal solution for improving into the corresponding packet and the header is added. In order voice traffic over LTE network with some constraints on allowed to protect the voice packet when it is delivered over a noisy maximum end-to-end delay and allowed maximum packet loss. channel, some error correcting technologies are included. The The best suboptimal choice is channel code rate corresponding to Forward Error Correction (FEC) channel code is widely used each mode of the AMR-WB codec that minimizes redundant bits generated by channel coding with an acceptable MOS reduction. in LTE network for data channels is Turbo code. Channel Besides, this algorithm can be integrated with rate control in coding reduces Bit Error Rate (BER), so that the speech AMR-WB codec to offer the required mode of LTE network.
    [Show full text]
  • Super-Wideband Bandwidth Extension for Wideband Audio
    DAGA 2010 - Berlin Super-Wideband Bandwidth Extension for Wideband Audio Codecs Using Switched Spectral Replication and Pitch Synthesis Bernd Geiser, Hauke Kr¨uger, Peter Vary Institute of Communication Systems and Data Processing ( ) RWTH Aachen University, Germany {geiser|krueger|vary}@ind.rwth-aachen.de Abstract Frame 61 Frame 62 This paper describes a new bandwidth extension algorithm which is targeted at high quality audio communication over IP networks. The algorithm is part of the Huawei/ETRI can- didate for the ITU-T super-wideband (SWB) extensions of Rec. G.729.1 and G.718. In the SWB candidate codec, the 7-14 kHz frequency band of speech and audio signals is rep- resented in terms of temporal and spectral envelopes. This description is encoded and transmitted to the decoder. In Frame 63 Frame 64 addition, the fine structure of the input signal is analyzed and compactly encoded. From this compact information, the decoder can regenerate the 7-14 kHz fine structure either by spectral replication or by pitch synthesis. Then, an adaptive envelope restoration procedure is employed. The algorithm operates in the MDCT domain to allow subsequent refine- ment coding by vector quantization of spectral coefficients. In the paper, relevant listening test results for the G.729.1- Figure 1: MDCT and pseudo spectrum representations of a SWB candidate codec that have been obtained during the stationary sinusoid over four successive signal frames. Com- ITU-T standardization process are summarized. Good audio parison with DFT amplitude spectrum. quality could be shown for both speech and music signals. Codec Overview Bandwidth Extension Algorithm As the first and most important part of the parameter set, The bandwidth extension (BWE) algorithm which is de- the BWE encoder computes a spectral envelope which is scribed in this paper has been implemented in the formed of logarithmic subband gains for subbands of equal Huawei/ETRI candidate [1] for the super-wideband (SWB) bandwidth.
    [Show full text]