ITU-T G.729.1 Scalable Codec for New Wideband Services
Total Page:16
File Type:pdf, Size:1020Kb
VARGA LAYOUT 9/22/09 3:57 PM Page 131 ITU-T STANDARDS ITU-T G.729.1 Scalable Codec for New Wideband Services Imre Varga, Qualcomm Inc. Stéphane Proust, France Telecom Hervé Taddei, Huawei Technologies ABSTRACT as gateways or other devices combining multiple data streams. This feature provides high flexibili- G.729.1 is a scalable codec for narrowband ty by easy adaptation to various service require- and wideband conversational applications stan- ments and interconnected networks and dardized by ITU-T Study Group 16. The motiva- terminals. With such bit rate adaptation, opti- tion for the standardization work was to meet mum speech quality is provided according to ser- the new challenges of VoIP in terms of quality vice and network constraints, and packet of service and efficiency in networks, in particu- droppings that severely impair the overall quality lar regarding the strategic rollout of wideband are limited. Thus, the G.729.1 scalable codec service. G.729.1 was designed to allow smooth minimizes transcoding and ensures smooth intro- transition from narrowband (300–3400 Hz) duction of new features without breaking exist- PSTN to high-quality wideband (50–7000 Hz) ing services. telephony by preserving backward compatibility This article summarizes the G.729.1 stan- with the widely deployed G.729 codec. The scal- dardization process, the main targeted applica- able structure allows gradual quality increase tions, and related requirements and with bit rate. A low-delay mode makes the coder constraints. It presents the most important especially suitable for high-quality speech com- characteristics and the performance of munication. The article presents the standardiza- G.729.1. For more algorithmic details, the tion goals and process, an overview of the coding reader may refer to [2, 4, 5]. algorithm, and the codec performance in various conditions. STANDARDIZATION PROCESS INTRODUCTION G.729.1 standardization was launched in Jan- uary 2004 under the nickname G.729EV. In The International Telecommunication Union — July 2005 the qualification test results showed Telecommunication Standardization Sector that two candidate codecs (from France Tele- (ITU-T) has conducted a substantial amount of com and from a consortium of three compa- work in standardization of speech and audio nies: Siemens, Matsushita, and Mindspeed) codecs. In recent years, ITU-T has focused on passed all the qualification phase requirements. flexibility enhancement in which ITU-T pio- Two more candidates (from ETRI and neered scalable coding. Scalable coding is a VoiceAge) had very few marginal failures and highly flexible coding technology involving a core qualified as well. Following the SG16 chair- layer with multiple embedded layers [1]. The core man’s advice, the qualified candidate compa- layer provides the minimum bits needed for the nies agreed to work together on a single best decoder to resynthesize the speech signal with a possible candidate. The performance of the minimum (core) quality. Additional layers aim at new combined algorithm was thoroughly exam- improving the quality. ined during the characterization phase against G.729.1 [2] is the first speech codec with an an agreed set of requirements and objectives. The work presented in this embedded scalable structure built as an exten- G.729EV completed electronic balloting (Alter- article was performed for sion of an already existing standard. It offers full native Approval Process [AAP]) in May 2006. Siemens Enterprise Com- backward bitstream interoperability at 8 kb/s The approved standard is published as ITU-T munications. I. Varga cur- with the much used G.729 standard [3] in voice Recommendation G.729.1 as well as G.729 rently works for over IP (VoIP) infrastructures. G.729.1 is one of Annex J. Qualcomm Inc. the best for wideband speech quality, and its After approval, several improvements were quality is preserved regardless of the access added [2]: The work presented in this modes and device capabilities thanks to strong • Annex A (January 2007) contains the Real- article was performed for robustness to IP packet losses. G.729.1 scalable Time Transport Protocol (RTP) payload Siemens AG. H. Taddei structure allows for dynamic instant bit rate format, capability identifiers, and parame- currently works for selection by simple truncation of the bitstream at ters for the signaling of G.729.1 capabilities Huawei Technologies. any component of the communication chain such using H.245. IEEE Communications Magazine • October 2009 0163-6804/09/$25.00 © 2009 IEEE 131 VARGA LAYOUT 9/22/09 3:57 PM Page 132 • Annex B (February 2007) defines an alter- granularity as an important objective and to not As G.729.1 is pri- native implementation of the G.729.1 algo- exceed 2 kb/s rate intervals above 14 kb/s. rithm using floating point arithmetic. For the targeted wireline applications, setting marily dedicated to • Annex C (June 2008) specifies discontinu- the minimum bit rate for wideband capability at speech signals for ous transmission (DTX) and comfort noise around 14 kb/s was found appropriate in the conversational ser- generation (CNG) schemes in fixed point. committee. Introduction of an additional nar- Annex D (November 2008) is the corre- rowband layer at 12 kb/s to offer quality vices, the perfor- sponding floating point specification. improvement (close to G.711 public switched mance for music • The low-delay mode functionality originally telephone network [PSTN] quality) serves users used for the narrowband layers (8–12 kb/s) with narrowband G.729-based devices. signals was not the was added to the first wideband layer at 14 As usual for ITU-T speech and audio coding major criterion. Yet, kb/s of G.729.1 (August 2007). The motiva- Recommendations, it was required that the spec- since applications like tion was to address applications such as ification be written in modular ANSI-C code VoIP in enterprise networks where low using the 16–32 bit fixed-point basic operators Music on Hold and end-to-end delay is crucial. set provided in the ITU-T Software Tool Library Sharing music (“Hear [6]. GOALS AND APPLICATION SCENARIOS Once the design constraints were set, the what I hear”) require The primary applications for G.729.1 are packe- desired quality requirements could be set, and a good quality level tized speech over wireline networks (VoIP, voice those are described next. at 32 kb/s, the com- over asynchronous transfer mode [VoATM], IP phones, etc.) like G.729, but also high-quality QUALITY REQUIREMENTS mittee agreed to set audio/videoconferencing. G.729.1 is designed to Quality requirements were set to a subset of five appropriate offer a single coding format that can adapt to bit rates: two in narrowband (8 and 12 kb/s) and the demands of all integrated services over IP. three in wideband (14, 24, and 32 kb/s). Addi- requirements. Improvement in sound quality is achieved by tionally, gradual quality increase was required extending the audio bandwidth. Bandwidth scala- from 14 to 32 kb/s in 2 kb/s intervals. bility is very useful since it allows the flexibility Given all the constraints of scalability, inter- to switch audio bandwidth fully dynamically operability, and complexity limitations, the quali- when interoperating with existing infrastructure. ty requirements were set in comparison with Bit rate scalability allows simple adjustment of already deployed wideband coders (e.g., G.722) the bit rate to network or terminal capabilities at bit rates above 14 kb/s. For lower bit rates, and multiplexing in gateways. Simultaneous scal- the requirements were set such that the codec ability in bit rate and audio bandwidth allows the would offer significant improvement over nar- codec to cope with multiple access technologies rowband services for VoIP customers. for heterogeneous networks and terminals. As G.729.1 is primarily dedicated to speech In order to obtain a codec with the desired signals for conversational services, the perfor- capabilities, some design constraints were set mance for music signals was not the major crite- during the standardization process. They are rion. Yet, since applications like music on hold described in the next section. and sharing music (“hear what I hear”) require a good quality level at 32 kb/s, the committee DESIGN CONSTRAINTS agreed to set appropriate requirements. Table 1 Besides the usual requirements for a speech introduces some quality requirements/objectives coder, specific constraints were identified: inter- set for the G.729.1 standardization process in operability with G.729 / G.729 Annex A includ- the five tested bit rates for some conditions. A ing Voice Activity Detector (VAD)/DTX, audio more complete set is given in [7, 8]. bandwidth scalability from narrowband to wide- band, and bit rate scalability covering the range ECHNICAL VERVIEW OF from 8 to 32 kb/s. The frame size was set to 20 T O G.729.1 ms, which corresponds to the usual packet size ITU-T G.729.1 is an 8–32 kb/s scalable narrow- in VoIP applications. band/wideband coder. The codec operates on 20 Limiting the complexity was highlighted as a ms frames (called superframes), although the key factor: complexity must not exceed 40 core layer and the first enhancement layer at 12 weighted million operations per second kb/s operate on 10 ms frames similar to G.729. (WMOPS), and the memory size must be below A major novelty was that G.729.1 provides bit 30 kwords (16-bit words) for the RAM and rate and bandwidth scalability at the same time. below 64 kwords for the ROM. Bit rates at 8 and 12 kb/s are narrowband. The Although significantly higher than the 15 ms wideband rates range from 14 to 32 kb/s at 2 delay of legacy G.729, the delay requirement was kb/s intervals.