A New Parametric Stereo and Multi Channel Extension for MPEG-4 Enhanced Low Delay AAC (AAC-ELD)
Total Page:16
File Type:pdf, Size:1020Kb
A new parametric Stereo and Multi Channel Extension for MPEG-4 Enhanced Low Delay AAC (AAC-ELD) Mar´ıaLuis Valero1, Andreas H¨olzer3, Markus Schnell 1, Johannes Hilpert1, Manfred Lutzky 1, Jonas Engdeg˚ard2, Heiko Purnhagen2, Per Ekstrand2 and Kristofer Kj¨orling2 1Fraunhofer Institute for Integrated Circuits IIS, Am Wolfsmantel 33, 91058 Erlangen, Germany 2Dolby Sweden AB, G¨avlegatan 12A, 11330, Stockholm, Sweden 3DSP Solutions GmbH & Co. KG, Maximilianstraße 16, 93047 Regensburg, Germany Correspondence should be addressed to Manfred Lutzky ([email protected]) ABSTRACT ISO/MPEG standardizes two communication codecs with low delay: AAC-LD is a well established low delay codec for high quality communication applications such as video conferencing, tele-presence and Voice over IP. Its successor AAC-ELD offers enhanced bit rate efficiency being an ideal solution for broadcast audio gateway codecs. Many existing and upcoming communication applications benefit from the transmission of stereo or multi channel signals at low bit rates. With Low Delay MPEG Surround, ISO has recently standardized a low delay parametric extension for AAC-LD and AAC-ELD. It is based on MPEG Surround technology with specific adaption for low delay operation. This extension comes along with a significantly improved coding efficiency for transmission of stereo and multi channel signals. 1 INTRODUCTION uses a reduced frame length resulting in an algorith- mic delay of 20 ms rather than the 55 ms of AAC [2] In 1999, ISO/MPEG defined the audio codec for packet based transmission. The AAC-LD tech- Low Delay AAC (AAC-LD) [1], intended for bi- nology provides excellent sound quality for typical directional (low latency) communication. The de- audio content at bit rates between 48 and 64 kb/s sign of the codec is based on the AAC format, but per channel. In order to further increase the bit rate level/intensity differences and measures of correlation/coherence between the audio channels and 1. INTRODUCTION can be represented in an extremely compact way. At the same time, a monophonic or two-channel downmix In 2004, the ISO/MPEG Audio standardization group signal of the sound material is created and transmitted to started a new work item on efficient and backward tLuishe de Valerocoder ettog al.ether with the spatial cue information. Low Delay MPEG Surround compatible coding of high-quality multi-channel sound The downmix can be conveyed to the receiver using using parametric coding techniques. Specifically, the known audio coders for monophonic or stereophonic signals. On the decoding side, the transmitted downmix technology to be developed should be based on the efficiency, a low delay derivative of the Spectral Band stereophonic down mix of the input signal is gener- signal is expanded into a high quality multi-channel Spatial Audio Coding (SAC) approach that extends Replication (SBR) [3] tool was adopted by AAC-LD. ated by the spatial audio encoder. A compact para- output based on the spatial parameters. traditional approaches for coding of two or more This combination is known as Enhanced Low Delay metric description of the spatial properties of the channels in a way that provides several significant AAC (AAC-ELD)[4] which was finalized at MPEG multi channel input signal is conveyed next to the advantages, both in terms of compression efficiency and Due to the reduced number of audio channels to be in 2008 [5]. AAC-ELD allows the coding of "super- down mix. On the receiver side, the down mix sig- features. Firstly, it allows the transmission of multi- transmitted (e.g. just one channel for a monophonic wideband" signals (i.e. full audio bandwidth) at bit nal is expanded to the multi channel outputx ^ ...x ^ channel audio at bitrates, which so far only allowed for downmix signal), the Spatial Audio Coding approach 1 N rates down to 24 kb/s while the delay is still low - by applying the spatial parameters in the spatial au- the transmission of monophonic audio. Secondly, by its provides an extremely efficient representation of multi- underlying structure, the multi-channel audio signal is approximatelychannel audio 32sign ms.als. DuringFurtherm theore, standardizationit is backward dio decoder. process the main focus was mostly on mono signals. transmitted in a backward compatible way. As such, the compatible on the level of the downmix signal: A During the standardization of MPEG Surround, the However, MPEG offers some interesting technologies technology can be used to upgrade existing distribution receiver device without a spatial audio decoder will main focus was compression efficiency, flexibility and for the coding of multi channel content, e.g. MPEG infrastructures for stereo or mono audio content (radio simply present the downmix signal. scalability of the technology. A processing delay low Surround, which has recently been adapted for low channels, Internet streaming, music downloads etc.) enough to allow for an efficient combination with delay codecs. Such tools allow codecs to provide towards the delivery of multi-channel audio while Conceptually, this approach can be seen as an communication audio codecs was not on the require- goodenhan qualitycement o forf se stereoveral k orno multiwn tec channelhniques, contentsuch as atan retaining full compatibility with existing receivers. ments list at that time. After an intense development process, the resulting muchadvanc lowered me bittho ratesd for asjoi itnt isst possibleereo cod withing o discretef multi- MPEG Surround specification was finalized in the codingchannelof si eachgnals channel. [4], a generalization of Parametric MPEG Spatial Audio Object Coding (SAOC) has second half of 2006 [30]. Stereo [5] [6] to multi-channel application, and an been a follow-up standardization process in ISO, 2exte MPEGnsion of SURROUNDthe Binaural C ANDue Cod MPEGing (BC SPATIALC) scheme converting the principles of Spatial Audio Coding This paper summarizes the results of the standardization [7] AUDIO[8] tow OBJECTards using CODING more th (SAOC)an one transmitted from a 'channel-oriented' approach towards an 'au- process by describing the underlying ideas and downmix channel [9]. From a different viewing angle, dio objects' approach. The technology was recently providing an overview of the MPEG Surround Finalizedthe Spatial in Au 2007,dio C MPEGoding ap Surroundproach ma (ISO/IECy also be finalized in early 2010. SAOC supports interactive technology. Special attention is given to the results of 23003-1considere [6])d an is ex thetens ISOion o standardf well-kno forwn themat parametricrix surround manipulation of levels and spatial rendering posi- the recent MPEG Surround verification tests that assess multischeme channels (Dolby extension Surround of/Pr monophonicologic, Logicor 7,stereo- Circle tions of independent audio objects at the receiver the technology’s performance in several operation phonicSurround audio etc.) codecs.[10] [11] MPEGby transm Surroundission of isded basedicated side. SAOC re-uses an MPEG Surround decoder modes as they would be used in typical application on(spa thetial technicalcue) side i principlesnformationof to Spatialguide th Audioe multi- Codingchannel when rendering an arbitrary number of audio ob- scenarios introducing multi-channel audio into existing (SAC).reconstru Thection building process blocks and arethus shown achie inve Figureimpro 1.ved jects to a multi channel loudspeaker setup. In order subjective audio quality [1]. audio services. to support interactive teleconferencing application scenarios, a low delay operation mode of SAOC pro- Sections 2 and 3 illustrate the basic approach and the cessing as well as a low delay multi channel renderer MPEG standardization process that was executed to (LD MPEG Surround) were specified in ISO/IEC develop the MPEG Surround specification. The core of 23003-2 [8]. the MPEG Surround architecture and its further extensions are described in Sections 4 and 5. Finally, 3 LOW DELAY MPEG SURROUND system performance and applications are discussed. 3.1 Introduction 2. SPATIAL AUDIO CODING BASICS FigFigureure 1: Pr 1:inciSACple of block Spatia diagraml Audio C [7]oding The low delay variant of MPEG Surround is directly derived from the original MPEG Surround specifica- In a nutshell, the general underlying concept of Spatial tion but optimized towards a low algorithmic delay. Audio Coding can be outlined as follows: Rather than Due to the combination of bitrate-efficiency and In order to motivate the applied changes, all possible performing a discrete coding of the individual audio Abac spatialkward c audioompat encoderibility, SA actsC te aschn ao preprocessorlogy can be us toed ato delay sources are identified in the following section. input channels, a system based on Spatial Audio Coding conventionalenhance a lar perceptualge number audioof ex coder.isting m Ito derivesno or s per-tereo captures the spatial image of a multi-channel audio ceptuallyservices fr relevantom stereo parametersphonic (or m likeono levelphonic differences) to multi- 3.2 Delay Analysis of MPEG Surround signal into a compact set of parameters that can be used andchann correlationel transmiss betweenion in a c channelsompatible fromfashio then. T multio this to synthesize a high quality multi-channel representation channelaim, the (e.g.existi 5.1ng channels)audio trans inputmissio signaln chanxn1e...l cxaNrr.ie Thes the MPEG Surround comprises an algorithmic delay ex- from a transmitted downmix signal. Figure 1 illustrates signaldownm analysisix signal in, a MPEGnd the