ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding*

PAPERS ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* RALF GEIGER,1 RONGSHAN YU,2** JU¨ RGEN HERRE,1 SUSANTO RAHARDJA,2 ([email protected]) ([email protected]) ([email protected]) ([email protected]) SANG-WOOK KIM,3 XIAO LIN,2 and MARKUS SCHMIDT1 ([email protected]) ([email protected]) ([email protected]) 1Fraunhofer IIS, Erlangen, Germany 2Institute for Infocomm Research, Singapore 3Samsung Electronics, Suwon, Korea Recently the MPEG Audio standardization group has successfully concluded the standardization process on technology for lossless coding of audio signals. A summary of the scalable lossless coding (SLS) technology as one of the results of this standardization work is given. MPEG-4 scalable lossless coding provides a fine-grain scalable lossless extension of the well-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at word lengths and sampling rates typically used for high-resolution audio. The underlying innova- tive technology is described in detail and its performance is characterized for lossless and near lossless representation, both in conjunction with an AAC coder and as a stand-alone compression engine. A number of application scenarios for the new technology are discussed. 0 INTRODUCTION dia types, such as DVD-Audio [11], SACD [12], HD- DVD [11], or Blu-Ray Disc [13]. In the realm of audio, Perceptual coding of high-quality audio signals has ex- this is achieved by employing lossless formats with high perienced a tremendous evolution over the past two de- resolution (word length) and/or high sampling rate. cades, both in terms of research progress and in worldwide There exist several proprietary lossless formats, the deployment in products. Examples for successful applica- most prominent of which are MLP Lossless [14], [15], tions include portable music players, Internet audio, audio DTS-HD [16], and Apple Lossless [17]. Furthermore, sev- for digital media (such as VCD, DVD), and digital broad- eral freeware coding systems have gained some promi- casting. Several phases of international standardization nence through the Internet. Among them are FLAC [18], have been conducted successfully [1]–[3]. Many recent Monkey’s Audio [19], and OptimFROG [20]. research and standardization efforts focus at achieving On the other end of the bit-rate scale, approaches for good sound quality at even lower bit rates [4]–[9] to ac- scalable perceptual audio coding have been developed in commodate storage and transmission channels with lim- the recent years. Within the International Telecommuni- ited quality (such as terrestrial and satellite broadcasting cation Union (ITU) scalability in wide-band audio coding and third-generation mobile telecommunication). Never- has been adopted only recently [21]. Within MPEG-4 Au- theless, for other scenarios with higher transmission band- dio [3], scalability has been developed and adopted some- width available, there is a general trend toward providing time earlier [22]–[24]. This includes the realm of speech the consumer with a media experience of extremely high coding where scalability is provided both for harmonic fidelity [10], as it is frequently associated with the terms vector excitation coding (HVXC) [25], [26] and code- “high definition” and “high resolution” and dedicated me- excited linear prediction (CELP) [27], [28]. In this context the ISO/MPEG Audio standardization group decided to start a new work item to explore tech- *Manuscript received 2006 March 23; revised 2006 October nology for lossless and near-lossless coding of audio sig- 24 and November 27. nals by issuing a call for proposals for relevant technology **Currently with Dolby Laboratories, San Francisco, CA. in 2002 [29]. Three specifications emerged from this call J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 27 GEIGER ET AL. PAPERS as amendments to the MPEG-4 Audio standard. First the mation to produce losslessly/near-losslessly coded audio standard on lossless coding of 1-bit oversampled signals for high-definition applications. [30] specifies the lossless compression of highly oversampled 1-bit sigma–delta-modulated audio signals as 1.2 AAC Background they are stored on the SACD media under the name Direct Since the MPEG-4 SLS coder has been designed to Stream Digital (DSD) [31]. Second the audio lossless cod- operate as an enhancement to MPEG-4 AAC [3], [34], its ing (ALS) specification [32] describes technology for the structure is closely related to that of the underlying AAC lossless coding of PCM-coded signals at sampling rates up core coder [35]. This section sketches the architectural to 192 kHz as well as floating-point audio. features of MPEG-4 AAC as they are relevant to the SLS This paper provides an overview of the third descen- enhancement technology. dant, the scalable lossless coding (SLS) specification [33], The underlying AAC codec provides efficient percep- which extends traditional methods for perceptual audio tual audio coding with high quality and achieves broadcast coding toward lossless coding of high-resolution audio quality at a bit rate of about 64 kbps per channel [36]. Fig. signals in a scalable way. Specifically, it allows to scale up 3 gives a very concise view of the AAC encoder’s struc- from a perceptually coded representation (MPEG-4 ad- ture. The audio signal is processed in a blockwise spectral vanced audio coding, AAC) to a fully lossless representa- representation using the modified discrete cosine trans- tion with high definition, including a wide range of inter- form (MDCT) [37]. The resulting 1024 spectral values are mediate near-lossless representations. The paper starts quantized and coded considering the required accuracy as with an explanation of the general concept, discusses the demanded by the perceptual model. This is done to mini- underlying novel technology components, and character- mize the perceptibility of the introduced quantization dis- izes the codec in terms of compression performance and tortion by exploiting masking effects. Several neighboring complexity. Finally a number of application scenarios are spectral values are grouped into so-called scale factor briefly outlined. bands, sharing the same scale factor for quantization. Prior to the quantization/coding (Q/C) tool, a number of pro- 1 CONCEPT AND TECHNOLOGY OVERVIEW cessing tools operate on the spectral coefficients in order to improve coding performance for certain situations. The This section provides an overview of the principle and most important tools are the following. the structure of the scalable lossless coding technology and its combination with the AAC perceptual audio coder. • The temporal noise shaping (TNS) tool [38] carries out This combination will be referred to as high-definition predictive filtering across frequency in order to achieve advanced audio coding (HD-AAC) in the following. a temporal shaping of the quantization noise according to the signal envelope and in this way optimize temporal 1.1 General System Structure masking of the quantization distortion. Fig. 1 shows a very high-level view of the structure of • The M/S stereo coding tool [39] provides sum/ the HD-AAC encoder. The input signal is first coded by an difference coding of channel pairs, exploits interchannel AAC (“core layer”) encoder. The SLS algorithm uses the redundancy for near-monophonic signals, and avoids output to enhance the system’s performance toward loss- binaural unmasking. less coding, resulting in an enhancement layer. The two layers of information are subsequently multiplexed into 1.3 HD-AAC/SLS Enhancement one high-definition bit stream. Decoding of an HD-AAC The SLS scalable lossless enhancement works on top of bit stream is illustrated in Fig. 2. From the HD-AAC bit this AAC architecture. The structures of an HD-AAC en- stream, decoders are able either to decode the perceptually coder and decoder are shown in Figs. 4 and 5. coded AAC part only, or to use the additional SLS infor- In the encoder the audio signal is converted into a spectral representation using the integer modified discrete cosine transform (IntMDCT) [40], [41]. This transform represents an invertible integer approximation of the MDCT and is well-suited for lossless coding in the frequency domain. Other AAC coding tools, such as mid/side coding or temporal noise shaping, are also considered and performed on the IntMDCT spectral coefficients in an invertible integer fash- ion, thus maintaining the similarity between the spectral val- Fig. 1. HD-AAC encoder structure. ues used in the AAC coder and in the lossless enhancement. Fig. 2. HD-AAC decoder structure. Fig. 3. Structure of AAC encoder (simplified). 28 J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February PAPERS ISO/IEC MPEG-4 SCALABLE AAC The link between the perceptual core and the scalable as it results from the noise-shaping process of the AAC lossless enhancement layer of the coder [42] is provided perceptual coder, and thus takes advantage of the AAC by an error-mapping process. The error-mapping process perceptual model. removes the information that has already been coded in the The following sections will describe the principles un- perceptual (AAC) path from the IntMDCT spectral coef- derlying the design of SLS in greater detail by focusing on ficients such that only the resulting (IntMDCT) residuals its IntMDCT filter bank, error-mapping strategy, and en- are coded in the enhancement encoder, and in this way the tropy coding parts. coding efficiency benefits from the underlying AAC layer. The error-mapping process also preserves the probability 2 FILTER BANK distribution skew of the original IntMDCT coefficients and thus permits an efficient encoding of the residual by means 2.1 IntMDCT of two bit-plane coding processes, namely, bit-plane Golomb The IntMDCT, as introduced in [40], is an invertible coding (BPGC) [43] and context-based arithmetic coding integer approximation of the MDCT, which is obtained by (CBAC) [44], and a so-called low-energy-mode encoder, utilizing the “lifting scheme” [45] or “ladder network” all of which will be described later in further detail.

ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding*

DVB DASH Webinar 13 June 2018 DVB DASH: an Overview Simon Waller (Samsung) DVB Codecs and DVB DASH Chris Poole (BBC) DVB DASH in the Wild Martin Schmalohr (IRT)

Lossless Audio Codec Comparison

Overview of Technologies for Immersive Visual Experiences: Capture, Processing, Compression, Standardization and Display

Communications/Information

A Forensic Database for Digital Audio, Video, and Image Media

MPEG Surround Audio Coding on TI Floating-Point Dsps

Ardour Export Redesign

Audio Coding for Digital Broadcasting

Saracon Manual

(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW

Ogg Audio Codec Download

Multimedia Compression Techniques for Streaming