A New Model-Based Algorithm for Optimizing the MPEG-AAC in MS-Stereo Olivier Derrien, Gael Richard

A new model-based algorithm for optimizing the MPEG-AAC in MS-stereo Olivier Derrien, Gael Richard To cite this version: Olivier Derrien, Gael Richard. A new model-based algorithm for optimizing the MPEG-AAC in MS- stereo. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2008, 16 (8), pp.1373-1382. 10.1109/TASL.2008.2002068. hal-00467510 HAL Id: hal-00467510 https://hal.archives-ouvertes.fr/hal-00467510 Submitted on 26 Mar 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 A new model-based algorithm for optimizing the MPEG-AAC in MS-stereo Olivier Derrien and Gaël Richard, Senior Member, IEEE Abstract— In this paper, a new model-based algorithm Middle (M) and Sides (S) channels. M and S are coded for optimizing the MPEG-Advanced Audio Coder (AAC) instead of L and R and the reverse transformation is per- in MS-stereo mode is presented. This algorithm is an ex- tension to stereo signals of prior work on a statistical model formed at the decoder side. In MPEG-AAC, LR and MS of quantization noise. Traditionally, MS-stereo coding ap- mode can be used alternatively for each frequency subband proaches replace the Left (L) and Right (R) channels by the and each frame. The moderate coding gain is compen- Middle (M) and Sides (S) channels, each channel being in- sated by a small amount of side-information (one bit per dependently processed, almost like a monophonic signal. In contrast, our method proposes a global approach for coding subband). Other linear transformations have been pro- both channels in the same process. A model for the quan- posed in the literature: Inter-channel decorrelation with tization error allows us to tune the quantizers on channels a Karhunen-Loeve transform [5], inter-channel prediction M and S with respect to a distortion constraint on the re- constructed channels L and R as they will appear in the [6], [7], and more recently a time-aligned version of the MS decoder. This approach leads to a more efficient percep- transformation [8]. With all these techniques, the coding tual noise-shaping and avoids using complex psychoacoustic gain is increased for some signals, but at the expense of models built on the M and S channels. Furthermore, it pro- vides a straightforward scheme to choose between LR and additional side information. MS modes in each subband for each frame. Subjective lis- Parametric stereo coding is another popular scheme for tening tests prove that the coding efficiency at a medium increasing the coding efficiency. A core monophonic coder bitrate (96 kbits/s for both channels) is significantly bet- is used in combination with additional parameters that de- ter with our algorithm than with the standard algorithm, without increase of complexity. scribe the stereo information. The resulting auxiliary bit- Keywords— Perceptual audio coding, MPEG-AAC, MS- stream usually requires very few coding bits, but the orig- stereo, statistical model, quantization, scalefactor, bitrate inal signals in channels L and R can not be totally recov- constraint, distortion constraint, optimization algorithm. ered, even for very high bitrates. Thus, parametric stereo schemes are suitable for low bitrate applications. Orig- I. Introduction inally, a simple parametric stereo mode, called Intensity Stereo (IS), was specified in the MPEG-AAC standard. The MPEG-4 Advanced Audio Coder (AAC) is the lat- It consists of coding only the M channel and an inter- est international standard for high-quality lossy audio cod- channel intensity difference parameter for each subband. ing [1], [2]. Its application field is still expanding, includ- Since, many studies have been carried out on parametric ing consumer audio equipment and digital video broad- stereo (see for instance [9], [10]) and in the latest exten- casting. This codec has been derived in several profiles sion of MPEG-AAC, called HE-AAC v2 [11], the paramet- i.e. variations, for different applications: Low Complexity ric stereo mode can be considered as an improved version (LC-AAC), Low Delay (LD-AAC), High Efficiency (HE- of the original IS mode: more parameters can be used AAC/AACPlus) etc. The MPEG-AAC is a frame-based to describe the stereo image (intensity difference, cross- transform-coder. Its apparent complexity is due to a large correlation, phase/time difference). However, the typical variety of coding parameters, which make the optimization bitrate for the HE-AAC v2 is quite low: 24kbps for both process difficult to engineer and recent publications show channels. that AAC optimization is still a current issue [3]. In this paper, we consider high-quality/high-bitrate ap- The MPEG-AAC is a multichannel codec, designed for plications and focus on the MS-stereo mode for the MPEG- stereo and surround audio applications. An AAC audio AAC, especially on the implementation of the optimization stream can include single channels and channel pairs. A algorithm which is strongly related to the coding efficiency. single channel corresponds to a monophonic audio scene, a In a previous paper [12], we proposed a new algorithm for channel pair to a stereophonic scene (Left and Right chan- the single channel case, based on a statistical model of the nels). With the basic coding scheme for a channel pair, quantization noise. In the informative annex of the MPEG- called LR-stereo in MPEG-AAC, each channel is processed AAC standard [1], an implementation of the coding algo- as a monophonic signal. However, when a stereo signal ex- rithm is described. In this paper, it will be referred to as hibits significant inter-channel redundancy, the LR mode is the standard algorithm. Compared to this algorithm, our quite ineffective. Improving the coding efficiency by remov- method exhibits a lower complexity and a better sound ing the redundancy is possible with stereo coding modes. quality for the same bitrate. In this paper, we extend this A popular method for inter-channel decorrelation is the model to the MS-stereo case, and propose a new efficient sum-difference transformation [4]. This technique, also re- algorithm for coding a channel pair. ferred to as MS joint channel coding, consists of a linear This article is divided in three parts. First, we briefly combination of the Left (L) and Right (R) channels to get describe the MPEG-AAC codec and the MS-stereo mode. 2 CODER Optimizationloop Psychoacoustic Sub-band Scalefactor model Masking optimization Threshold Scalefactors Time- Huffman Bit- domain MDCT Quantized MDCT Sub-band coding stream signal coefficients quantization coefficients audiosignal DECODER control Scalefactors Bit- Huffman Time- decoding Quantized Reverse MDCT stream iMDCT domain coefficients quantization coefficients signal Fig. 1. Synopsis of a MPEG AAC codec. Then, after recalling the main results of the monophonic consists of finding the scaling parameters A(s) which max- model, we describe our stereophonic model and the new op- imize the audio quality under a bitrate constraint. This timization algorithm. Finally, we compare our algorithm to is generally implemented with an iterative algorithm in- the standard MPEG-AAC, both in terms of audio quality cluding quantization and Huffman coding modules, and a and computational complexity. psycho-acoustic model. Both psychoacoustic model and optimization algorithm are not specified in the standard, II. MPEG-AAC MS-stereo mode in order to allow for future advances in technology that will A. Quantization and coding improve the coding efficiency. Figure 1 presents the general scheme of a MPEG-AAC B. The MS-stereo mode codec. The audio signal is segmented in variable-length analysis windows (256 or 2048 samples) with 50% over- When the audio signal is a channel pair, we denote XL(k) lap. Over each window, the signal is transformed in a fre- and XR(k) the MDCT coefficients corresponding respec- quency domain with a Modified Discrete Cosine Transform tively to channels Left and Right. The MS transformation (MDCT) [13]. In this paper, we denote X(k) the MDCT is defined by: coefficients corresponding to a single channel, over the cur- X k 1 X k X k rent analysis window. k is a frequency index. Variable M ( ) = 2 [ L( )+ R( )] (3) length frequency subbands are defined as non-overlapping 1 XS(k) = [XL(k) XR(k)] subsets of frequency indexes: k kmin(s) kmax(s) 2 − where s is a subband index. Subband∈ { width K· · ·increases} It can be used independently for each subband. A one- along the frequency scale. bit flag per subband indicates whether the MS transforma- The MDCT coefficients are quantized subband by sub- tion is used. In MS mode, X and X are quantized and band according to: M S coded instead of XL and XR. On the decoder side, the re- 3 ˆ ˆ X(k) 4 constructed MDCT coefficients XM and XS are obtained i(k)= (1) after the reverse quantization process. Finally the reverse R A(s) ! MS transformation is performed: where A(s) is a scaling parameter, is a rounding func- Xˆ (k) = Xˆ (k)+ Xˆ (k) tion and i(k) are the quantization indexes.R A(s) follows a L M S (4) logarithmic scale: 1 φ s XˆR(k) = XˆM (k) XˆS(k) A(s)=2 4 ( ) (2) − where φ is an integer parameter called scalefactor.

Load more