7MPEG Advanced Audio Coding
Total Page:16
File Type:pdf, Size:1020Kb
7 MPEG advanced audio coding MPEG advanced audio coding (AAC) and its successors are currently the most powerful members in the MPEG family for high-quality, multichannel, digital au- dio compression. MPEG AAC provides a various selection of operating modes and configurations, and therefore supports a wide range of applications from low bit- rate Internet audio to multichannel broadcasting services. In this chapter, we will introduce the main features of the AAC multichannel coding system. 7.1. Introduction to advanced audio coding As part of the MPEG-2 standard (ISO/IEC 13818-7) [60], MPEG-2 AAC was fi- nalized in 1997. Initially the MPEG-2 AAC was called the MPEG-2 non-backward compatible (NBC) coding [15]. The standardization of MPEG-2 NBC started after the completion of the MPEG-2 multichannel audio standard in 1994 [59]. At that time, the MPEG audio subgroup wished to develop a new multichannel coding standard that allows an even higher quality other than the first MPEG-2 multi- channel audio standard, which is backward compatible with MPEG-1. The devel- opment goal of this new multichannel audio coding algorithm is to achieve indis- tinguishable quality according to the EBU definition [11] at the bit rate of 384 kbps or lower for five full-bandwidth channel signals, without any backward compati- bility constrain. MPEG-4 AAC, which is also referred to as MPEG-4 general audio (T/F) coder [58], added several new audio tools on top of MPEG-2 AAC. Some of the new tools borrow ideas from speech coding to even lower the encoding bit rate while maintaining the similar sound quality. Others modified certain existing MPEG-2 AAC coding blocks to enable a few desirable features and provide more function- alities. With a typical bit rate of 64 kbps per channel, MPEG-4 AAC can achieve near-transparent coding for CD-quality original input sound. Even at very low bit rates down to 16 kbps, MPEG-4 AAC still exhibits excellent performance. A scalable audio coding standard is also included in MPEG-4 audio. MPEG-4 version 1 provides large-step scalability, which integrated several existing MPEG-4 audio coding schemes to deliver a scalable audio bitstream. AAC is adopted as part of this large-step scalable audio coding architecture to provide enhancement layer bitstream. A fine-grain scalable (FGS) audio framework was developed in MPEG-4 92 MPEG advanced audio coding Scalable sampling rate Main Low complexity 6kHz 12 kHz 18 kHz 20 kHz Figure 7.1. Three MPEG-2 AAC profiles [60]. audio version 2 [65]. The new FGS audio codec is called bit-sliced arithmetic cod- ing (BSAC). It replaces the AAC’s quantization and entropy coding module to en- able the decoder to reconstruct the audio signal more and more accurately. MPEG-4 high-efficiency AAC (HE AAC) is the next generation of AAC stan- dard, which utilizes AAC as the core coder and extends the higher-frequency com- ponent and stereo information with the help of a set of limited parameters. Since this new algorithm is a model-based parametric extension to AAC, it further reduces the bit rate and is capable to reproduce a CD-quality stereo sound at 24 kbps, which is about 64 times smaller than the original file. In this chapter, audio tools in all AAC-related standards including MPEG-2 AAC, MPEG-4 AAC, MPEG-4 BSAC, and MPEG-4 HEAAC will be discussed in detail. 7.2. MPEG-2 AAC 7.2.1. Overview of MPEG-2 AAC 7.2.1.1. Profiles MPEG-2 AAC provides three profiles that can be selected by end-users according to their desired complexity and quality requirements. These three profiles are main profile, low-complexity (LC) profile, and scalable sampling rate (SSR) profile, and their relationships are shown in Figure 7.1. (1) Main profile. AAC main profile, abbreviated as AAC main, provides the best audio quality at any given bit rate among all three profiles. All coding tools except the gain control are enabled in AAC main. Thus both the memory require- ment and the computational complexity of this configuration are higher than the other two profiles. However, the main profile AAC decoder is more powerful and is capable to decode a bitstream generated by a low-complexity AAC encoder. (2) Low-complexity profile. AAC low-complexity profile, abbreviated as AAC LC, does not use the gain control and prediction tools. Moreover, the temporal MPEG-2 AAC 93 Psychoacoustic Threshold model Input signal Block type Quantization Pre- Gain Spectral Transform & processing control processing entropy coding Multiplex Compressed bitstream Figure 7.2. The block diagram of the AAC encoder. noise shaping (TNS) filter has a lower order than that of the AAC main codec. With this configuration, the AAC codec consumes considerably lower memory and processing power requirements. However, degradation of the reconstructed sound quality is not obvious. In fact, among all three profiles, AAC LC is the most popular profile adopted in industry. (3) Scalable sampling rate profile. AAC scalable sampling rate profile, abbrevi- ated as AAC SSR, is capable to provide a frequency scalable signal. The gain control is activated when the AAC codec is operating in this profile. Except the gain con- trol module, other audio tools are operating at the same configuration as the AAC LC profile. Therefore, the memory requirement and the computational complex- ity of the AAC SSR codec are also significantly lower than that of the AAC main codec. 7.2.1.2. High-level block diagrams Figures 7.2 and 7.3 are high-level block diagrams of AAC encoder and decoder, respectively. At the encoder side, input signals are first processed by the preprocessing module if it is necessary. Depending on the required bit rate and the sampling frequency of the input signal, the preprocessing module downsamples the original signal so that better quality can be achieved with the limited bit budget. If the in- put signal has out-of-range amplitude, it will also be processed in this module. If the encoder is operating in SSR profile, signals are then processed by the gain con- trol. Otherwise, they directly go to the transform block to map the time-domain signal to frequency domain and obtain spectral data. A serial of spectral process- ing is followed to remove irrelevant and unimportant information. Finally, the processed spectral data is quantized and noiseless coded. Throughout the entire encoding process, the output from the psychoacoustic model, such as block type 94 MPEG advanced audio coding Compressed bitstream Demultiplex Block type Entropy decoding Inverse Inverse Inverse Post- spectral & transform gain processing inverse quantization processing control Output signal Figure 7.3. The block diagram of the AAC decoder. and masking thresholds, is used as a guide line to control each encoding module. The compressed spectral data as well as that side information generated in each coding module are all multiplexed in the bitstream. When the decoder receives a compressed bitstream, it first demultiplexes it. Compressed spectral data is then decoded by entropy decoder and inverse quan- tizer. Side information extracted from the bitstream is used to control the inverse spectral processing, inverse transform, and inverse gain control if activated. If the original sound input to the encoder has out-of-range amplitude, then the post- processing module restores the reconstructed signal back to its original amplitude range. 7.2.2. Psychoacoustic model Psychoacoustic model plays the most important role in the perceptual audio cod- ing. The output of this model controls almost every major coding block in the AAC encoder. The main task of this psychoacoustic model is to calculate masking thresholds. Any quantization noise added in the later quantization block is imper- ceptible if its energy is smaller than the masking thresholds. Ideally, if all encoding errors are masked by the signal itself, then the reconstructed sound is perceptu- ally indistinguishable from the original input. However, if the available bit budget does not allow all spectral data to meet masking thresholds, psychoacoustic model needs to guide the quantizer so that the minimum perceptual noise will be added by loosening the psychoacoustic requirements. 7.2.3. Gain control The gain control block, which is only activated in the SSR profile, includes a polyp- hase quadrature filter (PQF), gain detectors, and gain modifiers. The PQF filter- bank separates the input signal to four equal-width frequency bands. For example, if input sound has 48 kHz sampling rate, then four-band output from the PQF MPEG-2 AAC 95 filter contains signals of 0–6 kHz, 6–12 kHz, 12–18 kHz, and 18–24 kHz. Hence, the bandwidth scalability can be achieved by discarding one or more upper band signals. The distinct feature of this structure allows the decoder to reconstruct a reduced bandwidth signal with lower computational complexity. The gain detec- tor gathers information from the output of the PQF filter and let the gain modi- fier know which one of these four bands needs amplitude adjustment and how it should be done. The inverse gain control module in the decoder consists of gain compensators and inverse polyphase quadrature filter (IPQF). Four gain compensators corre- sponding to four equal-width frequency bands are required if the full-bandwidth signal needs to be reconstructed. Each gain compensator recovers gain control data then restores the original signal dynamics. The resulting outputs after the gain compensation are combined and synthesized by the IPQF to generate final PCM data. 7.2.4. Transform 7.2.4.1. Modified discrete cosine transform The conversion between time-domain signals as the input of the encoder or the output of the decoder and their corresponding frequency representations is a fun- damental component of the MPEG AAC audio coder.