Trends in Standardization of Audio Coding Technologies
Total Page:16
File Type:pdf, Size:1020Kb
FEATURE Trends in Standardization of Audio Coding Technologies Tomoyasu Komori, Advanced Television Systems Research Division An ordinance from the Ministry of Internal Affairs and coding, regulations were added on the number of channels Communications (MIC) was issued in 2011 on revision and constraints such as the prediction order. of the audio coding formats of 8K Super Hi-Vision (8K) This article describes these trends in international and broadcasting using 22.2 multichannel (22.2 ch) sound. domestic standardization and introduces the latest 3D audio The ordinance makes it possible to use 22.2 ch sound in coding scheme, called MPEG-H 3D Audio, which was broadcasting satellite (BS) digital broadcasts and other standardized in February, 2015. media. In particular, it specifies that digital broadcast audio formats conform to either MPEG-4 Advanced Audio Coding (AAC) or Audio Lossless Coding (ALS). The Association 2. Overview of 22.2 ch sound of Radio Industries and Businesses (ARIB) revised ARIB 22.2 ch is a 3D sound format with a total of 24 channels STD-B32 accordingly. These revisions set the maximum arranged in three layers6). number of audio input channels for digital broadcasts to “22 There are nine channels in the top layer, above the channels and two low-frequency effect (LFE) channels”, viewing position, ten channels in the middle layer, at the and added MPEG-4 AAC and ALS to the available formats. level of the viewers’ ears, three channels in the bottom This article describes the latest trends in standardization and layer, below the viewer’s position, and two LFE channels. audio coding formats for 3D sound. The arrangement and labels of the channels in 22.2 ch sound system are shown in Figure 1. NHK set requirements for a highly realistic sound 1. Introduction format suitable for 8K broadcasts, conducted subjective In Japan, audio encoding formats were revised in 2011 evaluations showing that 22.2 ch sound system meets these by the issuing of MIC ordinance No. 87, “Standard digital requirements, and has been contributing to standardization broadcasting formats for television broadcasting1)”, to of the format in Japan and internationally6). enable 8K broadcasts with 22.2 ch sound. The ordinance increases the maximum number of input audio channels for BS and Communications Satellite (CS) digital broadcasts 3. Overview of MPEG-4 AAC standard and ALS from 5.1 ch (5 channels and 1 LFE channel) to 22.2 ch standard (22 channels and two LFE channels). Audio encodings 3.1 Compression encoding technology for audio for 8K broadcasts were also regulated to conform to the There are two main types of encoding MPEG-4 AAC standard2), which is the most efficient lossy technology used for compression of audio signals. 3) compression coding, or to MPEG-4 ALS , which is a non- (a) Coding methods that consider auditory characteristics: lossy compression coding. with these methods, the distortion produced from the ARIB revised its standard, ARIB STD-B32, “Video encoding is either completely or almost completely codings, audio codings and multiplexing methods for undetectable acoustically, even with compression. 4) digital broadcasting ” in response to the MIC ordinance. (b) Methods that attempt to eliminate redundancy in the In this revision, regulations were added with detailed audio data using techniques such as waveform prediction or specifications supporting 22.2 ch audio modes in the statistical methods: if the original signal can be perfectly MPEG-4 AAC audio coding5). For MPEG-4 ALS audio reproduced from the received data, it is called a lossless 11 FEATURE Top layer: 9 channels TpFC TpFL TpFR TpSiL TpC TpSiR TpBL TpBR TpBC Middle layer: 10 channels BtFC FC LFE1 LFE2 FLc FRc BtFL BtFR FL FR Bottom layer: 3 channels + SiL SiR 2 LFE channels BL BR BC Figure 1: 22.2 ch audio channel placement and labels encoding. audio encoding using auditory characteristics is shown in AAC is a type (a) method, while ALS is a type (b) method. Figure 2. To break down audio into frequency components, MPEG-4 AAC uses a “transform coding” method, which 3.2 Overview of MPEG-4 AAC uses the Discrete Cosine Transform (DCT) to convert MPEG-4 AAC is standardized in International the signal directly into a frequency domain signal. When Organization for Standardization/ International performing transform coding, the long window (block) used Electrotechnical Commission 14496-3 Subpart 4. MPEG-4 to transform the signal from the time into the frequency AAC is an extension of MPEG-2 AAC (ISO/IEC 13818- domain is 2,048 samples, but this can be changed adaptively 7)7); it can efficiently encode audio signals such as music to 256 sample blocks if a finer time resolution is needed. and can handle multichannel signals such as 22.2 ch in MPEG-4 AAC has several audio object types*2, but addition to monaural and stereo. broadcast services currently only use “Low Complexity” MPEG-4 AAC is a type of frequency-domain compression (LC), which has a good balance between the size of the encoding, which encodes by analyzing frequency decoder circuit and sound quality. components of the audio signal and using techniques such as With MPEG-4 AAC, almost no distortion due to encoding masking*1 to achieve high compression rates by exploiting can be detected, even when compressing a stereo signal by the characteristics of human hearing. A block diagram of approximately 1/12 its original size into something in the *1 The phenomenon by which a sound is obscured by another *2 MPEG-4 audio classifies according to the codecs and tools sound so that it cannot be heard or seems as though its vol- that can be used. ume is low. 12 FEATURE Audio signal Time⇒ Frequency, Quantization Bitstreamformatting Encoded transform Encoding bitstream Psychoacoustic model Figure 2: Block diagram of audio coding using Psychoacoustic model range of 128 to 144 kbps. 3.4 Overview of MPEG-4 ALS MPEG-4 ALS was standardized as ISO/IEC 14496-3:2007 3.3 Differences between MPEG-2 AAC and MPEG-4 AAC Amd.2 MPEG-4 Audio Lossless Coding in March, 2006. MPEG-2 AAC (ISO/IEC 13818-7) and MPEG-4 AAC It is a type of lossless encoding and can exactly reproduce (ISO/IEC 14496-3 Subpart 4) use almost the same tools the original waveform through predictive analysis, by using for compressing audio signals, but MPEG-4 AAC adds an linear predictive techniques on past sample, even for multi- encoding tool called Perceptual Noise Substitution (PNS)*3. channel signals and signals with high sampling rates. The When encoding audio, much of the required bit rate is for input audio signal is analyzed in order to calculate the transmitting the DCT coefficients gotten from transforming linear prediction parameters and prediction residual. The the audio signal into the frequency domain. PNS reduces parameters and residual are variable-length encoded to the bit rate by treating signals within a scale-factor band*4 as format the encoded bitstream (Figure 3). The amplitude of noise within the band and sends only the applicable power the prediction residual is generally small compared with information. That information is then used to add noise of the original signal, and this characteristic can be used to a suitable level when reconstructing the audio signal during compress the amount of data relative to the uncompressed decoding. data by 15% to 70%. 4. ARIB STD-B32 revisions *3 A tool that replaces noise with a small amount of data when Several revisions to ARIB STD-B32 were made to encoding signals and adds a noise waveform at the receiving support ultra-high-definition television in advanced BS side. digital broadcasts. In addition to supporting 22.2 ch audio *4 A group summarizing DCT coefficients for neighboring fre- quencies. input signals, a functionality was standardized for the Linear predictive parameters Variable length coding Audio signal Linear predictive Bitstream Encoded bitstream coding formatting Variable length Prediction error coding Figure 3: Basic architecture of MPEG-4 ALS encoding and decoding 13 FEATURE down-mixing*5 parameters when 22.2 ch audio encoded of 16 bits or greater. Table 1 gives technical formats for in MPEG-4 AAC is received on devices with 5.1 ch audio applicable to each digital broadcast standard format audio or stereo, along with formats for transmitting these (from 2011 MIC ordinances No. 87 and No. 94). parameters. Dialog enhancement*6 and dialog switching Separate numbers were also assigned in the MPEG-4 functions*7 were also introduced to extend conventional audio encoding standard, for commonly used audio systems broadcast services. There are also some restrictions on the such as two-channel stereo and 5.1 ch audio. Table 2 gives parameters that can be used with MPEG-4 ALS. the numbering for the channel configurations and number Note that in the MPEG-4 audio encoding standard, there of channels usable with MPEG-4 AAC and ALS. Note that is a wide range of sampling frequencies and numbers of 22.2 ch audio is assigned the number 13. channels that can be used, but ordinances and bulletins from MIC, and the ARIB standards, specify that 8K broadcasts 4.1 Revisions for transmitting AAC down-mix coefficients must use a sampling frequency of 48 kHz and quantization When down-mixing from multichannel stereo with more than 5.1 channels (audio modes with channel configuration numbers 7, 11, 12, 13, and 14) to two-channel stereo, the *5 A way of converting a multi-channel audio signal into a signal consisting of fewer channels. signals are first down-mixed to 5.1 ch sound, and then to *6 A function that allows the volume of dialog (voices) within a two-channel stereo.