Digital Audio Compression

Digital Audio Compression By Davis Yen Pan begins with a summary of the basic audio digitiza- Abstract tion process. The next two sections present detailed descriptions of two relatively simple approaches to Compared to most digital data types, with the audio compression: "-law and adaptive differential exception of digital video, the data rates associ- pulse code modulation. In the following section, the ated with uncompressed digital audio are substan- paper gives an overview of a third, much more so- tial. Digital audio compression enables more effi- phisticated, compression audio algorithm from the cient storage and transmission of audio data. The Motion Picture Experts Group. The topics covered many forms of audio compression techniques offer a in this section are quite complex and are intended range of encoder and decoder complexity, compressed for the reader who is familiar with digital signal audio quality, and differing amounts of data com- processing. The paper concludes with a discussion pression. The "-law transformation and ADPCM of software-only real-time implementations. coder are simple approaches with low-complexity, low-compression, and medium audio quality algo- Digital Audio Data rithms. The MPEG/audio standard is a high- complexity, high-compression, and high audio qual- The digital representation of audio data offers ity algorithm. These techniques apply to general au- many advantages: high noise immunity, stability, dio signals and are not specifically tuned for speech and reproducibility. Audio in digital form also al- signals. lows the efficient implementation of many audio processing functions (e.g., mixing, filtering, and equalization) through the digital computer. Introduction The conversion from the analog to the digital do- Digital audio compression allows the efficient stor- main begins by sampling the audio input in regular, age and transmission of audio data. The various au- discrete intervals of time and quantizing the sam- dio compression techniques offer different levels of pled values into a discrete number of evenly spaced complexity, compressed audio quality, and amount levels. The digital audio data consists of a sequence of data compression. of binary values representing the number of quan- This paper is a survey of techniques used to com- tizer levels for each audio sample. The method of press digital audio signals. Its intent is to provide representing each sample with an independent code useful information for readers of all levels of ex- word is called pulse code modulation (PCM). Figure perience with digital audio processing. The paper 1 shows the digital audio process. 00110111000... 11001100100... ANALOG ANALOG AUDIO PCM PCM AUDIO INPUT VALUES VALUES OUTPUT ANALOG-TO-DIGITAL DIGITAL SIGNAL DIGITAL-TO-ANALOG CONVERSION PROCESSING CONVERSION Figure 1 Digital Audio Process Digital Technical Journal Vol. 5 No. 2, Spring 1993 1 Digital Audio Compression According to the Nyquist theory, a time-sampled signals.[3,4] The federal standards 1015 LPC (linear signal can faithfully represent signals up to half the predictive coding) and 1016 CELP (coded excited lin- sampling rate.[1] Typical sampling rates range from ear prediction) fall into this category of audio com- 8 kilohertz (kHz) to 48 kHz. The 8-kHz rate covers pression. a frequency range up to 4 kHz and so covers most of the frequencies produced by the human voice. The 48-kHz rate covers a frequency range up to 24 kHz "-law Audio Compression and more than adequately covers the entire audi- " ble frequency range, which for humans typically ex- The -law transformation is a basic audio compres- tends to only 20 kHz. In practice, the frequency sion technique specified by the Comité Consultatif range is somewhat less than half the sampling rate Internationale de Télégraphique et Téléphonique because of the practical system limitations. (CCITT) Recommendation G.711.[5] The transformation is essentially logarithmic in nature and al- The number of quantizer levels is typically a power lows the 8 bits per sample output codes to cover of 2 to make full use of a fixed number of bits per au- a dynamic range equivalent to 14 bits of linearly dio sample to represent the quantized values. With quantized values. This transformation offers a com- uniform quantizer step spacing, each additional bit pression ratio of (number of bits per source sample) has the potential of increasing the signal-to-noise /8 to 1. Unlike linear quantization, the logarithmic ratio, or equivalently the dynamic range, of the step spacings represent low-amplitude audio sam- quantized amplitude by roughly 6 decibels (dB). The ples with greater accuracy than higher-amplitude typical number of bits per sample used for digital values. Thus the signal-to-noise ratio of the trans- audio ranges from 8 to 16. The dynamic range ca- formed output is more uniform over the range of pability of these representations thus ranges from amplitudes of the input signal. The "-law transfor- 48 to 96 dB, respectively. To put these ranges into mation is perspective, if 0 dB represents the weakest audi- & ' ble sound pressure level, then 25 dB is the mini- PSS 127 ln@I C "jxjA for x ! H y a ln(1+") mum noise level in a typical recording studio, 35 dB IPU 127 ln@I C "jxjA for x ` H ln(1+") is the noise level inside a quiet home, and 120 dB is the loudest level before discomfort begins.[2] In terms of audio perception, 1 dB is the minimum au- dible change in sound pressure level under the best where m = 255, and x is the value of the input sig- conditions, and doubling the sound pressure level nal normalized to have a maximum value of 1. The amounts to one perceptual step in loudness. CCITT Recommendation G.711 also specifies a similar A-law transformation. The "-law transformation Compared to most digital data types (digital video is in common use in North America and Japan for excluded), the data rates associated with uncom- the Integrated Services Digital Network (ISDN) 8- pressed digital audio are substantial. For example, kHz-sampled, voice-grade, digital telephony service, the audio data on a compact disc (2 channels of au- and the A-law transformation is used elsewhere for dio sampled at 44.1 kHz with 16 bits per sample) re- the ISDN telephony. quires a data rate of about 1.4 megabits per second. There is a clear need for some form of compression to enable the more efficient storage and transmis- Adaptive Differential Pulse Code sion of this data. Modulation The many forms of audio compression techniques Figure 2 shows a simplified block diagram of an differ in the trade-offs between encoder and decoder adaptive differential pulse code modulation (AD- complexity, the compressed audio quality, and the PCM) coder.[6] For the sake of clarity, the figure amount of data compression. The techniques pre- omits details such as bit-stream formatting, the pos- sented in the following sections of this paper cover sible use of side information, and the adaptation the full range from the "-law, a low-complexity, blocks. The ADPCM coder takes advantage of the low-compression, and medium audio quality algo- fact that neighboring audio samples are generally rithm, to MPEG/audio, a high-complexity, high- similar to each other. Instead of representing each compression, and high audio quality algorithm. audio sample independently as in PCM, an ADPCM These techniques apply to general audio signals and encoder computes the difference between each au- are not specifically tuned for speech signals. This dio sample and its predicted value and outputs the paper does not cover audio compression algorithms PCM value of the differential. Note that the AD- designed specifically for speech signals. These al- PCM encoder (Figure 2a) uses most of the compo- gorithms are generally based on a modeling of the nents of the ADPCM decoder (Figure 2b) to compute vocal tract and do not work well for nonspeech audio the predicted values. 2 Digital Technical Journal Vol. 5 No. 2, Spring 1993 Digital Audio Compression The following section describes the ADPCM algorithm proposed by the Interactive Multimedia Asso- ciation (IMA). This algorithm offers a compression X[n] + D[n] C[n] + (ADAPTIVE) factor of (number of bits per source sample)/4 to 1. QUANTIZER – Other ADPCM audio compression schemes include the CCITT Recommendation G.721 (32 kilobits per second compressed data rate) and Recommendation Xp[n – 1] (ADAPTIVE) (ADAPTIVE) G.723 (24 kilobits per second compressed data rate) PREDICTOR DEQUANTIZER Xp[n] standards and the compact disc interactive audio + Dq[n] compression algorithm.[7,8] + + The IMA ADPCM Algorithm. The IMA is a consor- tium of computer hardware and software vendors cooperating to develop a de facto standard for com- (a) ADPCM Encoder puter multimedia data. The IMA’s goal for its audio compression proposal was to select a public-domain audio compression algorithm able to provide good C[n] Dq[n] + Xp[n] compressed audio quality with good data compres- (ADAPTIVE) + DEQUANTIZER sion performance. In addition, the algorithm had to + be simple enough to enable software-only, real-time Xp[n – 1] (ADAPTIVE) decompression of stereo, 44.1-kHz-sampled, audio PREDICTOR signals on a 20-megahertz (MHz) 386-class computer. The selected ADPCM algorithm not only meets these goals, but is also simple enough to en- (b) ADPCM Decoder able software-only, real-time encoding on the same computer. Figure 2 ADPCM Compression and Decompression The simplicity of the IMA ADPCM proposal lies in the crudity of its predictor. The predicted value of the audio sample is simply the decoded value of the immediately previous audio sample. Thus the The quantizer output is generally only a (signed) predictor block in Figure 2 is merely a time-delay representation of the number of quantizer levels.

Load more