White Paper

Reducing the complexity of sub-band ADPCM coding to enable high-quality audio streaming from mobile devices a technical white paper by Neil Smyth and David Trainor, APTX www.aptx.com

1 | © APT Licensing Ltd., 2009 White Paper

ABSTRACT

The number of consumer audio applications demanding high quality audio compression and communication across networks continues to grow. Although the consumer is increasingly demanding higher audio quality, devices such as portable media players and wireless headsets also demand low computational complexity, low power dissipation and practical transmission bit-rates to help conserve battery life. This paper discusses research undertaken to lower the complexity of existing high-quality sub-band ADPCM coding schemes to better satisfy these conflicting criteria.

INTRODUCTION

The widespread popularity of portable media devices has allowed consumers to enjoy audio/video entertainment, games and online content at their convenience. The only limitation is the awkward necessity of wires to connect these devices to displays, earphones or car/home entertainment systems. Wireless audio streaming isolates the speaker device, allowing users to place their mobile devices in a safe or withdrawn location while they enjoy convenient wireless audio streaming.

A major obstacle to wireless audio streaming is that the mobile device and/or speaker system are typically power restricted. If the application involves video playback there is also a requirement for low in order to provide acceptable lip-synch. These requirements have led to the use of ADPCM coders rather than perceptual coders in wireless audio/video devices due to their relatively low complexity and latency. As the storage capacities of these devices continues to grow consumers increasingly exploit the quality benefits of increasingly higher bit rates to compress their audio content. As consumers become increasingly accustomed to high quality wired audio playback they will demand that this quality is maintained when using the convenience of wireless streaming. All of these factors lead to the opposing design constraints of high quality and low complexity.

A number of audio compression algorithms are used for wireless audio streaming, these include SBC [1] and Enhanced apt-X [2]. The Philips developed SBC algorithm was selected as a mandatory codec for use in Bluetooth to insure interoperability between products. Bluetooth SBC is a frame-based variable rate APCM codec with low complexity processing overhead, an adaptive quantization step size and lower latency than optional Bluetooth codecs such as MP3 and AAC.

2 | © APT Licensing Ltd., 2009 White Paper

Enhanced apt-X is a frameless ADPCM codec (see Figure 1 for a basic overview) with a fixed compression ratio of 4:1, an adaptive quantization step size and predictive differential coding. The use of prediction and differential coding in such an ADPCM codec provides reduced levels of quantization noise and thus quality improvements over an APCM codec. However, a significant obstacle to deployment of ADPCM codecs in comparison to APCM is the additional computational cost associated with the prediction process.

(a) ADPCM sub-band codec

(b) ADPCM encoder

(c) ADPCM decoder Fig. 1: ADPCM Sub-band codec block diagram

The difference in latency between the SBC and Enhanced apt-X codecs should also be considered. The frameless structure of Enhanced apt-X provides extremely low latencies. In contrast the framed SBC codec requires buffering of frame data prior to encoding thereby consuming additional algorithmic delay. When considering the impact of framed and frameless coding, the buffering and transmission delays incurred in the wireless transmission system should also be considered. If the wireless transmission system is designed

3 | © APT Licensing Ltd., 2009 White Paper to accommodate low latency operation the low latencies associated with a frameless streaming codec such as Enhanced apt-X will be apparent and desirable.

The contrasting algorithmic structure of SBC and Enhanced apt-X has led these two algorithms to be appropriated as the basis for research into lowering the complexity of high-quality sub-band ADPCM coding.

The research goals of lowering the complexity of ADPCM coding are as follows:

• Reduce the power dissipation • Lower the computational complexity • Enhance the audio quality • Reduce memory storage/access requirements

This paper outlines the results achieved thus far in developing a high quality for use in wireless applications. Here we describe how careful selection of the sub-band filter bank can reduce processing requirements while maintaining audio quality. It is also explained how the additional complexity associated with ADPCM coding as opposed to APCM coding can be mitigated by careful application of adaptive prediction techniques.

Additional methods for reducing the computational complexity are also described, highlighting their advantages and disadvantages. These methods include utilizing stereo intensity coding to discard perceptually unimportant side channel information, using large frame lengths to improve processing efficiency and introducing favourable conditions for entropy coding in order to achieve significant improvements.

SUB-BANDING FILTER BANK ANALYSIS The sub-band filter bank is a major component of an APCM/ADPCM codec in terms of computational complexity and audio quality. The primary purpose of the sub-band filter bank is to translate the input PCM samples into a number of sub-bands that can be independently coded. This enables the frequency- dependent psychoacoustic properties of the human hearing system to be exploited.

A high quality and low complexity sub-band filter bank design is obtained by balancing the two parameters. The full spectrum of target hardware must be considered (RISC microprocessor, DSP, FPGA, ASIC) to account

4 | © APT Licensing Ltd., 2009 White Paper for the wide variety of applications and scenarios in which such a codec could be deployed. This requires some general measures of complexity to be used for comparative purposes. Perhaps the most important measure in terms of power dissipation is the execution time of the algorithm. The faster the filter bank can complete its task, the greater the possibility of the hardware platform entering a low power state. If the filter bank architecture and hardware platform allow parallel processing to occur it is desirable for the implementation to employ these features. In software implementation it is known that lowering the execution time is preferable to increased power dissipation associated with parallelism [3, 4, 5].

A filter bank that offers a relatively low number of memory accesses has two possible advantages, (a) lower power dissipation attributed to less switching of the data and address buses and (b) faster execution time attributed to memory access delays.

The following analysis considers four different filter bank architectures and discusses their relative advantages and disadvantages.

QUASI-LINEAR PHASE IIR An IIR filter can be used to create a quasi-linear phase halfband _lter that maintains approximately linear phase within the passband. Quasi-linear phase IIR (QLPIIR) filters are an interesting alternative to FIR filters used in the QMF due to the reduced number of coefficients required to achieve the same transition roll-off and stopband attenuation characteristics. In this research MATLAB has been used to design a halfband quasi- linear phase IIR filter with a stopband attenuation of 70 dB. The analysis IIR filter prototype is described in Figure 2.

The allpass filters are defined as follows:

5 | © APT Licensing Ltd., 2009 White Paper

Fig. 2: QLP IIR prototype analysis filter PACKET DECOMPOSITION Similarly to the QMF and quasi-linear phase IIR, the high-pass and low-pass components of the discrete (DWT) can be used in the construction of a multi-level wavelet packet decomposition

(WPD) by means of a network of halfband filters .

DAUBECHIES 4 AND 6 The Daubechies 4 and Daubechies 6 DWTs were evaluated. They possess an irregular processing structure with a relatively high number of coefficients compared to CDF CDF 9/7 (see below) but they offer a lower latency. COHEN-DAUBECHIES-FEAUVEAU (CDF) 9/7 The Cohen-Daubechies-Feauveau wavelet is a family of biorthogonal that offer an invertible andsymmetric structure. Lifting decomposition of CDF 9/7 provides the polyphase matrix representation of Equation 4, also described by Figure 3. The inverse CDF 9/7 wavelet transform is implemented by simple inversion of the forward transform. The lifting scheme coe_cients of CDF 9/7 are all derived from an irrational number. In a fixed-point implementation the use of an irrational number results in rounding differences and quantization between the forward and inverse wavelet transforms. This distortion can be improved by modifying the underlying irrational number [6].

6 | © APT Licensing Ltd., 2009 White Paper

Fig. 3: CDF 9/7 discrete wavelet transform using lifting composition

COSINE MODULATED FILTER BANKS (CMFB)

An N-band cosine modulated filter bank is constructed from a prototype low-pass filter that possesses a cut- off frequency of Fs=4N. Cosine functions are then used to modulate the low-pass filter and form N band-pass filters each with a bandwidth of Fs=2N. Various methods can be used to efficiently construct this filter bank. For the purposes of this research the 4 and 8 sub-band variants of the Bluetooth SBC filter bank were used.

QUADRATURE MIRROR FILTER (QMF)

For the purposes of this research the 4 sub-band QMF of the Enhanced apt-X algorithm is analysed. A linear phase QMF filter is used to produce a symmetric high-pass and low-pass filter with near- perfect reconstruction. This halfband filter can be arranged into a tree structure in which each branch downsamples and splits the signal into two sub- bands.

EVALUATION OF SUB-BAND FILTER BANKS The fllter bank architectures were evaluated in two stages. In the first stage modelling of the algorithms was performed using Mathworks MATLAB and floating-point arithmetic. The Enhanced apt-X ADPCM algorithm was used as the basis for this model with modifications to the analysis-synthesis filter bank. This provided a practical evaluation of the filter banks where the signal is subjected to quantization noise, albeit using floating-point arithmetic.

The model provided measurements of latency and THD+N as provided in Table 1 and Figure 4 respectively. The quasi-linear phase IIR closely matches the THD+N achieved using the QMF filter while offering lower latency. Floating point implementations of the wavelet filters are shown to offer the best quality while also providing the lowest latency.

7 | © APT Licensing Ltd., 2009 White Paper

Table 1: Filter bank latency comparison for 4 bands The second stage of evaluation involved transferring the most promising filter bank structures from MATLAB to fixed-point realizations. For this purpose both assembly implementations on the Cambridge Silicon Radio (CSR) Kalimba DSP and C++ implementations on the ARM9E platform were created.

Fig. 4: THD+N comparison of various filter banks when used in Enhanced apt-X

Cambridge Silicon Radio (CSR) Kalimba DSP and C++ implementations on the ARM9E platform were created.

8 | © APT Licensing Ltd., 2009 White Paper

Table 2 describes the execution time of the different filter bank structures using the CSR Bluetunes 2 development boards [7]. For comparative purposes the number of MAC instructions per sample is also shown. Once DSP evaluation was concluded the designs were migrated to the ARM9E RISC microprocessor using fixed-point C++. Table 3 indicates the execution time achieved using the RealView tools with an ADPCM and APCM codec.

Table 2: Sub-band fiter bank execution time comparison using a modi_ed 4 sub-band Enhanced apt-X encoder and the CSR Kalimba DSP

Table 3: Encoder ARM9E execution time comparison using RealView real-time system model

Notable results here indicate that while the QMF utilizes a large number of MAC operations the concurrent load/store capabilities of the DSP allow for relatively fast operation in comparison to the general purpose RISC. It is interesting to note that while the arithmetic complexity of the quasi-linear phase IIR is low, matching that of SBC, the irregular structure and numerous delay lines in this architecture provide significant performance penalties on both software platforms. Markedly lower performance using the frame-based ADPCM codec can be attributed to code scheduling techniques being particularly suitable for the APCM codec and the other filter bank types, but not then QLPIIR filter bank. The superior design on the

9 | © APT Licensing Ltd., 2009 White Paper

DSP platform is the CDF 9/7 wavelet filter bank, its efficiency leading us to discard the D4 and D6 DWTs. The SBC CMFB is shown to be an efficient solution, but the CDF 9/7 filter is shown to offer the fastest execution time on both software platforms. ADAPTIVE PREDICTION

A prediction filter in an ADPCM codec is used to generate predicted signal values from which differential signals are obtained (see Figure 1). It is these difference signals that are coded and transmitted in the compressed bitstream. If the residual tends to zero the quantization noise will be reduced as the adaptive quantization step size in an ADPCM codec will also tend to zero. To achieve optimally low levels of quantization noise it is important to use adaptive prediction which will achieve an optimal convergence rate that tracks the audio signal as accurately as possible.

The inverse quantized sample values are typically used to update the filter coefficients as this data is available to both encoder and decoder and ensures that they are synchronized. Adaptive prediction filters operate independently within each sub-band. For the purposes of this research a sign-sign LMS filter was used to provide a basis for the prediction filter. This is a low complexity filter that incrementally updates its coefficients using a sign correlation between the current input signal and previous input signals. This LMS filter is configured to use two pole coefficients per sub-band with a variable number of zero coefficients.

NUMBER OF ZERO COEFFICIENTS Each zero coefficient is used as a filter tap in the prediction convolution and must be updated, leading to increased complexity as the number of zero coefficients increases. The number of zero coefficients is proportional to the mean prediction error and inversely proportional to the computational complexity. Therefore the number of zero coefficients should be chosen based on the perceptual importance of a sub- band and an acceptable computation cost.

10 | © APT Licensing Ltd., 2009 White Paper

Fig. 5: Mean prediction error and relative complexity of the lowest frequency sub-band of an ADPCM encoder for a variable number of zero coefficients

Figure 5 shows that less than 6 coefficients results in a steep increase in mean prediction error, while more than 6 coefficients results in a rapidly diminishing improvement in mean prediction error. The conclusion can therefore be drawn that 6 coefficients should be used as a minimum, with more coefficients applied where tolerable to those sub-bands of perceptually higher importance.

ADAPTIVE UPDATE RATE

While varying the number of zero coefficients provides a proportional change in the complexity, the subsequent convergence rate and audio quality will also vary proportionately. We therefore looked at the coefficient update process and determined that it can be selectively applied dependent on the variance of the audio material. Initial investigations using MATLAB and a variant of the Enhanced apt-X algorithm showed us that monitoring the variance of the prediction error provides an accurate means of determining when the prediction update should be performed. The principle behind this concept is that if the prediction error is sufficiently accurate there is no need to update the zero coefficients as they are already sufficiently accurate.

Mathematical modelling of this concept indicated that an average 33% reduction in the number of multiply accumulate operations associated with zero coefficient updates can be achieved with no significant perceptual loss in audio quality. Refinement of this concept led to an adaptive scheme where the instantaneous prediction error is classified to lie within a finite number of ranges. Within each of these ranges a suitable pre-determined rate for the number of zero and pole coefficient updates is chosen within each sub-band. This rate relates the number of times coefficients are updated to the number of prediction

11 | © APT Licensing Ltd., 2009 White Paper filtering iterations for which they are used, e.g. silence uses a rate of 1.25% while a transient signal uses a rate of 50%. Care is taken with transient signals, such that any significant increase in prediction error causes an immediate coefficient update to occur. This adaptive scheme exploits the redundancies found in invariant signals while maintaining the perceived audio quality.

Fig. 6: Signal variance in a sub-band and its corresponding update rate

A major concern with such a scheme is the effect of a loss of synchronization between the encoder and decoder coefficient update rate. The devised scheme is backwards adaptive, requiring no side information to convey the coefficient update rate. Therefore it is important that the decoder can recover from errors and the predicted signal values will converge. In such a situation the prediction filter itself provides protection against packet loss and bit errors which can occur over a wireless transmission medium. The transmission errors will result in an increase in the variance of the prediction error, instigating an increase in the coefficient update rate and subsequently recovery from error.

STEREO INTENSITY CODING Stereo intensity coding is a well understood tool that expands upon the premise of joint stereo coding. The joint stereo coding scheme involves the selective translation of the left/right stereo signal into a mid/side (or sum/difference) signal and is best employed when an adaptive bit allocation scheme is used. Optimal use of joint stereo coding is achieved when (a) the left and right channels of a stereo signal are highly correlated, (b) the side signal contains only small differences that can be aggressively compressed without loss in perceptual quality and (c) the mid signal can accurately represent both the left and right channels.

12 | © APT Licensing Ltd., 2009 White Paper

As per the equal loudness curves the threshold of hearing varies for different frequency ranges. While the human ear is relatively insensitive to the loudness of frequencies below 200 Hz, at 3- 4 kHz human hearing is considered to be at its most sensitive. The frequency range of 100 – 1500 Hz is the region in which the human ear is most sensitive to phase differences, i.e. stereo depth is perceptually important. At frequencies above this range the wavelength is comparable to the distance between the ears and the phase becomes ambiguous.

Stereo intensity coding expands upon the concept of joint stereo coding by analyzing the side channel in terms of the level and frequency range associated with each sub-band. The psychoacoustic test which is performed to determine if stereo depth is perceptually important involves determining if the mean side channel level is greater than a predefined threshold for each sub-band. This results in highly correlated stereo signals being coded as a mono signal. The level threshold of each sub-band is determined such that the probability of reducing the stereo depth increases exponentially with frequency. This psychoacoustic test is deliberately simple so as to reduce complexity. If low complexity was not a design specification more elaborate psychoacoustic analysis can be used. As an example the masking relationship between neighbouring sub-band frequencies and their level could also be exploited.

Fig. 7: Performance benefits of intensity coding

Figure 7 describes the reduction in execution times obtained using a modified version of SBC and intensity coding. As the number of sub-bands increases there is a proportional reduction in the processing requirements and bit rate. While there is a significant increase in SNR as the number of sub-bands increases

this distortion is perceptually indistinguishable from the original. When a side sub-band signal is intensity

13 | © APT Licensing Ltd., 2009 White Paper coded it is omitted from the coded bitstream. The bits saved can be re-allocated to other sub-bands to improve their quantization noise. Alternatively the bit rate reduction can be used by a Variable Bit Rate

(VBR) scheme to improve audio quality. VBR rate control will reduce the bit rate of undemanding sections of

audio content while maintaining a constant audio quality, this will reduce the power dissipation associated

with low bit rates and processing complexity.

THROUGHPUT AND COMPLEXITY REDUCTION ADAPTIVE FRAME LENGTH

A modified version of Bluetooth SBC has been created which utilizes non-standard block sizes of 2N samples, where N = 2 to 9. In addition to significant bit rate reductions, larger frame lengths enable a greater leverage of code scheduling and vectorization techniques. For example, critical code sections such as sub-band filtering loops or quantization will consume less processing overhead as they will be called on fewer occasions and will process data more efficiently when those data sets are larger. Figure 8 describes the significant complexity reductions that were achieved with this SBC variant and the ARM9E platform.

Fig. 8: Bit rate and execution time improvements for 4-band joint stereo SBC with a bitpool of 29 using large non- standard block sizes

Unfortunately this efficiency improvement introduces two significant issues – distortion and increased latency. Latency will increase proportionately with the increase in frame length. While irrelevant in wireless applications interacting only with audio sources, when using a video source the lip-synch misalignment will become clear. Therefore, increased frame lengths and lossless coding should not be used in joint audio/video applications.

14 | © APT Licensing Ltd., 2009 White Paper

Distortion artifacts are introduced if the quantization is not appropriate over the length of the frame. Gibbs phenomenon [9] can be observed when a large attack signal is preceded by a stationary region as the quantization noise is smeared forwards and backwards in time across the transient region. The human ear is more sensitive to backwards smearing as there is a smaller period of time before a transient signal in which noise will not be perceived. This is referred to as pre-echo and it leads to increased quantization noise in the stationary regions preceding the attack signal whichswill not be masked by the human hearing system to the same extent as any noise which proceeds.

In order to achieve complexity reductions while ensuring low distortion levels a compromise solution has been developed. This solution adaptively changes the frame length to achieve the optimal temporal and spatial resolution that minimizes the quantization noise. The scheme employed in this research monitors the variance of the wideband signal energy rather than each individual sub-band in an effort to reduce computation. When the energy variance begins to increase significantly the encoder will reduce the frame length in an attempt to maintain a low signal variance within each frame and therefore reduce quantization noise. If the signal energy decreases significantly the frame length is not reduced as the human ear will mask the resulting quantization noise.

ENTROPY CODING The probability density function (PDF) within each sub-band represents that of a Gaussian distribution. When the sub-band samples are distributed in this manner they become highly suitable for lossless coding. Higher degrees of redundancy are available to lossless coding schemes when the same quantization level is applied to larger sets of data. Increased processing requirements are associated with the addition of lossless coding. In order to achieve the goal of reduced complexity a number of entropy coding schemes with well documented low complexity characteristics were investigated. These schemes included Exponential-Golomb, Golomb- Rice, Hu_man and arithmetic coders. While the addition of any lossless coding will introduce additional processing requirements, as shown in Figure 8 the efficiency gains introduced by larger frame lengths are significant. Table 4 describes the coding efficiencies introduced to Bluetooth SBC when an adaptive frame length scheme and Golomb- Rice coding of the quantized samples are introduced.

15 | © APT Licensing Ltd., 2009 White Paper

Table 4: Computational efficiencies achieved with SBC using 8 sub-bands, SNR allocation method, joint stereo channel mode and a bitpool of 58 CONCLUSION

This paper discusses some of the research undertaken in developing a high quality audio codec for wireless audio streaming applications. The selection of a sub-banding filter and its impact on the computational requirements of the codec are discussed, with the selection of a CDF 9/7 based wavelet packet decomposition filter as the optimal choice in terms of quality and complexity. We also describe the use of conditional coefficcient update techniques to reduce the average computational requirements of an LMS prediction algorithm as used in an ADPCM codec. We demonstrate how this adaptive prediction technique can be used to reduce complexity without adversely affecting quality.

The complementary use of adaptive frame lengths and lossless coding of quantized samples is discussed. While such a scheme is unsuitable for low latency applications, it is shown to achieve significant reductions in both bit rate and computation. The overall goal of this research is to combine the developed coding tools to create a high quality audio compression algorithm that is developed specifically for wireless streaming purposes. To meet this objective further work is intended to expand and further explore the results thus far obtained. The important area of robustness over a wireless network will also be investigated.

REFERENCES 1) Audio Video WG, Bluetooth Special Interest Group (SIG), Inc., \Advanced Audio Distribution Profile Specification", April 2007 2) APTX, http://www.aptx.com/, Accessed July 2009. 3) H. Tomyiama, H. T. Ishihara, A. Inoue, and H. Yasuura, \Instruction scheduling for power reduction in processor-based system design," Design, Automation, Test Eur., pp. 23{26,1998. 4) M. Lee, V. Tiwari, S. Malik and M. Fujita, \Power analysis and and minimization techniques for embedded DSP software," Fujitsu Scientific and Technical Journal, 31(2),pp. 215{229, 1995. 5) X. Zhuang and S. Pande, \Parallelizing load/stores on dual-bank memory embedded processors," ACM Transactions on Embedded Computing Systems (TECS), Vol. 5, Issue 3, pp. 613{657, August 2006.

16 | © APT Licensing Ltd., 2009 White Paper

6) Z. Guangjun, C. Lizhi and C. Huowang, \A simple 9/7-tap wavelet filter based on lifting scheme," Proceedings of the 2001 International Conference on Image Processing, Volume 2, pp. 249{252 v, October 2001. 7) CSR BlueCore, http://www.csr.com, Accessed July 2009. 8) J. Herre, K. Brandenburg and D. Lederer, \Intensity Stereo Coding," Proc. 96th Audio Eng. Soc. (AES) Convention, AES, 1994, preprint 3799. 9) J. W. Gibbs, \Fourier Series," Nature 59, 200 and 606, 1899.

MORE INFORMATION For more information contact us : [email protected] www.aptx.com APTX – the apt-X® licensing company APT Licensing Limited Whiterock Business Park 729 Springfield Road Belfast, BT12 7FP Northern Ireland, UK

17 | © APT Licensing Ltd., 2009