Analysis of Inter-Channel Dependencies in Audio Lossless Block Coding
Total Page:16
File Type:pdf, Size:1020Kb
Communication Papers of the Federated Conference on DOI: 10.15439/2018F181 Computer Science and Information Systems pp. 135–139 ISSN 2300-5963 ACSIS, Vol. 17 Analysis of inter-channel dependencies in audio lossless block coding Cezary Wernik Grzegorz Ulacha West Pomeranian University of West Pomeranian University of Technology in Szczecin, Technology in Szczecin, al. Piastów 17, 70-310 Szczecin, al. Piastów 17, 70-310 Szczecin, Poland Poland Email: [email protected] Email: [email protected] such as Golomb [3] and Rice [8] code. In the case of audio Abstract— In this paper the basics of data predictive coding, a Gilbert-Moore block code, which is a variation of modeling (using the method of minimization mean square error) the arithmetic code [7] is also used. for lossless audio compression are presented. The described Publications in the field of lossless compression of audio research focuses on inter-channel analysis and setting range of are mainly focused on the stage of data decomposition. prediction in dependencies of frame size. In addition, the concept of data flow using inter-channel dependencies and an There are two basic types of modeling. The first is the use of authorial, effective and flexible method of saving prediction linear prediction [2], [9] or non-linear (e.g. using neural coefficients are presented. networks [6]). The second type is the use of such transformations as DCT (MPEG-4 SLS [12]) or wavelet, but I. INTRODUCTION TO LOSSLESS AUDIO DATA COMPRESSION the current research shows that this type is slightly less INIMIZATION of storage and transmission data cost efficient in the case of lossless coding. Therefore, predictive M are one of most important issues of teleinformatics. methods are used in most cases. Tool for simplification of reduction of this cost is data In Sections II and III the basics of audio modeling, compression. A lot of such compression algorithms exist and including calculation (based on the mean square error the most effective ones are adapted to the specific data type. minimization method) and a typical method of saving Compression methods can be divided into lossy and lossless, prediction coefficients are presented. In Section IV the and this research focuses on the latter, limited to the coding authorial solution of flexible and effective method saving of audio data. prediction coefficients are presented, afterwards In Section Important purposes of lossless audio compression include V presented the analysis of inter-channel dependencies. In recording storage, saving of records with high-quality sound Section VI was described the effective way of coding on commercial media (e.g. DVDs, Blu-Ray), and selling prediction errors (Golomb adaptive code), while in Section songs in online music stores for more demanding customers VII presented the summary and comparison of the efficiency who are not satisfied with the quality of mp3 format [5]. of the proposed method against other known solutions. Moreover, lossless mode is often required at the stage of music processing in a studio, advertising materials and in the II. BASICS OF AUDIO MODELING production of radio and television programs, films (post- In lossless audio codec’s, typical linear predictor of the production [1]) etc. In such case, no lossy coding is used, order r is used for modeling, which is the predicted value of which at each iteration of sound editing may cumulate the currently coded sample x(n) based on weighted average additional distortions. of the r previous signal samples. The simplest predictive In the beginning of the 21st century, many effective models are those with fixed coefficients, including, the proposals for the MPEG-4 Lossless Audio Coding standard DPCM constant model using the previous sample were developed [4], but we cannot ignore the fact that a ˆnx nx 1 . The use of a linear predictor allows to branch of amateur solutions develops in an independent way, encode only prediction errors, i.e. differences e(n) between which use algorithms that are not fully presented in scientific the actual and predicted value (rounded to the nearest publications. For example OptimFrog [14] and Monkey’s integer), which are most often small values oscillating near Audio[15] belong to the top of most efficient programs for zero: lossless audio compression. In modern compression methods usually two steps are ne nx ˆ nx (1) used: data decomposition, and then compression by one of In this way we obtain a difference signal in which the the efficient entropy methods. The most effective ones are distribution of errors e(n) has a character similar to Laplace arithmetic coding, Huffman coding [10] and its variations distribution, which allows for efficient coding using one of c 2018, PTI 135 ´ 136 COMMUNICATION PAPERS. POZNAN, 2018 static or adaptive entropy methods, such as Golomb-Rice of the inverse matrix, but it is able to calculate the model code or arithmetic code (see Fig. 1 in chapter V). coefficients in an iterative manner for subsequent orders of Usually a method called forward adaptation is used, as the prediction. In this way, we reduce computational complexity encoder must have access to the whole frame before from 0(n3) to 0(n2). An additional advantage is that the T encoding, and this means that the calculated coefficients reflection coefficients k = [k1, k2, ..., kr] often referred to as should also be sent to the decoder. For this reason, when PARCOR (partial correlation coefficients) belong to the developing the method with forward adaptation, one should compartment (-1; 1) and they can be effectively coded (they calculate the effective way of choosing the frame size, the are subjected to a quantization process, e.g. using 7 bits). order of prediction, as well as the accuracy of saving the Using this, the size of the header containing the set of prediction coefficients. This is an asymmetrical method coefficients of a given model is not large, even if quite high temporarily, since the decoder works relatively quickly, orders of prediction are used [5]. downloading only the head information associated with the In contrast to the PARCOR format, which assumes the given frame and decoding based on the formula (1). The stationarity of the audio signal the new approach proposed disproportion of time in the method of forward adaptation allows for coding efficiently prediction coefficients obtained (characterized by a longer coding time in relation to in any way (e.g. thanks to the use of different types of decoding) plays a significant role, since the coding operation suboptimal algorithms that minimize the entropy value or a is most often carried out once, and decoding many times. bitwise average in each coded frame) [10]. A typical forward adaptation solution is used in the Rejecting the assumption about the stationarity of audio MPEG-4 ALS, where the frame is approximately 43 ms long signal should be used the autocovariance method, wherein T (depending on the sampling frequency, the frame length the vector of prediction coefficients w = [w1, w2, ..., wr] is counted in the samples is different and the maximum at calculated from the matrix equation [10]: f = 192 kHz is N = 8192). p w R1 p (2) We can also introduce the term long term of the frame (which is a group of frames) called super-frame here, in where R is a square matrix with dimensions r×r elements which the number N can be hundreds of thousands of R(j,i) such that: samples. Although MPEG-4 ALS uses frames with a N 1 maximum size of N = 213, there is no obstacle in increasing R,ij nx nxi j , (3) length of frames in newer solutions. Their length may be n0 limited by the principle of free access to data in real time, while p is a vector with size r×1 elements p(j) such that: which results from the needs of e.g. studio sound processing. N 1 For this purpose, MPEG-4 ALS proposes independent access pj nx nx j , (4) to the frame set every 500 ms (no need to decode previous n0 data to correctly decode any frames), which introduces a where N is the number of samples in frame. It is assumed limit of Nmax = 24 000 samples in one super-frame when sampling frequency is 48 kHz. Above this value one should that for the first r samples in the first frame of super-frame a give up the possibility of free access. However, this is not a simplified prediction model of the lower order is used. problem for archiving and transmission applications, e.g. when someone wants to purchase the whole artist's album IV. ENCODING PREDICTION COEFFICIENTS METHOD from the online store. The method of saving header data proposed in this work In this paper, we consider th application of super-frames assumes that from the set of prediction coefficients w we with a length of 20 seconds. The analysis was made on 16 choose the one with the highest absolute value. We mark his dozen-second fragments of recordings (stereo, 16-bit position in the vector as imax, and the value of this coefficient samples, 44 100 samples per second) of various genres of as wmax. Value of wmax is initially projected onto a 32-bit music, men's and women's speech recordings available in float type, and the index imax is saved as an 8-bit integer base [16]. (at r ≤ 256). All other coefficients are coded on b + 1 bits after III. CALCULATION OF PREDICTION COEFFICIENTS USING normalizing their value in relation to wmax and appropriate MMSE METHOD scaling, in a manner consistent with the formula: It has been widely accepted that for audio signals, the w 1 1 i b calculation of prediction coefficients using the Mean Square wi 1 2 , (5) wmax 2 2 Error Minimization (MMSE) method gives very good results.