Towards Optimal Compression of Meteorological Data: a Case Study of Using Interval-Motivated Overestimators in Global Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Towards Optimal Compression of Meteorological Data: A Case Study of Using Interval-Motivated Overestimators in Global Optimization Olga Kosheleva Department of Electrical and Computer Engineering and Department of Teacher Education, University of Texas, El Paso, TX 79968, USA, [email protected] 1 Introduction The existing image and data compression techniques try to minimize the mean square deviation between the original data f(x; y; z) and the compressed- decompressed data fe(x; y; z). In many practical situations, reconstruction that only guaranteed mean square error over the data set is unacceptable. For example, if we use the meteorological data to plan a best trajectory for a plane, then what we really want to know are the meteorological parameters such as wind, temperature, and pressure along the trajectory. If along this line, the values are not reconstructed accurately enough, the plane may crash; the fact that on average, we get a good reconstruction, does not help. In general, what we need is a compression that guarantees that for each (x; y), the di®erence jf(x; y; z)¡fe(x; y; z)j is bounded by a given value ¢, i.e., that the actual value f(x; y; z) belongs to the interval [fe(x; y; z) ¡ ¢; fe(x; y; z) + ¢]: In this chapter, we describe new e±cient techniques for data compression under such interval uncertainty. 2 Optimal Data Compression: A Problem 2.1 Data Compression Is Important At present, so much data is coming from measuring instruments that it is necessary to compress this data before storing and processing. We can gain some storage space by using lossless compression, but often, this gain is not su±cient, so we must use lossy compression as well. 2 Olga Kosheleva 2.2 Successes of 2-D Image Compression In the last decades, there has been a great progress in image and data com- pression. In particular, the JPEG2000 standard (see, e.g., [TM02]) uses the wavelet transform methods together with other e±cient compression tech- niques to provide a very e±cient compression of 2D images. Within this standard, we can select di®erent bitrates (i.e., number of bits per pixel that is required, on average, for the compressed image), and depend- ing on the bitrate, get di®erent degrees of compression. When we select the highest possible bitrate, we get the lossless compres- sions that enables us to reconstruct the original image precisely. When we decrease the bitrate, we get a lossy compression; the smaller the bitrate, the more the compressed/decompressed image will di®er from the original image. In principle, it is possible to use these compression techniques to compress 2D measurement data as well. 2.3 Compressing 3-D Data: Layer-by-Layer Approach It is also possible to compress 3D measurement data f(x; y; z), e.g., meteoro- logical measurements taken in di®erent places (x; y) at di®erent heights z. One possibility is simply to apply the 2D JPEG2000 compression to each horizontal layer f(x; y; z0). 2.4 Compressing 3-D Data: An Approach that Uses KLT Transform Another possibility, in accordance with Part 2 of JPEG2000 standard [TM02], is to ¯rst apply the Karhunen-Lo`eve (KLT) transform to each vertical line. Speci¯cally, we: ² compute the average value 1 X f¹(z) = ¢ f(x; y; z) N x;y of the analyzed quantity at a given height z, where N is the overall number of horizontal points (x; y); ² compute the covariances between di®erent heights: 1 X V (z ; z ) = ¢ (f(x; y; z ) ¡ f¹(z )) ¢ (f(x; y; z ) ¡ f¹(z )); 1 2 N 1 1 2 2 x;y ² ¯nd the eigenvalues ¸k and the eigenvectors ek(z) of the covariance matrix V (z1; z2); we sort these eigenvectors into a sequence e1(z); e2(z);::: so that j¸1j ¸ j¸2j ¸ ::: ; Towards Optimal Compression of Meteorological Data 3 ² ¯nally, we represent the original 3D data values f(x; y; z) as a linear com- bination X ¹ f(x; y; z) = f(z) + ak(x; y) ¢ ek(z) k of the eigenvectors ek(z). 2.5 An Approach that Uses KLT Transform: Details How can we implement this approach? ² The ¯rst two steps are straightforward. ² The third step (computing eigenvalues and eigenvectors) is a known com- putational problem for which many standard software packages provides an e±cient algorithmic solution. ² So, to specify how this method can be implemented, we must describe how the last step can be implemented, i.e., how we can represent the original 3D data as a linear combination of the eigenvectors. First, why is such a representation possible at all? It is known (see, e.g., [TM02]), that in general, the eigenvalues of a covariance matrix form a ortho- normal basis in the space of all Nz-dimensional vectors e = fe(z)g = (e(1); e(2); : : : ; e(Nz)): By de¯nition of a basis this means, in particular, for every (x; y), the corre- sponding di®erence vector def ¹ dx;y(z) = f(x; y; z) ¡ f(z) can be represented as a linear combination of these eigenvalues, i.e., that for every z, we have X ¹ dx;y(z) = f(x; y; z) ¡ f(z) = ak(x; y) ¢ ek(z); k where by ak(x; y), we denoted the coe±cient at ek(z) in the corresponding expansion. From this formula, we get the desired representation of f(x; y; z). How can we actually compute this representation? Since the vectors ek(z) form an orthonormal basis, we can conclude that for the expansion of the vector dx;y, each coe±cient ak(x; y) at ek(z) is a dot (scalar) product of the vector dx;y and the k-th vector ek(z) from the basis, i.e., that def X ak(x; y) = dx;y ¢ ek = dx;y(z) ¢ ek(z): z Substituting the expression for dx;y(z) into this formula, we conclude that X ¹ ak(x; y) = (f(x; y; z) ¡ f(z)) ¢ ek(z): z 4 Olga Kosheleva Comment. Actually, instead of using this explicit (thus fast-to-compute) for- mula, we can use even faster formulas of the so-called fast KLT transform (see, e.g., [TM02]). 2.6 KLT Transform: Result As a result of the KLT approach, we represent the original 3-D data as a sequence of the horizontal slices ak(x; y): ² the ¯rst slice a1(x; y) corresponds to the main (1-st) eigenvalue; ² the second slice a2(x; y) corresponds to the next (2-nd) eigenvalue; ² etc., with the overall intensity decreasing from one slice to the next one. Next, to each of these slices ak(x; y), we apply a 2D JPEG2000 compression with the appropriate bit rate bk depending on the slice k. 2.7 Decompressing 3-D Data: KLT-Based Approach To reconstruct the data, we so the following: ² First, we apply JPEG2000 decompression to each slice; as a result, we get [bk] the values eak (x; y). ² Second, based on these reconstructed slices, we now reconstruct the origi- nal 3-D data data as X e ¹ [bk] f(x; y; z) = f(z) + eak (x; y) ¢ ek(z): k 2.8 Data Compression as an Optimization Problem: an Informal Introduction Di®erent bit rate allocations bk lead to di®erent quality of (lossy) compression, i.e., to di®erent quality of the corresponding decompressed data. Among all possible compression schemes that guarantee the given recon- struction accuracy (in some reasonable sense), we want to ¯nd the scheme for which the average bitrate def 1 X b = ¢ bk Nz k is the smallest possible. In some cases, the bandwidth is limited, i.e., we know the largest possible average bitrate b0. In such cases, among all compression schemes with b · b0, we must ¯nd a one for which the compression/decompression error is the smallest possible. Towards Optimal Compression of Meteorological Data 5 2.9 JPEG2000 Approach Is Tailored Towards Image Processing and Towards Mean Square Error (MSE) In the JPEG2000-based approach, for compressing measurement data, we use image compression techniques. The main objective of image compression is to retain the quality of the image. From the viewpoint of visual image quality, the image distortion can be reasonably well described by the mean square di®erence MSE (a.k.a. L2- norm) between the original image I(x; y) and the compressed-decompressed image Ie(x; y). As a result, sometimes, under the L2-optimal compression, an image may be vastly distorted at some points (x; y), and this is OK as long as the overall mean square error is small. 2.10 For Data Compression, MSE May Be a Bad Criterion When we compress measurement results, our objective is to be able to repro- duce each individual measurement result with a certain guaranteed accuracy. In such a case, reconstruction that only guaranteed mean square error over the data set is unacceptable: for example, if we use the meteorological data to plan a best trajectory for a plane, what we really want to know are the meteorological parameters such as wind, temperature, and pressure along the trajectory. If along this line, the values are not reconstructed accurately enough, the plane may crash. The fact that on average, we get a good reconstruction, does not help. 2.11 An Appropriate Criterion for Data Compression What we need is a compression that guarantees the given accuracy for all pixels, i.e., that guarantees that the L1-norm max jf(x; y; z) ¡ fe(x; y; z)j x;y;z is small.