A Practical Assessment of Process Data Compression Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Ind. Eng. Chem. Res. 1998, 37, 267-274 267 A Practical Assessment of Process Data Compression Techniques Matthew J. Watson, Antonios Liakopoulos, Dragana Brzakovic, and Christos Georgakis* Chemical Process Modeling and Control Research Center, Lehigh University, Bethlehem, Pennsylvania 18015 Plant data are used to compare the effectiveness of wavelet-based methods with other compression techniques. The challenge is to effectively treat the data so that the maximum compression ratio is achieved while the important features are retained in the compressed data. Wavelets have properties that are desirable for data compression. They are localized in time (or space) and in frequency. This means that important short-lived high-frequency disturbances can be preserved in the compressed data, and these disturbances may be differentiated from slower, low-frequency trends. Besides discrete wavelet transforms, linear interpolation, discrete cosine transform, and vector quantization are also used to compress data. The transform-based compression algorithms perform better than the linear interpolation methods, such as swinging door, that have been used traditionally in the chemical process industries. Among these techniques, the wavelet-based one compresses the process data with excellent overall and best local accuracy. 1. Introduction application of subband decomposition before vector quantizing the signal. Taking the sub-band decomposi- There is often a need to retrieve large quantities of tion of a signal is closely related to taking the wavelet archival data for the purposes of plant diagnostics or transform. Bakshi and Stephanopoulos published a model identification and validation. With the global- series of papers studying how wavelets can be applied ization of the operations of many companies this task to extract temporal features and detect faults in process also implies that the archival and recalling locations signals (Bakshi and Stephanopoulos, 1994a,b). More might be separated by several thousand miles. To speed recently, the problem of compressing chemical process up retrieval time from archive to requesting computer, data through the use of wavelets has been addressed the data need to be transmitted in compressed form. (Bakshi and Stephanopoulos, 1996; Watson et al., 1994). The problem of data compression for any type of These papers present qualitative comparisons of some engineering data has the same basis: maximize the data compression algorithms and give some quantitative compression ratio while maintaining as much of the comparisons for short-time data sets. desirable features of the signal as possible. The features The objective of this paper is to describe three ways that are retained depend upon the type of compression in which data compression is achieved: piecewise linear technique used. functional approximation, application of a data trans- The objective of any data compression algorithm is form and the discarding of the insignificant transform to represent a given data set with another smaller data coefficients, and vector quantization. Each of these set. In order to accomplish this, a data compression methods is described in sections 2.1, 2.2, and 2.3 algorithm takes advantage of any redundancy or repeti- respectively. The data compression methods described tion in the data. Frequently, data compression can be are then used to compress large sets of real process data. carried out for the purpose of separating the useful One of the data sets contains four flow rate measure- features of the data from those not needed. Most data ments from an Amoco depropanizer (MacFarlane, 1993). compression algorithms consist of one or a combination Each variable in this data set contains 1399 data points. of the following: data transform, quantization, and The other set is temperature data from duPont’s Falcon coding. project (Moser, 1994), and each variable consists of In contrast to the vast amount of research in data 17 152 time sequence measurements. Comparisons compression as applied to image or acoustic signals, between the compression methods are presented and there have been few studies of data compression in the discussed in sections 3 and 4, mostly in terms of how process industries. Hale and Sellars (1981) describe a the error, the difference between the reconstructed data compression algorithm used in industry, whereby and original signal, varies as a function of compression a signal is approximated by a piecewise linear function. ratio. The swinging door algorithm (Bristol, 1990), which is a variation of the piecewise linear theme, is also in use in the process industry. Feehs and Arce (1988) describe 2. Compression Methods and Algorithms how vector quantization can be used to compress process trend recordings. Hu and Arce (1988) later studied the 2.1. Piecewise Linear Compression. In piecewise linear compression a signal is assumed to continue in a * Author to whom correspondence should be addressed: straight line, within an error bound, until a point lying Iacocca Hall, Lehigh University, 111 Research Drive, Bethle- outside of the error bound forces a recording to be made. hem, PA 18015. Telephone: (610)758-5432. Fax: (610)758- A new line is then assumed and the algorithm contin- 5297. E-mail: [email protected]. ues. In this type of data compression, where the S0888-5885(97)00401-6 CCC: $15.00 © 1998 American Chemical Society Published on Web 01/05/1998 268 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 Figure 2. Backward slope compression. (Redrawn from Hale and Figure 1. Boxcar compression. (Redrawn from Hale and Sellars, Sellars, 1981.) 1981.) of more complex algorithms, discussed below, to be recording time step varies, the measured value and its applied effectively to process data. date-time tag must be recorded. If each date-time tag 2.2. Data Transforms. In the continuous formula- is assumed to require the same amount of memory as tion, a data transform is nondestructive or lossless, in one recorded measurement, the compression ratio is that no information is lost when the transform is made. Typically the transform has an inverse that can per- compression ratio ) fectly reconstruct the original data. Examples of some no. of original measurements commonly used transforms are Laplace, Fourier, and recently the wavelet transform. A linear transform no. of recorded measurements 2 × often compacts most of the information of the original data set into a smaller number of vector components. The boxcar algorithm makes a recording when the For example, the discrete cosine transform performs a current value differs from the last recorded value by an mapping from the time to the frequency domain, and amount greater than or equal to the predetermined often a signal vector has very little energy in the higher recording limit (error bound) for that variable. The frequency regions of the spectral band. This property, previous value processed should be recorded, not the known as compaction, implies that a large portion of current value which caused the triggering of the record- the components of the transformed signal vector are ing as in Figure 1. The boxcar algorithm performs best likely to be very close to zero and may often be neglected when the process runs for long stretches of steady-state entirely (Gersho and Gray, 1992). Setting these coef- operation. ficients to zero is known as thresholding and is the basis The backward slope algorithm projects a recording for data compression through functional approximation. limit into the future on the basis of slope of the previous When coefficients are neglected, the transform is no two recorded values. The previous value is recorded if longer lossless in that the reconstructed signal differs the current value lies outside of the recording limit. from the original. Once a value is recorded a new line and recording limit In the remainder of this section we briefly present are projected into the future, and the algorithm is the definitions of the transforms considered in this repeated (Figure 2). paper. Some properties of the transforms relevant to The boxcar and backward slope algorithms can be the discussion of the results are also reviewed. combined into a method that is a hybrid of the two. If a The continuous Fourier transform is given by value lies outside of the backward slope recording limit, the method reverts to the boxcar until a recording is ∞ H(f) ) h(t)e-2πift dt (1) made. If the boxcar test fails first, the method continues -∞ with backward slope until a recording is made. If both ∫ tests fail, a recording is made and the algorithm starts where f is the frequency and h(t) is the function in the over. The swinging door algorithm (Bristol, 1990) is time domain. The discrete form of the same transform similar to the backward slope algorithm, except that the is recording limit is based on the slope of the line between the previously recorded value and the current measured N-1 -2πikn/N values. When the current measured value has exceeded Hn ) ∑ hke (2) the error bound, defined by the recording limit, the k)0 value at the previous time step is recorded and the algorithm is repeated. where hk, k ) 0, 1, ..., N - 1, is the time sequence array These compression algorithms have a minimal com- of length N, and Hn is the discrete Fourier transform putational load and were developed at a time when coefficient. The discrete and continuous Fourier trans- algorithms that achieved higher compression ratios did forms are related, to a first approximation, by H(fn) ≈ not justify the additional computation. Today’s com- ∆Hn, where ∆ is the sampling interval in the time putational environment allows the efficient application domain. Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 269 The discrete cosine transform, of N-1 data points, is given by 2 N-1 (2j + 1)kπ Fk ) νk ∑ fj cos[ ] N j)0 2N x k ) 0, 1, 2, ..., N - 1 (3) where 1 if k ) 0 ν 2 k ) { x 1 otherwise fj is the discrete time data sequence and Fk is the kth discrete cosine transform coefficient.