Ind. Eng. Chem. Res. 1998, 37, 267-274 267

A Practical Assessment of Process Techniques

Matthew J. Watson, Antonios Liakopoulos, Dragana Brzakovic, and Christos Georgakis*

Chemical Process Modeling and Control Research Center, Lehigh University, Bethlehem, Pennsylvania 18015

Plant data are used to compare the effectiveness of wavelet-based methods with other compression techniques. The challenge is to effectively treat the data so that the maximum compression ratio is achieved while the important features are retained in the compressed data. Wavelets have properties that are desirable for data compression. They are localized in time (or space) and in frequency. This means that important short-lived high-frequency disturbances can be preserved in the compressed data, and these disturbances may be differentiated from slower, low-frequency trends. Besides discrete wavelet transforms, linear interpolation, discrete cosine transform, and vector quantization are also used to compress data. The transform-based compression algorithms perform better than the linear interpolation methods, such as swinging door, that have been used traditionally in the chemical process industries. Among these techniques, the wavelet-based one compresses the process data with excellent overall and best local accuracy.

1. Introduction application of subband decomposition before vector quantizing the signal. Taking the sub-band decomposi- There is often a need to retrieve large quantities of tion of a signal is closely related to taking the wavelet archival data for the purposes of plant diagnostics or transform. Bakshi and Stephanopoulos published a model identification and validation. With the global- series of papers studying how wavelets can be applied ization of the operations of many companies this task to extract temporal features and detect faults in process also implies that the archival and recalling locations signals (Bakshi and Stephanopoulos, 1994a,b). More might be separated by several thousand miles. To speed recently, the problem of compressing chemical process up retrieval time from archive to requesting computer, data through the use of wavelets has been addressed the data need to be transmitted in compressed form. (Bakshi and Stephanopoulos, 1996; Watson et al., 1994). The problem of data compression for any type of These papers present qualitative comparisons of some engineering data has the same basis: maximize the data compression algorithms and give some quantitative compression ratio while maintaining as much of the comparisons for short-time data sets. desirable features of the signal as possible. The features The objective of this paper is to describe three ways that are retained depend upon the type of compression in which data compression is achieved: piecewise linear technique used. functional approximation, application of a data trans- The objective of any data compression algorithm is form and the discarding of the insignificant transform to represent a given data set with another smaller data coefficients, and vector quantization. Each of these set. In order to accomplish this, a data compression methods is described in sections 2.1, 2.2, and 2.3 algorithm takes advantage of any redundancy or repeti- respectively. The data compression methods described tion in the data. Frequently, data compression can be are then used to compress large sets of real process data. carried out for the purpose of separating the useful One of the data sets contains four flow rate measure- features of the data from those not needed. Most data ments from an Amoco depropanizer (MacFarlane, 1993). compression algorithms consist of one or a combination Each variable in this data set contains 1399 data points. of the following: data transform, quantization, and The other set is temperature data from duPont’s Falcon coding. project (Moser, 1994), and each variable consists of In contrast to the vast amount of research in data 17 152 time sequence measurements. Comparisons compression as applied to image or acoustic signals, between the compression methods are presented and there have been few studies of data compression in the discussed in sections 3 and 4, mostly in terms of how process industries. Hale and Sellars (1981) describe a the error, the difference between the reconstructed data compression algorithm used in industry, whereby and original signal, varies as a function of compression a signal is approximated by a piecewise linear function. ratio. The swinging door algorithm (Bristol, 1990), which is a variation of the piecewise linear theme, is also in use in the process industry. Feehs and Arce (1988) describe 2. Compression Methods and Algorithms how vector quantization can be used to compress process trend recordings. Hu and Arce (1988) later studied the 2.1. Piecewise Linear Compression. In piecewise linear compression a signal is assumed to continue in a * Author to whom correspondence should be addressed: straight line, within an error bound, until a point lying Iacocca Hall, Lehigh University, 111 Research Drive, Bethle- outside of the error bound forces a recording to be made. hem, PA 18015. Telephone: (610)758-5432. Fax: (610)758- A new line is then assumed and the algorithm contin- 5297. E-mail: [email protected]. ues. In this type of data compression, where the

S0888-5885(97)00401-6 CCC: $15.00 © 1998 American Chemical Society Published on Web 01/05/1998 268 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998

Figure 2. Backward slope compression. (Redrawn from Hale and Figure 1. Boxcar compression. (Redrawn from Hale and Sellars, Sellars, 1981.) 1981.)

of more complex algorithms, discussed below, to be recording time step varies, the measured value and its applied effectively to process data. date-time tag must be recorded. If each date-time tag 2.2. Data Transforms. In the continuous formula- is assumed to require the same amount of memory as tion, a data transform is nondestructive or lossless, in one recorded measurement, the compression ratio is that no information is lost when the transform is made. Typically the transform has an inverse that can per- compression ratio ) fectly reconstruct the original data. Examples of some no. of original measurements commonly used transforms are Laplace, Fourier, and recently the wavelet transform. A linear transform no. of recorded measurements 2 × often compacts most of the information of the original data set into a smaller number of vector components. The boxcar algorithm makes a recording when the For example, the discrete cosine transform performs a current value differs from the last recorded value by an mapping from the time to the frequency domain, and amount greater than or equal to the predetermined often a signal vector has very little energy in the higher recording limit (error bound) for that variable. The frequency regions of the spectral band. This property, previous value processed should be recorded, not the known as compaction, implies that a large portion of current value which caused the triggering of the record- the components of the transformed signal vector are ing as in Figure 1. The boxcar algorithm performs best likely to be very close to zero and may often be neglected when the process runs for long stretches of steady-state entirely (Gersho and Gray, 1992). Setting these coef- operation. ficients to zero is known as thresholding and is the basis The backward slope algorithm projects a recording for data compression through functional approximation. limit into the future on the basis of slope of the previous When coefficients are neglected, the transform is no two recorded values. The previous value is recorded if longer lossless in that the reconstructed signal differs the current value lies outside of the recording limit. from the original. Once a value is recorded a new line and recording limit In the remainder of this section we briefly present are projected into the future, and the algorithm is the definitions of the transforms considered in this repeated (Figure 2). paper. Some properties of the transforms relevant to The boxcar and backward slope algorithms can be the discussion of the results are also reviewed. combined into a method that is a hybrid of the two. If a The continuous is given by value lies outside of the backward slope recording limit, the method reverts to the boxcar until a recording is ∞ H(f) ) h(t)e-2πift dt (1) made. If the boxcar test fails first, the method continues -∞ with backward slope until a recording is made. If both ∫ tests fail, a recording is made and the algorithm starts where f is the frequency and h(t) is the function in the over. The swinging door algorithm (Bristol, 1990) is time domain. The discrete form of the same transform similar to the backward slope algorithm, except that the is recording limit is based on the slope of the line between the previously recorded value and the current measured N-1 -2πikn/N values. When the current measured value has exceeded Hn ) ∑ hke (2) the error bound, defined by the recording limit, the k)0 value at the previous time step is recorded and the algorithm is repeated. where hk, k ) 0, 1, ..., N - 1, is the time sequence array These compression algorithms have a minimal com- of length N, and Hn is the discrete Fourier transform putational load and were developed at a time when coefficient. The discrete and continuous Fourier trans- algorithms that achieved higher compression ratios did forms are related, to a first approximation, by H(fn) ≈ not justify the additional computation. Today’s com- ∆Hn, where ∆ is the sampling interval in the time putational environment allows the efficient application domain. Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 269 The discrete cosine transform, of N-1 data points, is given by

2 N-1 (2j + 1)kπ Fk ) νk ∑ fj cos[ ] N j)0 2N x k ) 0, 1, 2, ..., N - 1 (3) where 1 if k ) 0 ν 2 k ) { x 1 otherwise fj is the discrete time data sequence and Fk is the kth discrete cosine transform coefficient. Algorithms for computing the discrete cosine transform can be found in other references (Ahmed et al., 1974; Elliot and Rao, 1982; Rao and Yip, 1990). The integral wavelet transform can be written as Figure 3. Discrete wavelet transform and power spectrum of signal whose frequency varies with time. (a) The signal is sin 50πt3; (b) frequency spectrum of signal; (c) multiresolution analysis 1/2 ∞ t - b ˆf(a,b) ) a - f(t)ψ dt (4) representation of the wavelet transform. | | -∞ ( a ) ∫ sin 50πt3 sampled at 128 discrete points over the time where ψ(t) is a wavelet function, known as the mother interval t ) [0,1]. In contrast to the Fourier and cosine wavelet. If the wavelet transform ˆf(a,b) is evaluated transforms, which can only resolve a function in terms at the position b ) k/2 j and with dilation a ) 2-j and if j/2 j of its frequency components, the wavelet transform the wavelet dilates and translates ψj,k(t) ) 2 ψ(2 t - simultaneously resolves a set of data into its time and k) are orthonormal, then the wavelet coefficients are frequency (translation and dilation) components. Thus, 1 k short-lived high-frequency components of the data can cj,k ) ˆf ( , ) (5) be distinguished and, if necessary, separated from j j 2 2 slower temporal trends. The frequency (Fourier) spec- trum, in Figure 3b, shows information about what where j and k are integers. Most functions and discrete frequencies are present in the entire signal, but tells data sets can be expressed in terms of a doubly infinite one nothing of where the frequencies occur. In contrast, series as follows: the discrete wavelet transform of the signal, represented ∞ on a time-frequency plane using multiresolution analy- sis in Figure 3c, gives information on the magnitude of f(t) ) ∑ cj,kψj,k(t) (6) j,k)-∞ frequencies over discrete time intervals. Each discrete wavelet transform coefficient is represented within a For a discrete data set, and an orthonormal wavelet rectangular time-frequency box, known as a tile, of area basis, the wavelet transform coefficients are given by ∆t∆f ) constant. Information about low frequencies is given over a longer period of time, corresponding to a ∞ wide, short-time frequency tile. Information at high j/2 j - cj,k ) 2 ∑ f(l)ψ(2 l - k) (7) frequencies, while blurred over a larger frequency range, l)-∞ is given over a smaller time interval, corresponding to a narrow, tall tile. where f(l)isthelth element of the discrete time data Once the discrete transform has been calculated, sequence (Chui, 1992). In this work we extensively use compression is achieved by rounding the smallest coef- Daubechies’ wavelets (Daubechies, 1988, 1992) which ficients to zero until either a desired compression ratio are orthonormal and have compact support. They can is reached or the accuracy of the reconstructed signal be expressed in terms of finite impulse response (FIR) has exceeded a desired bound. The frequent occurrence filter coefficients, which simplifies the calculation of the of the zero transform coefficients can be taken advan- wavelet transform coefficients and its inverse and allows tage of, for example, by storing the non-zero coefficients for fast computation. in the form {sign(wk)(k + ( wk /|wk|∞))} (Kantor 1993), There are several ways of taking the discrete wavelet | | where wk are the transform coefficients that have been transform. The most computationally efficient way of sorted in descending order. The non-zero coefficients, finding the discrete wavelet transform is by using in this form, must be stored with the length of the multiresolution analysis (Mallat, 1989; Strang, 1989). original vector and |wk|∞, the maximum transform Daubechies’ wavelets and the multiresolution analysis coefficient. The compression ratio is then given by form of the discrete wavelet transform are used in this paper. The multiresolution analysis output from a compression ratio ) signal can be represented as a gray-level image on the length of original vector time-frequency plane where dark areas represent coef- (8) ficients with large magnitude and white areas represent number of non-zero coefficients + 2 zero coefficients (Taswell, 1993), or in three-dimensional space, where the three axes are time, frequency, and 2.3. Quantization. Quantization is similar to round- coefficient magnitude. Figure 3a shows the function ing numbers to a desired level of accuracy, but it is not 270 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 necessarily restricted to rounding to the nearest whole number or to a decimal place. In the scalar case the real line is divided up into line segments known as cells. The cells are smaller (closer together) in the region of the real line where a measurement is most likely to occur. Outside of this region the cells can be made larger without a significant loss in accuracy. Scalar quantization can be extended to vector quantization where instead of dividing the real line into line seg- ments, the Rk space is divided into smaller regions of dimension k (Gersho and Gray, 1992; Linde et al., 1980). For example, in two dimensions a cell becomes a polygon and in three dimensions a polyhedron. Each cell, i, is assigned a single value (vector), yi, which is the midpoint of that cell. The collection of values (vectors), yi, is known as the codebook. A data point, or vector of points, that lies within cell i is approximated by the value (vector) yi. Quantization is lossy in that a quantized signal cannot be perfectly reconstructed or inverted to give the exact original Figure 4. Amoco and duPont data sets. signal. The advantage is much higher compression ratios. puter requires k 64 bits of memory to represent each × Vector quantization was implemented by taking group of data to be quantized by a k-dimensional sequential blocks of data of a single variable, the same codebook. The compression ratio is then dimension as the codebook elements, and quantizing the blocks. For example, if a three-dimensional codebook k 64 compression ratio (10) was used, blocks of three data points were quantized ) × log2(N) sequentially. This approach was used here since the motivation for compressing the data is primarily to since the memory required to store the codebook ( 64Nk decrease transmission time of individual measurements. ) bits) can be neglected for large data sets. For example, To calculate the codebook, with a given number of a three-dimensional codebook containing 128 elements elements, N, and dimension, k, an iterative procedure has a compression ratio of 3 64/log2(128) ) 27.43. must be used (Gersho and Gray, 1992). The iteration × is generally based on a subset of the entire data set, known as training data. While the optimality of the 3. Results codebook can be guaranteed for the compression of the Two groups of plant data were used to test the training data, it cannot be guaranteed for the entire following data compression methods: piecewise linear data set to be compressed. The ability of the vector functional approximation, vector quantization, and data quantizer to compress data effectively depends on the transforms (Fourier, cosine, and wavelet). The first set training data that are used to calculate the codebook. of data is flow rate measurements from one of Amoco’s Once a decision has been made about the number of depropanizing distillation columns (MacFarlane, 1993). elements N in the codebook and the dimension k of each The second set of data is from duPont’s Falcon project element, the compression ratio can be calculated on the (Moser, 1994). Two sets of data from the first group basis of the number of bits required to represent the and two from the second group of data are shown in signal before and after quantization. To calculate the Figure 4. memory required after the signal has been quantized, The % relative global error is used in the evaluation we have assumed it has been coded. The idea of coding of the effectiveness of the compression algorithms. It is to assign a binary code to the codebook indices i. is calculated as the ratio of the L2 norm of the difference Numerals from another base, for example hexadecimal, between the original and reconstructed signal and the could be used here also. The theoretical limit to the L2 norm of the original signal: average number of bits needed to represent a given set of characters is given by the information entropy func- 2 ∑ i(fi - ˆfi) tion (Held and Marshall, 1991), measured in bits: % relative global error ) 100 (11) × ∑ (f )2 N i i

H ) ∑Pi log2 Pi (9) i)1 where fi denotes the ith element of the original signal and ˆfi is the ith element of the reconstructed signal. The % relative maximum error, which gives a localized where Pi is the probability of the ith codebook element occurring and N is the number of cells (quantization measure of the error, is based on the absolute value of the maximum difference between the original and intervals). Since the Pi’s are probabilities, ∑iPi ) 1. In general, each of the elements of a codebook has an reconstructed signal and is expressed as a fraction of the L∞ norm of the signal: approximately equal probability of occurring, so that Pi ) 1/N in eq 9. For this reason coding the index with a binary number of constant bitlength is good enough. % relative maximum error ) Then, from eq 9, the number of bits required to code max ( f - ˆf ) 100 i | i i| (12) the indices, i ) 1, 2, 3, ..., N, from the codebook is log2- × max ( f ) (N) rounded up to the nearest integer. A 64-bit com- i | i| Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 271

Figure 5. Comparison of piecewise-linear compression algo- Figure 6. Vector quantization of data: b, scalar codebook; +, rithms: -‚-, boxcar; --backward slope; ---, boxcar and backward two-dimensional codebook; - ‚ -, three-dimensional codebook; - - -, slope; s swinging door. four-dimensional codebook; --, five-dimensional codebook; s, six- dimensional codebook. The left-most point of each curve represents 3.1. Piecewise Linear Compression. For the the design with the largest number of code elements. piecewise linear case, the relation between compression ratio and error was found by varying the recording limit, making the linear approximations, and comparing the reconstructed signal with the original. The results are shown in Figure 5. Note that the recording limit depends on the range of the signal. For example, the reboiler flow has a range of values from 5.9 to 8.1 and the recording limit was varied from 0.1 to 0.5, whereas the feed flow has a range of 3500-5500 so its recording limit was considerably larger. Figure 5 shows that the performances of the piecewise linear methods are comparable, an expected result considering that the same interpolating function is used to reconstruct the signals, although the polynomials used to establish the recording limit vary between zeroth and first order. In terms of the local error, it was found that the swinging door algorithm compressed the data most effectively (Watson, 1996). 3.2. Vector Quantizer Compression. The Amoco and Falcon data sets were quantized with codebooks of Figure 7. Comparison of transforms’ effectiveness in compressing various dimensions and sizes. The dimension was data: s, Daubechies’ fifth-order wavelet; --,Fourier transform; ---, cosine transform. varied between 1 (scalar) and 6, for codebooks of size 16, 32, and 64. For codebooks containing 128 elements The curves were generated by taking the transform of the dimension was varied from 1 to 4, and for codebooks the entire signal and varying the number of the trans- with 256 elements, the dimension was 1 and 2. The form coefficients that are considered small and are thus Amoco data sets are relatively small and the entire set set to zero. The compression ratio is varied by changing is used as the training data in the calculation of each the number of discarded coefficients, and the recon- codebook. The duPont data sets are much larger and, structed signal is compared with the original to calculate to simulate what would be done in a practical setting, the relative global error. The wavelet transform of the training data sets are selected. This also reduces the signal was taken using Daubechies’ compactly sup- computational load. An initial codebook was found ported orthonormal wavelets (Daubechies, 1988). In- iteratively using the pairwise nearest neighbor algo- formation entropy was used to evaluate the order of the rithm (Gersho and Gray, 1992), and the initial codebook wavelet that would compress the signal most effectively, was then trained using the LBG algorithm named after although, in general, all the different wavelets perform Linde, Buzo, and Gray (1980). The training ratio (the equally effectively. In fact, a set of 20 wavelets (Taswell, ratio of the number of training vectors to the number 1993) have been tested, and it was found that there was of codebook vectors) was 8. no significant variation in the compression performance Figure 6 shows the effect of codebook size and (Watson, 1996). From Figure 7 it is obvious that the dimension on vector quantization compression perfor- wavelet and cosine transforms have performed better mance for the Amoco and Falcon data. The general than the Fourier transform. It is well-known that in trend is clear: the most effective codebook has a high general the discrete cosine transform is better at dimension and a large number of elements. compressing data than the discrete Fourier transform 3.3. Transform Compression. Figure 7 shows the (Rao and Yip, 1990). It might be considered a surprise % relative global error as a function of compression ratio that the cosine did as well as the wavelet transform, for the wavelet, discrete cosine, and Fourier transform. but it should be noted that the comparison is made on 272 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998

Table 1. Effect of Using a Different Codebook on the Reconstruction Error of Data dP1 relative global error (%) codebook compression dimension ratio codebook (a) codebook (b) 1 10 0.2 0.3 2 25 0.2 0.5 3 45 0.2 0.8 4 50 0.2 0.8 5 70 0.2 1.0 6 80 0.2 1.0 ∑510 comparisons, the second n)1n comparisons, and so on. ∑511 ∑k Finding the 64 nearest neighbors requires k)64 n)1n ) 22 325 856 comparisons. Only the six-dimensional vector quantization results are shown in Figure 8, since this gave the lowest reconstruction errors for a given compression ratio. The codebook sizes shown are 16, 32, and 64 which correspond to decreasing compression Figure 8. Overall comparison of the compression algorithms: s, ratios and data reconstruction errors. Designing a wavelet; -‚-, discrete cosine transform; ---, vector quantization; --, codebook of this nature for a single-process variable swinging door. requires an unreasonably large amount of computer time, especially if the data set is large. For example, the basis of global error and not the local error. the design of a six-dimensional codebook containing 64 Comments on the smaller local error achieved by the elements for data sets dP1 or dP2 takes several hours wavelet transform compression are presented in the of CPU time on an IBM Powerstation 320H workstation, Discussion. using Matlab algorithms. The designing of a separate codebook for every measured variable of a large process- 4. Discussion ing plant is simply infeasible. The justification for this computational effort is that To facilitate the comparison of the compression tech- once the codebook has been calculated, it can be used niques, only the best from each family of algorithms are to quantize an infinite length of data. But the quanti- considered in this section. Among the linear methods, zation is only effective as long as the data are similar the swinging door performed the best. From the trans- to the training data. If the training data used to design form methods we select, for this comparison, the wavelet the codebook are not selected appropriately, a subopti- transform and its closest competitor, the cosine trans- mal quantizer results. A case in point is the failure to form. Highlights of the Results section are shown in select enough training data to cover the entire range of Figure 8. measurements that can be expected from the measuring Clearly the swinging door algorithm does not com- device. Choosing appropriate training data requires press data as effectively as the other two compression either a lot of data, and hence a lot of computer time, algorithms. However, the advantage of using a piece- or engineering judgment. Neither of these requirements wise linear method is its simplicity. The recording limit are satisfactory for most problems. Figure 8 shows that can be matched to the intrinsic noise level of a measure- the quantizer compresses data more effectively than ment so that only significant changes and trends are wavelet transforms when the codebook design is based recorded. Many process variables have long periods of on the entire data set, as was the case for the Amoco steady-state or pseudo-steady-state behavior, and piece- flowrate data. In a typical application the length of the wise linear functions are well suited to describing this data set to be quantized is considerably longer than that type of behavior. Reconstruction of the compressed data of the Amoco flow rate data, used in this study, and thus is simply a matter of drawing straight lines between the training data used to design the codebook is a recorded measurements and need not involve a com- fraction of the total data available. puter if hard copies of the recorded measurements are Vector quantization is not a global method. That is, available. Insight can be gained from the compressed a codebook that compresses one data set effectively may data without having to first decode the data. The not compress another data set very well. Table 1 disadvantages are the date-time tag required for each illustrates this point. The codebook of one data set was recording and the varying time step between each used to compress another data set with a similar range. measurement. If each measurement has its own date- Codebook (a) is the original codebook intended for time tag, much of the compression is lost. Also, the Falcon data set 1 (dP1). The compression ratio that variable time step can be problematic for tasks such as corresponds to an error of 0.2% is found and codebook model identification. (b), from another similar data set, is used to compress The performance of a vector quantizer is strongly data set 1 to the same compression ratio. The error dependent on the dimension and the size of the code- values are listed in Table 1. Clearly, as the dimension book. To compress data most effectively, large code- of codebook (b) increases, the error of the quantization books of a high dimension are required. However the increases. If the codebook from a data set with a vastly design of a codebook, even for a small set of training different range was used instead, the relative global data, is extremely time consuming. For example, if the error would be considerably worse. Ideally, a separate codebook is to contain 64 one-dimensional elements, and codebook must be calculated for each process data set a training ratio of 8 is used, the training data consist of that is to be compressed. 512 elements. The first exhaustive search for the The calculation of the compression ratio for a tech- ∑511 nearest neighbor (see section 2.3) consists of n)1n nique based on functional approximation is independent Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 273

Figure 9. Falcon Data Set 3 (top) and difference between reconstructed and original data for vector quantization (VQ), Figure 10. Gibb’s phenomena for third-order Daubechies’ wavelet discrete cosine transform (DCT), and discrete wavelet transform and sinusoidal basis. (a) step function reconstructed from trun- (DWT). Note that the DWT scale has been magnified 10 times. cated wavelet series; (b) step reconstructed from truncated discrete cosine series; (c) discrete time impulse reconstructed from trun- of the number of bits required by the computer to store cated wavelet series; (d) discrete time impulse reconstructed from each measurement. However, the calculation of the truncated discrete cosine series. compression ratio for vector quantization, eq 10, is directly proportional to the computer’s bitlength. A lower bitlength shifts the plots of compression ratio versus error, in Figures 6 and 8, to the left, thereby lowering the compression performance of the quantizer. The trends that are of interest in the process indus- tries are usually dynamic in nature and should be reconstructed accurately from the compressed data. Data that contain long periods of steady-state behavior with intermittent regions of dynamic behavior are not compressed very effectively with a vector quantizer. It is not always possible to capture all of the dynamic characteristics of the data since vector quantization is sort of an averaging process. That is to say, if the steady-state behavior is more prevalent in the data, the vector quantizer gives more weight to reconstructing the steady-state part of the data more accurately than the transient or intermittent dynamic behavior. To il- lustrate the point, Figure 9 shows a data set, also from duPont’s Falcon project, that contains long periods of steady-state behavior. The vector quantizer incurs large Figure 11. Comparison of transforms’ effectiveness in compress- errors around the points where there is a large step ing data: s, wavelet transform; ---, Fourier transform; --cosine change, whereas the wavelet-based technique shows transform. almost no difference between the original and com- pressed signal around the step changes. In Figure 10, two signals, a step and a spike, each From Figure 8, it might appear, on first sight, that consisting of 200 points have been reconstructed from using wavelets to compress data has no significant a truncated wavelet and cosine series. Of the 200 series advantage over the discrete cosine transform. It shows coefficients, 166 were set to zero for the step function, that the discrete cosine transform compresses the data and 181 were set to zero for the spike. It can be seen as effectively as the discrete wavelet transform and in in Figure 10 that wavelets describe a step function more some cases more effectively. However, wavelets are well accurately, even though the same number of coefficients suited to describe long periods of steady-state behavior are thresholded. Notice also that the maximum value followed by abrupt changes because of their ability to of the discrete time impulse function, Figure 10c,d, is resolve a signal simultaneously into its time and considerably less than 1 when it is reconstructed from frequency components. Sinusoidal bases tend to intro- a truncated cosine series, whereas the truncated wavelet duce large errors in the region of the abrupt change series preserves the original magnitude of the spike. when some of the coefficients are thresholded. This was The comparisons have, up until now, been made based illustrated in Figure 9, where the difference between entirely on the relative global error defined in eq 11, the reconstructed and original signal is largest around which is based on the L2 norm. If the maximum relative the step changes. This is due to the well-known Gibb’s error is used as a basis of comparison instead, the phenomenon. To illustrate the point, we computed the picture is more complete. Figure 11 shows the % discrete cosine transforms of a step function and a relative maximum error, eq 12, as a function of com- discrete time impulse and truncated the number of pression ratio. Clearly, the wavelet transform has a coefficients that are used in the reconstruction series. consistently lower relative maximum error. As was 274 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 shown in Figure 10, the error is largest for a truncated Bakshi, B. R.; Stephanopoulos, G. Representation of process sinusoidal series when there is a sudden change, such trendssIII. Multiscale extraction of trends from process data. as a step or spike. Wavelets, however, are better suited Comput. Chem. Eng. 1994a, 18 (4), 267-302. Bakshi, B. R.; Stephanopoulos, G. Representation of process to describe sudden changes because the reconstruction trendssIV. Induction of real-time patterns from operating data series combines dilated and translated versions of the for diagnosis and supervisory control. Comput. Chem. Eng. mother wavelet. These two factors allow a sudden, 1994b, 18 (4), 303-332. time-localized change to be captured in a smaller Bakshi, B. R.; Stephanopoulos, G. Compression of chemical process number of larger wavelet coefficients. data through functional approximation and feature extraction. There are some issues that remain for future work. AIChE J. 1996, 42 (2), 477-492. Bristol, E. H. Swinging door trending: Adaptive trend recording? Two of them are discussed here. Firstly non-zero Advances in Instrumentation and Control; Instrument Society transform coefficients can be quantized and encoded of America: Research Triangle Park, NC, 1990; Vol. 45, pp 749- prior to storage to increase the compression ratio. The 754. effect of this on the error of reconstruction needs to be Chui, C. K. An Introduction to Wavelets; Academic Press: San evaluated. Secondly, it would be of interest to compare Diego, 1992. the effectiveness of the wavelet transform with the Daubechies, I. Orthonormal bases of compactly supported wave- lets. Commun. Pure Appl. Math. 1988, 41, 909-996. Karhunen-Loeve transform. The Karhunen-Loeve Daubechies, I. Ten Lectures on Wavelets; Society for Industrial and transform is an optimal data transform, although there Applied Mathematics: Philadelphia, 1992. is no fast algorithm and the set of basis functions, or Elliott, D. F.; Rao, K. R. Fast Transforms: Algorithms, Analyses, vectors, is determined on a case by case basis (Rao and Applications; Academic Press, Inc: New York, 1982. Yip, 1990). Feehs, R. J.; Arce G. R. Vector quantization for data compression of trend recordings. Technical Report, Udel-EE 88-11-1; Uni- versity of Delaware: Newark, DE, 1988. 5. Conclusions Gersho, A.; Gray R. M. Vector Quantization and Signal Compres- In this paper, several ways in which process data can sion; Kluwer Academic Publishers: Boston, 1992. be compressed were described. The different compres- Hale, J.; Sellars, H. Historical data recording for process comput- sion methods were applied to long sets of real plant data, ers. Chem. Eng. Prog. 1981, 77 (11), 38-43. Held, G.; Marshall, T. Data Compression: Techniques and Ap- and comparisons were made. Compressing data by plications: Hardware and Software Considerations; Wiley: New transforming the data and thresholding the insignificant York, 1991. transform coefficients is the most effective means of Hu, T. W.; Arce G. R. Application of subband decomposition to compressing large sets of industrial data. A piecewise process control data. Technical Report, Udel-EE 88-12-1; Uni- linear approach in general does not compress data as versity of Delaware: Newark, DE, 1988. effectively as data transforms. The compression ratios Kantor, J. C. Wavelet Toolbox Reference Version 1.1. University of Notre Dame: Notre Dame, IN, 1993. are lower for a given error due to the date-time tag Linde, Y.; Buzo, A.; Gray, R. An algorithm for vector quantization associated with each recording as necessitated by the design. IEEE Trans. Commun. 1980, COM-28,84-95. variable time step. The use of a vector quantizer to MacFarlane, R., Amoco Corporation, personal communication, compress plant data is impractical because of the 1993. amount of time it takes to calculate the codebook for Mallat, S. G. Multiresolution approximations and wavelet ortho- 2 each data set and the inability of the vector quantizer normal bases of L (R). Trans. Am. Math. Soc. 1989, 315 (1), 69-87. to use a codebook for different data sets. The wavelet Moser, A. R., E. I. du Pont de Nemours and Company, personal compression algorithm is simple and fast and can be communication, 1994. applied to any data set. Rao, K. R.; Yip, P. Discrete Cosine Transform: Algorithms, The wavelet transform compresses data as effectively Advantages, Applications; Academic Press, Inc.: Boston, 1990. Strang, G. Wavelets and dilation equations: A brief introduction. as the discrete cosine transform when the L2 norm is used to quantify the reconstruction error. The wavelet SIAM Rev. 1989, 31 (4), 614-627. Taswell, C. WavBox3: Wavelet Toolbox for Matlab; Stanford Uni- transform is superior at reconstructing sudden changes versity: Stanford, CA, 1993. in the measured data (steps, spikes, etc.). This is Watson, M. J. Wavelet Techniques in Process Data Compression. important for the process control engineer who needs Master’s Thesis, Lehigh University, Bethlehem, PA, 1996. to carry out model identification and investigate con- Watson, M. J.; Liakopoulos, A.; Brzakovic, D.; Georgakis, C. troller tuning. Furthermore, computational efficiency Wavelet techniques in data compression and dynamic model is improved when orthonormal wavelets are used. It identification. Research Progress Report 19; Chemical Process Modeling and Control Research Center: Lehigh University, takes less time to compute the discrete wavelet trans- Bethlehem, PA, 1994. form than the discrete Fourier or cosine transform. Received for review June 2, 1997 Acknowledgment Revised manuscript received October 15, 1997 X The authors thank Amoco and duPont for supplying Accepted October 17, 1997 the data. IE970401W

Literature Cited Ahmed, N.; Natarajan, T.; Rao, K. R. Discrete cosine transform. X Abstract published in Advance ACS Abstracts, December IEEE Trans. Comput. 1974, C-23,90-93. 15, 1997.