ARTICLE IN PRESS

Control Engineering Practice 12 (2004) 917–929

Estimating product composition profiles in batch distillation via partial regression Eliana Zamprognaa, Massimiliano Baroloa,*, Dale E. Seborgb a Dipartimento di Principi e Impianti di Ingegneria Chimica (DIPIC), Universita" di Padova, Via Marzolo, 9, 35131 Padova PD, Italy b Department of Chemical Engineering, University of California, Santa Barbara, CA 93106, USA Received 15 February 2003; accepted 24 November 2003

Abstract

The properties of two multivariate regression techniques, principal component analysis and partial least squares (PLS) regression, are exploited to develop soft sensors able to estimate the product composition profiles in a simulated batch distillation process using available temperature measurements. The estimators’ performance is evaluated with respect to several issues, such as pre-processing of the calibration and validation sets, number of measurements used as sensor inputs, presence of noise in the input measurements, and use of lagged measurements. A simple augmentation of the conventional PLS regression approach is also proposed, which is based on the development and sequential use of multiple regression models. The results prove that the PLS estimators can provide accurate composition estimations for a batch distillation process. The computational requirements are very low, which makes the estimators attractive for on-line use. r 2004 Elsevier Ltd. All rights reserved.

Keywords: Batch distillation; Composition estimators; Soft sensors; Partial least squares regression; Principal component analysis

1. Introduction composition), at constant distillate composition (with variable reflux ratio), and at total reflux. A combination Batch distillation is a well-known unit operation that of these three basic modes can be used to optimize the is widely used in the fine chemistry, pharmaceutical, performance of the separation. Whatever the operating biochemical, and food industries to process small , proper operation of a batch column requires amounts of materials with high added value. The knowledge of products compositions during the entire success of batch distillation as a method of separation duration of the batch. Although product composition is undoubtedly due to its operational flexibility. A single can be measured on-line, it is well known that on-line batch column can separate a multicomponent mixture analyzers are complex pieces of equipment that are into several products within a single operation; con- expensive and difficult to maintain; they also entail versely, if the separation were carried out continuously, significant measurement delays, which can be detrimen- either a train of columns or a multi-pass operation tal from the control point of view (Leegwater, 1992). would be required. Also, whenever completely different Therefore, to circumvent these disadvantages, it is mixtures must be processed from day to day, the possible to estimate the product composition on-line, versatility of a batch column is unexcelled. These rather than measuring it. The use of such inferential attributes are crucial for quickly responding to a market composition estimators (or software sensors) has long demand characterised by short product lifetimes and been suggested to assist the monitoring and control of severe specification requirements. continuous distillation columns. Several applications Batch columns can be operated in three different have been reported in the literature, for both simulated ways: at constant-reflux ratio (with variable distillate and experimental columns (Joseph & Brosilow, 1978; Yu & Luyben, 1988; Lang & Gilles, 1990; Mejdell & *Corresponding author. Tel.: +39-0498275473; fax: +39- Skogestad, 1991; Baratti, Bertucco, Da Rold, & 0498275461. Morbidelli, 1995; Chien & Ogunnaike, 1997; Kano, E-mail address: [email protected] (M. Barolo). Miyazaki, Hasebe, & Hashimoto, 2000). However, the

0967-0661/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.conengprac.2003.11.005 ARTICLE IN PRESS 918 E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 issue of composition estimation in batch distillation computational requirements, but they may be difficult to columns has received very little attention. initialize, and suffer from very poor robustness to Quintero-Marmol, Luyben, and Georgakis (1991) process/model mismatch and to measurement noise. and Quintero-Marmol! and Luyben (1992) compared EKFs are much more robust to mismatch and noise, but the performances of a steady-state composition estima- still their performance heavily depends on the thermo- tor, a quasi-dynamic estimator (QDE), and an extended dynamic modeling of vapor–liquid equilibria; they are Luenberger observer (ELO) for a ternary batch column. also difficult to initialize and tune, and require They found that the ELO provided the best perfor- considerable computational effort for on-line use. mance. However, they noted that the observer was quite In all of the batch distillation studies cited above, the sensitive to the accuracy of assumed vapor–liquid composition estimator was obtained based on a funda- equilibria and to the assumed initial compositions; mental (i.e., first-principles) model of the process. In this moreover, the estimator’s performance rapidly degraded paper, we pursue a different approach by developing an when the tray temperature measurements (i.e., the estimator based on an empirical process model. The observer’s inputs) were affected by noise. Similar issues objective of this research is to evaluate the applicability were noted by Barolo and Berto (1998), who used the of multivariate regression techniques to develop a composition estimates generated by an ELO within a composition soft sensor for a conventional batch nonlinear strategy for composition control in a conven- rectifier. This approach is potentially very profitable, tional batch rectifier. They also observed that the because most of the disadvantages of estimators based estimator accuracy tended to degrade if the tray on a physical model can be resolved using an empirical hydraulics was taken into account and the number of estimator. In fact, a priori knowledge about vapor– trays was large. Han and Park (2001) used Luyben’s liquid equilibria behavior is not required in this latter QDE to estimate the distillate composition, and to case. Also, the estimator does not require composition control the estimated composition profile of a batch initialization, and is computationally simple, which is rectifier. desirable for on-line implementation. In order to improve the estimator’s robustness Partial least squares (PLS) regression is a widely used to process/model mismatch and measurement noise, multivariate regression technique, and its application to Barolo, Pistillo, and Trotta (2000) developed an the development of composition estimators for chemical extended Kalman filter (EKF) to reconstruct the processes has gained vast interest (Kourti & MacGre- product composition profiles of a middle-vessel batch gor, 1995; Yin, 1998; Kourti, 2002). This projection column from temperature measurements. They showed method is used to extract the information contained in that, while the robustness to measurement noise was available process data, and to project it onto to a low- generally improved with respect to an ELO, the dimensional space defined by new variables called latent estimation performance was greatly affected by the variables. Several applications of PLS regression to soft location of measurement sensors. The state vector sensor development have been reported for continuous initialization and filter ‘‘tuning’’ were more difficult distillation processes (Mejdell & Skogestad, 1991; Park than for the ELO case; moreover, the improvements in & Han, 1998; Hong, Jung, & Han, 1999; Kano et al., the estimator’s performance were obtained at the 2000; Shin, Lee, & Park, 2000), while the potential of expense of a much larger computational load. Oisiovici extending the use of this technique to batch distillation and Cruz (2000, 2001) developed an EKF to infer the has received relatively little attention. This may be due product composition of a batch rectifier, and applied it to the fact that PLS regression was originally developed within a globally linearizing control scheme to control for continuous steady-state process systems, and its operation at constant distillate composition. They extension to discontinuous processes raises some diffi- pointed out that accurate description of vapor–liquid culties. Recently, this technique has indeed been equilibria is very important for the performance of the extended to the analysis, on-line monitoring and filter. They also observed that the filter performance diagnosis of batch processes (Nomikos & MacGregor, usually improved when larger sets of secondary mea- 1995; Duchesne & MacGregor, 2000), and successful surements were used, and/or when the applications have been reported (Wold, Kettaneh-Wold, was increased; however, these options in- & Skagerberg, 1989; Kourti, Nomikos, & MacGregor, creased the estimator’s complexity and the computa- 1995; Zheng, McAvoy, Huang, & Chen, 2001). How- tional burden. Similar results were obtained by ever, in the large majority of these cases, the use of PLS Venkateswarlu and Avantika (2001), who also pointed regression is limited only to the estimation of the final out the difficulty in tuning the EKF matrices. quality of the batch product, whereas in batch distilla- To summarize the results reported in the above tion the knowledge of the composition profile during the papers, QDEs do not seem to be accurate enough for entire batch is required. Fletcher, Morris, and Martin actual monitoring and/or control applications in batch (2002) developed a local estimation approach based on distillation. ELOs are more reliable, and have modest dynamic PLS regression (Kaspar & Ray, 1993)to ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 919 monitor the performance of a batch fermentation components. The data matrix is then decomposed as process during the entire operation. K ¼ TPT þ E; ð1Þ Apparently, no previous applications of PLS regres- sion to the development of a composition estimator for where E is an m n matrix of residuals that contains batch distillation monitoring have been reported. In this that part of K that is left out of the regression. The paper, alternative composition soft sensors based on principal components, which are aligned along the s PLS regression are developed and evaluated for a columns of the matrix P, are the eigenvectors corre- simulated conventional batch distillation process oper- sponding to the s largest eigenvalues of the covariance ated at constant reflux ratio. The preliminary results matrix of K. reported by Zamprogna, Barolo, and Seborg (2002) are The of K is defined as extended. In particular, a technique based on principal KTK component analysis (PCA) is developed to pre-process covðKÞ¼ ; ð2Þ the input data set, and several PLS regression ap- m 1 proaches are considered to estimate the product and the relationship between the covariance matrix and composition not only at the end of the process but also each loading vector pi is during the entire duration of the batch. Several issues covðKÞpi ¼ lipi; ð3Þ are addressed, such as the effect of the number of measurements used as soft sensor inputs, the effect where li is the eigenvalue associated with the eigenvec- measurement noise, and the effect of augmenting the tor pi; and provides a measure of the amount of input data with lagged measurements. Also, a novel PLS of the original data described by the score-loading regression approach is proposed, which is based on the vector pair ti pi: Because these pairs are in descending development and sequential use of individual regression order of li; the first pair captures the largest amount of models for the different portions of the batch duration. variance of any pair in the decomposition. Each This paper is organized as follows. Section 2 provides subsequent pair captures the greatest possible amount T the theoretical background for PCA and PLS regression. of variance remaining after subtracting tipi from K. The The process model and operating strategy are described total variance of the original data retained in the PCA in Section 3, while issues concerning process data transformation is defined as the summation of the generation and pre-processing are discussed in Section variance expressed by the s principal components 4. Section 5 provides details on the soft sensor accounted for in the regression space. development, and evaluates the effects of different Computationally, the loading vectors can be obtained factors on the estimator’s performance. An augmenta- sequentially using the Nonlinear Iterative PArtial Least tion of the PLS estimator is proposed in Section 6, Squares (NIPALS) algorithm (Geladi & Kowalski, and the conclusions for the research are presented in 1986; Wold, Esbensen, & Geladi, 1987), which ensures Section 7. that the Euclidean norm of the residual matrix E is minimized for the given number of principal compo- nents. The optimal number of principal components s 2. Multivariate regression techniques can be assessed using a number of methods, with cross- validation being the most reliable and widely used This section provides a review of two important (Kourti & MacGregor, 1995). The residual Q and the multivariate regression techniques, PCA and PLS. Hotelling’s T 2 are widely used metrics to determine how well a sample conforms to the regression 2.1. Principal component analysis model (Jackson, 1991).

PCA explains the variance contained in a set of 2.2. Partial least squares regression correlated process variables by projecting the data onto a low-dimensional space defined by new uncorrelated PLS is conceptually similar to PCA, except that it variables, called principal components (Geladi & reduces the dimensions of two sets of data (an m nX Kowalski, 1986). The original process variables are input data set X and an m nY output data set Y) collectively represented as an m n matrix K, where m is simultaneously, finding the directions (latent variables, the number of samples and n is the number of variables. LVs) in the input space that are most predictive for the This transformation, which consists of an orthogonal output space (Kourti & MacGregor, 1995). A detailed regression in the n-dimensional space of the original description of the PLS algorithm and its mathematical variables, is performed so that the observation matrix K formulation are provided by Geladi and Kowalski is factored into two matrices: the score matrix T (m s) (1986). The PLS algorithm decomposes the X and Y and the principal component (or loading) matrix P original matrices into two lower-dimensional score (n s), where s is the specified number of principal matrices T (nY k) and U (nX k), which represent ARTICLE IN PRESS 920 E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 the projection of the original matrices X and Y onto the regressor (polynomial PLS; Wold et al., 1989), a spline latent variable space, plus two residual matrices E function (spline PLS; Wold, 1992), or a feedforward (mY nY ) and F (mX nX ), which contain that part of artificial neural network (ANNPLS; Qin & McAvoy, Y and X that is left out of the regression: 1992). These linear and nonlinear techniques are Y ¼ TPT þ E; generally referred to as static PLS algorithms. ð4Þ When process variables are also characterized by T X ¼ UQ þ F: auto-correlation in time, dynamic PLS can be used. In The latent variables are aligned along the k columns this approach, a relatively large number of past samples of the variables (lagged values) are included at each of the two score matrices, T (mY k) and U (mX k), and are ordered in such a way that the amount of sampling instant in the original input data matrix X information (variance) of the original data described by (Ricker, 1988; Lakshminarayanan, Shah, & Nandaku- each variable decreases as the number of latent variables mur, 1997), or in both the input and output matrices, X increases. The PLS transformation is performed so that and Y (Qin & McAvoy, 1996). The augmented data the score vectors of each ith latent variable are mutually matrices are then processed using conventional linear or related through an inner linear relationship: nonlinear PLS regression algorithms. Conventional PLS assumes that the data are given by ui ¼ biti þ hi; ð5Þ two-dimensional matrices. However, for batch pro- where bi is a coefficient determined by minimizing the cesses, data matrices are typically arranged in the form norm of the residual vector hi: of three-dimensional arrays, where different batch runs The optimal number of latent variables k can be are organized as rows, the measurement variables are in assessed using a number of methods, with cross- the columns, and their time evolution occupies the third validation being the most reliable and widely used dimension. Each horizontal slice through this three- (Kourti & MacGregor, 1995; Jackson, 1991). The latent dimensional array contains the trajectories of all the vectors are usually calculated iteratively using the variables from a single batch; each vertical slice collects NIPALS algorithm, or the SIMPLS algorithm. The the values of all the variables for all the batches at the latter ensures lower computational load and faster same time instant. To extend the conventional PLS convergence (De Jong, 1993). approach to batch data sets, these batch data arrays Because most practical problems are nonlinear, non- have to be unfolded to create two-dimensional arrays. linear PLS techniques have been developed in order to One possible way to rearrange the original arrays is maintain the robust generalization property of the represented by stacking one horizontal slice after the conventional (i.e., linear) PLS approach and, at the other, as shown in Fig. 1. This unfolding procedure is same time, represent any nonlinear relationships existing particularly useful when the number of recorded between X and Y. These techniques retain the frame- samples varies from batch to batch. As an alternative, work of linear PLS, but use a nonlinear relationship for the Multiway PLS approach (Nomikos & MacGregor, each pair of latent variables ui and ti: This relationship 1995; Kosanovic, Dahl, & Piovoso, 1996) provides for can be generally represented as XB to be unfolded by putting each of its vertical slices side by side to the right, starting with the one u ¼ f ðt Þþh ; ð6Þ i i i i corresponding to the first time interval. The three-way where fi( ) stands for a nonlinear vector function. For array is therefore decomposed in such a way that all example, it could be a second-order polynomial measurements collected over the entire duration of a

X XB Samples

Samples XB Samples Batches

Samples Batches

Variables X

Variables Samples Batches

Variables Variables Variables Variables (a) (b)

Fig. 1. Arrangement of the batch data XB (batches variables samples) in PLS (a) and in MWPLS (b). ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 921 batch are aligned along one row, and each row of the considered in the present study. This model will be new matrix X represents one batch. This unfolding referred to as ‘‘the process’’ hereafter. procedure makes it possible to analyze the variability among the batches in X by summarizing the information carried in the original data set with respect to both 4. Process data generation and pre-processing variables and their time variation. However, it is possible to resort to this unfolding paradigm only if all The data sets needed to develop the PLS regression the batches in the data set have the same time duration model of the process were generated by repeatedly (i.e., they have the same number of time samples), which running the first-principles model of the batch column is not the case of batch distillation. under different operating conditions. To determine how well the original data conform to Nineteen batch operations were simulated by varying the PLS regression space, the squared (MSQ) the initial feed composition, boilup rate and reflux rate error is defined as: from batch to batch, as summarized in Table 1. This qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi somewhat mimics the actual situation encountered by a # # T batch distillation process, whose feed and operating ðyi yiÞðyi yiÞ MSQi ¼ ; ð7Þ conditions may widely change from batch to batch. For m each operation, the trajectories of all process variables were monitored throughout the entire duration of the where yi is the row vector measurements referring to the # batch, and recorded using a sampling period of 36 s. For generic ith output variable y; yi is the vector of its estimates obtained from the soft sensor, and m is the each ith batch, the recorded temperature measurements total number of samples recorded. TB; T5; T10; T15; T20 were arranged as column vectors and used to compose the input data matrix Xi: Similarly, the measurements of the mole fraction of the light and intermediate components in the distillate stream, and 3. The process the heavy component in the bottoms were used to assemble the output matrix Yi: Because the simulated The process considered in this work is the separation batches have different durations, matrices Xi and Yi of a zeotropic ternary mixture in a conventional batch have different number of samples. rectifier with 20 trays, which is operated according to the The simulated data sets were divided into two groups: constant-reflux strategy described by Luyben (1991). data from the first 11 batches were used to compute the The column is initially operated at total reflux. When the distillate composition meets the desired quality specification, the distillate withdrawal is started, pro- Table 1 ducts and slop cuts are sequentially collected from the Characterization of the simulated batch runs included in the data set top and segregated in separate tanks. The heaviest cut is Batch run XF V (mol/h) D (mol/h) tF (h) recovered from the reboiler at the end of the batch. Calibration data Details about slop-cut withdrawal and handling are T01 0.25/0.60/0.15 105 52.50 3.96 given in Luyben’s paper. The process objective is to T02 0.33/0.33/0.34 100 47.84 3.28 recover each component of the feed at a given minimum T03 0.40/0.20/0.40 130 68.06 2.42 purity level. Namely, the mole fraction of the key T04 0.10/0.80/0.10 85 44.73 5.81 T05 0.10/0.20/0.70 90 42.65 2.27 component in each product must be equal to, or greater T06 0.35/0.15/0.50 95 50.26 2.55 than, 0.95. T07 0.20/0.45/0.35 105 49.52 3.33 A software sensor is developed to estimate the T08 0.50/0.40/0.10 100 53.19 3.88 instantaneous mole fraction of the light and intermedi- T09 0.60/0.25/0.15 115 53.99 3.80 ate components in the distillate stream (x and x ; T10 0.55/0.15/0.30 110 51.16 3.66 D;1 D;2 T11 0.10/0.05/0.85 80 42.78 1.13 respectively), and the mole fraction of the heavy component in the reboiler (xB;3), which are the key Validation data compositions needed for process monitoring. To this V01 0.45/0.40/0.15 110 58.82 3.38 purpose, it is supposed that temperature measurements V02 0.20/0.30/0.50 95 51.07 2.85 are available from the still pot and four additional trays V03 0.33/0.50/0.17 75 40.54 5.00 V04 0.75/0.10/0.15 85 40.47 5.01 (trays 5, 10, 15, 20, considering a bottom-top numbering V05 0.08/0.40/0.52 100 50.00 3.32 scheme), as suggested by Quintero-Marmol et al. (1991). V06 0.20/0.10/0.70 120 56.33 1.37 The process is modeled by a system of differential and V07 0.45/0.05/0.50 80 37.38 4.14 algebraic equations, which is extensively described in V08 0.65/0.10/0.25 105 56.45 3.66

Barolo and Berto (1998). The only difference with Note: For each batch run, the values of feed composition XF ; boilup respect to that model is that a tray holdup of 5 mol is rate V; distillate rate D; and final batch time tF are specified. ARTICLE IN PRESS 922 E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 parameters of the multivariate regression model (cali- reflux rate R; and the total duration of the batch tF : The bration sets); the remaining eight batches were used to feature vectors corresponding to the batch operations test the accuracy of the PLS regression (validation sets). used for calibration and validation were collected into PLS regression is data dependent. As a consequence, two matrices (Kfc and Kfv; respectively), in such a way analyzing and pre-processing the data used for PLS that the ith feature vector of a data set is aligned along model calibration and validation is of paramount the ith row of the corresponding matrix. The dimensions importance and requires particular care. First, data of the feature matrices Kfc and Kfv are therefore 11 5 from abnormal operations need to be detected and and 8 5, respectively. removed from the database, as they generally make the PCA was then performed on the calibration feature process identification more difficult, and result in a matrix Kfc; in order to define a lower-dimensional space regression model which is not representative of the that represents the original informative content of this ordinary process behavior for the considered operating matrix. region (Kano et al., 2000). Second, a suitable scaling A subspace of two principal components was used to procedure for the available data set needs to be adopted, project the original feature matrix Kfc; as this guarantees as proper data normalization can favor the determina- that more than 80% of the original data variance is tion of the most representative PLS multiplane (Kourti explained. The value of the residual Q and T2 statistics & MacGregor, 1995). These two issues will be addressed in the principal component space was calculated for in the following subsection. each calibration and validation set, and is reported in Fig. 2. 4.1. Data analysis and pre-processing All the calibration sets conform fairly well in the selected principal component space, as the 95% The data sets selected for calibration and validation statistical confidence limits are never violated. Con- were investigated in order to verify that no anomalous versely, the plot of score residuals Q calculated for the batches were included in the representative databases. validation data reveals that batch runs 4, 7 and 8 in the To this purpose, by exploiting the properties of PCA a validation sets are anomalous, since the value of their method was developed to detect abnormal batch runs. residuals exceeds the 95% confidence limit calculated for According to this approach, each simulated operation the calibration data. Close inspection of these data sets can be characterized by using a feature vector composed revealed that they were indeed characterized by an by the mole fraction of the light and intermediate unusual development of the operation, as the bottom of component in the feed ðxF;1; xF;2Þ; the boilup rate V; the the column empties before the composition in the still

12 12

10 10

8 8 Q Q

6 6

4 Q = 0.2 4 95 Q = 0.2 Residuals Residuals 95 2 2

0 0 123456789101112 0123456789 Calibration Set Validation Set

12 12 2 T2 = 9.5 T = 9.5 10 95 10 95

8

8 2 2

6 6

4 4 Hotelling T Hotelling Hotelling T 2 2

0 0 123456789101112 123456789 (a) Calibration Set (b) Validation Set Fig. 2. Value of residuals Q and Hotelling’s T 2 statistic and their 95% confidence limit for (a) the calibration sets, and (b) the validation sets. ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 923 pot reaches the desired specification. These data sets for these latter soft sensors is remarkably higher when were therefore excluded from the database. xD;1 and xD;2 are regarded. The deterioration of the The input and output matrices corresponding to the estimation accuracy for xB;3 is instead moderate. On the calibration sets were then arranged according to the whole, the best overall estimation performance is procedure schematically represented in Fig. 1 obtaining provided by the linear PLS soft sensor, as it expresses the comprehensive input and output calibration data the minimum value of total MSQ. matrices Xc and Yc: Similarly, the input and output Fig. 3 reports the comparison between the actual matrices (corresponding to the five validation batches value of the product composition profiles and their remaining after purging the original database from the estimates provided by a linear PLS soft sensor. All the anomalous sets) were used to build the validation data validation data are represented in this figure. It is clear matrices Xv and Yv: that a composition estimator can indeed be developed Calibration and validation data were scaled to zero using a PLS regression model of the process: the profiles mean and unit variance. This normalization procedure of the estimated product composition provided by the showed to lead to soft sensor of improved estima- soft sensor matches the actual composition profiles quite tion performance over alternative scaling methods accurately. Note that the accuracy of bottom composi- (Zamprogna, 2002). tion estimation needs to be good only by the end of each batch, since the bottom product is not withdrawn continuously. 5. PLS soft sensors It is useful to remark that, differently to what happens in structured estimators, no information on the system Linear, polynomial, spline and ANN PLS transfor- thermodynamics needs to be provided to the PLS mations were carried out on the calibration data sets Xc and Yc to determine a regression model that relates the 1.0 characteristics of the recorded temperature profiles to 0.8 the changes in the dominant component mole fraction in 0.6 x each product. Three latent variables were retained in the D,1 0.4 regression models, this optimal number having been 0.2 determined using cross-validation. 0.0 Table 2 shows the total percentage of variance of the original calibration data captured by the obtained 1.0 0.8 regression models. The values of the estimation error x 0.6 index MSQ calculated for the validation data are also D,2 0.4 reported for each soft sensor. As can be seen, the total 0.2 0.0 percentage of explained variance of Xc and Yc is not strongly affected by the adopted regression method, since its value roughly the same for all the soft sensors. 1.0 On the contrary, from the values of the prediction 0.8 error index MSQ calculated with respect to the x 0.6 B,3 0.4 validation data it emerges that the PLS regression Actual 0.2 model affects more markedly the accuracy of estimation Estimated 0.0 of the resulting soft sensor. In particular, the estimation 0 400 800 1200 1600 performance of the linear and polynomial PLS soft Time sample sensors can be considered equally good. Conversely, the Fig. 3. Comparison between the actual value of the product accuracy of estimation is poorer when spline or ANN compositions and their estimates provided by a linear PLS soft sensor PLS regression is adopted. The value of MSQ calculated (validation data).

Table 2 Total percent of variance and validation MSQ estimation error calculated for linear and nonlinear PLS soft sensors (variance refers to calibration data, MSQ to validation data)

PLS regression method Variance explained (%) MSQ 103

Xc Yc xD;1 xD;2 xB;3 Total Linear 98.05 88.37 2.17 2.84 2.06 7.07 Polynomial 98.05 87.66 2.49 2.84 2.03 7.37 Spline 98.03 88.79 3.14 3.64 2.39 9.19 ANN 98.05 87.18 3.05 3.52 2.35 8.92 ARTICLE IN PRESS 924 E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 estimator (although this information is needed to noise-free input data were considered. In particular, this generate the calibration and validation databases, if reduction is larger as the noise level increases. However, experimental data are not available). Moreover, no the presence of measurement noise does not remarkably initialization of the composition profiles is requested, affect the overall estimation accuracy of the PLS and the composition estimates are obtained almost models. In fact, Fig. 4 reveals that the MSQ error instantaneously, as the computational load required by remains nearly constant for varying noise level. the PLS soft sensor is very low. Because of the inherent features of PLS regression, the first few latent variables capture the most of the 5.1. Effect of measurement noise valuable information retained by the process data, while the random noise is typically associated with the higher- Normally distributed noise of zero mean and standard order latent variables. Therefore, measurement noise is deviation s was added to the process inputs for both the usually eliminated when the original data are projected calibration and validation data sets in order to onto the lower-dimensional PLS space. determine whether the presence of measurement noise can undermine the estimation accuracy of the PLS 5.2. Effect of the number of temperature inputs estimators. In particular, the effect of the presence of low-level noise (s ¼ 0:1C) and high-level noise As mentioned in Section 3, five temperatures, evenly (s ¼ 1:0C) in the process data is evaluated. distributed along the column, were used as soft sensor As shown in Table 3, the cumulative percentage of the inputs, according to the indications provided by explained data variance slightly decreases when noisy Quintero-Marmol et al. (1991). In order to investigate input data are used to develop the linear and nonlinear the effect of using a different number of temperature estimators with respect to the nominal case, in which inputs (nX ), linear and nonlinear PLS soft sensors (with

Table 3 Total percent of variance captured by linear and nonlinear three-dimensional PLS soft sensors when model inputs are affected by normally distributed noise with zero mean and variance s (calibration data)

PLS regression method s ¼ 0C s ¼ 0:1C s ¼ 1:0C

Xc block Yc block Xc block Yc block Xc block Yc block Linear 98.05 88.37 98.03 88.35 97.87 87.78 Polynomial 98.05 87.66 98.03 87.65 97.87 87.06 Spline 98.03 88.79 98.01 88.73 97.88 88.30 ANN 98.05 87.18 98.03 87.21 97.57 86.60

10 10 Noise free Linear PLS Polynomial PLS 8 σ = 0.1 ûC 8 σ = 1.0 ûC 3 6 6

4 4 MSQ x 10

2 2

0 0 x x x x x x D,1 D,2 B,3 Total D,1 D,2 B,3 Total 10 10 Spline PLS ANN PLS 8 8 3 6 6

4 4 MSQ x 10

2 2

0 0 x x x x x x D,1 D,2 B,3 Total D,1 D,2 B,3 Total Fig. 4. Estimation error MSQ for linear, polynomial, spline and ANN PLS soft sensors when input data corrupted by noise of different level are used (validation data). ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 925 three latent variables) were developed assuming that Fig. 5, which compares the validation MSQ error measurements from all 20 trays of the column and the obtained for the soft sensors using different numbers of reboiler (nX ¼ 21) were available. Alternatively, linear temperature inputs, provides further proof of the and nonlinear PLS estimators were obtained considering reduced capability of the PLS models using 21 three temperature measurements as model inputs measurement inputs to describe the actual dynamics of (TB; T10 and T20; nX ¼ 3). Because the number of the process with respect to the nominal case (nX ¼ 5). It original input temperatures and the number of latent was also experienced that the computational load variables retained by the regression models are the same required to the calculation of the optimal regression in the latter case, the value of the total percentage parameters increases, because of the larger number of captured for the Xc matrix is 100% when nX ¼ 3: inputs variables incorporated in the data matrices. As can be seen by comparing Table 4 to Table 2, all Consequently, the regression procedure gets lengthier, the estimators capture approximately the same amount particularly so for the spline PLS and ANN PLS. of information of the original calibration input data Xc Conversely, the reduction of the number of tempera- that was explained by the corresponding model using tures in the input matrix does not deteriorate the five temperature measurements. However, the percen- estimation accuracy of the model. tage of represented variance for the output data Yc is These results confirm that the number and location of lower when 21 temperatures are used, and this indicates the temperature inputs play an important role for the that a larger portion of the information about the estimator accuracy, to a point that inappropriate choice composition dynamics variables is lost in this case. of the sensor inputs may reduce the estimation accuracy When three temperature inputs are considered instead, (Kano et al., 2000). Therefore, further investigation is the amount of information described for Yc is approxi- needed in order to determine the optimal subset of mately the same than that obtained for the nominal case. temperature measurements to be fed to the estimator. A study addressing this issue is under development, and results will be reported elsewhere. Table 4 Total percent of variance captured by linear and nonlinear three- dimensional PLS soft sensors when different numbers of temperature 5.3. Dynamic PLS regression measurements (nX ) are used as model inputs (calibration data) The PLS estimators considered so far are static in PLS regression method nX ¼ 3 nX ¼ 21 nature; dynamics is obtained by simply placing static Xc block Yc block Xc block Yc block estimations side by side. This may seem somewhat Linear 100.00 87.59 98.09 83.39 conflicting with the fact that batch distillation is an Polynomial 100.00 88.24 98.08 82.10 inherently dynamic process. For this reason, the Spline 100.00 87.67 98.03 83.98 possibility of developing an intrinsically dynamic ANN 100.00 88.81 98.08 82.45 estimator was explored.

12 Linear PLS 12 Polynomial PLS nX = 3 10 10 nX = 5

3 n = 21 8 X 8

6 6

MSQ x 10 4 4

2 2

0 0 x x x x x x D,1 D,2 B,3 Total D,1 D,2 B,3 Total

12 Spline PLS 12 ANN PLS

10 10

3 8 8

6 6

MSQ x 10 4 4

2 2

0 0 x x x x x x D,1 D,2 B,3 Total D,1 D,2 B,3 Total Fig. 5. Estimation error MSQ for linear, polynomial, spline and ANN PLS soft sensors when a different number of temperature inputs are used (validation data). ARTICLE IN PRESS 926 E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929

Table 5 Total percent of variance captured by linear and nonlinear three-dimensional PLS soft sensors when L lagged inputs are considered in the data matrices (calibration data)

PLS regression method L ¼ 0 L ¼ 2 L ¼ 4 L ¼ 6

Xc block Yc block Xc block Yc block Xc block Yc block Xc block Yc block Linear 98.05 88.37 97.51 88.20 97.07 87.93 96.64 87.64 Polynomial 98.05 87.66 97.51 87.50 97.07 87.25 96.64 86.98 Spline 98.03 88.79 97.49 88.42 97.05 88.01 96.63 87.68 ANN 98.05 87.18 97.51 78.04 97.07 86.70 96.64 86.30

12 12 Lags = 0 Linear PLS Polynomial PLS 10 Lags = 2 10 Lags = 4 3 8 Lags = 6 8 6 6

MSQ x 10 4 4

2 2

0 0 x x x x x x D,1 D,2 B,3 Total D,1 D,2 B,3 Total 12 Spline PLS ANN PLS 10 8

3 8 6 6 4

MSQ x 10 4 2 2

0 0 x x x x x x D,1 D,2 B,3 Total D,1 D,2 B,3 Total Fig. 6. Estimation error MSQ for linear, polynomial, spline and ANN Dynamic PLS soft sensors obtained considering input matrices with different number of lagged inputs (validation data).

The extension of the conventional PLS algorithm for lagged values in the input data matrix causes a dynamic modeling was achieved through the augmenta- substantial increase in the dimension of this matrix. tion of the original input data matrices Xc and Xv with This implies that the computational load required to the lagged values. The augmented matrix Xc was subse- development of the multivariate regression becomes quently processed using conventional linear and non- considerably heavier, and the regression procedure gets linear PLS regression paradigms, and the performance much lengthier. This is particularly true for spline PLS of the obtained multivariate models was tested using the and ANN PLS, which are based on computationally augmented matrix Xv: more intensive algorithms. In addition, the resulting As shown in Table 5, the number of lagged model results more cumbersome to manipulate, due to temperature samples included in the input data set the presence of a higher number of parameters. affects the amount of cumulative percentage of variance explained by the PLS models. In fact, the total variance captured decreases slightly with the increase of the 6. Multiple PLS soft sensors inputs lags, for all the regression models. However, the augmentation of the input matrices does As shown in Fig. 3, a linear PLS soft sensor is able to not affect the estimation performance of the resulting provide a fairly accurate estimation performance. soft sensors significantly, as can be inferred from Fig. 6. However, some mismatch can be observed between the The value of MSQ in fact does not change substantially actual and estimated composition profiles, particularly for different number of lags included in the original data. so for xD;1 and xD;2: In order to improve the estimator A major drawback faced when performing dynamic performance, a simple augmentation of the conventional PLS is that the addition of a relatively large number of PLS methods was considered that takes into account the ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 927 peculiar characteristics of the batch distillation process. Namely, the proposed approach consists of subdividing 1.00 0.75 the data recorded for each batch into subsets, each of x 0.50 which corresponds to a particular operating period (i.e., D,1 0.25 startup; main cut 1 withdrawal; slop cut 1 with- 0.00 drawal;y). For each period, separate models using linear and nonlinear PLS regression algorithms can be 1.00 developed. The PLS models of the same type (linear, 0.75 polynomial, spline or ANN) obtained for each period 0.50 x are then used sequentially to estimate the whole D,2 0.25 composition profile. The composition estimators ob- 0.00 tained through this regression procedure are referred to as multiple PLS (MPLS) soft sensors. This modeling strategy is motivated by the fact that each batch goes 1.00 through a series of phases with substantially different 0.75 x 0.50 characterization. In fact, because of the operating B,3 0.25 procedure adopted, the process dynamic regime moves Actual 0.00 Linear MPLS from a condition of total reflux to a condition of constant reflux. At the same time, the process experi- 0 400 800 1200 1600 ences large excursions in the tray compositions, due to Time sample the sequential movement of the light and intermediate Fig. 7. Comparison between the actual value of the product components from the bottom to the top of the column. compositions and their estimates provided by the linear MPLS soft These changes in the column dynamic regime and sensor. composition distribution are reflected in the variability of the temperature profile, which is used in the PLS tuning and sequential use of several single spline PLS regressors to reconstruct the entire composition profiles. models, the use of spline regression within this In contrast to conventional PLS, the MPLS method can augmented approach might not be profitable, as handle this complexity and variety of information about evidenced by the results obtained. the different process phases directly, and this fact can When the other MPLS soft sensors are regarded, the potentially improve the model accuracy. value of the estimation error obtained for xB;3 is in In order to develop the MPLS models, the original general slightly higher than that obtained using data for each batch run were partitioned into five the corresponding conventional PLS soft sensors (see sections, corresponding to the total reflux phase and to Table 2). However, this increase can be considered the withdrawal of the products and slop cuts obtained nearly negligible. Conversely, it can be observed that the from the top of the column. Linear and nonlinear MPLS value of MSQ calculated for xD;1 and xD;2 is markedly models were developed. lower for the soft sensors based on the MPLS regression Table 6 reports the overall value of MSQ calculated approach. for the linear and nonlinear MPLS regression models This is particularly so for the linear MPLS soft when validation data are considered. The spline PLS sensor, which results in the lowest total MSQ error regression results in a very large MSQ. It was already and therefore provides the most accurate estimation mentioned that the model parameters are generally more performance. As shown in Fig. 7, this soft sensor is difficult to tune for a spline PLS soft sensor because of capable to describe the composition dynamics very the inherent complexity of its regression algorithm accurately, and its estimation accuracy is clearly (Wold, 1992). Because the MPLS method requires the superior to that achieved by the conventional PLS approach.

Table 6 MSQ error calculated for the linear and nonlinear MPLS soft sensors 7. Concluding remarks (validation data)

PLS regression method MSQ 103 In this study, a PLS-based soft sensor was developed for a simulated batch distillation process in order to xD;1 xD;2 xB;3 Total estimate the composition of the distillate stream and of Linear 1.31 1.60 2.09 5.00 the bottom product using secondary process informa- Polynomial 1.71 1.87 2.83 6.41 tion provided by temperature measurements. PCA was Spline 6.17 21.86 12.19 40.22 used to analyze the available process data and identify ANN 1.84 1.88 2.48 6.20 the anomalous operations to be excluded from the ARTICLE IN PRESS 928 E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 database. Several soft sensors were developed using Acknowledgements linear and nonlinear PLS regression, and their estima- tion performances were compared. Various issues were This research was carried out in the framework of the addressed, such as the effect of the number MIUR-PRIN 2002 project ‘‘Operability and controll- of measurement inputs, the effect of noise in the ability of middle-vessel distillation columns’’ (ref. no. model input variables, and the effect of the augmenta- 2002095147 002). tion of the original process data with lagged input measurements. With respect to the characterization of the variables References used as sensor inputs, it was evidenced that an effective composition estimation can be achieved even when Baratti, R., Bertucco, A., Da Rold, A., & Morbidelli, M. (1995). not all of the available temperature measurements Development of a composition estimator for binary distillation are used as input data to calibrate the PLS regression columns. Application to a pilot plant. Chemical Engineering Science, 50, 1541–1550. models; on the contrary, the reduction of the number Barolo, M., & Berto, F. (1998). Composition control in batch of temperatures in the input matrix does not distillation: Binary and multicomponent mixtures. Industrial and necessarily deteriorate the estimation accuracy of the Engineering Chemistry Research, 37, 4689–4698. model. These results confirm the importance of proper Barolo, M, Pistillo, A., & Trotta, A. (2000). Issues in the development data selection in the development of regression soft of a composition estimator for a middle-vessel batch column. In E. F. Camacho, L. Basanez,! J. A. de la Puente (Eds.), Advanced sensors, and motivate further investigation in order to control of chemical processes 2000–IFAC ADCHEM 2000 determine the optimal number and location of the (pp. 923–928). Oxford, UK: Elsevier. temperature measurements to be used as soft sensor Chien, I., & Ogunnaike, B. A. (1997). Modeling and control of a inputs. temperature-based high-purity distillation column. Chemical The estimators’ performance is not undermined by the Engineering Communications, 158, 71–105. De Jong, S. (1993). SIMPLS: An alternative approach to partial least presence of measurement noise, as the inherent proper- squares regression. and Intelligent Laboratory ties of PLS regression make it possible to segregate and Systems, 18, 251–263. eliminate data disturbances. No remarkable improve- Duchesne, C., & MacGregor, J. F. (2000). Multivariate analysis and ment of the estimation accuracy was observed when optimization of process variable trajectory for batch processes. employing dynamic PLS regression. On the contrary, Chemometrics and Intelligent Laboratory Systems, 51, 125–137. Fletcher, N. M., Morris, A. J., & Martin, E. B. (2002). Local linear and the resulting soft sensors are generally more complex nonlinear multi-way partial least squares batch modelling. Edited and difficult to calibrate. preprints of b ’02–15th IFAC World Conference, Barcelona, Spain, The estimator performance improves significantly July 21–26. when using the proposed regression approach, multiple Geladi, P., & Kowalski, B. R. (1986). Partial least-squares regression: PLS regression, particularly so as far as the estimation A tutorial. Analytica Chimica Acta, 185, 1–17. Han, M., & Park, S. (2001). Profile position control of batch of the distillate composition is concerned. Finally, the distillation based on a nonlinear wave model. Industrial and computing power required by the estimators is generally Engineering Chemistry Research, 40, 4111–4120. very low, which makes them attractive for on-line use. Hong, S. J., Jung, J. H., & Han, C. (1999). A design methodology of a For the practical implementation of these soft sensors, soft sensor based on local models. Computers and Chemical however, some issues should be considered. For Engineering, 23, S351–S354. Jackson, J. E. (1991). A user’s guide to principal components. New example, since the sensors rely on multiple (tempera- York, USA: Wiley. ture) measurements, they are open to sensor malfunc- Joseph, B., & Brosilow, C. B. (1978). Inferential control of processes. tioning, and therefore care should be taken in Part I: Steady state analysis and design. AIChE Journal, 24, identifying any sensor faults. On the other hand, 485–492. measurement noise should not be an issue, as was Kano, M., Miyazaki, K., Hasebe, S., & Hashimoto, I. (2000). Inferential control system of distillation compositions using discussed. Switching between models in the multiple dynamic partial least squares regression. Journal of Process PLS approach may not be easy to achieve in practice, Control, 10, 157–166. because a bad composition estimation may drive the soft Kaspar, M. H., & Ray, W. H. (1993). Dynamic PLS modeling for sensor to switch between a model and another one at the process control. Chemical Engineering Science, 48, 3447–3461. wrong time, with the risk of further worsening the Kosanovic, K. A., Dahl, K. S., & Piovoso, M. J. (1996). Improved process understanding using multiway principal component composition estimation. An alternative approach could analysis. Industrial and Engineering Chemistry Research, 35, be to develop a single model for each composition to be 138–146. estimated; thus, a multiple-model would eventually be Kourti, T. (2002). Process analysis and abnormal situation detection: obtained, but with no need for switching between From theory to practice. IEEE Control Systems Magazine, 22(5), models. This approach should prove convenient also 12–25. Kourti, T., & MacGregor, J. F. (1995). Tutorial: Process when a larger number of composition profiles need to be analysis, monitoring and diagnosis, using multivariate regression estimated (i.e., in the case of feeds with more than three methods. Chemometrics and Intelligent Laboratory Systems, 28, components). 3–21. ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 929

Kourti, T., Nomikos, P., & MacGregor, J. F. (1995). Analysis, Quintero-Marmol, E., Luyben, W. L., & Georgakis, C. (1991). monitoring and fault diagnosis of batch processes using multiblock Application of an extended Luenberger observer to the control of and multiway PLS. Journal of Process Control, 4, 277–284. multicomponent batch distillation. Industrial and Engineering Lakshminarayanan, S., Shah, S. L., & Nandakumur, K. (1997). Chemistry Research, 30, 1870–1880. Modeling and control of multivariable processes: Dynamic PLS Ricker, N. L. (1988). The use of biased least squares estimators for approach. AIChE Journal, 43, 2307–2322. parameters in discrete-time pulse response model. Industrial and Lang, L., & Gilles, E. D. (1990). Nonlinear observers for distillation Engineering Chemistry Research, 27, 343–350. columns. Computers and Chemical Engineering, 14, 1297–1301. Shin, J., Lee, M., & Park, S. (2000). Design of a composition estimator Leegwater, H. (1992). Industrial experience with double quality for inferential control of distillation columns. Chemical Engineering control. In W. L. Luyben (Ed.), Practical distillation control. New Communications, 178, 221–248. York, USA: Van Nostrand Reinhold. Venkateswarlu, C., & Avantika, S. (2001). Optimal state estimation of Luyben, W. L. (1991). Multicomponent batch distillation. 1. Ternary multicomponent batch distillation. Chemical Engineering Science, systems with slop recycle. Industrial and Chemical Engineering 56, 5771–5786. Research, 27, 642–657. Wold, S. (1992). Nonlinear partial least squares modelling. II. Spline Mejdell, T., & Skogestad, S. (1991). Estimation of distillation inner relation. Chemometrics and Intelligent Laboratory Systems, compositions from multiple temperature measurements using 14, 71–84. partial-least-squares regression. Industrial and Engineering Chem- Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component istry Research, 30, 2543–2555. analysis. Chemometrics and Intelligent Laboratory Systems, 2, Nomikos, P., & MacGregor, J. F. (1995). Multi-way partial least 37–52. squares in monitoring batch processes. Chemometrics and Intelli- Wold, S., Kettaneh-Wold, N., & Skagerberg, B. (1989). Non-linear gent Laboratory Systems, 30, 97–108. PLS modelling. Chemometrics and Intelligent Laboratory Systems, Oisiovici, R. M., & Cruz, S. L. (2000). State estimation of batch 7, 53–65. distillation columns using an extended Kalman filter. Chemical Yin, K. K. (1998). Multivariate statistical methods for fault detection Engineering Science, 55, 4667–4680. and diagnosis in chemical process industries: A survey. Trends in Osiovici, R. M., & Cruz, S. L. (2001). Inferential control of high-purity chemical engineering, 4, 233–241. multicomponent batch distillation columns using an extended Yu, C. C., & Luyben, W. L. (1988). Control of multicomponent Kalman filter. Industrial and Engineering Chemistry Research, 40, distillation columns using rigorous composition estimators. In 2628–2639. Distillation and adsorption 1997, IChemE Symposium Series No. Park, S., & Han, C. (1998). A nonlinear soft sensor based on 104 (pp. A29–A69). London, UK: IChemE. multivariate smoothing procedure for quality estimations in Zamprogna, E. (2002). Development of virtual sensors for batch distillation columns. Computers and Chemical Engineering, 24, distillation monitoring and control using multivariate regression 871–877. techniques. Ph.D. dissertation, Department of Chemical Engineer- Qin, S. J., & McAvoy, T. J. (1992). Non-linear PLS modelling using ing, University of Padova, Italy. artificial neural networks. Computers and Chemical Engineering, 16, Zamprogna, E., Barolo, M., & Seborg, D. E. (2002). Development of 379–391. a soft sensor for a batch distillation column using linear and Qin, S. J., & McAvoy, T. J. (1996). Nonlinear FIR modeling via a nonlinear PLS regression techniques. Edited preprints of neural net PLS approach. Computers and Chemical Engineering, 20, b’02–15th IFAC World Conference, Barcelona, Spain, 147–159. July 21–26. Quintero-Marmol,! E., & Luyben, W. L. (1992). Inferential model- Zheng, L. L., McAvoy, T. J., Huang, Y., & Chen, G. (2001). based control of multicomponent batch distillation. Chemical Application of multivariate statistical analysis in batch processes. Engineering Science, 47, 887–898. Industrial and Engineering Chemistry Research, 40, 1641–1649.