Multiple Model Identification for a High Purity Distillation Column Process Based on EM Algorithm
Abstract
Due to the strong nonlinearity and transition dynamics between different operating points of the high purity distillation column process, it is difficult to use a single model for modeling such a process. Therefore, the multiple model based approach is introduced for modeling the high purity distillation column plant under the framework of the expectation maximization (EM) algorithm. In this paper, autoregressive exogenous (ARX) models are adopted to construct the local models of this chemical process at different operating points, and the EM algorithm is used for identification of local models as well as the probability that each local model takes effect. The global model is obtained by aggregating the local models using an exponential weighting function. Finally, the simulation performed on the high purity distillation column demonstrates the effectiveness of the proposed method.
1. Introduction
The increasing complexity of the process control systems poses a great challenge for process modeling and parameter identification and control. In practice, most of the industrial processes show a strong nonlinearity and a dynamic nature. Therefore, it is of interest for both academic researchers and industrial practitioners to study how to model these systems and to achieve satisfactory or even optimal control performance. Linear modeling technologies have been quite sophisticated after decades of development; however, due to the inborn nonlinearity of some production processes, the performance of single linear model based controllers or optimizers may be compromised or even unsatisfactory [1]. To overcome the limitations imposed by the nonlinearity of the process, various strategies have been developed for nonlinear process modeling. Some of the popular modeling approaches have been proposed and applied in industrial processes, such as artificial neural networks [2, 3], support vector regression and its extension least squares support vector regression [4], Gaussian process regression [5], and hybrid methods [6]. The obvious feature of these methods is that they do not need any prior knowledge about the system. However, these methods do not always capture the process dynamics well and may lose validity when the system transits among a large range of operation.
The multiple model approach is a good compromise which produces complex models or controllers by piecing a number of simpler subsystems together. Some multiple modeling strategies and applications have been investigated because they are able to represent complex nonlinear processes using the linearization technique [7]. The terminology of linear parameter varying (LPV) modeling method, which is featured by its linear structure and varying model parameters, was introduced by Shamma and Athans [8, 9]. The work on identification of multimodel LPV models can be found in much literature. For example, Zhu and Xu proposed an LPV model based on blended linear models and added weights at the input side [10, 11]. The identification of LPV models in an input-output setting with Box-Jenkins (BJ) model structure was addressed by Zhao et al. and prediction error method was employed for parameter interpolation [12]. Huang et al. studied the work of the multiple LPV model identification using several weighting functions with two scheduling variables [13]. The multiple model based recursive least square parameter identification was illustrated by Chen and Liu [14]. Jin et al. proposed a multiple model LPV approach for nonlinear systems under the framework of the expectation maximization (EM) algorithm [15]. It is an important step on LPV identification in which the model identity is treated as the missing variable in the EM scheme. Based on the EM algorithm, the local models are synthesized with process data to obtain a global model for approximating a nonlinear process. Deng and Huang extended the identification method to nonlinear parameter varying systems in which the local model takes the form of state space equation. The identification problem is solved by using the particle filter based on EM algorithm [16]. The efficiency is illustrated on numerical examples and a pilot-scale hybrid water tank. Chen et al. continued to study the uncertain scheduling variable problem in multiple model based nonlinear identification. Both the linear and the nonlinear dynamics of scheduling variables are taken into account [17].
The distillation column is considered as one of the most important unit operations in chemical engineering [18, 19]. It becomes a benchmark problem in chemical process control for nonlinear modeling technology. The difficulties in modeling and controlling distillation columns lie in their highly nonlinear characteristics. Modeling of distillation columns may be classified into three groups: fundamental modeling, empirical modeling, and hybrid modeling [18]. In this paper, a hybrid solution based on the multiple models is adopted, which combines the two main advantages of linear and nonlinear models. The distillation column model is a nonlinear global model but can be represented as a reunion of linear models, one linear model for each operating point (local model or submodel), where each local model works at different operating point and can be identified through the iteration procedure of the EM algorithm. Finally, the global nonlinear model can be obtained by using the exponential function which takes full account of each operating point.
The remainder of this paper is organized as follows: the high purity distillation column used to test the multiple model method is described in Section 2. Section 3 lays out the mathematical formulation of the multiple model approach in which the ARX model is used as the local model. In Section 4, the identification procedure under the EM algorithm is provided. The simulation on the high purity distillation column with multiple inputs is given in Section 5. Finally, conclusions are given in Section 6.
2. Description of a High Purity Distillation Column
Distillation is simply defined as a process in which a liquid or vapor mixture of two or more substances is separated into its component fractions of desired purity, by the application and removal of heat [20]. The process is based on the fact that the vapor of a boiling mixture will be richer in the components that have lower boiling points. In most cases, the distillation is operated at a continuous steady state. New feed is always being added to the distillation column and products are always being removed. A high purity distillation column is strongly nonlinear, and any realistic study should take this into account [21, 22]. Distillation column has become a favorite subject in the process control engineering field, including the areas of soft-sensor, process modeling, and control. There are a variety of configurations for distillation columns, and each performs specific types of separations [23].
- (1)
two components (binary separation);
- (2)
one feed and two products;
- (3)
constant relative volatility;
- (4)
constant molar flows (same vapor flow on all stages);
- (5)
no vapor holdup;
- (6)
total condenser.
Variable | Description | Unit | |
---|---|---|---|
1 | F | Feed rate | kmol/min |
2 | yD | Distillate top product composition | mole fraction |
3 | xB | Distillate bottom product composition | mole fraction |
4 | LT | Reflux flow | kmol/min |
5 | VB | Boil-up flow | kmol/min |

3. Problem Formulation Using Multiple Model Method
The missing data set is the random variable I = {I1, I2, …, IN} denoting the identity of each local model, namely, Cmis = {I}. The values of input variables, output variables, and the scheduling variable are observed or given a priori, so Cobs = {U, Y, W, T1:M}. The parameters to be estimated from process input and output data under the framework of the EM algorithm are Θ = {θi, oi, σ} i=1,2,…,M.
4. The ARX Model Identification under the EM Algorithm
4.1. The EM Algorithm Review
- (1)
Initialization: give an initial value of the model parameter vector.
- (2)
E-step: the Q-function is calculated, given the current parameter Θ estimated from the previous iteration,
(6) - (3)
M-step: maximize the Q-function with respect to Θ to obtain the new iterative parameter estimate.
- (4)
Iteration: step 2 and step 3 are repeated until the changes in parameters after each iteration are within a specified tolerance level.
The above steps ensure that the log likelihood function of the observed data increases at every step. So the EM algorithm is guaranteed to converge to some maximum of the likelihood function [15]. The convergence performance under general conditions is given in Wu [27]. The flowchart of the algorithm is shown in Figure 2.

4.2. Formulation of the Multiple Model Approach
- (1)
logp(yk∣xk, Θi),
- (2)
p(Ik = i∣wk, T1:M, oi),
- (3)
,
4.3. The Summary of the Identification Procedure
Now we give the identification procedure under the frame of the EM algorithm as follows.
(1) Initialization. Set the initial value of parameters to be estimated.
(2) E-step. Using the current parameter Θold, calculate , where p(Ik = i | wk, T1:M, oi) can be computed by (5) and logp(yk | xk, Θi) is calculated from (15).
Here a nonlinear numerical optimization is required to obtain the optimal oi between the omin and omax. A constrained nonlinear optimization function named “fmincon” in optimization toolbox of “Matlab” is adopted to search for the optimal oi at each of the iteration steps [15].
(4) Iteration. Repeat step (2) and step (3), until the converge condition is satisfied. Here the converge condition is defined as the changes of the estimated value between two iterations which are as small as the level 10−6.
5. A Distillation Column Simulation
Nonlinear system identification is much more complex than linear system identification. In this section, a high purity distillation column is used to test the performance of the multiple model approach based on the EM algorithm.
It has been shown that the purer the products get, the more nonlinear the system becomes [20]. A distillation column is also a typical example for a MISO system in which there are strong interactions between the variables. In this paper, choosing LV-configuration is motivated by the fact that the LV-configuration is the most commonly used in industrial practice.
Here, the process input variables available for control purpose are the reflux (LT) and the boil-up rate (VB). The top product composition (yD) variable is the output. The disturbances to a distillation column can come from many sources. Besides the two manipulated variables, they can come from the feed (feed rate and feed composition), from the pressure inside the column and from the cooling water, and so forth. The distillation column is very sensitive to the feed rate disturbance. The feed rate is assumed to be measurable; therefore, its effect can be incorporated in the EM algorithm. In this paper, the feed rate F is treated as the scheduling variable [23]. When the column is operated over a relatively wide operating region instead of at a single set point, it reveals a significant nonlinear behaviour.
In the simulation, white noise with a variance of 0.05 is added in the simulated output data for unknown process noise. The observed input and output data for the high purity distillation column are shown in Figures 3 and 4. The scheduling variable is designed with three operating points at 0.8 kmol/min, 1.0 kmol/min, and 1.2 kmol/min, respectively, which were given in Figure 5. In order to achieve high identification performance the random binary excitation signals with small magnitude are added in the input variables. The feed rate increases by a fixed step size and no additional excitation signal is added. Then applying the multiple model modeling approach and using the EM algorithm to estimate the parameters of each local model, the simulation results of self-validation tests indicate the efficiency of the proposed method. The comparison and the weighting function curve are given in Figures 6 and 7.






To further test the performance of the identified multiple model, the cross-validation is also carried on the case. The data used for cross-validation as shown in Figures 8 and 9 are generated under different feed rates from Figure 5. Two different operating points are selected in the cross-validation simulation as in Figure 10. It can be seen from Figure 11 that the real process data and the model prediction are in good agreement with each other which effectively confirms the efficiency of the proposed algorithm.





6. Conclusions
A multiple model modeling algorithm based on EM algorithm is proposed and applied for a high purity distillation column reaction process. Taking a full consideration of the dynamic, nonlinearity, and complexity, the parameters of each local model are identified under different operating points. The global model can be obtained by using a weighting function, which adopts an exponential form about scheduling variable under the framework of EM algorithm. Simulation results show that the fusion model of multiple models based on the EM algorithm is very effective in the chemical process of a high purity distillation column.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors thank the financial support by the National Natural Science Foundation of China (nos.: 21206053, 21276111, and 61273131) and the paper is partly supported by the 111 Project (B12018), the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), the Fundamental Research Funds for the Central Universities (JUDCF10064), and Jiangsu Innovation Program for Graduates (CXZZ11_0464).