## Brazilian Journal of Chemical Engineering

##
*Print version* ISSN 0104-6632*On-line version* ISSN 1678-4383

### Braz. J. Chem. Eng. vol.17 n.4-7 São Paulo Dec. 2000

#### http://dx.doi.org/10.1590/S0104-66322000000400008

**SOFT SENSORS WITH WHITE- AND BLACK-BOX APPROACHES FOR A WASTEWATER TREATMENT PROCESS**

**D.Zyngier ^{1}, O.Q.F. Araújo^{2} and E.L.Lima^{1* }**

^{1}Programa de Engenharia Química /COPPE/ Universidade Federal do Rio de Janeiro,

P.O.Box 6850, 21945-970, Telephone : +(55)(21) 590-2241, Fax : +(55)(21) 290-6626

Rio de Janeiro - RJ, Brazil

^{2}Escola de Química, Departamento de Engenharia Química, Universidade Federal

do Rio de Janeiro, Av. Brigadeiro Trompovski s/n, Bl. E, 21945-970,

Telephone: +(55)(21) 590-3192, Fax: +(55)(21) 590-4991,

Rio de Janeiro - RJ, Brazil

*(Received: September7, 1999 ; Accepted: April 6, 2000 )*

Abstract- The increasing degradation of water resources makes it necessary to monitor and control process variables that may disturb the environment, but which may be very difficult to measure directly, either because there are no physical sensors available, or because these are too expensive. In this work, two soft sensors are proposed for monitoring concentrations of nitrate (NO) and ammonium (NH) ions, and of carbonaceous matter (CM) during nitrification of wastewater. One of them is based on reintegration of a process model to estimate NO and NH and on a feedforward neural network to estimate CM. The other estimator is based on Stacked Neural Networks (SNN), an approach that provides the predictor with robustness. After simulation, both soft sensors were implemented in an experimental unit using FIX MMI (Intellution, Inc) automation software as an interface between the process and MATLAB 5.1 (The Mathworks Inc.) software.

Keywords: estimators, stacked generalization, neural networks, wastewater treatment.

**INTRODUCTION**

Since the beginning of industrialization era until the 70s, there had never been a great concern with wastewater treatment. However, because of progressive deterioration of water resources, more efforts are being devoted to process effluents before they are discarded. Governmental agencies developed stricter regulations specifying effluent quality, and so more complex wastewater treatment plants had to be built in order to remove specific nutrients, like nitrogen and phosphorus (Gernaey *et al.*, 1998).

However, in a wastewater treatment unit, it may be very difficult to measure directly some process variables, either because there are no physical sensors available, or because these are too expensive. A special case is faced in nitrogen removal treatment where specified levels are set by legislation, and should hence be monitored. An alternative is to employ soft sensors in such cases to provide online estimates of difficult-to-measure variables through calculations that may involve auxiliary measurable variables. Studies on soft sensors have been resumed in the last decade due to developments in computer processing capability, which reduced required time for mathematical calculations. A number of estimators can be used as soft sensors, being Extended Kalman Filters (Wilson *et al.*, 1998) and neural networks (Albiol *et al.*, 1995) just two of the many possibilities.

In this work, two soft sensors are proposed and implemented in an experimental unit. The monitored process is the nitrification of wastewater, which is of great importance during the nitrogen removal phase of biological treatment of wastewater. The variables inferred by the soft sensors are the concentrations of nitrate (NO) and ammonium (NH) ions, and of carbonaceous matter (CM). These were chosen because their maximum allowed concentrations are specified in legislation.

**ESTIMATORS**

State estimators can be based on a process model from which process variables can be inferred. This is called a white-box approach, where the physical correlations between process inputs and outputs are known. As an example, there is the Kalman filter (Wilson *et al.*, 1998), which accounts for process and measurement noise for inferring process variables.

The Kalman filter has originally been developed for linear systems. It is based on two main steps: a prediction step, where the future (one step ahead) values of process variables are predicted through the process model, and a correction step, where the influence of the difference between the predicted and measured values for the variables are pondered through the filters gain matrix. This is where the stochastic nature of the process is considered, represented by the process and measurement noise covariances.

**Extended Kalman Filter (EKF)**

When the process is nonlinear, a linearization can be conducted using the nonlinear model in the prediction step and the linearized version when calculating the gain matrix, which is used in the correction step. This is a description of the Extended Kalman Filter (EKF), which has been used with success in many highly nonlinear processes, such as polymerization and bioprocesses (Crowley and Choi, 1998; Robertson *et al.*, 1996; Woo *et al.*, 1996; Zorzetto and Wilson, 1996; Myers *et al.*, 1996; Kozub and MacGregor, 1992).

Due to specificities of a system (for example, infrequent or delayed measurements), some variations on the EKF have been developed. Kozub and MacGregor (1992), for instance, obtained good results for state identification in a polymerization process using an EKF in which a second EKF was coupled to improve the initial state estimations (which are needed to start the EKF algorithm). Crowley *et al.* (1998) suggested that the estimates for the variables with smaller sampling period be recalculated each time a measurement of the variable with larger sampling period is available. Another proposal (Lukasse *et al.*, 1999) was to re-estimate model parameters using the Kalman filter, and then re-integrate the process model. This scheme was applied to a wastewater treatment plant composed of a continuous flow reactor.

**Neural Networks**

Neural networks (NN) are one of the best known black-box predictors. Their complex structure makes it possible to establish highly nonlinear correlations between input and output variables, thus being able to represent a wide variety of processes (Morris *et al.*, 1994).

A black-box approach is understood as one where the physical significance of the correlation between process inputs and outputs is unknown. A great advantage of this approach is that there is no need to develop a mathematical model to describe the process, thus saving time and effort.

Nevertheless, care should be taken when using neural networks. Because the mathematical correlation obtained lacks physical meaning, the extrapolation capacity of these predictors is generally very limited. Therefore, data used for developing the neural network need to represent the whole operation range of interest, so that it "learns" what the best correlation between the variables is (Glassey *et al.*, 1994).

Depending on the process nature and its operating scale, obtaining large amounts of representative data can be a very difficult or even impossible task (Schenker and Agarwal, 1996; Zhang *et al.*, 1994).

**Stacked Generalization**

A challenge when developing a new NN is choosing its architecture: it is difficult to know, *a priori*, which should be the chosen configuration in order to capture most process characteristics.

If a different configuration (number of hidden layers and neurons, NN inputs, activation function) is chosen, NN performance will probably be affected. To select the best architecture, each candidate NN must be tested with a validation data set after the parameter adjustment phase ("training"). The development of different types of NN with subsequent validation may be very time-consuming.

An approach that has shown good results is to combine several individual and architecturally simpler neural networks to provide improved robustness to the stacked neural network (SNN). Wolpert (1992), who has introduced stacked generalization, states the purpose of this technique as to achieve greater generalization accuracy as opposed to learning accuracy. This means that, even though the predictor may not have the best performance on training data, it is able to adequately capture process behavior, being thus more robust.

**ESTIMATOR BASED ON PROCESS MODEL**

A soft sensor was built based on a simplified model of the process, previously developed by Coelho (1998). Since the EKF had a number of successful applications to bioprocesses (Crowley and Choi, 1998; Myers *et al.*, 1996; Zorzetto and Wilson, 1996), it was first chosen as soft sensor for this system. The inferred variables were initially concentrations of nitrate and ammonium ions, and of carbonaceous matter. FIX MMI automation software (Intellution, Inc.) was configured in order to have a "user-friendly" interface in the experimental unit, thus making it easy to monitor the process.

This system has two sets of delayed measurements: each ion concentration can be measured offline, with a sampling period of 60 minutes, while carbonaceous matter, which is determined by Chemical Oxygen Demand method, can only be determined at 3 hour intervals. In order to deal with these delayed, offline measurements, two different alternative soft sensors (SS) were considered:

SS1: A reintegration method, where the process model is reintegrated at each CM update. The previous CM values (from the sampling instant to the updating moment) are replaced by the values obtained by reintegration, while the other states are estimated by an EKF. When there is a NH and NO update, previous values estimated for the three variables are replaced by the ones obtained by the reintegration.

SS2: An Iterated EKF, consisting of, at the moment of CM update, re-estimating NH and NO through EKF (considering their intermediate measurements) until the present moment is reached.

Experimental data from Coelho (1998) was used to evaluate the performance of each soft sensor. It was verified that SS1 showed superior performance when compared to SS2 for NH and NO estimation, but neither of the soft sensors achieved good results for CM estimation.

The reason CM inference was not satisfactory is probably due to the weak mathematical influence that the nitrogen compounds have on CM in the process model. Hence, CM estimation was then approached as a black-box model which, combined to NO and NH inference, formed a gray-box predictor (Fig. 1).

A feedforward neural network was employed, which, according to Montague and Morris (1994), is the most commonly used network in studies with neural networks.

Since there is experimental evidence of an indirect correlation between NH, NO and CM, and as the process has two distinct patterns through time: the filling phase and the batch reaction phase, two networks were trained, both with three inputs (NH and NO from the EKF, and dissolved oxygen, which is an online measurement), six hidden neurons and one output (CM). The activation function used was the hyperbolic tangent function.

**ESTIMATOR BASED ON STACKED NEURAL NETWORKS**

Stacked Neural Networks are built through the combination of several individual neural networks. An example of a SNN structure composed of three individual NN can be seen in Figure 2, where w_{i} is the weight of the ith stacked NN.

Some authors have recently used Stacked Neural Networks (SNN) as predictors (Zhang et al., 1998 and 1997; Sridhar et al., 1996). They seem to agree on the fact that no general rule exists on how to determine the stacking weights in order to combine estimators. Sridhar *et al.* (1996) have used Least Squares Regression, while Zhang *et al.* (1998 and 1997) recommended using Principal Component Regression (Eq. 4). There are other possibilities that may be used, like adopting equal weights for all stacked NN (Eq. 1) (which is equivalent to calculating an average of the individual NN outputs), or using a weighed average calculated with the individual NN training error (Eqs. 2 and 3).

In this work, four different types of stacking weights were evaluated:

(1) |

(2) |

(3) |

(4) |

where NS corresponds to the number of stacked NN, NRSS is the normalized residual sum of squares and t_{i} are the principal components of the outputs of each individual NN.

Equation 1 corresponds to averaging the individual NN outputs to obtain SNN prediction. Equations 2 and 3 represent two different ways to calculate a weighed average of individual NN, both assuming that the larger the NN training error, the smaller its weight, while Equation 4 is the mathematical expression for the Principal Component Regression technique.

Figure 3 shows the evolution of the standard deviation in relation to the mean of the weights assigned to each NN with the increase of NS for a given data set. One can observe that, as NS increases, stacking weights of types 2 (W2) and 3 (W3) standard deviations tend to zero. This means that, assuming that there are enough individual NN, it is practically equivalent to adopt the average of the individual NN outputs (Eq. 1) or one of the proposed weighed averages (Eqs. 2 and 3), since W1 standard deviation is always zero, for all the individual NN are always assigned the same weights in this case.

It is worth observing that, although one could have expected weighed averages based on training errors to show superior performance when compared to a simple average, for large enough NS the achieved results deny these expectations. A plot of each individual NN contribution to SNN prediction error display a pattern as NS increases: NN error contributions obtained by using W1, W2 and W3 show very similar performances. However, when using stacking weight of type 4 (W4, obtained using PCR technique), no evident pattern was observed.

Twenty-five individual NN were trained for CM estimation for this system. Data were separated into training and validation sets. Figure 4 shows the 25 individual NN errors (calculated through normalized residual sum of squares - NRSS) for the training and validation sets.

An important aspect when stacking NN is to know when the stacking process should stop; that is, when the ratio cost/benefit stops increasing, where cost is related to the parameters and benefit is the reduction in NRSS. The stacking process is pictured in Figure 5, as percentage of reduction in NRSS achieved as NS increased, where the NN were stacked in increasing order of architecture complexity. In this case, 18 individual NN were chosen as the optimal number of networks to stack.

An interesting point is that quite a few of the individual NN with many parameters did not do so well in the validation phase. Although they might be expected to diminish SNN performance, that only happened when using W4. The other stacking weight types did not seem to have been affected by them.

The same procedure was repeated for NO and NH concentrations, where the optimal number of networks to stack were16 and 14, respectively. The individual neural networks structures can be observed in Table 1 where DO stands for dissolved oxygen concentration, t stands for time, and ORP, for the variation in the Redox potential. The variables CMf, NOf and NHf stand for the respective CM, NO and NH feed concentrations.

**APPLICATION TO A BIOPROCESS**

Coelho (1998) developed an optimal operational strategy for an experimental Sequence Batch Reactor (SBR) bench-scale unit (35 liters), which can be seen in Figure 6. The author verified that there was maximum nitrogen removal when the filling was made by pulses and with aeration, followed by anoxic batch reaction phase. Coelho also developed a simplified mathematical model for this unit, and adjusted its parameters with experimental data, which were used in the development of the soft sensors herein presented.

**Gray-Box Predictor**

The performance of the gray-box predictor on a validation data set can be seen in Figure 7, where CM concentration was evaluated through Chemical Oxygen Demand (COD) method.

One of the advantages offered by the proposed soft sensor is that it is based on a simplified process model to infer NH and NO, which makes it easier to maintain and does not require the adjustment of many parameters. Besides, phenomenological models are usually able to represent the system even during small deviations in process operation conditions.

**Black-Box Predictor**

The black-box soft sensor presented in Figure 8 also showed good results for the validation data set. Its development and operation is simpler than those of the gray-box predictor, since a process model is not required and there is also no need for updating during the reaction.

It must be remembered that this soft sensor should be retrained each time process operating conditions are altered, since its performance is strongly dependant on the data set used for training purposes.

**CONCLUSIONS AND DISCUSSION**

In this work, two soft sensors were proposed for inferring variables that are difficult to measure online in a wastewater treatment process.

The first one was based on the reintegration of a simplified process model at each delayed nitrate and ammonium ion measurements, while carbonaceous matter was inferred by a feedforward neural network. Good results were achieved, proving that this soft sensor, besides not having any parameters to adjust, is robust, since the process model has some deviation in relation to experimental data. Nevertheless, phenomenological models may require greater effort during the development phase.

The second soft sensor overcomes this problem by using black-box models. Since its inputs are variables that are available online, this sensor does not require updating with offline measurements, which makes it easier to operate. The disadvantages of using this soft sensor are the same of using any black-box model: it cannot be used for predictions outside the range covered by training data values. That implies on retraining the predictor each time process conditions are altered.

**ACKNOWLEDGEMENTS**

The authors would like to thank CAPES (Fundação Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and FAPERJ (Fundação de Amparo à Pesquisa no Estado do Rio de Janeiro) for their financial support.

**REFERENCES**

Albiol, J., Campmajó, C., Casas, C. and Poch, M., Biomass Estimation in Plant Cell Cultures: A Neural Network Approach, Biotechnol. Prog., 11, 88-92 (1995). [ Links ]

Coelho, M.A.Z., Wastewater Nitrification Process Modeling and Optimization in a Sequential Batch Reactor, Ph.D. Thesis - COPPE/UFRJ (1998). [ Links ]

Crowley, T.J. and Choi, K.Y., Experimental Studies on Optimal Molecular Weight Distribution Control in a Batch-Free Radical Polymerization Process, Chem. Engng. Sci*.*, 53, No.** **15, 2769-2790 (1998). [ Links ]

Gernaey, K., Vanderhasselt, A., Bogaert, H., Vanrolleghem, P. and Verstraete, W., Sensors to Monitor Biological Nitrogen Removal and Activated Sludge Settling, Journal of Microb. Methods, 32, 193-204 (1998). [ Links ]

Glassey, J., Montague, G.A., Ward, A.C. and Kara, B.V., Artificial Neural Network Based Experimental Design Procedures for Enhancing Fermentation Development, Biotech. Bioeng*.*, 44, 397-405 (1994). [ Links ]

Kozub, D.J. and MacGregor, J.F., State Estimation for Semi-Batch Polymerization Reactors, Chem. Engng. Sci*.*, 47 No. 5, 1047-1062 (1992). [ Links ]

Lukasse, L.J.S., Keesman, K.J. and van Straten, G., A Recursively Identified Model for Short-Term Predictions of NH4/NO3 - Concentrations in Alternating Activated Sludge Process,* *J. Proc. Control, 9, 87-100 (1999). [ Links ]

Montague, G. and Morris, J., Neural-Network Contributions in Biotechnology, TIBITECH, 12, 312-324 (1994). [ Links ]

Morris, A.J., Montague, G.A. and Willis, M.J., Artificial Neural Networks: Studies in Process Modelling and Control, Trans. IChemE*.*, 72 Part A, 3-19 (1994). [ Links ]

Myers, M.A., Kang, S. and Luecke, R.H., State Estimation and Control for Systems with Delayed Offline Measurements, Comp. Chem. Engng*.*, 20, No. 5, 585-588 (1996). [ Links ]

Robertson, D.G., Lee, J.H. and Rawlings, J.B., A Moving Horizon-Based Approach for Least-Squares Estimation, AIChE Journal, 42, No. 8, 2209-2224 (1996). [ Links ]

Schenker, B. and Agarwal, M., Cross-Validated Structure Selection for Neural Networks, Comp. Chem. Engng*.*, 20, No. 2, 175-186 (1996). [ Links ]

Sridhar, D.V., Seagrave, R.C. & Bartlett, E.B., Process Modeling Using Stacked Neural Networks, AIChE Journal, 42 No. 9, 2529-2539 (1996). [ Links ]

Wilson, D.I., Agarwal, M. and Rippin, D.W.T., Experiences Implementing the Extended Kalman Filter on an Industrial Batch Reactor, Comp. Chem. Engng*.*, 22, No. 11, 1653-1672 (1998). [ Links ]

Wolpert, D.H., Stacked Generalization, Neural Networks, 5, 241-259 (1992). [ Links ]

Woo, W.W., Svoronos, S.A., Sankur, H.O., Bajaj, J. and Irvine, S.J.C., In-Situ Estimation of MOCVD Growth Rate via a Modified Kalman Filter, AIChE Journal, 42, No. 5, 1319-1340 (1996). [ Links ]

Zhang, Q., Reid, J.F., Litchfield, J.B., Ren, J. and Chang, S.-W., A Prototype Neural Network Supervised Control System for *Bacillus thuringiensis* Fermentations, Biotech. Bioeng*.*, 43, 483-489 (1994). [ Links ]

Zhang, J., Martin, E.B., Morris, A.J. and Kiparissides, C., Inferential Estimation of Polymer Quality Using Stacked Neural Networks, Computers Chem. Engng., 21, S1025-S1030 (1997). [ Links ]

Zhang, J., Martin, E.B., Morris, A.J. and Kiparissides, C., Prediction of Polymer Quality in Batch Polymerization, Chem. Engng. Journal, 69, 135-143 (1998). [ Links ]

Zorzetto, L.F.M. and Wilson, J.A., Monitoring Bioprocesses Using a Hybrid Models and an Extended Kalman Filter, Comp. Chem. Engng., 20, S689-S694 (1996). [ Links ]

*To whom correspondence should be addressed