TUTORIAL FOR MIXTURE-PROCESS EXPERIMENTS WITH AN INDUSTRIAL APPLICATION

This article presents a tutorial on mixture-process experiments and a case study of a chemical compound used in the delay mechanism for starting a rocket engine. The compound consists in a threecomponent mixture. Besides the mixture components, two process variables are considered. For the model selection, the use of an information criterion showed to be efficient in the case under study. A linear regression model was fitted. Through the developed model, the optimal proportions of the mixture components and the levels of the process variables were determined.


INTRODUCTION
Formulations of Mixture Experiments (ME) are commonly found in chemical, pharmaceutical and food industries, as well as in other industrial segments. In those experiments, the factors are proportions of the components in a mixture and the response is a variable that characterizes the quality of the product, assumed as a function of component proportion. In those experiments, the sum of component proportions always equals to one. In certain situations, there may be other factors, in addition to the mixture components, that affect the characteristics of the mixture. Such factors are named process variables, which are frequently included in the experiment as factorial designs. Therefore, we intend to determine not only the optimal proportions of the mixture components, but also the optimal levels of the process variables. The aim of this article is to provide a tutorial on Mixture-Process Experiments (MPE).
A chemical mix experiment will be presented, as part of a subsystem of a mechanism to delay the activation of a rocket engine, described in Vieira & Dal Bello (2006) and Dal Bello & Vieira (2008). The aim of the experiment is to find the ideal formulation, so that the expected burning time value is according to the design's specification, that is, 8 seconds. Such time is optimal for *Corresponding author the rocket's range. As it will be seen, several formulations lead to a time of 8 seconds, being thus convenient to find a formulation that provides the lowest variance for future responses. The mixture components are Zarfesil, Ground Glass and Nitrocellulose. In addition to these mixture variables, other two variables that may also affect the characteristics of the mixture were also taken into consideration. The first is the granulometry of the mixture; the second variable consists of the insertion -or not -of an orifice for gas escape during the burn reaction, making this reaction always to occur at a pressure which is approximately the atmospheric pressure. Cornell (2002) is the main reference on ME, being the Chapter 7 dedicated to MPE cases. In it, a comprehensive and detailed exposition can be found. Myers & Montgomery (2002) dedicate Chapters 12 and 13 to ME and MPE, thus comprising a good introduction to the topic. Piepel (2004) summarizes a survey related to mixture experiments for a period of 50 years, ranging from 1955 to . Prescott et al. (2002 propose a quadratic model as an alternative to the models traditionally used in ME (Scheffé models). Cornell (2000), Cornell (2002, Chapter 6), Cornell & Gorman (2003) and Khuri (2005) carried out comparative studies between models that they named as slack-variable models and Scheffé models. Piepel (2007) compares the CSLM (Component Slope Linear Model) with the SLM (Scheffé Linear Model) and the CLM (Cox Linear Model). They conclude that the models SLM, CLM and CSLM are mathematically equivalent and provide the same statistics for a given ME. The differences lie in the interpretations of their coefficients.

LITERATURE REVIEW
Goos & Donev (2006) describe an algorithm to plan experiments in blocks involving mixtures. They show that, for restricted and unrestricted experimental regions, the resulting design of experiments is statistically more efficient than the options of experiments in blocks presented in the literature. Goos & Donev (2007) describe an algorithm to plan split-plot experiments in cases involving mixture and process variables. They use an optimization criterion for the choice of experimental points and show that it is preferable to spread the replications all over the experiment region, instead of concentrating them in central points. in addition to the mixture components and process variables (controlled factors), there are uncontrolled factors in the productive process (noise variables), although they may be controlled in laboratory experiments. The authors address models that allow them to choose the controllable variable values (mixture and process) that make the process more robust in relation to the noise variables.
The restrictions presented in Equation (1) are shown in Figure 1, in the event of a mixture of two and three components. The feasible region of the two-component mixture is represented by a line segment and, in case of a three-component mix, it is represented by a triangle. Generally, in case of three-component mixtures, the restricted experimental region is represented by using a 3-coordinate system, as shown in Figure 2. Each edge of the triangle corresponds to a binary mixture, and the vertices of the triangle correspond to the formulations with pure components. The possible ternary mixtures are located inside the triangle. In this case, only two dimensions are needed to represent the experiment. As each component is represented by a vertex, a geometric figure with three apexes and two dimensions -that is, an equilateral triangle -represents the restricted factor space of a ternary mixture.
In many ME there are limitations in the component proportion, making the experimental space a sub-region of the original space. Therefore, upper and/or lower limits are established in the proportions, represented as follows: where L i is the lower limit and U i is the upper limit of the component proportion i.   When the upper and lower limits for the proportions of one mixture are established, the experimental region is reduced to a sub-region of the original region. In these cases, the coordinates of the sub-regions may be redefined in terms of "pseudo"-components. The pseudocomponents are defined as a function of the original components and of one of their limits (lower or upper).
We then have two types of pseudocomponents: the L-pseudocomponents in relation to the lower limit, and the U-pseudocomponents in relation to the upper limit. According to Cornell (2002, p. 134), the main reason to use pseudocomponents is that it is usually easier to plan the experiment and fit the model. Myers & Montgomery (2002, p. 655) recommend the use of pseudocomponents to fit mixture models where there are component constraints, resulting in mild to high levels of multicollinearity. Generally speaking, a mixture model that uses pseudocomponents will have lower levels of multicollinearity than the same model with the original components.
The L-pseudocomponents are defined as: where L = q i=1 L i . In order to obtain original components (x i ), just use the Equation (3): The U-pseudocomponents are defined as: where In order to calculate respective original components (x i ), just use the Equation (5): Where there are upper and lower constraints simultaneously, the choice of using L-pseudocomponents, v i , or U-pseudocomponents, u i , depends on the shape of the experimental region. Where (1 − L) < (U − 1), the choice if for L-pseudocomponents, and where (1 − L) ≥ (U − 1), the choice is for U-pseudocomponents (Cornell, 2002, p. 160).

Models for Mixture Experiments
The models which are traditionally used in mixture experiments are Scheffé's canonical polynomials (Cornell, 2002). Scheffé's Quadratic Model is as follows: where the β's are the models' parameter coefficients. Note that this model does not have the intercept, as it is eliminated by a simplification originated from the basic limitation presented in Equation (1).
Scheffé's Cubic Model is as follows: Scheffé's Special Cubic Model is the same as the Cubic Model, excluding the terms For the cases where there are high levels of multicollinearity, the use of slack-variable models is recommended (Cornell, 2002, Chapter 6). In a mixture of q components, being aware of the proportion of the first (q − 1) components, it is possible to determine also the proportion of component q, since Therefore, x q is defined as the slack variable, which will not be part of the composition of the model terms. In order to balance out the lack of the slack variable, an intercept is included to the model. Therefore, for a mixture of q components there are q possibilities of slack variables, that is, q different models with slack variable.
The Quadratic Model with slack variable: The Cubic Model with slack variable:

Model for Process Variables
An adequate model for r process variables z 1 , z 2 , . . . , z r , involving second-order terms is (Cornell, 2002): where the α's are the model's parameter coefficients for process variables. The experiment for the process variables may be a factorial design at two or more levels. In order to include terms with the variable z 2 j in the model, it is necessary an experiment with at least three levels of each process variable and a total number of points sufficient to fit and test the model. In order to fit a model without the variable z 2 j , considering only the main effects of the process variables and the interactions among them, only two levels of each process variable are necessary (Myers & Montgomery, 2002).

Mixture Model Including Process Variables
The experimental design is established through a combination of the mixture variables with the process variables. In addition, any model for the mixture may be combined with the model for the process variables, in order to represent mixture-process-type problems. Consider f (x) the for the mix and g(z) the model for the process variables. Then, the additive model combined for the mixture-process case is (Prescott, 2004) For example, the additive combined model, which includes Scheffé's cubic model for the mixture presented in Equation (8) and the reduced quadratic model, considering only the main effects of the process variables and the where the γ 's are the parameters for the mixture's combined model including process variables. The lower indexes of γ refer to mixture variables, whereas the upper ones refer to process variables. Observe that the combined model presented in Equation (13) does not have the intercept originated from the model for the process variables. The intercept is eliminated from the combined model, since it has a linear dependence with the terms γ 0 i x, due to the constraint presented in Equation (1). Those additive models do not take into account the effects of process variables on the linear and non-linear properties of the mixture components. Alternative models were suggested, with the introduction "crossed" terms in f (x) and g(z). The complete cross-over model is The combined multiplicative model that includes Scheffé's cubic model for mixtures and the reduced quadratic model for the process variables. This combined model is represented as follows: However, the combined models that include the additive and multiplicative model terms simultaneously may also be considered.

D-OPTIMAL DESIGN OF EXPERIMENTS
When there are constraints in the proportions of the mixture components, the use of experiments generated according to some optimization criterion is recommended (Cornell, 2002, p. 400). Cornell (2002) describes four alphabetic optimization criteria (A-optimal, D-optimal, G-optimal and V-optimal) for the choice of the experimental points. Such criteria are based on the optimization of some information matrix function W W , where W is a matrix (n × p), whose elements are the proportions of the mixture components (x i ), the levels of the process variables (z i ) and the x i and z i functions (such as interactions), where n is the number of observations in the experiment and p is the number of model parameters. The general combined mixture model with inclusion of process variables is represented in matrix form: For n observations, y is a vector (n×1) of the observations, β is a vector ( p×1) of the coefficients and ε is a vector (n × 1) of the random errors. In the classical linear regression model, ε is considered with multivariate normal distribution, that is, ε ∼ N 0, Iσ 2 .
The estimation vector for the coefficients isβ = W W −1 W y, the variance-covariance matrix is var(β) = σ 2 W W −1 . The prediction value for the response at point w (w is a matrix line W) isŷ(w) and its variance is The statistical software Design-Expert , developed and distributed by the company Stat-Ease, uses the D-optimal criterion for the choice of the experimental points. Myers & Montgomery (2002) define the D-optimal criterion using the moment matrix M = W W/n. According to these authors, a D-optimal experiment is the one that makes the determinant of the moment matrix, |M|, to be maximized. They show that the determinant of the moment matrix has the following property: With all that, and assuming that the error are normally distributed, independent and with constant variance the determinant W W is inversely proportional to the squared volume of the confidence region on the regression coefficients. When W W is small, it means that the inverse of W W is large and, therefore, the volume of the confidence region is large; thus, the estimated β is not regarded as good (Myers & Montgomery, 2002, Appendix 7).
Consequently, the D-optimal experiment design is the one that minimizes the volume of the confidence ellipsoid on β, which is achieved by maximizing the determinant W W . By observing Equation (18), we may infer that maximizing the determinant W W is equivalent to maximizing the determinant of the moment matrix, |M|.

DELAY MIX EXPERIMENT
The chemical mix under study is part of a subsystem of a mechanism to delay the ignition of a rocket engine, which evolved from the R&D stage in laboratory scale to be currently produced at industrial scale. The aim of the experiment is to find the ideal formulation, so that the expected burning time value is according to the project's specification, that is, 8 seconds. Such time is optimal for the maximization of the rocket's range. As it will be seen, several formulations lead to a time of 8 seconds, being thus convenient to find a formulation that provides the lowest variance for future responses.  Table 1 shows the experimental design of the delay mix, as well as the experiments' responses. Table 1 -Reduced experiment of D-Optimal delay mix with L-pseudocomponents.

Delay Mix Experiment with Process Variables
In addition to the mixture variables, other two variables that may also affect the characteristics of the mixture were also taken into consideration. The first one is the granulometry (z 1 ), which may be regarded as a categorical process variable, and will be varied at two levels. The granulometry currently used in the production of the delay mix is the 20-30 (level [−1]); however, the granulometry 25-30 (level [1]) is believed to provide a reduction in the variability of the burning times. The second variable (z 2 ) may be regarded as a design variable, as it is a modification of the original delay mechanism design. The insertion of an orifice for gas exhaustion during the burn reaction was suggested, making this reaction always to occur at a pressure which is approximately the atmospheric pressure. Currently, the burning reaction occurs in a confined environment, at a pressure under transient conditions. The insertion of an orifice for gas escape is expected to contribute to the reduction of response variability. That design variable may be seen as a categorical process variable, which will also be experimented in two levels (without orifice [−1] and with orifice [1]). Therefore, the problem of the delay mix may be addressed as a mixture experiment with three components, including two category process variables, which may be represented as z l = −1, 1; l = 1, 2.
With the constraints in the mixture components, the resulting experimental region becomes a subregion of the original experimental space. Under such conditions, Cornell (2002) recommends the use of some computer algorithm for the choice of experimental points according to some optimization criterion, arising out of some previously selected candidate points. The software Design-Expert 7 offers the option of planning restricted mixture experiments in the proportions of the components and including process variables. For the choice of experimental points arising out of some previously selected candidate points, the software uses the D-optimal criterion, which was presented in Section 5. For the delay mix problem, Design-Expert generated a D-Optimal experiment according to Table 2 and Figure 4, which also presented the results obtained in the random execution sequence in which the experiments were performed.
In Figure 4, each quadrant represents one possible combination of the process variables levels. These quadrants represent the plot of restricted experimental region of the mix variables in the 3-coordinate system (see Fig. 2), containing the experimental points.  At first, a model using the stepwise, forward or backward model was fitted. The backward method provided a better model than those obtained through the forward and stepwise methods.
The ME data modeling may become very complex when the region of definition of the components is very limited. When the interval of a variable is substantially smaller than the interval of other variables, there may be high level of multicollinearity. This collinearity may turn the estimators of model coefficients unstable and very inflated. In this situation, the model fitting may not be simple, as the columns of W (covariates matrix) may be nearly linearly dependent. As a result, W W may be nearly singular, and the least-squares estimators of the parameters in β may have great standard deviations and be highly correlated, besides the fact that the estimates depend on the precise location of the experimental points. In this context, stepwise, forward and backward selection of variables may result in arbitrary selection of variables that belong to the model (Harrell, 2001, p. 64). Therefore, the selection of the "best" model arising out of a set of candidate models may be very complex. One alternative is to use criteria based on the information theory. Once all the terms that may be part of the MPE model are known, we may then use model selection criteria based on Information Theory. AI C c = ln σ 2 p + n + p n − p − 2 (24) whereσ 2 p is the estimator of maximum likelihood of the error variance.   The lack-of-fit test was not significant, with a p-value of 0.9775.
As it will be seen below, the Model M2, presented in Equation (25), is better than the Model M1, presented in Equation (23).
Myers & Montgomery (2002) recommend the use of studentized residuals to check the following assumptions: normality, independence and constant variance. The studentized residuals (r i ) are defined as follows: where e i = y i −ŷ i and h ii are elements of the hat matrix diagonals H = W(W W) −1 W .
Figures 5 to 8 present the diagnosis plots used to check the adequacy of the fitted model. In the normal probability plot for the studentized residuals shown in Figure 5, we may observe that there is no indication that the normality assumption should not be accepted, as there are no points way off the alignment.
In order to check the independence assumption, there is the studentized residuals plot for the observations in the order in which the experiments were performed (see Fig. 6).
As the residuals from the plot shown in Figure 6 are randomly distributed and without any obvious trend to correlate them, there is no reason to suspect that the independence assumption is not valid. As the residuals shown in the plot from Figure 7 are randomly distributed around zero, there is no reason to suspect that the additivity assumption should not be accepted.

Normal quantiles
In order to check the assumption of constant variance, the plot of absolute values of studentized residuals versus fitted values, which is shown in Figure 8 is used.
The plot shown in Figure 8 does not present any indication of growth of variance with the increase in the fitted value.
Therefore, the adequacy of Model M2 was checked. It is also worth pointing out that the same diagnosis plots (not shown here) were built for Models M1, which enabled the verification of adequacy of this model.

RESPONSE OPTIMIZATION
This section presents the response optimization taking Models M1 and M2 into account. We may observe that the Model M2 provides the lowest variance of a future response at the point of interest. As mentioned above, a response of 8 seconds is desirable in the delay mix experiment. Several formulations may result in prediction of new response equal to 8 seconds. Consequently, a desirable aim is to minimize the variance of a new response among the combinations of formulations and process variables that result in an expected response value of 8 seconds. Figures 9 to 12 present contour plots of response prediction and standard deviation of the mean response as a function of the L-pseudocomponents and the level of process variables (z 1 = 1 and z 2 = 1), considering the Models M1 and M2.
By analyzing the plots in Figures 9 to 12, we may observe that, when the 8-second contour curve is overlapped with the contour curves of standard deviation, a minimum standard deviation of    the mean near 0.27 (Model M2) and 0.25 (Model M1) must be obtained. However, a stricter optimization procedure must be performed.
The estimation vector for the coefficients isβ = W W −1 W y, the variance-covariance matrix is var(β) = σ 2 W W −1 . The prediction value for the response at point w (w is a matrix line W) is represented byŷ(w) and its variance var ŷ(w) = σ 2 w W W −1 w. Suppose any given point w 0 in the space of the mixture components and the process variables, the model in the point may be represented as follows: y (w 0 ) = w 0 β + ε.
By analyzing the solutions, we may conclude that the two models provide practically the same mix formulation at the optimal point. However, the Model M2 is chosen, since it provides better variance (0.5757) for the estimation of future responses.

CONCLUSIONS
A tutorial for mixture-process experiments was presented, as well as the mixture experiment with the inclusion of process variables. There is a wide range of combined models to represent the problems of the mixture-process type. In addition, the constraints in mixture experiments are very common, thus making the resulting experimental region a sub-region of the original region.
The data analysis in very restricted mixture regions may be very complex due to the very nature of the mix variables, with possible multicollinearity among some of the terms of the model concerned, which may result in very unstable and inflated coefficient estimators, resulting in arbitrary selection of important variables. With a view to reducing the effect of this type of problem, the use of pseudocomponents was chosen, instead of actual components.
At first, the backward model selection method was used and, then, a model selection method based on the Information Theory, which proved to be more efficient in the case under study.
Then a model was selected, with which was possible to determine the proportion of each of the three components of the chemical mix and the level of the two process variables, so that the prediction of the delay time has minimum variance and expected value of 8 seconds.
It is worth pointing out that the proposed modification of process in relation to the granulometry of the mix and the introduction of the orifice in the delay mechanism result in the reduction of the variance of a new response at the point of interest.