Print version ISSN 0103-5053
J. Braz. Chem. Soc. vol.15 no.2 São Paulo Mar./Apr. 2004
João A. BortolotiI; João Carlos de AndradeI; Roy E. BrunsI, II, *
IInstituto de Química, Universidade Estadual de Campinas, CP 6154, 13083-970, Campinas - SP, Brazil
IIInstituto de Química, Universidade de São Paulo, CP 26077, 05513-970 São Paulo, SP, Brazil
Two experiment reduction procedures for split-plot designs are investigated using a data set containing 160 experiments, consisting of 80 duplicate results for the optimization of a water-acetone-N,N-dimethylformamide mixture with HCl, o-dianisidine and H2O2 reagent system for the analytical determination of Cr(VI). Stabilities of the model coefficients and ANOVA mean squares are used as quality criteria to judge the effectiveness of the procedures. Only the procedure that avoids the possibility of eliminating entire replicates for any given set of process variable conditions seems to be feasible, since it does not result in loss of valuable modeling information. Its mean square ANOVA values remain stable for up to a 30% replicate reduction whereas its model coefficients are relatively constant for even 70 % replicate reduction. Since complete split-plot designs involving both process and mixture variables require large numbers of experiments, the economy gained by performing incomplete split-plot designs makes their use more attractive.
Keywords: split-plot, optimization, ANOVA
Dois procedimentos para a redução de experimentos para o método split-plot foram investigados usando um conjunto de dados contendo 160 experimentos, consistindo de 80 duplicatas provenientes da otimização de um sistema contendo a mistura de água-acetona-N,N-dimetilformamida e os reagentes HCl, o-dianisidina e H2O2 para a determinação de Cr(VI). A estabilidade dos coeficientes do modelo e as médias quadráticas da ANOVA são usadas como critério para julgar a eficiência dos procedimentos. Somente o procedimento que evita a possibilidade de se eliminar completamente uma replicata para um dado conjunto de condições das variáveis de processo parece ser praticável, uma vez que não resulta em perda de informação fundamental da modelagem. Seus valores das médias quadráticas da ANOVA permaneceram estáveis para reduções de até 30% das replicatas enquanto seus coeficientes dos modelos foram relativamente constantes para até 70% de redução das replicatas. Tendo em vista que um planejamento split-plot completo envolvendo variáveis de processo e de mistura requer um grande número de experimentos, a economia introduzida por planejamentos split-plot incompletos faz seu uso ser muito atraente.
Optimization problems in chemistry often involve both process and mixture variables. However multivariate strategies applied in chemistry are normally restricted to either process1,2 or mixture3 variables using well-known experimental designs. As a consequence, possible significant interaction effects involving both process and mixture variables cannot be detected, let alone measured. Including both process and mixture variables in an optimization procedure increases the size of the experimental program so that complete randomization in the execution of trial runs is not generally feasible. Furthermore some variables may be easily adjusted from one level to another for each new experiment, whereas others, such as some process variables like temperature for which the attainment of equilibrium conditions could delay experimentation, cannot be included in a completely random experimental design and still maintain a feasible optimization program. In split-plot procedures4-7 such operational difficulties are minimized since subsets of experiments are set-up. The subsets are performed in random order as are all the experiments in a given subset. Complete randomization is restricted since the process (or mixture) variables can be maintained at constant values for each subset and only the mixture (or process) variables are randomly adjusted, facilitating the operational procedure. However a price is paid for this simplification. The ANOVA for a split-plot design is much more complicated than the standard ANOVA, since the former has two sources of error, the so-called main-plot and sub-plot errors, in comparison with only one error source for completely randomized designs. ANOVA tables are used to calculate standard errors for model coefficients in order to determine the significant model terms. Such error determinations for split-plot designs are mathematically more complex than for completely random designs.
Replication of experiments is the most secure way for estimating errors. In split-plot experiments randomization is also restricted within replicates. In other words, one complete set of experiments is performed following the split-plot procedure. Later its replicate is performed.
Although split-plot experimentation is relatively common in agricultural experiments,8 their use in chemistry is very limited. An early article discusses mixed process variable - mixture variable designs9 and more recently our group carried out a complete split-plot design10 to determine simultaneously the optimum values of the proportions of water, acetone and N,N-dimethylformamide as a reaction medium (mixture variables) and the concentrations of HCl, o-dianisidine and H2O2 as reagents (process variables) for the analytical determination of Cr (VI) using the 450 nm absorption of the oxidized species of o-dianisidine as the system response.
The objective of this paper is to show how the split-plot experimental procedure can be simplified by reducing the number of experiments that must be performed. Two alternative approaches are investigated. Success of the reduction procedure is judged by the stabilities in the ANOVA mean square values as well as in the model coefficients as the number of experiments is reduced.
The experimental data used in the calculations are taken from Table 3 of reference 10. The split-plot design is shown in Figure 1. Ten mixture formulations were tested at each of eight different process variable combinations. The mixture formulations are represented as points within the small triangles whereas the process variable conditions are represented by the vertices of the 23 factorial cube. Since all the experiments were performed in duplicate, a total of 10×8×2 = 160 experiments were performed. The ten mixture formulations within each triangle were performed at random and a random order also was used to perform each of the eight process variable combinations. This kind of restricted randomization was carried out within each replicate set.
Calculations for the complete split-plot design were carried out as in earlier work,10 using the SAS statistical package11 and using a MATLAB-based computational program developed in the authors' laboratory.12 Calculations for the incomplete split-plot designs, which were made by randomly deleting data from the complete data set, in order to simulate possible data set reduction procedures, were carried out using another MATLAB-based program currently under development in our laboratory.
Two data deletion schemes, represented in Figure 2, were employed. Figure 2a represents a simple two process variable three mixture variable split-plot design for which only three mixture experiments are performed within each triangle. This figure shows the complete designs. Figure 2b illustrates one method of data reduction. Experiments are randomly deleted but under the condition that no point in the experimental mixture design is left without at least one measurement for all possible process variable combinations. This assures that essential modeling information is not lost on data reduction. An alternative data deletion scheme, represented in Figure 2c, was also tested. Here there was little restriction on deleting data and both replicate experiments for some mixture-process variable level combinations can be deleted. All mixture formulations are present in the design, although they may not be present for all process variable combinations. This is indicated by the blank circles in Figure 2c, which show that no experiments were carried out at those points. This deletion scheme was not adequate for calculations resulting in significant loss of modeling information. On the other hand the scheme represented in Figure 2b, that contains either solid (2 experiments) or half-solid (1 experiment) circles underwent no loss of modeling information. The detailed results are discussed below.
Results and Discussion
Two criteria were used to decide whether data measurement reduction deteriorates the quality of model determination. First, the behaviors of the most important regression model coefficients as a function of number of replicates was investigated. Ideally these coefficients should vary within their standard errors as the number of replicates is decreased. Second, the mean square values of the split-plot ANOVA were examined for different numbers of replicate measurements. It is also important that these values remain stable since they are used to determine the standard errors in the model coefficients. In order to determine which coefficients should be included in the correct model t-tests are applied to the model coefficients using these standard errors. Of course one expects to derive statistically equivalent models if the number of replicate measurements is sufficient for accurate error estimation.
Alternative experiment reduction procedure
In the complete design there are 8 process conditions with 10 mixture compositions in duplicates resulting in a total of 160 experiments. For the first incomplete design the reduction scheme represented by Figure 2c was applied to the complete experimental data set. Five possibilities were tested, with 3, 4, 5, 6 and 8 mixtures with duplicates for each process condition. First, 3 mixture compositions were randomly selected whose results in duplicates were included in the calculation. Each of the ten compositions appeared at least once to avoid the loss of too many regression degrees of freedom, resulting in a total of 48 experiments. This method was repeated for designs where 4, 5, 6 and 8 mixture compositions were randomly selected for each process condition resulting in 64, 80, 96 and 128 experiments to be included in the calculations. The generated models were compared to the models obtained using complete design.
In the complete design the mathematical model that presented the best results was the bi-linear-quadratic one. For this reason all the models determined using the incomplete designs were always compared to it. Of the incomplete designs, the model obtained using eight mixture compositions in duplicate for each process condition presented model coefficients in best agreement with those obtained from the complete design. This required the execution of 128 experiments. Statistically significant parameters were determined by comparing model coefficients with their calculated errors. On comparing the parameters of the models in Table 1 it is evident that there are large variations in values. Figure 3 shows how the parameters of the models vary with the number of replicates. The only model in reasonable agreement with the one determined using the complete design is the eight duplicate design.
Preferred experiment reduction procedure
The experiment reduction scheme, represented by Figure 2b, applied to the complete experimental data set resulted in the split-plot ANOVA mean square values listed in Table 2 and significant model coefficients given in Table 3. The mean square values are graphed in Figure 4 as a function of the number of duplicates whereas the corresponding model coefficient graph is shown in Figure 5. All the mean square (MS) values remain relatively stable when seven to ten duplicates are used as seen Figure 4. Below six duplicate determinations the sub-plot mean square value falls from above 0.7 to a plateau around 0.55. The main-plot mean square shows a larger sensitivity to replicate reduction when the total number of duplicates is six or lower whereas seven to ten duplicate designs produce very similar main-plot MS values. The replicate mean square sum is seen to remain close to the zero line of the graph in Figure 4 for six to ten duplicate experiments but undergoes an order of magnitude increase for design results with only four or five duplicates. Inspection of the values in Table 2 permits a more detailed investigation of the changes in the replicate, main-plot error, main by sub-plot interaction and sub-plot error mean squares. They are all close to the zero line in Figure 4. The main-plot, sub-plot and main x sub-plot interaction mean squares show similar behavior. They have relatively constant values (intervals of 0.22 to 0.24, 0.71 to 0.77 and 0.01 to 0.02, respectively) for experiments with seven to ten duplicates and different but relatively constant values (0.15 0.23, 0.54 0.56 and 0.004 0.006) for three to six duplicate experiments. The sub-plot error values are lower than 0.005 for seven to ten duplicate experiments and higher than 0.010 for three to six duplicate designs. The MS of main-plot error undergoes the most drastic variation in its value on changing from six to five duplicates.
In general, the ANOVA MS of the split-plot values is relatively insensitive to change as long as seven to ten replicate experiments are performed. Larger variations occur when six or less duplicates are included in the calculation.
The model coefficients, however, are much less sensitive to replicate reduction as can be seen by studying Table 3 and Figure 5. The most important coefficients, b1, b2, b13 and b11 of the truncated model,
all have relatively constant values as indicated by their essentially horizontal behaviors in Figure 5. Considering designs with as little as three and as many as ten duplicates, b1 varies between 0.914 and 0.968; b2 between 0.167 and 0.173; b13 by 0.909 and 1.054 and b11 from +0.225 to + 0.335. These variations have magnitudes close to the corresponding standard errors of the model coefficients reported in Ref. 10, namely ±0.04(b1), ± 0.01(b2), ± 0.09(b13) and ± 0.04(b11).
A 30% reduction in the number of replicate experiments was possible for the water-acetone-N,N-dimethylformamide mixture with the HCl, o-dianisidine and H2O2 reagent system for the determination of Cr(VI) without serious deteriations in model coefficient or ANOVA mean square values. While this corresponds to a modest decrease in 15% of all the experiments needed to perform this split-plot design it does show that these designs provide models that are quite robust to missing data that could result in the execution of large numbers of experiments. Other procedures for reducing the number of experiments in split-plot designs, such as the use of cumulative probability graphs, are also presently under investigation.
The authors thank FAPESP for providing financial support. J.A.B. thanks FAPESP for a doctoral fellowship, and J.C.A. and R.E.B. thank CNPq for research fellowships.
1. Box, G. E. P.; Hunter, W. G.; Hunter, J. S.; Statistics for Experimenters, John Wiley: New York, 1978. [ Links ]
2. Barros Neto, B.; Scarminio, I.S.; Bruns, R. E.; Como Fazer Experimentos, Editora da Unicamp: Campinas, Brazil, 2001. [ Links ]
3. Cornell, J. A.; Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data, 2nd ed., Wiley: New York, 1990. [ Links ]
4. Cornell, J. A.; J. Qual. Technol. 1988, 20, 2. [ Links ]
5. Cornell, J. A.; J. Am. Stat. Assoc. 1971, 66, 42. [ Links ]
6. Montgomery, D. C.; Design and Analysis of Experiments, 3rd ed., Wiley: New York, 1991, p. 468. [ Links ]
7. Wooding, W. M.; J. Qual. Technol. 1973, 5, 16. [ Links ]
8. See, for example: Korwar, G. R.; Radder, G. D.; Agroforest. Syst. 1994, 25, 95; Scherer, E. E.; Rev. Bras. Cienc. Solo 1999, 22, 4; Mahungu, S. M.; Diaz S.; J. Food Sci. 1999, 47, 279. [ Links ]
9. Duineveld, C. A. A.; Smilde, A. K.; Doornbos, D. A.; Anal. Chim. Acta 1993, 227, 455. [ Links ]
10. Reis, C.; Andrade, J. C.; Bruns, R. E.; Moran, R. C. C. P.; Anal. Chim. Acta 1998, 369, 269. [ Links ]
11. SAS Institute (2001); SAS User's Guide, Ver. 8.02. SAS Institute Inc., USA.
12. Bortoloti, J. A.; MSc. Dissertation, Universidade Estadual de Campinas, Brazil, 2000. [ Links ]
Received: March 31, 2003
Published on the web: March 24, 2004
FAPESP helped in meeting the publication costs of this article