TWO-STAGE INFERENCE IN EXPERIMENTAL DESIGN USING DEA: AN APPLICATION TO INTERCROPPING AND EVIDENCE FROM RANDOMIZATION THEORY

In this article we propose the use of Data Envelopment Analysis (DEA) measures of efficiency, under constant returns to scale and input equal to unity, in the analysis of multidimensional nonnegative responses in the design of experiments. The approach agrees with the standard Analysis of Variance (Covariance) for univariate responses and simplifies the statistical analysis in the multivariate case. The best treatments provided by the analysis optimize a combined output defined by shadow prices, which are the solutions of the DEA problem. The approach is particularly useful for the analysis of intercropping (crop mixtures) experiments. In this context we discuss two examples. To properly address the issue of correlation and non-normality of DEA measurements in different experimental plots we validate the results via Randomization Theory.


Introduction
In a designed experiment one measures a response, real valued or multidimensional, and carries out a regression analysis modeling the response as a function of qualitative effects (dummy variables) and possibly some quantitative variables (Montgomery, 2004;Hinkelman & Kempthorne, 2007a, 2007bRyan, 2007). These exogenous variables, under the control of the experimenter, are called contextual variables in the Data Envelopment Analysis (DEA) literature. Typical in this class of statistical procedures are the Analysis of Variance (only qualitative effects) and the Analysis of Covariance (qualitative and quantitative variables) models. Whenever the response is multidimensional the analysis is complex and the statistical findings are hard to interpret. For this reason in many applications one notices real efforts to reduce the dimensionality of the response vector to ease the statistical analysis.
Such is the case with intercropping experiments in agriculture, where a response, such as yield, is measured for each crop in a finite set of crops that are sown together in each experimental plot. The observation in each plot is therefore a vector of correlated yields since the crops compete for available resources. Typically, treatments applied to the experimental plots are to be compared on the basis of the yield vector. Indexes like total yield value, for example, transform the response vector to a nonnegative real variable, easing the statistical analysis relative to the use of multivariate analysis. The interest in intercropping is growing, as can be seen, for instance, in Blaser  Here we suggest a new approach to assess treatment differences in any designed experiment whose objective is to optimize response in some sense, where response is defined by a vector of yields. The proposal is to use DEA to generate a measure of efficiency to each experimental plot. This one-dimensional score can be analyzed by the classical statistical methods. DEA estimates of efficiency are flexible in the sense that they can be derived imposing or not a set of weights reflecting the importance of each component of the response vector (yield). If a designed experiment has a real valued nonnegative response, the DEA analysis as proposed here is equivalent to the usual statistical analysis carried out in Analysis of Variance (Covariance) models. In the multivariate case, besides simplifying the analysis, it endows the statistical conclusions with optimal properties in the sense that best treatments represent best yield practices relative to the complete set of treatments under analysis.
The exposition is divided into four sections. Section 2 defines the DEA efficiency measurements we deal with in the article. Section 3 shows the Analysis of Variance for two intercropping experiments using efficiency measurements (seen as scores or indexes) as responses to treatments. Section 4 is on validation of the statistical data analysis via Randomization Theory. Finally, Section 5 summarizes the main conclusions and the suggestions put forward in the article.

Data Envelopment Analysis
From an Economics point of view, DEA has for objective the assessment of efficiency of a set of production units, the so-called Decision-Making Units (DMUs). DEA uses linear programming problems to identify DMUs that represent the best production practices in the sense that they make the best use of the inputs available to produce outputs. These units are considered efficient and act as benchmarks to the other DMUs.
The efficiency of production of a DMU, in a typical DEA analysis, is defined by the ratio of a weighted average of the outputs by a weighted average of the inputs. These weights are shadow prices. They are distinct for outputs and inputs, and are determined by linear programming. For each DMU, the linear programming problem is set in such a way to maximize the efficiency ratio. Subjective or preconceived perceptions of the investigator about the relative importance of outputs and inputs may be included as restrictions in the linear programming problem.
There are two classical DEA models, the CCR (Charnes et al., 1978) and the BCC (Banker et al., 1984). The two models differ according to the convexity restriction imposed in the BCC DEA model. Two orientations are possible in any DEA model. One may look for maximal outputs given input levels, or one may look for minimal input utilization, given output levels.
In DEA modeling one has to define the DMUs, the production variables (inputs and outputs), the orientation and the model. In a designed experiment the DMUs are the experimental plots arranged in accordance with the layout of the experimental design. The nonnegative response vector of each experimental plot defines the output. Typically, treatments are applied to the experimental plots, like different management practices, and one is concerned with the assessment of treatment differences on the basis of the output vector. To measure the efficiency of each experimental plot we assume unitary inputs for all units. This idea is not strange in the DEA literature; see Lovell & Pastor (1999) Leta et al. (2005). The model we consider is CCR with output orientation. We note that there are no issues of scale here, since the BCC solution for unit inputs coincide with the CCR solution, as stated and proved by Lovell & Pastor (1999). The assumption of unitary inputs is adequate for a designed experiment. It puts all experimental plots on the same basis for comparisons, and real differences in the response vector are therefore due to error control variables and to the influence of contextual variables, like treatments and other quantitative variables affecting the experimental plots and under the control of the investigator. Error control is typically achieved by design choice. In this context, using DEA scores as the response variable, the effects of contextual variables are to be assessed in a second stage, using a linear model where the classical assumptions (Steel et al., 1996) of the Analysis of Variance (Covariance) are postulated for the DEA efficiency measurements.
We now define the general DEA CCR model. Assume that each DMU (experimental plot) k, 1... = k n, is a production unit that uses m inputs (non-negatives, not all zeros) ik x , 1...
The quantities v i and u j are the shadow prices (weights) of inputs and outputs, respectively.
In some instances the investigator may feel that some outputs should be regarded as more important than others. If such is the case, it is possible to define a DEA model incorporating these perceptions. As discussed in Thanassoulis et al. (2004), there are many approaches for incorporating value judgements in DEA. One of them is the Assurance Region of Type I, which introduces the relative ordering or values of the outputs, by adding the restrictions 1 + ≤ ≤ j j j j u u α β on the output shadow prices in the classic DEA models. In these restrictions, j α and j β are user-specified constants to reflect value judgements regarding the relative importance of outputs j and 1 + j , i.e., they are assigned by the decision-maker, here the investigator that is conducting the experiment. It is important to notice that the solutions for this problem (unitary input and multiple outputs) under constant returns and variable returns to scale still coincide.
We note here that is not necessary to assume an underlying frontier as a data generating process for the yield observations. We are concerned with the population of responses generated using empirical DEA for the experimental plots available for the experiment, classified according to the treatment arrangement laid out by the experimental design. Random allocation of treatment levels to experimental plots typically validate the statistical analysis resulting from the model induced by the design, which becomes the data generating model for the efficiency measurements. The main interest in the analysis is in the comparison of treatments, once these responses have been adjusted for the other quantitative covariates.
Responses may be defined as the DEA measurements themselves or by some monotonic transformation of these measurements, like ranks. However, it should be pointed out here that the design environment with yield responses does not violate typical production assumptions. It is not unreasonable to impose Assumptions A1, A4-A6 of Simar & Wilson (2007), as well as the separability condition A2 in the same article, since the covariates, in a typical design of experiments application, are measured without error and are under the control of the investigator. They are not correlated with the error term, which is the experimental error responsible for the variation among experimental units treated alike. They are unaffected by treatments and they are observed before the study (Kutner et al., 2005).
The dimension of the output vector relative to the size of the experiment is important in DEA. If the dimension is too large, one may have many efficient plots and this fact may obscure the study of treatments differences. The problem however does not exist for univariate yields and also for intercropping experiments, since the dimension of the output vector, in many applications, is two or three. The constant input helps to lessen this problem. It should be said here that a large dimension of the output vector would put hard problems also for a standard multivariate analysis of the design.
In Section 4 we continue with the discussion of the validity of the two-stage procedure involving DEA responses in a designed experiment.

Intercropping
Here we consider two experiments: a randomized complete block and a split-plot design over a factorial structure. It's important to consider the two cases since they define different levels of complexity and may lead to different results regarding the validity of statistical tests in the corresponding Analysis of Variance (ANOVA).

Intercropping of Maize and Bean
The first experiment we consider is the intercropping of maize and bean laid out as a randomized block design. The reference is Steel et al. (1996). The randomized block design is used when the experimental plots are grouped in such way that within each group the plots are homogeneous. Each group is called a block and the number of experimental plots within each block may be the same or not. Treatments are randomly assigned to the experimental plots within each block. Blocking is used to control experimental error. The primary interest in the analysis is on treatment differences. Suppose one has n experimental plots, grouped in b blocks, and v treatments. Each block has j k experimental plots and each treatment appears i r times. One must have If the experimental plot u has response u y corresponding (possibly) to u x , a vector of observations on controllable quantitative contextual variables or covariates, block j, and treatment i, one postulates ρ ρ ρ is the vector of block (parameter) effects, ′ β is the parameter vector associated with the covariates, and the u ε are iid random errors normally distributed with mean zero. Typically, the design matrix for the block design is singular but ′ β , and block and treatment contrasts, of the form  The DEA model should be run with 56 DMUs (14 treatments and 4 blocks), but there were missing data in two of them (treatment 7 -block 2, and treatment 9 -block 4). Thus, it was run with 54 DMUs, one single and constant input (equal to unit) and two outputs, bean yield (BY) and maize yield (MY). The analysis of variance of the experiment based on efficiency responses is shown in Table 2. The statistical analysis is carried out assuming the data generating process of the block design with DEA efficiency measurements as the observations on the dependent variable. Treatment contrasts of interest were also included in the analysis.
Efficiency seems to be the same for all variety combinations of maize and bean used in intercropping (rows (d) and (e), Table 2). Overall, there is no significant gain in efficiency of sole plots over intercropping (row (a), Table 2). Intercropped beans however are significantly more efficient than sole beans (row (b), Table 2). An interesting aspect of this intercropping experiment is that it shows a smaller risk for intercropped plots. Efficiency variation of intercropped plots is dominated by efficiency variation in sole plots. This remark is evident from Figure 1 where treatments 1-8 refer to intercropping and treatments 9-14 refer to sole crops (treatments 1-8 show less dispersion than treatments 9-14).
Efficiency responses belong to the interval (0.1], and treatment variances seem to be heterogeneous. In this context, one may perform an analysis of variance considering ranks as the response variable. The approach is non-parametric. We notice that this analysis leads to similar conclusions.

Intercropping of Groundnut and Maize
The second experiment we consider is the intercropping of groundnut and maize in a randomized block design laid out as a split plot. The underlying principle of the split plot is this: the levels of a factor A (treatment A) are applied to the experimental plots arranged as in the randomized block design. Each experimental plot is then divided into subplots to which the levels of a second factor B (treatment B) are applied. In this context each experimental plot becomes a block for factor B. The randomization is carried out in two stages. First the levels of A are randomized over the (whole) experimental plots. Then the levels of B are randomized over the subplots. The reference is Steel et al. (1996).
The data generating process for the split plot assuming the randomized block design is as follows. Let µ ρ α δ αδ represent an overall mean, block effects, factor A effects, factor B effects and the interaction A*B (joint) effects. As before, the design matrix is singular but all parametric functions of interest are estimable. Covariates may also be superimposed in the context of the design layout. Table 3 shows production and efficiency data for an intercropping experiment involving maize and groundnuts. Response for each crop is measured in tons/acre. The experiment was carried out in the substation of Morwa, in Samuru, Nigeria. It is described in Carvalho (1988). The whole experimental plot treatment levels are defined by factor A with 5 levels, which are repeated in 4 blocks. The levels of A are defined by the plant densities (maize: groundnut) 1:3, 1:2, 2:3, 4:9, and 1:6. The subplot treatment, factor B, comprises two levels of nitrogen treated as qualitative. We denote by 1-5 the plant densities and by 1-2 the nitrogen levels. Figure 2 shows clearly an increase in efficiency due to nitrogen level 2 independently of the plant density (treatments Tx2 have higher response that treatments Tx1). It is also evident the dominance of plant density 4. A non-parametric analysis of variance is shown in Table 4. The response considered for this analysis is rank of efficiency. The ranks are assumed to follow the data generating process of the split plot with the randomized block design. The interaction A*B is significant.    Table 5 provides information on the average efficiency rank per treatment (means over efficiency ranks for the intercropping of maize and groundnut). Comparison of two A means, at the same or different levels of B, involve both main effect A and interaction A*B. The two error component are involved, the whole plot error and the subplot error. The variance of the difference is given by An unbiased estimate of ( )  We see that the visual impression of Figure 2 is confirmed. Responses associated with nitrogen level 2 are dominant. Treatment combination 42 has the highest efficiency. However it does not differ significantly (5%) from T32 or T52.
To compare the two levels of factor B, averaging over all densities, only the variance 2 ε σ is involved and a standard t-test may be carried out (Milliken & Johnson, 1984). Based on the ANOVA of the design (Table 4), the observed difference should be at least 3.67 to be declared significant at the 5% level. We see that level 2 significantly dominates level 1. We emphasize at this point that a similar analysis using non-transformed data leads to the same conclusions.
If the investigator feels that maize is more important than groundnut, this perception can be incorporated in the analysis using DEA models with Assurance Region weights restrictions approach ( 1 ≥ maize groundnut u u ), as discussed in Section 2. It leads to the data in Table 6 and the analysis shown in Table 7 when the responses are ranks. The interaction A*B is no longer significant at the 5% level. At this level of significance the least significant difference for A comparisons is 7.49. The A means are 12.13, 20.62, 26.00, 36.63 and 10.93, respectively. The best density is level 4, which is superior significantly to densities 1, 2, and 5 and marginally to density 3. The least significant difference for comparison of nitrogen levels is 3.05 and the observed mean difference is 10.8. Level 2 is superior as before.

Validation
The assumptions behind the statistical analysis carried in both examples of the previous section are strong and can be questioned. Typically, errors are uncorrelated and normally distributed. DEA efficiency responses cannot be normally distributed. The use of DEA efficiency measures as a response variable in regression models has been questioned recently in the econometric literature. See Simar & Wilson (2007). Besides normality, another issue in this discussion is that the analysis is carried out in two stages and in the first stage a correlation between experimental units is induced by the computation (and the definition) of the response variable and the potential association of covariates with the error term. In this context, one loses the distributional properties of the least squares estimators. For designed experiments however, the analysis of variance is known to be robust against many departures from the linear models defining the data generating process. Including non-normality and correlation among the experimental plots responses. On the other hand, covariates are not considered random in a designed experiment.
The random allocation of treatments to the experimental plots is the key feature that validates the statistical analysis besides allowing a method to verify if the significance levels induced by the normal theory are correct or not. Fisher (1925) extensively promoted this experimental process. A good discussion on this theory may also be seen in Hinkelman & Kempthorne (2007a, 2007b and Kutner et al. (2005).
Based on the robustness of the statistical analysis of designed experiments, we should stress once again that the issues of correlation and non-normality are of secondary importance in many applications. For example, with unitary inputs and a single response, the measure of efficiency proposed here amounts simply to divide the values of the response by its sample maximum. Since t and F statistics are invariant by scale transformations, the statistical results are unaffected by the two-stage process. The situation with multiple outputs is more complex. Our experience is that the correlation is not sufficiently strong to invalidate the analysis. In a previous study (Souza et al., 2007), we did not find that the correlation was significant.
A way to overcome the discussion on whether or not the classical analysis of variance is valid is to derive the distribution, induced by the randomization of treatments to the experimental plots, of the test statistics of concern, and compare them with the distribution generated under normal theory, i.e., under the model assumed to have generated the data.
For a complete randomized block design with r blocks and v treatments there are ( ) ! r v possible assignments of treatments to experimental plots. Consider the F statistics used to test equality of treatment effects. Each treatment assignment generates a value for F. The collection of these F values defines the distribution of F under randomization. Typically the number of possible treatment allocations is large and one works with a random sample from this population. Here we use 10,000 random allocations, which will generate 10,000 F values. We note that in these simulations the DEA responses are fixed in each experimental plot. Only the treatment allocations change.
For the split plot arranged in r blocks with a levels of factor A applied to main plots and b levels of factor B applied to the subplots, there will be ( ) ! ! r a b possible treatment allocations to the experimental plots. The statistics of main concern here are the F test statistics for the effects A, B and the interaction A*B. Here we also use 10,000 random allocations.
For a more complete discussion in randomization test see, for instance, Edgington & Onghena (2007).

Intercropping of Maize and Bean
We used SAS v9.1.3 to program a Macro to generate 10,000 values of the F statistic used for testing equality of treatment effects under normal theory. The SAS procedures used were PROC PLAN and PROC GLM. Table 8 shows the quantiles of the two distributions involved.
The quantiles of the two distributions are close. The analyses with normal theory and with randomization are coincident. The empirical p-value does not point to any significant discrepancy.

Intercropping of Groundnut and Maize
Here, also, a macro SAS was developed to generate values of the three F statistics associated with the effects A, B, and A*B. Table 9 shows the quantiles of the distributions of these test statistics under normality and under randomization.
Although the approximations for the split plot are not as good as for the previous randomized block design, the distributions reasonably agree. Empirical p values are 0.0254, < 0.0001 and 0.0489, for A, B and A*B, respectively. They are practically the same as those reported in the ANOVA analysis reported in Table 4.

Conclusions
We propose the use of DEA output oriented efficiency measurements, under constant returns to scale, assuming a unitary input, as the response of each experimental plot in a designed experiment whenever the output vector is defined by non-negative yields. The technique consists in applying the normal theory or non-parametric (rank) methods to the efficiency measurements. This proposed approach, with the validation test, has never been used before in intercropping experiments.