A NON-ARCHIMEDEAN DEA MODEL TO ASSESS GROUP COMPARISONS

We consider the use of the non-Archimedean infinitesimal epsilon in DEA-CCR models. The application of interest is defined by the performance measure of the Brazilian Agricultural Research Corporation research centers. We characterize an assurance region for the non-Archimedean element and suggest a value for it. Types of DMUs are compared using fractional regression models and quasi maximum likelihood inference. We conclude that the research centers aimed at studying specific agricultural products are dominant. The classic DEA-CCR performance measures and the solution provided by non-Archimedean model have Spearman correlation over 90%.


INTRODUCTION
Since 1996, the Brazilian Agricultural Research Corporation (Embrapa) monitors the production of its research centers with the use of an evaluation system based on a single performance output and three inputs.The inputs of the process are capital, operational expenses and labor.The performance indicator (output) is a weighted average of 28 production attributes, classified into four production categories: (a) technical-scientific production; (b) production of technical publications; (c) transfer of technologies and promotion of image; (d) development of technologies, products and processes.An efficiency measure is computed using Data Envelopment Analysis (DEA).See Souza et al. (1999) and Avila et al. (2014) for more details on Embrapa's performance system.The system of weights used to combine individual output performance indicators into a single variable is complex.It uses the Law of the Categorical Judgments of Thurstone (Torgerson, 1958;Souza, 2002) and it is dependent on a normalization system that makes the individual performance indicators scale free.
In the context of the company's new strategic guidelines for RD&I management and the proposition of a revised performance evaluation system, recently Souza & Gomes (2015a) suggested *Corresponding author.Embrapa, Secretaria de Gestão e Desenvolvimento Institucional -SGI, PqEB Av.W3 Norte final, 70770-901 Brasília, DF, Brasil.E-mails: geraldo.souza@embrapa.br;eliane.gomes@embrapa.brthe use of a three dimensional output, rearranging the attributes in three categories: (a) technical scientific production; (b) production of technical publications; (c) other activities.The system of weights used in these groupings is objective and based on Factor Analysis.In this context, the weighting system is data oriented and does not involve management pre-judgments regarding variable's importance.This is a major technical improvement compared to the previous approach.All the performance classification is based on the ranks of the performance variables, measured on a per capita basis, as explained in Section 2. It is important to emphasize that pure ratios are not used here as production variables, although the use of ratios in DEA seems to be a valid approach according to the recent literature (Olsen et al., 2015).The approach we propose motivates the use of a DEA-CCR model with three inputs and three outputs.We point out that this new approach has much appeal basically for two reasons.Firstly, the company no longer uses the older method of performance evaluation, which is described in detail in Souza et al. (1999) and in Avila et al. (2014).Secondly, a weighting system common to all units is easier to understand at the management level and eliminates operational biases.Our objective here is to present an alternative approach robust to the determination of weights and, at the same time, useful to obtain a potential measure of goal achievement, as described in Souza & Gomes (2015a), allowing a convenient way to monitor production policies.In this context it is critical to have non-zero weights for all production variables in the DEA solution.A full discussion of the full performance evaluation system and of the DEA score used by Embrapa through the period 1996-2009 is presented in Avila et al. (2014).
Typically in the standard DEA models, efficient units have slacks and therefore they don't satisfy the concept of efficiency in the sense of Pareto-Koopmans (Cooper et al., 2007).The existence of outputs or inputs with null weights in the multiplier solution of the DEA-CCR model suggests the use of a non-Archimedean constant in the evaluation process.In many applications of DEA small values have been used for this constant (10 −6 and smaller).See Charnes et al. (1996).The arbitrary choice of this constant is not advisable, because it can lead to erroneous solutions (numerical problems), as emphasized in Ali & Seiford (1993a, 1993b) and Cooper et al. (2007).Mehrabian et al. (2000) present a method for the calculation of an assurance region for the non-Archimedean constant ε.Taking into account the assurance region of Mehrabian et al. (2000), which is a closed and bounded interval, and the discussion in Cooper et al. (2007), we use the upper bound of the assurance region as our choice for the non-Archimedean constant.As a referee pointed out, the use of non-Archimedean constants may lead to inadequacies in the sense that the Pareto efficient target in the solution does not correspond to the target used to measure efficiency.In our application we think that such inadequacies are compensated by the presence of non-zero weights for inputs and outputs, since zero weights are not acceptable by company managers.From a practical point of view we may not eliminate variables like scientific production or personnel costs, for example, from an efficiency measurement solution.
The statistical inference in models where a DEA response is a dependent variable as a function of a set of covariates presents technical difficulties in two contexts.Firstly, the DEA calculations induce correlations among the DMUs.Of a more complex nature is the potential association of contextual variables with the error term.Simar & Wilson (2007, 2011) (1996) approach for two stage regressions.This approach is of our concern here, since the contextual variables are DMU types, unrelated to the performance score and therefore non endogenous.Gomes et al. (2008) give indication that the correlation induced by the design is not statistically significant for analysis of variance inferences when DEA responses are subjected only to treatment effects, validating, in this context, the fractional regressions of Ramalho et al. (2010).
Our discussion proceeds as follows.In Section 2 we describe the performance model being considered by Embrapa.In Section 3 we present the approach used to derive an assurance region and a value for the non-Archimedean constant ε.Section 4 describes the fractional regression models we used to assess type differences in performance.Section 5 is on statistical results.Finally, in Section 6 we summarize our findings and present our conclusions.

EMBRAPA'S PERFORMANCE MODEL
The performance model now in discussion in Embrapa is based on an input oriented DEA-CCR measure, with three inputs and three outputs, computed for each of the 37 research centers.Actually the company comprises 42 research units classified into three types of research-centers: product, eco-regional and thematic.Five of the 42 units were recently created (2010-2012) and were eliminated from the analysis due to lack of data.The centers of the type classification 'product' are endeavored to provide feasible solutions in research, development and innovation for specific agricultural products (e.g.soybean, corn and sorghum, cotton, cattle etc.).The ecoregional and thematic centers are concerned, respectively, with research envisaging sustainability of the agriculture in the different Brazilian eco-regions (e.g.: swamplands, savannah etc.) and in basic themes of interest (e.g., satellite monitoring, soil types, informatics and instrumentation for agriculture etc.).
The input components are labor costs, capital costs and operational costs.The output components are technical-scientific performance (TSP), production of technical publications (PTP), and other production activities (OPA).The category TSP is based on publications of scientific articles in indexed journals, book chapters, articles and abstracts in congresses proceedings and thesis supervising.The component PTP is based on publications classified as technical circulars, technical communications, research bulletins and other technical documents.The category OPA is defined mainly by the variables: field days, demonstrative and observation units, scientific methodology and environmental monitoring.All variables are normalized by the number of employees in each DMU (per capita basis) and, later, transformed into ranks.The normalization reduces variability and facilitates comparisons, reducing scale differences.The transformation into ranks eliminates the influence of atypical observations, defines responses scale free, and lends nonparametric properties to the statistical methods based in normality used in the determination of weights for the attributes in each performance dimension.The method used is maximum likelihood Factor Analysis.See Souza & Gomes (2015a) for further details about the construction of TSP, PTP and OPA.Table 1 shows the values of the six performance attributes for the year of 2009.

General concepts
Given a set of production units (Decision Making Units-DMUs), with known values for inputs and outputs, the DEA approach aims the calculation of an efficiency measure for each DMU.In general, the efficiency measure is the ratio between two weighted averages of outputs and inputs, respectively.The weights are derived by means of mathematical programming and maximize each DMU performance.
The two most popular DEA models are the CCR and the BCC (Cooper et al., 2007), which differ in the scale assumptions for the efficient frontier: constant returns to scale in the CCR case and variable returns to scale in the BCC case.In the search of the efficient frontier there are two possible radial orientations: input and output orientations.The input orientation is a cost minimization view and the output orientation is a revenue maximization view.The two orientations are equivalent under the constant returns hypothesis, in the sense that they induce the same efficiency measure.The primal (multipliers formulation) DEA-CCR is defined as (1) and the dual (envelope formulation) as (2).Here the production vector of DMU k is (x k , y k ), where x k denotes a r-dimensional input and y k a s-dimensional output.The DMU under evaluation has observed production vector (x o , y o ).The primal for the DEA-BCC is obtained adding a common scale factor to the objective function and to the inequality restriction in (1).For the envelope DEA-BCC formulation, one must add the restriction n k=1 λ k = 1 to (2).

Non-Archimedean DEA Models and Assurance Region for (non-Archimedean element)
The classic DEA models may attribute unit efficiency to DMUs that are not efficient in the sense of Pareto-Koopmans (weak efficiency due to the presence of non-null slacks).See Cooper et al. (2007).The necessary and sufficient condition for a DMU to be Pareto-Koopmans efficient is . One way to impose Pareto-Koopmans efficient solutions is to add the restrictions u j , v i ≥ ε > 0, ∀ j, i on the weights, where ε denotes the non-Archimedean positive constant.Thus, the primal model is (3) and the corresponding dual is (4).General DEA models with restrictions on the weights may be seen in Angulo Meza & Lins (2002), Thanassoulis et al. ( 2004), Portela & Thanassoulis (2006), for instance.
In general, there is no solution to the problems (3) or (4) without imposing restrictions on the values of the non-Archimedean constant.As discussed in Ali & Seiford (1993b) and in Cooper et al. (2007), it is quite tempting to use a small numerical value for ε > 0, as 10 −5 or 10 −6 , to obtain a solution close to the standard DEA measure.However, the use of very small quantities may lead to numerical problems.Ali (1990) presents a sensitivity analysis using different values of ε > 0 and shows the numerical implications in the efficiency calculations.Ali & Seiford (1993a) show that an assurance interval for ε is obtained imposing the bound ε < 1/ min k=1...n i=1...r x ik .Mehrabian et al. (2000) show that the limit proposed by Ali & Seiford (1993a)  In this article we opted for using the assurance region proposed by Mehrabian et al. (2000).Our motivation was to choose ε as the largest possible constant to guarantee feasibility, keeping away from numerical problems as well.The assurance region is (0, ε * ], where ε * = min{ε * 1 , . . ., ε * n }, n being the number of DMUs.In this expression ε * o (o = 1, . . ., n) is the solution of (5).The corresponding dual is (6).The fractional regression model postulates that the expected value of the performance measure is a monotonic function of the linear construct μ = xθ.To estimate θ from the observations (x i , y i ) i = 1, . . .n, one seeks θ maximizing the quasi-likelihood function (7).
It can be shown that if the function of the mean is specified correctly, then where the matrix V is estimated by (8).This is the result that allows valid statistical inference involving the parameter θ.The hypothesis of non-correlated observations is necessary but not likely to be of importance in our application as emphasized in Gomes et al. (2008).The alternative to this approach is bootstrap.

STATISTICAL RESULTS
Table 2 shows the standard DEA-CCR scores with the corresponding multipliers (weights).We used the SIAD software (Angulo Meza et al., 2005).
Table 3 shows the non-Archimedean DEA-CCR scores (Perform), as well as the values of the multipliers.The results were obtained via SAS programming, adapting the macro ORDEA of Emrouznejad & Ho (2012).
The performance measurements (Perform) have positive rank correlation of 0.929 with the standard DEA-CCR scores.This shows that inadequacies induced by the non-Archimedean solution are not strong enough to invalidate the analysis.The maximum value of the non-Archimedean constant that can be used is 0.002652, leading to the assurance region interval (0; 2.652 × 10 −2 ] (column Epsil in Table 3) and to the measurements Perform.For the non-Archimedean solution the relative weights of labor costs, operational costs and capital are, 19.4%, 24.6% and 56.0%, respectively.These are computed as follows.In Table 2 we computed averages for each input column and measured the relative participation of each column mean.Labor is the relatively cheapest component for the system.In relation to the output components the figures are 66.9%, 20.3% and 12.8% for technical-scientific production, production of technical publications and other activities, respectively.This dictates the direction to improve production efficiency in the company.The shadow price of technical-scientific production dominates the performance score.
Ceteris paribus, DMUs with good technical-scientific production and low values of capital expenses will show good performance.Notice that there are no null weights in this solution.
The distribution of Perform is shown in Figure 1 where it is represented a non-parametric estimate of the unknown probability density function.The graph suggests overlapping populations, with a dominating group.
Figure 2 shows box plots corresponding to the distribution of Perform by type of DMU.The type distributions differ, and DMUs classified as type product have superior performance.The average performances are 0.610, 0.421 and 0.260 for product, eco-regional and thematic research centers, respectively.Outliers were not observed.The formal statistical analysis with quasi-maximum likelihood estimation confirms the impression conveyed by Figures 1 and 2. The model was fitted with the logistic distribution function, with the mean value of the response variable dependent of the linear construct In this expression, the variables d 1 i and d 2 i are indicators of the types product and thematic, respectively.The b j are unknown parameters.The vector of estimates of  ) is assessed using the statistic l = b( V ) −1 b , distributed as the qui-square with three degrees of freedom under the hypothesis b = 0. We have l = 8.234, with p-value of 0.041.The pairwise differences between the types thematic and product, product and eco-regional, and thematic and eco-regional are assessed using estimates of the quantities b 1 − b 2 , b 1 , b 2 , respectively.The test statistics are where u 1 = (0, 1, −1), u 2 = (0, 1, 0) and u 3 = (0, 0, 1), respectively.Under the null hypotheses of no differences they are distributed as the standard normal.We have z 1 = 2.616, z 2 = 0.264, z 3 = −1.782with p-values 0.009, 0.792 and 0.075, respectively.The statistics indicate the dominance of the type product over thematic.The difference between types thematic and ecoregional is only marginally significant.

SUMMARY AND FINAL CONSIDERATIONS
In this article we used a DEA-CCR model with a non-Archimedean constant ε > 0 to compute performance measurements for 37 research centers of the Brazilian Agricultural Research Corporation.The model is based on a three dimensional output vector, constructed from the observed ranks of several production variables grouped into three performance categories: technicalscientific publications, technical publications, and other production activities.The input vector is defined by ranks on labor expenses, operational costs, and capital expenses.All production variables were previously normalized by the DMU labor quantity.An effort was made to use a weighting system not involving management pre-judgments.This was achieved via Factor Analysis.
The motivation for the non-Archimedean approach was dictated by the interest to obtain a weight distribution indicating usage of all input and output components.In this context, efficient units are Pareto-Koopmans efficient.The non-Archimedean ε was chosen as the upper limit of the assurance region proposed in Mehrabian et al. (2000).
The non-Archimedean performance measurements induced a rank correlation of 93% with the standard DEA-CCR measurements, indicating small differences in classification between the two solutions.The upper bound for ε is 0.002652.The relative weights of personnel expenses, operational costs and capital expenses are, respectively, 19.4%, 24.6% and 56.0%.Labor is the cheapest component for the system.In regard to the output components, the relative weights are 66.9%, 20.3% and 12.8%, respectively, indicating a strong dominance of publications of any type (technical scientific publications and technical publications: 87.2%).The corresponding values for the standard DEA-CCR models are 76.4%, 15.6%, and 7.8% for outputs and 36.6%,22.4%, and 47.4% for inputs.The order of importance is the same for outputs in the two models.In average terms if one wants to perform better it seems that a greater impact on efficiency would be achieved increasing scientific production and reducing operational and capital expenses.The non-Archimedean solution amplifies this need and produces smoother differences between publications and other activities.The outputs shadow prices may be used as weights in a goal evaluation system where actual production is compared with target production.
is not valid in general and propose a new assurance region.This work is seminal in the area and motivates the approaches of Jahanshahloo & Khodabakhshi (2004), Alirezaee (2005) and MirHassani & Alirezaee (2005).
Our aim in this article is the comparison of treatments (groups) in an analysis of variance design with DEA responses.Ramalho et al. (2010) discuss the use of fractional regression models in more general contexts, where a vector x of contextual variables affects linearly the DEA response.They consider two modeling alternatives.Here we choose the alternative derived from the model of Papke & Wooldridge (1996) for binary responses.If y denotes the DEA response, x the vector of contextual attributes (in our application a set of group indicator variables), and G(•) a nonlinear function with values in [0, 1], it is postulated that E(y|x) = G(xθ).The function G(•) is increasing.Usual choices for G(•) are the logistic G(xθ) = e xθ 1+e xθ and G(•) = (•), where (•) is the inverse of the probability distribution function of the standard normal.In fact, Papke & Wooldridge (1996) suggest as appropriate specification any smooth distribution function.

Figure 1 -
Figure 1 -Density probability function for the non-Archimedean measure.

Figure 2 -
Figure 2 -Box plots for the non-Archimedean measure by DMU type.

Table 1 -
Production observations and types of research centers.Ranks for inputs and weighted average of ranks for outputs.