Economic efficiency of Embrapa's research centers and the influence of contextual variables

In this paper we measure, with a DEA BCC model, the economic efficiency of Embrapas (Brazilian Agricultural Research Corporation) research centers. We model the DEA economic efficiency as a linear function of the contextual variables revenue generation capacity, partnership intensity, improvement of administrative processes, cost rationalization, size and type of research centers. The model used is of the type dynamic panel and assumes a structured correlation matrix intra times and unstructured inter times. In order to better discriminate the efficient units we used a heuristic efficiency score that aggregates the efficiencies in relation to the original and inverted DEA frontiers. Partnerships positive effect in the efficiency scores does not confirm the thoughts that Embrapas performance evaluation process discourages the integration and cooperation of its research centers. From the point of view of statistical significance, the improvement of administrative processes is the most important indicator among the contextual variables.


Introduction
The assessment of productive efficiency of a research public institution is of fundamental importance for its administration. As pointed out in Souza et al. (1999), in times of competition and of budget restrictions, a research institution needs to know how much is possible to increase its production with quality, without using additional resources. A better management of available resources may be accomplished if managers have at their disposal meaningful quantitative measurements of the production process.
In this context, the Brazilian Agricultural Research Corporation (Embrapa) monitors, since 1996, the production process of its 37 research centers, using a nonparametric DEA (Data Envelopment Analysis) production model, which provides a measure of technical efficiency of production for each research center. For more details see Souza et al. (1997Souza et al. ( , 1999 and Souza & Ávila (2000).
The economic efficiency proposed here to measure the performance in Embrapa's production model uses a four dimensional output vector, obtained from 28 indicators of marginal productions, and the total production costs as a single input. The DEA model imposes variable returns to scale and input orientation. This approach differs markedly from the actual use of DEA at Embrapa, which is based on the notion of technical efficiency and uses a single output measure (a weighted average of the four categories relative indexes) and a three dimensional input vector (costs with personnel, operational and capital expenditures), under the assumption of constant returns to scale. The motivation to use a combined measure of output and constant returns to scale is to make units more comparable and to avoid too many ties in the evaluation process, since under variable returns to scales the efficiency measurements are larger. Each unit is compared only with units of approximately the same size. In this context our approach with a four dimensional output is a compromise.
Besides efficiency assessment of production, managers are very much interested in the identification of exogenous covariates or contextual variables that may affect or cause economic efficiency. The identification of these variables is of managerial importance since they serve the purpose to identify management practices leading to efficient units. This article has as a general objective the study of DEA to measure efficiency of production of Embrapa's research centers. In this context there are three specific objectives. The first is to propose, as an alternative to the current model, the use of a more general measure of efficiency. This measure uses a multidimensional output, total costs as input and is a Farrel measure of economic efficiency under a technology showing variable returns to scale. The multidimensional output is defined by the grouping of marginal output indicators into four production categories: scientific production; production of technical publications; technology diffusion and image; and development of technologies, products and processes. The evolution of the economic efficiency measurements through time is investigated under the optics of a dynamic panel (Greene, 2002).
As a second objective of this study we propose the use, as in Leta et al. (2005), of a combined index to discriminate DMUs DEA efficient under variable returns to scale. This index is a weighted average of efficiency measurements under the classical and inverted DEA frontiers. Here, the weights generate minimum variance for the combined index.
Finally, the third specific objective is to model the log efficiency data as an autoregressive model, specifying a linear dependency on a set of contextual variables. The model assumes contemporaneous correlation and unstructured time covariances. The approach here used is original in the context of the use of DEA models in two stages regressions, since it takes into account the correlation among the research units, induced by the calculation method involved in the evaluation process of economic efficiency measurements. Particularly, we are interested in the effect induced on efficiency by an indicator of intensity of partnerships, since criticism has been raised to the use of economic efficiency in the evaluation process of Embrapa, with the argument that the process generates unwanted competition inhibiting collaboration among research centers.

Production data
The efficiency model in use now by Embrapa, considers a combined measure of output. This is a weighted average of marginal indexes each measuring a particular type of production relative to a pre-specified standard. Typically this normalization is defined, for each year, by the average of a particular variable over all research centers. The use of a joint measure of output is motivated by the fact that the fit of a production frontier assumes homogeneity of the production process, and this hypothesis is not verified strictly in Embrapa. Although all research units produce some amount of all the production variables considered, and use the same inputs, they have different perceptions on the relative importance of each production category. A research center specialized in Biotechnology, for instance, views as more important the category of Scientific Production, while a Product oriented center values, typically, as more important the category of Development of Technologies, Products and Processes. Embrapa tries to solve this homogeneity problem using a variable system of weights. Each research system has its own weights. We notice that such system of weights cannot be specified in an automatic way using DEA, due to the lack of homogeneity of the units involved in the evaluation. Another pertinent problem in this context, and also related to the use of DEA, is that an excessive number of production variables will make all research centers efficient.
The definition of a priory system of weights, that serves to the purpose of obtaining an univariate measure of output allowing comparisons among units is a complex task. In Embrapa's case, the weights were defined as a result of a survey study involving about 500 researchers and all company administrators. Each participant in the survey was asked to express his perception on the importance of a given output variable and category of production on a scale 1 (less important) to 5 (more important). Perceptions were expressed by categories of production and within categories of production. The model used in the analysis of this data is known as the Law of the Categorical Judgments, derived from the Law of Comparative Judgements, proposed by Thurstone (1927). The objective of the analysis is to transform the ordinal scale of individual perceptions in a interval scale representative of the population from which the appraisers' sample was extracted. The relative importance of each variable can then be determined using estimable differences in a continuous scale. More details on this process may be seen in Souza & Ávila (2000) and Souza (2002). The weights derived from Thurstone's technique are similar to those generated by the AHP method (Saaty, 1994).
The 28 dimensionless marginal production indicators are classified into 4 categories: Scientific Production, Production of Technical Publications, Development of Technologies, Products and Processes, and Technology Diffusion and Image. The category of Scientific Production includes the publication of articles in refereed journals, book chapters and abstracts and articles in proceedings of technical meetings. The category of Production of Technical publications groups the publications produced by the research centers aiming agricultural businesses and producers. Typical in this category are instructions and technical recommendations, which are publications written in a simplified language directed to extensionists and farmers. Such publications contain technical recommendations regarding the use of agricultural production systems. The category of Development of Technologies, Products and Processes groups production indicators related to the effort made by a research unit to make its production available to society in the form of a final product, as the production of new cultivars and varieties of plants, for instance. Finally, the category of Diffusion of Technologies and Image includes variables related both to Embrapa's effort to make its products known to the public and to market its image.
Besides efficiency measurements, Embrapa's evaluation model also makes use of other contextual variables (Ávila, 2002). These are: the capacity of obtaining research financing other than government (treasure) resources (RECF), improvement of administrative processes (IAPROC), intensity of partnerships (PART), and rationalization of costs (COSTS). The classification variables of interest are type and size. According to their objectives of research, the centers are classified into Product Oriented (15), Basic Themes (9) and Ecoregional (13) ones. Three sizes are considered (small, medium and large).
In this article, the data for production and contextual variables refer to the years of 2001, 2002 and 2003.

Methods
DEA has for objective the measurement of efficiency of production of units called Decision-Making Units, the so-called DMUs for short. DEA is an optimization process applied to each DMU aiming the estimation of an efficient frontier (piecewise linear frontier). The estimated frontier is composed by the DMUs, which represent the best production practices in the sample (Pareto efficient units). These units are peers or benchmarks for the inefficient ones.
The relative efficiency of a given DMU is defined as the ratio of a weighted sum of outputs by a weighted sum of inputs. The weights are shadow prices of outputs and inputs obtained via a linear programming problem (LPP) that seeks to maximize efficiency (the ratio). of the considered sum of products by the considered sum of necessary inputs to generate them. One of the advantages of a DEA frontier estimation over other competitive production models specifications is that it allows the use of both multiple inputs and multiple outputs in the calculation of a measure of efficiency, with or without the incorporation of subjective perceptions on the part of the investigator.
There are two classic DEA models. The CCR (also known by CRS or constant returns to scale) and the BCC (also known by VRS or variable returns to scale). The CCR model, (Charnes et al., 1978), assumes proportionality between inputs and outputs. The BCC model, due to Banker et al. (1984), substitutes the assumption of proportionality for the convexity one.
Typically there are two possible (radial) orientations in a DEA model. A DEA measure of efficiency is said to be input oriented when one seeks to minimize input use, without altering output level. The DEA measure of efficiency is said to be output oriented when one seeks to increase production, keeping constant the input usage. The term radial is used in reference to the fact that input (output) reduction (increase) is carried out by the same proportionality constant, independently of the input (output) component.
There are two equivalent formulations of a given DEA model. These are known as the Envelopment and the Multipliers formulations. The primal and the dual of a LPP define them. In a simplified way, it can be said that the Envelopment's formulation defines a feasible production region and projects each DMU on the frontier of this region. Inefficient DMUs are located below the efficient frontier and the efficient DMUs define the frontier. The Multipliers' formulation (given above) seeks to maximize, for each DMU, a ratio of a weighted sum of outputs by a weighted sum of inputs. In this formulation, the weights are the LPP's decision variables and are the most favorable to each DMU.
In (1) and in (2) below we present the DEA BCC versions for the Multipliers and Envelopment formulations, with input orientation. Each DMU k, , is a production unit that uses r inputs ik w , , to produce s outputs jk y , 1... j s = ; io w and jo y are the inputs and outputs of DMU o.
In (1), i v and j u are the weights for inputs and outputs, respectively, and * u is a scale factor (when positive indicates that the DMU operates in a region of decreasing returns to the scale; when negative the DMU operates in a region of increasing returns to scale; when null the DMU operates under constant returns to scale). In (2) Intuitively, a DMU is efficient if, in the scale that it operates, it is the best practice in terms of the use of available resources (best virtual output/input ratio).
As pointed out by Ali (1993), the BCC model considers as efficient a DMU with a (strictly) largest value in one of the output vector components, independently of the input vector, and a DMU with a (strictly) smallest value in one of the components of the input vector, independently of the output vector, since the units showing these conditions don't have any other units to be compared. DMUs in that situation cannot be considered truly efficient and require further investigation. Ali (1993) also points out that any unit satisfying condition (3) will be efficient, independently of the chosen DEA model.
Typically DEA BCC models will show a relatively large number of efficient units. The literature on the subject discusses several approaches that allow discrimination among efficient units. Angulo Meza & Lins (2002) present a revision of the models available to increase the discrimination in DEA models (and consequently to improve the ranking process). As the authors point out the models can be divided into two groups: those that incorporate a priori information from the decision maker (for instance, DEA models with weights restrictions and models of the Value Efficiency Analysis type), and those that do not use prior information in their specifications (for instance, super-efficiency models, crossevaluation models and DEA multi-objective models).
Leta et al. (2005) suggest the use of the inverted frontier to help in the discrimination process. The inverted frontier, proposed by Yamada et al. (1994) and Entani et al. (2002), shows a pessimistic evaluation of the DMUs. It is derived taking inputs as outputs and viceversa, and can have two interpretations: it consists of DMUs with the worst managerial practices and could be called the inefficient frontier, or of DMUs with the best practices according to an opposite point of view (for instance, the price of a certain product is seen as the input for the buyer, which seeks minimization, and as the output for the seller, which seeks maximization). Lins et al. (2005) uses this second interpretation in real estate evaluation.
In Leta et al. (2005), the first interpretation led to the suggestion of a composed index that weighs the two efficiency measures derived from the classic and inverted DEA frontiers. Following their approach the best DMUs must show a reasonable level of efficiency relative to the classical frontier and a poor level of efficiency relative to the inverted frontier. The composed index is a weighted average between the efficiency measurement and one minus the efficiency measurement relative to the inverted frontier. The assumption on the error structure above in (5) implies nonlinear parametric restrictions and the model becomes nonlinear. In this context, and given the presence of lagged values of the response variable on the right hand side of the equation, we used instrumental variables and three stage nonlinear least squares (Gallant, 1987) in the estimation process. As instruments we used the contextual variables assumed to be non-associated with the error component.

DEA-BCC model
The actual DEA CCR model in use by Embrapa is not applied directly to all research units. The population of DMUs is divided into three groups via multivariate cluster analysis applied to their cost vectors. The DEA analysis is then carried out within groups and this justifies the CCR approach, since units within a group will be homogeneous in size.
As an alternative to this model, here we use a DEA BCC model with input orientation, considering only one input (total costs) and 4 outputs (Scientific Production; Production of Technical Publications; Technology Diffusion and Image; Development of Technologies, Products and Processes). Within each category of production the weights leading to the corresponding measure of aggregate output are defined exogenously. Research centers are evaluated jointly. Differences in size are taken into consideration via the BCC restrictions. Table 1 shows economic efficiency under DEA BCC for Embrapa's research centers for each of the three years under study. . This can be seen in two ways. Firstly, a favorable way, since, in theory, the result indicates that there is a large number of units operating efficiently under variable returns to scale. Secondly, an unfavorable way, since some of the efficient measurements may be spuriously generated by the BCC assumption. Additional analyses are necessary to further discriminate the units.

Using a Heuristic Index of Efficiency
As pointed out before in 2.2 there are a plethora of methods in the DEA literature to discriminate efficient units. In this article we use a slight variation of the model suggested by Leta et al. (2005), which uses the efficiency results considering the classical and inverted DEA frontiers. Our approach differs in the choice of weights attributed to each efficiency measurement.
Here, weights are chosen to minimize the variance of the efficiency index. We notice that this approach also differs from the DEA-Savage model (Pimenta & Soares of Mello, 2005) that uses the concepts of optimism and pessimism in the attribution of the weights to the efficiency measurements. With the approach here proposed, of minimizing the differences between the DMUs, we expect to separate the units without penalizing much the classification.
The composed index minimizing variance is called here Heuristic Index of Efficiency (HIEf), since it is not an efficiency index in the strict sense of the term. It is an index, resulting from the aggregation of two efficiency measurements. HIEf is defined by the formula (1 ) However, if a is negative, one should determine a using a nonlinear programming whose objective function to be minimized is the variance (a quadratic function) subject to the restriction that 0 1 a ≤ ≤ .  Table 2 is better than in Table 1.

The influence of contextual variables in DEA economic efficiency
To further analyze the efficiency measurements associated with Embrapa's research centers we now turn to the identification of contextual variables that may affect (cause) economic efficiency, as measured by the DEA BCC model with input orientation. Table 3 displays the estimation results derived from nonlinear three stage least squares. We notice initially that we did not find evidence that β is changing over time. The contextual variables RECF, PART, COSTS and IAPROC are jointly significant (Wald statistics has p-value of 0,020). The negative sign of corresponding coefficients is indicative that they cause economic efficiency. The most important variables in the group are processes improvement (IAPROC) and intensity of partnerships (PART). Thus, we see that research centers with more partners are associated with larger efficiency level, ceteris paribus.
Centers of types Basic Themes and Product show more efficiency. Large and small research centers dominate the medium size centers. The global effect of size, however, is marginal.
As measure of goodness of the fit, we considered three measures of R 2 , one for each point in time (2001, 2002 and 2003). Each R 2 is the square of the correlation between observed and fitted values of the response variable. The values were 0.449, 0.419 and 0.500, respectively.
Finally, one could be led to think that the marginal significance of the contextual variables RECF, PART, COSTS and IAPROC would be consequence of multicollinearity. This is not the case. The condition indexes associated with the design matrix defined by these variables are 11.82, 12.60 and 11.28 for the years 2001-2003, respectively, well below the threshold of 30 suggested in the literature as a limit of singularity (Souza, 1998). Also, the variance inflation factors (VIFs), as defined in Souza (1998), are all close of 1, which is a further indication of low multiple correlation among the contextual variables.

Conclusions
The objectives proposed in this article were reached. We suggested a new measure of production efficiency to evaluate production of the Embrapa's research system defined by its 37 research centers. The measure of efficiency considered uses a multiple products and total costs and is a Farrel measure of economic efficiency. To further discriminate among efficient units, we used the concept of inverted frontier to generate a composed index with minimum variance.
For the period 2001-2003 we investigated the influence of contextual variables of administrative interest in the efficiency measure, using a dynamic panel data incorporating contemporaneous and serial correlation. We concluded that all contextual variables considered (capacity of own research financing, intensity of partnerships, improvement of administrative processes and rationalization of costs) are positively associated with the economic efficiency. Type and size of a research center also influence economic efficiency. The types Basic Themes and Product are dominant, as well as large and small research centers.
In this context we point out the effect of intensity of partnerships has on the economic efficiency measurements. In response to the criticism that the evaluation process would interfere negatively with the willingness of the research centers to cooperate with each other, we showed that higher values of partnership associations are related to larger values of economic efficiency.
The model used in this article was basically classical DEA. The resulting efficiency indexes are objective and show a clear economic interpretation. This approach however may not be the best for managers. We are directing new research to incorporate managerial perceptions into the model using restrictions to the shadow prices generated by the DEA model.