SPECIFYING WEIGHT RESTRICTION LIMITS IN DATA ENVELOPMENT ANALYSIS WITH THE WONG AND BEASLEY AND CONE RATIO METHODS

This study presents a new approach for the definition of weight restrictions in Data Envelopment Analysis (DEA) for the one output, multiple inputs case, using the results of a Linear Regression model (LRM) developed with the same DEA variables. Thus, the limits of Wong-Beasley and Cone Ratio methods are chosen without interference from a decision maker, with DEA weight search intervals defined from the estimated standardized coefficients of a linear regression (which represent the statistical importance of the inputs for the definition of the DEA efficiency scores). As an example, weight restrictions for a DEA model (Constant Returns to Scale (CRS)) were obtained through the unrestricted, Wong-Beasley and Cone Ratio methods applied to a dataset consisting of hospital admissions (output), number of beds and number of health professionals (inputs) in the year 2016; and rankings were compared by a Spearman correlation procedure. The regression model had R2 = 0.89 with coefficients 0.43 (professionals) and 0.54 (beds); and the Spearman correlation among rankings was at least RS = 0.84. In conclusion, rankings were consistent and interpretable, and the approach circumvents the need for a subjective intervention by a decision maker when defining weight restrictions in DEA.


INTRODUCTION
Data Envelopment Analysis (DEA) is a widely used methodology for performance evaluation of Decision Making Units (DMUs).Some of the areas where it has been successfully used include the evaluation of schools, health units and financial institutions (Gonc ¸alves et al., 2007; 524 DATA ENVELOPMENT ANALYSIS WITH THE WONG AND BEASLEY AND CONE RATIO METHODS Kontodimopoulos et al., 2009;Gonc ¸alves et al., 2013, Zhou et al., 2018); and its main advantage is the use of actual data for the definition of evaluation standards, so that no extraneous "gold standard" is needed for the comparison.Thus, for example, health units may be compared against the best units among those analyzed, according to predefined, relevant variables.
A problem sometimes faced by DEA is that it depends on an input/output optimization procedure in order to develop its main comparison parameters (the efficiency scores), but the original DEA formulation of Charnes et al. (1978Charnes et al. ( , 2013) ) allows for a complete variation of the weights used in this procedure.Thus, inadequate "null weights" can be obtained, eliminating important variables from the analysis.While some proposed methodologies are now widely used to deal with this problem ("weight restriction" procedures), most of these approaches are still dependent on the subjective interference of analysts (Mecit & Alp, 2012;Podinovski, 2016).
The present paper proposes a new approach: the use of Linear Regression models to define the weight variation limits without the interference of a decision maker in the one output, multiple inputs DEA case (that is, considering only the statistical importance of the variables for DEA model).Two classical weight restriction methods are used in conjunction with the technique proposed here: Wong and Beasley's method (W-B) and the Cone Ratio method (Wong & Beasley, 1990;Charnes et al., 2013), and, in addition, results for the traditional, unrestricted method are also presented.

The Constant Returns to Scale (CRS) model
The aim of a DEA model is to compare the relative performance of similar DMUs, taking into account defined inputs and outputs.In the case of multiple inputs and/or outputs, in the original DEA formulation by Charnes et al. (1978) below, efficiency is defined as the ratio between a weighted sum of outputs and a weighted sum of inputs, assuming constant returns to scale and that all input and output levels for all DMUs are strictly positive.
The CRS model measures the efficiency of the DMU j 0 and is calculated as: (1) subjected to:

525
W-B restrictions are then defined as: where y r j , x i j > 0 are the values of a DMU (with index j ), U r = the weight given to output r, V i = the weight given to input I, n = the number of DMUs, s = the number of outputs, m = the number of inputs and ε = a positive infinitesimal number.
The Cone Ratio method (Charnes et al., 2013) considers the concept of assurance region (AR) developed by Thompson et al. (1990), and allows setting restrictions (relationships between the weights) such as: where V i and V g represent the weights of the inputs, and U r and U o the weight of the outputs of the DEA model, respectively.The values k iw , k it , k rw and k rt can incorporate the opinion of a decision maker or any other information.
In the non-restricted case, the efficiency measure h 0 represents the radial factor that determines by how much an input should be changed in order to turn a DMU from inefficient into efficient.
In the CRS model with additional input/output restrictions, it is not necessary for h * 0 to actually approach this factor, that is, the input targets may or may not lead a DMU to efficiency (Allen et al., 1997;Charnes et al., 2013).However, these may still be considered as targets for DMU performance improvement.

Multiple Linear Regression
The well known Multiple Linear Regression model tries to represent the relationship between two or more variables (called predictors, inputs, independent or explanatory variables (Draper & Smith, 2014)) and a dependent (response or output) variable by fitting a linear equation to data.The independent variables are commonly represented as x i j (i = 1, . . ., m; j = 1, . . ., n), and the dependent variable as y i (Draper & Smith, 2014; Darlington & Hayes, 2016).
The population regression model for the m explanatory variables x 1 j , x 2 j , . . ., x m j , taken to be fixed (not random variables) is y j = β 0 + β 1 x 1 j + β 2 x 2 j + • • • + β m x m j + e j , and describes how the response variable y j changes with the independent variables.It is assumed that the response variable y has constant variance for each level of the dependent variable.The sample-fitted values β0 , β1 , . . ., βm estimate β 0 , β 1 , . . ., β m .
Since the observed values for y j have variable population means u j , the Multiple Regression model can be expressed as response variable = model function + random error, where "model function" represents β 0 + β 1 x 1 j + β 2 x 2 j +• • • +β m x m j , and the random error or residual represents the deviations (e j ) of the observed values y j from their population means μ j .In the commonly used Ordinary Least Squares estimation method (OLS), the residuals are independent random variables, supposed normally distributed with mean zero and variance σ 2 .The values fit by the equation β0 + β1 x i j + β2 x 2 j + • • • + βm x m j are written as ŷ j , and the estimated residuals ê j are given by y j − ŷ j (the difference between observed and fitted values in a sample).The OLS model is estimated by minimizing the sum of the squares of the residuals.
In the significance tests for the independent variables of the Multiple Regression model, the null hypothesis is that the coefficient β i is equal to 0, and the test statistic t is based on the parameter estimate ( βi ) divided by its standard deviation (S( βi )), which follows a Student-t (n−m−1) distribution when the model is estimated in a sample of size n and has m independent variables.A confidence interval for the parameter β i may then be computed from βi as βi ± t * S( βi ), with t * as the respective critical value of the t -distribution for, e.g., 95% confidence (Draper & Smith, 2014, Darlington & Hayes, 2016).

Additional restrictions in the CRS model
Additional restrictions to the inputs and outputs given in (3) and (4) may be introduced, reflecting a judgment of the relative importance of the variables in the model, with the virtual input V i x i j representing the contribution of the input i to DMU j with weight V i .In the special case studied here, where r = 1 (one output variable), the contribution of the output to DMU j is 100% (U r = 1).
The regression equation for the inputs x 1 j , x 2 j , . . ., x m j is given by y j = β 0 + β 1 x 1 j + β 2 x 2 j + • • • + β m x m j + e j (e j ≥ 0); and β0 , β1 , . . ., βm are the mentioned sample estimates of the population parameters β 0 , β 1 , . . ., β m .Then, the c i and d i limits are initially defined by substituting the positive coefficients of the standardized regression variables, that is, βi = V 1 i in the Wong-Beasley expression: i the regression estimated weights for input i).After replacing the coefficients of each variable in the above expression by their regression estimates, a set of n values is obtained for each input variable.The minimum and the maximum of these values specify the limits defined by the Wong and Beasley restrictions.Thus, for j = 1, . . .n: Pesquisa Operacional, Vol.38(3), 2018 Solving (4) for V i and introducing V 1 i in m i−1 V i x i j one obtains: with Y = m i=1 V 1 i x i j .Since the limits are chosen according to minimum and maximum values (equations 6 and 7), they tend not to be too close.Therefore, this procedure avoids the problem that may arise with very close limits, which imply in the reduction of the set of production possibilities (topological space) of the DEA problem, and, consequently, in the infeasibility of DEA linear programming (Sarrico & Dyson, 2004).Also, given that the statistically significant V 1 i s > 0 indicates the importance of variables in the DEA model, these estimates may be used to generate restriction intervals for the virtual input proportions.

Confidence intervals for the Cone Ratio method
The upper and lower bounds of the (95%) regression confidence intervals may be used to define weight restrictions through the Cone Ratio method when r = 1.If k iw and k it are the minimum and maximum of these weight values, a relationship of the type k iw ≤ V i V g ≤ k it (i = g; t = w) may be defined, where V i and V g represent the input weights in the DEA model.In this case, k iw and k it are respectively the minimum and the maximum values in the set generated by all pairs of lower confidence limit ratios and upper confidence limit ratios.For instance, if two variables were available, one would have: and U L are respectively the lower and upper confidence limits in the regression model).For the general case of m inputs, 2C 2 m relationships of the type ( βi ± t • S( βi ) (β b ± t • S( βg ) are available, and, consequently, 2C 2  m inequalities (restrictions of type (5) for inputs) are generated as: It is possible, then, to specify the weight restrictions as a matrix W (v, u) ≥ 0, in which v = (V i , V g ) is the input vector of weights and u = (U ) is the single-element vector representing the output of the model.Thus, W is formed with rows (1 − k iw 0), (−1 k it 0) and (0 0 1), corresponding to the weight restrictions and to the single-value vector above.

Example
The database used in the present application was obtained from hospital records in 20 public hospitals in the city of Rio de Janeiro, Brazil.The results of the Linear Regression model were used in order to identify the restriction intervals for the input variables of the DEA model.Once the c i , d i limits explained above (Equations ( 6) and (7)) were defined, a CRS model was built, providing the W-B weight limits for the DEA variables (Wong & Beasley, 1990).Following that, weight limits were also obtained through the Cone Ratio method (Charnes et al., 2013), using the 95% confidence intervals of the Linear Regression parameters, as previously described.Thus, three classification results were obtained, one using Wong and Beasley's method, one with the Cone Ratio and another one with the unrestricted, traditional CRS method.Finally, these rankings were compared by means of a Spearman correlation on their scores.

RESULTS
The data from the analyzed hospitals can be seen in Table 1, and Table 2 presents the restriction intervals together with the Linear Regression results.The percentage of explained variation in the Linear Regression model was R 2 = 0.89.The residual independence hypothesis was checked through Durbin-Watson statistics and accepted (d statistics = 1.98).Residuals had mean = zero; SD = 0.95 and the Normality and homoscedasticity assumptions were accepted by visual inspection.
Table 3 presents the DMU rankings according to the used methods (W-B, Cone Ratio and unrestricted).As mentioned, the limits of variation W-B of inputs were obtained from equations (6) and (7), substituting V 1 = 0.43 and V 2 = 0.54 (Linear Regression model coefficients -Table 2), resulting in the restriction intervals [0.47 − 0.77] for the input health professionals and [0.23 − 0.53] (for number of beds).The rows of the matrix W , which represent the relation between the weights in the Cone Ratio method, were derived from (9) as: where V 1 and V 2 represent the weights of health professionals and beds, respectively.Solving the inequalities above for V 1 and

DISCUSSION AND CONCLUSION
Few studies have discussed techniques dealing with the problem of specifying limit restrictions in DEA models.Thompson et al. (1986) were the first to propose the use of such restrictions in order to improve DEA classification metrics, and other interesting studies on this subject (Charnes et al., 2013;Halme et al., 1999) tried to use predefined DMUs as gold standards or looked into ways to incorporate the subjective experiences of decision makers into the DEA model.However, DEA applications demand an output to be positively correlated to inputs, and uses the concept of assurance regions (ARs) to impose constraints on the relative magnitude of the weights (in the present case as defined by (9)).As mentioned, the main difference between these two methods is that, instead of defining absolute weight ranges, the ARs define ranges for the ratio of the weights (Charnes et al., 2013).
It should be noticed, however, that both methods still require the subjective input of a decision maker in the setting of weight bounds, and an important characteristic of the approach introduced here is that restrictions do not depend on such opinions.The restriction intervals proposed are obtained directly from the available variables.It is known that Wong and Beasley's approach can yield unsolvable linear problems (Sarrico & Dyson, 2004), a limitation that is avoided in the Cone Ratio method (Charnes et al., 2013) and by the Wong and Beasley limits proposed here, as indicated by equations 6-8.
As said, the mathematical structure of DEA models allows a DMU to be considered efficient because some of the model input variables are assigned null weights.These null weights, however, imply that the offending variable is actually not relevant for the model, and should have been excluded from the analysis.The use of LRM coefficients associated with the Cone Ratio method precludes this possibility, yielding a more meaningful and interpretable model.Wong and Beasley's estimation (Equation 4) is based on the values of the regression coefficients, while the Cone Ratio method uses the lower and upper confidence interval limits of these coefficients.
Although the intervals obtained were similar, they are not mathematically equivalent, that is, it is not possible to directly arrive at one of them starting from the other.
DEA restriction methods (e.g.Wong & Beasley; Cone Ratio) do not directly specify limits, relying, instead, on subjective information introduced by a decision-maker.As mentioned, the original treatment presented here is concerned with the subjectivity in the process of choosing these limits.Thus, here, limits are chosen without interference from a decision maker, allowing for a weight search interval defined from the coefficients of a linear regression, which represent the statistical importance of the input variables for the definition of the efficiency scores.
When comparing this method with an unrestricted model, it could be seen that unrestricted scores were greater or equal than that of the restricted models.This is due to the fact that, in the search for a optimal solution for each DMU, the model can favor, for example, a single variable, overestimating its importance and assigning zero weight to another (Unslat & Örkcü, 2016).For instance, this happened to DMU H15; which achieved 100% efficiency in the unrestricted model, but which had weight zero for the "beds" variable.
In conclusion, this paper introduced a novel methodology for defining weight restrictions in DEA models in the one output, multiple inputs case, taking into account the information obtained from a Linear Regression developed from the DEA inputs and output.Since one of the assumptions of the DEA model is the existence of an association between inputs and outputs, the construction of the scores based on linear regression weights is a natural option, given that a LRM explains part of the variation of the outputs from the variation of each input.This procedure was used in conjunction with Wong and Beasley's and the Cone Ratio DEA methods, and yielded a consistent and interpretable ranking of 20 public hospitals.The problem of the subjective introduction of restrictions by a decision maker can thus be circumvented.
The Frontier Analyst Professional software (Hussan & Jones, 2009) was used for the DEA model estimation, the Statsoft Inc software (Statsoft, 2013) was used for the linear regression modeling, and the SIAD platform was used for the Cone Ratio estimation (Angulo Meza et al. 2005).The input variables of the linear regression model were the number of Hospital beds and health professionals (physicians, nurses and other administrative personnel) and the dependent variable (output) the number of hospital admissions in the year 2016 (CNES, 2018; DATASUS, 2018).

Table 1 -
Input and output variables for a DEA study, 20 public hospitals in Rio de Janeiro city, Brazil, 2016.Output: Admissions.Inputs: number of beds and health professionals.

Table 2 -
Results of a Linear Regression model with inputs (number of beds, health professionals) and output (admissions), 20 public hospitals, Rio de Janeiro city, Brazil, 2016.Model R 2 = 0.89 (p = 0.000031).

Table 3 -
(Khalili et al., 2010;Charnes et al., 2013)ls developed with the help of the unrestricted, the W-B and Cone Ratio weight restriction methods.In lieu of subjective restrictions introduced by a decision maker, the values of the inputs × output LRM coefficients (with the W-B method) or of the LRM CIs (with the Cone Ratio) were used for defining weight variation.Unsal & Örkcüb, 2016).In our manuscript, a new procedure based on this idea was presented for the one output, multiple inputs case, taking into account the estimated standardized coefficients of a linear regression for the definition of DEA weight search intervals.The present paper used two weight restriction methods widely found in the DEA literature: Wong-Beasley's and the Cone Ratio methods.The former proposes that the decision maker or analyst should set an upper and a lower bound, [c i ; d i ], in order to define suitable limits for the importance of the input i to the DMU j .The latter(Khalili et al., 2010;Charnes et al., 2013)