SciELO - Scientific Electronic Library Online

Home Pagealphabetic serial listing  

Services on Demand




Related links


Pesquisa Operacional

Print version ISSN 0101-7438On-line version ISSN 1678-5142

Pesqui. Oper. vol.38 no.3 Rio de Janeiro Sept./Dec. 2018 



Leonardo Macrini1  * 

Antonio Carlos Gonçalves2 

Renan M.V.R. Almeida3 

Carlos Patricio Samanez4  **

1Instituto Três Rios, Departamento de Ciências Econômicas e Exatas (DCEEx), Universidade Federal Rural do Rio de Janeiro, RJ, Brazil. E-mail:

2Instituto de Ciências Exatas, Departamento de Matemática, Universidade Federal Rural do Rio de Janeiro, RJ, Brazil. E-mail:

3Programa de Engenharia Biomédica, COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil. E-mail:

4Departamento de Engenharia Industrial, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, RJ, Brazil.


This study presents a new approach for the definition of weight restrictions in Data Envelopment Analysis (DEA) for the one output, multiple inputs case, using the results of a Linear Regression model (LRM) developed with the same DEA variables. Thus, the limits of Wong-Beasley and Cone Ratio methods are chosen without interference from a decision maker, with DEA weight search intervals defined from the estimated standardized coefficients of a linear regression (which represent the statistical importance of the inputs for the definition of the DEA efficiency scores). As an example, weight restrictions for a DEA model (Constant Returns to Scale (CRS)) were obtained through the unrestricted, Wong-Beasley and Cone Ratio methods applied to a dataset consisting of hospital admissions (output), number of beds and number of health professionals (inputs) in the year 2016; and rankings were compared by a Spearman correlation procedure. The regression model had R 2 = 0.89 with coefficients 0.43 (professionals) and 0.54 (beds); and the Spearman correlation among rankings was at least RS2 = 0.84. In conclusion, rankings were consistent and interpretable, and the approach circumvents the need for a subjective intervention by a decision maker when defining weight restrictions in DEA.

Keywords: Data Envelopment Analysis; Weight Restrictions; Wong-Beasley method; Cone Ratio method


Data Envelopment Analysis (DEA) is a widely used methodology for performance evaluation of Decision Making Units (DMUs). Some of the areas where it has been successfully used include the evaluation of schools, health units and financial institutions (Gonçalves et al., 2007; Kontodimopoulos et al., 2009; Gonçalves et al., 2013, Zhou et al., 2018); and its main advantage is the use of actual data for the definition of evaluation standards, so that no extraneous “gold standard” is needed for the comparison. Thus, for example, health units may be compared against the best units among those analyzed, according to predefined, relevant variables.

A problem sometimes faced by DEA is that it depends on an input/output optimization procedure in order to develop its main comparison parameters (the efficiency scores), but the original DEA formulation of Charnes et al. (1978, 2013) allows for a complete variation of the weights used in this procedure. Thus, inadequate “null weights” can be obtained, eliminating important variables from the analysis. While some proposed methodologies are now widely used to deal with this problem (“weight restriction” procedures), most of these approaches are still dependent on the subjective interference of analysts (Mecit & Alp, 2012; Podinovski, 2016).

The present paper proposes a new approach: the use of Linear Regression models to define the weight variation limits without the interference of a decision maker in the one output, multiple inputs DEA case (that is, considering only the statistical importance of the variables for DEA model). Two classical weight restriction methods are used in conjunction with the technique proposed here: Wong and Beasley’s method (W-B) and the Cone Ratio method (Wong & Beasley, 1990; Charnes et al., 2013), and, in addition, results for the traditional, unrestricted method are also presented.


2.1 The Constant Returns to Scale (CRS) model

The aim of a DEA model is to compare the relative performance of similar DMUs, taking into account defined inputs and outputs. In the case of multiple inputs and/or outputs, in the original DEA formulation by Charnes et al. (1978) below, efficiency is defined as the ratio between a weighted sum of outputs and a weighted sum of inputs, assuming constant returns to scale and that all input and output levels for all DMUs are strictly positive.

The CRS model measures the efficiency of the DMU j0 and is calculated as:

h0=Max r=1sUryrj0i=1mVixij0; (1)

subjected to:

r=1sUryrji=1mVixij1, j=1 n; (2)

W-B restrictions are then defined as:

arUryrjr=1sUryrjbr (3)

ciVixiji=1mVixijdiUr, Viε, r and i (4)

where y rj , x ij > 0 are the values of a DMU (with index j), U r = the weight given to output r, V i = the weight given to input I, n = the number of DMUs, s = the number of outputs, m = the number of inputs and ε = a positive infinitesimal number.

The Cone Ratio method (Charnes et al., 2013) considers the concept of assurance region (AR) developed by Thompson et al. (1990), and allows setting restrictions (relationships between the weights) such as:

kiwViVgkit ig and/or krwUrUokrt ro (5)

where V i and V g represent the weights of the inputs, and U r and U o the weight of the outputs of the DEA model, respectively. The values k iw , k it , k rw and k rt can incorporate the opinion of a decision maker or any other information.

In the non-restricted case, the efficiency measure h 0 represents the radial factor that determines by how much an input should be changed in order to turn a DMU from inefficient into efficient. In the CRS model with additional input/output restrictions, it is not necessary for h0* to actually approach this factor, that is, the input targets may or may not lead a DMU to efficiency (Allen et al., 1997; Charnes et al., 2013). However, these may still be considered as targets for DMU performance improvement.

2.2 Multiple Linear Regression

The well known Multiple Linear Regression model tries to represent the relationship between two or more variables (called predictors, inputs, independent or explanatory variables (Draper & Smith, 2014)) and a dependent (response or output) variable by fitting a linear equation to data. The independent variables are commonly represented as xij (i = 1, ..., m; j = 1, ..., n), and the dependent variable as y i (Draper & Smith, 2014; Darlington & Hayes, 2016).

The population regression model for the m explanatory variables x 1j , x 2j , ..., x mj , taken to be fixed (not random variables) is y j = β 0 + β 1 x1j + β 2 x2j + ··· + β mxmj + e j , and describes how the response variable y j changes with the independent variables. It is assumed that the response variable y has constant variance for each level of the dependent variable. The sample-fitted values β^0, β^1, . . . , β^m estimate β0, β1, ..., β m .

Since the observed values for y j have variable population means u j , the Multiple Regression model can be expressed as response variable = model function + random error, where “model function” represents β 0 + β 1 x1j + β 2 x2j +··· + β m xmj , and the random error or residual represents the deviations (e j ) of the observed values y j from their population means μ j . In the commonly used Ordinary Least Squares estimation method (OLS), the residuals are independent random variables, supposed normally distributed with mean zero and variance σ 2. The values fit by the equation β^0+β^1xij+β^2x2j+···+β^mxmj are written as y^j , and the estimated residuals e^j are given by yjy^j (the difference between observed and fitted values in a sample).

The OLS model is estimated by minimizing the sum of the squares of the residuals.

In the significance tests for the independent variables of the Multiple Regression model, the null hypothesis is that the coefficient β i is equal to 0, and the test statistic t is based on the parameter estimate ( β^i ) divided by its standard deviation (S( β^i )), which follows a Student-t (n−m−1) distribution when the model is estimated in a sample of size n and has m independent variables. A confidence interval for the parameter β i may then be computed from β^i as β^i ±t*Sβ^i , with t * as the respective critical value of the t -distribution for, e.g., 95% confidence (Draper & Smith, 2014, Darlington & Hayes, 2016).

2.3 Additional restrictions in the CRS model

Additional restrictions to the inputs and outputs given in (3) and (4) may be introduced, reflecting a judgment of the relative importance of the variables in the model, with the virtual input V i x ij representing the contribution of the input i to DMU j with weight V i . In the special case studied here, where r = 1 (one output variable), the contribution of the output to DMU j is 100% (U r = 1).

The regression equation for the inputs x 1j , x 2j , ..., x mj is given by yj=β0+β1x1j+β2x2j+···+βmxmj+ej(ej0) ; and β^0, β^1,..., β^m are the mentioned sample estimates of the population parameters β 0 , β 1 , ..., β m . Then, the c i and d i limits are initially defined by substituting the positive coefficients of the standardized regression variables, that is, β^i=Vi1 in the WongBeasley expression:

Vixiji=1m ViXij i, j

( Vi1 the regression estimated weights for input i).

After replacing the coefficients of each variable in the above expression by their regression estimates, a set of n values is obtained for each input variable. The minimum and the maximum of these values specify the limits defined by the Wong and Beasley restrictions. Thus, for j = 1, ... n:

ci=min Vi1xiji=1mVi1xij (6)

di=max Vi1xiji=1mVi1xij (7)

Solving (4) for V i and introducing Vi1 in i-1m Vi xij one obtains:

YcixijViYdixij (8)

with Y=i=1m Vi1 xij .

Since the limits are chosen according to minimum and maximum values (equations 6 and 7), they tend not to be too close. Therefore, this procedure avoids the problem that may arise with very close limits, which imply in the reduction of the set of production possibilities (topological space) of the DEA problem, and, consequently, in the infeasibility of DEA linear programming (Sarrico & Dyson, 2004). Also, given that the statistically significant Vis1 > 0 indicates the importance of variables in the DEA model, these estimates may be used to generate restriction intervals for the virtual input proportions.

2.4 Confidence intervals for the Cone Ratio method

The upper and lower bounds of the (95%) regression confidence intervals may be used to define weight restrictions through the Cone Ratio method when r = 1. If k iw and k it are the min imum and maximum of these weight values, a relationship of the type kiw ViVg kit ig; tw may be defined, where V i and V g represent the input weights in the DEA model. In this case, k iw and k it are respectively the minimum and the maximum values in the set generated by all pairs of lower confidence limit ratios and upper confidence limit ratios. For instance, if two variables were available, one would have: k iw = min(LL 1 /LL 2; UL 1 /UL 2); k it = max(LL 1 /LL 2; UL 1 /UL 2) (LL and UL are respectively the lower and upper confidence limits in the regression model). For the general case of m inputs, 2Cm2 relationships of the type (β^i±tS(β^i)/(βb±tS(β^g) are available, and, consequently, 2Cm2 inequalities (restrictions type (5) for inputs) are generated as:

Vikiw Vg0 and Vi+kit Vg0 (9)

It is possible, then, to specify the weight restrictions as a matrix W (v, u) ≥ 0, in which v = (V i , V g ) is the input vector of weights and u = (U) is the single-element vector representing the output of the model. Thus, W is formed with rows (1 − k iw 0), (−1 k it 0) and (0 0 1), corresponding to the weight restrictions and to the single-value vector above.

2.5 Example

The database used in the present application was obtained from hospital records in 20 public hospitals in the city of Rio de Janeiro, Brazil. The Frontier Analyst Professional® software

(Hussan & Jones, 2009) was used for the DEA model estimation, the Statsoft Inc software (Statsoft, 2013) was used for the linear regression modeling, and the SIAD platform was used for the Cone Ratio estimation (Angulo Meza et al. 2005). The input variables of the linear regression model were the number of Hospital beds and health professionals (physicians, nurses and other administrative personnel) and the dependent variable (output) the number of hospital admissions in the year 2016 (CNES, 2018; DATASUS, 2018).

The results of the Linear Regression model were used in order to identify the restriction intervals for the input variables of the DEA model. Once the ci, di limits explained above (Equations (6) and (7)) were defined, a CRS model was built, providing the W-B weight limits for the DEA variables (Wong & Beasley, 1990). Following that, weight limits were also obtained through the Cone Ratio method (Charnes et al., 2013), using the 95% confidence intervals of the Linear Regression parameters, as previously described. Thus, three classification results were obtained, one using Wong and Beasley’s method, one with the Cone Ratio and another one with the unrestricted, traditional CRS method. Finally, these rankings were compared by means of a Spearman correlation on their scores.


The data from the analyzed hospitals can be seen in Table 1, and Table 2 presents the restriction intervals together with the Linear Regression results. The percentage of explained variation in the Linear Regression model was R 2 = 0.89. The residual independence hypothesis was checked through Durbin-Watson statistics and accepted (d statistics = 1.98). Residuals had mean = zero; SD = 0.95 and the Normality and homoscedasticity assumptions were accepted by visual inspection.

Table 1 Input and output variables for a DEA study, 20 public hospitals in Rio de Janeiro city, Brazil, 2016. Output: Admissions. Inputs: number of beds and health professionals

Hospital Output Inputs
Admissions Surgery beds Health professionals
H1 3753 302 2039
H2 3803 293 2159
H3 2464 233 1288
H4 4518 304 2163
H5 2567 232 951
H6 4883 287 2665
H7 3277 250 1569
H8 3475 304 1474
H9 1501 83 603
H10 4336 376 1931
H11 2162 166 1609
H12 2282 175 1107
H13 343 126 981
H14 1187 63 590
H15 1095 79 363
H16 231 34 89
H17 395 47 401
H18 1645 97 565
H19 1401 108 732
H20 1849 122 1438

Table 2 Results of a Linear Regression model with inputs (number of beds, health professionals) and output (admissions), 20 public hospitals, Rio de Janeiro city, Brazil, 2016. Model R 2 = 0.89 (p = 0.000031). 

Inputs Standardized coefficients p-values 95% CIs
Health professionals number of beds 0.43 (0.027) [0.05- 0.81]
0.54 (0.007) [0.16- 0.92]

Table 3 presents the DMU rankings according to the used methods (W-B, Cone Ratio and unrestricted). As mentioned, the limits of variation W-B of inputs were obtained from equations (6) and (7), substituting V 1 = 0.43 and V 2 = 0.54 (Linear Regression model coefficients - Table 2), resulting in the restriction intervals [0.47 − 0.77] for the input health professionals and [0.23 − 0.53] (for number of beds). The rows of the matrix W , which represent the relation between the weights in the Cone Ratio method, were derived from (9) as:


where V 1 and V 2 represent the weights of health professionals and beds, respectively. Solving the inequalities above for V 1 and V 2, one obtains V 1 − 0.31V 2 ≥ 0 and −V 1 + 0.88V 2 ≥ 0, in which the coefficients are the rows of the matrix W.

Table 3 DEA efficiency scores for 20 public hospitals developed with the help of the unrestricted, the W-B and Cone Ratio weight restriction methods. In lieu of subjective restrictions introduced by a decision maker, the values of the inputs × output LRM coefficients (with the W-B method) or of the LRM CIs (with the Cone Ratio) were used for defining weight variation. 

Hospital W-B Cone Ratio Unrestricted
H9 100.00 100.00 100.00
H18 100.00 100.00 100.00
H14 97.18 100.00 100.00
H15 91.41 88.12 100.00
H6 87.98 90.44 90.44
H4 82.72 82.72 82.72
H5 76.62 72.64 89.48
H7 75.72 75.72 75.72
H12 75.16 75.16 75.16
H8 73.58 71.52 78.83
H19 73.27 73.27 73.27
H10 72.29 70.88 75.75
H2 71.52 71.62 71.62
H1 70.28 70.27 70.28
H11 66.56 69.13 65.13
H3 63.99 63.47 65.23
H16 55.33 49.41 86.04
H20 47.68 47.68 47.68
H17 44.54 45.33 45.33
H13 14.29 14.90 14.90

The Spearman correlation test indicated a correlation between rankings (R S : [W-B × Cone Ratio = 0.98, p < 0.0001]; [W-B × Unrestricted = 0.89, p < 0001] and [Cone Ratio × Unrestricted = 0.84, p < 0.0001)]), showing the convergence of the restricted rankings with the unrestricted model.


Few studies have discussed techniques dealing with the problem of specifying limit restrictions in DEA models. Thompson et al. (1986) were the first to propose the use of such restrictions in order to improve DEA classification metrics, and other interesting studies on this subject (Charnes et al., 2013; Halme et al., 1999) tried to use predefined DMUs as gold standards or looked into ways to incorporate the subjective experiences of decision makers into the DEA model. However, DEA applications demand an output to be positively correlated to inputs, and this characteristic allowed for the development of a number of weight restriction procedures based on the correlation of inputs and outputs (e.g. Gonçalves et al., 2007; Mecit & Alp 2012; Gonçalves et al., 2013; Unsal & Örkcüb, 2016). In our manuscript, a new procedure based on this idea was presented for the one output, multiple inputs case, taking into account the estimated standardized coefficients of a linear regression for the definition of DEA weight search intervals.

The present paper used two weight restriction methods widely found in the DEA literature: Wong-Beasley’s and the Cone Ratio methods. The former proposes that the decision maker or analyst should set an upper and a lower bound, [c i ; d i ], in order to define suitable limits for the importance of the input i to the DMU j . The latter (Khalili et al., 2010; Charnes et al., 2013) uses the concept of assurance regions (ARs) to impose constraints on the relative magnitude of the weights (in the present case as defined by (9)). As mentioned, the main difference between these two methods is that, instead of defining absolute weight ranges, the ARs define ranges for the ratio of the weights (Charnes et al., 2013).

It should be noticed, however, that both methods still require the subjective input of a decision maker in the setting of weight bounds, and an important characteristic of the approach introduced here is that restrictions do not depend on such opinions. The restriction intervals proposed are obtained directly from the available variables. It is known that Wong and Beasley’s approach can yield unsolvable linear problems (Sarrico & Dyson, 2004), a limitation that is avoided in the Cone Ratio method (Charnes et al., 2013) and by the Wong and Beasley limits proposed here, as indicated by equations 6-8.

As said, the mathematical structure of DEA models allows a DMU to be considered efficient because some of the model input variables are assigned null weights. These null weights, however, imply that the offending variable is actually not relevant for the model, and should have been excluded from the analysis. The use of LRM coefficients associated with the Cone Ratio method precludes this possibility, yielding a more meaningful and interpretable model. Wong and Beasley’s estimation (Equation 4) is based on the values of the regression coefficients, while the Cone Ratio method uses the lower and upper confidence interval limits of these coefficients. Although the intervals obtained were similar, they are not mathematically equivalent, that is, it is not possible to directly arrive at one of them starting from the other.

DEA restriction methods (e.g. Wong & Beasley; Cone Ratio) do not directly specify limits, relying, instead, on subjective information introduced by a decision-maker. As mentioned, the original treatment presented here is concerned with the subjectivity in the process of choosing these limits. Thus, here, limits are chosen without interference from a decision maker, allowing for a weight search interval defined from the coefficients of a linear regression, which represent the statistical importance of the input variables for the definition of the efficiency scores.

When comparing this method with an unrestricted model, it could be seen that unrestricted scores were greater or equal than that of the restricted models. This is due to the fact that, in the search for a optimal solution for each DMU, the model can favor, for example, a single variable, overestimating its importance and assigning zero weight to another (Unslat & Örkcü, 2016). For instance, this happened to DMU H15; which achieved 100% efficiency in the unrestricted model, but which had weight zero for the “beds” variable.

In conclusion, this paper introduced a novel methodology for defining weight restrictions in DEA models in the one output, multiple inputs case, taking into account the information obtained from a Linear Regression developed from the DEA inputs and output. Since one of the assumptions of the DEA model is the existence of an association between inputs and outputs, the construction of the scores based on linear regression weights is a natural option, given that a LRM explains part of the variation of the outputs from the variation of each input. This procedure was used in conjunction with Wong and Beasley’s and the Cone Ratio DEA methods, and yielded a consistent and interpretable ranking of 20 public hospitals. The problem of the subjective introduction of restrictions by a decision maker can thus be circumvented.


1 ANGULO ML, BIONDI NL, SOARES DE MELLO JCB, GOMES EG. 2005. ISYDS - Integrated System for Decision Support: A software package for data envelopment analysis model. Pesquisa Operacional, 25(3): 493-503. [ Links ]

2 ALLEN R, ATHANASSOPOULOS A, DYSON RG & THANASSOULIS E. 1997. Weights restrictions and value judgments in Data Envelopment Analysis: evolution, development and future direction. Annals of Operations Research, 73: 13-34. [ Links ]

3 CHARNES A, COOPER WW & RHODES E. 1978. Measuring the efficiency of decision-making units. European Journal of Operational Research, 6: 429-444. [ Links ]

4 CHARNES A, COOPER WW, LEWIN AY & SEIFORD LM. 2013. Data envelopment analysis: Theory, methodology, and applications. Springer, New York. [ Links ]


6 CYLUS J, PAPANICOLAS I & SMITH PC. 2017. Using data envelopment analysis to address the challenges of comparing health system efficiency. Global Policy, 8(S2): 60-68. [ Links ]

7 DARLINGTON RB & HAYES AF. 2016. Regression analysis and linear models: Concepts, applications, and implementation. Guilford Publications, New York. [ Links ]

8 DRAPER NR & SMITH H. 2014. Applied regression analysis Somerset: Wiley Series in Probability and Statistics, New York. [ Links ]


10 MECIT ED & ALP I. 2012. A new proposed model of restricted data envelopment analysis by correlation coefficients. Applied Mathematical Modelling, 37: 3407-3425. [ Links ]

11 GONÇALVES C, LINS MPE & ALMEIDA RMVR. 2007. Data Envelopment Analysis for evaluating public hospitals in Brazilian state capitals. Revista de Saúde Pública, 41: 427-435. [ Links ]

12 GONÇALVES C, ALMEIDA RMVR, LINS MPPE & SAMANEZ CP. 2013. Canonical correlation analysis in the definition of weight restrictions for data envelopment analysis. J Applied Statistics, 40: 1-12. [ Links ]

13 HALME M, JORO T & KOHONEN PWJ. 1999. A value efficiency approach to incorporating preference information in Data Envelopment Analysis. Management Science, 45: 103-115. [ Links ]

14 HUSSEIN A & JONES M. 2015. Frontier Analyst Professional software White Paper - version 4.0. Banxia Software. . Access on June 2017. [ Links ]

15 KHALILI M, CAMANHO AS, PORTELA MCAS & ALIREZAEE MR. 2010. The measurement of relative efficiency using Data Envelopment Analysis with assurance regions that link inputs and outputs. European Journal of Operational Research 203: 761-770. [ Links ]

16 KONTODIMOPOULOS N, PAPATHANASIOU ND, TOUNTAS Y & NIAKAS D. 2009. Separating managerial inefficiency from influences of the operating environment: An application in dialysis. Journal of Medical Systems, 34: 397-405. [ Links ]

17 UNSAL MG & ÖRKCÜB HH. 2016. A note for weight restriction in the ARIII model based on correlation coefficients. Applied Mathematical Modelling, 40: 10820-10827. [ Links ]

18 PODINOVSKI VV. 2016. Optimal weights in DEA models with weight restrictions. European Journal of Operational Research, 254: 916-924. [ Links ]

19 SARRICO CS & DYSON RG. 2004. Restricting virtual weights in Data Envelopment Analysis. European Journal of Operational Research, 59: 17-34. [ Links ]

20 STATSOFT. ELECTRONIC STATISTICS TEXTBOOK. 2013. StatSoft. . Access on February 2018. Tulsa, OK. [ Links ]

21 THOMPSON RG, SINGLETON FD, THRALL RM & SMITH BA. 1986. Comparative site evaluations for locating a high-energy physics lab in Texas. Interfaces, 16: 35-49. [ Links ]

22 THOMPSON RG, LANGEMEIER LN, LEE CT, LEE F & THRALL RM. 1990. The role of multiplier bounds in efficiency analysis with application to Kansas farming. Journal of Econometrics, 46: 93- 108. [ Links ]

23 WONG YHB & BEASLEY JE. 1990. Restricting weight flexibility in Data Envelopment Analysis. Journal of Operation Research Society, 41: 829-835. [ Links ]

24 ZHOU H, YANG Y, CHEN Y & ZHU J. 2018. Data envelopment analysis application in sustainability: The origins, development and future directions. European Journal of Operational Research, 264(1): 1-16. [ Links ]

Received: February 02, 2017; Accepted: August 22, 2018

*Corresponding author.



Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License