1 INTRODUCTION

Data Envelopment Analysis (DEA) is a widely used methodology for performance evaluation of Decision Making Units (DMUs). Some of the areas where it has been successfully used include the evaluation of schools, health units and financial institutions (^{Gonçalves et al., 2007}; ^{Kontodimopoulos et al., 2009}; ^{Gonçalves et al., 2013}, ^{Zhou et al., 2018}); and its main advantage is the use of actual data for the definition of evaluation standards, so that no extraneous “gold standard” is needed for the comparison. Thus, for example, health units may be compared against the best units among those analyzed, according to predefined, relevant variables.

A problem sometimes faced by DEA is that it depends on an input/output optimization procedure in order to develop its main comparison parameters (the *efficiency scores*), but the original DEA formulation of ^{Charnes et al. (1978}, 2013) allows for a complete variation of the weights used in this procedure. Thus, inadequate “null weights” can be obtained, eliminating important variables from the analysis. While some proposed methodologies are now widely used to deal with this problem (“weight restriction” procedures), most of these approaches are still dependent on the subjective interference of analysts (^{Mecit & Alp, 2012}; ^{Podinovski, 2016}).

The present paper proposes a new approach: the use of Linear Regression models to define the weight variation limits without the interference of a decision maker in the one output, multiple inputs DEA case (that is, considering only the statistical importance of the variables for DEA model). Two classical weight restriction methods are used in conjunction with the technique proposed here: Wong and Beasley’s method (W-B) and the Cone Ratio method (^{Wong & Beasley, 1990}; ^{Charnes et al., 2013}), and, in addition, results for the traditional, unrestricted method are also presented.

2 MATERIALS AND METHODS

2.1 The Constant Returns to Scale (CRS) model

The aim of a DEA model is to compare the relative performance of similar DMUs, taking into account defined inputs and outputs. In the case of multiple inputs and/or outputs, in the original DEA formulation by ^{Charnes et al. (1978}) below, efficiency is defined as the ratio between a weighted sum of outputs and a weighted sum of inputs, assuming constant returns to scale and that all input and output levels for all DMUs are strictly positive.

The CRS model measures the efficiency of the DMU *j*0 and is calculated as:

subjected to:

W-B restrictions are then defined as:

where *y*
_{
rj
}
*, x*
_{
ij
} > 0 are the values of a DMU (with index *j*), *U*
_{
r
} = the weight given to output *r, V*
_{
i
} = the weight given to input *I*, *n* = the number of DMUs, *s* = the number of outputs, *m* = the number of inputs and *ε* = a positive infinitesimal number.

The Cone Ratio method (^{Charnes et al., 2013}) considers the concept of assurance region (AR) developed by ^{Thompson et al. (1990}), and allows setting restrictions (relationships between the weights) such as:

where *V*
_{
i
} and *V*
_{
g
} represent the weights of the inputs, and *U*
_{
r
} and *U*
_{
o
} the weight of the outputs of the DEA model, respectively. The values *k*
_{
iw
} , *k*
_{
it
} , *k*
_{
rw
} and *k*
_{
rt
} can incorporate the opinion of a decision maker or any other information.

In the non-restricted case, the efficiency measure *h*
_{0} represents the radial factor that determines by how much an input should be changed in order to turn a DMU from inefficient into efficient. In the CRS model with additional input/output restrictions, it is not necessary for
^{Allen et al., 1997}; ^{Charnes et al., 2013}). However, these may still be considered as targets for DMU performance improvement.

2.2 Multiple Linear Regression

The well known Multiple Linear Regression model tries to represent the relationship between two or more variables (called *predictors*, *inputs*, *independent* or *explanatory* variables (^{Draper & Smith, 2014})) and a dependent (*response* or *output*) variable by fitting a linear equation to data. The independent variables are commonly represented as *xij (i* = 1*, ..., m*; *j* = 1*, ..., n)*, and the dependent variable as *y*
_{
i
} (^{Draper & Smith, 2014}; ^{Darlington & Hayes, 2016}).

The population regression model for the *m* explanatory variables *x*
_{
1j
}
*, x*
_{
2j
}
*, ..., x*
_{
mj
} , taken to be fixed (not random variables) is *y*
_{
j
} = *β*
_{
0
} + *β*
_{
1 x1j
} + *β*
_{
2 x2j
} + ··· + *β*
_{
mxmj
} + *e*
_{
j
} , and describes how the response variable *y*
_{
j
} changes with the independent variables. It is assumed that the response variable *y* has constant variance for each level of the dependent variable. The sample-fitted values
_{0}, β_{1}, ..., β_{
m
} .

Since the observed values for *y*
_{
j
} have variable population means *u*
_{
j
} , the Multiple Regression model can be expressed as **response variable** = **model function** + **random error**, where “model function” represents *β*
_{0} + *β*
_{
1 x1j
} + *β*
_{
2 x2j
} +··· + *β*
_{
m xmj
} , and the random error or residual represents the deviations (*e*
_{
j
} ) of the observed values *y*
_{
j
} from their population means *μ*
_{
j
} . In the commonly used Ordinary Least Squares estimation method (OLS), the residuals are independent random variables, supposed normally distributed with mean zero and variance *σ*
^{2}. The values fit by the equation

The OLS model is estimated by minimizing the sum of the squares of the residuals.

In the significance tests for the independent variables of the Multiple Regression model, the null hypothesis is that the coefficient *β*
_{
i
} is equal to 0, and the test statistic *t* is based on the parameter estimate (
*S*(
*t*
_{
(n−m−1)
} distribution when the model is estimated in a sample of size *n* and has *m* independent variables. A confidence interval for the parameter *β*
_{
i
} may then be computed from
*t*
^{*} as the respective critical value of the *t* -distribution for, e.g., 95% confidence (^{Draper & Smith, 2014}, ^{Darlington & Hayes, 2016}).

2.3 Additional restrictions in the CRS model

Additional restrictions to the inputs and outputs given in (3) and (4) may be introduced, reflecting a judgment of the relative importance of the variables in the model, with the *virtual input V*
_{
i
}
*x*
_{
ij
} representing the contribution of the input *i* to DMU *j* with weight *V*
_{
i
} . In the special case studied here, where *r* = 1 (one output variable), the contribution of the output to DMU_{
j
} is 100% (*U*
_{
r
} = 1).

The regression equation for the inputs *x*
_{1j}
*, x*
_{2j}
*, ..., x*
_{mj} is given by
*β*
_{0}
*, β*
_{1}
*, ..., β*
_{
m
} . Then, the *c*
_{
i
} and *d*
_{
i
} limits are initially defined by substituting the positive coefficients of the standardized regression variables, that is,

(
*i*).

After replacing the coefficients of each variable in the above expression by their regression estimates, a set of n values is obtained for each input variable. The minimum and the maximum of these values specify the limits defined by the Wong and Beasley restrictions. Thus, for *j* = 1, ... *n*:

Solving (4) for *V*
_{
i
} and introducing

with

Since the limits are chosen according to minimum and maximum values (equations 6 and 7), they tend not to be too close. Therefore, this procedure avoids the problem that may arise with very close limits, which imply in the reduction of the set of production possibilities (topological space) of the DEA problem, and, consequently, in the infeasibility of DEA linear programming (^{Sarrico & Dyson, 2004}). Also, given that the statistically significant

2.4 Confidence intervals for the Cone Ratio method

The upper and lower bounds of the (95%) regression confidence intervals may be used to define weight restrictions through the Cone Ratio method when *r* = 1. If *k*
_{
iw
} and *k*
_{
it
} are the min imum and maximum of these weight values, a relationship of the type
*V*
_{
i
} and *V*
_{
g
} represent the input weights in the DEA model. In this case, *k*
_{
iw
} and *k*
_{
it
} are respectively the minimum and the maximum values in the set generated by all pairs of lower confidence limit ratios and upper confidence limit ratios. For instance, if two variables were available, one would have: *k*
_{
iw
} = min(*LL*
_{1}
*/LL*
_{2}; *UL*
_{1}
*/UL*
_{2}); *k*
_{
it
} = max(*LL*
_{1}
*/LL*
_{2}; *UL*
_{1}
*/UL*
_{2}) (*LL* and *UL* are respectively the lower and upper confidence limits in the regression model). For the general case of *m* inputs,

It is possible, then, to specify the weight restrictions as a matrix *W (v, u)* ≥ 0, in which *v* = (*V*
_{
i
}
*, V*
_{
g
} ) is the input vector of weights and *u* = (*U*) is the single-element vector representing the output of the model. Thus, *W* is formed with rows (1 − *k*
_{
iw
} 0), (−1 *k*
_{
it
} 0) and (0 0 1), corresponding to the weight restrictions and to the single-value vector above.

2.5 Example

The database used in the present application was obtained from hospital records in 20 public hospitals in the city of Rio de Janeiro, Brazil. The Frontier Analyst Professional^{®} software

(Hussan & Jones, 2009) was used for the DEA model estimation, the Statsoft Inc software (^{Statsoft, 2013}) was used for the linear regression modeling, and the SIAD platform was used for the Cone Ratio estimation (^{Angulo Meza et al. 2005}). The input variables of the linear regression model were the number of *Hospital beds* and *health professionals* (physicians, nurses and other administrative personnel) and the dependent variable (output) the number of hospital *admissions* in the year 2016 (^{CNES, 2018}; ^{DATASUS, 2018}).

The results of the Linear Regression model were used in order to identify the restriction intervals for the input variables of the DEA model. Once the *ci, di* limits explained above (Equations **(6)** and (7)) were defined, a CRS model was built, providing the W-B weight limits for the DEA variables (^{Wong & Beasley, 1990}). Following that, weight limits were also obtained through the Cone Ratio method (^{Charnes et al., 2013}), using the 95% confidence intervals of the Linear Regression parameters, as previously described. Thus, three classification results were obtained, one using Wong and Beasley’s method, one with the Cone Ratio and another one with the unrestricted, traditional CRS method. Finally, these rankings were compared by means of a Spearman correlation on their scores.

3 RESULTS

The data from the analyzed hospitals can be seen in Table 1, and Table 2 presents the restriction intervals together with the Linear Regression results. The percentage of explained variation in the Linear Regression model was *R*
^{2} = 0*.*89. The residual independence hypothesis was checked through Durbin-Watson statistics and accepted (*d* statistics = 1.98). Residuals had mean = zero; SD = 0.95 and the Normality and homoscedasticity assumptions were accepted by visual inspection.

Hospital | Output | Inputs | |
---|---|---|---|

Admissions | Surgery beds | Health professionals | |

H1 | 3753 | 302 | 2039 |

H2 | 3803 | 293 | 2159 |

H3 | 2464 | 233 | 1288 |

H4 | 4518 | 304 | 2163 |

H5 | 2567 | 232 | 951 |

H6 | 4883 | 287 | 2665 |

H7 | 3277 | 250 | 1569 |

H8 | 3475 | 304 | 1474 |

H9 | 1501 | 83 | 603 |

H10 | 4336 | 376 | 1931 |

H11 | 2162 | 166 | 1609 |

H12 | 2282 | 175 | 1107 |

H13 | 343 | 126 | 981 |

H14 | 1187 | 63 | 590 |

H15 | 1095 | 79 | 363 |

H16 | 231 | 34 | 89 |

H17 | 395 | 47 | 401 |

H18 | 1645 | 97 | 565 |

H19 | 1401 | 108 | 732 |

H20 | 1849 | 122 | 1438 |

Inputs | Standardized coefficients | p-values | 95% CIs |
---|---|---|---|

Health professionals number of beds | 0.43 | (0.027) | [0.05- 0.81] |

0.54 | (0.007) | [0.16- 0.92] |

Table 3 presents the DMU rankings according to the used methods (W-B, Cone Ratio and unrestricted). As mentioned, the limits of variation W-B of inputs were obtained from equations **(6)** and (7), substituting *V*
_{1} = 0*.*43 and *V*
_{2} = 0*.*54 (Linear Regression model coefficients - Table 2), resulting in the restriction intervals [0*.*47 − 0*.*77] for the input *health professionals* and [0*.*23 − 0*.*53] (for *number of beds*). The rows of the matrix *W* , which represent the relation between the weights in the Cone Ratio method, were derived from (9) as:

where *V*
_{1} and *V*
_{2} represent the weights of *health professionals* and *beds*, respectively. Solving the inequalities above for *V*
_{1} and *V*
_{2}, one obtains *V*
_{1} − 0*.*31*V*
_{2} ≥ 0 and −*V*
_{1} + 0*.*88*V*
_{2} ≥ 0, in which the coefficients are the rows of the matrix *W*.

Hospital | W-B | Cone Ratio | Unrestricted |
---|---|---|---|

H9 | 100.00 | 100.00 | 100.00 |

H18 | 100.00 | 100.00 | 100.00 |

H14 | 97.18 | 100.00 | 100.00 |

H15 | 91.41 | 88.12 | 100.00 |

H6 | 87.98 | 90.44 | 90.44 |

H4 | 82.72 | 82.72 | 82.72 |

H5 | 76.62 | 72.64 | 89.48 |

H7 | 75.72 | 75.72 | 75.72 |

H12 | 75.16 | 75.16 | 75.16 |

H8 | 73.58 | 71.52 | 78.83 |

H19 | 73.27 | 73.27 | 73.27 |

H10 | 72.29 | 70.88 | 75.75 |

H2 | 71.52 | 71.62 | 71.62 |

H1 | 70.28 | 70.27 | 70.28 |

H11 | 66.56 | 69.13 | 65.13 |

H3 | 63.99 | 63.47 | 65.23 |

H16 | 55.33 | 49.41 | 86.04 |

H20 | 47.68 | 47.68 | 47.68 |

H17 | 44.54 | 45.33 | 45.33 |

H13 | 14.29 | 14.90 | 14.90 |

The Spearman correlation test indicated a correlation between rankings (R_{
S
}: [W-B × Cone Ratio = 0*.*98*, p <* 0*.*0001]; [W-B × Unrestricted = 0*.*89*, p <* 0001] and [Cone Ratio × Unrestricted = 0*.*84*, p <* 0*.*0001)]), showing the convergence of the restricted rankings with the unrestricted model.

4 DISCUSSION AND CONCLUSION

Few studies have discussed techniques dealing with the problem of specifying limit restrictions in DEA models. ^{Thompson et al. (1986}) were the first to propose the use of such restrictions in order to improve DEA classification metrics, and other interesting studies on this subject (^{Charnes et al., 2013}; ^{Halme et al., 1999}) tried to use predefined DMUs as gold standards or looked into ways to incorporate the subjective experiences of decision makers into the DEA model. However, DEA applications demand an output to be positively correlated to inputs, and this characteristic allowed for the development of a number of weight restriction procedures based on the correlation of inputs and outputs (e.g. ^{Gonçalves et al., 2007}; ^{Mecit & Alp 2012}; ^{Gonçalves et al., 2013}; ^{Unsal & Örkcüb, 2016}). In our manuscript, a new procedure based on this idea was presented for the one output, multiple inputs case, taking into account the estimated standardized coefficients of a linear regression for the definition of DEA weight search intervals.

The present paper used two weight restriction methods widely found in the DEA literature: Wong-Beasley’s and the Cone Ratio methods. The former proposes that the decision maker or analyst should set an upper and a lower bound, [*c*
_{
i
} ; *d*
_{
i
} ], in order to define suitable limits for the importance of the input *i* to the DMU *j* . The latter (^{Khalili et al., 2010}; ^{Charnes et al., 2013}) uses the concept of *assurance regions* (ARs) to impose constraints on the relative magnitude of the weights (in the present case as defined by (9)). As mentioned, the main difference between these two methods is that, instead of defining absolute weight ranges, the ARs define ranges for the *ratio* of the weights (^{Charnes et al., 2013}).

It should be noticed, however, that both methods still require the subjective input of a decision maker in the setting of weight bounds, and an important characteristic of the approach introduced here is that restrictions do not depend on such opinions. The restriction intervals proposed are obtained directly from the available variables. It is known that Wong and Beasley’s approach can yield unsolvable linear problems (^{Sarrico & Dyson, 2004}), a limitation that is avoided in the Cone Ratio method (^{Charnes et al., 2013}) and by the Wong and Beasley limits proposed here, as indicated by equations **6****-8**.

As said, the mathematical structure of DEA models allows a DMU to be considered efficient because some of the model input variables are assigned null weights. These null weights, however, imply that the offending variable is actually not relevant for the model, and should have been excluded from the analysis. The use of LRM coefficients associated with the Cone Ratio method precludes this possibility, yielding a more meaningful and interpretable model. Wong and Beasley’s estimation (Equation **4**) is based on the values of the regression coefficients, while the Cone Ratio method uses the lower and upper confidence interval limits of these coefficients. Although the intervals obtained were similar, they are not mathematically equivalent, that is, it is not possible to directly arrive at one of them starting from the other.

DEA restriction methods (e.g. Wong & Beasley; Cone Ratio) do not directly specify limits, relying, instead, on subjective information introduced by a decision-maker. As mentioned, the original treatment presented here is concerned with the subjectivity in the process of choosing these limits. Thus, here, limits are chosen without interference from a decision maker, allowing for a weight search interval defined from the coefficients of a linear regression, which represent the statistical importance of the input variables for the definition of the efficiency scores.

When comparing this method with an unrestricted model, it could be seen that unrestricted scores were greater or equal than that of the restricted models. This is due to the fact that, in the search for a optimal solution for each DMU, the model can favor, for example, a single variable, overestimating its importance and assigning zero weight to another (Unslat & Örkcü, 2016). For instance, this happened to DMU H15; which achieved 100% efficiency in the unrestricted model, but which had weight zero for the “beds” variable.

In conclusion, this paper introduced a novel methodology for defining weight restrictions in DEA models in the one output, multiple inputs case, taking into account the information obtained from a Linear Regression developed from the DEA inputs and output. Since one of the assumptions of the DEA model is the existence of an association between inputs and outputs, the construction of the scores based on linear regression weights is a natural option, given that a LRM explains part of the variation of the outputs from the variation of each input. This procedure was used in conjunction with Wong and Beasley’s and the Cone Ratio DEA methods, and yielded a consistent and interpretable ranking of 20 public hospitals. The problem of the subjective introduction of restrictions by a decision maker can thus be circumvented.