PREDICTION OF EFFICIENCY IN COLOMBIAN HIGHER EDUCATION INSTITUTIONS WITH DATA ENVELOPMENT ANALYSIS AND NEURAL NETWORKS

. This paper shows the results of a research of the application of data envelopment analysis (DEA) together with artiﬁcial neural networks (ANN) of higher education institutions in Colombia during the years 2011-2013, for the purpose of evaluating the technical efﬁciency of Colombian higher education institutions and subsequently carry out predictions, based on a group of management indicators. Information provided by the Ministry of National Education was used as data source. The results show that this two-stage approach provides the DEA with the predictive potential that it otherwise lacks, enhancing its evaluative qualities; this is also evident in the various research papers consulted. The results also show that 50% of the models built have correct classiﬁcation rates of 64.58% and 58.33% for training and validation datasets, respectively.


INTRODUCTION
Currently it is a priority to take into account principles of rational use of resources and economic efficiency in the administration of public education institutions at their different levels to strengthen their institutional duties (teaching, research and extension).For this reason, the different levels of governance and organization relative to the management of higher education develop strategies and plans to increase efficiency in universities and thus improve their performance.There is a very close relationship between the allocation and efficient use of public resources and for this reason, researchers have spent a lot of time evaluating the efficiency of different education systems in the world.
In this regard, it should be noted that there are quite noticeable differences in terms of the resources available to the different Higher Education Institutions (HEIs) to develop their institutional tasks, which in some way affects the results they obtain.Therefore, it is very possible to find universities presenting a better performance than other universities that have more resources.If we also consider that under the flagship program of the Ministry of National Education, Ser Pilo Paga (Being Smart Pays Off ), 98% of the resources are received by private universities (Observatory of the Colombian University, 2016) it would be interesting to review the efficiency of Colombian public universities as competitive economic units.
This research uses the CCR model (Charnes et al., 1978) of the Data Envelopment Analysis (DEA), focused on outputs, to determine whether an HEI is efficient, and subsequently neural networks to predict whether a new observation (HEI) can be classified as efficient.The data used in this study correspond to the management indicators of Colombian public universities during the period 2011-2013 and are taken from the database of the Colombian Ministry of National Education.The efficiency is the ability of an HEI to obtain the maximum outputs indicators from a given set of inputs indicators.

THEORETICAL FRAMEWORK
Data Envelopment Analysis (DEA) is an approach to evaluate the performance of a set of homogeneous entities called Decision Units (DMUs) that transform multiple inputs into multiple outputs (Cooper et al., 2011).Charnes, Cooper and Rhodes developed the initial DEA model originally based in an earlier work devised by Farrel in 1957 (Charnes et al., 1978).This technique uses linear programming models to compare production units that handle the same amount of resources and they produce the same amount of products.This generates a frontier efficient values and the efficiency indices within the group of production units studied.Uses of DEA have involved not only business firms but also government and non-profit agencies including schools, hospitals among others, the term "Decision Making Unit" (DMU) was therefore introduced to cover, in a flexible manner, any such entity.The performance of a DMU is efficient if and only if it is not possible to improve any input or output without worsening any other input or output (Cooper et al., 2011).In this way, Colombian state universities, can be treated as Decision-Making Units (DMUs).
In the CCR model is determined the relative efficiency of n DMUs to be evaluated, where each unit has m inputs (resources) x 1 , x 2 , . . ., x m and s outputs (products) y 1 , y 2 , . . ., y s , is required the solution of the following linear program model.
The DEA methodology is one of the main techniques to evaluate the performance of productive units both in the public sector and in the private sector.Thus, this technique has been used in the financial sector (Sathye, 2001 Artificial Neural Networks (ANNs) simulate the information processing of the human brain, as an interconnected set of simple processing elements, units or nodes, such as the ability to memorize and relate facts and circumstances, and are a powerful tool for modeling non-linear functions through an adaptation or learning process of a set of training patterns (Gurney, 2003).These units or nodes receive and transmit signals analogously to biological neural networks (Mehrotra et al., 1997).Neural networks were developed based on the McCulloch and Pitts model, where their object of study is the computation made by neurons, and showed that the neural network system of the human being can be imitated by a mathematical algorithm (Shabanpour et al., 2017).These computer algorithms consist of many simple elements with highly interconnected that produces different signals based on the weighted sum of the input signals they receive, with ability of modelling complex non-linear relations (Schalkoff, 1997).The type of network used is defined by the architecture that represents the connection structure between the nodes, by its method of determining the connection weights and by the activation function it uses.The architecture of a network consists of the grouping of the neurons forming layers; there are monolayer and multilayer networks (Fausett, 1994).

Data Envelopment Analysis and Neural Networks
Neural networks have been used in multiple applications and recently in combination with data envelopment analysis as predictive techniques in a second stage to evaluate the efficiency of productive units.Among these works we can highlight the following: (C ¸elebi

Data Envelopment Analysis and Higher Education
The evaluation of the efficiency of higher education institutions is an area of great interest.The DEA methodology is particularly useful in the evaluation of educational institutions since the economic value of many of the inputs and outputs is difficult to determine, so weighing them is considered appropriate for universities (Colbert et al., 2000).Thanassoulis ( 2016

METHODOLOGY
In this research, we make a quantitative study supported by Data Envelopment Analysis (DEA) of 32 universities within the Colombian Public University System during the years 2011 to 2013.The approach we propose using of a DEA-CCR-O model (CCR model oriented on outputs) with four inputs and six outputs.The inputs of the process are the total number of Academic Staff, Administrative Staff Spending, Financial Resources and Physical Resources.The performance indicators (output) are number of postgraduate and undergraduate degrees enrolled, Results Saber PRO, Indexed Journals, Research publications and Professor Mobility.An output approach is used, since reducing resources for an inefficient university would be unreasonable, because this implies reducing the academic and administrative staff, as well as a reduction of financial resources.
The DEA model is conceived for the purpose of determining the technical efficiency indexes of each Colombian public HEIs and thus identifying the institutions that present the best practices in their performance, later for the efficiency results obtained with the DEA model we using neural networks with the purpose of predicting the classification of a new IES as efficient or not, and the validation of the predictive power of the neural network.The architecture used in this research is a monolayer neuronal network.

HEIs included in the analysis
In this efficiency assessment of public HEIs, the 32 Colombian public universities belonging to the System of State Universities (SUE, for its Spanish acronym) are considered for three consecutive years (2011-2013).The HEIs in the analysis stage were randomly assigned from HEI-1 to HEI-32 in order to ensure the confidentiality of the results of the evaluation.The institutions that were considered in the study are shown in Table 1.

The data and variables
The data of the variables employed in the realization of this study is derived from data which are publicly available on the Colombian Ministry of National Education website (2015).The data envelopment analysis requires the identification of input variables, which correspond to the resources used to carry out the institutional duties, and output variables, which are identified as the products or objectives of the HEIs.We selected variables are used both to determine whether or not a given HEI is efficient, and for the subsequent prediction through neural networks, four inputs and six outputs are employed considering previous researches (Tsolas & Charles, 2015; Selim & Bursalıoglu, 2015; Ramzi & Ayadi, 2016).

Input variables
The first input is the total number of Academic Staff (full-time professors equivalent), includes professors, associate professors, assistant professors, casual professors and lecturers.The second is Administrative Staff Spending, contains the expense of non-academic staff that administers the teaching and research process.The third input is Financial Resources, represents monetary resources provided by the State and the monetary resources produced by the university itself.The fourth input is Physical Resources or area of built physical spaces available for university activities (academic and administrative).

Output variables
The first and second output is the number of postgraduate and undergraduate degrees enrolled, measured in equivalent full-time students.The third output is Results Saber PRO, contains the total quantity of students who are in the upper quintile according to the State exam scores.The fourth output is Indexed Journals indicating the number of academic journals of the HEI classified in the different categories in the Publindex of Colciencias.The fifth result is Research publications that show the number of papers of the researchers in different journals.The sixth output is Professor Mobility or number of professors linked to mobility programme supported by from the HEI to which they belong.

Data Envelopment Analysis (DEA)
The efficiency results of the CCR-O model for the years 2011 to 2013 are shown in Table 2. Therefore, there are 96 observations available, 32 universities for 3 years.

Neural networks
The results in Table 2 indicate that, from the 96 observations, 40 are considered efficient and 56 are inefficient.For the modeling of the neural network, the 96 observations were divided into 72, for training (train), and 24, for validation (test).The validation of the neural network will be carried out through repeated holdout, which consists in taking two sets of random samples for training and validation.This procedure is repeated k times, for which k models are constructed, which are also evaluated k times to improve the estimation of the performance of the classification model.
The modeling of the neural network was conducted through the caret and nnet packages of the R software.The best parameters for the neural network, Size and Decay, are determined through the caret package.The model obtained consists of an artificial neural network (ANN) (10, 3, 1), that is, of 10 inputs, 3 hidden neurons and 1 output, with 37 weights and with the decay parameter of 0.1, as shown in Figure 1.The inputs used for the ANN are the 10 variables of the DEA model mentioned before.
Figure 1 shows the representation of the calculated neural network.The black color of the arcs indicates a positive weight, the gray one a negative weight, and the thickness the magnitude of the weight, where we can observe a positive weight of great relative magnitude between the hidden layer H3 and the response (categorical output variable that indicates whether or not the HEI is efficient).After carrying out the validation by repeated holdout with k = 100 (repeat holdout 100 times), a success rate was found for the training data that varies between a minimum of 34.72% and a maximum of 81.94%, with an average value of 61.76%, and a success rate for the validation data that varies between a minimum of 33.33% and a maximum of 83.33%, with an average value of 58.33%.

The confusion matrix of the training data shown in
Figure 3 shows the Box Plots of the 100 correct classification rates when validating the neural network by means of repeated holdout 100 times, both for the training data and for the validation data, showing that in at least 75% of cases this rate is above 50%.
The confidence intervals for the correct classification rates constructed by the quintile method shown in       In Figure 4 we can observe a positive weight of great relative magnitude between the hidden layer H3 and the response (categorical output variable that indicates whether the IES is efficient or not).
The confusion matrices of the training data and validation data is show in Table 6 and Table 7, respectively In the Figure 5 displays the ROC (Receiver Operating Characteristic) curve for the training data, indicating an AUC (Area Under ROC Curve) of 0.841, which shows that the neural network model has a good classification capacity.

CONCLUSION
The results show that the application of the DEA together with neural networks allows us to predict whether or not a HEI can be classified as efficient, with an adequate correct classification rate.Furthermore, this two-stage approach gives the DEA a predictive potential that it otherwise lacks, thus enhancing its evaluative qualities.Thus, from the point of view of state management, the proposed model can be of great benefit to decision makers because when analyzing the performance of universities by geographic area, they can confirm that the distribution of financial resources from the State should be redesigned to improve the overall efficiency of the system.In other words, this approach can lead to a more realistic distribution of resources based on the results obtained, showing that 50% of the models built have correct classification rates of 64.58% and 58.33% for training and validation data, respectively.Further research is recommended to identify the most influential variables when predicting whether or not a HEI is efficient, and to include in the neural network variables that have not been considered in determining efficiency by means of the DEA, such as, for example, the transparency index of each of the HEIs, and evaluate the behavior of the neural network.As well as the use of different machine learning techniques instead of neural networks.
It is also recommended to model the neural network considering a predictive model of the number value of the efficiency index and not of the efficient or inefficient classification determined in this study.

Figure 5 -
Figure 5 -ROC curve of estimated neural network.

Table 1 -
HEI considered in the study.
Source: Compiled by authors.

Table 3
Figure 2 displays the ROC (Receiver Operating Characteristic) curve for the training data, indicating an AUC (Area Under ROC Curve) of 0.894, which shows that the neural network model has a good classification capacity.Pesquisa Operacional, Vol.39(2), 2019

Table 2 -
Efficiency of HEIs years 2011 to 2013 .

Table 3 -
Confusion matrix training data.

Table 4 -
Confusion matrix validation data.ROC curve of estimated neural network.

Table 6 -
Confusion matrix training data.

Table 7 -
Confusion matrix validation data.