A PROBABILISTIC APPROACH APPLIED TO THE CLASSIFICATION OF COURSES BY MULTIPLE EVALUATORS Annibal

How to measure the perceived influence of a course on its alumni skills? This paper describes the use of CPP-TRI as a tool to face this problem. The method was applied here in the context of a M.Sc. course evaluation. Levels of impact and importance previously determined for different features provide the framework for the analysis. Classifications by different groups of evaluators are combined. Taking into account the subjectivity in the assessments, CPP-TRI treats them as realizations of random variables. The combination of the evaluations is performed by computing joint probabilities, what avoids the assignment of weights to evaluators. Interval classifications between a hostile and a benevolent limit are provided offerings the educational evaluator a deeper understanding of the results. An additional study is here performed on the classification of the features. A total of sixteen features are sorted.


INTRODUCTION
This paper studies the problem of combining assessments of a group of evaluators about features of a project or institution, evaluated by their impact and by their importance.By impact is meant the effectiveness of the presence of the feature and by importance the significance of such presence for the objectives of the project or institution.
Levels of impact and importance previously determined for the different features provide a framework for the analysis.The information extracted from the data about which features are more and less present in the institution or project and more and less relevant for its finalities form a basis for the managers dealing with them to decide what to improve and what to explore.
The classification procedure employs an approach to the sorting problem based on randomizing the trichotomic decision of classifying any evaluated alternative as superior, inferior or equivalent to previously determined profiles based on the different levels of impact and importance that they may have for the evaluators.The object of evaluation is finally allocated in the class for which the probabilities of receiving evaluations above and below the class profiles are nearest to each other.This method is named CPP-Tri by Sant'Anna et al. (2015) to emphasize its use of the composition of probabilistic preferences (Sant'Anna, 2002) in a trichotomic procedure.Its first step consists of obtaining probability distributions for the profiles that define the classes in advance, as well as for the value judgments of the impact and the importance of the features by each evaluator.Probabilistic comparisons between evaluations according to each evaluator and profiles are then performed.A composition of these probabilistic preferences into global scores concludes the classification procedure.
To combine preferences of different evaluators, a method based on previous probability distributions liberates the decision-maker from the burden of assigning weights or determining threshold parameters.Instead, general assumptions on the form of the joint distribution that may be modified to take into account special information about the decision-makers, if judged necessary, are employed.
Here are examined features of a graduate course by two different groups of evaluators.The goal is that the course be classified as of high level for the impact of the features classified as of higher importance.
The next section brings a review of the relevant literature.Section 3 describes the probabilistic framework and the trichotomic classification procedure.Section 4 presents the features evaluated and the data set studied.Section 5 applies the methodology.Section 6 concludes the paper.

CPP-TRI
The problem here faced is analogous to the sorting of alternatives that are evaluated under multiple criteria into classes determined by a variable number of profiles described by evaluations on the same criteria in Sant'Anna et al. (2014).
The initial problem is that of classifying a course in one of five ordered classes.What makes this problem more complex is that different evaluators of different types: students, coordinators and managers of the students in the companies they work apply different criteria associated to desired qualities of the graduates.The evaluation by a different feature from a different point of view by a different evaluator is treated as the application of a different criterion.
A second problem addressed here is that of exploring the data to classify the features.Five ordered classes are considered again.
The classification procedure taken is similar to that proposed by Almeida Dias et al. (2012).The classes are simply identified by numbers from 1 to 5 and the evaluators are asked to make their assignments according to each feature in terms of grades from 1 to 5. The final classification is made in terms of proximity of the vectors of evaluations to vectors of identifying profiles.
In the present case, instead of identifying the classes by a larger number of profiles, only one profile is used to identify each class, as in ELECTRE-TRI-C (Almeida Dias et al., 2010).This unique profile has for each of its coordinates one of the five values employed to measure the feature represented in this coordinate.
The main difference of the method herein to those of the ELECTRE-TRI family is that it employs a previous probabilistic transformation proposed in Sant'Anna (2002).The probabilistic transformation is based on assuming random disturbances affecting the evaluations.
Following the principles of classical statistical modeling, independence between the evaluations of different alternatives, and identical normal distributions for the random disturbances in the measurements are assumed.If there is knowledge advising other distributions, it can be employed without substantial changes in the procedure.Nevertheless, the influence of the form of the distribution is limited in this technique by the trichotomic classification being made by comparing only sizes of intervals that always include the tails of the distribution.
Employing these assumptions, each numerical evaluation is replaced by a probability distribution centered on it.The comparisons between observed evaluations and representative profiles are then made in probabilistic terms.The whole technique involves the following three stages: i) Calculus of preference probabilities by each criterion As above described, the initial numerical assessments by each criterion are treated as means of probability distributions.Assuming normality, mean and variance are enough to determine each distribution.A common variance may be estimated from the observed variance of the vector of profiles provided to represent the different classes.
ii) Calculus of probabilistic distances from evaluations to profiles Once a probability distribution for the evaluation by each criterion is determined, the probability of each alternative being above or below each profile is computed for each criterion.Assuming independence, the probabilities of being above and below all the profiles of a class are given by the product of the probabilities of each being above or below each profile.

iii) Allocation in the nearest class
After the probabilities of the alternative being above or below the profiles of a class are computed, it is possible to calculate the difference between these probabilities.A null value for this difference has a null probability, while positive probabilistic distances favor the classification above the class or in it and negative values, on the contrary, indicate that the alternative should not be classified above the class.This difference is calculated for all classes, and the alternative is allocated to that class for which the absolute value of the difference is the lowest.
To formally describe the technique, consider m criteria, n classes and, to simplify, only one profile for each class.For each k from 1 to m, the alternative to be classified receives a numerical evaluation a k .For each i, from 1 to n, the i-th class is represented by the profile C i .These representative profiles are vector of m coordinates.Let (C i1 , . . ., C im ) denote the representative profile of class i.The classes are ordered: ( Each a k is thought as the mean of a normal distribution.The variances of these distributions will be given by the observed variances V k in the sample of profile coordinates {C 1k , C 2k , . . ., C nk }.
Let X k represent a random variable with a normal distribution of mean a k and variance V k and let A + ik and A − ik denote, respectively, the probability of X k presenting a value above and below the k-th value of the profile C i .By independence, and The probabilistic score for the i-th class is given by The alternative is assigned to the class i with the minimum absolute value for δ i .
A tie may occur is in this computation.The alternative will then be allocated in the two neighbor classes, reflecting the imprecision of the assessments.
From the beginning we deal with rough approximations.The numbers that the evaluators provide are their intuitions about the center of probability distributions.To generate more information on the presence of uncertainty in the classification, the classification derived from replacing the hypothesis of independence between the evaluations by the different evaluators may be compared to that resulting from a different assumption on correlation.For instance assuming maximal correlation between evaluations by different criteria instead of independence, A + i and A − i will be given by and Other classifications may be generated with the same aim of gathering information on the uncertainty in the final classification.Sorting processes based on more and less benevolent rules to classify above and below the representative profiles determined by fixed rates of reduction applied to the probabilities of being above or below the representative profiles may be applied.
A benevolent classification determined by the reduction percent c will place alternative A in the class C c (A) + whose index i minimizes the absolute value of the difference Analogously, a hostile classification for the same reduction percent will place alternative A in the class C c (A) − whose index i minimizes the absolute value of the difference

COMPOSITION OF FEATURES EVALUATIONS
The probabilistic approach is here applied to replicate the evaluations of the M.Sc.The impact of the course by different features and the importance of these features had been evaluated by a set of 21 students who took the course, four managers of the company that employs them, 20 teachers of the course and nine coordinators of courses at other universities evaluated together with the course in a Brazilian evaluation system.
The classifications were on five classes, each represented by a unique profile with all evaluations equal to each of the five values of a Likert scale.For the impact was employed a scale with values −2, −1, 0, 1 and 2 and for importance values from 0 to 4. The variance of 2 that characterizes samples of these five values was assumed for all the distributions.To produce a new global evaluation of the course, the different experts' evaluations of the impact of different features were treated separately.This approach resulted in hundreds of evaluations of the course that were combined in different ways.First, all evaluations were considered as individually final.Then, a combination of the evaluations by the members of each group with a same relation to the course was performed.
All these evaluations confirmed the location of the course obtained in previous analyses by Nepomuceno et al. ( 2010) and Nepomuceno & Costa (2012) as "good", which corresponds to classification in the fourth of the five classes considered in ascending order.This result follows from the classification of the impact of the course in that class in 21 of 52 evaluations by managers (corresponding to 40% of this subsample), 46 of 134 evaluations by coordinators (33%), 120 of 145 evaluations by teachers (83%) and 146 of 308 evaluations by students (47%).

CLASSIFICATION OF THE FEATURES
A sorting of the features was then performed, considering their evaluations in terms of impact and importance.Following Kahneman (2011) preference for the contribution of outer views, this last study was based on the evaluations of the managers of the students and the coordinators of other courses only.The teachers and students of the course were not considered.Thus, 58 alternatives were evaluated, as the coordinators evaluated the 16 features on impact and importance and the managers did not evaluate features 1, 3 and 10 because they were considered unable to access the effect of the course on such personal subjects.
The ordered classes and their unique profiles have the same form of the previous analysis.The evaluations of, respectively, managers of the students and coordinators of other courses are presented in Tables 2 and 3.The letter N in a cell of these tables is used to indicate that the evaluator did not evaluate the feature.Table 4 shows the final classifications for the global comparisons in terms of joint preference.Assuming independence, this joint preference is computed as the product of the probabilities derived from the evaluations by the different individuals.
The classification thus obtained in terms of the impact of the course was that the features of improvement in market suitability, problem-solving abilities, and ability to employ technical skills are classified in a higher class than the others by both coordinators and managers.The first feature, employability, may be included in this group of highest preference as it was not evaluated by the managers and was placed in the highest class by the coordinators.The features of sociability, research ability, self-esteem, and fluency were classified in the highest class with respect to impact only by the managers.
Regarding the importance of the feature, sociability was elevated to the highest level by both groups, replacing the ability to employ technical skills, which was considered less important by Pesquisa Operacional, Vol.36( 3), 2016  the coordinators.Whereas the features of entrepreneurship and critical thinking were elevated to the highest class by the managers, oratory skills and personal life (this last feature not having been evaluated by the managers) were classified by the coordinators as the least important.
Thus, when evaluating importance and impact, the managers of the students working at their companies appear more benevolent than the coordinators of similar courses.This difference may be due to the members of the latter group having a better knowledge of similar projects while being less involved with the participants of the project.
The classification of the features was employed to review the evaluation of the course.In this stage, the impact of only the features of highest importance was considered.Then it could be seen that, by those three features most important for the coordinators, the course would be classified in the highest impact class for market suitability and problem solving and would be classified as good for the impact on sociability.For the managers, the classification by the features in the highest class for importance is in the highest impact class for seven of the nine features of highest importance, the other two features resulting in the classification in the fourth class.
To evaluate the effect of the hypothesis of independence, a second classification was performed assuming that the evaluations from members of the same group were highly correlated.In this second approach the probabilities of classification above and below each class are made by equations ( 5) and ( 6), respectively, i.e., by the minimum and the maximum of the probabilities of the respective classification by the different members of the group.The results of this second classification are presented in Table 5.
It can be seen in Table 5 that the assumption of high correlation resulted in all features being classified in the fourth class with respect to impact.With respect to importance, there is a slightly higher discrimination, but it is still lower than that found by the evaluation under the hypothesis of independence.This is due to the downgrading of some features.In fact, under this last assumption, even for the managers, only three features are considered in the highest level: sociability, problem-solving, and technical skills.For the coordinators, all features classified under independence in the highest level come now to the fourth level.Between those previously classified in the fourth level, the use of technical skills as well as self-esteem and disinhibition fall also one level and are now added to oratory and personal life in the third level.
In another assessment of the variability of the results, Tables 6 and 7 present the results of taking a benevolent and a hostile approach to obtain class intervals.Percentual changes of 50% in the limits are employed.
Another indication of the lower discriminative power of the second approach is by the amplification of the ranges of the interval classification of the different features, resulting from a larger number of divergences between the hostile and benevolent classifications.Considering importance, for instance, when the changes in the limits are applied, three differences are observed in the assessments by the coordinators if correlation is assumed, of which only one is maintained under the independence assumption.In the assessments by the managers, this relation increases to ten against four.
For impact, the total of divergences jumps from two to four for coordinators and from four to 13 for managers, when independence is replaced by maximal correlation.In fact, it can be noticed in Table 6, that all the 13 features evaluated by the managers had a hostile classification lower than the benevolent classification if maximal correlation is assumed.It is interesting to notice also in Tables 6 and 7 that all the divergences when independence is assumed only affect features for which divergence is already observed when correlation is assumed.

CONCLUSION
The probabilistic approach here employed to combine the evaluations by a group of evaluators was successfully applied to classify features in terms of importance and impact and to evaluate a course.
The final evaluation as "Good" obtained in a previous analysis was confirmed.Besides, indications that for the features evaluated as more important the impact was evaluated as higher were revealed.
New information was also obtained on the perceived importance of the features.From the 16 features examined, clear agreement was obtained on the classification in the highest importance class of two distinct features: market suitability and problem solving.
Issues related to the mathematical modeling of the classification problem were also addressed.Empirical evidence was obtained favoring the idea that a higher discriminative power is obtained if independence between the disturbances is assumed.
A question that arises from situations where perceptions of several evaluators are combined is how to reduce the stress on the evaluators to reach coherent judgments.The knowledge that they don't need to weight criteria because a probabilistic composition will be able to deal with the assignment of importance may contribute to simplify matters for the evaluators along the decision process.
Another interesting aspect of employing a probabilistic approach to combine preferences like those here considered is that it allows the final results to be provided in terms of interval classifications between a hostile and a benevolent limit.The probabilistic formulation of these limits offers the decision maker a deeper understanding of the results.
Finally, it should be noticed that this approach can be extended to other situations where weighting features or weighting evaluators may be inadequate.
extended Electre III, Mustajoki et al. (2005) employed interval numbers in a combination of SMART with SWING (von Winterfeld & Edwards, 1986) criteria weighting method.Mahdavi et al. (2008), Dheena, & Mohanraj (2011) and Eiselt & Marianov (2014) addressed TOPSIS.Wang et al. (2008) and Fan et al. (2010) extended the use of the matrices of pairwise comparison of AHP to stochastic measures.Nefeslioglu et al. (2013) and Feizizadeh et al (2014) also took AHP as starting point.Chang & Wang (2009) and Merigó et.al. (2014) used the fuzzy approach to extend MAUT approaches.Chou & Lin (2009), van der Pas et al. (2010), Liu et al. (2011) and Perez et al. (2014) also developed MAUT models.Dealing with the sorting problem, it may be found the work of Janssen & Nemery (2013), using interval parameters to take into account uncertainty and extending the models of Nemery & Lamboray (2008), based on PROMETHEE, and of Ishizaka et al. (2012), based on AHP.In another direction, Almeida Dias et al. (2010, 2012) developed ELECTRI-C and ELECTRE-TRI-nC to extend ELECTRE TRI.While ELECTRE-TRI places the alternatives into classes separated by boundaries determined by fictitious alternatives with vectors of performances determined according to the multiple criteria, in ELECTRE-TRI-C a vector of central performances replaces the boundary vectors.ELECTRE-TRI-nC extends ELECTRE-TRI-C, allowing for a larger number of vectors determining each class.Different applications of multicriteria approaches to deal with the problem of evaluation of educational features are presented in Ishizaka (2012), Silva et al. (2012), Bortoluzzi et al. (2013) and Menezes & Pizzolato (2014).
Course of Management Systems of Universidade Federal Fluminense performed by Nepomuceno et al. (2010) and Nepomuceno & Costa (2015).Table 1 presents the different features of the course, identified by changes in the qualifications of the students.This selection of features has an initial source in Politis & Siskos (2004) and Neves (2005).Politis & Siskos (2004) evaluated a Greek Engineering Department using perceptions from students, graduates and also from companies where they work.(Neves,2005)adapted the features used in Politis & Siskos (2004) to evaluate a Brazilian M.Sc.Course under a strategic viewpoint.Based on these and other works indicated in Table1, a draft of a questionnaire was produced and submitted to a group of experts with a large experience in the management of similar courses in Brazil.Features F7 to F16 were included following the analysis in Nepomuceno & Costa (2012) of the suggestions of the group of experts.

Table 4 -
Classification of features by impact and importance under independence.

Table 5 -
Classification of features assuming maximal correlation.

Table 6 -
Hostile and benevolent classifications with respect to impact.

Table 7 -
Hostile and benevolent classifications with respect to importance.