Prediction of course completion by students of a university in Brazil

The conclusion of the undergraduate course by university students in the time predicted by the curriculum is desirable for young people and for society. The aim was to verify the reliability, sensitivity and specificity of a broad set of predictors for academic performance of university students, who completed the undergraduate course within the time predicted by the curricula, through data mining methodology, provided by the Support Vector Machines algorithm. A simple approach is proposed for the prediction of course completion by students in a university in Brazil. The dataset has 170 students who finished the course and 117 who did not finish. With the proposed methodology, it was possible to predict the course completion by students with an accuracy of 79.5% when using the 19 original variables. An accuracy of 75% was found using only 05 variables: Course, year of the course, gender, initial and final academic performance.

The conclusion of the undergraduate course by university students in the time predicted by the curriculum is desirable for young people and for society, considering the personal and economic costs involved.Entering the university is permeated by changes in routine and new challenges, including academic challenges (Monteiro, Tavares & Pereira, 2008;Osse & Costa, 2011).In the school context, academic performance constitutes the main outcome indicator, having direct implications for the conclusion or not of the undergraduate course in the time predicted by the curriculum.School performance can be defined as an estimated measurement of the results of what a student learned from an instructional process, with the usual way to operationalize it being through ratings or grades (Cascón, 2000), which may, with regard to university students, be influenced by various conditions.The aim of this study was to highlight the influence of mental health and social skill variables on academic performance and its associations with demographic and educational characteristics.
The relationship between academic performance and school anxiety was approached by Hernández-Pozo et al. (2008), with Mexican students, with it being found that those who had academic difficulties presented increased rates of depression, anxiety, suicidal ideation, and impoverished interpersonal relationships.Similarly, with Syrian university, Gonçalves et al. (2012) found that the lower the academic performance at the end of the course, the higher the depression scores at the beginning of the course, and at the end of the course, with these scores increasing with decreasing expectations of academic performance.The authors emphasized that poor school performance can act as a factor that contributes to the development of depressive symptoms.They found that depression at the beginning of the course may be transient and part of the adjustment of the student to academic life, however, in some cases, these symptoms may constitute a source of vulnerability, contributing to the development of other psychological problems.The relationship between depression and academic performance was studied by Turner et al. (2012), with 1,280 US university students from different years and graduation courses, from the first to final year.They found that students with indicators of depression symptoms in the second, third and fourth quartiles were more likely to present poor school performance with a lower cumulative mean grade, even after adjusting for age, gender, school year, race/ethnicity, substance use and level of debt on the credit card.
With respect to the demographic characteristics, an influence of gender for mental health problems has been verified, with women presenting a greater likelihood of developing mental disorders (Adlaf, Gliksman, Demers & Newton-Taylor, 2001) and tending to seek more help from psychological services (Peres, Santos & Coelho, 2004), although having more social skills (Al-Alawneh, Meqdadi, Al-Refai, Khdair & Malkawi, 2011;Bolsoni-Silva, Loureiro, Rosa & Oliveira, 2010;Ozben, 2013) and better academic performance (Ballester, 2012;Nemer, Fausto, Silva-Fonseca, Ciomei & Quintaes, 2013).In predictive studies, age was not identified as a predictor of mental health (Lima, Domingues & Cerqueira, 2006;Mohsen & Mansoor, 2009;Oladipo & Ogungbamila, 2013), the authors found relationships between being older and presenting mental health problems.Regarding the type of housing and the mental health of the university student, living alone, compared to living with parents, proved to be a strong predictor for problems (Cerchiari, Caetano & Faccenda, 2005;Facundes & Ludemir, 2005;Lima et al., 2006).Adlaf et al. (2001) and Shamsuddin et al. (2013), however, found that increased psychological distress was not associated with the type of housing.Concerning the influence of work on mental health, Cerchiari et al. (2005) verified a relationship between work and presenting mental health problems, with the first assessing minor mental problems and the second substance use.In contrast, Mounsey, Vandehey and Diekhoff (2013) found that work has importance for indicators of anxiety, but not for those of depression.To live with family, being married and belong to the ethnic majority were associated with good mental health conditions in the study by Cerchiari et al. (2005).
The school characteristic, the year of the student in the university, also seems to influence the social skills repertoire (Bolsoni-Silva et al., 2010) and mental health problems.It has been highlighted that mental disorders are more frequent in the first academic year (Bolsoni-Silva & Loureiro, 2015b, Fang et al., 2010, Adlaf et al., 2001), although Shamsuddin et al. (2013), with older students of earlier years generally presenting more indicators of anxiety and depression.Another school characteristic associated with mental health problems concerns the area of the course, which has been addressed in the studies of Cerchiari et al. (2005), Neves and Dalgalarrondo (2007) and Bolsoni-Silva and Loureiro (2015).These studies indicate the higher prevalence of mental health problems in students from the area of exact sciences, followed by the humanities.Bandeira et al. (2005), however, found a higher prevalence of problems in students of the Social Sciences and Al-Alawneh et al. (2011) found that students of humanities courses were more adept in team-working skills than the students of exact sciences.
Analyzing these studies, it was determined that certain mental health, demographic and school conditions may have a negative effect on academic performance.Conversely, social skills have been considered protective factors for adaptation in different contexts of life, being that "... a good social skills repertoire illustrates the quality of social relationships, with contributions for personal satisfaction, professional achievement and quality of life" (Del Prette & Del Prette, 2013, p. 48).
Regarding the influence of gender on social skills and academic performance, Iturra et al. (2012) demonstrated differences between men and women, with women presenting positive correlations for defensive assertiveness and assertiveness with friends with academic performance and negative correlations for hostile attitude, disregard for the rights of others, general aggression and covert hostility with academic performance.Similarly, Aguilhar (2008) found that women were more assertive in school situations and men in emotional situations, and that these relationships are accentuated with age.Ozben (2013) found that women were more skilled, had more satisfaction with life and were less lonely, compared to men, which can contribute to academic performance.
Regarding the school variables, the year of the course also seemed to influence the social skills repertoire, with freshmen presenting more difficulties than second year students (Al-Alawneh el al., 2011).Gomes and Soares (2013) identified, with students of the first two semesters, positive correlations between social skills, academic expectations and performance.Artino et al. (2012) conducted a study with students at different stages of training of medical school and found that students with higher grades were those that most sought help, with negative correlations.
The influence of social skills on mental health has also been investigated, with an association between a social skills deficit and social anxiety (Angélico, Crippa & Loureiro, 2012;Bolsoni-Silva & Loureiro, 2014), as well as depression (Bolsoni-Silva & Loureiro, 2015a;Segrin, 2000) verified in university students.
With regard to the interfaces between the social skills, mental health and academic performance variables, Al-Dubai, Al-Naggar, Alshagga & Rampal (2011) indicated a possible dependence between the three variables to indicate that the lack of emotional support, an aspect linked to social skills, may be related to higher levels of stress, a condition that can lead to problems of academic performance, such as avoidance.Lopez-Barcena et al. (2009) found that difficulties in academic performance were related to mental health problems and social skill deficits.Analyzing these studies it was found that only a few had investigated the relationship between academic achievement, mental health and social skills in association with the sociodemographic characteristics and school variables, such as the academic year, area of knowledge and course (Aguilhar, 2008;Iturra et al., 2012;Lópes-Bárcena et al., 2009;Oliveira, Dantas and Banzato, 2008), which indicates the need for further studies on this topic.
Satisfactory academic performance is the key element for the outcome completion of the undergraduate course within the time predicted by the curriculum and the identified studies addressed diverse variables, however, almost always with some specific combinations, which can be a limiting factor in identifying the predictive capacity of a wide range of indicators.Therefore, the aim was to verify the reliability, sensitivity and specificity of a broad set of predictors for academic performance of university students, who completed the undergraduate course within the time predicted by the curricula, through data mining methodology, provided by the Support Vector Machines algorithm.

Participants
Participants were 287 students of a public university in Brazil, with 141 male and 146 female (mean age = 20 years; SD = 3.16), from different areas of knowledge: exact (n = 126), human (n = 111) and biological (n = 50) sciences.Of these, 170 completed the graduation in the time predicted by the curricula and 117 did not.The participants responded to the following instruments: General Questionnaire, Evaluation of Social Skills, Behaviors and Contexts for University Students Questionnaire (QHC-University Students, Bolsoni-Silva & Loureiro, 2015c), Social Skills Inventory (IHS-Del Prette -Del Prette & Del Prette, 2001) and Structured Clinical Interview -DSM-IV (SCID Del Bel et al., 2001).From the student records, the school data related to academic performance was also collected, namely: mean scores obtained by the participant at the beginning of the course, defined as up to one semester prior to the middle of the course; and completion of the undergraduate course within the time predicted (YES/NO).

Data Analysis
A prediction problem involves identifying to which category -undergraduate course completed or not completed -a new student will belong based on a training set of data with variables of which the category is already known.The set of variables associated with a sample is represented as a vector.Machine learning entails the use of mathematics, statistics and computational methods aimed at finding efficient and accurate algorithms for classification.
Learning algorithms for classification have been successfully deployed in a variety of applications, such as: food (Maione et al., 2016) and animal science (Aguiar et al., 2012), among others.The learning stages of our prediction problem are as follows.Starting with an existing collection of labeled samples, the data was divided into a training set and a test set.The k-fold cross-validation was used for the model selection and for performance evaluation.In our problem, the original set is partitioned into k equal-size subsets.In k subsets, one subset is set apart for testing, while the others, consisting of k-1 sets, are used for training.This process is repeated k times, with each of the k subsets used exactly once as the test (Witten, Frank & Hall, 2011).Here k=20 was used.After applying the classification algorithm and analyzing its accuracy using all the original given features, the same methods were applied for a reduced number of selected features.This reduced number of features was obtained by a feature selection algorithm.The accuracy and computational complexity of the algorithm can be affected by the total number of variables in a problem.By using a variable selection algorithm, it is presupposed that a data set may have variables that do not provide additional information beyond the selected features.Here in our work the correlation feature selection (CFS) subset algorithm was used.A search algorithm is used in CFS to evaluate the value of the feature subsets.The method of discovery by which CFS measures the suitability of feature subsets considers the usefulness of each variable for finding the class label along with the level of intercorrelation among the variables (Hall, 1998).
The well-known classification algorithm SVM was used in the present study.The Weka (Waikato Environment for Knowledge Analysis) software was used to perform the algorithm.Weka contains a collection of visualization tools and algorithms for data analysis and classification, and is available free of charge under the GNU General Public License (Witten et al. 2011).
An intuitive explanation of the algorithms used in this study is provided below.A more in depth discussion can be found in Tan, Steinback and Kumar (2001).

Support Vector Machines
SVM was introduced by the Russian mathematician Vapnik and it is one of the most used classification algorithms.It was extended for use in a non-linear case by Boser, Guyon, andVapnik in 1992 (Boser et al. 1992).The use of the soft-margin was introduced in 1995, which uses slack variables for non-separable linear cases (Cortes and Vapnik, 1995).
SVM is based on a procedure that finds a special type of linear model called the maximum-margin hyperplane.To picture a maximum-margin hyperplane, consider a two-class dataset whose classes are linearly separable, meaning that a hyperplane in the input space classifies all training instances correctly.The maximummargin hyperplane can also be defined as that which gives the greatest separation between the classes.The instances that are closest to the maximum-margin hyperplane are called support vectors.Each class has at least one, however, often more than one support vector.What is critical is that the set of support vectors singularly defines the maximum-margin hyperplane for the learning problem (Witten et al. 2011).
The classifier should choose a hyperplane that minimizes errors when classifying new samples.Decision boundaries with greater margins tend to generate fewer errors than those with smaller margins (Tan et al., 2006).Therefore, a linear SVM is a classifier that searches for the hyperplane with the widest margin.

Experiments and accuracy tests
Tests were performed to measure algorithm performance as follows: the original 20 attributes were used, and the 5 most highly ranked elements, selected by the CFS Subset Eval procedure, were used.To compare the models, three performance criteria were employed: Where true positive means t predicted to belong to class A and which can in fact be found in it; false positive means t predicted to belong to A but which cannot actually be found in it; true negative means t not predicted to belong to A and which cannot in fact be found in it; false negative means t not predicted to belong to A but which can in fact be found in it.These values can be represented by a matrix, called the confusion matrix (Table 1).

Results
Table 1 presents the Confusion Matrix, Confusion Matrix using 19 variables and Confusion Matrix using 5 variables, as well as the Accuracy, Sensitivity and Specificity results using 19 and 5 variables.The 19 variables that entered the model of final academic performance were: period of the course, area of the course, course, year of the course, marital status, whether dating or not, type of housing, gender of the respondent, age of the respondent, working or not, six behavioral categories of social skills (QHC-University Students, IHS-Del Prette), measure of mental health (SCID-IV) and two initial measures of academic performance.The analysis with five variables included: area of the course, year of the course, gender of the respondent, initial academic performance and final academic performance.
The accuracy for classification using all 19 features was 79.5% and 75% when using 5 variables.

Discussion
Two explanatory models were identified for the academic performance of students, who completed the undergraduate course in the time predicted by the curricula, through statistical models that verified the reliability, sensitivity and specificity for predicting demographic, mental health, course and social skill variables.The model including all the variables of the study explained 80% of the final performance, which confirms the literature, although studies that dealt all the variables simultaneously were not identified.A model with only five variables predicted 75% of the final performance, and included the variables gender, type of course, year of the course, grades and initial academic performance.The prediction percentage difference is small using far fewer variables, which is relevant for the cost and broadening of the applicability.
As in previous studies (Cerchiari et al., 2005;Facundes & Ludemir, 2005;Lima et al., 2006), living alone, compared to living with family or in student accommodation was associated with problems, however, these studies noted that the housing situation influenced mental health and, in the present study, this was associated with satisfactory academic performance.Adlaf et al. (2001) and Shamsuddin et al. (2013) found no relationship between housing and mental health, however, the authors did not evaluate academic performance, differing from the object of this study.Working also helped explain the final academic performance, seeming to agree with Cerchiari et al. (2005) and Mounsey et al. (2013), who found that paid work was a protective factor for the mental health of the university student and, conversely, in disagreement with Khan et al. (2007), who found no such relationship.However, again, these authors did not focus on academic performance.Future studies can control the mental health and academic performance variables in order to elucidate these findings.
The studies of the area commonly show that women have more difficulty in mental health (Adlaf et al., 2001), that they tend to more often seek psychological help (Peres, Santos & Coelho, 2004) however, conversely, they also have more social skills (Al-Alawneh et al., 2011, Bolsoni-Silva et al., 2010, Ozben, 2013) and present better academic performance (Ballester, 2012;Nemer et al., 2013) than men, which can function as protective factors for academic performance, which is in agreement with the data of this study.The year of the course is cited in the literature as influencing mental health (Adlaf et al., 2001;Gonçalves et al., 2012;Shamsuddin et al., 2013) and the social skills repertoire (Bolsoni-Silva et al., 2010), which means that they are relevant variables when the focus is on academic achievement, which was also observed in this study.The university course also influences mental health (Bandeira et al., 2005;Bolsoni-Silva & Loureiro, 2015;Cerchiari et al., 2005, Neves & Dalgalarrondo, 2007), the social skills (Al-Alawneh et al., 2011) and academic performance (López-Bárcena et al., 2009), which was confirmed in the present study.
The social skills and mental health of university students may be variables that are less dependent of contextual conditions, such as the professional choice of a particular course/area of graduation, as they reflect more the conditions of the personal learning history of the young people and, accordingly, may influence academic performance in the medium-term, requiring interventions as early as possible to prevent school dropout.Thus, school variables, such as type of course and the academic year of the student present specificities that require preventive measures, i.e. courses of exact, human and biological science require study behaviors from the student, where by it would be the responsibility of the coordination of the course to help in the adaptation of the students who do not present, at the initiation of university, these behaviors necessary for good academic performance.For example, in humanities courses public speaking and teamwork skills are required, these being social skills that should be assessed and promoted in those students who have more difficulties, thus preventing mental health problems.In the case of courses in the areas of exact and biological sciences the contents taught are almost always much more complex that learned in the previous steps, without the students being shown how they will use these in the future professional practice.
The risk variables can also be combined, i.e., men present less mental health problems, however, are less skilled and, considering those in the areas of exact science, there is the additional risk of having more academic difficulties.A possible overlapping of conditions is highlighted here, expressed by the presence of more men in the exact science courses, and those with fewer social skills, possibly, questioning less and having more difficulty seeking clarification, which favors this more negative academic outcome.
Therefore, a set of variables is evidenced in this simultaneous analysis that correspond to cumulative risks for the negative final academic performance, that is, failure to complete the graduation in the time predicted.Therefore, students of the exact science courses (Cerchiari et al., 2005;Neves & Dalgalarrondo, 2007;Bolsoni-Silva & Loureiro, 2015), students of the early years (Bolsoni-Silva & Loureiro, 2015b, Adlaf et al., 2001;Gonçalves et al., 2012) and male university students (Aguilhar, 2008;Iturra et al., 2012;Lópes-Bárcena et al., 2009;Oliveira et al., 2008), require prevention and protection, especially those who present academic difficulties at the beginning of the courses.Accordingly, the university could have a welcome policy for the university students that included evaluation of these variables, favoring preventive interventions and guidelines for this high-risk population.In general, the findings of this study seem to agree with previous studies that have found relationships between social skill, mental health and academic performance variables (Al-Dubai et al., 2011;López-Bárcena et al., 2009), as well as these constructs with sociodemographic measures (Aguilhar, 2008;Iturra et al., 2012;Lópes-Bárcena et al., 2009;Oliveira et al., 2008).
It is concluded that experiences in the university may hinder adaptation, good academic performance (Hussain et al., 2013;Ramírez et al., 2009;Shamsuddin et al., 2013;Wristen, 2013) and effect the mental health of the student (Baptista et al., 2012;Bitsika & Sharpley, 2012;Fang et al., 2010;Gonçalves et al., 2012;Osada et al., 2010).Conversely, social skills can promote mental health (Adlaf et al., 2001;Bolsoni-Silva & Loureiro, 2014;Bolsoni-Silva & Loureiro, 2015a), adaptation (Monteiro et al., 2008), academic performance (López-Bárcena et al., 2009) and remaining in the university.These skills can be taught and, with this, the effect on mental health and academic performance can be immediately minimized, which in turn can also have an influence on the lives of young people.As the social skills and mental health of university students are variables with more personal than contextual characteristics, they can influence, in the medium-term, not only academic performance, but also the quality of life of the young people.Therefore, interventions are required as soon as possible to avoid university dropout and for the young people to make the best use of their personal resources.
Different authors have indicated that emotional support and/or support from the university is important to prevent mental health problems (Facundes & Ludemir, 2005;Lima et al., 2006;Willams & Galliher, 2006).The support could be, on one hand, academic, considering the specificities of the areas, such as teaching students of the exact areas how to study, with public speaking practice for those of the human sciences.On the other hand, it could be directed towards personal relationships, such as making friends (Facundes & Ludemir, 2005;Lima et al., 2006;Neves & Dalgalarrondo, 2007), interacting with parents, in the adult condition (Neves & Dalgalarrondo, 2007), feeling accepted (Facundes & Ludemir, 2005) and adapting to the new city of residence (Facundes & Ludemir, 2005).
Considering that entering university implies changes and challenges for the young adults, including academic challenges (Monteiro et al., 2008;Osse & Costa, 2011), with the rating or grade (Cascón, 2000) being the main outcome indicator, this study contributed to increase the production of knowledge in the area by clarifying which variables are predictive of performance and completion of undergraduate courses, studying multiple variables with the same sample.Such data can be instrumental for institutions and professionals interested in this subject and in preventing university dropout.
It should be noted that the university, as a public policy should be concerned with assessing and promoting mental health and social skills in order to increase adherence to the university and good academic achievement, valorizing the resources invested and the windows of opportunity that academic training opens for young people.

Conclusion
This study found the influence of multiple variables for academic performance, from sociodemographic and course measures to those related to behavior, in this case social skills and mental health.The study fills the gaps in the literature as it assessed, using a sophisticated mathematical model, the mutual influence of these variables, finding two explanatory models.One of the models, which explained 75% of the final academic performance includes sociodemographic and course measures, which may be useful in planning strategies in the academic context that more directly give special attention to these conditions, thus favoring low-cost prevention measures and the promotion of good performance.The other model also includes the behavioral variables of mental health and social skills, which highlights the need for the university to prevent problems by providing specialized services that offer treatment and personal development.As a limitation of the study it is necessary to point out that with this method it is possible to state that these measures mutually influence each other, however, it is not possible to specify in which directions these influences work, which may be the subject of further studies.Other studies may also include observation of measures for social skills and expand the sample to other universities and regions.

Table 1
, with support in the empirical literature demonstrating such relationships (Aguilhar, 2008; Artino et al., 2012; Burgess et al., 2013; Confusion Matrix, Confusion Matrix using 19 variables and 5 variables.Accuracy, Specificity and Sensitivity results using 19 and