Abstract
Paper aims
This paper presents a comparison of the performances of the Bayesian additive regression trees (BART), Random Forest (RF) and the logistic regression model (LRM) for the development of credit scoring models.
Originality
It is not usual the use of BART methodology for the analysis of credit scoring data. The database was provided by Serasa-Experian with information regarding direct retail consumer credit operations. The use of credit bureau variables is not usual in academic papers.
Research method
Several models were adjusted and their performances were compared by using regular methods.
Main findings
The analysis confirms the superiority of the BART model over the LRM for the analyzed data. RF was superior to LRM only for the balanced sample. The best-adjusted BART model was superior to RF.
Implications for theory and practice
The paper suggests that the use of BART or RF may bring better results for credit scoring modelling.
Keywords
Credit; Machine learning; Logistic regression; BART; Random Forest