# Abstract

This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC), granted to clients residing in the Distrito Federal (DF), to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR) techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters), with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

Keywords:
credit risk; geographically weighted logistic regression; credit scoring

# Resumo

Palavras-chave:
risco de crédito; regressão logística geograficamente ponderada; credit scoring

# 1. INTRODUCTION

The main activity of commercial banks is financial intermediation, which consists of raising financial resources and lending them to third parties under pre-established conditions, such as payment period, installment value, and interest rate (Hand & Henley, 1997Hand, D. J.; Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523-541.). As it involves expectation of future receipt, all credit granted is exposed to risks.

The topic “risk management” drew attention in the financial sector after the publishing of the Basel accords, which is a set of documents that serve as a basis for regulation and monitoring of the sector. Advances in technology and computing, together with the development of quantitative methods, have contributed in creating different tools for measuring risk, bringing significant gains in the financial management of institutions.

Credit risk can be defined as the possibility of financial losses occurring, associated with borrowers or counterparties not fulfilling their respective obligations in the agreed terms, with the devaluation of loan contracts because of a deterioration in borrowers’ risk classifications, with reductions in earnings or remunerations, with advantages conceded in renegotiations, and with recovery costs (Brazilian Central Bank [BACEN], 2009Banco Central do Brasil (2009). Resolução CMN nº 3.721, de 30/04/2009. Retrieved from http://www.bcb.gov.br
http://www.bcb.gov.br...
). It is one of the main risks that financial institutions are exposed to.

The models used to measure risk when granting credit are called credit scoring models. Due to them involving lower costs and greater agility, objectivity, and predictive power in credit granting decisions, credit scoring models have become popular and are widely used by the financial sector (Hand & Henley, 1997Hand, D. J.; Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523-541.).

Lessmann, Baesens, Seow, and Thomas (2015Lessmann, S.; Baesens, B.; Seow, H. V.; Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.) carried out a comprehensive study on the classification methodologies used for developing credit scoring models and indicated logistic regression as being the standard methodology in the financial sector.

Logistic regression is a multivariate analysis technique that aims to explain the relationship between a random binary dependent variable and a set of independent predictive variables (Hosmer & Lemeshow, 2000Hosmer, D. W.; Lemeshow, S. (2000). Applied logistic regression. Hoboken, NJ: John Wiley & Sons.).

Financial institutions use various credit scoring models, which are applied when evaluating different types of clients or credit operations to be contracted. The predictive variables that compose each model can be different, with the aim of improving predictions for the target population.

Geographical location (space) and its relationship with credit risk is the topic of some published studies. Among the most recent, Stine (2011Stine, R. (2011). Spatial temporal models for retail credit. In Proceedings of the Credit Scoring and Credit Control Conference, Edinburgh, UK.) analyzes the evolution of defaults on real estate loans in US counties between 1993 and 2010, contemplating pre- and post- subprime crisis periods, and finding evidence of a spatial correlation between default rates in these counties.

Fernandes and Artes (2015Fernandes, G. B.; Artes, R. (2016). Spatial dependence in credit risk and its improvement in credit scoring. European Journal of Operational Research, 249(2), 517-524.) used the Ordinary Kriging methodology to create a variable that reflects spatial risk and applied the Logistic Regression technique to verify the existence of a spatial correlation in defaults on loans taken out by small and medium sized enterprises (SMEs), using data from the SERASA credit bureau. The authors developed models with and without the spatial risk variable and confirmed that the inclusion of this variable improves credit scoring model performance.

The Geographically Weighted Regression (GWR) technique, proposed by Brunsdon, Fotheringham, and Charlton (1996Brunsdon, C.; Fotheringham, A. S.; Charlton, M. E. (1996). Geographically weighted regression: a method for exploring spatial nonstationarity. Geographical Analysis, 28(4), 281-298.), is used to model spatially heterogeneous (non-stationary) processes; that is, processes that vary (whether in mean, median, variance, etc.) from region to region. The basic idea of GWR is to adjust a regression model to each region in the data set using geographical location of the other observations to weight the parameter estimates. Application of the GWR technique can be observed in different areas of research, such as Geography (See et al., 2015See, L.; Schepaschenko, D.; Lesiv, M.; McCallum, I.; Fritz, S.; Comber, A.; Obersteiner, M. (2015). Building a hybrid land cover map with crowdsourcing and geographically weighted regression. ISPRS Journal of Photogrammetry and Remote Sensing, 103, 48-56.), Health (Gilbert & Chakraborty, 2011Gilbert, A.; Chakraborty, J. (2011). Using geographically weighted regression for environmental justice analysis: Cumulative cancer risks from air toxics in Florida. Social Science Research, 40(1), 273-286.), and Economics (Huang & Leung, 2002Huang, Y.; Leung, Y. (2002). Analysing regional industrialisation in Jiangsu province using geographically weighted regression. Journal of Geographical Systems, 4(2), 233-249.).

Atkinson, German, Sear, and Clark (2003Atkinson, P. M.; German, S. E.; Sear, D. A.; Clark, M. J. (2003). Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35(1), 58-82.) used Geographically Weighted Logistic Regression (GWLR) in their study to analyze the dependency of geographical location in the relationship between erosion and geomorphologic controls in a region of Wales. The dummy variable used in this study was the presence or absence of erosion in the areas studied. Applying the GWLR technique resulted in the estimation of models with different parameters (distinct models) for each area studied, revealing the need to adopt different practices to avoid erosion, depending on the region.

This article used data related to transactions involving Consumer Direct Credit (CDC), granted by a Brazilian financial institution to clients residing in the Distrito Federal (DF). The aims were as follows: to verify whether the factors that influence credit risk differ according to borrowers’ geographical locations; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models.

Although the central idea in this article of verifying whether space influences credit risk is similar to that of Stine (2011Stine, R. (2011). Spatial temporal models for retail credit. In Proceedings of the Credit Scoring and Credit Control Conference, Edinburgh, UK.) and Fernandes and Artes (2015Fernandes, G. B.; Artes, R. (2016). Spatial dependence in credit risk and its improvement in credit scoring. European Journal of Operational Research, 249(2), 517-524.), the target population and methodology used are different, with no studies being found in the literature that used the GWLR technique for the development of credit scoring models.

One advantage of applying the GWLR technique in relation to the others lies in estimating a model for each region in the study, allowing these models to be distinct in their variables and parameters (Atkinson et al., 2003Atkinson, P. M.; German, S. E.; Sear, D. A.; Clark, M. J. (2003). Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35(1), 58-82.), whereas a global model, represented by only one formula, may not represent local variations adequately. In relation to credit, different study regions can involve different risks, and if this phenomenon is observed, models that consider local characteristics can better differentiate the credit risk for borrowers residing there and generate financial gains for the institution.

Another difference from other studies on this topic and an advantage in the GWLR technique involves the use of different samples in developing each local model, giving greater weight to borrowers who are closer geographically, and not using distant information that is outside the radius defined by the weighting function.

Questions regarding endogeneity are not addressed in this study and could be raised by researchers in future papers.

In addition to this introduction, the second section of the article presents the geographically weighted logistic regression methodology and the process for developing the models, the thirds shows the results obtained, and the fourth sets out the conclusion.

# 2. METHODOLOGY

The flowchart presented in Figure 1 details all of the stages carried out in the process of developing the models in this study.

Figure 1
Flowchart of the stages in developing the models.

# 4. CONCLUSION

In this article, real data were used from a Brazilian financial institution on transactions involving Consumer Direct Credit, granted to clients residing in 19 regions in the Distrito Federal, to develop credit scoring models using two different methodologies: Logistic Regression and Geographically Pondered Logistic Regression.

The Logistic Regression methodology is quite widespread in the financial sector, and is used in this study to develop a global credit scoring model for the whole Distrito Federal.

The Geographically Weighted Logistic Regression methodology is quite rare and uses the borrower’s geographical location to weight observations when developing different models for each region studied.

The indicators used for comparison between the models developed via the two methodologies were very close, and based on the results obtained, the methodologies can be considered as similar in terms of their power to predict financial losses for the institution.

The study demonstrated that some variables were significant for all of the regions, whereas others were significant only for particular regions, concluding that credit risk is influenced by different factors, depending on the region studied.

It was also observed that all of the regression models developed using GWLR (regional models) presented different values for the coefficients (parameters) of the variables, showing that the weights (importance) of the variables varied from region to region.

The results demonstrated the viability of applying the GWLR methodology for developing credit scoring models for the target population in this study. The formulas obtained are applicable only to this population, however, it is believed that this methodology could be extended to other credit transactions and spatial levels (e.g. neighborhoods, municipalities, federal units).

Due to great advances in computing and technology occurring in recent decades, institutions granting credit have robust credit risk evaluation systems, which makes the implantation and use of a set of models estimated via GWLR viable.

With relation to the limitations of the study, the use of few predictive variables meant that the models presented low ranges of scores.

Categorization of the Formal Income variable was carried out so that the classes were monotonic with relation to relative risk; however, the values of their coefficients were inverted. Studies considering another categorization or target population should be carried out to verify the relevance of this variable for credit risk.

For future study topics, it is suggested that: the GWLR methodology is applied to develop credit scoring models for other target populations (for example, different credit transactions or geographical regions); comparisons are carried out with other methodologies (such as Support Vector Machines or Boosting); other predictive variables are used; the GWLR methodology is applied to develop models in other areas of a financial institution, such as strategy and marketing; or other functions are used, such as the Log Binomial, to develop geographically weighted models.

# REFERENCES

• Anselin, L. (1995). Local indicators of spatial association - LISA. Geographical Analysis, 27(2), 93-115.
• Atkinson, P. M.; German, S. E.; Sear, D. A.; Clark, M. J. (2003). Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35(1), 58-82.
• Banco Central do Brasil (2009). Resolução CMN nº 3.721, de 30/04/2009. Retrieved from http://www.bcb.gov.br
» http://www.bcb.gov.br
• Brunsdon, C.; Fotheringham, A. S.; Charlton, M. E. (1996). Geographically weighted regression: a method for exploring spatial nonstationarity. Geographical Analysis, 28(4), 281-298.
• Crook, J. N.; Edelman, D. B.; Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447-1465.
• Fernandes, G. B.; Artes, R. (2016). Spatial dependence in credit risk and its improvement in credit scoring. European Journal of Operational Research, 249(2), 517-524.
• Fotheringham, A. S.; Brunsdon, C.; Charlton, M. (2002). Geographically weighted regression: the analysis of spatially varying relationships Chichester: John Wiley & Sons.
• Gilbert, A.; Chakraborty, J. (2011). Using geographically weighted regression for environmental justice analysis: Cumulative cancer risks from air toxics in Florida. Social Science Research, 40(1), 273-286.
• Hand, D. J.; Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523-541.
• Hosmer, D. W.; Lemeshow, S. (2000). Applied logistic regression Hoboken, NJ: John Wiley & Sons.
• Huang, Y.; Leung, Y. (2002). Analysing regional industrialisation in Jiangsu province using geographically weighted regression. Journal of Geographical Systems, 4(2), 233-249.
• Hurvich, C. M.; Simonoff, J. S.; Tsai, C. L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(2), 271-293.
• Lessmann, S.; Baesens, B.; Seow, H. V.; Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.
• Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17-23.
• See, L.; Schepaschenko, D.; Lesiv, M.; McCallum, I.; Fritz, S.; Comber, A.; Obersteiner, M. (2015). Building a hybrid land cover map with crowdsourcing and geographically weighted regression. ISPRS Journal of Photogrammetry and Remote Sensing, 103, 48-56.
• Stine, R. (2011). Spatial temporal models for retail credit. In Proceedings of the Credit Scoring and Credit Control Conference, Edinburgh, UK.
• *
*Paper presented at the XL ANDAP Congress, Costa do Sauípe, BA, Brazil, September 2016.

# Publication Dates

• Publication in this collection
Apr 2017