Geographically Weighted Logistic Regression Applied to Credit Scoring Models

This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC), granted to clients residing in the Distrito Federal (DF), to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR) techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters), with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.


INTRODUCTION
Th e main activity of commercial banks is fi nancial intermediation, which consists of raising financial resources and lending them to third parties under pre-established conditions, such as payment period, installment value, and interest rate (Hand & Henley, 1997).As it involves expectation of future receipt, all credit granted is exposed to risks.
Th e topic "risk management" drew attention in the fi nancial sector aft er the publishing of the Basel accords, which is a set of documents that serve as a basis for regulation and monitoring of the sector.Advances in technology and computing, together with the development of quantitative methods, have contributed in creating diff erent tools for measuring risk, bringing signifi cant gains in the fi nancial management of institutions.
Credit risk can be defined as the possibility of fi nancial losses occurring, associated with borrowers or counterparties not fulfi lling their respective obligations in the agreed terms, with the devaluation of loan contracts because of a deterioration in borrowers' risk classifi cations, with reductions in earnings or remunerations, with advantages conceded in renegotiations, and with recovery costs (Brazilian Central Bank [BACEN], 2009).It is one of the main risks that fi nancial institutions are exposed to.
Th e models used to measure risk when granting credit are called credit scoring models.Due to them involving lower costs and greater agility, objectivity, and predictive power in credit granting decisions, credit scoring models have become popular and are widely used by the fi nancial sector (Hand & Henley, 1997).Lessmann, Baesens, Seow, and Thomas (2015) carried out a comprehensive study on the classifi cation methodologies used for developing credit scoring models and indicated logistic regression as being the standard methodology in the fi nancial sector.
Logistic regression is a multivariate analysis technique that aims to explain the relationship between a random binary dependent variable and a set of independent predictive variables (Hosmer & Lemeshow, 2000).
Financial institutions use various credit scoring models, which are applied when evaluating diff erent types of clients or credit operations to be contracted.Th e predictive variables that compose each model can be diff erent, with the aim of improving predictions for the target population.
Geographical location (space) and its relationship with credit risk is the topic of some published studies.Among the most recent, Stine (2011) analyzes the evolution of defaults on real estate loans in US counties between 1993 and 2010, contemplating pre-and post-subprime crisis periods, and fi nding evidence of a spatial correlation between default rates in these counties.Fernandes and Artes (2015) used the Ordinary Kriging methodology to create a variable that refl ects spatial risk and applied the Logistic Regression technique to verify the existence of a spatial correlation in defaults on loans taken out by small and medium sized enterprises (SMEs), using data from the SERASA credit bureau.Th e authors developed models with and without the spatial risk variable and confi rmed that the inclusion of this variable improves credit scoring model performance.
Th e Geographically Weighted Regression (GWR) technique, proposed by Brunsdon, Fotheringham, and Charlton (1996), is used to model spatially heterogeneous (non-stationary) processes; that is, processes that vary (whether in mean, median, variance, etc.) from region to region.Th e basic idea of GWR is to adjust a regression model to each region in the data set using geographical location of the other observations to weight the parameter estimates.Application of the GWR technique can be observed in diff erent areas of research, such as Geography (See et al., 2015), Health (Gilbert & Chakraborty, 2011), and Economics (Huang & Leung, 2002).Atkinson, German, Sear, and Clark (2003) used Geographically Weighted Logistic Regression (GWLR) in their study to analyze the dependency of geographical location in the relationship between erosion and geomorphologic controls in a region of Wales.The dummy variable used in this study was the presence or absence of erosion in the areas studied.Applying the GWLR technique resulted in the estimation of models with diff erent parameters (distinct models) for each area studied, revealing the need to adopt diff erent practices to avoid erosion, depending on the region.
Th is article used data related to transactions involving Consumer Direct Credit (CDC), granted by a Brazilian fi nancial institution to clients residing in the Distrito Federal (DF).Th e aims were as follows: to verify whether the factors that infl uence credit risk diff er according to borrowers' geographical locations; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and fi nancial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models.
Although the central idea in this article of verifying whether space infl uences credit risk is similar to that of Stine (2011) and Fernandes and Artes (2015), the target population and methodology used are diff erent, with no studies being found in the literature that used the GWLR technique for the development of credit scoring models.One advantage of applying the GWLR technique in relation to the others lies in estimating a model for each region in the study, allowing these models to be distinct in their variables and parameters (Atkinson et al., 2003), whereas a global model, represented by only one formula, may not represent local variations adequately.In relation to credit, diff erent study regions can involve diff erent risks, and if this phenomenon is observed, models that consider local characteristics can better diff erentiate the credit risk for borrowers residing there and generate fi nancial gains for the institution.
Another diff erence from other studies on this topic and an advantage in the GWLR technique involves the use of diff erent samples in developing each local model, giving greater weight to borrowers who are closer geographically, and not using distant information that is outside the radius defi ned by the weighting function.
Questions regarding endogeneity are not addressed in this study and could be raised by researchers in future papers.
In addition to this introduction, the second section of the article presents the geographically weighted logistic regression methodology and the process for developing the models, the thirds shows the results obtained, and the fourth sets out the conclusion.

METHODOLOGY
Th e fl owchart presented in Figure 1 details all of the stages carried out in the process of developing the models in this study.

Database
Th e data related to this study refer to transactions involving Consumer Direct Credit (CDC) granted by a Brazilian fi nancial institution to clients residing in the Distrito Federal (DF).Th ese transactions are paid in installments over periods between 0 and 36 months and have a maximum contract value of R$30,000.00.
Th e territorial division of the DF used in this study was composed of 19 regions, shown in Figure 2.
The sample included all loans granted between December 2013 and September 2014, involving 10 rounds of borrowing and a total of 22,132 diff erent loan contracts.Payment performance on these loans was monitored in the twelve months subsequent to the contract agreement date and those that exceeded 90 days in arrears in any of these months were labeled as being in default (Y=1).Due to loan arrears performance involving diff erent moments in time, this database is classifi ed as being of the panel data type.
Th e predictive variables selected to compose the models were: Age, Income, Level of Education, Borrower's Time of Relationship with the Institution, Loan Contract Period, SELIC, Unemployment Rate, and Infl ation (IPCA).Th ese variables refer to the time credit is taken out (a single point in time), thus involving cross-sectional type data.
Th e latitude and longitude geographical coordinates for the regions used in the study and needed to apply the GWLR technique were obtained from the IBGE website, and refer to the central point in each region and are equal for borrowers residing in the same region.
Th e database was subdivided into model development Th e data manipulation, as well the univariate, bivariate, and spatial indicator calculations, along with those for developing the global model via logistic regression analysis, were carried out using the SAS soft ware.Th e GWLR models were developed using the GWR4 soft ware.

Spatial indicators
Moran's I (Moran, 1950) is one of the most widely used global indicators for verifying the existence of spatial correlation.Global indicators present a single measure of spatial tendency for the whole region being studied, they allow the hypothesis of the existence of spatial dependency between regions to be tested in accordance with the variable of interest, and are used in exploratory analysis of data.Th e formula is given by: Whereas the global indicators assume that all of the regions studied can be represented by a single value, the local indicators of spatial association (LISA), developed by Anselin (1995), are used to verify the existence of spatial correlation within the geographical units studied and look for regional diff erences (peculiarities).Th e presence of areas with signifi cant local indices is an indication of spatial (non-stationary) homogeneity.
Th e Moran Local Index formula is given by: Th e database used in applying the Moran Global and Local Indices was the total database of records (without subdivision of samples) and the variable tested was the regional default rate, calculated via the following formula: In this study the Moran Global Index was used to verify the existence of spatial correlation in the default rate between the regions in the DF.Th e Moran Local Index was used to verify the existence of regions with diff erent default rates in relation to the others.Th e existence of signifi cant regions (the confi dence level used for the Moran Local Index was 95%) may indicate that the regression models developed for these regions are diff erent in relation to the models for the other regions in the study, which may warrant applying the GWLR to this target population.

Geographically Weighted Regression
According to Fotheringham, Brunsdon, and Charlton (2002), given a basic linear regression model, the equivalent expression for the GWR is given by: It is noted from the expression above that the model parameters represented by the function β k (u i , v i ) vary according to the values (u i , v i ), which represent the latitude and longitude geographical coordinates for observation (region) i, resulting in a diff erent model for each region in the study.Th e assumptions of the classical linear regression model remain in place for GWR.
Th e matrix form for estimating the GWR parameters is given by: ) is a diagonal matrix and diff erent for each point i of coordinates (u i , v i ), containing the weights w ij in its main diagonal, obtained via the weighting functions, or kernel.Th e substitution of all the weights w ij for the value 1 equates to the identity matrix, which, substituted in (5), turns it back into the classical linear regression model.
The two main weighting functions found in the literature are the Normal or Gaussian function and the Bisquare function.Th e formulas for both functions are presented in Table 1.
Table 1 Weighting functions or kernels.

Weighting Functions
Weighting Function Formulas

Fixed Gaussian
Fixed Bisquare
It is noted from Table 1 that there are two types of expressions for each one of the Gaussian and Bisquare functions, which diff er in the method of choosing the b (bandwidth) parameter to be used (whether fi xed or variable).Th e d ij parameter contained in the weighting functions represents the distance from point i to point j, the b parameter is the fi xed bandwidth (smoothing parameter), and the b i(k) parameter represents the adaptive bandwidth, with the letter k representing the number of neighbors closest to point i.
Th e bandwidth parameter controls the variance in the weighting function; for this reason, in situations in which the data are not equally distributed between regions, use of the bandwidth adaptive is recommended.When developing a model via GWR using the fi xed bandwidth, it should be specifi ed by its value in unit of distance; however, in using the adaptive bandwidth, a k (fi xed) number of closest neighbors to be used in the models should be defi ned, and based on this quantity k, the value of the bandwidth varies between the regions being studied.

Geographically Weighted Logistic Regression
When the response variable is binary, GWR should be applied via Geographically Weighted Logistic Regression (GWLR), in which the formula for obtaining the probability of the event of interest occurring is given by: or still, in the form: in which π(x j ) is the probability of the j th client defaulting and the function β k (u i ,v i ) represents the parameters (coeffi cients) of the k variables in the model, which vary according to the region i of latitude and longitude coordinates (u i , v i ).
Th e GWLR parameters are estimated via the maximum vraisemblance method and the GWLR vraisemblance function is represented by the following expression: By applying the natural logarithm transformation (ln) and developing the formula, we obtain: Th e W(u i , v i ) matrix described in (6) features weights w ij (calculated via the weighting functions shown in Table 1) and is used to geographically weight the observations in the estimation of each set of parameters β k (u i ,v i ).Th at is, this matrix is responsible for assigning a higher weight to the geographically closest observations to region i in the estimation of its parameters, and assigning a lower or zero weight (depending on the weighting function chosen) for the most distant observations from region i in question in the estimation of its parameters β k (u i , v i ).Th e W(u i , v i ) matrix also varies according to the location of each borrower and composes the likelihood function in the following way: Similar to the logistic regression model, after diff erentiating (11) in function of β(u i , v i ) and equating to zero, the model parameters are estimated using interactive numerical methods, such as the interactively reweighted least squares (IRLS) method.It should be noted that this maximization procedure is carried out for each one of the functions related to each region i in the study.
Initially, four diff erent models were developed using each one of the weighting functions presented in Table 1.Th e best model based on AICc was selected for comparison with the global model and to compare between the local models (the models generated for each region in the DF) in terms of signifi cance of the variables that composed the fi nal formula and estimations of the coeffi cients of the variables.

Comparison Between the Models
Th e metrics used to compare the models developed via GWLR and Logistic Regression were: the AICc informational criteria (Hurvich, Simonoff, & Tsai, 1998), the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio defaults compared with the monetary value of defaults observed.
Th e accuracy of the models and percentage of false positives were obtained via the confusion matrix, given by:  Crook, Edelman, and Thomas (2007).
According to Table 2, there are two types of error that a classifying model can commit: rejecting good clients (False Negative -FN), or approving bad clients (False Positive -FP).Th e latter, also known as a Type II Error, is considered to be the worst of the two errors, since these clients would be approved and could generate fi nancial losses for the institution.Th us, the FP percentage was one of the metrics used to compare the models.Th e expected monetary value of portfolio defaults was calculated using the expected discrete distributions formula, given by: in which n is the total number of borrowers in a portfolio, x i is the outstanding balance on the credit transaction for borrower i, and P(Y i = 1) is the probability of borrower i defaulting, resulting in the credit models.Th is value was compared with the value of the sum of defaulting client debts, with the aim of verifying which model comes closest to the real default value.Source: Prepared by the authors.

Univariate and Bivariate Analyses
Th e results on general default rates and those by region are shown in Tables 3 and 4 and the spatial distribution of default rates is found in Figure 6.As shown in Table 3, the general default rate in the DF was 27.66%; thus, it can be observed in Table 4 that only seven regions (Lago Sul, Cruzeiro, Brasília, Guará, Lago Norte, Taguatinga, and Núcleo Bandeirante) have lower default rates than the general average.It is also noted that the Lago Sul region presented the lowest default rate of the regions studied, followed by the Cruzeiro and Brasilia regions.As can be observed in Figure 6, the three regions are located in the center of the Distrito Federal.
Also by analyzing Figure 6, it is noted that the greater the distance from the central point in the DF, the more default rates increase (represented by the darkest areas on the map).Th e Santa Maria, Recanto das Emas, and Paranoá regions stand out in negative terms by presenting the worst default rates.
Th e frequencies were calculated, along with the mean, median, maximum, minimum, and quartiles statistics for the candidate variables for composing the models, and as there were no inconsistencies, missing values, or outliers, no variable was removed in this stage of the study.
Th e bivariate analysis consisted of calculating the cross frequency between the predictive variables and the response variable, with the aim of identifying the variables that diff erentiate credit risk among the target population in the study.Th e variables were categorized based on relative Risk (14), and using this categorization, dummy variables were created to compose the models.

14
All attributes of the rate of unemployment and infl ation variables presented similar levels of credit risk, and for this reason, they were excluded from the study.Th e categories for the other variables are found in Table 5.
It is observed in Table 5 that borrowers with higher Formal Incomes presented a lower credit risk.It is also observed that the higher the borrower's Level of Education, the lower the risk, with PhDs presenting a much higher relative risk than the rest.Th e results also indicated that the older the borrower and the shorter the loan period, the lower the credit risks.With relation to the borrower's time of relationship with the institution, those with shorter times presented a greater credit risk.
Th e SELIC rate is the basic interest rate in the Brazilian economy.An increase in the SELIC makes it more expensive for fi nancial institutions to raise funds, which consequently makes credit transactions more expensive.Higher interest rates in credit transactions reduce the purchasing power of borrowers, and because of this, it is expected that the higher the SELIC rate, the greater defaults and credit risk will be.However, as observed in Table 5, the results obtained were the opposite from expected, with lower relative risk (greater credit risk) for SELIC values below 10.00%, and lower credit risk for values above 10.00%.However, even in light of the results presented, the decision was made to maintain the SELIC rate variable in the study due to it being the only remaining macroeconomic variable.Subsequent studies using a more comprehensive target population should be conducted to better assess this variable.
Based on this categorization, dummy variables were created to be used for composing the regression models.

Spatial Indicators
Th e next stage in the study involved applying the Moran Global and Local Indices with the aim of verifying the existence of a spatial correlation between the default rate variable and the individual regions in the study population.
Th e Moral Global Index presented a value of 0.05, indicating an almost null spatial dependency.Source: Prepared by the authors.Source: Moran (1950) Figure 7 presents the Moran dispersion map in which the regions colored in red tones present positive spatial dependency, whereas the regions colored in blue tones present negative spatial dependency.Th e "Low-Low" type regions presented the lowest default rates, followed by the "Low-High", "High-Low", and "High-High" regions.Th ese results can be considered as spatial clusters of the default rate variable.Th is information could be used by the fi nancial institution to defi ne the target population in loan recovery campaigns, in which obtaining payment from clients residing in the "High-High" regions should be the initial focus of activities, with the aim of improving the company's fi nancial results.
Th e results found for the Moran Local Index, using a 95% level of signifi cance, are presented in the Moran Map in Figure 8.

Global Model via Logistic Regression
Th e global model was developed using the development sample, containing 10,944 records.
Th e variables used in developing the model were all of the dummies created based on the categorizations presented in Table 5.Using the stepwise variable selection method, the variables with p-values under 0.10 (10% level of signifi cance) and which were selected to compose the fi nal logistic regression model (global model) are presented in Table 6.Th e SELIC variable was not signifi cant and was not selected to compose the fi nal global regression model.One possible explanation for this fact is the use of a short loan contract period, leading to few distinct values for this variable.
Moreover, the coeffi cients for the Formal Income variable were inverted, in which the best income bands (d_income1 and d_income2) obtained worse coeffi cients with relation to the worst band (d_income4, the coeffi cient for which is zero).Th is result can be explained by the variable's behavior, with inversions of relative risk in its value ranges when categorized granularly.Another possible explanation is that the categorization was carried out based on total records and the model was developed using the development database, which covers a smaller number of records.
Th e nomenclature for the dummy variables respects the nomenclature for the categories shown in Table 5.For example, the dummy d_age1 represents the age category "> 55 years old" and is the best category of this variable with relation to credit risk, and the dummy d_education4 represents clients in the category "Incomplete College Degree or lower level of education", with this being the worst category for the Level of Education variable with relation to credit risk.
Response variable Y involves the occurrence of defaults (Y=1) as the event of interest, with the probability resulting from the logistic regression models and via GLWR referring to the probability of this event occurring; that is, of the client defaulting.Th us, it can be noted in Table 6 that all of the global regression coeffi cients, except for the Formal Income variable, are coherent, since the best categories for each variable with relation to credit risk presented lower coeffi cients in relation to the higher risk categories for the same variable; that is, the presence of the best categories for each variable reduces the probability of a client defaulting.Th is analysis is called congruence analysis; it is important for verifying whether there are inversions in the coeffi cients and whether categorization of the variables was carried out correctly.
Th e value found for the AICc informational criterion of the global model was 12,098.29,with this value being used for comparison with the models estimated via GWLR, the results from which are presented below.

Local Models via Geographically Weighted Logistic Regression (GWLR)
As described in the methodology, four models using the GWLR were developed, one for each weighting function shown in Table 1.Th e predictive variables used were those selected by the logistic regression model, shown in Table 6.
Th e best model using GWLR, following the AICc criterion, was the Adaptive Gaussian model, with a value of 2,022 closest neighbors to estimate the adaptive bandwidths.Source: Prepared by the authors.
Table 7 contains the descriptive statistics of the coeffi cients estimated by the GWLR model, in which it is noted that the averages for the coeffi cients were very close to the coeffi cients for the global model presented in Table 6.It is noted in Table 8 that the Intercept was signifi cant for all the regions in the Distrito Federal and varied from -1.3922 to -1.2005, indicating a regional diff erence between the values estimated.
With relation to the borrower's age, the variables d_ age1 and d_age5 were signifi cant for all of the regions in the Distrito Federal, whereas the variables d_age2 and d_ age4 were not signifi cant for some regions, indicating that the borrower's age infl uences risk diff erently, depending on the region studied.
Th e d_education4 variable was also signifi cant for all of the regions in the Distrito Federal, presenting a small variation in coeffi cients between the regions.
With relation to the borrower's Time of Relationship with the institution, the variables d_time_rel1 and d_ time_rel4 were signifi cant for all of the regions in the Distrito Federal, whereas the d_time_rel2 variable was not signifi cant for the Cruzeiro region.
With relation to the borrower's Income, the d_income1 variable was significant for all of the regions in the Distrito Federal, whereas the d_income2 variable was signifi cant only for the regions of Candangolândia, Gama, Núcleo Bandeirante, Recanto das Emas, Riacho Fundo, Samambaia, Santa Maria, and Taguatinga, indicating that the borrower's Income also infl uences credit risk diff erently between the regions.
Th e variables d_pd_contract1 and d_pd_contract2, which represent the Loan Contract Period, were signifi cant for all of the regions in the Distrito Federal.

Comparison Between the Models
Th e comparison between the Logistic Regression model and the GWLR Adaptive Gaussian model was made using the following metrics: International AICc Criterion, Accuracy, Percentage of False Positives, Sum of Value of False Positive Debt, and Expected Monetary Value of Defaults in the portfolio compared with the monetary value of defaults observed.
Except for the AICc informational criterion, calculated when developing the model, the other metrics were calculated based on the validation database, composed of 11,188 records.
Table 9 shows the descriptive statistics for the scores obtained by both the models selected in the validation sample.
Th e means for the model scores were very close, with a diff erence only in the third decimal place; however, the model using GWLR presented a greater range of scores.Th e use of few predictive variables meant that the scores produced by the models did not present values greater than 0.585 and 0.639.
To calculate the confusion matrix, a cut-off point had to be defi ned in terms of score, so that borrowers could be classifi ed as good or bad (0 or 1).Th is cut-off point was defi ned based on the shortest distance between Sensitivity and Specifi city and its value was 0.30.Source: Prepared by the authors.

It can be noted in
In Table 11, all of the values obtained for the metrics of the two models were also very close, with the model using GWLR being the one with the best (lowest) AICc informational criterion and best (highest) Accuracy, which indicates a better percentage of hits and lower percentage of False Positives.Th e model using LR was slightly higher in the metrics Sum of the Value of False Positives -this metric can be considered as an estimate of the monetary value that would be granted and enter into default, resulting in fi nancial loss for the institution -and Expected Value of Defaults, since the sum of the value of debt of all of the contracts in default (Y=1) in the validation database of the model was R$ 12,026,290.09, and the value that comes closest is the value from the model using LR.

CONCLUSION
In this article, real data were used from a Brazilian fi nancial institution on transactions involving Consumer Direct Credit, granted to clients residing in 19 regions in the Distrito Federal, to develop credit scoring models using two diff erent methodologies: Logistic Regression and Geographically Pondered Logistic Regression.
The Logistic Regression methodology is quite widespread in the fi nancial sector, and is used in this study to develop a global credit scoring model for the whole Distrito Federal.
Th e Geographically Weighted Logistic Regression methodology is quite rare and uses the borrower's geographical location to weight observations when developing diff erent models for each region studied.
Th e indicators used for comparison between the models developed via the two methodologies were very close, and based on the results obtained, the methodologies can be considered as similar in terms of their power to predict fi nancial losses for the institution.
Th e study demonstrated that some variables were signifi cant for all of the regions, whereas others were signifi cant only for particular regions, concluding that credit risk is infl uenced by diff erent factors, depending on the region studied.
It was also observed that all of the regression models developed using GWLR (regional models) presented diff erent values for the coeffi cients (parameters) of the variables, showing that the weights (importance) of the variables varied from region to region.
Th e results demonstrated the viability of applying the GWLR methodology for developing credit scoring models for the target population in this study.The formulas obtained are applicable only to this population, however, it is believed that this methodology could be extended to other credit transactions and spatial levels (e.g.neighborhoods, municipalities, federal units).
Due to great advances in computing and technology occurring in recent decades, institutions granting credit have robust credit risk evaluation systems, which makes the implantation and use of a set of models estimated via GWLR viable.
With relation to the limitations of the study, the use of few predictive variables meant that the models presented low ranges of scores.
Categorization of the Formal Income variable was carried out so that the classes were monotonic with relation to relative risk; however, the values of their coeffi cients were inverted.Studies considering another categorization or target population should be carried out to verify the relevance of this variable for credit risk.
For future study topics, it is suggested that: the GWLR methodology is applied to develop credit scoring models for other target populations (for example, diff erent credit transactions or geographical regions); comparisons are carried out with other methodologies (such as Support Vector Machines or Boosting); other predictive variables are used; the GWLR methodology is applied to develop models in other areas of a fi nancial institution, such as strategy and marketing; or other functions are used, such as the Log Binomial, to develop geographically weighted models.

Figure 1 .
Figure 1.Flowchart of the stages in developing the models.Source: Prepared by the authors.
and validation samples according to the date a transaction was contracted, with the development sample composed of the fi rst fi ve rounds (December 2013 to April 2014) and totaling 10,944 records.Th e validation database is composed of the fi nal fi ve rounds (May to September 2014), totaling 11,188 records.

Figure 2 .
Figure 2. Territorial division of the Distrito Federal used in the study.Source: Prepared by the authors.
Figure 3 illustrates the bandwidth in the weighting function and Figures 4 and 5 exemplify the use of fi xed or adaptive bandwidths.
Th e sum of the outstanding balance of all borrowers classifi ed as FP was measured to verify the monetary value that would enter into default due to classifi cation error in the model.Th e accuracy of the model is calculated using the proportion of TP and TN in relation to the total, as in the following formula: R. Cont.Fin.-USP, São Paulo, v. 28, n. 73, p. 93-112, jan./abr.2017

Figure 6 .
Figure 6.Spatial distribution of default rates in the Distrito Federal.Source: Prepared by the authors.

Figure 8 .
Figure 8. Moran Map with 95% con dence.Source: Prepared by the authors.

Table 2
Confusion Matrix Note.TP: True Positive -number of good clients classi ed as good; TN: True Negative -number of bad clients classi ed as bad; FP:False Positive -number of bad clients classi ed as good; FN: False Negative -number of good clients classi ed as bad.Source: Adapted from

Table 3
Distribution of frequencies of response variable Y Source: Prepared by the authors.

Table 4 .
Default rates by DF region.

Table 5
Categorization and Relative Risk of the variables.

Table 6
Final variables in the global model and respective coef cients.

Table 7
Statistics of the coef cients estimated in the GaussianAdaptive GWLR model.

Table 8
contains the fi nal formula of the models estimated via Adaptive Gaussian GWLR for the 19 regions in the DF.

Table 8
Formulas of Local Regressions estimated via the AdaptiveGaussian GWLR model.

Table 10
Confusion Matrix of the models using LR.
Source: Prepared by the authors.

Table 9
Descriptive Analysis of Model Scores.
Source: Prepared by the authors.
Table 10 that the models presented very close results with regards to client classifi cation.

Table 11
contains all of the metrics used for comparison between the models, in which a small diff erence is noted between the values of the indicators in the two models.

Table 11
Comparison between the LR and GWRL models