Regionalization of precipitation with determination of homogeneous regions via fuzzy c-means Regionalização de precipitação com determinação de regiões homogêneas via agrupamento fuzzy c-means

Knowledge about precipitation is indispensable for hydrological and climatic studies because precipitation subsidizes projects related to water supply, sanitation, drainage, flood and erosion control, reservoirs, agricultural production, hydroelectric facilities, and waterway transportation and other projects. In this context, methodologies are used to estimate precipitation in unmonitored locations. Thus, the objectives of this work are to i) identify homogeneous regions of precipitation in the Tocantins-Araguaia Hydrographic Region (TAHR) via the fuzzy c-means method, ii) regionalize and estimate the probability of occurrence of monthly and annual average precipitation using probability distribution models, and iii) regionalize and estimate the precipitation height using multiple regression models. Three homogeneous regions of precipitation were identified, and the results of the performance indices from the regional models of probability distribution were satisfactory for estimating average monthly and annual precipitation. The results of the regional multiple regression models showed that the annual mean precipitation was satisfactorily estimated. For the average monthly precipitation, the estimates of multiple regression models were only satisfactory when the months used were distributed in the dry and rainy seasons. Therefore, our results show that the methodology developed can be used to estimate precipitation in unmonitored locations in the TAHR.


INTRODUCTION
Precipitation is one of the most important hydrological variables.Its scarcity or excess directly affects society, influencing water supply, drainage, flood control and erosion systems, agricultural production, generation of energy, etc.However, precipitation monitoring is generally confined to scattered points, leaving gaps in more isolated and difficult to access areas, which highlights the importance of methods that allow hydrological information to be obtained.Thus, the development of techniques for estimating precipitation has become relevant.Regionalization is a possible technique that can provide hydrological data at low cost.Several works, such as Arellano-Lara and Escalante-Sandoval (2014), Asong, Khaliq and Wheater (2015), Shahana Shirin and Thomas (2016) and Fazel et al. (2018), are examples of the application of precipitation estimates in several regions.Regionalization is a well-known methodology and its importance is related to the obtainment of hydrological information in places without monitoring.In addition, using this technique, the zoning of the earth based on physical and hydrological characteristics can generate a greater understanding of the distribution and intensity of rainfall and streamflow in a specific region.
According to Samuel, Coulibaly and Metcalfe (2011), regionalization consists of the use of a set of methods that attempt to transfer information from one place to another in river basins, for the purpose of filling in missing information in a given region considered homogeneous.To apply precipitation regionalization, mathematical and statistical procedures are applied to the historical data series and to the physical and climatic characteristics of the river basins using hydrological models, which, after being calibrated and validated, are able to estimate the precipitation in the homogeneous regions.
The best known models of precipitation estimates are those created through spatial interpolation, statistical and satellite estimation methods.Models of spatial interpolation include the polygon of Thiessen, the kriging and the isohyetal methods.Among the statistical models, we highlight the probability distribution functions (PDF) and the multiple regression analysis (MRA).Satellite estimates are obtained from observations of the atmosphere, captured by micro waves and transformed into precipitation data by specific algorithms that require advanced technology.Spatial interpolation methods mainly consider precipitation.Mathematical and statistical models, such as those derived from multiple regression models, correlate several of the variables that exert some influence on the element studied to improve the results.
Numerous studies related to the estimation of precipitation and its probability of occurrence, through MRA and PDF, have been published.Chifurira and Chikobvu (2014) developed a simple, predictive model of precipitation using multiple regression, using climatic determinants (southern oscillation and sea level pressures) from Zimbabwe, Africa.This model had a reasonable adjustment at a significance level of 5% and is easily applied.Chatzithomas, Alexandris and Karavitis (2015) used multiple regression models to estimate the annual and monthly means of precipitation in the Viotikos Kefissos basin in Ecuador.In this study, the authors used 17 rainfall gauge stations, three independent variables (elevation, location and direction of storms), verifying that the regression models had excellent results when compared with the kriging method.Das and Umamahesh (2016) used a multiple regression model constructed with main components and fuzzy clusters that estimated the behavior of precipitation between 2008 and 2100, and found good results for the Godavari basin in India.
Li, Brissette and Chen (2014) evaluated the performance of six distributions of precipitation probability (exponential, gamma, Weibull, normal, mixed exponential and hybrid exponents) from the Loess Plateau in China, identifying the normal function as the best with which to simulate the distributions of monthly and annual frequency.Yuan et al. (2018) tested five different probability distribution functions to predict the distribution of the occurrence of the maximum hourly annual precipitation.The quality of the fit was assessed using the chi-square test, which indicated that the log-Pearson function had the best overall fit for the maximum hourly annual precipitation from most regions of Japan.
Thus, regionalization and precipitation estimates are the main objectives of this study, which is motivated by the regions of the Amazon that still lack rainfall gauge stations with long series of records.An example of one of these regions is the TAHR.In this case, the homogeneous regions were determined via the fuzzy c-means clustering technique.Probability distribution functions and regional models, determined through multiple regression models, were employed for precipitation height estimates.

Study area
The TAHR is located between 0º 30 'and 18º 05' south and 45º 45 'and 56º 20' west (Figure 1).It has an elongated configuration, with a south-north direction, following the predominant direction of the main watercourses, the Tocantins and Araguaia Rivers, which intersect in the northern part of the region, from which point they are called the Tocantins River, which empties into the Marajó Bay.The total area of the TAHR is 918,822 km 2 , covering part of the midwestern, northern and northeastern regions.The TAHR occupies 11% of the national territory and includes the states of Goiás (21.4%),Tocantins (30.2%),Pará (30.3%), Figure 1.Tocantins Araguaia Hydrographic Region (TAHR).
Gomes et al.
The TAHR has great importance for the development of the country since it provides electricity for the Brazil, through the Hydroelectric Power Plant (HPP) of Tucuruí, and is important for mining, agribusiness, agriculture and livestock farming.According to studies conducted by the National Water Agency (ANA, 2006), the average annual precipitation is approximately 1,837 mm, and the rate of flow is approximately 13,624 m 3 /s; the evapotranspiration is 1,371 mm, representing 75% of the precipitation (the average annual evapotranspiration of the country is 1,134 mm or 63% of the precipitation); and the average coefficient of the surface flow is 0.30.According to ANA (2016a), 109.5 thousand hectares of irrigable areas were registered in this region in 2014 (Figure 2).The most relevant land use and occupation activities are categorized into urbanized areas, crops, forests, pastures and agricultural establishments (Figure 3).

Data sources
Precipitation data from 92 stations located at TAHR in the ANA database (ANA, 2016b) were used (Table 1).The stations were chosen based on the historical series; the chosen stations had the largest data series.Despite flaws found in the daily series, the annual and monthly accumulated data was not compromised.The data consistency methodology adopted by ANA (2012) prioritizes the degree of homogeneity of the data, correcting possible errors.
To calibrate the models used in the regionalization, 83 stations were used and in the validation, 9 target stations were used (Figure 1).Altitude information and station coordinates are available in the ANA database.The mean annual precipitation (P), altitude (H), latitude (la) and longitude (lo) of each rainfall gauge station were used to identify the homogeneous regions of precipitation and to develop regional models of precipitation estimation.Of the 92 stations used, 70 have 30 years of data , and the remaining 22 include 17 and 28 years.

Homogeneous regions
One of the conditions necessary for the application of regionalization is the identification of homogeneous regions, which are associated with regions that have hydrological similarities.The identification of hydrologically homogeneous regions has two purposes: to impose boundaries between regions and to hydrologically characterize the regions.The identification of homogeneous regions can be performed in several ways.However, the most widely adopted method in hydrological and environmental studies is cluster analysis.The applications developed by Satyanarayana and Srinivas (2011), Dikbas et al. (2011), Santos, Lucio andSilva (2014),    (2015) and Pessoa, Blanco and Gomes (2018) are examples of the successful use of cluster analysis to identify hydrologically homogeneous regions, demonstrating their significant efficacy.

Fuzzy c-means (FCM)
The nonhierarchical fuzzy c-means method was initially proposed by Dunn (1973) and then generalized by Bezdek (1981).Known as fuzzy clustering, it is based on the premise that a set can be grouped into p groups by the degree of membership that each element has to one or more sets.The fuzzy c-means group is generated by minimizing the objective function (Equation 1) and by iteratively performing the algorithm (FCM), which indicates the degree of membership of an element to a given cluster group.Therefore, technique, each element belongs to a group with a certain degree of pertinence, which requiring an initial estimate of the number of groups.

( ) (
) where n is the number of data points; p is the number of groups; u ij is the degree of relevance of the sample Xi to the j-th cluster; m is the fuzzy parameter; d is the Euclidean distance between Xi and Cj; Xi is data vector, with i = 1, 2,..., n, representing a data attribute; and Cj is the center of a fuzzy cluster.
The fuzzy parameter (m) is also known as the fuzzy weight exponent, and is the parameter that controls the level of diffusivity in the classification process.The cluster decision is defined by the greater degree of relevance presented for each element analyzed.Thus, for a given Xi, its greater degree of pertinence, determines which group this object belongs to.

PBM index
The PBM index proposed by Pakhira, Bandyopadhyay and Maulik (2004), which is an acronym of the initials of the authors' names, serves to validate the number of clusters or subsets formed from a set of data by evaluating whether the clusters are well defined and separated.The PBM index is a maximization parameter; therefore, the higher its value, the better the quality of the partition is.It is defined as the product of three factors (Equation 2) and its maximization ensures that the partition has a small number of compact groups with a large separation between at least two of them.
where K is the number of clusters; E 1 is the sum of the distances of each sample to the geometric center of all samples; E k is the sum of the distances between the groups and D k represents the maximum separation of each pair of groupings.

Heterogeneity test (H)
The measurement of H (Equation 3) which is used in hydrology and meteorology, was proposed by Hosking and Wallis (1993) and aims to verify the degree of heterogeneity of a region by comparing the observed variability to the expected variability of a homogeneous region based in L-moment statistics.H helps verify the homogeneity of the regions formed in the cluster.
where V is the weighted standard deviation, μv is the arithmetic mean of the statistics Vj, obtained by simulation and σv is the standard deviation of the dispersion measure of the estimated samples.According to a test of significance, if H < 1, the region is considered to be "acceptably homogeneous," if 1 ≤ H < 2, the region is "possibly homogeneous" and finally if H ≥ 2, the region must be classified as "definitely heterogeneous."

Probability Distribution Functions -PDF
In hydrology, the PDFs produces a projection of what will happen in the future, based on the frequency of past occurrences.Thus, to model the frequency of hydrological data, it is necessary to study its occurrence and to establish whether the variable can be larger or smaller than a given value.Several probability distribution functions have been used to verify precipitation behavior and variability.Among these, we use the normal, gamma two parameters, log-normal and Weibull distributions because they show good adjustments of monthly and annual precipitation totals and some of them are highlighted in the publications of Li, Brissette and Chen (2014), Caldeira et al. (2015), Yuan et al. (2018).
The chi-square test (X 2 ) was used to select the PDF that best fit the probability values of monthly and annual precipitation.The choice of this test is justified because it is the most commonly used to test frequency distributions.In the calibration of the PDF, simulations were carried out using a computer code called PDF, created to generate the occurrence frequencies of annual and monthly average precipitation heights of each station in the homogeneous regions formed by the fuzzy c-means cluster.The PDFs selected in the calibration evaluated by their fit in the 9 target stations, which were not adopted in the calibration step.Thus, the frequency distribution of the target stations was determined by the best PDF obtained in the calibration.

Adhesion test -Chi-square (X 2 )
The chi-square test (Equation 4) was used to select the best probability function, adjusted to the observed data.The test is based on the comparison of the sum of the square of the deviations to the observed and estimated frequencies.In this work, the application of the chi-square test considered the number of degrees of freedom to be equal to two; and the level of significance to be equal to 5%, since these are the most usual values used in the application of this test.Thus, the value of the X 2 is equal to 5.99 for all functions.For the probability distribution to be considered adequate, the calculated value of X 2 must be smaller than the table (CORDER;FOREMAN, 2009).
where f o is the frequency observed (mm); and f e is the frequency (mm) estimated by the probability function.

Multiple regression models
According to Hair et al. (2005), this technique can be used to verify the relationship between a single dependent variable and several independent variables.The objective of this method is to use the independent variables, whose values are known, to predict the values of the dependent variable studied.The relationship between the dependent variable and the independent variables can be represented by a linear model (Equation 5).
where Y is the dependent or predicted variable, X 1 , X 2 ,…Xi, are the independent or explanatory variables.β o , β 1 , β 2 ....β i , are the regression coefficients, and Ɛ denotes the residuals of the regression.In the determination of the dependent variable (Y), represented by the precipitation (P), the multiple regression method was applied between the independent variables (elevation -H, latitude -la, and longitude -lo).For the determination of the parameters β o , β 1 , β 2 and β 3 , the least squares method was adopted.Thus, precipitation was determined by the following regression models: linear (Equation 6), potential (Equation 7), exponential (Equation 8) and logarithm (Equation 9).
These models were chosen because they are successful in estimating hydrological variables.In most studies involving regression models, we only observe the use of the variables latitude, longitude and altitude, which are most often available.However, this does not inhibit the success of satisfactory results in the estimation of precipitation, as in, for example, the work of Teixeira-Gandra, Damé and Simonete (2015) and Chatzithomas, Alexandris and Karavitis (2015).

Performance criteria
In the calibration of the regression models, the mean annual and monthly precipitation values at the rainfall stations of the formed groups were used.To evaluate the proposed regression models, we chose the performance criteria presented in Table 2.According to Nash and Sutcliffe (1970) and Rencher and Christensen (2012), the coefficient of determination (R 2 ) and Nash are equivalent, and the R 2 value varies between 0 and 1.An R 2 value of 9 indicates that 90% of the total variability in the response variable is accounted for by the independent variables.The root mean squared error (RMSE) corresponds to the mean magnitude of the estimated errors.According to Chai and Draxler (2014), the closer the value is to zero, the higher the quality of the estimated values.The percentage relative error, E (%), and the mean relative root square error, ε (%), are coefficients used in several areas of science.According to Jose (2017), the first evaluates the performance of the model, considering the percentage difference between the values of the observed estimated variables, and the second prioritizes the adjustment of the relative values, using the weight of values higher or lower.These coefficients are the most used in the applications of prediction models of hydrological variables, as observed in Mekanik et al. (2013), Chifurira and Chikobvu (2014), Supriya, Krishnaveni and Subbulakshmi (2015), Chatzithomas, Alexandris and Karavitis (2015) and Das and Umamahesh (2016).
For validation, 9 target stations were adopted.Based on the location and altitude data, the precipitation was estimated by applying the regression model, defined in the calibration.Thus, it was possible to compare observed and estimated mean annual and monthly precipitation data of each target station.The estimated data were obtained by the regression model.The mean percentage relative error, E (%) (Table 2) was used as a reference in the validation of the performance of the regression models since the evaluation considers the observed and estimated values, allowing a more direct and objective analysis.

Homogeneous regions
In the formation of homogeneous regions, 63 clusters were performed, changing the fuzzification parameter to the range of 1.2 to 2.0 and the number of groups to 2 to 15.However, it was observed that the larger the number of groups was, the lower the value of the PBM index.Tests with up to 8 groups were considered since the PBM index would tend to decrease with clusters larger than 8.The choice of the best cluster was decided by the PBM index, which presented a higher index (Figure 4) in the formation of three groups with a fuzzing parameter equal to 1.9.
Regionalization of precipitation with determination of homogeneous regions via fuzzy c-means 8/19 1,600 and 1,700 mm, respectively, while Region III presents an index of approximately 2,400 mm.Studies by Loureiro, Fernandes and Ishihara (2015), which used geostatistical interpolation in the region, identified that the precipitation totals decrease from north to south but did not define homogeneous regions.In the present work, in addition to confirming this result, it was possible to define three homogeneous regions by the fuzzy c-means clustering.In the verification of the heterogeneity test (H), the value of 0.047 was obtained for Region I, -0.0049 for Region II and -0.7874 for Region III, conferring acceptably homogeneous regions, since H <1.

PDF applied to annual average precipitation
The PDFs from normal, log-normal, gamma (two parameters) and Weibull distributions had good adherence in the chi-square test since their values were all below the table value of 5.99, as can be observed in Table 3.
However, the log-normal distribution showed better graphic adjustment between the frequencies observed and estimated.Thus, the log-normal function is the most appropriate model for estimating the probability of occurrence of annual precipitation in homogeneous regions I, II and III of the TAHR.
To validate the log-normal function in homogeneous regions, 9 target stations, three per homogeneous region, were tested using the chi-square test.The test values are below 5.99 (Table 4), validating the log-normal function.The graphical analysis of Figure 6 shows the good adjustment of the probability of occurrence of annual mean precipitation at the target stations in the TAHR.According to Naghettini and Pinto (2007), because the log-normal variable is positive and has a nonfixed asymmetry coefficient greater than zero, this distribution has a parametric form that is adequate to estimate precipitation heights monthly, quarterly or annually.

PDF applied to monthly average precipitation
The average monthly precipitation probabilities of each region were evaluated for adherence to the probability models (normal, log-normal, gamma and Weibull) by the chi-square test.The results of the chi-square test (Table 5) show that the gamma      In a general evaluation of the adjusted graphs, in the November, December and January, the most adequate adjustments occur, whereas in the months of April, June and July, less adequate adjustments occurred.This result was observed based on the number of times the Chi-square values were above the chosen threshold (5.99), with a significance level of 5% and degree of freedom equal to 2. To validate the gamma function, the probabilities of occurrence of monthly average precipitation at the target stations were generated by this function.The results of this validation indicate a good adjustment of the gamma function, since the values of the chi-square test were all adequate, as can be observed in Table 6 and in the adjustment of the graphs that represent the probabilities of observed and estimated occurrence of average monthly precipitation (Figures 7, 8 and 9).
In comparison with other probability functions, the gamma function has presented good adjustments in the predictions of the probability of occurrence of monthly precipitation.Sampaio et al. (2006) and Amburn, Lang and Buonaiuto (2015), for example, used different PDFs to estimate the occurrences of precipitation probabilities, and the gamma function had the best result for monthly precipitation data.
The results of Table 5 show that there are many values with adherence in the normal, log-normal and Weibull models.However, according to Kist and Virgem Filho (2015), the adherence of a distribution to the data does not necessarily mean that the adjustment is good, only that there was not enough evidence in the series for rejection.Thus, because four different distributions were tested, and some presented values considered adherent, we cannot totally rule out the use of these functions in the studied region, and thus, the other PDFs could be adopted in this region if they pass other measures of calibration and validation.This analysis is also valid for the annual data series, in which the probability functions were also determined to be adequate by the Chi-square test (Table 3).
According to Murta et al. (2005), the gamma function, from the statistical point of view, does not behave as if evenly distributed around the mean value, but rather shows irregular and large deviations around the mean value.This function could guarantee a better result in the study of average monthly precipitation if the average value of the series is not influenced by the results.Thus, the adhesion test (Table 6) and the graph adjustment (Figure 7, 8 and 9) confirm that the Gamma model is valid for application in TAHR.

Multiple regression models for annual mean precipitation estimates
The multiple regression models were tested considering three independent variables (altitude, latitude and longitude) from the set of stations representing each homogeneous region.Thus, using the results of the performance criteria, we determined the best model for estimating the dependent variable.
In homogeneous regions I and II, in relation to R 2 , R 2 _a and NASH, the models were not significant, with a R 2 value varying from 0.39 to 0.46 (Table 7).In homogeneous region III, the models were more significant, with R 2 values of 0.67 to 0.74.In terms of percentage, this coefficient represents how much of the variability in precipitation is explained by the independent variables (altitude, latitude and longitude).Thus, the linear model represents 46% and 41% (0.46 and 0.41 -Table 7) of the variability in precipitation that occurred in regions I and II, respectively, presenting the highest R 2 value among the models for these regions.In homogeneous region III, this percentage was much better, at 74%.Considering E (%), ε (%) and RMSE, the models would perform well in the estimation of precipitation, since the errors obtained are less than 7% and 0.7%, and the RMSE presented minimum values.Therefore, the linear model is the most significant for the estimation of the annual precipitation in regions I, II and III, as it also presents higher R 2 and Nash values (Table 7).
To validate the linear model, the percentage relative error, E (%), between the observed precipitations (Po) of the target stations and the estimated precipitations (Pe) of the linear model (Figure 10) was calculated.The percentage errors obtained by the linear model were lower than 9% for almost all of the target stations.Only for the Fazenda Marajá station, which belongs to the homogeneous region II, was the error greater than 10%.However, for the Pirenopolis station located in the homogeneous Gomes et al.

11/19
region II, the error was at least 0.16% (Figure 10).In general, the errors between the observed and estimated heights were acceptable.

Regression models for the rainy and dry season
The multiple regression models did not perform well in estimates of monthly mean precipitation.The highest relative percentage errors occurred in the dry months, and the lowest errors occurred in the rainy season.Thus, the multiple regression was conducted on the dry and rainy season, in an attempt to obtain more representative and adequate models of the estimation of average monthly precipitation.Following this method, rainy months were considered, i.e., the months of November, December, January, February, March and April.The dry months contain May, June, July, August, September and October.This analysis was performed using the monthly average values of the rainy and dry months from each station in the homogeneous regions formed 12/19 from the fuzzy c-means clustering.Thus, a multiple regression model was applied with the linear, potential, exponential and logarithm models, adopting the mean precipitation of the rainy and dry season as a dependent variable.For the rainy months, the R 2 and Nash values obtained from the regression models were all below 0.39 in homogeneous regions I and II (Table 8), indicating that there is a weak relationship between the independent variables.
The logarithm model, for example, can explain only 21% and 17% of the precipitation variability in the homogeneous regions I and II, simultaneously (0.21 and 0.17 -Table 8).The percentage errors (E, ε) were below 6.4% and 0.46%, respectively, and the RMSE was minimal, indicating that the models may be useful, even though the R 2 is low.In homogeneous region III, for the rainy season, all models presented values of 0.99 for the Nash coefficient, which indicates that they are excellent estimators.The R 2 was approximately 0.64 to 0.73.The percentage errors   In the validation of the rainy season data, the respective regression parameters were obtained from the calibration with the linear model and the information from the target stations (altitude, latitude and longitude).The percentage relative error was determined between Po and Pe that was calculated by the linear model.The Tucuruí station presented the maximum error of 13% (Figure 12) in the estimation of monthly precipitation for the rainy season.However, the mean relative error was 5.6%, indicating that the model performed adequately for the rainy season in the 3 homogeneous regions.
In the validation of the dry season data, the observed precipitation (Po) values were compared with the precipitation values obtained by the potential model.The mean errors found were less than 10%.Despite the stations Faz.Babilônia and Cametá presenting errors of 12.78% and 14.23% (Figure 13), respectively, the potential model performed well in estimating the average monthly precipitation, with a mean error of 6.86% for the three homogeneous regions.
By the RMSE values obtained (Tables 7, 8 and 9), all the models evaluated could be considered as good estimators, since all were close to zero.However, when comparing the results of other criteria, the models are not considered satisfactory.To avoid this type of error, other measures were evaluated, such as the Nash, R 2 , percentage, E (%), and mean, ε (%), errors, and the choice of the most appropriate model was prioritized.
According to Nash and Sutcliffe (1970), the Nash coefficient allows the efficiency of a model to be defined, and its value is analogous to the coefficient of determination (R 2 ); the closer the value is to 1, the better the model representation.In the results obtained, we can see that the value of R 2 approaches the Nash value.However, in the evaluation of multiple regression models, R 2 is the most important measure, as observed by Fumo and Rafe Biswas (2015), Alexander, Tropsha and Winkler (2015) and Bardak et al. (2016).Thus, R 2 value is the most relevant value to consider for when choosing a regression model; however, its evaluation is more consistent when there is an integration between the other performance criteria.
The proposed methodology can be considered acceptable for estimating precipitation since it analyzed the results of six performance criteria, evaluated observed and estimated precipitations using the dispersion graph and tested the proposed models with stations that were not considered in the calibration of the models.Through this methodology, estimates of the probability of occurrence of precipitation, as well as estimates of monthly and annual precipitation can be performed in locations without monitoring in a satisfactory way, just knowing the location and altitude data of a certain point within the basin studied.Table 10 shows the multiple regression models for estimating annual and monthly precipitation heights, in dry and rainy seasons, in the three homogeneous regions formed in the TAHR.

CONCLUSION
The grouping techniques, fuzzy c-means, PBM index and H-test were able to form distinct groups, with well-defined precipitation averages and a spatialization of the homogeneous regions appropriate to the rainfall recorded in the homogeneous regions.In the homogeneous regions I and II, formed to the southwest and center-west of the TAHR, respectively, smaller pluviometric volumes were determined.For the homogeneous Region III, located in the north, a higher pluviometric volume was determined, as was to be expected because the Amazon forest exists to the north of the TAHR and the Brazilian cerrado exists to the south.
Annual precipitation estimates performed well, both with the use of the probability distribution functions and through the use of multiple regression models.However, for the estimation of monthly averages, the regression models presented better estimates   Gomes et al.

17/19
when considering dry and rainy seasons.The monthly estimates were estimated satisfactorily using the probability functions without the need to consider dry and rainy seasons.
The performance criteria used in the validation of multiple regression models, provide a better analysis of the results, when used in an integrated way.The multiple regression models obtained use easy-to-obtain input variables, making them a useful tool for locations lacking precipitation data.Thus, the methodology developed can assist in the planning and management of others river basins, in terms of precipitation estimations.

Figure 4 .
Figure 4. PBM index as a function of the number of groups.

Figure 6 .
Figure 6.Probability of occurrence of observed and estimated annual mean precipitation at the target stations.

Figure 7 .
Figure 7. Probability of occurrence of observed and estimated monthly mean precipitation at the target stations -Homogeneous Region I.

Figure 8 .
Figure 8. Probability of occurrence of observed and estimated monthly mean precipitation at the target stations -Homogeneous Region II.

Figure 9 .
Figure 9. Probability of occurrence of observed and estimated monthly mean precipitation at the target stations -Homogeneous Region III.

Figure 11 .
Figure 11.The 1:1 line for average annual precipitation and average monthly precipitation -rainy and dry season.

Figure 12 .
Figure 12.Percent errors by homogeneous region and target station for monthly mean precipitation -rainy season.

Figure 13 .
Figure 13.Percent errors by homogeneous region and target station for monthly mean precipitation -dry season.

Table 1 .
TAHR rainfall gauge stations considered in the study.

Table 2 .
Performance criteria of multiple regression models.

Table 3 .
Chi-square test for the mean annual precipitation probability functions.

H. R. Result of the chi-square Normal Log-Normal Gamma Weibull
H. R. -Homogeneous Region.

Table 4 .
Chi-square values in the validation of the log-normal function for the annual series.

Table 5 .
Chi-square test with PDFs -probability distribution functions.

II HR III HR I HR II HR IIII HR I HR II HR III HR I HR II HR III
2 unsuitable values, while the normal, log-normal and Weibull function had 8, 7 and 5 values without adherence, respectively.This result indicates that, with the exception of the months of April and July (RH II and RH III), the gamma function offered lower values than the table value (5.99), indicating it adjusted well to the frequencies of occurrence of the monthly precipitation observed.Thus, the PDF gamma had the best adherence to the chi-square test for monthly precipitation.

Table 6 .
Chi-square test with frequencies observed and estimated by the gamma function at the target stations.

Table 8 .
Performance criteria of the models for the rainy season.

Table 9 .
Performance criteria for the models of the dry season.

(%) Nash RMSE
R 2 -determination coefficient; R 2 _a -adjusted coefficient of determination; E (%) -the average percentage relative error; ε (%) -mean relative root square error; NASH -coefficient of Nash Sutcliffe; RMSE -root mean squared error.Gomes et al.15/19 other models, suggesting that the potential model is best for the prediction of monthly precipitation in this region.