PERFORMANCE OF THE PROBABILITY DISTRIBUTION MODELS APPLIED TO HEAVY RAINFALL DAILY EVENTS

Probabilistic studies of hydrological variables, such as heavy rainfall daily events, constitute an important tool to support the planning and management of water resources, especially for the design of hydraulic structures and erosive rainfall potential. In this context, we aimed to analyze the performance of three probability distribution models (GEV, Gumbel and Gamma two parameter), whose parameters were adjusted by the Moments Method (MM), Maximum Likelihood (ML) and L Moments (LM). These models were adjusted to the frequencies from long-term of maximum daily rainfall of 8 rain gauges located in Minas Gerais state. To indicate and discuss the performance of the probability distribution models, it was applied, firstly, the non-parametric Filliben test, and in addition, when differences were unidentified, Anderson-Darlling and Chi-Squared tests were also applied. The Gumbel probability distribution model showed a better adjustment for 87.5% of the cases. Among the assessed probability distribution models, GEV fitted by LM method has been adequate for all studied rain gauges and can be recommended. Considering the number of adequate cases, MM and LM methods had better performance than ML method, presenting, respectively, 83% and 79.2% of adequate cases.


INTRODUCTION
Probabilistic studies of hydrological variables such as heavy rainfall constitute an important element for supporting water resources planning and management.Among the features of great interest is the study of rainfall frequency associated to the maximum daily rainfall, whose behavior is strongly associated with the asymptotic distributions (Mello;Silva, 2005).
Several studies in the literature have investigated the probability distribution models for extreme values of climate variables, especially the Gumbel and Generalized Extreme Value (GEV) models, which have produced better adjustments or performances.In the study of intense rainfall for the São Francisco Basin, Silva and Clarke (2004) concluded that the use of the Gumbel distribution could not be recommended for data sets throughout the San Francisco basin.Sansigolo (2008), comparing the Normal, Gumbel, Fréchet, Weibull, Log-Normal and Pearson probability distribution models, adjusted to maximum daily rainfall and maximum absolute temperature data sets, for Piracicaba city, SP state, concluded that the Gumbel distribution obtained the best performance.Araújo et al (2010) evaluated the Gumbel, Gamma, Log-Normal, Normal, Weibull and Beta probability distribution models applied to the long-term daily maximum temperature for Iguatu, Ceará, and concluded that all the models were adequate.However, according to the Lilliefors test, the best and worst probability distribution model were, respectively, the Normal and the Gumbel.Mello and Silva (2005) adjusted the Gumbel distribution to maximum rainfall long-term series belong to seven rain gauges, in the Upper Grande River region.They studied the effects that the estimation methods of Maximum Likelihood and Moments can generate on the parameters of the Gumbel distribution, analyzing yet the effects on the intense rainfall equation parameters estimate.The authors verified that the Maximum Likelihood has produced the lowest Chi-Square values, concluding about its greater performance.
The methods for adjusting a given probability distribution model, including those of Moments, Maximum Likelihood and L-moments, can lead to different results.The Moments Method (MM), according to Naghettini and Pinto (2007), is the simplest and normally, less efficient.The Maximum Likelihood method (ML) is considered as the method that maximizes the plausibility of a given distribution to be represented by the estimated parameters.However, for some cases, small-sized samples can produce estimators comparable or even inferior to other methods.The L-Moments method (LM) produces parameters comparable in quality to those produced by the ML method, however, for small samples, LM method can be more accurate, and thus, better performance (Naghettini;Pinto, 2007).
To compare the observed frequency of a given variable against to the respective probability (estimated frequency) are used for non-parametric statistical tests, like Kolmogorov-Smirnov, Lilliefors, Shapiro-Wilk, Cramervon Mises, among others (Campos, 1983, Assis;Arruda;Pereira,1996), in order to verify whether the sample values can be considered as originating from a theoretical distribution with that population.The Anderson -Darling test is an alternative at to the Chi-Square and Kolmogorov-Smirnov tests, as it gives more weight to the tails of the frequency distribution, being more recommended for asymptotic distributions (Naghettini;Pinto, 2007).
In this context, we developed this studied aiming to analyze the performance of three probability distribution models (GEV, Gumbel and Gamma two-parameter), whose parameters were adjusted by Moments, Maximum Likelihood and L-moments methods, applied to longterm series of maximum rainfall daily events from eight rain gauges located in Minas Gerais state.

MATERIAL AND METHODS
A long-term series of maximum daily rainfall, obtained from the National Water Agency (ANA/ HIDROWEB, 2012), from eight rain gauges located in medium to high heavy rainfall zones (Mello et al., 2007), in Minas Gerais state: Barbacena, Carmo da Mata, Desterro Mello, Divinópolis, Estiva, Ibituruna, Itapecirica and Ouro Preto were used.All the long-term series had at least 34 years of complete and recent data sets, being the period between 1977 and 2007 common to all them (Table 1 and figure 1).According to the National Center for Disaster Monitoring and Alert (CEMADEN, 2014), the municipalities of Barbacena, Desterro do Melo and Ouro Preto are located in high landslide risk areas making this study highly strategy for actions purposing to minimize the impacts.Three Probability Density Functions (PDF) designed for extreme values, were used: Generalized Extreme Value -GEV (Equation 1), Gamma two-parameter (Equation 2), and Gumbel (Equation 3), whose parameters were estimated by the zmethods of Moments (MM), Maximum Likelihood (ML) and L-Moments (LM) ( Naghettini; Pinto, 2007).
(3) Figure 1 -Location of the rain gauges studied in Minas Gerais state.
Where x is the hydrological variable (annual maximum daily rainfall), α, μ and ξ are the parameters of this distribution.

PDF f x x e
x : ( ) = × ( ) Where β and υ are the parameters of this distribution.Where α and μ are the parameters of the distribution.
With the parameters estimated by the MM, ML and LM methods for the Gumbel, GEV and 2P Gamma probability distribution models, the Filliben non-parametric test was previously applied (Equations from 4 to 6) to evaluate the performance of these models to represent the respective data sets.For distribution and methods where the adjustment were statistically the same, we used two another non-parametric tests: Anderson-Darling (Equation 7) and Chi-square (Equation 6), considering 0.05 of probability level (Naghettini;Pinto, 2007).
The Filliben test is based on the linear correlation coefficient r between observations sorted in ascending order {x(1), x(2), ... , x(i) , ... x(N)} and the theoretical quantiles {w1, w2, ... , wi , ... wN}, which are calculated by: Where F x −1 is the inverse of the PDF; qi is the empirical probability corresponding to the classification order i.
(5) N is the sample size and a = 0.40 (GEV), a = 0.44 (Gumbel) and = 0.40 (Gamma).Formally, the statistical of Filiben test is given by: (6) The r calc values are compared against those of r crit , which are obtained at a significance level of 0.05.If r calc > r crit , the sample can be represented by respective distribution.
The Anderson-Darling non-parametric test strongly considers the distributions tails, in which the largest (or smallest) observations of the sample can greatly alter the quality of the adjustment.The test is based on the difference between the cumulative probability functions FN(x) and theoretical of continuous random variables FX(x).Thus, the statistics of the Anderson-Darling test is: Where n is the number of class groups, f obsi and f theor are, respectively, the observed and theoretical frequencies in class i.
The null statistical values of this test ( χ 2 tab ) is obtained based on the degree of freedom which is obtained by the number of frequency classes minus the number of parameters of the PDF and minus one, and significance level of 0.05.If χ 2 calc < χ 2 tab , the PDF is suitable for the series studied.Furthermore, the χ 2 calc values reflects a squared mean error, with the participation of all the data sets of the series.Thus, it can be considered to compare and discuss the performance of the PDF, allowing the indication not only if the PDF is adequate but also the most appropriate model.

RESULTS AND DISCUSSION
Table 2 presents the basic statistics of the maximum daily rainfall data sets studied for each rain gauge.It can be seen that Barbacena rain gauge shows the lowest dispersion statistical indicators, represented by the standard deviation (SD), range of variation (RV) and coefficient of variation (CV), followed by Estiva and Ouro Preto rain gauges.On the other hand, Itapecirica, Desterro Melo and Ibituruna present the highest values of these dispersion indicators.These basic statistics of the data sets are relevant to study the performance of the probability distribution models studied, mainly associated with the influence on adjusting methods performance.
In table 3, it is presented the results of the Filliben test applied to all possible combinations among the PDF, adjusting method and rain gauge.
In the initial analysis of the PDF adjustments, it appears that the Gumbel distribution model presented adequate adjusts for 87.5% of the cases, GEV for 83.3% and Gamma for 75%.These results differ from those obtained by Araújo et al (2010), where the Gumbel distribution showed inferior performance compared to other models tested by the author for Iguatu, CE.It is also worth noting that the Gumbel distribution presented equal r calc values for the three adjustment methods (MM, ML and LM) for each one of the rain gauges stations studied, considering only those that were statistically adequate.
The Filliben test returned that GEV distribution was inadequate for two combinations considering MM and for two others considering ML.To Ouro Preto rain gauge, two of these situations were verified.The other two situations of GEV non-adequate were to the rain gauges of Desterro Melo and Estiva, respectively, for ML and MM.Another aspect worth noting is that the GEV-LM combination has Where N is the sample size, i the position of each of the data in the time series placed in position of ascending order, P1 (X < xi) is the probability of non-exceedance and P2(X > xi) is the probability of exceedance.If AD 2 < p (α), the null hypothesis (H 0 ) of adequacy of the PDF is accept.In this study, p (α) = 0.757 was considered for all PDF.
In the Chi-squared test, the null hypothesis (H 0 ) is tested by making a comparison between the observed and theoretical frequencies in each class grouping of sample data.The statistical of the test is given by: (8) Ciênc.Agrotec., Lavras, v.38, n.4, p.335-342, jul./ago., 2014 presented statistically adequate to all the eight rain gauges, reflecting a combination that showed there is not statistical problems of adequacy.The Gamma 2P distribution model has six cases of non-adequacy, being three in just one station (Desterro Melo -for all the 3 adjustment methods).
In addition, we can report that there were two cases of non-adequacy for Itapecirica rain gauge (ML and LM).
From the analysis of performance of the PDF tested, it appears that the MM and LM provided 83.3% of the adjustments, while the ML provided 79.2% of situations.The only method that enabled us to identify adequacy for all stations was the LM associated to the GEV distribution.Similar results were obtained by Beijo;Vivanco and Muniz (2009), Blain (2011) and Quadros; Queiroz and Vilas Boas (2011), proving the great performance of this PDF for maximum daily rainfall.However, these results are different from those obtained by Blain and Camargo (2012), pointing to the GEV -ML with better performance.
Comparing the r calc values for the rain gauges, it seems that except for the Barbacena, Carmo da Mata, Divinópolis and Itapecirica, other values generated based on the GEV by LM are higher than those generated by Gumbel and Gamma models.With similar purposes, Silva and Clarke (2004), Mello and Silva (2005) and Hartmann; Moala and Mendonça (2011) verified higher precision of the Gumbel when adjusted by ML.
Regarding the rain gauges data sets, only three have showed adequacy to the three distributions and three adjust methods (Barbacena, Carmo da Mata and Ibiturana).The others, always presented some combination that did not show adequacy to a PDF, and the Ouro Preto station features five cases (55.5%) of non-adequacy.This behavior can be associated to possible orographic events, because the region of Ouro Preto is characterized with mountainous topography, having the "Serra de Ouro Preto" as the northern limit and the "Serra Itacolomi", in the southern (Sobreira;Fonseca, 2001).The Desterro Melo presented four cases (44.5%) of non-adequacy.
Yet about Ouro Preto rain gauge, none of the adjustment methods when associated to the Gumbel distribution model was statistically adequate, which was the motivation to extend the analysis by applying the Chisquared (χ 2 ) and Anderson-Darling tests.The results from these statistical tests are presented in table 4.
Based on Chi-squared test, the Gumbel distribution was adequate to 87.5% of the cases.The cases of non-adequacy occurred for a same station (Ibituruna), independently of the adjustment method, which can be associated to the highest dispersion degree of the respective data set (Table 2), resulting in reduced frequency in some classes determining the class grouping to meet a minimum frequency.This limitation was also pointed out by Ferreira (2005), highlighting the fragility of this test in the analysis of asymptotic series.It was also found that the lowest Chi-squared values generated, except for the data sets of Barbacena, Desterro Melo and Ibituruna, were found for the L-moments method (LM).
The application of Anderson-Darling test to evaluate the Gumbel distribution model demonstrated that only 58.3% of the series were adequate, indicating its greater rigor in carrying out the adjustment of the PDF.
It is noteworthy that among the non-adequacy situations there was a concentration in three rain gauges (Ouro Preto, Itapecirica and Desterro Melo ), regardless of the parameter estimation method.As all studied data sets present over 41year long-term, it can be inferred that the size of the series should not has been a determining factor in the estimation of parameters, in opposition to the statement of Naghettini and Pinto (2007) that for the ML series with higher quantity of data provide more satisfactory results.

Table 1 -
Details about rain gauges analyzed and respective observed period.

Table 2 -
Basic statistics of the maximum daily rainfall data sets for each rain gauge studied.

Table 3 -
Results of the Filiben test for GEV, Gumbel and Gamma 2P probability distribution models, with adjusting methods of MM, ML and LM.