## Serviços Personalizados

## Journal

## Artigo

## Indicadores

- Citado por SciELO
- Acessos

## Links relacionados

- Citado por Google
- Similares em SciELO
- Similares em Google

## Compartilhar

## Ciência e Agrotecnologia

##
*versão impressa* ISSN 1413-7054

### Ciênc. agrotec. vol.38 no.4 Lavras jul./ago. 2014

#### https://doi.org/10.1590/S1413-70542014000400003

**AGRICULTURAL SCIENCES**

**Performance of the probability distribution models applied to heavy rainfall daily events**

**Desempenho de distribuições de probabilidades aplicadas a eventos extremos de precipitação diária**

**Rosângela Francisca de Paula Vitor Marques ^{I}; Carlos Rogério de Mello^{II}; Antônio Marciano da Silva^{II}; Camila Silva Franco^{II}; Alisson Souza de Oliveira^{II}**

^{I}Universidade Federal de Lavras/UFLA – Departamento de Engenharia/DEG – Cx. P. 3037 – 37200-000 – Lavras – MG – Brasil – rosarecursoshidricos@posgrad.ufla.br

^{I}^{I}Universidade Federal de Lavras/UFLA – Departamento de Engenharia/DEG – Lavras – MG – Brasil

**ABSTRACT**

Probabilistic studies of hydrological variables, such as heavy rainfall daily events, constitute an important tool to support the planning and management of water resources, especially for the design of hydraulic structures and erosive rainfall potential. In this context, we aimed to analyze the performance of three probability distribution models (GEV, Gumbel and Gamma two parameter), whose parameters were adjusted by the Moments Method (MM), Maximum Likelihood (ML) and L - Moments (LM). These models were adjusted to the frequencies from long-term of maximum daily rainfall of 8 rain gauges located in Minas Gerais state. To indicate and discuss the performance of the probability distribution models, it was applied, firstly, the non-parametric Filliben test, and in addition, when differences were unidentified, Anderson-Darlling and Chi-Squared tests were also applied. The Gumbel probability distribution model showed a better adjustment for 87.5% of the cases. Among the assessed probability distribution models, GEV fitted by LM method has been adequate for all studied rain gauges and can be recommended. Considering the number of adequate cases, MM and LM methods had better performance than ML method, presenting, respectively, 83% and 79.2% of adequate cases.

**Index terms: **Probability distribution models, intense rainfall, statistical inference, non-parametric statistical tests.

**RESUMO**

Estudos probabilísticos de variáveis hidrológicas, como a precipitação pluvial diária máxima, constituem-se um importante instrumento de apoio para o planejamento e gestão de recursos hídricos, principalmente quando associados ao dimensionamento de estruturas hidráulicas e potencial erosivo. Neste contexto, objetivou-se analisar o desempenho de três distribuições de probabilidades (GEV, Gumbel e Gama a dois parâmetros), cujos parâmetros foram ajustados pelos métodos dos Momentos (MM), da Máxima Verossimilhança (ML) e dos Momentos-L (ML), aplicados às séries históricas de precipitação diária máxima de 8 estações pluviométricas, localizadas no centro oeste de Minas Gerais. Para a verificação da melhor combinação distribuição de probabilidade e método de estimativa dos parâmetros das distribuições, aplicou-se o teste de aderência de Filliben, e, complementarmente, quando não identificadas diferenças, utilizou-se dos testes de Anderson Darlling e Qui-quadrado. A Distribuição de Probabilidades de Gumbel apresentou melhor desempenho, ajuste em 87,5% dos casos. Entre as distribuições de probabilidades avaliadas, a GEV ajustada por ML, apresentou aderência para todas as estações pluviométricas, podendo ser indicada. Considerando o numero de ajustes verificados, os métodos de estimação dos parâmetros MM e ML apresentaram melhor desempenho do que o método ML, apresentando, respectivamente, 83% 79.2% de casos adequados.

**Termos para indexação: **Distribuição de probabilidades, chuvas intensas, inferência estatística, testes estatísticos não paramétricos.

**INTRODUCTION**

Probabilistic studies of hydrological variables such as heavy rainfall constitute an important element for supporting water resources planning and management. Among the features of great interest is the study of rainfall frequency associated to the maximum daily rainfall, whose behavior is strongly associated with the asymptotic distributions (Mello; Silva, 2005).

Several studies in the literature have investigated the probability distribution models for extreme values of climate variables, especially the Gumbel and Generalized Extreme Value (GEV) models, which have produced better adjustments or performances. In the study of intense rainfall for the São Francisco Basin, Silva and Clarke (2004) concluded that the use of the Gumbel distribution could not be recommended for data sets throughout the San Francisco basin. Sansigolo (2008), comparing the Normal, Gumbel, Fréchet, Weibull, Log-Normal and Pearson probability distribution models, adjusted to maximum daily rainfall and maximum absolute temperature data sets, for Piracicaba city, SP state, concluded that the Gumbel distribution obtained the best performance. Araújo et al (2010) evaluated the Gumbel, Gamma, Log-Normal, Normal, Weibull and Beta probability distribution models applied to the long-term daily maximum temperature for Iguatu, Ceará, and concluded that all the models were adequate. However, according to the Lilliefors test, the best and worst probability distribution model were, respectively, the Normal and the Gumbel.

Mello and Silva (2005) adjusted the Gumbel distribution to maximum rainfall long-term series belong to seven rain gauges, in the Upper Grande River region. They studied the effects that the estimation methods of Maximum Likelihood and Moments can generate on the parameters of the Gumbel distribution, analyzing yet the effects on the intense rainfall equation parameters estimate. The authors verified that the Maximum Likelihood has produced the lowest Chi-Square values, concluding about its greater performance.

The methods for adjusting a given probability distribution model, including those of Moments, Maximum Likelihood and L-moments, can lead to different results. The Moments Method (MM), according to Naghettini and Pinto (2007), is the simplest and normally, less efficient. The Maximum Likelihood method (ML) is considered as the method that maximizes the plausibility of a given distribution to be represented by the estimated parameters. However, for some cases, small-sized samples can produce estimators comparable or even inferior to other methods. The L-Moments method (LM) produces parameters comparable in quality to those produced by the ML method, however, for small samples, LM method can be more accurate, and thus, better performance (Naghettini; Pinto, 2007).

To compare the observed frequency of a given variable against to the respective probability (estimated frequency) are used for non-parametric statistical tests, like Kolmogorov-Smirnov, Lilliefors, Shapiro-Wilk, Cramervon Mises, among others (Campos, 1983, Assis; Arruda; Pereira,1996), in order to verify whether the sample values can be considered as originating from a theoretical distribution with that population. The Anderson - Darling test is an alternative at to the Chi-Square and Kolmogorov-Smirnov tests, as it gives more weight to the tails of the frequency distribution, being more recommended for asymptotic distributions (Naghettini; Pinto, 2007).

In this context, we developed this studied aiming to analyze the performance of three probability distribution models (GEV, Gumbel and Gamma two-parameter), whose parameters were adjusted by Moments, Maximum Likelihood and L-moments methods, applied to long-term series of maximum rainfall daily events from eight rain gauges located in Minas Gerais state.

**MATERIAL AND METHODS**

A long-term series of maximum daily rainfall, obtained from the National Water Agency (ANA/HIDROWEB, 2012), from eight rain gauges located in medium to high heavy rainfall zones (Mello et al., 2007), in Minas Gerais state: Barbacena, Carmo da Mata, Desterro Mello, Divinópolis, Estiva, Ibituruna, Itapecirica and Ouro Preto were used. All the long-term series had at least 34 years of complete and recent data sets, being the period between 1977 and 2007 common to all them (Table 1 and figure 1).According to the National Center for Disaster Monitoring and Alert (CEMADEN, 2014), the municipalities of Barbacena, Desterro do Melo and Ouro Preto are located in high landslide risk areas making this study highly strategy for actions purposing to minimize the impacts.

Three Probability Density Functions (PDF) designed for extreme values, were used: Generalized Extreme Value - GEV (Equation 1), Gamma two-parameter (Equation 2), and Gumbel (Equation 3), whose parameters were estimated by the zmethods of Moments (MM), Maximum Likelihood (ML) and L-Moments (LM) ( Naghettini; Pinto, 2007).

Where x is the hydrological variable (annual maximum daily rainfall), α, μ and ξ are the parameters of this distribution.

Where β and υ are the parameters of this distribution.

Where α and μ are the parameters of the distribution.

With the parameters estimated by the MM, ML and LM methods for the Gumbel, GEV and 2P Gamma probability distribution models, the Filliben non-parametric test was previously applied (Equations from 4 to 6) to evaluate the performance of these models to represent the respective data sets. For distribution and methods where the adjustment were statistically the same, we used two another non-parametric tests: Anderson-Darling (Equation 7) and Chi-square (Equation 6), considering 0.05 of probability level (Naghettini; Pinto, 2007).

The Filliben test is based on the linear correlation coefficient *r *between observations sorted in ascending order {*x*(1), *x*(2), ... , *x*(i) , ... *x*(*N*)} and the theoretical quantiles {*w*1, *w*2, ... , *w*i , ... *wN*}, which are calculated by:

Where is the inverse of the PDF; *qi* is the empirical probability corresponding to the classification order *i*.

N is the sample size and a = 0.40 (GEV), a = 0.44 (Gumbel) and = 0.40 (Gamma).

Formally, the statistical of Filiben test is given by:

The r_{calc} values are compared against those of r_{crit}, which are obtained at a significance level of 0.05. If r_{calc} > r_{crit}, the sample can be represented by respective distribution.

The Anderson-Darling non-parametric test strongly considers the distributions tails, in which the largest (or smallest) observations of the sample can greatly alter the quality of the adjustment. The test is based on the difference between the cumulative probability functions FN(x) and theoretical of continuous random variables FX(x). Thus, the statistics of the Anderson-Darling test is:

Where N is the sample size, i the position of each of the data in the time series placed in position of ascending order, P1 (X < xi) is the probability of non-exceedance and P2(X > xi) is the probability of exceedance. If AD^{2} < p (α), the null hypothesis (H_{0}) of adequacy of the PDF is accept. In this study, p (α) = 0.757 was considered for all PDF.

In the Chi-squared test, the null hypothesis (H_{0}) is tested by making a comparison between the observed and theoretical frequencies in each class grouping of sample data. The statistical of the test is given by:

Where n is the number of class groups, f_{obsi} and f_{theor} are, respectively, the observed and theoretical frequencies in class i.

The null statistical values of this test ( χ^{2}_{ tab}) is obtained based on the degree of freedom which is obtained by the number of frequency classes minus the number of parameters of the PDF and minus one, and significance level of 0.05. If χ^{2}_{calc} < χ^{2}_{tab},_{ }the PDF is suitable for the series studied. Furthermore, the χ^{2}_{calc} values reflects a squared mean error, with the participation of all the data sets of the series. Thus, it can be considered to compare and discuss the performance of the PDF, allowing the indication not only if the PDF is adequate but also the most appropriate model.

**RESULTS AND DISCUSSION**

Table 2 presents the basic statistics of the maximum daily rainfall data sets studied for each rain gauge.It can be seen that Barbacena rain gauge shows the lowest dispersion statistical indicators, represented by the standard deviation (SD), range of variation (RV) and coefficient of variation (CV), followed by Estiva and Ouro Preto rain gauges. On the other hand, Itapecirica, Desterro Melo and Ibituruna present the highest values of these dispersion indicators. These basic statistics of the data sets are relevant to study the performance of the probability distribution models studied, mainly associated with the influence on adjusting methods performance.

In table 3, it is presented the results of the Filliben test applied to all possible combinations among the PDF, adjusting method and rain gauge.

In the initial analysis of the PDF adjustments, it appears that the Gumbel distribution model presented adequate adjusts for 87.5% of the cases, GEV for 83.3% and Gamma for 75%. These results differ from those obtained by Araújo et al (2010), where the Gumbel distribution showed inferior performance compared to other models tested by the author for Iguatu, CE. It is also worth noting that the Gumbel distribution presented equal r_{calc }values for the three adjustment methods (MM, ML and LM) for each one of the rain gauges stations studied, considering only those that were statistically adequate.

The Filliben test returned that GEV distribution was inadequate for two combinations considering MM and for two others considering ML. To Ouro Preto rain gauge, two of these situations were verified. The other two situations of GEV non-adequate were to the rain gauges of Desterro Melo and Estiva, respectively, for ML and MM. Another aspect worth noting is that the GEV-LM combination has presented statistically adequate to all the eight rain gauges, reflecting a combination that showed there is not statistical problems of adequacy. The Gamma 2P distribution model has six cases of non-adequacy, being three in just one station (Desterro Melo – for all the 3 adjustment methods). In addition, we can report that there were two cases of non-adequacy for Itapecirica rain gauge (ML and LM).

From the analysis of performance of the PDF tested, it appears that the MM and LM provided 83.3% of the adjustments, while the ML provided 79.2% of situations. The only method that enabled us to identify adequacy for all stations was the LM associated to the GEV distribution. Similar results were obtained by Beijo; Vivanco and Muniz (2009), Blain (2011) and Quadros; Queiroz and Vilas Boas (2011), proving the great performance of this PDF for maximum daily rainfall. However, these results are different from those obtained by Blain and Camargo (2012), pointing to the GEV - ML with better performance.

Comparing the r_{calc} values for the rain gauges, it seems that except for the Barbacena, Carmo da Mata, Divinópolis and Itapecirica, other values generated based on the GEV by LM are higher than those generated by Gumbel and Gamma models. With similar purposes, Silva and Clarke (2004), Mello and Silva (2005) and Hartmann; Moala and Mendonça (2011) verified higher precision of the Gumbel when adjusted by ML.

Regarding the rain gauges data sets, only three have showed adequacy to the three distributions and three adjust methods (Barbacena, Carmo da Mata and Ibiturana). The others, always presented some combination that did not show adequacy to a PDF, and the Ouro Preto station features five cases (55.5%) of non-adequacy. This behavior can be associated to possible orographic events, because the region of Ouro Preto is characterized with mountainous topography, having the "Serra de Ouro Preto" as the northern limit and the "Serra Itacolomi", in the southern (Sobreira; Fonseca, 2001). The Desterro Melo presented four cases (44.5%) of non-adequacy.

Yet about Ouro Preto rain gauge, none of the adjustment methods when associated to the Gumbel distribution model was statistically adequate, which was the motivation to extend the analysis by applying the Chi-squared (χ^{2}) and Anderson-Darling tests. The results from these statistical tests are presented in table 4.

Based on Chi-squared test, the Gumbel distribution was adequate to 87.5% of the cases. The cases of non-adequacy occurred for a same station (Ibituruna), independently of the adjustment method, which can be associated to the highest dispersion degree of the respective data set (Table 2), resulting in reduced frequency in some classes determining the class grouping to meet a minimum frequency. This limitation was also pointed out by Ferreira (2005), highlighting the fragility of this test in the analysis of asymptotic series. It was also found that the lowest Chi-squared values generated, except for the data sets of Barbacena, Desterro Melo and Ibituruna, were found for the L-moments method (LM).

The application of Anderson-Darling test to evaluate the Gumbel distribution model demonstrated that only 58.3% of the series were adequate, indicating its greater rigor in carrying out the adjustment of the PDF. It is noteworthy that among the non-adequacy situations there was a concentration in three rain gauges (Ouro Preto, Itapecirica and Desterro Melo ), regardless of the parameter estimation method. As all studied data sets present over 41-year long-term, it can be inferred that the size of the series should not has been a determining factor in the estimation of parameters, in opposition to the statement of Naghettini and Pinto (2007) that for the ML series with higher quantity of data provide more satisfactory results.

It is noteworthy that the data sets of Barbacena, Carmo da Mata and Estiva rain gauges showed adequate for all statistical tests. On the other hand, the long-term series that showed the greatest restriction to adjust was Ouro Preto, which only presented adequacy by the Chi-squared test.

**CONCLUSIONS**

Based on Filiben, Chi-squared and Anderson-Darling tests, the Gumbel distribution model has presented better performance, followed by GEV e Gama.

The Gumbel distribution model performance maintained the same pattern with the application of the Chi-squared test, but decreased substantially when Anderson-Darling test was applied.

GEV distribution model adjusted by L-moments (LM) method was statistically adequate to all rain gauges studied, and can be indicated for probability studies in regions classified as medium to high rainfall intensity.

Considering the number of adequate adjusts, the estimation methods of MM and LM showed better performance than ML method.

The maximum daily rainfall series of Barbacena, Carmo da Mata and Ibituruna were the only ones that enabled adequacy for all possible combinations between the PDF and parameter estimation method having Filliben test as reference.

**REFERENCES**

ANA - Agência Nacional das Águas. Sistema de Informações Hidrológicas. Disponível em: http://hidroweb.ana.gov.br/. Acesso em 10/04/2012. [ Links ]

ARAÚJO, E. M. et al. Aplicação de seis distribuições de probabilidade a séries de temperatura máxima em Iguatu – CE. **Revista de Ciência Agronômica**. 41(1):36-45, 2010.

ASSIS, F. N; ARRUDA, H. V. de; PEREIRA, A. R. **Aplicações de estatística à climatologia**: teoria e prática. 1 ed. Pelotas: Universidade Federal de Pelotas, 1996. 161p. [ Links ]

BEIJO, L. A. VIVANCO, M. J. F.; MUNIZ, J. A. Análise Bayesiana no estudo do tempo de retorno das precipitações máximas em Jaboticabal (SP). **Ciência e Agrotecnologia**. 33(1):261-270, 2009. [ Links ]

BLAIN, C. G.; Cento e vinte anos de totais extremos de precipitação pluvial máxima diária em Campinas, Estado de São Paulo: análises estatísticas. **Bragantia**. 70(3):722-728, 2011. [ Links ]

BLAIN, C. G.; CAMARGO, M. B. P. Probabilistic structure of an annual extreme rainfall series of a coastal area of the state of São Paulo, Brazil. **Engenharia Agrícola**. 32(3):552-559, 2012. [ Links ]

CAMPOS, H. de. **Estatística Experimental Não Paramétrica**. 4 ed. Piracicaba: ESALq, 1983. 349 p. [ Links ]

CEMADEN - Centro Nacional de Monitoramento e Alertas naturais Disponível em: __ http://www.cemaden.gov.br/municipiosprio.php__, Acesso em: 25/03/2014. [ Links ]

FERREIRA, D. F. **Estatística Básica**. 1. ed. Lavras: Editora UFLA, 2005. 664 p. [ Links ]

HARTMANN, M.; MOALA, F. A.; MENDONÇA, M. A. Estudo das precipitações máximas anuais em Presidente Prudente. **Revista Brasileira de Meteorologia**. 26(4):561-568, 2011. [ Links ]

MELLO, C. R.; SILVA, A. M. Métodos estimadores dos parâmetros da distribuição de Gumbel e sua influência em estudos hidrológicos de projeto. **Irriga**. 10(4):318-334, 2005. [ Links ]

MELLO, C. R. et al. Erosividade mensal e anual da chuva no Estado de Minas Gerais. **Pesquisa Agropecuária Brasileira**. 42(4):537-545, 2007. [ Links ]

NAGHETTINI, M.; PINTO, E. J. A. **Hidrologia Estatística**. Belo Horizonte: CPRM, 2007. 552p. [ Links ]

QUADROS, L. E.; QUEIROZ, M. M. F.; VILAS BOAS, M. A. Distribuição de frequência e temporal de chuvas intensas. **Acta Scientiarum. Agronomy**. 33(3):401-410, 2011. [ Links ]

SANSIGOLO, C. A. Distribuições de extremos de precipitação diária, temperatura máxima e mínima e velocidade do vento em Piracicaba, SP. **Revista Brasileira de Meteorologia**. 23(3):341-246, 2008. [ Links ]

SOBREIRA, F.G.; FONSECA, M.A. Impactos físicos e sociais de antigas atividades de mineração em Ouro Preto, Brasil. **Revista Portuguesa de Geotecnia**, Lisboa, nº 92, p. 5-28. 2001. [ Links ]

SILVA, B. C.; CLARKE, R. T. Análise estatística de chuvas intensas na Bacia do Rio São Francisco. **Revista Brasileira de Meteorologia**. 19(3):265-272, 2004. [ Links ]

Received in april 04, 2014

Approved in june 23, 2014