Probability distribution of heavy rainfall and determination of IDF in the city of Caruaru – PE

In the design of hydraulic engineering works, the estimation of project precipitation is fundamental. Rain forecasting depends on several factors, which makes estimating it simpler with stochastic processes. In this sense, the distributions of Gumbel (GUM), Log-Normal twoparameter (LN2P), Generalized Extreme Value (GEV), Fréchet with two and three parameters (FRE2P and FRE3P), Weibull with two and three parameters (W2P and W3P), Gamma (GAM2P), and Pareto with two and three parameters (PAR2P and PAR3P) were evaluated to the annual maximum daily precipitation (AMDP) adjustment in the city of Caruaru (Pernambuco ́s Agreste). A series of AMDP was used, based on data obtained from the National Water Agency (Agência Nacional de Águas ANA). Anderson Darling (AD), KolmogorovSmirnov (KS) and Pearson Chi-square (χ2) adherence tests, and the determination coefficient (R2) were used to assess the adherence quality of the distributions. The Likelihood Method presented a better fit quality than the Moment Method. The GEV distribution obtained the best results for the AD test in both methods to estimate the parameters. Among the adherence tests used, the AD test was considered the most restrictive. To verify the quality parameters’ fitness to the IDF relations, the Willmott performance coefficient was used. For all distributions employed in this study, Willmott performance coefficients presented values above 0.99, giving a perfect fit of IDF relations with determination coefficients close to 1.0.


INTRODUCTION
For the proper management of the waters of a region, hydrological knowledge is necessary, with the characterization of water flow and extreme events of maximum and minimum precipitation. In the case of intense rain events, studies can be applied to understand the hydrological behavior of river basins to control floods, as well as to estimate project flows for the design of hydraulic structures (Caldeira et al., 2015). As a rain event is a continuous random variable, it can be represented by theoretical probability distributions (Gandini and Queiroz, 2018). Therefore, Junqueira Jr. et al. (2015) indicates that the adjustment to a probabilistic model that best describes the process is necessary for the estimation of extreme events. There are several probabilistic models applicable to modeling maximum annual events of hydrological variables. In Brazil, the adjustment by more simplified theoretical probability models has been commonly observed, such as Log-Normal (with two and three parameters) and Gumbel distributions (Caldeira et al., 2015). However, according to Back (2001), even if a distribution provides a good fit for a series of data, its application cannot be generalized, and it is recommended that several distributions for a data set be tested. Thus, Blain and Meschiatti (2014) analyzed the performance of the Wakeby, Kappa, and Generalized Extreme Value distributions in estimating annual maximum total precipitation (daily, two, and three days) in the city of Campinas -SP. Borges and Thebaldi (2016) used the models of Gumbel, Fréchet, Gama, and Log-Normal (with two and three parameters) in the analysis of AMDP for the municipality of Formiga -MG.
Another important factor in the characterization of rainfall is the estimation of the parameters of the distributions of random variables, which can be done numerically. In general, Method of Moments (MM) (Oliveira et al., 2008;Silva and Oliveira, 2017) and Maximum Likelihood Method (ML) (Alves et al., 2013;Rodrigues et al., 2013;Cotta et al., 2016;Santos et al., 2018), or both (Mello and Silva, 2005;Franco et al., 2014;Alcântara et al., 2019a;2019b) are widely used in the literature.
In this sense, the state of Pernambuco has different rainfall activities throughout its territory. In general, the pluviometric index increases as it approaches the coast, where the extreme precipitations are more significant and intense. Comparatively, according to Ferreira et al. (2018), in the State of Pernambuco, monthly rainfall has greater variability and less predictability in the regions near Zona da Mata and Agreste than in the Sertão mesoregion. In addition to this characteristic, the rains in Agreste of the state are increasingly punctual, which increases annual maximum daily precipitation (AMDP), resulting in a higher occurrence of extremely dry events compared to other mesoregions (Nóbrega et al., 2015). Thus, this study evaluated the distribution of random variables of Gumbel (GUM), Log-Normal two-parameter (LN2P), Generalized Extreme Value (GEV), Fréchet with two and three parameters (FRE2P and FRE3P), Weibull with two and three parameters (W2P and W3P), Gamma (GAM2P), Pareto with two and three parameters (PAR2P and PAR3P) adjusted to AMDP events, in the city of Caruaru, in the Agreste region of the state of Pernambuco. It also obtained and evaluated the intensity-duration-frequency (IDF) equations parameters from the aforementioned probabilistic models.

Location and characterization of the experimental area
The study was carried out for the city of Caruaru, in the Agreste of Pernambuco's state, 130 km away from the capital, Recife. Caruaru is located in an area with a tropical climate of semi-arid type. But, due to its modest altitude, it presents less severe aridity, with hot and dry summers and mild and relatively rainy winters. According to the Köppen classification, the climate of Caruaru is classified as hot and humid tropical, with a dry season in winter (Medeiros et al., 2018).

Acquisition of rainfall data
The data were collected on the Hidroweb portal of the National Water Agency (ANA, 2019). The chosen station was the rainfall station with the code 835106 (Latitude -8.302792; Longitude -36.010798), operated by the Mineral Resources Research Company (Companhia de Pesquisa de Recursos Minerais -CPRM), under the responsibility of ANA.
The years 2006 and 2012 had no data. The maximum daily rainfall in those years was obtained using data from pluviometric stations of the Pernambuco's Water and Climate Agency (APAC, 2019). Therefore, the analysis of the years of interest resulted in data from three stations (24, 211, and 484), from which the values 48.9 and 49.4 were obtained, for 2006 and 2012, respectively. With this information, the missing data of the historical series of the station to be studied were filled in, resulting in the data in Table 1.

Empirical Distribution
The data obtained were organized in decreasing order, and from these values, the empirical function was determined using the California Method (Equation 1).

=
(1) Where Femp is the surplus empirical frequency; n is the size of the historical series; i is the position occupied by the data in the series.

Parameter Estimation
In this study, two methods were used, the Maximum Likelihood Method (ML) and the Method of Moments (MM). Naghetthini and Pinto (2007) describe these methods.
The function FindDistributionParameters (Find distribution parameters) was used for adjusting the data to the probabilistic distribution. This function receives the rain data, the probabilistic model, and the parameter estimator as input. The parameter estimator is the numerical method used to estimate the values of the parameters.  Table 2. Probability Density Functions used and the description of the respective parameters.

Adherence Test
To assess the adherence of theoretical statistical distributions to the empirical probability distribution, if Fexc is suitable for Femp, the tests of Anderson Darling (AD) (Equation 12), Kolmogorov-Smirnov (KS) adherence tests (Equation 13) and Pearson's chi-square (χ2) were used (Equation 14).
Rev. Ambient. Água vol. 16 n. 1, e2555 -Taubaté 2021 The determination coefficient (R²) was used to quantify the quality of statistical adjustments (Equation 15). R² determines the correlation of the variance in the experienced values that can be attributed to those observed, thereby expecting the value of 100%.
Where SQE is the sum of the residue squares, and SQT is the sum of the total squares.

Rainfall Disaggregating
The method of disaggregating rainfall developed by DAEE/CETESB (1980) adopts the average factor of 1.14 for the transformation of maximum rainfall of 1 day into a rainfall of 24 hours. Factors of 0.85; 0.82; 0.78; 0.72; and 0.42 are used to reduce the rainfall of 24 hours into rains of 12h, 10h, 8h, 6h, and 1h, respectively. The 30-minute rainfall is obtained by multiplying the 1-hour rainfall by 0.74; and to obtain the 25, 20, 15, 10, and 5 min rains, the rain of 30 min is multiplied by 0.91; 0.81; 0.70; 0.54; and 0.34, respectively.

Intense rain equation
For the characterization of extreme rainfall, it is necessary to determine empirical equations called intensity-duration-frequency (IDF) equations, or intense rainfall equations (Equation 16) (Bertoni and Tucci, 1993): Where i = is the precipitation intensity in mm/h; T = is the precipitation return period in years; t = is the duration of precipitation in minutes; a, b, c, and d = are statistical adjustment parameters.
The NonLinearModelFit function was used for performing the non-linear adjustment of the IDF parameters. This function receives as input the disaggregated rain data and the distribution parameters and makes the numerical adjustment of the IDF parameters.
The routine used to calculate the distribution parameters, the IDF parameters, and the adjustments can be observed in: https://github.com/ravellys/PAER.
As a method to interpret the performance of the distributions' adjustment, the criteria of the index C was used. According to Camargo and Sentelhas (1997), this indicator is excellent for C values higher than 0.85. Table 3 shows the results of the parameters for each model of distribution of random variables estimated by the MM and the ML with the respective adherence and determination coefficient tests for the set of Caruaru's city rainfall data. The GUM distribution presents position and scale parameters close to those of the GEV distribution. These parameters were estimated according to MM and ML, respectively. In a study similar to this one, for the municipalities of Afogados da Ingazeira, Recife, Rio Formoso, Petrolina and Toritama, Alcântara et al. (2019a) also found this similarity. On the one hand, there is also a similarity between the parameters estimated by GUM, W2P, LN2P, GAM2P, FRE2P, and GEV, according to MM, and FRE3P, which was estimated according to ML. On the other hand, the values predicted by PAR2P, W3P, and PAR3P show significant differences when comparing the determinations done by MM and ML. It is also possible to observe that the value estimated by PAR3P is very different from the other two, according to MM and ML.

RESULTS AND DISCUSSION
The results of the tests of adherence and determination coefficient (R²) were obtained and ordered in decreasing sequence, and are presented in Table 4.
At the analysis of the statistical tests, the distributions whose results were less than 0.05 (significance level of 5%) were considered unsatisfactory. In general, it is observed that the ML presented more satisfactory distributions than the MM, which was also observed by Back (2001) and Alves et al. (2013). For MM, it is observed that the distributions of PAR2P and FRE3P failed in all adhesion tests. According to ML, PAR2P was also unsatisfactory for all analyses, while the distribution of PAR3P did not obtain good results in either case. W2P was another distribution that did not present a sound adjustment to the tests at a level of 5% of significance. This fact goes against those found in the literature, where distributions to two parameters are often determined as excellent (Silva et al., 2012;Aragão et al., 2013;Finkler et al., 2015). Considering each adherence test individually, the AD test proved to be the most rigorous, failing five distributions, which were FRE2P, W3P, PAR2P, FRE3P (by MM), and PAR2P (by ML). When comparing the rigor of the adhesion methods, Beskow et al. (2015) and Franco et al. (2014) affirm the severity of the AD test compared to KS and χ 2 . This occurrence may have happened due to the greater precision that the AD test showed in the upper and lower tails of the distribution (Naghettini and Pinto, 2007). In this perspective of rigor, Douka and Karacostas (2018) indicate that AD is more appropriate for assessing extreme precipitation events. The KS test was the least rigorous, as pointed out by Caldeira et al. (2015), failing only the distributions of PAR2P and FRE3P (by MM) and PAR2P (by ML). The coefficient of determination R² showed a good or very good fit (> 0.90) for all distributions, except for PAR2P, FRE3P (by MM), and PAR2P (by ML); this last one failed at all significance levels by the ML adherence tests. Almost as strict as the AD test, the χ2 test rejected four methods, which were W2P, PAR2P, FRE3P (by MM), and PAR2P (by ML). For this test, Finkler et al. (2015) indicate a rigor in the interpretation of results and should be considered when choosing the most appropriate function for the series of minimum flows.
Gumbel distribution has been used in several studies of extreme rainfall, showing a better adjustment to 60% of the data series of 100 pluviometric stations in the state of Santa Catarina (Back, 2001). However, based on the previously discussed arrangements, it was not satisfactory for the test in the present study, neither by MM nor ML. This result reinforces the need for evaluating different estimation methods. Other studies have also found better adherence to other distributions when compared to Gumbel, such as Weibull (Aragão et al., 2013) and GEV (Ben-Zvi, 2009).
As for MM, the GEV distribution showed the best results for the AD and KS tests, in addition to having the best R². The FRE2P distribution showed an excellent result for the χ 2 test, coming very close to the unit, even though it failed the AD test. For the ML, the FRE3P distribution, which was not adjusted by any statistical test by MM, obtained the best results for Probability distribution of heavy rainfall and determination … Rev. Ambient. Água vol. 16 n. 1, e2555 -Taubaté 2021 the AD and KS tests. Therefore, it was bound with the GUM distribution in the second position in the χ 2 test and presented an R² very close to the unit (0.990). Another highlight is the GEV distribution, which, as in the previous case, was well in all tests, showing an R² tied with the FRE3P distribution. Some works found in the literature point to better adequacy of GEV compared to GUM (Alves et al., 2013;Franco et al., 2014;Nguyen and Nguyen, 2016). The distribution of LN2P presented the best result for the χ 2 test and the R².
Regarding the Weibull distribution, the insertion of three parameters (W3P), compared to two parameters (W2P), showed a simple increase in the adhesion and R² tests, except for the AD test by MM. Even with this increase, Weibull's distribution was not so satisfactory. The sound adherence to the Log-Normal variable, in this study, showed that W2P, may be related to the fact that the variable is positive and has an asymmetry coefficient greater than zero, making it widely applied in studies of maximum precipitation (Naghettini and Pinto, 2007).
Given these results, it is possible to infer the distributions that best fit. GEV, LN2P, and GUM were applied using the MM method. For the ML, the three best distributions were LN2P, GEV, and FRE3P. Figure 1 shows a comparison between them, where their theoretical curves are an interpolation of the empirical data. For MM, it is observed that the three distributions show very similar behavior with the GUM distribution going slightly out of the pattern of the others in the case of events with high frequency. This result was expected, considering that these distributions were the best classified by the adhesion and R² tests. Also, similar results for the GEV and LN2P distributions are reflected in the behavior of their almost overlapping curves.
The distribution of GEV also showed a proper adjustment by the ML. The same result can be seen in Blain (2013), who analyzed a pluviometric station in the city of Campinas, in the state of São Paulo from the year 1980 to 2012. It is noticeable that the three distributions are almost entirely overlapping, indicating the consistency of the excellent results obtained in the tests for them and their capability of representing the behavior of the rain data. In comparison with the theoretical frequency curves obtained by MM, the curves obtained by ML visually fit better with the empirical frequency data. This result was also expected because, according to the results of the two methods, the parameters obtained by ML did better in all tests and also had a better R² determination coefficient.
In possession of the parameters of the theoretical frequency distributions estimated by both methods (MM and ML), it was possible to determine the IDF parameters that fit Equation 14, and which represent one of the final objectives of this work. Table 5 presents the parameters 'a', 'b', 'c', and 'd' from MM and ML with their respective statistical criteria: R², RMSE, r, D, and C. For MM, parameters c and d presented identical values for all distributions. The 'b' parameter shows little variation between distributions. The 'a' parameter, however, was the one with the most significant values disparity when comparing the distributions. Focusing on the distributions that passed the adhesion tests and obtained better results, namely, GUM, GEV and LN2P the results of 'a' are relatively similar. The results for 'a' that are most discrepant from the others are precisely the distributions that did not fit the probabilistic model by any test, FRE3P, and PAR2P.
As for the ML, it is worth noting that parameters 'c', and 'd' showed the same values as their counterparts estimated by MM. The values of parameter 'b' varied more in relation to its counterpart estimated by MM. The 'a' parameters estimated by ML are similar to those estimated by MM for their respective distributions. Exceptions are the PAR2P and FRE3P distributions, which did not pass any adherence tests when having their parameters estimated by the MM. Even though the FRE3 distribution was well adjusted when having its parameters evaluated by the ML.
Both parameters estimated by MM and ML achieved optimum performance for all distributions, showing that the IDF equation can describe the theoretical behavior of all distributions, even for those that failed the adherence tests. In a similar study for the capitals of the northeastern Brazilian states, Silva and Oliveira (2017) obtained very satisfactory results for the distribution of GUM. For all cities, it had an excellent performance with C ranging from 0.95 to 0.99 and R² of 0.99 in all capitals. Besides, Santos et al. (2009) carried out similar works for the state of Mato Grosso, obtaining an average R² determination coefficient of 0.98. Figure 2 shows the IDF curves for the return times of 2, 10, 50, and 100 years for the distributions that adhered to the empirical data for MM ( Figure 2A) and ML ( Figure 2B). The distribution of GEV to MM (Figure 2A) obtained the best result in the AD test. Hence, its curve is better able to describe extreme events, even if in comparison with other curves, predicting less heavy rains for the same duration in the return times of 2 and 10 years. For return times of 50 and 100 years, its curve was in the middle, where the PAR3P distribution overestimated the rain intensity for these return times.
In the case of the curves developed from ML ( Figure 2B), the IDF curves of FRE3P and GEV appear overlapping all the time, making it impossible to separate them visually. This outcome was expected since both obtained surprisingly similar results in the adhesion tests. Both were also tied with the best performance in the AD test, so they are recommended to predict extreme events. Its curves occupy an intermediate position among the others. It is interesting to notice that the FRE2P distribution curve (ML) estimates the lowest intensities for the 2-year return time. However, with the increase of the return time, it starts to have expected intensity values much higher than the others, reaching intensities estimations close to 200 mm/h for a return time of 100-year (ML).

CONCLUSIONS
Both estimation methods were able to adjust the parameters of the theoretical distributions to the annual maximum daily precipitation data. However, ML was more effective than MM. Distributions parameters estimated by ML fared better than MM in all cases, except for FRE2P, which did not adjust in any of the cases. The three distributions that stood out for the MM were those of GEV, LN2P, and GUM. By the ML, the three best distributions according to the statistical tests and validated by the R² were LN2P, GEV, and FRE3P. The distribution of GEV also showed a reasonable adjustment by MM.
The distribution of GEV by MM obtained the best result in the AD test, so its curve is the best able to describe extreme events. In the case of curves made from ML, the IDF curves of FRE3P and GEV obtained surprisingly similar results in the adhesion tests. Both were also tied with the best performance in the AD test, so they are recommended to predict extreme events. All IDF equations had optimal adjustments, in addition to R², which indicates a fit with excellent quality. It was also found that it is possible to use the GUM-MM distribution to predict precipitation events with satisfactory accuracy within the parameters used in this work.