STANDARDIZED PRECIPITATION INDEX BASED ON PEARSON TYPE III DISTRIBUTION

The initial step in calculating the Standardized Precipitation Index (SPI) is to determine a probability density function (pdf) that describes the precipitation series under analysis. Once this pdf is determined, the cumulative probability of an observed precipitation amount is computed. The inverse normal function is then applied to the cumulative probability. The result is the SPI. This article assessed the changes in SPI final values, when computed based on Gamma 2-parameters (Gam) and Pearson Type III (PE3) distributions (SPIGam and SPIPE3, respectively). Monthly rainfall series, available from five weather stations of the State of São Paulo, were chosen for this study. Considering quantitative and qualitative assessments of goodness-of-fit (evaluated at 1-, 3-, and 6-months precipitation totals), the PE3 distribution seems to be a better choice than the Gam distribution, in describing the longterm rainfall series of the State of São Paulo. In addition, it was observed that the number of SPI time series that could be seen as normally distributed was higher when this drought index was computed from the PE3 distribution. Thus, the use of the Pearson type III distribution within the calculation algorithm of the SPI is recommended in the State of São Paulo.


INTRODUCTION
The Standardized Precipitation Index (SPI) was developed by McKee et al. (1993McKee et al. ( , 1995) ) as a drought indicator which standards the rainfall deficits/excess on temporal and regional basis.As described by Keyantash and Dracup (2002) the SPI represents observed rainfall as a standardized departure with respect to a rainfall probability distribution function.
Since McKee et al. (1993), the SPI model has been used by several authors on climate variability evaluations.Hayes et al. (1999) applied the SPI algorithm in describing drought conditions in the State of Texas, USA.Zhai et al. (2009)

calculated this
Thus the aim of this study was to evaluate the changes in SPI final values, when computed based on Gamma 2-parameters and Pearson Type III distributions in the State of São Paulo, Brazil.

MATERIAL AND METHODS
Monthly rainfall series, available from five weather stations (Campinas, 1890(Campinas, to 2008;;Pindorama, 1951Pindorama, to 2008;;Presidente Prudente, 1960to 2008;Ribeirão Preto, 1937to 2008and Ubatuba 1935to 2008) were chosen for this study.These five stations represents climatically dissimilar areas of the State of São Paulo, ranging from the coast of the State (Ubatuba), where there is the absence of a dry season, to the western region of the State (Presidente Prudente), where there is a remarkable dry season during the winter months (Figure 1).
The pdf of the Gam distribution is defined by: Since 1 is undefined at x=0 (no rainfall event), the final cumulative probability becomes: Hg(x) = qg+(1-qg) Gam(x) qg is the empirical probability of x=0, and x, α, β >0 (1) (2) The qg empirical factor can be estimated by the ratio between the number of data within the sample (N) and the number of zeros (n) within this sample (qg=N/n).The two parameters of the Gam distribution are α, the shape parameter; and β, the scale parameter.The quantity Г(α) is the gamma function (Wilks, 2006).Following Thom (1966), Blain (2005) and Wilks (2006), α and β estimation was based on the maximum likelihood method.
The PE3 distribution has a three parameters pdf described by: for γ > 0; and ξ=μ-2σ/γ where μ, σ, γ are, respectively, the location (mean value of the series), the scale, and the shape parameters.
According to Hosking and Wallis (1997), the quantile function pe(x) has no explicit analytical form.Since 4 is undefined at x<μ-2σ/γ, the cumulative probability becomes: qp is the empirical probability of x<μ-2σ/γ values, and G(.) is the incomplete gamma function As pointed out by Hosking and Wallis (1997) if γ equals zero, then the distribution is Normal and the range of x becomes -∞<x<∞.Following Hosking and Wallis (1997) and Guttmann (1999) the parameters estimation was based on the methods of L-moments.According to Hosking (1990) L-moments fitting tend to be preferred for small data samples.For moderate and large samples sizes the results of the two parameters estimation methods (L-moments and maximum likelihood) are usually similar.On this sense, Guttmann (1999) recommends at least 50 years of rainfall data for SPI calculations.
Although several author, such as Thom (1966), Blain (2005), Blain et al. (2007), Blain et al. (2009) and Blain (2009), have already assessed the robustness of the Gam distribution in describing monthly precipitation series of the State of São Paulo; the use of PE3 distribution in describing these series seems to be neglected.Since the SPI is a probability index, assessing the robustness of the PE3 distribution in describing (4) (5) (6) precipitation series may be seen as the first (and perhaps the most important) step in evaluating the possibility of adopting this three parameters distribution within SPI calculation algorithm.Two frequently used tests of the goodness-of-fit are the Kolmolgorov-Smirnov test (KS) and the chi-squared test (χ 2 ).According to Wilks (2006) the χ 2 operates more naturally for discrete random variables since to implement it the range of the data must be divided into discrete classes.For continuous distribution, the KS test, which compares the empirical and theoretical cdfs, will be more powerful than the χ 2 .As pointed out by Panofsky and Brier (1968) if the parametric/theoretical distribution is far different from the empirical distribution, the KS and/or χ 2 values will be large.The null hypothesis (H 0 ) associated with these both tests, affirms that the empirical distribution is statistically similar to the parametric distribution under analysis.Because the samples being tested in this work are the same samples being used in deriving the distributions parameters, the KS test is also known as the lilliefors test.The KS/lilliefors value (here after referred just as KS) for rejection/ acceptance of H 0 will depend on the sample size being used to test the parametric distribution, on the shape of the parametric distribution and, of course, on the significance level chosen by the user (Crutcher, 1975).The confidence in accepting or rejecting the theoretical distributions was measured by the p-value (p).The KS and χ 2 tests were applied in order to evaluate the goodness-of-fit of Gam and PE3 distributions to the observed rainfall series.A p-value smaller than 0,05 was taken as an evidence that the H 0 cannot be accepted.It is also worth mentioning that the χ 2 limiting value depends, among others, on the number of categories in which each monthly series was divided.
Graphical comparisons of the rainfall data and the fitted PE3 distributions were also carried out by using the percentilpercentil (pp) plots.The pp plots compare the empirical cumulative probability as a function of the fitted parametric cdf (Wilks, 2006).The KS, χ 2 , and pp plots calculation algorithms can be found on Wilks (2006).
The SPI values were estimated based on PE3 and Gam distributions for three time scales (1, 3, and 6 months).It is worth mentioning that the time scale associated with an SPI value is described by the month at the end of the analysis period.For instance, the 3-months SPI value observed at the end of the month of May is related to the precipitation total observed during the months of March, April and May.From both probability distributions (PE3 and Gam) the cumulative probability of an observed precipitation amount was computed.The inverse normal (Gaussian) function, with mean zero and variance one, was then applied to the cumulative probability [Hp(x) and/or Hg(x), respectively].The result is the SPI.The SPI drought categories, defined by McKee et al. (1993), are shown on Table 1.
As pointed out by Hayes et al. (1999) because SPI values fit a normal distribution, one can expect these values to be within one standard deviation 68% of the time, within two

SPI values
Drought Category 0 to -0.99 mild drought -1.00 to -1.49 moderate drought -1.50 to -1.99 severe drought ≤-2.00 extreme drought Table 1 -Standardized Precipitation Index (SPI) values and the associated drought categories (MCKEE et al., 1993) standard deviations 95% of the time, and within three standard deviations 99% of the time.A related interpretation would be that an SPI value of less than 1.0 occurs 16 times in 100 year, an SPI of less than 2.0 occurs two to three times in 100 year, and an SPI of less than 3.0 occurs once in approximately 200 year.Thus, as also pointed out by Hayes et al. (1999), because of this characteristics associated with the standard normal distribution (Equations 7 and 8), the SPI by itself cannot identify regions (or seasons) that may be more "drought prone" than others.
The regression analyses were used to compare SPIPE3 and SPIGam final values.Since the SPI are (at least approximately) normally distributed, the regression coefficients were obtained by using linear least square fitting.The significance of the coefficient of determination (R 2 ), was assessed by using the F test.
According to Wu et al. (2007), a SPI distribution can be considerate non-normal when its variables, related to the SPI time series, meet three criteria simultaneously: i) Shapiro-Wilk statistic (W) less than 0.960; ii) p-values less than 0.100; and iii) the absolute value of the median greater than 0.05.Otherwise, the SPI distribution is normal.The W statistic is the ratio of the best estimator of the variance, based on the square of a linear combination of the order statistic, to the usual corrected sum of squares estimator of the variance.The p-value is the probability that is associated with W (Wu et al., 2007).Since being normalized, or standardized on regional and temporal basis, is an important feature of the SPI algorithm, this normality test was applied on both SPIPE3 and SPIGam series.
It is worth mentioning that the presence of non-random components within a time series (such as trends and/or serial correlation) may affect the stability of the underlying statistic within the data sample.In this view, the Mann-kendall (MK) trend test (Kendall andStuart, 1967, andSneyers, 1975) is frequently used for detections of trends in time series data (Hamed and Hao, 1998).A positive MK value is observed under the presence of increasing trends.A negative MK value is observed under the presence of decreasing trends.If the MK value falls within [-1.96:1.96] the trend is taken as nonsignificant (The Ho is accepted at 5% level of significance).The Runs test (Z) for randomness was also calculated for the precipitation monthly series.If -1.96 < Z < 1.96, the hypothesis that the sequence is free of serial correlation (Ho) cannot be rejected at the 5% significance level.

The monthly time scale
As indicated by the Runs test, all monthly precipitation series can be seen as free of significant serial correlations.The Ho associated with the MK test could not be accepted only for the monthly series of May (Ribeirão Preto; MK=2.05) and August (Ubatuba; MK=-2.32).All others 108 monthly series can be seen as free of trends (at the 5% significance level).The inspection of Equations 7 and 8 indicates that if Hg differs significantly from Hp, then the SPIGam values may also differ significantly from the SPIPE3 values.Thus, as pointed out by Guttman (1999), standardization of this drought index calculation algorithm is necessary in order to provide for all users, a common basis for both spatial and temporal comparison of the SPI values.
According to Equations 1 and/or 4, while the Gam distribution has a fixed (x = 0) bound, the PE3 distribution bound depends on the values of its own parameters (x<μ-2σ/γ).Tables 3 and 4 show for each monthly series the median, the Standard deviations, and the number of cases in which the theoretical/ parametric PE3 and/or Gam distributions could not be defined [x=0, gam(x); or x<μ-2σ/γ, pe3(x)].
Table 3 indicates that the Gam distribution [gam(x)] could not be computed for a higher number of (no) rainfall events than the PE3 distribution [pe (x)] could be.On this view, the empirical factor (qg and/or qp) has a smaller contribution (weight) to SPIPE3 values than it has to SPIGam values.Table 4 indicates the opposite feature.
According to KS test (Tables 5 and 6) the monthly rainfall series are, in general, consistent with the proposition of their having been drawn from a population with PE3 and/ or Gam distributions.The same feature was also pointed out by the χ 2 (not illustrated).However, for the monthly series of Ribeirão Preto, the PE3 seems to be a better choice than the Gam distribution, since the Ho, associated with this last parametric function, could not be accepted for June.
Figures 2 to 4 show the pp plots for the weather station of Campinas, Presidente Prudente and Ubatuba at the months of January, April, July, and October.These four months represents, respectively, the summer, the autumn, the winter, and the spring seasons.
Figures 2 to 4 indicate that the fitted PE3 distribution corresponds well to the data through most of its range, since the parametric cumulative probability, evaluated at the corresponding data value, is quite close to the empirical cumulative probability.The pp points are very close to the line 1:1.The same feature (not illustrated) is also observed at the others monthly series and for the others weather stations.Considering the quantitative (KS and χ 2 tests) and the qualitative (pp plots) assessments of the goodness-of-fit, the PE3 distribution is adequate for SPI calculations in the State of São Paulo, Brazil.
Figures 5 and 6 show the regression analysis between SPIPE3 and SPIGam time series.The regression analysis was estimated for every data at each weather station.
The linear regression models (coefficient of determination, R 2 , slope and intercept) (Figures 5 and 6) show no significant All intercept values are also close to zero.However, for few extreme SPI negative values, there are remarkable differences between SPIPE3 and SPIGam values.
For the weather station of Campinas, the major difference between SPIPE3 and SPIGam occurred during March, 1908 for a SPIPE3= -2.5 and a SPIGam= -3.6.However, these both SPI values indicate the same drought categories (Table 1).Considering drought categories, the major difference between SPIPE3 and SPIGam occurred when there was no rainfall amount during the months of June (1929, 1936, 1963, 1966, 1979, 1984, 1986 and 2002) and during the months of September (1986 and 2002).In these both cases (months), while the SPIPE3 was indicating severe drought conditions, the SPIGam was indicating extreme drought conditions.
For the weather station of Pindorama, while the SPIPE3 was indicating, during the months of October (1984 -24mm; and 1985 -18mm), severe drought conditions, the SPIGam was indicating extreme drought conditions.The opposite feature occurred during the months of April (1952 and1953) when no precipitation was observed.For these last events, while the SPIPE3 was indicating extreme drought conditions, the SPIGam was indicating severe drought conditions.
For the weather station of Presidente Prudente, the SPIPE3 indicated during the months of May, when no precipitation was observed (1967 and 1981), extreme drought condition.For these same (no) precipitation events, the SPIGam indicated severe drought conditions.The observed precipitation data of November 1970 was 32mm.For this rainfall amount, the SPIPE3 and SPIGam values were, respectively, -1,85 (severe drought) and -2,16 (extreme drought).
For the weather station of Ubatuba (Figure 6), while the SPIPE3, during the months of April, 1963 (27mm) and August, 1988 (8mm), was indicating severe drought conditions, the SPIGam was indicating extreme drought conditions.For the weather station of Ribeirão Preto, during the months of October, 1944 (26,4mm) andFebruary, 2005 (50,2mm), while the SPIPE3 was indicating severe drought conditions, the SPIGam was indicating extreme drought conditions.
Figures 7 and 8 show the frequency distribution of dry SPI categories (Table 1) for the five weather stations.It is worth mentioning that the limitation of the SPI in identifying regions that may be more drought prone than others becomes evident since the frequencies of occurrence of the dry events were approximately the same among the five weather stations.Also according to Figures 7 and 8, the frequency of the SPIPE3 extreme drought events tends to be lower than the frequency of the same SPIGam event.For the station of Ribeirão Preto, the difference between the frequencies of SPIPE3 and SPIGam mild drought event is greater than 5%.
As indicated by Tables 7 and 8, almost all SPI nonnormal distributions occur during the winter season.Further studies should better investigate the reasons for this nonnormality, which seems to be related with the high probability of zero rainfall values during the dry season of the State of São Paulo.In this view, following Wu et al. (2007), it is worth mentioning that expression 7 computes both negative and positive SPI values.Consequently, in order to have balanced negative and positive value within a SPI time series, t (Equation 8) must be the same under the situations (i) 0 < Hg(x) or Hp(x) ≤ 0.5 and (ii) 0.5 < Hg(x) or Hp(x) ≤ 1.
Comparing the values within Tables 3 and 4 (number of no precipitation events), with the values within Tables 7 and  8, it becomes evident that in the locations (and months) where there is a high probability of zero rainfall values (Pindorama, Presidente Prudente and Ribeirão Preto), the lowest possible value of Hg(x) and/or Hp(x) is far from zero.Thus the high number of zeros, observed during the winter season led to an unbalanced Equation 8.As a result, the SPI time series were nonnormally distributed (Table 7).On the contrary, for the weather station of Ubatuba, because all the monthly precipitation totals are greater than zero, the departures of Equation 8 were symmetric distributed.Thus the SPI was normally distributed during all months (Table 8).
Despite the reasons, the number of non-normal SPIPE3 distribution (3) was lower than the number of non-normal SPIGam distribution (9).In this view, once again, the PE3  distribution seems to be a better choice than the Gam distribution, in developing a drought indicator which standards the rainfall deficits/excess on temporal and regional basis.

Others time scales (3 and 6 months)
The SPI was designed as a drought index that recognizes the importance of times scales in the analysis of water availability and water use.Thus, it becomes necessary to assess the robustness of Gam and PE3 distributions in describing precipitation series for larger time scales (larger than the monthly time scale).
For 3-months time scale, (Table 9) the Ho, associated with the 2-parameters gamma distribution, could not be accepted for the period ending on the months of July and August (Campinas), September (Ribeirão Preto), January, and December (Ubatuba).For 6-months time scale, the Ho, associated with this former distribution (Table 8) could not be accepted for the periods ending on the month of May (Campinas), May and October (Pindorama), April and November (Ribeirão Preto), May and August (Ubatuba).Also according to Table 8, the Ho, associated with the Pearson Type III distribution, could be accepted for all periods at both time scales.Furthermore, in the most of the cases, the p-value associated with the PE3 distributions was numerically greater than the p-value associated with the Gam distribution.Thus, it becomes evident from the p-values in Table 9 that the Gam distribution fits the precipitation data less well than the PE3 does.In this last view, as it was the case for the monthly time scale, the use of the PE3 distribution is once again recommend.
For both 3-and 6-moths times scales all SPIPE3 series can be seen as normally distributed.In the location of Ribeirão Preto, the SPIGam with time scales 3-and 6-months calculated for the period ending on the month of June are distributed nonnormally.

CONCLUSIONS
Since the SPI was developed as a standardized (on regional and temporal basis) drought index, the detected differences between SPIPE3 and SPIGam final values, indicate the need of adopting a uniform distribution model in calculation the SPI in the State of São Paulo, Brazil.Considering the applied goodness-of-fit tests the Pearson type III distribution seems to be a better choice than the Gamma distribution, in describing the long-term rainfall series of the State of São Paulo.In addition, it was observed that the number of SPI time series that could be seen as normally distributed was higher when this drought index was computed from the Pearson Type III distribution.Thus, the use of the Pearson type III distribution within the calculation algorithm of the SPI is recommended in the State of São Paulo, Brazil.

Figure 5 -Figure 6 -
Figure 5 -Linear regression analysis between Standardized Precipitation Index values, when computed based on pearson type III (SPIPE3) and Gam (SPIGam) distributions

Figure 8 -
Figure 8 -Frequency distributions of monthly dry SPI categories for the weather station of Ubatuba (1935 to 2008), State of São Paulo, Brazil.

Table 2 -
Runs test (Z) and Mann-Kendall test (MK) applied to five monthly precipitation series of the State of São Paulo, Brazil

Table 3 -
Median, Standard deviation (Sd)and number of cases (months) in which the pearson type III distribution (#PE3) and the gamma distribution (#Gam) could not be defined at five weather station of the State of São Paulo, Brazil.The median and the Sd are given in millimeters

Table 4 -
Median, Standard deviation (Sd)and number of cases (months) in which the pearson type III distribution (#PE3) and the gamma distribution (#Gam) could not be defined at the weather station of Ubatuba,SP, Brazil (1935/2008).The median and the Sd are given in millimeters

Table 6 -
Lilliefors/Kolmolgorov-Smirnov test applied to fitted gamma (KSGam) and Pearson type III (KSPE) distributions to monthly rainfall series.μ, σ, γ, α, β and are the parameters of the distributions.The p-value is shown in brackets.