Verification of the stationarity of flow series in the Iguaçu River basin

The purpose of this technical note is to verify the stationarity of flows in the Iguaçu River Basin, considering 14 fluviometric stations. For this purpose, three sets of annual flow series were studied: mean flows, maximum flows and minimum flows of 7 days. Initially, the exploratory analysis of the data was performed, based on the establishment of two points of change of the characteristics of the flows and the accomplishment of statistical tests of equality of mean and variance, parametric and nonparametric. Finally, composite tests were used considering the basin divided into Upper and Lower Iguaçu. In the exploratory analysis of the data, it was concluded that there was apparently a change in the trend of flow rates, part in the 1970s, part in the 1980s. Therefore, the series were divided in two different ways, at the half and at the point suggested in the exploratory analysis. Regarding the statistical tests, the Mann-Whitney test was chosen because it did not depend on the underlying distribution of the flow series, and because the test was highly recommended by other authors. It was concluded that the flow change occurred in a relatively short interval of time, and could be treated as a non-stationarity per hop. In general, the most recent change was downstream of the Uniao da Vitória fluviometric station. A change in the behavior of the mean and maximum flows was evident, however, such phenomenon was not observed in the evaluation of the minimum flows.


INTRODUCTION
The studies related to the water resource management and the projects for its utilization are, in general, based on estimates of surface water availability. These estimates are obtained, preferably, from observed flows series. In case of a lack of flow series, it may also be gathered through a statistical analysis of precipitation data.
Since the hydrological studies are considered as a relevant tool for the decision-making of the management agencies, it is important that the estimated water availability presents the current and, most importantly, the future reality of the flow.
In this scenario, the importance of identifying the stationarity of rain and flow series becomes evident, in order to plan and guarantee the future availability of water resources, in quality and quantity. For an accurate availability estimation, it is necessary to sustain a flow series that covers extensive periods of time, in order to minimize a correlation sampling error. At the same time, lengthy records enable the hypothesis that the statistical properties of the variables of interest may vary over time, either because of natural or anthropic climate effects, or because of soil usage.
According to Tozzi (2014), the water resource planning, when taking into consideration the stochastic nature of hydrological variables, was notably developed in the 20th century with the use of sophisticated statistical techniques. The essential principles presumed in these techniques are: (i) the statistical characteristics of the historical series will be reproduced in the future, considering in this case, that there are stationary series; and, (ii) the samples used in the studies are statistically representative.
However, extensive series of up until 100 years of observation, have often been shown to be non-stationary, due to climate condition variability, soil use alteration, or even hydraulic construction implementation. Therefore, the identification and subsequently, the adjustment of the stationarity of temporal flow series or precipitations, is utterly important, as it ensures that studies, projects and actions related to water resources work with more a representative data and aligned with the region's future of hydrological reality.
In the State of Paraná, the Iguaçu River basin is the largest in area and, due to its importance, a substantial number of studies were performed about the subject. This basin stands out for its great hydroelectric potential, reason why it has been the subject of most of these studies since the 1950s. At that time, an implementation of Negro River transposition (Iguaçu River affluent) was considered for Serra do Mar (mountain range by the sea). In the 1970s, a detailed study discarded this transposition and proposed a sequence of hydroelectric power plants along the Iguaçu River which, after some modifications, were implemented in the 20 th century (Canambra Engineering Consultants Limited, 1969;Companhia Paranaense de Energia, 1974, 1978. Recently, studies have been performed in order to analyze the stationarity of the flow at Iguaçu River, with the intention of verifying any changes in its behavior and, consequently, in the firm energy at the hydroelectric plants of the Iguaçu River (Pedrozo & Fill, 2017;Fill, 2017).
With this scenario in mind, the main purpose of this technical note is to demonstrate the results of the study implemented by Tozzi (2014) regarding the stationarity flow in the Iguaçu River Basin, reviewing methods to locate the non-stationarity hydrologic flow series, in order to contribute to the continuity of the discussion about the topic.
Based on the objective, the following measures were executed: (i) Exploratory Data Analysis; and, (ii) Hypothesis Tests.
According to Kundzewicz & Robson (2000), the visual examination of the data is part of a group of techniques called Exploratory Data Analysis (EDA), which involves the use of charts and other methods such as heuristic enquiry, to explore and understand series over time, being an essential component to begin analyzing the stationarity phenomenon.
The hypothesis tests are formal procedures in statistical inference used to analyze the characteristics of a population from a sample of observation (Spanos, 1980;Devore, 2012).

Used data
The hydrographic basin of Iguaçu River is located in the South of Brazil between latitudes 25° 05 'S and 26° 45 'S and longitudes 48° 57 'W and 54° 50 'W and has a drainage area of 65,558 square kilometers (Paraná, 2007). It is a sub-basin of Paraná River hydrographic basin, shown in Figure 1. The source of Iguaçu River is in the western portion of Serra do Mar (mountain range by the sea), at an altitude of around 1,200 m, passing through the Curitiba Plateau, and then crossing the Ponta Grossa Plateau. From Porto Vitória it passes by the Guarapuava Plateau to the Paraná River mouth.
For the case study, fourteen fluviometric stations in the Iguaçu River basin were selected, as listed in Table 1. These stations were selected mainly for their extensive periods of observation, location, data quality and constancy of their records. Besides, they have been the subject of many hydrological studies, most of which have focused on the design of hydroelectric plants, in which their records have been carefully analyzed and, eventually, corrected. Thus, the records consisting these series of flows were accepted as correct.
Observed flows were used since there are no regularization stream gage reservoirs for the fluviometric stations, except at Cataratas Falls. The series of average, maximum and minimum flows were produced based on the data available on Hidroweb website from the Agência Nacional de Águas (2013) on the internet.
The stations shown in Table 1 -except Porto Amazonas, União da Vitória and Santa Clara stations -have observation flaws, these were filled, when possible, using hydrological regionalization techniques. In cases where there was an impossibility to fill in the gaps, the periods of the respective series were redefined, removing from the analysis, the years with gaps in excess.

Exploratory analysis and regression tests
At the beginning, graphs were prepared with hydrographs and the flow rates accumulated over time. This method, and particularly the accumulated flow approach, was used by Fill (2011) to detect the probable change time in the characteristics of the series. A graph showing the deviations of average annual flows  Verificação da estacionariedade de séries hidrológicas de vazões na bacia do Rio Iguaçu in comparison with the long-term average, was also elaborated. The flow regression line was also aligned with the year of occurrence. This graph allows to observe (or not) a possible linear trend of the flow with time. Also, a graph was prepared with the moving averages of the flows.
To verify if the trend visually presented by the flow charts is significant, the Student's t-test was applied over the angular coefficient β of the flow regression (Johnston, 1984), in order to verify the null hypothesis : Besides the graphs, a technique called Rescaled Adjusted Partial Sums (RAPS) was also applied, proposed by Buishand (1984) and suggested by Alemaw & Chaoka (2002), as an important preliminary technique for visual inspection of hydrological series.
According to Buishand (1984), the sum RAPS of a function Y (t) is defined as: where: k X = RAPS at the k limit; t Y = value of the variable at time t; Y = sample mean; Y S = standard deviation of the sample; n = sample size; k = limit counter of the current sum.
According to Alexandre (2009), in case of this technique, a pronounced "peak" or "depression" may indicate a trend in the series data, with the positive slope indicating periods when the average is greater than the average after the change, while the negative slope indicates periods when the average is below the average of the period prior to the changing point. The choice, at first, for these points of change in the RAPS graph must be performed by analyzing the period of greatest amplitude between a "peak" and a "depression", or the other way around.
For each division identified by the RAPS method, the linear regression for the average sum, maximum and minimum flows was performed anew, testing the null hypothesis that the β coefficients are equal to one another (β1 = β2).
The exploratory data analysis was performed for all variables analyzed, which are: (i) average annual flow, (ii) maximum annual flow, (iii) minimum flow of 7 days. It is worth mentioning that the visual analysis of the charts only allows the identification of trends in the series and points where non-stationarities are entirely evident.

Statistical tests
The application of statistical tests takes place in three stages. Firstly, the hypothesis of the data normality is performed for the application of parametric tests, where the serial autocorrelation should guide the decision whether the tests are applied directly to the flow rates (or their logarithms), or on the residues of the autoregressive model. Afterwards, parametric tests of subsamples homogeneity, or the significance of the regression slope between the flow and the year of occurrence are applied. At the end, non-parametric tests are applied.
To analyze the normality of the series, in addition to the asymmetry coefficient and Chi-square tests, Tröger et al. (2004) also mentioned the Filliben and Kolmogorov-Smirnov tests, as more powerful. That way, the Kolmogorov-Smirnov test was chosen, as recommended by Loucks & van Beek (2005).
To verify the degree of dependence between successive data in a series, the sample Autocorrelation Coefficient (ρ) was calculated, considering a Markovian temporal dependency model in the flow logarithms and following the method proposed by Loucks & van Beek (2005) to test the hypothesis: H 0 :ρ=0.
Subsequently, homogeneity hypothesis tests were applied to assess the stationarity of the series.
For the research, two parametric tests were selected: the Student t test and the Snedecor F test. The classic Student t-test evaluates the equality of the population of each subseries; and the Snedecor F test evaluates the equality of population variances. Both consider that the samples are taken from normal populations. The basic formulation of these tests can be found in Devore (2012).
The non-parametric tests used in this research in order to check stationarity are: Spearman's coefficient, Mann-Whitney, Wald-Wolfowitz and Smirnov.
Very often, when studying changes in hydrological series, data from several measurement stations in a region are used, presuming a similarity between them. In this case, it is suggested to apply compound tests to each result, because there is a probability α of erroneously rejecting the null hypothesis (type I error) for each specific location In Fill (1994) it is shown in detail how to apply a compound test.
The application of a statistical test with α probability of type I error, to a single location, is equivalent to establishing a confidence interval , ] However, it is possible to extend a statistical test to a set of m locations, all supposedly subject to the same null hypothesis. This type of test is called a compound test (Fill, 1994). . Regarding the application of the original test to each location as a Bernoulli experiment, K is a binomial random variable with p a = and N m = . In this case, a critical value crit k for K can be determined, supposing the null hypothesis is true: where: r P = probability of type I error for the compound test; m = number of locations; p = probability of type I error; 0 H = null hypothesis.

Exploratory data analysis
In the data exploratory analysis, the descriptive statistics for the samples were first calculated, which, according to Naghettini & Pinto (2007), summarize in a simple way the pattern of distribution for variables in question.
Subsequently, exploratory analysis graphs were created in order to visually identify possible trends or changes in the series. Simple and accumulated time plots, 10 year moving average and median, and linear regressions were prepared, as shown in Figure 2.
From the visual analysis for the group of the charts for the fourteen fluviometric stations, it is possible to observe that most of them have a slight positive trend in the averages and maximums. In the 7 days minimum this trend is visible only in some stations. In particular, the average flow rates is more prominent, which in general present a more accentuated slope.
The results of the linear regression confirmed the visual analysis, with a significant positive trend being observed in the average annual flows in most stations (occurring in eleven stations of the fourteen studied); in the highs and lows, the positive trend occurs in approximately half of the stations. Figure 3 shows the dimensional of the Student's t-test result, with the posts presented by proportional circles to their area. Therefore, to visually identify the possible date of change in the positive trend confirmed by linear regression, the RAPS method was applied. Figure 4 shows the RAPS chart for the Santa Clara river station (65825000). Figure 4 shows a more pronounced "peak" in the early 1980s, indicating a possible change in behavior from that point ahead, in relation to the previous period. This pattern is verified in all fourteen fluviometric stations. This "peak" does not prove the existence of a change, but it warns of a possibility that should be analyzed with the use of statistical tests.
In order to confirm this conclusion, and to better identify the year in which a change in flow characteristics occurs, a chart was produced with the accumulated flow rates, suggested by Fill (2011). In Figure 5, the graph for Santa Clara (65825000)    RBRH, Porto Alegre, v. 25, e10, 2020 6/9 Verificação da estacionariedade de séries hidrológicas de vazões na bacia do Rio Iguaçu to the accumulated flows, as shown in Figure 6. The blue circles identify the observed classes.
From Figure 6, it is possible to notice that the first group of stations coincides with the upper part of the basin, of less recent urbanization and agricultural development, whereas the second group has its stations located in the downstream part of the basin, where agricultural development happened more recently. The exception to this pattern is the Cataratas Falls fluviometric station that drains almost the entire basin and includes both regions, therefore susceptible to be influenced by the hydroelectric plants located upstream.
Thus, the seasons were divided into two groups: in the first, the series were divided in half, from 1930 to 1969 and from 1971 to 2005; in the second, the series were divided in half from 1930 to 1979 and from 1981 to 2005. The first division was made in the middle of each historical series, while the second division was inspired by the subjective visual analysis of Figures 4, 5 and 6.
In order to verify the significance of the change in slope for the regression lines of the accumulated flows in a more precise way, Student's t test was used again.
Considering the division in half of each historical series, in the regressions performed in the sum of annual flows, the null hypothesis 1 2 β β = was rejected in all cases for average flows and accepted in only one case for maximum flows (65010000). In the case of minimum flows, there were nine cases of acceptance and five cases of rejection of the null hypothesis. The stations were grouped into two sub-basins called Alto (Upper) and Baixo (Lower) Iguaçu (upstream and downstream of União da Vitória), for the purposes of this technical note. Based on this division, it was found   7/9 that in Alto (Upper) Iguaçu there were five cases of acceptance and only two of rejection of the null hypothesis. In the Baixo (Lower) Iguaçu four cases of acceptance and three of rejection.
For the division of the subsamples based on the year defined by the visual analysis of the temporal graphs, also considering the sum of the annual flows, the null hypothesis 1 2 β β = was accepted for average flows only at the Fazendinha stations (65010000) and Porto Vitória (65365000) and in none of the cases for maximum flows. In the case of minimum flows, there were five cases of acceptance and nine cases of rejection of the null hypothesis. When the stations were divided into Alto (Upper) and Baixo (Lower) Iguaçu (upstream and downstream of Uniao da Vitoria), there were three cases of acceptance of the null hypothesis in Alto (Upper) Iguaçu and two cases in Baixo (Lower) Iguaçu.
It was then concluded that there was a change in the average and maximum annual flows. For the minimum annual flows there was a change in only some of the stations analyzed.

Hypothesis testing
For the use of the Kolmogorov-Smirnov test, the null hypothesis was considered as the expected behavior for the random variable that can be modeled by the normal distribution. Since the distribution of flow probabilities can often be described by a log-normal distribution (Chow, 1954;Loucks & van Beek, 2005), the Kolmogorov-Smirnov test was applied to the logarithms of average, maximum and minimum annual flows, as suggested in Fill (2011). In this case, the test accepted the null hypothesis that the random variable (ln Q) can be modeled by the normal distribution, which means a log-normal flow distribution.
Considering the results in the Autocorrelation Coefficient, only the Fazendinha station (65010000) rejected the null hypothesis that the logarithms of the average and minimum flows are series independent. In this case, the test was applied to the residuals of the flow logarithms, according to the model proposed in Fill et al. (2012).
After verifying normality and autocorrelation, parametric and nonparametric statistical tests were applied to more rigorously check the stationarity of the hydrological series.
Parametric tests were always applied to flow logarithms, while non-parametric tests used the flow information directly. Average, maximum and minimum annual flows were considered.
For the Student's t tests case, F of Snedecor and Mann-Whitney, two different groups were used for Alto (Upper) and Baixo (Lower) Iguaçu. In the first, the series were divided in half and between the years 1930 to 1969 and 1971 to 2005; and in the second, the series were divided in half and between the years 1930 to 1979 and 1981 to 2005. It is important to notice that in the Spearman and Wald-Wolfowitz tests there is not any definition in subsamples according to the method. The results of the various tests are shown in Tables 2 and 3, with the slope test of the regression lines altogether.
From the results shown in Table 3, it is possible to notice that the behavior of the Wald-Wolfowitz test is not consistent with the other tests, which is why it was disregarded in the analysis. The basin was also divided into Alto (Upper) and Baixo (Lower) Iguaçu with division in Uniao da Vitoria, since the physiographic features of    these sub-basins (slope, type and use of soil, precipitation) are notably different. There was a certain inconsistency in the results obtained, with several locations showing aspects of stationary and others not stationary. There was also a considerable variation in results between average, maximum and minimum flows, as well as between Alto (Upper) and Baixo (Lower) Iguaçu. Comparing the partition sub-samples cases in the middle of the total period, or in the suggested year at the visual analysis of temporal graphs, remarkable differences were not perceived.
Considering the Alto (Upper) and Baixo (Lower) Iguaçu basins as hydrologically distinct, but supposing that each one of these sub-basins is homogeneous in terms of flow stationarity, a compound was considered for the seven stations located in each one of the sub-basins. , it can be concluded that non-stationarities, when they exist, are essentially in the average and not in the flow variance.
These interpretations allow us to affirm, with a reasonable degree of certainty, that the flows in the Iguaçu River basin are not stationary on average but can be assumed to be stationary in the variance.

CONCLUSIONS
This technical note addresses a relevant issue for the management of water resources: the problem of stationary hydrological series.
The detection of stationarity can be done through the application of statistical tests, among which, it is worth mentioning the preferential use of non-parametric tests, as they do not depend on arbitrary assumptions about the underlying flow distributions (Loucks & van Beek, 2005). Because of that, the Mann-Whitney test is proposed to check stationarity, whenever there is some evidence of a defined period for changing statistical properties. When this evidence does not exist, Spearman's test can be recommended, where division into subsamples is not necessary.
The parametric tests t of Student and F of Snedecor, despite being strictly valid only for proportionately distributed variables, allow, to identify if the non-stationarity series is in the average, in the variance or in both. As the flows in general are not normally distributed, in this case there is a need to use reconstructed variables conveniently.
In this condition, the compound test can be perceived as a type of regional stationarity analysis. In that sense, it is suggested to create a territorial division on the hydrographic basin into homogeneous sub-basins. Based on the case analyzed, the sub-basins adopted, Alto (Upper) and Baixo (Lower) Iguaçu, have very different physiographic patterns (slope, soil type, land use). The application of the compound test to each of these sub-basins concludes that there is no stationarity in all cases of the annual series regarding maximum and minimum average flows for 7 days.
For the individual analysis of each fluviometric station considered, the results revealed both cases in which the stationarity hypothesis could be rejected, as well as cases in which it could not. There was also a greater tendency for non-stationarity in Alto (Upper) Iguaçu, since in Baixo (Lower) Iguaçu the individual analyzes for each station were not conclusive, with a tendency for stationarity for minimum flows.
The Wald-Wolfowitz test did not present coherent results with the other tests performed and therefore, it is believed that it cannot be recommended for stationarity analyzes.
A method for series analysis proposed by Fill et al. (2012) with series dependence was also used, using allegedly independent residues in the tests.