Statistical validity of water quality time series in urban watersheds Validade estatística de séries históricas de qualidade da água em bacias urbanas

The water resources quality continuous monitoring is a complex activity. It generates extensive databases with time series of many variables and monitoring points that require the application of statistical methods for the information extraction. The application of statistical methods for frequency analysis of time series is linked to attending of the basic assumptions of randomness, homogeneity, independence, and stationarity. However, despite its importance, the verification of these assumptions in water quality literature is unusual. Therefore, the present study tests the Upper Iguaçu basin water quality time series against the mentioned hypotheses. Rejection was observed in 15%, 26%, 51% e 31% for randomness, homogeneity, independence, and stationarity, respectively. The results evidenced the strong relation between monitoring strategy, data assessment and meeting of basic statistical assumptions for the analysis of water quality time series. Even with the existence of possible solutions for addressing those issues, the standard monitoring strategies, with irregular frequencies and lack of representativeness in relation to other periods, beyond commercial, act as an obstacle to their implementation.


INTRODUCTION
Qualitative monitoring of water resources is a complex activity.It requires the development, constant maintenance and updating of strategies for obtaining representative data and consistent information for the management of water resources.Continuity of monitoring gives rise to extensive databases containing time series of various variables and monitoring points.The analysis of these data requires the application of appropriate statistical methods, especially in urban basins, where processes that alter water quality are diverse and produce high variability.
The application of statistical methods for the analysis of frequencies in time series depends fundamentally on the statistical validity of the series, which is linked to the basic assumptions of randomness, homogeneity, independence and stationarity (CHADEE;SHARMA, 2003;MERZ;THIEKEN, 2005;NAGHETTINI;PINTO, 2007;HULLEY;CLARKE;WATT, 2015).In hydrological studies, the verification of these hypotheses is frequently recommended, however, in water quality literature references that have some recommendation or verification are rare.
Water quality time series are used in trend analysis (COSTA; OLIVEIRA; SOUZA, 2011;DAHM et al., 2014), multivariate statistical analyses (WANG et al., 2013;TORRES;LEMOS;MAGALHÃES JUNIOR, 2016), analysis of variances (GONÇALVES et al., 2011), mathematical modeling (KONDAGESKI;FERNANDES, 2009;KNAPIK et al., 2013;VIEIRA et al., 2013), duration curves (MARIN et al., 2007;FORMIGONI et al., 2011), linear regression analysis (HIRSCH; MOYER; ARCHFIELD, 2010), among other analyses, however, typically, without prior verification of the assumptions for analysis.Costa, Oliveira and Souza (2011) analyzed the presence of trends in time series of annual averages of water quality data in the Upper Iguaçu Basin.The annual averages were used as a solution to the problem of dependence and seasonality in the series.However, no verification of compliance with the basic assumptions for frequency analysis in time series was performed, and therefore there is no guarantee that the annual averages are statistically representative.
The concepts regarding the assumptions are clearly defined in hydrological studies.Randomness is observed when fluctuations of values have natural causes, such as river flows when not influenced by the presence of a dam.Independence is observed when previous observations do not influence the next observations in a chronological sequence.A homogeneous time series is observed when all the data comes from a single population.Different populations can be generated in the same series by the influence of phenomena such as El Niño or seasonality.Finally, a series is said to be stationary when there is no changing of population parameters, such as mean and standard deviation, over time (NAGHETTINI;PINTO, 2007).
In water quality studies there is no clear and appropriate definition of these concepts to explain cause and effect relationships.This absence is equivalent to considering the series random, independent, homogeneous and stationary, but without proper justification.Environmental data are susceptible to the presence of non-randomness, dependence, non-homogeneity, and non-stationarity, due to seasonality, the occurrence of extreme events, urbanization, among other factors.Hence, the need for a more cautious analysis, especially in urban basins, is evident.
The negligent application of traditional statistical methods in series with these characteristics produces unreliable information on parameters and their associated uncertainties, that is, on confidence intervals, p-values, standard deviation, among others (GILBERT, 1987;HELSEL;HIRSCH, 2002;MCBRIDE, 2005).
In this paper, a verification of the basic statistical assumptions for frequency analysis in water quality time series of the Upper Iguaçu Basin is performed.These series represent the technical basis of several relevant studies for the management of water resources.The results are accompanied by a critical analysis of the application to water quality data, the discussion of causes and effects of non-attendance, and recommendations on the development of monitoring strategies and approaches to the observed series.

MATERIAL AND METHODS
The research was developed from a database consolidated between 2005 and 2011 for research projects in the Upper Iguaçu Basin and complemented with five monitoring campaigns in 2012.The time series were submitted to verification of the attendance to the assumptions for the analysis of frequencies by the application of statistical methods, testing the hypotheses of randomness, homogeneity, independence, and stationarity.
Prior to the verification of compliance with the basic assumptions, characteristics of the time series, such as sample size, variability, sampling frequencies and seasonality, were analyzed.

Study area
The Upper Iguaçu Basin (Figure 1) is located in the State of Paraná, in the Metropolitan Region of Curitiba (RMC).It covers 15 municipalities, among which, Curitiba, the capital of the State of Paraná.Its drainage area is approximately 3,000 km 2 and is located between latitudes -25.23 and -25.83 and longitudes -48.96 and -49.69.The Iguaçu River crosses the State of Paraná in the east-west direction, extending for 1,320 km, of which the initial 90 km are in the Upper Iguaçu Basin.
Urbanization is mainly concentrated in the Barigui, Belém, Atuba, and Palmital rivers, on the right bank of the river.Approximately 15% of the area of the basin is urbanized.The other uses of the soil in the basin are mainly divided between temporary crops, fields and other types of vegetation, with the predominance of agricultural uses, thus leaving little forest cover on the basin (COELHO, 2013).

Research and database
The detailed analysis of water quality time series, involving questioning its representativeness and validity for subsequent frequency analysis, is original and distinct from other studies in the Upper Iguaçu Basin.Studies such as Kondageski and Fernandes (2009), Pitrat (2010), Knapik et al. (2013), and Costa;Oliveira and Souza (2011), used water quality time series in multivariate statistical analysis, trend analysis, mathematical modeling of water Coelho et al. quality, generation of duration curves, box plots, among other analyses, however, without worrying about the issues discussed in this article.
These surveys contributed to the consolidation of a database currently composed of 12 monitoring sites (Table 1), with 8 sites in the Iguaçu River and 4 in the main tributaries, presented in Figure 1, with 34 water quality variables.

Monitoring
The monitoring sites coincide with sites in the national information system on water resources, which are officially monitored by the local water agency, the Instituto das Águas do Paraná.It is important to emphasize that the monitoring described here is independent of the monitoring performed by that institution, therefore, not necessarily following the same methods and techniques.
The analyzed database is composed of data between 2005 and 2012 (43 monitoring campaigns), except in the tributaries (PA, AT, BE and BA, see Table 1) of the Iguaçu river and IG7, monitored only in 2012 by the present research, between August and December, in a monthly time step (5 campaigns).
The monitoring campaigns were typically held on business days between 8:00 am and 5:00 p.m.There were no records of campaigns during the night, weekends and holidays.

Flow estimates
All points, except IG1, AT and BE, have a fluviometric station, whose data and time series obtained by the Instituto das Águas do Paraná, are available in the Brazilian national water information system (Hidroweb).They also have a time series measured by this and other previously cited studies, which is included in the present work.
In years prior to 2012, the flows were estimated from a single reading of the ruler and later entered into a rating curve provided by the Instituto das Águas do Paraná.However, in 2012, it was found that most stations presented problems such as lack of ruler, large cross-sectional changes, and outdated rating curves.Thus, for the determination of flow rates in 2012, it was used the average specific flow rate for the Upper Iguaçu Basin (Equation 1) presented in Marin et al. (2007).(1) q = average specific flow rate (m 3 /s.km 2 ); p = permanence (%).The specific flow q was calculated as a function of the continuous flow of a reference site.Due to better conditions (rules, location, and recommendations of network users) IG7 (code Hidroweb 65035000 -Porto Amazonas) was used as a  Statistical validity of water quality time series in urban watersheds reference.From its flow time series , a duration curve was determined.With this, the IG7 flow permanence values in each campaign (calculated by the rating curve) were obtained.Then, q was calculated and multiplied by the drainage area of each site.

Hypothesis testing
The hypotheses of randomness, homogeneity, independence, and stationarity were tested by the following methods: Runs test, Wilcoxon-Mann-Whitney, Wald and Wolfowitz and Kendall tests, respectively.The presence of seasonality was evaluated by Kruskal-Wallis and median tests.The tests were applied in the time series of 34 monitoring variables in terms of quantity (loads) and quality (concentrations and other units), for 12 monitoring sites, with significance level α = 5%.The Wilcoxon-Mann-Whitney, Kruskal-Wallis, and median series tests were performed in SPSS Statistics 20 program from IBM  ; the Wald and Wolfowitz test in Excel  ; and the Kendall in XLSTAT 2013 program of Addinsoft  .

Runs test
The null hypothesis (H 0 ) that the series is random against the alternative hypothesis (H 1 ) that the series is not random was tested.Let m be the number of elements of a type and n the number of elements of another type in a sequence of N m n = + binary events.The test consists of counting the number of groups of equal elements in a sequence, or number of runs (r), for example, In order to transform the numerical historical series X into binary events, the median was used as the parameter, that is, and n are ≤ 20, a specific table is used, with the expected minimum and maximum values of r as a function of m and n.If m or n is > 20, the normal distribution is used as an approximation of the distribution of r by Equations 2, 3 and 4. The test is bilateral and H0 is rejected when p-value <0.05 (SIEGEL; CASTELLAN JUNIOR, 2006).

Wilcoxon-Mann-Whitney
In this test, the null hypothesis (H 0 ) is that the series is homogeneous, while the alternative hypothesis (H 1 ) is that the series is not homogeneous.
Let m be the number of cases in the sample of the group X and n the number of cases in the sample of the group Y. Assuming that the samples are independent, the observations of both groups are combined by arranging the stations in ascending order.The value of the test statistic, W x , is the sum of the posts of the smallest group (SIEGEL; CASTELLAN JUNIOR, 2006).
When m and n are ≤ 10, a specific table is used to determine the exact probability associated with occurrence of any W x as extreme as the observed, when H 0 (the series is homogeneous) is true.For m or n > 10 the distribution of W x approaches a normal distribution and the significance of an observed value of W x can be determined by Equations 5, 6 and 7 (SIEGEL;CASTELLAN JUNIOR, 2006).
In the context of water quality time series, the groups X and Y are the first and second half of the series, respectively.The test is bilateral and H 0 is rejected when p-value < 0.05.

Wald and Wolfowitz
Considering a sample { } , , , determined between i X and its average X , the Equation 8 is calculated.Under an independence hypothesis, R follows a normal distribution with mean and variance given by Equations 9 e 10, respectively.The test statistic is then determined by Equation 11.
x be the measures of station i of year l , K the number of stations and L the number of years.The seasonal Kendall tests for a monotonic trend in the data, considering different seasons.Hence, distinct trends may be identified for distinct seasons.
Considering the seasons as the months of the year, for each month the observations of each year are used to calculate the statistic i S and S statistic according to Equations 12 and 13, respectively.The variance of S is calculated by Equation 14, where i n is the number of observations in station i.After S and ( ) are calculated, Z is calculated according to Equation 15.In the bilateral case H 0 , is rejected if , with Z following a normal distribution (GILBERT, 1987;HELSEL;HIRSCH, 2002).Due to the presence of dependence in the series, the Hamed and Rao (1998) correction was applied by the software.

Kruskal-Wallis
The Kruskal-Wallis tests the null hypothesis that the median of k data groups are the same, against the alternative hypothesis that at least one pair of groups presents differences between their medians.Each of the N observations of the k groups is replaced by ranks, i.e. the rank 1 is assigned to the lowest value and the rank N to the highest, considering observations as a unique series.Then, the ranks average is calculated in each group and in all groups combined.
The test statistic is given by Equation 17, where N is the number of observations in the combined sample, k is the number of groups, j n s the number of cases in the j-th group, j R is the average of ranks in the j-th group and . For k 3 > and j n 5 > in each group, the sample distribution of KW is approximated by the

Medians test
Finally, the last considered method tests similar hypotheses than the Kruskal-Wallis test.The median of the combined sample of the k groups is determined and the observations higher and lower than the median are replaced by + and -, respectively.The test statistic is given by Equation 18, where i 1 = for + category, or 2 for −., category, ij n the number of cases in the each group categories j , k the number of groups, and ij E is the avegare between 1 j n and 2 j n .The obtained values 2 × , for large samples, are distributed as a  ) For the application of the Kruskal-Wallis and the medians tests, four groups were defined, corresponding to the four seasons of the year.

Time series characteristics
Figure 2 shows a small amount of summer data, when compared to the other seasons.In the same sense, fewer data from tributaries is considered in comparison with the main river data.The time series presented irregular frequency between 2005 and 2013 (Figure 3), with intervals from biweekly to quarterly.Long Statistical validity of water quality time series in urban watersheds intervals without monitoring are observed between 2006-2008, 2008-2009 and 2011-2012.The phosphorus series, except P T , as well as COLIF TOTAL , COLIF FECAL , COD, N T , OD WINKLER and S SED series, have a considerably smaller amount of data in relation to other variables at all sites.In terms of loads, some series are smaller because of the lack of flow data, specially at sites IG1, IG3, IG4 e IG6.
The boxplots for concentration, load, and other variables (T, pH, COND, TURB, Q, S SED , COLIF TOTAL , COLIF FECAL e SECCHI), in general, showed high variability, a large amount of outliers, and positive asymmetry, suggesting non-normality of the series.Some of these characteristics can be observed in the flow boxplots (data between 2005-2012) presented in Figure 4.The basin presents a considerable increase of loads and flows downstream the site IG3.

Hypothesis testing
The results of the hypothesis tests are presented in this section.Except in Table 2, in each parameter, the 1st row contains the results for the water quality time series and the 2nd row, the results for the quantity time series.Variables absent from the tables showed no rejection at any monitoring site, and missing points did not show rejection for any variables.
The tests were applied in a total of 720 series (34 variables x 12 sites for quality and 26 variables x 12 sites for loads).The analysis of the results is presented in terms of percentage of rejections and p-values, however, the sites IG7, PA, AT, BE and BA did not present rejections in the tests of seasonality, randomness, and homogeneity, possibly due to the low amount of data (5 observations) and, consequently, lack of evidence for rejection.For these tests the total of series considered for calculation of the percentages of rejection is 420.

Seasonality
Table 2 shows that the seasonal cycle exerts a significant influence on 9% of the series.In most cases, rejections are observed in both tests.The rejections in NH 4 + and N ORG series from IG3, T in all sites, and some forms of solids in IG2A are highlighted.
According to the results, the seasonal cycle seems to exert a greater influence on the organic contribution, represented by variables such as SSV, NH 4 + , N ORG and N KEJLDAHL , than in the hydrological cycle (Q).The smaller influence on the hydrological cycle is associated with the form of operation to collect information, performed only on business days without the concern of collecting during the occurrence of rainfall.On the other hand, the greater influence on water quality is associated to significant temperature   variations and to diffuse pollution, since in the rainy periods, mainly with the first rains, the contribution of diffuse pollution increases.
Therefore, interferences in the physical, chemical and biological processes throughout the season are expected.This emphasizes the importance of the representativeness of the series, that is, the series should represent all the variability of the parameter or variable being analyzed.The representativity is associated to the frequency of information collections.Thus, irregular collections of flow data will hardly represent their temporal variability accurately.
The presence of trend in the series introduces high variability in the groups referring to each station when older data are grouped to the most recent.This makes it difficult to differentiate the groups.

Randomness
It is observed in Table 3 that the randomness hypothesis was rejected in 15% of the series, half with p-value <0.01, especially the COND variable.The low p-values for COND can be explained by the apparent trend of the series, shown in Figure 5.
It is observed that from 2008 the values of conductivity are higher than the median, unlike the older ones.Since the method consists in analyzing the number of oscillations (runs) around the median (4 in this case), the rejections are due to the low number of runs.
This result shows that non-randomness may be a consequence of the analysis of long periods in which the series are non-stationary or non-homogeneous.The increasing trend and non-homogeneity in these series are confirmed by trend and homogeneity tests applied.
The causes for these effects may be real changes in water quality conditions or changes in the form and/or frequency of measurements, methods of analysis, etc., i.e. changes in monitoring strategy.It emphasizes the importance of investigating the origin of non-randomness.

Homogeneity
Table 4 shows the rejection of homogeneity in 26% of the series, half with p-value <0.01.Once again the COND variable is highlighted.Even if the chance of type I error (rejecting H 0 when this is true) was 10% (α), there would be rejection in several series.
The test application method consists of dividing the series in half and comparing the two groups, so the rejections indicate high temporal variability of the series and possible non-stationarity (trend).
The non-homogeneity implies the presence of two or more populations in the same sample, so the p-values, confidence intervals, and other statistical inferences obtained by the analysis of this type of time series do not represent any of the tested populations.The analysis of historical series should be based on graphical analysis and knowledge about probable causes of non-homogeneity, for example, dates in which there were changes in methods, seasonality, among others.
An alternative to avoid the non-homogeneity of the COND series shown in Figure 5 would be to use part of the series, for example, from the year 2009, unless the objective is trend analysis.

Independence
Rejection of H 0 (independence) in favor of H 1 (dependence) was observed in 51% of the series.The test statistics can be seen in Table 5 and Table 6.Values well above the significance level (1.96) are observed, with emphasis on DBO, NH 4 + , N ORG and COD, which are representative of the organic content of pollution.
For the tested series, it is observed that the irregular monitoring frequency in the longer series, predominantly with monthly and quarterly frequencies and long periods of time without data, results in high values of the test statistic with a maximum of 53.72 (Table 5), indicating a strong dependence relationship.
The results of the shorter series indicate that the monthly frequency also provides sufficient evidence to reject the hypothesis of independence in several series, however, with considerably lower values of the test statistic, with a maximum value of 17.18 (Table 6).
In water quality, when the sampling frequency is higher (daily, hourly, etc.), there is no evidence that the water quality will be better or worse in the next interval, since the concentrations are not only functions of the flows, but also of the highly variable contributions of loads in the system.On the other hand, when the frequency of monitoring is lower (monthly and quarterly), a certain behavior is expected in terms of water quality due to the seasonality of the climate and the hydrological cycles, which interfere with temperatures, rainfall volume and increased diffuse pollution in the rainier seasons.
These results indicate that the complexity in establishing the dependence ratio may be associated with the increase in the frequency of water quality monitoring.However, due to the seasonal effect, the longer series (greater than one year) may still be correlated.These relationships should be the object of future research.
An alternative for non-rejection of the independence hypothesis in the water quality time series is the analysis of data separated by season or, according to Gilbert (1987) and McBride (2005), the use of methods adapted to this characteristic or the use of techniques for the removal of seasonality and trends.

Stationarity
In Table 7 (sites with longer time series), H 0 (stationarity) rejections in favor of H 1 (trend) were observed in 31% of the series (420 series tested).Overall, higher values of the S statistic are associated with p-values well below 0.05 (Table 8).Of the total observed rejections, 55% indicate decreasing trend (negative) and 45% increasing trend (positive).In Table 9 (sites with shorter time series), there is an increasing trend in some sites and variables as the data approaches the summer (dates between 08/12 and 12/12), but with values of S associated with p-values close to 0.05.The term trend, in essence, refers to long-term changes, and caution should be taken in this interpretation.The increasing contribution associated with the reduction of decomposition products indicates a possible reduction of not reflect seasonal effects.It is important to note that these series were obtained by the same monitoring strategy and under the same seasonal conditions as the other series, evidencing the existence of other factors that determine their attendance.The absence of seasonality is associated with the non-representativeness, mainly, of the flow series.The flow series in the Iguaçu River basin show seasonal variations.
The understanding of the applied methods allowed the establishment of relations of cause and effect between randomness, homogeneity and independence and presence or not of seasonality and trends.Seasonality may give rise to cyclical (non-random) variability, to data significantly higher or lower in different (heterogeneous) seasons, and to data decreasing or increasing in a chronological sequence as they approach a particular season of the year (dependence).
The presence of trends may produce significant differences between recent and old data (non-homogeneity), its slope may result in few runs (not randomness), and make predictable part of the variability of future data depending on the last data obtained (dependence).
However, the results show that seasonality and trend can not assess the assumptions of randomness, homogeneity, and independence alone.Table 10 shows that the absence of seasonality and trends in the flow series are associated with compliance with all assumptions in IG4, IG5 and IG6, but not in IG3.The lack of trend may be associated to non-attendance, as in IG1, but also to compliance, as in IG4, IG5 and IG6.In this way, it is understood that all assumptions must be tested prior to frequency analysis in water quality time series.Some causes of non-compliance with basic assumptions can be verified graphically as well as possible solutions to work around the problem.

CONCLUSIONS
The verification of the basic assumptions for frequency analysis in water quality series in the Upper Iguaçu Basin evidenced the need for a more cautious approach to the analysis of large databases.The assumption of randomness, homogeneity, independence and stationarity premises, typical in studies of water quality, can be misleading in most of the series.
The applications of the tests allowed the identification of series important characteristics, as well as, reveled deficiencies of the typical strategies of water quality monitoring.The fulfillment of the basic assumptions for the analysis of water quality time series is essentially related to the monitoring strategies and techniques of approach of the series for analysis.As surface water quality variability is strongly related to anthropogenic activities, monitoring strategies should generate information on the cycles of variation corresponding to the various activities of man, i.e. daily and weekly cycles, as well as holidays.
Likewise, the annual cycle, due to seasonality, and other events that significantly interfere with water quality conditions through diffuse pollution should also be monitored.The data for each cycle or event should be compared and, if they are significantly different, separately analyzed.
However, due to the high costs, the intrinsic difficulties of monitoring and the lack of planning, the typical monitoring  biodegradability, consistent with the increasing trend observed in the pH variable, which negatively affects the action of microorganisms.

Synthesis of results
Almost all water quality variables were rejected in at least one of the assumptions at most monitoring sites.Some series, such as the STT and Q at IG6, met all the assumptions and did
and comparison with expected values in function of m e n (SIEGEL; CASTELLAN JUNIOR, 2006).

.
In the unilateral case H 0 is rejected in favor of the presence of an increasing trend if Z 0

−
degrees of freedom, where r is the number of categories and k the number of groups (SIEGEL;CASTELLAN JUNIOR, 2006).

Figure 3 .
Figure 3. Distribution of sampling frequency (each line is a campaign).

Figure 5 .
Figure 5. Median and conductivity time series at site IG4.
rejection of H 0 .
rejection of H 0 .From IG1 to IG6, increasing trends are concentrated on variables such as N ORG , N T , P T , COLIF FECAL , COND, TURB and pH, indicating the increasing contribution of organic matter in the basin.Decreasing trends are observed in variables such as DBO, , P ORG DISS and COD, which represent the products of decomposition.

Table 2 .
Rejection of H 0 (equality between seasons) in seasonality tests.

Table 3 .
Rejections of H 0 (series is random) in randomnsess test.

Table 5 .
Rejections of H 0 (independence), sites with longer time series.

Table 6 .
Rejections of H 0 (independence), sites with shorter time series.

Table 7 .
Rejections of H 0 (stationarity), sites with longer time series, S statistic.

Table 8 .
Rejections of H 0 (stationarity), sites with longer time series, p-values.

Table 10 .
Comparison among rejections in tests for flow Q.