Estimating flood recurrence uncertainty for non-stationary regimes

Abstract Assuming non-stationarity in flood frequency models is still controversial due to uncertainty in estimates. In this study, a hierarchical Bayesian framework for flood frequency analysis is presented without assuming the stationarity hypothesis. We account data and model uncertainty in all modelling steps and use the Pardo River, Brazil, as study case. Results showed the presence of increasing trends in floods in Pardo River. The stationary model underestimated floods compared to the non-stationary model. Physical-based covariates models performed better than time-based showing the importance of adding physical covariates to explain the trend behavior. The presented model is adaptable to other case. Finally, this study provided guidance for the flood recurrence estimation under non-stationary conditions.


INTRODUCTION
An accurate characterization of magnitude, frequency, duration, and timing of floods is key for water management (Merz et al., 2012).The estimation of flood recurrence is useful in many applications, including natural stream design, urban zoning and planning, and design of hydraulic structures (Gordon et al., 2004;Montanari et al., 2009).The Flood frequency analysis (FFA) is a method of fitting a probability distribution to a dataset to estimate the future occurrence of an event of interest, allowing us to determine the probability of flood's magnitude and frequency, and therefore, supporting water management practices and policies (Rao & Hamed, 2000;Brunner et al., 2021;Slater et al., 2021).
Traditional FFA methods require flood data set to be: i) independent and identically distributed, and ii) stationary (i.e., statistical properties do not change over time).When assuming stationarity, we're implying that the chosen return period (RP) in the structure's design will be the same by the end of its planned design life (Salas & Obeysekera, 2014).
However, as many studies have identified significant trends in hydrological series, the stationarity assumption does not always support.Trends in hydrological series were already identified in many studies in different regions.The 100-year flood may become more common in many watersheds in the United States (Vogel et al., 2011), trends in magnitude of snowmelt floods were identified in Canada (Cunderlik & Ouarda, 2009), annual streamflow in Europe (Stahl et al., 2012) and downward annual maxima flood trends in the southeast and southwest regions of Australia (Ishak et al., 2013).In Brazil, Detzel et al. (2011) identified trends in inflow series of hydroelectric plants.However, a more recent study by Berghuijs et al. (2017) found smaller increases in the frequency and magnitude of extreme floods in Brazil compared to Europe and the United States when analyzing multi-continental changes.Bartiko et al. (2019) observed more pronounced trends in the frequency rather than the magnitude of floods on an annual scale in Brazil.Despite this, the authors noted that flood events are becoming more frequent and intense in regions of Brazil with wet conditions and less frequent and intense in drier regions.
Although it is difficult to distinguish the exact cause of trends, researchers have identified several potential drivers, including atmospheric factors such as natural climate variability and human-induced climate change, catchment characteristics such as urbanization and land use changes, and alterations to rivers and streams resulting from in-stream channel engineering (Merz et al., 2012, Bloschl et al., 2015).According to Milly et al. (2008), anthropogenic effects on Earth's climate and its consequences on water cycle makes it unfeasible to consider the stationary assumption.
Conversely, some authors (Montanari & Koutsoyiannis, 2014;Serinaldi & Kilsby, 2015) argue that stationarity should be the default assumption in hydrologic series modeling due to uncertainties added to the nonstationary model, affecting its reliability, as well as the difficulty to demonstrate the main driver in hydrologic series changes.Therefore, these authors suggest there should be strong evidence of changes in extreme events before assuming non-stationarity.
To assess uncertainty in non-stationary models, the Bayesian analysis seems a promising approach.The Bayesian framework considers the addition covariates and, therefore, reduce uncertainties (Serinaldi & Kilsby, 2015).Some of the usual FFA methods, such as the Maximum-Likelihood may have some drawbacks related to instability in the likelihood function of small sample sizes, leading to estimates with high bias and variance (Hosking et al., 1985).
Numerous studies in FFA (Martins & Stedinger, 2000;El Adlouni et al., 2006;Gado & Nguyen, 2016;Bhat et al., 2019) use model selection methodologies that penalizes models with higher number of parameters (Burnham & Anderson, 2004) instead of considering the direct uncertainty caused by data and model parameterization (Vehtari et al., 2017).Bayesian studies often does not integrate over the posterior of parameters by considering point estimates like the posterior mean (Šraj et al., 2016).Bootstrapping is a common approach to account for prediction uncertainty by calculating confidence intervals (Serinaldi & Kilsby, 2015;Gado & Nguyen, 2016).However, its use is not straightforward for the non-stationary case, when we commonly have hierarchical data structure (e.g., precipitation), whereas the bootstrap was originally designed for data modelled as i.i.d (Field & Welsh, 2007;Ren et al., 2010).Additionally, bootstrap does not consider informative priors, a drawback as no expert knowledge is used to narrow model uncertainty in a world where hydrological data is scarce.
Therefore, due to the necessity to better understand uncertainty in non-stationary flood recurrence credibility intervals, we present a hierarchical Bayesian model for flood risk estimation of non-stationarity rivers using GEV distribution.First, we provide guidelines to FFA under a fully Bayesian approach.We use the GEV distribution as likelihood in our Bayesian model and set its parameters to vary depending on covariates.Secondly, using the hydrological data of Pardo River watershed in southeast Brazil, we assess the proposed method by analyzing the flood recurrence intervals under the non-stationary assumption.We incorporate the use of precipitation information as explainable variable of non-stationarity and compare with the time covariant (in years) case.

MATERIAL AND METHODS
We used Bayesian inference for flood estimation (Figure 1).We used annual maximum streamflow and GEV as likelihood for all models.We set time and precipitation as covariates (99 th quantile and total annual precipitation) and compared results Gomes et al.

3/12
to stationary model.After sampling from Markov Chain Monte Carlo (MCMC), we assessed model convergence and quantified the goodness of fit for all tested models.We selected the Pardo River as study case to assess model performance.

Bayesian theory and MCMC
The Bayesian theorem involves calculating the probability of a set of parameters (θ ) given a set of observations (X ).Unlike frequentist approaches, which treat the variable of interest as a constant value, Bayesian theory considers it as a random variable to account for the uncertainty in its exact value.This uncertainty is represented by a probability distribution that can be calculated from the observations.Some advantages of Bayesian method are its ability to provide the parameters' full posterior distribution, instead of only a point estimate, which allows the evaluation of uncertainties in the model, known as credible intervals.
To employ the Bayesian approach, we integrate the prior distribution, P(θ), which represents our prior knowledge about the parameter θ, with the likelihood of the dataset, P(X|θ).In cases where we lack robust prior information about the parameters, it is advisable to choose an uninformative prior distribution, which can be uniform or flat.Conversely, if we possess sufficient background information, we can set informative priors to better inform the analysis.The Bayes' theorem (Equation 1) is expressed as follows: is the posterior distribution, or the probability of the hypothesis given the data or evidence X ; ( ) P θ is the prior distribution; ( | )  P X θ is the likelihood function, or the probability of the data under the hypothesis, and it is usually given by a known distribution (for instance, GEV distribution); ( ) P X makes the posterior distribution of probabilities for each hypothesis sum to one, and for this reason, is called normalizing constant.
For simple problems ( ) P X can be obtained analytically.However, in non-trivial cases where the parameter vector is large, computational methods are required such as the Markov Chain Monte Carlo (MCMC; Gelman et al., 2013).
MCMC construct a Markov Chain whose stationary distribution is the target distribution (i.e., Posterior) after a large number of steps.Among the several MCMC techniques, the Metropolis Hastings (MH) (Metropolis et al., 1953;Hastings, 1970) has been widely used in hydrology studies (Han & Coulibaly, 2017).It consists of sampling from a target distribution by making a random proposal for new parameter values and deciding if those values are incorporated to the chain or if they are discarded based on defined criteria.This procedure is repeated until the desired number of iterations is achieved, attaining the samples from the target distribution in the accepted parameters values.Estimating flood recurrence uncertainty for non-stationary regimes Nonetheless, a characteristic and drawback of MH is its own randomness, called the "random walk" since it tends to waste computational time or explore narrow regions of the parameters space in complex problems.Hence, Hamiltonian Monte Carlo (HMC) algorithm is a more efficient way to explore the parameter space avoiding the random walk and sensitivity to correlated parameters (Hartmann & Ehlers, 2017).Traditional HMC requires the specification of two parameters, and it is highly sensitive to them: step size and number of steps.In this matter, the No U-Turn Sampler (NUTS) use a recursive algorithm that eliminates the need to set a number of steps.NUTS performs equally or more efficiently than traditional HMC (Hoffman & Gelman, 2014).
In this work, we used the the Probabilistic Programming framework written in Python, PyMC3 (Salvatier et al., 2016).

Stationary and non-stationary models
The Generalized Extreme Value (GEV) distribution is commonly used to model annual streamflow maxima (Katz et al., 2002).It includes three distribution families according to the shape parameter (κ ): Gumbel ( 0 κ = ), Fréchet ( 0 κ < ) and Weibull ( 0 κ > ).GEV distribution in its cumulative form (Equation 2) is represented by: ( ) where when 0 κ > (Weibull).µ, a, κ are the location, scale and shape parameters, respectively.To calculate quantiles with non-exceedance probability P, the inverse of the cumulative distribution function is used.
Reasonable shape parameters lie between -0.5 and 0.5 due to GEV restrictions regarding skew and variance.Due to these mathematical restrictions and based on hydrological experience, it is recommended a ( ) prior centered in -0.1 (Martins & Stedinger, 2000).Moreover, for heavy tailed cases, most common in hydrological practice, the choice of parameters 6 and 9 gives better quantile estimates (Park, 2005).For location and scale, when no information is available, uninformative parameters are chosen.

Convergence checking
As in any other optimization method, the convergence of the MCMC process must be analyzed for a proper inference, as we are assuming that the samples are derived from the true posterior distribution.We can determine the minimum number of samples required to ensure a reasonable approximation.There are several known methods to perform convergence diagnostics.Once no method works in every case, it is recommended a combination of strategies to evaluate MCMC sampler convergence (Cowles & Carlin, 1996;El Adlouni et al., 2006).In this study, we chose Gelman and Rubin (Gelman & Rubin, 1992) and Geweke (Geweke, 1991) tests due to their ease interpretation and wide use in the literature (Najafi & Moradkhani, 2013;Nguyen et al., 2014;Seidou et al., 2012).
The Gelman and Rubin test is based on the comparison of the variation within the chains in relation to the total variation across the chains for all the iterations.Several Markov chains are run in parallel and the square root of the ratio between average within-chain and average between-chains variance of the likelihood are computed during the iterations.This ratio should ideally be equal to 1 if convergence is achieved.
The Geweke test splits the MCMC resulting chain in two parts and compare their mean.If the chain has converged, their mean should be similar.With this test is also possible to determine the burn-in period of the simulation (or how many initial iterations to discard), which would be equivalent to the smallest early portion of the chain that passes the diagnostic (El Adlouni et al., 2006).

Model selection
Cross-validation (CV) is primarily used to estimate model performance when limited data is available.In the standard approach, called k-fold CV, we split the dataset in k smaller sets.For each subset we sampled a group as a holdout dataset and took the remaining groups as a training dataset.We fitted the model using the training set and then, evaluated the model using the discarded set.In the Leave-One-Out cross-validation (LOO), we iterated through each data point to test model performance.
To implement K-fold cross-validation we need to partition the data repeatedly and fit the model on every partition, which is computationally costly.To approximate LOO, importance sampling we used the Pareto Smoothed Importance Sampling (PSIS) (Vehtari et al., 2017).One advantage of PSIS-LOO is that the method is more robust than the Watanabe-Akaike information criterion (WAIC) in the finite case with weak priors or influential observations.As PSIS-LOO accounts for the whole posterior it is a fully Bayesian method, unlike AIC or DIC.Therefore, we used PSIS-LOO to assess prediction accuracy.Models with lower scores are preferred than the others.One concern regarding using LOO is the fact that we are using "future" data to validate past data, as we hold-out one data at a time for validation (Bürkner et al., 2020).However, LOO is able to select a model which can describe well the structure in the time series.The main assumption we have to make is that it is likely that such model we choose as the "best" would also make good predictions for future data as it did for the calibration data.

CASE STUDY
The Pardo River Basin was selected to assess the presented model in this study.Streamflow and precipitation records were obtained from Hidroweb from the Brazilian National Water Agency (ANA).Based on non-stationarity evidence in literature (Santos et al., 2016) we selected streamflow data from the Clube de Regatas station (code 61834000), located in the city of Ribeirão Preto, São Paulo Brazil.The streamflow timeseries correspond to 1941-2014 period.Data from nearby precipitation gauge station (code 02147004), was used.We considered a 5% maximum gap to include each year into the used dataset.After this pre analysis, and to pair both streamflow and precipitation data, we considered the period of 1945-2012.
Gomes et al.

5/12
Annual flood time series, annual precipitation percentile 99 th , and total annual precipitation were grouped from the time series dataset.All three variables indicate an increasing trend (Figure 2).
All models in this study use GEV distribution as likelihood in the Bayesian framework.We chose to vary the location parameter in function of time and precipitation.The shape parameter was assumed as constant in time for the principle of parsimony.The models are summarized as: • GEV µ a κ : A stationary case, where all parameters are constant; • ( ) : A non-stationary case, where the location parameter µ) is linearly dependent on the total annual precipitation.Time (t ) was counted from the beginning of the record for 1 GEV .
As indicated in section 2.2, Beta prior for the shape parameter was used for all models.For all other parameters and hyper-parameters uninformative priors were set.Details on distributions and boundaries are presented in Supplementary Material S1.
To make predictions we extrapolate covariates.As for 1 GEV covariate is time, extrapolation is evident, for the precipitationbased models we performed a linear regression of recorded precipitations and time to forecast the near-future precipitation data using the assumption floods will change in the same rate as in the recorded period.To guarantee the chain has converged to its equilibrium value, we drawn multiple long chains and compared marginal parameter densities when estimating FFA.To choose the "long chain" quantity we checked different (computationally feasible) chain lengths from 1,500 to 10,000 and respective impacts on posterior summaries such as Highest Posterior Densities (HPD; most likely values of the parameters with a fixed probability, e.g., 95%), mean and median values.There is no rule to choose chain lengths, but there was no evidence of longer chains needed in our sensitivity analysis.In our study, all trace plots converged (Figure 3).Parameter estimates do not move quickly from the central tendency after a few iterations from the initial value.This pattern means we have a good starting value (other trace plots not shown here).Chain length values can also be chosen depending on burn in period identified by model converge diagnostics.Geweke and Gelman and Rubin tests showed convergence for a burn-in-period of 500 iterations and chain length of 2500 (parameters not shown here).Multiple chains were generated and then retested using convergence diagnostics.They did not show considerable variation in terms of convergence parameters.Due to the results of Geweke and Gelman and Rubin tests, we assume that the model achieved convergence.It is important to note that if some of the above methods fail convergence detection, the alternative is to try a different parametrization strategy, such as different priors and respective hyper parameters, or resetting model structure.Changing to more informative priors, rebuilding model setup, and even gathering more data might improve model convergence.

Parameter estimation
Table 1 shows the parameter estimation results for the studied models considering time and rainfall-based covariates.We found similar shape estimations (negative and close to zero) regardless of the studied model.For all the non-stationary models we found the angular coefficient for the location parameter 1 m ) to be positive not only for the posterior mean, but for the whole considered HPD.This confirms our suspicions of an increasing trend in Pardo River.In Compared to data mean (650 m 3 /s) it represents approximately 0.9% of the flow discharge per year.For hydraulic structures with long design lifespan this increase could be impactful under stationarity for the whole design period.GEV performed best (964.6 over 967.9), but errors overlap.Taking in to account the uncertainty in their estimation, the PSIS-LOO seems quite flat across these two models, meaning uncertainty mostly overlaps across models.This indicates that even though 1 GEV performed best, it does not differ much from the stationary model in terms of PSIS-LOO.In this case, model complexity could not outweigh its benefit in absolute fit than the simpler model (even when there is an established increase trend).PSIS-LOO should guide our decision about the best model.However, it should not be the only metric to decide which model to effectively choose.Also, one limitation from PSIS-LOO or others cross-validation metrics is that they can only provide a relative test of model quality.In other words, PSIS-LOO cannot answer if the model is good in absolute sense (Vehtari et al., 2017).This means that we are comparing which model performs best with these specific chosen parameters.To find more informative priors and test different covariates that may improve model accuracy is one of the main focus on Bayesian Flood Frequency Analysis (Han & Coulibaly, 2017).

Comparison and prediction
Figure 5 shows flood prediction from years 2022-2050 to the 2 and 100 years return period for our models.2 and 100 years were chosen to represent a common and a rare (and potentially catastrophic) event.
1 GEV overestimated more than other non-stationary models and presents higher uncertainty even for low return period values such as the 2-year presented.As LOO accounts for the whole posterior, the higher uncertainty in 1 GEV agrees with higher LOO values for this model compared to the precipitation-based ones.

1
GEV also shows higher differences in trend attenuation (the slope hyperparameter) then the other non-stationary, which shows the value of adding physical-based covariates to non-stationary models for narrowing uncertainty.The 99 th quantile for the maximum predicted precipitation for year 2050 is 60.6 mm, which means a discharge of 1552 m 3 /s for the expected 100 years flood considering the posterior mean of our model.This value is 2.1% higher than the predicted value using 1Ptotal GEV for 2022-year, 1520 m 3 /s.All non-stationary models that have time as covariate, are extrapolations of the trend pattern and infer about events that have not been observed.However, there is no guarantee that the increasing pattern of floods in Pardo River will hold indefinitely.Future data may indicate change in the non-stationarity intensity.Therefore, it is important to identify what are the main drivers of non-stationarity to predictions (Viglione et al., 2016).
Considerations on the time frame predictions must be taken parsimoniously, while prediction data include a new layer of uncertainty.Even though a linear regression is fair for comparing different models, it severely underestimates the variance of the series and is unsuitable for a long-time ahead extrapolation.An alternative is using Regional Climate Models (Giorgi, 2019), which can provide more reliable information of climate-related variables in a changing climate (Gebrechorkos et al., 2019).It is also important to note that to design bigger hydraulic structures such as dams, we need to estimate decamillennial floods.In the proposed framework, it would require covariates to be extrapolated 10,000 years in the future as well, which would clearly incorporate more uncertainty to the problem.
Our results showed significant differences in quantile estimates between stationary and non-stationary models, with lower discharge values for the stationary case.Unjustified stationarity assumptions could lead to an increase in risk failure of hydraulic structures.Our results suggest that precipitation is a good covariate to explain the non-stationarity characteristic in Pardo River corroborating with other studies (Prosdocimi et al., 2014;Šraj et al., 2016;Viglione et al., 2016 ).(Zhang et al., 2019;Dong et al., 2019;Liu et al., 2014).Using multiple explanatory variables can be an attractive option because it may help to better explain the variability in the data.However, it is important to study these variables carefully since including more variables does not necessarily lead to a reduction in uncertainty (Zhang et al., 2019).
As the integration of physical variables into non-stationary FFA methods seems promising, they still rely on extrapolation as in the time-based 1 GEV and have their own forecast uncertainty.The Bayesian method can deal with this uncertainty by adding a new level in model hierarchy.Naturally, this adds complexity to the model and their implications on how to model uncertainty should yet be explored which is out of the scope of this study.The representativeness of river discharge time series also must be considered when performing FFA.Small size and insufficient datasets are associated with even higher uncertainties and consequently unreliable estimations.

CONCLUSIONS
The Bayesian framework for flood forecast presented in this study is flexible and adaptable to all cases.By presenting uncertainty in prediction return periods decision makers have the option to be even more cautious on flow return levels when designing hydraulic structures.The addition of covariates and expert knowledge to update priors were extremely valuable to increase model accuracy and make better predictions.
The limitation of the research work is still the uncertainty in extrapolating trends, as examining the assumption of the stationarity.Several aspects from the future work can be tackled from this study: consideration of regional data to better estimate priors, evaluation of Bayesian model flexibility and estimation uncertainty regarding the combination of multiple physical covariates, understand the role of other physical variables such as radar data, climate data and remotely sensed data.

Figure 1 .
Figure 1.Workflow and summary of used techniques.

Figure 3
Figure 3 presents the trace plots for the

Figure 2 .
Figure 2. Time series in the Pardo River of discharge, annual 99 th quantile precipitation and total annual precipitation.In gray, linear regression for time series.

Figure 3 .
Figure 3. Trace plots and posterior densities of GEV 1 parameters in Pardo River.Different colors represent multiple simulated chains.

1
GEV the hyperparameter 1 m can give us an indication of how much flow discharge increases per year.
log-score multiplied by -2 (lower Deviance is better).The model that has total annual precipitation as covariate resulted as the best model in our study considering the cross-validation analysis.Comparing 0 GEV and 1 GEV in terms of the PSIS-LOO score, 1

Figure 4 .
Figure 4. Models' comparison in terms of and WAIC.Empty circle represents the values of LOO and respective black error bars associated with them are the values of the standard deviation of LOO.Gray plots correspond to Watanabe-Akaike Information Criterion (WAIC).

Table 1 .
Summary of the posterior estimates for all four models using time and rainfall-based covariates.Estimated parameters (posterior mean) are followed by 94% credible intervals (in brackets).