Physical-based time series model applied on water table depths dynamics characteristics simulation

Time series modelling applied to study water table depths monitoring data is an elegant way to model irregular and continuous data. When successive observations are dependent, future values may be predicted from past observations, and target parameters can be estimated. These may include expected values of groundwater levels, or probabilities that critical levels are exceeded at certain times or during certain periods. These target parameters are estimated with the purpose of obtaining characteristics of the development of a certain domain in time and such characteristics can, for instance, be extrapolated to future situations. In a system identification approach, is it possible to establish the dynamic relationship between water table perturbations and climatological events, vegetation, hydrogeological local conditions, management and groundwater abstraction. The aim of this work was demonstrate the use of a physical-based time series model to stablish the relationship between precipitation and water table depths from hydrogeological monitoring data. The results enabled to infer about water table dynamics even when it is affected by different climatological patterns, simulating mean, maximum and minimum states.


INTRODUCTION
A common thread in environmental studies is the need for methodological tools capable of describing and predicting these complex and, typicall high-dimensional processes (CRESSIE; HOLAN, 2011). Environmental time series studies are fundamental to a large objective of sustainability and adaptation. To know how, and at last analysis, why enviromental processes changes over time gives governs and police makers a means of a rational and informed decision making processes.
A general class of time series models based on linear regression are described by Box and Jenkins (1976) while Hipel and McLeod (1994) present several applications of these methods in hydrology, as well extension developed between both publications. Even with recent physical-mechanistic spatially distributed models developements, simple time series models continue being used in a large variety of applications, but they do not stop evolving with collaborations of disciplines such as systems identification, engineering control, signal processing and filtering in their developments . The advantages of time series models over more complex models are their accuracy, ease of construction and robustness of the statistical bases that govern the models. This is mainly due to time-series models considering the system as a whole, while other models are based on the representative elementary volume (REV). When considering REV, the modeler automatically impute the notion of spatial scale to modeling, which requires a lot of data to solve the model.
Applications of time series analysis can be used to fill and complete irregular hydraulic head series from precipitation data (YI;LEE, 2003); to evaluate the interaction between groundwater and surface water (HATCH et al., 2006), or even be extended to the evaluation of multiple stresses like baseline flow, pumping, vegetation, climate, dams, in the behavior of groundwater levels (VON ASMUTH et al., 2008). More complex time series models can estimate groundwater recharge (YIHDEGO; WEBB, 2011), capture nonlinear soil drainage behavior (PETERSON; WESTERN, 2014) or even define recharge response time from precipitation (HOCKING; KELLY, 2016;MANZIONE et al., 2017).
The aim of this work was to use the dynamic relationship between precipitation and groundwater levels oscillation established by a time series model to simulate groundwater characteristics, such as medium and extreme levels, from hydrogeological monitoring data, in order to provide relevant information for the planning of water use and rational use of aquifer systems.

TIME SERIES MODELLING
Time series models are a systematic and empirical way to estimate and predict the temporal behaviour of a dynamic hydrological process. The initial aim of a time series analysis is the realization of inferences about the properties or basic characteristics of the generator mechanism of the stochastic process of the observations of the series (MANZIONE, 2015). After its formulation, the mathematical model is used to test some hypothesis or theories related with the process and make prediction of future values of the time series. The correct representation of the hydrological response determines a relevant point of the study to be considered in the management of these resources. Several stochastic methods and modelling schemes have been developed for hydrological processes. In temporal stochastic processes, the future is determined only partially from past values. Exact predictions are impossible to obtain and should be replaced by the idea that future values may occur within a given range of probabilities that is conditioned on the knowledge of past values. The future state of a stochastic process is predicted from a set of realizations, which can be described as the result of a probabilistic experiment. Predicting the future state of water resources is necessary for the application of water resources policies and decision-making systems in real time. The lack of knowledge about the physical processes of the hydrological cycle has made the application of statistical methods increase in the field of predictions and synthetic data generation in the incorporation and uncertainties and analysis of extreme events (KARAMOUZ et al., 2013).

TRANSFER-FUNCTION NOISE (TFN) MODELS
Transfer-function models relate the behaviour of the series under investigation to present and past values of one or more series. If another variable is affecting the value of the variable under study Zt, the effect of this variable can be verified using a transfer function model. One or more deterministic transfer components and a noise component are distinguished, being additive parts of the model. The transfer component describes the part of groundwater level that can be explained by an input consisting of a linear transformation of a time series of that input variable. The noise model describes the autoregressive structure of these differences between observed values of the aquifer level and the sum of the transfer components. The input of the noise model is a series of independent and identically distributed disturbances with zero mean and finite and constant variance, which is a white noise. The general form of a noise transfer function model can be seen in Figure 1,   The basic idea behind this transfer-function noise modelling is to divide the observed series (output) into a number of components related to causes (inputs) that influence the phenomenon, and an unknown noise component. TFN models are generally applied to distinguish natural and anthropic components from groundwater series (VAN GEERA;ZUUR, 1997). If an input series {Xt} is considered, the TFN model is defined as: is the transfer component, and: is the noise component. The subscript b is a pure delay, which corresponds to the number of time steps after an input action causes a reaction at the output of the system. The extent of this for multiple input series is direct. The transfer component of Equation 2 can be rewritten as: The weights 0 1 2 , , ,... ν ν ν form the impulse-response function ( ) where ω and δ are the coefficients of the impulse-response function B. The theoretical impulse-response (IR) function reflects the same autoregressive and moving average characteristics of a theoretical autocorrelation function (ACF) (HIPEL; MCLEOD, 1994). In models that empirically explain the influence of the hydrogeological regime on a phenomenon related to it, the number of explanatory variables (inputs) should be kept as low as possible, in order to avoid overestimated behaviors or coincident correlations (VON ASMUTH; KNOTTERS, 2004).

MODELLING WATER TABLE DEPTHS USING THE PIRFICT MODEL
The behaviour of a linear input and output system can be completely characterized by its IR function (ZIEMER et al., 1998;VON ASMUTH et al., 2002). For water table depths, physical-mechanistic models of groundwater flow can explain the dynamic relationship between the precipitation incident in an area and the response in the groundwater levels. However, predictions about water table depths can be provided and obtained by much less complex TFN models, being generally as accurate as those obtained by deterministic models (KNOTTERS, 2001). In TFN models, one or more deterministic transfer components and a noise are determined as additive components. The transfer components describe the part of the process of water levels oscillation that can be explained by the input series (precipitation, evapotranspiration, pumping, abstractions, river flow, among others) from a linear transformation of these input series. The noise model describes the autoregressive structure of the differences between the observed levels and the sum of the transfer components. The input of the noise model is a series of independent and identically distributed perturbations, with zero mean and finite and constant variance, which is white noise. The PIRFICT (Predefined Impulse Response Function In Continuous Time) model is an alternative to TFN models at discrete time intervals presented by Von Asmuth et al. (2002). In the PIRFICT model, the input block pulse is transformed into an output series by a continuous-time transfer function. The coefficients of this function do not depend on the frequency of observation.

Estimating groundwater systems response characteristics
Assuming linearity in the system, a series of water table depths is a transformation of a series of surplus/deficit precipitation. This transformation is completely governed by the IR function. For the case of a simple linear system with no phreatic disturbances, which is influenced only by surplus/deficit precipitation, the following TFN model (written as an integral convolution) can be used to describe the dynamic relationship between water table depths and surplus/deficit precipitation (VON ASMUTH et al., 2002): The local drainage level d is obtained from the data as follows: with N as the number of water table depths observations. TFN models are identified through the choice of mathematical functions that describe the IR relationship and the autoregressive structure of the noise. This identification can be done in two ways: 1. Interactively: using correlation structures in the available data and model diagnostics; 2. Physically: based on insight into the behaviour of the analysed system. Following the physical system identification, the IR function describes the way in which each water table will respond to a pulse caused by precipitation. In this respect, it is possible to make an analogy to a unit hydrograph (VON ASMUTH et al., 2002), where after a precipitation event there will be changes in the base flow and increase in the surface, subsurface and groundwater flow. A typical IR function resembles a probability distribution function with strong asymmetry (Figure 2).
The area and shape of the IR function depend heavily on in situ hydrological circumstances. Where by chance the resistance to flow near the nearest drainage is low, the water table will show a rapid drop in the levels after a precipitation event and consequently the area of the IR function will be small. This also reflects the memory of the hydrogeological system at a precipitation event, which may be small as in the previous example or large example when the porous media is extensive and the levels deeper. The parameter θ (t) is a Pearson type III distribution function (PIII df) (ABRAMOWITZ; STEGUN, 1965). The option for this type of function is given by its flexible nature, adjusting to a wide range of hydrological responses. Assuming linearity, the deterministic component of the water table dynamics is completely described by the moments of the IR function. In this case, the parameters can be defined according to Von Asmuth et al. (2002): where A, a, n, are the adjusted curve parameters, Γ(n) is the Gamma function, with parameter α controlling the exponential decay rate of φ(t), and σ 2 r is the residuals variance. The PIII df was able to model oscillations in the water tables in a similar and comparable way to Box-Jenkins TFN models, but with many more parameters (VON ASMUTH et al., 2002).
Equation 10 and its parameters present the physical sense embedded in the dynamic relationship between precipitation and response in the aquifer, as described in Von Asmuth and Knotters (2004). Parameter A is related to drainage resistance (the area of the IR function is equal to the ratio of the mean water table depth to the average recharge). The parameter a is determined by the soil storage coefficient (porosity) and parameter n by the convection time and dispersion of the precipitation by the unsaturated zone. The physical bases are explained by transfer functions of a series of linear reservoirs (NASH, 1958). For proposals where it is intended to model the response of a basin in general, the idealization of the basin as a reservoir of linear storage is the most elementary among several levels of conceptualization that involves (BODO; UNNY, 1987). The parameter n shows the number of reservoirs and a is equal to the inverse of the normally used reservoir coefficient. As Knotters and Bierkens (2000) explain, a simple linear reservoir (PIII df with n = 1) is equal to a simple physical model of one-dimensional soil column, discarding lateral flow and the functionality of the unsaturated zone. Von Asmuth and Knotters (2004) call attention that care should be taken when interpreting the parameters of the PIII df, in physical sense, because of their lumped and empirical nature. Further details on the identification of IR functions and the physical interpretation of their parameters can be found in Von Asmuth (2012).

Simulating water table dynamics characteristics
According to Knotters and Van Walsum (1997), examples of statistical characteristics describing the groundwater regime are the mean groundwater level (MGL), mean highest groundwater level (MHGL) and mean lowest groundwater level (MLGL). These statistics are called mean groundwater levels ( VAN HEESEN, 1970;VAN DER SLUIJS;GRUIJTER, 1985). These statistics are used, for example, to analyze or model the relationship between the hydrogeological regime and factors such as crop growth (FEDDES et al., 1988;VAN DAM, 2000), soil conditions or ecological conditions (GROOTJANS, 1985;WITTE et al., 1992;VAN EK et al., 2000). In addition, they are used to generate scenarios about extreme conditions of groundwater levels for rural and urban planning or even public policy formulations ( VAN HEESEN, 1970;BOUCNEAU et al., 1996;BIERKENS et al., 2000;MANZIONE et al., 2010).
Within a system identification approach such as that of the PIRFICT model, it is important that the signal between the oscillation of the groundwater levels and the climatological variables is high frequency so that the TFN can model the groundwater dynamics. Van Heesen (1970) recommends a minimum of eight years of groundwater observations to calculate MGL, while Knotters and Van Walsum (1997) show that there is still considerable variation in MGLs at scales above eight years. Since series of groundwater levels are not available for long periods of time, it is possible from the dynamic relationship between water table depths and climatological variables, such as precipitation, to use this signal to extend the series of observations for the same period of the climatic series. For this procedure, Manzione et al. (2012) proposes the following steps:  2. Realizations of the noise process are generated by stochastic simulation and next added to deterministic series resulting in realizations of series of water table depths. Realizations of the noise process can be generated either by random sampling from a normal distribution with zero mean and residual variance, or by resampling from the fitted residuals; 3. From the previous steps, N realizations of the stochastic simulation are generated. Statistics representing the prevailing hydrologic conditions are calculated from the water table depths probability density function (PDF) for each t instant.

PIRFICT model applications
Groundwater level time series are generally collected manually, tend to be non-equidistant and often containing missing data (VON ASMUTH et al., 2002). As previously noted, the PIRFICT model is capable of dealing with any data frequency because it is continuous over time, with the intervals of the output series not being determined by the frequency of the input series. In addition, the PIRFICT model offers an additional advantage when calibrating TFN models in irregular series, compared to autoregressive models combined with the Kalman filter (KNOTTERS; , since the format of the transfer function is not restricted to an exponential format (VON ASMUTH;BIERKENS, 2005).
Von Asmuth et al. (2008) extend the formulation of the PIRFICT model for multiple input series, such as precipitation, evapotranspiration, river base flow, pumping tests, anthropogenic interventions, inclusion of trends, steps and nonlinearities. For a complete description of the PIRFICT model, its formulation, applications and case studies it is recommend the work of Von Asmuth (2012) and for more studies with field data Yihdego andWebb (2011), Obergfell et al. (2013) and Manzione et al. (2017).
The EEcSB was regulated by Decree 22.337 of June 7, 1984(SÃO PAULO, 1984, which instituted its formation with an area of 4,371 hectares within the limits of the Santa Bárbara State Forest, of which 2,712 hectares of native vegetation (cerrado, Figure 3. Location of the Santa Bárbara Ecological Station (SBES) and State Forrest in the limits of Águas de Santa Barbara municipality, Brazil. Source: Santarosa (2016). marshes and gallery forest) dividing the space with the reforestation with pine and eucalyptus trees. The area is under responsibility of the Instituto Florestal do Estado de São Paulo (Forrest Institute on São Paulo /Brazil).
The geological formations in the region are the sandstones of the Adamantina and Marília Formations, belonging to the Bauru Group, with predominance of the Adamantina formation in the EEcSB (MELO; DURIGAN, 2011;CPRM, 2006). According to Ross and Moroz (1996), EEcSB is located in the Paraná Sedimentary Basin (morphostructure) and in the Western Plateau (morphosculture), with relief forms predominantly of broad and low hills with altitudes between 556 and 705 m (SANTAROSA, 2016).
The climate characteristic of the region, according to Koeppen's classification, is Cwa or tropical subhumid (hot weather with dry winter), presenting average monthly temperatures of 16 °C in the coldest month and 23 °C in the hottest month (CEPAGRI, 2016). The total annual rainfall varies from 1,000 to 2,086 mm, and can reach 30 mm monthly in winter. The average annual temperature is around 18 °C, with maximums in January between 22 °C and 30 °C and minimum temperatures of 18 °C in the coldest month (MELO; DURIGAN, 2011).

Water table monitoring series
For water table analysis, 44 wells were distributed in the Guarantã (14 wells), Bugre (13 wells), Santana (12 wells) and Passarinho (5 wells) sub basins. The wells were monitored on a semi-montly frequency from September 5, 2014 until October 29, 2015, when level measurements were carried out with monthly frequency until September 2, 2016. This change was for purposes since one year after monitoring the wells, it was noticed that there were no large oscillations within 2 weeks. The wells have heterogeneous depth, varying from 2.94 to 7.68 m. Figure 4 shows the layout of the wells in the study area.

Climatological series
The climatologic data used in this study were historical series of rainfall registered in a manual rain gauge installed in the EEcSB recording data since January 1987. The monthly series are the total precipitated in the period, summing up 29 years and 8 months of observations until August 2016.

Data modelling and water table depths simulation
The PIRFICT model was used for data modelling. To calculate the MGL, MLGL and MHGL, 1,000 model realizations were simulated. The analyzes of the PIRFICT model are performed using the Menyanthes software. Further details on input and output data as well as the complete resolution of the PIRFICT model can be found in Von .

RESULTS AND DISCUSSIONS
In general, Shapoori et al. (2015) point out that TFN models simulate an output observed in the model at a given point in time as a weighting of data from recent forces that are influencing levels (e. g. the function and transfer) plus a correlation term for simulated output not explained by the prevailing forces (e. g. the noise).
With the time series that recorded the monthly precipitations in the period between January 1987 and August 2016, the PIRFICT Figure 4. Locations of the monitoring wells at EEcSB. model was calibrated using this input variable for all wells, totalling 44 time series models. The models presented a good fit (Table 1), with a mean explained variance percentage (EVP) of 81.69%, with the worst adjustment being 64.40% and the best adjustment 92.90%.
The root mean squared error (RMSE) and root mean squared innovation (RMSI) values were also considered low, denoting acceptable adjustments. The IR functions were examined to verify the quality of these adjustments, ratifying the EVP, RMSE and RMSI values found, but are not discussed in this article. In the EEcSB it is possible to observe short memory systems with fast responses, varying as a function of the distance to the nearest drainage and thickness of the unsaturated zone, with strong influence of the precipitation and with linear tendencies of elevation in the levels between September 2014 and August 2016. Figure 5 shows in the upper part the calibration of the PIRFICT model for well S2, considered a good example of adjustment since it had an EVP value close to the average of the 44 wells (82.50%). The lower part of Figure 5 shows the oscillation of the precipitation series and how much it was responsible for the elevation of levels from September 2014 to August 2016. Precipitation was responsible for an oscillation of almost 2 meters between November 2014 and March of 2016, with an inflection in the curve in the winter of 2015.
From these adjustments, 1,000 model realizations were simulated for each well and then the MGL, MLGL and MHGL were calculated (Table 2).  The simulation presented moments of more superficial levels, others of deeper levels, agreeing with the precipitation patterns. There was a difference of 1.16 m between the mean values of MGL and MHGL. Based on the extreme values, the difference between the MLGL and MHGL minimum values ranged from 3.08 to 5.01 m. These values can assist planners and decision makers in estimating well depths, location of water-dependent activities, estimating exploitable volumes, and even artificial recharge practices. From such information, water systems can become more resilient to extreme events, avoiding adverse effects on urban supply, economic activities, agriculture and the environment. Figure 6 shows an example of the simulation performed for well S2. The dots in red are the observations, the green lines superimposed on the 1,000 simulated series, bounded by dashed lines representing the limits of the confidence intervals above and below 5%. The width of the intervals was approximately 1 m between the maximum and minimum values. In general, the simulated series remained within the established intervals, not denoting bias of the simulations.
Since the monitoring period of the simulated wells comprised two periods of climatic anomalies (end of the drought -2014and ENSO 2015, we compared the values of MGL only calculated for the period of the series with the simulated values, denoting a good relationship as can be seen in Figure 7. The coefficient of determination of the regression line (R 2 ) between simulated and observed characteristics was 0.89, presenting a Pearson correlation coefficient of 0.94. In addition to the MGL calculation, the simulation allows to calculate probabilities of occurrence of extreme levels from the confidence intervals (MANZIONE et al., 2010). This good agreement of the data shows that the simulation of the PIRFICT model from the relationship between the precipitation and water table levels oscillation reproduced the main characteristics of the groundwater dynamics in the EEcSB during the study period. The uncertain nature of the data also gives rise to errors. There may be uncertainties related to the data, groundwater levels observed, sensor accuracy,  climatological bases, measurements, approximations and estimates in general, other than those associated with the description of the phenomenon in question by the model and its calibration. To estimate more realistic scenarios, it is recommended to use longer series, avoiding periods of climatic anomalies such as those occurring during the monitoring of the levels in the EEcSB, in order to avoid distortions in the output series.

CONCLUSIONS
Based on the analysis performed in this study, it can be concluded that: • The transfer-function noise in continuous time PIRFICT model presented good adjustments when using monthly rainfall series; • The system identification revealed systems with a short memory, fast response and with strong seasonal influence of the precipitation; • The water table depths simulation from longer series of precipitation made it possible to calculate the statistics on groundwater levels in the EEcSB, respecting the established confidence intervals, without bias in the estimates; • The patterns found in the simulations are consistent with the precipitation input series, presenting moments of more superficial levels, and others of deeper levels.