Hydrological modeling in a basin of the Brazilian Cerrado biome

The Brazilian Cerrado biome (BCB) is among 25 biodiversity hotspots identified worldwide, and covers the recharge area of important aquifers and rivers in South America. The increase in deforestation has been threatening water availability in this region. In order to assist in the water-resource management of the BCB, this study models the daily streamflow in a basin of the Cerrado, using two approaches: a process-based model (Soil and Water Assessment Tool SWAT) and the data-driven model (Artificial Neural Network ANN). The performance of the models was evaluated by the Nash-Sutcliffe coefficient (NSE), coefficient of determination (R) and flow-duration-curves (FDC). The results indicate that SWAT (NSE > 0.61; R > 0.68) and ANN (NSE > 0.91; R > 0.79) models are suitable tools in daily streamflow modeling of the studied basin, with the ANN model being the most accurate. Based on FDC, the ANN model was also better than the SWAT model for all frequencies evaluated. Thus, the ANN model is a promising new approach for daily streamflow modelling in this region. Moreover, the results of this study can help water-resource managers in planning and implementing appropriate water allocation and conservation measures in the Brazilian Cerrado biome.


INTRODUCTION
The Brazilian Cerrado biome is one of the most important areas in South America, covering the recharge areas of important aquifers and rivers, and approximately 204 million ha, which corresponds to 24% of Brazil (Medrado and Lima, 2014); it is recognized as the "cradle of Brazil's water" (Lima, 2011). Besides that, the Brazilian Cerrado biome is one of 25 biodiversity hotspots in the world, with high biological and endemic diversity, has suffered loss of vegetation due to agriculture and pasture expansion (Silva et al. 2006), which has resulting in a large number of endangered species (Myers et al., 2000;Rodrigues et al., 2020). These agriculture and pasture expansions also threaten the stream flows from watersheds (Silva and Bates, 2002); however, the understanding of their impacts on streamflow in the Cerrado biome is still limited (Beuchle et al. 2015). Improving the knowledge base on hydrological modeling in the Cerrado biome is therefore important for water-resource management, since it allows the quantification of current and future water availability, which is essential to ensure water security and economic development.
Hydrological models provide a representation of the processes in the hydrological cycle of a basin and help in understanding, predicting and managing water resources (Devia et al., 2015). Among the various hydrological models that have been developed, the Soil and Water Assessment Tools (SWAT) model is one of the most applied for simulating at basin-scale around the world. SWAT is a semi-conceptual and semi-distributed hydrological model developed by the Agricultural Research Service (ARS/USA) and Texas A&M University in the early 1990s (Arnold et al., 1998). The SWAT has been used to predict streamflow time series, hydrological processes, water balance, and for evaluating the impacts of climate change, landuse change, and different management practices on the surface hydrological cycle, sediment yield, water quality (Abbaspour et al. 2007). Many researchers have applied the SWAT model for different basins. Pontes et al. (2016) applied the SWAT hydrological model to estimate daily and monthly discharges for the Camanducaia River Basin, Brazil, and to evaluate the performance of these calibrations in a contiguous drainage basin. Monteiro et al. (2015) used two precipitation grids, CFSR (Climate Forecast System Reanalysis) and WFDEI (WATCH Forcing Data methodology applied to ERA-Interim), as inputs to a SWAT model for river discharge simulation in the Tocantins catchment, Brazil. Choubin et al. (2019) compared the SWAT and Identification of Hydrographs and Components from Rainfall, Evaporation, and Stream (IHACRES) models in the streamflow regionalization for the Karkheh River Basin, Iran. Rodrigues et al. (2020) applied a SWAT model to simulate the monthly streamflow for three basins of the Brazilian Cerrado biome. Alvarenga et al. (2020) compared the SWAT and Variable Infiltration Capacity (VIC) models in the monthly streamflow simulation for the Verde River Watershed, located in the Minas Gerais state in southern Brazil. However, despite the wide use of the SWAT model, it requires a large amount of temporal and spatial data, maps, and input parameters, that are sometimes hard to predict (Makwana and Tiwari, 2014).

3
Hydrological modeling in a basin of the … Rev. Ambient. Água vol. 16 n. 1, e2639 -Taubaté 2021 Therefore, it is not applicable (or presents a worse performance) to basins in which input data for hydrological modeling are not available (Yaseen et al., 2019). This is the case of most of the Brazilian Cerrado biome basins where there is a significant lack of data (Nóbrega et al., 2017).
An alternative to conceptual models is the Artificial Neural Network (ANN), which is an empirical model capable of learning nonlinear relationships between the variables of a process, and relating inputs and outputs without the need of a detailed understanding of its physical characteristics. This feature makes ANN an effective tool for modeling complex hydrological processes (Talebizadeh and Moridnejad, 2011). Thus, ANN has been widely applied to solve several water-resource problems, and it was found to be a powerful tool for streamflow simulation (Aichouri et al., 2015). Kothari and Gharde (2015) applied ANN for the streamflow modeling of Savitri catchment, India. Zhou et al. (2018) forecasted the monthly streamflow of the Jinsha River by using three (ANN) architectures: extreme learning machine, radial basis function network, and Elman network. Vilanova et al. (2019) applied ANN to simulate daily streamflows for Brazilian Atlantic Rainforest basins. Papalaskaris (2020) applied ANN for the daily low streamflows forecast of Iokastis Stream, Kavala City, NE Greece, NE Mediterranean Basin. However, no study has applied the ANN model to simulate streamflow in basins of the Brazilian Cerrado biome.
The implementation of models with different approaches to the same problem allows exploring the advantages and disadvantages of the models and finding the best and most efficient structure for a given region. Few studies have compared SWAT and ANN models for streamflow estimation (Tan et al., 2020). Demirel et al. (2009) assessed the performance of SWAT and ANN models for the daily flow forecasting in the Pracana Basin, Portugal, and determined that the ANN model yielded the highest accuracy ratio. Noori and Kalin (2016) assessed the performance of SWAT and ANN models for the daily runoff simulation in the Gwinnett and Atlanta watersheds, and determined that the ANN model showed the better performance. Jimeno-Saiez et al. (2018) assessed the performance of SWAT and ANN models for daily streamflows simulation in different climatic zones of Peninsular Spain, and reported that ANN model had performed better for maximum values and for minimum values the SWAT model performance had been better. To the best of our knowledge, there is no study regarding streamflow estimation by comparing the SWAT and ANN in a Brazilian basin. Thus, such assessment is still limited in the literature, and therefore requires more investigation.
In this context, in order to assist in the water-resource management of the Cerrado biome, this study assesses the suitability of the SWAT and ANN models to simulate the daily streamflow in the Manuel Alves da Natividade River Basin (MRB), located in the Brazilian Cerrado biome, and compares their performances to determine which is more appropriate for the studied basin. The MRB was selected based on the great importance that it represents, since it serves several hydroelectric projects, and one of the major irrigation projects in Brazil, the Manuel Alves irrigation project, is installed on it (Tocantins, 2012). This study will provide a basis for the use of local and public administrations in improving a successful basin management strategy in the MRB. It will also help spread the research to different regions of the world.

Study area
The Manuel Alves da Natividade River Basin (MRB) has a drainage area of 14,344 km 2 ( Figure 1) and is one of the main sub-basins of the Tocantins-Araguaia River Basin (TARB), which in turn is the largest basin entirely inserted in the Brazilian territory. The MRB is fully inserted in the Cerrado biome, and the streamflow produced by it directly feeds the Luís Eduardo Magalhães hydropower plant, with an installed capacity of 903 MW (ONS, 2020), corresponding to 25% of its incremental contributing area. It also feeds the Manuel Alves irrigation project, with an irrigable area of 20 thousand hectares, which is one of the major irrigation projects in Brazil (Tocantins, 2012). The MRB is located in southeastern Tocantins state, north Brazil, between latitude 11°09'45" to 12°14'54" S and longitude 46°33'04" to 48°18'40" W, and is limited by the basins of the Rivers Palma from south, Balsas from north, Tocantins from west and São Francisco from east. According to the Köppen classification, the basin climate is Aw (tropical savanna), characterized by a rainy summer and dry winter (Kottek et al., 2006). The average annual precipitation is 1500 mm and the average annual temperature is 25°C.

Soil and Water Assessment Tool (SWAT)
The SWAT is a physically based semi-distributed hydrological model, free of charge, available at (http://swat.tamu.edu/), and widely used across the world. To execute the model, various inputs are required, such as hydrometeorological data, soil and land-use maps and a digital elevation model (DEM) in the basin. The SWAT model divides the basin into multiple sub-basins based on the river network and topography. Each sub-basin is further divided into Hydrologic Response Units (HRU), which consist of unique combinations of soil class, land use and slope (Arnold et al., 1998). SWAT simulations are based on water balance, are performed for each HRU and are accumulated to obtain the total for the sub-basins and basin. Equation 1 describes the water balance adopted by the SWAT model.
Where SWt is the final amount of soil water for the day (mm), SW0 is the initial amount of soil water for the day (mm), Rday is the total rainfall for the day (mm), Qsurf is the surface runoff for the day (mm), Ea is the evapotranspiration for the day (mm), Wseep is the total amount of water that seeps through the base of the soil profile for the day (mm), and Qgw is the groundwater flow for the day (mm). For a detailed description of the SWAT model, refer to Arnold et al. (1998) and Neitsch et al. (2011). To perform the hydrological simulation, SWAT requires hydrometeorological data (rainfall, maximum and minimum air temperatures, solar radiation, relative humidity, and wind speed), and geospatial data, which include the digital elevation model (DEM), land-use map and soil map. For the MRB, daily hydrometeorological data were obtained from seven rainfall gauge stations and one weather station (  For the streamflow modeling, the SWAT model was set up for the period between 1983 and 2005 with daily precipitation, maximum and minimum temperature, solar radiation, wind speed and relative humidity data. The first three years were used as a warm-up period to reduce the uncertainties regarding the initial conditions of the surface domain (Mello et al., 2008). The period from 1986 to 1995 was used for calibration, whereas the period from 1996 to 2005 was used for validation.
The parameters used in calibration (Table 1) were selected based on previous studies conducted in the Cerrado biome, such as Oliveira et al. (2019), Rodrigues et al. (2020) and Amorim et al. (2020), and based on Latin Hypercube One-factor-At-a-Time sensitivity analysis (LH-OAT) method (Van Griensven et al., 2006). LH-OAT was applied using the SUFI2 algorithm in the SWAT-CUP software. Following sensitivity analysis, the daily streamflow calibration was performed manually by changing one parameter at a time, as performed by Nyeko (2015), Pereira et al. (2016) and Rodrigues et al. (2020), and the validation was performed updating the SWAT model with the parameters obtained in the calibration period. Note: Prefixes "V," "R" and "A" correspond to the operations "replace by a given value," "relative an existing parameter value is multiplied by" and "add to the existing parameter value," respectively.

Artificial Neural Network (ANN)
The ANN model is a data-driven mathematical model that was developed to imitate the structure of a human brain neural network and has been widely applied to solve water-resource problems (Minns and Hall, 1996). To execute the model, prior knowledge of the physical 7 Hydrological modeling in a basin of the … Rev. Ambient. Água vol. 16 n. 1, e2639 -Taubaté 2021 characteristics of the process is not required, only hydrological and meteorological data from the basin. The ANN model uses interconnected artificial neurons that receive input information (x1, x2, ..., xn), perform operations and provide an output (y). Each connection between neurons (or synapse) has an associated intensity, expressed by a weight (w1, w2, ..., wn). Each neuron determines an input value (net) through the sum of the products of the input weighted by respective values. The weights are the values that represent the degree of importance of each input to the neuron, obtained at the time of neural network training. Once the weight is determined, it becomes the activation value of the respective neuron. This value is a function of the input (y = f(net)). An activation function precedes the transfer function and has the task of passing the signal obtained through the inputs to the transfer function (Haykin, 2007).
Multilayer perceptron (MLPs) ANN model has hidden layers, and according to Haykin (2007) they have three main characteristics: a) the model of each neuron has a non-linear activation function; b) the network has at least one hidden layer; and c) the network has a high degree of connectivity between its processing elements.
In this study, a feedforward neural network was defined with three layers: input, internal (or hidden) and output layers. This type of network is widely used in signal filtering, data compression, pattern recognition and inter-comparison of patterns. For this neural network, the hyperbolic tangent activation function (tansig) was used for the neurons of the middle layer; the linear activation function (purelin) for the output neuron; and the Bayesian approximation algorithm (trainbr) for network training (Haykin, 2007).
For structuring the MLP artificial neural network, 17 input nodes were used, referring to rainfall, maximum temperature, minimum temperature, solar radiation, potential evaporation, mean temperature, relative humidity, wind speed and streamflow of the previous day. To avoid polarization of the neural network and delays in the learning process, the input variables were normalized between -1 and 1. For this purpose, each variable was divided by the absolute maximum value of the set of corresponding variables. Concerning the middle layer, the number of neurons ranged from 1 to 10, and 30 replicates were generated for each neuron. Because the goal of the network was to model the streamflow of the MRB, only one neuron was used in the output layer. Figure 3 shows the MLP artificial neural network architectures analyzed.
To generate the MLP, the MATLAB® software was used, and to compare the results of the various neural models, Tukey's test (TSD -Tukey's Significant Difference) was applied to determine whether there was a significant difference (5% significance level) between the mean Nash-Sutcliffe coefficients (NSE) (Tukey, 1949).

Model Performance Evaluation
In order to evaluate the performance of the SWAT and ANN models, both in the calibration/training and validation periods, the coefficient of determination (R 2 ) and the Nash-Sutcliffe coefficient (NSE) were used. R 2 describes the proportion of the variance in observed data that can be explained by the model. It ranges from 0 to 1, and the higher its value, the better the fit. R 2 is calculated by Equation 2 and, in general, values greater than 0.6 are considered acceptable (Moriasi et al., 2015).
Where Qobs is the observed streamflow, m³.s -1 , Qsim is the simulated streamflow, m³.s -1 , ̅̅̅̅̅̅ is the observed mean streamflow, m³.s -1 , ̅̅̅̅̅̅ is the simulated mean streamflow, m³.s -1 , and n is the number of data points. NSE indicates how well the plot of observed versus simulated data fits the 1:1 line. It ranges from -∞ to 1, and the higher the value, the better the fit of the simulated to the observed data (Krause et al., 2005). NSE is calculated using the following Equation 3: (3) Moriasi et al. (2015) proposed the following classification for NSE using a daily time step for simulations: NSE > 0.8, the model is considered very good; 0.7 < NSE < 0.8, the model is considered good; and 0.5 < NSE < 0.7, the model is considered satisfactory.

Calibration and validation of the SWAT model
The SWAT model has been calibrated from 1986 to 1995 and validated from 1996 to 2005 using daily streamflow data from the Fazenda Lobeira gauging station, which delimits the MRB. The performance of the SWAT model in terms of R 2 was acceptable (R 2 > 0.6), with values of 0.70 and 0.68 for the calibration and validation periods, respectively. Regarding the NSE, values of 0.67 and 0.61 were obtained for the calibration and validation periods, respectively. These values allow classifying the model as "satisfactory" in both periods (Moriasi et al. 2015). These results demonstrate that the model is able to satisfactorily simulate the observed daily streamflow. Studies on the application of SWAT in different basins in Brazil have used NSE to evaluate the performance of the simulations. Durães et al. (2011) evaluated the performance of the SWAT model in hydrological simulation of the Paraopeba River Basin, with a 10,200 km² drainage area, and obtained NSE value of 0.79 for both the calibration and validation periods. Pereira et al. (2016) evaluated the performance of the SWAT model in hydrologic simulations of the Pomba River Basin, with a drainage area of 8,600 km 2 , and obtained NSE of 0.76 in both the calibration and validation periods. Rodrigues et al. (2020) evaluated the performance of the SWAT model in monthly streamflow simulation of three hydrographic basins located in the Cerrado biome, and obtained NSE values ranging from 0.56 to 0.84 and 0.70 to 0.81, respectively, for calibration and validation periods. Table 1 presents the parameters that were used in the SWAT model, which were selected from the literature review and the LH-OAT sensitivity analysis method, and the final calibrated values of the best simulation generated in calibration for MRB. A detailed description of the 9 Hydrological modeling in a basin of the … Rev. Ambient. Água vol. 16 n. 1, e2639 -Taubaté 2021 parameters used in this study can be obtained in Neitsch et al. (2011).

Training and validation of ANN model
The ANN model was developed using the same hydrometeorological dataset and the same training/calibration (1986 to 1995) and validation (1996 to 2005) periods used in the SWAT model. The training and validation of the ANN model was carried out with the number of neurons in the middle layer (NNML) between 1 and 10, and the NSE results for different ANN architectures are presented in Table 2. Considering the classification of Moriasi et al. (2015) as reference, the NSE values for the ANNs are classified as "very good" (>0.8). The results showed an increase in the quality of the simulation when increasing the NNML up to 3. However, above this threshold, no improvement was observed in the NSE values with increasing the NNML.  Figure 4 shows Tukey's statistical test results at the 5% significance level for the mean NSE obtained from the 30 replicates for each ANN architecture. Taking the result of the network with a single neuron in the middle layer as reference (Figure 4a), a significantly different behavior of the ANNs with a higher NNML is observed. For the network with the NNML set to 2 (Figure 4b), the behavior was statistically similar to that of the networks with the NNML set to 3 and 4, and distinct from the others. The networks with the NNML set to 3 and 4 were significantly different from the network with the NNML set to 1. For networks with the NNML ≥ 5, there was an analogous behavior that was significantly different from networks with the NNML set to 1 and 2. Thus, the structure of 5 NNML was chosen as the proper structure in the Manuel Alves da Natividade River Basin and for the remainder of the analysis, since it was the least complex among those with the best results. The R 2 values (for 5 NNML) were 0.97 and 0.79 for the training and validation periods, respectively.

Comparison of model performance
The comparison of the process-based SWAT model and the data-driven ANN model was carried out to study the suitability of these models to simulate the daily streamflow in a basin of the Brazilian Cerrado biome. Table 3 shows the comparison of SWAT and ANN models for the calibration/training and validation periods. The results suggest that the ANN model had greater performance (higher NSE and R 2 ) than SWAT for the entire simulation period. This result is similar to those obtained by Makwana and Tiwari (2017), Jimeno-Sáez et al. (2018) and Ahmadi et al. (2019), which observed a better performance of the ANN model than the SWAT model, considering both R 2 and NSE coefficients.
SWAT and ANN performances can also be observed by means of scatter plots (Figure 5a, b), for the calibration/training and validation periods, respectively. From these graphics, it can be observed that the scattered SWAT points are concentrated above the 1: 1 line (perfect fit), which means a tendency to overestimate the streamflow. In addition, the ANN model presented better estimates for almost all the streamflow values. In general, the scattered ANN points are closer to the 1:1 line (perfect fit) than SWAT points, which means greater precision.    Figure 6 shows the observed daily rainfall and streamflow data along with the streamflow data simulated by the ANN (a) and the SWAT (b) models for the calibration and validation periods. The results indicate that both models showed a good performance when compared against the observed streamflow. However, the adherence of the simulated to the observed data shows an underestimation of the maximum streamflow values by both models. The peak discharges are related to the surface runoff as a response to the intense rainfall events. However, the magnitude of the maximum streamflow depends not only on the magnitude of the intense rainfall but also on other factors, such as soil moisture, topography and land-use, making the modeling of maximum events a hard task in the context of continuous simulations. Otherwise, for recession periods, a greater agreement between the curves was observed. The greater performance of the models in the recession periods is explained based on the predominance of groundwater runoff, which is easier to model than direct surface runoff, given that its genesis is related to the discharge of the aquifer, following the Darcy's law for fluidity in porous media.
In order to further assess the difference between the SWAT and ANN models in the streamflow simulation in the MRB, the flow duration curves (FDC) were developed (Figure 7). It can be observed that the simulated curves presented good agreement with the observed one.    Table 4 shows the percentage difference of simulated streamflows with different frequencies of exceedance by the SWAT and ANN models compared to the observed streamflows. The best results were obtained from the ANN model, so its error rates were lower than those of the SWAT model for all simulated streamflows. The ANN model had a very good performance in estimating maximum values and a good performance in estimating minimum values, with error rates ranging from 0.5 to 20.3%. In turn, the SWAT model had a good performance in estimating maximum values, but in estimating average values, its performance was poor, showing error rates ranging from 4.5 to 83.1%. When similar studies in the literature are examined, results similar to this study are obtained. Singh (2016), evaluating the ANN and SWAT models to simulate the streamflow for an Agricultural Watershed in India, reported that the ANN model estimates the streamflow values more accurately and with less uncertainty. Koycegiz and Buyukyildiz (2019), evaluating the ANN and SWAT models to simulate the streamflow at the headwater of Çarsamba River, located at the Konya Closed Basin, Turkey, reported that the ANN model was more successful than the SWAT model. Ahmadi et al. (2019), evaluating the ANN and SWAT models to simulate the daily, monthly, and annual streamflows for the Kan Watershed, Iran, reported that the ANN model performed better in all streamflow simulations. However, when evaluating the literature, studies with diferente results are also found. Zakizadeh et al. (2020) analyzed the performance of the SWAT and ANN models in the runoff simulation for the Darake Watershed, Iran. They found that, for the maximum values, the performance of ANN model was better than that of SWAT, and for the minimum values the performance of SWAT model was better than that of ANN. Kim et al. (2015) analyzed the SWAT and ANN models for streamflow estimation of the Samho gauging station at Taehwa River, Korea. They found that the ANN was better at estimating high flows, while the SWAT model was better at simulating low flows. Pradhan et al. (2020) analyzed the SWAT and ANN models to simulate daily streamflow in three different river basins in different climatic regions of Asia. They found that in general, in two basins, the performance of the ANN model is better than the SWAT, whereas, in the other basin, the SWAT model performance is better than the ANN. Also, the SWAT model was found to be better for low flow simulation and the ANN model performed better for high flow simulation in the three river basins. Jimeno-Sáez et al. (2018) analyzed the SWAT and ANN models to simulate streamflow in two catchments in Peninsular Spain. They reported that the SWAT was more successful in relation to better simulation of lower flows, while ANN was superior at estimating higher flows in all cases. Thus, it can be seen that there is not a better model for all regions; thereby, one can highlight the importance of this analysis for the proper management of water resources. Table 4. Observed streamflows, in m 3 s −1 , with exceeding of 5% (Q5%), 10% (Q10%), 20% (Q20%), 50% (Q50%), 90% (Q90%) and 95% (Q95%), and the percentage differences (∆Q) of these simulated streamflows by the SWAT and ANN model in comparison against the observed streamflow. Therefore, our results suggest that the use of ANN and SWAT models is suitable for simulating the daily streamflow in the MRB, with the ANN model showing better performance. However, although the ANN model showed better streamflow simulation ability than SWAT, ANN model is a data-driven model, which does not consider the hydrological processes involved. Therefore, it is not recommended for situations in which there are alterations of the physical characteristics of the basin, like simulating hydrologic impacts under climate or land use change scenarios, since the rainfall network and land-use types are changed. In contrast, the process-based SWAT model describes the rainfall-streamflow transformation processes in detail; thus, it can better simulate the response of streamflow to changes in environmental factors than the data-driven model.

CONCLUSIONS
The objective of this study was to model the daily streamflow in a basin of the Brazilian Cerrado biome, using two approaches: a process-based model (the Soil and Water Assessment Tool -SWAT) and the data-driven ANN model, comparing their performance to determine which is more appropriate for the studied basin. It was concluded that both SWAT and ANN were suitable to simulate daily streamflow in the MRB. However, ANN model was better than the SWAT model during calibration and validation periods in all evaluations performed.
The multilayer perceptron artificial neural network was used to model daily streamflow in the MRB based on rainfall, streamflow and meteorological data. The best configuration had 5 neurons in the middle layer. The results of the statistical coefficients showed very good performance in the daily streamflow modeling. In addition, one of the advantages of the ANN model is that it does not require any physical characteristics of the basin and, therefore, its implementation is easier. However, it does not consider hydrological processes, thus, the ANN model cannot be used to simulate the streamflow if the physical characteristics in the basin change. Therefore, ANN is recommended for situations requiring real-time streamflow predictions, as is the case for civil surveillance.
The SWAT hydrological model produced adequate results for daily streamflow modeling in the MRB. It is concluded that the greater complexity of this method, the need for hydrological experience and the higher computational demand are justified for situations in which the analysis of processes of the hydrological cycle are important in a particular drainage basin, as for example in the simulation of land-use changes or climate changes scenarios.
Therefore, the results of this study can help in understanding the capabilities of both models to simulate daily streamflow in the MRB, and contribute to the proper management of water resources in the Brazilian Cerrado biome.