Modeling coordinated operation of multiple hydropower reservoirs at a continental scale using artificial neural network: the case of Brazilian hydropower system

Reservoirs considerably affect river streamflow and need to be accurately represented in environmental impact studies. Modeling reservoir outflow represents a challenge to hydrological studies since reservoir operations vary with flood risk, economic and demand aspects. The Brazilian Interconnected Energy System (SIN) is an example of a unique and complex system of coordinated operation composed by more than 160 large reservoirs. We proposed and evaluated an integrated approach to simulate daily outflows from most of the SIN reservoirs (138) using an Artificial Neural Network (ANN) model, distinguishing run-of-the-river and storage reservoirs and testing cases whether outflow and level data were available as input. Also, we investigated the influence of the proposed input features (14) on the simulated outflow, related to reservoir water balance, seasonality, and demand. As a result, we verified that the outputs of the ANN model were mainly influenced by local water balance variables, such as the reservoir inflow of the present day and outflow of the day before. However, other features such as the water level of 4 large reservoirs that represent different regions of the country, which infers about hydropower demand through water availability, seemed to influence to some extent reservoirs outflow estimates. This result indicates advantages in using an integrated approach rather than looking at each reservoir individually. In terms of data availability, it was tested scenarios with (WITH_Qout) and without (NO_Qout and SIM_Qout) observed outflow and water level as input features to the ANN model. The NO_Qout model is trained without outflow and water level while the SIM_Qout model is trained with all input features, but it is fed with simulated outflows and water levels rather than observations. These 3 ANN models were compared with two simple benchmarks: outflow is equal to the outflow of the day before (STEADY) and the outflow is equal to the inflow of the same day (INFLOW). For run-of-the-river reservoirs, an ANN model is not necessary as outflow is virtually equal to inflow. For storage reservoirs, the ANN estimates reached median Nash-Sutcliffe efficiencies (NSE) of 0.91, 0.77 and 0.68 for WITH_, NO_ and SIM_Qout respectively, compared to a median NSE of 0.81 and 0.29 for the STEADY and INFLOW benchmarks respectively. In conclusion, the ANN models presented satisfactory performances: when outflow observations are available, WITH_Qout model outperforms STEADY; otherwise, NO_Qout and SIM_Qout models outperform INFLOW.


INTRODUCTION
River dams are structures that affect the natural streamflow and needed to be accurately represented on hydrological studies (Zajac et al., 2017). Reservoirs assume a regulation function, modifying river flow duration curves and attenuating flow peaks (Ayalew et al., 2013;Li et al., 2010;Vogel et al., 2007;Volpi et al., 2018). These structures hold water on land, approximately tripling the mean age of the river water worldwide, which has impact not only on the natural hydrograph but on the sediment flux and the re-oxigenation of surface water (Vörösmarty et al., 1997). Regarding water volume, reservoir operations and water irrigation are responsible to reduce the global river discharge in approximately 2.1% (Biemans et al., 2011).
Worldwide, Lehner et al. (2011) estimated that there are nearly 3 million impoundments larger than 0.1 ha, in consequence, only 36% of rivers longer than 1,000 km are free flowing rivers, i.e. present natural connectivity (Grill et al., 2019). Nowadays, nearly 60,000 large dams (defined as over 15 m height or impounding more than 3 million cubic meters) are listed in the World Register of Dams database of which 20% to 25% are intended for hydropower purposes (International Commission on Large Dams, 2020). And the number of hydropower reservoirs are consistently growing. In 2014, over 3,000 major dams with capacity over 1 MW were planned to be built, mostly in developing countries (Zarfl et al., 2014). The large number of existing and planned reservoirs and their cumulative effect on streamflow justifies an explicit representation of hydropower reservoirs for accurate streamflow estimates.
There are several difficulties in estimating reservoir operation and outflows, as they are influenced by fluctuations on demand, downstream conditions, costs of other sources of energy, bed floor leakage and irregular inflow series. These challenges are more pronounced at large-scale context (national, continental, global). In several cases, operating systems deal with a cascade of reservoirs which demand complex optimization technics to maximize energy production (Liu et al., 2011;Pereira & Pinto, 1985;Zahraie & Karamouz, 2004).
Some studies have proposed simplified operation schemes to estimate reservoir outflows in a large-scale context. For sake of general applicability, these outflow simulations only use inflow and storage as input (Hanasaki et al., 2006;Shin et al., 2019), although extra information about the reservoir purpose, such as water demand for irrigation and maximum discharge for flood control, might be needed on the optimization process (Haddeland et al., 2006) or as a limiting condition (Zhao et al., 2016). In general, these simplified operation schemes are composed by few linear equations, tested on global/continental hydrological models with a monthly time step and they yield adequate results for a limited data situation and a large-scale context.
On the other hand, machine learning techniques are interesting alternatives to evaluate specific reservoirs and their operation when enough data is provided. Machine learning techniques have ability to represent highly non-linear relations and can autonomously detect patterns and provide predictions. Artificial neural networks (ANN), for example, has proved useful for optimizing reservoir operations (Carneiro & Farias, 2013;Chaves & Chang, 2008;Senthil Kumar et al., 2013) and estimating reservoir inflows (Paz et al., 2008;Valipour et al., 2013). Different machine learning techniques were used to simulate operation of specific reservoirs, such as decision-tree methods (Yang et al., 2016), supporting vector machine, ANN of a single layer and deep learning (Zhang et al., 2018). In a broader scale, Ehsani et al. (2016) proposed a general reservoir operation scheme (GROS) based on ANN, suitable for large-scale modelling. The authors coupled an ANN with a water balance model to simulate reservoir storage and release from reservoir inflow, using the output variables as input for the next steps. GROS presented better performance compared to other simplified methods to estimate operation.
Palavras-chave: Estimativa de escoamento do reservatório; Aprendizado de máquina. energy sources and water availability. Thus, Brazil's interconnected system of hydropower reservoirs ends up being an interesting example that require an integrated approach for modeling outflows rather than using local and specific features only.
In this paper, we simulate daily outflows of most of the hydropower reservoirs connected to the SIN using ANN and assess the model capacity to represent a coordinated system. This work characterizes a proof of concept that machine learning techniques can model individual reservoirs of a complex hydropower system in a large-scale context. It was proposed several input features related not only to local reservoir water balance, seasonality, and demand, but also information from other reservoirs. The relevance of each variable was evaluated distinguishing run-of-the-river and storage reservoirs and testing cases whether outflow and water level data were available as input. We have not focused on proposing an optimized operation but simulated daily outflows that can be useful for environmental impact studies on specific river reaches or coupling to hydrological models to understand effects on basin scale.

METHODOLOGY
The national interconnected hydrothermal energy system (SIN) All Brazilian major dams are operated considering the SIN, which concentrate almost 68% of the national electrical production capacity (Operador Nacional do Sistema Elétrico, 2019). Due to the significantly diverse hydrological characteristics, currently the SIN is divided as regional interconnected subsystems (South, Southeast/ Central-west, Northeast, and North), where all the reservoirs within each subsystem are treated as a single equivalent reservoir. Currently, there are more than 160 reservoirs in the SIN (Figure 1). It is coordinated by the ONS, which tries to minimize spills and maximize the hydro electrical energy production, in order to avoid the utilization of the expensive and air-polluting thermal energy, while it also guarantees consumptive water uses and environmental restrictions (Operador Nacional do Sistema Elétrico, 2019). It is a hard task, because it considers the randomness of the affluent flows, the expansion of the system, flows downstream, future demands, the current reservoirs storage, etc. (Zambon, 2015). A system wide operation strategy prevails over individual ones, and the operation of a given hydropower plant affects other units downstream. The system allows hydropower plants to dispatch and transfer the energy to another region, where the reservoirs are low in storage, avoiding the use of local thermal plants.

Artificial neural network model
ANN is a self-learning technique that estimates an output variable giving a proper amount of data. It is composed by an input layer (formed by the input features), one or more hidden layers, and an output layer, which contains the information learned by the neural network. Input features multiplied by coefficients (synaptic weights) feed an activation function, resulting on "neurons" (nodes) that build a first hidden layer. Following the same steps, neurons of the first hidden layer are multiplied by synaptic weights to generate neurons of a next layer and so on. A trained ANN have optimized weight matrices, which is usually obtained through an iterative method called backpropagation (Rumelhart et al., 1986). This method compares observations to ANN outputs, generating deltas that are propagated backwards on the network to correct the weight matrices, from the output to the input layer. ANN are often referred to arrangements with few hidden layers, as neural networks composed by many hidden layers are often called deep-learning techniques (Shen, 2018).
In this paper, it is proposed an ANN of a "nearly" single hidden layer ( Figure 2) and sigmoidal activation functions to predict outflow in time t (OUT0). The input variables are submitted to feature scaling (normalization) in order to guarantee equal range between features and enable weight comparison. We selected iii) state of other reservoirs: upstream and downstream reservoir water levels (UPST and DOWN, respectively) and energy availability (LRL) inferred by SIN reservoir levels (UHE).
The features related to the water balance are often the most relevant variables, sometimes used exclusively as input in an ANN for operation prediction. It was selected as input: inflow on time t (INF0), inflow one and two days before (INF1 and INF2, respectively), water level and outflow one and two days before (LEV1, LEV2, OUT1, OUT2).
Time variables are important to account for seasonality and demand. The time-of-the-year feature consists of a day within a year from 1 to 365 and was adapted to circular representations (i.e. sine and cosine of day 2 365 π ) in order to provide continuity from one year to the other (e.g. from December 31 st to January 1 st ). The weekday feature is important on hydropower operation since there is a significant reduction of energy demand on weekends as industries stop working. WDAY assumed a value of 1 if it is weekend or holiday and 0 if it is a workday. Continuous time refers to the day since the beginning of a reservoir operation and it was considered to infer eventual changes on the operation due to changes on the energy demands.
Finally, operation of Brazilian hydropower reservoirs is integrated and optimized to generate most energy for the interconnected system. Thus, it was considered information of other reservoirs as well. It was evaluated the water level of one reservoir upstream and other downstream, when applicable, in order to account for level restrictions, safety operations (e.g. flood control) and maximum energy generation for the cascade system of reservoirs. These variables are the UPST and DOWN features. In addition, the accumulated water volume of other hydropower reservoirs indicates the water availability for energy generation and, consequently, the energy demand for that specific reservoir. For example, if most of the great hydropower plants are operating in high water levels, consequently the country's energy demand is likely to be met, and the operator decision of a specific reservoir might be to reserve water for times of scarcity. Then, it was selected 4 large storage reservoirs from different regions of Brazil (North, Northeast, Southeast and South) as a proxy of the current potential of hydropower energy generation: Tucuruí, Sobradinho, Marimbondo and Foz do Areia. These reservoirs are important in terms of energy production and are old enough to provide a long time series of inflow, outflow and water level. The water level in time t-1 from these 4 reservoirs (UHE1, 2, 3 and 4) are input to a first node named LRL (large reservoirs level), which consist of the only neuron that composes the first hidden layer, but it can be interpreted as a neuron on the input layer ( Figure 2) if we consider that this ANN has only a single hidden layer.

Data acquisition and ANN training
The ANN input data and output observations were obtained through the Brazilian Reservoir Monitoring System (SAR) database from the Brazilian National Water Agency (ANA). The SAR Brêda et al.

5/12
database provided inflow, outflow, and water level time series from 159 reservoirs connected to SIN by the day it was accessed (Nov/2019) (Table 1). Then, it was selected reservoirs that had at least 8 years of not necessarily consecutive data registered with information of all input features (Figure 3). Thus, 21 reservoirs were discarded, remaining 138 of which 71 are classified by the ONS as run-of-the-river and 67 as storage reservoir.
Data was randomly split into training data (60%), crosstraining data (20%) and validation data (20%). The training data are used on the optimization process, but the weight matrices are selected based on the model fitness to the cross-training data. The backpropagation algorithm is fed exclusively with the training data while the cross-training data helps to select unbiased and not overfitted weight matrices on previous algorithm iterations. Then, the validation data, which was left untouched, indicates the ANN performance.
A complexity test was performed in order to identify an adequate number of neurons in the hidden layer. We selected two large and important reservoirs in terms of energy production for the complexity test, each representing a different type of hydropower facility: run-of-the-river reservoirs (Itaipu) and storage reservoirs (Furnas). It was assessed ANN arrangements with 1, 2, 3, 5, 7, 10, 15 and 20 hidden neurons and in each configuration the ANN was trained 10 times in order to obtain a more reliable and representative result and not depend on the algorithm starting point.
After deciding on an adequate number of hidden neurons, the ANN was specifically trained for each reservoir with the training data. It was used a momentum term of 0.96 (Rumelhart et al., 1986) and an initial learning rate of 0.0001 that is adapted based on the error evolution (Vogl et al., 1988), both technics applied to accelerate the convergence of the gradient descent. The backpropagation algorithm was run three times to improve chances of reaching a good set of weight matrices, selecting the best results based on the cross-training sample.

ANN assessment
Information about the reservoir level and previous releases are essential to estimate the present outflow. However, this data is not always available, or the latency period is too long for immediate applications. So, we have shaped this ANN model to fit a short These three ANN models are compared with two simple benchmarks: outflow of time t is equal to outflow in time t-1, which we called steady hypothesis (STEADY); and outflow is equal to inflow (INFLOW). The performances are evaluated in terms of the normalized root mean squared error (NRMSE) and Nash-Sutcliffe efficiency (NSE) which are given by: where Q is the simulated outflow; o Q is the observed outflow; i Q is the mean simulated outflow; o Q is the mean observed outflow and n is the number of samples.
17 features were proposed on this ANN arrangement, water balance, time or demand related. We evaluated the influence of every feature on the ANN output in order to understand which input variables are more relevant. We adopted the Weight method (Garson, 1991apud Gevrey et al., 2003, which basically multiply normalized synaptic weights from layer to layer. In this specific case, there is just one hidden layer, thus there are only two weight matrices (and a third to build the LRL neuron which was not evaluated, see Figure 2). The weight method was conducted as follow: i) Consider the first weight matrix ( h Θ ), with dimensions n h × (nº of features × nº of neurons); ii) h Θ is normalized dividing each component by the sum of the coefficients related to each neuron (column); iii) the second weight matrix ( o Θ ), with dimensions h 1 × (nº of neurons × nº of outputs), is also normalized; iv) finally the input features influence on the output is given by the dot product of the normalized weight matrices.
where is the number of input features; h is the number of hidden neurons; h Θ is the first weight matrix, from the input to the hidden layer; xh Θ is a specific line of h Θ that correspond to the weights of feature x ; and o Θ is the second weight matrix, from the hidden to the output layer. As the proposed ANN only has one output, the second weight matrix ( o Θ ) has just one column. Thus, the weight influence of feature x in the output ( x W -synaptic weight factor) is given by the dot product of the normalized line correspondent of x on the first weight matrix and the normalized column of the second weight matrix (Equation 3).

RESULTS
The complexity test indicates the number of hidden neurons that should be used in the ANN in order to provide an efficient and accurate performance. Figure 4 demonstrates the ANN performance considering different number of neurons in the hidden layer in terms of NRMSE. The NRMSE have converged to 20% in Furnas and to 4.75% in Itaipu, which suggest that an even larger number of hidden neurons would not improve the ANN Figure 3. Data quantification of all 159 SIN reservoirs available on the SAR database. Blue dots represent reservoirs selected on this study, red dots are the discarded reservoirs and the red line represents a threshold of eight years of data. The black triangles represent four specific reservoirs named above.

7/12
performance. The run-of-the-river reservoir has converged with lesser hidden neurons: 5 in Itaipu compared to 7 in Furnas. This was expected since there is almost no range of storage volume for reservoir operation in a run-of-the-river reservoir and, by definition, outflow is mostly governed by inflow. It is common practice to select the smallest number of hidden neurons that provide a good performance. Since the same ANN arrangement are applied to all reservoirs, we adopted an ANN composed by 10 hidden neurons as a cautious alternative.
The ANN training seemed to succeed. First results show that usual regression problems such as overfitting were avoided, since training (TRAIN), cross-training (X-TRAIN) and validation (VALID) data presented similar performances ( Figure 5). Also, outflow estimation was reasonable as NRMSEs median were around 14% and 18% for WITH_Qout and NO_Qout ANN models, respectively.
The influence of every input feature on the predicted variable was evaluated in terms of synaptic weights through the Weight method. Although this method has no physical meaning, it infers about input features that are more influent on the ANN output. Figure 6 illustrate the Weight method results as a box plot using the 138 selected SIN reservoirs as samples. The input variables DOWN and UPST were not considered in this specific test since they are not applicable to all reservoirs.
For the WITH_Qout model, the inflow in time t (INF0) is the feature that has the most influence on the ANN predicted variable (OUT0), followed by the outflow in time t-1 (OUT1). However, this sample analysis is biased since outflows of runof-the river reservoirs are largely dominated by inflow. Analyzing each type of reservoir individually, we can see that outflow of run-of-the-river reservoirs are indeed governed by inflow; but on storage reservoirs, OUT1 has more influence on the predicted variable than INF0.
For the NO_Qout model, outflow and level are not input variables, thus INF0 largely influences the ANN output. However, looking at storage reservoirs, other features that are related to time or demand presented a high weight factor as well. The LRL feature, for example, is indirectly related to hydropower demand  based on the water level of large SIN reservoirs. This feature presented a relatively high synaptic weight factor, which indicates the benefits of an integrated operation to simulate hydropower reservoirs in Brazil.
The ANN was able to predict relatively well outflow for most SIN reservoirs (Figure 7). When outflow and level observations were available as input to the ANN (WITH_Qout), the sample median NRMSE (NSE) was 14% (0.95) compared to 25% (0.83) and 24% (0.85) of the STEADY and INFLOW benchmarks, respectively, and the RMSE upper (NSE lower) quartile was 19% (0.90) compared to 38% (0.74) and 59% (0.08). If outflow and level observations were unknown, the ANN performance deteriorates. In general terms, SIM_Qout presented a performance similar to INFLOW, but for storage reservoirs SIM_Qout performance was much superior. NO_Qout presented a superior performance in average compared to INFLOW and STEADY, but for storage reservoirs exclusively, STEADY was a slightly better. Given this ANN arrangement and input features, these results indicate that it is 9/12 better to train an ANN without non-observed variables (OUT and LEV -NO_Qout), rather than training an ANN with all features and use simulated variables as input (SIM_Qout). The latter was adopted by GROS (Ehsani et al., 2016), which provided adequate results in large-scale, however it is important to remark that the authors used a different ANN structure with more hidden layers and less input features.
The performances of outflow predictions were strongly dependent of the type of the hydropower reservoir. Errors were much smaller for run-of-the-river reservoirs compared to storage reservoirs, which was expected since outflow of the former is easier to be estimated. Indeed, the INFLOW assumption presented a performance as good as WITH_Qout for run-of-theriver reservoirs; however, for storage reservoirs its performance considerably worsens. This indicates that storage reservoirs must be well represented in a hydrological model in order to provide accurate estimation of discharge downstream rather than only simulate natural streamflow. Furthermore, NO_Qout provided results similar to STEADY for storage reservoirs. While the former is a good option if no outflow and level data is available on the simulation period, the latter becomes a simple alternative to represent storage reservoir releases if previous days outflow is known. Figure 8 illustrates and exemplifies the ANN results through outflow hydrographs of four storage reservoirs: Furnas (a), Jurumirim (b), Sobradinho (c) and Tucuruí (d). It can be seen that ANN provide better estimates compared to INFLOW. The ANN were able to represent reservoirs with high regularization capacity that significantly impacts natural streamflow regimes. In fact, WITH_Qout outflow considerably approximates to observation, while outflows from NO_Qout and SIM_Qout provide adequate seasonal tendencies but are rarely accurate in a daily scale. The weekday feature can be detected on the ANN outflow hydrographs as a 7 days cycle where outflow reduces on the last day; this variable seems to be important to predict outflow especially in Furnas. Particularly on Tucuruí, ANN NRMSE varied from 9% (WITH_Qout) to 17% (SIM_Qout), while INFLOW presented a NRMSE of 33%. In terms of Nash Sutcliff coefficient, ANN performances varied from 0.99 (WITH_Qout) to 0.95 (SIM_Qout), while INFLOW was 0.81. These results indicate that Tucuruí has a relatively well-defined operation and the ANN was capable to capture that.

CONCLUSION
This paper offered a first evaluation of the potential of using machine learning techniques to simulate a complex coordinated system of hydropower reservoirs such as the Brazilian SIN. It was proposed an ANN model to predict daily outflow from most of hydropower reservoirs connected to SIN giving water balance, time, and demand input variables.
We used 14 input features and assessed their influence on the model output. As expected, model outflow is mainly influenced by water balance variables, such as the outflow of previous days and inflow. However, features as the water levels in 4 large representative reservoirs (LRL), which infers about hydropower demand through the water availability in different regions of the country, seemed to influence to some extent reservoirs outflow predictions, indicating advantages of an integrated assessment of SIN reservoirs.
The ANN was trained with (WITH_Qout) and without (NO_Qout and SIM_Qout) reservoir water level and outflow observations as input features to represent usual situations of outflow estimates. The ANN results were compared to two simple benchmarks: outflow is equal to the outflow of the day before (STEADY) and outflow is equal to inflow (INFLOW). There is a significant difference between estimating outflow on run-of-theriver and storage reservoirs since the latter considerably modify the natural streamflow hydrograph. Using ANN for run-of-the-river reservoirs seemed unnecessary as inflow is almost equal to outflow. However, for storage reservoirs, the ANN model presented a superior performance compared to the benchmarks. When outflow data is available, the WITH_Qout ANN model (median NSE=0.91) outperforms STEADY (median NSE=0.81). When outflow data is not available, ANN performance deteriorates, dropping to NSE medians of 0.77 (NO_Qout) and 0.68 (SIM_Qout), however it is still much superior to INFLOW (median NSE = 0.29).
In conclusion, we have simulated daily outflow from 138 reservoirs individually but using input features that include information from other reservoirs as well and the model performance has been superior to the benchmarks. These results indicate that an integrated approach benefits simulation of a coordinate hydropower system and machine learning techniques are interesting tools for estimating reservoir outflow. However, other ANN arrangements, other possible input features and/ or deep learning techniques might improve outflow predictions (Zhang et al., 2018) giving this large amount of data and high nonlinearity. The ANN model reasonably approximates to reservoir operations and becomes an interesting tool for large scale analysis of streamflow impacts. However, this is a general model and does not substitute specific reservoirs models that include important local information such as water supply demands, flood control, and environmental legislation for management purposes.