Monthly streamflow forecast for National Interconnected System (NIS) using Periodic Auto-regressive Endogenous Models (PAR) and Exogenous (PARX) with climate information

This study aims to find a seasonal streamflow forecast model simultaneous to all stations of SIN using periodic autoregressive models with exogenous variables (PARX) using climate indexes. Comparing the results from PAR and PARX Models, this research analyzes the impact on forecasts by using climate information. The proposed models for streamflow forecast has been carried out using natural streamflow data from Operador Nacional do Sistema (ONS) and statistical techniques (such as multiple linear regression and stepwise method to choose explanatory variables). On 27 climate indexes utilized, 4 of them are suggested in this work. The performance analysis methodology is based on the ELECTRE method further the NASH coefficient, the mean absolute percentage error, the multi-criteria distance and correlation. Forecasts with one month lead, the PAR models present better results for most stations of SIN within seasons DJF, MAM, and JJA, while for SON season there is greater efficiency from PARX model. This kind of model shows better performance during dry season in the basins at Northern Brazil – Amazonas and Araguaia-Tocantins; Central-Eastern Brazil – Eastern Atlantic and the most rivers located in the Paraná basin.


INTRODUCTION
The Brazilian power grid has a great contribution in the energy from hydroelectric plants.The Brazilian hydroelectricity generation and distribution capacity is directly influenced by the hydrological variability of the streamflows and the interconnection of the hydroelectric power plants, whose forecasts and uncertainties should be considered in the system operation and planning (ALEXANDRE, 2012;ANEEL, 2002).
In general terms, the quality of the hydrological forecasts affects the performance of the system operation.Thus, methodological improvements of hydrological scenarios or inflows generation contribute to the advance of the planning process and the operation schedule of the National Integrated System (NIS).
The monthly streamflow forecasting currently used by the National System Operator (ONS) can be performed through two methods.The first one uses rain-flow models with daily time-step added to the stochastic model of weekly streamflow forecast -PREVIVAZ, which can predict up to six weeks lead time.The generation of streamflow scenarios is obtained from the processing of the GEVAZP model, which usually consider 1,000 scenarios generated from a historical series of natural streamflow with 12 previous months.The second method can be performed more directly by the use of the PREVIVAZM software, which has a calculation structure similar to that of PREVIVAZ with a monthly time-step.The PREVIVAZM model was elaborated as a tool for special studies to verify the conditions of the supply of the power demand.The models utilized in both methods use the Periodic AutoRegressive model (PAR) or the Periodic AutoRegressive model with Moving Averaging (PARMA) (ALEXANDRE, 2012).
Studies performed since the 1990s pointed out to the use of climatic variables could reduce uncertainties due to the high correlation of these variables with hydrological variables (SOUZA FILHO;LALL, 2003;FOLEY et al., 2002;BARROS et al., 2004).On global scale, can be highlighted the works of Dettinger and Diaz (2001), Amarasekera et al. (1997) carried out in the tropical zones of South America and Africa and that of Stuck et al. (2006) and Garcia and Mechoso (2005) in South America.The studies of Carriello et al. (2005), Soares et al. (2006) and Kim and Vissotto (2003) analyzed the relation between Sea Surface Temperature (SST) and the hydrography in Brazil.In terms of specific regional basins, the works of Cardoso and Silva Dias (2006) and Berri et al. (2002), focused on basins in the Brazilian southeast; Foley et al. (2002) and Barros et al. (2004) analyzed basins of the Brazilian northwest showed the good representative capacity from SST variations associated to the streamflow variability in Brazil.
In order to take advantage of the climatic information, statistical models of monthly streamflow forecasting have been proposed in several studies, proving that the use of climatological indexes as explanatory variables in mathematical models provides ability to explain regimes of intranual inflows.Among those studies, we can highlight the ones by Uvo and Graham (1998), Souza Filho and Lall (2003, 2004), Silva and Molion (2004), Rohn et al. (2003), andRocha et al. (2007).
The current paper has the objective of developing a simultaneous seasonal flow prediction model for all NIS sites by Periodic Autoregressive Models with Exogenous Variables (PARX) by using climatic indexes, preserving the spatial structure through the correlation of the residuals.

National Interconnected System (SIN)
The National Interconnected System (SIN) manages the generation and transmission of electricity in Brazil.The SIN is a hydrothermal system in large scale with prominently hydropower plants, just 1.7% of installed capacity in Brazil are isolated systems located over Amazon region (ONS, 2016).
For planning and management, the ONS adopts streamflow forecasting by using subsets of hydrological output in each basin, denominated base station.At the rest of the sites, the streamflows is predicted through monthly linear regression based on the data from the base stations to complement the forecast for the whole SIN.ONS currently works with a total of 88 Base Stations (BS), which are representative of the various regional hydrological regimes found in Brazil (ONS, 2010).These BS are listed in Figure 1.

Periodic Autoregressive Model (PAR)
The method is based on periodic autoregressive models and for the maintenance of the spatial correlation.,Among the BS, the noise correlation (CRD) is utilized and defined at the moment of the calibration of the regression models.The first procedure to establish the PAR model is to choose the explanatory variables by using stepwise technique.
The standardization of the monthly series is the first stage of calculation of most hydrological series analyzes, as follows ( ) Z represents standardized streamflow in month (m=1,2,3...12) for the time t.Since t the number of the months in the time series, then it varies from 1 to 12*N, where N is the total of years., t m Q is streamflow at a given BS in period of months t in m 3 /s, m µ is the seasonal streamflow average of m (unit m 3 /s), and m σ is the seasonal streamflow standard deviation in month m in m 3 /s.
The PAR model (p) is referred by the index "p" and denominated the order or autoregressive terms from the model.In general, "p" is a vector that establish the order of each element given on a PAR period (p 1 , p 2 ,..., p 12 ).The standard formulation for a standardized variable "Z" could be formally descripted with P is the order of the Autoregressive Period operator "m", and t a the time series of noise, they are independent with mean zero and variance ( )   2 m a σ .The noises are defined at the moment of the calibration of the models, based on the difference between the expected inflows and the observed ones.The maintenance of the spatial correlation in the prediction of streamflow is performed through the use of the monthly covariance matrix of the noise between the BS to generate the random component of the error following a normal multivariate distribution.

Periodic Autoregressive Exogenous Model with noise correlated (PARX-CRD)
The PARX-CRD model generation strategy is similar to PAR-CRD, in order to consider previous months as predictors of the month to be forecasted and by maintaining the spatial correlations.However, it adds climatic variables as possible predictors, similar to Equation 2 where − m 1 j β is the predictor climatic variable j for the month m.Monthly data were obtained from (NOAA, 2014).As criterion for choosing the available indices, all indexes that have some physical relationship to the streamflow variability in South America were used in the correlation analysis.This analysis pointed to the possibility of using 22 indices added to the SST Index of the SST in South Atlantic at the Confluence of the Brazil-Malvinas (ASBM) defined by Cataldi (2008).
Then the possibility of creating other climate indices was analyzed.First, the predominant hydrological regime of the 88 SB was determined, with a predominant streamflows above the annual average in the first semester and streamflows below the annual average in the second semester.Data from the International Research Institute for Climate and Society (IRI) data were analyzed using data from Sea Surface Temperature (SST) and Zonal Wind (ZW).
In order to evaluate the influence of the climatic indexes and their representativeness in streamflow series, 27 indexes were considered and originated from circulation and atmospheric dynamics and oceanic conditions.Table 1 summarizes all the indexes used in this study.

Performance evaluation of the models
The evaluation procedure of this work considers the forecast with one month lead time.Due to the small number of forecasts, they will be grouped in quarterly blocks: DJF, MAM, JJA, and SON.Monthly streamflow forecast for National Interconnected System (NIS) using Periodic Auto-regressive Endogenous Models (PAR) and Exogenous (PARX) with climate information In order to evaluate the performance of estimated (calibration) and predicted (validation) streamflows in relation to the observed streamflows for hydroelectric power plants, underlying metrics applied by ONS (2010) are used.The results are presented in Table 2 in relation to the median of the generated scenarios.
The NASH coefficient expresses the efficiency of making better predictions with the value 1 (one), representing 100% efficiency.Negative values indicate that the performance of the model is worse than the performance of the forecast model considering the value of the monthly historical average only.
The mean absolute error (EMPA) represents the differences between the observed and simulated standardized values in relation to the observed streamflow.Values close to zero indicate near-perfect prediction.By rising the difference term, the EMPA tends to give greater weight to the large discrepancies between the observed and predicted fields.
The correlation (CORREL) can assume values between -1 and 1 therefore perfect anticorrelation and perfect correlation.Moreover, there is also the total absence of correlation verified with a result equal to zero.This index has the capacity to detect phase correspondence between the series.For the calculation of this index, the median of the forecasts was grouped by seasons and by station, then calculation was made in relation to the observed series.
ONS (2010) developed an index called Multicriteria Distance -DM, which uses the NASH and EMPA indicators, according to Table 2.

Electra
One of the tools widely used to aid in decision problems is Multi-criteria or Multi-objective Decision Analysis.This approach suggests that there is usually no alternative that is the best across  Silveira et al.
all criteria.In order to define the best model for the electric sector, the method used will be Electre, as mentioned by Braga and Gobetti (1997) .The evaluation criteria NASH, EMPA, CORREL and DM will be exploited for this approach.

Electre I
The central idea of the Electre I method is to separate the total set of alternatives from those that are preferred in most evaluation criteria and do not cause an unacceptable level of dissatisfaction in the other criteria.This method is mainly applied in the treatment of discrete alternatives evaluated qualitatively, for a partial ordering of the alternatives.Two indexes are used that measure the advantage and the disadvantage of each alternative in relation to the others.They are: C(a,b) referring to the concordance index with the affirmative aSb and D(a,b) corresponding to the discordance index with the affirmative aSb.The agreement between two alternatives a and b represents the willingness to choose the alternative a in place of b.The concordance index can be understood as a weighted percentage of the criteria for which alternative a is preferred to alternative b.
The first of those indexes is given by the equation where W + is the sum of weights of the criteria for condition a is preferred to b, Wc is the sum of the weights of the criteria for condition a = b, and W -corresponding to the sum of weights of the criteria for the b is the preferred to a.
The concept of disagreement is complementary to that of agreement and represents the discomfort experienced in choosing the alternative to over alternative b, and is determined by: where R Z* k -Z - k , e Z ak the evaluation of a in relation to criterion k.Z* k is the o best degree of evaluation obtained for the criterion k, and Z - k the worse degree evaluation obtained for criterion k.
The relation the preference is defined to establish value of threshold (p, q), between 0 and 1, such as the alternative a is preferred to alternative b, being verified that aSb if, only if, { ( , )   ( , ) To this analysis, it was utilized: p=0.70 and q varying according to the selected indexes, as Table 3.

Electra II
The procedure of ordering of the method Electra II is built in two stages: progressive classification and regressive classification.The final classification is given through average of classification obtained in each stage.The progressive classification could be descripted as following.G F (t) is a subset of G r , where G F (0) =G F ; and G f (t) a subset of G f , where G f (0) =G f , fixes t=0.Therefore, chosen the nodes do not dominated in G F (t).This set is denoted by C. The non-dominated C nodes C are also selected.This set is denoted by A(t).For each x ∈ A(t) is obtained a classification from ( ) ′ = + v x t 1.In this way, the sets G F (t) and G f (t) are reduced from G F (t+1)=G F (t)-A (t) and G f (t+1)=G f (t)-A(t).
If G F (t+1) for empty set, the process stops.Otherwise, t=t+1 and the procedure is repeat from step 2. In the regressive classification are reverted the directions of the arrows in G F and G f obtained an classification a (x), the same way as ( ) ′ v x .The rating is reset by doing ( ) ( ) x X.The final classification of alternatives, ( ) _ v x is calculated by defining the function m(x) as The values are ordered following m(x).

RESULTS
In the determination of the monthly regressions for the 88 SB under study, standardized streamflow data between 1 and 11 months lead time (lag1 to 11) were tested as explanatory variables.It should be mentioned that the streamflow used in this study were standardized according to Equation 1.
By using the stepwise, we selected the explanatory variables, as presented in Table 4.In the PAR model there is a major influence of the lag1 autoregressive variable, present in most (87.7%) of the SB regressions through all months.The streamflow with lag2 appears in 17.4% of the equations, especially in the months of July and August on the Grande Basin.While for the streamflow with lag3 the frequency is of 12.9% and for the streamflows with lag4 and 5 they have the frequency of 9.1%.
As shown in Table 4, January, March, April, July, and August have the second most commonly used explanatory variables, which are formed by streamflow in two months lead time, lag2.However, for the months of May, September, and October, the variables with the second highest level of explanation are those formed by streamflows with lag3; Variables with lag4 and lag5 appear more Monthly streamflow forecast for National Interconnected System (NIS) using Periodic Auto-regressive Endogenous Models (PAR) and Exogenous (PARX) with climate information frequently in December and June, respectively.February has the second explanatory variable most used in the regressions in lag11 flows, with 11 months lead time.
The monthly regressions of SB are built for 6 variables.In 44.1% of these regressions only one variable is required to explain the variability of the dependent term on the equation.According to Table 5 for the equation with two and three variables the percentage is 32.7% and 14.6%.It is estimated that 31 equations, 2.9% of the total, do not have variables, so they do not have definite regressions.Therefore, they will always forecast the average at the station.
January and February are the months in which most of the regressions have only one variable, 69% and 58%, respectively.It is verified for September only 15% of the regressions are defined with one variable only and 43% of the regressions require two variables, thus showing an exception to the rule.
Among the numbers of variables in the regression for the different months, it is verified that the month of May is where the largest number of regressions requires a group of six variables, 18.3% of the 12-month regressions, followed by July with 14.7%;For the group with five variables, the months with the highest percentages are those of June with 12.8%, October and April, both with 10.1%;In the equations with four variables, October and April have 12.5% and 11.9%, respectively; While for the regressions with three variables, the month of February is the one with the highest percentage (16.1%)and October the second highest with 10.6%;In the equations with two variables, March and January have 16.3% and 13.7%, respectively; Regressions with one variable only are observed with higher frequencies in January (20.3%) and November (15.9%).There is an influence on the time series of streamflow in the Atlantic and Pacific SST months represented by the PDO, AMO, NAO, NINO 3 and NINO 1.2 indexes.In the FMA months the TNI variable stands out, as in the season JJA, AMO and TNI are the most representative exogenous variables.In January there is the inclusion of the variable ZW3 in 18 equations, this index is proposed in this paper.

Comparison between PARX-CRD and PAR-CRD
It is observed in Figure 3 that the mean monthly value of the EMPA of the series of estimated values of 30% ranging between 8 and 135%.For the values in the periods of DJF, MAM, JJA, and SON as averages are 34%, 30%, 23%, and 34%, respectively.
For all the periods under analysis, there is an increase of the Mean Absolute Error for SB between 70 and 88, referring to the basins of the Iguaçu River (Paraná basin) and Uruguay, as EMPA for regions of 53% for the forecasting series, DJF (44%), MAM (45%), JJA (63%), and SON (59%).
During MAM months, the forecast capacity increases subtly to a larger area belonging to the Paraná basin, except for its northwest region.The regions along Amazonas, Tocantins-Araguaia, Southeast Atlantic and North Atlantic Rivers remain with very low predictability, with Nash below zero in some of these stations.
In JJA the change is quite drastic; however, a very disparity remains in the different regions according to the NASH coefficients (Figure 4).Thus, good results are observed in the Central-Eastern region of the river basin Paraná, Southeast Atlantic and northern Uruguay Basins.While northwestern and northern of the Paraná basin, the Amazon region, and Araguaia-Tocantins remain with NASH values below zero.
In the period of SON the predictability begins to fall again.The spatial distribution of the NASH coefficients remains the same as in the JJA period, but the predictive capacity along these SB decreases considerably.
For the MAM season the prediction capacity improves for a larger area of the Paraná Basin, except for the northwest region of the basin.The regions of the Amazonas, Tocantins-Araguaia, Southeast Atlantic and North Atlantic Rivers remain very low predictability.
Figure 5 shows that the PAR model is in the last quarters DJF, MAM and JJA, while PARX is better in the SON quarter.

CONCLUSION
The analysis of the prediction models for 1 lead month showed a better performance in general of PAR models than the PARX ones at the periods of December, January and February (DJF); March, April and May (MAM) and June, July, and August (JJA), based on the Electre II multi-criteria test.For the period of September, October and November (SON) the PARX model presents better predictions than the PAR in most of the stations.
Studies pointed out to the improvement of the performance of streamflow forecast models by the insertion of explanatory variables from climatic indexes (CATALDI, 2008;SOUZA FILHO;LALL, 2003).However, it was observed that the process described is not confirmed for all regions and all months of the year.This is probably due to the strong signal of the persistence in the PAR model, forecasts with more antecedence may have different results than those shown in this paper.Nevertheless, the PARX model shows good performance (based on the Electra II multicriteria test) for the dry period of the basins in the north of Brazil, Amazonas and Araguaia-Tocantins; and in the central east of Brazil in the Eastern Atlantic and most of the rivers in the Paraná Basin.
Considering the performance of the climatic forcing in the dynamics of the precipitation processes, the information added shown by the PARX for a month lead time in the months of SON suggests that the investigation of climatic indexes can be an alternative for the reduction of the uncertainties associated to the forecast for the SIN.It should be highlighted that the SON is the transition period between the dry and rainy season on most parts of Brazil, so the ability of models based on persistence may not be enough to capture this onset, requiring information from the large scale characteristics of the atmosphere-ocean and their impacts on the hydrological cycle.
∅ is Autoregressive Period operator, m

Figure 1 .
Figure 1.Spatial distribution of Base Stations utilized in this analysis.

Figure 2 .
Figure 2. Localization of the regions where suggested climate index were extracted.

Figure 5 .
Figure 5.Comparison between PAR and PARX using the Electre method, the y-axis indicates the score obtained by each technique when considering all the base stations.In those stations where Electre did not identify the best methodology 0.5 score were recorded for PAR and PARX.
The observational data base used as input for PAR and PARX models are streamflow from ONS since 1931 to 2008.Exogenous variables into PARX models are climate indexes made available by International Research Institute for Climate and Society (IRI).The period from 1931 to 1998 covers calibration time of the models, while the verification period is defined 1999-2008.

Table 1 .
Climate Indexes as input into PARX model.

Table 2 .
Index for evaluation of the models.
t QO -mean observed streamflow in time interval t; DM -multi-criterion distance.

Table 4 .
Frequency of time that variable was utilized during such month for PAR model.

Table 5 .
Number of variables by equation in PAR model.

Table 6 .
Explanatory variables utilized in most equation of monthly regression with PARX-CRD for 12 months in each 88 Principal Components.