Regionalization of hydrological model parameters for the semi-arid region of the northeast Brazil

RESUMO Este estudo analisa a regionalização de parâmetros de modelos hidrológicos em regiões semiáridas com escassez de dados, tendo como foco parte do semiárido brasileiro que envolve os Estados do Ceará, Rio Grande do Norte, Paraíba e Pernambuco. O modelo hidrológico adotado foi o MODHAC - Modelo Hidrológico Auto-Calibrável. Foram selecionadas 45 estações fluviométricas com um mínimo de oito anos consecutivos de dados consistidos a partir do ano 2000, além dos registros de anos anteriores. O principal critério para regionalização foi a proximidade entre as estações fluviométricas disponíveis, sendo também avaliada a escolha com base em propriedades físicas e climáticas para algumas estações. Foi feita análise de fatores que afetam a qualidade das séries de vazões, bem como da rede de monitoramento pluviométrico. A influência de reservatórios sobre esses dados e a desativação de muitas estações de monitoramento das precipitações foram os problemas mais frequentes. Considerando diferentes indicadores de desempenho, como o Coeficiente de Nash para a raiz quadrada das vazões satisfatório em 63% dos casos, de acordo com valores de referência apresentados na literatura, o procedimento teve razoável sucesso para quantificar as vazões mais elevadas e médias, porém não apresentou em geral bom ajuste das vazões mais baixas.


INTRODUCTION
Hydrological models have been used to simulate flow series from actual or predicted rainfall. However, such models require calibration of their parameters with pluviometric and fluviometric series of adequate duration and quality. But it is not always possible to have series with available data under these conditions, and thereby it is imprecise to establish the calibration parameters of these models to simulate water availability in basins or sub-basins without flow data. Thus, seeking to regionalize the parameters of the hydrological model obtained in one basin to simulate the water availability of another similar basin, in which there is no available hydrological series, is a strategy that can bring outstanding results for the water resources planning. This paper aims to analyze the possibility of regionalizing parameters of a hydrological model with several known applications in studies and projects in the Brazilian semi-arid region and discuss the challenges regarding them. Pilgrim et al. (1988) explain that arid and especially semi-arid regions are often in delicate hydrological equilibrium. The whole hydrological nature (and thus the values of the model parameter) can be altered by prolonged sequences of wet or dry weather. In these regions, rainfall, in turn, tends to be more variable both in space and time than in wetter areas.
According to Huang et al. (2016), most of the hydrological models described in the literature on this theme can well represent runoff in humid regions, but this does not happen when dealing with watersheds in semi-arid territories. Al-Qurashi et al. (2008), referencing several authors, attribute this difficulty to the general scarcity of data on precipitation, flow, soil properties and initial humidity conditions; the influence of seasonal and interannual vegetation variability; the complexity of the watercourse morphology; the difficulty of quantifying overflow losses; and inaccuracy in estimating potential evaporation.
The term regionalization is used to denote the transfer of information from one river basin to another. According to Samuel et al. (2011), most regionalization methods of hydrological models parameter are based on three sets of criteria: physical similarity of the basin, spatial proximity and regression-based approaches. A detailed analysis of the regionalization experiences of hydrological models parameters is presented by Beck et al. (2016). The authors describe experiments in different parts of the world involving models with 1 to 28 parameters to be calibrated. In their work, they apply the HBV model (BERGSTRÖM, 1992;VIS, 2012), with 14 parameters to be calibrated for 1787 watersheds in different regions of the world with areas from 10 to 10,000 km 2 , considering the physio-climatic factors in the regionalization analysis. Based on performance criteria the authors considered the results satisfactory, particularly for basins that are less than 5,000 km away from those for which the parameters were calibrated. Beck et al. (2016) discuss various regionalization approaches that seek to identify the flow signatures (such as the slope of the permanence curve and the base flow rate). The literature on the topic records an extensive discussion about the results obtained from different regionalization criteria. Many authors highlight sound results with the transfer of parameter sets according to a particular measure of climatic and / or physiographic similarity (KOKKONEN et al., 2003;MCINTYRE et al., 2005;PARAJKA et al., 2005;OUDIN et al., 2008;LI et al., 2009;REICHL et al., 2009;WALLNER et al., 2013;SINGH et al., 2014;SELLAMI et al., 2014;GARAMBOIS et al., 2015).
Even under similar conditions, the scale of the studied watershed often becomes a limiting factor for the transfer of information. This implies that a smaller sub-basin that is in a region homogeneous for several larger basins may not have hydrological behavior similar to the larger basins in which it is inserted, as shown by the scale effect studies in the research developed by Girardi et al. (2011).
Other authors, on the other hand, report many analyzes in which the transfer of parameter values, based only on the physical proximity of the watersheds, led to good results in the regionalization process. Patil and Stieglitz (2014) discuss regionalization criteria based on physical-climatic similarity and proximity between the basins, recording that in many applications there is no clear evidence of advantage for one criterion or another. Oudin et al. (2008), Zhang et al. (2015), for river basins in Austria, France, and Australia, reported better results in the transfer of hydrological models parameters when the proximity between the basin was taken into account. Petheram et al. (2012) reached the same conclusion in a study about the flow regionalization for watersheds with low-density of hydrological monitoring in Northern Australia.
Several authors proposed the regionalization of flow permanence curves as the hydrological response or one of the "signatures" of the watershed (LIU; HAN 2010; PINHEIRO;NAGHETTINI, 2010;WESTERBERG et al., 2011;PRAFULLA;YILMAZ;GUPTA, 2012;COSTA et al., 2014). In this last article, the authors use the regionalization methodology of permanence curves for watersheds in two Brazilian regions. One of them is formed by intermittent rivers of the Ceará state, but the results were inferior for those obtained for basins with perennial rivers.

Runoff modeling in semi-arid regions
The semi-arid region has been a major challenge for the application of hydrological models due to data scarcity, especially of flows. According to McIntyre and Al-Qurashi (2009), the selection of a rainfall-flow model for application in an arid or semi-arid region should consider the spatial characteristics of precipitation, the variability, and nonlinearity of losses, and the availability and quality of data.
There are not many hydrological models developed especially for arid or semi-arid regions. Then, to overcome this difficulty, several studies use models developed for general application, with no specificity for semi-arid regions. Al-Qurashi et al. (2008) seeking to represent the soil and rainfall spatial variability during 27 hydrological events applied the Kineros 2 distributed model to an arid watershed in Oman. The authors conclude based on performance indicators that the model validation was not satisfactory for all events and all calibration strategies that were tested to evaluate the highest flow rates. They state that their results are consistent with the experience of other hydrology modelers in arid and semi-arid climate regions and that further scientific research is needed, especially regarding the rainfall observation and spatial modeling.

3/17
Another experiment with hydrological modeling in the semi-arid is cited by Adam et al. (2017) in their study of the El Hawad watershed in the semi-arid region of Sudan, where the average annual rainfall is about 250 mm. In this work was used the SCS Curve Number model. Then, according to the authors, this method enabled the estimation of runoff depths consistent with the rainfall for different soil moisture conditions. Another case is presented in the study of Kan et al. (2017) for nine Chinese river basins. Three of them are in humid regions, three in semi-arid semi-humid regions and three in arid regions. Kan et al. (2017) investigated the application of three hydrological models in flood forecasting: XAJ (Xinanjiang Model, based on the runoff generation mechanism due to excess of saturation, which is suitable for wet and semi-humid regions), NS (Northern Shaanxi Model, based on the flow generation due to excess of infiltration supposedly common in arid areas) and MIX (its mechanism is based on vertical mixing, combining excess saturation and excess infiltration in the flow generation).
The results confirmed the complexity of the drier basins for flood forecasting. All models tested worked satisfactorily in wet watersheds, and only the NS model was applicable in arid watersheds. The XAJ and MIX models, which consider excess of saturation in the flow generation, showed a better performance than the NS model, which is based only on the excess of infiltration for the case of the addressed semi-arid and semi-humid basins.
A study using the HEC-HMS hydrological model is presented by Wang et al. (2016), in the Hailiutu watershed, semi-arid northwestern China, calibrated from 1978 to 1992. In this case, the model systematically underestimated the flows in winter and spring and some flows in summer. This is due to the discrepancy between the nonlinear rainfall-flow response in the basin and the linear structure of the Soil Moisture Accounting (SMA) model. Another example is dealt in the work of Traore et al. (2014) in the Koulountou River basin, a tributary of the Gambia River, located in the Republic of Guinea-Conakry. This study involved two hydrological models: GR4J and GR2M, which were developed by the French Research Institute for Agriculture and Environmental Engineering (CEMAGREF) today IRSTEA -National Research Institute for Science and Technology for the Environment and Agriculture. According to the authors, the modeling obtained satisfactory results from the established criteria of the Nash Coefficient and its variations, with square root variations and decimal logarithms greater than 0.7 in both models, when compared observed and calculated flows. These results led to the establishment of parameters of these models that would allow restoring missing flows from rainfall.
McIntyre and Al-Qurashi (2009) present a discussion of the IHACRES model, which was defined by the authors as semiempirical and was developed jointly by the Institute of Hydrology (IH) in the United Kingdom and the Center for Resource and Environmental Studies (CRES) of the Australian National University. The justification of the model is based on the calculation of a unit hydrograph for the total runoff after deducting losses in nonlinear relationship with precipitation. The model was initially designed for watersheds in a temperate climate and was progressively adapted to represent runoff in temporary rivers. It is currently presented in a concentrated and semi-distributed version, with daily or sub-daily time steps. The authors report the application in several semi-arid basins of Australia and Oman, pointing out that the best results are obtained using the semi-distributed model. However, the authors emphasize the limitation of this type of model when the quantity and quality of hydrological data are not adequate.
Regarding modeling experiences for the Brazilian semi-arid, Cabral et al. (2017), for example, have satisfactorily calibrated the hydrological model HEC-HMS (Hydrologic Engineering Center -Hydrologic Modeling System) in a region between Zona da Mata and Agreste, in the São Miguel river basin in the State of Alagoas. This method was used to simulate flow rates by both observed and estimated radar precipitation to compare them with the observed flow. However, their results showed that the flow estimated by precipitation, obtained by radar, underestimated the magnitude of the peak flow and volume, while adequately representing the time of the peak flow with sound Nash-Sutcliffe coefficient values (0.75 -0.79). An experience in the semi-arid in the State of Paraíba is described in the study of Felix and Paz (2016) in the Piancó basin. They used the drainage area of the river station of same name as a research sub-basin, simulating the flows from the model distributed by MGB-IPH (Large Basin Model) modules. This study reveals the difficulties of the model in representing the lower flows of the Piancó River. The permanence curve is observed at 5% of the time, when the flows exceed the value of 50m 3 /s, happening a very sound adherence between simulated and measured flows according to the authors. On the other hand, in approximately 40% of the time there is no runoff, but the model maintains a residual flow. This behavior was also identified by other authors discussed in this article, due to the application of a type of model that does not present mathematical formulation for rivers with characteristics of semi-arid regions.
An experiment on the evaluation of general models with suitability for semi-arid regions was developed by Huang et al. (2016). Four classic hydrological models were selected and applied to three semi-arid basins in northern China. They were: TOPMODEL, Xinanjiang (XAJ), SAC-SMA and Tank. Based on the analysis and comparison of the results of these classic models, the authors developed four new models designated by them as flexible and seeking to improve modeling responses. The application of flexible models aims to identify the dominant flow processes.
Considering the classic models, the authors believe that the high flexibility of the nonlinear components in the XAJ and SAC-SMA models may be the reason why these models had a better performed than the TOPMODEL in the semi-arid basins.

MODHAC simulation model
The MODHAC (Modelo Hidrológico Auto-Calibrável -Self-Calibrating Hydrological Model) (Lanna, 1997), although it is commonly used as the vast majority of rainfall-flow models, was also developed to meet semi-arid characteristics (Lanna, 1997). The model was used for hydrological studies of the Ceará State Water Resources Plan and Master Plans of several basins in Bahia. Martins et al. (2006) used the model in an operational study of reservoirs of the Jaguaribe-Metropolitan system in Ceará. Due to the various applications, this was the model chosen to evaluate the regionalization possibility in the semi-arid area of the present study, although it is a model of general use.
The selection of the model, study areas, and data used in this research had a significant influence on the Northeast Atlas project (ANA, 2006). This project is under the coordination of ANA (Agência Nacional de Águas) -National Water Agency and involved numerous actors, such as water resources state bodies and institutions. For the Atlas hydrological studies, regarding the regions of the states of Ceará, Rio Grande do Norte, Paraíba and Pernambuco, the MODHAC hydrological model was used to generate estimations of water availability provided by several reservoirs of the Northeast region.
In the case of Atlas, in turn, the model was chosen based on the studies of the São Francisco water transfer project (Brasil, 2001;ENGECORPS;HARZA, 2000). In the MODHAC were used parameters adjusted in the Atlas studies, but complementing the calibrations for areas not contemplated or with adjustments considered unsatisfactory (initially based on the R 2 and Pbias performance indicators), as well as analyzing the application of these parameters to most recent data series.
The MODHAC is a lumped hydrological model, i.e., it does not explicitly consider the spatial variability of the hydrological process and the physiographic characteristics of the basin, but it can be used as a semi-distributed model in a basin composed of interconnected sub-basins. This model simulates the terrestrial phase of the hydrological cycle, that is, the transformation process that leads to the calculation of river flows. It was opted for the heuristic calibration of the parameters to ensure a better sensitivity and refinement, but mainly to enjoy the experiences of the mentioned applications, without using the automatic option for such purpose.
The water storage process in the basin is simulated through three fictitious reservoirs: surface storage, subsurface storage and underground storage. Precipitation goes through an initial corrective filter, which is needed due to the rises from deficiencies in obtaining data because of the low rainfall or systematic reading errors. Afterward, this precipitation feeds the potential evapotranspiration (PET), which may be or not supply. If water remains in this process, then the surface reservoir (vegetation and topographic depressions) will be fed, where the unmet PET will be reevaluated. This situation will occur even without rain if the reservoir is not empty. If this reservoir overflows, will happen a runoff, and therefore, also a percolation to the subsurface and underground tank. The water from the subsurface reservoir will be able to meet the remaining PET, besides being responsible for the base runoff, and in case of overflow, it will attend the hypodermic runoff. The water from the underground reservoir will also form the base runoff and, in the event of an overflow, the water will flow into the subsurface reservoir that may serve its runoff. Figure 1 schematically demonstrates the processes of the hydrological cycle in the MODHAC.
The model requires the calibration of 14 parameters, described in Table 1. Despite the large number, some of these can be considered constant or even disregarded in the calibration Figure 1. Schematic representation of the processes of the hydrological cycle in MODHAC. Source: (Lanna, 1997).

5/17
due to the characteristics of the basin that will be modeled, thus reducing the adjustment work.
The underground reservoir simulates water storage in the lower soil layers. It encompasses the underground aquifer and gives rise to a more delayed underground or basal runoff that supports the runoff in the drought. This reservoir has two capacities that should be to consider. These are RSBX and RSBY, which are parameters of the model. RSBX is the full capacity and RSBY controls the leaks, or contributions to the runoff promoted by this reservoir. When storage is less than RSBY the water leakage will be controlled by the ASBX parameter and when it is higher, is controlled by ASBY.

Physical characteristics of the Brazilian semi-arid region and limitations of hydrological data
The semi-arid regions, as most of the Brazilian Northeast, are areas with large volumetric, temporal and spatial rainfall irregularities. For this reason, several strategies are adopted to meet the regularization of water flow throughout the dry period. This is the case of thousands of dams built for the accumulation and redistribution of water for various purposes during the drought (CIRILO et al., 2017). Historical series of hydrological data do not always exist due to the lack of monitoring or are not long enough, especially the level observations and the monitoring campaigns for the establishment of rule curves and then the determination of flow rates. Besides, the existence of reservoirs alters the natural conditions of runoff, being necessary the identification of this effect when simulations are performed.
The northeastern semi-arid region is characterized by intermittent flow in most of its rivers, having the flow interrupted during most of the year and only becoming perennial in areas where they reach wetter regions, that is, near the river mouth on the Atlantic ocean. Moreover, the crystalline basement that occupies more than 80% of the region has characteristics not contemplated by most hydrological models, in the aspect of river-aquifer accumulation and water exchange. It is observed, especially when the simulations are developed with a time step of one day or less, that the flow ceases soon after the precipitation occurs, characterizing the almost absence of storage and underground flow. In addition, runoff losses can be considerable (recharging the cracking of the crystalline stratum, for example). Therefore, based on the described characteristics, the modeling is restricted to simulate the surface runoff, because the groundwater component is impaired by the physical environment characteristics.
Regarding the hydrological monitoring networks, the low density of fluviometric stations with continuous data series of adequate duration for the elaboration of studies and projects should be recorded. The rainfall network has already been constituted by a significant number of measuring stations at the time that the SUDENE -Northeast Development Superintendence had a fruitful performance in the sector, until the 80's of the last century. However, most of these stations have been deactivated. There is currently an effort by ANA -National Water Agency and related state agencies to rebuild the hydrometric network, but there is not yet significant historical series.
Another relevant aspect is the presence of dams along most of the rivers, built from the sevenths on last century. The flow series of the rivers are therefore affected by the operation of the reservoirs. Thus, hydrological analyzes should consider both physical interventions in the basins and the reduction of rainfall monitoring stations.
The studied region is part of the Brazilian northeastern semi-arid, belonging to the states of Ceará, Rio Grande do Norte, Paraíba and Pernambuco, located between the parallels 03° 30' and 09° 30' S and meridians 41° 30' and 35° 13' W, within the UTM Zone 24 South for most of its part. This territory has an area of 281,965 km 2 that occupies 108,956 km 2 or 73.20% of Ceará, 48,752 km 2 or 92.22% of Rio Grande do Norte, 45,788 km 2 or 81.01% of Paraíba and 78,469 km 2 or 79.91% of Pernambuco, involving 79.11% of the total area of these states, 17.75% of the Northeast area and 3.31% of Brazil's land area, as shown in Figure 2.
The wettest quarter occurs in February, March and April in most of the states of Ceará, Rio Grande do Norte (except in its eastern portion) and in the western and central regions of Paraíba. In Pernambuco the wettest quarter is divided into regions: in the west, it rains most in January, February and March and on the east coast the most significant rainfall occurs in May, June and July. Between these territorial extremes of the state the rainy season is not well defined, depending on the atmospheric phenomena that generate the precipitation. According to the Pluviometric Atlas of Brazil project, developed by CPRM (2011), the study area has a range of average annual rainfall isoietes between 400 mm and 1,300 mm. In general, the state of Ceará has average annual rainfall in the semi-arid region around 400 mm to 1,200 mm, with an average around 750 mm, for Rio Grande do Norte this value ranges between 500 mm and 1,300 mm, with an average of 632 mm. For Paraíba, the annual rainfall ranges from 400 mm to 1,000 mm, with an average of 636 mm, and Pernambuco ranging from 500 mm to 1,000 mm, with an average of 581 mm.

Fluviometric stations
From the ANA inventory and the database built with data from CPRM and ANA were selected 170 stations with flow data for the states of Ceará, Rio Grande do Norte, Paraíba, and Pernambuco. Stations located on the São Francisco River have a drainage area outside the study area and are not part of this research.
From the 170 stations were selected flows series with at least eight consecutive years of consisted data to represent the flows observed in MODHAC and that had drainage areas partially or entirely within the semi-arid polygon. This restriction led to the selection of 45 stations, which are represented in Figure 2 and with physical and climatic characteristics indicated in Table 2. However, of these 45 stations only four have consisted flow

Pluviometric stations
Two rainfall databases were used for the calculation of the average rainfall series weighted by the Thiessen polygon, which was the method used in this study. One from the Northeast Atlas project from 1933 to 2000, originally from 1933 to 2001, and another from available data from ANA, including more recent records for the compatibility of the studied period with the data of the fluviometric stations. The first database was generated with consistency analysis and series extension to standardize the data from 830 rainfall stations with the same duration. Inconsistencies were detected in the data from 2001 in 56 stations and thereby the series were reduced until 2000.
The second database with rainfall data from the year 2000 was obtained from ANA and CPRM. These data sources present discontinuity in some stations' rainfall series, and there was no treatment extending extinct station series or new facilities to maintain a single period to all. Therefore, it was decided to select rainfall stations that had at least one continuous data series of eight years for the construction of this second database.
Thus, the corrected Northeast Atlas project database has 830 rainfall stations from 1933 to 2000, and the second database, dating from a period after 2000, has data from 228 rainfall stations from ANA with at least the period between 2000 and 2007 without missing data.

Evaporimetric data
The Potential evapotranspiration (PET) data used in the Northeast Atlas project were estimated from evaporation data, a methodology that was also adopted in this study. The calculations were performed according to the data available from the climatological station nearest to the drainage area of the fluviometric station to be modeled. Thus, most of the calibrations/simulations performed in the present study maintained the evapotranspiration data used in the Northeast Atlas project. But, for new simulations were estimated evapotranspiration data from the INMET (Brazilian National Institute of Meteorology) database.

RESULTS
From the data selection for the specified time periods, the chosen hydrological model was calibrated, also taking into account the results of previous studies and then, through the cross-validation process seeking to identify its suitability or not for other basins.

Calibration of the model parameters and performance indicators
In the calibration process, results from the Northeast Atlas project were used, when the selected basins coincided, and with new adjustments of parameter values for non-contemplated areas. The evaluation was done at a monthly level comparing the flow calculated by MODHAC with the flow observed from the corresponding fluviometric stations.
The results were evaluated by comparing the observed and the calculated flows by MODHAC, considering the following performance indicators: NSE: Nash-Sutcliffe efficiency; r 2 : determination coefficient; RMSE: root mean square error; Pbias%: percent bias; RSR: ratio of the root mean square error to the standard deviation of measured data; AAPE: Average absolute percentage error. The Nash-Sutcliffe Efficiency Coefficient (NSE) ranges from -∞ to 1 and can be obtained from Equation (1) The vast majority of authors use the Nash-Sutcliffe efficiency coefficient as one of the main performance indicators, some of for both daily and monthly time steps. However, authors such as Zappa (2002) propose values above 0.5 for NSE.
On the other hand, it is recommended to associate the Nash-Sutcliffe efficiency coefficient values with other performance indicators. Table 3 refers to the minimum and maximum values of NSE, Pbias, and RSR for calibration and validation from various works compiled by Moriasi et al. (2007). The authors gathered works that aggregated several performance indicators and selected the three mentioned to represent the accuracy of the modeling jointly. A similar analysis developed by Van Liew et al. (2007) reaches the same performance reference ranges for the NSE and Pbias indicators.
In addition to the indicators cited, several authors use NSE derivations, like square root (NSE sqrtQ ) and decimal logarithm of flows (NSE logQ ) instead of the natural flows in the Nash-Sutcliffe formula, indicated in Equations (2) and (3). According to Traore et al. (2014), NSE closer to the unit indicate the adjustment of the highest flows, NSE sqrtQ indicate the average flows and NSE logQ the lowest flows. Pushpalatha et al. (2012) analyze researches developed by several authors with other transformations on the NSE indicator and conclude that NSE sqrtQ provides more adequate information about the errors made in the simulation of high and low flow rates, which agree with results of Oudin et al. (2006).
The calibration analysis is presented below, highlighting the external factors that caused distortions in the adjustments. The performance indicator values for the nine reference stations that were calculated for the model calibration are concentrated in Table 4. The analysis of each station is relevant to understand the peculiarities that must be found in the regionalization process. Table 5 presents the parameter values calibrated for the 9 reference gauge stations.

Iguatu Station -36160000
This station is located on the Jaguaribe river, in the Ceará state. The modeled periods were from 1962 to 1980, 1980 to 2000 and 2001 to 2009 Table 4, the parameters calibrated for the data of the Icó station led to the best result, meeting the criteria established by Moriasi et al. (2007) and also satisfactorily representing the minimum flow rates according to NSE logQ > 0.5.

Influence of evaporation data
To identify how much the estimated evapotranspiration data impact on modeling, the sensitivity of the model was evaluated according to the nature of these data. For this, the flows in the Iguatu station were simulated in two situations: using the INMET data from the Iguatu climatological station and using the evaporimetric averages of the four climatological stations adopted in the Northeast Atlas project. The INMET data are the climatological normals obtained with Piché evaporimeter. Table 6 shows the considerable differences in evaporation values and Figure 3 their reflection on simulated flow rates. Hence the need to maintain the same potential evapotranspiration dataset when comparing results of parameter regionalization.

Icó station -36290000
The Icó station, located on the Salgado river, such as Iguatu, is also situated in the semi-arid region of Ceará. The periods studied here are subdivided from 1959 to 1987, 1988 to 1999 and 2000 to 2007. There are some issues to be considered when analyzing flow data at this station, that is, the period from 1988 to 1999 presents loss in modeling quality. It was also found that from 1988 to 1993 the historical series showed zero flow between July and December, a situation that is not observed in the previous 20 years. On the other hand, analyzing the period from 1990 to 1996 ( Figure 4) were observed flood peaks, but it was not identified the presence of rainfall compatible with this behavior. It happens that the affluent São João stream has its confluence upstream from where Icó station is. Then, the Lima Campos dam is located in this affluent and it is suspected that due to the occurrence of six years drought, this dam may have released water seasonally through the affluent to the Salgado River for regularization, which would explain the decrease in modeling efficiency for the period from 1959 to 1987. Nevertheless, the Icó station presented the best performance among all analysis with its calibrated values for the parameters approaching the optimal classification proposed by Moriasi et al. (2007), and having an adequate adjustment for the minimum flow rates.

Oiticica station -34741000
This station is far west in Ceará, in the Parnaíba river basin (Sub-Basin 34). Its installation occurred only in 2004. Then, for the study was selected the period of available consistent flow, which was between 2005 and 2011. The performance indicator values obtained for this station, presented in Table 4, are classified as satisfactory according to Moriasi criteria (2007). The NSE logQ indicates that the adjustment of the low flows was not sound, a trend also reflected in the AAPE indicator, which is very restrictive, because it weights the flows equally, regardless of their magnitude.

Piancó station -37340000
The Piancó fluviometric station is located in the Piranhas-Açu river basin. The periods from 1964The periods from to 1981The periods from , 1982The periods from to 1999The periods from and 1999 to 2006 were considered. The change in the observation period did not affect the adjustment performance indicators, with satisfactory to outstanding results in all analyzed time intervals. According to Table 4 the adjustment is considered sound even for low flows.

Mossoró station -37090000
This station is located in the Apodi basin, in Rio Grande do Norte semi-arid region. The most sensitive parameter in Mossoró modeling was RSPX, which deals with the maximum capacity of the superficial reservoir. As in other cases that were analyzed, this parameter seeks to compensate for the storage effect in upstream reservoirs.
Referring only to the Nash-Sutcliffe Coefficient, the value fell from 0.79 (period 1987-1998) to NSE = 0.20 in the period from 1998 to 2005. The fall on performance is attributed to the reduction of 38 pluviometric stations with data consisting and only 11 stations in the subsequent period. Nevertheless, according to Table 4, the indicators suggested by Moriasi et al. (2007) for the entire period were satisfactory. However, an unsatisfactory adjustment for low flow rates expressed by NSE logQ was observed.

Poço de Pedras station -38850000
The MODHAC simulations for this station data in the state of Paraíba, located in the Taperoá River, affluent of the Paraíba River, presented different behavior over the studied period. From 1986 on the quality of the adjustment worsened. This must have been due to the interference of the Taperoá II (1983) reservoir construction, which is upstream of the station. This fact is confirmed by observing the fall from 8.14 m 3 /s to 1.50 m 3 /s of the monthly average flow of the first period, even with the average monthly rainfall remaining at 47 mm for both periods in the station drainage area.
Even so, only the adjustment indicators Pbias and NSE logQ were not satisfactory.

Toritama station -39130000
The Toritama river gauge station in the State of Pernambuco is closest to the river head of the Capibaribe River, being located upstream of the Jucazinho Dam and downstream of the Poço Fundo reservoir. Between 1973 and 1986, the simulations results were sound (NSE = 0.76), but they have worsened in the following years, probably due to Poço Fundo, which started operations in 1987.
The indicators proposed by Moriasi et al. (2007) presented in Table 4 for the whole period are slightly lower than the acceptable minimums. The negative NSE logQ indicates that the adjustment of the low flow was not proper either.

Ilha Grande station-48880000
This is a fluviometric station located in the Navio Creek, in the Pajeú river basin, downstream of the Barra do Juá reservoir, which started operating in 1982. The values obtained for the NSE logQ , NSE sqrtQ, and Pbias performance indices were not satisfactory.

Capivara station -39540000
This fluviometric station is in the upper course of the Una river, in Central Agreste region of Pernambuco. The period selected for the studies were from 1978 to 1993 and 1998 to 2006. For the first adjustment was obtained a NSE = 0.82. In the transition periods, the availability of the rainfall stations decreased from 17 to only 3 stations, which may explain the reduction in the adjustment quality.
Considering the entire period, the adjustment was not satisfactory, as indicated in Table 4.

Evaluation of parameter value transfer
In order to compare the characteristics of the basins under analysis, the following indicators are presented in Table 1: Area of the basin (km 2 ); mean annual flow Qm (mm / year); Mean annual rainfall PMA (mm); Cesc (runoff coefficient: Qm / PMA); and percentage of area located on fractured aquifer Cris (%). These calculations were made for the area controlled by each of the 45 fluviometric stations. Crystalline soils are predominant, which is typical of the semi-arid region of northeastern Brazil.
Among the problems mentioned, the ones that mainly stand out are the influence of reservoirs on the observed flow and the deactivation of a considerable portion of pluviometric stations in the most recent years. Therefore, given the problems regarding the flow and precipitation data series, it should be expected that climate similarity indicators will be influenced. Thus, it was decided to test the application validity of the calibrated parameters between the different regions based on the greater proximity between them. The process consists of using the set of parameter values calibrated for one station by simulating the flow series for other neighboring stations and evaluating the adjustment indicators.
A commonly raised issue concerns the block transfer of parameter values. Bárdossy (2007) and Oudin et al. (2008) emphasize the fact that parameter values are interdependent and thus should be considered as a set that must be transferred entirely to maintain the correlations between the adjusted values. This interdependence also exists with evaporation data, especially when it comes to monthly level simulations, as occurs in this paper. Patil and Stieglitz (2014), in turn, highlight that, as some parameters are more sensitive in the calibration process than others, the convenience of block parameter transfer should be evaluated according to the characteristics of the parameters.

Regionalization result
The simulation results for several crossings parameter were analyzed. From the flow calibration at the selected stations, indicated in Table 2, the transfers of the parameter sets were tested to simulate the runoff in closer basins.
Figures 5a, 5b, 6a, and 6b represent the permanence curves of the flows observed at the Iguatu and Floresta stations (representing "receiving basins") and the permanence curve generated with the calibrated parameters for the "donor basins". In the case of Iguatu, the parameters come from calibration with data from the Icó station. To simulate the flow rates corresponding to the Floresta station, the parameters used came from the calibration of Poço de Pedras, Ilha Grande and Piancó stations. The permanence curves in both cases were separated into two parts to highlight very low flow rates. For these it is verified that the models did not properly reproduce the recorded flows. This behavior was identified by Costa et al. (2014) in the application of the regionalization method by synthetic permanence curve for Ceará basins with daily data. Probably this is due to the inadequacy of general models used in both cases (MODHAC and Rio Grande). Two examples of parameter transfer are presented as permanence curves for the stations of Iguatu, in Ceará, and Floresta, in Pernambuco.
In the general context, 113 analyzes were performed from all 45 stations using the parameters of the "donor" stations 13/17 applied to the flow simulation for the periods in which they presented data. Figure 7 shows box plot graphs of all analyzes performed considering the NSE, NSE sqrtQ , NSE logQ , PBias and RSR indicators. NSE values greater than 0.50 were obtained in 40% of the analyzes of all performed simulations and 51% presented NSE above 0.36. NSE sqrtQ and NSE logQ above 0.5 were obtained in 63% and 26% of the simulations, respectively. RSR values less than 0.7 and PBias less than 25 were obtained in 40% and 72% of the results, respectively. However, in only 39% of the analyzed results, the set of NSE, RSR, and PBias parameters were classified as satisfactory, according to Moriasi et al. (2007) proposal. Table 7 shows the regionalization performance indicators from the permanence curves. As can be seen, the best results were obtained from data of the "donor stations" Icó, for Iguatu, and Piancó, for Floresta.

Evaluation of selected "donor" fluviometric stations
The reference stations indicated in Table 4 were chosen based on their spatial distribution so that they could represent a closer set of stations, and on the best quality of their historical series when compared to the others. To evaluate the representativeness of this selection, an analysis was performed with all 45 stations indicated in Table 2, presenting then the physio-climatic properties of the basins under study (area, flow, and annual average precipitation, runoff coefficient, percentage of fractured soil). Afterward, the aim is to evaluate the regionalization criteria from these variables. For this, it was adopted the methodology that was proposed to measure the physical similarity between watersheds described by Kay et al. (2007), and adapted by Oudin et al. (2010). The similarity is defined by the Euclidean distance in the space of each physical or climatic property, with values normalized by their standard deviation throughout the complete set. This is done using Equation (8).
Where j = 1, J are the basin descriptors, x a,j is the value of each descriptor, σ X, j is the standard deviation of each property and w j is the weight assigned to each descriptor. In this case, the following values were estimated for the weights: 0.15 for the area; 0.225 for mean flow and precipitation; 0.25 for the flow coefficient and 0.15 for the crystalline basement fraction. The estimation of these values was performed to give prime importance to the variables that depend on hydrological data. The Euclidean distance described in Equation (8) was calculated for all 45 stations indicated in Table 2.
To compare the results of the selected donor stations, the results obtained for the data of stations Iguatu, in Ceará, and Floresta, in Pernambuco were chosen as examples, which are presented in Table 7.
In the case of regionalization for Floresta station, the shortest Euclidean distances were 0.435 for Piancó station; 0.598 for Capivara station; 0.701 for Poço das Pedras station; 0,856 for Ilha Grande station. Except for Capivara station, the selection of the other stations coincided with the choice made for Piancó station, with inversion in the ranking of Ilha Grande and Poço das Pedras stations.
Regarding the analysis of data from the Iguatu station, the choice as "donor basin" based on the defined metric fell on the Oiticica station. The criterion of shortest Euclidean distance pointed the Icó station as the 6th priority among the 9 stations chosen as "donors", and this was the station that at the end presented the best fit. For the stations which regionalization could not be established with any of the others that were used as reference (Croatá, Saudoso, Fazenda Paraná, Caicó, Bodocongó, Caraúbas and Inajá), the Euclidean distances criterion indicated as stations able to act as "donors" some of them that did not meet the minimum available flow data requirements. Therefore, these stations were not selected.

CONCLUSIONS
This study involved the survey of an extensive hydrological and spatial database on the Brazilian Northeastern semi-arid, analyzing data from precipitation, flow and evaporation series in four states.
The fragility of the observed data was evident, being necessary to develop a detailed analysis of the flows and precipitations series, as well as external effects that affect the quality of the flow data. It is also required to verify the presence of dams and the reduction of the amount of pluviometric stations that remained in operation. These changes are reflected in climate indicators and thereby, being more challenging to establish similarity relationships. This gathering of data and information sought to explain the reduction in the simulation's quality for the most recent periods. The regionalization of the model parameters presented better adjustments for watersheds of states that have older and more consistent hydrological series, such as Ceará. In the case of Pernambuco, the results were impaired due to the absence of fluviometric stations with long, consistent and available series.
A portion of non-regionalized areas is due to reservoir interference. These situation does not mean that the parameters used cannot lead to sound results, but it would be necessary to expand the study to consider only the incremental areas upon the reservoirs and to complement their inflows.
In regionalization analyzes the scale effect was not noticed. Where happened a rejection of the parameter values of the "donor basins" it is not noticed influence of the basin area. The rainfall data deficiencies and the influence of reservoirs were evaluated as responsible for reducing the quality of the simulation of the most recent periods. Therefore, it cannot be safely deduced that the calibrated parameter values are not appropriate for the data series from 2001 onwards. The short duration or presence of long gap periods compromised the use of other stations that could be "donors" and thus broaden the regionalization universe.
Another essential issue that should be analyzed concerns the hydrological model used, although Petheram et al. (2012), for example, consider that modeling results are more dependent on the quality of input data than on the model. However, certain behaviors of intermittent river basins are often not reproduced by the equations that represent the simulated processes.
The results obtained in the regionalization are average, as can be observed in the performance indicators. Although in 63% of the results the values obtained from NSE sqrtQ were at least satisfactory, representing an adequate adjustment for the average flows. On the other hand, only 26% of the values obtained for NSE logQ , the minimum flow adjustment indicator, and 39% of the set of three indicators (NSE, RSR, and Pbias) were simultaneously met, according to the Moriasi et al. (2007) criteria. The challenge of improving regionalization for semi-arid regions with scarce data is related to the parameterization with a higher physical basis. In this way, a significant portion of the parameters can be calculated directly from the basin characteristics, which will facilitate the transfer of parameter values.