Viability of CLIGEN in the climatic conditions of Paraná state , Brazil

Studies on hydrology, agro-meteorology, soil loss and climate change scenarios depend on weather information, which may not be available. Weather generators, such as the CLIGEN, can synthesize daily climate series statistically similar to the observed data. The objective of this study was to evaluate the CLIGEN in generating series in the climatic conditions of Paraná, Brazil, which show transition between Cfa and Cfb climates. Observed data from 20 weather stations from 1975 to 2009 were compared with synthetic series generated with the same number of years. Mean and standard deviation of the number of wet days, daily precipitation, normalized storm peak intensity, solar radiation, maximum and minimum temperatures and dew point were analysed. The coefficient of determination was less than 0.91 in two stations.Under the evaluated conditions, the CLIGEN showed restrictions to simulate the normalized storm peak intensity and, for the remaining variables, it was shown to be viable to synthesize daily climate series statistically similar to those in the observed data.


IntroductIon
The generation of weather series continues to be the object of research in climatology, hydrology and agrometeorology.Its importance may be verified in fields such as the analysis of sensitivity of models dependent of the weather and scenarios of evaluation of the impact of weather changes (Ng & Panu, 2010).The generation of synthetic meteorological data becomes relevant as it allows the study of future scenarios of soil loss (Nearing et al., 2004) or agricultural and hydrological systems (Evangelista et al., 2006), stemming from climate changes.
The CLIGEN is a stochastic generator of climate data, a component of the model Water Erosion Prediction Project-WEPP (Nicks et al., 1995).It was adopted by Yu (2005) in the evaluation of soil loss in Sydney, Australia, when it was calibrated in periods with significantly increased rainfall.The authors observed that the alterations in the daily amount may overestimate the impact in the estimate of runoff and soil loss, whereas the alterations in the frequency of wet days may underestimate such an impact.Amorim et al. (2010) compared the performance of models USLE, RUSLE and WEPP with CLIGEN 4.3,under the Brazilian edaphoclimatic conditions with natural rainfall.The best estimates were obtained by means of WEPP, which also presented better general performance.In his thesis, Amorim (2004) recommends that, under the Brazilian conditions, a reliable estimate needs precise calibration, mainly of the parameters obtained by indirect estimate both with WEPP and CLIGEN.Yu (2003) high lights that, differently from the other weather generators like WGEN, USCLIMATE, GEM and WM2, the CLIGEN generates storm duration, peak storm intensity and time to peak.The author also considered the CLIGEN to be unbeatable among the stochastic generators, due to the number of variables generated as well as the dimension of the required database.He also points out that the input parameters are from statistics of low order moments, which can be routinely calculated for a big number of locations.Evangelista et al. (2006) considered efficient the performance of the CLIGEN in the estimates of the main climatic elements in a 50-year synthetic series in relation to data collected between 1972 and 2001 in the region of Viçosa, MG, and concluded that the model was efficient in the estimates of the climatic elements.
The State of Paraná presents a climatic transition that enables the occurrence of tropical to temperate climates (Caramori et al., 2008), which is attributed to a peculiar characteristic of altitude variation associated to the latitude and the presence of the Tropic of Capricorn (23º 27" S).Therefore, the data collected in that State present conditions to evaluate the viability of climatic models due to their climatic diversity.
The objective of this work was to evaluate the viability of the CLIGEN as a climatic model to generate synthetic series of meteorological data in the climatic conditions of Paraná State, by taking advantage of its climatic transition, for application in simulation studies.

MaterIal and Methods
Paraná State is located in the south of Brazil, between parallels 22º 30' S and 26º 43' S and meridians 48º 00' W and 54º 38' W, under Cfa subtropical climate, with hot summers and summer rainfall concentration, without defined dry season, and Cfb typical temperate climate, with mild summers and no defined dry season.
Twenty stations (Figure1) with records of more than 30 years of data were selected.Out of these stations, only 9 had solar radiation data.Table 1 shows the basic data of the stations, such as the presence of an actinograph, latitude, longitude, altitude, climatic classification and the morpho-physiographic region of the State.

Figure1. Location of the selected agrometeorological stations in Paraná (▲)
The data collected from 1975 to 2009 were used to constitute the database, which totalized 264,504 records with the following fields: code, year, month, day, precipitation, rainfall duration, storm peak within 60, 30, 15 and 10 min, average relative humidity, daily radiation, mean temperature, minimum, maximum and average day temperature.Relative humidity and average day temperature were necessary for the estimate of the dew point by means of Magnus-Tetens formulae (Murray, 1967).
CLIGEN is a stochastic weather generator that produces series daily based on monthly historical averages and absolute parameters from a single geographical point.The daily simulated data are: accumulated precipitation and duration; normalized storm peak intensity; time between start and peak intensity; maximum and minimum temperature; dew point; solar radiation and wind direction.
The normalized storm peak intensity is a relation between the precipitation maximum intensity and the rainfall average precipitation.It is dimensionless and always higher than 1.The time between the start of the rainfall and the peak is a dimensionless quantity which is proportional to the total duration of the rainfall.
The input absolute parameters are: identification (name); latitude; longitude; number of years recorded; type of single rainfall distribution; altitude; maximum precipitation in 30 min and in 6 h, respectively TP5and TP6, and the time between the start and rainfall peak.
Rainfall distribution is classified into 4 types, defined by the Soil Conservation Service (SCS) from the United States Department of Agriculture (USDA) and detailed on Figure 2 (Soil Conservation Service, 1986).Types 1 and 2 (SCS 1 and 1A) occur on the Pacific Ocean shore.Type 4 (SCS 3) occurs in part of the Gulf of Mexico shore and the Atlantic shore, whereas type 3 (SCS 2) occurs in the remaining parts of the territory.
The CLIGEN does not simulate storm peak intensity and time to peak when the rainfall distribution type is not defined.Considering the importance of such variables in the hydrological component of the models, and also due to lack of such a definition, type 4 of rainfall distribution was attributed to all the stations so that the variables could be generated.
The maximum accumulated precipitation (in inches) with 100-year recurrence for times 0.5 (30 min) and 6 h are, respectively, the values of the parameters TR5 and TR6.In Paraná, these parameters were obtained from Fendrich (2003).
The time between the start and the peak is parameterized by the distribution accumulated in 12 time classes of (normalized) peak with an 0.0833 increase.In other words, in the first class, one can find the proportion (index) of a storm whose peak occurred prior to 8.33% of the rainfall duration.The limits of the classes are: 0, 0.0833, 0.1667, 0.25, 0.3333, 0.4167, 0.5, 0.5833, 0.6667, 0.75, 0.8333, 0.9167 and 1 (Nicks et al., 1995).
The monthly historical parameters are: 1. Mean daily precipitation of the wet days and standard deviation; coefficient of distribution asymmetry; probability of occurrence of a wet day after a wet day; probability of occurrence of a wet day after a dry day; and peak intensity average in a 30 min precipitation.A wet day was defined as the one in which daily precipitation was above 0 mm.
2. Mean maximum daily air temperature and standard deviation; mean minimum daily air temperature and standard deviation.
3. Mean daily solar radiation and standard deviation.4. Mean dew point.5. Wind, data about speed, time percentage in that quadrant, standard deviation, coefficient of asymmetry in the 16 quadrants and calmness.
The parameters related to wind were not considered, as they are used to estimate the snow accumulation and melting, phenomena regarded as non-existent in Paraná.The units of the input parameters of the CLIGEN are expressed by the British Imperial System, and the results, by the International System of Units.The methodology used to obtain the parameters is explained in details in Nicks et al. (1995).
A file with the parameters specified above was generated.They were calculated by using the historical series for each agrometeorological station selected.Version 5.3 of the CLIGEN was used, which was available for download at the WEPP page of the United States Department of Agriculture (USDA, 2012).The daily synthetic climate data were generated with the same number of years of the historical series of the station.
To evaluate the performance of the model, the study considered that the generation of synthetic climate series has the objective to obtain (daily, monthly and annual) meteorological Table 1.List of agrometeorological stations selected for the study and characteristics of location and presence (*) of Actinograph (Act.)variables that are statistically similar to the historical records (Dubrovsky, 1997).
The monthly data of the synthetic series were evaluated in relation to the following historical data of the climate variables: number of wet days; daily precipitation; normalized storm peak intensity; maximum temperature; minimum temperature; solar radiation and dew point.In all the cases, monthly average and standard deviation were evaluated, as well as position and dispersion measurements, respectively.
The statistical characteristics of the synthetic series generated by the CLIGEN, with 264,446 records, were compared to the historical series collected from 1975 to 2009, in accordance with Dubrovsky's (1997) considerations.Thus, the monthly averages and the respective standard deviation for each climate variable of the historical and synthetic series of each station were estimated.However, firstly, the annual precipitation average and its standard deviation in each station were evaluated.
For the statistical evaluation, the estimated (synthetic) and the observed (historical) data were adjusted to linear equation passing through the origin, that is, with the linear coefficient (α) being null (equal to zero).The perfect adjustment occurs when the angular or regression coefficient (β) equals 1.Following Bussab's (1988) guidelines, the observed data were attributed to the dependent variable (Y) whereas the estimated data were attributed to the independent variable (X).
A regression analysis was conducted and the hypothesis H 0 : β = 1was tested by means of Student's t test, verifying the tendency of the model to underestimate (β > 1) or overestimate (β < 1) the climate variable.The notation used for the test result was: ns (non-significant), it is not possible to affirm that β≠1, and it was considered that β = 1; *, β ≠ 1 (5%); ** and β ≠ 1 (1%).The coefficient of determination n (R 2 ) was also estimated, which, for better adjustment, should remain close to 1.
The observed and synthetic results of each station were subjected to the efficiency coefficient of a model proposed by Nash & Sutcliffe (1970) (NS) as an adjustment parameter to the curve (Eq.1), i n accordance with the criteria established by the American Society of Civil Engineers (ASCE) for the evaluation of models in watersheds (ASCE, 1993).
When the observed values are close to the average, the denominator of Eq. 1 tends to zero and the NS coefficient takes negative values, even with relatively small deviation.Therefore, the NS efficiency coefficient provides better results when the coefficient of variation of all the observed data is high (ASCE, 1993).
The geographic distribution was mapped by interpolating the NS efficiency coefficient for the average and standard deviation of the climate variables that presented conditions, that is, values between 0 and 1, with variability that justified the special exploration of the index.The technique of geoprocessing the Inverse Distance Weighted (IDW) was used to verify the influence of the geographic location in the NS efficiency coefficient.

results and dIscussIon
The average of annual precipitation and the standard deviation of the historical (observed) and synthetic (simulated) data for the evaluated stations are presented in Figure 3.The simulated annual average precipitation varied from 1,437.1 to 2,410.1 mm, with standard deviation ranging from 173.2 to 303.3 mm.
The statistical coefficients close to 1 obtained showed that the estimate of the annual average was coherent; however, the coefficient of regression was statistically different from 1, with 1% significance, highlighting the tendency of overestimation of the annual average of precipitation generated by the CLIGEN, though it was only 1.8% (0.9818).The standard deviation (dispersion) of the simulated data was under the observed variation, presenting a significantly different coefficient of regression (β) of 1.47.
The statistical coefficients of the number of wet days per month are presented in Table 2, where it is observed that the simulated estimates of the monthly averages were close to the observations in all the stations and in all the coefficients.The coefficients of regression were not statistically different from 1, both for the average and the standard deviation, indicating that there is no tendency to overestimate or underestimate, even at Est_09 with coefficient of regression 1.4519 for the standard deviation.
The lowest values of the coefficients of determination and NS efficiency of the average were 0.9961 and 0.81272 for Est_20.The NS coefficient for the standard deviation presented negative values in all stations, which indicated a deficiency of the model to estimate data with dispersion equivalent to the observed ones, though the coefficient of regression did not present statistical difference of 1 (β = 1).
The average daily precipitation of the wet days presented statistical coefficients close to 1, signaling good adjustment of the simulation with the historical values (Table 3), not only in the average but also in the standard deviation.The coefficient of regression for the average was not different from 1 in 75% of the stations with 5% significance.In the remaining stations, the lowest (overestimation) value was 0.9668, at Est_14 (Guarapuava).The lowest efficiency coefficient was 0.7599 at Est_07 (Paranavaí).

average of the n observations
The interpretation of the NS efficiency coefficient is complex, as it may take values that tend to -∞, which makes the interpretation meaningless.When the value of NS is close to 1, it is possible to affirm that the model approaches perfect adjustment.However, when it is close to 0, it means that the model cannot predict values better than the average of the observations. (1) The variation of the NS efficiency coefficient of the average enabled the interpolation and design of a distribution map presented in Figure 4.The interpolation of the NS efficiency coefficient shows a tendency to reduce the efficiency heading to the north of the State, whereas the regions with the highest values are in the south and in the coast of the State.
For the standard deviation, 66% of the stations presented coefficients of regression different from 1, with at least 0.05 significance.The highest coefficient of regression was 1.0828, observed at Est_14 (Guarapuava).
The variation of the NS efficiency coefficient of the standard deviation also enabled the interpolation and design of a distribution map, presented in Figure 5.The interpolation of the NS efficiency coefficient shows the same tendency as the one presented by the average of the average daily precipitation, although with higher variation of values.
Probably, the observed tendency, both for the average and the standard deviation, is due to the alterations in the formation of rainfall that occur in Paraná between the summer and the winter.In other words, in the summer there is intense formation of convective rain falls which, because of the high temperatures, are more frequent in the north of the State.In the winter there is the predominance of frontal rainfalls (cold fronts), which generally spread all over the State.The CLIGEN would find A.
B. Vaghefi & Yu (2011) stated that, differently from the other models, the CLIGEN can also generate the parameters that describe the rainfall pattern, such as the duration of the precipitation, the normalized storm peak intensity, and the time between start and peak.Nevertheless, the NS efficiency coefficients of the average and standard deviation of the normalized storm peak intensity presented negative values.
The performance of CLIGEN in this variable may be associated to several reasons.Firstly, the low coefficient of variation of the average (4.7%) may have affected the performance of the NS efficiency coefficient, as described by ASCE (1993).Another reason is the rainfall distribution in Paraná, which may not fit into type 4 or in the other types proposed by the SCS-USDA.
The difficulty found to obtain the maximum intensity of synthetic rainfall statistically similar to the observed one is also reported by other authors.Evangelista et al. (2006) reported considerable percentage variations in storm peak intensity using 30 years of data in Viçosa, MG; Oliveira et al. (2005) found high percentage variations in the instantaneous maximum variations of precipitation, working with 29 years of data in 11 stations in Rio de Janeiro-RJ.Both used CLIGEN.Yu (2003) found systematic overestimation in the intensity of rainfall and erosivity.The author used data from 43 Australian stations with 24 to 62 years of data collecting and attributes such an effect to the particular type of rainfall assumed by the CLIGEN.The complexity in collecting data about the temporal distribution of rainfall is also observed in other methods like the Chicago or "Bureau of Reclamation", mentioned by Bertoni & Tucci (2007).
The averages and standard deviations of the maximum (Table 5) and minimum (Table 6) temperatures present values Table 4 presents the statistical coefficients of the monthly averages of the normalized storm peak intensity and their standard deviations between the observed and simulated data.It is possible to observe that the coefficients of determination (R 2 ) of the monthly averages and standard deviations are close to 1, which indicates very good adjustment of the model.
However, in 45% of the stations, the coefficient of regression for the average presented significant difference from 1, that is, the predicted value could not be considered equal to the observed ones.Among those stations, only Est_10 (Nova Cantu) showed tendency of underestimation, with 1% significance.The coefficient of regression was different from 1 for the standard deviation in 75% of the stations.
Table 4. Angular coefficient (b) of the linear regression of the monthly average of the normalized storm peak intensity between the historical (observed) and synthetic (simulated) data, coefficient of determination (R 2 ) and Nash-Sutcliffe (NS) efficiency coefficient, per station close to 1 for the three statistical coefficients in all the stations, showing very high proximity between the observed and the simulated data.The coefficient of regression different from 1 was observed at station Est_02 for the average of the maximum and minimum temperatures, and at stations Est_11, Est_15, Est_19, for the standard deviation of the minimum temperature.
However, even in such stations the NS efficiency coefficient indicated good adjustment of the simulated data.
Nevertheless, one should take into account Harmel et al.'s (2002) considerations in relation to the occurrence of months in which the distribution of temperature is not normal, but slightly skewed, different from the normal distribution assumed by the model.Lopes (2005) claims that this would not affect the averages, but it could cause the generation of extreme temperatures, higher than the observed data.The data confirm this possibility, as the absolute maximum daily temperature found in the synthetic data was 45.8 ºC at Est_07, against the observation of 41.5 ºC, a difference of 4.3 ºC.However, at Est_15 it was possible to observe the absolute maximum 37.5 ºC, while the synthetic maximum was 38.2 ºC, a difference of 0.7 ºC, supporting Lopes's (2005) statement as a possibility.
The results of the coefficients related to the monthly average of the solar radiation are presented in Table 7, where it is possible to observe that, for the averages, all the coefficients are close to 1, showing very high proximity between the observed and the simulated data.Only station Est_04 (Bandeirantes) presents coefficient of regression that differs statistically from 1, underestimating the generated values.
Table 5. Angular coefficient (b) of the linear regression of the monthly average of the maximum temperatures between the historical (observed) and the synthetic (simulated) data, coefficient of determination (R 2 ) and Nash-Sutcliffe (NS) efficiency coefficient, per station Table 6.Angular coefficient (b) of the linear regression of the monthly average of the minimum temperatures between the historical (observed) and the synthetic (simulated) data, coefficient of determination (R 2 ) and Nash-Sutcliffe (NS) efficiency coefficient, per station Table 7. Angular coefficient (b) of the linear regression of the monthly average of the daily global solar radiation between the historical (observed) and the synthetic (simulated) data, coefficient of determination (R 2 ) and Nash-Sutcliffe (NS) efficiency coefficient, per station For the standard deviations of the solar radiation (Table 7), the coefficient of regression of all the stations differed from 1, with 5% significance, revealing a tendency to underestimate the dispersion of the radiation data, except at Est_04.The coefficient of determination ranged from 0.7984, at Est_04 to 0.9943 at Est_11; however, the NS efficiency coefficient remained negative for the majority of the stations.Only in two stations, Est_03 and Est_11, this coefficient presented values above zero.The results indicate a deficiency in the estimate of the synthetic data with similar variations to the observed ones.
The assumptions to synthesize the daily series of solar radiation are the same for the maximum, minimum, and dew point temperatures, that is, the data should present normal distribution.Nevertheless, the solar radiation presented underestimation of the standard deviation that was not observed in the maximum and minimum temperatures.Probably, this is due to the fact that the CLIGEN estimates the independent parameters, without considering that, in practice, there is dependence between the climate variables.Wet days interfere in the solar radiation; therefore the solar radiation distribution would be partially linked to the distribution of dry days.
The statistical coefficients of the average and standard deviation of the dew point evaluated are presented in Table 8.Only stations Est_11 (1.0045) and Est_12 (1.0043) had coefficient of regression of the average statistically different from 1.The coefficients of determination and NS efficiency followed the same tendency, and the lowest value was obtained at station Est_13 with 0.9893 of NS efficiency coefficient.For the other stations, the coefficients were either very close to 1 or equal the unit with 4 decimals of precision.
clearly influenced by the coast stations (Est_11 and Est_12), which present close to zero and negative values.
Some stations have uncommon local characteristics that can influence the performance of the model in the generation of synthetic data when compared with the observed data.This is the case of stations Est_12 and Est_11, which are on the coast, almost at sea level (40 and 59 m), and were probably affected by the proximity of the ocean in the distribution of dew point, estimated from relative humidity and average temperature.
The performance of the model was similar in the evaluated variables.Overall, the estimated average was more accurate than the generally underestimated dispersion (standard deviation).In some variables, such as the maximum temperature, there was no underestimation of dispersion.However, the normalized storm peak intensity showed variations that indicate a difficulty of the CLIGEN to estimate this climate variable.
As described by Srikanthan & McMahon (2001) and Lopes (2005), usually stochastic models do not preserve the variance of the precipitation data.This probably occurs because of the inadequacy of the models in relation to the factors that interfere in the real variations in the long run (low frequency), for instance the patterns of air circulation on a large scale with periods lasting several years, or even the weather changes and the El Niño (or La Niña) effect of Southern Oscillations (ENSO).
By studying the weather changes and the variances between grouped years in the United States, Zhang & Garbrecht (2003) and Zhang (2003) identified underestimation of the standard deviations in the monthly averages in 4 and 5 stations respectively, with more than 50 years of data.However, when they stratified the historical series in dry and wet years, generating two sets of input parameters, they obtained frequencies that were relatively well represented by the model.Lopes (2005), studying the parameterization of the CLIGEN with data collected between 1985 and 2003 at Castelo Branco Table 8.Angular coefficient (b) of the linear regression of the monthly average of the dew point between the historical (observed) and synthetic (simulated) data, coefficient of determination (r 2 ) and Nash-Sutcliffe (NS) efficiency coefficient, per station The coefficient of regression for the standard deviation of the dew point oscillated between values that were higher and lower than 1, showing both overestimation and underestimation.However, 55% of the stations showed good estimates, presenting no statistical difference of 1. Station Est_12 presented the lowest coefficient of regression (0.7369), whereas station Est_19 (1.0817) presented the highest.
The coefficients of determination were close to 1, with the lowest value at Est_08 (0.9870), whereas the NS efficiency coefficient presented variation between -2.1479 (Est_12) and 0.9331 (Est_17).It was possible to conduct the interpolation of this coefficient, considering the variation found and the existence of only one negative value, presented on the map in Figure 6.
The interpolation of the NS efficiency coefficient for the standard deviation of the average dew point shows that in the center-south and southwest regions of the State the model is more efficient to estimate variability.The region where efficiency is the lowest is the East, near the Atlantic shore,

Figure 2 .
Figure 2. Rainfall distribution types defined by the Soil Conservation Service (Soil Conservation Service, 1986) and used by CLIGEN

Figure 3 .
Figure 3. Annual average precipitation (a) and standard deviation (b) of the historical (observed) and synthetic (simulated) datafor the agrometeorological stations, with the coefficients of regression of line adjustment (b), of determination of adjustment (R 2 ) and of NS efficiency (Nash & Sutcliffe) Table 2. Coefficient of regression (b) of the linear regression of the monthly average of the number of wet days between the historical (observed) and synthetic (simulated) data, coefficient of determination (R 2 ) and Nash-Sutcliffe (NS) efficiency coefficient, per station

Figure 6 .
Figure 6.Interpolation of the NS (Nash & Sutcliffe) efficiency coefficient of the standard deviation (S.Desvi.) of the monthly average of dew point (Dew P.) and interpolated stations (▲)