Development and evaluation of neural network models to estimate daily solar radiation at Córdoba , Argentina

The objective of this work was to develop neural network models of backpropagation type to estimate solar radiation based on extraterrestrial radiation data, daily temperature range, precipitation, cloudiness and relative sunshine duration. Data from Córdoba, Argentina, were used for development and validation. The behaviour and adjustment between values observed and estimates obtained by neural networks for different combinations of input were assessed. These estimations showed root mean square error between 3.15 and 3.88 MJ m-2 d-1. The latter corresponds to the model that calculates radiation using only precipitation and daily temperature range. In all models, results show good adjustment to seasonal solar radiation. These results allow inferring the adequate performance and pertinence of this methodology to estimate complex phenomena, such as solar radiation.


Introduction
The effects of climate variability and change on agricultural systems have led to increased interest in the study of the interactions between crops and weather.Crop stimulation models are useful tools to assess climate impact on agriculture, and contribute to management decision alternatives (Boote et al., 1996;Podestá et al., 2004).
Model design allows to study analogue situations, to define a production process in detail and to show its extrapolation in time (Bocco et al., 2000).Models of crop growth, development and yield in a region, and the implementation of different hydrological or biophysical models, need weather data.The lack of meteorological data does not allow the use of previous analysis procedures, so it requires the implementation of different methods of assessment (De Jong & Stewart, 1993).
Solar radiation affects crop growth and numerical models are used to estimate soil humidity, photosynthesis, and potential evapotranspiration (Ball et al., 2004).Solar radiation is an infrequently measured meteorological variable, compared to temperature and rainfall (Liu & Scott, 2001;Weiss & Hays, 2004); there are many agricultural regions of Argentina lacking data on radiation and, thus, they have to be estimated (Grossi Gallegos, 1998).To calculate the solar radiation that reaches the earth's surface, empirical and physical models can be used (Noia et al., 1993a(Noia et al., , 1993b;;Flores Tovar & Baldasano, 2001).
Empirical models to assess solar radiation were applied in Argentina by different authors; Alonso et al. (2002) used them in fifteen locations distributed all over the country; while Podestá et al. (2004) applied them in Buenos Aires and Pergamino; and De la Casa et al. (2003) used them at different places in the province of Córdoba. Elizondo et al. (1994), Mohandes et al. (1998a) and Reddy & Ranjan (2003) estimated global solar radiation using neural networks (NN) -a mathematical modelling technique.NN are a structure of neurons joined by nodes that transmit information to other neurons, which give a result by means of mathematical functions (Hilera & Martínez, 2000).The NN learn from the existing information through some training, process by which their parameters (weights) are adjusted, so as to provide an approximate output close to the desired one.In this way, they acquire the capacity to estimate answers of the same phenomenon.
The use of NN to predict weather phenomena can be found in researches by Zurada (1992) and Clair & Ehrman (1998), among others.The NN were used in hydrology to predict rainfall and runoffs in basins with a range of characteristics (Shamseldin, 1997;Brahm & Varas, 2003).Njau (1997) and Benvenuto & Marani (2000) developed models to predict temperature; Alexiadis et al. (1998) and Mohandes et al. (1998b) used NN to predict wind speed.
The objective of this work was to develop neural network models capable to estimate daily solar radiation values, based on different meteorological data as input, and to assess the behaviour of such networks and the errors on the estimates considering different meteorological variables.

Material and Methods
Eight models of NN of the multilayer feed-forward preceptron type were designed.All of them included three neuron layers, input, hidden and output layer (Figure 1).Each model was built with a different number of neurons in the input layer (E i ) (Table 1), which received as input patterns, the following daily values: extraterrestrial radiation (ETR, in MJ m -2 d -1 ), which is calculated on the basis of latitude and the day of the year, according to Chassériaux (1990), and considers a solar constant of 1,370 W m -2 , adjusted by the earth's position; square root of the temperature range (SRTR, ºC ½ ), following the procedure used by Liu & Scott (2001).The precipitation was considered in two different ways, as a potential function with exponent 0.05 (PP 0.05 , mm 0.05 ), according to Alonso et al. (2002), and as a binary function with value 1 for occurrence and zero for days with no precipitation (PP bin ) (Liu & Scott, 2001).This representation does not mean an important loss of information, since according to our data, in 95% of the cases with precipitation, the values of the potential function differ from the unit in less than 0.15; cloudiness observed at 14 hours local time (C 14 oktas) and the relative sunshine duration (h/H%, in which h is the sunshine duration, and H is the day duration, both in hours).
The hidden layer for all models was built with five neurons (Oj) and the output layer was built with only one neuron (S) that indicates the daily estimated solar radiation (SR, in MJ m -2 d -1 ).
Records of this meteorological information for the period 1988/1991 were taken from the Córdoba Observatory Station, in Argentina (31º26'S; 64º11'W; 438 m), which belongs to the Servicio Meteorológico Nacional (SMN).The global daily solar radiation data for the same period were supplied by the Centro de Investigaciones Acústicas y Luminotécnicas (Cial) (31º26'S; 64º11'W; 549 m), which belongs to the Red Solimétrica Nacional of the Universidad Nacional de Córdoba.
The steps that demonstrate the training algorithm of the proposed networks are described, according to Hilera & Martínez (2000), as follows: initialise the weights in the net with random values (step 1); read an input pattern X p : (x p1 , x p2 , ..., x pN ) and the desired output d: (d 1 , d 2 , ..., d M ) associated to such input, corresponding to the records of the years 1988/1989 (step 2); generate the output calculated by the net for the presented input.To do so, the values of the answers in each layer are obtained, until the output layer is reached (step 3).The sub-steps to be followed are described in the next paragraphs.
The net for the hidden neurons coming from the input (net) is calculated.For a hidden neuron j (Oj): , in which p is the p-th training vector; j is the j-th hidden neuron; w ji is the weight of the connection between E i and O j ; and θ j is the minimum threshold to be achieved by the neuron for its activation.Based on these inputs, the outputs of the hidden neurons (y) are calculated, using the sigmoid activation function f to minimise the error: . To obtain the results of each k neuron in the output layer, the same is done: . Once all neurons have an activation value for a given input pattern, the algorithm continues calculating the error for each neuron, except for those in the input layer (step 4).For the k neuron in the output layer, if the answer is (y 1 , y 2 , ..., y M ), such error (δ) can be expressed as , and for the sigmoid function in particular: δ pk = (d pk -y pk )y pk (1 -y pk ).
If the neuron j is not an output one, then the partial derivative of the error cannot be directly calculated.Thus, the result is obtained out of known values and others that can be evaluated.The resulting formula is: , in which the error in the hidden layers depends on all the terms of the error in the output layer.For this reason they are called backpropagation.
The error in a hidden neuron is proportional to the sum of the known errors that are produced in the neurons connected to its output, each of them multiplied by the weight of the connection.The inner thresholds for the neurons are adapted in a similar way, considering they are connected with weights from auxiliary inputs with constant value.
In order to update the weights, a recursive algorithm is used, starting with the output neurons and working backwards until the input layer is reached (step 5).These weights are adjusted in the output layer neurons as ; and in the hidden layer neurons as ; .
In both cases, to accelerate the learning process, a learning rate (α) equal to 1 is included, and to correct the direction of the error, a moment term γ(w kj (t) -w kj (t -1)) is used in the case of an output neuron, and γ(w ji (t) -w ji (t -1)) when it is the case of a hidden layer, with a γ rate equal to 0.7.
This process is repeated an n number of times, so that an acceptably low square error (E p ) for all the learned patterns can be reached (step 6): . In this learning phase of the designed net, a thousand repetitions were done to ensure that the minimum square error was less than 0.01.
The model validation phase was done with the weights generated in this first stage, and the registered data of the years 1990/1991.The number of records for each model can be observed in Table 1.

Results and Discussion
The results of the validation process of all models allowed the calculation of different statistic values of the error and the correlation coefficient between observed and estimated values of solar radiation for Córdoba (Table 2).All models present good performance in the estimation of solar radiation.
The results of the models that, using the same input, differ only in the consideration of precipitation (M1 and M2; M3 and M4; M7 and M8) show a similar behaviour; therefore, in weather stations where rainfall is not measured, the occurrence or absence of the phenomenon can be used without any loss in the quality of radiation estimation.
In cases of no data on sunshine duration, the records of cloudiness at 14 hours become relevant in the estimation of solar radiation, as shown in models M6, M7 and M8.The models that consider only temperature and precipitation information (M3 and M4) are the ones which show greater errors in the estimation.However, this difference is not relevant.
In order to analyse the performance of the models that present better adjustment (M1 and M2), a dispersion diagram considering observed and estimated solar radiation values was done in the validation stage of model M1 (Figure 2).This model tends to underestimate the highest values of observed solar radiation.For daily radiation over 25 MJ m -2 d -1 the underestimation percentage reaches, on average, 15%.
In the analysed models, the temporal evolution of the estimated values of solar radiation shows a seasonal behaviour pattern correctly adjusted to those of observed ones.As an example, Figure 3 shows the temporal evolution for the model M1.The maximum values obtained in it tend to underestimate the real ones.This becomes more apparent during summer.
The results of the models agree with similar researches on the estimation of solar radiation.Both Liu & Scott (2001) and Podestá et al. (2004) observed that the models considering precipitation as a binary variable had a similar result compared to those which used rainfall data (mm) as an input.
The results obtained applying different neural networks show similar values to those found previously.Liu & Scott (2001) estimated solar radiation in different cities in Australia using temperature and precipitation data.Their root mean square error (RMSE) values were between 2.01 and 5.44 MJ m -2 d -1 .
The comparison of the results for different models proves that the exclusion of sunshine duration as a relative variable increases the error and reduces the correlation coefficient in the validation stage.Different authors also noted the importance of sunshine duration in the estimation of solar radiation.Alonso et al. ( 2002) found a higher root mean square error in the estimation of solar radiation, for different locations in Argentina, when a model based on temperature range and daily precipitation was used.This error is higher than the one found by the Angström-Prescott traditional method, that uses relative sunshine duration as input data.The average values of RMSE were 2.55 MJ m -2 d -1 when sunshine duration was used, and increased to 3.87 MJ m -2 d -1 , when temperature and precipitation were used.Podestá et al. (2004) reported an average RMSE of 1.72 MJ m -2 d -1 for the Pampa Húmeda in Argentina, with estimations based on relative sunshine duration.When using temperature and precipitation, such errors reached an average of 3.76 MJ m -2 d -1 .The values of the correlation coefficient decreased from 0.98 to 0.90, respectively (Table 2).
When calculating solar radiation in Córdoba on the basis of relative sunshine duration and cloudiness, de la Casa et al. (2003) found RMSE of 3.60 and 3.13 MJ m -2 d -1 , respectively.The error increased to 3.74 MJ m -2 d -1 , when temperature range was used, and to 3.82 MJ m -2 d -1 , when a second degree function of precipitation was considered.
Even if the errors are slightly higher, it is very important to consider the M3 and M4 models, which use only temperature and precipitation as input, since in Argentina, most of the stations have only instruments to measure these variables; these models constitute then a very useful tool.

Conclusions
1.The modelling process using neural networks is effective to estimate solar radiation, when only a limited number of meteorological variables is available.
2. These models allow the adjusted reproduction of the patterns of the temporal evolution of solar radiation.
3. The results of the models that differ only in the way the rainfall is considered, show a similar behaviour; the models considering only temperature and precipitation information are the ones that show greater errors in the estimation; however, this difference is not relevant.

Figure 1 .
Figure 1.Scheme of a neural network of the multilayer perceptron type (Model M1).

Table 1 .
Variables and input patterns of the training and validation phases for the Neural Network Models.

Table 2 .
Root mean square error (RMSE), mean absolute error (MAE), and linear correlation coefficient (r) in the validation phase for the different models.