Calibration of Methods to Estimate Solar Irradiance in Northeastern Pará

Two models aimed to estimate solar irradiance were calibrated in six locations in Northeastern Pará (Belém, Cametá, Conceição do Araguaia, Marabá, Soure, and Tucuruí). The first one is the equation of Angström-Prescott (AP), which requires observations of sunshine duration hours. The second model is a modified version of Hargreaves' radiation formula (MH), which requires observations of daily maximum and daily minimum air temperatures. Both models were calibrated to estimate daily and monthly solar radiation. The calibration of both equations for each season (i.e., dry season and wet season) in each location was also tested. AP has an average performance about 74% higher than MH for daily estimates (excluding Soure) and 83% higher than MH for monthly estimates (excluding Soure and Tucuruí). The use of seasonally calibrated equations slightly improves the performance of AP, measured by the performance index, by 0.68% and improves the performance of MH in most locations, when estimating daily solar radiation. The performance of both models is much higher when estimating monthly solar radiation than daily solar radiation, with an increase of the performance index of 10.95% for AP.


Introduction
Global horizontal irradiance has a linear relationship with sunshine duration hours, which is described by the Angström-Prescott equation (AP). However, its intersect (a s ) and slope (b s ) must be calibrated so it can provide reliable estimates (Allen et al., 1998). The calibration is performed by linear regression and requires simultaneous measurements of both global horizontal irradiance and sunshine duration hours. Moreover, Allen et al. (1998) recommend Hargreaves' radiation formula to estimate the fraction of solar extraterrestrial radiation that reaches Earth's surface when measurements of sunshine hours are not available. This formula may be useful, since it requires only the difference between daily maximum and minimum air temperature, both of which are widely available data, but Hargreaves' formula also requires calibration.
When calibrated coefficients for AP are not available, Allen et al. (1998) recommend a s = 0.25 and b s = 0.50. On the other hand, Glover and McCulloch (1958) argue that a s = 0.29 cos (ϕ) -where ϕ is the latitude -and b s = 0.52 can provide reliable estimations of solar irradiance for a wide range of locations. We have not found studies aiming to calibrate or even to validate such widely used coefficients and equations for many places in northeastern Pará. Some studies (Carvalho et al., 2011;Medeiros;Bezerra, 2017;Silva, 2014) that have been carried out for another locations in Brazil have shown that the calibrated coefficients for AP are usually different from the coefficients recommended by Allen et al. (1998) and by Glover and McCulloch (1958). Solar radiation has a strong influence on evapotranspiration calculated by Penman-Monteith FAO 56 formula (Allen et al., 1998), as Carvalho et al. (2011) have shown. Evapotranspiration is a major component of water balance in most locations, and as such, estimates of this variable are widely used in irrigation, crop modelling, climate risk assessment and so on. Therefore, when measurements of incoming solar radiation are not available, they must be estimated as accurately as possible. In this work, coefficients a s and b s of AP were fitted for six locations in northeastern Pará. Fitted coefficients are compared to the coefficients recommended by Allen et al. (1998) and by Glover and McCulloch (1958). A modified version of Hargreaves' formula (MH), which includes an intercept, was also calibrated. We aim to answer whether: (I) calibrated coefficients for AP differ from the coefficients recommend by both Allen et al. (1998) and by Glover and McCulloch (1958); (II) the use of calibrated coefficients for dry and wet season improves the performance of AP or MH for daily estimates of solar irradiance, and (III) AP and MH can properly estimate monthly average solar irradiance.

Weather data
We used weather data recorded from 12 weather stations that belong to the Brazilian National Institute of Meteorology (INMET). The weather data comprises six locations ( Fig. 1 and Table 1): Belém (BEL), Cametá (CAM), Conceição do Araguaia (CON), Marabá (MAR), Soure (SOU), and Tucuruí (TUC). There is one synoptic weather station (SWS) and one automated weather station (AWS) in each municipality listed above.
Hourly incoming solar radiation as well as maximum and minimum daily temperature were obtained from  Jun-Nov each AWS. Total daily R s was obtained from the sum of the recorded hourly R s from 9:00 h to 21:00 h UTC, which corresponds to daytime in such locations. Days with missing records in one or more hours during daytime were disregarded. Daily records for sunshine hours (n) were obtained from the SWS in each location.

Calibrated equations
Two mathematical models aimed to estimate solar radiation were calibrated: (I) the Angström-Prescott equation (Eq. (1)), which requires daily sunshine duration hours, and (II) the Hargreaves' equation, which requires the difference between daily maximum and minimum air temperatures (ΔT) (Eq. (2)).
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where R s -solar irradiance (MJ m -2 ), R a -extraterrestrial solar irradiance (MJ m -2 ); n -sunshine duration hours (h); N -potential sunshine duration hours or actual day length (h); T max -maximum daily air temperature (°C); T minminimum daily air temperature (°C); a s , b s , a h , and b hcoefficients. Both equations were calibrated for estimates of total daily and total monthly solar irradiance. For estimates of daily solar irradiance, we also tested the calibration of seasonal (i.e., dry season and wet season) coefficients for Angström-Prescott equation (S-AP), and seasonal coefficients for modified Hargreaves' radiation formula (S-MH). The dry season was defined as the months when the potential evapotranspiration is greater than the total precipitation (Pereira, 2005), respecting the regional pattern stablished by Amanajás and Braga (2013), who defined the three driest months as well as the transition months between the dry and the wet season in the studied region.
Additionally, the Angström-Prescott equation as well as the Hargreaves equation were calibrated using data from all locations together, so to obtain a general equation valid for the whole region. They are abbreviated as R-AP and R-MH, respectively.
where ȷ -day of year, N -potential sunshine duration hours or actual daylength. Based on such calculated variables and on the recorded weather data, relative incoming radiation (R s /R a ) and relative sunshine duration (n/N) were then calculated.
Monthly averages of R s were obtained from the average of each day that has a complete set of records. Since the weather data from AWS usually have some small gaps, a tolerance of 20% of days with missing data was adopted for computations of monthly averages of R s . Therefore, months with less than 80% of days with a full set of records of R s were excluded from the analysis. The average monthly R a was calculates as the average of total daily R a from Eq. (3), excluding from the sum all days with missing records of R s . The same procedure and 20% tolerance were adopted for n and N.

Validation of results and statistical analysis
The performance of each model was assessed by means of the following statistical indicators: Pearson correlation coefficient (r) (Eq. (8)); Willmott's index of agreement (d) (Willmott, 1981) (Eq. (9)), which indicates how well model-produced estimates simulate observed data ; performance index of Camargo and Sentelhas (1997) (c) (Eq. (10)), which is the result of multiplying r and d, therefore, aggregates the information provided by both indicators, making it easier to visualize; mean error (me) (Eq. (11)), which is used here to compare different models in the analysis of variance; root of mean square errors (rmse) (Eq. (12)), which indicates how far estimates are from the their respective observed data, on average; and percent bias error (pbias) (Eq. (13)), which indicates the overall tendency of a model to underestimate (negative sign) or overestimate (positive sign) a variable in relation to observed data. All equation fitting, compar-isons and analysis were performed in R. r = cov O; P ð Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi rmse = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 N wherew P i is the predicted value in the day or month i; O i is the observed value in the day or month i; P is the average of predicted values; Ō is the average of observed values; N is the number of samples (i.e., days or months with the necessary recorded weather data).
The calibration was performed with 70% of the available records and the validation of calibrated equation was performed with the remaining 30% of records (Table 2 and Table 3), which were chosen randomly. Table 4 shows the calibrated coefficients for AP and MH. Regression analysis performed with calibration data shows significance at 1% levels for AP and MH for all locations, except for Soure, where MH is not significant. Regarding the calibrated coefficients for AP, it may be noted that, except for the intercepts of CAM and TUC, all calibrated coefficients differ from the ones recommended by Allen et al. (1998) and by Glover et al. (1958) at 1% level, which reinforces the need for calibration. Such observations are also valid for the coefficients of the Angström-Prescott equation calibrated for all locations (R-AP).

Estimates of solar irradiance for daily time steps
Even though the calibrated Angström-Prescott equation has a low coefficient of determination (Table 5) in some locations, such as Belém and Tucuruí, such values are usually found in other works aiming to calibrate AP, for example, in Seropédica, RJ, the coefficient of determination r 2 = 0.83 was found (Carvalho et al., 2011), and in  Parnaíba, PI, the values are r 2 = 0.714 and r 2 = 0.515, during the wet season and the dry season, respectively (Andrade Júnior et al., 2012). One factor that may contribute to such large variability is the distribution of solar irradiance throughout the day. During noon, irradiance is typically greater than 970 W m -2 , while near the sunrise or sunset, the irradiance is usually smaller than 270 W m -2 , therefore, days in which the incidence of solar radiation concentrates around noon must have higher total irradiance than days in which the incidence of solar radiation concentrates near the sunrise or sunset, even if both days have the same sunshine duration hours. Ambas and Baltas (2014) performed an spectral analysis of solar radiation in hourly time-steps in western Macedonia and they observed that the clearness index (R s / R a ), is about 18% higher around noon, therefore, precise estimates of solar irradiance from sunshine duration hours must consider the distribution of the periods with incidence of direct radiation throughout the day. Unfortunately, this is not a widely available information. Table 6 shows the calibrated coefficients for S-AP and S-MH. Both S-AP and S-MH and their respective slopes are significant at 1% level in Soure, while MH is not. This indicates a strong seasonal variation of the dependence between daily total radiation and √ΔT in this location.
Confidence intervals for intercept and slope are also usually higher during the dry season, which may happen due to a smaller number of samples (days) (see Table 2) during the dry season. Since it is a rainy region, the dry season is shorter than the wet season. Silva Dornelas; Silva; Oliveira (2006) observed higher standard deviations of intercept and slope of AP during the dry season in Brasília -DF, which they attributed to increased dust and biomass burning. This could possibly explain the variation of the confidence interval throughout the year in Table 6, in addition to the smaller number of samples during the dry season mentioned above. The variations of a s + b s between dry and wet season (Table 6) also corroborate this hypothesis.
More expected, a s + b s is usually higher during the wet season than during the dry season (Table 6), except for Belém and Tucuruí. The sum of both coefficients in AP and S-AP represents the maximum theoretical fraction of extraterrestrial radiation that can reaches earth's surface in a clear sky day, since it is the result of AP when n/N = 1 (see Eq. (1)). Such difference in a s + b s between seasons may be related to differences in atmospheric transparency throughout the year. According to Guyon et al. (2003), particle number in Amazonian atmosphere increases from about 400 cm -3 to about 4,000 cm -3 when moving from the wet season to the dry season. These authors argue that this massive increase is due to the extensive seasonal biomass burning caused by deforestation and pasture cleaning. Increases in aerosol concentration could lead to dimin- ished atmosphere transparency during the dry season, and, therefore, smaller a s + b s . Another factor contributing for higher atmospheric transparency during the wet season is aerosol scavenging, which may happen by different processes, such as nucleation scavenging and impaction scavenging (in-cloud and below-cloud). According to Ohata et al. (2016), nucleation scavenging controls the removal efficiency of accumulation-mode aerosols. Similar patterns of a s + b s (higher during the wet season and smaller during the dry season) and for the confidence intervals of a s and b s (higher during the dry season and smaller during the wet season) are also found for another locations, such as Natal (RN) (Medeiros et al., 2017)where a s + b s is usually greater than 0.7 during the wet season and smaller than 0.7 during the dry season -and Seropédica (RJ) (Carvalho et al., 2011) -where a s + b s averages 0.75 during summer (wet season) and 0.71 during winter (dry season). Table 7 shows the various statistical indicator obtained by the comparison of estimated and measured R s with validation data. AP-LAT has always the highest percent bias error and mean error, which indicates a strong tendency of this model to overestimate R s . This tendency is also observed for AP-FAO in some locations, such as BEL, CON, and TUC, even though to a lesser extent. The poor performance of MH is demonstrated by the low Willmott's index of agreement (d) (Willmott, 1981) and the index of performance of Camargo et al. (1997) (c). In Soure, even though S-MH has a poor performance compared to AP or S-AP, the increase of performance when using S-MH instead of MH is remarkable.
Since MH is not significant in Soure, any √ΔT will result approximately the same R s /R a , therefore, all the var-iation in R s estimated by MH (in Soure) is due to variations in R a throughout the year. This explains the extremely low performance of MH in Soure. The abovementioned tendency of AP-LAT to overestimate R a is clearly seen in Fig. 2. It shows that AP-LAT resulted in estimated R s around 25 MJ m -2 day -1 when the measured R s is around 20 MJ m -2 day -1 in Belém, which represents almost 25% overestimation.
The AP equation calibrated for the whole region (R-AP) performs similarly to AP-FAO and AP-LAT, in the sense that it tends to overestimate R s in Belém. Also, the comparison of residual means shows that residuals of R-AP are usually different from the residuals of AP in each location. The MH equation calibrated for the whole region (R-MH) has a very poor performance, as shown by the low r, d and c, thus it cannot produce reliable estimates of R s at daily time steps. Table 8 shows the calibrated coefficients for AP and MH in each location. Regression analysis performed with calibration data shows significance for both AP and MH at 1% level for all locations, except for Soure, where only AP is significant at 1% level. Table 9 shows that the precision of estimations by both AP and MH is much better for monthly averages of solar irradiance than for daily irradiance. Even MH, which results in very imprecise estimates for daily totals of solar irradiance, could be safely used to estimate solar radiation in some locations, such as Marabá, Conceição do Araguaia, and Cametá.

Estimates of monthly averages of solar radiation
Such increase in precision in monthly estimates usually happen because the relative error of one day coun- Table 6 -Coefficients for seasonal Angström-Prescott equation (S-AP) and seasonal modified Hargreaves equation (S-MH) and their respective confidence interval (p ≤ 0.01) in each location and season. * * Significant at 1% level. * Significant at 5% level.

Location
Season     terbalances the relative error of another day, therefore resulting in smaller error at the end of a multi-day period (Allen et al., 1998). The same tendency of AP-FAO and AP-LAT to overestimate solar radiation, which was previously discussed, can be seen for monthly estimates as well (Table 9 and Fig. 4). In all the six locations, AP-LAT has the greatest mean error and the greatest absolute pbias, which indicates the use of this model results in the greatest overestimations. The accuracy of the calibrated models (AP and MH) is higher than the accuracy of the non-calibrated models (AP-FAO and AP-LAT), however, the calibrated Hargreaves equation (MH) is much less precise than the calibrated Angström-Prescott equation (AP). Such difference is noticeable in Fig. 4, since the points representing comparisons of estimates and observed data have much higher dispersion in MH than in AP. Table 9 -Pearson correlation coefficient (r), Willmott's index of agreement (d), performance index (c) of Camargo et al. (1995), mean error (me), root of mean square error (rmse), and percent bias error (pbias). Even for monthly estimations of R s , the MH equation calibrated for the whole region exhibits a low performance, shown by low values of r, d and c, and therefore should not be used to calculate mean R s from mean n in any one of these locations.

Conclusions
1. Calibrated coefficients in the Angström-Prescott equation significantly increase the accuracy of estimates in most locations, except for Cametá. The increase in accuracy is particularly high in Belém, where the calibration reduces the percent bias error from 24.2% to -0.4%. 2. Daily estimates of solar irradiance by Hargreaves' radiation formula usually have low precision, therefore, this model is not recommended in most locations. The highest precision of MH for daily time steps occurs in Marabá, where the calibrated Hargreaves' radiation formula has c = 0.644, (performance index) while, on the other hand, the calibrated Angström-Prescott equation has c = 0.838.

Both the Angström-Prescott equation and the modified
Hargreaves' radiation formula show a great improvement in accuracy when used to estimate monthly aver-age solar radiation, instead of daily total solar radiation. Angström-Prescott equation is still more accurate than the modified Hargreaves' radiation formula, however Hargreaves' can be safely used to estimate monthly solar irradiance in Cametá, Conceição do Araguaia, and Marabá. 4. When calibrated coefficients for Angström-Prescott equation are not available, the use of the coefficients recommend by FAO is preferable to the ones obtained by the latitude method.