CROP MODELING WITH LESS DATA: THE FAO MODEL FOR SOYBEAN YIELD ESTIMATION

Crop growth simulation models such as WOFOST and DSSAT are useful, but require several inputs that sometimes are not available, especially in developing areas. In addition, measured data is usually time and labor-intensive. In search of faster and easier methods for soybean estimates, this study presents a lower input requiring methodology for yield estimation. This study combines the FAO-33 yield model with the agroecological zone approach for soybean yield estimations using mostly indirect data. Sowing and harvest dates and yield were collected from 74 soybean commercial farms. Agrometeorological data from the European Centre for Medium-Range Weather Forecasts (ECMWF) were used. Fifty farms (66%) were used to calibrate the model and 24 farm areas (33%) were used for evaluation purposes. Two methodologies (FAO-56 and Thornthwaite and Mather) for water balance and actual evapotranspiration (ETa) estimations were used. The comparison of yield estimations and observations showed that the use of low data input to obtain reasonable accuracy, with a mean error of −310 kg ha and a mean absolute percentage error of 23.3%.


INTRODUCTION
Global food issues are frequently raised due to the growing demand for food in the world market and fluctuations in commodity prices (Sakamoto et al., 2014), pointing to the importance of studies that reduce market speculation and assist in food management. In this sense, local and regional estimates of crop yield are important for macro-and microeconomic management (Johnson, 2014). Thus, proposing ways to obtain agricultural statistics reliably and quickly is essential (Johnson, 2014), and one way to obtain such information is through the remote sensing technique. There are different methods to do so. Moges et al. (2007) used a hand-held optical sensor to estimate sorghum yield in Oklahoma -USA, obtaining an in-season estimated yield derived from green Normalized Difference Vegetation Index (NDVI) correlated with final grain yield (r = 0.71). Gusso et al. (2017) developed a satellite remote sensingbased procedure to estimate soybean production before crop harvest, with coefficients of determination ranging from 0.91 to 0.98. Richetti et al. (2018) used machine learning algorithms with Enhanced Vegetation Index (EVI) for soybean in Brazil and obtained a mean error of 3.5 kg ha −1 , RMSD of 373 kg ha −1 , and Willmott's d of 0.85.
Another approach consists of crop growth mechanistic models such as WOFOST (World Food Studies) (Boogaard et al., 2014) or DSSAT (Decision Support System for Agrotechnology Transfer) (Jones et al., 2003), which are reliable and present accurate results. However, the minimum data requirements are sometimes high for developing or less technologically advanced regions. Antle et al. (2017) presented future strategies for agricultural systems and modeling, indicating that data are the foundation for the science and the analysis of agricultural systems. However, better data may be the greatest need and challenge to achieve the next generation of research and modeling for agricultural systems. In addition, the lack of information for model inputs is still a challenge in many developing areas. There is little information on agrometeorological and soil data, and more complex inputs, such as the soil fertility level, are very difficult to obtain in some cases. Thus, this study aimed to assess whether a low input model is capable of acceptable accuracy results.

Study area
The study area comprised of 74 monitored farm areas with an average size of 102 ha, in the state of Paraná (southern Brazil), located between parallels 22°29′ S and 26°43′ S and the meridians 48°2′ W and 54°38′ W (Figure 1), with an area of 199,308 km 2 . The climate in the state of Paraná is mainly a humid subtropical climate (Aparecido et al., 2016), with a predominance of clay-textured Oxisols (EMBRAPA, 2009). The state of Paraná accounts for almost 18% of the Brazilian production (CONAB, 2018), which means a soybean production higher than that of China, the world's fourthlargest soybean producer (FAOSTAT, 2018). The observed yield (Yobs) and sowing and harvest dates from 74 farms ( Figure 1) from 2007/2008 to 2013/2014 growing seasons were used. In the state of Paraná, soybean is usually sown from September to November and harvested from January to April. A total of 46 farms were used in the 2013/2014 growing season, 13 in 2012/2013, four in 2011/2012, three in 2010/2011, three in 2008/2009, three in 2007/2008, and two in 2009/2010. Some farms belong to partners, but most of the data (2012/2013 and 2013/2014 growing seasons) were obtained from field research developed during the Embrapa project named MAPAGRI (Mapping Agricultural Activity in Brazil), developed and completed in 2014 by several universities and research institutions in Brazil and financed by Embrapa Agricultural Informatics.

Data
Agrometeorological data, quality, and reliability are essential in the estimation of agricultural yield. Thus, climate data from the European Centre for Medium-Range Weather Forecasts (ECMWF) were used considering the poor network of weather stations in the state, with only 59 weather stations in operation, the red tape for obtaining data, the irregular spatial distribution of the stations, and the lack of real-time data availability (IAPAR, 2015). Johann et al. (2016) conducted a study on ECMWF climate elements in the state of Paraná and concluded that the precipitation and ET0 data had a significant uncertainty when compared to meteorological station data. However, Hagedorn et al. (2008) pointed out that raw ensemble forecasts of surface temperatures were biased, but approximately 70% of the improvement in ECMWF could be attributed to a simple correction of the mean bias. Therefore, the calibration of the model by HI should correct the ECMWF uncertainty.
The used agrometeorological data from ECMWF are available in a grid with a 10-day time resolution and a 0.25degree spatial resolution (± 25 × 25 km). This grid format was called virtual station (VS) (  (Dee et al., 2011). Initially, the 10day agrometeorological data were interpolated by inverse distance weighting (IDW) at a 250 × 250-m spatial resolution. Then, the average values of each agrometeorological data were extracted for each farm area.
The information of the total available soil water (TAW) for the state of Paraná was generated from the total water-holding capacity, as presented by Farias et al. (2007), from soil types obtained from the New Soil Map of Brazil (EMBRAPA, 2011). It is the only used soil information and input for both water balance methods.
The soybean crop was considered as described and characterized by Allen et al. (1998). The crop water stress coefficient (ky = 0.8) and the p-depletion factor (0.5) were fixed for the entire growth cycle. The crop index (kc) and root depth (Zr) values were used according to the soybean phenological stage, as presented by FAO (Allen et al., 1998).

Water balance and yield estimation
The FAO-33 model requires four inputs to estimate the yield (Yest) of a crop, namely: maximum yield (Yx), crop Engenharia Agrícola, Jaboticabal, v.41, n.2, p.196-203, mar./apr. 2021 reference evapotranspiration (ETc), actual evapotranspiration (ETa), and crop water stress coefficient (ky). Yx can be determined differently, from historical or experimental data. However, the use of historical data requires reliable historical information, and experimental data are costly and timeconsuming. Therefore, Yx was estimated for this study using the agro-ecological zone approach (Kassam & Higgins, 1981), which considers the average temperature to estimate potential production. The crop evapotranspiration (ETc) and actual evapotranspiration (ETa) were obtained from two methods: the FAO-56 water balance (FAOWB) (Allen et al., 1998) and the Thornthwaite and Mather water balanced (TMWB) (Black, 2007). The calibration of the model was performed through a bias adjustment factor, consisting of comparison between field data and the data estimated by the model. All the procedures ( Figure 2) were conducted for each farm. The harvest index (HI) was used as the bias adjustment factor, minimizing to zero the difference between the observed yield (Yobs from farms) and the estimated yield (Yest from the FAO-33 model). For this, the Simplex LP method of the Solver tool in Excel was used. Fifty out of the 74 farm areas were used to determine HI and the other 24 farm areas were used to validate the results. The HI was used to determine the potential yield (Yx) and as a calibration factor for the model, making the process as simple as possible.

FAO-56 water balance (FAOWB)
The FAOWB uses depletion in the root zone (Equation 1) to assess the crop water stress. where: , is the depletion in the root zone at the end of time t (mm); , −1 is the depletion in the root zone at previous time t−1 (mm); is precipitation at time t (mm); is the runoff at time t (mm); is irrigation at time t (mm); is the capillary rise at time t (mm); , is the crop evapotranspiration at time t (mm), and is the deep percolation at time t (mm).
The values capillary rise, deep percolation, runoff, and irrigation were not considered in this study, as they are extremely hard to be determined and the state of Paraná has no significant irrigation areas. One way of calculating ETa,t is through the ks,t factor, that is, the crop water stress coefficient at time t (Equation 2 quantifies the reduction factor relative to the evaporation of soil water availability. where: ETa,t is the actual evapotranspiration at time t (mm), and ks,t is the water stress coefficient at time t. where: TAWt is the total water available at a certain soil depth at time t (mm), and RAWt is the readily available water for the crop at time t (mm).
The readily available water at time t (RAWt, Equation 4) represents the amount of water that the plant needs no effort to use. It is calculated by multiplying the TAWt value by the p coefficient (p = 0.5) (Allen et al., 1998), which is the depletion coefficient representing the difficulty that the plant undergoes because the amount of water in the soil is not readily available.
The amount of depletion ( , , Equation 5) is dependent on the TAWt value and , −1 .

Thornthwaite and Mather water balance (TMWB)
The TMWB method calculates the water balance differently using the same data. First, it determines the difference between precipitation (Pt) and ETc, preserving the positive or negative signs. Subsequently, it estimates both the accumulated negative at time t (ANt, mm; Equation 6) and the water stored in the soil at time t (WSSt, mm; Equation 7). The accumulated negative is the sum of the sequence of negative values of the months in which DIFt has a negative value, showing that rained less than the soil lost water during the period, in this case dekadly, indicating the potential for soil dryness.
After determining WSSt, it is possible to verify alterations in soil water at time t (ALTt, mm; Equation 8). It refers to the difference between WSSt values while maintaining the positive and negative signs, showing whether the amount of water in the soil increased.
The ETa,t value (mm; Equation 9) is calculated considering water alterations in the soil, never with negative values.
The ETa,t value allows verifying if there was a water deficit at time t (DEFt, mm; Equation 10), that is, DEFt indicates the lack of crop evapotranspiration.
The method also introduces the excess water at time t (EXCt, mm; Equation 11). EXC is the amount of water that is lost during rainy periods and can be determined when WSSt is lower than TAWt, showing the water supply in the soil at time t, and when WSSt is equal to TAWt, showing that the soil is storing its maximum water capacity.
Thus, water balance is the difference between water excess and deficit at a given period.

Ecological zone approach for potential yield (Yx) estimation
The Yx value is the sum of potential gross yield at time t (PPRt, kg ha −1 ; Equation 12).
The crop gross yield is determined by the potential gross yield of the standard crop at time t (PPBt, kg ha −1 ; Equation 13) and the harvest index (HI, dimensionless), responsible for the model calibration in the study. = 0.265455 * * .

Q0,t is the radiation at the top of the atmosphere at time t (Cal cm −2 );
DRt is the Sun-Earth relative distance at time t (UA;  80) ), where: DJ is the Julian day.
The HI value was determined by minimizing the difference between the actual yield (Yact, from farm areas) and the estimated yield (Ya, from the FAO-33 model) to zero, using the Simplex LP method of the Solver tool in Excel. The methods FAO-56 water balance (FAOWB) and Thornthwaite and Mather water balance (TMWB) were used to estimate ETa. Fifty out of the 74 farm areas were used to determine HI and the other 24 farm areas were used to validate the results. The HI was used to determine the potential yield (Yx), and the higher the HI value, the higher the potential yield. The HI value was used as a calibration factor for the model, making the process as simple as possible.

FAO-33 yield model
The traditional FAO-33 model (Equation 21) shows that the relative reduction in yield is related to the corresponding relative reduction in evapotranspiration (Steduto et al., 2012). It was operated in a 10-day stage, and the final yield was obtained by the sum of estimated yields for the growing season. where: Yx and Ya are the potential and estimated yield, respectively (t ha −1 ); ETc and ETa are the crop and actual evapotranspiration, respectively (mm), and ky is the yield coefficient that translates the crop yield sensitivity to water stress (dimensionless).

Statistical analysis
The metrics consisted of the mean absolute error (MAE; Equation 22), which measures the magnitude of the errors in a set of estimates, the mean error (ME; Equation 23), the root mean square error (RMSE; Equation 24), which indicates the size of the error generated by the model, that is, the model performance assessment criteria, and the mean absolute percentage error (MAPE; Equation 25), which shows, in relative terms, the errors of the estimates. where: n is the number of farms; Yest is the estimated yield, and Yobs is the observed yield from farm areas.

Model calibration
The model calibration was performed based on 50 farms using HI. The results show that the average HI values of 0.2625 and 0.2093 for FAOWB and TMWB, respectively, are recommended (Table 1). Both HI values presented a homogeneous coefficient of variance, and the results of the ttest showed 5% significance, indicating a significant difference between the two index values. Therefore, a single index cannot be used for both methods. However, the results are the same as those shown in Section "Water balance", regardless of the used water balance method. Additionally, the indices found in the literature do not allow obtaining statistically satisfactory yield estimates using agrometeorological models (Araújo et al., 2011). Steduto et al. (2012) recommended that HI should be calibrated for each region. Furthermore, the authors found HI values from 0.25 to 0.40 for oilseed crops using a diverse methodology to estimate Yx. The results of this study corroborate with Steduto et al. (2012) and the pilot project (Richetti et al., 2015) conducted at a regional level. Kemanian et al. (2007) proposed a linear model to estimate HI values for barley (0.40 to 0.60), spring wheat (0.38 to 0.46), and sorghum (0.45 to 0.53).

Water balance
The soil water balance and ETa for each farm area were estimated by the FAOWB and TMWB methods. No significant differences were observed between water balances. Therefore, both methods could monitor water stress during the soybean cycle. An example is the farm chosen at random ( Figure 3) and designated as No. 34 (P-34 in Figure  1) for the 2013-2014 growing season. However, Katerji et al. (2011) used the FAOWB method under saline environment conditions for potato and broad beans and concluded that ks calculations underestimated the evapotranspiration measured during the crop cycle by 12%, on average. Rojas (2007) concluded that the FAOWB method together with data from remote sensing can be used to conduct operational predictions of corn yield. Schwantes et al. (2016) compared the TMWB model with data from a polymer tensiometer in a field experiment with soybean and concluded that the time course of the soil water storage profiles calculated with water balances based on the evapotranspiration models measured with the polymer tensiometer was very similar. Thus, the TMWB method can lead to an immediate response in yield. An immediate decrease in yield is observed with water stress, while the FAOWB method leads to a higher reduction in the relative yield, but not immediate. Sowing in the example farm (P-34) was carried out at the first dekad of October, with harvest at the first dekad of February. ETa presents its first decrease at the first dekad of stage IV (Figure 4), leading to a decrease in the relative yield. The next reduction in ETa occurred when the cycle was ending, and the water requirements were lower than in other stages of the crop. Therefore, there is no relative yield loss.  I II II II II III III IIIIVIVIVIV

Yield estimations
Sowing and harvest dates varied according to the location of each area, but most of the farms carried sowing in the first 10-day of October and harvest in the second 10-day of February. The ME value showed an underestimation of 310 kg ha −1 and the MAPE value of 23%, regardless of the chosen water balance method. Therefore, both methods can be used equally to estimate soybean yield (Table 2). These results are slightly lower than those obtained by Araújo et al. (2011), who found an estimated yield of 3567 kg ha −1 , but for only one region and growing season in the state of Paraná. The average estimated yield was 12% higher than the average yield officially reported in Brazil and 6% higher than the average observed in the state (CONAB, 2018). Most farms presented observed yields of around 3100 to 3700 kg ha −1 , with a coefficient of variation of 21%. An HI value of 0.26 was used as a calibration index in this study. Moreover, the indices obtained in the literature do not allow obtaining statistically satisfactory yield estimates with the agrometeorological models (Araújo et al., 2011). Steduto et al. (2012) recommended that HI values should be calibrated for each region. Furthermore, these authors used a diverse methodology for estimating Yx and found that the HI value for oilseed crops ranged from 0.25 to 0.40. The results obtained using the FAO-56 method corroborated with Steduto et al. (2012) and a pilot project (Richetti et al., 2015) conducted at the regional level. Kemanian et al. (2007) proposed a linear model to estimate HI values for barley (0.40 to 0.60), spring wheat (0.38 to 0.46), and sorghum (0.45 to 0.53).

CONCLUSIONS
This study estimated soybean yield using only rainfall, ETo, average temperature, TAW, sowing and harvest dates, and yield as inputs. The use of low inputs, in which field data are almost unnecessary (sowing and harvest dates and yield from farms), is a quick way to evaluate agricultural production. The method underestimated the observed values, with a mean error of −310 kg ha −1 . Field verifications are necessary, although costly, as there are many displacement difficulties, some producers are closed and mistrusted, and others have no record of previous growing seasons. Soybean yield estimations using HI values of 0.2625 (using FAOWB) and 0.2093 (using TMWB) are recommended. Furthermore, an improvement in the estimates can be obtained using the dual crop coefficient (Allen et al., 2005). Another improvement is the use of an equation/function for HI rather than a single value for all areas, as presented by Kemanian et al. (2007). This study has been implemented in simple software, in which the calculations will be assessed on a refined spatial-temporal scale and daily based.