Validation of white oat yield estimation models using vegetation indices

The use of remote sensing in agriculture presents some practical applications in crop production forecast. In this context, studies with remote sensing are scarce for crops such as white oats, which may indicate the capacity of using this technique in the crop. The aim of this study was to evaluate the accuracy in validation of white oat biomass and grain yield estimates by spectral models previously calibrated using two vegetation indices (NDVI and IRVI) at three phenological stages. The mean values of NDVI and IRVI were correlated with the grain and biomass yield of white oats to obtain regression equations. The accuracy was verified by the determination coefficient (R2), root mean square error (RMSE) and mean bias error (MBE). The models were calibrated using data from a field experiment carried out in 2017 and validated with data from the same experiment, but conducted in 2018. The models had good generalization capacity for estimating yield of white oats, especially for biomass yield. Parametrized models in more advanced phenological stages, showed lower error of estimation. Models calibrated with the vegetation index IRVI had lower error of estimation than when calibrated with NDVI.

The use of remote sensing in the diverse crops presents some practical applications as the crop production forecast, thematic mapping for fertilization and irrigation in variable rate and production destination by the farmers. For crop production estimates, the use of remote sensing by spectral indices can indicate the regional and national scale of crop production before its harvest, helping to define the price of agricultural products and production destination. Currently, in several countries, crop production forecast is performed after its harvest, presenting high cost and time in the process (CONAB 2018).
In addition, remote sensing can lead to recommendations on fertilization and irrigation in variable rate. Vegetation indices represent the physiological state of the crop with accuracy (Rissini et al. 2015), and the yield represents the interaction of the edaphic conditions to which the crop was submitted (Grohs et al. 2009). Therefore, estimation of crop yield as a function of vegetation indices can be performed, while, based on this crop yield, thematic maps can be generated for the definition of specific management areas. This is because higher yield sites have better conditions for crop development and can receive less resource, such as water and fertilizer, compared to areas with lower yields.
Among the best known spectral indices in remote sensing of agricultural areas, the vegetation indices are the most utilized. They include NDVI (Normalized Difference Vegetation Index) and IRVI (Inverse Ratio Vegetation Index), calculated from the spectral response at different wavelengths. These indices are widely used in the definition of specific management areas and crop yield estimation. However, the accuracy in the estimation of agronomic attributes varies according to the crop and phenological stage evaluated, vegetation index and model used (Ramírez et al. 2011;Bolton et al. 2013).
Model calibration is an essential step in the estimation of crop yield. Defining the best phenological stage and index which guarantee higher accuracy will interfere with the decision-making by the producer regarding both production management and agricultural management of the next crops. However, model validation is an even more important factor because it ensures the generalization capacity. Due to interannual and temporal variations of the indices, models may exhibit wide variations of accuracy, especially at initial stages of evaluation (Raun et al. 2005). Therefore, the accuracy assessment for he different crops becomes necessary to obtain accurate results.
In this context, studies with remote sensing are scarce for crops such as white oats, which may indicate the capacity of using this technique in the crop and the best phenological stages to perform the evaluation. White oats crop has highly world importance, presents planted area of 9.52 10 6 ha and average yield of 2.47 Mg.ha -1 (USDA 2018). It is used for grain production for human and animal feeding, forage, hay production and for straw production in no-tillage system (Damian et al. 2017). This study aimed to evaluate the accuracy in the validation of spectral models for estimating white oat biomass and grain yields previously calibrated using two vegetation indices (NDVI and IRVI) and at three phenological stages.
The experiment was conducted at the School of Agricultural and Veterinarian Sciences of the São Paulo State University -Unesp, Jaboticabal, SP, Brazil (21 o 14'44"S, 48 o 17'00"W and altitude of 545 m) from May to August 2018. The regional climate -according to the classification of Köppen -is Aw, subtropical, with relatively dry winters and rainy summers (Alvares et al. 2013).
White oat (cv. IAC 7) was sown on May 2, 2018, at density of 80 kg.ha -1 , using seeds with 95% germination rate at spacing of 17 cm between rows. Treatments were arranged in an experimental design in strips, formed by five irrigation levels: 11%, 31%, 60%, 87% and 100% of the water volume evapotranspired by the crop (ETc). The experimental plots were 5 m long and 2.4 m wide. The first 50 cm on each side of the plots were considered as border. Additional information on the definition of treatments and experimental area can be found in Coelho et al. (2018).
The ground sensor used was the portable GreenSeeker HandHeld TM . This sensor is active and automatically generates two vegetation indices, NDVI and IRVI, from the measurement of the spectral response of red (650 nm) and near infrared (770 nm) bands. Data were manually obtained with GreenSeeker by moving it 0.50 m above the white oat canopy, covering a sampled area of 5.9 m 2 per plot. The models applied to verify accuracy were those used by Coelho et al. (2018). In this study, the authors calibrated models at four phenological stages of white oat for the same experimental area in the 2017 season, but accuracy was tested only for three stages because the models were not significant at stage 4 (pseudostem appearance) (Table 1).
NDVI and IRVI readings were taken at three phenological stages of the crop: rubbery (8) appearance of flag leaf sheath (10) and kernel watery ripe (10.5.4), according to the phenological scale for winter cereals of Feeks and Large (Large, 1954). Each replicate had a mean value of NDVI and IRVI, which were used to estimate white oat grain and biomass yield through the calibrated models (Coelho et al. 2018). White oat grain yield was calculated by harvesting a 2 m 2 area in each plot, with moisture content standardized at 0.14 g·g -1 . To obtain the biomass, white oat shoots were harvested within a 0.5 m 2 area in each plot and the samples were dried in a forced air circulation oven at 65 °C for 72 hours, and weighed to determine the dry matter. Accuracy quality in the validation of the models was assessed by the coefficient of determination (R 2 ), Root Mean Square Error (RMSE) (Eq. 1) and Mean Bias Error (MBE) (Eq. 2).

RMSE=
( where: N = number of data; Yobs i = observed values; Yest i = estimated values. Figure 1 shows the performance of grain yield estimation for the validation of the models generated for each white oat phenological stage with the vegetation indices NDVI and IRVI. Models generated at more advanced stages of the cycle (10.5.4) showed lower error than those generated at earlier stages (8 and 10). Regardless of stage, all models overestimated grain yield. The lowest error was obtained by the model of the stage 10.5.4. Considering the average grain yield of the validation data (2,839 kg·ha -1 ), the models with NDVI and IRVI overestimated white oat yield by 27%, and there were no major differences of accuracy (R 2 ) and mean errors (RMSE and MBE) between the models.
For biomass yield estimation, it was also observed that the models generated at more advanced phenological stages led to lower errors compared to those generated at earlier stages (Fig. 2). Regardless of stage, all models overestimated biomass yield. However, the highest overestimation occurred for the models of the phenological stage 8, with MBE above 3,500 kg·ha -1 , while the mean errors of the models for the other stages were much lower. The lowest mean bias error (MBE) was  obtained for the model of the stage 10.5.4. Considering the average biomass yield of the validation data (10,919 kg.ha -1 ), the models with NDVI and IRVI overestimated white oat biomass by 1.36% and 0.90%, respectively. For biomass, the mean bias errors of the models generated for IRVI were lower at the stages 10 and 10.5.4 compared to those generated for NDVI, whereas the accuracy (R 2 ) did not differ significantly.
Although the same managements of the previous season (2017) were performed and the readings were taken at the same phenological stages, there was a considerable overestimation of white oat grain yield. This occurs because the vegetation indices exhibit interannual variability (Grohs et al. 2011). Hence, the same values obtained between years do not always lead to the same grain yields. Biotic and abiotic factors such as temperature, relative humidity, solar radiation and diseases incidence diseases may affect crop yield, leading to different yields for the same indices values (Rissini et al. 2015).
Considering that more than 60% of the interannual variability in crop yield can be explained by climate variability (Ray et al. 2015), the error of the models in overestimating white oat yield by 27% is considered as acceptable. Aiming to evaluate the spatial variability of wheat grain yield with NDVI values for validation of models, Grohs et al. (2011) observed intermediate correspondence (48%) between observed and predicted values. According to these authors, grain yield prediction based only on NDVI values is unstable along crop cycle. For biomass yield, these authors observed correspondence of 81%. It can be noted that the estimation of biomass yield using NDVI is more accurate than the estimation of grain yield because the latter is highly influenced by climate conditions (Raun et al. 2005).
In most studies on model calibration and validation, the dataset comes from the same agricultural year, in which part of the samples is used for model calibration and another part for validation (Zeleke et al. 2011;Gomes et al. 2014). This validation method ensures higher accuracy for the models because, since all samples were subjected to the same environmental conditions, there is no interannual variability of the indices, which could reduce the accuracy and increase the error of the models. In studies in which validation is carried out using data from other study sites and/or agricultural years, estimate variations of 10% to 30% occur in the calibrated models but, despite that, they are still recommended by the authors to estimate crop growth and yield (Nassif et al. 2012;Bertolin et al. 2017).
By the present study, it is verified that the biomass yield (BY) estimation of the white oats presented greater accuracy than for the grain yield (GY) estimation. Thus, it is verified that the definition of specific management areas for white oats will be more accurate if performed in function of BY. In addition, plotting the relative GY (RGY) as a function of the absolute BY (Fig. 3), it is verified that areas with higher BY will present higher grain yield, even if in absolute numbers this does not happen between the evaluation years. This is because, as discussed earlier, the interannual variation of the GY is higher  than the BY, due to the climatic variations interfering more in the GY. However, when the analysis is performed by the RGY within each year, areas with higher BY will present higher GY, since the angular and linear coefficients and constants of the generated equations for the estimation of the RGY as a function of the BY for the years 2017 and 2018 (Fig. 3). Another use indicated through to Fig. 3 is the grain yield estimation of white oats as a function of RGY. From the NDVI values in each phenological stage of white oats, it is possible to estimate the biomass yield and from this value to estimate the RGY in each area. For BY above 15,000 kg·ha -1 , for example, the RGY of a given area will be above of 75% (Fig. 3). As the estimation accuracy of the GY was lower than the BY, the use of the RGY estimation as a function of the BY has more accurate, because of the high correlation between the RGY and the BY (R 2 > 0.85).
The models showed good generalization capacity for estimating white oat yield, especially for biomass yield. Models generated at stages closer to white oat flowering show lower error of estimation. Models calibrated with the vegetation index IRVI had lower error of estimation of white oat biomass yield than those calibrated with NDVI. The use of relative grain yield through to biomass yield presents greater estimation accurate.