The use of ALOS/PALSAR data for estimating sugarcane productivity

Some models have been developed using agrometeorological and remote sensing data to estimate agriculture production. However, it is expected that the use of SAR images can improve their performance. The main objective of this study was to estimate the sugarcane production using a multiple linear regression model which considers agronomic data and ALOS/PALSAR images obtained from 2007/08, 2008/09 and 2009/10 cropping seasons. The performance of models was evaluated by coefficient of determination, t-test, Willmott agreement index (d), random error and standard error. The model was able to explain 79%, 12% and 74% of the variation in the observed productions of the 2007/08, 2008/09 and 2009/10 cropping seasons, respectively. Performance of the model for the 2008/09 cropping season was poor because of the occurrence of a long period of drought in that season. When the three seasons were considered all together, the model explained 66% of the variation. Results showed that SAR-based yield prediction models can contribute and assist sugar mill technicians to improve such estimates.


INTRODUCTION
The sugarcane (Saccharum spp.) is of great economic importance to Brazil because the country is the largest producer of sugar and ethanol in the world and is the leader in production technology.It occupies approximately 10% of total cultivated land and 1% of the total land available for agriculture in the country (GOLDEMBERG, 2007) and has been receiving increasing prominence on the world stage for presenting great efficiency in biofuel production (AGUIAR et al., 2009;MENDONÇA et al., 2011).Since the National Alcohol Program was implemented in Brazil in the 1970s, the production of sugarcane increased from 67,759.180 tons in 197067,759.180 tons in to 384,165.158 tons in 200667,759.180 tons in (IBGE, 2010)).This trend remains due to the increasing demand of ethanol supply for automobile industry (MORAES, 2011).Only in the state of São Paulo, the largest producer of sugarcane in Brazil, from 2003 to 2008, the planted area increased by 1.88 million hectares ( RUDORFF et al., 2010).
Productivity estimates are important for planning activities in the sugarcane agribusiness sector and planning operations at the plants.In general, estimates of agricultural productivity of sugarcane areas at the mill are made before the start of harvest (March / April) by technicians who run the sugarcane fields and observe crop development, assigning them values of productivity based on experience and information acquired in previous years (PICOLI et al., 2009).Some models for estimating productivity of sugar cane were developed using agro meteorological and / or remote sensing data (e.g., SUGAWARA et al., 2007;PICOLI et al., 2009;SIMÕES et al., 2009;FERNANDES et al.;2011).These models use as input parameters, climatic, agronomic and / or remote sensing data to estimate agricultural productivity.The spectral models are based on the use of variables such as vegetation indices that are obtained from remote sensing images, and are related to crop vigor and, therefore, with agricultural productivity.
New advances have been made regard to the potential use of satellite synthetic aperture radar images (SAR) in agronomic models to estimate the productivity of agricultural crops.LIN et al. (2009) found that there was correlation between leaf area index (LAI) derived from sugarcane plantations in China, and backscatter coefficients (σ°) obtained in the C band of ENVISAT ASAR satellite.In this study, the authors built two empirical models to estimate LAI derived from the ratio between values of σ° obtained in HV and HH polarizations (σ°H V /σ°H H ), finding coefficients of determination of 0.93 and 0.88 for both models.BAGHDADI et al. (2009) had coefficient of determination of 0.87 between data of X-band, from satellite Terra SAR-X and the vegetation index of NDVI (Normalized Difference Vegetation Index) of sugarcane cultivars in Reunion Island.These authors also observed that the backscattering of sugarcane values in the L-band and polarization of HH /HV, from the sensor ALOS/PALSAR, increased about 4 dB when the NDVI values ranged from 0.2 (young sugarcane, plowed soil and harvested sugarcane) to 0.8 (maturing sugarcane plantation).Thus, this study aimed to build an agronomic-spectral model that allows estimating agricultural productivity in plots of sugarcane, using radar data from the sensor ALOS/PALSAR (L-band, HH polarization) and agronomic data on 2007/08, 2008/09 and 2009/10 cropping seasons.

MATERIALS AND METHODS
The study area covers the sugarcane crops located in the northeast of São Paulo State and is bounded by the coordinates 20º46 'to 21º50' south latitude and 47º16' to 48º14' west longitude.The region has a typical tropical climate, with raining period from January to March (historical average sum of precipitation of 607 mm), being January and February the warmest (historical average temperature of 24.3°C); the period from July to September are drier (historical average sum of precipitation from July to September: 118.2 mm) being June the coldest month (historical average temperature: 18.6°C) (UNESP, 2011).According to the data obtained by SRTM (Shuttle Radar Topography Mission) and interpolated to the grid of 30 meters by VALERIANO & ROSSETTI (2012), the location's elevation ranges from 500 to 800 meters and the landscape varies from flat to smooth undulating.

Varieties of sugar cane
In the study area, three sugarcane varieties predominated: RB85-5156 (early cycle), harvested early in the season (April-May); RB92-5345 (mid-cycle), harvested at mid-season (May-July); and RB86-7515 (late cycle) harvested late in the season (August to December).The variety RB85-5156 has high tillering mainly in ratoon crop, erect stems, but decumbent in late stages, thin to medium diameter, bright-green color, presence of cracks and great volume of straw (HOFFMANN et al., 2008).It still has great ability to sprout from ratoons and high precocity (Table 1).The RB92-5345 variety is characterized by the fast development, erect growth habit, hard straw removal, average stem diameter, purple-yellowish color when exposed to the sun and prominent presence of wax and oval gem.It also has high sucrose level, high productivity and high fiber content at the beginning of the harvest.The RB86-7515 variety shows high speed growth, height, upright growth habit, high density stem, purplish-green color that is accentuated when exposed to the sun and easy straw removal.It is drought tolerant, has good sprouting from ratoon, even raw harvested it has high sucrose level, rapid growth and high productivity (HOFFMANN et al., 2008).

Radar image data
The images of the PALSAR sensor used in this study were obtained on the following dates: February 19 th , 2007; February 22 nd , 2008 and February 24 th , 2009; FBS (Fine Beam Single  Polarisation), HH polarization and spatial resolution of 6.25 meters, with ascending orbit, inclination angle of 38° and radiometric resolution of 16 bits.The images were pre-processed to correct the radiometric and geometric effects.The radiometric correction involved the conversion of the digital amplitude image levels for σ ° (unit in decibels -dB) (SHIMADA et al., 2009) (Eq.1).According to ROSENQVIST et al. (2007), this conversion enables the multi-temporal analysis of radar images from a particular place.The geometric correction of the PALSAR images was based on orthorectified image (geocover) Landsat ETM+ of March 23 rd , 2001, available on the website of the University of Maryland, United States (http://glcfapp.glcf.umd.edu:8080/esdi/index.jsp),using control points and projected for cartographic projection system UTM (Universal Transverse Mercator), datum WGS84 and 23S zone (GRANT & DYKSTRA, 2004).
The backscatter values regarding to sugarcane were analyzed by calculating the average values within the plot, according to the variety.

Agronomic Data
The model to estimate the productivity of sugarcane was obtained by analysis of the following agronomic variables: the production environment, cutting stage, variety of sugarcane, cutting date and previous crop productivity.The production environment is defined as a function of conditions: physical, hydrological, morphological, chemical and mineralogical, type of management/tillage of topsoil, liming, fertilization, amount of vinasse and filter cake added, straw, in case of direct seeding, controlling weeds and pests, subsurface soils properties, and especially the regional climate (rainfall, temperature, solar radiation and evapotranspiration).Therefore, the environment production is the sum of the interactions of surface and subsurface features and climate conditions (PRADO, 2005).The environments are ranked from A to E, from the best to the worst environment, respectively.
The cutting stage indicates how many times this plant has been harvested.At every cutting, occurs a reduction of productivity (CTC, 2004).The degree of decrease in productivity over the cuts depends on the genetic potential of the variety, since a large energy is spent for the annual regrowth plus terms of soil fertility and other factors of production that must be maintained (ROSSETO et al., 2010).The plots studied were between the second and the fifth cutting.
The cutting date was entered into the model to investigate the hypothesis that the plots of sugarcane harvested at the end of the previous season tends to produce more because their development during the period of heaviest precipitation, which occurs from September to December, therefore these plots not being penalized by the drought that occurs halfway through the season (May-August).
The productivity of the previous season brings considerable information about the productivity of the current crop because, as mentioned earlier, sugarcane tends to have a drop in productivity with the increase of cutting stage.Thus, this information is indicative of current productivity.All agronomic data of the study area were provided by the mill.

Model variables
Four models of multiple linear regressions were generated, one for each cropping season (2007/08, 2008/09 and 2009/10) and the fourth for three seasons together, to estimate agricultural productivity for sugarcane per plot.Initially, the model to estimate agricultural productivity of sugarcane was composed by the following variables: backscatter referring to PALSAR image for February (for the season), variety of sugarcane, production environment, cutting stage, harvesting date and productivity of the plot in the previous harvest.
The variable variety of sugarcane was composed by three classes: RB85-5156, RB92-5345 and RB86-7515; the environment production variable, by Five classes ranging from environment B (there was no environment A in the studied plots) to the environment E (the worst environment); variable cutting stage for four classes: 2 nd , 3 rd , 4 th and 5 th cutting.These qualitative variables were transformed into dummy variables and then was applied the technique of Best Subsets (NETER et al., 1996), which identifies the best subset of predictors within a set originally specified to estimate the agricultural productivity of sugarcane.Thus, it was possible to determine estimation models with good adjustment quality, and with the smallest possible number of predictors, all based on the coefficient of determination.
Residual analysis was performed to verify the homoscedasticity, normality and independence of the waste.The presence or absence of multicolinearity was also observed, namely, the predictors were linearly dependent.The presence of outliers was assessed by residual graphic.
After the analyzes above mentioned, was calculated the scale of agricultural productivity in farming (plot).The models were evaluated by values σ and analysis of approximately 20% of all plots, randomly selected, which were not included in the creation of models for estimating agricultural productivity.For this, the production estimates models were calculated for these selected plots.Subsequently the simple linear regression analysis were performed between productivity estimated by models and real productivity.The model was also evaluated by modified Willmott index d that verifies the concordance between the observed data and the estimated (WILLMOTT et al., 1985).Apart from the index d, the values of mean square error (MSE), random error (RE) and standard error (SE) were also provided (ALLEN, 1986).The coefficient of determination (R2) indicated the accuracy of the model, that is, the oscillation of a dependent variable is explained by the independent variables and the probability value p indicated that the model was significant at the 5% level of significance.The estimated average productivity (EAP) were calculated after the models have been tested in 20% of the plots, i.e., after the calculation of the EAP for each plot, and their arithmetic average productivity.Actual average productivity (AAP) was also calculated using field information of the total production and the size of the harvested plots.
Finally, the t test method was used to compare the productivity estimated by the model with the actual production.This review allowed us to analyze the differences between average values of productivity estimated by the models in relation to the average values of real productivity.

RESULTS AND DISCUSSION
The Best Subset technique showed that the best predictor subset, within the set initially specified, would be obtained with the use of six variables.The coefficient of determination (R 2 ) tends to stabilize, and therefore the actual productivity would not be estimated with higher accuracy if all the available variables were used in the regression model.
The suggested model by the technique of Best Subset indicated the following qualitative variables (dummy): a) the variety of sugarcane RB92-5345, which among the three was the most productive variety produced in 2007/08, among 6 and 20 t ha -1 more than the other varieties; in 2008/09, between 13 and 15 t ha -1 ; and in 2008/09, approximately 14 t ha -1 more than the other varieties, per plot; b) with respect to environment production variable, only B class was considered in the model because it is the best environment among the studied plots; and c) the variable cutting stage was inserted always in the lower cutting (e.g.: in 2007/08 there were sugarcane in the 2 nd and 3 rd cut, and in the model the variable 2 nd cut was considered), except in the model for all crops where all cutting stages were included.Considering the quantitative variables: a) the σ° values were collected in February image (for each harvesting season), when the sugarcane reaches its maximum development (which begins in November and ends in April) CASTRO, 1999); b) the cutting date tends to be less penalized because their growth period coincides with the period of the greatest precipitation; and c) previous crop productivity, which helps to explain the variation in productivity of the current crop.
Table 2 shows that all selected variables in the model are significant at level of 10%, however, not always all variables are significant in the same model, even though these variables remained in the model, as they were considered relevant.For example, only the variable RB92-5345 was not significant in the productivity estimation model in 2009/10.The analysis of the coefficients (β) showed that the variable backscatter of PALSAR image in February had always a positive contribution in the productivity estimation, because the values of σ° ranging between -8 and -13.6 dB and β values were always negative.The variable that less contributed was the previous crop productivity, but it was considered important because it is an indicative of the productivity of the current crop.TABLE 2. Coefficients estimated by regression analysis (coef), t-test (t) and p-values (p) .
The most accurate productivity estimation model was the 2007/08 season, which explained 79.5% of the variation in productivity (Table 3).The t-test also showed that the relation between the dependent and independent variables is significant.Furthermore it was the model that had the lowest average variability on the regression line (Figure 1a).The productivity estimation model was applied to a new set of data (20% of total plots) in order to evaluate it and the result was close to that obtained previously, with coefficient of determination of 71% and a value of p that tends to zero, for that harvesting season.The model for the 2008/09 cropping season could not estimate agricultural productivity with good precision.This may be due to a long period of drought that occurred during that season.Figure 2 shows that there was water shortage from the beginning of March, immediately after obtaining the PALSAR image, to December 2008, which harmed the development of sugarcane.The model did not show good values for productivity estimation (R 2 adjusted = 0.12) for not having any agrometeorological variable, which could indicate problems in the plant development.Thus, the model could not accurately estimate the productivity of that cropping season, which in turn was less productive due to the long period of drought.However, when the average productivity estimation was calculated for all 2008/09 plots, the model was able properly to assess productivity.
The actual average productivity and the average productivity estimated by the model for the plots were both 84.8 t ha -1 , with a standard deviation of 14.6 t ha -1 for the actual data and 13.4 t ha -1 for estimated data, showing that, on average, the model was able to estimate with good precision the agricultural productivity of sugarcane.Adding an agrometeorological variable would be of great importance for the estimation model, as shown by FERNANDES et al. (2011) who used ECMWF data model to calculate the productivity of sugarcane in the state of São Paulo.However, data from the ECMWF model have a higher spatial resolution (25km x 25km) than that used in this study (12.5m x 12.5m), but tests were carried out with interpolated ECMWF data, but the results were not satisfactory.There are also other models using meteorological data, such as ETA, developed by RIZZI &RUDORFF (2007) andPICOLI et al. (2009), however, it also features incompatible resolution (15km x 15km) with the model proposed in this study.
Productivity estimation models for 2009/10 cropping seasons and for the three cropping seasons together explained 74% and 66% of the variability in the actual productivity (Figures 1c  and 1d) with a standard error of 8.4 and 6.3 t ha -1 , respectively.The productivity estimation model of the three cropping seasons together was tested on a new data set (plots), resulting in a determination coefficient of 66%.It is believed that this model did not obtain more significant results due to the inclusion of data from the 2008/09 cropping season, which was, as mentioned above, an atypical season.
For 2009/10 cropping season, there were not enough data to separate a set of plots to evaluate the estimation model because the plots with RB86-7515 variety were not harvested in that season.
The modified index of agreement d, close to one, featured almost perfect agreement between the estimated and the actual given data.So for 2007/08 and 2009/10 cropping seasons and for the three cropping seasons together (d> 0.9), the model was able to estimate the sugarcane productivity with greater accuracy than for the 2008/09 (d = 0.65) cropping season.This may be due the fact that sugarcane suffered from the drought during the 2007/08 cropping season, and that affected its performance in the 2008/09 cropping season.The t test to compare means between the productivity estimated by the model and the actual average productivity revealed that there was no significant difference in all cropping seasons, with 5% level of significance.This fact can be observed by comparing the values of the average estimated productivity by the models with the actual average productivity (Table 3) that were similar.According to the results it can be assumed that other factors (e.g.meteorological variables and field variables such as fertilization), not considered in this study, influenced the agricultural productivity of sugarcane.Another hypothesis is that temporal resolution of the variables is not considered adequate to represent their effects on plant agricultural productivity and the model should be calculated fortnightly or even monthly.More importantly, the lack of an index that provides a penalty in production due to the occurrence of drought is important for plant development.

CONCLUSIONS
It was possible to estimate the agricultural productivity of sugarcane plots by a developed models whose variables used were: backscattering (PALSAR -HH), sugarcane variety (RB92-5345), production environment, cutting date, cutting stage and productivity of the previous cropping season.The models have been effective, especially for the estimates of the average productivity calculation of the sugarcane plots, where the estimations were very close to the actual values.It was observed the need for inclusion of agro-meteorological variables to explain variations in sugarcane productivity related to climatic events such as drought.
is the digital image value range, and CF corresponds to a correction factor whose values are found in AUIG (2009).

TABLE 3 .
Actual average productivity (AAP) and corresponding standard deviation (s), average productivity estimated by the model (APE) and standard deviation (s), adjusted coefficient of determination (R 2 adjusted), variability about the regression line (S), index of agreement (d), random error (RE), standard error (SE), mean squared error (MSE) and t-test of the agronomic models for crops in 2007/08, 2008/09, 2009/10 and for three crop cycles.