Modeling of the Rainfall and R-Factor for Tocantins State, Brazil

The state of Tocantins is inserted in the new Brazilian agricultural frontier and has shown enormous potential for expansion of the agricultural lands. However, there is a lack of more elaborate scientific information for better planning and guide agricultural activities, especially regarding the soil and water conservation. Tocantins has a relevant rainfall spatial variability and, consequently, rainfall erosivity. Thus, this work aimed to develop models to estimate the mean monthly and annual rainfall and means rainfall erosivity factor (R-factor) of the Universal Soil Loss Equation (USLE) for the state of Tocantins. For that, 97 historical series of daily rainfall were studied, considering a standard period of 25 years (1985-2009). The fitting of the models adjustment datasets consists of 86 rain-gauges, out of which, 11 were randomly selected and used solely for model validation. Afterward, maps of these variables were generated based on the regression-kriging procedure. The fitted models presented precision statistical that allow characterizing them as “good quality”, with a coefficient of determination higher than 0.68 for rainfall models and 0.65 for the R-factor model, besides acceptable bias. Therefore, the application of the models can be successfully carried out, aiding the agricultural planning in the state of Tocantins.


INTRODUCTION
The state of Tocantins belongs to the northern region of Brazil and is inserted in the Tocantins-Araguaia basin. Its economic activities of the state are focused on cattle breeding, forestry, and grain production, mainly soybean, corn, and rice. The agri-business has considerable potential for expansion since around 50 % of the soils are suitable for agriculture (Seplan, 2016). However, its expansion is hampered by the lack of studies on climate recognition with the necessary spatial and temporal discretization to subsidize agricultural planning.
The rainfall regime in the state of Tocantins is considerably variable in time and space. In the Northwest, along with the Amazon Biome, the rainfall is more uniformly distributed throughout the year. However, in the Cerrado biome, mainly in the southern, the dry season is more severe and usually occurs between May and September (Viola et al., 2014). These differences in the rainfall regime are the limiting factors for recognizing the agricultural suitability of the state. Viola et al. (2014) already mapped R-factor for the state of Tocantins, considering monthly and annual distribution. However, a linear regression model for R-factor prediction was not adjusted, and they developed an R-factor map using simple ordinary kriging. Thus, a more detailed spatial and temporal study of R-factor is indispensable, which can be done by means regression-kriging procedure (Mello et al., 2013). Furthermore, in cited study, neither estimation nor fitting regression models for mean rainfall was developed, which is rather important for crop production planning. Even though the occurrence of rainfall is associated with complex atmospheric mechanisms, there is a highly recognized empirical relationship between its magnitude and location (latitude, longitude, and altitude). The influence of the geographical location and elevation on the rainfall pattern needs to be incorporation in the estimation models, become them more accurate to estimate for locations without data using only these coordinates as inputs (Marquínez et al., 2003;Mello and Silva, 2009;Mello et al., 2013).
Rainfall erosivity (R-factor -long-term average annual rainfall erosivity) can also be estimated using its geographic location since available rainfall erosivity data, e.g., pluviograms, are very limited. In this sense, Meusburger et al. (2012) report that latitude and longitude did not affect significantly the estimation of rainfall erosivity in Switzerland. Nevertheless, Mello et al. (2013) and Mello et al. (2015), while modeling R-factor in Brazil and its regions, observed the importance of the geographic location on the behavior of this factor, since Brazil is a continental country, affected by different climatic types. In addition, altitude is another important variable that needs to be tested in these statistical models (Panagos et al., 2015). Mello et al. (2015) presented four equations to estimate rainfall erosivity for the geographical regions of Brazil. Image processing was performed at a pixel level of 0.04 km 2 (40,000 m 2 ), and the corresponding equation to the North-Center-West region, where Tocantins is located. The results presented by Mello et al. (2015) allow comparing the R-factor of different states within the national scenario. However, in the case of the state of Tocantins, the estimated erosivity values did not provide R-factor estimates with enough details for local planning, which is relevant for local stakeholders, such as farmers, engineers, or agronomists since the authors used much less observed climate information. Likewise, Oliveira et al. (2012) also provided a good overall understanding of the occurrence of larger and smaller values of erosivity throughout the country. The authors mapped rainfall erosivity for Brazil reviewing available erosivity equations, in focalized locales, to interpolate and estimate erosivity values for the entire country. It was also reported that some regions, mainly Northern Brazil, lacked of rainfall erosivity studies. Trindade et al. (2016) studied rainfall erosivity of Brazilian hydrographical basins. They reported that the Tocantins-Araguaia basin presented the highest and lowest values of R-factor and precipitation simultaneously. In this sense, this study suggested the need of a more detailed analysis for Tocantins, since it is a state in which anthropic activities are rapidly changing the land use, mainly due to agricultural expansion, and the use of more precise tools for spatially distribution of rainfall and R-factor can be further explored.
Multivariate models are widely disseminated for the estimation of hydrological and climatological variables due to their wide range of applications, and ease handle. In this context, multivariate geostatistical interpolators have been taken place for the above-mentioned purpose, highlighting regression-kriging (Mello et al., 2015). Regression-kriging considers, at first, to map at pixel-to-pixel the variable based on statistical models using geographical variables as inputs; then, the residuals generated by comparison between predicted and observed data are taken. These values are then mapped by ordinary kriging; finally, this latter map is summed to the previous one, which allows removing the bias from the regression models (Mello et al., 2013).
The hypothesis is that statistical models using geographical variables to generate a regression equation can provide a better estimation of both month rainfall and annual erosive potential at a farm level scale (pixel of 90 m 2 ). Thus, this work aimed: (1) developing statistical models for mean monthly and annual rainfall, and R-factor for Tocantins, by fitting regression models using geographical explanatory variables, such as latitude, longitude, and altitude; and (2) mapping these climatic variables by means regression-kriging procedure, as tools for environmental planning.

Physiographical characteristics of Tocantins and database
Tocantins has an area of 277,298 km 2 and is the 9th largest Brazilian state. The Thornthwaite type-climates are: Mega-thermal (C1w2A'a'), dry sub-humid with large summer water surplus; Mega-thermal (C2wA'a'), sub-humid with moderate winter water deficiency; and Mega-thermal (B1wA'a'), humid with moderate winter water deficiency (Souza et al., 2019).
The relief varies between flat and gently undulated in most of the state. In the middle region of the Araguaia river basin, in the extreme Southwestern area of the state, lies Ilha do Bananal, at an altitude below 200 m. In the Southeast, on the border with the states of Bahia and Piauí, in the Serra Geral and Serra de Tabatinga, and center-state, in the Serra do Lajeado, the altitude is greater than 1,200 m. In figure 1, it is possible to see the SRTM digital elevation model for the state of Tocantins.
The dataset consisted of 97 long-term daily rainfall data obtained from the Hydrology Information System of the National Water Agency (Hidroweb/ANA). We standardized these series considering a standard period of 25 years (from 1985 to 2009) as, after this period, some rain-gauges were shutoff, which impeded to use more recent datasets. The dataset for rainfall and rainfall erosivity modeling consisted of 86 rain-gauges, out of which 11 were randomly selected exclusively to validate the models.
The R-factor for each rain-gauge was estimated by means of R = f (Rc or p) equations (Rc is the Fournier's index and p the mean monthly rainfall). These equations were fitted for locations provided with pluviographic records located in the neighboring area of Tocantins (Table 1).
To define which equation was applied for each one of the rain-gauges within the Tocantins State, we adapted the method developed by Mello et al. (2013). In this procedure, the influence area of each R-factor equation (Main Station - Figure 1) was determined combining the Precipitation Concentration Index (PCI) (Oliver, 1980) with Thiessen Polygon procedure. The PCI allows to understand how rainfall is distributed over time, giving a perspective of rainfall concentration, which is related to rainfall erosivity and Thiessen Polygon gives a geometrical influence area of a given rain-gauge. Thus, we defined the rain-gauges under the same PCI magnitude within a given polygon as characterized by Thiessen procedure. This methodology was also adopted by Viola et al. (2014) and Mello et al. (2015). In figure 1b, the location of rain-gauges in Tocantins and its whereabouts are presented, as applied in the modeling and validation process, as well as the location and area of influence of each main station (rain-gauge with an equation for R-factor).
The independent variables for statistical modeling were the longitude, latitude, and altitude. For the first two, we considered the module of the geographical coordinates in decimal degree. The third one, in meters, was extracted from the digital elevation model, with a 90-meter resolution.

Development of the multivariate statistical models
The development of the multivariate statistical models was based on the fitting multiple linear models and combining independent variables (latitude -LA, longitude -LO, and altitude -A). The variables were tested according to their significance and collinearity and selected by a backward process of multiple regression. This procedure assesses  all variables in a regression, excluding the non-significant ones by Student's t-test, as well as those that present collinearity. Software SAS for Windows was applied for the models fitting. The information on rainfall was totalized into monthly and annual periods. Due to the reduced rainfall amount normally observed in July, August, and September, it was opted to group the entire accumulated rainfall of this quarter into a single statistical model.
For the evaluation of the models, it was applied the adjusted coefficient of determination (R a 2 ), the average absolute error (E), in %, and the tendency of the models (T), in %.
For the final validation of the models, the parameters Error and Tendency were applied to 11 stations which were separated exclusively for this purpose (Figure 1b). In addition to these statistical parameters, scattered estimated points in 1:1 line graphics were designed, with the purpose to visually analyze the precision and accuracy of the estimations.
The tendency of the models (T) allows to verify the average trends of the estimations obtained from the models, with respect to the observed values. Thus, the closer to zero the value of this coefficient, the better the model will represent the modeled phenomenon. Van Liew et al. (2007) present the following classification for tendency: |T| <10 %, very good; 10 % < |T| <25 %, satisfactory and |T| >25 %, the model will produce improper estimations.

Rainfall and R-factor mapping based on regression-kriging
Afterward, rainfall and R-factor were mapped by means regression-kriging (Meusburger et al., 2012;Mello et al., 2013;Oliver and Webster, 2014;Borrelli et al., 2016;Meddi et al., 2016), with the aid of the Geographical Information System ArcGIS. In this procedure, the altitude extracted from the SRTM elevation digital model was used, and rainfall and R-factor maps were made under format raster, with 90-meter spatial resolution, using the models. After that, the residue for each location was calculated (observed -estimated). To map the residues, ordinary kriging was used adjusting the exponential semivariogram model (Mello et al., 2007;Silva et al., 2010;Aquino et al., 2012;Mello et al., 2013;Viola et al., 2014). Thus, the initial maps generated by statistical models were added to the respective residue maps to obtain the final maps.

Models for rainfall and R-factor
Tables 2 and 3 display the parameters of the statistical models for the rainfall variables in the state of Tocantins. Due to the difficulty to fit models with good performance for the months of July, August, and September -littler rainfall amount observed during the dry season-, it was opted to group them in a quarter period. It is possible to observe that based on the values of the adjusted coefficient of determination (R a 2 ), and good quality of the precision statistics, the fitted models presented good performance, especially for May (R a 2 = 0.920), July-August-September (0.89), April (0.877), and annual (0.823).
The models for both January and November presented inferior R adjust 2 , nevertheless acceptable performance.
The adjusted models for some months had non-significant parameters under Student's t-test, which can be justified for the improvement of the final quality of the model. Such procedure was also adopted by Mello and Silva (2009) and Mello et al. (2013).
A statistical model to R-factor estimate is presented in table 4. A coefficient of determination (R 2 ) of 0.652 was computed. Figure 2 presents the dispersion of the estimated values around the 1:1 line, average error (E), and tendency (T) obtained from the validation process. By analyzing the tendency of the models, it is possible to observe that, except for June, which had an overestimated tendency of 26.5 % (model for June), all the models presented great performance tendency (varying from -0.5 to -7.1 %) (Van Liew et al., 2007).
The values for average error (E) varied between 11.2 and 46.3 %, being the highest value for the model in June, since rainfall accumulation during this month was very low (Viola et al., 2014). This resulted in inversely proportional behavior with high estimation errors (Figure 2).
Figure 2 also shows the dispersion of the estimates generated by the model fitted to the R-factor around 1:1 line. Likewise, the dispersion observed for rainfall, the R-factor presented good distribution around the line, as its tendency is low (-6.7 %). However, for erosivity values greater than 10,000 MJ mm ha -1 h -1 yr -1 -very strong erosivity (Foster et al., 1981) -, the proposed model underestimated the R-factor values ( Figure 2). Consequently, locales with estimated high rainfall erosive potential may even present more severe water erosion problems than those estimated by the model. The error was 20.4 %, considering that the State of Tocantins has widely R-factor values, ranging from 6,500 to 14,000 MJ mm ha -1 h -1 yr -1 (Viola et al., 2014). Thus, we highlight the good performance of the proposed model for R-factor estimation for the state of Tocantins.

Mapping of average rainfall and R-factor for the state of Tocantins using regression-kriging
The exponential semivariogram model fitted for the residuals from the statistical models are presented in figure 3. It is possible to notice a strong spatial dependence structure as the values of the degree of dependence (GD) are larger than 75 % for most models, and the good adherence of the experimental values to the exponential model. The estimated range reached up to 554.6 km, being that November presenting a lower value (75 km). Thus, it is possible to observe that regression-kriging may produce good results for rainfall and R-factor mapping in the state of Tocantins.
The results of rainfall and R-factor maps, developed based on regression-kriging, are displayed in figures 4 and 5.   Daly et al. (1994) mapped orographic precipitation in a complex terrain in the United States of America as a function of geographic coordinates and elevation. Results of precipitation-elevation regression models showed R 2 values ranging from 0.55 to 0.70, producing usable estimates of precipitation. For the state of Minas Gerais, Mello and Silva (2009) fitted linear statistical models for the rainfall as affected by geographical coordinates and elevation, obtaining R 2 varying between 0.510 to 0.802, and reported that the models have good performance. Both studies concluded that, because of the physical-environmental elements being highly and naturally variable, the models may be applied. Meddi et al. (2016) incorporated longitude and altitude into the modeling of rainfall erosivity in Argelia, reporting the great importance of longitude as an explanatory variable for rainfall erosivity in that place. In the present study, the coefficients of determination were comparable to the above-mentioned studies and can be rated as good quality; especially considering that the state of Tocantins is located within a climatic transition region between the biomes of Cerrado and Amazon. Meusburger et al. (2012) developed a multiple regression model to estimate the average annual rainfall erosivity (R-factor) in Switzerland, obtaining a determination coefficient of 0.76, using precipitation data as explanatory variables, in addition to geographic coordinates. Mello et al. (2013) developed statistical models to estimate R-factor for Brazilian regions, obtaining R 2 of 0.678, 0.749, 0.681, and 0.658 for the South, Southeast, North-Center-West, and Northeast regions, respectively. The model fitted in the present study, the coefficient of determination was close to the ones obtained by Mello et al. (2013), being considered statistically acceptable. Regarding the lower R 2 obtained for dry seasons, Marquínez et al. (2003) also observed that issue and, to overcome that, the authors adjusted a model with a greater number of the input variables, reducing, therefore, the degree of freedom, and indeed some the collinearity. However, this situation demonstrates that the modeling tends to perform better if other more complex variables are incorporated.

Models for rainfall and R-factor
Conversely, when the driest months (July, August, and September) were placed in the same group, the model generated an average error similar to those found for the wet season months. Thus, it is important to highlight the good fitting of the models, since the state of Tocantins comprises the Cerrado and Amazon biomes, as well as their transition region, being close to the Northeastern region of Brazil, which influences the rainfall pattern.
By applying the R-factor model for the North-Center-West region proposed by Mello et al. (2013), using our dataset, we computed an average error of 38.6 % and an overestimated tendency of 18.9 %, despite of their model having a better statistical indicators. Hence, it is possible to say that the proposed model is consistent with the regional   one presented by Mello et al. (2013), nevertheless estimations were more accurate when our model was used.
Mapping of average rainfall and R-factor for the state of Tocantins using regression-kriging Viola et al. (2010) also developed maps of the statistical models produced by Mello and Silva (2009) for the state of Minas Gerais. In that study, the authors reported overestimated tendencies, except for the dry months, for which an underestimated tendency was observed.
The annual rainfall map ( Figure 5e) together with validation of the statistical model ( Figure 2) suggested that estimation of rainfall values may be based solely on geographical coordinates and altitude. It is also possible to highlight that monthly models were developed and may become useful for the planning of agricultural activities in the region. This is significant for local stakeholders since scarcity of reliable hydroclimatic data is a consequence of a limited rain-gauge.  Figure 5. Maps generated by regression-kriging for rainfall estimated data for the quarter July-September (a), October (b), November (c), December (d), annual (e), and R-factor (f).
Accordingly, the R-factor map (Figure 5f) had the same behavior as the map of rainfall erosivity in Tocantins, which was presented by Viola et al. (2014) based on ordinary kriging. However, in Bico do Papagaio, located in the far northern area of the state, the estimated values in the present study were greater than the ones reported by Viola et al. (2014). This difference was most likely due to the equation used for R-factor estimates by Viola et al. (2014), which was fitted to Belém, in Amazon biome. This, however, does not correspond to this region of the state of Tocantins, as it may be observed with the methodology applied in our study, i.e. geographical or climatic influence (rainfall concentration patterns) are more similar to those of Conceição do Araguaia (Figure 1).

CONCLUSIONS
The statistical models fitted for the estimation of rainfall and R-factor had adjustments that corresponded to good quality predictors. Thus, it is possible to use them for the estimation of average annual rainfall and R-factor for the state of Tocantins.
In general, according to the tendency analysis performed, the models were rated as very good, except for June.
The maps generated by regression-kriging enables to evaluate the spatial distribution of rainfall and erosivity in the state of Tocantins. The highest values of annual precipitation and erosivity were observed in the Northern region of the State.