Mapping wood volume in seasonally dry vegetation of Caatinga in Bahia State, Brazil

ABSTRACT The Caatinga biome in Brazil comprises the largest and most continuous expanse of the seasonally dry tropical forest (SDTF) worldwide; nevertheless, it is among the most threatened and least studied, despite its ecological and biogeographical importance. The spatial distribution of volumetric wood stocks in the Caatinga and the relationship with environmental factors remain unknown. Therefore, this study intends to quantify and analyze the spatial distribution of wood volume as a function of environmental variables in Caatinga vegetation in Bahia State, Brazil. Volumetric estimates were obtained at the plot and fragment level. The multiple linear regression techniques were adopted, using environmental variables in the area as predictors. Spatial modeling was performed using the geostatistical kriging approach with the model residuals. The model developed presented a reasonable fit for the volume m3 ha with r2 of 0.54 and Root Mean Square Error (RMSE) of 10.9 m3 ha–1. The kriging of ordinary residuals suggested low error estimates in unsampled locations and balance in the under and overestimates of the model. The regression kriging approach provided greater detailing of the global wood volume stock map, yielding volume estimates that ranged from 0.01 to 109 m3 ha–1. Elevation, mean annual temperature, and precipitation of the driest month are strong environmental predictors for volume estimation. This information is necessary to development action plans for sustainable management and use of the Caatinga SDTF in Bahia State, Brazil.


Introduction
Seasonally dry tropical forests (SDTF) harbor a significant proportion of global plant biodiversity (Silva et al., 2017;Banda-R et al., 2016;Lima et al., 2021), contributing extensively to biogeochemical cycles and providing numerous ecosystem services, including water quality control, carbon storage, and carbon sequestration (Althoff et al., 2016;Siyum, 2020).The knowledge of ecological patterns and processes in SDTF, including for the Caatinga region in northeastern Brazil, has grown gradually in recent years (Barros et al., 2021); nevertheless, these ecosystems remain chronically understudied.Despite their ecological and biogeographical importance (Queiroz et al., 2017;Pennington et al., 2006), SDTF are among the most threatened forest ecosystems and may be thus at greater risk than humid forests (Apgaua et al., 2015;Lima et al., 2017;Sunderland et al., 2015).The current spatial understanding of the SDTF structure has been largely generated using remote sensing approaches that provide spatially explicit values related to forest area, canopy cover, topography, soil, and climate variables.This information is widely used in statistical and geostatistical models to generate predictive maps of forest attributes (Crowther et al., 2015;Oliveira et al., 2019).These maps have improved our understanding of the morphoclimatic characteristics of the Caatinga biome; however, they do not provide population estimates, densities, or wood volume stocks, which are essential factors for sustainable forest management.An accurate analysis of the spatial distribution of timber stock of Caatinga dry vegetation from a local perspective is necessary to guide action plans aimed at environmental sustainability, especially in Bahia State, Brazil.Geostatistical modeling is an alternative mainly because it can predict a variable of interest in unsampled locations and map its spatial distribution using interpolated information from a sampled area (Seidel and Oliveira, 2016).As the volumetric stock of seasonally dry vegetation across the Caatinga is correlated with environmental covariates (Silveira et al., 2019), this study aims to quantify and spatialize volumetric wood stock for Caatinga vegetation in Bahia State to support more sustainable forest management.We analyzed the potential influence of key environmental variables on the spatial behavior of ecosystem wood volume and created predictive maps of the potential volume distribution for the entire Caatinga area of Bahia State.

Materials and Methods
The study site comprises 54 % of the state of Bahia, Brazil.We used forest inventory data conducted from 868 plots distributed in 40 fragments of Caatinga dry vegetation (42°5' W; 13°7' S, altitude 545 m) (Figure 1).The average altitude is around 200 m and the highest point is 2,033 m above sea level, located in Serra do Barbado, Bahia, Brazil.The study region comprises a mosaic of thorny shrubs and mostly SDTF biome (Queiroz et al., 2017).There is a wide variation of typologies and landscapes on a local scale, which mainly reflect variations in water availability (IBGE, 2019).
The predominant climate is BSwh, BSh, and BWh (Köppen classification), characterized by arid climate with a rainy season in the summer and well-defined dry periods in the winter, with precipitation below 500 mm and average temperature above 18 °C (Alvares et al., 2013).The soils that occur in more significant proportions are Latosols, Neosols, Ultisols, Plintosols, and Luvisols (Santos, 2018).

Data analysis
The dendrometric data were obtained from forest inventories distributed in 40 fragments.The inventories were carried out in plots assembled from three different sources, Inema, RMFC, and ForestPlots.net(ForestPlots. net et al., 2021), representing research and local consultancies covering the broad aspect of the caatinga vegetation in Bahia State (Table 1).
Plot sizes varied according to the objectives of each acquisition source (Lopez-Gonzalez et al., 2009).Most plots had a standard size of 20 × 20 m (400 m 2 ), while others were 10 × 20 m (200 m 2 ), 20 × 30 m (600 m 2 ), 15 × 50 (7500 m 2 ), 20 × 50 (1000 m 2 ), and 100 × 100 m (1 ha).The inclusion diameter was not fixed and the initial CBH (circumference measured at breast height) value showed variation within a range of 4 to 15.7 cm, which was measured on the circumference at 1.30 m above ground level.For better research development, the inclusion diameter CBH ≥ 10 cm was standardized and the extreme outlier values were removed.According to the typologies and fragments in the sampled points, local equations already developed and consolidated were applied to the vegetation divided into arboreal and shrubby Caatinga, which presented R 2 of 0.98, for arboreal caatinga: V = −9.53089× dbh 2.00951 × h 0.84063 and for shrubby caatinga: V = −9.33235× dbh 2.01714 × h 0.66644 .A local equation developed by Pereira et al. (2021) was used for the data referring to the Contendas do Sincorá National Forest, which used the Linearized Spurr model specifically for the locality in question with R 2 of 0.86, described according to the equation: LnV = −9.935921+ 1.026668 Ln(dbh 2 × h).Where, V = volume (m 3 ); dbh = diameter breast height of 1.30 cm from the ground (cm); h = height (m).
The sites were verified for tree and shrub formations in Caatinga and Caatinga/Cerrado transition areas.Due to the existing volumetric variability at the plot level, the data were grouped at the fragment level and converted to one hectare equivalent (m 3 ha -1 ) to obtain more accurate estimates on a broader scale.After estimating the volumetric stock, the data were associated to a set of predictor variables at each location.We selected 47 geospatial covariates grouped into different subsets: topoclimatic, land cover, and ecosystem heterogeneity.The covariates were obtained using satellite remote sensing and globally distributed terrestrial weather stations in raster format.Each raster layer is a spatially explicit grid image, where each pixel represents the value of the covariate described.
Then, the individual volume of the trees was obtained using local topoclimatic variables composed of elevation (relief of the area) and the set of 19 bioclimatic variables (Table 2) from the WorldClim 2.1 database (https://www.worldclim.org/data/worldclim21.html),obtained with a resolution of 1 km 2 (Fick and Hijmans, 2017).The geographic and topographic variables were extracted from the map provided by NASA-SRTM (Shuttle Radar Topography Mission) with a spatial resolution of 100 m, referring to the area altitude (elevation).The latitude and longitude covariates were obtained from databases collected in the field.The covariates of land cover and spatial heterogeneity of the global habitat were obtained from the EarthEnv platform (http://www.earthenv.org//).The information refers to global products of land cover and ecosystem heterogeneity from remote sensing at 30 arcseconds (1 km) resolution.Itprovides consensus information on 12 land cover classes at 1 km resolution (available in full and reduced) and texture characteristics of Enhanced Vegetation Index (EVI) acquired by the Moderate Resolution Imaging Spectroradiometer (MODIS) (Table 3) (Tuanmu and Jetz, 2015).
All variables that fit the model, except for latitude and longitude, were processed in the R software (R Core Team, 2021).To obtain the values corresponding to each study site (at the plot and/or fragment level), the geospatial covariates were initially obtained for the state of Bahia and later, the extraction of environmental values was performed using the coordinates of each site.For each point sampled using the raster:extract function of the R raster package (Rocchini et al., 2021) in each raster file of the geospatial covariates.This information was then stored and saved in a final matrix and used as predictor variables in the model.
The multiple linear regression model was used at first to create a spatial forecast of the volumetric stock: Where, Y = volume (m 3 ha -1 ); α: model intercept; β: regression coefficients, X: environmental variables; ε: random error.No marked independence was assumed within the full set of biophysical variables extracted from the raster layers compiled from the GIS (Geographic Information System) due to the inherently interactive nature of climate, topography, land cover, and environmental heterogeneity in the study site.However, the variance inflation value (VIF) was calculated for each covariate to explain any collinearity between the geospatial covariates and those with a VIF value ≥ 5 were excluded.Then, the step-by-step variable selection procedure was applied using the stepAIC function from R's car package (Weisberg and Fox, 2010).This function builds all possible candidate submodels nested in the global model, identifies the most plausible subset of covariates, and ranks them according to the corrected values of the Akaike Information Criterion (AICc) and the likelihood weights of the AIC (AICcw).The final equation was verified for the assumptions of independence, normality, and homoscedasticity of the residuals via the graphical analysis and by the Durbin-Watson, Shapiro-Wilk, and Breusch-Pagan tests, respectively.The coefficient of determination (R 2 ) and standard error measurements of the estimate (RMSE) of the final equation was analyzed to ensure that the model was adequate.This modeling aimed to establish an explanatory model of the volumetric stock based on geospatial covariates that potentially govern the spatial distribution of volume in the Caatinga vegetation in Bahia State.
The spherical, exponential, and Gaussian geostatistical theoretical models were adjusted to the experimental semivariograms from the residuals of the final equation.In this procedure, the methods of Maximum Likelihood, Ordinary Least Squares, and Weighted Least Squares were adopted, considering the stationarity assumption of the intrinsic hypothesis (Table 4) (Goovaerts and Goovaerts, 1997).
Cross-validation was used as a criterion, calculating the reduced mean error (EMR) and the standard deviation of the reduced mean error (SER), and overestimation of the model to evaluate the performance and select the semivariogram model that best fits the data set (Morais et al., 2017).In the adjustment of the theoretical models of the experimental semivariograms, the parameters nugget effect (τ2), threshold (σ2), and reach (ϕ) were determined.For the analysis of the degree of Spatial Dependence (DE) the relation τ2 / (τ2 σ2) and the intervals proposed by Cambardella et al. (1994), which consider it a strong spatial dependence (DE < 25 %), moderate (25 % < ED < 75 %), and weak (DE > 75 %).From the selected spatial model, the regression residuals were interpolated by ordinary kriging, obtaining the residual map.
To apply the regression kriging, continuous georeferenced cells with dimensions of 100 × 100 m were created for the environmental covariates selected in the resulting volumetric equation with the aid of the ArcMap 10.5 program (ESRI, 2019).Each cell contained the information of the predictor variables.However, this map presents trends in the estimates and the residual map of ordinary kriging was added through map algebra for its correction, obtaining the final, unbiased map of the volume spatial distribution for the entire Caatinga area in the state of Bahia.
For the predictive validation of ordinary kriging, Standardized Mean (MS), Root Mean Standardized Error Square (RMSS), Mean Standard Error (ASE), Square Root of Mean Error (RMS) were calculated from crossvalidation, which provided accuracy of the estimates, based on the data set (Barni et al., 2016;Silveira et al., 2019).All computations and analyses were performed with the support of R's geoR package (Ribeiro Junior et al., 2001) and the final maps were generated in ArcGIS version 10.5 (ESRI, 2019).

Results and Discussion
The average volume of wood estimated by forest inventory was 25.82 m 3 ha -1 (± 19.51).The minimum value was 0.16 m 3 ha -1 , while the maximum was 173.64 m 3 ha -1 , resulting in high variability for the vegetation studied (CV = 75.6 %) (Figure 2).This high variability is expected for the Caatinga dry vegetation because the ecophysiological conditions that promote tree trunk growth and shape are functionally related to specific environmental factors, such as water stress, temperature, soil type, and topography (Chaturvedi et al., 2013;Dexter et al., 2018;García-Cervigón et al., 2020;Mattos et al., 2015).Therefore, the combination of these factors can result in locations with high volume concentration while other areas may have a lower volumetric stock distribution (Reis et al., 2020;Silveira et al., 2019).
The multiple regression analysis confirms these hypotheses by suggesting that all parameters related to the selected environmental variables are significant (p < 0.05) (Table 5) and the residuals show homogeneity of variance (Figure 3A) and are normally distributed (Shapiro-Wilk p < 0.01; Figure 3B).The total variation of volumetric data explained by regression (R 2 ) was 54 %, while the root mean square error (RMSE) was 10.9 m 3 ha -1 .These values are considered acceptable due to the significant variation found for wood volume in the region and agree with other studies (Almeida et al., 2014;Reis et al., 2020;Silveira et al., 2019).In similar ecosystems, these studies found variations in the R 2 value of 53 %, 55 %, and 60 %, respectively, and slightly larger error measurements.
The modeling captures this significant variation due to the incorporation of environmental variables into the equation.Therefore, vegetation volume stocks are possibly not explained only by the dendrometric data of diameter and height but also by the inherently sitespecific environmental conditions, such as temperature, precipitation, and altitude (Balvanera and Aguirre, 2006;Song et al., 2021;Tilk et al., 2017) or even other ecological factors intrinsically linked to vegetation types, such as different successional stages, structures, diversity, and anthropization levels (Segura et al., 2002;Álvarez-Yépiz et al., 2008;Lima et al., 2017;Rito et al., 2017).
In general, the average annual temperature (BIO1), precipitation of the driest month (BIO14), and the relief of the area are prominent factors for the variation in wood volume in the Caatinga in Bahia State (Figure 4).In tropical dry forests, these specific variables directly influence the environmental stress of the area, controlling water availability in the soil and in the plant (Ocón et al.,  Junior et al., 2019).The removal of native vegetation for crops is a procedure that favors the advance of mechanized agriculture, especially in areas of dense vegetation and predominantly flat relief (Silveira et al., 2019).These trends in land use, with the increase in cultivated areas and reduction of dense caatinga vegetation with greater volume, were also observed by Blackie et al. (2014) for the northeastern region in Brazil.On the other hand, the Simpson diversity index is the only positive predictor variable, suggesting that the greater the richness and diversity of species, the greater the volumetric stock found in the region, as expected for most mature, dry tropical forests (Banda-R et al., 2016;Powers et al., 2009).
The residual semivariogram analysis shows a nugget effect of 14.7, a sill effect of 117.6, and a distance 2021), and reflecting in growth patterns, trunk shape, and consequently in wood volume (Brandeis et al., 2005;Derroire et al., 2016).
Volumetric stocks are also significantly influenced by the indices or occurrences of the area with cultivated and managed vegetation, mixed trees, and shrubs.For example, the negative values of the regression coefficients suggest that the volumetric stocks tend to decrease with each increase in the occurrence of these areas.This pattern is expected because land use characteristics generally correlate negatively with wood stocks in any natural forest (Álvarez-Yépiz et al., 2008;Burgos and Maass, 2004;Colón and Lugo, 2006).Furthermore, changes in ecological patterns can alter the vegetation structure and consequently the volumetric distribution at different spatial scales (Segura et al., 2002;Monteiro   of 14.3453 km.These parameters were adjusted using the distance of 44.3198 km and indicate the exponential model with greater accuracy with a reduced mean error of 0.004 m 3 ha -1 , overestimation of -24.75 m 3 ha -1 , and standard deviation of 2.24 m 3 ha -1 (Figure 5).The nugget effect value indicates that the semivariogram model explained the residual variation (Dixon and Uddameri, 2016).The degree of spatial dependence (GD) of the study was 11 %, indicating a robust spatial dependence of the residues, according to Cambardella et al. (1994), classified as strong (< 25 %), moderate (25 % to 75 %), poor (> 75 %).This result indicates that the spatial model explained approximately 46 % (1 -R 2 ) of the wood volume variation.The semivariogram corresponds to an intrinsic characteristic of regionalization.The intrinsic hypothesis is a fundamental concept in the theory of regionalized variables (Calder and Cressie, 2009), implying that the intrinsic function describes the spatial behavior of the regionalized variable within the space.Spatial variation is stationary if the variogram is the same in any sample (Sen, 1989).From a practical perspective, areas with an extensive range of variation in the estimated values tend to interfere directly with the performance of the semivariogram, as observed in this study.Another relevant point is that the shape and size of the sample configuration can affect the theoretical estimators and the description of the spatial dependence structure and the spatial estimates of unmeasured values (Kestring et al., 2015).Therefore, one of the main limitations of using geostatistical techniques is the number of samples needed to form the ideal sampling grid, which spatially represents the distribution of the variable under study (Araújo et al., 2019).
The spatial distribution map of residuals (Figure 6) shows a balanced distribution between under and overestimates, suggesting model adequacy and reliable estimates for observations in unsampled locations (Silveira et al., 2019).
These results are like those found by Vasques et al. (2016), Silveira et al. (2019), andLi et al. (2020).The authors applied the regression kriging for soil carbon and water mapping and for volume and biomass estimation in tropical dry forests.They concluded that regression kriging is more expressive when residues showed symmetrical spatial distribution.In addition, the low values of the estimation errors obtained corroborate these results (Standardized Mean = 0.0257; Square Root of the Mean Standard Error = 0.87; Mean Standard Error = 11.5; and Square Root of the Mean Error = 9.58), considered acceptable for geostatistical kriging estimates (Benítez et al., 2016).Both underestimated (negative) and overestimated (positive) values demonstrate not only the excellent performance of the regression model, but also the standard kriging map of wood volume stock residues (Silveira et al., 2019).
The global predictive map generated by the regression model (Figure 7) and the map corrected by the regression kriging technique (Figure 8) show the same pattern in the distribution of the volumetric stock, with estimates ranging from 0.01 m 3 ha -1 to ~ 110 m 3 ha -1 , showing a similar variation with the data observed at the plot level.The mesoregion corresponding to the Vale dos São Francisco presented the lowest volumetric stocks, as indicated in the map.In this location, municipalities were sampled, namely Sento Sé, Xique-Xique, Gentio do Ouro, and Campo Alegre de Lourdes.The northeastern and central-northern parts of Bahia also revealed lower wood stocks for most areas that comprise these regions.Places with the highest concentrations of wood volume are in the central-northern region and part of the centralsouthern areas of the state.
The decrease in wood volume from the central to the northern part of the state can be explained mainly by precipitation, temperature, and karst relief associated to soils with limestone in the valleys and slopes (IBGE, 2019).Low water availability in the soil in these areas associated to the presence of rocky outcrops with the occurrence of slabs in slightly flat areas also represents a limiting factor for the development of biometric characteristics of plants, resulting in low values of volume wood biomass (Wagner et al., 2014(Wagner et al., , 2016;;Zappi et al., 2015).This region is also characterized by a set of mountains that reach high altitudes and rough relief, resulting from a high erosion rate and occupation of the areas by rocky outcrops, limiting vegetation development (IBGE, 2019).
The central-southern portion of the state holds the remnants with the most significant volumes of wood, ranging from 55 to 108.2 m 3 ha -1 .These regions have greater water availability (IBGE, 2019), providing better conditions for plant growth (Chaturvedi et al., 2013;Rozendaal et al., 2020;Sanaei et al., 2018).However,  small patches of low volumetric stock are visible in the central portion of this region (less than 20 m 3 ha -1 ).Three factors can explain this result.First, it may reflect areas with anthropic disturbances, such as exploitation of vegetation for charcoal and firewood production, crops, and livestock, which are in an advanced degradation stage, leading to a lower wood volume in this region.Second, the low volume concentration in these areas may be due to climatic effects related to a geographic barrier, creating an unfavorable situation for plant growth.An extensive mountain range constitutes this geographic barrier range with high altitudes, which produces an orographic effect on the rainfall regime in specific locations in the central portion of the state Bahia (Silveira et al., 2019).In addition, cold fronts that come from the southern region of Brazil reach part of this region, decreasing humidity in the winter months (IBGE, 2019), therefore directly influencing the biomechanical development of plants (Lines et al., 2012) and leading to the death of some trees in periods of drought (Aguirre-Gutiérrez et al., 2019;Krishnadas et al., 2021).Third, these sites tend to have sandier soils, such as Cambisol and Litholic Neosol (Santos, 2018;IBGE, 2019).These soils generally have low fertility, which creates unfavorable conditions for plant growth, along with the physical characteristics, low precipitation, and high temperatures (Cao and Sanchez-Azofeifa, 2017;Silveira et al., 2019).These interconnected factors regulate the availability and volumetric stock in the caatinga and are hypotheses that deserve further investigation.Maps that predict thoroughly wood volume are needed, as they can simplify research into priority areas for sustainable timber production, conservation unit creation, and biomass and carbon mapping.These advances are urgently needed in light of increasing deforestation rates and different forms of land use that potentially threaten the Caatinga biome.

Conclusions
Using geostatistics is a promising tool to generate predictive maps of volumetric wood stocks.This information is necessary to develop action plans for sustainable management and use of the Caatinga seasonally dry forests in Bahia.The spatial distribution of volume stocks is partly controlled by temperature, precipitation, relief, and vegetation heterogeneity.The regression model suggests an excellent potential to estimate volumetric stock from environmental variables to predict wood volume where they are not measured.Additional studies with a larger sample population and using other variables can improve the model for the Caatinga dry vegetation.

Figure 1 -
Figure 1 -Distribution of plot data by fragments, obtained for Caatinga in the state of Bahia, northeastern Brazil.

Figure 2 -
Figure 2 -Wood volume stock distribution of Caatinga vegetation in Bahia State, northeastern Brazil.

Figure 3 -
Figure 3 -Regression residuals and estimated wood volume (A); distribution of the residual relative to the normal distribution on a reference line (B).

Figure 4 -
Figure 4 -Variables selected by regression model to estimate wood volume in the Caatinga in Bahia.The description of the variables is given in Table 3. Bioclimatic variables: BIO 1 = Average annual temperature (°C); BIO 14 = Rainfall of the driest month (mm); Land Cover Consensus: VCM = Cultivated and Managed Vegetation; AM = Mixed Trees; ARB = Shrubs; Global Habitat Heterogeneity: COR = Correlation; SIM = Simpson's index.

Figure 5 -Figure 6 -
Figure 5 -Experimental univariate semivariogram for the residual volumetric stock of (m 3 ha -1 ) as a function of environmental variables of Caatinga vegetation in the state of Bahia, northeastern Brazil.

Figure 7 -
Figure 7 -Global model of wood volume (m 3 ha -1 ) for Caatinga vegetation in Bahia State, northeastern Brazil.

Figure 8 -
Figure 8 -Wood volume map (m 3 ha -1 ) obtained by the regression kriging for Caatinga vegetation in the state of Bahia, northeastern Brazil.

Table 1 -
The database obtained, collection locations, and numbers of plots.

Table 2 -
Bioclimatic variables, latitude, and longitude used to adjust the volumetric prediction model in the Caatinga in the state of Bahia, northeastern Brazil.
BIO 16 Precipitation of the wettest quarter (mm) BIO 17 Rainfall in the driest quarter (mm) BIO 18 Precipitation of the warmest quarter (mm) BIO 19 Rainfall of the coldest quarter (mm) Elev Elevation (m) Mapping of wood volume in dry forest Sci.Agric.v.80, e20220161, 2023

Table 3 -
Land cover variables and spatial heterogeneity of the global habitat used to adjust the volumetric prediction model in the Caatinga in the state of Bahia, northeastern Brazil.

Table 4 -
Theoretical semivariogram models adjusted to assess the spatial dependence of the Caatinga volume for the state of Bahia, northeastern Brazil.