A close look at above ground biomass of a large and heterogeneous Seasonally Dry Tropical Forest - Caatinga in North East of Brazil

: This work is focused on characterizing and understanding the aboveground biomass of Caatinga in a semiarid region in northeastern Brazil. The quantification of Caatinga biomass is limited by the small number of field plots, which are inadequate for addressing the biome’s extreme heterogeneity. Satellite-derived biomass products can address spatial and temporal changes but they have not been validated for seasonally dry tropical forests. Here we combine a compilation of published field phytosociological observations with a new 30m spatial resolution satellite biomass product. Both data were significantly correlated, satellite estimates consistently captured the wide variability of the biomass across the different physiognomies (2-272 Mg/ha). Based on the satellite product we show that in year 2000 about 50 percent of the region had very low biomass (<2 Mg/ha) and that the majority of the biomass (86%) is concentrated in only 27% of the area. Our work confirm other estimates of biomass 39 Mg/ha (9-61 Mg/ha) and carbon 0.79 PgC. The satellite products together with ground based estimates has the potential to improve forest management in Caatinga and other seasonally dry tropical forests through improved approximation of spatial variability, how they relate to climate, and support numerical modeling experiments in


INTRODUCTION
It is estimated that seasonally dry tropical forests largest continuous area is found in the semiarid region in northeastern Brazil called Caatinga (Miles et al. 2006). The Caatinga environment has for centuries been subject to cycles of land conversion, abandonment, and regrowth. But the net effect of these changes and the actual contribution of the SDTF to the global C cycle are still uncertain (Poorter et al. 2016).
A clearer understanding of the spatial distribution of Caatinga vegetation and its properties, particularly biomass, is essential for strategic development planning, preservation of the biome functions, human services, and biodiversity (Miles et al. 2006). The number of scientific research publications on Caatinga biomass has been usually based on localized field plots, and limited by scarce financial resources. The region is characterized by highly heterogeneous vegetation and land use, and as a result extrapolation from the small-scale sparse field data alone does not accurately characterize the landscape. Therefore, a larger scale monitoring tool with high spatial resolution, such as satellite biomass product, is essential for achieving better biomass estimates (Adams 1999). Here we analyze a high resolution map and quantify the biomass of the Caatinga dry tropical forest based on ground based observations and satellite biomass estimates (Zarin et al. 2016).
A review of the fragmented characteristic of the Caatinga Biome Vegetation The Brazilian semiarid region is dominated by a particular seasonally dry tropical forest called the Caatinga Biome. In this work for simplification we refer to the entire geographic region located predominantly in the northeast region of Brazil as the Caatinga region (IBGE 2004). Within the Caatinga geographic region the vegetation physiognomy is highly variable, and has been classified by different authors as different expressions of the Caatinga biome (Andrade-Lima 1981, Silva et al. 1993, RADAMBRASIL 1983.
While the predominant vegetation is xerophilous thorn woodlands (a combination of shrubs and small trees) with a seasonal herbaceous layer, other forms occur, including mosaics of semi-deciduous and evergreen forests in moister higher altitude sites (classified as Atlantic Forest Biome) and cactus scrublands and rocky soil in dryer regions. The large heterogeneity of the region have been characterized in literature as due to varying degrees of edapho-climatic properties within the geographic region, including orographic effects, geomorphology, degree of dissection of the landscape, slope, wind exposure, as well as soil depth and soil physical and chemical composition (Sampaio 1995, Andrade-Lima 1981, Araujo et al. 2005. Added to the edaphic and climatic spatial variability, most of the Caatinga has already been either completely converted from its native vegetation or modified (Casteleti et al. 2000) by deforestation, selective logging, degradation, and reforestation. Together these effects result in a very fragmented vegetation landscape in different physiognomies, stages of regeneration, and different land use practices.
The importance of the biome and the threats due to human activities The Caatinga has high vegetation biodiversity that is well adapted to the severity of the semiarid climate. There are approximately 1,700 species of trees and shrubs with more than 300 of them being endemic (Moro et al. 2014, Pagano et al. 2013. The Caatinga vegetation is a resource for human needs including wood for fire, construction, and charcoal, fruits, fibers, latex, carnauba based waxes, medicines, apiculture, diverse fodder for cattle, and ornamental plants (Sampaio 1995, Giulietti et al. 2003, Sampaio et al. 2006. The biome is also threatened by local and non-local human activities. Less than 2 % of the region is protected from exploitation and those protected areas are generally small in size (Casteleti et al. 2000, Pagano et al. 2013, Leal et al. 2005. Human activities, such as deforestation, timbering, agriculture, and cattle ranching have changed the Caatinga landscape and contributed to erosion, aggradation of rivers, desertification of large areas, disruption of the hydrological cycle, reduced water quality, increased carbon loss, and threatened the biodiversity (Sampaio 1995. For the period 2002-2008 it is estimated that the average deforestation rates was 0.33% per year (IBAMA 2010). The region is also vulnerable to a changing climate (Guerreiro et al. 2013, Marengo et al. 2016, Marengo et al. 2018. Near term (e.g. 2040) temperature and precipitation changes are predicted to be 0.5-1 o C and -10 to -20% respectively and up to 3.5-4.5 o C and -40% to -50% respectively by 2100 (PBMC 2014). Although the response of Caatinga vegetation to climatic change is not well understood the increased water stress may lead to significant desertification of larger areas of the region (Oyama & Nobre 2003, Salimon & Anderson 2018, Silva de Miranda et al. 2018).

The biomass estimate in the region
Efforts have been made to quantify the total above ground biomass of the entire Caatinga in the early 1990s (Sampaio 1995) to early 2000s (Sampaio & Costa 2011), and individual site locations from ground based estimates (Costa et al. 2014, Albuquerque et al. 2015, Pereira Jr et al. 2016, Amorim et al. 2005, Cabral et al. 2013). In the last two decades, there have been more than 100 publications that explore the floristic and phytosociology in different Caatinga physiognomies, but they do not explore the biomass content (Moro et al. 2014, Araujo et al. 2005. A spatially explicit map of the biome biomass has not been created. This is due to 2 main factors: the extremely fragmented Caatinga landscape and the lack of record of historical land use practices and regeneration capacity, that makes very difficult the extrapolation based on field data points. In this study we explore the spatial distribution of above ground biomass across the Caatinga. We perform an extensive compilation of vegetation floristic and phytosociology inventories of woody vegetation based on scientific publications about Caatinga. We use this dataset to estimate above ground biomass and compare to a high spatial resolution (30 m) satellite remote sensing estimates of biomass (Zarin et al. 2016). We do a comprehensive analysis of the spatial variability of the biomass in the region its relation to different physiognomies and the climatic impact in modulating it.

MATERIALS AND METHODS
The focus of this work is the Caatinga geographic region located in the semiarid northeastern Brazil (Figure 1). Geomorphology is predominantly plains, with the presence of plateaus and mountains up to one thousand meters ( Figure S1a). The change in elevation shapes the spatial patterns of temperature ( Figure S1d). Annual precipitation is distributed irregularly and unpredictably (Sampaio 1995). Precipitation amount is also variable in space it varies from 300 mm/yr up to 2000 mm/yr ( Figure S1b). The soils have complex characteristics and are heterogeneous in space, with deep and sandy soils in western regions to shallow crystalline soils dominating much of the remaining areas ( Figure S1e  of the rivers are intermittent (Sampaio 1995). Detailed maps of the Caatinga biome properties published in governments reports are presented in the supplementary material for convenience (Appendix S1, Figure S1a-h) these are: altitude, precipitation, dry season length, temperature, soil class and soil texture.
The Caatinga potential physiognomies vary from bare-rocky-ground, thorn wood shrubs/ arboreal to evergreen broadleaf tropical forests. The definition of the Caatinga physiognomies and their geographic occurrence across the region has been addressed by some authors (RADAMBRASIL 1983, Andrade-Lima 1981, Silva et al. 1993, and revisited by others (Prado 2003, Sampaio & Rodal 2000, Velloso et al. 2002, Moro et al. 2014, Giulietti et al. 2003. Satellite high resolution images have been used to help map them (Rocha 2004, Eva et al. 2002. The authors in reviewed literature however still find difficulty in defining limits within the transitions between physiognomies as well as due to the high heterogeneity within them. In this work we use as a reference a classification of potential vegetation physiognomies based on the Rocha (2004) (Figure 1). A brief description review of the main physiognomies can be found in the supplementary material (Appendix S1).

Field data bibliographic review and biomass estimation
We assembled phytosociological data inventories within the Caatinga region from 69 papers and literature reviews that provided basic information for estimating in this work the biomass at 104 specific locations (Table SI).
The year of publication of the papers ranges from 1995 up to 2015 and not all of them present the actual year that the inventory was carried on. The field inventory design reported were very similar between the majority of them. The majority are based on the inventory of all individuals with stem diameter at ground level (DGL) higher than 3 cm or at breast height (DBH) higher than 5 cm and are higher than 1 m. The number and size of the plots varied between reports ranging from 0.1 up to 3 ha, depending on their representativeness as estimated in each study. Most of the studies describes the physiognomies of the vegetation. A description of each site of measurement including reference, location, physiognomy, plant structural means and the biomass estimates, are available and presented in Table SI.
Considering the limitations imposed by the variation of available information between publications we chose in these analyses to make the simplest estimate of the above ground biomass, in other to be inclusive rather than exclusive of as many sites as possible. We estimated above ground biomass by applying an allometric equation to the compiled groundbased data. We analyzed the use of two different allometric equations, one is the Caatingaspecific allometric equation developed Sampaio & Silva (2005), and the other one a global dry forest allometric equation developed by Chave et al. (2005), that is used in the satellite biomass retrieval. We present the results of the Caatingaspecific allometric equation (Sampaio & Silva 2005), because it was defined based on local forest and is the one mostly used by the scientific community of the Caatinga. The comparison between the two allometric equation options is discussed in the supplementary material (Appendix S1, Figure S2). Of the 104 published sites reviewed from literature biomass was estimated based on the allometric equations and total of 70 data points met the criteria for comparison to satellite data. The criteria took into consideration uncertainties in field-plot locations and year of measurement reported in the papers. They were classified for quality assurance, based on satellite images (Google Earth Professional) and on the geographical coordinates. All the data were flagged with the corresponding classification: #1 located over an urban area; #2 deforested area; #7 defined based on paper map or description; #8 location represents an average of region defined by more than one site; #9 defined in an area of preservation, #10 geographic coordinates given by the paper. All data from sites that had classification 2 or less, or did not present explicit coordinate locations were removed from the analyses but remain in Table SI for reference.

Satellite Biomass Map
This work uses a wall-to-wall 30 m resolution map of aboveground woody biomass density across the tropics, available at climate. globalforestwatch.org (Zarin et al. 2016). The AGB retrieval follows the methodology presented in Baccini et al. (2012) but it replaced the MODIS data with Landsat 7 ETM+ satellite products, and is briefly described here. 30-m Landsat data were chosen because these data balance spatial detail with acceptable uncertainty. The Landsat datasets included reflectance in the red band, near-infrared band, and two shortwaveinfrared bands (corresponding to bands 3, 4, 5, and 7), the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Infrared Index (NDII), and percent of tree canopy cover from the Hansen et al. (2013) Global Forest Change dataset. The methodology uses the statistical relationship derived between groundbased measurements of forest biomass density (diameter at breast height; DBH) of all live trees having a DBH >5 cm) and collocated Geoscience Laser Altimeter System (GLAS) LiDAR waveform metrics as described by Baccini et al. (2012). It uses allometric equations from Chave et al. (2005) to estimate the biomass density of more than 40,000 GLAS footprints throughout the tropics. GLAS-derived estimates of biomass density were correlated to continuous, gridded variables as Landsat imagery and products, elevation and biophysical variables (Baccini et al. 2004) to generate a pan-tropical map of aboveground live woody biomass density at 30 m resolution for circa the year 2000 (Zarin et al. 2016). A more detailed description of the methodology used to create this data product is in Zarin et al. (2016).
The satellite biomass product was compared to field data estimates by three different methods. The first one was a direct comparison of the site to the corresponding satellite pixel estimate (referred to as Satpix). The second method was a comparison of field data to the average of the satellite data in an area of a 500m radius around the site location (referred to as Sat500). For this comparison the non-vegetated area (or very low vegetated area < 2 Mg/ha) was filtered from the analyses to avoid any underestimation in the average (since the goal is to compare vegetated areas between ground and satellite estimates) ( Figure S3). In the third method the masked satellite product (biomass >=2 Mg/ha) was averaged based on physiognomy type (Figure 1), to quantify how the biomass is distributed within physiognomies ANDREA D.A. CASTANHO et al.

RESULTS
The field data plots available in the literature are not evenly distributed across the biome, being concentrated mostly in the northeast of the region (Figure 2). The above ground biomass estimated based on the field data and satellite shows a strong variability across the region (Figure 2 and 3), the Discussion section explores the reason for that. Field-based estimates of biomass vary greatly (5-118 Mg/ha) as do the satellite (0-272 Mg/ha) (Table I, Figure 2 and 3). The full description of each field site, and estimates of AGB data and corresponding satellite data is presented in Table SI.
Median biomass for the 70 sites used in this study is 43 (interquartile 25-61) Mg/ha. Previous biomass estimates (also based on ground measurements of individual sites) for the region was about 40 Mg/ha (Sampaio & Costa 2011), which is comparable to results from the Forest Resource Assessment (FRA 2015) of 48 Mg/ha. The satellite based estimate from those regions of the Caatinga domain with biomass >= 2 Mg/ ha and within a 500m buffer around each site (Sat500) is 48 (interquartile 32 -76) Mg/ha.
The total AGB C estimate for Caatinga region based on satellite retrieval is 0.796 PgC (considering a C content as a factor of 0.45, (Pereira Jr et al. 2016, Souza et al. 2013, which is comparable to Sampaio & Costa (2011) who extrapolated a total carbon of 0.720 PgC for a vegetated area of about 400,000 km 2 . Results from the FRA (2015) published by the United Nations Food and Agriculture Organization (FAO) estimate a total above ground C of 1.003 PgC for forested areas and other wooded lands (465,901 km 2 ) in the year 2000. The major differences within these works despite of the different methodologies are the vegetated region considered, taking that into consideration and the extrapolation methods of the references they are pretty comparable to the satellite estimate.

Quantitative Comparison between Field data and Satellite data
An individual pixel (Satpix) to ground-site comparison of satellite and field data is inaccurate due to the lack of precise information on the field coordinates and shape of the area sampled, compared to specific satellite pixel area ( Figure S4). To avoid the uncertainty in the pixel-site comparison ( Figure S4) and assuming that the area of the field plot is representative of the average vegetation of that region we compare the calculated field data to the mean of the satellite estimate within a circle of 500 m radius around each reported field site location masked with a threshold for non-vegetated areas (in this study we considered a threshold of 2Mg/ha, and this will be called as Sat500) ( Figure 4). The two data sets are significantly correlated (70 points, correlation 0.54 with p-value <0.05, considering spearman non-parametric correlation). As both variables are independent and have uncertainties we opted to use in this case the reduced major axis regression, that minimize the distance between both variables to the adjustment, rather than the ordinary least square regression (intercept = -5.82 , slope = 1.27) (Legendre & Legendre 2012). The differences between the estimates, show a normal distribution close to zero, and have no trend Figure 2. Estimated ground site above ground biomass (AGB) from phytosociological data from literature review across the Caatinga region (circles colored by AGB), light gray lines represent geomorphic limits. The insert boxes (from A to J) are for reference to specific regions highlighted in the supplementary material (Appendix S1). The full description of each field site and references, and estimates of AGB data and corresponding satellite data is presented in Table SI. relative to the mean of AGB, however there is a difference in the median with satellite estimates being 6.8 Mg/ha greater, which is an increase of about 15 percent of the mean (Figure 4b).

Biomass spatial variability and physiognomies
The satellite based biomass estimate provides a first detailed approximation to the spatial distribution of biomass in this understudied region. A qualitative discussion of the Caatinga physiognomies and biomass map (Appendix S1).
The estimated biomass from Field and respective Sat500 was grouped based on the description of physiognomy observed in each plot ( Figure 5, open and gray boxes respectively). The results quanify the mean biomass in each physiognomy, with the respective variability ranging from thorn woodlands (40 Mg/ha) to mountain evergreen forests (80 Mg/ha). The satellite and field estimates of the average abovground biomass in each of the different physiognomies of the region are highly correlated (r=0.9, statistic correlation significance of 0.015) Figure 4. (a) Scatter plot comparison between above ground biomass (AGB) from field (estimated biomass based on allometric equation) and satellite estimate (Sat500, averaged biomass in a radius of 500 m around the site location). The uncertainty bars represent the satellite propagated uncertainties in y axis (satellite product pixel uncertainty and standard deviation of biomass in the area of studies around each site), and an assumed minimum uncertainty in the field data of 20 percent in x axis. The black continuous line represents the reduced major axis regression (RMA), the dashed lines represent the upper and lower limit of the 95% CI, and the gray continuous line represents the 1:1 regression. The figure also presents the boxplot statistics of field and satellite data sets; (b) Difference between satellite and field biomass estimate, with the corresponding propagated uncertainties of satellite and field data (field uncertainty 50%). and close to 1:1 line ( Figure S5), which suggests that the satellite product is properly identifying the differences in biomass in the different vegtation types.
The average of the satellite biomass estimates within the areas delimited by each physiognomy (Figure 1) (Figure 5, red dots) also present a consistent trend of biomass through the vegetation physiognomies. In general, for most of the physiognomies the average field data and corresponding satellite data are close to the average of the entire region defined in the potential vegetation map (illustrated by the proximity of the red dots to the mean of the bars). The Carrasco physiognomy is an exception, with the satellite estimates being higher than the field estimates in this transitional physiognomy between Caatinga and Cerrado. This Carrasco physiognomy region is in fact characterized by high heterogenity and covers a very broad region and the field data network is sparse. Thus it is likely that more field data inventories would be necessary to be definitevely representative of AGB. The average of the physiognomies biomass ( Figure 5, red dots) is represented spatialy in Figure S7.

Biomass and climate
Expanding the analyses to the entire region, the distribution of the estimates of satellite biomass as a function of the precipitation amount ( Figure 6) and dry season length ( Figure S6) can be quantified. It shows a consistent trend of higher fraction (70-90%) of denser vegetation (from 40 to > 80 Mg/ha) in more humid regions (precipitation > 1000 mm/yr and dry season length less than 2 months) and higher fraction (70-90%) of lower biomass (from 0 to 40 Mg/ha) in dryer regions (precipitation < 600 mm/yr and dry season length less over 8 months) ( Figure 6 and Figure S6).

Biomass and corresponding areas
According to the satellite product, in the year 2000 about 50% of the region presented biomass of <2Mg/ha and accounted for about 1% of the total Caatinga biomass (Figure 7). About 23% of the area had biomass of between 2 and 40 Mg/ ha, which represents 14% of the total Caatinga biomass. About 20% of the Caatinga area had estimated AGB of 4080 Mg/ha, representing about 55% of the total biomass. The biomass of  (Figure 1). The white boxes represent the average of field data AGB located within each of the physiognomy; the gray boxes represent the average satellites AGB estimates in 500m radius around the field plot locations, within each physiognomy. The boxes center represents the mean value, while the lengths represent plus and minus one standard deviation. Red dots represent the mean and respective standard deviation (light gray bars) of the satellite AGB averaged within the entire shapes defined by the physiognomy map (Figure 1).
the wettest regions covers only about 7% of the Caatinga but with about 80-130 Mg/ha accounts for 31% of the total biomass of the Caatinga (Figure 7).

DISCUSSION AND CONCLUSIONS
The results show high biomass variability within each physiognomy (Figure 5), the complexity of the Caatinga biomass can be summarized by three factors (Figure 8). The first factor is the synergy between climate and (i.e. precipitation amount -topography effect and its distribution in time -dry season length) and edaphic properties. Climate define the macro-variability in Caatinga biomass associated with the different physiognomies (from bare ground, herbaceous, deciduous thorn woodlands, dry deciduous/ semi-deciduous forest through mountain humid forests). Within the macro level there is a mesovariability associated with the current land use of the region (from preserved area, abandoned area, pasture, selective logging). Finally within the meso-variability there is the third level (micro-variability) that is the actual stand age of regeneration of the area (that varies from 0 to the total time from when the land was abandoned for regrowth). In this work we could address the first factor in the level of physiognomies and the relation of the biomass distribution to some climatic variables. The other two levels (land use and regeneration) because of the lack of information not only in the field data base but also from a wider extent of the Caatinga region, were not addressed here. However, the impact of the second and the third levels on the Caatinga biomass could in future work be assessed through time-series biomass data from field and or from satellite (Zarin et al. 2016 Figure S1(b)) across the Caatinga region. and shape area of the field plot but that is usually not available. Another factor is related to the heterogeneity of the region. In very heterogeneous regions the comparison of coarse resolution satellite with field data is compromised by the size of the satellite footprint and the representativeness of the size of field data plot. There is thus a compromise in increasing the satellite footprint to be more representative and trying not to include too much heterogeneity (in the small scale) in the statistics. The different allometric equations used in the field (Sampaio & Silva 2005) and satellite retrieval (Chave et al. 2005) may also explain part of the discrepancy in the mean. The comparison between them ( Figure S2), show that the Chave et al. (2005) equation used in the satellite retrieval with a wood density of 0.55 g/cm 3 overestimate the AGB in comparison to the use of local alometric equation (Sampaio & Silva 2005).
Both methodologies present uncertainties associated with the use of a single allometric equation for all AGB estimates in this heterogeneous region. As pointed out by Chave et al. (2005), more forest-specific allometric equations are required for deriving more accurate estimates of biomass. For the Caatinga Sampaio et al. (2010) pointed out that information on the stand age (young, successional, or old-growth forest) may also play an important role in the development of an appropriate allometric equation. An accurate physiognomy map and  Figure 7. Fraction of the total Caatinga area in a given above ground biomass interval (left column) and the fraction of the total Caatinga AGB represented by each above ground biomass interval (right column). description of the vegetation (e.g. specific wood density and height probability density functions) could greatly improve the allometric equations and biomass estimates for satellite retrievals and field-based estimates. A regionally-planned long term field observation network providing a consistent monitoring of strategic vegetation physiognomies and stand ages would greatly improve satellite validation and calibration.
Despite the uncertainties within the different methodologies the results show that this satellite product (Zarin et al. 2016) is consistently able to identify the wide range of biomass in this fragmented and extremely heterogeneous domain, regions which would be extremely challenging to characterize from field observations alone. The databased build in this work is of great utility for the understanding of the climatic effect over different vegetation physiognomies and on the total above ground biomass density. The importance of climate in how it determine the biomass distribution in semiarid regions across the globe is critical for understanding the effect of the changing current and future climate to drier conditions in this region. The data base can support numerical modeling experiment for better accurate simulation of the different physiognomies in this semiarid region, and an important component to better predict their response to climate change. Together, uncontrolled land use change and climate changes may cause an irreversible scenario of desertification in the region. Less than 2 percent of the Caatinga area is protected a better understanding and high spatial resolution of biomass distribution is important to help plan expansion of areas for preservation.
For future analyses accurate annual satellitebased maps of biomass are important to help understand the contribution of the Caatinga region to the global seasonally dry tropical forests carbon stock and change, quantify deforestation and forest growth, understand the dynamical responses of the biomass to climate changes, and plan for greater preservation.     (Zarin et al. 2016), satellite mean AGB and corresponding standard deviation of the 30m resolution data averaged in a 500 m radius around the field coordinates. Figure S4. a) Scatter plot comparison of estimated AGB between fi eld data and corresponding satellite pixel ( (Zarin et al. 2016), AGB map 30m resolution); b) Difference between fi eld and satellite biomass estimate as a function of the mean biomass estimates, with the corresponding uncertainty represented by the error bars. Figure S5. Scatter plot of field and satellite estimates of biomass averaged by physiognomy as defined in field data base (Table SI). The numbers of ground data points in each physiognomy is presented, the extension of the ellipses indicate the standard deviation of the series of AGB to the corresponding physiognomies, for field in x and satellite estimates in y axis (correlation coefficient r=0.9, 0.015 statistic correlation significance). Black continuous line represents the regression 1:1. Figure S6. Fraction of biomass in given ranges as a function of mean dry season length. Biomass is based satellite biomass map, and dry season length ( Figure  S1b) across the Caatinga region. Figure S7. Potential above ground biomass estimate based on the vegetation type map (Figure 1) and the average AGB from the satellite for each vegetation type (Figure 5 red dots).