MIXED MODELS FOR NUTRIENTS PREDICTION IN SPECIES OF THE BRAZILIAN CAATINGA BIOME

ABSTRACT Nutrient prediction models applied to tree species from Brazilian Caatinga can be a crucial tool in understanding this biome. The study aimed to fit a mixed model to predict nitrogen (N), phosphorus (P), and potassium (K) content in tree species native to the Caatinga biome located in Floresta municipality, Pernambuco State – PE, Brazil. The following species were considered the area’s most important and evaluated in the present study: Poincianella bracteosa (Tul.) L.P.Queiroz, Mimosa ophtalmocentra Mart. ex Benth, Aspidosperma pyrifolium Mart, Cnidoscolus quercifolius (Mull. Arg.) Pax. & Hoffm, and Anadenanthera colubrina var. cebil (Griseb.) Altschul. Four trees, representing the average circumference in each diameter class, were harvested for NPK quantification. The Spurr model was evaluated for NPK prediction, and species inclusion as a random effect was significant (p > 0.05) in all models. The Spurr model with fixed and random effects presented better statistics than fixed-effect models in all parameters for all nutrients. Generated NPK predicting equations can be a handy tool to understand the impact of wood extraction over Caatinga’s biogeochemical cycles and guide forest management strategies in semi-arid regions of the world.


INTRODUCTION
Brazilian forest conservation is a priority due to its diversity (Soares-Filho et al., 2014), with the remaining 60% of forests covering the country and harboring much of the world forest species in diff erent biomes (Oliveira et al., 2018).Brazilian Caatinga vegetation is one of the largest tropical dry forests remaining areas in the world (Miles et al., 2006) and a complex ecosystem characterized by high environmental variability (Moura et al., 2016).
In recent years, the population density increase has put pressure on the biome's natural resources and caused changes in land cover, mainly native vegetation; accurate information on land-use change in Caatinga is limited, but in 2009, the biome had 53.4% of the original vegetation cover remaining (Beuchle et al., 2015), being one of the most threatened ecosystems in the country (Arnan et al., 2018).Firewood extraction, pasture, and agricultural fi eld settlements are the main human activities that aff ect its vegetation (Aguiar et al., 2014;Althoff et al., 2018).
Forest biomass is one of the main energy sources in the region, with 10 million m 3 of wood harvested in the year (Gariglio et al., 2010).In order to supply this energy demand, wood extraction intensifi es impacts on the carbon cycle and nutrients (Moura et al., 2016;Althoff et al., 2018).Large nutrient amounts removal can lead to soil depletion and severe adverse eff ects over long-term productivity (Aquino et al., 2017;Gómez-García et al., 2016;Macedo et al., 2023;Yan et al., 2017).Understanding better the nutrient dynamics in these ecosystems, mainly nitrogen, phosphorus, and potassium, can help in wood harvesting management and provide greenhouse gas emissions and removals better estimates in the region (Althoff et al., 2018).
Nutrients predicting models are a crucial tool in understanding wood extraction impact over biogeochemical cycles in the Caatinga Biome, in addition to forest management strategies guiding (He et al., 2018).Studies with traditional models were developed in Brazil (Barbeiro et al., 2009;Abreu et al., 2016;Oliveira et al., 2018).However, the majority of the datasets utilized for biomass and nutrient modeling in tropical forests have heterogeneous structures, meaning samples in diff erent sites with high species diversity (Miguel et al., 2013;Grau et al., 2017).These factors make traditional regression models present high error of estimates due to the forests' heterogeneity.
Mixed models can be a promising alternative to modeling heterogeneous environments.These models are often utilized to analyze data across a broad area spectrum (Groom et al., 2012;Hu et al., 2018;Poudel et al., 2018;Özkale and Kuran, 2018).Thus, this study aimed to fi t a mixed model to predict nitrogen (N), phosphorous (P), and potassium (K) in native species from the Caatinga Biome.

Study Area
The study was carried out in a 50 ha area (8°30´37" S and 37°59´07" W) with Caatinga vegetation, which is part of the 6,000 ha Itapemirim Farm, located in São Francisco, a mesoregion of Pernambuco State, Brazil.
The Floresta municipality is part of the Pajeú River watershed.According to the Köppen classifi cation, the region's climate is classifi ed as BSh (Hot semi-arid (steppe) climate).The average rainfall for the site is 503 mm, a rainy period from January to April, with an average annual temperature of 26.1 ºC.The municipality area is 3,643.97km², and the altitude average is 323 m (Araújo Filho et al., 2001).

Dataset
Forest inventory was carried out by sampling, with 40 plots of 20 × 20 m (400 m 2 ) spaced 80 m apart, with 50 m of the border and a 6 cm circumference inclusion level at 1.30 m (CBH).
The following fi ve species were selected as the most important ones, according to the Importance Value Index (IVI), based on information from prior forest inventory (Alves et al. 2017): Poincianella bracteosa (Tul.)L.P.Queiroz, Mimosa ophtalmocentra Mart.ex ser uma ferramenta útil para entender o impacto da extração de madeira sobre os ciclos biogeoquímicos da Caatinga e orientar estratégias de manejo fl orestal em regiões semiáridas do mundo.

Nutrient Quantifi cation
Nutrient quantifi cation analysis (NPK) in the aerial part was based on the diametric structure found in a new forest inventory.The fi ve most important species were divided into fi ve circumference classes with 3 cm amplitude, starting from a circumference at breast height (CBH) of 6 cm.Four trees representative of the average circumference at each class were harvested for aerial part nutrients analysis.Thus, 10 individuals per species were harvested, totaling 50 trees.
In order to cover diameter classes, individuals were chosen randomly, avoiding, though, partially harvested, burned, or fallen trees.The next step was to measure the chosen trees' CBH.Then, each CBH was converted in diameter at breast height (DBH).Then, total (Ht) and commercial (Hc) trunk heights were measured.Subsequent to dendrometric variable measurements, trunk, branches, and leaves were separated, and their samples were sent to laboratory analysis.
Total weight and wet weight samples obtained in the fi eld were used to calculate dry biomass for each aerial component of the 50 sampled trees, using the expression below.The dry matter extracts for P and K analyses were obtained through wet digestion using HNO 3 : HCl in proportion (2:1), while N was obtained through sulfuric digestion.Phosphorus (P) levels were analyzed by colorimetry with visible ultraviolet at 420 nm.Potassium (K) was determined by fl ame emission photometry technique.
The samples were divided among the three laboratories due to limitations in resources and equipment during the research.The nitrogen analyses were performed at the Plant Biochemistry laboratories of Universidade Federal Rural de Pernambuco, while the phosphorus and potassium analyses were conducted at the Laboratory of Organic Chemistry of the Department of Agronomy at Universidade Federal do Piauí in Bom Jesus-PI campus and Universidade Estadual de Londrina, respectively.Nutrient content was determined in g kg -1 , while the sampled trees' total nutrient amount was determined by multiplying concentration in g kg -1 by the dry biomass total.

Fitting Equations
The Spurr model (1952), in linear form, was fi tted with green biomass, diameter, and total height data: The previous equation was fi tted by the Maximum Likelihood Method, using the R programming language (R Core Team, 2014), specifi cally with the glm2 package.The fi t evaluation was done by Akaike Information Criteria (AIC), correlation coeffi cient (r yŷ ) between observed and predicted biomass, root mean square error (RMSE%), bias, and residual graphical analysis (Binoti et al., 2015).
Equations based on the Spurr model were adjusted considering the structure of mixed linear models, including intercepts and random slope coeffi cients, with species as a random eff ect.Mixed models, also known as mixed-eff ects models or hierarchical models, are a type of statistical model that incorporate both fi xed and random eff ects in the analysis.In these models, fi xed eff ects are used to explain the relationships between independent variables and the dependent variable, while random eff ects account for variation that is not explained by the fi xed eff ects.
Equations regarding mixed models were fi tted by Restricted Maximum Likelihood Method (REML) using the R programming language (R Core Team, 2014), specifi cally with the nlme package.The same selection criteria used for fi xed models was applied to mixed ones.Random eff ect inclusion result on intercept and slope was verifi ed by maximum likelihood ratio test (Resende et al., 2014), where the signifi cance of diff erences (D) among deviations [-2log(L)] for models with and without random eff ect, was done comparing calculated and tabulated values, by χ 2 at 5% signifi cance level.After mixed linear modeling, the resulting mixed model can be complete, partially complete, meaning random eff ects associated with only some parameters of the original model, or even a fi xed-eff ect model, referring to nonsignifi cance of random eff ects.Where: Ln = neperian logarithmic; N = nitrogen, in kg; P = phosphorous, in kg; K = potassium, in kg; DBH = diameter at breast height, in cm; Ht = individual total height, in m; β0 to β2 = model fi xed parameters; a i = random intercept for i-th species; b 1i = random slope coeffi cient for i-th specie; ε i ~ N (0, σ2) = random error.

RESULTS
Mixed models allow for the incorporation of both fi xed and random eff ects, which can help to explain the sources of variability in the data and improve the accuracy of the results obtained.In their fi xed or mixed forms, the Spurr model equations showed signifi cant estimates for fi xed eff ects parameters (Table 1).
Random coeffi cients considering the Spurr model structure were generated for each species to predict the NPK content in the region evaluated (Table 2).
Residuals showed adequate distribution along a straight line, with a mean around zero and constant variance.The hypothesis of homogeneity is not rejected concerning equations with random eff ects on DBH and Ht (Figure 1).Species inclusion as a random eff ect was signifi cant (p < 0.05) in all models according to the maximum likelihood ratio test.Thus, the fi nal model showed fi xed and random eff ects.(Table 3).
The AIC value for Model 10, which includes a random eff ect only in the slope of the height variable, was the lowest among all models tested (Table 4).This indicates that Model 10 is the best model for potassium analysis, Model 6 is the best model for nitrogen analysis, and Model 9 is the best model for phosphorus analysis.

DISCUSSION
Mixed-eff ects models off er a fl exible and powerful tool for analyzing pooled data while estimating both fi xed and random model parameters.The fi xed eff ects are average values of the population similar to parameters obtained by ordinary least squares regression.Random eff ects can be estimated for each hierarchical level in a data set and various parameters in a model (Ou et al., 2016).These models are essential tools used in the forestry sector as they provide an adequate framework for assessing the growth and forests condition.Mixed models allow calibrations for a given location or tree and can individual and species-specifi c predictions (Miguel et al. 2013;Huff et al. 2018).
The all fi xed eff ects parameters signifi cance confi rms the DBH and Ht inserting importance as model-predictive variables (Calegario et al., 2015).In a mixed model, if response variable information is available for a new species, random coeffi cients are estimated considering each species-specifi c response instead the population mean response.In the average population, the random coeffi cients vector of for a new individual has expected value equal to zero (Burkhart and Tomé, 2012).
The species-included signifi cance as a random eff ect in all models indicates that this variable can be inserted as another tree NPK predictor (Garber and Maguire 2003;Huff et al. 2018) in order to improve estimates precision.Statistics from mixed eff ect models were superior to fi xed-eff ect models when predicting NPK in native species, which highlights the improvement due to random eff ect inclusion (Adame et al., 2008;Crecente-Campo et al., 2010;Ruslandi et al., 2017).
The residual distribution was considered adequate.Data outside the range were insignifi cant, since it is a small amount in relation to the sample size, not actively interfering with the model estimates (Gouveia et al., 2015).When a sample is available to estimate random eff ects, the performance of a mixed model is better than a fi xed model (Temesgen et al., 2008).This statement is proved by residues of all equations with random eff ects that have a smaller amplitude than the equation in its fi xed form.
These results are important because they suggest that the models used in the study are reliable and provide accurate estimates of the eff ects of the variables being analyzed.In particular, the fact that the residuals follow a straight line with a mean value close to zero suggests that the models are unbiased and that the random eff ects included in the equations eff ectively account for the variability in the data.Furthermore, the constant variance observed in the residuals indicates that the models are valid across the range of values of the predictor variables, suggesting that the relationships between the variables being studied are consistent throughout the dataset.This is important because it indicates that the results obtained from the models are likely to be robust and applicable to other similar datasets (Bates et al., 2015).
In the present study, improvement in NPK predictions due to random eff ect inclusion corroborates with the affi rmation of Huff et al. (2018), in which the authors stated that species included as a random eff ect improve the estimates of mixed models compared to fi xed ones.It is worth mentioning that other variables can be inserted as a random eff ect, such as forest type, region or site quality classes, precipitation, soil, elevation, among other geographical characteristics (Meng et al., 2007;Boubeta et al., 2015;Ou et al., 2016;Özçelik et al. 2018).
Morphological changes that occur between species, together with intraspecifi c diff erences caused by climatic and other environmental factors, require that individual equations are used to predict biomass in varied regions (Huff et al., 2018).Thus, the mixed model approach for species macronutrients modeling in the Caatinga biome is an alternative to obtain accurate predictions.
It is worth mentioning that new studies with environmental variables can be carried out and can improve the estimates.Mixed linear models provide a more fl exible approach to analyze non-normal data when random eff ects are present.Finally, generated equations can support decision-making and guide politics towards better conservation practices in the Caatinga Biome.

CONCLUSION
Species inclusion as a random eff ect promoted an RMSE reduction of at least 4% in mixed models compared to fi xed models.Thus, the proposed equations capture each species' eff ect and can be applied to better estimate NPK in trees from the Caatinga Biome.
The generated equations can be a handy tool to understand the impact of wood extraction over biogeochemical cycles of the Caatinga Biome and support forest management strategies in semi-arid regions of the world.
Overall, the use of mixed models in the study of tree nutrition in the Caatinga ecosystems can help provide a more comprehensive understanding of the complex relationships between nutrient availability, tree physiology, and ecosystem dynamics, ultimately contributing to the development of more eff ective and sustainable management strategies for these valuable and threatened ecosystems.

Table 1 -
Estimates of fi xed-eff ects parameters for the Spurr model to predict native species NPK content regarding trees located in Floresta municipality, Pernambuco State, Brazil.Tabela 1 -Estimativas dos parâmetros de efeitos fi xos do modelo Spurr para predizer o teor de NPK de espécies nativas, para árvores localizadas no município de Floresta, Pernambuco.
^Table 2 -Random eff ects estimates regarding the Spurr equation to predict NPK content in native species located in Floresta municipality, Pernambuco State, Brazil.

Table 3 -
Maximum likelihood ratio test for equations that predict NPK in native species in Floresta municipality, Pernambuco State, Brazil.Tabela 3 -Teste de razão de máxima verossimilhança para equações de NPK em espécies nativas no Município de Floresta, Pernambuco.: MLE: Maximum-Likelihood Estimation.The Spurr model with fi xed and random eff ects showed the best statistics than fi xed models for all parameters and nutrients.Onde: MLE: Estimador de Máxima Verossimilhança.O modelo Spurr com efeitos fi xos e aleatórios apresentou as melhores estatísticas em comparação com os modelos fi xos para todos os parâmetros e nutrientes.

Table 4 -
Precision statistics of the Spurr model in its fi xed and mixed forms in Floresta municipality, Pernambuco State, Brazil.Tabela 4 -Estatísticas de precisão do modelo Spurr em suas formas fi xa e mista no Município de Floresta, Pernambuco.