Diameter distribution in a Brazilian tropical dry forest domain : predictions for the stand and species

Currently, there is a lack of studies on the correct utilization of continuous distributions for dry tropical forests. Therefore, this work aims to investigate the diameter structure of a brazilian tropical dry forest and to select suitable continuous distributions by means of statistic tools for the stand and the main species. Two subsets were randomly selected from 40 plots. Diameter at base height was obtained. The following functions were tested: log–normal; gamma; Weibull 2P and Burr. The best fits were selected by Akaike’s information validation criterion. Overall, the diameter distribution of the dry tropical forest was better described by negative exponential curves and positive skewness. The forest studied showed diameter distributions with decreasing probability for larger trees. This behavior was observed for both the main species and the stand. The generalization of the function fitted for the main species show that the development of individual models is needed. The Burr function showed good flexibility to describe the diameter structure of the stand and the behavior of Mimosa ophthalmocentra and Bauhinia cheilantha species. For Poincianella bracteosa, Aspidosperma pyrifolium and Myracrodum urundeuva better fitting was obtained with the log–normal function.


inTRoDucTion
Diameter distributions are crucial decisionmaking tools for forest management.They directly affect the choices concerning silvicultural and harvesting stages activities.For instance, timing and intensity of thinning and harvesting, as well as harvesting equipment are dependent on the diameter distributions (Robinson and Hamann 2011).Furthermore, they are applied as inputs of growth models and sometimes are the subject of growth modeling themselves.Information on current diameter distribution of a forest stand allows prediction of its future structure which provides even better support for sustainable forest ROBSON B. DE LIMA et al. management (Clutter et al. 1983, Borders et al. 1987, Vanclay 1994, Podlaski 2006).
Many distinc forest domainshave diameter distributions that vary in width and shape, which provide valuable information on the growing forest stock (Assmann 1970, Ferreira et al. 1998, Rubin et al. 2006).The descriptive analysis of diameter distribution through biomathematic models is an implicit prediction technique of the current yield since it provides the number of trees per hectare based on the height of each diameter class (Cao andBurkhart 1984, Cao 2004).Therefore, it allows to obtain very detailed information on the stand structure (Gorgoso-Varela and Rojo-Alboreca 2014, Martínez-Antúnez et al. 2015).
Dry tropical forests comprise about half of the world tropical forests (Murphy and Lugo 1986, Sabogal 1992, Powers et al. 2009).Information about heterogeneity, structure and diversity of Brazilian dry forests is provided by Huntley and Walker (1982, p. 25-47).Nevertheless, they are among the most threatened and the less studied forest domains; hence they may be in higher risks than rainforests (Ghazoul 2002, McLaren et al. 2005, Miles et al. 2006, Portillo-Quintero and Sanchez-Azofeifa 2010, Aide et al. 2012, Gillespie et al. 2012).In Brazil, several other studies had been and are being developed with this theme, showing satisfactory results hypotheses about changes in diversity and structure of forest stands and species of dry forests (Bucher 1982, Kirmse et al. 1987, Oliveira et al. 2007, Leal et al. 2010, Gunkel et al. 2013, Apgaua et al. 2015).
Until now, investigations of the correct management of dry forest were mainly carried out in Africa (Blackie et al. 2014) and Asia (Sagar and Singh 2005, Nath et al. 2006, McShea et al. 2011).Overall, there is a lack of studies on the diameter structure and modeling of dry forest stands and species (Fallahchai and Shokri 2014, Hussain et al. 2014, Sampaio et al. 2010).Investigations of this nature will provide better estimation of production per diameter class and allow ordination and regulation of the productiveness for the stands and species (Zheng and Zhou 2010).
Therefore, this work aims to investigate the diameter structure of a Brazilian tropical dry forest and to select suitable continuous distributions by means of fitting and validation of probabilistic density functions.The following questions are arise: What is the behavior of the diameter distributions of the stand and the species?Which probabilistic density functions better describe the diameter distribution of the stand and the species?Which functions do not provide suitable data fitting for the dry forest area under study?

STUDY SITE
The investigation was carried out at a farm that is mainly explored for legal forest management located at Floresta county, in the Pernambuco State (8°30´37"S and 37°59´07"W) at northeast region of Brazil.The domain is dry forest of the xerophylous type (Caatinga), being characterized by bush-tree vegetation with cactus plants and herbaceous strata (IBGE 2012).The farm area is about 6000 ha (Fig. 1).

FOREST INVENTORY
The forest inventory started in 2008 in a 50 ha-area and was proceeded according to recommendations of the protocol for measurements of permanent plots (Comitê Técnico Científico da Rede de Manejo Florestal da Caatinga 2005).Data was obtained based on acceptable error of 20% and probability of 90%.Forty forest plots of the 20 x 20 m (400 m²) were placed in the study site for measurement of all bush-tree individuals with circumference at the base (0.30 m from the ground level -C b ) above 6 cm.Values were transformed into diameter at the base (0.30 m from the soil height -D b ) previously to fitting.Equivalent diameter was calculated according to Burkhart and Tomé (2012) for plants with more than one stem.

DIAMETER DISTRIBUTION MODELING
Histograms of the frequencies of individual number per hectare per diameter class were generated in order to check the distribution pattern of the tree species and stand.
The distribution curves were subjected to the fitting and validation through diameter distribution models.Plot data was randomly and equally

PDF
Tested density functions x is the center of the diameter class (x > 0) (cm); π is the constant "pi" (3.1416); μ is the arithmetic average of the Neperian logarithm of the diameter (cm) in function 1; σ is the standard deviation of the Neperian logarithm of the diameter (cm) in function 1; exp is the exponential; ln x is the Neperian logarithm of the diameter (cm) in function 1; Г is the gamma function: ( ) ; k, α, β and γ are function parameters -α an k: shape parameter in function 2, and location parameter in function 4 (α > 0); β is the scale parameter in functions 2, 3, and 4 (β > 0); γ is the shape parameter (γ > 0); α can be replaced by the minimum diameter of the stand (α = xmin) in function 4.
divided into a base applied for fitting and another base applied for validation.New class numbers and intervals were determined by this procedure.Exploratory data analysis (EDA) was performed previously to model with the aim of investigating data characteristics.This procedure avoid substantial errors and partial analysis with unsuitable data.The analysis reveals normality and outliers through box-plot graphic technique (Ruppert 2011).
The following descriptive statistics were determined: mean, standard deviation, variation coefficient, skewness and kurtosis (Machado et al.
The parameters of the PDFs were obtained by maximum likelihood through the MASS (Venables and Ripley 2002) and fitdistrplus (Delignette-Muller and Dutang 2015) packages of R software (R Core Team 2015), version 3.2.3.

GOODNESS OF FIT TEST
The goodness of fit in different class intervals was carried out using the Kolmogorov-Smirnov test according to the following expression (Schneider et al. 2009): Where: Fo(x) is the accumulate observed frequency; Fe(x) is the accumulate expected frequency; n is the Number of observations; D n is the calculated D value.
Dn was compared with the value of the Kolmogorov-Smirnov table at a probability level of 99% (Aigbe 2014, Diamantopoulou et al. 2015).This test was used to check the following hypotheses of the bilateral test: H 0 = the observed diameters follow the proposed distributions; and H 1 = the observed diameters do not follow the proposed distributions.
Furthermore, the quality of the fitting was also assessed by the Akaike Information Criterion (AIC).The following expression (Lima et al. 2014) allows to compare models: Where: L is likelihood and k is number of parameters The best functions were chosen by means of the statistic scores with the aim to summarize the results and make the selection process easier.
The weighted value was determined assigning values or weights to the calculated statistics.Those were ranked according to their efficiency by attributing weight 1 for the most effective function and crescent weights for the remaining equations.The weighted value was obtained as follows:

∑
Where: WV is the weighted value of the function in ranking; Wi is the weighted of the i position; Nr i is the number of registers that were obtained in the i position.

SKEWNESS AND KURTOSIS
The skewness and kurtosis coefficients were calculated according to the methodology recommended by Ruppert (2011)

FUNCTION VALIDATION
The functions that showed the best goodness of fit according to the Kolmogorov-Smirnov test were submitted to validation with another database.The validation analysis consisted of predicting the frequencies per diameter class based on the function parameters obtained in fitting.According to Palahí et al. (2002), the following statistics must be considered: Where: E is the relative efficiency (%); Yi is the observed density (number of trees/ha); Ῡ i is the average density (number of trees/ha); Ŷ i is the estimate value of density per diameter class (number of trees/ha); n is the number of observations.Bias (%) is also named "relative tendency".These scores provide the error and quality measurements of the validated functions; hence lower values of Bias and higher values of Relative Efficiency are desirable.

ResuLTs
Descriptive statistics of D b are shown in Table III.The highest and lowest D b values found were 40.7 cm and 1.9 cm, respectively.Around 47.91% of the individuals are characterized as P. bracteosa; hence the average D b of this species was very close to the value determined for the stand.
Among the species of highest importance value, M. urundeuva has less skewness with larger interquartile ranges.The highest skewness and kurtosis were found for B. cheilantha (Fig. 2).
Although the box plot provides general information on location and dispersion, the most relevant parameter is the tail of distribution.
Outliers may adversely affect decisions on data.Skewness, mode, median and average coefficients shown positive asymmetric distributions, since the highest D b concentration is located on the left side.This pattern means that the tails of distribution extend to the right side where the average value is higher than mode and average.Kurtosis coefficients imply that most distributions are platykurtic and correspond to the curves that are flatter than the normal curve with positive excess.
B. cheilantha had the highest individual density (96%) significantly concentrated in the lowest class.This result assures similarity between median and mode.Data from M. ophthalmocentra and A. pyrifolium showed unimodal distribution with positive skewness meaning that the central tendency measurements do not concentrate within the first class, but within the following classes.
Kolmogorov-Smirnov test accepts the hypothesis that describes statistic similarity between expected and observed frequencies.However, the fittings that showed significant difference are inadequate to describe database.Overall, lognormal function showed the lowest AIC values for the stand and for P. bracteosa, A. pyrifolium, M. urundeuva and B. cheilantha species.For the stand,   M. ophthalmocentra and B. cheilantha fitting was better by the Burr function (Table IV).
In addition to ranking, fitting quality may be attested by the frequency histograms and the total frequency estimations (Fig. 3).These results allow to previously trial the histograms.The worst database fittings were reported for gamma and Weibull 2P functions, for both the stand and the species.Gamma function underestimated the frequency for initial classes and overestimated the frequency for intermediate classes.These results are attributed to low flexibility of the functions due to generation of decreasing curves.
The weighted values of the statistical scores classifies the main functions that are valid to predict diameter structure through model validation.
The structural patterns of diameter sizes were better described by the log-normal and Burr functions.The bias values (%) indicate a slight underestimation of frequencies in some diameter classes, but they were not characterized by the low bias values.The efficiency of the validated and accepted functions was suitable.Therefore, they provided effective results for function selection, since the validation statistics qualify as reliable predictions those that show the total variation explained by the functions above 90% (Table V).
The description of the diameter distributions at the level of stand and separate species in different database from sampling confirm the understanding of skewness of the distribution.The cumulative distribution curves are described in Figure 4. We selected these distribution by fully describing the probability distribution of a random variable of real value X.The validation of the functions in the new database suggests the correct use to select candidate models.
The functions correctly describe the tails of the distributions, except for the species M. urundeuva.It is noted a slight tendency for the log-normal function concerning the left tail of the observed distribution, which does not set statistical   differences.This behavior was expected due to the low asymmetry value found for this species.

Discussion
The diameter distribution is a key method to describe the uniformity and growth of a stand.It provides crucial information for forest inventories on different levels of structure and dynamics of the area regarding variability of density within size classes (Assmann 1970).If the diameter distribution depicts a single peak with distortion of the density concentration to the left, this pattern must be kept despite silvicultural interventions (Rubin et al. 2006, Podlaski and Zasada 2008, Zheng and Zhou 2010).Some species stood out by the higher number of individuals within the initial classes, which defines the distribution as "inverted-J" type.Other species showed the mode close, but not within, the first class.Those type of distribution are classified as unimodal with positive skewness (Meyer 1952, McLaren et al. 2005, Podlaski and Roesch 2014).
The individual distribution of a single species by histograms is an attempt to evaluate its development stages since the tool provides the proportions of individuals by class (Meyer 1952, Assmann 1970, Silva and Soares 2002).François De Liocourt (1898) reported that the ratio between the number of trees of successive diameter follows a decreasing geometric series, often with the "inverted-J" shape in natural forests.Each species have specific development and adaption abilities; hence the diameter amplitude of the stand does not necessarily represent those of the species.
In this study, small amplitudes between classes were considered (maximum of 3 cm).The distribution behavior provides information on the area successional stages; hence it is possible to describe the structure of the whole forest or of one species due to the great amount of individuals sampled within the first diameter (Podlaski 2006).
This type of decreasing distribution in dry forests indicates that the regeneration is continuously happening as a consequence of the species' ability to adapt to extremely dry environments (Segura et al. 2003, Powers et al. 2009, Gunkel et al. 2013, Ferreira et al. 2015).
The analyses of the stand detected the highest individual concentration in the first diameter classes.This is mainly attributed to the high density and dispersion of P. bracteosa species, which is often the dominant species in different successional stages of dry forests located at Brazilian northeast region (Figueiredo et al. 2012, Ferreira et al. 2015).This data support the hypothesis that P. bracteosa species has a relatively slow initial growth, but a strong resistance to drought and great capacity to light competition.Moreover, it has the ability of regrowth by strains and roots (Sampaio et al. 1998).
The wide diameter variability justifies the irregularity of the distribution; hence it may be necessary to fit models that express the forest structure by transformation of variables or that have parameters with estimation methods with more refined searches, such as the likelihood method (Gove 2003, Taubert et al. 2013) or neural networks (Diamantopoulou et al. 2015).For P. bracteosa, A. pyrifolium and M. urundeuva the best fitting was provided by log-normal function probably due to diameter variable transformation.This procedure results in variance stabilization; hence it is an alternative method to correct the heterogeneity and reduce the amplitude by reduction of the deviations in relation to the average (Bliss and Reinker 1964, Nanang 1998, Schneider et al. 2009).
A difficulty is related to the traditional statistical methods used, which are not suitable.Overall, it is inferred that observed abundances fit to the predicted values by different models if the adherence tests do not show significant deviation.This is a mistaken approach of the significance test logic because it considers that the distribution model is tested as the null hypothesis.As a consequence, the acceptance will depend more of the test strength than of the fitting quality.Besides, those tests are not suitable to compare different models, since they evaluate the fitting to one distribution each time.Multiple tests cause other problems and are often inconclusive because the fitting of different models may seem equal.The likelihood method is a potential solution to created protocols of simultaneous comparison of many rival hypothesis.In this case, a simple alternative is to select the models according to the information indexes (AIC) (Prado 2009).
The functions adequately generalizes the database from dry natural forests, whose distribution is decreasing.Therefore, they are often applied to forest measurements (Sheykholeslami et al. 2011, Hussain et al. 2014).
Overall, the database randomly selected for fitting and validation did not negatively affect the behavior of the distribution and performance of the models.This is attributed to the sample properties that directly interfere in the dispersion measurements, central tendency, skewness and kurtosis (Ruppert 2011, Lima et al. 2014).In addition to empirical plots, descriptive statistics may help to select models to describe a distribution among a set of parametric distributions.Skewness and kurtosis are especially useful for this purpose.A non-zero skewness reveals a lack of symmetry of the empirical distribution, while the kurtosis value quantifies the weight of tails in comparison to the normal distribution for which the kurtosis is equal 3 (Delignette-Muller and Dutang 2015).
For the decreasing distributions, increases in value concentration around the lowest classes results in higher kurtosis.Although kurtosis is often explained as "flatness degree" of frequency distribution, this parameter actually indicates the degree of value concentration of the distribution around the center of the same distribution (Ruppert 2011).Graphically this peculiarity was associated to curves with more extended tails of the intermediate diameter classes with a sharper frequency peak in the initial classes; hence the mode of the distribution was characterize more clearly.In dry tropical forests, these characteristics of the diameter distribution described by skewness and kurtosis may be influenced by the forest dynamics.Precipitation indexes cause structural and ecological changes in tree growth and affect the distribution usually decreasing skewness and kurtosis values (Nath et al. 2006, Hiltner et al. 2015).
As the diameters increase, the calculated probability exponentially falls to the right and the selected functions generalize those estimations and capture the distribution displacement (Robinson andHamann 2011, Burkhart andTomé 2012).Although there was variation in class number for the stand and species in the fitting and validation, the log-normal and Burr functions fitted well to the observed data and allowed feasible generalization (Nord-Larsen and Cao 2006).
The highest concentration of individuals for both, the stand and the species, points to a short life cycle with limited size due to genetic characteristics or short regeneration time, meaning that the forest is at the beginning of the regeneration process (Swaine et al. 1990, Sabogal 1992, Sagar and Singh 2005, Aide et al. 2012, Gunkel et al. 2013).Other factor may be the limited growth potential in the area hindering the development of the individuals until higher diameter classes (Gillespie and Jaffré 2003, Powers et al. 2009, Apgaua et al. 2015).
By using the diameter distribution tool, the decision making will be trustable concerning intervention in the structure of the most representative species of the dry forests.The generalization of the function proves the necessity of the development of individual models that better describe the data and allows to understand the dynamics, competition indexes, skewness and kurtosis of the dry tropical forest distribution.

figure 1 -
figure 1 -Study site: inside details and location from the wider to specific level.

figure 2 -
figure 2 -Box plot of the diameter and potential outliers for fit and validation database of the highest importance (HIV) and other species.Where: 1 = P. bracteosa; 2 = M. ophthalmocentra; 3 = A. pyrifolium; 4 = M. urundeuva; 5 = B. cheilanta.The black spots indicate the average values of the diameters for each species in their database.

figure 3 -
figure 3 -Diameter distributions and fitted curves of the models for the stand and HIV species.

figure 4 -
figure 4 -Cumulative diameter distribution and validation of the fitted functions for the stand and HIV species.

1198
ROBSON B. DE LIMA et al.

TaBLe i Phytosociological parameters of the five tree species of highest importance value (HIV).
Overwritten numbers correspond to importance value.DE is the density; RDE is the relative density; DO is the dominance; RDO is the relative dominance; FQ is the frequency; RFQ is the relative frequency; and RIV is the relative importance value. * Kurtosis is the degree of relative flattening or elevation of a distribution, usually determined in relation to the Normal distribution.A distribution is called leptokurtic if the curve has a relative high peak, with negative excess, i.e. a kurtosis coefficient < 0.263; it is platykurtic if the curve has the top more flattened, with positive excess, i.e. a kurtosis coefficient > 0.263; The intermediate curve is called mesokurtic, with kurtosis coefficient = 0.263.
the skewness coefficient in module lies between 0.15 and 1, the skew is considered moderate.If it is higher than 1, the skew is strong.ROBSON B. DE LIMA et al.

TaBLe iii Descriptive statistics of the data sets of diameter at base height (D b ).
*number of observations sampled in 40 plots.

TaBLe iv Classification ranking for the diameter distribution models fitted for the stand and HIV species.
(*): hypothesis rejection value; (ns): not significant; Obs.means observed; Est.means estimated; Dcrit. is the tabulated value of the Kolmogorov-Smirnov test; AIC is the Akaike information criterion.