HeigHt-diameter models for tHree subtropical forest types in soutHern brazil

Total tree height ( h ) is often difficult to measure in natural forests. Regression models based on easily accessed variables like DBH ( d ) can be an alternative, since their assumptions are validated. The aims of this study are to: (i) calibrate specific and generic h - d models for three forest types (Seasonal Deciduous Forest, DEC; Mixed Ombrophilous Forest, MIX; and Dense Rainforest, DEN) in Santa Catarina state testing the regression assumptions and evaluating model quality; (ii) verify different h-d relationship between forest types. The dataset (1,766 measured tree h and 3,150 estimated h ) was collected by Santa Catarina Forest and Floristic Inventory (IFFSC) in 418 systematically located sample plots. Models were calibrated for two datasets, one containing hypsometer measurements, the other h estimations made by field crews. Specific models were calibrated for species with at least 30 sampled trees . Residual normality, randomness and heteroskedasticity were evaluated by analytical methods. Confidence bands were generated by the Working-Hotelling method; z test for means was applied to compare models based on the two databases. The statistical parameters such as corrected Akaike Information Criterion provided evidences that logarithmic models were better adjusted to the data. Both datasets were statistically different for DEN and MIX. Differences in h - d relationships were found between forest types. The use of calibrated h - d models is an alternative for studying the relationships between these variables and to assess vertical structure patterns of forest communities, when h measurements are not feasible, although, for situations that more accurate h values are needed, they will not always provide reliable predictions.

introduction Some of the dendrometric variables usually assessed in forest inventories are dependent on each other and the nature of their dependence (e.g. linear, quadratic, logarithmic or exponential) allows estimates through mathematical models that express the relationship between these variables. The use of mathematical modeling turns out to be interesting when the measurement of dependent variables is costly and the independent variables are easily obtainable. Calibrating regression models is the mathematical technique often used to construct estimates of an independent variable.
Total tree height (h) is one of the variables difficult to measure, especially in tropical forests where several layers and a discontinuous and dense canopy occur (Hu, 1992;Vanclay, 1994). This measurement is usually taken with an instrument (hypsometer) that records the extent of the angle and distance from a tree to estimate its total height using trigonometric relations (Crecente-Campo et al., 2010).
This procedure is subject to systematic and random errors due to factors such as distance between hypsometer operators and trees, device type, training level of operators, as well as crown form and total tree height, which can affect the measurement accuracy (Silva et al., 2012). Knowing that the height of a tree is correlated with its diameter at breast height (d), an alternative solution to the mentioned problem is the use of h-d models, which relate these two variables. In fact, the h-d relationship is one of the most important elements of forest structure (Tewari;Gadow, 1999).
Most of the h-d relationship studies refer to even-aged stands, probably due to the aforementioned difficulties of data assessment in native forests. There are few studies concerning tropical forests (Fang;Bailey, 1998;Batista;Couto;Marquesini, 2001;Feldpausch et al., 2011;Scaranello et al., 2012). Batista;Couto and Marquesini (2001) analyzed the h-d relationship in an equatorial rainforest in Maranhão (Brazil) and in the Atlantic rain forest with Tabebuia cassinoides (Lam.) DC. in São Paulo (Brazil). Nonlinear models showed better biological foundation and demonstrated superior performance than linear models. Machado et al. (2008) tested 13 h-d models to estimate total height of Araucaria angustifolia (Bert.) O. Kuntze in Paraná state, Brazil. The authors achieved the best estimates with the logarithmic model proposed by Curtis (1967). Caetano et al. (2014) found that a second order polynomial model was the best for Bertholletia excelsa Bonpl. intercropped with Hevea brasiliensis ((Willd ex A. Juss.) Muell Arg.) in Minas Gerais state, Brazil. Scaranello et al. (2012) studied the h-d relationship on an altitudinal gradient in the Atlantic Forest (Serra do Mar State Park, Brazil); the authors applied 11 h-d models and obtained the best estimates from the Weibull and Chapman-Richards nonlinear models and concluded that models calibrated to specific altitude ranges produced better results than generic models. Feldpausch et al. (2011) concluded that different h-d relationships occur in different tropical forest types and that a h-d relationship is influenced by climatic, topographic and forest structural variables. Fang and Bailey (1998) investigated the h-d relationship of fallen trees on Hainan Island (China) and, among 33 mathematical models, the best estimates were generated by an exponential model for height curves published by Meyer (1940). Tewari and Gadow (1999) used median regressions and percentile curves of 5 and 95% of the bivariate S BB distribution to study the h-d relationship of a pure Acacia tortilis Hayne stand in India and in a mixed forest in Germany containing Fraxinus excelsior L., Fagus sylvatica L. and Acer sp. The percentile curves of the S BB distribution indicated that the variation in h for a given d is less pronounced for larger trees. Huang, Price and Titus (2000) calibrated 27 h-d models for Picea glauca (Moench) Voss in the Alberta Boreal Forest region and found that logistic models produced more satisfactory predictions. The authors found that the models present different behaviors within different ecoregions and the application of models adjusted for one ecoregion in another can generate different estimates: 29.5% overestimated or 21.92% underestimated.
Dendrometric data collected by the first cycle of Santa Catarina Forest and Floristic Inventory (IFFSC) in southern Brazil give a unique and unprecedented opportunity to explore the quantitative and qualitative aspects of the state's forest cover. IFFSC applied a systematic sample design, measured between 2007 and 2010 at equal probability sample points (Vibrans et al., 2010). Thus, the objectives of this study were to (i) calibrate specific and generic h-d models for three forest types in Santa Catarina state and to test the regression assumptions; (ii) choose the best model based on statistical indicators; (iii) compare the calibrated models for the measured and estimated datasets; (iv) compare the h-d relationships of each forest type.

Study location
Santa Catarina state is located between latitudes 26° and 29° S and longitudes 48º and 53º W with an area of 95,346 km². Three forest types, manifestations of important variations of geomorphological, climatic and edaphic site conditions, were covered in this study (Klein, 1978;Vibrans et al., 2010): Seasonal Deciduous Forest (DEC), Mixed Ombrophilous Forest with Araucaria (MIX) and Dense Rainforests (DEN). According to Köppen-Geiger climatic classification (Kottek et al., 2006), Santa Catarina has two climatic types: Cfa -fully humid temperate climate with warm summer and Cfb -fully humid temperate climate with cool summer. The climates are defined primarily by the temperature difference due to altitudinal variation. Longterm (> 30 years) averages of mean annual temperature are 18.35; 16.36; 18.90 (°C) for DEC, MIX and DEN, respectively, while annual mean precipitation is 1,646; 1,632; 1,574mm (EPAGRI, 2002). Altitude range (m a.s.l.) for sample plots is 503-898m for DEC, 514-1,560m for MIX and 2-1,195m for DEN.

Data collection and analysis
Data used in this study were obtained by IFFSC between 2007 and 2011 (Vibrans et al., 2010), using the Brazilian National Forest Inventory methodology (Freitas et al., 2010). The sampling design features a systematic distribution of sample points located at the intersections of a 10-km x 10-km grid that covers the entire state with the exception of DEC for which a 5km x 5km grid was used due to its highly fragmented status.
Of the 418 sample plots, 78 were located in forest remnants of DEC, 143 in MIX and 197 in DEN. The IFFSC sample plot consists of a cluster of four crosswise 1,000 m² subplots (20 m x 50 m), each one located at a distance of 30m from the plot center. Inside the sample plot limits, the field crews measured the d (diameter at breast height, 1.30 m) of every tree with d ≥ 10 cm; total tree heights were visually estimated by the field crews and total heights of up to eight trees per sample plot, spanning all diameter classes, were measured with a hypsometer (Vibrans et al., 2010). In total, 1,766 measured tree heights and 3,150 estimated heights were used to calibrate specific and generic h-d models as well as to examine the reliability of visual estimates made by the field crews.
The h-d models (Table 1) were calibrated in a generic way for two datasets: (i) h measured with hypsometers and (ii) h estimated by field crews. For further reference, these data are treated as the measured dataset and the estimated dataset.
Considering that a height for a given diameter can vary significantly between species (King, 1996) and regions (Huang;Price;Titus, 2000;Feldpausch et al., 2011), h-d models were also calibrated specifically for some species and for each forest type, in order to verify possible improvements on model quality or characteristic h-d relations. The specific model was calibrated for species having a sample size of at least 30 trees with exception of Ocotea puberula of MIX (22 trees) and Clethra scabra of DEN (29 trees) ( Table 2).
A random sampling of 3,250 trees amid the entire estimated dataset was performed to facilitate their manipulation. For the measured dataset, this procedure was unnecessary, given the smaller number of measured trees.
The datasets' normality were evaluated using the Kolmogorov-Smirnov test with α = 0.05. Detection of outlier data was made by the standardized scores (mean 0 and standard deviation 1) of the indexes h/d and d/h which showed a normal distribution according to Shapiro-Wilk's test with α = 0.05; values whose absolute score were greater than 3 were considered outliers (for normal data, 99% standardized scores are within an interval of ± 3 standard deviations). Excluding outliers, the tree species with a sufficient number of measured individuals (n ≥ 30) for each forest type in Santa Catarina are presented in Table 2.
The h-d models were linearized for calibration using minimum least square method. The root mean square error (RMSE), pseudo-coefficient of determination (R 2 *), (Naesset, 2011) and corrected Akaike information criterion (AICc) were calculated for each model, all based on back transformed data (original scale). The R 2 * was used since the assumptions underlying R² are not completely satisfied when using nonlinear models (Anderson-Sprecher, 1994).
Overall regression significance was assessed using the F test (α=0.05) of the regression's analysis of variance. A model ranking procedure was used to select the model with the best performance: the significant model (for both overall and parameters significance) with lower AICc was selected. For the best models, residuals were analyzed to check the basic regression assumptions -normality was investigated using the Kolmogorov-Smirnov test, Runs test to randomness and Brown-Forsythe test to heteroskedasticity (Neter et al., 1996). The F test for lack of fit (Zar, 2010) was conducted on the linearized scale to assert the statistical significance of the linear relationship between the variables. All the aforementioned hypothesis tests used α = 0.05. Confidence bands were generated by the Working-Hotelling method (Neter et al., 1996) with α = 0.05. The interpretation of these bands is analogous to the interpretation of confidence intervals for each regression estimate. However, the confidence bands are wider because they relate to the regression line as a whole and not to specific estimates.
Next, a comparison of each pair of models calibrated for the measured and estimated datasets was made. A simulation of 80 random values of d (10-80 cm range) was conducted and h was predicted by each pair of the models to be compared. The mean values of those 80 predicted h were compared through a z test for independent means (Triola, 1999) with α = 0.05. This procedure was conducted 100 times for all the best performance models (generics and specifics) aiming to prevent precipitate inferences based on a single simulation of d values.
To study the differences in h-d relationship among the three forest types, it is necessary to conduct an inside analysis of the h-d models in a common range of the predictor variable (d). Graybill and Iyer (1994) proposed that a statistical test should not be used by itself but in combination with the construction of simultaneous confidence intervals using dummy variables based on a Student's t distribution; these intervals present the differences between the regressions instead of a single and static conclusion, which a hypothesis test yields. These intervals must be constructed simultaneously to avoid the increment of the significance level. To achieve this premise, Bonferroni corrections are a usable approach, resulting in conservative confidence intervals. The regression parameters confidence intervals (α = 0.05) were constructed for minimum d of 10 cm and a maximum of 80cm, which is the most sampled diameter range. The models considered in this analysis were the generic ones calibrated for the measured dataset.

results and discussion
The results of the calibration process are presented in Table 3 for the two datasets. The measured data are graphed over their calibrated models to demonstrate the dispersion around the regression line, circumventing the need for residual graphs. Since the residual analysis was made based on analytical procedures, visual analysis was not necessary. The best-fitted models for each dataset, respectively, were also graphed on the same cartesian plane. The aforementioned graphs for DEC, MIX and for all data (Overall generic) are shown in Figure 1 and 2.
The model residuals were subjected to Kolmogorov-Smirnov (normality), Runs (randomness) and Brown-Forsythe (heteroskedasticity) tests. All models presented an expected residual behavior; the tests showed no evidence to reject the null hypothesis (p ≥ 0.05). Therefore, the residuals may be regarded as satisfying the aforementioned assumptions. The F tests for overall regression significance rejected the null hypothesis of absence of regression for every calibrated model. All the F tests for lack of fit presented non-significant p-values (p ≥ 0.05), except the generic model of DEN (model n°9) and Overall generic model (model n°9).
The species-specific and generic models calibrated for measured dataset presented substantially lower AICc than for the estimated dataset. This suggests that data variance is higher on the heights estimated by field crews. This issue reflects on the results of the z tests for means which rejected the null hypothesis for DEN, MIX and Overall generic models (all p-values < 0.05, according with the 100 runs), meaning that the models calibrated for each dataset did not estimated the same mean height. For DEC the z test failed to reject the null hypothesis (all p-values ≥ 0.05). For all species-specific models the z tests were not significant, except for Ocotea puberula in DEC.
Different h-d relationships between the forest types were observed, as found by Feldpausch et al. (2011) for different forest types distributed throughout the tropics. These differences can be verified by examining the difference between the adjusted parameters for the regressions (Feldpausch et al., 2011). Table 4 shows the simultaneously generated confidence intervals for these parameters.
For both minimum and maximum d, the only confidence interval containing zero is for the difference between DEC and MIX, providing no evidence to reject the null hypothesis; so, there is no evidence for assuming differences in the h-d relationship between the two forest types. The DEN results indicate significant differences from the other two forest types for both minimum and maximum d. Figure 3 shows the simultaneous confidence intervals for the three models with d ranging from 10 to 80 cm.
Among the 10 calibrated h-d models, the logarithmic models of Loetsch; Zohrer and Haller (1973) (models n°5 and 6) proved to be the most effective in 86% of cases. Its advantage is associated, in part, with the fact that the logarithmic scale tends to correct the level of data heteroskedasticity coming from the positive correlation between the mean and variance of the dataset (Sokal;Rohlf, 1987). Loglinear models have good qualities, such as satisfactory precision and easy adjustment and are widely used models for allometric purposes (Brown;Gillespie;Lugo, 1989;Fang;Bailey, 1998;Nogueira et al., 2008;Breidenbach et al., 2014;Westfall, 2014;McRoberts et al., 2015). Although, when model parameters are estimated using data transformed to the logarithmic scale and prediction is desired on the original scale, an adjustment term must be added to the prediction to compensate for bias that accrues due to the transformation (Baskerville, 1972). By the way, this procedure is often ignored by Brazilian forester community. In this study the addition of the correction terms to the predictions were considered negligible because residual variance (s² res ) were less than or equal to 0.03; predictions were little affected by a multiplicative factor of 1.02.     The use of multivariate models may be an alternative that could increase the prediction and explanation of h (Neter et al., 1996). Despite the fact that the purpose of mathematical modeling is to provide an average estimate for the desired value of the response variable, the models calibrated for high variant data will not generate fully reliable predictions. The calibration of h-d models appears to be an alternative for studying different relationships between these variables (Feldpausch et al., 2011;Scaranello et al., 2012) and to assess vertical structure patterns of forest communities, when h measurements are not feasible; although, for situations that more accurate h values are needed they will not always provide reliable predictions.
The comparison between the models adjusted for the measured and estimated datasets showed that most of the species-specific models predicted the same mean values for h. This result demonstrates an overall quality of visual estimates made by IFFSC's field crews. Thus, visual estimation may be a feasible operational technique, since instrumental measurements are often laborious and time-consuming to obtain. However, the generic models did not predict the same mean h (except for DEC) for the measured and estimated datasets. These facts, together with the higher AICc of generic models, may be evidences that species-specific models are more appropriated when predictions of h are desired. In addition, species-specific models are also appropriate for analyzing the different h-d relationships between species (King, 1996) or regions/sites (Huang;Price;Titus, 2000;Scaranello et al., 2012).
The variability within the h-d data is an obstacle for the calibration of models for native forests. There is a large dispersion of the observed values and of the model-predicted values as found by Huang et al. (2000). The encountered magnitude of the R²* values was not large, having a maximum value of 0.66. However, the F tests revealed that all regressions were statistically significant, providing evidence for the existence of correlations between h and d, but not necessarily of causal relationship between these variables. Like site conditions in even-aged forest stands, tree height is probably influenced by other random explanatory factors not included in this study. Some factors may explain the results concerning the h-d relationships of the three forest types. Variables such as basal area are closely related to vegetation structure and can produce variation in h-d allometry (King et al., 2009;Feldpausch et al., 2011). This observation made by Feldpausch et al. (2011) may, in part, explain the different h-d relationships observed for DEN in relation to the other forest types. The total basal area (m².ha -1 ) of this forest type proved to be statistically different from that of the other forest types (Vibrans et al., 2012). The influence of basal area is explained by the high competition for light where individuals tend to grow first in height to reach the upper forest layers (Feldpausch et al., 2011). King et al. (2009) observed different h-d relationships in two well-developed forests in Malaysia with different basal areas. These differences were also observed by King (1981) and Hummel (2000), who found that trees growing in more open environments have larger diameters compared to trees growing in closed environments such as tropical rainforests; therefore basal area perhaps should be included in new models as a predictor variable to account for competition.
Geomorphological and climatic factors may also influence the h-d relationship, as well as altitude (Huang et al., 2000;Lawton, 1982;Feldpausch et al., 2011;Scaranello et al., 2012). While DEN occurs in coastal sedimentary plains and the granulitic complex of the coastal range in altitudes between sea level and 1,200m, the MIX occurs in the sedimentary and basaltic plateaus of Santa Catarina, at altitudes ranging from 500 to 1,500m a.s.l., where the winter season is characterized by the frequent occurrence of ground frost and temperatures that are considerably lower than those of the coastal region. According to Woodward (1993), the wind speed increases with increasing altitude. The highest altitude of the Santa Catarina plateau exposes the vegetation to stronger winds and may influence the h-d relationship, as observed by Lawton (1982). Scaranello et al. (2012) noticed that the decrease in air and soil temperature at higher altitudes can also influence h-d relationships.

conclusions
The regression assumptions for all the presented h-d models were validated; this allows for estimates with valid confidence intervals. The high variance within h-d datasets of native forests is a limitation for the calibration of models which are intended to be used as reliable predictors of h. The calibration of h-d models showed to be an alternative for studying different relationships between these variables, although, for situations that more accurate h values are needed they will not always provide reliable predictions. Compared to the generic models the speciesspecific ones are more indicated when predictions of h are desired. Most of the species-specific models calibrated for the measured dataset and estimated dataset estimated the same mean h. This is an evidence that the visual estimation can be used when field crews are appropriately trained; it is necessary to use the h measurements of some trees in the field to create a reference for further estimations. There are differences in the h-d relationship of the DEN when compared to the other two forest types. However, no significant differences were found in the h-d relationships between MIX and DEC, possibly due to the similar climatic and geomorphological conditions as well as the similar structural and floristic closeness of these two forest types.