Modeling and mapping basal area of Pinus taeda L . plantation using airborne LiDAR data

Basal area (BA) is a good predictor of timber stand volume and forest growth. This study developed predictive models using field and airborne LiDAR (Light Detection and Ranging) data for estimation of basal area in Pinus taeda plantation in south Brazil. In the field, BA was collected from conventional forest inventory plots. Multiple linear regression models for predicting BA from LiDAR-derived metrics were developed and evaluated for predictive power and parsimony. The best model to predict BA from a family of six models was selected based on corrected Akaike Information Criterion (AICc) and assessed by the adjusted coefficient of determination (adj. R2) and root mean square error (RMSE). The best model revealed an adj. R2=0.93 and RMSE=7.74%. Leave one out cross-validation of the best regression model was also computed, and revealed an adj. R2 and RMSE of 0.92 and 8.31%, respectively. This study showed that LiDAR-derived metrics can be used to predict BA in Pinus taeda plantations in south Brazil with high precision. We conclude that there is good potential to monitor growth in this type of plantations using airborne LiDAR. We hope that the promising results for BA modeling presented herein will stimulate to operate this technology in Brazil.


INTRODUCTION
In Brazil, pine plantations cover 1.59 million hectares, and are the most important long fiber source for pulp and paper production-accounting for approximately 20.54% of the country's total reforested area (Ibá 2015).Most of the pine plantations are concentrated in south Brazil, with 34.1 and 42.4% of the total reforested area located in Paraná and Santa Catarina states (Ibá 2015).Pinus taeda L., whose common name is loblolly pine, is extremely valuable since it is the most planted tree species in the region (Kohler et al. 2014).
Forest inventory in loblolly pine is typically conducted annually to monitor forest growth in Brazil-allowing managers to identify problematic CARLOS A. SILVA et al. conditions during initial growth stages, and determine optimal harvest time.Inventories measure tree diameter and basal area.Basal area describes the state of the stand and it aids directly the thinning decisions (Reukema 1975, Emmingham andGreen 2003).Diameter and basal area distributions are therefore used in many forest management planning packages for predicting stand volume and growth (Gobakken and Naesset 2004).The most accurate method of estimating basal area in loblolly pine plantations is to physically sample it in the field.However, field measurements of forest attributes over large areas are limited by budgets and time, making them impractical (Hummel et al. 2011, Silva et al. 2014).
Light Detection and Ranging (LiDAR), is a powerful and well-suited technology for mapping many forest attributes, including basal area.LiDAR is very useful for providing high resolution, threedimensional information of vertical and horizontal forest structures and the underlying topography (Silva et al. 2016a).In forestry, key LiDAR applications include high accuracy retrieval of tree density, stem volume, above ground carbon, leaf area index and basal area (e.g.Naesset and Bjerknes 2001, Andersen et al. 2005, Hudak et al. 2006, Silva et al. 2014).LiDAR has been widely used to quantify and map forest attributes in conifer and deciduous forests types (e.g., Naesset 1997, Naesset and Bjerknes 2001, Popescu et al. 2003, Popescu 2007).However, in Brazil LiDAR has not been widely used to predict forest attributes in pine plantations as it has been used to predict and map forest attributes in Eucalyptus plantations (e.g., Silva et al. 2014, 2016a, 2017, Carvalho et al. 2015).
In many countries, LiDAR has already moved from the research arena to operational reality in the area of forest resource inventory.In Brazil, the use of LiDAR for predicting forest inventory is still really new and therefore requires more research to test and validate this technology in Brazilians pine forest plantations.Thus far only Zandoná et al. (2008) andMüller et al. (2014) have used airborne LiDAR data for forest inventory purposes in loblolly pine plantations in south Brazil.
This study was designed to test the accuracy of LiDAR-based approach to quantify BA in P. taeda plantation, in southern Brazil.The specific aims were to: i) determine which LiDAR-derived metrics are most useful to predict BA; ii) derive, test and select the best model for predicting BA; and iii) apply the best model across the landscape to map the spatial distribution of BA at the standlevel.

STUDY AREA DESCRIPTION
The study area consisted of four P. taeda plantations located within the Telêmaco Borba municipality in the state of Paraná, Southern Region of Brazil (Fig. 1).The climate of the region is characterized as warm and temperate (Cfa) (Köppen and Geiger 1928), with mean annual precipitation of 1378 mm and the mean annual temperature of 18.4º C. The regions topography is complex, ranging from 618 m to 905 m.The plantations are managed by Klabin Celulose S/A, a pulp company, and all the trees were planted evenly along a 2.5 x 2.5 m or 3.0 x 2.0 m grid configuration, resulting in a mean tree density of 1,600 and 1,667 trees per ha, respectively.FIELD DATA COLLECTION A total of 50 rectangular plots of approximately 500 -620 m 2 each were established in stands ranging in ages from three to nine years old.All plots were georeferenced with a geodetic GPS with differential correction capability (Trimble Pro-XR).
For each GPS location, we recorded data for a time period ranging from 5-10 min, which allowed us to reduce the horizontal error to the level of 10 cm.In each plot, all individual trees were measured for diameter at breast height (DBH) at 1.30 m, and the total basal area (BA; m 2 ha -1 ) was then calculated for each sample plot using the following the equation: Where: BA n is basal area in m 2 ha -1 for the plot n; DBH is the diameter at breast height (1.30 m) in cm for tree i; PA is plot area in m 2 .Summary statistics of BA per stand ages are presented in Table I.

LIDAR DATA ACQUISITION AND PROCESSING
LiDAR data were obtained using a Harrier 68i sensor mounted on a CESSNA 206 aircraft.Table II describes the characteristics and precision of the LiDAR data.LiDAR data processing consisted of several steps that ingested the LiDAR point cloud data and provided two major outputs: the digital terrain model (DTM), and the LiDAR-derived canopy structure metrics.All of the LiDAR processing was performed using the US Forest Service FUSION/LDV 3.42 software toolkit (McGaughey 2015).
Initially the Catalog tool in FUSION/LDV was used to generate a descriptive report of the LiDAR data set.A filtering algorithm, based on Kraus and Pfeifer (1998) and available in the Groundfilter tool, was applied to differentiate between ground and vegetation points.DTMs were generated using the classified ground points with a spatial resolution of one meter using GridSurfaceCreate.Afterwards, the ClipData function was applied to normalize heights and to assure that the z coordinate for each point corresponded to the height above ground and not the orthometric elevation of the single point.The PolyClipdata function was then used to subset the LiDAR points within each of the 50 in situ- measured sample plots, and the CloudMetrics tool was applied to compute the LiDAR-derived metrics using all the available returns of the point cloud.Finally, the GridMetrics function was used to generate the same LiDAR metrics as computed with CloudMetrics, but now within grid cells of 25 m spatial resolution across the landscape.The list of the LiDAR metrics computed in this are presented in the Table III.

MODELING DEVELOPMENT AND ASSESSMENT
Pearson's correlation (r) was used to identify highly correlated predictor variables (r>0.9);redundant predictors were subsequently excluded to help guard against overfitting the predictive model.We determined the best LiDAR metrics using the regsubsets function in the Leaps package in R (R Development Core Team 2015), which selects the best subset of predictor variables using the Mallows Cp statistic (Mallows 1973).We then used the Lm linear model function in R to define the prospective linear regression models.To measure the quality of each model, the corrected Akaike information criterion (AICc) (Akaike 1973(Akaike , 1974) ) was calculated and ranked accordingly by minimum AIC.Model ranking was done using AICc since we had small sample size (n/p < 40, where n is number of samples and p is number of parameters of the model) (Hurvich and Tsai 1989).Residuals from all prospective models were analyzed graphically and tested for normality using the Shapiro-Wilk test (Shapiro and Wilk 1965) and for heteroscedasticity using the Breusch-Pagan test (Breusch and Pagan 1979).
BA was positively skewed, causing poor model fits at the tails of a distribution because ordinary least squares (OLS) regression assumes a normal distribution in the response variable.Therefore, natural logarithm (ln) transforms were applied to the response variable before modeling.The final BA prediction on the natural scale was obtained by multiplying the output prediction by the correction factor of exp (0.5 x MSE), were MSE is the mean square error of the residuals.The precision of the model predictions were then evaluated in terms of Person's correlation (r; eq.2), adjusted coefficient of determination (adj.R²; eq.3 and 4), and Root Mean Square Error (RMSE; eq.5 and 6), both absolute (m 2 ha -1 ) and relative (%):

Elevation accuracy 15cm
Operating altitude 666 m

Scan frequency 300 kHz
Pulse density 4 pulses/m² ( ) Where n is the number of plots, is the observed value for plot i, and is the predicted value for plot i, and are the mean of the observed and predicted BA values, p is the number of independent variables.Statistical equivalence tests were employed (Robinson 2015) to assess whether the predicted BA is statistically similar (i.e., equivalent) to the field-based, observed BA.We defined an acceptable accuracy as a relative RMSE below 15%.Based on the AICc values, the best model was selected and evaluated by means of a leave-oneout cross-validation -LOOCV strategy.We used the yaImpute package in R (Crookston and Finley 2008) to apply the best model across the landscape to map the spatial distribution of BA of P. taeda at the stand level.An overview of the methodology is outlined in Fig. 2.

RESULTS
According to Pearson's correlation (r), 24 (80%) of the LiDAR-derived metrics were highly correlated (r>0.9).The remaining six metrics not highly correlated were HSD, HCV, HSKE, HKUR, H99TH and COV.Even though the number of metrics was greatly reduced, these metrics still represented the canopy height (e.g., H99TH), cover (COV) and  variation (HCV).Table IV summarizes the Pearson statistics.

HMAX
The best metrics ranked according to the regsubset for predicting BA were H99TH, HCV, COV, HSKE, HKUR.Table V shows the competing models and diagnostic statistics for predicting plot-level BA from LiDAR-derived metrics.Six best subset models were developed, and for these models the adj.R² ranged from 0.84 to 0.97.The AICc ranged from -108.67 to -67.30, absolute RMSE ranged from 1.96 to 3.04 m 2 ha -1 , and relative RMSE ranged from 7.54 to 11.71%.
We observed that the inclusion of more than three metrics did not improve the model fit significantly.Therefore, the best model according to the lower AICc values was the model that included H99TH, HCV and COV metrics as predictors of BA.This model produced adj.R 2 and RMSE (%) of 0.93 and 7.74%, respectively.The leave-one-out cross validation showed a high model's stability with adj.R 2 and RMSE (%) of 0.92 and 8.31%, respectively.Equivalence plots of observed versus predicted BA via best model and leave-one-out cross validation also indicated that predicted and observed BA were statistically equivalents (Fig. 3).Moreover, the model residuals exhibited both normality and homogeneity of variances (p > 0.05) when evaluated by Shapiro-Wilk and Breusch-Pagan tests, respectively.The W and Chisquare statistics for Shapiro-Wilk and The MSE (0.005) for best BA model was substituted in the correction factor equation and the predicited BA was multiplied by the computed correction factor of 1.002 to converted the predicions from natural-logarithm-transformed scale to natural scale.Thus , the predicted BA of P. taeda for the 50 sample plots in this study ranged from 13.43 to 39.13 m 2 ha -1 (Fig. 4).Moreover, the selected best model was applied across the extent of the study area to map the BA at the landscape level.Fig. 5 shows the spatial distribution of the predicted BA for three stands with ages ranging from 4 to 8 years old.Predicted BA (m 2 ha -1 ) on natural scale per hectare for stands A, B and C were approximately 17.59 (± 2.37); 24.87 (± 3.59); 28.48 (± 5.68) m 2 ha -1 , respectively.

DISCUSSION
LiDAR is an important forest inventory data source when combined with appropriately designed sample plots in the field and with appropriate modeling tools (Hudak et al. 2006), and even though LiDAR data have been used widely for forest inventory (Naesset 1997, Maltamo et al. 2004, Hudak et al. 2006, 2014, Silva et al. 2016b), this is the first time that airborne LiDAR has been applied to predict and map basal area at plot-and landscape-levels pine plantations of south Brazil.
LiDAR is increasingly being used as a means for obtaining forest structural measurements over    large areas, and many LiDAR metrics can be used as candidates for predicting forest structure attributes (Hyyppa et al. 2008, Wulder et al. 2008, Van Leeuwen and Nieuwenhuis 2012, Treiz et al. 2012, Silva et al. 2014).In the P. taeda plantation environment due the homogeneity of canopy structure, we observed that most of LiDAR-derived metrics have been identified as highly correlated.
In this study, due the high homogeneity of this type of plantation, H99TH, HCV and COV provided accurate predictions of BA.These metrics most efficiently described the canopy structure of the forest, by capturing the majority of canopy structure variation.Although we selected just three variables to assess BA, the most important and number of metrics to include in the LiDAR-derived model will also depend on forest type and the variability of the response variable.For instance, Holmgren (2004) also included other LiDAR-derived metrics, such as H99TH and COV, in a BA predicting model for a mixed forest in south-western Sweden.On the other hand, Hudak et al. ( 2006) used both LiDAR and image-derived metrics to predict BA in mixed forests located in north-central Idaho, USA.
Although the cost of LiDAR data acquisition was not a central objective in this study, it is nonetheless an important factor to consider.The acquisition cost of LiDAR data is still high, especially in Brazil where the there are few vendors.However, LiDAR offers opportunities for rapidly estimating forest attributes, such as BA, over extensive areas with high accuracy (Hummel et al. 2011), and greatly reduces the amount of time and effort required for field-based measurements.In this study, we predicted and mapped BA at the landscape level at a spatial resolution of 25 m and with RMSE at plot-level lower than the 15% established in the materials and methods.In the Figure 5, we presented some of the basal area maps generated from the best LiDAR-derived model.The spatial distribution and variability of BA at stand level was not evaluated in this study, however, differences in BA at stand level may be related to topography, age and management practices of forest stands.Moreover, differences in BA may also depend on other factors such as i) type and intensity of land use before the establishment of a forest stand, ii) amount of soil compaction and reduction of soils fertility (Parker et al. 2007), iii) genetic variability (Emhart et al. 2006); iv) plant stress due high stand density (Sharma 2013), which could be described as competition index (Rivas et al. 2005, Uzoh andOliver 2008) and v) site index.

CONCLUSIONS
This study presented a simplified framework for predicting and mapping BA of P. taeda plantations using airborne LiDAR data.The most useful LiDAR metrics for assessing BA were H99TH, HCV and COV.The best model showed high accuracy for predicting BA with overall RMSE less than 10%.High spatial resolution maps of BA were created from the LiDAR data once a robust model was developed.We conclude that this LiDARbased approach is capable of providing suitably accuracy estimates of BA at plot and stand-levels for operational use in inventories of south pine plantations in southern Brazil.We hope that the promising results for forest inventory modeling presented herein will stimulate to operational use of this technology in Brazil.

Figure 1 -
Figure 1 -Location of study area in shouth Brazil.The black points indicate the location of the Pinus taeda stands.
COVCanopy Cover (Percentage of first return above 1.30m)CARLOS A. SILVA et al.

Figure 4 -
Figure 4 -Observed versus predicted BA (natural scale) from the best model and LOOCV across plantation ages for the sample plots (n=50).

Figure 3
Figure 3 -a) Observed versus predicted BA (natural scale) from the adjusted model; b) Observed versus predicted BA from LOOCV; (n=50).The grey polygon represents the ± 25% region of equivalence for the intercept, and the black vertical bar represents a 95% of confidence interval for the intercept.The predicted BA from model and LOOCV are equivalents to the fieldbased BA for the intercept if the black vertical bar is completely within the grey polygon.If the grey polygon is lower than the black vertical bar, the predicted BA is biased low; and if it is higher than the black vertical bar, the predicted BA is biased high.The grey dashed line represents the ± 25% region of equivalence for the slope, and if the solid black line is contained completely within the grey dashed line, the pairwise measurements are equal.A bar that is wider than the region outlined by the grey dashed lines indicates highly variable predictions.The white dots are the pairwise measurements, and the solid black line is a best-fit linear model for the pairwise measurements.The black dashed line represented the relationship 1:1.

Figure 5 -
Figure 5 -Predicted BA (natural scale) at landscape level.a) stand with ages ranging between 3 -5 years; b) stand with ages ranging between 5 -7 years; c) stand with age ranging between 7 -9 years; 1) Density plot of the predicted BA ; and 2) predicted BA map at landscape level with 25 m of spatial resolution.