Assessing biomass based on canopy height profiles using airborne laser scanning

This study aimed to map the stem biomass of an even-aged eucalyptus plantation in southeastern Brazil based on canopy height profile (CHPs) statistics using wall-to-wall discrete return airborne laser scanning (ALS), and compare the results with alternative maps generated by ordinary kriging interpolation from field-derived measurements. The assessment of stem biomass with ALS data was carried out using regression analysis methods. Initially, CHPs were determined to express the distribution of laser point heights in the ALS cloud for each sample plot. The probability density function (pdf) used was the Weibull distribution, with two parameters that in a secondary task, were used as explanatory variables to model stem biomass. ALS metrics such as height percentiles, dispersion of heights, and proportion of points were also investigated. A simple linear regression model of stem biomass as a function of the Weibull scale parameter showed high correlation (adj.R2 = 0.89). The alternative model considering the 30th percentile and the Weibull shape parameter slightly improved the quality of the estimation (adj. R2 = 0.93). Stem biomass maps based on the Weibull scale parameter doubled the accuracy of the ordinary kriging approach (relative root mean square error = 6 % and 13 %, respectively).

represented by a probability function is a convenient way of summarizing and retrieving the canopy vertical form (Coops et al., 2007).The Weibull probability density function (pdf) has been used to model vertical profiles thanks to its flexibility in characterizing different types of vegetation, and its parameters of scale and shape were successfully correlated with above ground attributes, such as height of trees, density of stems, and diameter at breast height (Coops et al., 2007;Mori and Hagihara, 1991).In Brazil, ALS technology has been used for terrain modeling, but applications involving assessment and mapping of biomass in eucalypt stands are still incipient (Packalén et al., 2011;Silva et al., 2014;Vauhkonen et al., 2011).This study aimed to map the stem biomass stock based on canopy height profiles statistics using ALS data of an even-aged eucalypt plantation, in southeastern Brazil, and compare the results with alternative maps generated by ordinary kriging interpolation from field-derived measurements.

Study site
The study site is located in the state of São Paulo, southeastern Brazil (22º58'04" S; 48º43'40" W) (Figure 1).The area is approximately 200 ha and is composed of a 6.5-year-old Eucalyptus grandis (W.Hill ex Maiden) plantation with seedlings coming from a fourth-generation seed orchard (Campoe et al., 2012).The stands were planted in December 2002, with an approximate density of 1600 trees ha −1 (3.75 m × 1.6 m) following minimum site preparation (Gonçalves et al., 2013).The Köppen climate type is Cfa; with mean annual temperature equal

Introduction
Current methods of forest inventory are based on the direct surveying of trees and sampling of ground plots (Campos and Leite, 2013).Statistical models generated to predict average estimates of stand attributes and their spatial variability are assessed only sporadically (Zhou et al., 2013).Data interpolation techniques can be used to assess stand spatial structure, yet they are limited by the sampling intensity of ground plots, which is usually insufficient to yield precise estimates (Bouvier et al., 2015;Viana et al., 2012).On the other hand, integrating remotely sensed data is an actual possibility, as it can be acquired to a great extent, within a short time and at reasonable prices (Hummel et al., 2011).Airborne laser scanning (ALS) technology is a potential remote sensing tool, as it can provide data on multiple strata of the canopy, while passive sensors cannot (Magnussen and Boudewyn, 1998;Naesset, 1997).The laser sensor records tridimensional coordinates over the surveyed area by collecting the time in which a laser pulse takes to go back to the aircraft after being reflected by the target.The result is a georeferenced 3D point cloud with high spatial resolution (Baltsavias, 1999;Reutebuch et al., 2005).

Field measurements and plot summaries
Figure 1 shows the network of inventory ground plots in the study site.They were divided into 22 training plots and 21 validation plots.Plots 1 to 12 (457 m 2 to 574 m 2 ) were established for research related to ecophysiology.Their location covered the site's productivity range after a census of diameter at breast height was carried out (Campoe et al., 2012).In addition, plots 13 to 22 (811 m 2 to 881 m 2 ) were designed specifically for ALS investigations.Plots 23 to 43 (240 m 2 ) are permanent ground plots of continuous inventory owned by the forest producer.
Field measurements were conducted in the training plots in July 2009, and July 2008, in the validation plots (Table 1).Diameter at breast height (DBH, 1.3 m above ground level) and total tree height were measured on all plots.Stem dry biomass was estimated using the following local-specific allometric equation derived from destructive sampling in Sep 2008 (Campoe et al., 2012 (1) where: Bi = dry stem biomass per tree (kg per tree), D i = diameter at breast height (cm), H i = total height (m).

Airborne laser scanning (ALS) data
The ALS survey was undertaken during Apr 2009, a period of the year with maximum stable leaf area index (LAI) (le Maire et al., 2011).The flight mission was conducted by a twin-engined light aircraft equipped with a discrete-return small footprint laser scanner (laser wavelength = 1064 nm).The parameters from the flight mission and the scanning process were as follows: flight height = 900 m; flight speed = 132 km h −1 ; swath width = 235 m; swath overlap = 30 %; scanning angle = 15º; scanning frequency = 74 Hz; frequency of pulse emission = 110 kHz; footprint = 21 cm; point density = 6.5 pts m −2 ; standard deviation of point density = 2 pts m −2 .
GPS observations from ground and aircraft were processed in a method to obtain a unique and adjusted cinematic solution to a well-known coordinate system, using the Waypoint GraphNav software.A digital elevation model (DEM) was generated with a one meter resolution, after ground points were labeled with the Multiscale Curvature Classification algorithm (MCC-LiDAR) (Evans and Hudak, 2007).The DEM was subtracted from all point elevations to remove topographic variations (normalization process).Then, the normalized point cloud was clipped to the same locations as the inventory field plots.The Fusion LiDAR Toolkit software was used to normalize the point cloud (clipdata), to clip plots using shapefiles (polyclipdata) and to extract ALS metrics (cloudmetrics).
We tried to select metrics based on a priori knowledge in an attempt to improve the capacity of the model generalization (Bouvier et al., 2015).Among the metrics extracted from each ALS plot, we focused on using height percentiles.A percentile x is the height z in which x % of points in the ALS cloud are beneath z.We tested the percentiles which predicted better basal area and volume in Zonete et al. (2010): first return percentiles 10, 30, 70, 90.Moreover, we selected metrics such as mean height, standard deviation and variance of heights, all of them also being derived from first returns.These metrics were calculated disregarding laser points with a height less than 2 m (Naesset, 2002;Zonete et al., 2010).We also parameterized a metric of density, which we named p_understory consisting of the proportion of laser points between 0.3 and 15 meters in height and the laser points between ground, (0 meters) and 15-meters in height.The 15-meter threshold was set based on visual inspection of the point cloud and on field experience.In this case, all return echoes were used to increase the sample of points in the understory layer.

Apparent canopy height profiles (CHPs)
From a set of possible theoretical distributions, we chose the Weibull pdf due to its flexibility in characterizing foliage distributions of different types of vegetation and its potential for fitting skewed data, which is a common feature of forest-derived ALS data (Coops et al., 2007;Dean et al., 2009;Lovell et al., 2003;Magnussen et al., 1999).
The apparent canopy height profiles (CHPs) were obtained by curve-fitting all ALS returns with a height greater than 5 m.The threshold aimed to exclude laser points on the ground, in shrubs, and dominated trees, to eliminate the bimodal effect on the vertical profile (Coops et al., 2007;Lovell et al., 2003).Curve-fitting analysis was carried out applying the maximum likelihood estimation technique (Cohen, 1965) using fitdistr in R (R Foundation for Statistical Computing; MASS package).The Weibull pdf with two parameters (scale and shape) is shown in Eq. (2).
A Weibull pdf with a shape parameter equal to 1 reduces to an exponential distribution, while shape values between 3 and 4 will approximate a normal curve.The scale parameter is at the 63.2nd percentile of the distribution (McCool, 2012).The Weibull scale and shape parameters were obtained from the CHPs at each training plot, and they were used as candidate predictors for stem biomass modeling.

Regression modeling of stem biomass with ALS data
The linear relationships between the ALS predictors and stem biomass were explored by carrying out a paired sample correlation t-test.We suspended ALS metrics from further analysis when the Pearson's correlation coefficient (ρ) was not significant at 99 % confidence level.Regression models were fit by the ordinary least squares method, and the ALS predictors were selected using the best subset approach (Lumley, 2009).We used the variance inflation factor (VIF) to detect multicollinearity of explanatory variables (Fox and Monette, 1992).Models with VIF greater than 5 were excluded from further analysis (d 'Oliveira et al., 2012).Graphical analyses of residuals and hypothesis testing were performed to check the assumptions underlying the linear regression theory.
The regression models were submitted to leaveone-out cross validation (Picard and Cook, 1984).This method uses the training data set without one of its observations (n-1) to predict the value removed from the sample (n is the sample size).This process occurs n times, so all observations are excluded once.The cross validation output was assessed by the relative root mean square error (rRMSE) statistic (Meng et al., 2009;Nyström et al., 2012) (Eq. 3).where: rRMSE = relative root mean square error (%); y i = observed stem biomass in plot i (Mg ha −1 ); ŷ i = predicted stem biomass in plot i (Mg ha −1 ); n = number of observations; y mean of observed stem biomass (Mg ha −1 ).Furthermore, we compared observed values of stem biomass in the validation dataset with predictions by the best regression models.To make such a comparison, we used Pearson's correlation coefficient statistic (4) where: ρ ^XY = sample Pearson's correlation coefficient; X i and Y i = observed values of variables X and Y; X _ and Y _ = mean of variables X and Y.

Stem biomass interpolation with ordinary kriging
We used ordinary kriging to interpolate stem biomass from the training dataset as an alternative method (Viana et al., 2012).The experimental semivariograms were obtained according to Eq. ( 5): where: γ h K ( ) = semivariance estimate of class k in dis- tance h h K ; = mean distance of class k; N k = number of pairs observed in class k; ε X i ( )= residual (random error) observed in x i ; x i = position i with coordinates x and y.
The following types of theoretical models were tested: spherical, exponential, linear, and Gaussian, and we picked the one with the least residual squares summation (Hiemstra et al., 2009).Ordinary kriging interpolation was carried out according to Eq. ( 6): where: Ẑ x 0 ( )= stem biomass estimate at position x 0 ; n = number of observations; Z(x i ) = observed value of stem biomass at position x i ; λ i = weight assigned to observation Z(x i ), in which the sum of weights is 1 (Viana et al., 2012).The fitting of theoretical semivariograms and the ordinary kriging were conducted using autofitVariogram and autoKrige in R (automap package).As for the regression models, we used leave-one-out cross validation and the validation dataset to assess the ordinary kriging performance.

Spatial representation of stand attributes
The prediction maps were generated with a spatial resolution of 16 m, an approximate size of the validation plots.We also generated two extra basal area maps (lin-the best fit for the Weibull pdf in the canopy upper layer (Figure 3), partially due to crowns occluding the penetration of laser beams at lower parts (Harding et al., 2001).The integration of terrestrial laser scanning (TLS) data with ALS-derived CHPs was presented as an alternative for improving canopy modeling at lower layers (Zhao et al., 2013).
The Weibull shape parameter (β_CHP) ranged from 5.7 to 11.3 confirming the negative asymmetry observed in the histograms of Figure 2 (Table 2).All the ALS metrics except β_CHP were positively correlated with the stem biomass.The metric P10 did not present strong correlation with the stem biomass, and it was excluded from further analysis (Table 3).ear regression and ordinary kriging) to visually compare results with one former basal area map obtained from a census of DBH on Feb 2008 (Campoe et al., 2012).To our knowledge, no study has yet compared the quality of stem biomass maps generated from regression with ALS data and generated from ordinary kriging of fieldderived data, in even-aged eucalypt plantations.
The observed and predicted values for the stem biomass regressed on P30 and β_CHP (model 1) and on α_CHP (model 2) are shown in Figure 4.The observations close to the 1:1 diagonal indicate a good model fit.This was also corroborated by the leave-one-out cross validation, which demonstrated rRMSE to be equal to 5 % for both models.
Figure 5 shows the Gaussian semivariogram for the stem biomass constructed from the training dataset.The model presented a nugget effect equal to 128 (Mg ha −1 ) 2 and a sill equal to 1129 (Mg ha −1 ) 2 at a range of 574 m.The leave-one-out-cross validation from the ordinary kriging resulted in an rRMSE equal to 13 %, which was 160 % greater than the regression models.The predictions from the regression models showed slightly higher correlation with the validation dataset than the ordinary kriging (ρ = 0.8, ρ = 0.82, and ρ = 0.71, respectively, Figure 6).The result is coherent with the work of Meng et al. (2009), who studied different methods of kriging to estimate the basal area in pine forests in the state of Georgia, USA.They found that applying regression kriging using Landsat ETM+ data as the auxiliary variable improved the results compared to ordinary kriging interpolation (R 2 = 0.9 and R 2 = 0.75, respectively).
The map of the stem biomass generated from the metrics P30 and β_CHP were shown to be sensitive to the overlapping effect along adjacent flight lines (pixels scattered in the northwest-southeast direction) (Figure 7).
All maps show the gradient of productivity similar to that described by Campoe et al. (2012); i.e., lower elevations in the terrain had higher stem biomass stock than the highest locations.However, the maps differed considerably in relation to local spatial patterns.A visual validation with the collinear variable of the stem biomass, basal area, is shown in Figure 8.

Discussion
There was great variability between the observed apparent canopy height profiles (CHPs), even though all trees in the stand are about the same age.The CHP is a signature of the forest structure, and is useful in a variety of contexts, like monitoring spatial and temporal changes (Coops et al., 2007), mapping homogeneous strata (Nelson et al., 2003), and identifying vegetation types (Harding et al., 2001;Jaskierniak et al., 2011).Dean et al. (2009) were able to retrieve DBH values in a 36-year-old even-aged loblolly pine (Pinus taeda L.) stand from variable height to the base of live crown and height to crown median, which were retrieved from ALS-derived CHPs (n = 17, R 2 = 0.97).
The ALS metric P30 presented the greatest correlation with stem biomass.d 'Oliveira et al. (2012) observed a similar correlation between the height per-   tical foliage profile from a hinoki stand (Chamaecyparis obtusa (Sieb.et Zucc.)Endl.) in central Japan.The improvement from adding the field-derived variable density of trees together with α_CHP seems to be promising owing to the potential of ALS data to quantify trees at the stand level (Görgens et al., 2015b;Popescu et al., 2003;Oliveira et al., 2012).
The shape parameter of the Weibull distribution (β_CHP) was the only ALS metric negatively correlated with stem biomass.This result was also observed in Coops et al. (2007) in relation to the DBH variable in different mixed stands of Vancouver Island, Canada.The Weibull pdf shape parameter is related to the degree of data dispersion in the distribution.With the scale pafixed, the greater the value of the shape parameter, the smaller the curve width at the mode.In models where β_CHP was used together with P30, there was a mediation effect; i.e., β_CHP correlated positively with stem biomass.The hypothesis is that for similar canopy height layers, a smaller vertical dispersion of heights is indicative of homogeneity, which would lead to more productive stands (Stape et al., 2010).
The ALS metric p_understory showed only moderate correlation with stem biomass and did not capture the variation in the forest horizontal structure as expected.This is because the most productive plots also had more ALS points intercepted by the crown, thereby underestimating the density of trees in the understory layer.Additionally, observations were influenced by the point density heterogeneity within the ALS data, partially created by the overlapping of swaths during the flight (Bater et al., 2011;Görgens et al., 2015a).The selected regression models had at most two explanatory variables, and they were able to explain 93 % of the stem biomass variation.Stephens et al. (2012) explained 70 % of total carbon stock in forests of New Zealand with just one ALS height metric, observing that the addition of a density metric slightly improved the quality of the model (2 %), but a third variable did not bring any improvement.centile P25 (all returns) and above-ground biomass in the Amazon rainforest, Brazil.In planted forests, similar results were observed by Stephens et al. (2012) and Zonete et al. (2010).According to Stephens et al. (2012), the lower percentiles can combine information of tree height and crown density, in which denser stands will present greater values for such metric.The significant correlation obtained between the scale parameter of the Weibull distribution (α_CHP) and stem biomass was consistent with the work of Mori and Hagihara (1991).The stem biomass maps constructed from the linear models with ALS metrics and from the ordinary kriging interpolation were consistent with the existing gradient of productivity shown by Campoe et al. (2012).However, we observed from the validation statistics and visual inspection that the regression models fitted from the ALS metrics (P30 and α_CHP) generated more realistic maps, corroborating the initial hypothesis.

Figure 2 -
Figure 2 -Apparent eucalypt canopy height profiles (CHPs) per training plot.The vertical axis shows the laser point heights and the horizontal axis is the probability density of occurrence in each height class.The solid lines are the Weibull probability density functions fitted to the observed airborne laser scanning (ALS) data (histograms).
Stat. = statistics; α_CHP = Weibull distribution scale parameter; β_CHP = Weibull distribution shape parameter; P## = percentiles in height ##; Mean = mean height of first returns; St. dev.= standard deviation of heights from first returns; Variance = variance of heights from first returns; p_understory = proportion of points in the canopy understory layer; min.= minimum observed value among plots; avg.= average value among plots; max.= maximum observed value among plots; st.dev.= standard deviation of mean values.

Figure 3 -
Figure 3 -Quantile-quantile plot per training plot.The vertical axis shows the sample quantiles for laser point heights and the horizontal axis illustrates the Weibull distribution quantiles.

Figure 4 -
Figure 4 -Eucalypt stem biomass (B) regressed on P30 and β_CHP (model 1, from Table 3) in the upper layer and regressed on α_CHP (model 2) in the lower layer.The horizontal bars in the left column represent the prediction intervals with 95 % confidence level.

Figure 5 -
Figure 5 -Gaussian semivariogram of eucalypt stem biomass built from the training dataset.The points represent the number of observed pairs in class k.

Figure 6 -
Figure 6 -Eucalypt stem biomass (B) in the validation dataset: predicted and observed.The regression models shown on the left and center correspond to models 1 and 2, from Table 3; ρ = Pearson's correlation coefficient; yr = years.

Figure 8 -
Figure 8 -Eucalypt basal area (G) maps.The map in the center is generated from a census of diameter at breast height (DBH) when the forest was 5.3 years old.The map on the left is G regressed on α_CHP, while the map on the right was derived from ordinary kriging interpolation.

Figure 7 -
Figure 7 -Eucalypt stem biomass prediction maps.In the upper and middle layer are linear regression models 1 and 2, from Table 3.The lower layer was derived from ordinary kriging interpolation (the white dots represent the training plots location).
They succeeded in correlating the scale parameter of the Weibull distribution with DBH, when studying the ver-

Table 2 -
Summary of metrics resulted from processing the airborne laser scanning data (ALS) data of the 22 training plots in the eucalypt plantation site.