INTRODUCTION
The Amazon forest is the world’s largest tropical forest covering 342 millions of hectares in the Brazilian territory, from which 73.6 millions are under sustainable use (Brazilian Forest Service, 2017). The description of forest structure, quantification of wood and biomass, combined with accurate topography, are necessary for development and execution of forest management plans and for monitoring programs in the Brazilian Amazon biome (D’Oliveira et al., 2012). It is expected that in the future, the strategy of forestry development in the Brazilian Amazon will adopt XXI century innovative technologies, which impact less on the environment and incorporate the knowledge of local populations (Becker, 2001). In this context, remote sensing has the potential to provide the information required to advance scientific understanding of the environment and to facilitate the sustainable resource use (Foody, 2003).
Light Detection and Ranging (LiDAR) is an active remote sensing technology that provides detailed information, in three dimensions, of forest structure (Dubayah and Drake, 2000). LiDAR sensors measure the distance between objects, determined by the elapsed time between emission and return of laser pulses (Lefsky et al., 2002). Airborne Laser Scanning (ALS) is a LiDAR based method mounted on an aircraft, which position and rotation of the sensor is recorded using a differential global positioning system (GPS) and inertia measurement units (IMU) (Hyyppä et al., 2008). ALS data has shown to be an alternative for costly and labor-intensive field inventories in Brazilian Amazon for estimating carbon, biomass and to monitor structural changes caused by selective logging, besides an efficient tool for REDD carbon monitoring systems (Asner et al., 2004; D’Oliveira et al., 2012; Asner et al., 2014; Andersen et al., 2014).
Among the many uses of LiDAR technology, airborne systems have become the dominant remote sensing technology for individual tree detection (ITD), providing highly accurate information of large areas of forests in a considerably short time. However, individual level approaches have not been studied as widely as the traditional plot-level methodology (Silva et al., 2012; Silva et al., 2014), although in the past 10 years the number of studies exploring individual tree detection and crown delineation, with development of novel high performance algorithms have emerged considerably (Zhen et al., 2016). Individual tree detection may be used to measure height, crown diameter (Popescu et al., 2003), basal area and volume (Silva et al., 2016). Isolating individual trees and extracting measurements at individual level allows for studies of habitat and behavior of wildlife (Zhao et al., 2012), monitor forest regeneration, reduce fieldwork of forest inventory and decrease uncertainties in estimations of aboveground biomass (Dalponte et al., 2016).
Over the past years, algorithms using different principles of tree detection have been developed, such as raster-based, point cloud and tree shape reconstruction (Zhen et al., 2016). Raster-based methods were first developed, and have longer been studied on passive remote sensing data, such as aerial and satellite imagery, and are divided into three groups: crown delineation, tree top detection and geographic object-based Image analysis (GEOBIA) (Zhen et al., 2016). Chen et al. (2006) successfully applied crown delineation using watershed segmentation on deciduous forests and Kwak et al. (2007) estimated height from individually delineated tree crowns in a mixed forest in South Korea. Local maxima filtering (Popescu et al., 2003; Falkowski et al., 2008) and other derived methods are the most common among tree top detection methods, and are still being improved, for example, Silva et al. (2016) used local maxima and pit-free canopy height models to enhance ITD accuracy in longleaf pine (Pinus palustris L.) forest in Southern United States. Region Growing algorithm detection was considered an efficient approach of ITD by Hyyppä et al. (2009), and more recently has been used to delineate crowns from local maxima tree top detection (Silva et al., 2016; Dalponte and Coomes, 2016). Novel Geographic Object-Based Image Analysis (GEOBIA) methods have been receiving special attention in the improvement of raster-based approaches, as an effective algorithm for both passive and active remote sensing data sources (Zhen et al., 2016), as Jakubowski et al. (2013) found more precise delineation of crowns of intermediate and suppressed trees, as well as more accurate height estimations using object-based segmentation on LiDAR-derived CHMs than from 3D LiDAR point cloud segmentation, in mixed conifer forest in Sierra Nevada, United States.
Studies attempting to segment individual trees in tropical forests using high resolution imagery have faced many difficulties due to the complex structure and high biodiversity found in these ecosystems, often causing underestimation of smaller trees (Asner et al., 2002). Same challenges were found in attempts using airborne hyperspectral data and a meanshift clustering algorithm (Féret et al., 2013). Imputation of biomass and carbon density using tree level estimations, in contrast to area-based, is an emerging viable alternative to provide input to fine-scale models of forest biomass. However the successful application of this approach strongly relies on the continuous improvement of automatic segmentation of individual crowns from high resolution LiDAR (Graves et al., 2018). In a tropical forest in Panama, Ferraz et al. (2016) used a combination of crown segmentation and a 3D clustering algorithm to delineate crowns and impute Above Ground Biomass (AGB) at tree level. Jucker et al. (2016) demonstrated the use of LiDAR individual crown metrics as input to biomass models with a global scale dataset, including data from Brazilian tropical forests. Similarly, Graves et al. (2018) used image segmentation on CHMs to calculate individual crown metrics to successfully estimate biomass of isolated trees in tropical agricultural landscapes. Coomes et al. (2017), in dense humid forests in Malaysia, pointed out that a key factor contributing to uncertainty in carbon density estimates is the over-segmentation of large trees and under-segmentation of sub-canopy trees. As reported by Coomes et al. (2017), ITD metrics-derived models have as advantage the similarity with existing carbon allometric models, the ability to identify and minimize sources of uncertainties and bias, and the less dependency on plot size on prediction errors. Based on those assumptions, the study and development of algorithms capable of correctly detect and extract individual crown information may give subsidies to possible more precise estimations of biomass and carbon from remote sensing data in tropical forests.
Assessing accuracy of ITD methods has been shown to be a difficult task because there is no standardized procedure for assessing performance of different algorithms, unless multiple algorithms are tested on a single study area using the same accuracy parameters, allied to a lack of field data for validation in some studies that aim to compare ITD approaches in point cloud data (Kaartinen et al., 2012; Zhen et al., 2016). Therefore, based on similar comparison frameworks, with the objective of assessing the strengths and weaknesses of ITD algorithms, this study aimed to test different approaches of automated individual tree detection from LiDAR data, exploring its potential application for Brazilian Amazon forests.
MATERIAL AND METHODS
Study area and field data collection
The study site is situated in Jamari National Forest, state of Rondônia, northern Brazil. The predominant vegetation is characterized as Open Ombrophylous Forest, with dominance of palm trees, lianas, and high diversity of large trees. The understory is composed by the same species that occupy the highest strata, but in younger stages. The area is under sustainable forest management use in a federal concession, and a low impact logging has been done in the past (Longo et al., 2016). Climate is classified as Am Tropical monsoon in Koppen’s climate classification, with annual precipitation of 2403 mm and mean temperature of 26ºC (Alvares et al. 2013).
Twenty field plots of 2500 m2 (50 x 50 m) were established. All trees above 35 cm of diameter at breast height (DBH, 1.30 cm) were included in the inventory. Trees with DBHs above 10 cm were measured inside a 5 x 50 m subplot (Figure 1, D). The four corners of each plot were georeferenced using a differential GNSS Trimble GeoXH 6000, with estimated post-processed accuracy of < 0.5 m. The local X and Y coordinates of tree stems were calculated using the two closest dGNSS points and a measuring tape to collect the distance.

FIGURE 1 Location of the study area. (A) Brazil and State of Rondônia; (B) Rondônia and Jamari National Forest; (C) LiDAR flight coverage; (D) Field plots and sub-plots; (E) Location of field plots within the flight coverage
Lidar data collection and processing
This study was conducted with data made available freely by the Sustainable Landscapes Project - EMBRAPA and USDA. The LiDAR flight was acquired in September of 2013, and the field inventory was conducted in December of the same year. An Optech Orion sensor, mounted on an ALS system was used to collect a total of 500 hectares. The characteristics and precision of the LiDAR data are presented in Table 1. The point cloud processing was performed using LAStools software (Isenburg, 2019) and LidR package (Roussel and Auty, 2018) in R environment (R Core Team, 2017).
TABLE 1 LiDAR flight details.
Attributes | Values |
Average laser pulse density | 30.94 pulses/m2 |
Average flight altitude | 853 m |
Field of view | 11.1° |
Scanning frequency | 67.5 Hz |
Datum | Sirgas 2000 / UTM zone 20 S |
In the initial processing, a summary report of the point cloud was generated using LAStools’ lasinfo function. Then, all XYZ duplicated points were removed, storing only unique points, using lasduplicate. Subsequently, the returns were classified as ground and vegetation points, using lasground and lasclassify respectively. The lasnoise function was then used to label as noise all returns that in a 12m x 12m x 12m grid cell, had up to 5 nearby points. Thereafter, the Z values of points classified as ground were subtracted to generate a normalized point cloud, using lasheight function, and then lasclip tool was used to subset the data into separate clouds corresponding to the 20 plots established in the field. In R environment, the normalized point clouds were used to create a 1-meter resolution Canopy Height Model (CHM), using lidR’s grid_canopy function.
Individual Tree Detection algorithms
A total of four methods were tested on the dataset (Table 2), selected based on their performance in past studies, easy reproduction and availability with free open source software. The methods tested here adopt different functioning principles to perform tree detection: Entirely raster-based and raster associated with point cloud analysis, as seen in Table 2.
TABLE 2 Summary of tested ITD methods.
ID | Method | Method Group |
1 | Watershed | Raster |
2 | Silva et al. (2016) | Raster |
3 | Dalponte and Coomes (2016) | Raster + Point Cloud |
4 | Coomes et al. (2017) | Raster + Point Cloud |
Method 1: Watershed.
Originally reported by Vincent and Soille (1991), watershed segmentation simulates an immersion in water in a digital gray scale model using queue of pixels. As the water fills, the pixels coming from different minimum points merge forming “dams” that correspond to the watersheds. We implemented this method in lidR package, which returns maximum points, corresponding to treetops, and delineates individual crowns. To achieve best performance, a 0.5 resolution CHM was used specifically in this method, based on best performance in a prior empirical test. This algorithm, as well as Method 3, was applied using the lastrees function in lidR package for R environment.
Method 2: Silva et al. (2016).
This method uses a local maxima 3 x 3 meter window to find treetops on a smoothed CHM. Smoothing is applied to remove noise and to generate a pit-free canopy model, in order to improve over-segmentation errors. A 3 x 3 meter window size was empirically chosen after testing different sizes. Subsequently, an initial tree crown area was delimited by a variable crown buffer, which was calculated for each tree multiplying the LiDAR height by a crown radius/total height factor, which the default value of 0.6 was chosen. The last step consisted of isolating polygons that corresponded to individual trees through a centroidal voronoi tessellation approach. This workflow was performed using rLiDAR package for R (Silva et al., 2017). Initially the FindTreesCHM function was used to find tree tops, and subsequently the ForestCAS function was used to grow crown limits.
Method 3: Dalponte and Coomes (2016).
This three step-algorithm (1) uses a Gaussian low-pass filter to smooth the CHM in order to eliminate sharp changes on the surface; (2) applies a 3 x 3 meter window to locate local maxima in the CHM. (3) A region growing is applied originating from the local maximum point to search for nearby lower pixels that supposedly belong to the same crown, and finally the algorithm creates a 2-D convex-hull polygon using the point cloud information.
Method 4: Coomes et al. (2017).
This procedure is different from Method 3 by using a variable size window to search for local maxima. Here, the window size varies in relation to the height, i.e., a higher local maximum pixel is considered a larger tree, and therefore a larger window size is applied. A dataset of over 5000 trees that had crown diameter and height measured from field plots in Ombrophylous and Seasonal forests throughout the legal Amazon in Brazil, provided by the Sustainable Landscapes Project, was used to model crown diameter and height relationship. Similarly to Coomes et al. (2017), we used a quantile regression, using quantreg package in R, to fit a linear model with tau = 0.9, or 90% of data bellow the regression line. The authors specifically adapted this method to minimize errors of omission of small trees and over-segmentation of large trees in tropical forest. The implementation of this algorithm was done using itcLiDARallo function in itcSegment package in R (Dalponte, 2018).
Automated matching of reference and candidate trees
The accuracy assessment was performed by an automated tree matching algorithm, that links predicted trees to reference trees in the field plots. This fully automated procedure was based on a previous ITD comparison study by Kaartinen et al. (2012). In a benchmark study, Eysn et al. (2015) applied a similar automated methodology to validate ITD methods, and because of its easy and simple reproduction, this procedure was preferred over manual assessment with trained interpreters.
The matching process consists of measuring the Euclidean distance of reference trees to nearby test trees within a search radius. In case of more than one tree inside the search radius threshold, the candidate tree with the least ΔD (distance between reference and test tree) is assigned to the test tree and considered a perfect detection, or true positive (TP) and the other tree is marked as commission (FP) and over-detection. The reference trees with no matches are considered false negative (FN) or omission. The search radius was fixed in 6 meters. This parameter value was chosen based on CHM crown diameters manually measured from 32 reference trees selected from the study site. On average, crown diameter was 12.36 m and thus a round value of 6 m was used. In conifer forests, Kaartinen et al (2012) and Eysn et al. (2015) used a 5-meter radius in automated tree matching.
The matching algorithm returns recall (r), precision (p) and F-score (Sokolova et al., 2006; Li et al., 2012), according to equations 1, 2 and 3. Over-detection rate (O%) is the number of reference trees with more than one test trees assigned by the total number of TP. This workflow was also implemented in R.
RESULTS
Table 3 to 5 summarize the results found in this study. The total number of trees counted in the twenty field plots was 259 (Nobs). Method 4 had the smallest error in the number of trees detected, underestimating trees by 93 individuals (Table 3). At plot level, all methods were highly biased. Method 2 showed less biased results, underestimating an average of 4.42% or 0.9 trees per plot (Table 4). Method 3 highly overestimated the number of trees overall and at plot level. Method 1 and 4 had very similar performance (Table 5), and are the two less effective algorithms, mainly caused by the low recall (r).
TABLE 3 Number of individual trees detected (Np), absolute and relative error by method.
Method | Np | Error | Error (%) | |
1 | Watershed | 225 | -68 | -30,22 |
2 | Silva | 372 | 96 | 25,81 |
3 | Dalponte and Coomes | 584 | 291 | 49,83 |
4 | Coomes | 200 | -93 | -46,50 |
TABLE 4 Error in number of trees detected compared to reference trees by field plot. Positive values mean overestimation error, and negative values mean underestimation.
Plot | Nobs | Method | |||
1 | 2 | 3 | 4 | ||
1 | 22 | -11 | 0 | 14 | -13 |
2 | 21 | -8 | -2 | 9 | -11 |
3 | 22 | -9 | -7 | 4 | -8 |
4 | 13 | -1 | 5 | 16 | -5 |
5 | 22 | -7 | -1 | 11 | -10 |
6 | 16 | -2 | 5 | 11 | -3 |
7 | 18 | -6 | 6 | 18 | -7 |
8 | 20 | -8 | 3 | 11 | -5 |
9 | 22 | -11 | -2 | 5 | -12 |
10 | 21 | -10 | -3 | 9 | -9 |
11 | 28 | -18 | -8 | -1 | -20 |
12 | 24 | -20 | -6 | 6 | -17 |
13 | 22 | -8 | -2 | 7 | -12 |
14 | 28 | -11 | -8 | 6 | -13 |
15 | 23 | -13 | -7 | 9 | -13 |
16 | 19 | -9 | -1 | 18 | -10 |
17 | 20 | -6 | -3 | 9 | -10 |
18 | 13 | -5 | 1 | 13 | -3 |
19 | 19 | -3 | 5 | 22 | -6 |
20 | 14 | -5 | 7 | 20 | -6 |
Mean | -8.55 | -0.90 | 10.85 | -9.65 | |
Bias (%) | -42.01 | -4.42 | 53.32 | -47.42 |
TABLE 5 Accuracy assessment.
Method | TP | FP | FN | r | p | F score | O% | |
1 | Watershed | 106 | 119 | 181 | 0.37 | 0.47 | 0.41 | 6 |
2 | Silva | 171 | 201 | 122 | 0.58 | 0.46 | 0.51 | 34.5 |
3 | Dalponte | 190 | 394 | 103 | 0.65 | 0.33 | 0.43 | 44.2 |
4 | Coomes | 115 | 71 | 114 | 0.35 | 0.50 | 0.41 | 5 |
Detecting 58% of all trees correctly, Method 2 achieved the best overall results in terms of F - score, 0.51. Method 4 was the most accurate, achieving the lowest number of trees falsely detected (FP), 50% of precision, and the lowest over-detection rate (O%). Method 3, detecting 65% of trees, obtained the highest recall, or percentage of correct detections, as well as the best results of omission error (FN) (Table 5).
DISCUSSION
The successful detection of individual trees in the Amazon forest may subsidize alternative methods of estimating biomass, and allow for new ecology and habitat investigations. In our study, the automated process using Method 3 algorithm was able to detect and match 65% of trees referenced in the field inventory, however, considering the tradeoff between TP and detection errors (FP, FN), Method 2 was considered the most efficient. Omission was the main source of error in Methods 1 and 4, and for Methods 2 and 3, commission error was more significant.
The workflow proposed by Silva et al. (2016), Method 2, achieved the best F-score parameters in this study, outperforming other ITD methods. However, results found in the previously reported work in California conifer Sierra Nevada forests were significantly better, with an average of 82% of trees correctly detected, and a balanced number of missed and falsely detected trees, with tendency to omission. Here, false positive trees were the main source of error, certainly caused by the inflexibility of the search window for large trees, often assigning multiple trees where one crown covers a large area. The same effect was observed in Method 3, that also uses a fixed window size, but with significantly more incorrect detections. Testing larger windows sizes, we found that FP errors improved, however, TP considerably decreased. Silva et al. (2016) observed 94% of precision, and an F - score of 0.90, contrasting with 46% and 0.51 in this study, respectively. Although overall this algorithm was considered the best, high FP errors might suggest that Method 2 might not be optimal for attribute estimation in this type of forest.
The approach proposed by Dalponte and Coomes (2016), Method 3, showed the highest recall, although the poorest accuracy statistics was found. As reported by Dalponte and Coomes (2016), this method was efficient to detect larger trees, while it tended to omit trees in smaller height and crown width in uneven-aged forest dominated by Picea abies (L.) Karst. As discussed above, the fixed search window tends to over-detect large trees, which in Method 3 resulted in 44.2% of over-detection rate (O%). To improve these deficiencies and adapt the algorithm to map carbon in Asian tropical forests, Coomes et al. (2017) proposed a variable size window, (Method 4). Comparing the two workflows, Method 4 had significant improvements of omission errors, which is the main deficiency of Method 3. Method 4 also showed a low O%, meaning that correctly detected trees have not been split into multiple crowns. This is an important parameter when the final objective of ITD is the imputation and development of methodologies to extract individual tree measurements, such as basal area and biomass, as crown dimension is emerging as a viable variable for predicting biomass (Goodman et al., 2014). Promising results have been reported of ITD-derived crown metrics, such as crown area and diameter, as input to biomass allometric models for tropical forest (Jucker et al. 2016; Coomes et al., 2017). Furthermore, Method 4 could possibly have achieved better efficiency if local tree crown measurements were available for establishing a relationship between height and search window size. Coomes et al. (2017) points out that their algorithm detected slightly less than 10% in the 10 - 30 cm size classes in tropical forests in Malaysia, which is related to the use of maximum points search on rasterized canopy models, although 97% of dominant trees were correctly detected.
Good precision in relation to other methods was found in Method 1 (Watershed), showing a low O%, that is certainly related to its ability to delineate large and heterogeneous shaped crowns, using a different strategy from the main premise of tree-top searching used by the other methods tested. In this study, however, watershed was the algorithm that most omitted reference trees. Ayrey et al. (2017), found that watershed segmentation significantly outperformed local maxima in dense uneven-aged conifer forest, with detection rates of 49% to 58%. The authors suggest a more robust approach, applying the algorithm to multiple stacked layers to increase detection of understory and overtopped trees, which resulted in an 11% improvement in the number of trees correctly detected. Reitberger et al. (2009) proposes a watershed-based method that is capable of detecting understory trees, and a 12% improvement compared to the conventional watershed method.
Raster CHM-based methods, such as watershed and local maxima have been reported as ineffective to detect lower vegetation (Chen et al., 2006; Popescu et al., 2007; Shendryk et al., 2016). Kwak et al. (2007) found that pine trees were more easily delineated than Quercus sp. in mixed forests in South Korea, using watershed algorithm. In this study, fixed window local maxima often assigned multiple trees to one reference tree, when the search window covered only parts of large crowns. Another characteristic of tropical forests that may have affected the performance of ITD is that the highest points of the crowns are likely to be more distant from where the main stem is located, comparing to pine trees. These particularities can be pointed out as challenges for ITD. Furthermore, algorithm parameters such as window size, CHM resolution, and specific algorithm parameters were observed to have a significant impact on detection rates. Therefore these important parameters should be adapted for each specific forest type and point cloud features, in order to achieve best results of tree detection.
To the present date, yet few studies aimed to compare available automated ITD algorithms on tropical forests using LiDAR data, although many comparison studies with conifer forests have been conducted. Current methods use premises that work very well for temperate forests but are not efficient for tropical forest. However, much effort is being made to develop and improve ITD for in the tropics, like the studies done by Graves et al. (2017) and Coomes et al. (2017). Ferraz et al. (2016) applied an individual tree level approach using a 3D Adaptive Mean shift algorithm, a directly on point cloud computing procedure that decomposes clusters of points into clusters that correspond to individual tree crowns, and successfully predicted AGB in a tropical forest in Panama. Hu et al. (2017) used a similar approach to a conifer and broadleaf mixed forest in China, reporting 86% of trees detected overall, 48% of suppressed trees and 77% of intermediate trees, which shows the potential of this methodology for Brazilian forests. Methods of ITD that use complete 3D information are emerging as a viable alternative for tree detection in complex forests. The central advantage of 3D methods is that the processing of raw point clouds uses all the horizontal and vertical information (Hamraz et al., 2017), although these methods can be very computationally intensive. Methods that take into account the shape of the crown may be a possible solution for tropical forest. For example Li et al. (2012) developed an algorithm for conifer forests that uses the conical shape and spacing between trees as premise. The paper done by Wan-Mohd-Jaafar et al. (2017) in tropical forests in Malaysia and by Figueiredo et al. (2014), in a tropical forest in the state of Acre-Brazil, have proven the viability of using LiDAR to measure individual crown metrics to model biomass and volume, with the use of manual procedures to perform the analysis, and Jucker et al. (2014) points out the potential of using individual crown metrics and height for AGB estimation, especially in large trees. Therefore, once an automated process is established, large areas could be inventoried using less time and resources. Applying a specific ITD algorithm more efficient for each forest stratum could be a strategy to tackle the challenge of heterogeneity in tropical forest, e.g. Method 4 is efficient for large trees, while a 3D method can be used to detect overlapped trees. Furthermore, since detection of large trees has been achieving good performance in past studies, detection of lower strata trees is the next step in future investigations.
CONCLUSION
Automated tree detection was able to detect 65% of field-referenced trees from LiDAR point cloud in the Amazon tropical forest. The most effective algorithm was the method proposed by Silva et al. (2016), although omission and commission errors were significantly high in all of the procedures tested. Current CHM-based methods are ineffective to detect trees in lower strata. The complexity and heterogeneity of forest formations in the Amazon is certainly a challenge for current tree detection algorithms. Robust methods that take into account the shape of the crowns and the complex structure of tropical forests are a possible solution to improve detection and precision rates.