POINT CLOUD GENERATION FROM GAUSSIAN DECOMPOSITION OF THE WAVEFORM LASER SIGNAL WITH GENETIC ALGORITHMS

: Recent developments in LIDAR technology lead to the availability of the waveform systems, which capture and digitize the whole return of the emitted LASER pulse. As many objects may cause multiple returns in the same echo, one task is to detect and separate different echoes within the same digitized measurement. In this paper the results of a study aimed at LASER signal waveform decomposition using genetic algorithms are introduced. The proposed method is based on the Gaussian decomposition approach and analyzes each digitized return to compute one or more points. Initially, the number of peaks contained in the waveform is determined by a simple peak detection method, with a local maximum point algorithm. When more than one peak is detected, genetic algorithms are applied to estimate the amplitude, time and standard deviation of each peak within the digitized signal. With this methodology it was possible to increase the number of points by approximately 17 % compared to the point cloud obtained using commercial software. The best results were obtained in areas with high vegetation, and thus the methodology can be applied to the generation of denser points cloud in forest areas.


Introduction
Airborne Laser Scanner (ALS) systems use LIDAR (Light Detection And Ranging) technology to compute a dense points cloud of the Earth's surface and the objects above it.This is carried out by measuring the travel time of a LASER (Light Amplification by Stimulated Emission of Radiation) pulse emitted, which reaches the surface of the object and is backscattered.A small portion is reflected towards the sensor, which acts as a receiver and registers the travelling time (Wehr and Lohr 1999).The physical properties of the LASER system are wavelength (μm), pulse duration (ns), amount of energy (μJ), pulse repetition frequency (kHz) and beam divergence (mrad) (Baltsavias 1999).
A very special feature of the ALS systems is its high data collection frequency.For topographic mapping, ALS systems emit short pulses with a duration of 5 to 10 nanoseconds (ns) and wavelength ranging from 800 to 1,550 nanometers (nm).The recording of the return signal is usually performed with a sample rate of 1 ns.As the beam has a small divergence, depending on the height, the measurements are made within a small footprint, around 0.2 m at 1 km of flight height (Wagner et al. 2004).The data density of the ALS systems is currently being increased with the use of larger acquisition and scanning frequencies and even the use of multiple pulses.In addition, one of the main advantages is the possibility of the LASER beam to penetrate forest canopy, which enables the measurement of points on the ground in vegetation covered areas (Wagner et al. 2006).
The first ALS systems were able to record one or two returned echoes in discrete mode.Recent developments enable to record the entire reflected pulse and store it using a digitizer.These systems are called a full-waveform (FWF).The digitized signal is later analyzed to identify multiple echoes within the recorded data and compute the respective point coordinates based on the peaks contained in the waveform (WF).The pulse duration and the cross-section inclination are other information that can be extracted from the WF and can also be used to derive information about the nature of the surface.The advantage of the FWF is the possibility to locate surfaces discontinuities and distinguish terrain from low vegetation, less than 1 meter height.The classification of points that hit vegetation or ground in forest areas can be improved using the reflectivity, roughness and slope of the targets that intercept the LASER beam.This information can also be used for many specific applications such as city modeling, bridge and power line detection (Mallet et al. 2009).WF is also useful to identify vegetation species and estimate wood volume and biomass.It also turns it easier to estimate canopy height, shape, density and vertical structure, as well detect different levels of vegetation (Blair, Rabine and Hofton 1999).A disadvantage of using the FWF system is the increase in data volume and the need of specific software for data processing.
As the FWF enables recording the complete return of the emitted beam, it is possible to identify more of one object along the trajectory of the LASER pulse.For this purpose, it is necessary to detect the peaks in the digitized and recorded signal.The intensity and shape of these peaks provide information to estimate physical parameters of each illuminated object.There are different alternatives to detect the peaks, as simple peak detection (SPD) methods, correlation techniques, deconvolution and decomposition, besides other approaches (Mallet et al. 2009).
According Wagner et al. (2004) simple peak detection can be performed based on standard methods like: threshold, center of gravity, local maximum point, crossing with zero of the second derivative and fractionation constant algorithms.The quality of the range computation depends on the method and overlapping peaks can be omitted.The performance of the peak detectors depends on the effective scattering cross section, range and noise level.
Another alternative is the use of the correlation principle, comparing the received signal to the emitted one.For this purpose, it is used the normalized cross-correlation, which relates the WF of the emitted pulse and the received signal through a time delay function (Bretar et al. 2008).
The WF decomposition using appropriate parametric functions is one of the techniques that allow extracting additional information for each return, but require longer processing time, since they require the estimation of initial parameters and an optimization step.For this purpose it can be used methods such as the Non-linear least-squares approach using Levenberg-Marquardt optimization algorithm and the maximum likelihood estimation using Expectation-Maximization algorithm (Zhu et al. 2011).
In this paper, a new method to decompose the digitized signal into single or multiple returns is presented.The method is based on the use of genetic algorithms and the Gaussian decomposition.In order to describe the approach, the paper is organized as follows.First, the principles of WF processing are introduced in section 2. The study area and WF data are described in section 3. The methodology is described in section 4, while the experiments are presented in chapter 5. Finally, the conclusions and recommendations are listed in section 6.

FWF processing
FWF ALS systems were developed in the 1980s for bathymetric surveys, designed for mapping underwater topography in coastal and surface waters (Mallet and Bretar 2009).The U. S. National Bulletin of Geodetic Sciences, 24(2):270-285, Apr-Jun, 2018 Aeronautics and Space Administration (NASA) developed the Scanning Lidar Imagery of Echo Recovery (SLICER) and Laser Vegetation Imaging Sensor (LVIS) systems for mapping forest areas (Blair, Rabine and Hofton 1999).
Figure 1 illustrates difference between the measurement of discrete echoes and the full-waveform recording.In the discrete form system detects the time when the returned signal is greater than a threshold, and therefore it produces just two measurements in the example.The first return corresponds to the tree and the second one is the result of the mixture of the reflection of the scrub and the terrain.In the full-waveform digitization, the complete return signal is recorded at discrete times.Some examples of commercial ALS FWF systems are the Leica ALS60, Optech ALTM 3100, Riegl LMS-Q560, Toposys Falcon III and Topoye Mark II (Mallet e Bretar 2009).The Optech Airborne Laser Terrain Mapper (ALTM) Pegasus HD500 system surveyed the WF data used in this study.This system features maximum scanning frequencies of 140 Hz and pulse repetition frequency of 500 kHz, with maximum operation height of 2.5 km.It uses an oscillating mirror to scan the surface transversely to the flight direction.The LASER beam is emitted at 1,064 nm, 250 µJ, pulse width 4 ns, beam divergence of 0.25 mrad what it generates a footprint size of 0.25 m at flight height of 1 km.The field of view (FOV) is ± 32.5°.The system allows the discrete recording and also the complete digitizing of the WF of return signal by using the Intelligent Waveform Digitizer (IWD).When a discrete form is used for recording the data, it computes a points cloud in real time.The IWD is a module which records the amplitude of the return signal with 12 bits, generating amplitudes with 4,096 digital numbers (DN).The sampling interval is 1 ns and the maximum duration of the emitted pulse is equal to 40 ns and the received signal is 499 ns.The values of the amplitudes present offset of 200 DN.After recording the WF through the IWD, the points cloud can be computed in the post-processing step using the Digitizer Data Retrieval (DDR) software, which also enables the visualization of the recorded WF and detected peaks (Optech 2012).Sciences, 24(2):270-285, Apr-Jun, 2018

Gaussian Decomposition
The result of the digital encoder is a list of amplitudes recorded at fixed time steps.The problem consists basically of two steps.First, the amount of peaks in the signal has to be computed.Then the form of the single returns needs to be modeled.The returns can be determined by SPD methods or WF decomposition (Wang et al. 2016).
A common practice to model each peak (j) is based on the assumption that a single return is symmetric and follows a Gaussian distribution (Wagner et al. 2006).Under this assumption, three parameters are sufficient to reconstruct the reflected signal of a single surface: the mean (μj), amplitude (Aj) and standard deviation (σj).These parameters correspond to the mean time of the peak, height and width of the base in abscissa axis.Equation 1shows the Gaussian function (fG,j) for each time (xj).Figure 2 shows the Gaussian parameters (μ, A, σ) for the last peak in the WF of received signal. (1) Figure 2: Gaussian parameters in the last peak of the WF.
The decomposition algorithms determine the parameters of each peak considering that the sum of all peaks is equal to the recorded signal plus superimposed noise (bj), as described in Equation 2.More details and modeling functions can be seen in Mallet and Bretar (2009). (2)

Range computation
Bulletin of Geodetic Sciences, 24(2):270-285, Apr-Jun, 2018 The range is computed based on the time of the LASER pulse needs to travel from the emission to the surface and back to the receiver, is the principle Time-Of-Flight (TOF) (Wehr and Lohr 1999).
The LASER pulse propagates at the speed of light (c) in the atmosphere between the sensor and the target.For the range computation is necessary to use the propagation index (n), according to Equation 3, temperature (T) and pressure (P) (Optech 2012).
The parameters (T) and (P) in Equation 3are mean values, in Celsius (C) and hectopascal (hPa), respectively.If these parameters are known in air and at the ground, the mean values should be used.The range (R) is calculated by Equation 4 using the time of flight (Δt) of the emitted laser pulse and the returned signal (Optech 2012).
The travel time (Δt) can be computed from Equation 5.The signal return time (T1) is obtained in the ASCII file with the WF data.The emitted pulse time (t0) and peak time (ti) of the return signal is obtained in Gaussian decomposition of the WF.
The range resolution (ΔR) is function of the time resolution (Δdt) and can be computed from Equation 6 (Wehr and Lohr 1999).
A common sampling rate for ALS FWF systems is 1 ns, applying these value in Equation 6 the range resolution is 0.15 m.However, the length measurement with the WF depends on how the target backscatters the LASER beam towards the sensor.Table 1 shows an example of the WF recorded by the IWD.The transmitted signal is represented by the code 0 and the received signal by the code 1.The GPS Time (GPST) determines the correspondence between the WF of the emitted and received signals, in addition to providing the location of the coordinates of the point of emission of the LASER beam in the trajectory of the aircraft with the ALS system.198 195 196 198 200 200 200 199 198 196 195 193 190 187 185 186 186 192] Figure 3 shows the plot of the WF data shown in Table 1, as well as the mean time of the peaks computed using the local maximum point method.The red line shows two peaks, corresponding to two objects.The time of the received signal (T1) is 5,295.96ns, as displayed on Table 1  Table 2 shows the difference of the ranges computed with the DDR software and the SPD method.
The differences are below the range resolution.

Georeferencing
ALS systems have three main units, one for range measurement, a second unit for computation of the position and the last one to measure the attitude of the platform.The first unit measures the range between the sensor and the illuminated surface.The second unit provides the aircraft's trajectory through the Global Navigation Satellite System (GNSS) and the last one measures the attitude with an Inertial Navigation System (INS).The coordinates of the trajectory are referenced to a geodetic reference system (Wehr and Lohr 1999).
The phase center of the GNSS receiver does not coincide with the optical center of the LASER measurement unit and the coordinate reference systems are not parallel.Therefore it is necessary to perform rotations and translations to compute the coordinates in the reference frame, according to Equation 7, as described in Filin (2003). [ In Equation ( 7) the Cartesian coordinates (X, Y, Z) of each point is referenced to the geodetic reference system.They are obtained from the coordinates of the phase center of the GNSS receiver (X0, Y0, Z0) in the local frame, the rotation matrix (RINS) from body frame to reference frame defined by the local vertical, the lever arm offset vector (δx, δy, δz), which relates the phase center of the GNSS receiver with the optical center of the LASER unit, the rotation matrix (Rm) that represents the relative rotation of the LASER unit in relation to the inertial system, the rotation (RS) of the scanner, the range (ρ) to the target and the components (ēx, ēy, ēz) that model random errors (Filin 2003).

Study Area and WF Data
The WF data used in this study were provided by LACTEC.These were surveyed in August 2012, with the ALTM Pegasus HD500 system, on the city of Ponta Grossa -Paraná.The flight was performed with the mean height to 800 m, FOV of ± 32.5º, scanning and emission frequencies of 23 Hz and 500 kHz, aircraft speed of 135 kts and flight direction of east -west.The study area is illustrated with the red polygon in Figure 4a with Google Earth image of March 2012 as background.
The ellipsoidal height in the study area ranges from 792.5 m to 808.8 m. Figure 4c shows the tridimensional view of the points cloud.

Methodology
Initially the number of peaks in the WF was determined with the local maximum point, in a Simple Peak Detection (SPD) algorithm implemented in Matlab.When a single peak is detected, the standard deviation is computed from the amplitude samples and this is modeled with a Gaussian function.Then the difference between the original digitized values and the modeled curve is computed.If the difference is less than the threshold of 20 DN the process stop, otherwise the WF is decomposed with the Genetic Algorithm (GA) and, if the stop criterion is reached, the parameters of each peak are computed.After, the range is computed the georeferencing procedure is performed for the points cloud computation.Figure 5 shows the flowchart of this methodology.

Multiple peaks detection and modeling
In this step, the series of amplitudes is analyzed in order to identify the presence of multiple returns within the same WF.As described in Equation 2, noise may be present in the data.Therefore a threshold of 320 DN was used to filter weak signals and to avoid peaks caused by noise."A priori", the number of peaks, their location and amplitude are unknown.The aim of this step is to count the peaks and determine their time and amplitude.
The proposed solution is based on genetic algorithms.Genetic algorithms are optimization methods that aim at solving a proposed problem within an iterative process that resembles natural evolution.The problem needs to be parameterized and the parameters to be defined.The search process starts with a set of random solutions.The parameters of a possible solution are then codified as a genetic chain (represented by chromosomes), for example using a binary chain that describes one individual.A set of individuals (solutions) is called a population or generation.The adequacy of each solution in respect to the problem is then evaluated.This enables to rank the individuals according to its fitness or adequacy to the problem.The fitness value can be based, for instance, according to an error rate.The best solutions are then identified and the worst discarded.The best solutions from one population are combined to form a new population.By combining two individuals with high fitness, it is expected that new individuals will be generated, with better fitness, because they would be able to inherit the best characteristics of each individual.So, it is expected that the new population will be better than the previous one, leading to an optimization process.This is repeated until some condition, for example number of populations or minimal fitness, is achieved (Negnevitsky 2005).
The combination of two solutions can follow different principles.This operation is called crossover and applied to shuffle the different features of two individuals to create a new solution that inherits characteristics of each parent.The simplest crossover method is the one-point crossover and can be described as follows: Two chromosomes are chosen, a point along the chromosome is selected, then the pieces to the left of that point are exchanged between the chromosomes, producing a pair of offspring chromosomes.The selection of the parents is performed randomly, considering only the best fitted individuals.This grants that only good solutions are combined.
In order to speed up the process and avoid the stagnation at a local minimum, new genetic material is introduced through a mutation operator.After each iteration, an individual is altered randomly, by choosing one of its bits randomly and inverting its value, which produces a new value for one of the parameters.
In the experiments, the crossing and mutation probabilities adopted were 30 % and 20 %.

Fitness function
The computation of the fitness is defined for each specific problem.It is a function that reflects how the result of the model approaches the original data.Considering Equations 1 and 2, the modeled curve is the sum of single Gaussian functions, each one defined in terms of its mean, amplitude and standard deviation.
In this study, the difference between the digitized original values and the modeled function is used to measure the fitness of each solution, as described in Equation 8, where the difference (DIF) is computed using the observed curve (COBS) and the modeled curve (CMOD).
The iterative process searches for the best parameters of the Gaussian curves that better approach the original curve.The fitness value of the best individual is measured after each iteration, in order to monitor the evolutionary process.It is considered that convergence is reached when the fitness value lies above 95% and then the process is stopped.In some cases, the iterative process reaches the maximum allowed iterations without reaching the global minimum of the difference.In such cases, the maximal difference is computed along the time axis and compared to an established threshold of 20 DN in amplitude.If it is greater than this limit, the search process is repeated increasing the number of searched peaks.This condition was introduced specially to solve combinations of multiple curves, where the peaks of different Gaussians are superimposed.If the difference is less than the threshold, the process is terminated.

Codification of an individual
Bulletin of Geodetic Sciences, 24(2):270-285, Apr-Jun, 2018 For the codification of the solution as a genetic chain, the desired parameters and its format have to be specified.A single peak is described by three parameters: mean time, maximal amplitude and standard deviation.As the IWD system stores a maximum of 499 samples in order to reduce the amount of data, and in order to make digitizing technically feasible due to the high frequency operation of the ALTM Pegasus HD 500 system (Optech 2012), the binary representation of the mean time (μj) is made with 9 bits.The amplitude (Aj) was coded using 12 bits (4,096 DN), according to the IWD output.The standard deviation (σj) is a real value and was represented using 16 bits and controls the width of the base of the single peak in the abscissa axis.So, a binary chain representing a single Gaussian uses 37 bits.To represent two Gaussians, 74 bits are used, and so on.Figure 6 shows the binary codification of an individual with amplitude value equal to 1,562 ND (12 bits), mean time of 75 ns (9 bits) and standard deviation of 3.046875 (16 bits).
For the representation of a signal composed of more than one peak, for example n-peaks, n*3 parameters must be determined (A1, μ1, σ1, A2, μ2, σ2, …, An, μn, σn).Each Gaussian (G) can be described by the set of parameters (Aj, μj, σj).For example, the combination of two modeled curves (CMOD) is represented by Equation 9.

Points cloud computation
After the detection and computation of the peaks in the digitized data, the genetic chain is decomposed and translated into values of the Gaussian parameters.One of the most important results is the time, which enables to compute the range to the illuminated point on the surface.Using Equation 4, the range for each point is computed.Then it is projected to a 3D Cartesian coordinate system, defined by the optical center of the ALS system.Subsequently, the 3D coordinates of the point are computed in the geodetic frame, according the Equation 7. Finally, an ASCII file is generated with the UTM coordinates (E, N) and ellipsoidal height (h).In this file is added the amplitude, distance, standard deviation and GPST for each computed point.

Results
The results obtained with the proposed methodology are shown in this section.Figure 7 shows a simple WF with relatively well defined peaks.The emitted pulse is represented with the black solid line and the detected peak with the gray triangle.The SPD algorithm and the GA were able to determine five returns, which are represented by the red triangles and the five Gaussians are plotted with dotted lines.The solid red line represents the WF of the received signal.The second and third returns have low amplitude values, 234 and 236 DN respectively, but are above the amplitude threshold of 220 ND previously defined.The DDR software could not detect these returns, probably because it uses a noise filter with amplitude above 220 DN.Gaussians 2, 3 and 4 in Figure 8 represent intermediate returns that did not cause a local maximum, and therefore it was not possible to detect them with SPD algorithm.The overlaps of the three Gaussians reduces the symmetry of the curve.
As a result of the detection of more peaks in some series, the number of points was increased in the point cloud.Table 3 illustrates the relative quantity of points computed using the DDR software and the GA methodology.For the point cloud generated with the 12 strips, there was an absolute increase of 3,021 points, which represents a relative increase by approximately 17 %.The point cloud computed by the DDR software has 17,880 points, while the GA-based method has 20,901 points.Figures 9 and 10 illustrate the points of the first strip (L1) computed with the DDR software and the GA method, respectively.The GA points cloud is denser, with around 23 % more points.Two regions are outlined with red circles in Figure 10.In areas with low vegetation (region 1), there was no significant increase of points.In the second region, a region covered by high vegetation, it can be seen that there was a significant increase of points.In order to confirm the results obtained in the areas of high vegetation, another study area was chosen with predominance of high and dense vegetation, again more points were obtained that were generated with DDR software.The increase in the number of points was approximately 28 %.

Conclusions
In this study, the waveform decomposition was performed using genetic algorithms with simple peak detection and the maximum local point method.The results were compared to reference values obtained using a commercial software.The results proved that it was possible to increase the number of points by approximately 17 %.It was verified that the best results were obtained in areas covered by high vegetation, where the amount of points can be increased up to 28 %.
It was also verified that the computed ranges are compatible to those of the commercial software, because they lie within the range precision of 0.15 m.When complex waveforms are analyzed, errors of 1 ns can be occur in the determination of the mean time of the peaks, because of the larger standard deviation.Consequently, errors in range measurements are larger than the nominal precision.
Finally, it is recommended to improve the peaks detection algorithm by testing other genetic operations, such as different crossover or selection methods, which could speed up the detection process.
Figure3shows the plot of the WF data shown in Table1, as well as the mean time of the peaks computed using the local maximum point method.The red line shows two peaks, corresponding to two objects.The time of the received signal (T1) is 5,295.96ns, as displayed on Table1.The time of each peak of the received signal (ti) is equal to 31 ns and 60 ns, respectively, corresponding to the first (t1) and second (t2) returns.The peak position of the emitted pulse (t0) is 21 ns.Considering the temperature of 16.8°C and pressure of 928.2 hPa at flight high, the range of each peak can be computed and the values are 795.497m and 799.845 m.

Figure 3 :
Figure 3: Time measurement with peaks in WF of received signal.

Figure 4 :
Figure 4: Study area and point cloud.

Figure 8
Figure 8 illustrates a complex WF example.Using the SPD method two peaks were determined and are represented as red triangles.Using the GA methodology, five peaks were detected.Again, the WF of the received signal is represented as solid line red, while each Gaussian is displayed as dotted lines.Using the DDR software, four returns were computed.

Figure 9 :
Figure 9: Points of line 1 computed with DDR software.

Figure 10 :
Figure 10: Points of line 1 computed with GA method.

Table 2 :
Comparison between the ranges calculated with the DDR software and SPD method.

Table 3 :
Number of Points