Acessibilidade / Reportar erro

SPATIAL OUTLIERS DETECTION ALGORITHM (SODA) APPLIED TO MULTIBEAM BATHYMETRIC DATA PROCESSING

Abstract:

The amount of data produced in an echosounder has increased dramatically with scanning systems such as multibeam echo sounders and interferometric sonars, providing a considerable improvement in the submerged terrain representation, especially when it comes to detecting hazardous objects to the navigator. However, the available processing algorithms did not come along with this evolution; manual processing or, at least, constant intervention is usually necessary, which makes this task arduous with a high degree of subjectivity. In addition, statistical inconsistencies do not appear to be uncommon in most of the algorithms and filters available. Thus, SODA (Spatial Outliers Detection Algorithm) was recently presented, being a methodology directed, at first, to echosounder data treatment. Considering this, this article aims to evaluate the SODA efficiency for real data treatment from a multibeam sonar. SODA, in some cases, was capable to identify up to 74% of spikes with the δ-Method, which means that this technique detected 28 out of the 38 confirmed spikes in the data set, reassuming the methodology strength regarding the search for spikes in echosounder data.

Keywords:
Spikes; Outliers; Multibeam Sonar; Multibeam Data Processing

1. Introduction

Hydrographic surveys deal with the configuration of ocean floor, rivers, lakes, dams, ports, and so on. Generally, these surveys focus on the measurement of depths, tides, ocean tides, gravity, terrestrial magnetism and determination of chemical and physical properties of the water with the main goal of compiling data for generating maps and reliable nautical publications (IHO, 2005IHO - International Hydrographic Organization. C-13: IHO Manual on Hydrography. Mônaco: International Hydrographic Bureau, 540p., 2005.). Thus, performing these surveys requires specific knowledge of physical environment, submarine acoustics, the several equipment and sensors as well as the appropriate procedures to comply with the requirements and recommendations defined by Brazilian and international regulations (IHO, 2008IHO - International Hydrographic Organization. S-44: IHO Standards for Hydrographic Surveys. Special Publication n. 44-5th. Mônaco: International Hydrographic Bureau , 36p., 2008.; JONG et al., 2010JONG, C. D.; LACHAPELLE, G.; SKONE, S.; ELEMA, I. A. Hydrography. 2ª ed. Delft University Press: VSSD, 354p., 2010.; DHN, 2014DHN - Diretoria de Hidrografia e Navegação. NORMAM 25: Normas da Autoridade Marítima para Levantamentos Hidrográficos. Marinha do Brasil, Brasil, 52p., 2014.; FERREIRA et al., 2015FERREIRA, Í. O.; RODRIGUES, D. D.; SANTOS, G. R. Coleta, processamento e análise de dados batimétricos. 1ª ed. Saarbrucken: Novas Edições Acadêmicas, v. 1, 100p., 2015.; FERREIRA et al., 2016FERREIRA, Í. O.; RODRIGUES, D. D.; NETO, A. A.; MONTEIRO, C. S. Modelo de incerteza para sondadores de feixe simples. Revista Brasileira de Cartografia, v. 68, n. 5, p. 863-881, 2016.).

The main task of a hydrographic survey is collecting spatial georeferenced depths. This task takes places via so-called echosounder surveys. In a current scenario, the products generated by swath bathymetry system show high gain in resolution and accuracy, both in planimetric and altimetric (depth) terms together with a large data densification, describing almost completely the submerged bottom (IHO, 2005IHO - International Hydrographic Organization. C-13: IHO Manual on Hydrography. Mônaco: International Hydrographic Bureau, 540p., 2005.; USACE, 2013USACE - U.S. Army Corps of Engineers. Hydrographic Surveying. Engineer Manual n. 1110-2-1003. Department of the Army. Washington, D. C. USA, 2013.; MALEIKA, 2015MALEIKA, W. The influence of the grid resolution on the accuracy of the digital terrain model used in seabed modeling. Marine Geophysical Research, v. 36, n. 1, p. 35-44, 2015.). When it comes to shallow water sampling, this data density is even higher. An increasing number of hydrographic services has adopted multibeam technology as its main methodology for collecting echosounder data for map production and updating (IHO, 2008IHO - International Hydrographic Organization. S-44: IHO Standards for Hydrographic Surveys. Special Publication n. 44-5th. Mônaco: International Hydrographic Bureau , 36p., 2008.; INSTITUTO HIDROGRÁFICO, 2009INSTITUTO HIDROGRÁFICO. Especificação Técnica para Produção de cartografia hidrográfica. Marinha Portuguesa, Lisboa, Portugal, v. 0, 24p., 2009.; LINZ, 2010LINZ - Land Information New Zealand. Contract Specifications for Hydrographic Surveys. New Zealand Hydrographic Authority, V. 1.2, 111p., 2010.; NOAA, 2011NOAA - National Oceanic and Atmospheric Administration. Field Procedures Manual. Office of Coast Survey, 2011.; USACE, 2013USACE - U.S. Army Corps of Engineers. Hydrographic Surveying. Engineer Manual n. 1110-2-1003. Department of the Army. Washington, D. C. USA, 2013.; DHN, 2014DHN - Diretoria de Hidrografia e Navegação. NORMAM 25: Normas da Autoridade Marítima para Levantamentos Hidrográficos. Marinha do Brasil, Brasil, 52p., 2014.).

Data collected by such technology, when used for producing or updating nautical maps, as well as in related activities, are expected to be free from abnormal data: spikes and tops. However, bathymetric survey has several sources of uncertainty as described, for example, in IHO (2008IHO - International Hydrographic Organization. S-44: IHO Standards for Hydrographic Surveys. Special Publication n. 44-5th. Mônaco: International Hydrographic Bureau , 36p., 2008.). In a depth measurement, spikes can depend, among other factors, on sonar quality alongside with bottom detection algorithm (phase detection, amplitude, Fourier transform, etc.). They can also depend on secondary lobe detection, on multiple reflections, air bubbles on the transducer and on reflections in the water column mainly caused by algae, fish shoals, deep scattering layer, thermal variations and suspended sediments (URICK, 1975URICK, R. I. Principles of Underwater Acoustics. Toronto: McGraw-Hill, 1975.; JONG et al., 2010JONG, C. D.; LACHAPELLE, G.; SKONE, S.; ELEMA, I. A. Hydrography. 2ª ed. Delft University Press: VSSD, 354p., 2010.)1 1 It consists of a layer of plankton that varies in depth throughout the day (IHO, 2005). .

Manual processing of bathymetric data is still common nowadays, especially in the stage of outlier elimination. However, as the density of collected data has increased, this procedure has become time consuming. Furthermore, when spike elimination occurs manually - by a hydrographer-, this stage shows a high degree of subjectivity, depending on the hydrographer and therefore more sensitive to human errors. Less efficient systems are already capable of collecting thousands of points per hour of survey, which makes the data processing in its traditional usage more costly than the hydrographic survey itself (CALDER & MAYER, 2003CALDER, B. R. Automatic statistical processing of multibeam echosounder data. The International Hydrographic Review, v. 4, n. 1, p. 53-68, 2003.; CALDER & SMITH, 2003CALDER, B. R. & SMITH, S. A time/effort comparison of automatic and manual bathymetric processing in real-time mode. In: Proceedings of the US Hydro 2003 Conference, The Hydrographic Society of America, Biloxi, MS. 2003.; BJØRKE & NILSEN, 2009BJØRKE, J. T. & NILSEN, S. Fast trend extraction and identification of spikes in bathymetric data. Computers & Geosciences, v. 35, n. 6, p. 1061-1071, 2009.; VICENTE, 2011VICENTE, J. P. D. Modelação de dados batimétricos com estimação de incerteza. Dissertação (Mestrado). Programa de Pós-Graduação em Sistemas de Informação Geográfica Tecnologias e Aplicações, Departamento de Engenharia Geográfica, Geofísica e Energia, Universidade de Lisboa, Portugal, 158p., 2011.; LU et al., 2010LU, D.; LI, H.; WEI, Y.; ZHOU, T. Automatic outlier detection in multibeam bathymetric data using robust LTS estimation. In: 3rd International Congress on Image and Signal Processing (CISP), IEEE, v. 9, p. 4032-4036, 2010.).

In response to the increase in the volume of data collected by the multibeam survey, researchers began to develop methodologies and algorithms of computer-aided processing in the 1990s to facilitate the hydrographer task (VICENTE, 2011VICENTE, J. P. D. Modelação de dados batimétricos com estimação de incerteza. Dissertação (Mestrado). Programa de Pós-Graduação em Sistemas de Informação Geográfica Tecnologias e Aplicações, Departamento de Engenharia Geográfica, Geofísica e Energia, Universidade de Lisboa, Portugal, 158p., 2011.). In addition, according to Debese (2007DEBESE, N. Multibeam Echosounder Data Cleaning Through an Adaptive Surface-based Approach. In: US Hydro 07 Norfolk, 18p., 2007.), these algorithms estimate depth at a given location, and some of them are still able to evaluate qualitatively the estimation process. As an example, CUBE (Combined Uncertainty and Bathymetry Estimator) presented by Calder (2003CALDER, B. R. Automatic statistical processing of multibeam echosounder data. The International Hydrographic Review, v. 4, n. 1, p. 53-68, 2003.) is one of the algorithms that stands out. This is one of the most promising algorithms for semi-automatic processing of multibeam data, which is implemented in several commercial processing packages.

Several authors have developed researches in the detection area of anomalous values in bathymetry data from acoustic sonars. Ware et al. (1991WARE, C.; KNIGHT, W.; WELLS, D. Memory intensive statistical algorithms for multibeam bathymetric data. Computers & Geosciences , v. 17, n. 7, p. 985-993, 1991.) showed outliers detection process based on the analysis of sample statistical properties. Likewise, Eeg (1995EEG, J. On the identification of spikes in soundings. The International Hydrographic Review , v. 72, n. 1, p. 33-41, 1995.) proposed a statistical test to validate the size of the clusters used to detect spikes. Debese & Bisquay (1999DEBESE, N. & BISQUAY, H. Automatic detection of punctual errors in multibeam data using a robust estimator. The International Hydrographic Review , v. 76 n. 1, p. 49-63, 1999.), Motao et al. (1999MOTAO, H.; GUOJUN, Z.; RUI, W.; YONGZHONG, O.; ZHENG, G. Robust method for the detection of abnormal data in hydrography. The International Hydrographic Review , v. 76, n. 2, p. 93-102, 1999.), Debese (2007DEBESE, N. Multibeam Echosounder Data Cleaning Through an Adaptive Surface-based Approach. In: US Hydro 07 Norfolk, 18p., 2007.) and Debese et al. (2012DEBESE, N.; MOITIÉ, R.; SEUBE, N. Multibeam echosounder data cleaning through a hierarchic adaptive and robust local surfacing. Computers & Geosciences , v. 46, p. 330-339, 2012.) applied M-estimators while Calder & Mayer (2003CALDER, B. R. Automatic statistical processing of multibeam echosounder data. The International Hydrographic Review, v. 4, n. 1, p. 53-68, 2003.) used the Kalman filter to process bathymetric data automatically. Similarly, Bottelier et al. (2005BOTTELIER, P.; BRIESE, C.; HENNIS, N.; LINDENBERGH, R.; PFEIFER, N. Distinguishing features from outliers in automatic Kriging-based filtering of MBES data: a comparative study. Geostatistics for Environmental Applications, Springer, p. 403-414, 2005.) used kriging techniques and Bjørke & Nilsen (2009BJØRKE, J. T. & NILSEN, S. Fast trend extraction and identification of spikes in bathymetric data. Computers & Geosciences, v. 35, n. 6, p. 1061-1071, 2009.) showed a technique for spike detecting based on the construction of trend surfaces. Moreover, Lu et al. (2010LU, D.; LI, H.; WEI, Y.; ZHOU, T. Automatic outlier detection in multibeam bathymetric data using robust LTS estimation. In: 3rd International Congress on Image and Signal Processing (CISP), IEEE, v. 9, p. 4032-4036, 2010.) developed an algorithm based on the robust estimator LTS (Least Trimmed Squares).

However, these methodologies are either mostly difficult to apply, semi-automated or implemented only in commercial packages. In addition, most of these techniques are based on theoretical assumptions, which are hardly met and/or verified such as presuming that the variables are independent and belong to sets of variables normally distributed.

As an alternative to the available techniques, Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.) showed a new methodology for locating spikes entitled SODA (Spatial Outlier Detection Algorithm). In the study, the authors described thoroughly the developed algorithm and validated it via computational simulations. SODA has a series of robust and ideal characteristics, which means that it takes into account the autocorrelation and the non-normality of the data. In addition, it is an open source code.

Thus, this study aims to evaluate the SODA efficiency from the real data processing. In this sense, one used bathymetric data collected through multibeam technology and preprocessed it in agreement to specifications from Brazilian regulations. The depth and position data submitted to the SODA methodology were later compared to traditional processing.

2. SODA - SPATIAL OUTLIERS DETECTION ALGORITHM

SODA, presented by Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.), is a methodology developed for spike detection in bathymetric point cloud, specifically multibeam echosounder data, interferometric sonars and airborne laser bathymetric systems. Although the focus of SODA is on bathymetric data, one can easily modify it to identify outliers in any set of georeferenced data, such as in Topography, Geodesy, Photogrammetry, Mining data, etc. (Ferreira, 2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.).

The theoretical development of SODA bases on classical and geostatistical statistics theorems, which makes it supposedly robust and efficient. The operation principle consists in applying the steps described in the diagram below (Figure 1).

Figure 1: Flowchart
of the logic process of SODA tool.

As can be seen in Figure 1, the first step of applying SODA consists in importing the points cloud. In this step, the three-dimensional coordinates in the XYZ format (Shapefile or text file) are imported, in which X and Y represent the planimetric coordinates that can be either local, projected or geodesic while Z denotes the reduced depth. An exploratory analysis is performed subsequently.

The next step, perhaps the most meaningful one to the entire process, is verifying the presence - or not - of spatial independence between the depth data, an assumed condition by the statistical supporting techniques used by SODA (MORETTIN & BUSSAB, 2004MORETTIN, P. A. & BUSSAB, W. O. Estatística básica. 5ª ed. São Paulo: Editora Saraiva, 526p., 2004. ; SEO, 2006SEO, S. A review and comparison of methods for detecting outliers in univariate data sets. Master Of Science, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, USA, 59p., 2006. ). For this evaluation, one used the semivariogram (MATHERON, 1965MATHERON, G. Les variables régionalisées et leur estimation. Paris: Masson, 306p., 1965.; FERREIRA et al., 2015FERREIRA, Í. O.; RODRIGUES, D. D.; SANTOS, G. R. Coleta, processamento e análise de dados batimétricos. 1ª ed. Saarbrucken: Novas Edições Acadêmicas, v. 1, 100p., 2015.).

Given the spatial nature of the bathymetry data, georeferenced depth sample is expected to be dependent. Regarding this, SODA uses Geostatistics tools to eliminate spatial autocorrelation and thus to continue with the procedures. In this step, a geostatistical modeling is carried out, aiming to generate the standardized residues (SRs), which are acknowledged as homogeneous, non-tendentious, independent and normal (SANTOS et al., 2017SANTOS, A. M. R. T.; SANTOS, G. R.; EMILIANO, P. C.; MEDEIROS, N. G.; KALEITA, A. L.; PRUSKI, L. O. S. Detection of inconsistencies in geospatial data with geostatistics. Boletim de Ciências Geodésicas, v. 23, n. 2, p. 296-308, 2017.). By the complexity of a coherent geostatistical analysis, human intervention is required at this stage (Ferreira et al., 2013FERREIRA, Í. O.; SANTOS, G. R.; RODRIGUES, D. D. Estudo sobre a utilização adequada da krigagem na representação computacional de superfícies batimétricas. Revista Brasileira de Cartografia , Rio de Janeiro, v. 65, n. 5, p. 831-842, 2013.). Controversially, there may be cases where irregular sampling or even other factors lead to an independent sample, as proven by Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.).

Once the spatial independence of the variable to be analyzed (depth or SR) has been verified, SODA application is continued. In the next step, the sample is spatially segmented in order to preserve the spatial characteristic of the analysis (local analysis). SODA employs Segmentation in Circles, in which the Search Radius is defined by the user or by a spatial analysis. In the latter case, a radius equal to three times the minimum distance between points is used.

From a given radius, the algorithm constructs a circle centered on each depth or SR, identifying and storing all the data inside this circle in subsamples. All analyses, from then onwards, are performed only on these subsamples. The independence characteristic of the data set is guaranteed for the subsamples by the following theorem: If X1,,Xk are independent random variables and g1(),,gs are s functions such that Yj=gjXj, j=1,,k are random variables, then, Y1,,Ys are independent. The demonstration of this theorem, as well as theoretic examples, can be found in Mood (1913MOOD, A. M. Introduction to the theory of statistics. McGraw-Hi1l series in probability and statistics, 564p., 1913.) and Mood et al. (1974).

In a successive step, as discussed, local analyzes are performed in the sub-samples generated in the previous step. Thus, SODA employs three detection techniques of outliers: Adjusted Boxplot (VANDERVIEREN & HUBERT, 2004VANDERVIEREN, E. & HUBERT, M. An Adjusted Boxplot for skewed distributions. Proceedings in Computational Statistics, p. 1933-1940, 2004.); Z-Score Modified (IGLEWICZ & HOAGLIN, 1993IGLEWICZ, B. & HOAGLIN, D. How to detect and handle outliers. Milwaukee, Wis.: ASQC Quality Press, 87p., 1993.) and δ -Method (FERREIRA, 2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.). The latter has been developed together with SODA and, as well as with other methods mentioned above, Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.) prescribes it in detail.

The 𝛿-Method, given by Equation 1, is a proposition consisting of a new outlier detection threshold, based on robust estimators with spatial characteristics.

T h r e s h o l d = Q 2 L o c a l ± c δ (1)

Where Q2Local is the median of the subsampled data as well as 𝑐 and 𝛿 are constants that depend on data variability. The constant 𝑐 assumes the value 1 for irregular terrains or artificial channels (high variability); 2 for slopes (average variability) and 3 for plain terrain (low variability). The value can be understood as a constant weight 𝛿 and must be inserted by the user. This constant is automatically determined by the algorithm through the evaluation of the Normalized Median Absolute Deviation Global (NMADGlobal) or Local (NMADlocal). In other words, δ=0,5(NMADGlobal+NMADlocal), if NMADGlobal>NMADlocal. Otherwise, δ=NMADGlobal. When the detection thresholds for outliers are applied, the last SODA stage is performed. In this step, the probability of the data being outliers is determined (Poutlier) in each of the three techniques, based on the number of times the data were analyzed (Nºanalyzed) and the number of times they were considered outliers (Nºoutlier) (Equation 2):

P o u t l i e r % = N º o u t l i e r N º a n a l y z e d 100 (2)

For instance, considering that given any radius of search, the observation 𝑖 was sub-sampled 40 times. Hence, it was analyzed by the three detection techniques of outliers in these 40 times. In addition, one should note that out of the 40 times, the observation 𝑖 was considered a spike by the δ-Method 10 times; hence, Poutlier=(10/40)=0,25, or observation 𝑖 is 25% likely to be a spike if the considered cutoff limit is the one given by the δ -Method. It is evident that SODA does this computation jointly and automatically alongside with the other thresholds.

Finally, once the user sets a standard Ptreshold, SODA spatially plots all observations, highlighting the detected spikes by the thresholds in use, i.e., all data where PoutlierPtreshold. Afterwards, the hydrographer performs an inspection to confirm whereas the data flagged as spikes are in fact spikes and then deletes them. In all instances, new XYZ files are created for each technique previously discussed. To be more specific, SODA is associated to the Adjusted Boxplot, Modified Z-Score and δ -Method, respectively.

From the study presented by Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.) SODA presents very interesting characteristics. When it comes to controlled environments, SODA has shown efficiency in detecting depths characterized as spikes. As for SODA’s behavior in the presence of submerged structures - sandbars, rocks and shipwrecks-, there has been another promising outcome, comparing to other simulated results: SODA did not erroneously identify any of these features.

3. Experiments and results

Real data used as a basis for practical SODA evaluation were obtained from a partnership with the company A2 Marine Solution. The hydrographic survey took place in April 2017, near the turning basin of the Integrator Port Terminal Luiz Antônio Mesquita (TIPLAM, abbreviated in Portuguese). Turning basin is the front area of berthing facilities, reserved for the necessary turning for berthing and unberthing procedures.

TIPLAM is located in the city of Santos, state of São Paulo, Brazil and it currently moves about 2.5 million tons per year, being responsible for the discharge, mainly, of sulfur, phosphate rock, fertilizers and ammonia.

For bathymetric data collection, a swath bathymetry system was used composed by the multibeam echosounder model Sonic 2022, by R2 Sonic, integrated with inertial navigation system, model I2NS, by Applanix. The planning, execution and data analysis were according to the recommendations from the NORMAM-25 (DHN, 2014DHN - Diretoria de Hidrografia e Navegação. NORMAM 25: Normas da Autoridade Marítima para Levantamentos Hidrográficos. Marinha do Brasil, Brasil, 52p., 2014.) and S-44 (IHO, 2008IHO - International Hydrographic Organization. S-44: IHO Standards for Hydrographic Surveys. Special Publication n. 44-5th. Mônaco: International Hydrographic Bureau , 36p., 2008.) in category A and Special Order, respectively.

The surveys were firstly pre-processed in the software Hysweep (Hypack, 2012HYPACK, Inc. Hypack - Hydrographic Survey Software User Manual. Middletown, USA, 1784p., 2012.), which consisted in the following steps:

  • Conversion of collected data from the various sensors into Hysweep format;

  • Analysis of auxiliary sensor data (attitude, latency, speed of sound, tide, etc.), aiming to identify possible flaws. If necessary, interpolation or rejection of anomalous data;

  • Junction of datagrams2 2 Complete and independent data entity. In this case, it refers to the data generated by the various sensors. ;

  • Computation of Total Propagated Uncertainty (Horizontal and Vertical);

  • Computation of three-dimensional coordinates in XYZ format (reduced georeferenced depths), and

  • Filtering duplicated points.

In this last step, all duplicated points were removed mainly from overlapping point clouds. This reduction from the original sample considerably minimized the processing time. Subsequently, the study area was selected and the three-dimensional coordinates in the XYZ format (reduced georeferenced depths) were exported.

Figure 2 illustrates the study area.

Figure 2:
Study area.

With the spatial data set with their respective XYZ coordinates, SODA application may continue. Besides, the points were imported and submitted to exploratory analysis shown in Table 1.

Table 1:
Descriptive statistics of the study area.

From the variance (Table 1), it can be concluded that the data show low variability (WARRICK & NIELSEN, 1980WARRICK, A.W. & NIELSEN, D.R. Spatial variability of soil physical properties in the field. In: HILLEL, D. Applications of soil physics. New York: Academic Press, p.319-344, 1980.). Moreover, coefficients of asymmetry and kurtosis, which quantify the deviation of depth distribution with respect to a symmetrical distribution and flatness degree of distribution, respectively, indicate an asymmetrical distribution to the left (negative), leptokurtic and, a priori, with a large concentration of values around the mean. To sum up, the distribution is non-normal and/or is affected by abnormal values.

Figure 3 shows some graphs that aid in the exploratory analysis and hence are constructed and generated by SODA.

Figure 3: Exploratory
graphical analysis.

The depth histogram together with the density curve graph are tools capable of providing indicatives regarding data normality, which may or may not be proved by Q-Q Plot analysis. This analysis consists in a graph that permits to check how adequate the data frequency distribution (empiric/real) is in comparison to any probability distribution. In brief, the quantiles of the empirical distribution function are plotted against the theoretical quantile of probability distribution, which is, in this case, the normal distribution. If the empirical distribution is normal, then the graph will be presented as a straight line (HÖHLE & HÖHLE, 2009HÖHLE, J. & HÖHLE, M. Accuracy assessment of digital elevation models by means of robust statistical methods. ISPRS Journal of Photogrammetry and Remote Sensing, v. 64, n. 4, p. 398-406, 2009.). After the graphical analysis, one observed the non-normality of data, which could later be confirmed by univariate normality tests if spatial independence occurs.

Finally, there is the Tukey boxplot that shows, generally, the presence of possible outliers. Nevertheless, such assumption cannot be confirmed since the Tukey method does not consider the spatial dependence structure of the depths. In addition to that, the cut-off thresholds are derived from the normal distribution. However, by observing the histogram, there are some depths slightly apart from the average.

Figure 4 presents the point cloud in study that shows the presence of some depths disagreeing with their neighboring values. It should be pointed out that some of the possible spikes coincide with those abnormal depths detected by the Tukey method.

Figure 4:
Spatial data sets of the study area.

According to the flow diagram shown in Figure 1, the spatial independence analysis is performed after the exploratory analysis. If the spatial independence is confirmed, the proposed methodology may continue, determining the search radius followed by segmenting the sample. Otherwise, a geostatistical analysis, with the objective to obtaining the standardized residues (SRs), must be performed. Then, all subsequent analyses are performed on the SRs. SODA builds three semivariograms that allow, through a visual analysis, to confirm or not, the spatial independence.

Figure 5 shows the semivariograms, where the spatial independence can be confirmed by the presence of pure nugget effect. This is a rare case, since an intrinsic characteristic of georeferenced data is its spatial autocorrelation. For this reason, it was necessary to redo this process even though the obtaining results have proven to be identical.

Determining the search radius is preferably carried out from a spatial analysis. In this case, a radius equals to three times the minimum distance is selected. In this study, one obtained a radius of 1.799 meters. From this radius, the algorithm segments and applies for each sub-sample the three proposed detection thresholds of outliers. Firstly, the Adjusted Boxplot, Modified Z-Score and δ-Method thresholds detected, respectively, 2 028 (25.07%), 1 204 (14.88%) and 161 (1.99%) possible spikes. As it is a flat area, the constant plain terrain of δ-Method was set to 3.

A comparison between the thresholds used by SODA in the study area shows a 95.65% agreement between the 𝛿-Method and the Modified Z-Score threshold, i.e., out of the 161 possible spikes detected by the 𝛿-Method, 154 were also detected by the Modified Z-Score. Similarly, the agreement between the also and the Adjusted Boxplot was 57.76% and between the Modified Z-Score and the Adjusted Boxplot of 43.35%.

Figure 5:
Experimental semivariograms with 75%, 50% and 25% maximum distance.

In the next step, SODA refines the analysis by calculating the likelihood that the data is a spike for each of the three techniques. To execute this step, a Pthreshold=0,5 has been used for the Adjusted Boxplot and δ-Method and Pthreshold=0, 8 for the Modified Z-Score, as suggested by Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.). After that, the proposed methodology associated with the Adjusted Boxplot, Modified Z-Score and δ-Method detected 66 (0.82%), 24 (0.30%) and 34 (0.42%) spikes, respectively. Out of the 24 spikes detected by the Modified Z-Score technique, 16 were also detected by the Adjusted Boxplot and by the δ -Method, showing an agreement of approximately 66%. The agreement between the 𝛿-Method and the Adjusted Boxplot was nearly 38.23%, suggesting that the 16 outliers located in agreement with the Modified Z-Score are not the same.

Figure 6 shows the study area with the detected spikes highlighted in red.

Figure 6:
Spikes detected by SODA from the Adjusted Boxplot (a), Modified Z-Score (b) and δ-Method (c) thresholds.

It is noteworthy that for processing this data set, one used a machine with an operating system Windows 10, RAM of 8GB RAM (partially dedicated to software R (R Core Team, 2017R CORE TEAM. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. 2017.
https://www.R-project.org/...
)) and Intel® Core™ i7-4500U CPU @ 1.80GHz 2.40 GHz processor. The processing time was approximately 9 minutes.

In order to thoroughly evaluate the robustness of the proposed methodology, particularly of the 𝛿-Method, the study area was submitted to manual processing. In this phase, one visualizes the data through a graphical interface, in which the general and partial point cloud, bathymetric profile and backscattering images can be presented, among other information. Then it decides, based on a qualitative and analytical judgement, which depth is valid.

In general, the study area is relatively small with a flat relief that facilitates the search for manual spikes. Moreover, the surveys were collected by a high-quality multibeam system that provided reasonably simple manual processing. It is important to point out that collecting depths using low-quality multibeam sonars, interferometric sonars, or bathymetric LiDAR (Light Detection and Ranging) systems frequently produces very noisy data, effecting the speed of manual quality processing. Obviously, in shallow water or in large areas, the problem around manual processing becomes aggravated.

Through manual processing, 38 spurious features were detected, approximately 0.47% spikes, as shown in Figure 7.

Figure 7:
Spikes detected through manual processing.

By taking the manual processing as a reference, the proposed methodology, allied mainly to the 𝛿-Method thresholds, was quite robust obtaining an agreement of 82.35%. In other words, out of the 34 spikes located by the 𝛿-Method, 28 were identified by manual processing. To be more specific, the success rate was 74%, since the 𝛿-Method detected 28 out of the 38 spikes identified in the data set.

The undetected points are almost entirely outliers with very low magnitudes, around 0.30 meters. Therefore, the failure to detect these points probably does not cause losses or gains to the final products.

Two main reasons lead to this problem in which the first one relates to the border points. As can be seen in Figure 8, SODA failed to detect the presence of few spikes in the extremities of the study area. This is due to the developed methodology aspect, since spike searches occur on the subsamples formed from circles and hence it results in an affected analysis at the edge of point clouds by the reduced number of verifications (redundancy). In addition to that, there is the need of at least 7 points within each circle in order to apply outlier detection techniques (FERREIRA, 2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.).

The second reason is associated with spike agglomeration, which may bias the local analysis, and/or their presence in coverage failures. In these cases, the analysis redundancy is also reduced and, consequently, the process reliability decreases. In more extreme cases, spikes presented in holidays (coverage problems) or in areas with low point density may not be analyzed, since the sub-samples tend to show less than seven depths. Therefore, SODA shows to be effective when the point cloud is dense and has no coverage failures.

Particularly analyzing the 𝛿-Method , it appeared to show problems in flagging six valid depths as spikes. When investigating these depths, some of them seemed to be presented in areas with coverage failures and, as discussed above, in such occasions the methodology may not be flawless. The others - ID 6026 and ID 2184 -, when locally analyzed, showed divergent depths from their neighbors, presenting a magnitude near 50 centimeters, indicating, therefore, failures in manual processing.

Besides, as Table 2 shows, these depths obtained a Poutlier of approximately 50%. This means that if Poutlier was set by the user with a value of 51%, that is, Poutlier=51%, the method would have found only 2 false spikes (ID 6026 and ID 2184).

Table 1: Valid
depths marked as spikes by the δ-Method .

Adjusted Boxplot quantitatively overestimated spike detection, locating depths that, in comparison to manual processing, are not characterized as spikes. From the sixty-six probable detected spikes, only 17 were identified through manual processing. Therefore, the agreement and percentage of accuracy of this threshold were 25.76% and 44.74%, respectively.

The incorrect detected points, when locally analyzed throughout partial rough bathymetric profiles, are depths that slightly deviate from their neighbors, showing themselves to be local noises. Nonetheless, in a navigational purpose process, most points detected by this threshold are in fact valid depths. No probable spike detected by the threshold represents a direct harm to navigation and, thus, the exclusion of these points does not cause any losses to the final generated products. In fact, this may represent gains since the proposed method has proven to be an excellent tool for smoothing bathymetric surfaces. Consequently, in cases where submerged terrain softening is required, for instance, when it is desired to construct contour lines or digital elevation models, the application of this threshold can be significantly useful.

On the other hand, even though the Adjusted Boxplot threshold found 66 probable outliers out of 38 possible, it obtained only 44.74% accuracy. By analyzing the undetected spikes, the previously discussed issue took place. To be more specific, the methodology shows low efficiency on locating spikes at the extremities of the study area, at holidays and in areas with low point cloud density. It must also be highlighted the failure to detecting spurious features when it comes to spike agglomerations, such as the one presented in the central part of the study area (Figure 8).

Finally, the Modified Z-Score threshold underestimated spike detection. Twenty-four spikes were found, out of which 16 are anomalous depths, i.e., an agreement of 66.67% and a success percentage of 42.15%. Overall, this threshold was more efficient than the Adjusted Boxplot. Analyzing the undetected spikes, in accordance with what previously happened, the Modified Z-Score showed the same problem: agglomerate spikes in the extremities of the study area and in holiday areas were not detected. Conversely, eight valid depths were mistakenly flagged as spikes, where six out of these eight were also detected by the Adjusted Boxplot and the other two - ID 6377 and ID 7801 - showed about 20 centimeters difference with respect to their neighborhood.

Figure 8 shows spikes detected by manual processing and those located by the proposed methodology in association with outliers detection techniques.

Figure 8:
Excluded Spikes through manual processing and Spikes detected by SODA from the thresholds of the Adjusted Boxplot (a), Modified Z-Score (b) and 𝛿-Method (c).

Based on the above, the proposed methodology associated with 𝛿-Method has proven to be quite robust, obtaining about 74% success rate in the real data processing, while the Adjusted Boxplot and Modified Z-Score thresholds reached 44.73% and 42.10%, respectively. In general, both thresholds, when compared to manual processing, have proven to be powerful tools for locating and eliminating spikes. Some adjustments, though, such as defining the constant 𝑐 for the δ-Method, are still required.

4. Final Remarks

This work aimed to evaluate SODA methodology in the real bathymetric data processing. In the evaluation, one used bathymetric lines surveyed with multibeam technology. Data were collected in a moderately small area with submerged flat bottom, which caused manual processing to be reasonably easy. Manually data processed were used as reference for SODA validation.

The proposed strategy for spikes detection has proven to be a versatile alternative of great potential when compared to the currently used methodologies, especially when it comes to manual treatment.

On this matter, it is important to stress that the subjectivity degree of the process of manually identifying spikes reduced comparatively to the proposed method, since this is a semi-automatic approach and the hydrographer should simply indicate some parameters in the process, such as determining the constant 𝑐 (𝛿 Method). In some cases, if preferred, one may choose to set the Search Radius and/or modify the 𝑃 𝑙𝑖𝑚𝑖𝑎𝑟 , instead of performing analyzes based on each personal experience. Thus, besides it is a less subjective method, there is no need to analyze large data sets in the proposed approach, becoming a less time-consuming process.

Even in cases where existing semiautomatic methods are used, which may be more complex to be applied, these techniques are almost completely based on theoretical assumptions hardly met and/or verified. As an example, there is the assumption that the studied variables are independent and belong to normally distributed sets of variables, unlike the proposed tool that performs analysis of normality and spatial independence. Precisely, besides minimizing the hydrographer’s interference, the usage of theorems from classical statistics and Geostatistics strengthened SODA in theory.

Among the thresholds used by SODA, the 𝛿-Method, showed superior performance when compared to the others. Adjusted Boxplot threshold overestimated the amount of spikes in the data set. However, SODA, when associated with the Adjusted Boxplot technique, can be a useful tool for smoothing bathymetric models, succeeding generation of contour lines. Unlike the results presented by Ferreira (2018FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.), for real data, the Z-Score threshold underestimated the amount of spikes in the data set.

Although the focus has been on the use of multibeam survey data, the proposed methodology can be applied to airborne laser bathymetry systems, interferometric sonars and even in related areas such as Topography, Geodesy, Photogrammetry, Mining, Statistics, etc. Future tests and simulations are recommended in different terrains, aiming the performance improvement of algorithms in terms of processing time, definition of the search radius for areas with coverage failures and definition of the constant of 𝛿-Method.

Regarding the search radius determination, it is also suggested simultaneous application of different search radiuses during the processing. This would allow greater redundancy of analyzes as well as a possible improvement in processing results on point clouds with holidays. Another matter to be elucidated is the analysis at the extremities of point clouds even though, computationally, the solution to this fact seems not to be feasible. A practical solution would be to extrapolate the study area during field surveys. To conclude, the application of this methodology in related areas is desirable.

ACKNOWLEDGMENT

The authors would like to thank GEPLH (Study and Research Group on Hydrographic Surveys) and DEC (Department of Civil Engineering) for providing the realization of this survey.

References

  • BJØRKE, J. T. & NILSEN, S. Fast trend extraction and identification of spikes in bathymetric data. Computers & Geosciences, v. 35, n. 6, p. 1061-1071, 2009.
  • BOTTELIER, P.; BRIESE, C.; HENNIS, N.; LINDENBERGH, R.; PFEIFER, N. Distinguishing features from outliers in automatic Kriging-based filtering of MBES data: a comparative study. Geostatistics for Environmental Applications, Springer, p. 403-414, 2005.
  • CALDER, B. R. & MAYER, L. A. Automatic processing of high-rate, high-density multibeam echosounder data. Geochemistry, Geophysics, Geosystems, v. 4, n. 6, 2003.
  • CALDER, B. R. & SMITH, S. A time/effort comparison of automatic and manual bathymetric processing in real-time mode. In: Proceedings of the US Hydro 2003 Conference, The Hydrographic Society of America, Biloxi, MS. 2003.
  • CALDER, B. R. Automatic statistical processing of multibeam echosounder data. The International Hydrographic Review, v. 4, n. 1, p. 53-68, 2003.
  • DEBESE, N. & BISQUAY, H. Automatic detection of punctual errors in multibeam data using a robust estimator. The International Hydrographic Review , v. 76 n. 1, p. 49-63, 1999.
  • DEBESE, N. Multibeam Echosounder Data Cleaning Through an Adaptive Surface-based Approach. In: US Hydro 07 Norfolk, 18p., 2007.
  • DEBESE, N.; MOITIÉ, R.; SEUBE, N. Multibeam echosounder data cleaning through a hierarchic adaptive and robust local surfacing. Computers & Geosciences , v. 46, p. 330-339, 2012.
  • DHN - Diretoria de Hidrografia e Navegação. NORMAM 25: Normas da Autoridade Marítima para Levantamentos Hidrográficos. Marinha do Brasil, Brasil, 52p., 2014.
  • EEG, J. On the identification of spikes in soundings. The International Hydrographic Review , v. 72, n. 1, p. 33-41, 1995.
  • FERREIRA, I. O. Controle de qualidade em levantamentos hidrográficos. Tese (Doutorado). Programa de Pós-Graduação em Engenharia Civil, Departamento de Engenharia Civil, Universidade Federal de Viçosa, Viçosa, Minas Gerais, 216p., 2018.
  • FERREIRA, Í. O.; RODRIGUES, D. D.; NETO, A. A.; MONTEIRO, C. S. Modelo de incerteza para sondadores de feixe simples. Revista Brasileira de Cartografia, v. 68, n. 5, p. 863-881, 2016.
  • FERREIRA, Í. O.; RODRIGUES, D. D.; SANTOS, G. R. Coleta, processamento e análise de dados batimétricos. 1ª ed. Saarbrucken: Novas Edições Acadêmicas, v. 1, 100p., 2015.
  • FERREIRA, Í. O.; SANTOS, G. R.; RODRIGUES, D. D. Estudo sobre a utilização adequada da krigagem na representação computacional de superfícies batimétricas. Revista Brasileira de Cartografia , Rio de Janeiro, v. 65, n. 5, p. 831-842, 2013.
  • HÖHLE, J. & HÖHLE, M. Accuracy assessment of digital elevation models by means of robust statistical methods. ISPRS Journal of Photogrammetry and Remote Sensing, v. 64, n. 4, p. 398-406, 2009.
  • HYPACK, Inc. Hypack - Hydrographic Survey Software User Manual. Middletown, USA, 1784p., 2012.
  • IGLEWICZ, B. & HOAGLIN, D. How to detect and handle outliers. Milwaukee, Wis.: ASQC Quality Press, 87p., 1993.
  • IHO - International Hydrographic Organization. C-13: IHO Manual on Hydrography. Mônaco: International Hydrographic Bureau, 540p., 2005.
  • IHO - International Hydrographic Organization. S-44: IHO Standards for Hydrographic Surveys. Special Publication n. 44-5th. Mônaco: International Hydrographic Bureau , 36p., 2008.
  • INSTITUTO HIDROGRÁFICO. Especificação Técnica para Produção de cartografia hidrográfica. Marinha Portuguesa, Lisboa, Portugal, v. 0, 24p., 2009.
  • JONG, C. D.; LACHAPELLE, G.; SKONE, S.; ELEMA, I. A. Hydrography. 2ª ed. Delft University Press: VSSD, 354p., 2010.
  • LINZ - Land Information New Zealand. Contract Specifications for Hydrographic Surveys. New Zealand Hydrographic Authority, V. 1.2, 111p., 2010.
  • LU, D.; LI, H.; WEI, Y.; ZHOU, T. Automatic outlier detection in multibeam bathymetric data using robust LTS estimation. In: 3rd International Congress on Image and Signal Processing (CISP), IEEE, v. 9, p. 4032-4036, 2010.
  • MALEIKA, W. The influence of the grid resolution on the accuracy of the digital terrain model used in seabed modeling. Marine Geophysical Research, v. 36, n. 1, p. 35-44, 2015.
  • MATHERON, G. Les variables régionalisées et leur estimation. Paris: Masson, 306p., 1965.
  • MOOD, A. M. Introduction to the theory of statistics. McGraw-Hi1l series in probability and statistics, 564p., 1913.
  • MOOD, A. M.; GRAYBILL, F. A.; BOES, D. C. Introduction to the Theory of Statistics. McGraw-Hill International, 577p., 1974.
  • MORETTIN, P. A. & BUSSAB, W. O. Estatística básica. 5ª ed. São Paulo: Editora Saraiva, 526p., 2004.
  • MOTAO, H.; GUOJUN, Z.; RUI, W.; YONGZHONG, O.; ZHENG, G. Robust method for the detection of abnormal data in hydrography. The International Hydrographic Review , v. 76, n. 2, p. 93-102, 1999.
  • NOAA - National Oceanic and Atmospheric Administration. Field Procedures Manual. Office of Coast Survey, 2011.
  • R CORE TEAM. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ 2017.
    » https://www.R-project.org/
  • SANTOS, A. M. R. T.; SANTOS, G. R.; EMILIANO, P. C.; MEDEIROS, N. G.; KALEITA, A. L.; PRUSKI, L. O. S. Detection of inconsistencies in geospatial data with geostatistics. Boletim de Ciências Geodésicas, v. 23, n. 2, p. 296-308, 2017.
  • SEO, S. A review and comparison of methods for detecting outliers in univariate data sets. Master Of Science, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, USA, 59p., 2006.
  • URICK, R. I. Principles of Underwater Acoustics. Toronto: McGraw-Hill, 1975.
  • USACE - U.S. Army Corps of Engineers. Hydrographic Surveying. Engineer Manual n. 1110-2-1003. Department of the Army. Washington, D. C. USA, 2013.
  • VANDERVIEREN, E. & HUBERT, M. An Adjusted Boxplot for skewed distributions. Proceedings in Computational Statistics, p. 1933-1940, 2004.
  • VICENTE, J. P. D. Modelação de dados batimétricos com estimação de incerteza. Dissertação (Mestrado). Programa de Pós-Graduação em Sistemas de Informação Geográfica Tecnologias e Aplicações, Departamento de Engenharia Geográfica, Geofísica e Energia, Universidade de Lisboa, Portugal, 158p., 2011.
  • WARE, C.; KNIGHT, W.; WELLS, D. Memory intensive statistical algorithms for multibeam bathymetric data. Computers & Geosciences , v. 17, n. 7, p. 985-993, 1991.
  • WARRICK, A.W. & NIELSEN, D.R. Spatial variability of soil physical properties in the field. In: HILLEL, D. Applications of soil physics. New York: Academic Press, p.319-344, 1980.
  • 1
    It consists of a layer of plankton that varies in depth throughout the day (IHO, 2005).
  • 2
    Complete and independent data entity. In this case, it refers to the data generated by the various sensors.

Publication Dates

  • Publication in this collection
    2 Dec 2019
  • Date of issue
    2019

History

  • Received
    27 July 2018
  • Accepted
    17 Dec 2018
Universidade Federal do Paraná Centro Politécnico, Jardim das Américas, 81531-990 Curitiba - Paraná - Brasil, Tel./Fax: (55 41) 3361-3637 - Curitiba - PR - Brazil
E-mail: bcg_editor@ufpr.br