AN OUTLIER DETECTION METHOD IN GEODETIC NETWORKS BASED ON THE ORIGINAL OBSERVATIONS

The observations in geodetic networks are measured repetitively and in the network adjustment step, the mean values of these original observations are used. The mean operator is a kind of Least Square Estimation (LSE) . LSE provides optimal results when random errors are normally distributed. If one of the original repetitive observations has outlier, the magnitude of this out lier will decrease because the mean value of these original observations is used i n the network adjustment and outlier detection. In this case, the reliability of the outlier detection methods decreases, too. Since the original repetitive obser vations are independent, they can be used in the adjustment model instead of the esti mating mean value of them. In this study, to show the effects of the estimating m ean value of the original repetitive observations, a leveling network that contains both outward run and backward run observations were simulated. Tests for outlier, Hub er and Danish methods were applied to two different cases. First, the mean val ues of the original observations (outward run and return run) were used; and then al l original observations were considered in the outlier detection. The reliabilit ies of the methods were measured by Mean Succes Rate. According to the obtained resu lts, the second case has more reliable results than first case.


INTRODUCTION
The geodetic networks (the leveling network, horizontal control network or Global Navigation Satellite System (GNSS) network) are established and the observations are measured at least two times repetitively.Depending on the different reasons i.e. based on environment or user attention or equipment conditions, some outliers may be occurred in the observations.These outliers can affect the estimated parameters and their variances, so that these obtained wrong results may cause wrong assumptions.Therefore the outlier in the observations must be detected.There are two main approaches to detect outlier in Geodesy: Tests for outlier (BAARDA 1968;POPE 1976) and robust methods (HUBER 1981;HAMPEL et al. 1986;ROUSSEEUW and LEROY 1987;KOCH 1999).
The least squares estimation (LSE) plays an important role for both Tests for outlier and robust methods.It is very sensitive against deviations of the model assumptions (HAMPEL et al. 1986) and spreads the effects of the outliers on all residuals (HEKIMOGLU et al. 2011).Each observation in geodetic networks is measured at least two times repetitively, and then the mean value of them is taken.The mean operator is a kind of the LSE.The magnitude of the outlier is smeared on the residuals of the other observations from LSE and therefore the outlier detection methods sometimes arenot able to be succesful.To avoid this effect, outlier analysis must be based on the original (initial)observations.For example, in levelingnetwork; An outlier detection method in geodetic networks based on... Bol.Ciênc.Geod., sec.Artigos, Curitiba, v. 20, n o 3, p.578-589, jul-set, 2014.5 8 0 a height difference is measured as outward run and return run.They are called here as original observations.The mean value of them is estimated and used for network adjustment.It is known that the outward run h oi and the return run h ri are independent from each other.If the outward run or the return is contaminated, the half of the outlier (∆) arises in the mean value (h oi + h ri )/2+∆/2).Since the outlier (∆) is replaced by ∆/2 in the ordinary adjustment, detecting small outlier gets more difficult.Therefore, each of the repetitive observations should be used in the ordinary adjustment model.
In order to measure the capacities of the outlier detection method Hekimoglu andKoch (1999) and(2000) proposed using the Mean Success Rate (MSR)that is the number of success divided by the number of the total number of experiments in the simulation study.In robust statistics, the breakdown point is also used to measure the global capacity (i.e.reliability) of the robust methods (XU 2005;YOUCAI 1995).The MSR has been used in regression analysis (HEKIMOGLU and KOCH 1999;HEKIMOGLU and KOCH 2000;HEKIMOGLU and BERBER 2003;HEKIMOGLU 2005) also in geodetic networks (HEKIMOGLU and ERENOGLU 2007;ERENOGLU and HEKIMOGLU 2010;HEKIMOGLU et al. 2011;HEKIMOGLU and ERDOGAN 2013).
In this study, the effects of the estimating mean value of the original repetitive observations were investigated.A leveling network that consists outward runs and return runs was simulated.The outliers that had the different magnitude intervals were added to the observations and the contaminated observations were obtained.Tests for outlier and robust method were applied to these contaminated working samples.The MSRs of the methods were obtained.According to the obtained results the case that considers all original observations in adjustment model has more reliable results than the case that considers the mean value of the original observations in the ordinary adjustment model.

LINEAR MODELS
The Gauss-Markov linear model for the geodetic networks is given as follows (KOCH 1999): , (2) where l is the nx1 vector of observations, is the ux1 parameter (unknown) vector, A is the nxu coefficient matrix, v is the residual vector, P is the diagonal nxn weight is the cofactor matrix of the parameter vector, is the cofactor matrix of the residual vector, n is the number of observations and u is the number of unknown parameters.

Tests for Outlier
Outlier detection procedures were proposed by Baarda (1968) and Pope (1976) for geodesy.It is assumed that the outliers are rare in the observations; they are called "bad" observations and their expectation value is larger than 3σ.If an observation l i has an outlier δl i the hypothesis : 0 against : 0 (5) is tested.If the a priori variance of unit weight is known, the normalized residual is estimated to obtain test statistic as follows (the covariance matrix of the observations is diagonal): Whereq vvi is the i th diagonal element of Q vv .σ vi is the standard deviation of the i th residual.
If the a priori variance is not known, the studentized residual are estimated by using a posteriori variance .The test statistic of the Pope test (τ -test) is given (POPE, 1976). .
If the level of significance α corresponds to all observations, the level of significance for each observation must be 0 1 ⁄ : where 5 ,,6,6 7 ., 6 ⁄ , ,6 7 and α is generally chosen as 0.05 (KOCH 1999).Baarda and Pope Tests in geodetic network are iterative methods.Only the observation with the largest normalized or studentized residual is tested in one cycle of the iterations.If this observation is rejected, it is removed, and the remaining observations are adjusted again.This procedure is carried out until no more outliers are detected (SCHWARZ and KOK, 1993).

Robust Methods
The robust M-estimation, a generalized form of maximumlikelihood estimation, was introduced by Huber (1964).The normal equation system of theM-estimation is non-linear.To solve it, iteratively reweighted LSE is used (KOCH. 1999).The M-estimations of Huber and Danish methods were used in this paper (KRARUP et al., 1980;HUBER, 1964).: ; < : ; < : (9) ; < : ;3 : 8, = 1, 2 … (10) ;3 8 @ (11) : : (12 Where E is the identity matrix, k is the number of iterations and c is the tuning constant.For the first iteration is estimated from Eq(2).Then, for each iteration step the diagonal elements of the weight matrix w < are changed according to related weight function W3v D E 8; and : and E are recalculated for each step.

The M-Estimation of Huber
k and c are chosen as 5 and 1.5σ o , respectively.

The M-Estimation of Danish
k and c are chosen as 5 and 1.5σ o , respectively.If the residual that is estimated at the last iteration is greater than the 3σ o , it is detected as outlier; otherwise it is assumed as good observation.

MOTIVATION
The observations of geodetic networks are measured repetitively and the means values of these observations are used in the network adjustment and also for the outlier detection.These repetitive observations are independent, and if one of them includes outlier its effect decreases depending on the computing mean value.Moreover, the reliability of the outlier detection method decreases.If the all original observations are used in the outlier analyse, (i.e.not their mean value), the more reliable results can be obtained.

Leveling Network
At the leveling network the height differences are measured as outward run (h oi ) and return run (h ri ).In the outlier detection step, the means of the outward runs and return runs are used and sometimes the effects of the outliers become smaller and also they may disappear.Furthermore, LSE smears the effect of the outlier all over the residuals of the other observations.If we can eliminate this effect, more reliable results can be obtained.This situation can be realised by using original observation without computing mean.
To investigate the effects of the using all original observations for outlier detection, two leveling networks given in Figs. 1 and 2 are considered.These networks have the same observations.The network given in Fig. 1 includes the mean value of the outward run and return run.The analysis which is applied in this network is called "classical approach".The network given in Fig. 2 includes all original observations as outward run and return run.The analysis which is applied in this network is called as "new approach".
Each height difference h i in the Fig. 1 is a mean value of the outward run (h oi ) and the return run (h ri ) in Fig. 2, i.e. h i =(h oi +h ri )/2.Also, h oi and h ri are independent from each other.It is possible that the outward run or the return run or both of them may be contaminated.If the outward run or the return run is contaminated, the half of the outlier (∆) arises in the mean value (h i + ∆/2).To prove the reliability of the new approach, the networks given in Figs.1 and 2 were considered.The heights of six points were: H 1 =100.000m,H 2 =102.256m,H 3 =105.246m,H 4 =106.245m,H 5 =104.946m and H 6 =103.486m,respectively.The height differences that were not affected from random errors h 0i (i=1,2,..,13) and then the height differences for outward run (h oi ) and return run (h ri )were computed.To obtain the measurements of the height differences the random errors (e oi and e ri ) were generated from a normal distribution.They were added to the height differences.The precision was taken as P √R ( 1 SS √1 =S ⁄ ) where S was the length of the leveling line in km.For the classical approach the precisions of the means of the original observations were estimated and used.The lengths of the leveling line for Figs. 1 and 2 varied between 0.85 km and 1.9 km.Thus, the measurements of the height differences (h oi ,h ri i=1, 2,…, 13)were computed as e oi were 0.92, 0.21, -0.65, -1.01, 0.59, -0.47, 0.24, 1.87, -1.55, -1.90, -1.91, -0.64, -0.15 mm, and e ri were 0.60, 1.52, -0.48, 0.10, -0.67, 0.36, -0.89, -0.19, -0.26, 1.58, 0.83, -0.40, -0.69 mm.To generate one contaminated height value T < , the random error e i was replaced by the outlier dh i as follows: Erdogan, B.
5 8 5 In this section the following cases are tested: I.The observations do not include any outlier.II.
The outward run (h o5 ) is contaminated with +5mm magnitude.III.
The return run (h o7 ) is contaminated with -10 mm magnitude.IV.
The outward run and return run (h o2 and h r11 ) is contaminated with +10mm magnitude.V.
The outward run and return run (h o8 and h r10 ) is contaminated with -20 mm and +1000 mm magnitude, respectively.To compare the new approach with classical approach, five different cases were analysed.Table 1 and Table 2 show the outliers that were detected by the classical approach and the new approach, respectively.
Table 1 -The outliers detected by the methods for the classical approach.
For the first case,all methods didnot detect any outlier; they were successful when the observations did not include outlier.
Table 2 -The outliers detected by the methods for the new approach.
For the second case,the methods in the classical approach did not detect the outlier,whereas the methods in the new approach detected the outlier (h o5 ) successfully.In the classical approach the magnitude of the outlier decreases, so that the all methods are unsuccessful.For the third case, all methods that were used in the classical approach and new approach detected the true outlier.

Method
For the fourth case, there were two outliers in the observation.Baarda and Huber's methods detected only one of them, Danish method detected two outliers and Pope's test did not detect any outlier in classical approach.All methods in the new approach was able to detect two outliers successfully.
For the fifth case, there were two outliers and the magnitude of one of them was very large.All methods are successful for this case.

MONTE CARLO SIMULATION RESULTS
The success of the robust methods and Tests for outlier are changed from one sample to the other one where the random errors are different (HEKIMOGLU and KOCH, 1999;HEKIMOGLU and KOCH, 2000).The success of the methods for different samples may be different.Therefore, to obtain the reliability of the new approach 10 000 working samples were simulated and analysed.For Monte Carlo simulation the networks given in Figs. 1 and 2 were considered.

Classical Approach
The random errors and the mesurements for outward run and return run (i.e. the height differences) and outlier were generated as done in above section.The outlier was added only outward run or return run.A hundered random error vectors e were generated and then a hundered good sample also were generated by adding only random errors to the height differences (\ ] ).In addition, for each sample was contaminated by one and two outliers 100 times.Thus, 10 000 contaminated samples were obtained for one outlier and two outliers separately (HEKIMOGLU and ERENOGLU, 2007;ERENOGLU and HEKIMOGLU, 2010;HEKIMOGLU et al., 2011;HEKIMOGLU and ERDOGAN, 2013).
The mean values of outward run and return run of this leveling network (Fig. 1) were firstly analyzed by employing the classical approach with two main approaches such as Tests for outlier and robust methods (Danish and Huber methods) to decide whether observations includes outlier or not.
The MSRs and standard deviations of Tests for outlier and robust methods are given in Tables 3 and 4. α is chosen 0.001 and 0.05 for Baarda and Pope's test, respectively.For Danish and Huber methods c is taken as 1.5.If the residuals that are estimated at the last iteration of Danish and Huber methods greater than the threshold value, these residuals are considered as outliers.The threshold value is chosen as 3 P .The magnitude intervals for outliers are chosen as 3σ -6σ and 6σ -12σ.
The MSRs of all methods are very small for the magnitude interval between 3σ and 6σ because the estimation of the mean value of outward run and return run decreases the magnitude of the outlier.The MSRs of Danish method are greater than other methods.The reliabilities of the classical approaches are not enough for the Erdogan, B.

The New Approach for Detection of Outlier
The outward run and return run of height differences are independent measurements.They can be considered in the adjustment model.All original observations should be in the adjustment model, so that the smearing effect of the mean operator can be removed.The height differences and the outlier and the random errors are exactlythe same as in the classical approach.The obtained MSRs and standard deviations of the methods are given in Tables 5 and 6.The MSRs of the new approach are greater than the ones of the classical approach.There is a huge improvement for small and large magnitude intervals.Since the original observations are considered in the adjustment model the smearing effects of the estimation mean value is removed.Moreover, the Baarda, Pope, Danish and Huber methods in the new approach can be detected the good observations (Type I error) as outliers at the rates of the 3%, 6%, 12% and 9%, respectively.It is a risk for outlier detection.

CONCLUSION
The observations in geodetic networks are measured repetitively andthen the means value of them are calculated and these valuesare used in the adjustment model.If the observations do not contain any outlier there is not any problem.If the observations include at least one outlier, themean value smears the outliers on the other part of the observation.Therefore, the outlier analysis must be based on the original observations, not on the mean value of them.Ifthe MSRs of the new approach are compared with the MSRs of the classical approachit is clearly seen that thereliabilities of the new approach are significantly greater than the ones of the classical approach.Moreover, if the observations do not have any outlier, Type I error increase.
Consequently, the original observations of a geodetic network should be preferred for the outlier detection without using any estimator before the network adjustment to obtain more reliable results.

Figure 1 -
Figure 1 -A leveling network that considers mean values of the outward runs and return runs.

Figure 2 -
Figure 2 -A leveling network that considers both outward runs and return runs.

Table 3 -
MSRs and standard deviations of the Tests for outlier and robust methods between 3σ and 6σ for the classical approach.

Table 4 -
MSRs and standard deviations of the Tests for outlier and robust methods between 6σ and 12σ for the classical approach.

Table 5 -
MSRs and standard deviations of the Tests for outlier and robust methods between 3σ and 6σ for the new approach.An outlier detection method in geodetic networks based on...

Table 6 -
MSRs and standard deviations of the Tests for outlier and robust methods between 6σ and 12σ for the new approach.