1. INTRODUCTION
The geodetic networks (the leveling network, horizontal control network or Global Navigation Satellite System (GNSS) network) are established and the observations are measured at least two times repetitively. Depending on the different reasons i.e. based on environment or user attention or equipment conditions, some outliers may be occurred in the observations. These outliers can affect the estimated parameters and their variances, so that these obtained wrong results may cause wrong assumptions. Therefore the outlier in the observations must be detected.There are two main approaches to detect outlier in Geodesy: Tests for outlier (^{BAARDA 1968}; ^{POPE 1976}) and robust methods (^{HUBER 1981}; ^{HAMPEL et al. 1986}; ^{ROUSSEEUW and LEROY 1987}; ^{KOCH 1999}).
The least squares estimation (LSE) plays an important role for both Tests for outlier and robust methods. It is very sensitive against deviations of the model assumptions (^{HAMPEL et al. 1986}) and spreads the effects of the outliers on all residuals (^{HEKIMOGLU et al. 2011}). Each observation in geodetic networks is measured at least two times repetitively, and then the mean value of them is taken. The mean operator is a kind of the LSE. The magnitude of the outlier is smeared on the residuals of the other observations from LSE and therefore the outlier detection methods sometimes arenot able to be succesful.To avoid this effect, outlier analysis must be based on the original (initial)observations.For example, in levelingnetwork; a height difference is measured as outward run and return run. They are called here as original observations.The mean value of them is estimated and used for network adjustment.It is known that the outward run h_{oi }and the return run h_{ri} are independent from each other. If the outward run or the return is contaminated, the half of the outlier (∆) arises in the mean value (h_{oi} + h_{ri})/2+∆/2). Since the outlier (∆) is replaced by ∆/2 in the ordinary adjustment, detecting small outlier gets more difficult. Therefore, each of the repetitive observations should be used in the ordinary adjustment model.
In order to measure the capacities of the outlier detection method Hekimoglu and Koch (1999) and (2000) proposed using the Mean Success Rate (MSR)that is the number of success divided by the number of the total number of experiments in the simulation study. In robust statistics, the breakdown point is also used to measure the global capacity (i.e. reliability) of the robust methods(XU 2005; YOUCAI 1995). The MSR has been used in regression analysis (^{HEKIMOGLU and KOCH 1999}; ^{HEKIMOGLU and KOCH 2000}; ^{HEKIMOGLU and BERBER 2003}; ^{HEKIMOGLU 2005}) also in geodetic networks (^{HEKIMOGLU and ERENOGLU 2007}; ^{ERENOGLU and HEKIMOGLU 2010}; ^{HEKIMOGLU et al. 2011}; ^{HEKIMOGLU and ERDOGAN 2013}).
In this study, the effects of the estimating mean value of the original repetitive observations were investigated. A leveling network that consists outward runs and return runs was simulated. The outliers that had the different magnitude intervals were added to the observations and the contaminated observations were obtained. Tests for outlier and robust method were applied to these contaminated working samples. The MSRs of the methods were obtained. According to the obtained results the case that considers all original observations in adjustment model has more reliable results than the case that considers the mean value of the original observations in the ordinary adjustment model.
2. LINEAR MODELS
The Gauss-Markov linear model for the geodetic networks is given as follows (^{KOCH 1999}):
where l is the nx1 vector of observations, is the ux1 parameter (unknown) vector, A is the nxu coefficient matrix, v is the residual vector, P is the diagonal nxn weight matrix, is the variance with unit weight, C_{ll} is the nxn covariance matrix of the observations,Q_{xx} is the cofactor matrix of the parameter vector, Q_{vv} is the cofactor matrix of the residual vector, n is the number of observations and u is the number of unknown parameters.
2.1 Tests for Outlier
Outlier detection procedures were proposed by ^{Baarda (1968)} and ^{Pope (1976)} for geodesy. It is assumed that the outliers are rare in the observations; they are called "bad" observations and their expectation value is larger than 3σ.
If an observation l_{i} has an outlier δl_{i} the hypothesis
is tested. If the a priori variance of unit weight is known, the normalized residual is estimated to obtain test statistic as follows (the covariance matrix of the observations is diagonal):
Whereq_{vvi} is the i^{th} diagonal element of Q_{vv}. σ_{vi} is the standard deviation of the i^{th} residual. If w_{i} > z_{1-α /2}, the i^{th} observation is considered as an outlier. α is generally chosen as 0.001 for Baarda's test (^{BAARDA, 1968}).
If the a priori variance is not known, the studentized residual are estimated by using a posteriori variance . The test statistic of the Pope test (τ - test) is given (^{POPE, 1976}).
If the level of significance α corresponds to all observations, the level of significance for each observation must be α/n:
where C_{1-α,n,n-u} = τ_{1-α/n,1,n-u-1} and α is generally chosen as 0.05 (^{KOCH 1999}). Baarda and Pope Tests in geodetic network are iterative methods. Only the observation with the largest normalized or studentized residual is tested in one cycle of the iterations. If this observation is rejected, it is removed, and the remaining observations are adjusted again. This procedure is carried out until no more outliers are detected (^{SCHWARZ and KOK, 1993}).
2.2 Robust Methods
The robust M-estimation, a generalized form of maximumlikelihood estimation, was introduced by Huber (1964). The normal equation system of theM-estimation is non-linear. To solve it, iteratively reweighted LSE is used (^{KOCH. 1999}). The M-estimations of Huber and Danish methods were used in this paper (^{KRARUP et al., 1980}; ^{HUBER, 1964}).
Where E is the identity matrix, k is the number of iterations and c is the tuning constant. For the first iteration is estimated from Eq(2). Then, for each iteration step the diagonal elements of the weight matrix (w) are changed according to related weight function W (v_{i}^{k}); and _{k} and v_{k} are recalculated for each step.
3. MOTIVATION
The observations of geodetic networks are measured repetitively and the means values of these observations are used in the network adjustment and also for the outlier detection. These repetitive observations are independent, and if one of them includes outlier its effect decreases depending on the computing mean value. Moreover, the reliability of the outlier detection method decreases. If the all original observations are used in the outlier analyse, (i.e. not their mean value), the more reliable results can be obtained.
3.1 Leveling Network
At the leveling network the height differences are measured as outward run (h_{oi}) and return run (h_{ri}). In the outlier detection step, the means of the outward runs and return runs are used and sometimes the effects of the outliers become smaller and also they may disappear. Furthermore, LSE smears the effect of the outlier all over the residuals of the other observations. If we can eliminate this effect, more reliable results can be obtained. This situation can be realised by using original observation without computing mean.
To investigate the effects of the using all original observations for outlier detection, two leveling networks given in Figs. 1 and 2 are considered. These networks have the same observations. The network given in Fig.1 includes the mean value of the outward run and return run. The analysis which is applied in this network is called "classical approach". The network given in Fig. 2 includes all original observations as outward run and return run. The analysis which is applied in this network is called as "new approach".
Each height difference hiin the Fig. 1 is a mean value of the outward run (h_{oi}) and the return run (h_{ri}) in Fig.2, i.e. hi=(h_{oi}+h_{ri})/2. Also, h_{oi} and h_{ri} are independent from each other. It is possible that the outward run or the return run or both of them may be contaminated. If the outward run or the return run is contaminated, the half of the outlier (∆) arises in the mean value (h_{i}+∆/2).
To prove the reliability of the new approach, the networks given in Figs.1 and 2were considered. The heights of six points were: H_{1}=100.000m, H_{2}=102.256m, H_{3}=105.246m, H_{4}=106.245m, H_{5}=104.946 m and H_{6}=103.486 m,respectively. The height differences that were not affected from random errors h_{0i}(i=1,2,..,13) and then the height differences for outward run (h_{oi}) and return run (h_{ri})were computed. To obtain the measurements of the height differences the random errors (e_{oi} and e_{ri}) were generated from a normal distribution. They were added to the height differences. The precision was taken as σ_{h} = σ_{o} (σ_{o} = 1 mm/) where S was the length of the leveling line in km. For the classical approach the precisions of the means of the original observations were estimated and used. The lengths of the leveling line for Figs. 1 and 2 varied between 0.85 km and 1.9 km. Thus, the measurements of the height differences (h_{oi},h_{ri}i = 1, 2,..., 13)were computed as
e_{oi}were 0.92, 0.21, -0.65, -1.01, 0.59, -0.47, 0.24, 1.87, -1.55, -1.90, -1.91, -0.64, 0.15 mm, and e_{ri} were 0.60, 1.52, -0.48, 0.10, -0.67, 0.36, -0.89, -0.19, -0.26, 1.58, 0.83, -0.40, -0.69 mm. To generate one contaminated height value h_{i}, the random error e_{i} was replaced by the outlier dh_{i} as follows:
In this section the following cases are tested:
The observations do not include any outlier.
The outward run (h_{o5}) is contaminated with +5mm magnitude.
The return run (h_{o7}) is contaminated with -10 mm magnitude.
The outward run and return run (h_{o2} and h_{r11}) is contaminated with +10mm magnitude.
The outward run and return run (h_{o8} and h_{r10}) is contaminated with -20 mm and +1000 mm magnitude, respectively.
To compare the new approach with classical approach, five different cases were analysed. Table 1 and Table 2 show the outliers that were detected by the classical approach and the new approach, respectively.
Cases \ Method | Baarda | Pope | Danish | Huber |
---|---|---|---|---|
I.Case | - | - | - | - |
II.Case | - | - | - | - |
III.Case | h_{7} | h_{7} | h_{7} | h_{7} |
IV.Case | h_{2} | - | h_{2} - h_{11} | h_{2} |
V.Case | h_{g} - h_{10} | h_{8} - h_{10} | h_{8} - h_{10} | h_{8} - h_{10} |
Cases \ Method | Baarda | Pope | Danish | Huber |
---|---|---|---|---|
I.Case | - | - | - | - |
II.Case | h_{o5} | h_{o5} | h_{o5} | h_{o5} |
III.Case | h_{o7} | h_{o7} | h_{o7} | h_{o7} |
IV.Case | h_{02} - h_{r11} | h_{02} - h_{r11} | h_{02} - h_{r11} | h_{02} - h_{r11} |
V.Case | h_{o8} - h_{r10} | h_{o8} - h_{r10} | h_{o8} - h_{r10} | h_{o8} - h_{r10} |
For the first case,all methods didnot detect any outlier; they were successful when the observations did not include outlier.
For the second case,the methods in the classical approach did not detect the outlier,whereas the methods in the new approach detected the outlier (h_{o5}) successfully. In the classical approach the magnitude of the outlier decreases, so that the all methods are unsuccessful.
For the third case, all methods that were used in the classical approach and new approach detected the true outlier.
For the fourth case, there were two outliers in the observation. Baarda and Huber's methods detected only one of them, Danish method detected two outliers and Pope's test did not detect any outlier in classical approach. All methods in the new approach was able to detect two outliers successfully.
For the fifth case, there were two outliers and the magnitude of one of them was very large. All methods are successful for this case.
4. MONTE CARLO SIMULATION RESULTS
The success of the robust methods and Tests for outlier are changed from one sample to the other one where the random errors are different (^{HEKIMOGLU and KOCH, 1999}; ^{HEKIMOGLU and KOCH, 2000}). The success of the methods for different samples may be different. Therefore, to obtain the reliability of the new approach 10 000 working samples were simulated and analysed. For Monte Carlo simulation the networks given in Figs. 1 and 2were considered.
4.1 Classical Approach
The random errors and the mesurements for outward run and return run (i.e. the height differences) and outlier were generated as done in above section. The outlier was added only outward run or return run. A hundered random error vectors ewere generated and then a hundered good sample also were generated by adding only random errors to the height differences (h_{0}). In addition, for each sample was contaminated by one and two outliers 100 times. Thus, 10 000 contaminated samples were obtained for one outlier and two outliers separately (^{HEKIMOGLU and ERENOGLU, 2007}; ^{ERENOGLU and HEKIMOGLU, 2010}; ^{HEKIMOGLU et al., 2011}; ^{HEKIMOGLU and ERDOGAN, 2013}).
The mean values of outward run and return run of this leveling network (Fig.1) were firstly analyzed by employing the classical approach with two main approaches such as Tests for outlier and robust methods (Danish and Huber methods) to decide whether observations includes outlier or not.
The MSRs and standard deviations of Tests for outlier and robust methods are given in Tables 3 and 4. α is chosen 0.001 and 0.05 for Baarda and Pope's test, respectively. For Danish and Huber methods c is taken as 1.5. If the residuals that are estimated at the last iteration of Danish and Huber methods greater than the threshold value, these residuals are considered as outliers.The threshold value is chosen as 3σ_{h}. The magnitude intervals for outliers are chosen as 3σ - 6σ and 6σ - 12σ.
Method \ The number of Outliers | 0 (%) | 1 (%) | 2 (%) |
---|---|---|---|
Baarda | 100 | 1.1±2.1 | 0 |
Pope | 95 | 12.9±15.5 | 0.3±1.3 |
Danish | 100 | 11.5±6.5 | 1.1±1.7 |
Huber | 100 | 2.0±2.7 | 0.1±0.3 |
Method \ The number of Outliers | 1 (%) | 2 (%) |
---|---|---|
Baarda | 62.1±10.9 | 27.3±9.0 |
Pope | 68.1±24.8 | 5.3±6.1 |
Danish | 83.6±8.0 | 62.8±11.1 |
Huber | 68.1±9.4 | 41.8±9.0 |
The MSRs of all methods are very small for the magnitude interval between 3σ and 6σ because the estimation of the mean value of outward run and return run decreases the magnitude of the outlier. The MSRs of Danish method are greater than other methods. The reliabilities of the classical approaches are not enough for the high precision estimations; the undetectable outliers affect badly the estimation parameters and their standard deviations.
4.2 The New Approach for Detection of Outlier
The outward run and return run of height differences are independent measurements. They can be considered in the adjustment model. All original observations should be in the adjustment model, so that the smearing effect of the mean operator can be removed. The height differences and the outlier and the random errors are exactlythe same as in the classical approach. The obtained MSRs and standard deviations of the methods are given in Tables 5 and 6.
Method \ The number of Outliers | 0 (%) | 1 (%) | 2 (%) |
---|---|---|---|
Baarda | 97 | 78.7±13.2 | 62.2±12.9 |
Pope | 94 | 68.0±23.0 | 32.9±26.6 |
Danish | 88 | 80.4±26.7 | 72.5±24.1 |
Huber | 91 | 81.1±20.4 | 70.5±17.7 |
The number of OutliersMethod | 1(%) | 2(%) |
---|---|---|
Baarda | 97.2±14.2 | 97.4±12.4 |
Pope | 94.8±19.3 | 93.7±17.1 |
Danish | 87.9±29.4 | 88.2±27.4 |
Huber | 92.1±22.0 | 91.8±19.9 |
The MSRs of the new approach are greater than the ones of the classical approach. There is a huge improvement for small and large magnitude intervals. Since the original observations are considered in the adjustment model the smearing effects of the estimation mean value is removed. Moreover, the Baarda, Pope, Danish and Huber methods in the new approach can be detected the good observations (Type I error) as outliers at the rates of the 3%, 6%, 12% and 9%, respectively. It is a risk for outlier detection.
5. CONCLUSION
The observations in geodetic networks are measured repetitively andthen the means value of them are calculated and these valuesare used in the adjustment model. If the observations do not contain any outlier there is not any problem. If the observations include at least one outlier, themean value smears the outliers on the other part of the observation. Therefore, the outlier analysis must be based on the original observations, not on the mean value of them. Ifthe MSRs of the new approach are compared with the MSRs of the classical approachit is clearly seen that thereliabilities of the new approach are significantly greater than the ones of the classical approach. Moreover, if the observations do not have any outlier, Type I error increase. Consequently, the original observations of a geodetic network should be preferred for the outlier detection without using any estimator before the network adjustment to obtain more reliable results.