ELIMINATION OF SOME UNKNOWN PARAMETERS AND ITS EFFECT ON OUTLIER DETECTION

Outliers in observation set badly affect all the estimated unknown parameters and residuals, that is because outlier detection has a great importance for reliable estimation results. Tests for outliers (e.g. Baarda’s and Pope’s tests) are frequently used to detect outliers in geodetic applications. In order to reduce the computational time, sometimes elimination of some unknown parameters, which are not of interest, is performed. In this case, although the estimated unknown parameters and residuals do not change, the cofactor matrix of the residuals and the redundancies of the observations change. In this study, the effects of the elimination of the unknown parameters on tests for outliers have been investigated. We have proved that the redundancies in initial functional model (IFM) are smaller than the ones in reduced functional model (RFM) where elimination is performed. To show this situation, a horizontal control network was simulated and then many experiences were performed. According to simulation results, tests for outlier in IFM are more reliable than the ones in RFM.


INTRODUCTION
Outlier detection has a great importance in geodetic networks.The efficacy of the unknown parameters and their standard deviations depends on whether the observation set includes outliers or not.Sometimes, observations may contain one or more outliers.In this case, these outliers must be detected and removed or remeasured.Tests for outliers are mostly used for the outlier detection (BAARDA, 1968;POPE, 1976;KOCH, 1999).The efficacies of the tests for outliers change depending on the number of outliers and the magnitudes of outliers (HEKIMOGLU and KOCH, 2000).Tests for outliers can detect only one outlier reliably (HEKIMOGLU, 1997;BASELGA, 2007;HEKIMOGLU et al., 2011).If the observations include more than one outlier, the tests for outliers cannot detect them reliably due to masking effect or swamping effect, especially when the magnitudes of multiple outliers are small (HEKIMOGLU, 2000 and2005).
Until recently, capacity of computers was bounded.Therefore, the number of unknown parameters in geodetic networks should be small so that some unknowns that are not related directly to the coordinates can be eliminated in adjustment model.The normal equations may then be inverted by using computer.Herewith, large geodetic networks are divided into sub-networks in Helmert-Block method and the coordinates of the points (except connection points) are eliminated (WOLF, 1968;HÖPCKE, 1980).Furthermore, elimination of the unknown parameters as bundle block adjustment is preferred in the photogrammetry (MIKHAIL and ACKERMANN, 1976;ALBERTZ and KREILING, 1989;KRAUS, 1997).In the process of GPS observations, we may want to eliminate the ambiguity unknowns (KING et al., 1987;STRANG and BORRE, 1997;GHILANI and WOLF, 2006); also in the adjustment model, constraints sometimes may be eliminated to reduce the number of parameters (MIKHAIL and ACKERMANN, 1976).
To apply outlier detection, the observations of the geodetic networks are adjusted as a free network.First, the unknown parameters, which are not related to the coordinates of the points, are eliminated to interpret the unconstrained adjustment model geometrically (NIEMEIER, 2002).A similar problem to eliminate the orientation parameters is commonly occurred in triangulation networks.Some unknown parameters that are not of interest are generally eliminated.In this context, two different functional models come up: (1) the initial functional model (IFM) and ( 2) the reduced functional model (RFM), where elimination is performed.
Although the estimated unknown parameters and residuals in RFM are the same as the ones in IFM, the cofactor matrix of residuals and redundancies r i of the observations are different.In this study, firstly, we figure out this situation.Secondly, we investigate whether the differences among the redundancies may affect the outlier detection or not.Also, the following question is investigated: must the outlier detection be applied to RFM or IFM where a group of unknowns is not eliminated?To compare the reliabilities of tests for outliers in RFM and IFM, mean success rate (MSR) is used.The MSR was introduced to measure the efficacy of the tests for outliers (HEKIMOGLU and KOCH, 2000).The MSR is globally the number of successful detections over the number of experiments.The MSR also means the estimated power of the test (AYDIN, 2011).

ELIMINATION BY PARTITIONING BLOCKS
A general approach for eliminating some (or a group) unknown parameters is to split the functional model into blocks.The linearized IFM is given as follows: , (1) where l is the observation vector, v is the residual vector, A is the coefficient (design) matrix of the unknown parameters, x is the unknown vector, is the variance of unit weight, P is the diagonal weight matrix, is the cofactor matrix of the observations and is the variance covariance matrix of the observations.Then the design matrix and the unknown vector given in Eq. ( 1) are divided into two submatrices and sub-vectors: where the sizes of A 1 , A 2 , x 1 and x 2 are n x p, n x q, p x 1 and q x 1, respectively.X 1 contains the main unknown parameters, x 2 also contains the unknown parameters, which are eliminated.N is the number of observations, p is the number of main unknown parameters and q is the number of eliminated unknown parameters.In this case, Eq. ( 1) can be written as follows: (3) The stochastic model of Eq. ( 3) is the same as given in Eq. (1); it does not change (NIEMEIER, 2002).In principle, the normal equations and the right side of the equations can be expressed as in terms of divided blocks: Here, if is invertible, in the Eq. ( 4) x 2 can be written as a function of x 1 : (7) If Eq. ( 7) is put in the first equation of the Eq. ( 4), (8) Eq. ( 8) can be written shortly, (9) Eq. ( 10) is obtained for the inverse of the , (10) In this way, the solution of the vector x 1 for RFM can be obtained from Eq. ( 9): (11) As calculating the block matrices in IFM, the sub-matrices of the cofactor matrix Q are determined.To obtain the sub-matrices, the following equations are usually used: (12) where E denotes the identity matrix.If the sub-matrix N 22 is regular, i.e. it is invertible; the following equations can be obtained as (Faddejew and Faddejewa, 1976): (13) (14) Also, if Eq. ( 14) is considered in Eq. ( 12): (15) The above equation can be obtained, by taking advantage of the symmetry property, the following equation can be written as follows: (16) If Eq. (10) and Eq. ( 15) are compared to each other, it can be seen that the cofactor matrix of the x 1 is equal to the cofactor matrix Q 11 in IFM, i.e. .Elimination does not change the cofactor matrix.To determine the unknown vector x 2 in RFM, x 1 is obtained from Eq. ( 11) and put into Eq.( 7).It is important to determine the residuals in RFM.If x 2 is obtained from Eq. ( 7) and put into Eq.(3), the residuals can be computed as follows: (17) If the terms of this equation are shortened as follows: (18) (19) (20) Eq. ( 18) can be written differently: (21) There will not be any change for the estimation value of the variance .
(22) The equations in this section can be adapted to free network adjustment (HECK, 1975).

COMPARING THE REDUNDANCIES OF THE OBSERVATIONS IN RFM WITH THE ONES IN IFM
The H and R matrices in IFM are given as follows: (23) (24) where H denotes the hat matrix in statistics and matrix can be written from Eq.
(4) as and the hat matrix can be written from Eq. ( 23) as below.

P
(30) For RFM, the following two equations from the Eq. ( 18) can be written: (31) Since , , and , Eq. ( 31) can be written as below: (33) (34) (35) If Eq. ( 30) and ( 35) are taken into account, Eq. ( 36) can be obtained: (36) Since is a quadratic form, it can be expressed as 0. Therefore is always bigger than h , i.e. h h and also , since 1 and 1 .The equations in this section can be adapted to free network adjustment (HECK, 1975).

TESTS FOR OUTLIERS
Outlier detection procedures were proposed by Baarda (1968) and Pope (1976) for geodesy.In these outlier detection processes, "good" observations originate from the same distribution, which is generally expressed as a normal distribution , .Observations that contain outliers are called as "bad" observations.Let an observation l ′′ has an outlier δl with l ′′ l ′ δl .The hypothesis H : δl 0 against H : δl 0 is tested.If the observations are uncorrelated and the variance σ is known, the standardized residuals derived from IFM can be presented as: where is the diagonal elements of .

If
⁄ , which is the upper 2 ⁄ percentage point of the normal distribution, the observation ′′ is accepted as a bad observation where α is chosen as 0.001.This is called as Baarda's method.If there is more than one outlier among the observations, Baarda's method is used iteratively (BAARDA, 1968).
If the variance is not known before, the studentized residual is used for Pope's test: where is given in Eq. ( 22).If the level of significance α is related to all observations, the level of each observation must be ⁄ .37a) and (37b) are rewritten by considering Eq. ( 41), the following equations can be obtained , where α is generally chosen as 0.05 or 0.01 (Koch, 1999).
For RFM, we can generate the following equations similar to Eqs. ( 39) and ( 40).
(43) (44) If above two equations are considered, the following equation can be written as (45) and similar to Eqs. ( 42a) and (42b), the followings can be obtained:

Since
, and in IFM are bigger than and in RFM, respectively.It means that the effects of the outlier with small magnitude in IFM may be reflected stronger than the ones in RFM on the standardized or studentized residuals.Therefore, the MSRs of the tests for outliers in IFM become bigger than the MSRs of the ones in RFM.

Elimination of orientation parameters in triangulation network
The method dating back to C. F. Gauss uses the elimination of the orientation parameters at triangulation networks.If we consider only one unknown parameter, which has the same coefficient in the residual equations, it can be regarded as a special case of the elimination.For elimination of the only one unknown x 2 (q=1 in Eq. ( 2)), the related design matrix can be written: where the size of the A 2 is nx1 and k is the number of direction observations at one station.If P = I, the reduced approach according to Eq. ( 19) is given as follows: (48) (49) where means column sum of the A 1 (it must be multiply by 1/k) (Niemeier, 2002).At the same time, it should be eliminated from the initial residual equations.is obtained similarly, (50) (51) where column sum of is similarly divided by k and reduced from l i .Mean residual equation that is eliminated from each residual equation is computed as follows: 0 (52) In practice, Eq. ( 52) is computed for each station by which direction observations are made at the network.The linearized residual equations for a station, in which k direction measurements are made can be written with the orientation parameter o: (53) Reduced residual equations are obtained as follows: (54) where ∑ , ∑ , ∑

Simulation
In this study, a Monte-Carlo simulation technique has been used to demonstrate as above described case.To measure the reliability of tests for outliers in IFM and RFM, a horizontal control network was simulated.For the horizontal control network, the observations, such as direction measurements and distance measurements are computed from the coordinates of the points.They are free of random errors.The random errors are generated from a normal distribution as follows: for direction measurements ~ , with 3 mgon and for the distance measurements ~ , with 3 2 10 where S ij is the distance between i th and j th points.The random errors are added to the distance and direction measurements such as , thus the contaminated observations ′′ are obtained.The approximate values of the point coordinates are given in Table 1.The direction measurements and distance measurements are also given Table 2 and Table 3,    To test the reliabilities of the IFM and RFM for tests for outliers, 6 th observation (i.e.direction from 2 to 7) is contaminated by the outlier.The magnitude of the outlier is -1.255 mgon.The 6 th observation given in Table 2 includes this outlier.After adding the outlier to the observation, Baarda's method (i.e.assume that the a priori variance is known) and Pope's method (i.e.assume that the a priori variance is unknown before) are applied; and obtained redundancies, standardized and studentized residuals are represented for IFM and RFM in Table 4.In this study, is chosen 0.001 for Baarda's test and 0.05 for Pope's test, respectively.As it is seen from Table 4, the standardized and studentized residuals of the IFM are bigger than the ones of RFM.Although the outlier (6 th observation) can be detected in IFM, it cannot be detected in RFM.But, one sample is not enough to decide that the results of the IFM are more reliable than the RFM, that's why, a hundred different contaminated samples of are simulated for each of the data sets l.For a hundred different data sets l, totally 10000 different contaminated samples of , are obtained separately.Baarda's test and Pope's test are applied on these contaminated samples.To measure the reliabilities of the tests for outliers, the MSR criterion has been handled.A test for outlier is regarded as successful when the test statistics can separate the null hypothesis H 0 from the alternative hypothesis H 1 at the significance level α.The mean success rate (MSR) is defined with dividing the number of success by the number of experiments.If a good sample is contaminated by replacing any number of the observations with arbitrary values, then a contaminated sample obtained.Many good samples can be obtained by generating the different subsets of random errors.Thus, for each good sample, many contaminated samples are generated by replacing any number of good observations with arbitrary values.
Since a simulation is used to generate the outliers, it is possible to know exactly whether an observation is contaminated or not, in advance of carrying out the analysis.After applying the outlier detection method, if the observation is identified as an outlier and it corresponds to truly contaminated observation, the method is regarded as successful.If the method fails, it is considered unsuccessful (HEKIMOGLU and ERENOGLU 2007, HEKIMOGLU and KOCH 2000, ERENOGLU and HEKIMOGLU 2010, HEKIMOGLU et al. 2011).Owing to using simulation techniques, a lot of samples can be generated easily.The MSR is globally the number of successful detections over the number of experiments.
The magnitudes of the small outliers (whose magnitudes lie between 3 and 6 ), and large outliers (whose magnitudes lie between 6 and 12 ), are generated separately.Also, both tests for outliers are iteratively applied.Only the observation with the largest normalized or studentized residual is tested and in case it is rejected, it is removed and the remaining observations are then adjusted again.But, in this case, a geometric defect of the network may occur.To prevent such a geometric defect, the detected observation is not removed; instead of this, the related weight of the observation is set smaller for the next iteration step, for example p i =0.001 x p i .In this case, the initial approximation of the orientation is estimated by using the weighted arithmetic mean.94.5±19.996.7±10.995.6±10.995.1±5.9 2 93.6±18.1 92.1±11.6 90.3±10.5 75.8±13.9 The orientation parameters are eliminated in RFM; whereas they are estimated in IFM.Also, the observations at the network given in Fig. 1 were adjusted as free network.Tables 5 and 6 include the MSRs of both Baarda's and Pope's tests for IFM and RFM.The MSRs are increased by 11.5% (55.2% -43.7%) and 19.0% (45.1% -26.1%) for the Baarda's and Pope's tests, respectively, for one small outlier.Also, the reliability of IFM for two outliers is bigger than the ones of RFM.However, the MSRs of IFM are bigger than the RFM when there is no outlier in observation set.This is the type I error.The increase in type I error for IFM is 4% (5% -1%) for Baarda's test and 2% (2% -0%) for Pope's test.The advantage of IFM is 7.5% (11.5% -4%) for Baarda's test and 17% (19% -2%) for Pope's test.However, for one large outlier the MSRs for both tests are not increased significantly.

CONCLUSION
Elimination of the unknown parameters in adjustment model is sometimes preferred to shorten the calculation time.Although the estimated unknown parameters, residuals, cofactor matrix of the unknown parameters in IFM are the same as the ones in RFM, the cofactor matrix of the residuals and redundancies of the observations are different.The redundancies in IFM are smaller than the ones in RFM; this situation is proved in this study.Since the diagonal elements of the cofactor matrix of the residuals in RFM is bigger than the ones in IFM, the standardized residuals or studentized residuals in IFM are bigger than RFM.Therefore, the effects of the outliers do not appear strongly on the residuals for some cases of RFM where the magnitude of outlier is small, and the outliers cannot be detected.
In this study, two models are regarded for simulation.The orientation parameters are considered as unknowns in IFM and they are eliminated in RFM.To compare the reliabilities of tests for outliers in RFM and IFM, tests for outliers are applied on simulated horizontal control network.The reliability is measured by MSR.According to simulation results, the reliabilities of the tests for outliers in IFM are bigger than the ones in RFM for small and large outliers in condition that the observation set contains one and two outliers.For this reason, to apply tests for outliers on IFM should be preferred.
Fig. 1 presents the positions of the points and observations for the horizontal control.The simulated horizontal control network given in Figure 1 consists of 7 points where n = 48, u = 21 and degrees of freedom f = 30.The MSRs for IFM and RFM are presented for the same network.All results have been obtained using MATLAB version R2006a.The random errors are generated from a normal distribution , with expected value ( 0) and variance by using the random number generator of MATLAB.The process of simulating the good observations and bad observations are obtained as the same as Hekimoglu and Erenoglu (2007), Erenoglu and Hekimoglu (2010), Hekimoglu et al. (2011).
random error is replaced by the outlier in the related observation as ′′ ′ respectively.

Table 1 -
The approximate values of the points' coordinates shown in Fig.1.

Table 2 -
The direction measurements and standard deviations.

Table 3 -
The distance measurements and standard deviations.

Table 4 -
The redundancy, standardized and studentized residuals for IFM and RFM.

Table 5 -
The MSRs of IFM and RFM for the magnitudes which lie between 3σ and 6σ.

Table 6 -
The MSRs of IFM and RFM for the magnitudes which lie between 6σ and 12σ.