1. INTRODUCTION

Gauss-Markov (G-M) model and least-squares (LS) method are widely used in geodetic science. Most of time, the elements of the coefficient matrix may be consisting of the observations possessing the statistical properties in many applications such as the coordinate transformation (^{Akyilmaz, 2007}; ^{Li et al., 2012}; ^{Li et al., 2013}; ^{Fang, 2014}), and the estimates of the unknown parameters derived by the LS method would not be optimal because the statistical properties of the elements in the coefficient matrix are ignored. The errors-in-variables (EIV) model and so called total least-squares (TLS) method named by ^{Gloub et al. (1980}) are more rigorous than the LS method*.* There are many algorithms to compute the TLS estimate (^{Gloub et al.,1980}; ^{Schaffrin, 2006}) or weighted TLS (WTLS) estimate (^{Schaffrin and Wieser, 2008}; ^{Shen et al., 2011}; ^{Xu et al., 2012}; ^{Amiri-Simkooei and Jazaeri, 2012}; ^{Mahboub, 2012}; ^{Fang, 2013}; ^{Jazaeri et al., 2014})*.*

Unfortunately, like the LS estimate, the WTLS estimate is also extremely vulnerable to the outliers in the EIV model. Although many methods for detecting the outliesr in the G-M model are investigated extensively (^{Baarda, 1968}; ^{Pope, 1976}; ^{Kok, 1984}; ^{Huber 1981}; ^{Hekimoglu, 2005}; ^{Gui et al. 1999}, ^{2005a}, ^{2005b}, ^{2007}, ^{2011}; ^{Guo et al., 2007}; ^{Hekimoglu and Erenoglu, 2009}; ^{Lehmann, 2013}; ^{Hekimoglu et al., 2014}), they cannot be directly employed to deal with the outliers in the EIV model. ^{Schaffrin and Uzun (2011}) have generalized the mean-shift method to detect a single outlier located either in the observations or in the coefficient matrix in the EIV model. The reliability was also analyzed (^{Schaffrin and Uzun, 2012}). ^{Amiri-Simkooei and Jazaeri (2013}) applied the data-snooping procedure to identify the outliers based on the WTLS method formulated with the standard LS theory (^{Amiri-Simkooei and Jazaeri, 2012}). However, the test procedure is required to be implemented more than once while there are some repeated random elements in the different locations of the coefficient matrix like the two-dimensional affine transformation.

The partial EIV model is a generalized EIV model and can avoid considering the correlations between the repeated random elements in the coefficient matrix (^{Xu et al., 2012}). Therefore, it is a more proper model to be used to deal with the case where the coefficient matrix follows a structured characteristic. Unfortunately, the test statistics for detecting the outliers cannot be clearly derived through the existing WTLS method. For this reason, a new two-step iterated approach of computing the WTLS estimates under the framework of LS theory is developed in this paper so that some test statistics of identifying the outliers for the partial EIV model can be constructed.

The remaining of the paper is organized as follows. In Section 2, a two-step iterated method for the partial EIV model taking advantage of LS theory is proposed. In Section 3, the corresponding *w*-test statistics are constructed to detect the outliers while the observations, coefficient matrix or both are contaminated with the outliers and an algorithm for detecting outliers in the partial EIV model is designed. If the variance factor is not known, we will employ the least median squares (LMS) method to estimate it. In a latter section, a simulated data and a real data about two-dimensional affine transformation are used to verify the validity of the proposed method. In the end, some concluding remarks are presented.

2. PARTIAL EIV MODEL AND WTLS ESTIMATE

As a matter of fact, not all elements of the coefficient matrix are random and there are some repeated random elements in the different locations of the coefficient matrix such as the coordinate transformation. As a result, their correlations between the repeated random elements must be taken into account. The five rules (^{Mahboub, 2012}) can be used to determine the variance-covariance matrix of the coefficient matrix. However, if the partial EIV model proposed by ^{Xu et al. (2012}) is considered, the correlations can be avoided so that the additional burden is reduced. Therefore, the partial EIV model is more superior to be adopted. The function model is shown as following:

Where *X*= *t*×1 vector of unknown parameters; *L*= *n*×1 vector of observations; *I _{n}
* =

*n*×

*n*identity matrix;

*h*=

*nt*×1 vector that is consisting of zero and fixed elements of the coefficient matrix

*A*;

*B*=

*nt*×

*s*known structured matrix;

*s*=the number of different random elements of

*A*=

*invec*(

*h*+

*Ba*); = s ×1 true values vector of a; e = s ×1 random errors vector of a; Δ= n ×1 vector of random errors of observations; invec is a mathematic function for transforming an nt×1 vector to an n × t matrix; =Kronecker product operator. The stochastic model is expressed as follows:

Where QL= n×n cofactor matrix of L; Qa= s×s cofactor matrix of a; σ2=unknown variance factor.

A two-step iterated method of computing the WTLS estimate for the partial EIV model is proposed in order to develop an outlier detection method suitable for the partial EIV model. For any given X(0), the model (1) can be transformed as follows:

Furthermore, the model (3) can be rewritten as

the estimate ofcan be derived by the LS principle (^{Koch, 1999}). As a result, we have

The residual vector of a is

Insertinginto the first equation of the model (1) yields

If the inverse transformation of the mathematic operator vec (invec) is used, we can obtain

Then the model (8) is easily rewritten as follows:

Similarly, based on the LS principle (^{Koch, 1999}), the estimate of X is

and the residual vector of L is

The posterior estimate of the variance factor, which can be obtained from Equation 7 and Equation 12, is

3. OUTLIER DETECTION PROCEDURE IN PARTIAL EIV MODEL

The data-snooping method suggested by ^{Baarda (1968}) is employed extensively in geodetic data processing for detecting the outliers (^{Kok, 1984}; ^{Koch, 1999}). If the observations or coefficient matrix in the partial EIV model are contaminated with the outliers, the following w-test statistics can be constructed based on Equation 6 or Equation 11 to detect the outliers:

andare an unit vector with the ith and jth element equal to 1, respectively; N(0,1) represents the standard normal distribution.

In general, when the variance factor is unknown, its posterior estimatecan be adopted (^{Pope, 1976}). Then we have

and

Where τn = τ distribution with n degree of freedom. The computation about τ distribution can be found in ^{Baselga (2007}) and ^{Guo and Zhao (2012}).

The robust method is an efficient one to estimate the variance factor. By employing the least median squares (LMS) method (^{Rousseeuw and Leroy, 1987}), the variance factor may be estimated by

or

So the test statistics (14) and (15) with (18) and (19) become

and

The superiority of the above two test statistics is that they are very robust to the outliers so that it is more reliable for them to be used for detecting the outliers. It is to be noted here that they do not strictly follow a normal distribution. Therefore, it is very hard to give the exact probability distributions of them. In order to simplify the computation of the threshold value which is used to identify the outliers, the upper percentage point of the standard normal distribution is still used when the principle of identifying the outliers is established.

The implemented procedure for detecting the outliers in the partial EIV model is summarized as follows:

Step1. Give a,L,h,B,QL,Qe and define.

Step3. For any k, compute

Step6. If, the iteration will be stopped, whereis a given value. Otherwise, return to Step 3.

Step8. According to the data-snooping procedure, for single outlier, if and are satisfied simultaneously, one can judge that the outlier locates in the observation equation containing the observation *L _{j}
* and coefficient matrix element

*a*. For multiple outliers, if and we will deem that the corresponding observation equation containing the observations

_{i}*L*and coefficient matrix elements

_{j}*a*is contaminated with outlier. But one still can’t confirm that the outliers locate in the observations or coefficient matrix, or both. Here

_{i}*u*is the upper α-percentage point of the standard normal distribution.

_{α}Step9. If multiple outliers exist in the observations or coefficient matrix, the above procedure of Step 1 to Step 8 should be repeated until all the *w*-test statistics are smaller than the threshold value.

4. NUMERICAL RESULTS

4.1. Simulated two-dimensional affine transformation

The mathematic model for the two-dimensional affine transformation is expressed as follows:

The data are displayed in Table 1, which is taken from ^{Amiri-Simkooei and Jazaeri (2013})*.* In this example, there are ten points in total. So the partial EIV model is

In order to give the reliable evaluations for the proposed outlier detection method, the following five schemes for adding outliers are discussed. The significant level for determining critical value is set as 0.05, which is very frequently used (^{Gao et al. 1992}).

Scheme 1: According to ^{Amiri-Simkooei and Jazaeri (2013}), the outlier of magnitude 0.1 m which is 10 times of the priori standard deviation, is added into the xs component of point 4 in the start system.

The residuals of the observations and random vector a and the corresponding w-test statistics are displayed in Table 2. Obviously, the absolute values of residuals of the x components of point 4

in the start system and target system are greater than others. Meanwhile, bothand surpass the threshold value u0.975 = 1.96. So we deem that there is an outlier in the x component of the start system, target system or both, which is kept the same with the set simulated case. However, we can’t determinate the special position of the outlier.

Scheme 2: The outlier of magnitude 0.1 m is added into both components of point 4 in the start system.

The residuals and w-test statistics are shown in Table 3. As we know, the absolute values of residuals of the x components of point 4 in both coordinate systems are greater than others.

Particularly, bothandfor thecomponent of point 4 are beyond the threshold value 1.96, andfor the yt component of point 4 in the target system exceeds 1.96 too. Althoughfor the ys component of point 4 in the start system is smaller than the threshold value 1.96, the absolute values of w-test statistics and their corresponding absolute values of residuals are very tremendous. Thus, both components of point 4 are considered to be contaminated with outliers. Unfortunately, we can’t discriminate the specific positions of these outliers .

Scheme 3: The outlier of magnitude 0.1 m is added into the xs component of point 4 in the start system and the yt component of point 4 in target system.

The residuals and the w-test statistics are obtained, which is displayed in Table 4. The results from Table 4 show that the test statistics satisfyand, which shows that the x component of point 4 is possibly contaminated with an outlier. Although the absolute value of residual for the yt component of point 4 in the target system is small, and the absolute value of residual for the ys component of point 4 in the start system demonstrate that there is an outlier in the y component.

Scheme 4: The outlier of magnitude 0.1m is added into the y component of point 4 in both start system and target system.

The concrete results are presented in Table 5. It is not difficult to know andfrom Table 5, but the absolute values of other *w*-statistics are smaller than 1.96. It means that only *y* component of point 4 contains an outlier, which is consistent with the set simulated case. If we will delete point 4 in both coordinate systems, the new results about the residuals and *w*-test statistics are obtained, which is displayed in Table 6. It is shown that allandare smaller than the threshold value 1.96, which demonstrates that the remaining observations are clean without the effects of outliers.

We just discuss the case that the outlier locates in the same point in two different systems for scheme 1 to 4. In fact, there may be multiple outliers in the different points for the two-dimensional coordinate transformation. Hence, the following scheme 5 is used to assess the efficiency of the proposed procedure for detecting multiple outliers in the partial EIV model.

Scheme 5: In this simulation, two outliers of magnitude 0.1 m are added to the *x _{s}
* component of point 2 in the start system and the

*y*component of point 4 in the target system, respectively.

_{t}The detail results about the residuals and *w*-test statistics are listed in Table 7. and indicate that the *x* component of point 2 contains an outlier. On the other hand, due toand, the *y*component of point 4 is probable to be contaminated with an outlier. Because the outlier may locate in the different locations, we will delete point 2 in both coordinate systems firstly. After that, the new results and *w*-test statistics are obtained, which can be found in Table 8. Apparently, there is an outlier in *y* component in the start system or target system or both based on the criterion for identifying outlier. As a result, point 4 in both two coordinate systems should be deleted. After removing the assigned outlying observations, the new results about the residuals and *w-*test statistics are presented in Table 9, which indicates that there is no outlier in the observations of both coordinate systems.

4.2 Real data about map rectification

The example is about the map rectification. The 2D affine transformation is used to rectify the map. The scale of map is 1:500 for Figure 1.

There are ten common points whose theoretical coordinates are previously known, and then we sample their coordinates on the distorted map. The affine transformation is used to rectify the map. The sampled coordinates and theoretical coordinates of common points are treated as the coordinates in the start system and target system, respectively, which is displayed in Table 10.

The transformation parameters can be estimated by using the common points with the 2D affine transformation. By employing the proposed algorithm, the residuals and *w*-test statistics of the observations and random vector *a* are derived, which is shown in Table 11. Because the *w*-test statistics satisfyand, the point 7 is suspected as an outlier and should be deleted. Then the new residuals and *w*-test statistics are obtained, which can be found in Table 12. Due tofor point 9 in the target system, there are no outliers in the observations even ifaccording to the criterion for identifying the outliers in section 3. Therefore, the only outlier is identified. After that, the transformation parameters are estimated by the WTLS method. The results are presented in Table 13. By checking the reliability of the proposed method, the fifteen non-common points are employed to >to evaluate the performance of the proposed algorithm and RMSE (Root mean square error) is used to judge the influence of outlier for the coordinates. The RMSE for the data-snooping procedure is 0.00892, but is 0.032786 for the WTLS method with outliers. The reason is that the transformation parameters estimated by the WTLS method are disturbed with the outliers.

to evaluate the performance of the proposed algorithm and RMSE (Root mean square error) is used to judge the influence of outlier for the coordinates. The RMSE for the data-snooping procedure is 0.00892, but is 0.032786 for the WTLS method with outliers. The reason is that the transformation parameters estimated by the WTLS method are disturbed with the outliers.

5. CONCLUSIONS

The WTLS estimate of the partial EIV model may strongly be influenced by the outliers. The aim of this paper is to develop an approach to detect the outliers in the partial EIV model. Firstly, we propose a two-step iterated method of computing the WTLS estimates for the partial EIV model based on the standard LS theory. Then the corresponding *w*-test statistics are constructed to detect the outliers while the observations, coefficient matrix or both are contaminated with the outliers. If the variance factor is unknown, it may be estimated by the LMS method. Making using of the proposed two-step iterated method, the implement algorithm for detecting the outliers in the partial EIV model is proposed. Through the numerical results with the two-dimensional affine transformation, the identification of outliers is implemented only once through the proposed procedure compared with previously approach while single outlier is considered. For multiple outliers, the repeated test with step by step is suggested. However, we still can’t discriminate that the outliers locate in the observation or coefficient matrix or both, which is a very open problem to be discussed in the future