1. Introduction

With the advent of the artificial satellites technology, the activities that involve geodetic measurements have experienced a revolution in recent years, due to the ability that the Global Navigation Satellite System (GNSS) has to produce accurate point positioning, combining promptness and accuracy. These facts have motivated the necessity of revision of the characteristics of the Brazilian Geodetic System, resulting in the implementation of the Brazilian Network for Continuous Monitoring - RBMC and therefore the establishment of SIRGAS2000.

Based on this approach, this paper analyzes the coordinates obtained from GNSS data processing, as well as the related statistical values from the adjustment procedures applied to obtain the final coordinates to classify the GNSS measurements in terms of reliability. The accuracy values were determined by comparing the differences in values of the computed coordinates and the official ones made available by IBGE. The data processing and adjustments have been conducted based on GNSS measurements collected in some periods of the years 2006, 2007 and 2008, considering the baseline geometries, distance between stations, seasons of the year, occurrence of solar activities and the use of control points (horizontal and vertical). The results were classified into four levels using the Decision Tree technique. As a result, it is proposed a classification table of reliability, so that the information can be used in different land surveying projects. In addition, this paper proposes a methodology to evaluate the quality and accuracies of short and long baselines to comply with the requirements of the Brazilian´s Law Nº 10.267/2001 (^{LEI Nº 10.267, 2001}), which is related to the rural lands georreferencing. According to this law, the owners of rural areas are required to submit the georreferencing plant of their rural properties to INCRA, attesting that their property does not overlap other rural lands or environmental conservation areas, as for instance, Natives reservations or Maroons.

2. GNSS

The basic principle of GNSS technology is the determination of the travel time of the signal transmitted by artificial satellites, which reach a terrestrial receiver antenna and, consequently, generates the vectorial distance between the receiver antenna and the satellite antennas. Based on baseline computations, it is possible to calculate the coordinates of the receiver antenna in the same coordinate system of a terrestrial known point, considered as reference in the data processing.

The coordinates obtained from the data collected by GNSS receivers undergo a series of influences, due to the environment in which the electromagnetic waves pass through before reaching the receiver antenna. As a result, the processed coordinates are degraded in precision and accuracy. In this work, these influences are classified into three groups: (1) uncertainties arising in signal emission, (2) uncertainties arising in signal propagation and reception and (3) uncertainties arising in data processing and adjustment.

2.1 Uncertainties arising in signal emission

Uncertainties are related to the geometry of distribution of the satellites on the observer's horizon (geometry) and to the quality of the transmitted ephemerides by the satellites. The geometry can be evaluated by named DOP factors (Dilution of Precision), being the PDOP factor (Precision of DOP) considered in this study.

Regarding the quality of ephemerides, there are two different classes: broadcasted ephemerides by the satellites and precise ephemerides provided by various civil organizations. The broadcasted ephemerides emitted by the GNSS satellites provides, by navigation message, the orbital elements necessary for the coordinates computation of each satellite as well as the coefficients for the satellite clock corrections. The precise ephemerides are derived from calculations performed by research centers, under contracts with the International GNSS Service - IGS, from orbital elements collected. Based on this fact, one can say that the experiments that use precise ephemerides are more accurate than those processed by broadcasted ones. It is recommended that broadcast ephemerides shall be used only for computation of coordinates in short baselines, while precise ephemerides are recommended for long baselines. In this study, it has been used only IGS precise ephemerides downloaded from http://igscb.jpl.nasa.gov/components/prods.html (accessed on 01/31/2011), which are a result of the combination of GNSS data collected from several research centers. These data are normally available on websites, at least a week after data collection.

2.2 Uncertainties arising in signal propagation and reception

This topic is related to all disturbances that act on the wave front during its propagation from the satellite antennas to the receiver antenna.

Tropospheric refraction

Troposphere is a layer of the atmosphere, situated above the earth's surface up to 50 km altitude, formed mostly by the concentration of gases, primarily containing Nitrogen (N2), Oxygen (O2), Dioxide of Carbon (CO2), Argon (Ar), water vapor and neutral particles. The propagated signal is influenced directly by water vapor, atmospheric pressure and temperature that exist in this layer. The effects of the troposphere are considerable in the GNSS positioning, especially in the geometric height determination. Some GNSS software have models for computing such effects, by estimating the tropospheric zenith delay.

Ionospheric refraction

Ionosphere is a layer of the atmosphere that extends in various layers from about 50 km to 1,000 km above earth. These layers are composed of ionized gases that interacts with their constituents elements. The ionization depends on the molecules of the gases, on their physical and on the intensity of radiation. As these radiations are close to the earth´s surface, they find dense layers that increase the density of the free electrons by means of a photochemical process. Simultaneously, there is a process of particles transportation forming a distribution of layers with different densities. Thus, the ionosphere causes a delay in the pseudodistance measurements and an equivalent advance of carrier phase due to the presence of free electrons in this layer. The ionosphere behavior is a function of time of day, time of year, latitude and longitude of the observation points and the solar activities.

Solar activities

The uncertainty due to the ionosphere is directly proportional to the Total Electron Content (TEC). The TEC varies regularly in time and space in relation to the sunspot cycle, epoch of year, time of day, geographic location, among others. However, the TEC can undergo abrupt changes in their behavior, due to, for instance, the occurrence of intense solar flares. This occurrence can happen in two ways: (1) unexpectedly, considering that the register are due to the systematic mapping of the solar surface realized by terrestrial observatories, and (2) expectedly, provided by the cycle of solar activity that is monitored by official research centers, for more than a century. The monitoring has found that they present a regular and repetitive cycle, lasting approximately 11 years. See figure 1.

Cycle slips

Cycle slips can occur due to existing obstructions near to the receiver antenna, such as buildings, trees, bridges, mountains and others. Furthermore, they can also be a result of abrupt changes in the atmosphere conditions, interference from other radio sources and problems in the software/hardware of the receiver. In case of cycle slips detection, the observations can be corrected, or disregarded during the data processing.

Multipath

Multipath, can be termed as "signals reflection", is a process that occurs when the front of wave that propagates between the satellite antennas and the receiver antenna find a discontinuity in the refractive index. Thus, the receiver antenna can receive, in some circumstances, besides the signal arriving directly to it from the satellites, signals reflected from surfaces close to it, such as buildings, cars, trees, water masses, fences, etc. The effect of multipath in the code observables is almost double of the carrier phase, and to reduce its effects, the users must take care of the places where the receiver antennas are installed.

Ambiguity

The GNSS receivers measures the fractional part of the phase of the carrier wave starting the counting of cycles as soon as the receiver is turned on. The ambiguity can be understood as the number of cycles since of the first observable, and is a parameter determined by adjustment, together with other parameters such as, the clock errors. It is inherent in the carrier phase measurement, and depends on the quality of the receiver, receiver antenna and the satellite signal. It can be said that in all data collection for a long period, there is always occurrence of cycle slips. The estimated values of the ambiguity can be real numbers (float solution or free solution) or can be set as integer numbers (fix solution). However, the integer ambiguities is not a simple task, and the solution can be affected by ionospheric effects, tropospheric refraction, multipath, geometry and number of satellites available on the horizon of the receiver antenna, as well as the collection time of observables.

2.3 Uncertainties originated in the data processing and adjustment

The factors that influence the accuracy of coordinate of a point considered in this paper are: (a) distance and geometry among the RBMC stations, (b) quality of the data and (c) horizontal and vertical accuracies obtained from the adjustment of the processing data.

(a) Distance and geometry between RBMC stations

In this study all analysis are based on baselines from 300 km up to 1,400 km length. Distances shorter than 300 km were considered short baselines due to the high quality of the RBMC data. Regarding the geometry of the RBMC stations, it have been used two different configurations: one using equivalent distances (approximately equal) and another one with different distances.

(b) Adjustment of observations by the method of least squares and quality analysis using the chi-square test (χ2)

Adjustment of observations is a mathematical model for the solution of an overdetermined system of equations based on the principle of least squares (^{Amiri-Simkooei et all, 2009}). Beside the unique solution, the adjustment is also an important tool for statistical analysis and quality of the results including precisions and statistical tests for quality control. In this paper, the quality control of the adjustment have been done using the Chi-Square Test (χ2).

3. Brazilian continuous monitoring network (RBMC)

The RBMC aims to provide an infrastructure of geodetic reference using GNSS technology, facilitating the use of the system by end-users and ensuring the quality of the results. It is worth mentioning that the RBMC is also the main connection with the global reference systems. It is composed by GNSS reference stations with known coordinates, eliminating the need to immobilize a receiver over a geodetic point during measurement campaigns. Moreover, the RBMC stations are fitted with receivers to high performance, providing high-quality of the observables and consequently providing reliability to the data processing results.

4. Decision tree learning (DTL)

DTL are powerful and popular tools for classification and prediction of cases (^{Moussas, 2006}). The attractiveness of the use of decision trees is due to the fact that, in contrast to neural networks, decision trees represent criteria, that can readily be expressed so that humans can understand them or even directly used in a database access language like SQL so that records falling into a particular category may be retrieved.

In some applications, the accuracy of a classification or prediction is the only thing that matters. In such situations, we do not necessarily care how or why the model works. In other situations, the ability to explain the reason for a decision is crucial. There are varieties of algorithms for building decision trees that share the desirable quality of interpretability.

A decision tree can be used as a model for sequential decision of problems under uncertainties. It describes graphically the decisions that can be made, the events that may occur, and the outcomes associated with combinations of decisions and events. Probabilities are assigned to the events, and values are determined for each outcome. A major goal of the analysis is to determine the best decisions. The idea of a DTL is that the learning preceptors are not only used to act, but also to enhance the ability of the agent to act in the future. Learning takes place to the extent that the agent observes their interactions with the world and its internal decision-making process.

DTL is an example of inductive learning, i.e. through a hypothesis based in a particular case to produce general conclusions. It is very useful to implement expert systems and classify problems. The DTL takes as input data described by a set of attributes and returns a decision, which is the predicted value for the input value. This paper used this technique to create the reliability classification table (see section 7).

5. Material and methods

As already mentioned, the data processing has been performed considering the geometry of the baselines, the distances between RBMC stations, the season of the year, the significant occurrence of solar activities and the use of horizontal and vertical control points. To analyse the influence of the network geometry, the data processing has been performed taking into account two cases: (1) stations with similar distances and (2) stations with different distances.

Regarding the case of good geometry and long baselines, the stations used were CUIB, BRAZ and PPTE. For the case of long distances and bad geometry, the chosen stations were CUIB, BRAZ and MCLA. In both cases, the considered "unknown" station in the processing was CUIB. Figure 2 illustrates these two situations.

The stations used for shorter and similar distances were VARG, VICO and RIOD and for shorter (and different) distances were VARG, RIOD and UBER. The station VARG was considered as "unknown". The choice of this station has been done in order to get the required geometry and due to the greater quantity and quality of data for these stations. Figure 3 shows these two situations:

Regarding the period of time on this research, it was considered four days of data referring to four different seasons of the year (approximately 20 days after initiation of season). The days have been selected according to data availability from RBMC in the IBGE site, and the data processing have been carried out using the period of data provided by IBGE for each day (in most cases, these periods were 24 hours). The days selected were (considering south hemisphere):

• FALL (03/21 to 06/21): April, 11 to 14 (Julian days 101 to 104 in the years 2006 and 2007; 102 to 105 in the year 2008 (leap year));

• WINTER (06/21 to 09/23): July, 12 to 15 (Julian days 193 to 196 in the years 2006 and 2007; 194 to 197 in the year 2008 (leap year));

• SPRING (09/23 to 12/21): October, 13 to 16 (Julian days 286 to 289 in the years 2006 and 2007; 287 to 290 in the year 2008 (leap year));

• SUMMER (12/21 to 03/21): January, 11 to 14 (Julian days 11 to 14 in the year 2006)

The software used in the data processing was Topcon Tools, version 7.1, considering the SIRGAS2000 parameters and precise ephemeris obtained on the website http://igscb.jpl.nasa.gov/components/prods.html, accessed in 1/31/2011. After the data processing, the results of the processed coordinates have been separated and sorted by DTL in the software WEKA (Waikato Environment Knowledge Analysis), version 3.5, according to the previously established criteria.

For the adjustment of the observations, two known stations have been used for each case: VICO and RIOD for the first, UBER and RIOD for the second, BRAZ and PPTE for the third and BRAZ and MCLA for the fourth. The website used to obtain information regarding solar activities was: www.sec.noaa.gov (accessed 1/31/2011). The ionospheric conditions have been evaluated by the Klobuchar model.

To evaluate the correlation between the precision and accuracy of the coordinates obtained in the data processing with a higher degree of confidence, a relevant fact considered in this paper was the accuracy of the equipment used in the RBMC stations. The devices have different accuracies among themselves, thus for each classification this factor has been taking into account. The related information about the equipment and stations are listed below. See Table 1:

6. Processing results

The data processing have been performed using the following parameters:

• Reference System: SIRGAS2000;

• Type of coordinates: latitude, longitude and ellipsoidal height;

• Coefficient of Refraction 0,14;

• Elevation Mask: 15º;

• Adjustment by Least Squares Method;

• Confidence level of Adjustment: 68%;

• Number of set points: 3

• Number of control points: 2;

• Number of vectors: 2

• Rejection Criteria Adjustment: Chi-square (UWE).

For each chosen day of the year, it has been done four distinct data processing, taking into account the distance and the geometry among stations.

All results of coordinates have been transformed to UTM coordinates on different projection zones, depending on the station position. All coordinate values are related to SIRGAS2000.

Case 1: Similar distances among stations considering baselines shorter than 300 km

For this case, the following RBMC stations have been used: VARG, VICO and RIOD. VICO and RIOD stations have been considered as control points and VARG station was the unknown point (see Table 2).

Case 2: Different distances among stations considering at least one baseline shorter than 300 km

For this case, the following RBMC stations have been used: RIOD, UBER and VARG. RIOD and UBER stations have been considered as control points and VARG station was the unknown point, (see Table 3).

Case 3: Similar distances among stations and baselines greater than 700 km

For this case, the following RBMC stations have been used: BRAZ, PPTE and CUIB. BRAZ and PPTE stations have been considered as control points and CUIB station was the unknown point (see Table 4).

Case 4: Large distances and different baselines length, considering at least one baseline greater than 700 km.

For this case, the following RBMC stations have been used: BRAZ, MCLA and CUIB. BRAZ and MCLA stations have been considered as control points and CUIB station was the unknown point (see Table 5).

7. Classification and results analysis

The parameters used to classify the results were:

• When the discrepancy between the coordinates approved by IBGE and the estimated coordinates found in data processing was exceeded in 10 cm, the data would be immediately rejected. This value has been adopted based on the Technical Standard for Georreferencing of Rural Properties - 2ª edition (INCRA, 2010.);

• The solution of ambiguity as "float solution" has been accepted only for baselines greater than 300 km. The classification has been carried out taking into account the following considerations: if only one baseline does not produce the expected ambiguity solution, the process would be of a lower class, but if two or three baselines do not produce the expected ambiguity solution, the process would be rejected. This decision was based on previous research (MENZORI, 2005);

• For the classification "A" the PDOP value should be smaller than 3. If PDOP values were between 3 and 6 the related baseline was automatically labelled as a lower class. With two or three PDOP values between 3 and 6, or with only one value above 6, the data processing was rejected;

• To estimate the limit value of the horizontal and vertical accuracies, it has been taken into account the accuracy of each receiver in each baseline and its time of operation during the data collection.

The results of all data processing have been divided according to the criteria presented in Figure 4.

Table 6 presents the proposed classification, which has been considered in the DTL for the analysis of the adjusted results of the coordinates.

Classification by WEKA

Figure 5 shows the database structure generated for classification proposal in this paper, using the WEKA software.

In the header of Figure 5 it is shown the stations (VARG, VICO and RIOD) and the processing year, that in this experiment was 2006. The following lines contain the qualifying attributes used for classification. The software interface, as well as the graph generated for this experiment can be visualized in the Figure 6.

The classification results generated by WEKA software can be viewed in Figure 7 (referring to the four cases described previously). In the figure, the reader can see the amount of data processing classified as A, B, C or R, for each year.

8. Analysis and results

The results have been classified accordingly to the behavior of the factors generated by the data processing and adjustment, considering the direct influence of solar activities, baseline lengths and geometric configuration between RBMC stations.

The behavior of the results, for the months in which the solar activity was high, was very similar to the behavior of the results for the months with no atypical sunspot activity in all baseline lengths and all geometric configurations of the stations used in this study. For baselines shorter than 100 km, the behavior of the results of both cases is also very similar, according to indications of ^{Menzori (2005}).

Regarding the baselines lengths, a large number of data processing for baselines greater than 700 km has been rejected and classified as classes B and C, while in baselines shorter than 300 km, they have been classified as class A.

The solution of the ambiguity behaved as expected, except for some baseline lengths shorter than 300 km, where the solutions of the ambiguity were not fixed. For data processing with baseline lengths up to 800 km, for which it was expected that all solutions would be of floating type, in some cases, the solution was of fixed ambiguity. These cases require further analysis, since the solution may be good from a statistical point of view, however, the coordinates of the points may have significant errors and hence the computed distances of the baselines may have significant errors.

The baseline length does not influence directly the results of the ambiguity solution, but on the other hand, in conjunction with the geometric configuration of the RBMC stations, it can be critical for the accuracies of the horizontal and vertical coordinate values. For long baselines and bad network geometry, the accuracies were within the thresholds set by the equipment manufacturers. The accuracies almost doubled in cases where the baselines had a maximum length of 300 km and good geometric configuration (the distances between stations were similar - equilateral triangle), making it clear that the latter situation allows a better quality of the processed results.

Another relevant factor is related to the PDOP values. For baselines below 300 km, almost all PDOP values were less than 3. For baselines greater than 1,000 km, and considering triangles with different lengths, the vast majority of PDOP values were greater than 3, which degrades the quality of the results.

It is noteworthy that a few numbers of results were rejected due to discrepancies between the coordinates processed and the ones approved by the IBGE. All rejected cases occurred in situations where the distance between the RBMC stations were very different. In situations of similar baselines lengths, even for long baselines, there was no problem regarding the different baseline lengths. Therefore, it is apparent the importance of the reference station network geometry.

9. Analysis of the classification

The data processing have been classified according to five situations: (1) discrepancy between the calculated coordinates and those approved by the IBGE, (2) solution of the ambiguity, (3) PDOP value, (4) horizontal accuracies and (5) vertical accuracies. Thus, it can be affirmed that:

• Regarding the coordinate discrepancies, the rejected data processing occurred only for those that had different distances between RBMC stations, i.e., they had a bad geometry. Usually these discrepancies were evident in the East-coordinate;

• Regarding the PDOP, it was noted a trend that most of the bad results for the PDOP values were for baselines bigger than 1,300 km and with different distances between RBMC stations (bad geometry). The variation of the PDOP for most part of data processing was small;

• Regarding the ambiguity, it was expected float solution in almost all processing. However, some surprising results were obtained, as the fixed solution of ambiguity for baselines greater than 700 km. These cases must be carefully interpreted, since the coordinates obtained from the data processing may present considerable errors when compared with the real value of the related. Fix ambiguity solution for long baselines does not ensures a good quality for the coordinates;

• Regarding to the horizontal and vertical accuracies, the values found can be accepted according to the manufacturer's specification for those distances. However, it has been found that for long baselines and with a bad geometry among the stations, the values in several cases were worse than specified. It is important to highlight that, even with baselines greater than 1,000 km; in some cases, the accuracies were found to be smaller than 20 cm.

10. Accuracy classification table

The considerations adopted in this paper to classify the results are reliable. After result analysis, the authors propose the Table 7 for future investigation on this matter.

11. Conclusion

The coordinate accuracy values resulting from GNSS data processing and adjustments, is a hard factor to be determined in a day-by-day basis. In this article, it is proposed a reliability classification table based on different factors as baseline lengths, PDOP values and geometric configuration of the reference stations. As it was stated in the article, the statistical factors generated in the data processing and adjustment cannot be individually considered as a reliability indicator. Even the ambiguity solution cannot always determine the measurement quality once it is possible to reach fixed solution even for baseline length greater than 300 km. However, the reliability of this result is doubtful, demonstrating once more the importance of a proper choice of control points that will compose the coordinate determination.

The article demonstrates that the modeling of the results of GNSS data processing allows the classification of them in the form of a table of reliability. As a result, it is recommended to use the presented classification table as a parameter to analyze the quality of planned land surveying works. Particularly this recommendation may be applied for planning land survey works in rural properties in Brazil aiming to comply with the denominated INCRA's Law 10.267/2001. Based on the results obtained in this research, a level of reliability classified as class A or B may be qualified to obtain INCRA´s certification.