AN EXPERIMENTAL DESIGN APPROACH ON GEOREFERENCING

Georeferencing is one of the most important stages of digitizing analogue maps. It is affected by many factors such as; scales and resolu tions of maps, the number of control points, etc. In this study, four of these f actors were investigated using 2 4 factorial design in two dimensional georeferencing of cadastral maps. Factorial design determines, whether the selected factors hav e main and/or interaction effects on a response variable or not. Map scale, resolutio n of raster map, the number of control points and the coordinate transformation me thod were selected as experimental factors. Then, main effects and intera ctions between these factors were investigated. The results were statistically analyz ed using analysis of variance (ANOVA), and a regression model was suggested to co nsider the significant main and interaction effects of factors. It was observed that the two dimensional georeferencing of maps were affected by each of the sel cted experimental factors and by the interaction between the map scale and co ordinate transformation method.


INTRODUCTION
Maps have an extensive area of usage.Maps are used particularly in geographical and associated subjects.Cadastre is one of the most important usage areas of maps.It refers to systematically organized parcels of land in particular area and contains the information about individual land parcels and properties.This information is obtained through maps and land registers; the maps show shape, size, and location of the land parcels on ground, while the ownership, rights, area, and other information is kept in the land registers (ALI et al., 2012).In Turkey, cadastral works are undertaken by the General Directorate of Land Registry and Cadastre (GDLRC).Over time, different types of cadastral maps were produced in accordance with technological developments and legal amendments (DEMIR and CORUHLU, 2008a;2008b).The production of approximately 98% of cadastral maps covering the whole Turkey was completed by 2012 (GDLRC, 2012).
Cadastral works began in 1924 with local based works in Turkey, after 1950's the process was accelerated by the use of photogrammetry (EKIN et al., 2012).Up to the 1990's over three hundred thousand cadastral maps were produced in nondigital (analogue) format.The production method, coordinate system, scale, and base type of these maps were different.Table 1 shows the distribution of maps by production method.In society today, a need to convert accumulated analogue maps into digital format has emerged.In most cases, many individual maps exist at various scales and levels of detail and it is either impossible or too expensive to physically re-survey the features for creating a new digital map (GIELSDORF et al., 2003).All the existing analogue maps need to be digitized in order to be compatible with Spatial Information Systems (SIS) and to provide appropriate land registry maps to other public and private organizations.There are several ways of digitizing analogue maps with the most commonly used method being "raster to vector transformation".The basic steps in terms of method are scanning, georeferencing, and digitizing the existing maps.In the current study we focused on georeferencing of cadastral maps.
Georeferencing is the process of assigning locations to geographical objects within a geographic frame of reference and determining the geometric relations between the image data and the real world (LEGAT, 2006).Georeferencing is used both to establish the relation between raster images and coordinates, and to determine the spatial location of other geographical features.It is also used in determination of the exterior orientation parameters of remote sensing (RS) and sensor data at the time of the recording and the restitution of the scene from the image data (SKALOUD and LEGAT, 2008;Li and BRIGGS, 2012).However, the current study does not include RS and sensor data.
Georeferencing of a scanned map depends on many factors such as; the scale of the map, the resolution of the image, the number of control points (NCP), the chosen coordinate transformation method (CTM), map production method, and some uncontrolled factors like user experience and working conditions etc. Whether these factors have an individual or joint effect on georeferencing can be investigated using experimental design (COMLEKCI, 2003) except uncontrolled factors.Uncontrolled factors cannot be defined so they cannot be investigated using experimental design.Experimental design is used in many kinds of fields.Problemsolving and process improvement studies related to various fields of activity utilizing experimental design have increased in recent years.Experimental design examines every possible combination of factors at the tested levels (GEORGE et al., 2005).The effects of two-level factors can be explained by a 2 n (n: number of factors) factorial design, which is a special application of the full factorial design of experiments (NAVIDI, 2008;MONTGOMERY et al., 2001).Below is an explanation of some concepts used in experimental design: (a) Main Effect: The change in the response variable that occurs as experimental factors change.
(b) Interaction: The situation where the effect of a factor on the response variable depends on the level of another factor.
(c) Response Variable (Performance Criterion/Dependent variable): An output which is measured or observed.
(d) Factor: A controllable independent variable which is thought to have an impact on response variable.
(e) Level: A specific value or setting of a factor.Georeferencing of scanned maps includes some two-level factors. 2 n factorial design can be used to investigate the effects of selected factors on georeferencing as a response variable.In the current study, the factorial design approach was adopted in order to calculate the effects of the following four factors; the scale of the map, the resolution of the scanning image, the NCP and the CTM in two dimensional georeferencing.The Root Mean Square Error (RMSE) was used as a response variable (performance criteria).The main aim of the current study was to investigate whether the selected four factors have an effect, either individually and/or jointly, on georeferencing by using two levels of the 2 4 factorial design.

MATERIALS AND METHODS
In the first stage of the current study, analogue cadastral maps with different scales were obtained from the Local Cadastral Directorate in Samsun Turkey.It was ensured that the selected maps were undeformed and well used, to avoid uncontrollable mistakes which may affect the experimental study.These maps were produced in the same projection system (UTM projection system) in accordance with the Regulation for Producing Large Scale Maps in Turkey.Obtained analogue cadastral maps were scanned in different resolutions and the scanned images were uploaded to the GIS software.Georeferencing of the scanned maps was done by using NetCAD 5.0 GIS software which is a mapping and GIS software frequently used by surveyors, urban and regional planners, and public and private arganizations in Turkey.

Experimental Procedure and Selected Factors
Experimental design analysis has both structural and organized analysis capability, and it is often used by researchers to examine the quality of products based on certain essential design factors and levels during the manufacturing process.This analysis of variance (ANOVA) explores the influence of factors on the overall system performance and the inter-connections between selected factors and levels (CHENG et al., 2012).ANOVA includes the null and alternative hypothesis testing.The null hypothesis attempts to show that no variation exists between variables, or that a single variable is no different than zero.It is presumed to be true until statistical evidence nullifies it for an alternative hypothesis.A statistically significant result when a probability (p-value) is less than a significance level, justifies the rejection of the null hypothesis.The significance level of this study is taken as 0.05 according to Stigler, 2008.In the current study, the results of the experimental data were studied and interpreted using statistical software MINITAB (Version 16) of Minitab, Inc., USA.The georeferencing of a map is affected by individual or multiple factors.In the current experimental study, the map scale, the resolution of map, NCP and CTM were evaluated.

Map Scale
Map scale is an important factor of digitizing because it provides the enlargement or reduction ratio of the real map.The map scale directly affects the accuracy of data obtained from the map.Large scale maps like 1/1000 1/2000 are used for showing individual parcels and buildings in detail because they cover a small area of land and they present more accurate data than small scale maps which cover large areas of land.Cadastral maps and their accuracy are very important because they display the shape, size, location and also ownership rights of the land parcels on the ground.The scale of cadastral map is determined according to the density of parcels and buildings in the study area.In Turkey, cadastral maps have been produced in different scales; 43.5% of the cadastral maps are in 1/1000, 27.9% are in 1/2000, 20.4% are in 1/5000 and 8.2% are in other scales (GDLRC, 2012).In the current study 1/1000 and 1/2000 scale maps were used.

Resolution of the Scanned Maps
As mentioned above, analog cadastral maps must be digitized in order to comply with Spatial Information Systems.Scanning is a method that converts an analog map into an image file in raster format.Resolution of the scanned image is an important property.Resolution represents the fineness of detail in the image, which translates to the pixel size .There is a limit to how far you can zoom into any digital image you eventually reach the limit of the pixel size; beyond this point, the image gets progressively blurrier (HILL, 2006).Common scanning resolutions are between 100 and 400 DPI, (KATONA and HUDRA, 1999;UNITED NATIONS, 2000).The resolution of the scanned image is related to the cell size.The resolution of the raster data is the inverse of the cell size which means that a smaller cell size provides higher resolution.A smaller cell size also provides higher feature and spatial accuracy.However, processing is slower and the file size is larger (CONGALTON, 1997;ESRI, 2013).The resolution of a scanned image is also important for georeferencing, since the better determined details on the image provide the better vectorization in same scale map.
Resolution is a term closely associated with scale (HILL, 2006).It is obvious that large scale maps are more accurate than small scale maps.A small scale map which has a high resolution is not as accurate as a large scale map.

Control Points
Control points are used to determine the relationship between the screen coordinate system and map coordinate system in the georeferencing.The number of control points and their locations are important in the coordinate transformation because of the RMSE calculation (GHILANI and WOLF, 2006).Geometrical distribution of control points plays an important role in estimating transformation parameters (EKIN et al., 2012;TAN et al., 2013).They should be equally distributed in the map for quality of the transformation and the RMSE.If a good distribution of control points is made, the absorptions can be kept at minimal levels; thus, more appropriate results for the datum parameters can be obtained (KUTOĞLU and AYAN, 2006).In the current study, to discover the effects of the number of control points, two levels were used; 4 and 12. Grid ticks in the raster image were used as the control points.Selected grid ticks were equally distributed in the map and they have equal distances north-south and east-west directions from each other.
Analogue maps can contain errors resulting from the method of production and measurement, and these maps become deformed over time.Accordingly, the control point coordinates obtained from scanned cadastral maps will also contain some errors.The number of control points must be more than the required number to increase the sensitivity of the solution.If the number of measurements is greater than the number of unknown parameters, an adjustment solution should be implemented to obtain the unique unknown parameters (WANG, 1992;GHILANI and WOLF, 2006).In the current study the least square adjustment method was used.

Coordinate Transformation Method
In the application of geodesy and photogrammetry engineering, transformation between two coordinate systems is frequently used.The size of the transformation area affects the transformation accuracy and method.Therefore, obtaining different transformation accuracies using different methods for regional or local areas is possible (EKIN et al., 2012).In the current two-level experimental study, the similarity transformation method and the affine transformation method were used for two levels of the factors for CTM.
In the similarity coordinate transformation method; the relationship between the two coordinate systems is defined by two translations of the coordinate origin, one rotation, and one scale factor along the coordinate axes.In this method, the geometrical similarity of shapes is maintained.The edges of the regular geometrical shapes shrink or expand at the same rate and the absolute value of the angles remain unchanged (TURGUT and INAL, 2003).In the affine coordinate transformation method, the relationship between the two coordinate systems is defined by two translations of the coordinate origin, two rotations, and two scale factor coordinate axes.The affine transformation method is frequently used in photogrammetry and cartography, because the film, paper and other map bases have different deformation along the two axes (TURGUT and INAL, 2003).

Determination of Factor Levels
In the experimental design analysis the identification of factors and factor levels under investigation requires a theoretical understanding of georeferencing the scanned map.In this study, four factors were examined at two levels, high (+) and low (-), resulting in a 2 4 full factorial design.The design determines which factors have important effects on a response, as well as how the effect of one factor varies with the level of factors (KAVAK, 2009).The resulting outcome measures the RMSE.
The level selection for each factor was carried out on the basis of preliminary trials.The experiments were executed in random order to evaluate experimental errors.Controllable experimental factors and their high and low levels are presented in Table 2.

RESULTS
In the current study 16 experiments with two replications, a total of 32 experiments were conducted in all probable combinations of all factor levels for georeferencing.All experiments were performed by the author.Each experiment included a coordinate transformation process and one of the factors was changed respectively in all experiments.In 2 4 factorial design, response data (RMSE) was obtained from 16 factorial experiments (2 4 =2.2.2.2=16) with two replications, as presented in Table 3.The results of the experimental data were studied and interpreted using Minitab 16 statistical software.The null hypothesis stating that the main effects and interactions equal to zero, was tested using the F-test.In Table 4, p values smaller than 0.05 indicate that all effects and interactions are not equal to zero at 5% significance level.In other words, the RMSE basically depends on the main effects of the Map Scale (A), the Resolution (B), the Number of Control Points (C) and the Coordinate Transformation Method (D) and also the interactions of A x D. The column labeled "F" presents F-value of the F-test and "p" presents the P-values of the F-test, as stated in Table 4.A similar result can be achieved from the normal probability graph of standardized effects (Figure 1).The aim of this graph is to determine the statistical significance of both the main and interaction effects.According to Figure 1, the most significant factor is A, because of its distance from the reference line and it has a negative value.It is followed by D, AxD, C, B in that order.The insignificant effects will be located along the line, while the significant effects will stay farther from the line.The second stage of the analysis was repeated by removing the terms which had no effect on the RMSE.Table 5 shows the revised estimated effects and coefficients for the RMSE.In this step, only the terms found to be significant at the 5% level are given.The column labeled T, presents the T values of student's t statistics, the column labeled P presents P-values in the Table 5. Examining a model is important for the prediction responses in any designed experiment (ISMAIL et al., 2008).The coefficients in such a model show the impact of any effect on the response.A negative sign for a given parameter indicates that the response decreases with an increase in the value of the parameter.This means that, "+1 level" of factor A decreases the measurement error.Since the problem lies in the minimization of the RMSE, the factor levels must be set at; A (+1) (Scale, 1/1000), B (+1) (Resolution, 400 DPI), C (-1) (4 control points) and D (+1) (affine transformation method).The magnitude and sign of the factor indicates the importance of the factors with respect to the response variable.
The determination coefficient (R 2 ) values show that the extent of a response variable is explained by the terms in the model.In other words, because the adjusted R 2 takes into consideration the number of predictors in the model, it is generally used to compare models with different numbers of predictors and thus is able to indicate how good it fits.According to the results of analysis, approximately 74% of the variability in the RMSE is explained by the factors included in experimental design.
Figure 2 presents the main effect graphs of the factors.If the slope is close to zero, then the magnitude of the main effect will be small.In the current study A and D factors have higher main effects than the others.Each level of the factors affects the RMSE differently.The high (+1) level of A, B and D factors and the low (-1) level of C factor produce a result in a lower RMSE.
Interaction refers to the situation when the effect of a factor on the performance criteria is dependent on another factor.In the analyses, the AxD (Scale x CTM) interaction, which was found to be significant, was analyzed in the multivari chart in Figure 3.It is clearly observed that the effect of scale (A) on the RMSE was significantly different when CTM (D) was at a low level.The best combination is high levels of the factor A and factor D.  A residual is the difference between an observation and its predicted value according to the statistical model being studied.Experimental design is based on the assumption that residuals are distributed normally and independently.Residual graphs presented in Figure 4 were used to check the validity of this assumption.Since the residuals lie approximately along a straight line and any pattern, such as sequences of positive and negative residuals, is not observed, it was concluded that the residuals are normally and independently distributed.

CONCLUSIONS
The main aim of this study was to investigate of the main effects and interaction effects of selected operational factors on georeferencing using full factorial design experiments.Experimental design provides to avoid the traditional one-factor-at-a-time experiments.For this purpose, a 2 4 full factorial design was employed to evaluate the main and interaction effects of map scale, resolution, NCP and CTM.The RMSE was selected as the response variable of the experimental design.As a result of the 32 experiments; the minimum and maximum values of the RMSE were 0.063 m and 0.7910 m.The results were statistically analyzed using Student's t test, analysis of variance, and F test, whether selected factors have a main or interaction effect on the response variable in two dimension georeferencing.
A few conclusions can be drawn and summarized as follows; • All factors have main effect on the response variable at the 5% significant level • Only Scale x CTM interaction was significant at the 5% significant level for the two-way interaction effects.
• 3-ways and 4-ways interactions have not any effect on the response value at the 5% significant level.
In this study selected high level scale, resolution, CTM and low level NCP provided to minimize RMSE value in georeferencing.The factorial design results showed that the optimum solution includes a 1/1000 scale, 400 DPI resolution, equally distributed 4 common points, and affine transformation methods in georeferencing.
GIS users have to transform analogue maps into digital form, so analogue maps must be scanned and georeferenced.The georeferencing of a scanned map is affected many factors as mentioned above.The effects of factor levels was determined in order to reach better RMSE value in georeferencing using full factorial experimental design.Finally, it can be concluded that experimental design studies can be used for geomatics engineering applications.Further investigations should be focused on the other georeferencing applications using RS and sensor data.

Table 2 -
The levels of experimental factors.

Table 4 -
Analysis of variance for RMSE.

Table 5 -
Estimated effects and coefficients for RMSE.