Quality of automatic geocoding tools: a study using addresses from hospital record files in Temuco, Chile Calidad de las herramientas automáticas de geocodificación: un estudio usando direcciones

Automatic geocoding methods have become popular in recent years, facilitating the study of the association between health outcomes and the place of living. However, rather few studies have evaluated geocoding quality, with most of them being performed in the US and Europe. This article aims to compare the quality of three automatic online geocoding tools against a reference method. A subsample of 300 handwritten addresses from hospital records was geocoded using Bing, Google Earth, and Google Maps. Match rates were higher (> 80%) for Google Maps and Google Earth compared with Bing. However, the accuracy of the addresses was better for Bing with a larger proportion (> 70%) of addresses with positional errors below 20m. Generally, performance did not vary for each method for different socioeconomic status. Overall, the methods showed an acceptable, but heterogeneous performance, which may be a warning against the use of automatic methods without assessing quality in other municipalities, particularly in Chile and Latin America. Geographic Mapping; Residence Characteristics; Spatial Analysis Correspondence P. Ruiz-Rudolph Universidad de Chile. Independencia 939, Independencia, Santiago, Chile. pabloruizr@uchile.cl 1 Universidad de Talca, Talca, Chile. 2 Universidad Andrés Bello, Viña del Mar, Chile. 3 Universidad de la Frontera, Temuco, Chile. 4 Universidad de Chile, Santiago, Chile. 5 Universitat Jaume I, Castellón, España. 6 University of Birmingham, Birmingham, U.K. This article is published in Open Access under the Creative Commons Attribution license, which allows use, distribution, and reproduction in any medium, without restrictions, as long as the original work is correctly cited. Quinteros ME et al. 2 Cad. Saúde Pública 2022; 38(1):e00288920 Introduction Knowing the spatial distribution of certain attributes, health determinants or conditions of individuals or populations has helped researchers and policy-makers to monitor and to understand some important relationships between public health and people’s environments 1. In the last decades, Geographic Information Systems (GIS) have been increasingly used in environmental 2,3,4, nutritional 5,6, and social epidemiological studies 7, as well as in public health research and practice 2,8,9,10,11. Thus, the transformation of a written address into spatial information, i.e., geocoding, is essential and has become an important methodology to locate people and services, among others 4,8,11,12. Address geocoding describes the process of spatially locating an address by finding the coordinate that best fits its physical location on a map 3,9,10,11,13. Geocoders are the service providers that receive the query address, process the geocoding task, and output the coordinate results. Recently, several online geocoding applications – including address geocoding – have become widely available with Bing Maps (https://www.bing.com/maps/), Google Maps (https://www.google.com/maps/), and Open Street Maps (https://www.openstreetmap.org/) 11,14. In general, a set of addresses are queried automatically and the results are retrieved, including metadata indicators of quality along with the coordinates 7,10,11,12,13. Geocoders – and other online tools – may vary in both match rate, i.e., the rate at which addresses are found in a certain study, and accuracy, i.e., how close to the real location the queried address is placed. These quality estimates are essential for public health research, since differences in match rates across locations and/or spatial displacements of the addresses may bias the study results 4,9,12,13,15. Geocoders use different databases and algorithms, and, therefore, the quality of geocoding are expected to be different. Many recent studies have attempted to assess the quality of geocoder services in different settings 2,9,10,11,13. To this date, most studies of geocoding quality have been conducted mainly in North America and Europe, where it was possible to identify differences in quality; and only recently a report of this nature have emerged on Brazil 14, with no other studies known for Chile or other Latin American countries, although the substrate for geocoding in the region could greatly differ. There is a large interest in studying spatial health determinants in the region 16,17, and following quality estimations or recommendations from international studies may introduce biases, as well as under or overestimation of the health effect in the local population, which ultimately might affect the success of implementing local public health policies 17. For these reasons, a study of the quality of geocoding in municipalities of Latin American need attention. We developed this study as part of a larger research project studying the association between air pollution and pregnancy outcomes in a cohort of women in Temuco, Chile, where residential addresses of pregnant women, obtained from handwritten hospital records, were used to spatially estimate air pollution exposures. This article aims to determine the quality of three automatic online geocoding tools by comparing them with a reference method in a random subsample of the addresses.


Study site
Temuco and Padre Las Casas are neighboring cities, separated by the Cautín river (Figure 1), that belong to a conurbation, known as Temuco; located at 38º44' S and 72º35' W in the Araucanía Region in the Southern Chile. Temuco was founded at the end of the 19th century and is the most populated city in the region with a surface area of 464km 2 and a population of 290,000 inhabitants 18 . Padre Las Casas was founded in 1995 and has a surface area of 400km 2 and 80,000 inhabitants 18 . Most of the population (93%) in Temuco live in urban areas, whereas in Padre Las Casas, a larger proportion (40%) reside in rural zones also presenting a larger share of indigenous people 19 . The main economic activities of the region are agriculture and services. This region is the poorest in Chile with 17% of the population living below the poverty line 20 .  Note: the white line represents the limit between the municipalities of Temuco and Padre Las Casas.

Study design and data collection
Addresses for geocoders testing were drawn from a retrospective pregnancy cohort study including 15,500 childbirths at the Dr. Hernan Henriquez Aravena Hospital in Temuco, the reference health center for the municipality from 2009 to 2015. Maternal sociodemographic characteristics, obstetrics, and newborn variables were collected from hospital records. The main study was approved by the Araucanía Sur Local Ethics Committee with nationwide accreditation (Servicio de Salud Araucanía Sur, Resolución Exenta n. 1,179, March 6, 2014). The study attempted to link air pollution from a spatiotemporal model with maternal information. To achieve this, handwritten addresses -from hospital records -were automatically converted into spatial points.
To evaluate the quality of three different automatic geocoders, the geocoding results from a subsample of addresses were compared with a reference method. A total of 300 handwritten addresses were randomly selected from the cohort database but stratified by municipality to ensure that each municipality was adequately represented in the subsample (200 selected in Temuco and 100 in Padre Las Casas). The number of addresses evaluated was limited to 300 for it was a feasible amount to process using the reference method, allowed adequate comparison, and because similar numbers have been used in previous studies 2,7,21 . Addresses were limited to urban areas within the municipalities using the same inclusion criteria as the larger study. To assure strict confidentiality criteria, a geocoding team was established inside the hospital and an identification number was assigned to each address. Thus, a reduced database was generated for geocoding, including only the identification number and address, with no other personal data available.

Reference method
The reference method consisted of manually geocoding all addresses in the subsample, conducted by a trained technician, who did not belong to the hospital or the research team and was blinded to the address source. The process was conducted in two steps. In the first step, addresses were located using Google Street View (https://streetview.gosur.com/), assuring that the actual street address and number were observed on the screen. Then, the point was located in the middle of the sidewalk in front of the household. If the address was not found with Google Street View, the technician would personally explore the zone until identifying the address, and subsequently using Global Positional System (GPS) receiver (Garmin 60CSx, Garmin Ltd.; http://garmin.com/) to obtain the address coordinates. Due to the high accuracy of the GPS (4.2 and 5.3m for the Easting and Northing coordinates, respectively) 22 , no differential correction was employed in this study. Both systems located the points using the WGS84 coordinate reference system. All referencing was achieved in September 2018. As we used two different techniques to build the reference method, we explored the differences between positional errors of GPS and Street View by locating the now known locations found in GPS in Street View. Figure S1 and Table S1 (Supplementary material: http://cadernos.ensp.fiocruz.br/ static//arquivo/suppl-e00288920_6701.pdf) show a small positional error between both techniques, with 90% of the points within an error of 20m in both municipalities.

Automated geocoding services
The three geocoding methods used were Bing, Google Earth, and Google Maps. Both Bing and Google Earth were implemented using a code in R software (http://www.r-project.org), while Google Maps geocoding was implemented using GIS software (https://www.qgis.org/). The solution output included metadata and quality indicators besides the coordinates. The results may include more than one solution, and some solutions may be erroneous (i.e., in other cities or even in other countries). To ensure the selection of a result that was likely the actual address in question and filter the inadequate ones, some quality criteria were established for each geocoder using the returned indicators.
(a) Bing. Addresses were automatically supplied to Bing Map using a code in R software. A typical query was "street name + street numbering, city, Chile". Six criteria were established based on the metadata: (i) confidence must be "high"; (ii) entity type must be "address" or "roadblock"; (iii) accuracy must be "rooftop" or "interpolation"; (iv) match code must be "good"; (v) the city must match the one Cad. Saúde Pública 2022; 38(1):e00288920 in the record ("Temuco" or "Padre Las Casas"); and (vi) a street number must be found. An address was considered found and selected when all six criteria were met. Usually, only one result matched the six criteria.
(b) Google Earth. The process was similar to Bing except that the platform used was Google Earth and the criteria were adjusted as follows. Five criteria were used: (i) one component must be a "route"; (ii) another component must be "street number"; (iii) the found city must match the one in the record; (iv) type of point must be "rooftop" or "range-interpolated"; and (v) result must match the city. As with Bing, usually, only one result satisfied the five criteria.
(c) Google Maps. Addresses were loaded to Google Maps in batch mode using the MMQGIS plugin of QGIS. The plugin employs an attribute table in CSV format with the addresses (street number, street, city, and country) to obtain a geocoded point layer. Three criteria were used to evaluate the performance of the geocoding method: (i) accuracy must be "rooftop" or "interpolated range"; (ii) address type must be "street" and "house number"; and (iii) district must be "Temuco" or "Padre Las Casas". Google Maps provided only one result per query.
All automatic geocoding presented were initially performed in September 2018, to be comparable to the reference method. New searches were repeated at a later date, yielding similar results to those obtained in 2018.

Data analysis
Firstly, the match rate of the reference method was calculated by dividing the number of geocoded addresses by the total number of submitted addresses 23,24,25 . Then, match rates of the automatic methods were estimated using only the addresses previously found by the reference methods. The positional error was calculated as the Euclidean distance, in meters, between the results of the automated tools and the reference method to compare the accuracy of the results. To do this, all locations were first projected to a UTM zone 18H south coordinate system. The positional error was characterized and compared by using descriptive statistics (mean, median, standard deviation, and percentiles) and plotting the cumulative frequency distribution of positional errors. The outcome was also analyzed by socioeconomic status. ADIMARK 26 is a common instrument used in Chile to evaluate socioeconomic status, dividing the population into five groups: ABC1, C2, C3, D, and E, according to income and purchasing power, with the first being higher-income group, whereas the last being the one with lower income. The variable was calculated for each block in the cities based on data measured at the household level using the 2002 Census of the Chilean National Institute of Statistics, which included the education level of the head of the household and possession of assets. To facilitate the analyses, the addresses were grouped in blocks of high (ABC1), medium (C2+C3), and low socioeconomic status (D+E) and matched to each address by block. Figure 1a shows the spatial distribution of the addresses that could be located by the reference method. Notably, the distribution of these addresses was spread across both cities, although less represented in sectors with higher socioeconomic status (Figure 1b). This occurred because the hospital performs approximately 80% of the cities childbirths, mostly of lower and medium socioeconomic status mothers. From the 300 addresses, 90% were successfully found by the reference method (Table 1). Geocoding using the reference method required approximately 24 hours of the technician's time, compared to automatic methods that were executed in few minutes. Most addresses were found in the initial step using Google Street View (63%) with rates slightly higher in Temuco compared to Padre Las Casas. Regarding those addresses not found, the technician reported having the address number not matching actual street numbering as the main reason. Furthermore, four addresses located in Padre Las Casas in the clinical record were found in Temuco. This emphasizes the initial difficulties faced when working with transcribed handwritten addresses.

Performance of the reference method
Cad. Saúde Pública 2022; 38(1):e00288920 Performance of the automated methods Table 2 shows the match rates of the three automated methods compared to the reference one. We observed large significant differences in the match rates between methods and between cities (Table 1). We also observed better overall performance for Google Maps with rates above 90% for both cities, followed by Google Earth with rates above 80% for both cities, with statistical differences between methods (Table 3). Finally, Bing had rates above 80% only for Temuco and no matches in Padre Las Casas. Considering socioeconomic status, we found large and significant differences in the match rates between the methods for addresses in the low and medium socioeconomic status (Tables 4  and 5), whereas we found no differences in match rates when comparing socioeconomic status for each method.

• Match rate
• Positional errors Table 6 shows the distribution of positional errors for the three methods. We found significant differences among methods (Table 7). Overall, Bing showed a lower positional error with a higher proportion (88%) of the observations with positional errors in smaller ranges, i.e., < 20m, and lower proportion in the larger ranges (1%), i.e., ≥ 100m, compared to the other methods, in the order of 70% for smaller positional errors (< 20m) and 6%-10% for larger positional errors (≥ 100m). We observed significant differences between Bing and Google Earth and Bing and Google Maps (Table 7), but not for Google Earth and Google Maps. This was more evident when inspecting the cumulative distribution of positional errors plot (Figure 2), in which it was clear that Bing had a better performance followed by Google Earth and Google Maps. Moreover, Table 8 shows some very large errors (> 1,000m) observed for some cases (p98) in Google Earth and Google Maps. When analyzing each city separately, the trends in Temuco were similar to overall results, whereas Bing presented no matches in Padre Las Casas and the performance of the other two methods was slightly worse than in Temuco.
Finally, we found significant differences when considering socioeconomic status (Tables 9 and  10). Bing showed lower positional error in low socioeconomic status, with a higher proportion (92%) of the observations with positional errors in smaller ranges, i.e., < 20m, compared to the other socioeconomic status. We observed no significant differences between methods by socioeconomic status (Table 10).

Discussion
Our results reveal that the quality of geocoding methods greatly varies regarding match rate and accuracy. Concerning match rates, Google Earth and Google Maps showed a good performance compared to Bing, which completely failed in one of the studied areas (Padre Las Casas). However, Bing presented a much lower positional error once the address was found, and with even better performance in lower socioeconomic status. For some years, researchers have been studying the quality of geocoding methods 15,23,27,28,29 , but recently, automated methods have been evaluated. Considering five of the most recent methods 2,7,9,21,24 , the observed match rate in our study was on the higher end (above 90%) compared to what other authors found, at least for some of the methods. Similarly, all methods in our study had positional errors mostly in the smaller range (i.e., less than 20m), particularly Bing, with only some excursions above 100m. Google Maps and Google Earth, on the other hand, presented relatively larger positional errors more frequently, according to other international studies. Only a fraction (1%-5%) of the addresses presented large errors, similar to previous reports. Surprisingly, the performance did not vary for a given method with socioeconomic status. The results from this study are far from being generalizable to other cities in Chile or Latin America. On other hand, it warns against the massive use of automated methods without knowing the quality of the outputs, which may result in large differences among cities, or neighborhoods, potentially leading to biases 15 . Locally, it seems advisable to automatically geocode the addresses using all three methods following a tiered protocol based on the quality criteria previously established. For Temuco, the protocol suggests geocoding the addresses automatically with Bing first, proceeding with Google Earth first and Google Maps later. Whereas in Padre Las Casas, we proposed to start with Google Earth, followed by Google Maps. We estimate that using this procedure would ensure an overall 93% match rate and about 80% of positional error below 20m and less than 5% above 100m, thus minimizing biases.
Cad. Saúde Pública 2022; 38(1):e00288920 Cad. Saúde Pública 2022; 38(1):e00288920 Table 9 Distribution of positional error for the different methods according to socioeconomic status. Municipalities of Temuco   Regarding limitations, we recognize that this subsample was created based on a population of pregnant women seeking services at a public hospital, and, therefore, it is likely that women of higher socioeconomic status are less represented. We can speculate that these women from higher socioeconomic status are likely to live in well-established, higher socioeconomic status neighborhoods; and, concerning the commercial interest of the engines, it is possible that the quality of the geocoding of their addresses might not differ much from the one reported here. Another limitation is that the reference method was derived from two complementary methods: one that could be called a "gold standard", i.e., GPS and Google Street View. These methods could have different inherent errors 7 , but we emphasize that the supervised process should keep errors relatively small and thus, should allow estimating whether the automated geocoding methods are falling in the identification of address coordinates within a smaller (below 20m) or larger (more than 100m) range of errors.
As strengths of this study, we can mention the fact that this is the first of its kind in Chile and the second in South America. Also it is linked to a real-world health study that uses handwritten addresses, a challenge that many research teams face. Furthermore it was performed in Temuco, a mid-sized regional capital, and not in Santiago, the capital city, where approximately 40% of Chile's population resides. A study conducted in Santiago would likely have yielded results unlikely to be compared to small, or medium-size cities in Chile.
Overall, methods showed an acceptable, but heterogeneous performance, corroborating with other international studies. If the methods are used combined in a tiered protocol, the geocoding results may present adequate quality to perform health studies in Temuco and Padre Las Casas. The heterogeneity of the performance warns against using the automatic methods without assessing quality in other cities in Chile and Latin America.