Abstract:
Toponyms play a crucial role in the identification and singularisation of geographic features. While traditional sources include gazetteers, official records, and historical maps, collaborative mapping platforms such as OpenStreetMap (OSM) offer a dynamic alternative by capturing local knowledge. However, validating the existence of OSM toponyms through external and up-to-date sources remains a challenge. This study proposes an automated framework for validating OSM toponyms using street-level imagery (SLI). The methodology integrates advanced computer vision and artificial intelligence techniques, combining the YOLOv11 model for text region prediction with the Keras-OCR framework for text recognition. Textual evidence extracted from SLI platforms, Mapillary and Google Street View (GSV), was analysed and compared to OSM toponyms using the Index of Collaborative Toponym Validation by Accumulated Evidence (ICTVAE), a metric designed to balance similarity and coverage in validation scores. The results reveal that SLI is a viable source for confirming the existence of OSM toponyms, with variations depending on image quality, visibility, and contextual factors. The proposed ICTVAE index effectively consolidates accumulated evidence from multiple detections, mitigating issues related to incomplete or partial recognitions. This approach provides a practical and scalable solution for validating collaborative toponyms, especially in regions where authoritative datasets are limited or unavailable.
Keywords:
Collaborative mapping; OpenStreetMap; Toponyms; Artificial Intelligence; Mapillary; Google Street View
1. Introduction
Place names, or toponyms, are essential for labelling, signifying, and identifying geographical elements, enabling the recognition of specific places as unique entities compared to others. Traditional sources for obtaining toponyms include oral tradition, historical records, official documents, old maps, atlases, dictionaries of geographical names (gazetteers), road signs and direct field collection (Ganieva, 2021). In recent years, collaborative mapping platforms such as OpenStreetMap (OSM) have emerged as an important source with great semantic potential for obtaining geographic information, as the mapping carried out by ordinary users is directly related to the local knowledge of the contributors, providing a dynamic and participatory approach to mapping (Barron, Neis and Zipf, 2014; Goodchild, 2007).
Collaborative toponyms, such as those found on OSM, have become an alternative source of this type of geographic data, utilizing the local knowledge of contributors to enhance the accuracy, detail, and fidelity of topographic mapping (Ardanuy and Sporleder, 2017; Machado et al., 2022; Perdana and Ostermann, 2018). This approach is particularly valuable in areas where traditional methods, such as field surveys and authoritative databases, are constrained by limited resources, incomplete coverage, or slow update cycles (Machado et al., 2022). Moreover, the traditional process of field surveys to collect toponyms, as well as the entire topographic maps production, is costly, time-consuming, and often requires specialised labour and extensive logistical resources (Kent and Hopfstock, 2018; Olteanu-Raimond et al., 2017). These challenges are particularly pronounced in developing countries, especially those with vast territories and diverse cultural contexts, such as Brazil, where conventional field survey methods remain largely outdated and insufficient to meet the dynamic demands of toponyms in geospatial databases (Martins Junior et al., 2016).
A limited number of studies have examined the integration of OSM toponyms with authoritative datasets to assess their quality and potential applications. The research of Ursini and Samo (2023) highlights the linguistic and cross-cultural richness embedded in OSM toponyms. It also points out that their effectiveness varies significantly depending on the geographic context and the availability of external validation sources. Similarly, Antoniou, Touya and Raimond (2016) analysed the volatility of OSM toponyms and their potential integration into gazetteers, underscoring the importance of consistent tagging practices and user contributions to ensuring data stability. Perdana and Ostermann (2018) highlight the potential of leveraging community knowledge to enhance the collection and validation of toponyms. Additionally, citizen science approaches further complement the efforts by engaging local contributors in data collection and validation, bridging gaps between collaborative and authoritative datasets, as demonstrated in studies comparing OpenStreetMap and GeoNames in regions like Kenya (Daniel and Mátyás, 2022).
Despite its potential, the reliability of OSM toponyms often depends on their alignment with external data sources to confirm the existence and accuracy of such names, which are not consistently available across all regions (Yamashita et al., 2022). However, external data sources and authoritative databases are not always available, particularly for remote or underrepresented regions. In the case of Brazil, a significant portion of the territory still requires updates to authoritative cartographic databases, while many areas remain entirely unmapped at larger scales (Silva and Camboim, 2021). Limitations in the availability and update frequency of official mapping are exacerbated by institutional, technical and financial constraints that prevent comprehensive coverage of geographic information across the country. In addition, disparities in the level of detail between urban and rural areas create challenges in maintaining homogeneous toponymic datasets, as many remote locations lack formal documentation (Camboim, Bravo, Sluter, 2015).
In this context, new approaches, such as the use of street-level imagery (SLI), offer promising alternatives for validating and complementing OSM toponyms. SLI, sourced from platforms like Google Street View (GSV) and Mapillary, provides high-resolution visual data that can support the extraction and verification of place names in urban and rural areas. Advances in computing resources, including the ability to store and process large volumes of data, combined with innovations in computer vision and artificial intelligence techniques, have made SLI a critical tool for geospatial data collection and decision-making processes (Biljecki and Ito, 2021). GSV is the most well-known SLI platform due to its extensive global coverage and user-friendly interfaces, including integration with Google Maps and APIs for automated access to its imagery (Anguelov et al., 2010; Liang, Zhao and Biljecki, 2023). However, the commercial nature of GSV introduces licensing restrictions, prompting the rise of crowdsourced platforms such as Mapillary and KartaView, which operate under open licenses like CC BY-SA 4.0, encouraging free use and broader application development (Hou and Biljecki, 2022; Ma et al., 2020).
Among these platforms, Mapillary stands out as a versatile and accessible option for geospatial applications, including the study of toponyms. It relies on images contributed by users, captured via smartphones, tablets, or 360° cameras, which are then processed to create datasets for object detection and annotation (Biljecki and Ito, 2021; Mapillary, 2025). Mapillary’s intuitive interface, API availability, and image processing capabilities enable the semantic segmentation of imagery, allowing for targeted searches for features such as road signs or license plates (Neuhold et al., 2017). Such capabilities make it a valuable resource for extracting toponymic information not easily discernible in aerial or satellite imagery. However, the collaborative nature of Mapillary introduces challenges, including variations in image quality, non-standardized acquisition methods, and uneven geographic coverage, particularly in remote or sparsely populated regions (Leon and Quinn, 2019). Despite these challenges, the platform offers a substantial volume of free, high-quality imagery, making it a significant advancement for geospatial data collection and a potential source for toponym validation.
Therefore, this research seeks to address a literature gap by investigating whether external data sources, such as SLI, can confirm the existence of collaborative toponyms provided by OSM. Based on the hypothesis that SLI can serve as a reliable external data source for validating OSM toponyms, particularly in scenarios where authoritative datasets are unavailable or limited, this study proposes a novel framework that combines AI-based techniques for retrieving place names from SLI and the subsequent OSM toponyms validation.
This work significantly extends our preliminary study (Nunes and Camboim, 2024) by incorporating additional analyses, expanded datasets, and a comprehensive validation of the proposed methodology. The current approach integrates the YOLOv11 model for text region detection in SLI, Keras-OCR for text extraction, and both Levenshtein similarity metrics and the proposed aggregate index, the ICTVAE, to evaluate the textual correspondence between SLI-extracted toponyms and OSM data.
By adopting the proposed approach, the study also aims to contribute to the broader field by developing and demonstrating a hybrid methodology that can verify the existence of collaborative mapping data. Furthermore, the proposed method evaluates the viability of SLI as a practical and alternative validation framework for toponymic data across diverse geographic contexts.
2. Methods
The proposed research framework consists of four main steps (Figure 1):
-
Data Retrieval: Points of Interest (POIs) containing toponyms were extracted from OSM based on predefined tags (Buildings and Amenities). This process involved querying the OSM API to retrieve the objects and associated collaborative place names. Moreover, the SLI was collected from two sources: the Mapillary API and the Google Street View Static API.
-
Training, Evaluation, and Testing the YOLOv11 Model: To detect text in LSI, a YOLOv11 model was trained, validated and tested using the Street View Text (SVT) dataset, which consists of 350 annotated images from Google Street View containing often business signage with high variability (Kai Wang, 2012).
-
SLI Text Prediction: The best-trained YOLOv11 model was applied to detect text in the street-level imagery. Subsequently, Keras-OCR framework was applied to extract the corresponding textual information.
-
Toponyms Correspondence Analysis: The extracted texts from SLI were compared to OSM toponyms using Levenshtein similarity at the subterm level. To consolidate multiple detections, the Index of Collaborative Toponym Validation by Accumulated Evidence (ICTVAE) was developed, which combines weighted similarity scores and subterm coverage to provide a comprehensive validation measure for each case study.
2.1 Data retrieval
As a preliminary stage of the proposed methodology, the study area was defined, considering the following criteria: the presence of POIs mapped on the OSM platform, where contributors informed toponyms; and the availability of SLI from both platforms analysed (Mapillary and Google Street View). Therefore, a total of 12 case studies were selected from the Água Verde neighbourhood, located in the city of Curitiba, Paraná, Brazil’s southern region (Figure 2).
The analyses were conducted for the OSM tags “Building” and “Amenity”, which were selected due to their importance in identifying urban points of interest and the potential for filled-in toponyms. These tags provide a detailed overview of the urban infrastructures and amenities available in the study area and include the following classes of features: School, Restaurant, Pharmacy, Pub, Supermarket, Banks, Place of worship, Museum and Hospital.
Following the delineation of the study area and the selection of tags, a request was submitted to the OSM API (OSM API, 2024) to obtain the toponyms associated with the POIs. Python scripts were implemented to interact with the API and extract the desired data. The selection was based on specific key-value combinations, such as ‘building=school’, ‘building=hospital’, ‘building=public’, ‘amenity=pharmacy’, ‘amenity=bank’, ‘amenity=restaurant’, ‘amenity=pub’, ‘shop=supermarket’, ‘amenity=place_of_worship’ and ‘tourism=museum’. Only features containing the ‘name=*’ tag were retained to ensure the presence of toponyms. This algorithm also filtered and manipulated the data to isolate the toponyms, as shown in Table 1.
Once the toponyms were obtained, the street-level imagery was carried out using two main APIs: the Mapillary API and the Google Street View Static API. For each POI, Python-based algorithms automated the requests based on their spatial location. A search radius of 50 meters was applied to define a bounding box around each POI, which served as a spatial filter to ensure that only street-level images spatially associated with the selected POIs were retrieved. This radius was defined based on preliminary tests to balance proximity and image availability.
After the authentication process, a total of 375 images, mostly from 2019, were downloaded from the Mapillary API. Similarly, 86 GSV images were obtained from the Google Street View Static API using equivalent parameters, covering the years 2019, 2022, and 2024 (Table 1). Although GSV provides 360-degree panoramic images, the Street View Static API returns a single perspective frame based on predefined viewing parameters. Therefore, only perspective images were used in this study from both platforms to ensure a consistent basis for comparison. In all cases, images were retrieved alongside relevant metadata, such as the date of image capture and geographical coordinates.
2.2 Training, Validation, and Testing of the YOLOv11 Model
The next step involved the training, validation, and testing of the latest YOLOv11 (You Only Look Once, version 11) model, the latest iteration of the YOLO family. This deep learning-based model extends beyond object detection to encompass multiple tasks, including image segmentation, classification, pose estimation, and object detection with specific orientations. YOLOv11 enhances feature extraction by refining its core components and optimising training strategies, resulting in improved accuracy and processing efficiency for real-time applications. The model retains the full convolutional neural network (CNN) architecture, which is optimised for detecting multiple objects within a single image frame, making it particularly suitable for object detection tasks in urban environments (Jocher and Qiu, 2024; Redmon et al., 2016).
Compared to previous versions of YOLO, the YOLOv11 introduces multiple architectural improvements aimed at enhancing detection performance, computational efficiency, and generalisation across diverse datasets. It consists of five versions: YOLOv11n, YOLOv11s, YOLOv11m, YOLOv11l, and YOLOv11x, classified by memory storage size, while preserving the same fundamental architecture. YOLOv11n, the smallest and fastest, is optimised for low-latency applications, while YOLOv11x, the largest, offers higher detection accuracy at the cost of increased computational demand (Jocher and Qiu, 2024).
In this study, the YOLOv11x model was used due to its superior ability to detect smaller and complex objects in datasets, although it requires considerable computational resources. The parameter settings included a learning rate of 0.01, a momentum of 0.95, a batch size of 16, an input image size of 640×640 pixels, and a Number of epochs of 200, using the Stochastic Gradient Descent (SGD) optimizer. The training also included Data Augmentation techniques, such as HSV adjustments, Scaling, Translation, Flipping, and Mosaic.
To train the model, the public SVT (Street View Text Dataset) dataset was selected due to its relevance in detecting textual information from street-level images, particularly for our purpose of detecting place names in urban environments. This dataset consists of 350 annotated images from Google Street View, providing a structured reference for supervised learning (Kai Wang, 2012). Following the download and preprocessing of the dataset, the data was divided into 70% for training, 20% for validation, and 10% for testing, ensuring a balanced and effective model evaluation. The YOLOv11x model was then applied with the above parameters, using these restructured subsets of the original SVT dataset for both training and evaluation, as well as inference on unseen test set images.
2.3 Street-level imagery Text Prediction
Once the YOLOv11x model was trained and optimised, i.e. the best weights were obtained, it was applied to predict (or detect) textual elements in the street-level images from Mapillary and Google Street View. This stage aimed to identify potential place names embedded in urban signage. The model processed each image by generating bounding boxes, regions of interest where text was likely to be present. The YOLO-based approaches have been adopted for text recognition due to their efficiency in identifying structured patterns in complex visual environments (Chaitra et al., 2022; Liu et al., 2024).
Following text region (bounding box) prediction in the SLI, the Keras-OCR, a deep learning-based optical character recognition (OCR) framework, was applied to extract the textual content from the detected bounding boxes. This framework has been recognised for its high accuracy in extracting text from natural scene images, leveraging convolutional recurrent neural networks (CRNN) for robust character recognition (Alrasheed et al., 2021; Shi, Bai and Yao, 2017). The extracted textual data were then structured and organised for the subsequent textual correspondence analysis, ensuring consistency in comparing extracted names of SLI with OpenStreetMap (OSM) toponyms. The integration of YOLOv11x for detection (prediction) and Keras-OCR for recognition (extraction) presents a hybrid approach, improving precision by narrowing the scope of OCR processing to relevant regions, thereby reducing false positives.
2.4 Toponyms Correspondence Analysis
The textual correspondence between the toponyms extracted from SLI and those available in OSM was assessed using a combination of similarity metrics to ensure a robust validation process. Prior to computing textual similarity, a minimal pre-processing was applied to standardise both the OSM toponyms and the SLI-extracted texts. This preprocessing included converting all texts to lowercase, removing accents and special characters, and eliminating numbers and symbols, to ensure consistency during comparison. An additional pre-processing step was also applied to the GSV-extracted texts only, to remove mentions of copyright.
The primary metric used was the Levenshtein distance, a widely used string similarity metric. This metric quantifies the minimum number of single-character operations such as insertions, deletions, or substitutions, required to transform one text sequence into another, providing a measure of textual similarity (Levenshtein, 1966). It has been extensively used in text-matching tasks, particularly in scenarios involving noisy data or spelling variations (Navarro, 2001). The similarity between the extracted texts and the OSM toponyms was expressed in terms of percentage similarity, as shown in Equation (1).
Where dLev represents the Levenshtein distance between the two text sequences, s1 and s2 correspond to the compared strings, and len denotes the length of each sequence. This baseline measure enables the assessment of how closely the SLI-extracted toponyms align with those recorded in OSM.
While Levenshtein similarity is effective for string-matching, its direct application in toponym validation is limited. Character-level comparisons alone may not capture the semantic relevance of multi-term place names. To address this, a novel validation metric referred to as the Index of Collaborative Toponyms Validation by Accumulated Evidence (ICTVAE), is proposed. The ICTVAE is designed to validate the existence of collaboratively mapped toponyms in OSM, leveraging textual evidence detected in street-level imagery (SLI). The ICTVAE consolidates two complementary components of evidence:
-
The quality of the textual correspondence, measured by the Levenshtein similarity between subterms of the OSM toponym and the corresponding text detected in the SLI; and
-
The coverage of the toponym, which assesses the proportion of subterms that exhibit any relevant detection within the imagery.
The ICTVAE is formalised as (2):
Where S i is the Levenshtein similarity of subterm i based on its best correspondence among the detected texts in the SLI; w i represents the weight of subterm i, proportional to its length (i.e., number of characters); n is the total number of subterms that compose the OSM toponym; and D is the number of subterms detected with any degree of similarity (S i > 0).
This formulation ensures a balanced assessment that considers not only how well the detected texts match the individual components of the OSM toponym (weighted similarity), but also the proportion of those components that were identified (coverage factor). The first term of the ICTVAE equation captures the weighted average similarity, giving greater importance to longer subterms that typically convey more semantic information. The second term represents the coverage ratio, which reflects the completeness of the detection by evaluating the proportion of subterms with any positive evidence of detection. By multiplying these terms, the ICTVAE penalises incomplete detections while rewarding higher-quality matches. This enables a more robust validation strategy, as it avoids overreliance on isolated high-similarity matches and prevents dilution by low-quality or incomplete detections.
The proposed index provides a quantitative score ranging from 0 to 100, where higher values indicate stronger accumulated evidence supporting the existence of a given OSM toponym in the real world, as observed through SLI. Crucially, the ICTVAE is not intended to measure the quality or correctness of the toponym’s spelling or its semantic appropriateness, but rather to validate its existence based on observable evidence captured in the imagery.
For example, an OSM toponym consisting of six subterms (e.g., “Colégio Estadual Barão do Rio Branco”) may exhibit varying degrees of similarity between its components and those detected in the SLI. The ICTVAE will evaluate the similarity of each subterm, weigh it according to its length, and calculate an overall validation score by incorporating both the quality of the matches and the completeness of subterm detection. Thus, even if only a subset of subterms is found, the Index aggregates these similarities and their coverage to produce the score.
This validation strategy strengthens the evidence-based assessment of collaboratively mapped features in OSM, as it focuses on leveraging multiple independent detections to support the presence of mapped features in user-generated geospatial databases. The approach is consistent with previous research that recommends multimeric similarity measures for robust text matching applications (Cohen, Ravikumar and Fienberg, 2003). The subsequent section presents the findings and critical discussions of the proposed validation framework.
3 Results and discussions
The YOLOv11x model was trained on the SVT dataset using a structured approach that included a validation dataset to monitor the performance and prevent overfitting. As shown in Figure 3, the training loss and validation loss curves showed a consistent downward trend, indicating effective convergence. The primary losses-box loss (localisation error), classification loss, and distribution focal loss-decrease significantly during the early training epochs, with an apparent stabilisation starting around epoch 50 and reaching a steady state by approximately 125 epochs. This trend suggests that the model effectively learns during the early training phase and refines its performance in later epochs. The use of data augmentation techniques contributed to improved generalisation, ensuring robust performance across different urban signage scenarios.
Training and validation loss curves for (i) box loss, (ii) classification loss, and (iii) distribution focal loss. Additionally, performance metrics (iv) precision, (v) recall, and (vi) mean average precision (mAP) over training epochs.
During validation, the model showed a balance between precision (the proportion of correctly identified textual instances among all positive predictions) and recall (the proportion of actual textual instances correctly detected). The trends in precision and recall indicate that while the model is effective in detecting textual instances, there is a trade-off between maintaining high recall and minimising false positives. The validation results closely match the final test performance, reinforcing the model’s reliability.
The final evaluation conducted on the test dataset consisted of 36 images, each containing a total of 84 text instances. The model achieved a precision of 76.62%, meaning that most of the predicted text elements were correct. The recall was 70.24%, which means most of the actual textual elements in the test dataset were successfully recognised, although some were missed. The mAP@50 (mean Average Precision at Intersection of Union [IoU] threshold of 0.5) was 72.80%, reflecting the overall accuracy of the inferences. However, the mAP@50-95 (which considers multiple IoU thresholds) was lower at 45.98%, underscoring the difficulties of maintaining consistent detection across varying levels of overlap. Figure 4 shows examples of the inference results from the SVT test dataset, illustrating the model’s ability to detect and localise text in a variety SLI of urban scenes. While the results demonstrate the model’s strong detection capabilities, some challenges persist in identifying small, occluded, or low-contrast text on signage.
To further illustrate the performance of the YOLOv11x model in detecting textual elements from street-level imagery, Figure 5 shows examples of predicted text regions. These examples highlight the model’s ability to accurately localise and extract text from urban signage, including street names, business signs, and other relevant toponyms. The bounding boxes indicate the detected text regions, while the extracted textual content is displayed for comparison. As observed, the model effectively identifies clear and well-structured text, although challenges remain in cases involving occlusions, low resolution, or complex backgrounds.
Examples of text prediction regions (red bounding boxes) on the SLI, using the proposed hybrid approach of YOLOv11 + Keras OCR
Examples of case studies with OSM toponyms and the corresponding text extracted from the SLI platforms, Mapillary and Google Street View imagery, respectively.
The results of the assessment of the evidence for the existence of OSM toponyms are presented in Table 2. For each case study, the table shows the validation scores calculated using the proposed Index of Collaborative Toponym Validation by Accumulated Evidence (ICTVAE).
The results of the ICTVAE index computation for the Mapillary and GSV datasets reveal distinct patterns of validation success across the selected case studies (Table 2). These variations highlight the relevance of the proposed approach and some challenges of OSM toponyms validation using SLI. Although no rigid threshold was defined to interpret ICTVAE scores, a validation flag was provided for guidance. Scores equal or higher than 50% were considered sufficient to indicate substantial textual evidence from SLI supporting the existence of the corresponding OSM toponym. However, lower percentages do not necessarily imply the absence of a toponym in the real world but rather limited textual evidence in the available imagery. Thus, interpretation remains case-dependent. Using this criterion, 09 out of the 12 case studies were successfully validated using substantial evidence from either Mapillary or GSV imagery.
In the Mapillary dataset, high ICTVAE scores were achieved in cases where the captured signs were clear and fully visible. For example, Case 06 (“Bek’s Bar”) and Case 09 (“Bradesco”) achieved ICTVAE scores of 100.0%, indicating complete subterm recognition with high similarity and full coverage. These results suggest that when textual information is prominently displayed and unobstructed in the signage, SLI from crowdsourced platforms such as Mapillary can effectively validate the presence of OSM toponyms in the real world.
Other noteworthy cases include Case 04 (“Panvel”) and Case 08 (“Itaú”), which yielded ICTVAE scores of 83.3% and 80.0%, respectively. These results demonstrate that even in urban settings with potential visual noise, relevant toponymic information can still be reliably extracted through the proposed methodology in this work, validating the existence of OSM toponyms.
On the other hand, lower validation scores were observed in cases of partial occlusion and limited visibility of image signage (e.g. 21.8% in Case 11 and 47.4% in Case 07). These highlight the challenges of OCR performance in dynamic outdoor environments. Similar difficulties were reported by Sun et al. (2023), who identified text occlusion and low-resolution imagery as significant barriers to extracting building attributes from SLI.
For Case 05 (“Tiki Taka Gastrobar”) and Case 10 (“Paróquia Santuário Sagrado Coração de Jesus”), no text was extracted from the Mapillary dataset, resulting in an ICTVAE score of zero. The lack of textual evidence shows the validation framework relies on detecting and recognising signage in the SLI. It also demonstrates the impact of factors such as sign positioning, crowdsourced image quality, and environmental conditions on validating OSM toponyms from SLI.
The GSV dataset revealed complementary patterns. High ICTVAE values were recorded in Cases 03 (“Batatiba”) and 04 (“Panvel”), both achieving 100.0% validation, as well as Case 08 (“Itaú”), further reinforcing the role of SLI in successful validation outcomes. However, some cases showed notably lower scores. For example, Case 09 (“Bradesco”) had an ICTVAE of 12.5%, and Case 01 (“Colégio Estadual Barão do Rio Branco”) scored 6.5%. These cases illustrate the sensitivity of the ICTVAE index to variations in SLI quality and the spatial positioning of signage relative to the camera’s field of view. Especially, in Case 09, a field verification confirmed that the “Bradesco” branch no longer existed at the time of GSV image capture (2023). The residual score is due to the presence of other textual elements with similar words detected in the vicinity. This highlights the importance of incorporating image metadata into the validation framework, as adopted in this work approach, since temporal discrepancies can affect the reliability of OSM toponym validation.
In case study 02, the ICTVAE scores for Mapillary and GSV were the same (52.94%), despite the different nature of the image capture environments. Mapillary typically provides a perspective view, while GSV often includes a wider field of view, capturing multiple elements beyond the primary target, which could have complicated the correspondence with the OSM toponym. This had no significant impact on the validation score. Both datasets provided sufficient evidence to support the existence of the OSM toponym (i.e. “Restaurante Yamato”). This demonstrates the robustness of the proposed index and the combined YOLOv11 + Keras-OCR approach for detecting and extracting relevant textual information from SLI in complex urban scenarios.
These results demonstrate that SLI, when processed using the hybrid YOLOv11 + Keras OCR approach and the proposed ICTVAE score, provides a robust framework for validating the existence of collaborative toponyms in OSM. The ICTVAE index, by combining weighted similarity scores with subterm coverage factors, ensures that even partial but meaningful evidence contributes to the validation process. This balanced methodology increases reliability and reduces the likelihood of false negatives, confirming its suitability for supporting the validation of OSM toponyms by SLI, even in challenging scenarios.
4. Conclusions
This research proposes an automated framework for validating collaborative toponyms from OpenStreetMap (OSM) using street-level imagery, combined with artificial intelligence and computer vision techniques. The hybrid approach, which integrates the YOLOv11 model trained on the SVT dataset for text region prediction with Keras-OCR for optical character recognition, demonstrated its effectiveness in detection and extracting relevant textual evidence from SLI. These detections were then evaluated using the Index of Collaborative Toponym Validation by Accumulated Evidence (ICTVAE), a novel metric specifically designed to consolidate Levenshtein similarity and coverage ratio into a balanced and objective validation score. All code and data from this study are available as open source on the GitHub platform, enabling reproducibility and refinement.
The results confirm the potential of the SLI as an external data source to support the existence of OSM toponyms, in the real world, even in the absence of authoritative datasets. The ICTVAE index showed to be a robust validation metric, effectively accounting for both the textual correspondence and the completeness of toponym subterms detection. Higher ICTVAE scores were observed in cases where signage was visible, unobstructed and captured with adequate image resolution. The combined use of YOLOv11 and Keras-OCR further improved the accuracy of the text recognition process by focusing on relevant regions of interest, reducing false positives and improving detection in complex urban scenarios.
When comparing the two SLI platforms, certain characteristics were observed. Mapillary provided a greater number of images for the case studies, often capturing multiple perspectives. However, the heterogeneous nature of its crowdsourced imagery, including inconsistencies in camera quality, resolution and capture standards, resulted in greater variability in validation results. Despite these challenges, Mapillary’s open licence (CC BY-SA 4.0) and wealth of imagery make it a valuable complementary source for OSM validation. In contrast, GSV provided standardised 360° imagery with higher resolution and more consistent temporal coverage. These factors contributed to higher similarity scores in some cases. However, the wide field of view often introduced visual noise, making it difficult to detect and match specific toponyms. Additionally, the proprietary nature of GSV imagery imposes licensing restrictions that limit its broader applicability in collaborative mapping validation workflows.
The findings also underlined key challenges in using SLI for toponym validation. Visual obstructions, occlusions, suboptimal lighting conditions and the variability of image capture perspectives, particularly in crowdsourced datasets such as Mapillary, remain significant barriers to reliable text extraction. However, the substantial number of crowdsourced images and the proposed ICTVAE index successfully mitigates some of these problems by aggregating evidence across multiple detections and rewarding partially complete but meaningful matches. Detailed verification, as demonstrated in case 09 (“Bradesco”), underscores the importance of integrating temporal metadata from SLI to account for potential discrepancies resulting from outdated images.
Future research directions include:
-
Enrich the training datasets by incorporating more representative Mapillary samples. While the SVT dataset used in this study provided a good training base, it was originally generated only from GSV imagery, which may limit generalisation to crowdsourced imagery platforms such as Mapillary.
-
Hyperparameters optimisation to further enhance the performance of text detection and recognition models.
-
Integration of semantic similarity measures to refine the textual matching processes, potentially reducing false positives and improving correspondence accuracy.
-
Expand the proposed methodology to additional OSM tags and geographic regions with available SLI, thereby assessing scalability and broader applicability.
-
Furthermore, the addition of integration of semantic validation layers, as suggested by Sun et al. (2023), could complement the evidence-based approach adopted in this study. This would enrich the validation framework and support more comprehensive quality assessments of user-generated geospatial data.
In conclusion, the proposed methodology of this work shows that the combined use of advanced AI models and SLI provides a practical and scalable solution for validating collaborative toponyms in OSM. In particular, the ICTVAE index provides an objective, evidence-based measure of validation, reinforcing the role of SLI as a viable alternative to traditional field surveys and in regions where authoritative datasets are scarce.
ACKNOWLEDGMENT
We thank the Postgraduate Programme in Geodetic Sciences of the Federal University of Paraná for their support. We would also like to thank the Kauê de Moraes Vestena for supporting the development of the algorithms for the Mappilary API data request and the Levenhstein distance.
REFERENCES
-
Alrasheed, N. et al. (2021). Evaluation of Deep Learning Techniques for Content Extraction in Spanish Colonial Notary Records. In: Proceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents SUMAC’21. 20 October 2021. New York, NY, USA: Association for Computing Machinery. pp.23-30. [Online]. Available at: doi:10.1145/3475720.3484443 [Accessed 19 January 2025].
» https://doi.org/10.1145/3475720.3484443 -
Anguelov, D. et al. (2010). Google Street View: Capturing the World at Street Level. Computer, 43 (6), pp.32-38. [Online]. Available at: doi:10.1109/MC.2010.170.
» https://doi.org/10.1109/MC.2010.170 -
Antoniou, V., Touya, G. and Raimond, A.-M. (2016). Quality analysis of the Parisian OSM toponyms evolution. In: European Handbook of Crowdsourced Geographic Information London: Ubiquity Press. pp.97-112. [Online]. Available at: Available at: https://www.jstor.org/stable/j.ctv3t5r09.12 [Accessed 22 January 2024].
» https://www.jstor.org/stable/j.ctv3t5r09.12 -
Ardanuy, M. C. and Sporleder, C. (2017). Toponym disambiguation in historical documents using semantic and geographic features. In: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage 2017. pp.175-180. [Online]. Available at: doi:10.1145/3078081.3078099.
» https://doi.org/10.1145/3078081.3078099 -
Barron, C., Neis, P. and Zipf, A. (2014). A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis. Transactions in GIS, 18 (6), pp.877-895. [Online]. Available at: doi:10.1111/tgis.12073.
» https://doi.org/10.1111/tgis.12073 -
Biljecki, F. and Ito, K. (2021). Street view imagery in urban analytics and GIS: A review. Landscape and Urban Planning, 215, p.104217. [Online]. Available at: doi:10.1016/j.landurbplan.2021.104217 [Accessed 19 May 2023].
» https://doi.org/10.1016/j.landurbplan.2021.104217 - Camboim, S. P.; Bravo, J. V. M.; Sluter, C. R. (2015). An investigation into the completeness of, and updates to, the Open Street Map data in a heterogeneous area in Brazil. ISPRS International Journal of Geo-Information, 3 (4), pp.1366-1388.
-
Chaitra, Y. L. et al. (2022). An Impact of YOLOv5 on Text Detection and Recognition System using TesseractOCR in Images/Video Frames. In: 2022 IEEE International Conference on Data Science and Information System (ICDSIS) July 2022. pp.1-6. [Online]. Available at: doi:10.1109/ICDSIS55133.2022.9915927 [Accessed 19 January 2025].
» https://doi.org/10.1109/ICDSIS55133.2022.9915927 - Cohen, W. W., Ravikumar, P. and Fienberg, S. E. (2003). A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) 3. 2003. pp.73-78.
-
Daniel, N. and Mátyás, G. (2022). Citizen science characterization of meanings of toponyms of Kenya: a shared heritage. GeoJournal [Online]. Available at: doi:10.1007/s10708-022-10640-5.
» https://doi.org/10.1007/s10708-022-10640-5 -
Ganieva, G. (2021). Some Techniques Used to Collect Toponyms. Mental Enlightenment Scientific-Methodological Journal, 2021 (2), pp.160-172. [Online]. Available at: doi:10.51348/tziuj2021218.
» https://doi.org/10.51348/tziuj2021218 -
Goodchild, M. F. (2007). Citizens as sensors: the world of volunteered geography. GeoJournal , 69 (4), pp.211-221. [Online]. Available at: doi:10.1007/s10708-007-9111-y.
» https://doi.org/10.1007/s10708-007-9111-y -
Hou, Y. and Biljecki, F. (2022). A comprehensive framework for evaluating the quality of street view imagery. International Journal of Applied Earth Observation and Geoinformation, 115, p.103094. [Online]. Available at: doi:10.1016/j.jag.2022.103094 [Accessed 3 October 2024].
» https://doi.org/10.1016/j.jag.2022.103094 -
Jocher, G. and Qiu, J. (2024). Ultralytics YOLO11. Ultralytics [Online]. Available at: doi:10.5281/zenodo.7347926 [Accessed 12 July 2024].
» https://doi.org/10.5281/zenodo.7347926 -
Kai Wang. (2012). The Street View Text Dataset Available at: http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset [Online]. Available at: http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset [Accessed 25 July 2024].
» http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset -
Kent, A. J. and Hopfstock, A. (2018). Topographic Mapping: Past, Present and Future. Cartographic Journal, 55 (4), pp.305-308. [Online]. Available at: doi:10.1080/00087041.2018.1576973.
» https://doi.org/10.1080/00087041.2018.1576973 -
Leon, L. F. A. and Quinn, S. (2019). The value of crowdsourced street-level imagery: examining the shifting property regimes of OpenStreetCam and Mapillary. GeoJournal , 84 (2), pp.395-414. [Online]. Available at: doi:10.1007/s10708-018-9865-4.
» https://doi.org/10.1007/s10708-018-9865-4 - Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady, 10 (8), pp.707-710.
-
Liang, X., Zhao, T. and Biljecki, F. (2023). Revealing spatio-temporal evolution of urban visual environments with street view imagery. Landscape and Urban Planning , 237, p.104802. [Online]. Available at: doi:10.1016/j.landurbplan.2023.104802.
» https://doi.org/10.1016/j.landurbplan.2023.104802 -
Liu, Y. et al. (2024). YOLOv5ST: A Lightweight and Fast Scene Text Detector. Computers, Materials & Continua, 79 (1), pp.909-926. [Online]. Available at: doi:10.32604/cmc.2024.047901 [Accessed 19 January 2025].
» https://doi.org/10.32604/cmc.2024.047901 -
Ma, D. et al. (2020). The State of Mapillary: An Exploratory Analysis. ISPRS International Journal of Geo-Information , 9 (1), p.10. [Online]. Available at: doi:10.3390/ijgi9010010.
» https://doi.org/10.3390/ijgi9010010 -
Machado, A. A. et al. (2022). Informação geográfica voluntária: o potencial das ferramentas colaborativas para a aquisição de nomes geográficos. Revista Brasileira de Geografia, 66 (2), pp.239-253. [Online]. Available at: doi:10.21579/issn.2526-0375_2021_n2_239-253.
» https://doi.org/10.21579/issn.2526-0375_2021_n2_239-253 -
Mapillary. (2025). Mapillary [Online]. Mapillary©. Available at: Available at: https://www.mapillary.com/ [Accessed 16 January 2025].
» https://www.mapillary.com/ -
Martins Junior, O. G. et al. (2016). Informação geográfica voluntária no processo de reambulação. Boletim de Ciências Geodésicas, 22, pp.613-629. [Online]. Available at: doi:10.1590/S1982-21702016000400035.
» https://doi.org/10.1590/S1982-21702016000400035 - Neuhold, G. et al. (2017). The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In: Proceedings of the IEEE International Conference on Computer Vision 2017. pp.4990-4999.
-
Nunes, D. M. and Camboim, S. P. (2024). Potencial e Desafios do Uso de Imagens ao Nível de Rua Como Fonte de Topônimos: Uma Abordagem Utilizando Técnicas de Inteligência Artificial. In: Anais do XIII Colóquio Brasileiro de Ciências Geodésicas 26 November 2024. Curitiba, PR. Brasil: XIII Colóquio Brasileiro de Ciências Geodésicas. pp.1-3. [Online]. Available at: doi:10.5281/zenodo.14153332 [Accessed 24 March 2025].
» https://doi.org/10.5281/zenodo.14153332 -
Olteanu-Raimond, A.-M. et al. (2017). The Scale of VGI in Map Production: A Perspective on European National Mapping Agencies. Transactions in GIS , 21 (1), pp.74-90. [Online]. Available at: doi:10.1111/tgis.12189.
» https://doi.org/10.1111/tgis.12189 -
OSM API. (2024). OSM API v0.6 [Online]. OSM API v0.6. Available at: Available at: https://osmapi.metaodi.ch/osmapi.html [Accessed 15 December 2024].
» https://osmapi.metaodi.ch/osmapi.html -
Perdana, A. P. and Ostermann, F. O. (2018). A Citizen Science Approach for Collecting Toponyms. ISPRS International Journal of Geo-Information , 7 (6), p.222. [Online]. Available at: doi:10.3390/ijgi7060222.
» https://doi.org/10.3390/ijgi7060222 -
Redmon, J. et al. (2016). You Only Look Once: Unified, Real-Time Object Detection [Online]. Available at: Available at: http://arxiv.org/abs/1506.02640 [Accessed 28 October 2024].
» http://arxiv.org/abs/1506.02640 -
Shi, B., Bai, X. and Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (11), pp.2298-2304. [Online]. Available at: doi:10.1109/TPAMI.2016.2646371 [Accessed 19 January 2025].
» https://doi.org/10.1109/TPAMI.2016.2646371 -
Silva, L. S. L. and Camboim, S. P. (2021). Authoritative cartography in Brazil and collaborative mapping platforms: Challenges and proposals for data integration. Boletim de Ciências Geodésicas , 27. [Online]. Available at: doi:10.1590/s1982-21702021000100003.
» https://doi.org/10.1590/s1982-21702021000100003 - Ursini, F.-A. and Samo, G. (2023). Extracting Toponyms from OpenStreetMap: A Cross-Linguistic Perspective. In: CEUR Workshop Proceedings 2 April 2023. Dublin, Ireland.
-
Yamashita, J. et al. (2022). Quality assessment of volunteered geographic information for outdoor activities: an analysis of OpenStreetMap data for names of peaks in Japan. Geo-spatial Information Science, 0 (0), pp.1-13. [Online]. Available at: doi:10.1080/10095020.2022.2085188.
» https://doi.org/10.1080/10095020.2022.2085188
-
DATA AVAILABILITY
The codes and data that support the findings of this study are openly available in a GitHub repository at https://github.com/darlanmnunes/PhD_Thesis_Step2_OSM_Toponyms
Data availability
The codes and data that support the findings of this study are openly available in a GitHub repository at https://github.com/darlanmnunes/PhD_Thesis_Step2_OSM_Toponyms
Publication Dates
-
Publication in this collection
14 Nov 2025 -
Date of issue
2025
History
-
Received
31 Mar 2025 -
Accepted
10 Sept 2025












