Detection of rural roads from planet images using convolutional neural networks

Crato, Jorgiana Kamila Teixeira do; Jijón-Palma, Mário Ernesto; Centeno, Jorge António Silva

doi:10.1590/s1982-21702025000100004

Abstract:

Updating maps in Brazil is hindered by considerable obstacles, primarily to the high costs associated with it and the difficulty of accessing the regions. Moreover, the accelerated rate of environmental transformation, particularly in rural settings, represents an additional challenge. This study proposes to use high-resolution satellite images from Planet constellation, in conjunction with artificial intelligence, specifically UNet, to automatically identify rural roads in the metropolitan region of Curitiba, Paraná, Brazil. The objective is to identify the optimal parameters for automating the detection of rural roads. The UNet, with its distinctive U-shaped architecture, is highly effective in segmenting and detecting targets while simultaneously preserving the feature maps in each convolution. In this study, the network was trained on satellite images containing rural roads, resulting in segmented maps with an encouraging 91.95% accuracy in road detection. Nevertheless, further improvement is possible, as evidenced by the method’s precision of 75.83% and F1-Score of 69.07%. These outcomes indicate the possibility of enhancement through the expansion of the training dataset, thereby better addressing the network’s recognition constraints. One potential avenue for optimizing detection using the methodology would be the incorporation of supplementary training samples, which could potentially mitigate the network’s recognition limitations.

Keywords:
UNet; Planet Imagery; Convolutional Neural Network; Rural Road Detection

1. Introduction

It is irrefutable that roads serve a pivotal function in facilitating transportation and enhancing the quality of life. They provide access to essential services, including public safety and healthcare, for communities in remote locations. Furthermore, the expansion of public services, such as water and energy supply, is typically planned along roads, thereby contributing directly to socio-economic development.

In countries with extensive territories, such as Brazil, the continuous updating of road maps represents a significant challenge, particularly in rural areas, due to the vast distances and the lack of infrastructure and access to them. The extensive territorial expanse and the considerable financial outlay associated with mapping render this an inherently time-consuming and costly process. It is of the utmost importance that road network are accurately mapped and consistently updated in order to maintain the infrastructure, allow management bodies to plan expansions and restructuring, and promote economic expansion in the regions they serve.

In this context, the research is based on the application of satellite images with sufficient resolution to identify and extract information about roads in rural areas. Additionally, it considers methods based on artificial intelligence, such as convolutional networks, which are used to identify objects in images in a manner similar to that of human perception. This allows for the development of methods to map rural road networks using these satellite images and artificial intelligence algorithms with precision.

Historically, the delineation of the road network was conducted through visual analysis of the images, entailing the manual digitization of the road axis on a computer screen (Chambon, 2011; Cheng et al., 2014). Although this approach is laborious, it allows for an accurate estimate of the road network, as the human observer can identify issues such as occlusions by vegetation or shadows, as well as color variations caused by the presence of different materials (Silva; Centeno, 2010). Nevertheless, the automation of this task has been the focus of significant research, driven by the emergence of computational advances that enable the development of methods that emulate aspects of human reasoning. These include neural networks, support vector machines (SVM), random forests, and deep learning techniques. Moreover, convolutional networks can be employed for the purpose of mapping roads in images.

As stated by Zhang, Zhou, and Lin (2018), the FCN (Fully Convolutional Network) method was initially developed for the semantic segmentation of images based on the VGG (Visual Geometry Group) model due to its favorable performance and high efficiency. As observed by Zhang, Zhang, and Du (2016), each convolution layer is capable of not only identifying the object of interest, but also determining its spatial location and contour details. The method employs deep learning for end-to-end training, integrating it with conditional random fields and subjecting it to post-processing to enhance segmentation performance (Dung, Anh, 2019). In order to successfully detect roads in images, it is imperative that image segmentation be performed as part of the process.

Furthermore, target segmentation can be accomplished through the use of a Generative Adversarial Network (GAN). The GAN model is a convolutional neural network (CNN) that incorporates segmentation as a component of its overall processing. An illustrative example of the application of the GAN model to the generation and discrimination of road detection data can be seen in the work of Abdollahi et al. (2020b). The prevailing trend in this field indicates an increasing reliance on convolutional networks, given their ability to analyze regions of an image in search of distinctive patterns associated with road sections.

In light of these considerations, the UNet network was selected for its capacity to segment images and discern both global and local features, as well as its high degree of accuracy in segmentation tasks. This is particularly evident when working with a limited number of training samples and generating precise segments, as demonstrated by Ronneberg, Fischer, and Brox (2015).

The UNet network was initially developed for use in the field of biomedicine (Ronneberg, Fischer, and Brox, 2015), but has since been applied to other domains, including remote sensing (Senadeera, 2021). In this context, the network was used in conjunction with semantic neural networks and transfer learning, which are inherent to the architectural design of UNet, to achieve highly accurate extraction of roads from satellite images.

The UNet employs an end-to-end learning approach that encompasses all parameters, including those pertaining to the decoding phase. In this phase, spatial information and features are integrated through convolution and concatenation operations. The pooling operations not only render the learned features invariant to image transformations but also expand the receptive field to incorporate more context information implicitly (Yang, Rottensteiner, and Heipke, 2019). The training samples are presented to the UNet as input in the form of cutouts. The samples are processed in the encoder phase, with the execution of the convolution functions organized in sequence.

In this context, this research proposes the detection of the rural road network in the metropolitan region of Curitiba, Paraná, using images from the Planet constellation, given the absence of rural areas in the municipality of Curitiba. To this end, artificial intelligence methodologies were employed, specifically the UNet neural network. The deployment of the UNet in conjunction with these images facilitates the automated identification and mapping of roads in rural areas with high precision, offering an efficient alternative to traditional manual delimitation methods.

2. Material and Method

The input image was selected considering the typical width of rural roads and the spatial resolution of available remote sensing images. It was decided to use images from the Planet constellation, which were made available by the Norwegian Environmental Protection Initiative (NICFI) via the website planet.com/explorer/. The spatial resolution of the Planet images is 4.7m.

Access to Planet images is contingent upon the user’s level of access. This is divided into three categories, for this search is used the level 1 - Open Access to the Planet Platform. Allows users to view images that have undergone atmospheric and geometric correction, with a spatial resolution between 3 and 5 meters (NICFI, 2022).

The images were selected based on the following criteria: they were obtained by the Planet Scope sensor, with a spatial resolution of 4.7 meters, and 0% cloud cover, and were acquired on May 21, 2022. The images have 4,096 x 4,096 pixels, comprising four bands (RGB and NIR) with 8 bits radiometric resolution. Additionally, the data are available at a monthly temporal resolution.

Figure 1:
Location map of the study area. A) Presents the study area, containing the scenes used in this study in relation to the municipality of Curitiba. B) Shows the location of the study area in the state of Paraná. C) Position of the state of Paraná in Brazil and in South America.

The present study was conducted in a rural area within the metropolitan region of Curitiba, located in southern Brazil (see Figure 1). Twelve scenes from the Planet images were selected for this study, covering an area of approximately 5,000 km². Figure 1A illustrates the rural area used in this study in green, as indicated by the satellite images, bordering the urban area of Curitiba, shown in grey with a black outline in the central region of the map. The images encompass the municipality of Curitiba and parts of the metropolitan region of Paraná. The study concentrated on a rural area, necessitating the exclusion of more urbanised regions. As the objective was to evaluate the capacity of a convolutional network to process high-spatial-resolution images in the visible range, the NIR band was excluded from the analysis. Subsequently, the same methodology is anticipated to be employed to analyse RGB aerial images obtained by Unmanned Aerial Vehicles (UAVs).

The Google Colab platform was used with the python language interpreter to develop the necessary programs. Google Colab, or Colaboratory, is an online platform that allows writing and executing codes in Python online, with free access to virtual Graphics Processing Units (GPUs). Although three types of GPUs are available for use in the Colab environment, (T100, A100 and T4), only the basic GPU (T4) is available free. In this study, the T4 GPU access was used, with 0.18 processing cells per hour to perform processing. Despite the limitations of using the GPU in the Colab environment, it has great applicability in the field of machine learning, as it includes many machine-learning libraries in Python. Therefore, it was used to implement the programming routines of this project. taking advantage of the main numerical and image data manipulation libraries NumPy, MatPlotLib, PiL, along with the machine learning libraries TensorFlow and Keras, and the pre-trained segmentation model library Segmentation Model.

2.1 Method

The methodology of this research was structured in three stages: pre-processing, processing, and post-processing. In the initial stage, the original images were divided into smaller sections in order to prepare the image samples. Subsequently, labels were generated, designating the pixels as background, object and the outline of the roads. The samples were then organized into designated directories.

In the second stage, the processing stage, the test and training directory was employed as the input for the network. Following the generation of the training model, the validation directory was employed, resulting in the detection of roads. In the final stage, post-processing, the results were validated using statistical metrics and visual evaluation, with the objective of comparing the field truth with the results obtained (Figure 2). These steps are described below.

Figure 2:
The following flowchart illustrates the methodology employed to develop this research project. It comprises three principal stages: A) Identify in red is the pre-processing (acquisition, sample preparation and separating sets). B) Identify in blue is processing using UNet (Training, test e validation of model). C) Identify in green is the post-processing (validation by statistical metrics).

2.1.1 Preprocessing

The initial stage of the process entails a visual examination of the image, with the purpose of generating a ground truth map, which will be utilized in the subsequent stages. It was necessary to manually vectored the rural roads in the QGIS software version 3.22.14, relying exclusively on visual analysis to identify them in the image. This step was of paramount importance for the generation of the training masks and for ensuring the compatibility of the results with the ground truth map.

In the second phase of the process, the pixels were labeled using QGIS software version 3.22.14. To take account of any spatial inaccuracies in the tagged image, each pixel in the reference image was assigned a value according to its specific location. Figure 3 illustrates this stage of the process. Figure 3A shows part of the input image in RGB format, while Figure 3B illustrates the binary image generated through visual analysis. To delimit the pixels, a scale based on a pre-existing network model was used, which served as the basis for the development of this research project. The visible part of the road (the body of the object) was designated as “1”, the background as “2” and an area surrounding the roads (a buffer) was assigned the value “3” (Figure 3C). The buffer was selected with a radius of two pixels around the road, as a larger radius could incorporate information that is not pertinent to the description of the road itself, while a smaller radius might not encompass roads of greater thickness. This made it possible to consider potential minor variations in the edges of the roads.

Figure 3:
The following examples illustrate the creation of cutouts in PLANET images and their respective visual identifications for the road. A) A cutout containing a sample of the original image that was used as input in UNet. B) A cutout of the image in which pixels have been marked exclusively for the roads, corresponding to the region that was delimited in the original sample. and C) A mask indicating the labeling of The objects present in the sample, based on the original cutout, are distinguished by color. The object is separated by color, with the body of the object identified in “white”, the background in “black”, and the outline of the roads identified in “gray”.

Subsequently, samples containing road segments were extracted from the RGB image and the ground truth labeled image. The samples were obtained automatically. For this purpose, the labeled image was scanned looking for pixels labeled as road. Once a pixel was identified as belonging to a road, a square region around it is extracted from both the RGB image and the labeled image (see Figure 4). Following an investigation into the optimal neighborhood size, it was determined that a size of 41x41 pixels was the most effective. Window sizes smaller than the optimal 41x41 pixels are insufficient for adequately describing the diverse shapes present in the road network, including intersections and forks. To avoid redundancy, the scan was performed with a 41 lines and columns step. The samples were then labeled to create pairs of cutouts, namely an RGB image and a labeled image, as illustrated in Figure 4.

Figure 4:
The following samples have been prepared for submission to UNet for evaluation. A) The image contains the primary route of the road in a rural setting. B) The selected area, marked with pixels.

The available samples were then divided into two sets: one for training and testing and the other for validation. The training set was obtained from 10 scenes out of 12 scenes acquired on May 21, 2022. The validation samples were obtained from the remaining images, ensuring that the same images were not used for both training and validation.

2.1.2 Processing

A UNet network was adapted to analyze the training samples and compute relevant features to detect roads. The architecture of the UNet network is characterized by its “U” shaped structure, like the principle of the auto encoder (Figure 5). It has two main arms, composed of convolutional layers of different sizes. On the left arm, the encoding step, 3x3 convolutional layers use the ReLU activation function to compute spatial features, followed by 2x2 up convolutional operations that compress and store the feature maps. This grants spatial invariance of the feature maps and increases the receptive field (RONNEBERGER, FISCHER and BROX, 2015).

Figure 5:
Example of the UNet Network applied to low-resolution 32x32 pixel images.

During training, samples are processed in the encoder arm, which uses convolution layers to compute several feature maps. As the size of the image is progressively reduced, a compressed description in the central node. This encoded representation is decoded on the right arm, where the deconvolution layers amplify the spatial information using the 2*2 max pooling convolution function and perform the concatenation of the feature maps obtained in the coding process, reconstructing a segmented image based on the input features. UNet is widely used for image segmentation, generating a segmented image as output, not a copy of the input. It operates end-to-end, combining spatial information and the concatenated feature maps. Intermediary connections allow information transfer from low to high levels, which significantly improves the quality of segmentation, as highlighted by Senadeera (2021).

The architecture of the network that was used in the experiments included 13 convolutional layers, seven in the coding phase and six in the decoding phase. The training was carried out over 50 epochs. Two series of experiments were performed, varying the number of neurons. In the first test, Test 1, 16 neurons were used, and in the second experiment, Test 2, 32 neurons.

In the training and testing phase, the network was fed with input samples containing parts of roads and the corresponding labeled image. The training set consists of 9760 samples, where 70% of the samples were used to adjust the parameters of the net and the remaining samples (30%) were used as test samples.

Training is an iterative process where the desired samples and outputs are presented to the network, which produces an output based on its internal parameters, or weights. During training, the network parameters are adjusted to produce an output that is similar to the desired labeled image from a given input (RGB clipping). By comparing the output with the desired output (labeled clipping), the committed error is estimated and based on this error, the internal parameters of the network (weights) are adjusted, using the back propagation algorithm. This process is repeated until the error is minimized, when the network is said to have been trained. The back propagation algorithm adjusts the weights of the neurons, propagating the error throughout the entire network (SENADEERA, 2021), backwards, starting at the output layer. After training, the network architecture and corresponding weights are stored and can be used to analyze new samples.

The validation of the net was performed with a new set of 1783 samples not used during training. These predictions were statistically compared with the ground truth, and quality evaluation was performed using the evaluation metrics proposed by Xu et al. (2019): recall (R), precision (P), F1-score, Accuracy, and mIoU. For this purpose, and according to equations 1-5, the number of true positives (TP), false positives (FP), and false negatives (FN) was computed.

Equation 1 indicates the accuracy of the model. This equation compares the number of correctly identified samples to the total number of total analyzed samples, thereby providing a measure of the agreement between the predicted mask and the vectored mask. This metric considers only true positive cases.

A c = \frac{T P}{n}

(1)

Precision, as presented in Equation 2. Is of paramount importance in evaluating the model’s capacity has accurately identify samples. This is achieved by calculating the proportion of correctly classified samples in relation to the total number of samples identified as “road.” This metric reflects accuracy from the perspective of the data producer, emphasizing the precision with which roads are identified.

P = \frac{T P}{T P + F P}

(2)

As indicated in Equation 3, Recall provides a complementary assessment of the model’s sensitivity in accurately identifying samples belonging to the “road” class.

R = \frac{T P}{T P + F N}

(3)

The F1 score (Equation 4) provides an overall assessment of accuracy, which is of particular importance when working with unbalanced datasets, where the number of marked pixels differs significantly from the number of background pixels. This is evidenced by the significant discrepancy between the number of pixels marked as “road” and the number of background pixels. By calculating the harmonic mean between recall and precision, this metric ensures accurate evaluation in contexts where there are inherent difficulties.

F_{1} s c o r e = 2 * \frac{R * P}{R + P}

(4)

Ultimately, the IoU metric, also referred to as the Jaccard index, as demonstrated in Equation 5, provides a comprehensive evaluation of the degree of overlap between the predicted mask and the vectored mask. This metric reveals the degree of pixel-to-pixel similarity between the images, thus providing valuable information on the accuracy of rural road detection. These metrics collectively provide a comprehensive assessment of the model’s effectiveness in this task.

I o U = \frac{T P}{T P + F P + F N}

(5)

3. Results and Discussion

Once the network has undergone training, it can be employed to analyze new samples and identify any roads within them. Figure 6 illustrates the resulting output. The RGB image (Figure 6A), which contains a portion of a road, was provided as input to the network, and the resulting output is shown in Figure 6B. It can be posited that the road has been correctly identified. Figure 6C illustrates the result superimposed on the original image.

Figure 6:
Example of the application of the computed network for road detection A) original RGB sample; B) Computed output, where white represents road pixels and black the background; and C) Overlay of the result on the original image, considering the buffer around the road.

Two series of tests were performed, one with 16 neurons and another with 32. Comparing the quality metrics obtained in the tests, it was noted that increasing the number of neurons increased the processing time. Initially, processing took about four hours, increasing to six hours in the second test, as shown in Table 1. Despite maintaining the other parameters constant (batch size, Epochs), the number of neurons influence the computational effort (computational time increased almost 2 hours), with modest gains in the quality statistics.

Thumbnail

Table 1:
Results for first series of tests using Colab platform, 9760 training samples and 1783 test samples, 50 training epochs.

The benefits were notably limited: a gain of only 0.10% in accuracy, 0.03% in precision, and 0.08% in F1-Score. Even for metrics where the improvement was a little more significant, such as Recall, which showed an increase of 2.21%, and mIoU, with an improvement of 1.07%, the gain is still considered low.

Considering the quality metrics, the accuracy lies above 90%, which is relatively high. Nevertheless, the F1-score and mIoU are not as high. The high values of the accuracy are explained because this metric considers only the true positives in relation to all the labeled pixels. This excludes false detected road pixels.

Low mIoU values were expected, as the detected road pixels do not perfectly overlap the digitized example. This error is not crucial, because the road width varies along the image and the digitized road has constant width.

An example of the result of the first series of experiments is shown in Figure 7, which includes A) the original image used in the validation overlaid with the result and B) the result obtained by the network using 16 neurons. The identification of the main silhouette of the road is evident.

Figure 7:
Result of road detection using 16 neurons. A) Result (red) overlaid on the original image. B) Mask Binary resulting image.

Figure 8 displays several examples of the performance of the UNet using 16 and 32 neurons. Column A displays the input RGB sample; B corresponds to the ground truth mask, with white indicating the center of the road and gray indicating the edge of the road, determined by a buffer of 2 pixels; C displays the result using 16 neurons; and D using 32 neurons.

These examples allow a better interpretation of the results, beyond the values of the quality metrics. In example 1, the detection perfectly matched the ground truth, proving optimal agreement. The network was able to reproduce the manual digitizing product. Furthermore, the Network correctly identified pixels that were not present in the ground truth, as evidenced in example 2, which is highlighted in green.

However, some challenges were observed, such as the presence of dense vegetation or the lack of clarity regarding the continuity of the road (as highlighted in yellow in example 3), which causes failures. Such failures can be explained by the lack of contrast between the road and the background, as no bare ground is visible.

Figure 8:
Examples of the performance of the UNet: A) RGB sample. B) Ground truth. C) Result using 16 neurons. d) Result using 32 neurons.

Another example of failure is delimited in orange in example 4. In this case, the road was not detected, and the possible explanation is the lack of contextual information, as the error occurs on the border of the image, and the lack of contrast, as there is an urban area with similar color around the road. The same factors influenced the result displayed as example 5, where the contrast between the road and the background is low.

The two experiments did not produce the same results. As displayed in example 6, using 16 neurons produced better results than using 32, when considering the continuity of the road network. On the other hand, the geometry of the detected roads is not correct. Using 32 neurons produces conservative results, which produce other kinds of errors, losing parts of the roads.

Despite the failures and challenges faced, it was noted that the output of the network largely coincides with the ground truth, and it became evident that using 16 neurons can produce better results in some aspects.

Additionally, an experiment was conducted to assess the enhancement in gain that could be achieved by augmenting the network with an additional 64 neurons. This was based on the premise that an increase in the number of neurons would lead to a more intricate network, facilitating a greater number of connections between operations within each layer. However, the experiment was terminated due to the excessive computational demands, as indicated by the Colab environment.

Colab is a virtual environment that employs cloud-processing tools, including virtual graphics processing units (GPUs). It is a commercial platform designed to offer a programming environment, but with limitations for free use. In this case, when attempting to utilize a greater number of neurons and/or layers, the platform generated a message indicating that the use of the kernel for processing had exceeded the available memory. This prevented the increase in the number of neurons, as the server resources were insufficient to accommodate the request.

3.1 Discussion

Following the analysis of the results of this study, it was determined that the accuracy rate exceeded 90%, a value considered high for road detection by the neural network, based on the optimal performance observed in the second test. It is important to note that the focus of this study is on rural roads, whereas the existing literature contains a multitude of studies on road detection that prioritize highways as the object of analysis. While rural roads and highways exhibit certain similarities in terms of geometric shape, highways are complex and well-defined engineering works with distinctive characteristics, particularly in comparison to rural roads.

In consideration of the initial data set and the characteristics of the target, a correlation was established between the quality of the network and the findings of other studies. A review of the literature reveals that highways possess characteristics that are seldom observed on rural roads. For instance, the paving material differs: highways are paved with asphalt or concrete, whereas rural roads are typically composed of compacted soil.

In the study by Abdollahi et al. (2020a), the F1-Score metric of a convolutional neural network (CNN) for road detection was 0.5320. In comparison, this study obtained an F1-Score of 69.07%, which indicates that the detection is relatively adequate despite the additional difficulties imposed by the characteristics of rural roads. Nevertheless, Lourenço et al. (2023) attained an F1-Score of 0.7960 with UNet in one of their assessments, indicating that augmenting the data volume and conducting a comprehensive analysis to assess overfitting could potentially enhance this metric.

In Senadera’s (2021) study, the accuracy for highways reached 95%, with 72% accuracy achieved using a convolutional neural network (CNN). In this study, the accuracy was 91.69%, representing a slight decrease of approximately 4% compared to the previous results, while the precision was 75.83%, exhibiting a modest increase of around 4% in comparison to the earlier findings. These discrepancies can be attributed to specific characteristics, such as the width of the roads. Highways have lanes with a regulated width of approximately 4 meters, while rural roads tend to be narrower, with variable width and, in some cases, only a lane of around 3 meters in places that are difficult to access.

In this study, the UNet model was used in its most basic form, resulting in an mIoU of 55.01%. This validates the segmentation by demonstrating a correlation between the model’s predictions and the actual field observations. In enhanced UNet models, such as the C-UNet proposed by Hou et al. (2021), which employs road segmentation through a combination of a standard UNet and a UNet with multiscale dilated convolution, the mIoU attained was 59.99%. Ghandorh et al. (2022) reported mIoUs of 77.34% and 75.44%, respectively, for segmentation masks using UNet and SegNet. Despite the smaller number of parameters in the SegNet model, this simplicity results in a reduction in performance.

As recommended in the literature, the F1-Score and mIoU metrics were employed to evaluate the quality of road detection using artificial intelligence, yielding values of 69.07% and 55.01%, respectively. These indices indicate that the neural network still requires improvement. One potential avenue for enhancing these outcomes would be to conduct further experiments with a more expansive data set.

These findings suggest that, with respect to the current model, there is scope for advancement in the categorization of rural roads, underscoring the necessity of investigating more resilient and comprehensive variants of neural networks.

4. Conclusion

This paper evaluates the use of a UNet convolutional network to detect and map rural roads in RGB images. For this purpose, data from Planet Constellation were used and tests were performed in the Google Colab environment. The use of UNet convolutional networks proved to be feasible for the detection of rural roads in remote sensing images with a resolution of 5m. First, it was concluded that a region of approximately 41x41 pixels (equivalent to 205mx205m) is sufficient to describe the different shapes of the road network. The experiments proved that UNet can detect roads in 41x41 pixel samples, and the combination of these results shows that the results are accurate. Some errors still occur, especially when trees overlap the road (occlusion) and when the contrast between the road and the background is low. Errors can also occur in shaded areas. The accuracy values chosen in the experiments are above 90%, which can be considered high. However, the network still needs improvement. Through the experiments, we found that the UNet architecture with a batch size of 32, 50 epochs, and initially 16 neurons is satisfactory. It performs detection with relatively good analysis, reaching 91.85% accuracy. Increasing the number of neurons to 32 did not significantly improve the results (accuracy of 91.95%).

The success of the proposed network depends on the availability of a large number of training samples, including as many possible variations of the shapes of a road in the image, reflecting the different appearances of roads. For this work, about nine thousand samples of road sections were used. However, there is still a need for more samples, since the straight sections and indexes had the highest success rate. Problems can be expected if the sections are winding or if there are other elements that hide the roads.

The use of the free Google Colab environment proved to be viable for the purpose of detecting roads in the images of the planet, but more complex networks increased the computational effort too much, which blocked new experiments. Depending on the GPU used, increasing the complexity of the network may exceed the amount of memory available on the platform.

4.1. Recommendation for future work

An interesting extension of this study would be to apply the trained model in different geographic scenarios, considering variations in topography, land use, and environment. This would allow for a more comprehensive evaluation of performance in different contexts. It is also recommended to study the improvement of image contrast using the band fusion method with the panchromatic band for satellites with better spatial resolution and to complement this with the vegetation index test.

It is also recommended that this type of network be evaluated with other data sets, such as aerial or orbital imagery with better spatial resolution than that used in this study. Samples of road sections from other locations and regions reflect other common situations on the Earth’s surface for a more comprehensive analysis.

ACKNOWLEDGEMENT

This study was only possible thanks to the availability of images belonging to the PLANET constellation, managed by the Norwegian Environmental Protection Initiative (NCFI) program. With the support of the Coordination for the Improvement of Higher Education Personnel (CAPES).

REFERENCES

Abdollahi, A., Pradhan, B. and Alamri, A.M. (2020a) ‘VNet: An End-To-End Fully Convolutional Neural Network for Road Extraction from High-Resolution Remote Sensing Data’, IEEE Access, 8. Available at: <DOI: 10.1109/ACCESS.2020.3026658> [Accessed 24 April 2023]
» https://doi.org/10.1109/ACCESS.2020.3026658
Abdollahi, A. Pradhan, B., Shukla, N., Chakraborty, S. and Alamri, A. (2020b) ‘Deep Learning Approaches Applied To Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review’, Remote Sensing, 12, 9.
Chambon, S. (2011) ‘Detection of Points of Interest for Geodesic Contours: Application on Road Images for Crack Detection’, Proceedings of the International Conference on Computer Vision Theory and Application, pp. 210-213.
Cheng, G., wang, Y., Gong, Y.. Zhu, F. and Pan, C. (2014) ‘Urban Road Extraction Via Graph Cuts Based Probability Propagation’, IEEE International Conference on Image Processing, 1, pp. 5072-5076.
Dung, C.V. and Anh, L.D. (2019) ‘Autonomous Concrete Crack Detection Using Deep Fully Convolutional Neural Network’, Automation in Construction, 99, pp. 52-58.
Ghandorh, H., Boulila, W., Massod, S., Koubaa, A., Ahmed, F. and Ahmad, J. (2022) ‘Semantic Segmentation And Edge Detection-Approach For Road Detection In Very High Resolution Satellite Images’, Remote Sens, 14, pp. 613. Available at: <https://doi.org/10.3390/Rs14030613> [Accessed 14 July 2023]
» https://doi.org/https://doi.org/10.3390/Rs14030613
Hou, Y., Liu, Z., Zhang, T. and Li, Y. (2021) ‘C-UNet: Complement Unet for Remote Sensing Road Extraction’, Sensors 21, pp. 2153. Available at: <https://doi.org/10.3390/S21062153> [Accessed 10 May 2023]
» https://doi.org/https://doi.org/10.3390/S21062153
Lourenço, M., Estina, D., Oliveira, H., Oliveira, L. and Mora, A. (2023) ‘Automatic Rural Road Centerline Detection and Extraction from Aerial Images for a Forest Fire Decision Support System’, 15, pp. 271. Available at: <https://doi.org/10.3390/Rs15010271> [Accessed 10 May 2023]
» https://doi.org/https://doi.org/10.3390/Rs15010271
NICFI. (2022) Norway’s International Climate and Forests Initiative Satellite Data Program. Available at: <Available at: https://www.planet.com/nicfi/ > [Accessed 02 May 2022]
» https://www.planet.com/nicfi/
Ronneberger, O., Fischer, P. and Brox, T. (2019) UNet: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Available at: <Available at: https://arxiv.org/abs/1505.04597 > [Accessed 25 July 2023]
» https://arxiv.org/abs/1505.04597
Senadeera, K.U.H.P.T.S. (2021) ‘Automation of Road Feature Extraction from High Resolution Images’. Available at: <Available at: http://hdl.handle.net/10362/113905 > [Accessed 27 January 2023]
» http://hdl.handle.net/10362/113905
Silva, C.R. and Centeno, J.A.S. (2010) ‘Semi-Automatic Extraction Of Side Roads Based On Genetic Algorithms’, Acta Scientiarum - Technology, 32, 2, pp. 137-145.
Xu, M., Wu, J., Wang, H. and Cao, M. (2019) ‘Anomaly Detection in Road Networks Using Sliding-Window Tensor Factorization’, IEEE Transactions on Intelligent Transportation Systems, 20, 12, pp. 4704-4713. Available at: <https://DOI.org/10.1109/TITS.2019.2941649> [Accessed 27 January 2023]
» https://doi.org/https://DOI.org/10.1109/TITS.2019.2941649
Yang, C. Rottensteiner, F. and Heipke, C. (2019) ‘Towards Better Classification Of Land Cover And Land Use Based On Convolutional Neural Networks’, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 42, 2, pp. 139 - 146.
Zhang, L. Zhang, L. and Du, B. (2016) ‘Deep Learning for Remote Sensing Data: A Technical Tutorial on the State Of The Art’, IEEE Geoscience and Remote Sensing Magazine, 4, 2, pp. 22 - 40.
Zhang, X., Zhou, X. and Lin, M. (2018) ‘Shuffle net: An Extremely Efficient Convolutional Neural Network for Mobile Devices’, IEEE Xplore, pp. 6848 - 6856.

Publication Dates

Publication in this collection
09 May 2025
Date of issue
2025

History

Received
14 May 2024
Accepted
02 Apr 2025

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] Abdollahi, A., Pradhan, B. and Alamri, A.M. (2020a) ‘VNet: An End-To-End Fully Convolutional Neural Network for Road Extraction from High-Resolution Remote Sensing Data’, IEEE Access, 8. Available at: <DOI: 10.1109/ACCESS.2020.3026658> [Accessed 24 April 2023]
» https://doi.org/10.1109/ACCESS.2020.3026658

[2] Abdollahi, A. Pradhan, B., Shukla, N., Chakraborty, S. and Alamri, A. (2020b) ‘Deep Learning Approaches Applied To Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review’, Remote Sensing, 12, 9.

[3] Chambon, S. (2011) ‘Detection of Points of Interest for Geodesic Contours: Application on Road Images for Crack Detection’, Proceedings of the International Conference on Computer Vision Theory and Application, pp. 210-213.

[4] Cheng, G., wang, Y., Gong, Y.. Zhu, F. and Pan, C. (2014) ‘Urban Road Extraction Via Graph Cuts Based Probability Propagation’, IEEE International Conference on Image Processing, 1, pp. 5072-5076.

[5] Dung, C.V. and Anh, L.D. (2019) ‘Autonomous Concrete Crack Detection Using Deep Fully Convolutional Neural Network’, Automation in Construction, 99, pp. 52-58.

[6] Ghandorh, H., Boulila, W., Massod, S., Koubaa, A., Ahmed, F. and Ahmad, J. (2022) ‘Semantic Segmentation And Edge Detection-Approach For Road Detection In Very High Resolution Satellite Images’, Remote Sens, 14, pp. 613. Available at: <https://doi.org/10.3390/Rs14030613> [Accessed 14 July 2023]
» https://doi.org/https://doi.org/10.3390/Rs14030613

[7] Hou, Y., Liu, Z., Zhang, T. and Li, Y. (2021) ‘C-UNet: Complement Unet for Remote Sensing Road Extraction’, Sensors 21, pp. 2153. Available at: <https://doi.org/10.3390/S21062153> [Accessed 10 May 2023]
» https://doi.org/https://doi.org/10.3390/S21062153

[8] Lourenço, M., Estina, D., Oliveira, H., Oliveira, L. and Mora, A. (2023) ‘Automatic Rural Road Centerline Detection and Extraction from Aerial Images for a Forest Fire Decision Support System’, 15, pp. 271. Available at: <https://doi.org/10.3390/Rs15010271> [Accessed 10 May 2023]
» https://doi.org/https://doi.org/10.3390/Rs15010271

[9] NICFI. (2022) Norway’s International Climate and Forests Initiative Satellite Data Program. Available at: <Available at: https://www.planet.com/nicfi/ > [Accessed 02 May 2022]
» https://www.planet.com/nicfi/

[10] Ronneberger, O., Fischer, P. and Brox, T. (2019) UNet: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Available at: <Available at: https://arxiv.org/abs/1505.04597 > [Accessed 25 July 2023]
» https://arxiv.org/abs/1505.04597

[11] Senadeera, K.U.H.P.T.S. (2021) ‘Automation of Road Feature Extraction from High Resolution Images’. Available at: <Available at: http://hdl.handle.net/10362/113905 > [Accessed 27 January 2023]
» http://hdl.handle.net/10362/113905

[12] Silva, C.R. and Centeno, J.A.S. (2010) ‘Semi-Automatic Extraction Of Side Roads Based On Genetic Algorithms’, Acta Scientiarum - Technology, 32, 2, pp. 137-145.

[13] Xu, M., Wu, J., Wang, H. and Cao, M. (2019) ‘Anomaly Detection in Road Networks Using Sliding-Window Tensor Factorization’, IEEE Transactions on Intelligent Transportation Systems, 20, 12, pp. 4704-4713. Available at: <https://DOI.org/10.1109/TITS.2019.2941649> [Accessed 27 January 2023]
» https://doi.org/https://DOI.org/10.1109/TITS.2019.2941649

[14] Yang, C. Rottensteiner, F. and Heipke, C. (2019) ‘Towards Better Classification Of Land Cover And Land Use Based On Convolutional Neural Networks’, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 42, 2, pp. 139 - 146.

[15] Zhang, L. Zhang, L. and Du, B. (2016) ‘Deep Learning for Remote Sensing Data: A Technical Tutorial on the State Of The Art’, IEEE Geoscience and Remote Sensing Magazine, 4, 2, pp. 22 - 40.

[16] Zhang, X., Zhou, X. and Lin, M. (2018) ‘Shuffle net: An Extremely Efficient Convolutional Neural Network for Mobile Devices’, IEEE Xplore, pp. 6848 - 6856.

Parameters	Test 1	Test 2
Epochs	50	50
Batch Size	32	32
Neurons	16	32
Accuracy	91.85%	91.95%
Precision	75.80%	75.83%
Recall	63.71%	65.92%
F1-Score	68.99%	69.07%
mIoU	53.94%	55.01%
Time	04h:53min	06:08min