LAND-USE AND LAND-COVER MAPPING USING A COMBINATION OF RADAR AND OPTICAL SENSORS IN RORAIMA – BRAZIL

Land-use and land-cover (LULC) are important environmental properties of the Earth's surface. Satellite platforms and state-of-the-art algorithms enable the mapping of large areas, but they still need to be improved. This study aimed to compare free- and open-access images from radar and optical sensors, using the Google Earth Engine™ (GEE) for supervised classification of LULC for five municipalities in Roraima State, Brazil. Sentinel-1 (S1) scenes were used along with Landsat 8 (L8) and Sentinel-2 (S2) ones, resulting in five classification approaches S1 (SD), L8 (ODL), S2 (ODS), S1+L8 (SODL), and S1+S2 (SODS), with an auxiliary ALOS World 3D dataset (DEM≈30m). Accuracy was assessed by an error matrix. The SD approach was significantly different (P ≤ 0.01) from the others using a mean F1-score of 0.80. ODL and ODS had barely perceptible differences (P ≤ 0.1), showing F1-scores of 0.95 and 0.92, respectively. When comparing ODL (F1=0.95) and SODL (F1=0.95) no difference was found (P > 0.1). However, by comparing ODS (F1=0.92) and SODS (F1=0.94), there was a significant classification improvement (P ≤ 0.05). In short, the approaches SODL and SODS had the best pixel-based supervised classification performance.


INTRODUCTION
Land-cover defines the physical condition and characteristics of the biotic component at the local, regional, national, or continental scale. It also shows how human needs and actions have modified the environment to give it uses different from those of its original condition and aptitude (Chu, 2020;Hassan et al., 2016).
Understanding land-use and land-cover (LULC) changes of large areas in a jointly and dynamic manner is essential to plan urban spaces , assess environmental impacts from anthropic intervention or natural disasters (Gomes et al., 2019;Ishihara & Tadono, 2017), evaluate expansion of agricultural frontiers (Bacarji et al., 2021;Ferreira et al., 2021;Souza et al., 2020), manage water resources and soil (Lopes et al., 2020;Silva et al., 2021), among many other applications.
LULC can be mapped directly in the field at several scales, adding information with different precision levels. However, this method is excessively demanding of the workforce, time, and financial resources, which makes it practically unfeasible for large surfaces.
Technological advances in recent decades and the availability of satellite platforms have allowed mapping and monitoring landscape components such as relief, vegetation cover, water resources, and land use (Pandey et al., 2021). Among its advantages, state-of-art orbital sensor imagery has multi-temporal availability, great spatial coverage, fast availability at reduced costs, or even free, facilitating, and cheapening LULC mapping.
There are now data processing workflows integrated with machine learning classification algorithms, which allow land cover to be accurately mapped over large areas using multi-temporal imagery (Floreano & De Moraes, 2021;Ghayour et al., 2021;Gupta et al., 2021;Talukdar et al., 2020). Such advances have helped in creating some operational monitoring programs for LULC in many countries.
The Brazilian territory has continental dimensions, with 8,510,345.538 km² (IBGE, 2021). It also has a great diversity in landscape, making it a full sample of tropical ecology. Therefore, to map LULC at a national or regional scale, the Biome to which the area belongs should be considered, demanding the use of specific classes for each ecological unit.
In Brazil, many studies have focused on obtaining basic information for monitoring spatial-temporal changes at the Biome scale. One of the most recent and complete is the MapBiomas collection, which contains 35-year (1985-2019) annual data at a 30-m spatial resolution. Its accuracy has substantially improved in terms of the Biomes and new agricultural cover types. MapBiomas is based on optical images, specifically from the Landsat 5/7/8 platforms .
Although the overall average accuracy of LULC time series has improved considerably, these methods have to further develop to improve temporal resolution in cloudy areas, such as the state of Roraima in the Brazilian Legal Amazon. Of the fifteen municipalities in Roraima State, Alto Alegre, Boa Vista, Cantá, Normandia, and Bonfim stand out for their importance in agricultural activity (IBGE, 2009;IBGE, 2021); yet, all of them have high cloudiness throughout the year.
A combination of radar with optical imaging has shown promising results, as SAR can penetrate clouds and cirrus clouds (Carneiro et al., 2020;Hirschmugl et al., 2020;Rao et al., 2021). Platforms, such as Sentinel-1 (radar) and Sentinel-2 (optical), are a new data-rich source that can be used in LULC mapping to monitor crop phenology, as well as to detect cultivated and irrigated plots, with high spatial and temporal resolutions (Gutiérrez-Vélez & Defries, 2013;Kou et al., 2015;Tian et al., 2019).
This study aimed to compare free-and open-access images from radar and optical sensors, hosted on the Google Earth Engine™ (GEE) cloud computing platform for supervised classification of LULC in five municipalities of Roraima State, Brazil.

MATERIAL AND METHODS Study Area Description
Roraima State is the northernmost of Brazil and extends from latitude 1°35'11'' S to 5°16'20'' N ( Figure 1). It occupies an area of 223,644.527 km2, and more than 80% of its territory is in the northern hemisphere (IBGE, 2021). Roraima has a diverse phyto-physiognomy, with forests being the largest portion, followed by savanna zones (known as Cerrado), a peculiar characteristic of the Amazon in the north-east and center-northeast, occupying an area of 17% of the state (IBGE, 2009).
Regarding hydrography, the Branco River is the main tributary of the Negro River and is formed from the confluence of the Uraricoera and Tacutu rivers. The rivers Mucajaí, Água Boa, Univini, Catrimani, and Xeruini are the main tributaries on the right bank and the Anauá River on the left bank. The Branco River basin virtually covers the whole state of Roraima; however, three other basins are also important. Those are of the Jauaperi and Jufari rivers, also tributaries of the Negro River, and the Jatapu, a tributary of the Amazon River (IBGE, 2009).
According to Köppen's classification, the main climate is divided into three groups: Aw (tropical savanna climate), Am (tropical monsoon climate, between April and September with a peak in June), and Af (tropical humid climate, between March and August with a peak in May). The local rainfall regime is split into two well-defined periods, a rainy season (May-August) and a dry period (September-April), with an annual rainfall of 1,925 ± 339.7 mm (Barni et al., 2020).
Land-use and land-cover mapping using a combination of radar and optical sensors in Roraima -Brazil Engenharia Agrícola, Jaboticabal, v.42, n.2, e20210142, 2022 Sensors, orbital datasets, and preprocessing In this study, we used a collection of images from three sensors (Table 1). One is a synthetic aperture radar (SAR) operating in the C-band (at the 5.405 GHz central frequency), which corresponds to a 5.55-cm wavelength, while the other two are optical sensors, Multispectral Instrument (MSI) and Operational Land Imager (OLI). The Sentinel-1 satellite constellation provides SAR images, and the Sentinel-2 and Landsat 8 satellites provide optical images. The Digital Elevation Model (DEM) known as "ALOS World 3D-30m (1 arcsec)" was used as an auxiliary dataset (Tadono et al., 2016). All orbital data were acquired and processed using the GEE cloud computing platform, which runs on the portal: https://earthengine.google.com (Gorelick et al., 2017). This tool allows handling and processing large amounts of data, with high computing power. The images were taken over the study area from 2018 to 2019. Table 2 summarizes the satellite specifications and the main characteristics of the available dataset for the study area. Sentinel-1 A total of 1407 images of the Sentinel-1 (S1A/S1B) were analyzed. These are high temporal and spatial resolution products in Interferometric Wide Swath (IW) and Level-1 Ground Range Detected High Resolution (GRDH) modes, with incidence angles from -30.74° to 46.19° (Table  2). These scenes are projected on a regular 10 m × 10 m grid, with a reference spatial resolution of 20 m × 22 m (range and azimuth directions, respectively).
GEE pre-processing includes applying an orbit file, GRD border denoising, thermal denoising, radiometric calibration to radar brightness (σ^°), Range-Doppler Terrain correction to geocode the Universal Transverse Mercator (UTM) coordinates, WGS-84 ellipsoid (UTM-WGS84). A speckle filter was applied to the mosaic as a circle with a smoothing radius of 50 to remove backscatter noise (salt and pepper effect) in the images, which is characteristic of SAR scenes (ESA, 2021;Mullissa et al., 2021).

Sentinel-2
For the Sentinel-2 (S2A/S2B) dataset, a total of 2,891 images were acquired on the GEE platform over the study area for the years 2018-2019, of which 382 were processed with a criterion of less than 30% of cloudiness, with a spatial resolution of 10 and 20 m, and spectral resolution integrated by 10 bands ( Table 2).
As a source, level-2A products were used, which are a Bottom-Of-Atmosphere (BOA) reflectance product, which is radiometrically and geometrically corrected, including orthorectification using Sentinel-2 Toolbox (ESA, 2021).
Moreover, a cloud-masking function, known as cloud probability collection or S2cloudless (Ke et al., 2017), was applied. This algorithm created with the sentinel2cloud-detector library is based on a highly efficient gradient boosting decision tree (Highly Efficient Gradient Boosting Decision Tree, LightGBM). Data from 10 Sentinel-2 bands (B1-B5 and B8-B12) were sampled by bilinear interpolation at 10 µm resolution before applying the algorithm. This preprocessing was performed to remove clouds, cirrus clouds, and shadows, using a cloud scoring algorithm to mask polluted pixels as described in ESA (2021), and projected in UTM-WGS 84 coordinates.

Landsat 8
A series of 601 Landsat-8 OLI images (Table 2) was evaluated before selecting the least affected by clouds (filtering), resulting in 278 scenes. All of these belong to the Landsat-8 tier 1 SR (T1) collection, which was uploaded to GEE for analysis. The T1 collection contains the highest quality Level-1 Precision (L1TP) data that is considered suitable for time series analysis. This dataset comprises atmospherically-corrected surface reflectance that has been orthorectified, geo-registered (terrain corrected), and projected to UTM-WGS 84 coordinates.
A per-pixel mask function was applied to treat the presence of clouds, cloud shadows, and cirrus clouds, which are derived from the CFmask algorithm for this dataset from the T1 collection (Foga et al., 2017).
Regarding sensor selection, used orbital dataset and to achieve the research goal, five supervised pixel-based classification approaches were established (Table 3). Reference classification (MapBiomas collection) and land-use and -cover classes used Land-use and -cover was mapped using the Collection 5-derived product for the period 1985-2019 and updated in August 2020 (https://mapbiomas.org/), with a 30-m spatial resolution. This first approximation was used to select the classifier training areas and define the LULC classes. This resulted in 10 classes (Table 4) namely: Forest Formation, Forest Plantation, Non-Forest Formation in Wetland, Grassland Formation, Pasture, Agriculture, Beach (sandbanks), Savanna, Urban Infrastructure, and Water.
A total of 36 field inspections (field reality) were carried out in the five municipalities during the 2 years studied, identifying in situ areas corresponding to each selected class. Geographical coordinates were taken at these sites using an Android cell phone application (As Minhas Coordenadas GPS TM ). From these sites were obtained spectral signatures of the actual cover (training), as well as subsequent verification of classification accuracy (validation). * Classification modified by the authors A selection criterion was applied in training areas under the following conditions: a) easy-access areas, preferably interurban roads, rural roads and, in the last case, property internal access; b) areas with the presence of at least three of the classes to be classified; c) if the area was being cultivated, it should preferably be with crops representative of those produced in the state, that is, grains and/or cereals.
In difficult access areas, both related to road limitations and circulation restrictions due to the Covid-19 pandemic regulations, land covers were identified using false color and true color composites of Sentinel-2 imagery at 10-m spatial resolution. The researchers' knowledge of the area under study was used, as well as high-resolution images available from the Google Earth © and Bing Maps © online sites for the identification of land covers.
To avoid overfitting, the data were pseudo-randomly separated in the proportions 60-40%, 70-30%, and 80-20% (training-validation), observing the best model fit in the proportion 70 -30%. After the validation, the required statistics were generated to validate and measure classification reliability. The training areas, selected by polygonal vectors, were uniformly distributed over the collected images to represent each class to be classified.
The classification approaches were statistically compared by a pairwise Sign Test on every two samples, using the BSDA © version 1.2.0 package in the R environment (Mangiafico, 2016). The p-value associated with F1 was obtained, i.e., the probability that the F1 score measures the evidence against the null hypothesis (Ho: the median of the paired differences [F1] in the population from which the sample was drawn [approaches] is equal to zero).

Classification algorithm
The non-parametric machine-learning Random Forest classifier was used for the supervised classification (Breiman, 2001), which is a widely used high-precision classifier. Its robustness lies in the fact that it can combine multiple decision-tree results through a selection of random subsets within the training set. This algorithm classifies each pixel individually, resulting in a final classification by majority voting. In other words, each decision tree is generated using a different subset of training samples in each interaction, thus building multiple singular or disconnected trees, which may not contain all classes in each tree (Gareth et al., 2013).

RESULTS AND DISCUSSION
Except for SD, in all classification approaches, elev (elevation) band corresponding to DEM resulted in a higher record of importance variable. This variable is used to rank the classification from the algorithm (Behnamian et al., 2017). On the other hand, in this same scenario, the slope range was more important than the elevation range, with records of 1852.7 (S1/A) and 1435.5 (S1/B), respectively (Figure 2 [C, D]). It suggests that, in the case of radar, discrimination between the evaluated classes is influenced by backscatter (VH, VV) and terrain slope, rather than by elevation.
The foregoing demonstrates the great utility of incorporating a DEM into a supervised classification. In this regard, Hurskainen et al. (2019) found that geospatial datasets used as classification aids, such as topography, soil characteristics, settlement patterns, watercourses, and vegetation phenology, can significantly increase the discriminatory potential of LULC categories, improving classifications in the context of complex heterogeneous landscapes.

Spectral behavior of the land-use and -cover classes
Sentinel-1 Figure 2 [A, B] shows the backscatter values in decibels (dB) corresponding to the thematic categories identified using the SAR collection. In general, the classes Water and Beach showed the lowest intensity among the ten categories for both polarizations studied and both orbits. Regarding Water, the backscatter values in the Descending and Ascending orbits were -29.78 dB and -29.80 dB for the VH band, and -21.10 dB and -16.45 dB for the VV, respectively. However, in the category Beach, the intensities found were -27.47 dB (VH) and -17.45 dB (VV) for S1/A and -27.47 dB (VH) and -15.07 dB (VV) for S1/B. Such low backscattering measures are because these land-cover classes comprise flat surfaces with little roughness (specular), which causes the SAR signal to bounce away from the radar antenna (ESA, 2021).
In contrast, the highest backscatter records corresponded to the Forest Formation classes, with -13.24 dB and -13.54 dB (VH), -9.31 dB and -7.18 dB (VV), as well as to Forest Plantation classes, with values of -15.06 dB and -15.76 dB (VH), -9.31 dB and -10.01 dB (VV) in both orbits considered, respectively. These surfaces behave like rough surfaces, causing volume dispersion.
As has been reported by Carneiro et al. (2020), Hirschmugl et al. (2020), and Rao et al. (2021), the classes Water, Forest Formation, Forest Plantation, and Grassland can be easily distinguished using the VH and VV polarizations individually. Conversely, the classes Pasture, Agriculture, and Savanna are not easily differentiated, as the average range of values among them is narrower for both VH and VV polarization.
The combined revisit time of the COPERNICUS/S1_GRD image collection (S1/A and S1/B) over the study area was between 1 and 3 days, which is a characteristic of this constellation in its passage over latitudes close to the equator (ESA, 2021). Moreover, the backscatter values obtained in the two analyzed orbits were highly coincident, which is why we decided to use only the Descending orbit (S1/A) to combine with the optical sensors onboard Landsat-8 and Sentinel-2 in the classification.
Landsat 8 Figure 3 [A] displays the spectral profile of bands and NDVI of L8 used, while Figure 3 [B] shows the ranking of each band used to obtain the ten cover classes.  Visible bands (RGB) express the lowest relative pixel values. It is difficult to differentiate cover classes using only these bands, as these spectral bands are poorly differentiated. Among these bands, B03 is the least important as for the variable importance of the RF algorithm with a value of 167.2 (dimensionless). These bands are very useful in identifying classes made up of plant covers because they are related to chlorophyll and other pigments present in leaves (Pelletier et al., 2016).
Although band B10 (thermal) shows a high importance value (231.7), it does not discriminate between the classified categories. The NIR bands, SWIR1 and SWIR2, have great potential for spectral differentiation between classes, with importance values of 222.5, 214.1, and 196.3 respectively. The NDVI had good discrimination potential for the categories studied, with its importance value being 212.  (1, 2, 3), NIR, and SWIR (1, 2) bands yield reflectance measurements that allow an optimal differentiation between vegetation-related classes (Hernandez et al., 2020;Phiri et al., 2020).

Sentinel-2
NDVI showed an importance value of 253.8 in classification, as it had a greater effect in differentiating Water, Forest Formation, Forest Plantation, Beach, and Urban classes, as reported by Da Silva et al. (2020). By comparing Landsat 8 and Sentinel-2 results, the latter had better spectral behavior in the classification. This may be influenced by its narrower bandwidth and better spatial resolution. The classes related to forest vegetation, lower native vegetation, agriculture, urban infrastructure, and water were clearly defined with the NIR and SWIR channels of S2. Mandanici & Bitelli (2016) reported similar results. Table 5 shows the summary of accuracy metrics derived from the RF sorting approaches. The overall evaluation of each sorting approach is expressed in the AO and Kappa index. The metrics UA, PA, F1, CE, and OE showed more details for each classified class.

Classification accuracy
In the SD approach, AO was 97.18% and Kappa index 0.88 (dimensionless). For ODL and ODS, these metrics had the following values 99.56% and 0.98, and 99.33% and 0.97, respectively. Radar and optical sensor classifications showed a notorious difference. However, when ODL is compared with ODS, such difference was practically imperceptible. When evaluating how adding S1 improved L8 and S2 classifications, SODL had an OA of 99.71 and a Kappa index of 0.99, while for SODS these values were 99.53% (OA) and 0.98 (Kappa). This represents 0.15-and 0.20-unit precision increases for SODL and SODS, respectively, and in a Kappa index of 0.01 for both.
Notably, we could not assess the advantage of adding radar images to optical ones for classification. However, when going into the detail of some classes, such a difference was more noticeable. The classes with the highest CE and OE errors were Urban (9) and Wetland (3).
In the ODL scenario, the Urban class had an OE of 18.68% when assigning pixels belonging to another class, whereas OE was reduced to 10.81% in the SODL, improving classification for urban infrastructure. A similar situation was evident in the ODS and SODS scenarios, wherein the Urban class had a CE of 52.27% for ODS and in SODS the CE fell to 44.80%.
Regarding the Wetland class in the ODL, the CE was 15% when predicting pixels in this class, while the CE was higher (17.02%) in the SODL scenario. However, in the ODS and SODS scenarios, the Wetland class showed an EC of 16.95% for ODS and in SODS it increased to 28.95%. The best-classified classes, both with S2 and L8 and added with S1, were the Forest Formation and Beach, with CE values below 0.01%. In the Water class, no OE was observed in the ODS, SODL, and SODS scenarios.  Table 6 displays the sign test applied to F1. Such a test allowed us to verify statistical differences among the approaches. Thus, the SD approach was significantly different (p ≤ 0.01) from the others, reflecting its worst performance in the classification.
ODL and ODS, both optical, showed barely perceptible differences (p ≤ 0.1). Likewise, for ODL, the combination of L8 with S1 (SODL) had no significant effect (p > 0.1). So, under these conditions, classification did not improve when images from these platforms were combined. However, for ODS, the combination of S2 and S1 (SODS) improved classification significantly (p ≤ 0.05). As for the SODL and SODS approaches, the combination of S2 and S1 had a better classification performance than did L8 and S1, which were different from each other (p ≤ 0.05).

Maps of land-uses and covers obtained with RF classifier
Although classifications are quantitatively evaluated, the final product must be subjected to a qualitative analysis by the specialist who carried it out, as well as by the users to whom it is intended. Furthermore, another influencing criterion is the knowledge of the area of study that those involved in the work have. Figure 5 shows the maps of LULC types classified at the pixel level, with a 30-m spatial resolution.
Clip S2 4-3-2/A corresponds to an area with a predominance forest plantation (Acacia mangium), whose borders are covered with savanna and grasslands. In the clip S2 4-3-2/B, agricultural areas with 5 central pivots and other crops with a watercourse (Cauamé river) are clearly seen. Clip S2 4-3-2/C is mainly composed of forest vegetation, crop areas, water (Branco River), sandbanks, savanna, and grasslands. Lastly, clip S2 4-3-2/D is mostly represented by the urban planning of the city of Boa Vista, which is the capital of Roraima State, besides being flanked by the Branco River.
Both SODL and SODS approaches were qualitatively the best to define all 10 evaluated cover classes. However, ODL tended to erroneously classify exposed lands and savannahs as urban areas. Regarding ODS, urban areas were classified as savanna, highlighting only paved roads. Lastly, SD showed a high degree of classification confusion, hindering a proper land class differentiation.

CONCLUSIONS
Our results indicate that the fusion of multitemporal SAR and optical data increases overall accuracy for LULC mapping when compared to the use of temporal optical and SAR data independently. The LULC classes Water, Forest Formation, Forest Plantation, and Grassland are easily differentiated with the use of SAR. RGB bands of optical sensors are highly sensitive to the spectral response of the classes vegetation, Water, and Urban.
The incorporation of SAR images into classification with optical time series increases the possibility of classifying areas highly affected by clouds during long periods, as is the case of the Brazilian Legal Amazon. This is due to variations in backscattering that different covers experience in geometry, roughness, and moisture content, which are the main physical factors inherent in any surface object or target detected by radar.
High spatial resolution optical images, such as Sentinel-2 ones, are useful to classify areas of difficult access, at least for first-class level, but this is limited by the presence of clouds. Conversely, SAR images allow to classify the LULC categories of particular interest, such as bodies of water, flooded areas, or urban infrastructure.
Google Earth Engine™ cloud computing platform allows the analysis of many images from optical and radar sensors with free and open access. Such analysis results in products with each of the approaches studied. Overall, for LULC determined and using the method presented in this study, the SODL and SODS approaches show the best visual results in the final product.