Comparison between oil spill images and look-alikes: an evaluation of SAR-derived observations of the 2019 oil spill incident along Brazilian waters

: Three SAR-derived observations of dark surface patches along the Northeastern Brazilian coastline by the end of 2019 were misreported in the Brazilian media as oil spill-related. Unfortunately, these observations were misled by false positives or look-alikes. Therefore, this paper aims to technically evaluate these look-alike classes by analyzing image attributes found to be helpful to the identifi cation of ocean targets, including oil spills, rain cells, biofi lms, and low wind conditions. We use image augmentation to extend our dataset size and create the probability density function curves. The processing includes image segmentation, optimal attribute extraction, and classifi cation with random forest classifi ers. Our results contrast with the open-source oil spill detection system and patch classifi er methodology called “RIOSS.” Analysis of the feature probability density functions based on optimal attributes is promising since we could capture most of the false positive targets in the three SAR-reported images in 2019. The only exception was the biofi lm slick observed on October 28th, where the RIOSS mistakenly classifi ed this organic patch as a low wind region with oil spots. This pitfall is acceptable at this project stage since we had only fi ve biogenic fi lm samples to train the algorithm.

INTRODUCTION Vasconcelos et al. (2020), using a bibliometric analysis with a systematic review, revisit the relevant literature on oil spill detection and mapping from the last fi fty years . Their results prove that oil detection at sea has shown signifi cant evolution in recent decades. This evolution is explained by the fact that there is a strong relationship between the technological evolution of detection and remote sensing data acquisition methods. Among the countries that contributed most to this fi eld of science, China, Norway, the United States, and Canada stood out as the largest producers and disseminators of information in this research fi eld.
Spaceborne Synthetic Aperture Radar (SAR) is more efficient than other commonly used satellite oceanography techniques to monitor oil spills and natural seeps over the ocean's surface, like optical remote sensing, thermal infrared imaging, and passive microwave sensors (Fingas & Brown 2018, Alpers et al. 2017, Chaturvedi et al. 2020. It is common sense that it is an excellent sensor for monitoring large maritime areas at regular intervals, all day (day/ night), and foul weather detection due to its sensor characteristics. SAR generates electromagnetic pulses that "illuminate" the ocean surface, and it receives a backscattered echo described by the Bragg scattering theory. In calm (rougher) seas, most of the transmitted energy is reflected away from (towards) the radar, and the backscattered signal towards it is very low (high). The backscattered pulse is known as "radar echo." The lower (higher) the echo, the darker (brighter) a target will appear on a SAR image. This is how oil spills appear as "dark spots" on radar images. They damp the capillary and short gravity waves responsible for the Bragg scattering. Consequently, these oil surface features are detectable and appear as dark spots in radar imagery. Nevertheless, classifying actual oil areas from look-alikes in SAR images is still challenging (Fingas & Brown 2018, Alpers et al. 2017, as these dark formations can be associated with oil spills, biogenic films, rain cells, low wind areas, and ship wakes (Singha et al. 2012, Chen et al. 2017, Alpers et al. 2016, Eldhuset 1996, Di Carro et al. 2018, Espedal & Wahl 1999. As mentioned before, oil spills appear as dark spots in SAR images due to their characteristic viscosity, which increases the surface tension and reduces the Bragg waves at the ocean's surface, thus, its echo. According to Fingas & Brown (2018) and Alpers et al. (2017), nowadays, oil spill response includes remote sensing as an essential component. Users expect its extent and location to be mapped to implement counter-measurements to minimize pollution and hold illegal discharges from ships accountable. Indeed, a hot topic in satellite oil detection is still separating real oil spills from false positives or look-alikes (Brekke & Solberg 2005, Svejkovsky et al. 2016).
Biogenic films are not different. They also damp the Bragg waves, diminishing the radar echo. These films are natural surface slicks produced by fish or plankton activity (Alpers & Hühnerfuss 1988, 1989). Indeed, according to Alpers et al. (2017, the most challenging task is to separate radar signatures caused by mineral oil spills from those caused by biogenic surface films. According to the scientific literature, rain leaves footprints on the sea surface that sometimes become visible on SAR images (Alpers et al. 2016). Heavy rain often appears in the form of rain cells, especially in the tropics and subtropics. They can be seen on satellite pictures as bright patchy areas of augmented sea surface roughness due to a boost of sea surface wind speeds caused by the downdraft winds. However, rain cells are not always associated with downdrafts. They can also show dark patches in their centers, where the raininduced turbulence in the ocean's surface has damped the Bragg waves, i.e., the short surface waves responsible for the radar backscattering (Melsheimer et al. 1998).
As presented for oil spills and biogenic films, radar imagery is also susceptible to lookalikes if the wind speed is lower than the 1.5 m/s limit (Fingas & Brown 2018, Alpers et al. 2017). According to Gade et al. (1998), who performed a series of slick controlled experiments with three different multifrequency/multipolarization SAR bands (i.e., L-, C-, and X-band), at high wind speeds (8-12 m/s), the ratio of the radar backscatter from a slick-free and a slick-covered water surface (damping ratio) is smaller than at low-to-moderate wind speeds (4-7 m/s).
According to Reed & Milgram (2002), SAR observations of ship wakes usually show a dark trailing centerline region, a bright V-shaped feature aligned at some angle to the ship's path, and, sometimes, either the transverse or the diverging waves of the Kelvin-wave pattern. The dark region is usually associated with relatively low radar backscatter, whereas the bright line suggests a region of enhanced radar echo (Eldhuset 1996).
Therefore, one can see that classifying actual oil-like areas and separating them from look-alikes in SAR images is still challenging and not an easy task. Indeed, three divergent SAR-derived observations of the occurrence of dark surface patches along the Northeastern Brazilian coastline, when viscous crude oil reached over 200 locations along the coast, were reported in the Brazilian media (print and digital) erroneously as oil spill-related. To our knowledge, these are the only three displayed satellite images in the media. They were recorded during the austral spring of 2019 -19/07/2019, 11/10/2019, and 28/10/2019. Unfortunately, these observations were misled by false positives or look-alikes. The purpose of this paper is to analyze these three look-alikes using an oil spill detection methodology called "RIOSS algorithm" proposed by Conceição et al. (2021). Indeed, knowledge about ocean SARderived targets is needed prior to press release notes and mainstream short communications.

Satellite Data
Synthetic Aperture Radar (SAR) image data from Sentinel-1A and 1B satellites were acquired through the European Space Agency (ESA) website via the Copernicus Open Access Hub portal (https://scihub.copernicus. eu/). Sentinel-1, the first of the Copernicus Program satellite constellation, is on a sunsynchronous, near-polar (98.18°) orbit with a 12day repeat cycle completing 175 orbits/cycle. Its SAR, which operates in the C-band and provides images in all-weather, daytime, or nighttime, has a spatial resolution of down to 5 m and a swath of up to 400 km. The fi rst satellite, Sentinel-1A, was launched on April 3 rd , 2014, and Sentinel-1B, on April 25 th , 2016. For further details, the reader should refer to (https://sentinel.esa.int/web/ sentinel/missions/sentinel-1).
These SAR images are available in Single Look Complex (SLC) format, with a 5m × 20m spatial resolution (azimuth and range) and 250km image range, level 1 processing, georeferenced, with satellite orbit and altitude provided in zero-Doppler inclined range geometry. Processed Level 1 SLC data, which have complex images with phase and amplitude, do not have all the corrections and pre-processing steps needed for the following analyses. Therefore, SAR data's pre-processing steps included orbit correction, radiometric calibration in Sigmazero , deburst, and multilook. After the multilook pre-processing is applied, the posprocessed images have a nominal resolution of 20m x 20m. Before the computational analyses, these steps were done using the SNAP Sentinel toolboxes software. During the development of the RIOSS Code (Conceição et al. 2021), the analyzes were centered on radar backscattering values; however, the authors want to evolve to interferometry and polarimetry analyses soon. For this reason, we chose to use Single Look Complex (SLC) data from the fi rst steps of the code.
For the random forest RIOSS model training, described by Conceição et al. (2021), 39 Sentinel-1 SLC SAR images were used. A total of 1,138 sub-image blocks were analyzed according to the seven target classes and the seven attributes from this dataset. They were all shown as probability density functions (PDF) of each selected feature. In addition, three Sentinel-1 SAR images were downloaded for feature analysis and further classifi cation with the RIOSS algorithm. They are all Sentinel-1A SLC images, recorded in the Interferometric Wide (IW) swath mode during austral spring: a) October 11 th , 2019 (11/10/2019) -S1A IW SLC recorded at 07h 54min 49s UTC; b) July 19 th , 2019 (19/07/2019) -S1A IW SLC recorded at 07h 53min 30s UTC; and c) October 28 th , 2019 (28/10/2019) -S1A IW SLC recorded at 08h 04min 30s UTC. Their geographic positions are displayed in Figure 1.

Computational Analyses
The methodology used here is based on Solberg & Solberg (1996) and Conceição et al. (2021). The former used three main steps for detecting oil spills -image segmentation, attribute extraction, and classification -while the latter also combines that methodology with an optimized feature space and Random Forest Classifiers (RFC) based on decision trees (e.g., Pal 2005, Gislason et al. 2006. A Decision Tree Classifier (DTC) was the basis for random forest classification models. DTC separated a feature space into class domains. It took a feature vector as input into its root node, dividing the tree into two branches, representing two different classes. It was done by asking if one of the vector's features was more significant than a previously trained one. The input vector kept walking through its branches and nodes until it got to "a leaf" or a final node, where the feature vector was finally classified between one of the predefined classes. The idea behind RFC models was to generate many trees trained on random subsets of a dataset. Each tree node can only choose a feature from a random subset of m elements from the selected attributes. The forest output was taken as the mode of its trees' results. As DTC was considered a weak classifier, meaning that it usually offers considerable variance, using many of them in RFC often generated robust, lower variance outputs. For a detailed discussion about DTC and RFC methodologies, the reader should refer to Conceição et al. (2021).
Due to the limited number of images with targets of interest in Sentinel-1, we used the image augmentation technique. This methodology is essential in machine learning, providing a larger sample size. The application of this methodology makes it possible to multiply training data sets, creating more accurate models. In addition, this technique transforms the dataset images through spatial variations (Singh et al. 2011). The result can train or test an algorithm, maximizing learning with each new sample. In practice, the images used were rotated to 15, -30, 60, and -75° to apply the image magnification concept and nearly quintuple our dataset training and testing.
The classification algorithm, developed by Conceição et al. (2021), was trained to identify seven target classes: oil spill, biofilms, rain cells, low wind conditions, ocean's surface, ships, and land cover. These authors named this methodology the "Radar Image Oil Spill Seeker (RIOSS)." The code is written in Python language and can be downloaded from the GitHub site (https://github.com/los-ufba/rioss). For further information on the RIOSS algorithm, the reader should refer to Conceição et al. (2021).
The step before classification aims to separate the image into blocks and apply a mixed adaptive threshold segmentation to identify possible dark targets in the SAR images. First, image segmentation was performed based on an adaptive threshold value, described by Mera et al. (2012) for separating oil and ocean pixels on SAR images. Later, we compute a set of image features for each given block and provide them as input to a random forest model that has been tuned to classify its associated block. Finally, the method was applied to a backscattering level estimation over a 512 x 512 px² sub-image window. The window of 512 pixels was defined as the ideal average size to image the primary targets (oil spill and lookalikes) analyzed in the Sentinel-1 images. The result was a 512 x 512 px² Boolean mask that splits dark-spotted pixels from the background ocean. According to Conceição et al.'s (2021), 42 features were extracted initially and separated into five main categories: shape, complexity, statistics, gradient-based, and texture elements. These attributes were later reduced to 11 to eliminate data redundancy and unnecessary computational cost. Such attributes are: "pseudo-spectral density functions based on: fractal dimension (psdfd), lacunarity (bclac), gradient mean (gradmean), mean, skewness (skew), Shannon entropy (entropy), kurtosis (kurt), segmentation mask's Shannon entropy (segentropy), segmentation mask's energy (segener), and background mean (bgmean). Other two attributes used were the foreground mean (fgmean) in the case of the seven-class model and complexity (complex) in the case of the oil detector. Then, they were estimated and sequentially compared to the usual response of the analyzed phenomena using their probability density functions (PDF), as computed from our augmented image dataset ( Figure 2) via kernel density estimation (KDE). According to Conceição et al. (2021), these were the most representative ones showing clear class separation between the seven different selected target classes. As the objective of this work was to show how the PDFs of such relevant attributes allow us to infer the phenomena present in the analyzed images, we selected the PDFs that could give the best insights to the analysis discussed here. The reader should refer to their paper for a detailed description of the RIOSS algorithm. After that, we compared our analyses to the RIOSS model outputs, which used the entire optimized feature space.
According to Conceição et al. (2021), the Python machine learning library Scikit-learn (Pedregosa et al. 2011) implementations were used to create and train decision trees and random forest models. Their random forest model had 60 decision trees and seven max depths to avoid overfitting. For the 7-class (oil, biofilm, rain, wind, sea, terrain, and ship) image labeling problem, the authors reported 79% and 85% accuracy for the decision tree and random forest models. On the other hand, they discussed that decision trees evaluated the oil detector (2-class problem) with an 86% precision. In contrast, precision was increased to 93% with the random forest model using just the 11 features.

Overall Performance
The seven attributes demonstrated relevant contributions to the separation between target classes, especially when they were analyzed together ( Figure 4).
Although the top first three panels of Figure  4 (a to c) showed a tendency to separate oilrelated surface signature peaks (red -except for Figure 4c) from biofilm (green), rain cells (light blue), low wind conditions (grey), and ocean's surface (royal blue), one could not easily separate these last four classes from each other.
Although these last four panels (Figures 4d to 4g) gave different attribute values depending on the window size choice (not shown here), the normalized Shannon entropy (NSE) showed that ship and land cover could be easily distinguished from the remaining target classes. Albeit the NSE should range from 0 to 1 instead of -1 to +1 (Figure 4d), infinite computed NSE values were replaced here by -1 (land cover and ship curves). Both targets usually had relatively high reflectance values (i.e., echo) than their ocean counterparts. In our case, it turned out to be a valuable parameter if one was seeking ship detection, as we were further away from land. This kind of correlation had been previously reported by Shirvany et al. Just like complexity, the fractal dimension (FD) separated oil, land cover, and ship signatures from the other four target curves (Figures 4e and  f). Ocean's surface images were often easy to separate when considering the other classes based on these two attributes (Figures 4e and f).
The FD attribute played an excellent role in class separation. It was possible to observe that a homogeneous spatial distribution fitted the generated PDF values between the various target classes. A clear separation between oil, low wind conditions, rain cells, and biofilm values made this algorithm the tool that best adapted to the objectives proposed here, even without a detailed analysis of the σ 0 backscatter coefficient. Furthermore, it initially allowed us to identify a potential oil spill in each SAR image. Figures 4a, b, and c were essential in separating oil spill images and the complexity concerning other features.
Low wind conditions (grey curve) and biofilms (green curve) were often easily confused, especially when using these attributes to detect look-alikes (Figures 4b, c, e, and f). However, their signatures were quite apart when the absolute gradient mean (AGM) was applied (Figure 4g). This attribute performed satisfactorily and quite effectively identified most of the analyzed lookalike classes.

The three case studies
The three SAR images used as case studies (19/07/2019, 11/10/2019, and 28/10/2019) were displayed in Fig. 4 as vertical black lines. These three values represented the mean attribute throughout these scenes. For each dark-spotted region observed in the original case study image, a subset was extracted, and then its corresponding attribute averaged over the entire subgroup. It was done by randomly subsampling the subset into 512 x 512-pixel blocks whose attributes were extracted and then averaged for the entire block, giving one value for each test case.
For the 28/10/2019 SAR image, the three averaged values (mean, spot mean, and background mean of 0 [in dB] - Figure 4a, b, and c; vertical black dotted line, ~ -22 dB) showed relatively high probability density values for the biofilm class (green curve). It was also noticed that the ocean's surface (royal blue), rain cells (light blue), low wind conditions (grey), and ship (magenta) curves showed relatively high probability density values too, which were also detached from the oil spill signature (red curve). Not only the curves' peak in dB for biofilm (~ -22 dB) were different from the oil's crest (-27 to -25 dB), but they do differ in terms of dB values  panels (a, b, and c) showed a tendency to separate oil-related surface signature peaks. In contrast, the normalized Shannon entropy (d) showed that ship and land cover could be easily distinguished. too, with different values ranging approximately from 2 5 dB.
The same behavior was observed for the rain cells in the 11/10/2019 SAR image (vertical black line, ~ -18 dB). It was also noticed that rain cells (light blue), ocean surface (royal blue), and low wind conditions (grey) curves stood out from the remaining four classes. Their PDF values and curves' behavior were easily detached from the oil spill signatures (Figures 4a, b, and c). One striking result was the relative importance of the biofilm signature (green curve), especially in Figures 4b and c. Compared with these three target classes, it showed a relatively low probability density for this test case image, which corresponded to a value of ~ -18 dB.
Analogous to what was described for the previous image, the 19/07/2019 SAR image (vertical black line, ~ -19 dB) test case also showed the relative importance of biofilms on Figures 4b and c. In addition, it showed slightly higher PDF values for biofilm signature than for low wind conditions. Although we did not scrutinize this result, we believe that this was due to the ship wake signature (seen as a black angled straight line).
Moreover, according to Alpers et al. (2016), the C-band rain signature can be enhanced (reduced) relative to its background. Enhancement (reduction) is caused by the backscattering of Bragg waves increase (attenuation) due to the raindrops and downdraft winds usually associated with rain cells (the turbulence generated by the raindrops) impinging onto the sea surface.
The analysis of complexity and FD could not distinguish the correct target class for each image (Figures 4e and f). Although the former attribute did quite well for the 11/10/2019 SAR image, it lacked adequately to deal with the biofilm signature (Figure 4e), which gave a probability density value higher than the observed actual conditions (i.e., presence of rain cells and low wind conditions). The same kind of behavior could be observed for the 19/07/2019 image. On the other hand, the FD seemed to capture low wind conditions from these two images and the ship detection of the last image.
Contrastingly, the AGM analysis gave better results than complexity and FD. It seemed to capture well low wind conditions on the first two images (11/10/2019 and 19/07/2019) and a ship in the second and third images. However, it could not capture the biofilm signature in the third image (28/10/2019).
According to our seven pre-selected target classes, figure 5 displayed the analysis of oil probability and classification maps, specifically for the three SAR test case images, both being outputs of RIOSS' oil detector and its 7-class models, respectively. Except for the October 28th, 2019 SAR image (Figure 5c), the first two oil probability and classification maps could precisely reproduce the radar scenes observed on these two images.
For the first image (11/10/2019, Figure 5a), one could see that the overall oil probability for the whole picture is below 0.2, except close to 11.4 o S-35.4 o W, where two boluses with probabilities between 0.2 and 0.4 could be seen. Lower values (~ 0.2) were also seen in the upper right corner of the georeferenced image. Comparison between the actual SAR image (in , dB) and its corresponding classification map showed good agreement, capturing the observed three rain cells and low wind conditions associated with these meteorological features. Although we did not investigate how sensible the RIOSS algorithm was to variable wind speeds, according to Alpers et al. (2016), low wind conditions can prevail if the Bragg waves' backscattering on the sea surface is damped by turbulence generated by raindrops impinging onto the sea surface. They also argue that enhancing or reducing the Bragg scattering depends on rain rate, wind speed, incidence angle, and raindrop distribution.
The July 19 th , 2019 image ( Figure 5b) showed an overall oil probability of less than 0.2 for the whole picture. As previously observed, the comparison between the SAR image and the classification map was in good agreement. The RIOSS algorithm captured and detected the ship's presence (magenta, right panel), and the prevailed low wind conditions over the image.
However, the algorithm misinterpreted the ship wake as low winds. This could be associated with the fact that the ship wake, seen as a dark angled straight line, had a small area compared to the low wind conditions, making it challenging to be accurately captured by the algorithm. Indeed, none of the 1,138 sub-image blocks used in this study had ship wake images to train the RIOSS algorithm. Even though the algorithm could address low oil probability values (< 0.2) for the biofilm patch's inner part, it failed to do it at its outer limits. As a result, this biofilm discontinuity with the ocean's surface was mistakenly marked by high oil probabilities (> 0.6). Three key factors should be responsible for this misclassification. First, biofilms are the closest oil spill look-alikes in radar images, as seen in the PCA cross-plots in Conceição et al. (2021), making the confusion expected. However, the authors state that the precision metric was used to select the classification models. It means that the model was intentionally biased to be mistaken by false positives rather than false negatives (the most dangerous case). Second, their findings also indicate that gradient-dependent features were among the most important attributes when classifying oil spills, and those are especially anomalous in the dark-spotted borders. Third, it mistakenly addressed the biofilm's inner portion as "low wind conditions" and its surroundings as discontinuity oil patches. On the other hand, the algorithm could register the ships' presence in the SAR image ( Figure 5c, right panel, magenta square).
This manuscript was developed during the first code version. Here, we used only five biogenic film images. The increase in the number of training images can reduce the biofilm class's false positive classification values. However, according to Fiscella et al. (2000), a manual inspection is essential in classifying oil spills and look-alikes. So, even with the ability of probability curves to separate false positives from ocean oil spill targets, the mathematical algorithms are susceptible to prediction errors, indicating the importance of a technical team to verify the positive responses of targets in marine environments. Indeed, we are now working on a more robust version of the RIOSS code under the CNPQ/MCTI 06/2020 call -Research and Development for Coping with Oil Spills on the Brazilian Coast -Ciências do Mar Program, grant #440852/2020-0.

CONCLUSIONS
This paper investigates the capability of detecting and classifying look-alikes on Sentinel-1A SAR images. Three SAR-derived observations of oil spill patches along the Northeastern Brazilian coastline, reported in the Brazilian media (print and digital) as oil spill pictures, were used as test cases. These look-alikes were evaluated and tested using an oil spill detection methodology called "the RIOSS algorithm." Our analyses based on optimal attributes are promising since we could capture most of the false positive targets in the three SARreported images in 2019. The only exception was the biofilm slick observed on October 28 th , where the RIOSS algorithm mistakenly classified this organic patch as a low wind region with oil spots. However, this pitfall is acceptable at this project stage since we had only five biogenic film samples to train the algorithm. Further studies will be conducted by: 1) increasing the number of target classes, especially for biofilm slicks; 2) allowing the analysis for archived SAR images other than the Copernicus Program satellite constellation to increase the accuracy of the model; 3) implementing an operational web-based oil spill warning framework for end-users and decision-makers. These future steps are part of an ongoing project recently funded by the Brazilian Navy, the National Council for Scientific and Technological Development (CNPQ), and the Ministry of Science, Technology, and Innovation (MCTI), call CNPQ/MCTI 06/2020 -Research and Development for Coping with Oil Spills on the Brazilian Coast -Ciências do Mar Program, grant #440852/2020-0.