A COMPARISON OF HAZE REMOVAL ALGORITHMS AND THEIR IMPACTS ON CLASSIFICATION ACCURACY FOR LANDSAT IMAGERY

The quality of Landsat images in humid areas is considerably degraded by haze in terms of their spectral response pattern, which limits the possibility of their application in using visible and near-infrared bands. A variety of haze removal algorithms have been proposed to correct these unsatisfactory illumination effects caused by the haze contamination. The purpose of this study was to illustrate the difference of two major algorithms (the improved homomorphic filtering (HF) and the virtual cloud point (VCP)) for their effectiveness in solving spatially varying haze contamination, and to evaluate the impacts of haze removal on land cover classification. A case study with exploiting large quantities of Landsat TM images and climates (clear and haze) in the most humid areas in China proved that these haze removal algorithms both perform well in processing Landsat images contaminated by haze. The outcome of the application of VCP appears to be more similar to the reference images compared to HF. Moreover, the Landsat image with VCP haze removal can improve the classification accuracy effectively in comparison to that without haze removal, especially in the cloudy contaminated area.


Introduction
Cloud and haze can change the brightness values of pixels at different levels of saturation by masking the land surface, which causes to many problems of remote sensing, including exotic errors in land cover classification and false detection of land cover change, etc., especially where large areas are covered by haze (thin cloud) (Zhu and Woodcock, 2012).Due to the variations of climate and weather, it is difficult to obtain cloud and haze free satellite image data completely, which significantly inhibits their application, in humid areas (Asner, 2001).
The algorithm of spatial filtering can remove the lower frequency layer where haze is distributed in the frequency domain.Under the first approach, the HF has been widely used to remove haze from large areas by combining frequency filtering and grayscale change to discriminate haze from background, and subsequently removing the influence of haze based on the illuminationreflectance model.However, the limitations of the selected filter structure and cut-off frequency result in the discard of some useful information.In order to overcome these limitations, many improved haze removal algorithms based on the HF have been presented (Cai et al., 2011), which can effectively deal with the spatial variation of haze contamination with the autoadjusting filter cut-off frequency.
Unlike spatial filtering, spectral transformation can remove haze contamination with spectral information.TC Transformation is the algorithm of an orthogonal linear transformation to remove haze by discarding the fourth component of the TC transformation which is considered to be thin cloud and other noise (Richter, 1996).HOT is an advanced form of TC transformation with only using blue and red bands for haze detection, and dark object subtraction (DOS) is implemented in each band of each slice after density slicing the HOT image (Zhang, 2002).Recently, in order to retrieve the digital number (DN) contaminated by the spatial variation of ever-present haze, an improved technique that can remove haze by the virtual cloud point (VCP) algorithm was presented by He and Liu (He et al., 2010;Liu et al., 2011).
The improved HF and VCP are the two of most frequently-used haze removal approaches.Majority of peer studies mainly focused on the development of the haze removal algorithm.However, less research was dedicated to studying the effects of these two haze removal algorithms on land cover classification.Meanwhile, the relationships between haze removal and image classification accuracy still needed to be explored further.
As the composition and characteristics of land surface element, land cover is the key environmental index and is important to the ecosystem assessment and management with political purposes (Cihlar, 2000).Recently, the succession of land cover and climatic anomaly by global change have highlighted the role of ecosystems in soil retention, water flow regulation, water-quality security and the mitigation and adaption of climate change.Therefore, the landscape analysis and ecological modeling of regional area representing the internships between natural conditions and human impacts across time and space are particularly important, which are based on the data of land cover categories frequently.The use of land cover data to answer ecological questions could be greatly increased by the effective removal of haze from satellite images (Hughes and Hayes, 2014).
Whether the haze removal process can effectively improve classification accuracy and how to improve it further is still needed to be explored in the remote sensing community.Thus, the objective of this study was to assess the effectiveness of two different haze removal algorithms (Improved HF and VCP) and their impacts on land cover classification.

HF algorithm
Images usually consist of light reflected from objects.The basic nature of an image can be characterized by two components based on the illumination-reflectance model (Liu and Hunt, 1984): the amount of light coming from the incident source on the scene or the illumination components and the amount of light reflected by objects in the scene or the reflectance components (Sousa et al., 2011).When surfaces are covered by haze or thin clouds, the illuminated components represent the solar radiation reflected by haze and the reflected components represent the solar radiation reflected by the planetary surface.The simplified concept is shown as follow (Fan and Zhang, 2011;Adelmann, 1998).
Where fi(x, y) is the illumination component, whose spectrum is concentrated in the lowfrequency part, so fi(x, y) can be regarded as the cloud distribution function; fr(x, y) is the reflection component, which is a function of surface features, and its spectrum is concentrated in the high-frequency part.The process of removing cloud contamination is to remove fi(x, y), and find fr(x, y).Using this fact, high-pass filter technique aims to reduce the significance of fi(x, y) by reducing the low frequency components of the image.This can be achieved by executing the filtering process in frequency domain.In order to process an image in frequency domain, the image need to be transformed from spatial domain to frequency domain by using Fourier transform.After that, the Butterworth high-pass filter chosen as the HF (homomorphic filter) is used to extract the high-frequency component, which can yield information on surface features, and to filter out the low-frequency component, which contains the cloud information (Cai et al., 2011).
Where D0 is the cut-off frequency and, traditionally, D0 is a fixed value.By taking into account the fact that different thicknesses of cloud can result in different components of illumination and reflection, D0 should vary with the thickness of cloud cover.k is obtained in the condition D(u,v)=D0 and H(u,v)= /2, k=0.414; n=1.
Using high-pass filtering to reduce the brightness value is the basic principle of haze removal with HF.But the DN value in cloudless areas has been modified as well.A simple processing is to use the DN value of the original image as the judgment condition for per-pixel replacement.The DN value of the original image and the processed image is expressed in f(i, j) and g(i, j), respectively, and the DN value after dealing with the pixel replacement, is expressed in F(i, j).

VCP algorithm
The visible bands for most land surfaces under clear-sky conditions are highly correlated, and the spectral response to haze is different between the blue and red wavelengths.Thus, based on these differences, the Haze-Optimized Transformation (HOT) for Landsat data has been proposed by Zhang et al. (2002).However, HOT has a limitation in that it is based on the assumption that the blue and red bands are highly correlated.HOT does not seem particularly robust because blue and red bands are not always highly correlated (Liu et al., 2011).Therefore, an ideal index named the 'Background Suppressed Haze Thickness Index' (BSHTI) used for haze removal was proposed by Liu et al. (2011), which not only reflect cloud thickness but also suppressed background noise.BSHTI derived from an optimal statistical analysis reflects haze thickness well in accord with visual perception based on experience.However, there are still shortcomings since some spurious land cover types may cause severe bias that is statistically abnormal.Therefore some haze perfection algorithms were designed to correct the spurious values (the overestimated or underestimated haze thickness index) resulting from haze detection caused by the spatial information of some land cover types (He et al., 2010).The robust algorithms proposed by Planchon and Darboux (2002) to detect and fill sinks, which can solve the low HOT problem.The mathematical morphological operations (Liu et al., 2011) were used to solve the high BSHTI problem.
There are three algorithms with good performance in haze removal including DOS, histogram matching (HM), and VCP algorithms (He et al., 2010;Liu et al., 2011).All of them are implemented on each band separately, after the density slicing of haze image.Considering our image characteristics (different land cover type compositions and cloud situations), we selected the VCP to remove haze.Based on the detailed VCP algorithm, the point of intersection (BSHTIVCP, DNVCP) for each band can be found.Then, projecting all pixels (BSHTI, DN) onto the vertical line (BSHTI = 0) to get the dehazed image, and it can be calculated as follows: where yr is the DN value of a pixel after haze removal, y is the original DN value of the pixel before haze removal, x is the BSHTI value of the pixel, and xVCP and yVCP are the BSHTI value and DN value of VCP, respectively.

Materials
The study area located between 104°55' -110°73' E and 27°96' -32°69' N, which is a component part of the Sichuan Basin, covering the entire Chongqing region.The Sichuan Basin in southwestern China is a topographically well-defined rhomboid-shaped basin, bounded on all sides by mountains and drained by the Yangtze River and a number of major tributaries (figure.1).The area for most of the year experiences very humid conditions with a subtropical monsoon climate.With the low altitude and dense river net, the area is influenced by strong evaporation and air humidity.Due to the high mountains block, the water vapor can't spread out and continue to be condensed into cloud and rain within the basin.
As one of the most important ecological function zones, the study area is located on the upper reaches of the Three Gorges Reservoir Area (TGRA), where breeds the largest water conservation project in the world.The status of the landscape structure and ecological function here has direct impacts on TGPA.Eight case areas (figure. 1) were selected for the case studies because of their certain haze contamination images with high quality of corresponding reference images (non-haze contamination).Moreover, these areas are regarded as the places which are hard to be accurately characterized in the world with the heterogeneity and fragment of landscape.Thus, these areas served as important case studies for testing algorithms to monitor haze removal and their impacts on classification.
With a long history of widespread use, Landsat data has become one of the most valuable datasets that are available for mapping land cover information (Cohen et al., 1998;Watmougha et al., 2011).Although selected image data are now available for free, which was one of the constraints highlighted by Foody (2002), users of Landsat data are still suffer from the problem that there may be no option but to select clouds affected image and search for a proper method to identify and remove this contamination when an image from a specific date is required for subsequent data analysis (Annemarie, 2012).Therefore, how to remove or reduce the influence of thin clouds from the remote sensing image in the preprocessing process becomes the barrier of related research.In this paper, 16-scene Landsat TM images (8-scene original and reference images) were collected for the current study, which were listed in Table 1.*Note: The original image as the haze contaminated input data in each study area were used for haze removal algorithm; Reference image with the clear comparable data were used for evaluating the effect of haze removal algorith ms.All Landsat TM images were radiometrically and geometrically corrected.Landsat data collections from United States Geological Survey (USGS) can be available by order at http://glovis.usgs.gov/.
The study area consisted of diverse land cover classes including man-made and natural types.Referring to the land use and land cover classification system Anderson et al., 1976), a total of seven land cover classes were identified in the study area to facilitate analysis for accuracy assessment (see Table 2 for class descriptions) (Center for Advanced Spatial Technologies, 2007).Choice of these classes was guided by: i) the objective of the research and ii) the expected certain degree of accuracy in image classification.

Methods
Flow chart of the method is shown in figure 2. Due to the fact that the weakened quality of the image caused by haze contamination mainly was distributed on the visible and near-infrared band (1, 2, 3 and 4) with low impacts of haze on mid-infrared bands (5 and 7), band 1, 2, 3 and 4 of Landsat data were used as the input for removing haze contamination.A spectral match was performed between each corresponding band in original and reference images to minimize the negative effects by the inter-annual variability of the landscape.To successfully classify all seven land cover classes, a MLC method was used, which is acknowledged as one of the most efficient parametric methods for image classification (Kozak et al., 2008;Bayarsaikhan et al., 2009).In this research, pixel-based supervised maximum likelihood image classification was performed within ENVI v.4.8 software.A post-classification smoothing process was executed to improve the accuracy of image classification by operating a 3×3 moving window across the images (Krishna Bahadur, 2009).
The complexity of the real Chinese landscape with intra annual variability is needed to be explored by a large number of exemplars, thus any collected sample sites should therefore be used during processing.The impact of training set variability could be minimized by the use of stratified random sampling method.To assess the accuracy of the classification, a confusion matrix (or error matrix), which had been the core of the analysis and estimation procedures for the accuracy assessment (Stehman and Raymond, 1998), was applied here.Important accuracy indexes such as the overall accuracy and kappa coefficient were calculated.Moreover, the Kappa coefficient was calculated as an improvement to this overall accuracy assessment metric, and expresses the proportionate reduction in error generated by a classifier compared with the error of a completely random classification.
To evaluate the impact of haze removal on classification (both the addition of transformed features and reduction of inputs), a series of experiments were devised in which different combinations of bands were tested as input to ML classifier.As being showed in Table 3, the first experiment used visible and near-infrared bands as inputs; this experiment was treated as the direct effects of images on classification after haze removal.In the second experiment, nearinfrared and mid-infrared bands were used as inputs to figure out classification accuracy with the reduction of band1,2,3,4, which caused to lower image quality.In the third experiment, the impacts of combinative bands after haze removal and bands without processing were tested by using all Landsat bands.During the following sets of experiments, the Landsat data (with or without haze removal) were augmented with some additional features (normalized difference vegetation index (NDVI)) to produce an expanded dataset for each study area (cases 4-6).

Table 3:
The objectives for each combination of Landsat images and NDVI tested in each study area

Haze removal visual examination
A simple assessment of the haze reduction was implemented by comparing the images before and after haze removal visually with reference images.In this research, we selected four typical areas to illustrate the performance of haze removal.
From an inspection of the results (figure 3), most of haze, including some thin clouds, had been well removed in the both processed results (HF and VCP results) and their differences were obvious, particularly over the area 3 and 6.Note that the color of the corrected image looked distorted from the original image because of the image enhancement processing for visualization to a great extent.By comparing the VCP results with HF results, the VCP visually appeared to remove more thin cloud and to be more similar to the reference images.The spectrum and texture features of VCP results mostly were in accordance with the reference data, despite of some noticeable excessive processing parts.

Haze removal statistical evaluation
In order to assess the effectiveness of the algorithms in recovering the true spectral information under haze, and improving their accuracy rigorously, one common method used to quantitatively evaluate the reliability of the results is the comparison of the image from overlapped region with clear image (spectral matched reference image).The statistical results for the haze image (before and after haze removal) against the reference image are shown in figure 4 and figure 5.After haze removal, the squared correlation coefficient (R2) for each band between the de-hazed image and clear image became greater than 0.25 (HF) and 0.40 (VCP), respectively, which is much better than that before haze removal.It is clear that de-hazed technique can ameliorate the hazy Landsat TM images significantly.However, there exist differences between the two algorithms.The R2 for each band based on the VCP are greater than the results by the HF, implying that the VCP is superior to the HF.

Classification accuracy assessment
In the past, it was common that developments in the haze removal algorithm could not reach the practical applications.A further step beyond a new method development is to demonstrate the benefits of such improvement, especially in land cover classification, which has recently been a hot research topic for a variety of applications (Townshend, 1994).In order to assess and judge the effect of haze removal algorithm on classification accuracy, and the impact of data quality and quantity on classifier performance, classification accuracy before and after haze removal were evaluated with multiple criteria through a series of experiments.Meanwhile, classifications of the clear image were also provided references for comparison.
The results of classification reveal a major trend regarding classifier performance after haze removal (figure 6).Both the HF and VCP outperformed the cloud image in terms of overall accuracy in the whole area.The results for the two haze removal approaches (62.5%-74.3%)are higher than those for the cloud image (62.0-69.5%).However, the classification after VCP preceded HF approach in the production of higher accuracy.Among the six classification experiments for VCP, the band combinations that include all Landsat data and NDVI (cases 6) showed optimistic results, and achieve an overall accuracy of 74.3% across the eight study areas, which is similar to the result for reference image (74.7%).The setup of experiments containing NDVI achieved more accuracy in regards to vegetation because of the NDVI responded the highest for vegetation and the lowest for bare soil and built-up (Krishna Bahadur, 2009).These Bol.Ciênc.Geod., sec.Artigos, Curitiba, v. 23, n o 1, p.55 -71, jan -mar, 2017.
results showed that higher rates of accuracy could be achieved by haze removal approaches.
Given the high degree of variability in the scope and extent of cloud effect, and the complexity of land cover in each study area, the low accuracy rates for the cloud image are expected.
The classification results were assessed further to understand the principle how each algorithm performed in the cloud contaminated area.The assumption here is that the haze contaminated bands provided added variability that causes these areas more difficult to be classified.Evaluating these results therefore offers a more specific viewpoint of classification performance than those in the whole area.Not surprisingly, the results from the cloud area corroborate the results in the whole area: the HF and VCP performed better than the results for cloud image.The overall accuracies in cloud area for the HF and VCP (52.0%-69.0%)were markedly higher than those for the cloud image (49.4%-62.3%),with some added variability indicated by the increase in the error bars for all feature combinations.
There was also a similar pattern for the overall Kappa statistic in the whole area: the haze removal image classification based on the VCP had overall Kappa statistic between 0.55-0.64, the clear image classification was 0.58-0.64,and the haze removal image classification based on the HF had an overall Kappa between 0.49-0.60,while the overall Kappa statistic of original image classification was 0.50-0.58.These results revealed that, in general, the Landsat TM image with VCP haze removal can improve the classification accuracy effectively in comparison to the image without haze removal, especially in cloud contaminated area.While the results for HF performed slightly better than those for the cloud image.The VCP processed images with combination of all Landsat data and NDVI resulted in the best classification with overall accuracy 74.3%, and Kappa coefficient 0.64 for the whole of study areas.

Classification visual examination
The classification results (case 6) of the image before and after the application of haze removal algorithm are shown in figure 7. A visual comparison of the resultant land cover images shows some differences between the classifications.The results indicate that VCP algorithm provide the best results: the size and shape of each land cover features are in good agreement with the morphology of reference image of each area.Moreover, some noticeable misclassification could be found from the results for original image (cloud image).Due to the presence of heavy haze that reducing image quality, the land cover were incorrectly classified using the original image without haze removal by visual comparison with the classifications of reference image.
On the other hand, after haze removal based on HF and VCP, the land cover types were correctly classified in the same area.The classified results after VCP haze removal were similar to the results of reference image, although there still existed some differences between the results of reference image and VCP.The different acquisition time of the reference image may be the major reason for this discrepancy.figure 6 also reveals that the HF did not achieve useable map results compared to VCP, but it still improved the accuracy of classification.

Conclusions
In this paper, we compared two major algorithms for haze removal (improved HF and VCP) to propose which is better at removing or weakening the masking influence of haze in humid areas, and find out their impacts on land cover classification.To evaluate the impacts of haze removal on classification, a series of experiments (case1-6) were devised in which different combinations of bands were tested as input to ML classifier.The results suggest that both of the haze removal algorithms perform well in processing Landsat images contaminated by haze although their results were different to some extent, due to the inconsistent in principle of processes.Furthermore, the classification results reveal that both the HF and VCP outperformed the cloud image in terms of overall accuracy.The spectral and textural signatures of haze area for individual classes showed quite satisfactory indication for separability of the classes after haze removal.It is concluded that the determination of land cover classes with different levels of haze contamination using the VCP before classification can lead to an improvement in the overall classification accuracy.The authors would like to point out that, as this research was aiming at the comparison of mapping results with using MLC before and after haze removal, there is no discussion of the classification with other classifier, so some parts of the study still need to be explored further.Nevertheless, it is expected that the processed methods developed in this study will provide theoretical support for other researches, and a stronger understanding of the haze impacts on classification in humid areas will improve classification accuracy, which will assist ecosystem assessment, management and policy purposes in China and other parts of the world.

Figure 1 :
Figure 1: The location of each case study area.

Table 2 :
Description of land cover classes Sampling collection is essential for classification training and validation, since all further exploration is based on the sample data.In this research, selection of sampled data was performed in-lab for each case study area through visual interpretation of Landsat data and Google Earth VHR image, and through on-the-ground visits to each location during June 2010 and August 2011.All the sampled points were double checked visually using both Google Earth and the Landsat TM images.All the training samples distributed in clear region (non-haze contaminated area through visual interpretation of original image) were used as the input data for classification.

Figure 3 :
Figure 3: A series of panels showing the performance of the haze removal based on the HF and VCP for four typical areas (band 4, 3, 2 of Landsat TM): Area 1 (top row), Area 3 (second row), Area 6 (third row), and Area 7 (bottom row).For references, Landsat images after linear regression (second column) are shown.

Figure 4 :
Figure 4: Scatter plot for the cloud image (before and after haze removal based on HF) versus the reference image using 100 paired polygon samples in the overlapping region.DN, Digital Number.

Figure 5 :
Figure 5: Scatter plot for the cloud image (before and after haze removal based on VCP) versus the reference image using 100 paired polygon samples in the overlapping region.DN, Digital Number.

Figure 6 :
Figure 6: Classification accuracy assessment for each combination of Landsat imagery and transformed features (NDVI) using the maximum likelihood classifier.Classification accuracy in whole area (top row), and classification accuracy in cloud contaminated area (bottom row).For these tests, the results from the areas of eight case studies were then averaged to estimate mean overall accuracy and class accuracy.The value portrayed on each pillar was Kappa coefficient.

Figure 7 :
Figure 7: A series of panels showing the performance of land cover classification after removing the impacts of haze on four typical areas: Area 1 Thin cloud contaminated area in the west plain (top row), Area 3 Serious cloud contaminated area in the south mountain (second row), Area 6 Heavy cloud contaminated area in the west mountain (third row), Area 7 Moderate cloud contaminated area in the south hill (bottom row).These maps were produced using the all Landsat and NDVI (case 6) as the input to the maximum likelihood classifier.*Note: the fork symbols in cloud images represent misclassification areas.