Detection of foliar diseases using image processing techniques

Submitted on September 09 , 2019 and accepted on January 14 , 2020. This work is part of the project investigation funded by Colciencias with code 511-3-024-15. 2. Universidad Tecnológica de Pereira, Facultad de Ingenierías, Pereira, Risaralda, Colombia. lepamplona@utp.edu.co; afcalvo@utp.edu.co; abejarano@utp.edu.co *Corresponding author: afcalvo@utp.edu.co Detection of foliar diseases using image processing techniques


INTRODUCTION
Fruit production is one of the most potential activities (Lasprilla, 2011;FAO, 2014). However, agricultural production is limited by the development of different diseases which generate significant losses in the harvest production (Sankaran et al., 2010). In the case of Solanum lycopersicum, commonly known as tomato, is susceptible to late blight that is caused by Phytophthora infestans, this disease is the most devastating and depending on the thermal floor, this disease develops faster, specifically when the temperature is between 15 °C and 22 °C with a relative humidity percentage greater than 80% (Juárez-Becerra et al., 2010;Intagri, 2015).
Another important fruit is banana, which is affected by affected by the diseases black Sigatoka and yellow Sigatoka, which are caused by the fungi Mycosphaerella fijiensis and Mycosphaerella musicola respectively. These foliar diseases are developed mainly in plantain and banana crops Worldwide. In the case of black Sigatoka, this affects the leaf area of the plant, therefore lowers the quality of the product and is very common in warmer environments and altitudes below 1600 meters above sea level. In addition to his, infections caused by this disease cause premature ripening of the fruit (Juárez-Becerra et al., 2010). Therefore, detecting these symptoms in plants in an efficient manner, has become an important factor in the maintenance of crops, allowing the farmer to reduce the risk of diseases and sustain production and quality in their crops (Finagro, 2014).
The traditional methods used to diagnose diseases are based mainly on direct observation by a professional in this area, generating slow and wasteful processes that are subject to multiple human errors (Guzmán et al., 2009;Caracol Radio, 2018). In addition, the difficult access to many of the crops means that there is not always adequate supervision. Other alternatives are molecular techniques (fluorescent in situ hybridization, polymerase chain reaction, DNA strands), but these are delayed and expensive processes requiring specialized equipment and trained personnel.
Considering that the symptoms can be identified visually, computer vision methodologies generate an opportunity to develop automatic systems for disease detection without human intervention. In this way, the farmer has the opportunity to prevent the spread of different diseases in the crop (Sankaran et al., 2010;Al-Hiary et al., 2011). Therefore, in this work, it is proposed to use digital image processing techniques to detect the percentage of affected area of the disease Phytophthora infestans in tomato plants. The main contributions of the developed methodology are the following: • Manual labeling of tomato leaves with late blight disease from the Plant Village database.
• Detection of percentage of affected area performing the analysis by color.
• Compare the method proposed in this article with the developed in Sharma et al. (2017).
• black Sigatoka and yellow Sigatoka disease detection on banana leaves implementing the proposed methodology considering that this disease has characteristics similar to late blight in terms of color. The purpose of the above test is to check the overallity of the method to detect other types of diseases.
This article is divided into several sections where disease detection methods are related. The methodology describing each of the stages that compose the segmentation of the background and the disease together with the calculation of the percentage of affected area of the leaf are introduced. Additionally, the experimental results are shown along with the evaluation of the results.

State of the art
Among the methodologies that have been proposed for the detection of diseases using digital image processing techniques are two approaches. Firstly, we have the analysis of thresholding by color for the segmentation of the background and the disease. Secondly, machine learning techniques have been implemented where characteristics of color spaces are extracted, such as RGB, HSV and LAB to segment, train and classify the different types of diseases that occur in plants (Huang, 2007;Meunkaewjinda et al., 2008;Wiwart et al., 2009;Baghel & Jain, 2016;Patel & Dewangan, 2017).
For the background segmentation, the RGB color spaces (G channel and the difference of G and B channels) and HSI (S channel) were used; where the thresholds are established in a heuristic, adaptive or statistical way (Hlaing & Zaw, 2017;Sharma et al., 2017;Singh & Misra, 2017), however, no error metric is used to evaluate the segmentation result, since it is generally done qualitatively, in Camargo & Smith (2009) two metrics are proposed to measure segmentation results, however the focus of this work is to identify the type of disease that the plant has with an SVM and not its severity percentage. For the detection of the disease, we have worked with the H channel from HSI and HSV, and the I3 channel from I1I2I3, thresholding by means of an intensity histogram, the Otsu method or the channel average, obtaining an accuracy up to 96% compared to other channels such as CR and A from the YCbCr and LAB color spaces respectively, with a low computational cost (Chaudhary et al., 2012;Sharma et al., 2017). However, these proposals only focus on the detection of the leaf and the type of disease but not on the affected area, besides they do not have statistical relevance, considering that a maximum of 25% of the images in the database are used for any of the processed fruits such as potato and tomato.
Other approaches implement machine learning algorithms to detect diseases in plants, where different procedures are carried out for the extraction of characteristics, among which are the Color Co-occurrence method for an ANNs and a multiclass SVM, the SIFT transformed, the normalization of the RGB color space, the color difference in the LAB for a multiclass SVM, with K-means it has been used between 3 and 4 clusters corresponding to the leaf, the disease and the background in the RGB color space (Chaudhary et al., 2012;Arakeri et al., 2015;Sharma et al., 2017); obtaining efficiencies of up to 99.35% with a high computational cost and low statistical relevance due to the quantity of images evaluated. However, in Mohanty et al. (2016) they worked with 54306 images where a CNN was trained to identify 4 crop species and 26 diseases, however there are limitations due to the variation of the conditions of the images during the training phase, substantially reducing the accuracy of this model. Although it is important to identify the type of disease that plants have, for epidemiological studies it is necessary to study the progress of diseases over time in populations of plants by means of mathematical models that allow the farmer to take the most appropriate control measures according to the state of the leaf, among these models are the estimation of the affected area in the foliage that is given in percentage to determine the rate of development of the disease (Escalante & Farrera, 2004;Patil & Bodhe, 2011).

Methodology
The methodology proposed in this article can be described from the diagram shown in Figure 1, where it is divided into three sequential stages.
First, the Plant Village database that has healthy and diseased tomato leaves is used. Secondly background and disease segmentation for its detection is performed.
Rev. Ceres, Viçosa, v. 67, n.2, p. 100-110, mar/apr, 2020 As a last stage, the percentage of affected area in the plant is determined. For the validation of the method, the referred error of the percentage of the area found is calculated and compared with one of the methods documented in the state of the art, in addition, this procedure is evaluated with another color space.

Database
For this work, we used the Plant Village database available in Mohanty (2016), which has three versions of the images captured for the 14 fruits with different diseases: • In color.
• Segmented (elimination of the background).
Where the images have a high resolution, captured in uncontrolled conditions and labeled by plants pathologist's experts; For more details about the database, consult Hughes & Salathe (2016). For this case, we worked with the tomato, with healthy leaves (1592 images) and diseased by Phytophthora infestans (1909 images).

Regions of Interest
In the segmentation of both the background and the disease is necessary to carry out the validation of the methodology in each of these stages. For which, the area of interest (leaf and disease) was manually selected by means of a digital image editor that will find the number of pixels. Figure 2 shows the process of selecting the regions of interest using the GIMP software tools.
In this case, 100 images from the database (50 healthy and 50 diseased) were randomly selected to determine the area of the leaf. In addition, 1909 images corresponding to diseased tomato leaves were selected to find the number of pixels corresponding to the diseased area of the leaf. This procedure was necessary to obtain an error metric with respect to the algorithm proposed for the segmentation of the background and the disease as well as the detection of the percentage of affected area. Table  1 summarizes some of the selected images along with the number of pixels found.

Segmentation of the disease
For the segmentation of the diseased it was necessary to develop an algorithm to detect and extract the background of the leaf. Therefore, a Gaussian filter that allows to smooth the abrupt changes that are present in the background due to variations of lighting and textures is applied. On the other hand, morphological operations of erosion and dilation are carried out to complete the regions of interest of the image and transformations to other color spaces to eliminate the background.
For the selection of the color space and channel, the color models that were mentioned in the state of the art were considered and a manual segmentation of the background of the 100 images selected from the database mentioned in Table 1, was carried out. The G and B channels from the RGB color space and the H and S from the HSV color space were tested for the background and the CR from YCbCr, A from LAB, I from YIQ and T from TSL for the diseased.
In Figure 3 it is observed that the H and S channel of the HSV color space allows the detection of a greater percentage of area (86% to 94%) for the different variations of lightning and texture that the background has. While, I and T channels exceed 95% of detection area.

Extraction of the background
The steps to extract the plant with respect to the background are shown in below.
To validate this method, the error referring to the size of the image was calculated using the data in Table 1 and the result of the algorithm by means of equation 1, Figure  4 shows the result of the algorithm. That is to say: Where: S M : Segmentation of the leaf manually (number of pixels).
S A : Segmentation of the leaf by the algorithm (number of pixels).
T I : Image size (number of pixels).
To calculate S A Equation 2 was implemented.

Detection of the disease
Once the background has been removed, it is necessary to identify the healthy and diseased part of the leaf, as indicated in the following algorithm:

End // Removal of the healthy part of the leaf Input: Image (IRGB) Output: Image with only the diseased part of the leaf (Ips) Start
In Figure 5b the disease is shown eliminating the healthy part of the leaf, however, the algorithm presents problems when the leaves have areas with a shade of white and some shadows caused by the variations of illumination with shades similar to the one of the illness. Therefore, it is necessary to develop the following algorithm to eliminate these areas as shown in Figure 5c.

// Detection of the disease Input: Image with only the diseased part of the leaf (Ips) Output: Image with the detection of the disease (Ie), Segmented image with the disease (Is) Start End
Rev. Ceres, Viçosa, v. 67, n.2, p. 100-110, mar/apr, 2020    Rev. Ceres, Viçosa, v. 67, n.2, p. 100-110, mar/apr, 2020 Table 3: Methodology limitations in the detection of the disease with a) Proposed method, b) K-means with LAB, c) K-means with HSV  1. Proposed method 4.32 ± 5.44 2. K-means with LAB (Sharma et al., 2017) 7.72 ± 7.24 3. K-means with HSV 10.36 ± 7.60 1. Proposed method 0.03 ± 0.01 2. K-means with LAB (Sharma et al., 2017) 5.15 ± 0.28 3. K-means with HSV 4.91 ± 0.21 to this with respect to the color space used; these algorithms were implemented in the software of Matlab® in a computer with Intel CORE i7 processor with 16GB RAM to establish the computation time in the measurement of the percentage of affected area. In the third stage, this methodology is validated by detecting other diseases that have similar characteristics to Phytophthora infestans in terms of color.

Background segmentation
For this case, the comparison is made with three approaches of the state of the art where the background of the image is eliminated. For the first approach, shown in Figure 6a, the Plant Village database where the images with the segmented background for all tomato leaves, both healthy and diseased is used. These are available in Mohanty (2016). In the second approach, shown in Figure  6b, the background extraction is performed with the RGB's G and B channels. This method was proposed by Hlaing & Zaw (2017). In the third approach, shown in Figure 6c, the RGB's G and B and HSI's S channels are used. This method was proposed by Sharma et al. (2017).

Detection of the disease
In order to validate the proposed methodology in the detection of the percentage of diseased area of the leaf, the work developed in Arakeri et al. (2015) was selected. In this approach, the K-means classifier is used with 3 clusters that represent the background, the healthy and diseased part of the leaf, respectively, with LAB's A and B channels, and the Euclidean norm to calculate the distance between the centroids and the pixel. In addition, a variation of this method is proposed using HSV's H and S channels, taking into account that this color model has been used to segment regions of interest obtaining good efficiency (Chen et al., 2008).

Disease Segmentation
Finally, this methodology was tested on banana leaves with black Sigatoka and yellow Sigatoka. This disease initially manifests itself with vertical stripes of red -brown thin in 2-4 leaves, as the lesion progresses stripes change to a dark brown surrounded by a yellow halo; yellow Sigatoka, it is very common in cool environments and altitudes higher than 1600 meters above sea level. This disease manifests initially with vertical yellow stripes. As the lesion progresses, the color becomes more vivid, making it more visible and then changes to a brown color surrounded by a black border and a grey centre (Pérez, 2012). Considering that the tonality characteristics of these two diseases are similar to the Phytophthora infestans, it was decided to implement the method proposed in this work, for which it was necessary to build a database consisting of 465 images that were captured by visiting To validate this method, the error referred to the size of the image was calculated taking into account the manual segmentation of the area which corresponds to the disease, for which equation 3 was used.

Refered error =
Where: AE: Number of pixels corresponding to leaf disease obtained manually.
areaf: Number of pixels corresponding to the disease in the sheet obtained by the algorithm.
TI: Image size (number of pixels).

Evaluation and validation procedure
For the evaluation and validation process three stages are established, in the first stage the comparison is made with other approaches of the state of the art to quantify the efficiency of each in the segmentation of the background. In the second stage, the procedure presented in Arakeri et al. (2015) is implemented, along with a variant

Percentage of Area affected by the disease
To determine the percentage of area affected by the disease, the following algorithm is implemented using the segmentation of the leaf and the disease.

// Percentage of area affected by the disease Input: Quantity of pixels of the segmentation of the leaf by the algorithm (SA), Segmented Image with the disease (Is) Output: Percentage of area affected (Por), Quantity of pixels corresponding to the disease (areaf) Start
End different plantain crops in the Risaralda region. In Figure  7, the organized database is observed according to the percentage of stage associated with the black Sigatoka and yellow Sigatoka severity scale, as shown below: − Percentage of stages 1 and 2: Low Severity.

RESULTS AND DISCUSSION
This section shows the results of the methodology developed and the comparison with other state of the art approaches is carried out.

Background segmentation
Next, each of the methods described in methodology for the segmentation of the background is compared. Figure 8 relates the results obtained using the images selected from Table 1, where a correct segmentation is observed. However, these methodologies have certain limitations when there are dark shadows on the leaf or when the background has a shade similar to the disease, as shown in Figure 8. These problems appear when a high variability of illumination in the images of the database is present.
In Table 2, the error in the segmentation of the background for each of the methods evaluated is shown, where it is evident that 1 and 2 have a better performance. However, for  Rev. Ceres, Viçosa, v. 67, n.2, p. 100-110, mar/apr, 2020 the second case that corresponds to the database, the methodology developed in order to create a threshold is not evident. It is important to have in mind that these images, having a high resolution, any region that is not segmented will have a significant contribution in the error.

Detection of the percentage of affected area
The results of the implemented algorithm for the detection of the disease and the comparison of the Kmeans method with LAB and K-means with HSV are indicated. Note that Figure 9 shows the need to use the two-colour spaces (YIQ and TSL) to eliminate the parts that do not correspond to the disease. Table 3 shows the results of the algorithm implemented for the detection of the disease and the comparison of the method of K-means with LAB and K-means with HSV. A better K-means result is seen when LAB's A and B channels are used compared to HSV's H and S channels for the same leaves. The methods have an inconvenient when the background is carried out since they have disease-like shades (brown), so it is convenient to capture with a background other than grey, since this color can vary significantly depending on the illumination. However, it is observed that the proposed methodology has better results when discriminating this type of information compared to the K-means classifier. On the other hand, it is observed that K-means with the H and S channels presents a greater error in the segmentation of the disease. Table 4 shows the errors obtained by determining the percentage of affected area on the leaf. Note that all three strategies have good efficiency.
However, in computation time the method proposed in this article has a low computational cost compared to the other strategies implemented, as evidenced in Table 5. Rev. Ceres, Viçosa, v. 67, n.2, p. 100-110, mar/apr, 2020 Table 6 shows the results in the detection of diseases for each of the severities, where it is observed that for each of the stages it is possible to detect these stripes with their respective color variation at the site of the lesion.

CONCLUSIONS
A methodology was presented that allows the detection of Phytophthora Infestans, black Sigatoka and yellow Sigatoka diseases in tomato and banana leaves respectively, determined the percentage of area affected by the lesion, obtaining reliable results for the development of this task with an error of less than 5% using the Plant Village database. With regard to the tests done on banana leaves in an exploratory way with the proposed algorithm, it was possible to detect satisfactorily the variations of the disease, evidencing the versatility of the method to achieve different detections with visual information of the leaf. With this information it is possible to determine the percentage of severity that the plant has, since the leaf area and percentage of affected area are known, to apply preventive, corrective or curative measures according to this percentage. In many of the works developed for the detection of diseases, LAB's A and B channels the HSV's channel H color spaces are usually used, in this case it was decided to use other color spaces such as YIQ and TSL obtaining reliable results in the detection of the disease at a low computational cost compared to the other proposed methods of the state of the art.
On the other hand, this methodology was evaluated in another disease that has similar characteristics in terms of color, obtaining satisfactory results in the detection of the affected areas. It is important to note that although the proposal has its limitations, a thorough test of the algorithm was performed, so it was essential to label the disease to verify the efficiency of the developed algorithm. This method is a diagnostic support tool that allows the user to automatically know the evolution of the disease. This is of great importance, since with the current technological proliferation allows farmers to use this type of tools that allow them to improve the productivity of their crops.

NOMENCLATURE
The nomenclature used in the document is provided here for quick reference.
LAB -Luminance (L), Color range from green to red (A), Color range from blue to yellow (B), (Gonzalez et al., 2009). YCBCR -Luminance (Y), Difference of the components of color of blue (CB), Difference of the components of color of red (CR), (Gonzalez et al., 2009). YIQ -Luminance (Y), (I and Q) Chrominance that includes the components of color from orange to cyan and from green to magenta respectively, (Gonzalez et al., 2009).

SIFT -Transformation of scale invariant characteristic.
CNN -Deep convolutional neuronal network.
rgb2hsv -Convert from RGB color space to HSV. rgb2tsl -Convert from RGB color space to TSL. rgb2yiq -Convert from RGB color space to YIQ. σ -Deviation.