USE OF DIGITAL IMAGES TO CLASSIFY LEAF PHOSPHORUS CONTENT IN GRAPE TOMATOES

Leaf chemical analysis is one of the ways to assess plant development. However, this type of assessment is expensive and time-consuming. The variation of nutrient content in the leaves modifies the proportion of light reflected and absorbed by plants at different wavelengths. Being able to relate the color reflected by the leaves with their phosphorus (P) content and using this data as input into an artificial neural network (ANN) can be an alternative for its determination. For this, it is necessary to establish which colors are most correlated with the different nutrients. Therefore, the phosphorus content in tomato leaves was evaluated in this study, according to four treatments (0.25, 50, 75, and 100% of the P doses). Different vegetation indices were also evaluated using images of mini-tomato leaves through a principal component analysis to determine which ones would be suitable to serve as an input to an ANN (multilayer perceptron).


INTRODUCTION
Phosphorus (P) is one of the most extracted elements by plants among the nutrients needed for plant activities, such as protein production, productivity, and fruit quality (Kumar et al., 2015). Phosphorus plays an important role in the initial development of tomatoes and contributes to increasing the commercial fruit quality (Nowaki et al., 2017). Leaf chemical analysis is one of the most commonly used methods to control the nutritional status of plants, but the response is often time-consuming, hindering decisionmaking for efficient management (Meiqing et al., 2016). Alternatively, the light reflectance of leaves at different wavelengths can be evaluated. In the case of phosphorus, some authors have shown the useful band regions, which vary between the visible and the Short Wave Infrared (SWIR) (Li et al., 2018). The visible region is related to stress due to P deficiency, highlighting the bands in the blue, red, and red edge regions (Stein et al., 2014). This stress causes some changes in the leaves, such as the higher presence of anthocyanin and the resulting decrease in chlorophyll (Marschner, 2013).
Thus, the use of computer vision with RGB (Red, Blue, and Green) images associated with artificial neural networks arises as an alternative for evaluating and classifying the nutritional status of plants. The advantages of this method are its lower cost (compared to optical sensors), greater accuracy (compared to visual assessment), and reduced time compared to chemical analysis (Pir, 2016). Several vegetation indices have been described in the literature in recent years, such as NDVI (Normalized Difference Vegetation Index) (Rousse et al., 1973), MPRI (Modified Photochemical Reflectance Index) (Yang et al., 2008), and DGCI (Dark Green Color Index) (Karcher & Richardson, 2003).
Some authors have conducted studies on the subject to highlight the importance of vegetation indices in crop evaluation. Oliveira et al. (2019) evaluated different indices in the relationship between nitrogen doses and yield in tomato crops. Siedliska et al. (2021) used hyperspectral data from three different crops, while Yan li et al. (2015) and Peng-Tao et al. (2018) evaluated orange and rubber tree leaves, respectively, to classify the amount of phosphorus. The application of neural networks to classify nutrients in leaves has been used in several crops, such as palm (Jayaselan et al., 2018) and tobacco (Backhaus et al., 2011). Furthermore, Barbedo (2019) published a review on this topic, with several studies using images and machine learning.
Therefore, this study aimed to evaluate, through digitalized images of grape tomato leaves, which colors and vegetation indices would have a higher correlation with phosphorus contents in leaves for their classification using an artificial neural network.

Study area, crop, and substrate
The experiment was conducted in a 117-m 2 greenhouse built with a 10-mm transparent alveolar polycarbonate, presenting a panel-exhaust evaporative air cooling system (fan pad). The facility belongs to the Laboratory of Production Technology and Plant Health of the Department of Biosystems Engineering, located at the Faculty of Animal Science and Food Engineering (FZEA) of the University of São Paulo (USP), Pirassununga, State of São Paulo, Brazil. This city is located in the Midwest region of the State of São Paulo, at the latitude 21°59′46″ South and longitude 47°25′33″ West, at an altitude of 627 meters above sea level. The climate is as Cwa (subtropical climate) according to Köppen and Geiger classification.
The mini-tomato cultivar Red Sugar (TPC-1430) from the company Agristar was used in the experiment. The seedlings were produced in the greenhouse; the seeds were sown on December 19, 2017, in 128-cell polyethylene trays filled with a granular coconut fiber substrate with pH = 5.5-6.2, electrical conductivity (EC) = 0.5 ± 0.6 dS m −1 , and density = 150 kg m −3 , with the nutritional composition shown in Table 1. The substrate was hydrated with a water proportion of 35 L to 31 kg, being uncompressed.  The seedlings received manual irrigation after plant emergence with a nutrient solution with the following composition: calcium nitrate (0.513 g L −1 ), potassium sulfate (0.419 g L −1 ), magnesium sulfate (0.389 g L −1 ), and micronutrients (44 mg L −1 ). Irrigation occurred with a frequency of one to three times a day, according to weather conditions and substrate moisture. Before transplanting, the substrate in each pot was saturated with the same nutrient solution used to irrigate the seedlings. The electrical conductivity of the solution drained into the pot was measured after 24 hours, being considered ideal for planting when the value is below 3 dS m −1 . The transplanting was conducted on January 30, 2018, that is, at 40 days after sowing (DAS), when the seedlings had four fully expanded leaves.

Experimental design, irrigation, and nutrient solution
The experimental design was randomized in blocks in a 5 x 4 factorial scheme, with five treatments with different phosphorus (P2O5) doses (0, 25, 50, 75, and 100% of the recommended dose) in the nutrient solution, four evaluations in subplots (22, 36, 50, and 64 days after transplanting -DAT), and four replications. Each plot consisted of four black plastic bags of 6 liters each filled with 4.3 liters of the same substrate used for seedling production, totaling 160 plants conducted in a double-stem system. This value guaranteed a water holding capacity of 2.18 liters. The plants were divided into four rows with 40 pots each, with a spacing of 0.4 m between plants and 0.6 m between rows, reaching a density of 4.1 plants m −2 .
The plants were drip irrigated using an automated system with a flow rate of 8 L h −1 , with each dripper tube 3 Personal Communication providing 2 L h −1 per plant. The pH was measured using a strip and the electrical conductivity was obtained using a portable pH meter (waterproof Pen mS/cm Tester), both at each preparation. Initially, irrigation was performed once a day at 9:30 am, with an irrigation pulse of four minutes, totaling an applied volume of 120 mL. The solution prepared in the water tank was made for 400 liters of water, according to Cunha et al. (2014).
The necessary phosphorus doses were applied manually according to the recommendation proposed in the experiment (0, 25, 50, 75, and 100% P2O5), with the 100% P2O5 dose consisting of 114 g 1000 L −1 up to 22 DAT and 260 g 1000 L −1 until the end of the experiment (64 DAT). The amount applied weekly was calculated as a function of other nutrients applied via irrigation.

Images, vegetation indices, and nutritional diagnosis
Tomato leaf images were taken at 22, 36, 50, and 64 DAT, thus allowing calculating the vegetation indices. The 4th fully expanded leaf from the apex was sampled, as determined by Malavolta et al. (1989), with four leaves being collected from each plot. The leaves from both the upper and the lower faces were scanned with a white stock card in the background on a flatbed scanner, resulting in 300 dpi images saved in TIFF format. A total of 80 samples were collected on each date, that is, four samples per plot, but the value of each index was obtained from the average of the samples per plot. Thus, 20 values of each index were obtained on each date, totaling n = 80 at the end of the collections.
After the acquisition, the images were inserted in software developed in MATLAB (https://url.gratis/zRt6Cz) to obtain their RGB and HSV component values and calculate the vegetation indices of interest. The indices were MPRI, DGCI, Rn (Normalized Red), Gn (Normalized Green), and Bn (Normalized Blue), calculated from both sides of the leaves. Rn is the normalized red; Gn is the normalized green, and Bn is the normalized blue.
The leaves used for image acquisition were stored in a paper bag and oven-dried at 65 °C for 72 hours. Subsequently, the samples were ground and sent for analysis at the Laboratory of Analysis of the Agricultural Sciences sector of FZEA/USP.

Data analysis
Pearson's correlation was used to assess the colors most related to the phosphorus content in the leaves, according to the following criteria: perfect correlation (r=1), strong correlation (r>0.75), moderate correlation (r>0.5), weak correlation (r<0.5), and non-existent correlation (r=0) (NOGUEIRA et al., 2010). The indices Rn, Gn, and Bn were evaluated at this phase of the analysis. A principal component analysis (PCA) was performed using the statistical software R to determine the vegetation indices that most indicated the P variation in the leaves. The principal component analysis is a multivariate exploratory analysis tool, which allows revealing the existence of anomalous samples, relationships between measured variables, and relationships or groups between samples (Lyra et al., 2010).
The classification of the amount of phosphorus into distinct categories was performed after determining the indices. An artificial neural network (ANN) was developed using the indices as input data. The backpropagation Multilayer Perceptron (MLP) ANN was developed in MATLAB (https://url.gratis/8VLyIa). The intermediate layer had ten neurons, and the Levenberg-Marquardt training function (trainlm), in which the performance evaluation was based on the mean square error (MSE), was used. Two-thirds (n = 53) of the data sample was used for network training, and 20 random samples were selected at the network validation stage, distributed throughout the four evaluation periods (22,36,50,and 64 DAT). The hit rate was used as a parameter for evaluating the efficiency of the artificial neural network. Its calculation is performed using the following equation: Where: Vp is the true positive; Fp is the false positive, and Fn is the false negative. True positives are the ANN output data classified within the category to which the real data belongs, Fp is the data classified below this category, and Fn is the data classified as category II, with the real data belonging to category I. The P content categories were established according to Table 2. The same steps were performed with the values of P contents in the leaves obtained in the laboratory.

RESULTS AND DISCUSSION
The statistical analysis of nutrient contents in the leaves showed that only phosphorus (P) and manganese (Mn) were significant relative to doses (Figure 1). Thus, the evaluation of the change in leaf color can be explained only by variations in phosphorus content. Pearson's correlation (Figure 2) showed that the green (upper and lower) and blue colors (lower) showed moderate correlation with phosphorus contents in the leaf (0.628, 0.594, and −0.514, respectively). A reduction in the phosphorus content in leaves also leads to a decrease in chlorophyll efficiency due to its participation in energy production (ATP from ADP) for photosynthesis (Geiger & Servaites, 1994). Thus, adequate phosphorus contents lead to a higher green reflectance (wavelength of 550nm). This factor explains the higher correlation of normalized green with phosphorus contents. Figure 3 shows that lower phosphorus doses also had lower chlorophyll values, reducing the green color of the leaves. Costa et al. (2019) also observed a quadratic behavior of chlorophyll contents regarding the nitrogen variation in vine leaves. Mulla (2013) also observed that normalized green is concentrated in absorption areas where there is the action of pigments and chlorophyll (such as anthocyanin), a fact that explains the changes in this color with the variation in P contents. Moreover, the reflection in the blue range in the leaf regions where less phosphorus is found increases due to the higher presence of anthocyanin.  Figure 4 shows the contribution of each index obtained by PCA. Both leaf faces had a high value, indicated by the eigenvectors. Liu et al. (2015) observed that the upper leaf face had a better result to indicate N and P contents in citrus leaves.
According to Karcher & Richardson (2003), the higher the DGCI value, the darker the green tone measured by this index. Therefore, the positive eigenvector of this index indicates that P variation in tomato leaves more intensely alters the green color saturation. Healthy plants, which did not have their chlorophyll values changed by the deficient P contents (Figure 3), have darker green leaves. It occurs in opposition to plants where P decreases, thus increasing the amount of anthocyanins in the leaves, which leads to a higher presence of the blue color relative to green. It can be observed when considering the Bn index, as it measures the amount of blue band reflected by leaves in contrast to the total RGB. The negative Bn value observed in the second principal component is explained by a decrease in the amount of P and the consequent purple color in the leaves. The purple color has a higher value of reflected blue in its composition, which would explain that a decrease in the amount of P in the leaves causes an increase in the Bn index. Bands reflected by plants in the visible range are more commonly associated with P deficiency, especially in the blue, red, and red-edge regions (Stein et al., 2014). It explains why the Bn index had a high negative correlation with the P data in the analyzed leaves for both Pearson's correlation and PCA. Samples with lower P contents were distributed in the same quadrant (by PCA) as the Bn_U and Bn_L indices ( Figure 5). Those that had higher nutrient contents (mostly samples from 1 to 20) are in the same quadrants as the DGCI_L, Gn_L, and Gn_U indices, which may indicate that a higher presence of P leads to a greener leaf than those with smaller P contents.  Sun et al. (2018) evaluated macronutrient deficiency in rice and observed a predominance of light green color, while dark green leaves could be observed in P-deficient plants, in which case the leaves changed from green to purplish gray. These authors evaluated different vegetation indices and demonstrated that the normalized green index using the first most expanded leaf of rice is among those most related to phosphorus deficiency, whereas the normalized green and MPRI are among the most related in the second leaf. However, only DGCI is related to the phosphorus content when using the third most expanded leaf. The indices cited by these authors are in line with those used in the present study, showing a higher relationship with P contents. Rice plants tend to form anthocyanin and may turn red or purplish in the sheaths (Chen et al., 2014). These authors also found that rice leaves are narrower, erect, and with dark green spots under P deficiency, which is different from the result found in the present study for tomatoes, in which a higher P content in the leaves had a positive relationship (by PCA) with DGCI, that is, darker leaves. Therefore, the neural network was trained with these four indices. The neural network had a hit rate (in classifying the samples within the defined categories) of 80.77% in training using 2/3 (53) of the 80 samples. This rate was obtained after 10 interactions of the system with a performance of 0.24 (mean squared error). The hit rate at this phase became higher with the data obtained after 36 DAT. It indicates that an analysis of leaves from this period of development would satisfy P classification in the leaves. The hit rate was 69% at 22 DAT and 76.9% at 36 and 50 DAT. Twenty random samples were selected from the remaining 1/3 of the dataset to validate the neural network. The hit rate at this phase was 90% to classify in the desired categories, with a performance of 1.45 (MSE). Christensen et al. (2004) worked with a neural network to predict P and N contents in barley leaves using leaf reflectance data instead of vegetation indices and obtained hit rates of 74 (P) and 81% (N). Yanli et al. (2015) used hyperspectral images and obtained better results in the prediction of phosphorus rate using the adaxial face of orange leaves. Jayaselan et al. (2018) used a neural network to classify the nitrogen and potassium contents in palm leaves and obtained an overall hit rate of 85.3% for both elements. Aboukarima et al. (2020) applied a neural network to recognize five varieties of faba bean and obtained a hit rate of 77.5%. The hit rate obtained in the present study is, therefore, at a level close to that obtained by other authors, who used classifiers with neural networks. The P content in the leaves predicted by the ANN relative to the current values ( Figure 6) showed a mean percentage error of 64.3% at the validation phase of the neural network.  Figure 7 shows the percentage error variation in each sample used in the validation of the neural network. Among these samples, the error ranged from 6 to 413%. Four out of the 20 samples had errors with a variation from the actual value above 100%, three between 50 and 100%, and 13 below 50% (six having a variation below 6%). A larger number of samples for the neural network training phase can contribute to a decrease in the observed error.
About 10% of the ANN error (and not only in obtaining the P content) was classified in a category below the current value. In other words, both results were false positives for deficient nutrition if we consider category II as an ideal range of nutrients. No false negatives were observed, indicating that a sample from category I was within category II, the same occurring at the training phase of the network.
According to Withers et al. (2018), the use of phosphate fertilizers in Brazil is twice the demand of plants. This supply above demand is due to soil characteristics, in addition to the lack of leaf analysis. Thus, obtaining leaf analysis instantly using images and artificial neural networks can assist in faster handling and increase the efficiency in the use of this type of fertilization. The results presented in this study also indicate that the technique can be applied to other nutrients, making it a commercially viable alternative for leaf analysis.

CONCLUSIONS
The analysis of the colors reflected in the visible range showed that the green and blue colors had a moderate correlation with phosphorus (P) content in leaves. This factor was corroborated by the results obtained after performing the principal component analysis with the vegetation indices, indicating that the DGCI index has a positive relationship with P content, while the opposite occurs with the Bn index. The developed neural network had a similar hit rate and, in some cases, superior to other studies with the same purpose of predicting or classifying nutrient contents in plant leaves. It indicates that the neural network, using the mentioned indices as input, can be a way to replace the current analysis in obtaining the P content in leaves. The results also indicate that the use of both leaf sides to measure the nutritional content increases the efficiency of the classification of P content.

ACKNOWLEDGMENTS
To the company Agristar, for the partnership and seed donation; to AMAFIBRA, for the coconut fiber donation. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001.