# ABSTRACT

Manual phenotyping for papaya Carica papaya (L) breeding purposes limits the evaluation of a great number of plants and hampers selection of superior genotypes. This study aimed to validate two methodologies for the phenotyping of morpho-agronomic plant traits using image analysis and fruit traits through image processing. In plants of the ‘THB’ variety and ‘UENF/Caliman-01’ hybrid two images (A and B) were analyzed to estimate commercial and irregularly shaped fruits. Image A was also used in the estimation of plant height, stem diameter and the first fruit insertion height. In ‘THB’ fruits, largest and smallest diameters, length, and volume were estimated by using a caliper and image processing (IP). Volume was obtained by water column displacement (WCD) and by the expression of ellipsoid approximation (EA). Correlations above 0.85 between manual and image measurements were obtained for all traits. The averages of the morpho-agronomic traits, estimated by using images, were similar when compared to the averages measured manually. In addition, the errors of the proposed methodologies were low compared to manual phenotyping. Bland-Altman's approach indicated agreement between the volume estimated by WCD and EA using caliper and IP. The strong association obtained between volume and fruit weight suggests the use of regression to estimate this trait. Thus, the expectation is that image-based phenotyping can be used to expand the experiments, thereby maintaining accuracy and providing greater genetic gains in the selection of superior genotypes.

Keywords:
Carica papaya; phenomics; digital image processing

# Introduction

Although Brazil is the second largest producer of papaya Carica papaya (L) (FAO, 2015Food and Agriculture Organization [FAO]. 2015. Browse data. Available at: http://faostat3.fao.org/browse/Q/QC/E [Accessed Dec 10, 2015]
http://faostat3.fao.org/browse/Q/QC/E...
), there is still limited genetic variability in commercial plantations (Oliveira et al., 2010Oliveira, E.J.; Amorim, V.B.O.; Matos, E.L.S.; Costa, J.L.; Castellen, M.S.; Padua, J.G.; Dantas, J.L.L. 2010. Polymorphism of micro satellite markers in papaya (Carica papaya L.). Plant Molecular Biology Reporter 28: 519-530.), making it necessary to increase the supply of cultivars through the selection of superior genotypes. This fruit tree has a relatively long reproductive cycle, in which repeated measurements must be taken on each individual under different environmental conditions. Therefore, the experiments required several replications, a large number of treatments, and rigor and precision in selecting methods. Currently, manual phenotyping for traits of interest entails intensive labor demands and is time consuming. Thus, manual phenotyping limits the assessment of large numbers of individuals, and may decrease experimental precision, reduce the chances of selecting superior genotypes as well as obtaining genetic gain (Roscher et al., 2014Roscher, R.; Herzog, K.; Kunkel, A.; Kicherer, A.; Töpfer, R.; Förstner, W. 2014. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields. Computers and Electronics in Agriculture 100: 148-158.).

In order to improve the efficiency of phenotyping for traits of interest, methodologies were developed based on analysis and image processing. It is expected that the application of these methodologies allows for high performance phenotyping in order to increase the number of assessed genotypes, improving the acquisition and analysis of data and minimizing the experimental error (Li et al., 2014Li, L.; Qin, Z.; Danfeng, H. 2014. A review of imaging techniques for plant phenotyping. Sensors 14: 20078-20111.; Roscher et al., 2014Roscher, R.; Herzog, K.; Kunkel, A.; Kicherer, A.; Töpfer, R.; Förstner, W. 2014. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields. Computers and Electronics in Agriculture 100: 148-158.). A methodology based on digital image analysis was validated by Ferreira et al. (2012)Ferreira, R.T.; Viana, A.P.; Barroso, D.G.; Resende, M.D.V.; Amaral Júnior, A.T. 2012. Toona ciliata genotype selection with the use of individual BLUP with repeated measures. Scientia Agricola 69: 210-216. to measure plant height, stem base diameter and diameter at breast height in Australian red cedar (Toona ciliata). The authors obtained similar average results for measurements performed manually. Algorithms of digital image processing have been validated with a high degree of accuracy to estimate the number of commercial fruits in apple-tree (Aggelopoulou et al., 2011Aggelopoulou, A.D.; Bochtis, D.; Fountas, S.; Swain, K.C.; Gemtos, T.A.; Nanos, G.D. 2011. Yield prediction in apple orchards based on image processing. Precision Agriculture 12: 448-56.), mango (Payne et al., 2013Payne, A.B.; Walsh, K.B.; Subedi, P.P.; Darvis, P.P. 2013. Estimation of mango crop yield using image analysis: segmentation method. Computers and Electronics in Agriculture 91: 57-64.) and vine (Roscher et al., 2014Roscher, R.; Herzog, K.; Kunkel, A.; Kicherer, A.; Töpfer, R.; Förstner, W. 2014. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields. Computers and Electronics in Agriculture 100: 148-158.). Likewise, processing algorithms have been validated to estimate length, diameter and volume of fruit in watermelon (Koc, 2007Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371.), orange (Khojastehnazhand et al., 2009Khojastehnazhand, M.; Omid, M.; Tabatabaeefar, A. 2009. Determination of orange volume and surface area using image processing technique. International Agrophysics 2000: 237-242.) and cantaloupe (Rashidi et al., 2009Rashidi, M.; Gholami, M.; Abbassi, S. 2009. Cantaloupe volume determination through image processing. Journal of Agricultural Science and Technology: 623-631.). As a result, this study aimed to validate two methodologies for the phenotyping of morpho-agronomic plant traits using image analysis and fruit traits through image processing.

# Materials and Methods

## Genetic material

The experiment was conducted in Linhares, in the state of Espírito Santo, Brazil (19°06′ and 19°18′ S, 39°45′ and 40°19′ W, altitude 30 m). One hundred and fifty plants of the ‘UENF/Caliman-01’ hybrid and 150 plants of the ‘THB’ variety were randomly selected from two commercial plots.

## Phenotyping of the morpho-agronomic plant traits

Each plant used in this experiment was identified to facilitate comparison between the proposed and manual methodology. Consequently, the plants were phenotyped manually. Plant height (PH) and the first fruit insertion height (FFIH) were measured with measuring tape, both expressed in centimeters; stem diameter (SD) was measured at 20 cm from the ground using the digital caliper, and was expressed in millimeters. The number of commercial fruits (NCF) and the number of irregularly shaped fruits (NDF) were also evaluated. In the fruit counting, only those with a defined format were considered, and were marked as the last fruit evaluated to facilitate the digital counting. In this study, the pentandric, carpelloid and bananoid fruits were classified as irregularly shaped.

For digital phenotyping, a conventional camera was used. Each plant was photographed in two different positions, one of them perpendicular to the plant (Image A) (Figures 1A and C) considering the axis of the row, and the other one considering the opposite side of the same plant (Image B) used in image A (Figures 1B and D). The pictures were obtained by taking them at a distance of 2.5 m from the plant row. Image A was used to measure PH, FFIH and SD (Figure 1A); and both images (Image A + Image B) were used to estimate NCF and NDF (Figure 1A, B, C and D).

Figure 1
Images used to estimate morph-agronomic plant traits. (A and B) indicates the images used in the hybrid ‘UENF/Caliman-01’; (C and D) images used in the ‘THB’ variety. (A and C), Image A perpendicular to the plant considering the axis of the row; (B and D), image B considering the opposite side of the same plant used on image A. Image A was used to measure the plant height (PH), the first fruit insertion height (FFIH) and the stem diameter (SD); and both images (Image A + Image B) were used to estimate the number of commercial fruits (NCF) and the number of deformed fruits (NDF). The orange arrows show the mark used to identify the last fruit evaluated.

The images were analyzed using the public domain image-processing program ImageJ. In each plant photographed, a ruler was placed as a reference to facilitate calibration using the software function set scale. The NFC and NDF traits were estimated using the plugin Cell Counter, which is part of ImageJ.

## Phenotyping of the morpho-agronomic fruit traits

For this study, 50 fruits were selected from the ‘THB’ variety in which the traits’ fruit length (FL), the largest fruit diameter (D1F), and the smallest fruit diameter (D2F) were measured. The measurements were carried out by digital caliper (DC), and the results expressed in millimeters. The fruits were weighed using an analytical balance. The fruit volume was calculated by the water column displacement (WCD) method. This method required each fruit to be immersed in a graduated volumetric container of 10 L containing an initial known volume of water (initial volume); the fruit volume was obtained by calculating the difference between the initial and final volume.

The image processing (IP) system consisted of a box with dimensions of 50 × 60 × 60 cm, with the inner walls covered with white cardboard, and illuminated with two 20 W PL bulbs; a webcam placed on the top of the box; and a laptop equipped with the ImageJ program. Each fruit was placed in the center of the camera's field of view and two RGB color images were captured after manually turning the fruit 90° around its longitudinal axis. A measuring tape was placed on the fruit surface and the calibration was performed using the set scale function of the ImageJ software. The original image of each fruit was converted to an eight-bit grayscale image. By using the thresholding technique, the region of interest in the grayscale image was segmented using the Otsu algorithm. After that, a binary image with pixel values of 0 (black) or 250 (white) was obtained. From the grayscale image, values lower than 144 were converted to 0 (black) and values greater than 144 were converted to 250 (white), which led to obtaining a binary image for each fruit. The Canny filter was used to detect the edges in each image. The image-processing flowchart is illustrated in Figure 2A, B, C and D. The number of pixels that represent the length and the width of the fruits was measured in the binarized image (Figure 2D).

Figure 2
Digital image processing. A) Original RGB color image of papaya, B) eight-bit grayscale image, C) two-bit binary image and D) the outline image.

The parameters major and minor of the ImageJ were used to validate the measurements of FL, D1F and D2F. Thus, the parameter major was compared to the FL trait and the parameter minor was compared to the D1F and D2F traits. Major and minor 1 were obtained from the first image and minor 2 was measured using the image obtained after rotating the fruit 90° around its longitudinal axis.

In order to estimate the volume using the dimensions of the fruit obtained via digital image and those obtained using a caliper, each papaya fruit was considered as a uniform ellipsoid. Thus, fruit volume was estimated from the length (FL), the largest diameter (D1F) and the smallest diameter (D2F) to construct the equation of ellipsoid approximation (EA) (Koc, 2007Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371.):

(1) $v = π ( F L × D 1 F × D 2 F 6 )$

## Statistical Analysis

Statistical analysis was performed using SAS Studio software (SAS Institute, Cary, NC, USA).

In order to verify whether the phenotyping methodologies of morpho-agronomic traits assisted by digital images statistically differ from manual methodology, Student's t test was applied to assess differences in population averages for paired data (same population: manual phenotyping and phenotyping via digital image). In addition, confidence intervals were constructed for the average difference of methodologies (paired data).

The average relative error of the digital images methodologies was calculated according to the following equation (Zhang, 2000Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22: 1330-1334.):

(2) $E r r o r ( % ) = | x ¯ m − x ¯ i x ¯ m | × 100 ,$

where: $x¯m$ is the average of the trait obtained using the manual methodology and $x¯i$ is the average of the trait obtained via digital image.

Pearson correlations between the manual measurements and those obtained through the digital images were calculated for the traits assessed.

Since this is a study aiming to validate a phenotyping methodology to be applied to breeding, a simulation was performed to establish the minimum number of plants that should be used in the experimental plots based on the results observed in the validation of the proposed methodology. The simulation was performed for the number of commercial fruits (NCF) due to its agronomic importance. Thus, the relative average difference of NCF was calculated simulating a different number of plants, measured with two images (Image A + Image B) and one composite image (Image A and Image B). In each simulation, samples with size k (1, 2, 4, 5, 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140 and 150) were generated.

The Bland and Altman (1999)Bland, J.M.; Altman, D.G. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135-160. approach was adopted to plot the agreement between the papaya fruit volume measured by WCD and EA. This approach is used to analyze the agreement between two different methods that measure the same variable with the same unit of measurement. Thus, it was possible to decide whether the differences between the measurements of the two methods might be acceptable and equivalent.

# Results and Discussion

## Phenotyping of the morpho-agronomic plant traits

Table 1 shows the statistics obtained from the analysis of both manual and image analysis phenotyping of the morpho-agronomic plant traits measured in the ‘THB’ variety and ‘UENF/Caliman-01’ hybrid. For the two genotypes, similar averages were obtained for plant height (PH), stem diameter (SD), and first fruit insertion height (FFIH) (p > 0.05). Similarly, these results were obtained at intervals constructed for the average difference of methodologies (Table 1). Because all intervals contain zero, the estimated averages of these traits are the same considering the two methodologies. These results indicate that trait phenotyping assisted by digital images can be used to replace the traditional methodology. Digital image analysis has also been used in other crops with the purpose of evaluating traits of interest. For example, Ferreira et al. (2012)Ferreira, R.T.; Viana, A.P.; Barroso, D.G.; Resende, M.D.V.; Amaral Júnior, A.T. 2012. Toona ciliata genotype selection with the use of individual BLUP with repeated measures. Scientia Agricola 69: 210-216. found no significant differences between the averages of the morphological traits in Australian red cedar measured manually and those measured by digital image analysis. In turn, Sritarapipat et al. (2014)Sritarapipat, T.; Rakwatin, P.; Kasetkasem, T. 2014. Automatic rice crop height measurement using a field server and digital image processing. Sensors 14: 900-926. and Miller et al. (2015)Miller, J.; Morgenroth, J.; Gomez, C. 2015. 3D modelling of individual trees using handheld camera: accuracy of height, diameter and volume estimates. Urban Forestry and Urban Greening 14: 932-940. validated algorithms of digital image processing to estimate the plant height of rice and Tilia platypyllos, Acer campestre, Acer rubrum, Juglans regia, respectively, obtaining similar results to those found using the traditional methodology.

Table 1
Estimates of average, average difference confidence interval (CI), average relative error of the methodology assisted by digital images (Error %) and Pearson correlations (r) for traits: the plant height (PH), the first fruit insertion height (FFIH), the stem diameter (SD), the number of commercial fruits (NCF) and the number of irregular shaped fruits (NDF), measured in the ‘UENF/Caliman-01’ hybrid and the ‘THB’ variety using manual and image analysis methodologies.

The average relative error of the methodology assisted by digital images was zero for the PH and FFIH traits in the the ‘UENF/Caliman-01'hybrid and the ‘THB’ cultivar (Table 1). For the SD trait, the average relative error was 1 % for ‘UENF/Caliman-01’ and 1 % for ‘THB’ (Table 1). These errors are lower than those reported for diameter estimations found by Shimizu et al. (2014)Shimizu, A.; Yamada, S.; Arita,Y. 2014. Diameter measurements of the upper parts of trees using an ultra-telephoto digital photography system. Open Journal of Forestry 4: 316-326., who measured this trait in Japanese cedars using an algorithm of analysis and processing, finding a minimum error of 2 % and a maximum of 7%.

These results show that the phenotyping of morphological papaya traits based on digital image analysis enables accurate estimations. The image analysis errors may be due to the irregular architecture of certain plants. For example, cylindrical stems can be measured more accurately than those in irregularly shaped stems. It could be seen that the correlations between the manual measurements and those using digital image analysis for the morphological traits were high, ranging from 0.97 to 0.98 in both genotypes and showing that the two-phenotyping methodologies presented good constancy.

As for the NCF and NDF traits, where two images were used (Image A + Image B) there were no differences (p > 0.05) between the averages. Furthermore, the confidence intervals for the average difference of methodologies presented zero, confirming equality between the averages (Table 1). Thus,digital image analysis methodology could be used for the phenotyping of these traits.

The average relative error was low for the NCF trait (1 %) for the ‘UENF/Caliman-01’ hybrid and zero % for the ‘THB’ variety. However, for NDF the average relative error was 2 % for the hybrid and 5 % for the variety (Table 1).

High correlations between the NCF averages were obtained using two images and the averages obtained manually, being 0.98 for the hybrid and 0.97 for the variety. In its turn, for NDF correlations were found with a magnitude of 0.87 in both genotypes. These correlations are greater than those found in fruit trees using phenotyping methodologies based on digital images. For example, Payne et al. (2013)Payne, A.B.; Walsh, K.B.; Subedi, P.P.; Darvis, P.P. 2013. Estimation of mango crop yield using image analysis: segmentation method. Computers and Electronics in Agriculture 91: 57-64. developed an algorithm for estimating the number of fruits in mango tree, obtaining a correlation of 0.74 compared to manual counting. In apple trees, Zhou et al. (2012)Zhou, R.; Damerow, L.; Sun, Y.; Blanke, M.M. 2012. Using colour features of cv. ‘Gala’ apple fruits in an orchard in image processing to predict yield. Precision Agriculture 13: 568-580. validated an algorithm used to estimate the production of light green and red fruits, and found correlations of 0.80 and 0.85 compared to manual counting. In vines, Roscher et al. (2014)Roscher, R.; Herzog, K.; Kunkel, A.; Kicherer, A.; Töpfer, R.; Förstner, W. 2014. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields. Computers and Electronics in Agriculture 100: 148-158. developed an algorithm for estimating the number and size of berries at different development stages. The algorithm identified the berries with an average difference in diameter of 1 mm and a correlation of 0.88 compared to measurements taken using a caliper.

In order to determine how many images can be used in phenotyping the NCF and NDF traits, the A and B images were compared to assess the symmetry between the photographed sides. For this, the number of fruits obtained in each image was multiplied by two (Image A × 2 and Image B × 2) and the estimated averages were compared to those obtained manually by the t test for paired data. In relation to NCF, the averages obtained using Image A × 2 or Image B × 2 showed no differences (p > 0.05) in the averages obtained by manual measurements in both genotypes. Furthermore, the average relative error was low and close to that calculated with the two images in the ‘THB’ variety and ‘UENF/Caliman-01’ hybrid. These results indicate that the number of commercial fruits can be obtained by using an image without loss of experimental precision.

Although these trait estimation errors are low in the ‘THB’ variety and ‘UENF/Caliman-01’ hybrid, there are differences between them, with greater accuracy in the variety compared to the hybrid. This fact may be due to greater fruit occlusion in the hybrid compared to the variety. In some cases, a fruit is not completely visible because the leaves, branches, or other fruits hamper its viewability, or more fruits may be growing in the same node. Thus, fruit occlusion can decrease their visible area and make their detection in the image difficult. Fruit occlusion has been widely commented on in research studies of fruit detection through methodologies based on digital images (Aggelopoulou et al., 2011Aggelopoulou, A.D.; Bochtis, D.; Fountas, S.; Swain, K.C.; Gemtos, T.A.; Nanos, G.D. 2011. Yield prediction in apple orchards based on image processing. Precision Agriculture 12: 448-56.; Dorj et al., 2013Dorj, U.; Malrey, L.; Sangsub, H. 2013. A comparative study on tangerine detection, counting and yield estimation algorithm. International Journal of Security and Its Applications 7: 405-412.; Payne et al., 2013Payne, A.B.; Walsh, K.B.; Subedi, P.P.; Darvis, P.P. 2013. Estimation of mango crop yield using image analysis: segmentation method. Computers and Electronics in Agriculture 91: 57-64.; Roscher et al., 2014Roscher, R.; Herzog, K.; Kunkel, A.; Kicherer, A.; Töpfer, R.; Förstner, W. 2014. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields. Computers and Electronics in Agriculture 100: 148-158.).

The correlations between NCF estimated by manual methodologies and using an image (Image A × 2 and Image B × 2) were high, showing that consistency between methodologies can be maintained using an image. The good consistency between the two methodologies is very important for yield prediction purposes in commercial plantations since commercial fruit counting via digital image can be used to feed mathematical models of yield prediction, and generate useful information to improve activities in harvest, post-harvest, packinghouse, and commercialization (Aggelopoulou et al., 2011Aggelopoulou, A.D.; Bochtis, D.; Fountas, S.; Swain, K.C.; Gemtos, T.A.; Nanos, G.D. 2011. Yield prediction in apple orchards based on image processing. Precision Agriculture 12: 448-56.).

In the case of irregularly shape fruits, in the ‘UENF/Caliman-01’ hybrid the averages obtained using an image (Image A × 2 or Image B × 2) showed no differences (p > 0.05) from the averages obtained by manual measurements. However, in the ‘THB’ variety the averages obtained using an image did show differences (p ≤ 0.05) from the averages obtained by manual measurements. Furthermore, the correlation between the estimations carried out by the manual methodology and using an image (Image A × 2 or Image B × 2) were lower than those found using two images, ranging from 0.70 to 0.79 and showed that consistency decreased when using an image. The average relative error increased considerably when an image was used. Thus, in the ‘UENF/Caliman-01’ hybrid the error was 2 % (Image A × 2) and 3 % (Image B × 2); in contrast, for the ‘THB’ variety the error was 14 % (Image A × 2) and 23 % (Image B × 2). The NDF trait in the hybrid may be estimated using an image, indicating that the photographed sides are symmetrical. However, in the ‘THB’ variety the photographed sides presented differences, showing that they are not symmetrical. These differences may be due to greater variation in this trait in the variety. Another possible explanation is that in ‘THB’ there is greater difficulty in identifying the portion of the fruit that exhibits the anomaly. For example, carpelloid fruits can be confused with commercial fruits. In the case of pentandric and bananoid fruits, identification is easier due to their characteristic shape. However, a trained and experienced evaluator can identify most of the irregularly shaped fruits from the image, thereby reducing errors attributable to the methodology.

The results in the simulation of the minimum number of plants that should be used in the experimental plots for estimating the NCF trait are shown in Figure 3A and B. For the ‘UENF/Caliman-01’ hybrid when using two images and a plant, the relative difference was 5 %; this difference was lower with the increase in sample size. Taking this into consideration, a minimum number of two plants and a maximum of 10 plants may constitute the experimental plots for papaya; these results show that digital image analysis methodology can be used to estimate this trait in experimental plots. In the simulation performed by using an image, with the increase in sample size, the relative difference also decreases. However, the relative difference is higher compared to the results obtained using two images. Thus, using Image A × 2 and a plant, the difference was 22 %; on the other hand, using Image B × 2, yielded a difference of 9 %. When two plants were used, the relative difference using Image A × 2 was 15 % and 9 % using Image B × 2; and when using four plants, the difference was 1 % using Image A × 2 and 4 % using Image B × 2.

Figure 3
Relationship between plot size (K plants per plot) and average relative difference of the estimation of the number of commercial fruits (NCF) using: two images (A + B) and an image (Image A × 2 and Image B × 2). A) The result of the ‘UENF/Caliman-01’ hybrid; B) The result of the ‘THB’ variety.

For the ‘THB’ variety, using two images and a plant, the relative difference was 9 %, which decreased as the sample size increased. When an image is used, as the sample size increases the relative difference also decreases. However, the relative difference is higher when compared to the results obtained using two images. Thus, using Image A × 2 and a plant, the difference was 22 %, while using Image B × 2, the difference was 9 %. In turn, by simulating two plants, the difference obtained using Image A × 2 was 14 % and 10 % when using Image B × 2. In addition, when using four plants the difference found was 4 % using Image A × 2 and 6 % using Image B × 2.

These results suggest that in experiments formed by plots with few plants (k = 2, 3), the estimation of NCF can be performed with two images; on the other hand, in experiments with larger plots (k ≥ 4), this trait can be assessed using one image.

## Phenotyping of the morpho-agronomic fruit traits

The averages of the FL, D1F and D2F traits measured with a caliper showed no differences (p > 0.05) from the averages of the parameters major, minor 1 and minor 2, respectively (Table 2). These results indicate that the parameters major and minor are equivalent to the length and diameter of the papaya fruits; thus, they could be used to estimate these traits by digital image processing (IP). The estimate error using digital images was zero in all three traits, showing that this method could be deployed to estimate them with a high degree of accuracy. Several authors have reported success in estimating the diameter and length of fruits using digital images. For instance, Koc (2007)Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371. validated these traits in watermelon with the use of digital images, obtaining averages similar to the measurements taken with a caliper. Similarly, Khojastehnazhand et al. (2009)Khojastehnazhand, M.; Omid, M.; Tabatabaeefar, A. 2009. Determination of orange volume and surface area using image processing technique. International Agrophysics 2000: 237-242., when evaluating oranges, and Rashidi et al. (2009)Rashidi, M.; Gholami, M.; Abbassi, S. 2009. Cantaloupe volume determination through image processing. Journal of Agricultural Science and Technology: 623-631., when evaluating cantaloupe, found that the fruit averages obtained with IP were similar to those measured with a caliper.

Table 2
Estimates of average, average difference confidence interval (CI), average relative error of image processing (Error %) for traits: The fruit length (FL), the largest fruit diameter (D1F) and the smallest fruit diameter (D2F), measured in the ‘THB’ variety using digital caliper (DC) and image processing (IP).

The correlation between the volume calculated by WCD and by the EA using the dimensions obtained with a digital caliper was high (r = 0.96) (Figure 4A). According to the Bland and Altman (1999)Bland, J.M.; Altman, D.G. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135-160. approach, these methods are consistent, which means that the volume of papaya fruits may be estimated using EA with the dimensions of the fruit obtained with a caliper. Agreement was obtained when only the smallest diameter and the length of the fruit were used in EA. The average difference (AD) between the estimated volume using the dimensions obtained with a digital caliper and by WCD was 2.73 mL (Figure 4B). The standard deviation of the volume (SDev) differences was 22.17 mL. The volume of the papaya fruit measured from WCD and the volume estimated with a caliper was similar (p > 0.05) (Table 3). Ninety-five percent of the volume differences are expected to be between the AD at −1.96 SDev and AD + 1.96 SDev, known as the limits of agreement (Bland and Altman, 1999Bland, J.M.; Altman, D.G. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135-160.). The limits of agreement of the differences between the volume calculated by WCD and EA were −40.71 and 46.2 mL (Figure 4B). The volumes by expression of ellipsoid approximation can be lower than 40.71 mL or 46.2 mL higher than the volumes calculated by WCD. The errors in this methodology may be due to the use of an approach expression, which is based on the assumption that the fruits have a perfectly ellipsoid shape. Therefore, irregularly shaped fruits serve to increment the errors of this methodology. This was evidenced in the use of EA, in which the volume is validated only when the smallest fruit diameter was used.

Figure 4
A) Correlation between estimated volume values by water column displacement (WCD) method and the expression of ellipsoid approximation (EA) using dimensions performed with digital caliper (DC); B) Bland-Altman plot for the comparison of papaya volumes measured with water column displacement (WCD) method and the expression of ellipsoid approximation (EA) using dimensions performed with digital caliper (DC); outer lines indicate the 95 % limits (upper and lower) of agreement and the center line shows the average difference.
Table 3
Estimates of average, average difference confidence interval (CI), average relative error of image processing (Error %) to the volume of papaya fruit, measured in the ‘THB’ variety using the water column displacement (WCD) method, and the expression of ellipsoid approximation using measurements performed with digital caliper (DC) and image processing (IP).

The correlation between the volume calculated by WCD and EA using the dimensions obtained by IP was high (r = 0.95) (Figure 5A). The Bland and Altman (1999)Bland, J.M.; Altman, D.G. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135-160. agreement between the two methods was also obtained when only the smallest diameter and the length of the fruit in the approximation expression were used. The average difference between the estimated volume using IP and WCD was where AD = 6 mL (Figure 5B). The standard deviation of the volume differences was where SDev = 29.78 mL. The volume of the papaya fruit measured from WCD and the volume estimated by IP was similar (p > 0.05) (Table 3). The limits of 95 % of agreement of the differences between the two methods were −64.4 and 52.4 mL (Figure 5B). The volumes by EA may be lower than 64.4 mL or 52.4 mL higher than the volumes calculated by WCD.

Figure 5
Correlation between estimated volume values by water column displacement (WCD) method and the expression of ellipsoid approximation (EA) using dimensions performed with image processing (IP); B) Bland-Altman plot for the comparison of papaya volumes measured with water column displacement (WCD) method and the expression of ellipsoid approximation (EA) using dimensions performed with image processing (IP); outer lines indicate the 95 % limits (upper and lower) of agreement and center line shows the average difference.

The Bland and Altman (1999)Bland, J.M.; Altman, D.G. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135-160. approach has been used to validate the volume and area of fruits using analysis and digital image processing. Thus, it has been used, for example, in validating the volume estimate of watermelon (Koc, 2007Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371.), orange (Khojastehnazhand et al., 2009Khojastehnazhand, M.; Omid, M.; Tabatabaeefar, A. 2009. Determination of orange volume and surface area using image processing technique. International Agrophysics 2000: 237-242.) and the surface area of zucchini (Arjenaki et al., 2012Arjenaki, O.O.; Asad, M.M.; Parviz, A.M. 2012. A new method for estimating surface area of cylindrical fruits (zucchini) using digital image processing. Australian Journal of Crop Science 6: 1332-1336.).

Methodology errors based on digital images can also be due to the use of EA, as evidenced in volumes estimated with a caliper. The accuracy of the expression of ellipsoid approximation decreased in line with increments in the irregularity of the fruits’ shape. Because of this fact, the fruit volume may be validated using its smallest diameter and length only.

The accuracy of fruit volume estimated by means of the dimensions obtained by IP was lower than that observed using the dimensions obtained with a caliper. This problem can be attributeded to the use of just a single camera. Thus, with increases in fruit size, the distance changes, resulting in an overestimation of the estimated volume for IP. Although the distance between the camera and the measurement box was constant, the distance between the fruit surface and the camera reduced with increasing fruit size. Koc (2007)Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371., who also attributed image estimation errors to the distance change as fruit size increased, reported this problem in watermelon. According to Khojastehnazhand et al. (2009)Khojastehnazhand, M.; Omid, M.; Tabatabaeefar, A. 2009. Determination of orange volume and surface area using image processing technique. International Agrophysics 2000: 237-242., this problem can be solved by using several cameras and developing more complex algorithms, so that a 3D fruit silhouette can be reconstructed. Recently, more accurate methods such as the Monte Carlo simulation have been used in estimating the volume of irregularly shaped fruit (Siswantoro et al., 2014Siswantoro, J.; Prabuwono, A.S.; Abdullah, A.; Idrus, B. 2014. Monte Carlo method with heuristic adjustment for irregularly shaped food product volume measurement. The Scientific World Journal 2014: 1-10.).

Fruit mass is a very important trait for papaya breeding due to its relationship with production and use as a standard quality of fruits. Fruit mass is also used to monitor fruit development in the field, yield prediction and the estimating of fertilization and irrigation levels, planning activities in the packinghouse, transport and commercialization (Sabliov et al., 2002Sabliov, C.M.; Boldor, D.; Keener, K.M.; Farkas, B.E. 2002. Image processing method to determine surface area and volume of axi-symmetric agricultural products. International Journal of Food Properties 5: 641-653.; Koc, 2007Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371.). Fruit mass can be determined from volume if its density is known. In this study, fruit volume estimated by EA using measurements taken with a caliper and IP was related to fruit mass. The results show that there is a high association between volume and mass for both methods. In this sense, the regression equations with R2 = 0.94 from measurements taken with a caliper and R2 = 0.87 for those taken with IP (Figure 6A and B) were obtained. From the regression analysis, the following equations were obtained:

(3) $M = 0.85 Vol DC + 5.12$

and

(4) $M = 0.72 Vol IP + 59.43 ,$

where: M is the mass (g), VolDC is the volume estimated by using a digital caliper (mL) and VolIP is the volume estimated by using digital image processing.

Figure 6
Regression between the volumes calculated by the expression of ellipsoid approximation (EA) and the fruit mass. A) The result for the volume estimated using dimensions performed with digital caliper (DC); B) The result for the volume estimated using dimensions performed with image processing (IP).

Thus, these simple equations can be used to estimate the mass of papaya fruit from the volume estimate obtained by EA using a caliper and IP. Similarly, Khojastehnazhand et al. (2009)Khojastehnazhand, M.; Omid, M.; Tabatabaeefar, A. 2009. Determination of orange volume and surface area using image processing technique. International Agrophysics 2000: 237-242. observed for orange fruits a high association where R2 = 0.93 between their mass and volume. These authors also obtained a regression equation to estimate fruit weight from the volume obtained with IP.

Methodologies based on digital images are promising tools that can be used to assist the phenotyping of interest traits in the papaya crop. Images are captured with a conventional camera, which is simple, inexpensive, and easily handled and transported. In addition, the images can be stored in a computer for later analysis. All these traits reduce the work force and the time spent in the field and laboratory measurements, and improve the phenotyping process. For example, in this study, two observers spent on average 96.8 s to estimate manual morpho-agronomics traits per plant, whereas the same observers spent 15.8 s to capture two images per plant and 30 s for image analysis. In other words, the proposed image-based phenotyping is about twice as much quicker. Thus, the digital phenotyping proposed in this study is a much less time consuming process. The expectation is that image-based phenotyping methodologies allow for expanding the experiments, performing phenotypic evaluation quickly and precisely, contributing to increases in selection differential and the heritability coefficient, with a direct effect on genetic gain. When applied to breeding populations, the precise quantification of phenotype increments the proportion of variance due to genetic effects and genetic gain in the selection of superior genotypes (Honsdorf et al., 2014Honsdorf, N.; March, T.M.; Berger, B.; Tester, M.; Pillen, K. 2014. High-throughput phenotyping to detect drought tolerance QTL in wild barley introgression lines. PLOS One 9: e97047.; Parent et al., 2015Parent, B.; Shahinnia, F.; Maphosa, L.; Berger, B.; Rabie, H.; Ken, H.; Kovalchuk, A.; Langridge, P.; Fleury, D. 2015. Combining field performance with controlled environment plant imaging to identify the genetic control of growth and transpiration underlying yield response to water-deficit stress in wheat. Journal of Experimental Botany 66: 5481-5492.; Pauli et al., 2016Pauli, D.; Sanchez-Andrade, P.; Carmo-Silva, E.; Gazave, E.; French, A.N.; Heun, J.; Hunsaker, D.J.; Lipka, A.E.; Setter, T.L.; Strand, R.J.; Thorp, K.R.; Wang, S.; White, J.W.; Gore, M.A. 2016. Field-based high-throughput plant phenotyping reveals the temporal patterns of quantitative trait loci associated with stress-responsive traits in cotton. G3: 865-879.). Thus, it can be used at several stages of papaya breeding programs such as for germplasm evaluation, inbred line development and yield trial evaluations in general. Also, image-based phenotyping can be used to facilitate genome-wide selection (GWS), genome-wide association studies (GWAS) and marker-assisted selection (MAS).

# Acknowledgements

The authors are grateful to the Caliman Agrícola S/A and the Brazilian National Council for Scientific and Technological Development (CNPq) for financial support. We are also grateful to the Coordination for the Improvement of Higher Level Personnel (CAPES) for scholarships granted to the students.

# References

• Aggelopoulou, A.D.; Bochtis, D.; Fountas, S.; Swain, K.C.; Gemtos, T.A.; Nanos, G.D. 2011. Yield prediction in apple orchards based on image processing. Precision Agriculture 12: 448-56.
• Arjenaki, O.O.; Asad, M.M.; Parviz, A.M. 2012. A new method for estimating surface area of cylindrical fruits (zucchini) using digital image processing. Australian Journal of Crop Science 6: 1332-1336.
• Bland, J.M.; Altman, D.G. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135-160.
• Dorj, U.; Malrey, L.; Sangsub, H. 2013. A comparative study on tangerine detection, counting and yield estimation algorithm. International Journal of Security and Its Applications 7: 405-412.
• Ferreira, R.T.; Viana, A.P.; Barroso, D.G.; Resende, M.D.V.; Amaral Júnior, A.T. 2012. Toona ciliata genotype selection with the use of individual BLUP with repeated measures. Scientia Agricola 69: 210-216.
• Food and Agriculture Organization [FAO]. 2015. Browse data. Available at: http://faostat3.fao.org/browse/Q/QC/E [Accessed Dec 10, 2015]
» http://faostat3.fao.org/browse/Q/QC/E
• Honsdorf, N.; March, T.M.; Berger, B.; Tester, M.; Pillen, K. 2014. High-throughput phenotyping to detect drought tolerance QTL in wild barley introgression lines. PLOS One 9: e97047.
• Khojastehnazhand, M.; Omid, M.; Tabatabaeefar, A. 2009. Determination of orange volume and surface area using image processing technique. International Agrophysics 2000: 237-242.
• Koc, A.B. 2007. Determination of watermelon volume using ellipsoid approximation and image processing. Postharvest Biology and Technology 45: 366-371.
• Li, L.; Qin, Z.; Danfeng, H. 2014. A review of imaging techniques for plant phenotyping. Sensors 14: 20078-20111.
• Miller, J.; Morgenroth, J.; Gomez, C. 2015. 3D modelling of individual trees using handheld camera: accuracy of height, diameter and volume estimates. Urban Forestry and Urban Greening 14: 932-940.
• Oliveira, E.J.; Amorim, V.B.O.; Matos, E.L.S.; Costa, J.L.; Castellen, M.S.; Padua, J.G.; Dantas, J.L.L. 2010. Polymorphism of micro satellite markers in papaya (Carica papaya L.). Plant Molecular Biology Reporter 28: 519-530.
• Pauli, D.; Sanchez-Andrade, P.; Carmo-Silva, E.; Gazave, E.; French, A.N.; Heun, J.; Hunsaker, D.J.; Lipka, A.E.; Setter, T.L.; Strand, R.J.; Thorp, K.R.; Wang, S.; White, J.W.; Gore, M.A. 2016. Field-based high-throughput plant phenotyping reveals the temporal patterns of quantitative trait loci associated with stress-responsive traits in cotton. G3: 865-879.
• Parent, B.; Shahinnia, F.; Maphosa, L.; Berger, B.; Rabie, H.; Ken, H.; Kovalchuk, A.; Langridge, P.; Fleury, D. 2015. Combining field performance with controlled environment plant imaging to identify the genetic control of growth and transpiration underlying yield response to water-deficit stress in wheat. Journal of Experimental Botany 66: 5481-5492.
• Payne, A.B.; Walsh, K.B.; Subedi, P.P.; Darvis, P.P. 2013. Estimation of mango crop yield using image analysis: segmentation method. Computers and Electronics in Agriculture 91: 57-64.
• Rashidi, M.; Gholami, M.; Abbassi, S. 2009. Cantaloupe volume determination through image processing. Journal of Agricultural Science and Technology: 623-631.
• Roscher, R.; Herzog, K.; Kunkel, A.; Kicherer, A.; Töpfer, R.; Förstner, W. 2014. Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields. Computers and Electronics in Agriculture 100: 148-158.
• Sabliov, C.M.; Boldor, D.; Keener, K.M.; Farkas, B.E. 2002. Image processing method to determine surface area and volume of axi-symmetric agricultural products. International Journal of Food Properties 5: 641-653.
• Shimizu, A.; Yamada, S.; Arita,Y. 2014. Diameter measurements of the upper parts of trees using an ultra-telephoto digital photography system. Open Journal of Forestry 4: 316-326.
• Siswantoro, J.; Prabuwono, A.S.; Abdullah, A.; Idrus, B. 2014. Monte Carlo method with heuristic adjustment for irregularly shaped food product volume measurement. The Scientific World Journal 2014: 1-10.
• Sritarapipat, T.; Rakwatin, P.; Kasetkasem, T. 2014. Automatic rice crop height measurement using a field server and digital image processing. Sensors 14: 900-926.
• Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22: 1330-1334.
• Zhou, R.; Damerow, L.; Sun, Y.; Blanke, M.M. 2012. Using colour features of cv. ‘Gala’ apple fruits in an orchard in image processing to predict yield. Precision Agriculture 13: 568-580.

# Publication Dates

• Publication in this collection
Jul-Aug 2017