Papaya recombinant inbred lines selection by image-based phenotyping

The selection of superior Carica papaya (L) genotypes depends on the availability of genetic variability and on the favorable and simultaneous response of the genotypes to those traits of most interest. However, manual phenotyping (MP) demands intensive labor, is time-consuming and expensive. The aim of the current study is to access the efficiency of image-based phenotyping (IBP) in estimating genetic parameters and in selecting F4 recombinant inbred lines. The genetic parameters and values were estimated in accordance with the REML/BLUB procedure and combined selection using the selection index based on standardized genetic values. The majority of traits accessed through IBP showed experimental coefficients of variation similar to those found through MP. Both methodologies showed genetic parameters of similar magnitude, indicating expressive genetic variability between lines in the traits accessed in this study. The same superior lines were indicated in both methodologies and expressive genetic gains obtained through the lines were selected for all traits. IBP performance was similar to that of MP with respect to the estimates of breeding-relevant traits such as commercial fruits and yield. Thus, IBP showed efficient phenotypic assessment, as well as selective accuracy in accessing genetic variability and genetic gains, when it was compared to MP. Since IBP is far less dependent on labor, it is expected to be incorporated into the routine of papaya breeding programs as a way of increasing the number of accessed lines and, consequently, increasing genetic gains.


Introduction
Papaya (Carica papaya L.) is one of the most economically important fruits in the tropical and subtropical regions of the world. In addition, it is widely known for its nutritional benefits and pharmacological properties (Oliveira and Vitória, 2011). However, commercial fields consist of a small number of cultivars and this leads to limited genetic variability (Dias et al., 2011). The development of cultivars depends on the availability of genetic variability and the simultaneous and favorable response of the genotypes to most traits of agronomic importance. The phenotypic assessment of fruit tree genotypes requires intensive labor and is based mainly on traits such as fruit yield and quality. Conventional phenotyping methodologies have low performance; they are laborious, time-consuming, expensive and, most of the time, destructive (Rahaman et al., 2015). Thus, phenotypic assessment affects the selection strategies and it is necessary to the development of methodologies which will efficiently collect, store and analyze data (Merk et al., 2012).
The trait measuring process must be reliable and consistent to allow for assessing phenotypic differences and improving selection. Recently, the introduction of phenotyping methodologies based on digital images has allowed for assessing phenotypic values with high resolution, accuracy, and on a large scale (Honsdorf et al., 2014;Parent et al., 2015;Pauli et al., 2016). Accurate phenotypic quantification applied to breeding populations has increased the variance rate in many traits, due to genetic effects, as well as increased genetic gains.
Recently, plant breeders have begun to consider genetic value estimation as a selection criterion (Heffner et al., 2009). High quality genetic assessment procedures rely on the estimation of variance components through the Restricted Maximum Likelihood (REML) method, as well as on the estimation of breeding values through the Best Linear Unbiased Predictor (BLUP) method, which uses mixed modeling to provide more accurate estimates and predictions of genetic parameters and breeding values (Resende et al., 2006). These procedures have been successfully used to select superior papaya genotypes (Oliveira et al., 2012;Pinto et al., 2013;Ramos et al., 2014).
Thus, the aim of the current study was to access the efficiency of the image-based phenotyping methodology in estimating genetic parameters and selecting F 4 recombinant inbred lines.

Study location and plant material
The experiment was conducted in Linhares, Espírito Santo,Brazil (19º06' and 19º18' S,39º45' and 40º19' W,altitude 30 m). A completely randomized block design with six replications was applied in this study using 23 F 4 recombinant lines and one plant per plot (STP). The lines were derived from the crossing of two parents from the 'Formosa' heterotic group, which were subjected to an advance in generations through self-fertilization. Two assessments were performed 9 and 12 months after the transplanting of seedlings (MAT).

The phenotyping of morpho-agronomic traits using conventional methodology
Manual phenotyping (MP) was used to assess in each plant the following traits: plant height (PH), which was expressed as m and measured with a measuring tape; stem diameter (SD), which was expressed as mm and measured with a digital caliper; number of commercial fruits (NCF); number of deformed fruits (NDF); and number of fruitless leaf axils (FLLA). Plant production (PROD) was obtained by multiplying NCF by the mass of commercial fruits. The fruits were weighed using an analytical balance. Only fruits showing a defined shape were taken into consideration in fruit counting; the last fruit assessed was marked to facilitate the counting conducted according to the methodology based on digital images.

Image capturing
Image-based phenotyping (IBP) used a semiprofessional digital camera to take pictures of each plant in two different positions. The first position was perpendicular to the plant (Image A) ( Figure 1A), based on the axis of the row; the second position was the opposite side of the same plant used in image A (Image B) . The pictures were taken at a distance of 2.5 m from the plant in the row. Image A was used to measure traits such as PH (expressed in m) and SD (expressed in mm). On the other hand, both images (Image A + Image B) were used to estimate the NCF, NDF and FLLA. In order to determine how many images can be used in phenotyping the traits NCF, NDF and FLLA, the A and B images were compared to assess the symmetry between the sides photographed. For this, the number of fruits and fruitless leaf axils obtained in each image was multiplied by two (Image A × 2 and Image B × 2) to estimate the genetic parameters.

Image analysis
The images were analyzed by the public domain ImageJ software program. A ruler was used as a reference measure in each plant photographed in order to calibrate the dimensions of the image through the 'set scale' function of the software program. PH and SD traits were measured after calibration using the 'straight line selection' tool. The NCF, FLLA and NDF traits were estimated using the plugin Cell Counter. The PROD was estimated using the same mass of fruit considered for quantifying this trait by manual phenotyping.

Statistical analysis
The statistical analysis of the NCF, NDF, FLLA and PROD took the sum of the two assessment periods (nine MAT + twelve MAT) into consideration. On the other hand, the statistical analyses of PH and SD took the mean of the two assessment periods into consideration. NCF, FLLA and PROD data were subjected to x -type transformation and NDF data to x + 0 5 . -type transformation.
The genetic analysis of the traits, which was conducted using both manual phenotyping and imagebased phenotyping methodology, used the mixed model methodology and applied the REML/BLUP procedure. Variance components and the genetic parameters were estimated using the REML method, whereas the genetic values were obtained using the BLUP as shown in the following statistical model where: y is the data vector; b the vector of replication effects (assumed to be fixed), g the vector of the genetic effects of lines (assumed to be random), and e the vector of errors (random). The capital letters represent the incidence matrices of these effects. The distribution and structure of means and variances is given by: For the random effects the model is given by: Cov(g, e') = 0. The variance structure of the model is given by: where A is the genetic relationship matrix involving all individuals, in which elements are functions of identity by descent probabilities.
The equations of the mixed models to estimate the fixed effects and to predict the random effects using the Blup procedure, presented by Resende (2002) are given by: 2 2 = + σ σ σ is the individual narrow sense heritability in the block; σ g 2 = the additive genetic variance and σ e 2 = the residual variance (environmental + non additive).
In order to compare the performance of the two methodologies, the estimates of variances and genetic parameters were obtained as follows: Phenotypic variance of the mean of lines: ˆˆσ σ σ The mixed model analysis was performed using the MIXED procedure of the SAS Studio 3.5 statistical software program. The combined selection to identify the superior lines was carried out using the index based on standardized genetic values developed for the selection of papaya lines, according to Silva et al. (2008) and Ramos et al. (2014). The procedures required for the construction of this index were generated using the MIXED procedure program from SAS software. A selection intensity of 35 % was applied to indicate the eight superior lines. In addition, the genetic gains obtained from the selection of superior lines indicated in each methodology were estimated using the following estimatorˆĜ where: Ĝ s = genetic gain; ŷ s − µ 0 = differential selection; ĥ m 2 = heritability of the mean of the lines. Table 1 shows the results of the estimates of the variance components and genetic parameters of the assessed traits. The experimental coefficients of variation (CVe) were estimated for most of the traits using imagebased phenotyping (IBP) and presented values similar to those found when MP was used. Thus, the CVe ranged from 10 to 28 % when the IBP methodology was used, whereas the MP methodology yielded coefficients ranging from 9 to 25 %. The CVe values estimated for PH and SD were low indicating a high degree of experimental precision. Precision in the phenotypic assessment of the morphological traits of plants using methodologies based on digital images has been reported in several economically important crops such as barley (Chen et al., 2014), Australian cedar (Shimizu et al., 2014) and rice (Sritarapipat et al., 2014). However, the CVe values in the remaining traits were moderate, and the highest value was found for the NDF trait (25 and 28 % estimated by MP and IBP, respectively). Moderate and high CVe values in NCF, FLLA, PROD and NDF have been reported in studies that assessed papaya lines in the field Ramos et al., 2014). High CVe magnitude values indicated a low degree of experimental precision and may be associated with the great variation presented by these traits between lines. Another cause that may have contributed to the recording of moderate magnitude values was the drought in the region in the last three years, which has affected plant development, and has led to plant loss and a consequent reduction in the number of experimental units. Studies about the effect of water deficit on papaya plants have indicated reductions in stomatal conductance in the soil, leading to a decrease in photosynthesis and, therefore, reductions in both the production and quality of fruit (Campostrini et al., 2010). According to Ferrão et al. (2008), high CVe values may be associated with the long cycle of the crop, with the large size of the experiments, with sampling errors, with different responses of the genotypes to the stress caused by high temperatures and drought, as well as with the different responses of the genotypes to pests, disease, wind and pruning.

Estimates of variance components and genetic parameters
With regard to the magnitude value of the genotypic coefficient of variation (CVg), which expresses the amount of genetic variation in percentage, the two phenotyping methodologies showed similar values in most of the assessed traits. Thus, the CVg ranged from 7 to 20 % when the IBP methodology was used and from 8 to 21 %, when the MP methodology was used. Using the IBP methodology resulted in slightly higher values for NDF, FLLA and PROD. However, for SD and NFC slightly lower values were obtained. Low values were estimated for PH and SD and moderate values for FLLA, NCF, NDF and PROD by both methodologies. Thus, according to the CVg magnitude values, it appears that the Papaya selection by image analysis Sci. Agric. v.75, n.3, p.208-215, May/June 2018 performance of the IBP methodology was comparable to that of the MP in terms of accessing genetic variability between lines. The high values of relevant traits such as NCF, PROD and NDF in papaya breeding indicated that it is possible to select highly productive lines, as well as lines showing a small number of deformed fruits.
The relative coefficient of variation (CVr), which refers to the magnitude of the relationship between CVg and CVe, indicates to what extent the existing variation results from genetic causes and it measures the accuracy of the inferences that could result from phenotypic assessments. Thus, CVr values ranged from 0.68 (NDF) to 0.91 (PROD) when the IBP methodology was applied and from 0.68 (NDF) to 0.95 (PROD), when the MP methodology was used. Values above the unit provide inferences of a high and very high degree of accuracy and precision (Resende and Duarte, 2007). The current study found no magnitude value equal to the unit in the assessed traits. However, traits such as PH, SD, NCF and PROD did show magnitude values close to the unit in both phenotyping methodologies, indicating a favorable condition for the selection of superior lines. As for NDF and FLLA, more accurate methods should be used to select superior lines.
The quality of genotypic assessment should preferably be inferred based on accuracy because this parameter refers to a correlation between the actual genotypic  value of lines and that predicted for the information obtained by field experiment. In the current study, the two phenotyping methodologies showed a similar degree of accuracy in most assessed traits, with values ranging from 0.84 to 0.90, which are considered high in magnitude value showing that IBP allows for obtaining reliable inferences of genotypic means.
With respect to the heritability of the mean of the lines (ĥ m 2 ), IBP allowed for estimating magnitude values equal to those estimated through MP for traits such as PH and PROD (0.78 and 0.81), and similar magnitude values in NCF (IBP: 0.78 and MP: 0.77). On the other hand, MP was slightly more efficient in accessing the genetic variability of the lines in traits such as SD (0.81) and FLLA (0.74) in comparison to that obtained through IBP (0.75 and 0.71, respectively). It is worth highlighting that although the IBP presented lower magnitude values than MP in these traits, they were very close. Thus, there is no reason for rejecting its use in papaya breeding programs. According to the results, the PROD trait showed the highest degree of genetic variability, whereas NDF and FLLA showed the lowest degree of genetic variability in the two methodologies. Although the sampling population used herein was generated from the crossing between two parents belonging to the same heterotic group, it presented expressive genetic variability in most traits, mainly in NCF and PROD. Papaya selection by image analysis Sci. Agric. v.75, n.3, p.208-215, May/June 2018 These traits have great economic importance among those assessed in the current study. Genetic variability may be due to the inbreeding nature of the lines herein assessed, which resulted from the advance of three generations through self-fertilization (F 4 ). Consequently, the increased genetic variance between lines made them more genetically distant from each other. According to Hallauer et al. (2010), the expected variance between F 4 lines is equivalent to: σ σ σ GF A D 4 2 2 2 3 2 3 16 / / = + , where: σ A 2 : is the additive genetic variance and σ D 2 : the dominance genetic variance.
Thus, most of this variance results from the additive variance component, which indicates that heritability is mainly additive, which increases the chances of obtaining greater gains in the selection of such lines.
The estimates of variance and genetic parameters for traits NCF, NDF, FLLA and PROD using an image and multiplying by two (Image A × 2, Image B × 2) are shown in Table 1. Thus, the CVe magnitude values estimated in Image A × 2 ranged from 17 % (NCF) to 35 % (NDF), whereas those estimated in Image B × 2 from 19 % (NCF) to 34 % (NDF). The CVg obtained through Image A × 2 showed magnitude values ranging from 14 % (FLLA) to 21 % (PROD), whereas that obtained in Image B × 2 showed magnitude values ranging from 11 % (FLLA) to 22 % (NDF). The CVr obtained in Image A × 2 showed magnitude values ranging from 0.51 (NDF) to 0.95 (PROD), whereas that obtained in Image B × 2 showed magnitude values ranging from 0.61 (FLLA) to 0.80 (PROD). The accuracy obtained by Image A × 2 ranged from 0.77 (NDF) to 0.90 (PROD), whereas that obtained by image B × 2 from 0.81 (NDF) to 0.87 (PROD).
The ĥ m 2 obtained in Image A × 2 ranged from 0.59 (NDF) to 0.82 (PROD), whereas that obtained in Image B × 2 ranged from 0.64 (FLLA) to 0.76 (PROD). Thus, the analysis of the parameter estimate values using one of the photographed sides allowed for seeing that the photographed sides of these lines were not symmetrical; therefore, it was necessary to use the two images to assess count-dependent traits. The asymmetry observed herein may be associated with the genetic nature of the lines, which were the third generation obtained through self-fertilization. It can be implied that there is still genetic variability within lines, although this variation was not taken into consideration given the STP experimental conditions. Another possible cause may lie in the variation in the arrangement or insertion of fruits and fruitless leaf axils within a single plant. Such variation would lead to differences between the photographed sides, i.e., one side would show the largest number of fruits, thus leading to inconsistent results. This variation may also be due to the drought in the region, which results in the previously mentioned implications. Thus, based on the results found herein, both the analysis of the selection index and the estimation of genetic gains were conducted by taking the sum of the two images (Image A + Image B) into consideration in order to compare the phenotyping methodologies.

Combined selection and estimation of genetic gains
Overall, the selection index used to make the combined selection has consistently ranked the lines based on all the assessed traits. The same superior lines were selected by the two phenotyping methodologies (Tables  2 and 3). In addition, the index also showed good consistency in the selection of lines based on PROD and NCF, which were considered the most economically important traits. Based on this index eight superior lines were selected. By taking into consideration the mean of the selected lines, the genetic gains in all the traits ranged from 29 % (PROD) to -8 % (FFLA) when the MP methodology was used and from 27 % (PROD) to -10 (FLLA) when the IBP methodology was applied. Both methodologies were consistent in the gain estimate of each trait. The highest mean gain was obtained in PROD, whereas the lowest one was in SD. It is worth emphasizing that these gains were positive for traits such as PROD, NCF and SD, and that they were negative in traits such as NDF and FLLA, since the goal was to reduce the NDF and FLLA means.
IBP methodology has led to genetic gains greater than those of the MP methodology in FLLA and SD, as well as to similar genetic gains in NCF. However, MP methodology has led to the greatest genetic gains in PROD and NDF. The eight lines selected through MP methodology showed gains ranging from -2 % to 100 % for PROD, from 0 % to 42 % for NCF, from -42 % to 36 % for NDF, from -21 % to 14 % for FLLA, and from -1 % to 13 % for SD (Table 2). On the other hand, the gains resulting from the application of the IBP methodology ranged from -1 % to 103 % in PROD, from -3 % to 43 % for NCF, from -48 % to 44 %for NDF, from -27 to 14 % for FLLA, and from -2 % to 9 % for SD (Table  3). The negative sign of the PROD trait was obtained through the selection of line 22 by the two methodologies and it can be explained by the fruit mass, since this trait is the product of the multiplication of the number of commercial fruits by the mean mass of fruits. Line 22 has presented fruits with mean mass 470 g, and it was the least selected line. However, this line showed significant and positive gains in NCF, which justified its selection. The opposite happened to line 13, which presented a mean mass of 890 g, and it was one of the lines that was selected the most. The positive sign of the NDF trait was found in certain selected lines due to the fact that a number of productive plants also produced deformed fruits which has increased the mean of this trait.
The differences in magnitude values of the genetic gain estimates between the phenotyping methodologies stem from differences between the heritability means and coefficients. This fact is associated with the peculiarities of each methodology. For example, with respect to the number of fruits, at times a given fruit is not completely visible because the leaves, branches or other fruits prevent it from being seen or because more fruits may be growing in the same node. Thus, the occlusion of Papaya selection by image analysis Sci. Agric. v.75, n.3, p.208-215, May/June 2018 fruits may minimize their visible area and hinder their recognition in the image. Errors resulting from the occlusion of fruits have been addressed in studies that estimate the number of fruits using methodologies based on digital images (Payne et al., 2013;Roscher et al., 2014). In addition, errors in the recognition of deformed fruits may result from the difficulty of identifying the part of the fruit exhibiting the anomaly. For example, carpelloid fruits may be mistaken for commercial fruits. The identification of pentandric fruits is easier because of the characteristic shape of these fruits. However, a trained and experienced evaluator may identify most of the deformed fruits in the image and help reduce the methodology error. On the other hand, the manual counting of papaya fruits demands intensive labor. The evaluator must go around each plant or, in many cases, use a ladder in order to perform the counting. Thus, manual assessment -mainly in productive plants or in experiments comprising large numbers of treatments -is laborious and induces the appraiser to make counting errors, since it is tiring and difficult to accomplish. Thus, these Table 2 -Genetic gains (Gain) and new predicted averages in five traits crucial to papaya breeding for the lines selected by index using manual phenotyping considering the sum of the two evaluation seasons (9 and 12 months after the transplanting) in Linhares, Espírito Santo, Brazil (2016).   peculiarities associated with each methodology, as well as the experimental conditions and genetic structure of the lines, are able to explain the small differences between means and magnitude of heritability coefficients and the consequent genetic gain estimates obtained by each methodology. Direct selection based on genetic values obtained through BLUP has been successfully applied to fruit species since it explores the environmental effect-free genetic variation of individuals. However, the combined selection was identified as an appropriate strategy in papaya breeding although it showed lower gains than those found in direct selection. This happens because of the great expectation of success in future generations and because the combined selection simultaneously takes into consideration traits that are favorable and unfavorable to the papaya crop Pinto et al., 2013;Ramos et al., 2014). The two phenotyping methodologies showed significant genetic variability between lines as far as the assessed traits are concerned. Eight superior lines were selected for the advance of genera-Papaya selection by image analysis Sci. Agric. v.75, n.3, p.208-215, May/June 2018 tions and selection cycles. It is worth emphasizing that this selection should be done between lines due to the genetic variance evolution of generations derived from self-fertilization. In addition, it is worth pinpointing that the genetic parameter estimates, as well as the efficiency index in the selection of superior lines, are both inherent to F 4 lines and to the experimental conditions set in the current study.
IBP is an efficient phenotypic analysis instrument as regards selective precision and accuracy in the capturing of genetic variability and the gains obtained from the selection of superior lines, when compared to MP methodology. Furthermore, IBP methodology can be easily adopted, since the images are captured using an inexpensive, easily handled and transported conventional camera. In addition, the images can be stored in a computer for later analysis. It reduces both the labor and time spent on field measurements, and thus improves phenotypic assessment. As was evidenced in the present study, the two appraisers used 100 s, on average, to estimate traits in the MP methodology, whereas the same appraisers used 16 s to take two pictures per plant and 30 s to analyze the images. In addition, IBP methodology has the advantage that the time used to capture the images does not depend on the number of fruits or on the plant height; thus, it is faster than the MP methodology because the latter takes longer to be applied to productive and/or tall plants and is, therefore, more laborious. Thus, IBP methodology is expected to expand the size of the experiments, to make fast and accurate phenotypic assessments, as well as to help increase both the selection differential and the heritability coefficient, and, hence, lead to direct effects on genetic gains. It is also expected to be used at different stages of papaya breeding programs such as the germplasm assessment, the development of inbred lines, the assessment of yield competition trials, genome-wide selection studies (GWS), genome-wide association studies (GWAS) and marker-assisted selection (MAS).