Phenomics approaches: genetic diversity and variance components in a S2 guava family by seed traits

Guava is of great economic importance in Brazil. The development of new cultivars by obtaining inbreeding lines has been a promising option. The objective of this work was to evaluate the S2 families of Psidium guajava using seed attributes. Different characters of physiological quality of guava seeds were studied, in addition to performing digital phenotyping of characteristics of geometry, texture and colors of the seeds. The variables were analyzed simultaneously using the Ward-Modified Location Model (MLM) method and carried out individual analyses of variance for estimating genetic parameters of the population. The formation of more than one group of divergent genotypes was observed, the geometry characteristics were more impacting for the discrimination of the genotypes, a high phenotypic correlation was observed with the germination variables and dry matter weight. High heritabilities were verified for the variables related to seed quality, indicating success in selecting vigorous genotypes. The Ward-MLM method is a useful tool to detect genetic diversity among genotypes of inbred guava. Thus, the most divergent genotypes with high germination potential can be recommended for future crosses or self-fertilized to obtain new lines in the guava breeding program.


INTRODUCTION
Phenomics approaches in a S 2 guava family Self-pollination was obtained by protecting the flowers, by covering them before anthesis. Buds were identified and the fruits were later protected with a raschel bag.
After harvesting, the fruits were stored in a cold chamber until the time of seed removal. Auxiliary cutting and pulp removal tools were used to pulp the fruits and sieves were used to remove all mucilage and fiber. The seeds were rubbed over a steel mesh sieve under running water.
The removed seeds were left to dry at room temperature for 48 h, on paper towel in containers. Twenty-four hours after the start of the drying process, the seeds were turned over to dry evenly.

Seed physiological quality
Several seed physiological quality traits were evaluated for the different families obtained, namely: Moisture: determined by the greenhouse method, 24 h at 105 ± 3 °C (Brazil 2009). 1000-seed weight: obtained according to the Rules for Seed Testing (RAS) (Brazil 2009). Germination test: four replicates of 50 seeds from each of the 42 genotypes were used. The test was set up on a paper roll. The germination chambers were regulated to an alternating temperature of 35-25 °C, with a photoperiod of 8 h/ light and 16 h/dark, respectively. The first evaluation was performed on the 10th day and the last one on the 35th day, in which the percentages of normal seedlings, abnormal seedlings and nongerminated seeds were recorded. Shoot length (SL) and radicle length (RL): four rolls of paper were set up with ten seeds and, at the end of the 35th day, the seedlings were measured with a graduated ruler. Dry matter weight (DMW): after obtaining the length of the seedlings, they were sectioned to separate the shoots from the radicles and placed in paper bags, which were then oven-dried at a temperature of 65 °C for 72 h. After drying, the samples were weighed on an analytical scale to determine the dry matter of shoots (g) and root (g). Germination speed index (GSI): seeds that produced shoots with a length of 1.0 cm, according to the preliminary tests, were counted on alternate days. This variable was calculated by the formula proposed by Maguire (1962). Tetrazolium test: for the seeds that did not germinate in the germination test, the tetrazolium test was performed using a 0.1% solution. The seeds were cut longitudinally and kept for 4 h in the dark, immersed in the tetrazolium solution, at a temperature of 30 °C (Masetto et al. 2009). Dead seeds were considered to be those which did not change color or deteriorated. Accelerated aging test: the seeds were placed uniformly on an aluminum screen inside a germination box with 40 mL of water at the bottom. Subsequently, the germination boxes were subjected to a temperature of 41 °C for a period of 48 h. After this procedure, the test was performed to assess germination, as described previously.

Digital phenotyping of seeds
For the digital phenotyping of several traits related to seed shape, color and texture, four replicates of 50 seeds of each genotype were analyzed. The GroundEye Mini instrument was used to capture and analyze the seeds. Fig. 1 illustrates some of the geometric traits evaluated. The equipment generated observation spreadsheets from the digital image analysis of the seeds, of which 51 were traits of color, 48 of geometry, 192 of histogram and 43 of texture. Genetic dissimilarity in S 2 families of P. guajava the SAS program (SAS Institute 2003) was used to calculate the genetic distance between the genotypes. Observations were analyzed simultaneously using the Ward-MLM method to create the groups by the clustering method. Ward's clustering was performed using Gower's distance matrix. The ideal number of groups was defined according to the pseudo-F and pseudo-T² criteria. The difference in canonical variables was examined graphically, and the distance was determined based on the dissimilarity between the groups (Amaral Júnior et al. 2010).

Estimation of genetic components
To check for the existence of genetic variability between the evaluated genotypes, a randomized block design was adopted, individual analyses of variance were performed and genetic parameters of the population estimated. The Genes program (Cruz 2016) was used for individual analysis, by applying the following statistical model (Eq. 1): where Y ij = observation referring to family i in replicate j, μ = overall constant; G i = effect of family i, i = 1, 2, ..., g (g i ~ NID [0, σ 2 g]); B j = effect of block j, j = 1, 2, ..., r (b j ~ NID [0, σ 2 b]); and e ijk = experimental error (ε ijk ~ NID [0, σ 2 ]).
Based on the mean square values obtained by analysis of variance, the variance components associated with the effects of the genetic nature of the statistical models were estimated. Component of genetic variance (Eq. 2), component of phenotypic variance (Eq. 3), mean heritability coefficient (Eq. 4), experimental coefficient of variation (Eq. 5), and variation index (Eq. 6).
When significant by the F-test, the means of the genotypes for the studied traits were grouped by the Scott-Knott test at the 5% probability level using the Genes computer program (Cruz 2016).
The following expression was used to calculate the phenotypic correlation coefficients (r F ) between the pairs of rF traits: The significance of the phenotypic correlation coefficients was evaluated by the t test at 5 and 1% probability.

Genetic dissimilarity in S 2 families of P. guajava
The ideal number of groups was found using the MLM model, which performed the log-likelihood function, according to the pseudo-F and pseudo-T² criteria combined with the likelihood profile associated with the likelihood ratio test (SAS Institute 2003). Four groups were formed. Within the groups, the genotypes that most resembled each other were determined, as shown in Fig. 2 and Table 1.  Campos et al. (2013) studied the genetic divergence between 138 guava genotypes obtained from controlled biparental crosses. Eight groups were determined by the Ward-MLM method, based on morphological, agronomic and physicochemical analyses. As stated by Gonçalves et al. (2009), the number of groups can vary according to the species, number of accessions and number and type of descriptors. In the present case, the evaluated families originate from the population studied by Campos et al. (2013), which were selected and underwent a process of self-pollination. Therefore, low genetic diversity is observed for the evaluated set of traits, which can be explained by the self-pollination process to which this group of plants was subjected.
Group 1 was composed of 16 genotypes, representing the largest cluster. Group 2 was formed only by seven genotypes, and groups 4 and 3 contained 12 genotypes each.
These individuals also exhibited higher germination means in these conditions, the genotypes of group 1 can be considered the most vigorous. It is known that the genetic constitution of the evaluated individuals influences the results of seed vigor and germination. Furthermore, the seed germination behavior can vary widely according to the type of substrate, physicochemical factors, aeration, structure, water-holding capacity, among others (Fanti and Perez 1999). In the accelerated aging test, seeds are subjected to high temperature and humidity conditions, which cause stress, before being taken to the germination test (Marcos Filho 1999). It should be noted that the S 1 genotypes, which were self-pollinated to obtain these individuals, underwent a process of evaluation of agronomic attributes (Ambrósio et al. 2021), with emphasis given to their selection, aiming at fixing superior agronomic traits.  Group 2 showed the highest means for the variables of germination percentage (GP) and histogram: HSL: hue: minimum index (H3). The genotypes of group 2 can be considered those that would best perform in terms of germinating in the field. However, this group exhibited the lowest mean for 1000-seed weight. Germination may have been more effective in smaller seeds because of the greater contact surface with the substrate and, consequently, greater water absorption. Water is the factor that exerts the greatest influence on germination, since tissues are rehydrated upon its absorption. The consequence of this action is the intensification of respiration and all metabolic activities that provide energy and nutrients for the resumption of embryonic growth. With the entry of water, the seed also enlarges, facilitating the rupture of the coat and, thus, the emergence of the root hypocotyl (Carvalho and Nakagawa 2012).
Group 4 had the highest means for the variables of shoot length (SL) and 1000-seed weight (TSW), but the lowest GP. Seeds are considered reservoir organs, as they contain all the material necessary for the formation of future plants. In general, larger seeds or seeds of greater density were better nourished during their development, possessing larger reserve amounts. However, in certain situations, these larger seeds may not be the most vigorous (Carvalho and Nakagawa 2012). Krause et al. (2017) evaluated 61 genotypes of S 1 families of P. guajava L. by digital seed analysis, using six clustering strategies, and, in all of them, three groups were formed by the Ward-MLM method. This can be explained by the fact that the genotypes originated from inbred guava families in the first generation of self-pollination, whose aim was uniformity of the genotypes under evaluation, based on the formation of few groups.
Of the 31 variables used to determine genetic variability, some traits were of greater importance based on canonical variable 1. According to the data described in Table 1, the two-color variables, C3 and C2, with values of 0.827 and 0.823, respectively, were those which most contributed to the formation of the groups, followed by H12, H14, H9 and T14, with respective values of 0.790313, 0.786644, 0.783764 and 0.776867.
Contrary to this study, Fachi et al. (2019) and Krause et al. (2017) found a greater contribution to the analysis of genetic divergence, using the Ward-MLM method, in seed geometry traits. However, in the present study, the geometry variables were also predominant for the discrimination of the genotypes under study.
Thus, the sole use of the GroundEye instrument would be enough to determine the formation of groups in the study of genetic variability in self-pollinated guava seeds of the S 1 family. Accordingly, this could eliminate the need for physiological analyses, which demands time and financial resources, given the ease of use of digital analysis and its efficiency. However, the seed physiological quality aspects under evaluation cannot be neglected, as they are important criteria for selecting superior genotypes. Venora et al. (2007) described that image analysis is a fast method and requires less time to obtain data, in addition to not being destructive and being easily reproduced. Nevertheless, for greater knowledge of field conditions and applications in breeding programs, physiological variables are indispensable.
According to Cruz et al. (2014), when the first two canonical variables estimate over 80% of the total variation, a satisfactory interpretation of the variability between accessions can be considered. In the two-dimensional plot representation (Fig. 2), canonical variable 1 explained 82.03%, whereas canonical variable 2 explained 13.08% of the total variation. Together, the two explained 95.11 of the total variation by the Ward-MLM method.
In the study developed by Paiva et al. (2014), the first two canonical variables obtained by the Ward-MLM method explained 91.16% of the total variation. Krause et al. (2017) found, in the sum of the first two canonical variables, 100% of the explanation of the distances between the groups in the six clustering strategies.
In the study of Campos et al. (2013), the first two canonical variables did not achieve a satisfactory result, as only 61.79% of variation was explained. Thus, a third variable had to be introduced that represented 19.50% of the variation. In this way, the graphical representation was three-dimensional and the sum of the three variables resulted in 81.30% of the explanation, unlike the results obtained in the current study, as there was no need for the third variable. This may be due to the type of population used in the study, which were full-sib families with greater genetic segregation within the families.
Considering a cross between more vigorous groups, the cross of groups 2 and 4 would be indicated, as they showed a higher mean for the traits of germination, 1000-seed weight and SL. Group 3 was the most distant from all the others, but did not show any higher mean among the analyzed physiological traits. In this way, this study allows to indicate new crossing possibilities to obtain new segregating populations, as well as to indicate new individuals for new stages of self-pollination.
The possibilities of group formation and the potential of the geometric variables to discriminate the studied genotypes are illustrated in Fig. 3, which shows the groups formed using the Scott-Knott test according to the variable under study. More groups were formed for the DMW and circumference (C) variables (16 and 15 groups, respectively). For the convex circumference (CC) variable, 12 groups were formed. For maximum (MaD) and minimum (MiD) diameter, 11 groups were formed. Lastly, for corrected diameter (CD), convex area (CA) and area (A), 10 groups were formed. This emphasizes or places a caveat on the importance of the seed shape variables in detecting the existing diversity. When only these variables are considered, the high formation of groups provides information about the differences of genetic origin in the seeds of the studied S 2 guava families. Note. AG = area geometry; CAG = convex area geometry; GC = geometry circularity; CDG = contained diameter geometry; GMD = geometry maximum diameter; MDG = minimum diameter geometry; GSF = geometry sphericity of form; PG = perimeter geometry; CPG = convex perimeter geometry; GSC = geometry solidity contour; GPAA = germination percentage in the accelerated aging test; DSAA = percentage of dead seeds in the accelerated aging test; GSI = germination speed index; GP = germination percentage; DSG = percentage of dead seeds in the germination test; DMW = dry matter weight; ADP = aerial dry pasta; RL = root length; SL = shoot length. Tables 2 and 3 show the estimates of heritability coefficients for the traits associated with the physiological quality of guava seeds. All results were of high magnitude. The highest heritability value, 99.84%, was found for TSW.

Estimates of genetic parameters and possibilities of gains with selection
For the physiological variables (Table 2), the estimated heritability coefficients were 93.32% for GPAA, 94.65% for percentage of dead seeds in the accelerated aging test (DSAA), 98.26% for the GSI, 98.35% for the GP in the germination test and 97.93% for the percentage of dead seeds in the germination test (DSG). It is worth mentioning that the coefficients of variation of the DSAA and DSG variables were higher than 30% and thus considered very high. The data of these variables reveal the presence of many zeros, since most of the genotypes showed a high germination percentage. As the coefficient of variation indicates the precision of the experiment, it is clear that the nature of the observations is not homogeneous, which translates into a higher coefficient.
The lowest heritability value, 87.91%, was found for SL. The results of RL and SDM were 93.19 and 95.51%, respectively. Similar results were described by Martins et al. (2016), who evaluated 62 lines of soybean seeds and found 93.20% heritability in the accelerated aging test and 96.1% in the germination test. Maia et al. (2011) evaluated 94 lines of common bean and found 81.7% heritability for the dry matter trait. In turnip seeds, the heritability values were 90.2% for germination and 91.1% for accelerated aging . Martins et al. (2014) studied seeds from 50 half-sib progenies of 'Brasília'-type carrots and obtained heritability coefficients of 91.5% for germination, 99.76% for 1000-seed weight and 90.40% for accelerated aging. Table 2. Summary of individual analysis of variance and some genetic parameters associated for traits. Note. GPAA = Germination percentage in the accelerated aging test; DSAA = percentage of dead seeds in the accelerated aging test; GSI = germination speed index; GP = germination percentage; DSG = percentage of dead seeds in the germination test, assess in S 2 seeds of guava. **, * and ns significant at 1 and 5% probability and not significant, respectively, by the F-test; CV = coefficient of variation; MSD = minimum significant difference; IV (variation index; h² (heritability). Cardoso et al. (2009) estimated the genetic parameters of 30 seed genotypes from a papaya germplasm bank and found a heritability coefficient of 95.62% and a variation index of 15.76 for 1000-seed weight. For the germination variable, heritability was 81.18% and the variation index was 1.46. The heritability values found for root length and seedling dry matter were 86.18 and 90.95%, respectively. This contributed to the good variation indices of 1.7 for root length and 2.24 for seedling dry matter. The GSI results, in turn, were not satisfactory, as heritability was 31.07% and the variation index was 0.47. This was not observed in this study, where the GSI variable reached a heritability value of 98.29% and a variation index of 3.79. A contributing factor to the divergence of values was the use of different species.

Degree of freedom
Rajan et al. (2012) evaluated 50 accessions of guava and showed heritability values ranging from 49 to 98%. The estimated variables were fruit weight, fruit height, fruit diameter, fruit firmness, seed weight per fruit, number of seeds per fruit, seed weight/fruit weight ratio, 1000-seed weight and seed hardness. This reaffirms the importance of the study of heritability for the species P. guajava L. in breeding programs, since different genotypes can have opposite results.
The variation index (Iv) evaluates how much of environmental variance predominates over genetic variance. When greater than or equal to unity, this index indicates that the genetic component is little influenced by the environment (Cavalcante et al. 2012). The obtained results (Tables 2 and 3) were higher than 1, revealing little environmental influence and greater estimates of the heritability, which indicates a favorable situation for the selection of these traits and obtaining greater gains. However, it is important to emphasize that the genetic parameters estimated must not be extrapolated to other populations or environmental conditions because these are specific traits for the population under study. Table 3. Summary of individual analysis of variance and some genetic parameters associated for traits. Note. SL = Shoot length (cm); RL = root length; SDM = dry mass of the aerial part; WTS = weight of a thousand seeds assess in S 2 seeds of guava. **, * and ns significant at 1 and 5% probability and not significant, respectively, by the F-test (7.287 *); CV = coefficient of variation; MSD = minimum significant difference; VI = variation index; h² = heritability.

Source of variation Degree of freedom
C.C.A. Silva et al.

Phenotypic correlation between physical and physiological attributes of seeds
Knowing the correlation between two traits is extremely important for the improvement of any species. This information can facilitate the process of selection on traits that are difficult to measure and to identify, which may have low heritability. The phenotypic correlation is estimated directly from phenotypic measurements, which have genetic and environmental causes . In this case, digital phenotyping of seeds can be extremely relevant, since seed attributes can be observed early and be highly correlated with variables related to seed physiological quality, which demand time for phenotyping.
Phenotypic correlation analysis revealed the positive and negative significant correlations between the variables. According to Leal et al. (2012), highly correlated variables should not be interpreted in isolation.
Some of the geometric variables extracted by the GroundEye instrument showed high positive correlations. The area variable had the greatest correlation with CA, which can be explained by the little irregular structure of the seeds. Area also had a strong positive correlation with the traits of CD (0.848**), MaD (0.984**), MiD (0.953**), C (0.995**) and CC (0.996**) ( Table 4). These values clearly show that a larger area in guava seeds means higher values for the variables of CD, MaD, MiD, C and CC.
The 1000-seed weight obtained correlation coefficients of 0.865**, 0.862**, 0.851**, 0.860**, 0.859** and 0.866** with A, CA, MaD, MiD, C and CC, respectively. The 1000-seed weight is influenced by the area and length measurements (Table 4). Based on these correlations, it is suggested that obtaining the geometry results from the digital analysis equipment would also indicate selection gains through indirect responses for heavier/larger seeds. This means the DMW variable, whose measurement is more difficult, could be evaluated with the results of the variables of A, CA, MaD, MiD, C, CC. Therefore, after further adjustments to digital phenotyping, measurement protocols can be indicated in the future for the rapid assessment of genetic potential in different genotypes through digital seed traits, allowing, for instance, digital phenotyping in large populations under study.
Contour solidity is a variable that is more sensitive to the presence of long and thin branches. However, guava seeds have more spherical features. This explains the high negative correlation (-0.888**) between contour solidity and the sphericity of the shape (Table 4).
The DSAA variable obtained a highly negative correlation of -0.929** with GPAA. As with DSAA and GPAA, the correlation coefficient between DSG and GP (-0.880**) was high and negative, which was an expected result (Table 4).
Germination percentage showed a high positive correlation of 0.906** with GSI, indicating that a faster activation of the embryo's growth through metabolic activities and, consequently, a faster rupture of the integument will lead to higher GPs in the studied guava genotypes (Table 4). Mengarda et al. (2016) estimated Pearson's correlation to determine the quality of seeds and plants of the F1 and F2 generations of the UENF/Caliman 01 papaya hybrid. The sugar and lipid contents showed a positive correlation with GSI. These results indicate the metabolic contribution to the GSI and GP values, suggesting good nutrition of the papaya seeds, which can be associated with the result of other traits. The use of correlation can facilitate the development of more complex assessments. Rajan et al. (2008) evaluated 68 guava genotypes collected from different sources and kept in the germplasm bank of the Central Institute for Subtropical Horticulture of Lucknow. The authors performed correlation analysis in their study and proved its relevance for the discovery of favorable traits in the species P. guajava L., mainly in the seeds. According to their results, genetic gains can be obtained through correlated responses using seed variables. Table 4. Phenotypic correlations of the geometry traits and physiological analyzes in S 2 seeds of guava. Note. AG = area geometry; CAG = convex area geometry; GC = geometry circularity; CDG = contained diameter geometry; GMD = geometry maximum diameter; MDG = minimum diameter geometry; GSF = geometry sphericity of form; PG = perimeter geometry; CPG = convex perimeter geometry; GSC = geometry solidity contour; GPAA = germination percentage in the accelerated aging test; DSAA = percentage of dead seeds in the accelerated aging test; GSI = germination speed index; GP = germination percentage; DSG = percentage of dead seeds in the germination test; SL = shoot length (cm); RL = root length; DMW = dry matter weight; WTS = weight of a thousand seeds.

CONCLUSION
The high values of heritability observed for the characteristics related to the physiological quality of the seeds possible to select superior individuals with high germination potential.
The Ward-MLM strategy was efficient for discriminating groups, demonstrating that the simultaneous analysis of physical and physiological variables of seeds is feasible and allows greater efficiency in the knowledge of the divergence between the guava genotypes.
Therefore, the guava breeding program can be conducted in two ways: the most divergent genotypes with high germination potential can be recommended for future crosses or self-pollinated to obtain new lines.

DATA AVAILABILITY STATEMENT
No datasets were generated or analyzed during the current study.