Selection of desirable characters for papaya genetic improvement programs associated with hydric and thermal stress

HIGHLIGHTS: Young seedlings provide efficient data for identifying promising characteristics for the evaluation of papaya varieties. With only two components, it was possible to explain 70% of the variability for both cultivars. Principal component analysis (PCA) helps identify desirable characteristics of seedlings, thereby reducing costs and time. ABSTRACT Papaya cultivation is widespread in Brazil, particularly in the states of Bahia and Espírito Santo, where most commercial plantations are concentrated. Owing to the economic and social importance of papaya, the present study aimed to determine the explanatory variables between the genotypes of two cultivars: Golden (from the Soil group) and Tainung Nº 1 hybrid (from the Formosa group), cultivated under high temperatures and hydric stress. The genotypes containing more desirable agronomic characteristics were identified for use in plant genetic improvement programs. Principal component analysis (PCA) was applied to select the desirable genotypes for Golden and Tainung Nº 1 cultivars based on specific variables analyzed for two groups of variables; for group 1, the plant height, stem diameter, leaf length, leaf width, and leaf number were analyzed, whereas for group 2, the leaf and root dry mass, stem dry mass and fresh mass of 10 discs, fresh mass, and stem and root fresh mass were analyzed. When exposed to hydric and thermal stress, the Tainung Nº 1 cultivar outperformed the Golden cultivar for the evaluation characteristics selected for use in genetic improvement programs.

Growth analyses are used to explain differences in plant growth due to genetic or environmental changes (Rodrigues et al., 2019) and can be associated with physiological indices to identify morphophysiological conditions influenced by biotic and abiotic factors (Fontes et al., 2005).Multivariate statistical analyses are important for understanding the genetic relationships between morphophysiological characteristics and genetic materials and the variability of germplasm collections.Principal component analysis (PCA) is used to reveal patterns for germplasm characterization and eliminate datasets with less contribution (Suman et al., 2019), thereby facilitating the selection of important components and the assessment of existing genetic diversity.
Owing to the economic and social importance of papaya, the present study aimed to determine the explanatory variables between the genotypes of two cultivars, Golden from the Soil group and Tainung Nº 1 hybrid from the Formosa group, which were cultivated under high temperatures and hydric stress to determine which of these contains more desirable agronomic characteristics for use in plant breeding programs.

Material and Methods
Data were obtained from experiments conducted in a greenhouse in Embrapa Mandioca e Fruticultura Tropical -CNPMF, located in Cruz das Almas, Bahia, Brazil (Figure 1) at the coordinates: 12º 40' 19" S, 39º 06' 23" W and 225m.According to Köppen-Geiger climate classification, the climatic type of this region is characterized as Af, with an annual average temperature of 24.5 °C, relative humidity of 80%, and annual average precipitation of 1250 m (AGRITEMPO, 2018).
Two abiotic stress conditions were simulated in the greenhouse, excessively high temperature and low water availability, which could interfere with plant development and result in the loss of a part of the plantation.The average external temperature, obtained from daily temperature data provided by the CNPMF Meteorological Station, was 25 °C, while the average internal temperature, obtained from data collected during the study period via the Testo datalogger system, was 34 °C.The average temperature difference between both environments was 10 °C, according to Aiello et al. (2018).
After germination, the adopted water regime was 80% of the field capacity until 45 days after emergence (DAE).During this period, the field capacity was reduced to 40%.Irrigation was performed based on the mass of evaporated water, which was estimated by randomly weighing the pots.
Seedlings of Golden cultivar from the Soil group and those of Tainung Nº 1 hybrid from the Formosa group were placed in individual polyethylene bags, to which a 2:1 mixture of bovine manure and topsoil was added.Three seeds were placed per bag for the Soil group, and two seeds were placed per bag for the Formosa group.The seeds were watered thrice a day until germination.Thinning was performed after germination, leaving only the most vigorous seedlings per container.The experiment was conducted using a completely randomized design.
The analyzed variables were categorized into two study groups (1 and 2).For group 1, 100 plants were used as the basic unit.For each plant, the following variables were analyzed: plant height (HE, cm), stem diameter (DIAM, mm), leaf length (LL, cm), leaf width (LW, cm; measurements were performed with the aid of a caliper rule and a tape measure), and leaf number (NL).For group 2, variables were evaluated after collecting plants at 15,30,45,60,and 75 DAE,and 10 plants from each cultivar (Golden and Tainung Nº 1) were randomly selected.For each plant in group 2, the following variables were analyzed: dry mass of leaf (DML, g per plant), dry mass of root (DMR, g per plant), dry mass of stem (DMS, g per plant), dry mass of 10 discs (DM10D, g per plant), fresh mass of 10 discs (FM10D, g per plant), fresh mass (FM, g per plant), stem fresh mass (SFM, g per plant), and root fresh mass (RFM, g per plant).
For the group 2 variables, evaluations were performed at 15 DAE and subsequently every 15 days for each variable, totaling five evaluations: 15DAE: evaluation at day 15; 30DAE: evaluation at day 30; 45DAE: evaluation at day 45; 60DAE: evaluation at day 60, and 75DAE: evaluation at day 75.The evaluations were performed between February and April 2022.
Statistical analysis of the genotypes was performed using principal component analysis (PCA), an exploratory multivariate technique, with R statistical software (R Core Team, 2021).The group 1 dataset contained continuous quantitative variables (plant height, plant diameter, leaf length, leaf width, and plant diameter) and discrete quantitative variables (number of leaves); therefore, p = 5 variables.Genotype numbers, a = 2, were also defined, and was the total number of experimental units with two genotypes.Thus, a data matrix, X, with dimensions 1593 × 5 was obtained.
The correlation matrix of the variables was used for the PCA so that the results were not influenced by the magnitude of the variables' units.
The group 2 dataset only contained continuous quantitative variables, with p = 8 variables.The number of genotypes, a = 2, was also defined, and this variable.Good coverage is also a desirable characteristic for the breeding of papaya plants, as large leaf areas allow for better photosynthetic assimilation, protect plants against burns (Dantas et al., 2013), and contribute to higher rates of dry matter accumulation (Zhou et al., 2020).Melo et al. (2007) and Salles et al. (2019) verified that papaya seedlings with greater height and larger diameter resulted in a greater amount of phytomass, consequently producing greater resistance and quality, and could be introduced in the field.Because hydric and thermal stresses compromise the growth and yield of papaya, the variables HE, DIAM, LL, and LW, which had similar values, are extremely important characteristics for growth rates.
Regarding group 2 variables, for Golden cultivar, PC1 presented all variables with values higher than those considered relevant (0.5), except for the variable MS10D, which had a value of 0.37.For Tainung Nº 1, all variables were greater than 0.5, except for M10D and MS10D.For PC2, except for variables M10D and MS10D (0.68 and 0.86, respectively), all others had values lower than 0.5 for both Golden and Tainung Nº 1 cultivars (Table 2).
According to Silva et al. (2019), variables that have the same sign act directly, i.e., an increase in the value of one variable, corresponds to an increase in the value of the other.Conversely, when variables have opposite signs, they operate in the inverse configuration, i.e., when the value of one variable increases, the value of the other decreases.Thus, all analyzed variables (groups 1 and 2) acted directly on PC1 in both genetic materials, except for variable M10D.For PC2, this direct action was verified using a set of the group 1 variables HE, DIAM, and LL for the Golden, and HE and DIAM for the Tainung N° 1 cultivar; and for the groups 2 variables FM, FM10D, DM10D, DML, DMR and DMA in Golden, and FM, SFM and DML in Tainung N° 1.
According to Barbero et al. (2013) growth data accurately estimate the causes of variations in plant growth patterns that are was the total number of experimental units with two genotypes.
A data matrix, X, with dimensions 100 × 8 was obtained, and the correlation matrix of the variables was used for the PCA so that the results were not influenced by the magnitude of the variables' units.

Results and Discussion
First, PCA was performed for the genotypes separately, and the first two principal components in the growth variables explained 95.0% of the variability in Golden and 93.0% of the variability in Tainung Nº 1 (Table 1).For the analyzed variables, PC1 contributed 78.1% for Golden and 71.7% for Tainung Nº 1, whereas PC2 contributed 16.8% for Golden and 21.4% for Tainung N o 1 (Table 1).
Values or correlation values equal to or greater than 0.5 were considered relevant (Tobar-Tosse et al., 2015).It was observed that group 1 variables exhibited similar behaviors for the two genetic materials.For PC1, all variables had values greater than 0.5; however, for NL, a value close to the relevant value was obtained for Golden (0.51), while only 0.15 was obtained for Tainung Nº 1.For PC2, for both genetic materials, only the variable NL had greater discriminatory power (-0.86 Golden and -0.98 Tainung Nº 1) but were negative, while other variables were less than 0.5 (Table 1).
The group 1 discriminatory variables were HE, DIAM, LL, and LW (Table 1).Nobre et al. (2021) considered height as an important agronomic characteristic in the genetic improvement of papaya, with low heights being desirable because small-sized genotypes facilitate harvesting and cultural practices.According to Conceição et al. (2021), stem diameter is an important characteristic for papaya yield, as plants with the largest stem diameter have a lower tendency to lodging and consequently produce more fruits and are more productive  Growth analyses attempt to explore the functional and structural differences within the same species and favor the selection of genetically superior materials with desirable characteristics within genetic programs.
The group 2 discriminatory variables were FM, FM10D, RFM, DML and DMS (Table 2), indicating that plant growth may be related to the photosynthetic activity/system in providing/ producing organic matter through photosynthesis for biomass accumulation (Tang et al., 2020).
For the two genetic materials, the positive correlations observed in Table 1 are responsible for the discrimination of the variables located to the right of PC1.For the Golden genotype, PC1 had a greater contribution from HE, DIAM, and LL.There were strong linear correlations between HE and DIAM, between LL and LW, and between HE and DIAM with LL and LW; the analysis also suggests that the NL is not linearly correlated with any other variable under study, except for LW, where a moderate correlation could be found (Figure 1A).For the Tainung Nº 1 cultivar, similar results were observed, where the most explanatory variables were HE, DIAM, and LL, and strong correlations between HE and DIAM, LL, and LW were observed, where the variable NL was not linearly correlated with the others (Figure 1B).
For the Golden cultivar, except for NL, which presented higher values at 43DAE, treatments 50DAE, 57DAE, and 64DAE had the highest values of observed variables among treatments; with the highest values for HE, DIAM, LL, and LW recorded during the 50DAE, 57DAE, and 64DAE evaluations, respectively.a clear superiority was found for the 8th evaluation (64DAE), which was expected since those variables grew over time until growth stabilization (Figure 1A).
For the Tainung Nº 1 hybrid, 50DAE, 57DAE, and 64DAE had higher values for HE, DIAM, LL, and LW, with the highest values recorded at 64DAE (8 th evaluation), followed by 50DAE (6 th evaluation) and 57DAE (7 th evaluation), respectively.For NL, the highest values were recorded at 43DAE, the 5 th evaluation (Figure 1B).Therefore, the 8 th evaluation of Golden was more representative than the 8 th evaluation on Tainung N° 1.
Regarding the positive correlations found in Table 2, for group 2 variables analyzed for both genetic materials, for PC1, the variables facing to the right determined a greater contribution considering all variables, with a strong linear correlation among them, except for FM10D (to the left on the graph) and DM10D (closest to the origin).These results suggest that all variables are important for the Golden cultivar.For DM10D, a weak linear correlation with the other variables was recorded, whereas for FM10D, a negative and moderate correlation with the other variables was observed (Figure 3A).Tainung Nº 1 had similar results because all variables contributed greatly to the characteristics, except for FM10D (on the left of the graph), DM10D, and DMR (closer to the origin).Considering DM10D, there was a weak linear correlation with the other variables, whereas for FM10D, a weak and negative linear correlation with the other variables was recorded (Figure 2B).
Because the later evaluations had the highest values for all investigated variables, a joint analysis was performed on the last treatments of each cultivar (64DAE Golden and 64DAE Tainung Nº 1) to determine the best genotype.Analyzing the growth variables of both genotypes simultaneously, the three principal components corresponded to 78% of the variances contained in the original variables; PC1 contributed 58.6%, while PC2 contributed 19.7%.For the group 2 variables, the two principal components corresponded to 83% of the variance contained in the original variables; PC1 contributed 68.9%, and PC2 contributed 14.6% (Table 3).
Only two principal components were required for both groups of variables; although the first two principal components were not greater than 80% (group 1 variables), the values were very close, and the third component did not have relevant values.Thus, in PC1 for group 1, all variables had values greater than 0.5, except for NL (-0.22); for PC2, only NL had greater discriminatory power (-0.97), but with a negative value; the other variables presented values lower than 0.5.For group 2, all variables had values greater than 0.5 in PC1, while for PC2, FM10D and DM10D had (negative) values greater than 0.5 (Table 3).
By the positive correlations observed in Table 3, for both genetic materials, it was possible to determine that group 1 variables facing to the right in PC1, specifically HE, DIAM, LL, and WL, had the greatest contribution and exhibited strong linear correlations between HE and DIAM and LL and LW.Regarding NL, a negative linear correlation with the other variables was suggested.There was a slight superiority of Tainung Nº 1 in relation to Golden owing to a greater dispersion of data in CP1 (Figure 3A).For the group 2 variables, there without ethylene, on the ripening and production of volatile compounds from Golden papaya, Façanha et al. (2019) selected two components that, together, explained the greater sample variability, corresponding to 57.72%.
Results obtained by Ruas et al. (2022) demonstrated that hydric stress in the Aliança papaya genotype promoted a reduction in growth, as observed by a reduction in leaf area, stem diameter, and plant height.Salinas et al. (2021) pointed out that temperatures below 20 ºC or above 35 ºC may induce a reduction in productivity and fruit quality, in addition to poor flower formation.
In a program of papaya selection, Santa-Catarina et al. ( 2020) aiming to distinguish between phenotypic and variability characteristics of 222 individuals of base population, needed five components to explain 76% of total variety.In all correlations observed in the present study, components 1 and 2 explained more than 78% of the sampled variabilities; these results demonstrated a high correlation between the observed characteristics and the major greater loads of factors in the PCA (Figures 1, 2 and 3).
The use of statistical techniques in several research areas is extremely important for guiding researchers' decision-making.In this study, abiotic stress conditions were simulated in a controlled environment to evaluate the behavior of two papaya genotypes, and the differences between them were analyzed using the principal components technique from data with low experimental error.This technique was shown to be a promising tool for the characterization of germplasm banks, helping to identify important and desirable seedling characteristics, discarding characters of little contribution, reducing the time and costs of breeding programs, and consequently, contributing to the genetic variability and breeding of papaya.

Conclusions
1.The first group of explanatory variables were plant height, stem diameter, and leaf length and width, while those for second group of variables were leaf fresh mass, stem fresh mass, root dry mass, leaf dry mass, and stem dry mass.
2. Tainung Nº 1 cultivar of the Formosa group showed superiority compared to the Golden group cultivar, suggesting that this cultivar has greater potential in hydric and thermal stress conditions and can be used in plant breeding programs to obtain resistant cultivars and hybrids and expand the genetic base.
Table 3.First and second principal components (PC1 and PC2, respectively) of variables of groups 1 and 2 with cultivar Golden (Solo group) and hybrid Tainung Nº 1 (Formosa group) was a greater discriminatory power of FM, SFM, RFM, DML, and DMS in which strong linear correlations were observed.Although FM10D and DM10D presented a linear correlation with each other, no discriminative power for group 2 variables was observed.There was a clear superiority of the Tainung Nº 1 hybrid compared to the Golden cultivar since it had greater scores in PC1 (Figure 3B).This superiority is related to the hydric vigor of the Tainung Nº 1 cultivar, as it presents high quality and yield.Sani et al. (2022), attempted to optimize the extraction of antibacterial agents from Carica papaya seeds, and only obtained 46% of the dataset explained by the addition of the two components together.Lieb et al. (2018) used PCA to explore the differences between several cultivars in Costa Rica, with the first two principal components of the model explaining 47.45% of the total variance between the samples.To test the effects of the application of 1-methylcyclopropene, with and (Vettorazzi et  al. 2020); furthermore, they found relevant values of 19% for Correlation values among the variables with the components, contribution of each principal component and total variance for each principal component, are presented in percentage.HE -Plant height; DIAM -Stem diameter; LL -Leaf length; LW -Leaf width; NL -Number of leaves per plant

Table 1 .
First and second principal components of group 1 of variables (PC1 and PC2, respectively) with two genetic materials of the cultivar Golden (Solo group) and the hybrid Tainung Nº 1 (Formosa group) Correlation values between the variables with the components, contribution of each main component and total variance for each principal component are presented in percentage.FM -Fresh mass; FM10D -Fresh mass of 10 discs; SFM -Stem fresh mass; FRM -Fresh root mass; DM10D -Dry mass of 10 discs; DML-Dry mass of leaves; DMR -Dry mass of root; DMS -Dry mass of stemTable 2. First and second principal components for group 2 of variables (PC1 and PC2, respectively) characteristics with two genetic materials of the Golden cultivar (Solo group), and hybrid Tainung Nº 1 (Formosa group) genetically different or when they are in different environments.

Figure 1 .Figure 2 .
Figure 1.Biplot showing the projection of the variables in the first two principal components (PC) for group 1 of variables of the cultivar Golden (Solo group) (A) and of the hybrid Tainung Nº 1 (Formosa group) (B) Figure 2. Biplot showing the projection of the variables in the first two principal components (PC) of group 2 of variables of the genetic materials of the cultivar Golden (Solo group) (A) and of the hybrid Tainung Nº 1 (Formosa group) (B)

Figure 3 .
Figure 3. Biplot showing the projection of variables in the first two principal components (PC) of group 1 of variables by genotype (A) and group 2 of variables by genotype (B)