Selection of BC 1 F 3 populations of Santa Cruz type dwarf tomato plant by computational intelligence techniques

: The aim of this study was to estimate genetic divergence and select BC 1 F 3 populations of dwarf tomato plant within the Santa Cruz segment by computational intelligence techniques. The experiment was conducted in a greenhouse in the Vegetable Crop Experimental Station of the Universidade Federal de Uberlândia (UFU), Monte Carmelo, MG, Brazil. A randomized block experimental design was used with 17 treatments and four replications. The genetic material evaluated comprised thirteen dwarf tomato plant populations obtained by a backcross and two self-fertilizations, plus both parents (recurrent and donor), and two commercial check varieties. The traits evaluated were mean fruit weight (MFW), soluble solids content (SSC), fruit diameter (FD), fruit length (FL), fruit shape (FS), pulp thickness (PT), number of locules (NL), distance between internodes, and acylsugar, β-carotene, and lycopene content. The data were analyzed by means testing, and genetic divergence was measured using Mahalanobis generalized distance by the unweighted pair group method with arithmetic mean (UPGMA) and through computational intelligence using Kohonen self-organizing maps (SOM). Genetic dissimilarity in relation to the donor parent could be confirmed through both methodologies. However, the SOM was able to detect differences and organize the similarities among the populations in a more consistent manner, resulting in a larger number of groups. In addition, the computational intelligence techniques allow the weight of each variable in formation of the groups to be ascertained. Among the BC 1 F 3 populations, UFU-SC#3 and UFU-SC#5 stood out for agronomic traits, and UFU-SC#10 and UFU-SC#11 stood out for quality parameters.


INTRODUCTION
The tomato plant (Solanum lycopersicum) has considerable socioeconomic relevance and is one of the vegetable crops most extensively grown throughout the world (Gerszberg and Hnatuszko-Konka 2017). In 2018, worldwide production was approximately 244 million tons, occupying around 5.8 million hectares (FAO 2019). In Brazil, tomatoes are classified into five groups: Minitomate, Salada, Caqui, Santa Cruz and Saladete (Alvarenga 2013). Out of Brazil, these tomato groups are known as cherry or grape (minitomatoes), round, beefsteak, chonto, and saladette or roma, respectivelly (Finzi et al. 2020). Among the different segments, hybrids belonging to the Santa Cruz group have fruit with the highest post-harvest durability, together with high yield potential and superior organoleptic traits compared to conventional long shelf-life tomatoes (Shirahige et al. 2010).
The high cost of production and high susceptibility to biotic and abiotic stresses make tomato growing a potentially high-risk operation (Finzi et al. 2020). Some proposals for increasing the yield and quality of the tomato fruit to ensure profitability are changes in plant spacing (Wamser et al. 2012), evaluation of different harvest times (Maciel et al. 2015a(Maciel et al. ), https://doi.org/10.1590(Maciel et al. /1678 PLANT BREEDING Article and the use of stimulants (Seleguine et al. 2016). In addition, there are reports of the benefits of reduction in internode space through plant breeding, resulting in more compact plants and, consequently, a larger number of tomato bunches per linear meter of stem/vine (Finzi et al. 2017a). Obtaining more compact plants that provide higher yield (Finzi et al. 2020) should be one of the aims of tomato plant breeding programs. The success of breeding programs depends on the existence of genetic variability (Santos et al. 2019). For the purpose of measuring this genetic dissimilarity, various techniques have been used to handle the data during evaluations of dwarf tomato plants (Finzi et al. 2017a;Finzi et al. 2020;Maciel et al. 2015b). However, one of the main obstacles has been the difficulty of discovering which statistical method is most suitable for evaluating multiple traits in the germplasm of dwarf tomato plants. One alternative would be the use of computational intelligence techniques through artificial neural networks (ANN), which have proven to be promising in different areas of breeding (Santos et al. 2019).
Therefore, the aim of this study was to estimate genetic dissimilarity and select BC 1 F 3 populations of dwarf tomato within the Santa Cruz segment by computational intelligence techniques.

MATERIAL AND METHODS
The experiment was conducted from October 2019 to March 2020 at the Vegetable Crop Experimental Station, Monte Carmelo, MG, Brazil (18°42'43.19"S, 47°29'55.8"W, and altitude of 873 masl). The plants were grown in an arch type greenhouse (7 × 21 m), with a ceiling height of 4 m, covered with a 150-micron transparent polyethylene film with an ultraviolet radiation inhibitor additive and lateral curtains of a white anti-aphid screen.
The genetic material evaluated consisted of 13 dwarf tomato plant populations obtained from two self-fertilizations of a previous backcross (BC 1 F 3 ) after hybridization of a pre-commercial homozygote line with Santa Cruz type fruit (UFU-TOM-Mother-2) crossed with the dwarf line UFU MC TOM1 (Maciel et al. 2015b), both parents, and the Santa Cruz commercial check varieties Kada and Santa Clara, for a total of 17 treatments. The wild accession Solanum pennellii present in this study was used only for comparison of the variable related to resistance to arthropod pests. The BC 1 F 3 populations and the parents are from the tomato germplasm bank of UFU. The commercial check varieties and the recurrent parent are characterized by their indeterminate growth habit and red Santa Cruz type fruit. In contrast, UFU MC TOM1 is a homozygote line for dwarf plant size with an indeterminate growth habit and oblong fruit of the mini-tomato type (Finzi et al. 2017b;Maciel et al. 2015b), which was used as a donor parent. Since expression of the dwarf phenotype is of recessive and monogenic origin (Maciel et al. 2015b), backcrosses were performed for transfer of the recessive allele.
Seeds were sown in polystyrene trays (200 cell) on October 3, 2019. Seedlings were transplanted 36 days after sowing (DAS) in 5 L capacity plastic pots. Coconut fiber based commercial substrate was used both in the trays and in the pots. Throughout the time of the experiment, crop treatments were performed as recommended for the tomato crop grown in a protected environment (Alvarenga 2013). The recurrent parent and the commercial check varieties were oriented vertically with two stems in a cord training system.
A randomized block experimental design was used, with 17 treatments and four replications. The experimental plots consisted of six plants distributed in double rows at a spacing of 0.3 × 0.3 m. A spacing of 0.8 m was used between the double rows, for a total of 360 plants.
Tomatoes were harvested weekly from January 3 to March 6, 2020, for a total of ten harvests. The fruit from each experimental plot was harvested in the full maturity stage, and the following agronomic traits were evaluated: mean fruit weight (MFW) (g), ratio between the weight of the tomato fruit and the number of tomatoes harvested from the plot; total soluble solids content (SSC) (°Brix), after harvest, the fruit was analyzed regarding total SSC using a portable digital refractometer (Atago PAL-1 3810); fruit diameter (FD) (cm), obtained with the aid of a ruler after cutting the fruit vertically in the middle, measuring its horizontal length/diameter; fruit length (FL) (cm), obtained with the aid of a ruler after cutting the fruit vertically in the middle, measuring its vertical length; fruit shape (FS), obtained by the ratio between the FD and FL (FD/FL). The recurrent parent and the commercial check varieties were used as references of the Santa Cruz segment to allow classification of the fruit; pulp thickness (PT) (cm), obtained with the aid of a ruler after cutting the fruit vertically in the middle, measuring the length between the fruit peel and the beginning of the locule; number of locules (NL) (locules·fruit -1 ), obtained after cutting the fruit horizontally in the middle, counting the NL; internode length (IL) (cm), obtained by the equation [(plant height/number of nodes)], in two plants at the center of the plot; acylsugar content (AA) (nmol·cm -2 of leaf area), obtained at 75 DAS, using a sample composed of eight leaf disks (equivalent to 4.2 cm 2 ) from each plant of the plot. The discs were collected from leaflets from the upper third of the plants and placed in test tubes. Extraction and quantification followed the method adapted by Maciel and Silva (2014).
Regarding nutritional traits, the β-carotene and lycopene contents were extracted and quantified according to the method adapted from Nagata and Yamashita (1992), Rodriguez-Amaya (2001), Rodriguez-Amaya and Kimura (2004). The fruit pulp was ground and then 1 g of it was placed in a glass vial containing 3 mL of 80% acetone. The samples were kept in the dark at a temperature of 8 °C for 48 h. Absorbances were then obtained by the spectrophotometry method using the wavelength of 450 nm for β-carotene and 470 nm for lycopene.
After presuppositions were validated by analyses of normality (Kolmogorov-Smirnov test), homogeneity (O'Neill and Mathews test), and additivity (Tukey's non-additivity test), data were transformed by √x for MFW and Log x + 1.0 for NL, and the true values of these variables were tabulated.
The data were analyzed by genetic dissimilarity obtained by the matrix of Mahalanobis generalized distance and represented by a dendrogram obtained by the unweighted pair group method with arithmetic mean (UPGMA) hierarchical method. Validation of grouping by the UPGMA method was determined by the cophenetic correlation coefficient (CCC).
Computational intelligence was used to determine the similarity among the populations by means of a class of neural networks, using Kohonen self-organizing maps (SOM). Self-organizing maps learning is basically achieved in three stages. Initially, synaptic weights are attributed to the different neurons, and then a competition process occurs. The set of genetic values of each genotype is allocated to the neuron that best represents it (winning neuron). The comparison phase begins with this allocation, in which the winning neuron determines the approximation of the other neurons according to similarity. Finally, the neurons establish which will be the neighbor neurons and pass to the adaptation phase, characterized by weight adjustment for each variable.
Network training used 5000 epochs, with three neurons in each dimension, radius equal to one, allocated in nine organizational neurons (three rows and three columns), topology with a hexagonal neighborhood, Feedforward network architecture with an input layer (means) and an output neuron, and Euclidean distance type activation function. All analyses were performed on the GENES software, integrated with the R and Matlab software (Cruz 2016).

RESULTS AND DISCUSSION
The dwarf tomato BC 1 F 3 populations differed from the parents and commercial check varieties for the agronomic traits related to tomato fruit (Table 1) by the F test (α = 0.05), except for the FS trait, and most of them were similar to the Kada and Santa Clara commercial check varieties of the Santa Cruz segment. The agronomic traits had a coefficient of genotypic determination (h 2 ) greater than 70% and a CVg/CVe ratio greater than 0.90, showing their reliability during the selection process. The present study highlights the progress achieved from carrying out the first backcross and two generations of self-fertilization of superior BC 1 F 3 populations of dwarf tomato plants of the Santa Cruz group.
Mean fruit weight is of considerable importance in the tomato crop and is directly connected with fruit quality and yield (Souza et al. 2012). Separate evaluation of this trait showed an expressive mean increase in the BC 1 F 3 populations in relation to the donor parent, especially in the UFU-SC#3 and UFU-SC#5 populations, with increases of 727.91 and 681.70%, respectively.
Fruit with greater PT is generally firmer and, consequently, more resistant to physical damage during transport and less subject to deformation (Siddiqui et al. 2015). In this study, PT in all the F 3 BC 1 populations was superior to that of the donor parent, with a mean increase of 136%. Another trait associated with fruit firmness is the NL; the smaller the NL, the firmer the fruit (Siddiqui et al. 2015). The populations UFU-SC#1, UFU-SC#3, UFU-SC#4, UFU-SC#7, UFU-SC#9, UFU-SC#11, and UFU-SC#12 had a smaller NL than the commercial check varieties and the recurrent parent. Fruit size in the present study was represented by FD and FL. A mean increase of 96.28% was found for FD and 29.10% for FL. Fruit size directly affects the tomato grower; tomatoes of reduced size are difficult to place on the market in a competitive manner (Nascimento et al. 2013).
Among the dwarf BC 1 F 3 populations evaluated, UFU-SC#3 and UFU-SC#5 showed prominent results in all the traits considered up to this point (Fig. 1). Finzi et al. (2020) also reported success in obtaining BC 1 F 2 populations of dwarf tomato in the Salada segment, corroborating the results of the present study.
Tomato breeding programs focusing on in natura consumption have adopted the strategy of introgression of alleles not only for fruit traits, but also for insect resistance. Various studies have confirmed that genotypes with high allelochemical contents in the leaves, such as acylsugars and 2-tridecanone, confer resistance to whitefly and other arthropod pests (Andrade et al. 2017;Oliveira et al. 2012;Neiva et al. 2019;Silva et al. 2018).
In the present study, the AA of the UFU-SC#1 (40.79 nmol·cm -2 ) and UFU-SC#2 (42.78 nmol·cm -2 ) populations did not differ from that of the wild species Solanum pennellii (41.49 nmol·cm -2 ), commonly used as a donor parent of pest resistance alleles. These materials had higher content than that found in either of the parents, the cultivar Santa Clara and Kada variety, and the other BC 1 F 3 populations. These results show the potential the dwarf BC 1 F 3 populations with high AA have as sources of resistance to arthropod pests.
In relation to IL, all the dwarf BC 1 F 3 populations had values lower than those observed in the recurrent parent, cultivar Santa Clara, and Kada variety, which exhibited IL greater than or equal to 4.50 cm. Among the BC 1 F 3 populations, UFU-SC#4, UFU-SC#9, UFU-SC#10, and UFU-SC#12 stood out with shorter ILs. Tomato breeding aiming to obtain tomato plant varieties with reduced IL and, consequently, better plant architecture is a future market trend (Sun et al. 2019). Panthee and Gardner (2013) and Finzi et al. (2017a) reported obtaining tomato plants with compact architecture that resulted in higher yield. Regarding SSC, β-carotene content (CC), and lycopene content (LC), the dwarf tomato BC 1 F 3 populations differed from the donor parent, recurrent parent, Kada variety, and cultivar Santa Clara only for SSC and LC ( Table 2).
Expressive differences among the genotypes were not found regarding content of β-carotene, an important nutrient that acts in prevention of diverse diseases (Baldet et al. 2014). However, for content of lycopene, a carotenoid whose main source is the tomato fruit and that acts in protection against cardiovascular diseases and diverse types of cancer (Salvia-Trujillo et al. 2016), the dwarf populations UFU-SC#4 (2.58 mg·100 mg -1 ), UFU-SC#8 (2.94 mg·100 mg -1 ), UFU-SC#10 (2.87 mg·100 mg -1 ), and UFU-SC#11 (2.58 mg·100 mg -1 ) had higher values. Thus, the dwarf BC 1 F 3 populations UFU-SC#8, UFU-SC#10, and UFU-SC#11 showed promise regarding fruit quality parameters because they exhibited SSC and LC superior to those found in the commercial check varieties, and did not differ from them regarding β-carotene content. Two more backcrosses are suggested for the next steps, just as performed by Gonçalves Neto et al. (2010), resulting in dwarf lines with commercial standard fruit, and later, hybrids belonging to the Santa Cruz group coming from such lines, such as obtained by Finzi et al. (2017a). Thus, it is important to use selection strategies that allow superior dwarf populations to be obtained, as well as the use of different measures of dissimilarity (Araújo et al. 2016).
The dissimilarity estimated by the Mahalanobis generalized distance among the dwarf plants ranged from 5.32 (UFU-SC#11 and UFU-SC#13) to 655.07 (donor parent and UFU-SC#3), showing the genetic diversity among the dwarf populations (data not shown). Selection of lines with the greatest divergence possible is recommended for breeding programs (Cruz et al. 2012). The dendrogram obtained by the UPGMA method (Fig. 2) and the Kohonen SOM network model was used for visualization of this dissimilarity (Fig. 3).
For the UPGMA grouping method, the cutoff in the dendrogram was performed considering 10% of genetic variability, which allowed division of the genotypes into four different groups, a criterion defined considering the abrupt change of level (Cruz et al. 2012).
Group I was composed of all the dwarf BC 1 F 3 populations, group II consisted of the donor parent, group III was composed of the recurrent parent and the Kada variety, and group IV consisted of only the cultivar Santa Clara. The success of the first backcross can be confirmed from the separation of the donor parent and the dwarf BC 1 F 3 populations, due to improvement in agronomic performance of the BC 1 F 3 populations. Maciel et al. (2018) compared different methods of multivariate analysis for evaluation of genetic dissimilarity in tomato and found the effectiveness of this method in separation of groups.
In addition to the statistical methods traditionally applied in separation of groups, such as the Tocher optimization method and the UPGMA, ANN techniques have recently been used for classification of genotypes. The Kohonen SOM figure prominently in evaluation of genetic dissimilarity. The nonlinear structure of this technique allows detection of more complex traits in the dataset (Oliveira et al. 2020  The map is organized in a topological structure that reflects the similarity among the genotypes under study (Oliveira et al. 2020). Thus, neurons near each other have similar populations, more divergent populations are represented by neurons in the more extreme regions, and the intermediate populations constitute the center of the map.
Using the SOM method, from the nine neurons established with three rows and three grids for the command, the 17 populations were classified and eight classes were filled, forming eight groups (Fig. 3). Neuron classes one and three (row I column I and row I column III, respectively) were constituted by four populations; one population was allocated to each of classes two, five, seven, and nine (row I column II, row II column II, row III column I, and row III column III, respectively), standing out for their dissimilarity in relation to the others. Three populations were allocated to class four (row II column I); class six (row II column III) was composed of two populations; and classification of populations did not occur for class eight (row III column II). This indicates high divergence among the populations belonging to classes seven and nine, since neighbor neurons indicate proximity between materials.
The classification of the tomato populations in the classes and the constitution of eight filled groups through the Kohonen map can be visualized in Fig. 4. Group I was composed of the dwarf BC 1 F 3 populations UFU-SC#7, UFU-SC#9, UFU-SC#12, and UFU-SC#13; group II of the UFU-SC#5 population; group III of the populations UFU-SC#1, UFU-SC#2, UFU-SC#4, and UFU-SC#10; group IV of the populations UFU-SC#6, UFU-SC#8, and UFU-SC#11; group V of the UFU-SC#3 population; group VI of the cultivar Santa Clara and Kada variety; group VII of the donor parent; and group VIII of the recurrent parent. Thus, as observed by the UPGMA method, all the dwarf BC 1 F 3 populations differed from the donor parent; they were allocated in different groups. This reaffirms the fact that the first backcross led to increases in the BC 1 F 3 populations, recovering part of the genetic constitution of the recurrent parent. Nevertheless, using UPGMA, all the dwarf BC 1 F 3 populations were allocated in a single group, whereas for SOM, the BC 1 F 3 populations were distributed among groups I, II, III, IV, and V.
Among the BC 1 F 3 populations, the populations UFU-SC#3 and UFU-SC#5 can be highlighted, which, by the SOM method, were allocated to groups V and II, respectively. The separation of these populations from the others is explainable by their better performance for MFW and PT, with the UFU-SC#3 population also exhibiting greater FL (Table 1). Thus, the SOM method was more efficient for evaluation of genetic dissimilarity and selection of individuals with superior traits for obtaining dwarf tomato lines. In addition, the efficacy of computational intelligence is validated by the fact of showing the extreme genetic dissimilarity between the donor parent (group VII) and the recurrent parent (group VIII). Not only are they found in different groups, but they also have an empty class between them, which indicates the large distinction between these populations (Fig. 3).
Another difference between the methods was in regard to the allocation of the recurrent parent, Kada variety, and the cultivar Santa Clara. By the UPGMA method, the Kada variety and the recurrent parent were allocated in a single group and the cultivar Santa Clara in an isolated group. By the Kohonen method, the Kada variety and the cultivar Santa Clara were allocated in a single group and the recurrent parent in an isolated group. The latter grouping is more consistent with what is observed by the Scott-Knott test (Table 1), in which the Kada variety and the cultivar Santa Clara were similar regarding MFW, FD, FS, PT, NL, IL, and AA.
These differences can be explained by the high simulation ability of the neural networks, which broaden the input data and estimate new values, with different synaptic weights for each neuron, organizing the groups by order of proximity (Oliveira et al. 2020). Santos et al. (2019), using SOM for evaluation of genetic divergence, concluded that the method was efficient for evaluation of genetic diversity in rice genotypes.
From the maps of the weights and association of each input variable (traits evaluated) with the neurons of the output layer (Fig. 5), the influence that each variable exercises on the network neurons can be observed. The lighter colors indicate greater weights and, therefore, large importance. In contrast, dark colors represent lower importance of a determined trait on the neuron. In addition, neurons with similar color patterns indicate similar responses; that is, by means of analysis of the color pattern, inferences can be made regarding correlation among traits. Note. MFW: mean fruit weight (g); FL: fruit length (cm); FD: fruit diameter (cm); FS: fruit shape; PT: pulp thickness (cm); NL: number of locules (locules·fruit -1 ); IL: internode length (cm); AA: acylsugar content (nmols·cm -2 of leaf area), SSC: soluble solids content (°Brix); CC: β-carotene content (mg·100 mg -1 ); LC: lycopene content (mg·100 mg -1 ).
Analysis of color intensity showed that MFW, FL, and FD, just as NL and IL, exhibited correlation with each other, because they have the same pattern of similarity among the populations, represented by colors. According to Ribeiro et al. (2016), knowledge of the correlation among traits provides information regarding the degree of association among them, in which selection of one trait can change the response of another.
In general, AFW, FL, FD, PT, NL, and IC did not contribute to the formation of the neuron that clustered the donor parent (group VII), which suggests that they were the traits that least contributed to classification of this population. In addition, the classes of neurons that formed groups I, II, and III were characterized mainly by FD, PT, and AA, respectively. The AFW and FL characteristics were important for characterizing group V (UFU-SC#3). Representation of groups VI (Kada variety and cultivar Santa Clara) and VII (donor parent) in lighter colors for most of the traits confirms their importance for evaluation of dissimilarity among these populations that are characterized by highly contrasting phenotypes.
Both methods (UPGMA and SOM) confirmed the genetic dissimilarity in relation to the donor parent. However, SOM was able to detect differences and organize the similarities among the populations in a more consistent manner, resulting in a larger number of groups compared to the UPGMA method. Regarding agronomic traits among the BC 1 F 3 populations, UFU-SC#3 and UFU-SC#5 stood out, which were actually allocated to separate groups when using SOM. In relation to quality parameters, the UFU-SC#10 and UFU-SC#11 populations stood out. In addition, analysis of color intensity indicated that all the traits were important for quantifying dissimilarity among the populations.
The Kohonen self-organizing map, which uses computational intelligence, proved to be more suitable for classifying and clustering the dwarf tomato plant populations.
Computational intelligence is a promising approach that is easy to use and highly flexible, contributing efficiently to the study of genetic dissimilarity and selection strategies in plant breeding programs.

DATA AVAILABILITY STATEMENT
All datasets were generated or analyzed in the current study.