Genetic variability in genotypes of safflower via SSR molecular marker

ABSTRACT The safflower is an oleaginous plant belonging to the Asteraceae family. It is used as a raw material for various purposes. These plants are popular for the quality and quantity of oil produced and, and thus, studying their genetic variability using markers is necessary for determining genetic resources to conduct breeding programs. Therefore, we evaluated the genetic variability of safflower genotypes using Simple Sequence Repeat (SSR) molecular markers. The study was conducted at the State University of Mato Grosso “Carlos Alberto Reyes Maldonado”, in the Campus of Cáceres-MT. In total, 121 safflower genotypes from the Germplasm collection were evaluated using 21 SSR markers. The programs GenAlEx 6.5, GENES, and Structure were used to analyze the data. We identified 158 alleles at 21 loci among the genotypes. The expected heterozygosity (He) was high (0.551 - 0.804), but the observed heterozygosity (Ho) was low (0.000 - 0.502), and the indices of the endogamy coefficient (F) were positive in all loci and all populations, with an overall average of 0.958. The genetic differentiation (FST) values among populations were low, with an average of 0.010, which suggested a low population structure. The modified Tocher clustering and the UPGMA hierarchical clustering yielded 19 and 15 distinct groups, respectively. The genetic structure showed two populations, with few intermixes in the genome. The evaluated safflower genotypes showed genetic variability, and these genetically different variants might be used in breeding programs to obtain cultivars adapted to Brazil.


INTRODUCTION
Carthamus tinctorius L. has been cultivated and used for more than 4,000 years (Moura et al., 2015).It is an oilseed from the Asteraceae family that is mainly grown for extracting oil, which is used for human consumption (Queiroga;Girão;Albuquerque, 2021) and to make lubricants, biofuels, soaps, varnishes, and animal feed (Golkar, 2014;Kumar et al., 2016).
Safflower is an important oilseed used around the world (Kim et al., 2016;Sharifi;Namvar, 2017).It is cultivated in more than 60 countries, and the global production of safflower in 2017 was around 734,000 tons, cultivated in an area of approximately 725,000 hectares; Turkey, Mexico, and China were the largest producers of safflower with yields of 1,826, 1,565, and 1,429 kg ha -1 , respectively (Food and Agriculture Organization of the United Nations -FAO, 2019).
In Brazil, culture has attracted the attention of researchers and industries due to the quantity and quality of oil produced (Silveira et al., 2017;FAO, 2019).Studies on the culture, mainly on genetic improvement, are limited.Thus, further research can help in the selection of genotypes adapted to specific regions, which can increase crop yield (Singh;Nimbkar, 2016).
Evaluating the genetic variability using markers is necessary for using genetic resources in plant breeding programs (Saadaoui et al., 2017).Determining the genetic variability of a breeding collection via SSR (Simple Sequence Repeat) molecular markers helps in identifying genotypes with desirable characteristics for developing new cultivars (Kiran et al., 2017).Golkar and Mokhtari (2018) used SSRs in the safflower genotype to evaluate genetic variability and structure.Ambreen et al. (2018) evaluated association mapping for important agronomic traits in the main collection of safflower (Carthamus tinctorius L.) using microsatellite markers and found associations between molecular markers and traits, which can facilitate marker-assisted breeding and the identification of genetic determinants of trait variability.Hassani et al. (2020a) evaluated the morphological description, genetic diversity, and population structure of safflower (Carthamus tinctorius L.) mini-crop using SRAP and SSR markers and found high genetic diversity in the safflower germplasm examined by performing agromorphological and molecular analysis.The same group (Hassani et al., 2020b), conducted a Deep Analysis of the genomic diversity, population structure, and linkage disequilibrium of safflower (Carthamus tinctorius L.) found across Africa and Europe.They used the NGS data generated by the DArTseq technology and found that their results matched their hypothesis that safflower domestication started somewhere west of the Fertile Crescent and then expanded across Africa and Europe.
The use of SSR markers in studies on safflower can provide information on the genetic improvement of the culture.These markers can be used to determine genetic variability and population structure.Information on both these aspects is important for using the genetic diversity of safflower populations effectively.
Therefore, in this study, we estimated the genetic variability of 121 safflower genotypes via SSR molecular markers from the germplasm collection of the Laboratory of Genetic Resources & Biotechnology (LRG&B) of the State University of Mato Grosso "Carlos Alberto Reyes Maldonado" (UNEMAT), Campus of Cáceres, Mato Grosso, Brazil.

MATERIAL AND METHODS
The study was conducted under controlled temperature and humidity conditions at the Laboratory of Genetic Resources & Biotechnology (LRG&B) and in the greenhouse belonging to the LRG&B, both associated with the Department of Agronomy of the University of the State of Mato Grosso "Carlos Alberto Reyes Maldonado" (UNEMAT), University City of the Campus of Cáceres -Mato Grosso, located at "16°07'66" latitude and "57°65'29" longitude.
For collecting DNA samples, 121 safflower genotypes were sown in 500 mL plastic cups containing commercial substrate.Two seeds were sown in the greenhouse of LRG&B, with three replicates for each genotype.The seeds were irrigated daily, twice a day, until the leaf tissue was collected.The samples were collected between eight and ten days after sowing when the second pair of true leaves emerged.
We evaluated 121 genotypes from 10 populations, which included varieties from Bangladesh, Canada, Kazakhstan, China, Ethiopia, the USA, India, Iran, Pakistan, and Turkey.These populations were grouped into six regions: South Asia (India and Pakistan), Middle East (Iran and Turkey), North America (Canada and USA), East Asia (China and Bangladesh), Central Asia (Kazakhstan), and East Africa (Ethiopia) (Table 1).
While collecting the samples, tweezers were used to pluck the leaves from the plants.Care was taken to prevent contamination, and later, the samples were stored in zip lock bags and refrigerated in an ultra-freezer at -80 °C until DNA extraction was performed.
The leaf tissue was macerated in the TissueLyser for 10 min.The DNA was extracted using the Wizard® Genomic DNA Purification Promega kit (USA), following the manufacturer's instructions.To amplify the DNA, 21 primers were used for the SSR loci (Table 2), which represented the genetic variability of the safflowers, as described by Mokhtari et al. (2018) and Kiran et al. (2017).
Following the protocol of Williams et al. (1990), the PCR assays were conducted using a Perkin Elmer model 9600 thermocycler with the following temperature program: initial denaturation phase at 94 °C for 5 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing for 30 s (according to the temperature requirements of the specific primer) (Table 2), extension at 72 °C for 30 s, and a final extension phase at 72 °C for 5 min.The PCR product was stored at 4 °C until further experiments were conducted.The amplified products (amplicons) were stained with Gel Red and Blue Juice 6 X and visualized on a 3% agarose gel, using Tris borate EDTA (1%) as a buffer solution.The gel was photographed using the Locus Biotecnologia/photo documentation system LPix Image version 2.7 after running the gel at 60 V for 4 h.
The genetic diversity of 110 safflower genotypes was evaluated.The data on the 11 remaining genotypes were eliminated as there were less than two samples per region, which was below the minimum requirement for analysis by the GenAlEx 6.5 program (Excoffier;Laval;Schneider, 2005).The allele frequency, number of alleles, average observed heterozygosity (Ho), average expected heterozygosity (He), and inbreeding coefficient (F) were evaluated.The genetic structure of the Fst populations was measured using the same program (Wright, 1949).
The analysis of molecular variance (AMOVA) was performed to determine the distribution of genetic diversity among and within the population and between individuals, following the method described by Excoffier, Smouse, and Quattro (1992), and the significance was tested using 1,000 permutations with a 95% confidence interval.
The dissimilarity matrix resulting from the Jaccard index was analyzed by Tocher's optimization method and the UPGMA hierarchical method using the computational resource GENES (Cruz, 2013).The Bayesian cluster analysis was performed using the Structure software (Pritchard et al., 2000) to define the number of groups (K).

RESULTS AND DISCUSSION
The 21 SSR markers used in this study showed 100% polymorphism.The results for the number of alleles, observed (Ho) and expected (He) heterozygosity, and inbreeding coefficient (F) were determined (Table 3), which in turn was used to obtain information on the genetic diversity of 110 Carthamus tinctorius L.
In total, 158 alleles were detected among the genotypes at the 21 loci.The number of alleles ranged from six (CT6, CT12, CT13, and CT19) to 11 (CT26), with eight alleles per locus on average (Table 3).Kiran et al. (2017) evaluated the genetic divergence of 148 safflower genotypes using 48 molecular SSR markers and found that the number of alleles was 2-15, which was higher than that recorded in this study.However, the average number of alleles per locus in their study was four, which was lower than that of our study.Mokhtari et al. (2018) studied the genetic divergence of 103 safflower genotypes using 32 SSR molecular markers and found a lesser number of alleles than that in our study, ranging from two to four and an average of three alleles per locus.
The number of alleles is an important parameter to determine the genetic diversity among populations before they are used in breeding programs.As the number of alleles in a population increases, its diversity also increases, which in turn increases the chance of identifying favorable genotypic combinations.Therefore, this parameter is greatly influenced by the number of genotypes evaluated, and it increases with the sample size (Petit;Mousadik;Pons, 1998).These findings explained the results obtained in this study.The number of alleles found in this study was lower than that reported by Kiran et al. (2017) and higher than that reported by Mokhtari et al. (2018), probably because the number of genotypes evaluated in those studies was different.
The expected heterozygosity (He) in our study was high (0.551 to 0.804), with an average of 0.718.This average value was higher than the values reported by Lee et al. (2014) and Bahmankar, Nabati, and Dehdari, (2017), which were 0.386 and 0.537, respectively, indicating that the genetic diversity in those studies was lower.
The observed heterozygosity (Ho) was low (0.000 to 0.502), with an average of 0.035.Ambreen et al. (2018) evaluated 124 safflower genotypes using 93 SSR primers and also found a low Ho of 0.112.The low polymorphism might be due to the predominance of sexual reproduction and self-pollination, which increases the rate of homozygosity.
The results of the inbreeding coefficient index (F) were positive for all loci and in all populations, with an average of 0.958, which is expected for autogamous breeding plants.
The F values were high and positive, probably due to the level of He relative to that of Ho for each locus and in each population.The F values indicated that inbreeding was prevalent.The F analysis can be used to measure the deficiency or excess of heterozygous genotypes present in a population.This analysis estimates the probability of two alleles being identical by descent, with a coefficient that can range from -1 to 1. Negative values indicate the presence of more heterozygotes than expected, while positive values indicate more homozygotes, and zero indicates that the process is random.
The results of AMOVA (Molecular Variance Analysis) showed that 91% of the total genetic variability occurred within populations, 5% of variability occurred between populations, and 4% of variability occurred between individuals.The variability between regions was 0% (Table 4).
In general, the average genetic differentiation (FST) between populations was low (0.010), suggesting a low population structure.Weir (1996) stated low FST indicates a similar frequency of alleles within each population, and high FST indicates different allele frequencies in the populations.Kiran et al. (2017) found similar results after evaluating 148 safflower genotypes, in which the results of AMOVA showed that 85% of the genetic variation was explained by individuals within populations and 15% of the variation was explained between populations.Their findings also indicated a low population structure.

Components of Variation
Variation (%) The grouping performed by the modified Tocher method, with a Bayesian Jaccard similarity index, allowed the clustering of genotypes into 19 groups based on 21 SSR markers (Table 5).Group I included the highest number of genotypes ( 14) of the total evaluated (12.74%).These genotypes were a part of the populations from India and Pakistan (South Asia), Iran and Turkey (Middle East), China (East Asia), and Kazakhstan (Central Asia).
Groups II and III consisted of 11 genotypes, corresponding to 10% of the total genotypes evaluated.These genotypes were a part of the populations from Pakistan and India (South Asia), Iran (Middle East), Ethiopia (East Africa), Bangladesh and China (East Asia), and the USA (North America), respectively.
Groups VI, VII, XIII, XIV, and XVI included 33 (29.99%) genotypes.These groups were a part of the populations from India (South Asia), the USA and Canada (North America), China (East Asia), Turkey and Iran (Middle East), and Ethiopia (East Africa).
Groups IV, V, VIII, and IX consisted of six genotypes each (5.45%), which were a part of the populations from China (East Asia), the USA (North America), India (South Asia), Turkey and Iran (Middle East), and Bangladesh (East Asia).We also found four groups (X, XII, XV, and XIII) with three genotypes each (2.73%), which were a part of the populations from Kazakhstan (Central Asia), China (East Asia), the USA (North America), India and Pakistan (South Asia), Turkey (Middle East), and Ethiopia (East Africa).
Most groups formed by the modified Tocher method included individuals collected from different populations and regions of the world, which indicated low variability between the evaluated genotypes.Reis et al. (2015), Araújo et al. (2019), andHassani et al. (2020b) also found a low association between genetic diversity and the collection sites when evaluating characteristics of interest.Thus, the geographical origin is a poor indicator of genetic diversity, and it might not reflect greater genetic distance, which was the case in this study.
Table 5: Representation of the cluster generated by the modified Tocher optimization method based on the dissimilarity between the 110 genotypes of Carthamus tinctorius L.

Groups
Genotypes Percentage of genotypes 20,21,22,23,24,25,26,27,42,43,45,58,61 and 82 12.74 II 11,12,13,14,15,16,17,40,41,52  Groups XI and XVII consisted of only two genotypes each (1.82%), which were a part of the populations from India/Pakistan (South Asia) and China (Group XI) (East Asia), and Iran (Middle East) (Group XVII).Group XIX consisted of only one genotype (0.91%), suggesting that it was the most divergent of all the evaluated genotypes, and it came from the population of Iran (Middle East).This result indicated that genotype 56 was the most divergent relative to the other genotypes and should be considered for plant breeding programs.Similar results were obtained by Cordeiro et al. (2020), who reported the formation of groups with only one genotype using the modified Tocher method.
Based on the dendrogram obtained by the UPGMA hierarchical method with a significant cut at 90%, the genotypes were divided into 15 groups (Figure 1).The highest number of genotypes were found in Group I (23 genotypes), Group IV (11 genotypes), and Group IX (10 genotypes).All genotypes in the three groups belonged to populations from India and Pakistan (South Asia), Iran and Turkey (Middle East), Kazakhstan (Central Asia), China (East Asia), the USA (North America), and Ethiopia (East Africa).Nine groups had four to nine genotypes each.These groups included Group VIII with nine genotypes, Group V with eight genotypes, Groups VI, X, and XIV with seven genotypes each, Group III with six genotypes, Group XI with five genotypes, and Groups II and XIII with four genotypes.These genotypes belonged to populations from Turkey (Middle East), India and Pakistan (South Asia), the USA and Canada (North America), China (Central Asia), Iran (Middle East), Bangladesh (East Asia), and Ethiopia (East Africa), respectively.
Groups VII and XII consisted of two genotypes each, and group XV had one genotype.Thus, it had the most divergent genotype relative to the other groups.Groups with only one genotype are more divergent than the others.These genotypes can be used in breeding programs (Rotili et al., 2012).We found no geographic structure based on the similarity between the genotypes of the same population or region since some genotypes of the same population or region were allocated to different groups.
The consistency of the obtained dendrogram was evaluated by the co-phenetic correlation coefficient (CCC), which measured the correlation between the distances recovered from the dendrogram with the original distance matrix proposed by Sokal and Rohlf (1962).Based on the CCC, the results of the t-test conducted for the grouping method showed a significant value (P ≤ 0.01) between groups for the mean grouping method (UPGMA).The correlation coefficient (r ≥ 0.62) suggested variability in the consistency of the grouping pattern between the genotypes.Lira et al. (2021) reported similar results after evaluating 124 safflower genotypes, with a CCC of 0.70.Correa et al. (2020) evaluated the phenotypic dissimilarity in nine genotypes of sunflower, which also belongs to the Asteraceae family.They found a CCC of 0.65, and the results of their t-test were significant (P < 0.01).
Both grouping methods (modified Tocher and UPGMA) showed similarities in the grouping of genotypes.Groups X and XVII formed by the modified Tocher grouping, were similar to Groups XII and VII formed by the UPGMA method.The genotypes allocated in Groups I, XIX, and part of the modified Tocher group III were all allocated in Group I obtained by the UPGMA method.The other groups formed did not show similarity in the safflower genotype groupings.More groups were formed when the genotypes were grouped by the Tocher method than by the hierarchical UPGMA method.Oliveira et al. (2019) found similar results by applying these methodologies, where the UPGMA and Tocher methods could efficiently categorize the genotypes, although the number of groups formed was different.
Using the Bayesian method, proposed by Evanno, Regnaut, and Goudet (2005) and the Structure software, we identified the structure of this set of evaluated genotypes.The data from Delta K showed only one peak (K = 2).The data had the highest peak and the greatest adequacy between the suggested groups, assuming that was the real K value.
By analyzing the population structure (Figure 2), the safflower genotypes were placed into two groups, which matched the results obtained from the Delta K variation graph.
Group I consisted of 58 genotypes belonging to populations from Ethiopia (East Africa), Turkey and Iran (Middle East), India and Pakistan (South Asia), Kazakhstan (Central Asia), China (East Asia), and the USA (North America).Group II consisted of 63 genotypes belonging to populations from Turkey and Iran (Middle East), Bangladesh (East Asia), India (South Asia), China (East Asia), and the USA and Canada (North America).Similar results were reported by Mokhtari et al. (2018), who evaluated the genetic diversity and genetic structure of the population of Carthamus tinctorius L. using SSR markers.In that study, the genotypes could be divided into two groups.
The presence of introgressions was related to the genotypes that contained different colors in the same bar of the bar plot (Figure 2).The average proportion of introgressed fragments was approximately 13%, and the genotypes were placed into two or more groups.The two groups evaluated showed introgression but with greater intensity in genotype 78 of Group I.The occurrence of introgression in the safflower genotypes was low, considering that only 11 genotypes showed introgression, coming from individuals of other populations.
A low rate of gene introgression was reported by Asfaw, Blair, and Almekinders (2009), who conducted genetic studies on landrace beans from Ethiopia and Kenya.Delfini et al. (2021) found an introgression of approximately 55% by analyzing the genetic diversity, population structure, and linkage disequilibrium (LD) in common bean accessions.Fisseha et al. (2016) studied accessions of beans from Ethiopia and reported high introgression (58%).
By performing principal coordinates analysis (PCoA), we determined the spatial distribution of the ten populations and how the 110 safflower genotypes presented themselves within the populations.The first two coordinates explained 5.74% of the total variation among the accessions, with dimensions 1 and 2 explaining 2.99% and 2.75%, respectively (Figure 3).
By comparing the results obtained from the Bayesian analysis (determined by the Structure software) (Figure 2) with those obtained by the UPGMA method (Figure 1) and the principal coordinates analysis (PCoA) (Figure 3), we found that the methods of analysis were similar, as eight genotypes (6, 20, 21, 22, 23, 24, 26, and 30) belonging to the populations from Pakistan, Kazakhstan, and India, originating from southern Asia and central Asia, were placed in the same groups although they were determined by the different methods of analysis.

Figure 1 :
Figure 1: A dendrogram of the grouping of 110 safflower genotypes constructed by the UPGMA method and based on the dissimilarity estimated from molecular characteristics.

Figure 2 :
Figure 2: The population structure of 10 populations and six regions of Carthamus tinctorius L. included 121 genotypes based on 21 molecular SSR markers; K = 2.Each vertical bar represents a genotype and the percentage of adherence to each group.

Table 1 :
Information on the 121 genotypes of Carthamus tinctorius L.

Table 2 :
Details of the 21 molecular SSR markers that were used to identify the molecular variability of the 121 genotypes of Carthamus tinctorius L.

Table 3 :
Estimation of the genetic diversity of the 110 genotypes of Carthamus tinctorius L. obtained from 21 SSR markers * .

Table 4 :
Analysis of molecular variance (AMOVA) of Carthamus tinctorius L. genotypes from 10 populations belonging to six distinct geographic regions.