Genealogy and genetic base of Brazilian cotton cultivars

Abstract Cotton is an essential worldwide commodity, and Brazil ranks fourth in world production and second in export. The present work aimed to gather the genealogies and estimate the genetic base of 109 Brazilian cotton cultivars released from 1932 to 2021. Twelve of the 68 ancestors identified contributed with 52.03% of the genes, and 33% of the cultivars resulted from the direct selection from landraces or old cultivars, evidencing a narrow genetic base. Over time, especially after 1990, the increases in the number of cultivars released and number of ancestors added to the breeding programs were driven by the need to reduce the cultivars' cycle for coexistence with pests and to make them suitable for mechanized cultivation in the new agricultural frontiers. However, Brazilian cotton cultivars still have a narrow genetic base, with the important participation of the ancestors CNPA SRI3, CKR 100W, C-77 and Express.


INTRODUCTION
Cotton is one of the main commodities in the world, and Brazil stands out in world production (ICAC 2022).In the 2018/19, 2019/20 and 2020/21 crop seasons, the average Brazilian production was 6.658 million tons (ABRAPA 2022), of which around 17.5% were destined for export, mainly to China (Coêlho 2021).Regarding yield, the average in the same period was 3.597 kg/ ha (ABRAPA 2022).However, there are Brazilian cultivars with a yield potential above 4.000 kg/ha (Suassuna et al. 2020).
Until the 1980 decade, the cultivation of semi-perennial cotton cultivars predominated in Brazil, mainly in the semi-arid regions of the Northeast and North of Minas Gerais State (AMIPA 2022), basically without agricultural mechanization.However, due to the introduction of the pest boll weevil (Anthonomus grandis B.) in Brazil in the mentioned decade, the cultivation of semi-perennial cultivars became unfeasible, mainly because it was difficult to interrupt the pest's cycle.Besides that, they were unsuitable for mechanized cultivation.
Therefore, releasing annual short-cycle cultivars suitable for mechanized cultivation boosted the displacement of cotton cultivation from the Northeast to São Paulo and Paraná States.Subsequently, due to the increase in the prices of arable land in the States of São Paulo and Paraná, and competition from other crops, Brazilian cotton cultivation reached the Cerrado, where over 80% of Brazilian cotton is currently produced, mainly in the Mato Grosso and Goiás PLM Cruz et al.
States.In this context, genetic breeding has contributed to the development of cultivars suitable to new agricultural frontiers and with tolerance to pests and diseases, enabling the coexistence with phytosanitary problems, increasing yield and quality of the fiber (AMIPA 2022).
The genetic improvement of cotton in Brazil began in 1924 at Campinas Agronomic Institute (IAC), initially aiming to develop high-yielding cultivars.However, with the emergence of Fusarium wilt in cotton plantations in São Paulo State in 1957, the program began to emphasize the development of resistant cultivars to this disease (Cavaleri et al. 1967).Since then, other cotton breeding programs have been created in institutions such as the Brazilian Agricultural Research Corporation (EMBRAPA), Paraná Agronomic Institute (IAPAR), and Minas Gerais Agricultural Research Corporation (EPAMIG).In 1997, the Law no.9.456, which instituted the Brazilian Plant Variety Protection Act, attracted private companies, such as Deltapine Land, Stoneville International, Bayer Seeds and Syngenta, to establish cotton breeding programs.In addition to conducting studies with cotton genetic breeding, the Monsanto Company acquired Deltapine and MDM cotton seeds, and Mato Grosso Cotton Institute (IMA Mt) acquired the cotton program from Coodetec and LD Melhoramento (Vidal Neto and Freire 2013).
Despite the genetic gains in Brazilian cotton breeding programs, a study on the genetic base with all cultivars and their genealogies had not yet been carried out.According to Silva et al. (1999), information about genealogies is often published in restricted circulation documents or even not published, hence being inaccessible to breeders.Pedigree information can be used to evaluate the contribution of various genetic pools to current cultivars (Bowman et al. 1996) and to monitor changes in genetic vulnerability over time (Van Esbroeck et al. 1999).In the United States, concerns with cotton genetic base have been reported since the end of the twentieth century (Bowman et al. 1996).Van Esbroeck and Bowman (1998) and Zenh (2014) showed narrowing genetic bases in US cotton cultivars, with consequent restriction for improvement of lint yield and fiber qualities in the breeding programs.In Pakistan, Saleem et al. (2020) also observed a very narrow genetic base when studying 44 in-use cotton cultivars assigned to continuous breeding among varieties of the same origin.According to Gingle et al. (2006), the strong pressure of selection to high yield and earliness was the principal cause of the genetic narrowing in cotton cultivars that exposes vulnerability risk around the world.
Until now, no study involving the genetic base of Brazilian cotton cultivars released over time has been conducted.Bertini et al. (2005), assessing the genetic diversity of 30 Brazilian cotton cultivars by microsatellites and pedigree, concluded that cultivars descended from few parents and suggested the introduction of new alleles into their gene pool.Due to the importance of genetic diversity and wide genetic base to the breeding programs, the present work aimed to gather the genealogies and quantify the genetic base of Brazilian cotton cultivars released commercially from 1932 to 2021.

MATERIAL AND METHODS
Commercial cotton cultivars developed in Brazil from 1932 to 2021 were listed (Table S1).The genealogies of each cultivar were elaborated by looking as far as possible for the oldest ancestors that were found.For that, several bibliographies were examined, such as: Cavaleri and Gridi-Papp (1993), Bertini et al. (2006), Bowman et al. (2006), andCarvalho (2008), folders of cultivar release, information from websites of institutions responsible for cultivars development and unpublished documents.Most cultivars developed by private companies were not included in the pedigrees because they do not provide genealogies or records of crosses.
The genitors were structured in pedigree charts, following conventional and international standards, keeping the female parents on the left and the male ones on the right of their offspring.This notation maintains the identifications of female ancestors, which transmit the cytoplasmic genes.Backcrosses were represented by the symbol "*" followed by the Arabic number corresponding to the number of times it was used as a recurrent parent.Genotypes and cultivars from mutation or direct selection were designated by vertical stroke (|).The Malécot's Coefficient of Parentage (COP) was used to estimate each cultivar's relative genetic contribution (RGC), which was classified in decreasing order of values.The accumulated genetic contribution (AGC) was estimated from the successive sum of the ancestral RGCs.The frequency of each ancestor in the pedigree (FAP) was estimated considering the number of cultivars with a certain ancestor in their genealogy relative to the total number of cultivars analyzed.The number of ancestors that constitute each cultivar (NAC) and the average number of ancestors per cultivar (ANAC) were estimated.These parameters were also obtained for the released cultivars subdivided into three periods (1932-1962, 1963-1992 and 1993-2021).In addition, the number of ancestors corresponded to 50% to 90% of the genetic base (NAGB) and the relation between the number of ancestors and the number of cultivars (NA/NC) per period were computed.The methodology described here was performed according to Wysmierski and Vello (2013) and Rabelo et al. (2015).

RESULTS AND DISCUSSION
Two hundred thirty-six Brazilian cotton cultivars released from 1932 to 2021 were identified (Table S1).In the three periods analyzed (1932-1962, 1963-1992 and 1993-2021), 13, 28 and 195 cultivars were developed, respectively, of which 69 cultivars released from 1993 to 2021 were transgenic.The pedigrees shown in Figures 1, 2 and 3 summarize the origin of 109 cotton cultivars released in Brazil and the origin of elite genotypes used in breeding programs in the mentioned period.The other 127 cultivars, mainly those developed by private companies, were not included in the figures or in the genetic base estimated because their genealogies or records of crosses were not published or were unavailable.The disposition of female parents on the left side of the pedigrees permits the identification of the origin of cytoplasmic genes transmitted only by them.It should be highlighted that until the 1970 decade, there were no major concerns in registering or publishing the pedigrees of the developed cultivars by breeding programs.After the epidemic caused by the fungus Helminthosporium maydis on corn in the United States in 1970, vulnerability arising from the narrow genetic base of crops became a concern (Nass 2001).Nowadays, restrictions on such publications occur in private companies for reasons of economic competition.
Pedigrees are very important in plant breeding because they permit estimating the genetic distance among cultivars and evaluating the contribution of various gene pools to the current cultivars (Bowman et al. 1996), to monitor changes in genetic vulnerability over time (Van Esbroeck and Bowman 1998) and also over changes in the genetic makeup of commercial cultivars (Bowman and Gutiérrez 2003).Pedigrees also permit the identification of genetically dissimilar parents and thus have the potential to generate new variability for genetic breeding and also to avoid or minimize crosses which contribute to the narrowing of the genetic base.PLM Cruz et al.
In the development of the 109 cultivars analyzed, 68 different ancestors were used (Table 1).The ancestors CNPA SRI3 and CKR 100W had the highest relative genetic contribution (RGC), 10.43% and 6.70%, respectively.The accumulated genetic contribution (AGC) showed that only the first 12 ancestors contributed with 52.03% of the genome (Table 1).These values indicate that the genetic base of Brazilian cotton cultivars is narrow, as reported by Bertini et al. (2005) when studied 30 in-use cultivars.Narrow genetic base in cotton cultivars has also been reported in USA (Zenh 2014) and Pakistan (Saleem et al. 2020).In all these cases, the authors warned that narrowing the genetic base can hinder the genetic gain for traits of economic importance and expose the crops to the risk of vulnerability to pests and diseases.Regarding the frequency of ancestors in the pedigree (FAP), it was verified that the Express ancestor was the most frequent, participating in 33.90% of the cultivars, while Jonh Cotton Polycross, FOX, 8 to Acala Lines, AXTE, D&PLD, 5242 A and 2128 R reached 30.27%.Therefore, these mentioned ancestors should not be used as parents in breeding programs that want to widen the genetic base of Brazilian cotton cultivars.
The number of ancestors per cultivar (NAC) ranged from 1 to 15, and the average number of ancestors per cultivar (ANAC) ranged from 1.25 to 6.48 in the periods analyzed (Table 3).The greatest NAC was found in BRS 336.This cultivar was originated from a tri-parental cross of cotton cultivars: Chaco 520, BRS Itaúba, and Delta Opal.Chaco 520 is an early plant with good fiber quality, and BRS Itaúba also has good fiber and resistance to the main diseases which occur in Brazil.The ancestor Delta Opal is used for overall plant conformation and yield (Morello et al. 2012).Notably, 33   ) and percentage frequency of each ancestor in the pedigree (FAP) for the main ancestors of Brazilian cotton cultivars released in the three periods (1932-1962, 1963-1992, 1993-2021) Ancestors cultivars had only one ancestor; therefore, they were developed by direct selection from landraces or old cultivars.The ratio between the number of ancestors and the number of cultivars (NA/NC) was only 0.624.The number of ancestors that correspond to 50% to 90% of the genetic base (NAGB) was low (12 and 45, respectively), mainly in the first two periods, although it increased in the last period studied (Table 3).These results also demonstrated how narrow the genetic base of the Brazilian cotton cultivars is.
From 1932 to 1962, the Express ancestor had the highest RGC (33.33%) and FAP (33.33%) to Brazilian cotton cultivars.Three among eight ancestors (Express, Texas Bigboll, and Stoneville 2B) contributed with 75% of the genes, as shown in Table 2.They originated from Gossypium hirsutum lines from the United States, introduced in Brazil by the IAC cotton improvement program (Penna 2005).
In the second period , the number of cultivars and ancestors increased considerably, with 23 cultivars released, but the participation of 28 ancestors was low (Table 3).The C-74, CKR 100 W, Sel.Lone Star and Deltapine cultivars participated in 50.55% of genes of the genealogies, and CKR 100 W and Sel.Lone Star ancestors had the highest FAP values, 39.13% (Table 2).During this period, the genetic breeding of cotton in Brazil was carried out by the IAC, EMBRAPA, IAPAR, and EPAMIG, with the participation of public universities.Genotype exchange among institutions was frequent, including introductions and materials from hybridizations obtained between parents with complementary traits, mainly aimed at better fiber quality and disease resistance.
However, most breeding programs were resized due to the introduction of the cotton boll weevil in Brazil, making the commercial cultivation of semi-perennial cotton unfeasible.The reduction of the crop cycle and the selection of traits that allowed for delaying the infestation of the pest became a priority for the coexistence with the pest.In addition, mechanized cultivation predominating in the new agricultural frontiers (Southeast and Central-West regions of Brazil) demanded the incorporation of other parents for genetic recombination to generate appropriate cultivars for new demand.
The number of cultivars released in Brazil between 1993 and 2021 was 195.This large number is due to the "maturity" of Brazilian breeding programs in public institutions, the participation of private companies, encouraged by the Cultivar Protection Law enacted in 1997, and the increase in exports, elevating cotton to the status of a world agricultural commodity.As most cultivars were developed by private companies that do not disclose genealogies, only 75 of the 195 were included in this study.
It should be emphasized that many cultivars developed by private companies, including the transgenic ones, originated from selected genotypes that have the same origins as cultivars developed by public companies, considering that until then, private companies do not invest in developing improved germplasm.For most of these transgenic cultivars, events such as disease, pest and herbicide resistance were introduced in elite genotypes or cultivars from traditional ancestors.Therefore, such cultivars should have contributed to the further narrowing of the Brazilian cotton genetic base.
The ancestors with the highest RGC were CNPA SRI3 (13.85%),GH-11-9-75 (5.06%), CKR 100 W (5.00%), FOX (4.72%), and Jonh Cotton Polycross (4.70%) (Table 2).These ancestors were included as parents in the period from 1963 to 1992.Regarding the AGC, 12 ancestors contributed with 53.52% of the genes.Eight ancestors showed identical FAP values (43.24%).Despite the greater genetic diversity added in this period, there was also a concentration on the use of parents.The results from the different periods (Table 4) show a tendency to widen the Brazilian cotton genetic base by increasing the number of cultivars and ancestors over time.It is also supported by increasing the mean number of ancestors per cultivar and decreasing the AGC of the top four ancestors.
The genetic base of Brazilian cotton cultivars was considered narrow, mainly in the first two periods studied (1932-1962 and 1963-1992), which could have made it difficult to obtain gain yield in genetic breeding programs.Considering the period from 1993 to 2021, the diversity of the cultivars released commercially is broader than that of those developed in the two previous periods evaluated.That broadness certainly favored the development of new cultivars suitable for the new agricultural frontiers, mechanization and some yield increase.
Therefore, it is necessary to concentrate efforts on including new parents in crosses, which provide expansion of the genetic base of the cotton crop in Brazil.It is recommended to avoid choosing parents of the same origin for genetic recombination in order not to restrict selection gains for traits of agronomic interest and to avoid possible risks of vulnerability concerning pests and diseases.Therefore, in new genetic recombinations, as far as possible, genitors originating from ancestors with the highest relative genetic contribution to the cultivars released after 1992 should be avoided; they are CNPA SRI3, GH-11-9-75 and CKR 100W.

Figure 1 .
Figure 1.Pedigree of Brazilian cotton cultivars released commercially from 1932 to 2021.Boxes filled in green correspond to cultivars, names in bold correspond to ancestors and indicate continuation of the genealogy in the same figure.

Figure 3 .
Figure 3. Pedigree of Brazilian cotton cultivars released commercially from 1932 to 2021.Boxes filled in green correspond to cultivars, names in bold correspond to ancestors and indicate continuation of the genealogy in the same figure.

Figure 2 .
Figure 2. Pedigree of Brazilian cotton cultivars released commercially from 1932 to 2021.Boxes filled in green correspond to cultivars, names in bold correspond to ancestors and indicate continuation of the genealogy in the same figure.

Table 1 .
Relative genetic contribution (RGC), accumulated genetic contribution (AGC) and frequency of each ancestor in the pedigree (FAP) of Brazilian cotton cultivars released from 1932 to 2021

Table 3 .
Analysis of three periods, indicating the number of cultivars and ancestors in each one, average number of ancestors per cultivar (ANAC), accumulated genetic contribution of the top 4 ancestors (AGC 4), number of ancestors that correspond to 50% to 90% of the genetic base (NAGB) and ratio between the number of ancestors and the number of cultivars (NA/NC) in each period