Exact inbreeding coefficient of maize synthetic varieties derived from mixed lines and single crosses

There is an apparent overvaluation in the inbreeding coefficient (IC) formula for a synthetic variety (SV) derived from a mixture of S single crosses and L lines (FSynL,SC) in Zea mays. Therefore, our goal was to derive the exact and general FSynL,SC. For the development of this SV (SynL,SC), an even number of L + 2S unrelated lines whose IC was F (0<F<1) was considered. We show that the exact and general formula for FSynL,SC is: FSynL,SC = (2L + S)(1 + F)/[4(L + S) 2], and that the ICs and genotypic means of two additional synthetics developed from the same lines (one derived from (L + S2)/2 single crosses and the other derived from the L + 2S lines) must be equal. Among these three SVs, SynL,SC shows the largest IC.


INTRODUCTION
In a finite population whose individuals reproduce by panmixia, mating between relatives will likely occur in subsequent generations, resulting in a reduction of vigor (Falconer and Mackay 2001). For this reason, formulae have been derived to calculate inbreeding coefficients (ICs) and to predict the performance of a crop variety called a synthetic variety (SV), or simply a synthetic.
Synthetic varieties of some cultivated species such as maize (Zea mays L.) are usually made with lines. However, they can also be generated with single crosses, three-way line crosses, double crosses, and mixtures of these hybrids (Márquez-Sánchez 2011). Each resulting synthetic will have a particular genotypic array, IC, and genotypic mean.
The mating systems among individuals are classified into two types: preferential and random or panmictic. In this paper, inbreeding (mating among related individuals) in the context of random mating will be addressed, because it involves the reproduction of synthetics. In an applied context, SV inbreeding is important because it is related to the genotypic mean of traits of economic importance, such as maize grain yield (Busbice 1970) and onion (Allium cepa L.) seed production (Magalhães et al. 2020) Malécot (1948) defined the inbreeding coefficient (IC) as the probability that two genes in the same locus are identical by descent. Two genes are D Arellano-Suarez et al. considered identical by descent when both are copies of the same ancestral gene. In this context, the base population is the ancestor generation, which has no genes identical by descent and an IC equal to zero.
Cultivated species that mate randomly, such as onion and maize, facilitate the formation of synthetic varieties derived from several parents, each represented by m plants. According to Wricke and Weber (1986), synthetics show genetic variability, adaptability to diverse environments, and stability in their corresponding genotypic arrays across generations. At the same time, randomness, a finite number of individuals (particularly when m is small), and genetic variability among the plants representing each parent might cause inbreeding and genetic erosion.
Several recent studies have been conducted to investigate the adaptability and performance of synthetic varieties. Farid et al. (2019) found two adaptive synthetics whose grain yield under drought conditions was at least 6.0 t ha -1 . Badu-Apraku et al. (2018) reported a genetic gain of 423 kg ha -1 cycle -1 in a SV after five cycles of S 1 family selection for resistance to Striga hermonthica (Delile) Benth. This synthetic underwent three selection cycles for grain yield under drought. The International Maize and Wheat Improvement Center (CIMMYT) releases "open pollinated varieties" for grain yield potential and biotic and abiotic stress tolerance for areas prone to environmental stresses (Masuka et al. 2017). These varieties are synthetics whose parent lines that are a byproduct of the hybrid parental line development pipeline.
Maize synthetics, besides being used as improved varieties in developing countries, are also being used in breeding programs (Saboor et al. 2018). For instance, Oliveira et al. (2016) studied the methods to identify genotypes or populations that will ensure success when crossed. The authors used a diallel whose parents were eight synthetic varieties to study combining abilities and best linear unbiased predictions (BLUP). Articles on other methods to estimate combining abilities of potential parents of a SV or a hybrid have also been published. Genet et al. (2017) used testers and Afekhai et al. (2017) based this estimation on the use of the North Carolina mating design II. Furthermore, recurrent selection has also been used to improve performance of SVs (Baktash 2016) and Saito et al. (2018) considering adaptability and stability, found five corn lines suitable to produce dent synthetics resistant to gray leaf spot and northern leaf blight.
An economic advantage in using a SV is that it allows farmers to produce their own seed without changing the genotypic array, if there is no pollen contamination (Masuka et al. 2017). Furthermore, using hybrids instead of synthetic parents as lines can save resources (Márquez-Sánchez 2011). It might be assumed that a synthetic derived from hybrids is the same as the one that would be developed from the parental lines of such hybrids. However, this might not necessarily be the case. With hybrids, the demand for resources must be less. For example, with 16 lines it is possible to develop ( 16 2 ), ( 16 3 ), ... , ( 16 16 ) synthetic varieties whose parents are 2, 3, …, 16 lines, respectively, and the grain yields of the possible Σ 16 i=2 ( 16 i ) = 65519 synthetics can be predicted based on data obtained from evaluating the complete 16 x 16 diallel that would be formed with the 16 lines. Furthermore, with 8 single crosses involving the 16 lines, predicting the Σ 8 i=2 ( 8 i ) = 247 possible synthetics from two or more parents (single crosses) can be made based on the resulting 8 x 8 diallel. This is 75% smaller than the 16 x 16 diallel. With 4 double crosses involving the 16 lines, synthetic prediction can be made, but only for 15 synthetics from two or more double crosses. Moreover, many of the best synthetics might not be predicted in this manner. Therefore, other alternatives must be explored.
Some resources would be saved by using a synthetic whose parents are a mixture of a number of S single crosses and L lines (Syn L,SC ) instead of using only parental lines. In addition, if L > 0 and S > 0, more synthetics could be predicted than when their parents are only single crosses or double crosses. Sahagún-Castellanos et al. (2013) developed a formula for the IC of a Syn L,SC . In this case, the IC of all involved lines was called F (0 < F < 1). However, this formula contains an apparent overestimation of the coancestry among individuals representing each single cross. This overvaluation would affect both the synthetic IC that would be formed with only single crosses (FSyn SC ) and the IC of Syn L,SC (FSyn L,SC ). The hypothesis of this study is that the intraparental coancestry was not derived accurately in previous studies. The main objective was to derive the exact and general IC (lines not necessarily pure) for synthetic varieties whose parents are L lines and S single crosses.

MATERIAL AND METHODS
The study was based on the one locus model of a population of a diploid species where its individuals mate randomly. To develop a SV whose parents are L and S single crosses, an even number of L + 2S unrelated initial lines whose IC was F (0 < F < 1) were considered. Then, S single crosses were visualized with 2S lines. Two additional synthetic varieties developed from the same L + 2S lines were also visualized. Their parents were: 1) the initial L + 2S lines (Syn L, ), and 2) (L + 2S)/2 single crosses developed from all L + 2S lines (Syn SC ). In the three cases, each parent (line or single cross) was represented by m plants (Figure 1).
The IC of Syn L,SC synthetic developed by the random pairing of L lines and S single crosses was determined based on Malécot (1948)'s approach using the concepts of genotypic array and the gametic array of a panmictic population. Kempthorne (1969) considers that if the frequency of the A i gene (i = 1, 2..., a) in a panmictic population is p i , the gametic (GAA) and genotypic (GEA) arrays are expressed as: Classical probability was used to determine the IC of a SV. We assumed the existence of equally possible and mutually exclusive events, and that, the IC is the probability that two random genes, carried by two separate gametes (one male and one female) are identical by descent. FSyn L,SC is composed of contributions from two sources: self-fertilizations and intraparental crosses. To calculate the contribution of a source, the probability of identity by descent is multiplied by the quotient of the number of the corresponding genotypes in the genotypic array divided by the total genotype number. Thus, the IC of the synthetic is the sum of both contributions. This is equivalent to what occurs in the GEA when A i A j is replaced by P(A i ≡ A j ), i.e. the probability that A i and A j are identical by descent.
The random mating of the m(L + S) plants that generate the Syn L,SC synthetic implies random mating among the m plants of each parent and any other parental group. Therefore, the derivation of the FSyn L,SC formula was based on the inbreeding contributions of each line and each single cross.
Regarding the IC of the progeny of a single cross, the two parental lines were assumed as two virtual populations represented by A 1 A 2 and B 1 B 2 , such that P(A 1 ≡ A 2 ) = P(B 1 ≡ B 2 ) = F.

RESULTS AND DISCUSSION
The single cross A 1 A 2 x B 1 B 2 produces progeny whose genotypic array (GEA SC ) is: The set of m plants that represents this single cross can be visualized as the result of taking a random sample (with replacement) of size m from a population formed by the four genotypes of this GEA SC (Equation 1). The random The inbreeding coefficient of the population produced by the random mating of m plants representing a single cross (F SC(m) ) can also be expressed in terms of m and the two inbreeding sources: the m self-fertilizations and the m(m -1) intraparental crosses, whose average inbreeding coefficients are 1/2 and Γ 0,W (the average coancestry among individuals within a parent), respectively. In addition: 1) each self-fertilization and each cross produces progeny whose genotypes can be visualized as the 4 possible outcomes that result from uniting one random male and female gene, and 2) the total number of possible results from the random mating of m plants is 4m + 4m(m -1) = 4m 2 . Thus, based on these considerations and the definition of the inbreeding coefficient:

)/4 (Equation 4) and F SC = F SC(m) (Equation 5):
Then, solving for Γ 0,W : It is evident that Γ 0,W depends directly on F. Regarding the relationship between Γ 0,W and F, consider that the cross of two individuals produces progeny whose inbreeding coefficient is directly related to the level of coancestry of these two individuals. In a panmictic population, the inbreeding level of the progeny and the coancestry of their parents are equal.
By substituting the result of Equation 6 into Equation 5, we obtain: The F SC(m) proportion due to self-fertilization [F SC, X ] according to Equation 7 is: This reduces to the following equation: According to Equation (9), there is an inverse relationship between F SC,X and m. That is, when m increases, the importance of self-fertilization as an inbreeding source decreases. Notably, F SC,X does not depend on the inbreeding coefficient of the lines. However, the self-pollination of a plant formed by the cross between two unrelated lines of any inbreeding level will always produce progeny whose expected genotype frequency of two identical genes is 1/2. Regarding the coancestry among individuals within a parent, the contribution to F SC(m) (F SC,Γ 0,W ), based on Equations 5 and 6, is: As expected, according to Equation 10, F SC,Γ 0,W is directly related to F and m. If the initial lines were pure (F = 1), then F SC,Γ 0,W = (m -1)/2m. Thus, F SC,Γ 0,W only depends directly on m. Note that for this particular case, F SC,Γ 0,W is greater than F SC,X only when m > 2.
An equation for FSyn SC was previously derived (Márquez-Sánchez 2011). This equation, however, is restricted to the case of pure lines (F = 1). The obtained results are coincident with our results for a similar case. For the general case where 0 < F < 1, Sahagún-Castellanos et al. (2013) considered that Γ 0,W = (1 + F)/2, which is not possible for a population formed by m representatives of a single cross between two unrelated lines. In this case, the maximum value for a coancestry is 1/2, which is the coancestry of a plant with itself (via self-fertilization) or between two individuals that have the same genotype.
As previously stated, the genotypic mean of a SV trait such as grain yield in maize has an inverse relationship with inbreeding (Busbice 1970). According to Equation 9, for the case of inbreeding by self-fertilization in a synthetic developed from single crosses, as m increases, the genotypic mean increases because the IC decreases. An inverse effect of the same magnitude is observed in F SC,Γ 0,W (Equation 10), since it can be expressed as (1 + F)/4 -(1/2m). This implies that the expected genotypic mean does not depend on m. Consider now the random pairing of the mS representatives of the S single crosses. The resulting synthetic (Syn SC ) is composed of the progeny of S(S -1) crosses between parents and the intraparental crosses within each parent. Since the inbreeding coefficients of these two crosses are zero (because the single crosses are unrelated) and FSyn SC(m) (Equation 7), respectively, then the Syn SC IC (FSyn SC ) is equal to:

Or, in compact form (Equation 4):
The synthetic developed by randomly mating L lines, each represented by m plants (Syn L ), is a population whose IC D Arellano-Suarez et al.
The contribution by intraparental coancestry to FSyn L (F SC,Γ 0,W ) is: Equations 14 and 15 show that contributions by self-pollination and intraparental coancestry are counteracted by changes in m; that is, m does not affect FSyn L , as evidenced by its reduced formula (Equation 13).
If the parents of a synthetic are L lines and S single crosses (Syn L,SC ), the IC of this synthetic (FSyn L,SC ) must be composed of the contribution from self-pollination and from intrapaternal crosses from both lines and single crosses (four terms). Based on Equations 11 and 13: Or in reduced form: This is the exact and general Syn L,SC (0 < F < 1) IC.

From Equation 17
: a) If S = 0, FSyn L,SC = FSyn L . Therefore, the IC of a synthetic whose parents are L lines is: b) If L = 0, FSyn L,SC = FSyn SC . Thus, the IC of a synthetic whose parents are S single crosses is: Rodríguez-Pérez et al. (2019) derived the IC of a synthetic developed from the random mating of d double crosses, each represented by a "large" number of plants (FSyn DC ). They found that from the ICs for Syn L (Equation 18) and for Syn SC (Equation 19), FSyn DC = (1 + F)/8d, as expected.
The independence between the inbreeding coefficients of the synthetic varieties and m in Equations 17 to 19 occurs because the gene frequencies are not altered when m is increased or decreased. Therefore, neither the synthetic genotypic arrays, nor their inbreeding coefficients or genotypic means should be modified.
Consider now that if the initial lines were pure (F = 1), Equation 17 reduces to: Equations 18 and 19 were generated directly from Equations 12 and 13. These equations imply that for one synthetic formed with L' = L + 2S lines and another with S' = L'/2 single crosses which involve the same L' lines, their inbreeding coefficients (FSyn L' and FSyn SC ' ) must be equal: FSyn L' = FSyn SC ' . This occurs because the gene frequencies involved in the two parental sets are the same in both cases. If from all these L + 2S lines, a Syn L,SC with S > 1 and L > 1 is generated, the gene frequencies are unbalanced and FSyn L,SC must be larger than FSyn L' and FSyn SC ' . Similarly, Ibarra-Sánchez et al. (2019) found that the IC of a synthetic developed from t = L'/3 three-way line hybrids (Syn T ) is larger than FSyn L' and FSyn SC ' .
Regarding the number of parents, the inverse FSyn L and FSyn SC relationship (Table 1 and Equations 17-19) occurs because each increase in the number of parents increases the number of interparental crosses (which do not contribute to inbreeding) in a greater proportion than the number of self-pollinations and intrapaternal crosses, which provide genotypes formed by two identical by descent genes.
The direct relationship of F with the IC of the synthetics evidenced in Equations 17 to 19 is obvious. When F increases, the contributions to the inbreeding of the synthetic by self-fertilization and intraparental crosses also increase, while the interparental crosses will never contribute to the IC because the parents are not related. Equations 17 to 22 are directly interpretable. Table 1 shows the inbreeding coefficients for 88 synthetic varieties formed with 22 combinations of lines and single crosses, each with four levels of inbreeding in the initial parental lines.
As expected from Equation 17, Table 1 shows that for each mixture of L lines and S single crosses, the inbreeding coefficients are directly and linearly related to F. However, in synthetic varieties developed with the same number of initial lines (6, 8, 10, or 12), for each of four F values (0.00, 0.50, 0.75 and 1.00), the IC of the synthetic varieties formed only with lines or only with single crosses are equal. This occurs because the genes and their frequencies are the same in both cases. This homogeneity of gene frequencies and random mating maximizes the formation of genotypes containing D Arellano-Suarez et al. two genes from two different lines. However, as L approaches S, the resulting inbreeding coefficients of the synthetic s are larger because the variability of the gene frequencies of the L + 2S lines involved in the development of Syn L,SC is increased. This larger gene frequency variance favors the increase of the frequency of genotypes formed by two identical by descent genes, which are the genes whose frequencies are the largest (those of the lines).
In summary, although the 3 synthetics have the same genes, the gene frequencies are the same only in Syn SC and Syn L . Therefore, a) Syn SC = Syn L ≠ Syn L,SC , b) FSyn SC = FSyn L < FSyn L,SC , and c) Syn L,SC must have the smallest genotypic mean (Busbice 1970).