Genetic approaches for studying transgene inheritance and genetic recombination in three successive generations of transformed tobacco

Transgene integration into plant genomes is a complex process accompanied by molecular rearrangements. Classic methods that are normally used to study transgenic population genetics are generally inadequate for assessing such integration. Two major characteristics of transgenic populations are that a transgenic genome may harbor many copies of the transgene and that molecular rearrangements can create an unstable transgenic locus. In this work, we examined the segregation of T1, T2 and T3 transgenic tobacco progenies. Since transfer DNA (T-DNA) contains the NptII selectable marker gene that confers resistance to kanamycin, we used this characteristic in developing a method to estimate the number of functional inserts integrated into the genome. This approach was based on calculation of the theoretical segregation ratios in successive generations. Mendelian ratios of 3:1, 15:1 and 63:1 were confirmed for five transformation events whereas six transformation events yielded non-segregating progenies, a finding that raised questions about causal factors. A second approach based on a maximum likelihood method was performed to estimate recombination frequencies between linked inserts. Recombination estimates varied among transformation events and over generations. Some transgenic loci were unstable and evolved continuously to segregate independently in the T3 generation. Recombination and amplification of the transgene and filler DNA yielded additional transformed genotypes.


Introduction
Plant transformation mediated by Agrobacterium tumefaciens has become the most used method for introducing foreign genes into plant cells. This method yields a high level of perfect transgenic loci with complete conservation of the host genome (Pawlowski and Somers, 1996). The mechanisms involved in the integration of transfer DNA (T-DNA) are still not well characterized, although integration is considered to occur by illegitimate recombination (Tinland, 1996;Gorbunova and Levy, 1997;Salomon and Puchta, 1998;Britt, 1999;Brunaud et al., 2002;Van Attikum and Hooykaas, 2003). The structure of transgenic loci depends on genomic factors and does not depend on the method how the transgene is transferred to the genome (Somers and Makarevitch, 2004). The complexity of integration mechanisms leads to transgenic loci consisting of two or many copies of the transgene (De Neve et al., 1997;Takano et al., 1997). Frequently, copies of the transgene are arranged in the same direction and separated by filler DNA (Krizkova and Hrouda, 1998). Integration is often as-sociated with complex rearrangements including deletions, filler DNA, inversions and duplication of the original inserted sequence (Jones et al., 1994;Zhu et al., 2010).
Intra-chromosomal recombination occurs during meiosis and mitosis, with spontaneous recombination normally being a rare event (10 -6 -10 -5 events per cellular division). Embryogenic cells have the highest recombination ability, with an average of 3 x 10 -5 recombination events per genome (Yang et al., 2010). The frequency of recombination can be strongly increased by T-DNA integrations, which cause double-strand breaks (Gorbunova and Levy, 1999;Wehrkamp-Richter et al., 2009), or by other factors related to stress (Dong, 2004). Recombination between copies of the transgene has been reported for transgenic loci in various plant species (Jones et al., 1985;Eckes et al., 1986;Christou et al., 1989;Tovar and Lichtenstein, 1992;Choffens et al., 2001).
The objective of this study was to examine the number of functional inserts, the mode of transgene inheritance and the recombination frequencies of linked inserts in the first three generations of transgenic tobacco lines.

Transgenic material
Nicotiana tabacum L plants were genetically transformed with Agrobacterium tumefaciens by using the leaf disk method. The T-DNA consisted of a traditional cassette made up of the neomycin phosphotransferase selection marker gene (NptII) driven by the nopaline synthase promoter and the b-glucuronidase reporter gene (GUS) under control of the CaMV 35S promoter (Sanders et al., 1987). Only the expression and inheritance of NptII that conferred resistance to aminoglycosidic antibiotics was studied. Transformed plants were regenerated on selective medium containing kanamycin and were grown in a chamber under carefully controlled conditions. After three weeks, transformed plants were transferred to larger pots and kept in a greenhouse with no pollinating insects or wind in order to avoid pollen dispersal and hybridization. At maturity, seeds from each plant (T0) were collected in a single tube and preserved in bottles with silica gel to avoid humidification and the loss of germination capacity (S Thaminy, unpublished data). Seeds from each plant (line T0) were cultured on medium containing MS salts (Murashige and Skoog, 1962) and 100 mg of kanamycin/L in Petri dishes; this con-centration of antibiotic was sufficient to select transformed plants (Klein et al., 1988;Tavazza et al., 1988;Misra, 1989). In this assay, non-transformed seedlings turned brown and died while transformed seedlings survived and grew healthy. T1 transgenic plants (n = 25-30) were transferred to larger pots and kept in a greenhouse. At maturity, seeds from each line were collected and mixed to form the "bulk" for the T2 generation. Subsequently, 50-100 seeds of the bulk were cultured on selective medium and the T2 resistant plants (n = 25-30) were transferred to soil. At maturity, seeds from each transformed plant were harvested and tested for resistance to kanamycin. Table 1 summarizes the experimental protocol.

Segregation analysis
Segregation analysis was done using the c 2 test in which observed values were compared to theoretical values corresponding to the integration of one or more copies of the transgene.

Estimation of recombination frequencies
Recombination frequencies were estimated using a genetic approach derived from the maximum likelihood method. Suppose that there are two linked inserts in the host genome, with both insertions (genes) having two allelic forms (K 1 R and K 1 S for the first gene and K 2 R and K 2 S for the second gene). K R indicates the presence of a functional insert whereas K S indicates the absence of the insert. The two transgenes are inserted in cis or in trans. The distance separating the physically linked inserts is defined as d and r is the recombination frequency. When r ³ 50% the two transgenes segregate independently and when r < 50% the two transgenes recombine with a frequency r. Thus, for two linked inserts in cis: Tizaoui and Kchouk 641 where n 1 is the frequency of sensitive seedlings, n 2 is the frequency of resistant seedlings and n 1 + n 2 = 1. The probability of observing n 1 or n 2 follows multinomial law: The distance between two inserts is defined as: For two linked inserts in trans, the recombination frequency was estimated as described above for insertion in cis, but with Similarly, for cases involving three transgenes with two inserts linked in cis or trans the recombination frequency was estimated based on the principles outlined above; Table 2 summarizes the calculations involved in this analysis.

Theoretical considerations
Segregation ratios depend on the number of functional inserts integrated into T0 plants. The greater the number of copies of independently segregating inserts in the genome, the greater the probability of obtaining K R gametes and, consequently, the ratio of kanamycin-resistant plants increases. The marker gene NptII is considered a dominant trait. Self-pollinated tobacco plants were grown in a greenhouse to avoid inter-crosses. In this model, there are two variables: generation (T 1 , T 2 ,... T n ,T n+1 ) and the number of inserts (I 1 , I 2 , ..., I n , I n+1 ). The following equations were used to calculate theoretical values in transgenic populations: where XT n is the number of possible genotypes or zygotes in generation T n , XT n+1 is the number of possible genotypes or zygotes in generation T n+1 and y is the number of possible gametes that depends on the number of insertions. Generation varies while y is stable.
yI yI n n where yI n is the number of possible gametes in the case of n inserts and yI n+1 is the number of possible gametes in the case of n + 1 inserts.
xT xT xT XT n n Equation (9) is applied in the case of one insert where xT n+1 is the number of sensitive seedlings (K S //K S ) in generation T n+1 , xT 1 is the number of sensitive seedlings (K S //K S ) in the case of one insert in T1, xT n is the number of sensitive seedlings (K S //K S ) in generation T n and XT 1 is the total number of possible zygotes in T1. This equation can be extended to the case of two, three,... I n , I n+1 inserts: where xI n+1 is the number of sensitive seedlings (K S //K S ) in the case of n + 1 inserts and xI 1 is the number of sensitive seedlings in the case of one insert. When the progenies of each T0 line were tested on selective medium, sensitive homozygous seedlings (K S //K S ) died. This selection was taken into account when calculating the theoretical segregation ratios (Table 3). When the number of inserts was greater than one copy, Eq. (11) was used to calculate the theoretical number of kanamycin-sensitive homozygous plants in each generation. Sensitive homozygous plants unable to grow on selective medium did not participate in reproduction in the next generation and were excluded from calculations. The populations tended to be homozygous at equilibrium.
where xT n+1 is the number of sensitive seedlings (K S //K S ) in generation T n+1 with selection against (K S //K S ) in generation T n .

Identification of transgenic plants
Eleven self-pollinated transgenic tobacco lines considered as separate transformation events were analyzed (Table 4). Kanamycin-resistant plants had green leaves, were well rooted and developed on selective medium. In contrast, sensitive-kanamycin plants had yellow leaves and weak rooting with delayed growth; these plants died early 642 Transgene inheritance and genetic recombination in transformed tobacco d: distance between two linked inserts, r: frequency of recombination between two linked inserts, n 1 : frequency of sensitive seedlings, n 2 : frequency of resistant seedlings.
at the two-leaf stage. The frequency of kanamycin-resistant plants varied among progenies. Lines L 1 and L 7 had homogenous progenies with a kanamycin-resistant phenotype in T1 and T2 generations. There was no segregation in either generation, possibly because the parent plants were homozygous for one or more copies of the transgene.
Line L 2 progeny was homogenous for kanamycinresistance in T1 and heterogeneous in T2. The segregation in T2 did not reflect the hypothesis of a homozygous locus. This finding indicated that the genome harbored many copies of the transgene since the frequency of resistant plants depends on number of expressed inserts.
Lines L 3r , L 14r and L 16r had both kanamycin-resistant and kanamycin-sensitive plants in T1, but in T2 all progenies were kanamycin-resistant. The increase in transgene expression may have resulted from the amplification of one or more copies of the original insert or, alternatively, the transgenic locus may have become homozygous in T2.
Lines L 4 , L 4r , L 6 , L 17 and L 17r had heterogeneous progenies in both generations and the respective parents may have been heterozygous for one or more copies of the transgene. Segregating progenies are the primary material for studying transgene inheritance and recombination.
Inter-transformant variability is attributable to the fact that each transgenic line is the result of a single, separate transformation event, with one or more NpTII transgenes being inserted at a single locus, or inserted independently at segregating loci. Consequently, screening a transgenic line based only on transgene expression is insufficient; it is more convenient to screen for plants with a single copy of the transgene.

Segregation analysis
Hypotheses of segregation can be tested only for heterogeneous progenies ( Table 5). The progenies of L 2 were homogenous with regard to the resistant phenotype in T1. The number of individuals screened was much greater in T2, in which a sensitive phenotype was observed. The c 2 test was significant for one, two and three inserts, suggesting that this line may harbor more than three copies of the transgene.
Lines L 3r , L 14r and L 16r had heterogeneous progenies in T1. The c 2 test suggested the presence of more than one insert for lines L 3r and L 16r , whereas for L 14r , this test was significant for one, two and three inserts. The corresponding progenies became homogenous and had a kanamycin-resistant phenotype in T2, making it impossible to test the hypothesis of segregation in this generation. The instability of these transgenic lines may be attributable to amplification of the original transgenic loci or other complex rearrangements.
Lines L 4 , L 4r , L 6 , L 17 and L 17r had heterogeneous progenies in T1 and T2. Line L 4r had a stable 3:1 segregation ratio in both generations, in agreement with the presence of a single functional insert. For line L 4 , the hypothesis of a single insert was confirmed in T1 but not verified in T2. A 15:1 segregation ratio was confirmed for L 6 in T1 and T2 and the hypothesis of three inserts was also confirmed in T2. For lines L 17 and L 17r , the c 2 test was not significant for a single insert in T1 and not significant for two inserts in T2. The frequency of resistant individuals increased in T2, possibly as a result of recombination and amplification.

Genetic recombination analysis
When inserts were physically linked (Figure 1), the distance between the two inserts and their position (cis or trans) need to be considered. Only distances between 0 and Tizaoui and Kchouk 643  Transgene inheritance and genetic recombination in transformed tobacco  . K 1 R and K 2 R indicate transgenic loci whereas K 1 S and K 2 S indicate absence of the transgene on the chromosome. [K R ] and [K S ] are the kanamycin-resistant and kanamycin-sensitive phenotypes, respectively. r -recombination frequency between the linked inserts. Gametes T0 are gametes produced by T0 transformed plants. When r ³ 0.5, the two transgenes segregate independently while for r < 0.5 the two transgenes recombined with a frequency (r) equal to the distance separating the physically linked inserts. 0.5 Morgans were considered (Table 6). This analysis in lines L 3r and L 16r confirmed the hypothesis of a single insert. The estimated recombination frequencies between two linked inserts in cis were 32% for L 3r and 20% for L 16r . Linked inserts were not sufficiently far apart to segregate independently, which explains why the c 2 test was not significant for a single copy of the transgene. For L 14r , the c 2 test was not significant for one, two and three transgene copies. The estimated recombination frequency between two linked inserts in cis was 26%.
For L 2 in T2, the c 2 test was highly significant for the presence of two copies and significant for three inserts. Based on estimated recombination frequencies, we concluded that L 2 harbored three inserts, of which two were linked in trans and recombined with a frequency of 40%.
T1 of line L 4 confirmed the hypothesis of one copy, with a recombination frequency of 0%. Hypotheses of one and two independent inserts were not confirmed in T2 and the estimated recombination frequency between two linked inserts in cis was very low (4%). For line L 4r , the c 2 test confirmed the hypothesis of a single insert in T1 and T2 and the recombination frequencies were not considered in these cases.
For line L 6 , hypothesis of two inserts was confirmed in T1 and T2, both of which had high recombination frequencies (46% in T1 and 49% in T2), indicating that the two inserts segregated independently.
For line L 17 , in which the hypothesis of a single insert was confirmed in T1, the recombination frequency was low (7%). In T2, hypothesis of two inserts was confirmed with a recombination frequency of 43%. These results suggested the presence of two tightly linked inserts in cis in T1, with a low recombination frequency. The transgenic locus evolved in T2 and inserts were far enough apart to recombine with a high frequency.
For line L 17r in T1, two linked inserts in cis recombined with a frequency of 24%, which explained why the c 2 test was not significant for the presence of a single insert. The recombination frequency increased sufficiently (to 42%) to allow independent segregation in T2.
The findings described above indicated that L 4 , L 17 and L 17r carried unstable transgenic loci. Since filler DNA is a property of complex integration and amplification was detected at sites of insertion, we hypothesized that linked inserts were far apart because filler DNA may have been amplified. Similarly, the original copy of the transgene could have been amplified or duplicated. Molecular approaches could be used to confirm these hypotheses.

T3 Generation analysis and screening good events of transformation
For each line with heterogeneous progenies in T1 and T2 we separately analyzed six T3 progenies obtained by self-pollination of T2 individuals (Table 7). Lines L 4-x-1 , L 4-x-2 and L 4-x-3 had heterogeneous progenies with a ratio of 15:1, indicating the segregation of two independent inserts. The corresponding recombination frequencies were high (44%, 40% and 46%, respectively). These lines might be heterozygous with two copies of the transgene. Two progenies originated from L 4-x-5 and L 4-x-6 progenies were homogenous for the kanamycin-resistant phenotype, indicating that the corresponding parents might be homozygous for one or two copies of the transgene. For line L 4-x-4 , the presence of one, two and three copies of the transgene was not confirmed and the recombination frequency between two linked inserts in cis was 27%. Line L 4 was possibly heterozygous for two tightly linked inserts in cis in T1, with recombination frequencies varying between 27% and 46%. The original parent may also have been heterozygous for a single insert which was amplified in T3.
Two lines (L 6-x-1 , and L 6-x-3 ) had heterogeneous progenies, which confirmed the hypothesis of two independently segregating transgenic loci. We rejected the possibility of two linked inserts in trans because the hypothesis of two independent inserts was confirmed in T1 and T2. The hypoth-Tizaoui and Kchouk 645 esis of three inserts, two of which were linked in cis, was also confirmed; in this case, the recombination frequencies were 38% (L 6-x-1 ) and 35% (L 6-x-3 ). Four lines (L 6-x-2 , L 6-x-4 , L 6-x-5 and L 6-x-6 ) had homogenous progenies with a kanamycin-resistant phenotype; these lines may be homozygous for one, two or three copies of the transgene.
T3 confirmed the stability of line L 4r ; three lines (L 4r-x-1 , L 4r-x-2 , L 4r-x-1 ) had segregation ratios indicative of the presence of a single functional transgenic locus. Lines L 4r-x-3 , L 4r-x-4 and L 4r-x-6 were homozygous for one copy of the transgene. From these three progenies, we screened homozygous lines for a single copy of the transgene with stable and acceptable transgene expression. T3 progenies showed Mendelian inheritance of the transgene and confirmed the hypotheses for T1 and T2.
Line L 17r-x had only one heterogeneous progeny (L 17r-x-5 ) and the estimated recombination frequency between two linked inserts in cis was 31%. Lines L 17r-x-1 , L 17r-x-2 , L 17r-x-4 and L 17r-x-6 were probably homozygous for two copies of the transgene because their progenies were homogenous for kanamycin-resistance. Interestingly, for the kanamycin-sensitive progeny of L 17r-x-3 the genotype was probably homozygous (K S //K S ). Sensitive progeny 646 Transgene inheritance and genetic recombination in transformed tobacco     would be expected with a probability of 1:4 in the case of one copy and 1:16 in the case of two copies. For the genetic engineer who desires excessive expression of the transgene, the best approach would be to screen homozygous lines for two copies of the transgene in homogenous kanamycin-resistant progenies of lines L 4 , L 6 , L 17 and L 17r . In addition, lines that yielded homogenous resistant progenies in T1 and/or T2 (L 1 , L 7 , L 3r , L 14r and L 16r ) may have more than three copies of the transgene.

Discussion
The analysis of transgenic segregating progenies based on the two approaches described here provided additional information concerning the transgenic population. The major findings of this study agreed with those of previous reports. Genetic analysis confirmed high inter-transformant variability. Indeed, expression levels can vary considerably among plants transformed with the same construct (Hobbs et al., 1990;Peach and Velten, 1991) and in most cases, this expression does not correlate with the copy number (Mlynarova et al., 1991;Hobbs et al., 1993). The copy numbers of transgenic and rearranged fragments are often highly variable, possibly because one or more transgenes can occur at any site.
The two approaches described here were useful for confirming hypotheses regarding the number of insert copies (one insert, two independent or linked inserts, three independent inserts or three inserts of which two were linked) but were unsuitable for non-segregating progenies. Lines with homogenous kanamycin-resistant progenies in T1, T2 and T3 may harbor many copies of the transgene. The best hypothesis for explaining non-segregating progenies is that each chromatid sister possesses a functional transgene. Kohli et al. (1998) reported that the first integrated site acts as a hot spot to integrate more copies of the transgene. This can result in multiple T-DNA insertions (De Neve et al., 1997), with single transgene insertions occurring at a low frequency (Huang et al., 2001). T-DNA acts as an endogenous stimulus that activates the cellular machinery (Fagard and Vaucheret, 2000). As a result, a previously stable genome can become particularly reactive in response to newly inserted transgenes, depending on the extent of inter-genic reactions (Jones et al., 1985;Gheysen et al., 1987;Mayerhofer et al., 1991;Petrov, 1997;Drews and Yadegari, 2002;Brunaud et al., 2002;Van Attikum and Hooykaas, 2003).
Inter-transformant variability was accompanied by variation within the transformed line; transgene expression in most of the lines was unstable and increased across generations. Enhanced transgene expression can be explained by amplification or duplication of the original transgene loci. Since amplification and duplication are frequent events during the repair of double-strand breaks (Spencer et al., 1992;Cannell et al., 1999;Cucu et al., 2002) the number of transgene copies increases in the host genome. This explains why the progenies of lines L 3r , L 14r and L 16r became homogenous for the kanamycin-resistant phenotype in T2 and T3. This observation agrees with the finding of Yong et al. (2006) who reported that homozygous transgenic progeny plants were obtained in T2. In meiotic cells, a copy of the transgene on one chromatid can be passed to the allelic position on the opposite homologue so that the transformed line becomes homozygous for the transgene. Moreover, for self-pollinating species, all loci become homozygous at equilibrium. Upon selfing, the epigenetically silenced loci may segregate, thereby restoring expression of the trans-silenced locus (Khaitová et al., 2011).
The modified maximum likelihood method used here showed that there was frequent crossing-over between linked inserts. Crossing-over occurs naturally in plants and its major role is to generate new genetic combinations; this phenomenon is observed at meiosis and during mitosis between sister-chromatids (Gal et al., 1991;Gorbunova and Levy, 1999). The frequency of crossing-over increases in response to endogenous and exogenous stimuli such as transgenes newly integrated into the genome. Filler DNA, which has been observed in complex transgenic loci (Gheysen et al., 1987;Krizkova and Hrouda, 1998;Brunaud et al., 2002;Theuns et al., 2002;Somers and Makarevitch, 2004), may be amplified such that inserts that were previously tightly linked at the same transgenic site, now become sufficiently separated from each other physically to allow detectable crossing-over.
Several studies have shown that transgene integration sites exhibit different levels of structural complexity ranging from the simple integration of two apparently contiguous transgene copies to tightly linked clusters of multiple copies of transgenes interspersed with host DNA (Svitashev et al., 2000). Epistatic interaction between different loci and/or allelic interaction within a single locus also occur (Matzke and Matzke, 1995;Nap et al., 1997). In the present study, only line L 4r was stable, with a 3:1 segregation ratio, indicating the presence of a single functional transgenic locus. This line represented a good transformation event since the stability of transgene expression is a challenge for genetic engineering. However, such analyses should not be limited to the first or second generation.
The results described here showed that transgene inheritance followed Mendelian laws. Mendelian segregation has not been verified for most transformed lines because of transgene instability. The instability of transgenic loci may reflect complex rearrangements, especially amplification of the transgene and filler DNA. Amplification can increase the recombination frequencies, leading to more transformed genotypes. The scenario of transgene introduction may reflect what happened in the history of gene movement among relatives in land races or through horizontal gene transfer (Parrott, 2010).
The genetic approaches developed in this work were efficient because they allowed us to address fundamental and practical issues: (1) they allowed us to screen for stable genetic transformation events that are desirable for breeding programs, (2) they provided insights into the evolution and variation of transgenic loci in early generations (T1, T2 and T3) and (3) they facilitated the study of transgene inheritance. However, future investigations should use molecular analyses such as quantitative PCR to quantify transgenes in the host genome.