Auxin independent1 (Axi1) as an endogenous gene for copy number determination in transgenic tobacco

Determining the copy number is important because it can greatly influence the expression level and genetic stability of transgenes. This study aimed to establish a methodology that can estimate the transgene copy number in tobacco (Nicotiana tabacum) using the Auxin independent1 (Axi1) gene as an endogenous control. Tobacco c.v. Petit Havana plants were transformed via Agrobacterium tumefaciens with a pMCG1005 vector that contained Bar as a selective gene. Oligonucleotide efficiency was determined and a qPCR using the 2-ΔCt and 2-ΔΔCt methods was performed. Bar was the target gene and Axi1 was the endogenous control in five transgenic tobacco events. The results showed that the Axi1 gene was donated by the maternal parent N. sylvestris when interspecific hybridization occurred between N. sylvestris and N. tomentosiformis. The copy number results agreed with the segregation ratios for the Bar gene in T1 plants, which confirmed that Axi1 is a single copy gene that can be used as an endogenous control.


INTRODUCTION
Genetic engineering incorporates new genes into transgenic plants from completely unrelated species, which can be transmitted to their descendants (Low et al. 2018, Nazir et al. 2019. In 1983, the first transgenic tobacco (Nicotiana tabacum) plants containing a gene coding for resistance against the antibiotic kanamycin were generated (Fraley et al. 1983). Since then, several proteins have been expressed in transgenic tobacco and other plants (Jube and Borthakur 2007, Burnett and Burnett 2019, Moon et al. 2020. Tobacco is a highly productive, non-food crop that is cultivated in about 120 countries throughout the world. Furthermore, it is considered to be a model plant for scientific research because of its well-established regeneration and transformation methodologies, and the existence of a sequenced genome (Edwards et al. 2017).
The genetic transformation of model plants is key to validating gene function. Transgenic plants can be used to characterize single genes and reconstruct gene networks that control plant biochemistry, physiology, and morphology under different conditions (Kochetov and Shumny 2017). Molecular characterization is necessary after transformation because copy number and the insertion site can influence gene stability and the expression levels of the SS Lopes et al.
transgene (Gadaleta et al. 2011). Transgenic (T 0 ) plants are hemizygous for the gene of interest and if they inherit a dominant trait and follow Mendelian inheritance, then they will produce homozygous and hemizygous plants after segregation. Out of all the generated plants, only one quarter of the T1 generation are homozygous plants that can be used to improve crops. Therefore, it is important to find simple, reliable, and high-throughput techniques that can detect the zygosity of transgenes (Passricha et al. 2016). Southern blotting was used for a long time to determine copy numbers, but it is a laborious and time-consuming method. It requires considerable amounts of DNA from samples and hazardous radioisotopes in some cases. Real-time PCR was identified as a fast, sensitive, and accurate technique that could be used to determine copy number insertion. However, in order to ensure that this method produces accurate results, it is essential to have a single copy gene as a reference, such as Alcohol dehydrogenase 1 (Adh1) in maize (Ingham et al. 2001) and a gene encoding the I/Y protein of a mobile element (hmg l/y) in Brassica napus (Masek et al. 2000, Weng et al. 2005. Not many single genes in tobacco have been characterized as being potential single-copy reference genes. The Auxin independent1 (Axi1) gene was first used to determine the copy number in transgenic plants transformed with the p3 gene (Potato Virus A -PVA), an aphid-transmitted potyvirus (Nováková et al. 2005) by Šubr et al. (2006). However, they were unable to prove that Axi1 was a single copy-gene. The Axi1 gene was isolated from a DNA tagged plant mutant collection that aimed to isolate genes involved in auxin action. When Axi1 expression is interrupted, the protoplasts gain the ability to grow in culture in the absence or at high concentrations of auxin, which is uncommon in wild type protoplasts (Walden et al. 1994).
Tobacco is an allotetraploid and its large genome (4.5 Gb) contains a large number of replicates (> 70%). The species N. tabacum (2n = 4x = 48) evolved from the interspecific hybridization of the ancestors Nicotiana sylvestris (2n = 24, maternal donor) and Nicotiana tomentosiformis (2n = 24, paternal donor) about 400,000-800,000 years ago (Clarkson et al. 2017). The results from this study showed that Axi1 is a single copy gene and that it came from the maternal donor N. sylvestris. We also established a methodology to estimate transgene copy number in transgenic tobacco using real-time PCR and the 2 -ΔΔCt method.

Cloning the Axi1 gene
Oligonucleotides were designed and used to clone a 727 bp fragment from the Axi1 gene (GenBank: X80301) (Axi1CloneF: 5'-AGATGCAGTTGTTGCAGCTC-3' and Axi1CloneR: 5'-TCAGATGCAAGGCAACAAAG-3'). The PCR reaction solution contained 40 ng tobacco genomic DNA (20 ng mL -1 ), 0.5 mM MgCl 2 , 0.5 mM of each primer, and 1× Promega GoTaq® Colorless Master Mix in a final volume of 20 mL. The reaction solutions were heated to 94 °C for 2 min, followed by 35 cycles of 94 °C for 20 s, 61 °C for 20 s, and 72 °C for 1 min. The final step was 72 °C for 5 min. The PCR product was applied to 1% agarose gel and visualized with GelRed (Biotium, Hayward, CA, USA). The band from the fragment of interest was isolated from the gel and purified using a QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) and then cloned into the pGEM®-T Easy Vector System (Promega Corporation, Madison, WI, USA). Escherichia coli DH5-a cells were transformed using the freeze-thaw method and the plasmid DNA from the recombinant clones was extracted using a Wizard® Plus SV Minipreps DNA Purification System kit (Promega Corporation, Madison, WI, USA). The plasmid DNA was quantified using a Nanodrop1000 spectrophotometer (Thermo Fisher Scientific, Foster City, CA, USA). All procedures were performed according to manufacturer's instructions.

Determination of the Axi1 gene copy number in the tobacco genome
The copy number of the Axi1 gene in the tobacco genome was determined using a standard four-point curve obtained from a 1:10 factor serial dilution of plasmid DNA containing 100 000, 10 000, 1000, and 100 copies of the Axi1 gene. The copy number for the plasmid (starting amount) was calculated using the formula: Copy Number μL -1 = molecular mass mass (g) × 6.022 × 10 23 The CT values were plotted against the log of the copy number and the standard curve was generated from an adjusted linear regression of the plotted points. The PCR amplification efficiency (E) was calculated from the slope coefficient of the adjusted line using the equation in Rasmussen (2001): The copy number of the gene Axi1 in the tobacco genome was determined by taking 87 ng (10 000 copies of the genome -content 2C = 8.7 рg) from three independent genomic DNA samples. The results were then interpolated using the standard curve.
The PCR reaction sample contained plasmid DNA, and there were three biological replicates and three technical replicates with 1:10 factor serial dilution of plasmid DNA (100 000, 10 000, 1000, and 100 copies of the Axi1 gene). The sample also contained 87 ng genomic DNA, 1× Fast SYBR® Green Master Mix (Applied Biosystems, Thermo Fisher Scientific), 1 mM of each primer (Axi1Forward: 5´-GCCGTCCTTGTAGTTCCAAA-3´ and Axi1Reverse:5´-AGCGGTCGACATCAAAAATC-3´) (Šubr et al. 2006) in a final volume of 10 mL. The Axi1 primers were located on exons five (forward) and six (reverse), and they amplified an 85 bp section of DNA. A 7500 Fast Real-Time PCR System (Applied Biosystems, Thermo Fisher Scientific) was used for the PCR and the amplification conditions were 95 °C for 20 s, followed by 40 cycles of 95 °C for 3 s, and 60 °C for 30 s, which were the manufacturer's recommendations.

Tobacco transformation with the pMCG1005 vector
Agrobacterium tumefaciens strain EHA101 harboring a pMCG1005 binary vector (McGinnis et al. 2005) that expressed the Bar gene driven by the 4× 35S promoter and the nopaline synthase gene terminator was used to transform axenic Nicotiana tabacum cv. Petit Havana plants following the protocol developed by Horsch et al. (1985). Tobacco leaf disks were placed in a Petri dish that was lined with tissue. Then 20 mL Agrobacterium tumefaciens culture (grown at 28 o C, 250 rpm) was added and the samples were left to culture overnight. After culturing, the leaf disks were removed and any excess bacterial culture was blotted from the tissue using sterile filter paper. The leaf disks were placed onto infection medium (4.3 g L -1 MS salts mixture, 3% sucrose, 0.8% agar, 0.4 mg L -1 thiamine-HCl, 100 mg L -1 myo-inositol, 5 mM kinetin, and 13.5 mM 2,4-dichlorophenoxyacetic acid) and incubated for 48 hours at 26 o C in a lit growth chamber. Unless otherwise stated, all of the chemicals were obtained from the Sigma Corporation (Sigma, São Paulo, Brazil). The transformed tissues were selected on shoot induction medium (4.3 g L -1 MS salts, 3% sucrose, 0.8% agar, 1 mL L -1 1000× MS vitamins stock (1000× MS stock solution: thiamine HCl 1 g L -1 , nicotinic acid 0.5 g L -1 , pyridoxine HCl 0.5 g L -1 , glycine 2 g L -1 , and myo-inositol 100 g L -1 ), 0.5 µM µ-naphthaleneacetic acid, 4 µM 6-benzylaminepurine, 100 mg L -1 tioxin, and 3 mg L -1 phosphinothricin (PPT). The explants were transferred to new selection medium on a weekly basis and the shoots regenerated within three to four weeks. The transformed shoots rooted in two to three weeks and were maintained on 50 mL of shoot induction medium without phytohormones, but supplemented with 100 mg L -1 tioxin and 1 mg L -1 PPT, in Majenta GA7 containers (Sigma) under continuous light at 26 ± 2 o C. Then the plants were transferred from the culture media to soil and grown in the greenhouse.

Copy number quantification of the transgenic tobacco events
The copy number was calculated for the five pMCG1005 events (T0) using the formula:2 Ct reference -Ct transgene , which does not require a single-copy reference event (Zhang et al. 2015). After this analysis, a single copy sample was selected as a reference to quantify the relative expression levels using the 2 -ΔΔCt method (Livak and Schmittgen 2001).

Segregation test in transgenic tobacco plants
Twenty seeds from each of the five transgenic tobacco T1 pMCG1005 events were germinated in MS medium (Murashige and Skoog 1962) without the selection agent. The genomic DNA was extracted from the seedlings at two and a half months after germination, according to Saghai-Maroof et al. (1984). The DNA was quantified by a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific) and an integrity check was carried out using electrophoresis with 1.0% agarose gel stained with GelRed (Biotium).
The presence of the Bar gene was confirmed by PCR using 60 ng of genomic DNA, 0.5 μM of each primer (35SxÔmegaF: 5'-ACTATCCTTCGCAAGACCCTTC-3' and BarR: 5'-CTTCAGCAGGTGGGTGTAGAG-3'), 1× reaction buffer, 0.6 mM of dNTP, 1U Invitrogen Taq DNA polymerase platinum, and 5% DMSO in a final volume of 20 mL. The cycling conditions were 94 °C for 2 min, followed by 35 cycles of: 94 °C for 20 s, 60 °C for 20 s, and 72 °C for 1 min, with a final extension of 72 °C for 5 min.
A χ 2 test was performed to compare the expected proportions of the transgene copy numbers.

RESULTS AND DISCUSSION
Tobacco is a model plant that is often used in plant-microorganism interaction and gene function studies (Clarkson et al. 2017, Chen et al. 2018, Bai et al. 2019). Real-time PCR was used to estimate the copy number in transgenic tobacco plants using Axi1 as the endogenous gene and Bar as the target gene. Quantitative PCR is based on a comparison between the target gene and the single copy gene in each sample. Therefore, having a characterized single-copy gene is essential to ensure experimental accuracy. In this study, the 2 -ΔCt method was used to select a reference sample that could be utilized in large-scale experiments because it does not require a reference sample. After the first round of analysis, a single copy event was selected as the reference sample for the 2 -ΔΔCt method, which was then used to estimate the copy number of the Bar gene in the transgenic tobacco genome. Finally, a Mendelian segregation analysis was carried out to check the real-time PCR results and to confirm that there was only one Bar gene insertion.
The 2 -ΔΔCt method requires an amplification efficiency of between 90 and 100%, and an equivalent efficiency between the endogenous control and the target gene. Therefore, the amplification efficiency was tested and the results showed a 93% efficiency rate. This meant that the Bar oligonucleotides could be used in the real-time PCR. Furthermore, the dissociation or melting curve showed the presence of a single product (Figure 1).
The standard curve was created using plasmids with increasing concentrations of the Axi1 gene fragment (727 bp) that ranged from 100 to 100 000 copies in the tobacco genomic DNA. The tobacco genomic DNA was quantified as Figure 1. Dissociation curve when Axi1 primers (A) were used. The standard curve for the Axi1 gene cloned in a plasmid with 100 000, 10 000, 1000, and 100 copies is in red and the Axi1 gene in the N. tobacco genome is in blue. There were three biological and three technical replicates (Nt 1-3, rep 1-3)  having 7000 and 12 650 copies after interpolating the results from the 10 000 copies point of the standard curve ( Figure  1B). This result indicated that there was only one copy of the Axi1 gene in the tobacco genome and that it was inherited from only one parent.
We generated seven transgenic events (T0), but only five events produced enough seeds to perform the experiments. Initially, the 2 -ΔCt method was performed, using Axi1 as endogenous the gene and Bar as the target gene in the T0 generation of five tobacco transgenic events, to identify a single copy sample because we did not have a characterized single copy sample for tobacco. After the first round of analysis, a reference sample (single copy) was selected and used to calculate the number of copies in the five events. The 2 -ΔΔCt method was used to estimate the copy number of the Bar gene in the transgenic tobacco genome and then a segregation test was performed (Figure 2). The five analyzed events were considered to be a single copy (Table 1). The values 0.52 and 0.5 were estimated as copy numbers equal to 1 because the transgene was inserted into the T0 generation by hemizygosis (Tt). Although all DNA samples were diluted to 20 ng μL -1 and the concentration verified by the Nanodrop method and in agarose gel, which produced similar results to the 2 -ΔΔCt method, there was some variation in the pipetting and DNA quality that resulted in different 2 -ΔΔCt results and consequently a difference in the copy numbers, which ranged from 0.5 and 1.0. As expected, the wild type tobacco did not show amplification of the Bar gene (Table 1). Transgenic single copy events are desirable because they follow a Mendelian segregation pattern, which means that the possibility of gene silencing is lower (Collier et al. 2017, Tark-Dame et al. 2018. Our results corroborated those of Šubr et al. (2006) who also used the Axi1 gene as an endogenous control to determine the copy number for the Potato virus A (PVA) P3 gene in transgenic tobacco.

SS Lopes et al.
Once it had been confirmed that there was only one copy of the Axi1 gene in the N. tabacum genome, a PCR was performed with N. sylvestris as the maternal parent (S-genome) and N. tomentosiformis as the paternal parent (T-genome). The PCR only amplified the Axi1 gene fragment (975 bp) in the maternal parent (Figure 3).
Nicotiana (Solanaceae) contains approximately 75 species, 48% of which are allotetraploids (36 species) that are classified into 12 sections. These allotetraploids are mainly distributed in the Americas, except for one section that is found outside the Americas (Knnap et al. 2004). Nicotiana tabacum (2n= 48, SSTT) was originally derived from interspecific hybridization between N. sylvestris (2n= 24, SS) and N. tomentosiformis (2n =24, TT) followed by chromosome doubling. These events occurred less than 1 Ma ago at an early stage in the diploidization process (Clarkson et al. 2010). Concerted evolution of the rDNA has been documented in N. tabacum. The process led to rDNA loci being overwritten within a few generations by one dominant progenitor copy, which was usually maternal, and it is possible that the same could have happened to the Axi1 loci.
N. tabacum genome is highly similar to its parent's genome. However, there has been a reduction in its genome, with greater losses of repetitive DNA sequences from the T-genome than from the S-genome, showing a preferential loss of repetitive sequences from the paternal parent at the genome level (Renny-Byfield et al. 2011). The loss of repetitive sequences from T-genome also happened with Axi1 gene, which is only present in the S-genome. Studies to identify mutations in the N gene, responsible to resistance to mosaic virus and present in the N-chromosome, also showed that this chromosome is closer related to N. sylvestris, which is probably derived from the S-genome (Chen et al. 2018), as showed for Axi1 gene. The genome downsizing is a widespread biological response to polyploidization that is also illustrated in the Xyloglucan endotransglucosylase/hydrolase (XTH) gene family, which has a higher number of NtXTHs genes in N. tabacum than in the ancestral donors, but less than the sum of these two donors (Wang et al. 2018).
Antibiotic or herbicide selection is a conventional method that is used to differentiate between hemizygous and homozygous lines if the transgene is a dominant trait and follows the Mendelian first law of segregation (Passricha et al. 2016, Tarafdar et al. 2019. To validate the data obtained for the copy number by real-time PCR, a segregation test was performed on the five independent pMCG1005 events. For event pMCG1005 Ev. 3, 12 seedlings out of 19 were positive for the Bar gene and 7 were negative, which was similar to event pMCG1005 Ev. 4, where 13 out of 20 seedlings were positive for the Bar gene and 7 were negative, and event pMCG1005 Ev. 6, where 14 out of 20 seedlings were positive for the Bar gene and 6 were negative. However, for event pMCG1005 Ev. 7, 18 out of 20 seedlings were positive for the Bar gene and 2 were negative and for event pMCG1005 Ev. 10, 16 out of 20 seedlings were positive for the Bar gene and 4 were negative (Table 1). The segregation pattern obtained in these events was the expected for monogenic characteristics (3:1) and this was confirmed by the χ2 test at p < 0.05 (Table 1). Events that had copy numbers between 0.5 and 1.0 may have had more than one copy integrated into the same locus. In this case, the copies would be inherited together and show a 3:1 distribution in the segregation test at T1. Tizaoui and Kchouk (2012) reported a tobacco transgenic event that showed two tightly linked inserts in the cis part at T1 with a low recombination frequency. By the T2 generation, the transgenic locus had evolved and the inserts were far enough apart to recombine with a high frequency. Transgenes are sexually inherited as a dominant trait with 3:1 Mendelian inheritance when present as single copy in the host genome  (Low et al. 2018), which is usually functional (Parrott 2010).
A Southern blot analysis is traditionally used to determine transgene copy number and is still a very useful technique when working with a species that do not have an available, fully annotated genome sequence. While reliable, this classic molecular biology technique is laborious, time-consuming, and requires large amounts of starting plant material, which is difficult in the case of tobacco seedlings, for example, and may involve the use of hazardous radioisotopes (Giancaspro et al. 2017, Nazir et al. 2019). Furthermore, it may fail to detect the exact number of transgene copies when these have been rearranged during integration into the host genome, which may cause changes and/or the loss of relevant restriction sites (Gadaleta et al. 2011, Giancaspro et al. 2017). Real-time PCR analyses are being used on many crops and model species to detect and characterize transgene locus structures. The determination of the transgenic locus number through real-time PCR overcomes the problems linked to phenotypic segregation analysis (i.e., lack of detectable expression even when the transgenes are present) and it is possible to analyze hundreds of samples per day. These two advantages mean that real-time PCR is an efficient method for estimating gene copy numbers (Gadaleta et al. 2011, Thu et al. 2016, Giancaspro et al. 2017). Real-time PCR is a fast and sensitive method, but its precision depends on a reliable one-copy reference gene, which was confirmed by the results from this study.
To date, there have been no genes characterized and validated in tobacco that could be used as a single-copy reference. However, this study has shown that Axi1 is definitely a single-copy gene in N. tabacum, and that it came from its maternal donor N. sylvestris. The results also showed that it can be used as a reference gene for copy number determination by real-time PCR and the 2 -ΔΔCt method, and consequently for transgenic characterization.