Identification of reference genes for expression analysis by real‐time quantitative PCR in drought‐stressed soybean

The objective of this work was to validate, by quantitative PCR in real time (RT-qPCR), genes to be used as reference in studies of gene expression in soybean in drought-stressed trials. Four genes commonly used in soybean were evaluated: Gmβ‐actin, GmGAPDH, GmLectin and GmRNAr18S. Total RNA was extracted from six samples: three from roots in a hydroponic system with different drought intensities (0, 25, 50, 75 and 100 minutes of water stress), and three from leaves of plants grown in sand with different soil moistures (15, 5 and 2.5% gravimetric humidity). The raw cycle threshold (Ct) data were analyzed, and the efficiency of each primer was calculated for an overall analysis of the Ct range among the different samples. The GeNorm application was used to evaluate the best reference gene, according to its stability. The GmGAPDH was the least stable gene, with the highest mean values of expression stability (M), and the most stable genes, with the lowest M values, were the Gmβ‐actin and GmRNAr18S, when both root and leaves samples were tested. These genes can be used in RT-qPCR as reference gene for expression analysis.


Introduction
Soybean has been the subject of many studies carried out to understand and quantify the processes that interfere with crop production in challenging environments (Popp et al., 2003).Biotechnological approaches to create new cultivars, with characteristics that help minimize losses in production and improve the understanding of the genetic basis of adaptive responses to environmental stress, will facilitate the improvement of drought tolerance (Talamè et al., 2007).
Gene expression studies in plants include precise quantification of mRNAs expressed in various situations, such as the effects of high temperature at different developmental stages, and in different tissues and cells.The quantification of mRNAs is usually achieved by northern blotting or by ribonuclease protection assay (RPA) (Suzuki et al., 2000).These methods are less precise than RT-qPCR (Bustin, 2000), which has emerged as an important technique to compare the expression profiles of target genes in several species, tissues and treatments, and also to validate high-throughput gene-expression profiles (Crismani et al., 2006).
One of the methodologies used to determine expression levels by RT-qPCR compares the gene of interest with reference genes, whose expression does not change under different experimental conditions.Statistical analysis methods have been developed to identify the best reference genes for a certain organism or experimental condition (Vandesompele et al., 2002;Pfaffl et al., 2004).The use of reference genes without prior verification of their expression stability can lead to an inaccurate interpretation of the data and generate incorrect results.A reference gene should have stable expression in different organs, developmental stages, and environments.According to previous works on the best reference genes for transcription normalization in plants, the most reliable are those constitutively expressed and involved in basic cellular processes, such as protein and sugar metabolism and maintenance of cell structure (Cruz et al., 2009).
A large-scale comparative analysis of the most stable genes of Arabidopsis thaliana has shown that the best reference genes are those related to the ubiquitin degradation process, such as poly-ubiquitin, ubiquitin-conjugating enzymes and ubiquitin ligases (Czechowski et al., 2005).In the RT-qPCR expression-profile analysis of suitable reference genes for a hybrid poplar (Populus spp.) and vitis (Vitis vinifera) (Reid et al., 2006), genes for tubulin and actin were stably expressed and considered the most reliable.Jain et al. (2006) showed that the best genes in various tissue samples from rice (Oryza sativa) were those encoding ubiquitin 5 and elongation factor-1 alpha.Genes that encode the proteins GAPDH (Wall & Edwards, 2002), β-actin (Kreuzer et al., 1999), tubulin (Brunner et al., 2004) or RNAr18S have been used for other species (Bhatia et al., 2002).
The objective of this work was to validate, by quantitative PCR in real time (RT-qPCR), genes to be used as reference in studies of gene expression in soybean in drought-stressed trials.

Materials and Methods
Soybean cultivar BR16, sensitive to drought (Oya et al., 2004), was grown hydroponically with nutrient solution (Hewitt, 1963) in a randomized block experimental design, with three 10-plant biological triplicates.The plants were grown for three weeks in a greenhouse (day, 30ºC±2ºC; night, 25ºC±2ºC), with relative humidity (RH) near 50%.At the V 5 development stage, drought was obtained by withdrawing the nutrient solution.Root samples were collected immediately (TH 0 ) and after 25, 50, 75 and 100 minutes of stress, and placed in liquid nitrogen for later molecular analysis.The samples were segregated into bulk 1, containing the treatments 25 and 50 min (TH 1 ), and bulk 2 containing the treatments 75 and 100 min (TH 2 ).
An experiment in sand culture was also carried out using the BR16 genotype.Two stress conditions were applied in addition to a control group (TS 0 ), close to field capacity, at 15% GH (gravimetric humidity): one group was moderately stressed at 5% GH (TS 1 ) and another was severely stressed at 2.5% GH (TS 2 ), according to Jones (2007).Each group consisted of six plants, for each treatment, sown in 10 L pots filled with sand, irrigated with nutrient solution (Hewitt, 1963), placed in greenhouse (day, 30 o C±2 o C; night, 22±2 o C; 40±5% RH), in a randomized block design.Plants were grown for 45 days, which corresponded to the full-flowering stage (R 2 ), under normal conditions (15% GH); then, water shortage was induced by withholding irrigation until the sand-moisture level reached 5% GH or 2.5% GH at the grain-filling stage (R 5 ).The control group was kept at 15% GH until the end of the experiment (TS 0 ).The pots were weighed daily to maintain the programmed moisture status.The third plant trifoliate from six plants, for each treatment, were collected and placed in liquid nitrogen for later molecular analysis.
Specific primers for the reference and target genes were designed from expressed sequence tag (EST) regions in soybean (Table 1); these sequences were obtained at the GenBank (National Center for Biotechnology Information, 2001).The reference genes were: Gmβ-actin, a structural component of the cytoskeleton of cells; GmGAPDH, glyceraldehyde-3-phosphate dehydrogenase; GmRNAr18S, a ribosomal gene; and GmLectin, a gene present in low copy numbers in the haploid genome of soybean.The soybean target gene AF514908.1 is a drought-responsive transcription factor.All primers were designed with the Primer Express Program version 3.0 (Applied Biosystems, São Paulo, SP, Brazil), and were adjusted to a melting temperature (Tm) varying between 59 and 62°C, with fragments in the 75-155 bp range.The primers were screened for hairpins, dimer formation, and target specificity by BlastN (Blast, 2011), against the nucleotide databank.Primer pairs were tested for specificity by RT-PCR and also by RT-qPCR, followed by a dissociation curve and agarose-gel electrophoresis.
To pinpoint the best reference genes in soybean, which are responsive to drought, reference genes from other species were used for a BlastX search against a Glycine max EST library, constructed from roots submitted to drought.Specific primers were designed and tested for amplification efficiency, to be used as references (Table 1).
Total RNA was extracted from all tissue samples (leaves and roots) in each experiment using trizol reagent (Invitrogen, São Paulo, SP, Brazil), according to the manufacturer's instructions.RNA quality analysis and quantification were performed by agarose-gel analysis and a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, São Paulo, SP, Brazil) measurement, respectively.This procedure was crucial to guarantee the same amount of starting material and equivalent efficiencies of cDNA synthesis from total RNA samples.The protocol used to synthesize cDNA was obtained from Schenk et al. (2003), using the reverse-transcriptase Superscript II First Strand Synthesis Kit (Invitrogen, Carlsbad, CA, USA).To prepare the cDNA, 1 µg total RNA was mixed with random primers (5 ng µL -1 ).
Polymerase chain reactions were performed in 96-well plates using SYBR Green (Invitrogen, Carlsbad, CA, USA) to detect dsDNA (double strand) synthesis.
Reactions were done in 25-μL volumes containing PCR buffer with 1.5 mmol L -1 MgCl 2 , 0.1 mmol L -1 dNTPs, 0.25 U Taq Platinum, 0.1 × SYBR Green (Invitrogen, Carlsbad, CA, USA), 0.2 µmol L -1 final concentration for the GmRNAr18S, GmGAPDH and target gene, 0.15 µmol L -1 for Gmβ-actin and GmLectin, and 10 μL sscDNA (corresponding to 5 ng of total RNA).Aliquots from the ssDNA (single strand) samples were used with all primer sets in two separate experiments.Two biological replicates, for each sample, were used for real-time PCR analysis, and three technical replicates were analyzed for each biological replicate.
Reactions were run in a 7300 RT-qPCR thermocycler (Applied Biosystems, Foster, CA, USA) using the following cycling parameters: 50°C for 2 min, 95°C for 10 min, 45 cycles at 95°C for 2 min, 62°C for 30 seconds and 72°C for 30 seconds; the data were collected in the last (extension) phase.No-template controls (NTCs) were included for each primer pair.Dissociation curves for each amplicon and agarose gel were analyzed to verify the specificity of each amplification reaction; the curves were obtained by heating the amplicons from 60 to 95°C with readings at one Celsius degree intervals.After the PCR reaction, raw fluorescence data generated in sequence detection was used for the calculation of primer amplification efficiency and cycle threshold (Ct) determination.It accounts for each PCR exponential curve, making it possible to have accurate values for the quantification of RT-qPCR.
The calculation of primer amplification efficiency and Ct determinations were achieved using the miner algorithm (Zhao & Fernald, 2005), corresponding to the formula E = [10 -1/slope ], used to calculate the reaction efficiency.The calibration curve was established by the Ct and the log of cDNA dilutions, and the results were  et al., 2002), varying the gene used as reference, in order to verify the expression stability between tissues analyzes (leaf and root), stress treatments (drought and control), genotype (BR16) and type of experiment (hydroponic and sand).
The GeNorm v. 3.4 software (Pattyn et al., 2003) was used for the analysis of gene-expression stability and rank; the best reference gene for soybean was selected by the GeNorm application (Vandesompele et al., 2002).This application calculates a gene stability value (M) and a normalization factor (NF) based on the geometric mean of the expression values of the set of control genes tested.The program considers two different factors to analyze gene-expression stability: the average expression stability (M) and pairwise variation (V).The lower the M value, the more stably expressed the gene is.The GeNorm application enables the exclusion of the most unstable gene to recalculate the M value.A Microsoft Excel file with the raw expression Ct values, generated with the miner algorithm, for each tested gene, in the two different samples, was first analyzed with the Statistica software (StatSoft, 2005) and then transferred into the expression stability program GeNorm.This application defines the most stable genes, by calculating the mean pairwise variation between a particular gene and all others used in an experiment, and determines M values.The highest M value corresponds to the least stable expression in a set of samples.The normalization factor (NF) is defined by the M values of the most stable genes, which establishes the minimum number of reference genes required for an accurate calculation of the relative expression of a target gene.The ideal number is given by the inclusion of a certain number of genes in the NF calculation, until there is no significant contribution to an additional gene.

Results and Discussion
The amplification efficiency for the four tested primer pairs varied from 1.49 to 1.95, which are the expected values for compared genes.Each primer system showed differences in amplification efficiency; the REST software adjusts the expression results according to the efficiency value using the formula: The range and distribution of the Ct values allow for a visualization of the least variable genes, among the samples (Figure 1).The most stable genes, Gmβ-actin and GmRNAr18S, have the narrowest Ct range and the smallest deviation from the Ct median.The Ct values of the candidate reference genes, in all samples, were within 16.63 and 35.85 cycles, showing a wide range of variation between them.
The Ct values indicate the detection sensitivity of cDNA, showing variations in the reaction, with Ct values varying from three to four cycles within each sample, in triplicate.This may represent a combination of the intrinsic variability of the RT-qPCR protocol and biological variation.Technical variation may be attributed to uncontrollable factors, such as errors in total mRNA estimates (spectrophotometric errors and impurities absorbing at 260 nm) and the variation in efficiency in different reverse-transcription reactions.
The target gene normalized with the GmRNAr18S and Gmβ-actin references showed the lowest expression levels in the leaves, in both treatments, but in T 1 showed the highest expression levels for all reference genes (Table 2) in the root.Depending on the expression levels of the tested genes, it is possible to choose reference genes with similar expression.Since there were no differences among samples within each organ, the Gmβ-actin and GmRNAr18S genes were considered the best references for this trial, because the RQ values of the target gene normalized with these two reference genes remained constant (Table 2).
Primer amplification efficiency for each reaction with the reference genes was determined from the slope (inclination value of the straight line, bx) of the exponential phase of amplification by RT-qPCR, within an experiment designed to compare organ-specific differences in gene expression (Table 1).The GmRNAr18S gene was used as the primary standard for RT-qPCR.Based on the efficiency data, the reaction with the GmRNAr18S gene could be used as the primary reference to estimate relative gene expression.Efficiency and sensitivity determination of the reaction primer showed acceptable values for the Gmβ-actin and GmRNAr18S genes.
A reference gene should be expressed at relatively constant levels in different tissues and at all developmental stages, and should be unaffected by experimental conditions (Bustin, 2000).Based on the obtained data, the RQ values normalized with the Gmβ-actin and GmRNAr18S genes were more consistent across organs and treatments than those normalized with GmLectin and GmGAPDH (Table 2).
The GAPDH gene, which encodes an enzyme in the glycolytic path, is present in most cell types and has been used as an endogenous control in RT-qPCR experiments in animal and human samples (Wall & Edwards, 2002).Studies in mammalian systems have suggested that the utility of this gene is limited as an endogenous control, due to expression variation in some tissues and to variability resulting from experimental treatment (Giulietti et al., 2001).The gene for GAPDH should be used with caution because its expression can be more abundant in proliferating cells (Bustin, 2002).
Out of the four genes used for analysis, only the GmGAPDH showed an M value higher than the cutoff established by GeNorm (M<1.5), and three of them (GmLectin, GmRNAr18S and Gmβ-actin) showed the lowest M values, well-suited numbers for reference genes (Mamo et al., 2007) (Table 3 and Figure 2).After exclusion of the least stable gene, GmGAPDH, there  (1) TH0, root samples collected immediately after withdraw of the nutrient solution; TH1, after 25 and 50 min; and TH2, after 75 and 100 min.

Figure 2.
Average expression stability values (M) of remaining reference genes, in different systems, using GeNorm.Plotted from the least stable to the most stable genes.
were decreases in the M values of the other genes and changes in the M values of the unstable genes.Although the difference in the M values of the GmLectin gene was higher in comparison to the GmRNAr18S and Gmβ-actin genes, its inclusion as a reference gene in RT-qPCR is recommended because of its high expression values, which is important whenever tested genes are highly expressed.The use of a reference gene for RT-qPCR in rice showed expression consistency in comparison to the GAPDH, β-actin and β-tubulin genes, at various seedling-development stages and after UV radiation (Kim et al., 2003).The high abundance of mRNA requires higher dilutions of cDNA to allow direct comparison with the target gene amplification.These stock dilutions require more work and introduce more error, leading to greater inaccuracy in RT-qPCR analyses (Iskandar et al., 2004).Consequently, it is advantageous to have a reference gene that is expressed at levels similar to those of the genes of interest, and to choose treatments that have little effect on the reference gene (Suzuki et al., 2000).
The Gmlectin gene did not provide results that justify its use as reference, although it showed acceptable M values.This gene was present in a single copy in the soybean genome, but it is not necessarily constitutive.If its expression shows variations, relative quantification would be difficult (Brunner et al., 2004;Czechowski et al., 2005).However, it could still be used as a reference to provide the number of copies of the genome.
Pairwise variation of the reference genes suggested that Gmβ-actin and GmGAPDH were the more stable and least stable ones, respectively.Nevertheless, there were only slight differences in the stabilities of the other genes (Figure 3).The second and third best reference genes were GmRNAr18S and GmLectin, with a 0.72 difference in M values (Figure 2).Using others rare transcripts as references, such as transcription factors, may cause problems in normalizing RT-qPCR experiments.Gene expression for β-actin or tubulin usually depends on plant developmental stage (Diaz-Camino et al., 2005).
When all four genes were analyzed with pairwise variation, there was no significant difference in the V numbers; however, there was an increase in instability with the addition of the GmGAPDH gene (V3/4) (Figure 3), which is relatively unstable, as shown by the Ct distribution in Figure 1.The optimal cutoff V number, according to Vandersompele (2002), should be around 0.15, but other works using this application have shown a higher V number (Kuijk et al, 2007), depending on the number of genes and types of sample tested.In soybean, the addition of a fourth gene did not contribute significantly to stability, in the comparison of both tissues.Only two reference genes, Gmβ-actin and GmRNAr18S, should be used for RT-qPCR experiments on roots and leaves of the studied accession, when submitted to drought.

Conclusions
1. RQ values normalized with the Gmβ-actin and GmRNAr18S genes have lower variations in organs and treatments.
2. Gmβ-actin and GmRNAr18S are the most stable genes, based on their transcriptional profiles and GeNorm analysis.
3. Gmβ-actin and GmRNAr18S can be used as reference genes in soybean under water stress.
Figure 3. Pairwise variation (V) of the selected reference genes in hidroponic or sand systems.Calculated on GeNorm, from the most stable gene to the least stable gene, according to the M value; V2/3, pairwise variation between the two most stable genes (GmRNAr18S and Gmβ-actin), plus the third most stable gene (GmLectin); V3/4, addition of the fourth most stable gene (GmGAPDH) in each of the systems.

Figure 1 .
Figure 1.Box-whisker showing the cycle threshold (Ct) variation (number of cycles) of each candidate reference gene among the samples.The median quartiles, and the minimum and maximum Ct values of the four samples were calculated in the Statistica software (StatSoft, 2005).

Table 1 .
Gene description, primer sequences and efficiency of the selected ESTs.by the Sequence Detection v. 1.2.2 software (Perkin Elmer do Brasil, São Paulo, SP, Brazil).Cycle threshold data were analyzed by the Relative Expression Software Toll (REST) version 2.0.7 (Pfaffl analyzed

Table 3 .
Normalization factor for treatments, and M values for reference genes.