Comparison of RAPD , RFLP , AFLP and SSR markers for diversity studies in tropical maize inbred lines

In order to compare their relative efficiencies as markers and to find the most suitable marker for maize diversity studies we evaluated 18 tropical maize inbred lines using a number of different loci as markers. The loci used were: 774 amplified fragment length polymorphisms (AFLPs); 262 random amplified polymorphic DNAs (RAPDs); 185 restriction fragment length polymorphisms (RFLPs); and 68 simple sequence repeats (SSRs). For estimating genetic distance the AFLP and RFLP markers gave the most correlated results, with a correlation coefficient of r = 0.87. Bootstrap analysis were used to evaluate the number of loci for the markers and the coefficients of variation (CV) revealed a skewed distribution. The dominant markers (AFLP and RAPD) had small CV values indicating a skewed distribution while the codominant markers gave high CV values. The use of maximum values of genetic distance CVs within each sample size was efficient in determining the number of loci needed to obtain a maximum CV of 10%. The number of RFLP and AFLP loci used was enough to give CV values of below 5%, while the SSRs and RAPD loci gave higher CV values. Except for RAPD, for all the markers genetic distance correlated with single cross performance and heterosis which showed that they could be useful in predicting single cross performance and heterosis in intrapopulation crosses for broad-based populations. Our results indicate that AFLP seemed to be the best-suited molecular assay for fingerprinting and assessing genetic relationships among tropical maize inbred lines with high accuracy.


Introduction
The past limitations associated with pedigree data and morphological, physiological and cytological markers for assessing genetic diversity in cultivated and wild plant species have largely been circumvented by the development of DNA markers such as restriction fragment length polymorphisms (RFLPs; Botstein et al., 1980), random amplified polymorphic DNAs (RAPDs; Williams et al., 1990), amplified fragment length polymorphisms (AFLPs; Zabeau and Vos, 1993) and simple sequence repeats (SSRs, microsatellites;Tautz, 1989).However, these molecular markers have technical differences in terms of cost, speed, amount of DNA needed, technical labor, degrees of polymorphism, precision of genetic distance estimates and the statistical power of tests.
Although the discrimination power of RFLPs in diversity studies has been well documented (Smith et al., 1990;Dudley et al., 1991;Messmer et al., 1993;Benchimol et al., 2000) the limitations related to the routine use of RFLPs stimulated studies with other types of molecular markers such as RAPDs which are simpler to use and do not require the use of radioactive materials (Williams et al., 1990).The RAPD technology is well suited to DNA fingerprinting (dos Santos et al., 1994;Thormann et al., 1994) although it does suffer from a certain lack of reproducibility due to mismatch annealing (Neale and Harry, 1994;Demeke et al., 1997;Karp et al., 1997).
Microsatellites (SSRs) occur frequently in most eukaryote genomes and can be very informative, multi-allelic and reproducible (Vos et al., 1995;Senior and Heun, 1993) and were suggested in order to overcome the limitations associated with RFLP and RAPD.The application of SSR techniques to plants depends on the availability of suitable microsatellite markers, which have been developed for species such as soybean (Rongwen et al., 1995), rice (Zhao and Kochert, 1993), maize (Taramino and Tingey, 1996) and the common bean (Yu et al., 2000).Morgante and Olivieri (1993) stated that in soybean the amount of information given by SSR loci in relation to a comparable number of RFLP loci is given by the estimated number of alleles (4.25 per locus for SSR as opposed to 2.15 per locus for RFLP).Wu and Tanskley (1993) stating that the heterozygosity of SSRs is seven to ten times higher than that of RFLPs.
The AFLP technique is more laborious and time consuming than RAPD methods but is also more reliable, AFLP being able to detect a large number of polymorphic bands in a single lane rather than high levels of polymorphism at each locus such as is the case for SSR methods.Although this lower sensibility in detecting informative genotypic classes might be associated with the inability to distinguish heterozygotes from homozygotes because of binary scored AFLPs, Gerber et al. (2000) suggest that the high numbers of polymorphic loci revealed by AFLP methods counterbalance the loss of information resulting from dominance, while Garcia-Mas et al. (2000) showed that AFLPs had higher efficiency in detecting polymorphism than either RAPD or RFLP markers.It is also known that the AFLP technique has lower initial costs and is more transferable across species than SSR methods.Techniques based on AFLPs have been applied to genome mapping (Zimnoch-Guzowska et al., 2000), DNA fingerprinting (Powell et al., 1996), genetic diversity studies (Russell et al., 1997) and parentage analysis (Gerber et al., 2000;Lima et al., 2001).
Comparisons of different DNA markers for diversity studies in maize (Hahn et al., 1995;Smith et al., 1997;Ajmone Marsan et al., 1998;Pejic et al., 1998), barley (Russell et al., 1997), wheat (Bohn et al., 1999), cruciferous species (dos Santos et al., 1994;Thormann et al., 1994), potato (Mc Gregor et al., 2000), sorghum (Yang et al., 1996) and rice (Davierwala et al., 2000) have tried to evaluate the relative efficiencies of the different techniques available.However, in the case of maize, tropical and temperate populations differ from each other because tropical populations usually originate from composites with higher genetic variability and, most of the time, it is difficult to allocate tropical composites to well-defined heterotic groups by phenotypic evaluation.Due to this uniqueness, molecular markers have been very useful in genetic evaluations and assignment of tropical maize inbred lines to heterotic groups.
The objectives of the study described in this paper were: i) compare the level of information provided by RFLP, RAPD, SSR and AFLP markers for estimating genetic similarities in tropical maize inbred lines; ii) evaluate the minimum number of loci of each marker needed to accurately represent genetic distance between inbred lines; iii) compare the genetic distances (GD) obtained with the different marker system; iv) compare the usefulness of these four markers in predicting single-cross hybrid performance by means of genetic distance estimates.

Plant material and DNA isolation
Eighteen S 3 selected inbred lines from two divergent tropical maize populations (eight from BR-105 and ten from BR-106) previously had their genetic distances surveyed using four different marker systems (Lanza et al., 1997;Benchimol et al., 2000;Barbosa et al., 2003).The BR-105 population is an early-maturing synthetic with orange flint kernels while the BR-106 population is an earlymaturing composite with yellow dent kernels, both populations having shown high levels of heterosis when crossed and were assigned to distinct heterotic groups by Naspolini Fº et al., (1981) and Souza Jr. et al., (1993).Detailed descriptions of these populations are given in Lanza et al. (1997) and Rezende and Souza Jr. (2000).
Total genomic DNA was isolated from a bulk of five-week-old leaf tissue taken from 16 plants of each line, then being isolated and purified by the method of Hoisington et al. (1994).

Molecular analysis
The way the RAPD data was obtained and a description of the data is given in Lanza et al. (1997).Thirty-two primers showing reproducible polymorphism were selected and used for scoring the 18 inbred lines.When performing RAPD analysis, each band was considered as one locus.How the RFLP data were obtained and the data itself is given in Benchimol et al. (2000).Briefly, a total of 185 clone-enzyme combinations were analyzed, the maize genome being saturated (20 cM intervals) with at least one RFLP probe selected by its map location on each chromosome.Each probe-enzyme combination (EPC) was considered a locus and each unique RFLP banding pattern a distinct variant.Barbosa et al. (2003) describes how the AFLP and SSR profiles were obtained and also give the data produced.For the AFLP method 20 primer combinations were used and binary scored (1 or 0) with each band being considered a locus while for the SSR method 68 polymorphic primers were used with the binary data being converted into a genotypic matrix which was used to identify alleles and their respective loci.

Data analysis
Both dominant markers (RAPD and AFLP) were used to calculate the genetic distances between the 18 in-bred lines using the complement of the Jaccard's similarity coefficient (Jaccard, 1908) which takes into account the presence or absence of bands.In this method, cooccurrences are divided by the total number of evaluated loci (excluding the negative co-occurrences) and thus can be interpreted as the proportion of coincidences in relation to the total number of evaluated loci.Jaccard similarities were calculated using version 2.0j of the NTSYS-PC computer package (Exeter software, NY; Rolf, 1997).The genetic distances for the codominant markers (RFLP and SSR) were calculated using the modified Roger's distance (MRD; Goodman and Stuber, 1983) based on the allele frequency of each locus which considers the amount of genetic diversity and expresses the quantity of diversity present in each locus or allele, calculations being made using version 1.3 of the TFPGA sotware (Miller, 1997).
Pearson's correlation coefficient was calculated for the genetic distances, single cross performance and heterosis as previously described by Benchimol et al. (2000).The information content of each marker system was calculated for each marker and locus using the polymorphism information content (PIC) (Lynch and Walsh, 1998) which provides an estimate of the discriminating power of a locus by taking into account not only the number of alleles that are expressed but also their relative frequencies.Calculations were made using the following formula: , where f i is the frequency of the i th allele.
Bootstrap analysis was used to verify if the number of polymorphic loci evaluated was high enough to provide accurate genetic distance estimates (King et al., 1993;Halldén et al., 1994).To determine the sampling variance of the genetic distances produced by the different molecular data sets we performed bootstrap analysis using a decreasing number of loci (for codominant markers) or bands (for dominant markers).For each specific number of loci or bands used the polymorphic markers were submitted to 500 random samplings with replacement (bootstrap samples) and genetic distances were obtained for each bootstrap sample (Tivang et al., 1994).Each band visualized on the gel was considered to be the re-sampling unit for dominant markers because for these markers each band is related to one locus.Codominant markers relate each band to an allele, and therefore the boostrap was applied among locus.
The coefficient of variation (CV) for all 500 genetic distances across the bootstrap samples was estimated for each specific number of loci or bands sampled, a computer program for performing these analyses being set up using the 'RANNUNI' function of the SAS system (Version 8.0; SAS Institute, 1999).For each marker system (AFLP, RAPD, RFLP and SSR) the exponential function was adjusted to estimate the number of loci needed to obtain a 10% CV.We used the median and maximum coefficient of variation values to evaluate the accuracy of the genetic distance estimates because although the mean coefficient of variation is often used in the literature caution is needed when dealing with molecular marker data for which there is no assurance that the CVs values are distributed symmetrically.

Levels of polymorphism
All of the 18 maize inbred lines studied by us had previously been investigated using the four different marker systems (RAPD: Lanza et al., 1997;RFLP: Benchimol et al., 2000;AFLP and SSR: Barbosa et al., 2003), the estimated means and ranges of the genetic distances and the level of polymorphism produced by each of the marker systems for the possible combinations of crosses between lines BR-105 and BR-106 being summarized in Tables 1 and 2. In the work of Lanza et al. (1997), Benchimol et al. (2000) and Barbosa et al. (2003) the total number of assays ranged from 20 primer combinations for the AFLP method to 185 probe/enzyme combinations for the RFLP method, with the total number of polymorphic bands ranging from 200 for SSR to 973 for RFLP (Table 2).Since the RAPD and AFLP markers were dominant they could only express the theoretical maximum of two alleles per locus, whereas because  Benchimol et al. (2000); AFLP and SSR, Barbosa et al. (2003).N = number of crosses.
the RFLP and SSR markers were codominant these markers could express different values of alleles per locus.

Polymorphism information content
The RFLP and SSR polymorphism information content (PIC) means were higher than the RAPD and AFLP means (Figure 1).Differences in the distribution profiles also occurred between dominant and codominant markers, with dominant markers having higher standard deviations than codominant markers.The differences between minimum and maximum PIC values were lower for RFLP and SSR than for AFLP and RAPD.The RFLP markers gave the highest mean PIC value for all loci (PIC = 0.96) and the SSR markers the second highest (PIC = 0.89), with the dominant RAPD (PIC = 0.75) and AFLP (PIC = 0.73) markers having mean PIC values of almost the same magnitude.

Correlations between genetic distances measured with different markers
The highest Pearson correlation value (Figure 2) was that between the AFLP and RFLP genetic distances (r = 0.87) and it seems that these two markers are the most similar type of markers in terms of the magnitude of the ge-582 Garcia et al.Benchimol et al. (2000); AFLP and SSR, Barbosa et al. (2003).Key: Each band corresponds to a locus for RAPD and AFLP and to an allele for RFLP and SSR.
a Theoretical maximum number of loci.netic distances produced.The SSR and AFLP markers produced the second highest correlation value (r = 0.78), followed by SSR and RFLP (r = 0.71), RFLP and RAPD (r = 0.50) RAPD and AFLP (r = 0.48), with the SSR and RAPD markers having the lowest value (r = 0.33).The RAPD markers were clearly the most distinct type of marker because the correlation values involving this marker were equal to or lower than 0.5 while the other markers showed tight association patterns between each other.

Bootstrap analysis
As expected, the magnitude of the coefficient of variation (CV) values decreased as the number of polymorphic loci (bands) evaluated increased.Within each sample (i.e. the number of loci examined for each marker system) the distribution of the genetic distance CVs were skewed for all the systems examined, with dominant markers tending to have lower CV values and be skewed to the left while codominant markers had higher CV values and tended to be skewed to the right (Figure 3).Because the mean is not a good indicator of central tendency for skewed data we calculated the minimum number of loci necessary for an accurate representation of the genetic distances by fitting an exponential function based on the mean, median and maximum CV values of the genetic distances obtained by bootstrap sampling to the data for each marker, the results of this analysis being given in the Boxplots shown in Figure 3.We used the median CV value to calculate the following: the sample size (number of loci or Bands) required so that 50% of the genetic distance values had CV values less than 10% (n median ); the sample size needed so that no genetic dis- tance value had a CV of more than 10% (n maximum ); and the sample size required for all genetic distances to have an average CV of 10% (n mean ) (Figure 3).The results obtained based on the adjusted functions (except for the mean CV) shown in Figure 3 are presented in Table 3.

Correlation of genetic distance with F 1 grain yield and heterosis
The correlations between genetic distances and grain yield (Table 4) showed a similar pattern for the RFLP, SSR and AFLP markers, correlation being high (0.82 to 0.91, significant at p = 0.01) for intrapopulation BR-106 crosses, moderate (0.39 (not significant) to 0.52, significant at p = 0.05) for intrapopulation BR-105 crosses and low (0.16 to 0.29, not significant) for the interpopulation BR-105 x BR-106 crosses.For the RAPD markers correlation was moderate for intrapopulation BR-106 crosses (0.56, significant at p = 0.01) and intrapopulation BR-105 crosses (0.60, significant at p = 0.05) but low (0.16, not statistically significant) for the interpopulation BR-105 x BR-106 crosses (Table 4).Similar patterns were observed for genetic distance and heterosis.

Discussion
Similar levels of genetic distance estimates were obtained using the RAPD, AFLP and SSR markers.The highest genetic distance values occurred with crosses between inbred lines from different heterotic groups (BR-105 x BR-106), these results agreeing with the high level of heterosis exhibited when these populations are intercrossed (Naspolini Fº et al., 1981;Souza Jr. et al., 1993).Although similar average genetic distance values were obtained for the BR-105 and BR-106 intrapopulation crosses, the BR-106 crosses showed the widest range of genetic distances with all of the four different markers assayed; probably because of the broader genetic base of the BR-106 population.Brazilian breeding programs have exploited the genetic diversity of the BR-106 population and demonstrated that high performance cultivars can be obtained   from this population (Gerage et al., 1988(Gerage et al., , 1989)).The correlation coefficient values between genetic distance and hybrid performance for the four markers assayed were similar to the correlation values between genetic distance and heterosis, not only for the inter or intrapopulation crosses but also for all crosses combined.
The RFLP assay reflects restriction size variation spread across the genome, because the use of RFLP markers resulted in the greatest average number of alleles per locus as compared to the other marker systems tested.found that estimates of polymorphism information content (PIC) based on RFLP measures had the lowest standard deviations and were the most informative.As expected, the PIC distributions revealed that, in terms of genetic distance, dominant markers had lower levels of polymorphism as compared to codominant markers.However, we also found that SSRs markers gave a more heterogeneous distribution for individual PIC values than RFLP markers, although this might have been due to the low number of polymorphic loci evaluated for this marker (Barbosa et al., 2003).Although the AFLP markers gave the lowest mean PIC value they provided a similar degree of polymorphism information content to that provided by the RAPD markers, which agrees with the results published by Becker et al. (1995), Russell et al. (1997) and Pejic et al. (1998).
Comparison of the genetic distances generated by different molecular markers in diversity studies have been reported by several authors (Hahn et al., 1995;Russell et al., 1997;Yang et al., 1996) and have revealed only moderate agreement between genetic distance estimates made using RFLP and RAPD markers.Pejic et al. (1998) compared different molecular markers to assess the genetic similarities between maize inbred lines and found great differences in the RAPD similarity clustering pattern.The results obtained in our study showed high agreement between RFLP and AFLP genetic distance estimates, such estimates having also been highly correlated in other studies (Russell et al., 1997;Melchinger et al., 1998).Indeed, we found that the RFLP and AFLP markers produced sufficient numbers of polymorphic bands to produce reliable genetic distance estimates with high correlations between these two marker systems, the similarity between the results being explainable by the fact that they are similar techniques based on restriction site changes.
Although the SSR and RAPD markers did not result in sufficient numbers of polymorphic bands to produce a mean CV of 10% (Figure 3) it is possible that additional bands would lead to lower CV values and increase the reliability of genetic distance estimates.Even though the CV values were not low enough to indicate a high level of precision the SSR markers produced high, and the RAPD markers moderate, correlations between the genetic distance estimates and hybrid performance and heterosis for the BR-106 intrapopulational crosses.Our results points to the need to adopt different strategies for selecting markers and choosing an upper number of SSR and RAPD markers.
An average CV value of 10% is often cited as being necessary to achieve precise genetic distance estimates (dos Santos et al., 1994;Halldén et al., 1994;Thormann et al., 1994;Tivang et al., 1994).Because the box-plots for each of our groups of samples were skewed we used the mean, median and maximum CV values to determine the adequate number of polymorphic loci needed for acceptable precision.The box-plots (Figure 3) show what happens when the genetic distance CV values, which are different for dominant and codominant markers, are high.The choice of the appropriate number of polymorphic loci required for a reliable estimation of genetic distance is influenced by the criteria used, and it appears that the maximum and median CV values are the best choice for evaluating the precision of the genetic distance estimates based on molecular marker data sets.From the analysis of our data it appears that the maximum CV value appears to be, in most cases, the best guarantee for producing reliable estimates of genetic distance.For dominant markers, where the distribution is skewed towards lower genetic distance CV values, the use of mean or median CV values may lead to errors because some of the genetic distance values will not fall within the required level of precision.For codominant markers, however, the distribution of values within each sample is skewed to-  Benchimol et al. (2000); AFLP and SSR, Barbosa et al. (2003).N = number of crosses; * Significant at p = 0.5; ** Significant at p = 0.01.
wards the higher values and it appears that mean or median CV values should be appropriate.We also found extremely high (almost 100%) coefficients of determination for the adjusted equations for both codominant and dominant markers, indicating that extrapolation to outlying points could be done.Thormann et al. (1994) reported that the number of bands required for a mean CV of 10% was 327 for RAPD and 294 for genomic RFLPs to estimate genetic relationships within and between cruciferous species.Pejic et al. (1998) performed a bootstrap procedure to evaluate the variation in the genetic similarities between temperate maize inbred lines across different marker systems and suggested that 150 bands were sufficient for reliable estimates of genetic similarities.Our data indicates that when measuring genetic distances 229 RAPD bands would be required to achieve a median CV value of 10% while 526 RFLP bands would be needed for a maximum CV of 10%.Our AFLP and RFLP genetic distance data appeared to have less dispersion, with only 185 RFLP loci being needed to produce a median CV of less than 5%.
Our results indicate that, apart from the RAPD markers, the other DNA marker systems provided consistent information for diversity studies on tropical maize populations and produced genetic distance estimates which were in good agreement.The RFLP system appears to be the most robust marker assay in terms of the amount of polymorphism surveyed, although, in practice, it is still a very laborious technique.The SSR markers were promising in terms of the polymorphism and information content revealed, but may involve some additional initial costs associated with primer development.The results also suggest that the number of loci evaluated should be increased.
Our results suggest that AFLP markers are the best choice for the evaluation of diversity and assessing the genetic relationships between tropical maize inbred lines with high accuracy.The AFLP system presents good levels of precision in its genetic estimates and single crosses prediction.AFLP also correlates highly with results obtained using the RFLP system and is a fast and reliable system capable of supporting a multiplex approach not requiring previous knowledge of DNA sequencing.

Figure 1 -
Figure 1 -Distribution of polymorphism information content (PIC) data for different maize crosses.The data was obtained using random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR) markers.SD= standard deviation.

Figure 3 -
Figure 3 -Boxplots showing sampling variation of the genetic distances (GD) values between different maize inbred lines using bootstrap analysis across different marker systems.Key: a, c = dominant markers; b, d = codominant markers; x = number of loci; y max = exponential function adjusted relative to the highest coefficient of variation (CV) value; y median = exponential function adjusted relative to the median CV value; RAPD = random amplified polymorphic DNA; RFLP = restriction fragment length polymorphism; AFLP = amplified fragment length polymorphism; SSR = simple sequence repeat markers.

Table 1 -
Comparison of markers for diversity studies in tropical maize 581 Mean and range of the genetic distance values for different maize crosses calculated using data from random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR) markers.The Jaccard similarity coefficient was used for dominant markers and Roger's modified distance for codominant markers.

Table 3 -
Sample size (number of loci or bands) required so that genetic distance values will have the specified coefficient of variation values.

Table 4 -
Comparison of markers for diversity studies in tropical maize 585 Pearson correlation coefficient (r) between genetic distance (GD) and F 1 grain yield and heterosis for different maize crosses within and between heterotic groups as calculated using data from random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR) markers.