Demarcation of informative chromosomes in tropical sweet corn inbred lines using microsatellite DNA markers

A study of genetic variation among 10 pairs of chromosomes extracted from 13 tropical sweet corn inbred lines, using 99 microsatellite markers, revealed a wide range of genetic diversity. Allelic richness and the number of effective alleles per chromosome ranged from 2.78 to 4.33 and 1.96 to 3.47, respectively, with respective mean values of 3.62 and 2.73. According to the Shannon’s information index (I) and Nei’s gene diversity coefficient (Nei), Chromosome 10 was the most informative chromosome (I = 1.311 and Nei = 0.703), while Chromosome 2 possessed the least (I = 0.762 and Nei = 0.456). Based on linkage disequilibrium (LD) measurements for loci less than 50 cM apart on the same chromosome, all loci on Chromosomes 1, 6 and 7 were in equilibrium. Even so, there was a high proportion of genetic variation in Chromosomes 4, 5, 8, 9 and 10, thereby revealing their appropriateness for use in the genetic diversity investigations among tropical sweet corn lines. Chromosome 4, with the highest number of loci in linkage disequilibrium, was considered the best for marker-phenotype association and QTL mapping, followed by Chromosomes 5, 8, 9 and 10.


Introduction
Sweet corn (Zea mays L. ssp. saccharata, 2n = 20), with a thin pericarp layer on the caryopsis, is consumed at the immature grain-stage of endosperm development. It can be grown in a wide range of environments, so long as its water requirements are in accordance (Kashiani et al., 2011). Due to its high economic value, production in the tropical areas is on the increase Saleh et al., 2010). The sugary (su) or sweet gene in Chromosome 4 either prevents or retards normal conversion of sugar into starch during endosperm development, whence the sweet taste.
The study of variation among chromosomes, besides being a quick way of detecting genes linked to molecular markers, makes it easy to define the degree of genetic relationships among inbred lines. The use of these molecular markers in the breeding process also facilitates efficiently reaching breeding goals, with less reliance on field assaying by inoculation. Over the last two decades, innumerable molecular markers have been developed for almost all the major crop species (Barcaccia, 2010). Microsatellites, also known as SSRs, short tandem repeats (STRs), and se-quence-tagged microsatellite sites (STMS), are repeated units of short nucleotide motifs in tandem, that are 1 to 6 bp long. Di-, tri-and tetra-nucleotide repeats are widely distributed throughout the genomes of plants and animals (Tautz and Renz, 1984;Perseguini et al., 2011). Reportedly, microsatellite technology is useful for evaluating genetic diversity and phylogenetic relationships in plant species (Naeem et al., 2011), microsatellite loci having proven their efficiency as genetic markers, under these circumstances. To date, SSRs have been used with corn for mapping (Taramino and Tingey, 1996), genetic fingerprinting (Senior et al., 1998), and assessing genetic diversity among inbred lines (Liu et al., 2003). Nonetheless, chromosomes, with higher locus contribution to genetic variation for important traits among tropical sweet corn lines, have not yet been demarcated from those with numerous housekeeping genes. greatly contribute to its breeding efficiency. Currently, information on many of these microsatellites, and their respective location on the chromosome, is available in databases. Nonetheless, chromosomes with numerous polymorphic microsatellite loci, capable of detecting high genetic variability, have not yet been distinguished. Moreover, even when using any kind of molecular marker, there is no available information on the characterization of the 10 pairs of specific chromosomes. Although karyotypic data are available, there is an obvious lack in knowledge on inter-and intra-chromosomal variation. This obliges researchers to select microsatellite loci throughout the genome for genetic diversity studies and QTL investigation, thus making investigation time consuming and costly. Hence, the main objective of the present study was to demarcate informative chromosomes with the highest interand intra-variation among tropical sweet corn inbred lines, for further marker-assisted breeding work, genetic diversity studies and QTL investigation.

Material and Methods
Through an extensive breeding program carried out in Universiti Putra Malaysia, a set of homozygous inbred lines developed from various tropical-source populations was obtained after eight generations of self-pollination and selection. Among these, 13, originally developed from Malaysian, Indonesian, Hawaiian, Taiwanese and Thailand source-populations, were selected for investigation of innate chromosomal variation. Twenty seeds from each line were germinated in jiffy cups. Eventual seedlings were grown to the two-leaf stage, whereupon genomic DNA was extracted from the young leaves of 10 per line using a DNeasy ® Plant Mini Kit from QIAGEN ® , according to manufacturer's instruction, but with minor modifications as regards washing steps.
Based on their polymorphism information content (PIC) and QTL information as previously reported, one hundred and five microsatellite regions that are distributed throughout the corn genome were chosen from the maize genome database (MaizeGDB). Six out of 105 microsatellite primers were not successfully amplified, even when tested at different annealing temperatures. The remaining 99, as well as their respective locations in the ten corn chromosomes, are shown in Figure 1. Amplifications were carried out using volumes consisting of 20 mL PCR reaction containing 5 mL (25-30 ng) of genomic DNA, 1.5 mL of PCR 10x buffer, 1.5 mL of 25 mM MgCl2, 0.3 mL of dNTP Mix (10 mM each of dATP, dGTP, dCTP, dTTP), 0.2 unit of Taq polymerase (all from QIAGEN ® , Taq DNA Polymerase Kit) 1.8 mL (4 pmol/mL) of each primer (F and R primers), and 9.7 mL of distilled water. PCR amplification was in an Eppendorf Mastercycler Gradient Thermal cycler (Eppendorf Scientific, Westbury, NY), using 96well-plates. Amplification conditions with the touchdown thermal cycling protocol were 95°C for 1 min, 30 cycles of 94°C for 1 min, 67°C annealing temperature decreasing by 0.4°C per cycle for 2 min and 72°C for 2 min, and a terminal extension step at 72°C for 1 h. Following amplification, 10 mL of amplified DNA were mixed with 5 mL of a formamide loading buffer, and then placed into 4% (w/v) metaphore agarose 36-cm well-to-read gels with 1x TBE buffer. Electrophoresis was performed at 100 volts for 5 min, and followed by 55 volts for approximately 4 h until the bromophenol blue band of the loading dye had been moved forward by 10 cm. After staining with ethidium bromide, all the gels were visualized under UV light with an Alpha EaseR FC Imaging System (Alpha Innotech Corporation, CA). The ChemilImagerTM Gel Doc. imaging system (Alpha Innotech Corporation, CA) was used to record the gels as JPEG files for counterchecking. Fragment sizes were estimated based on GeneRulerTM 25 and 50 bp DNA Ladders (Fermentas).
Descriptive statistics for each chromosome, including average number of alleles (n a ), number of effective alleles (n e ) (Kimura and Crow, 1964), observed and expected homozygosity (Hom o and Hom e , respectively), observed and expected heterozygosity (H o and H e , respectively) (Levene, 1949;Nei, 1987), average heterozygosity ( $ H), the Shannon's information index (I) (Lewontin, 1972), Nei's expected heterozygosity (Nei, 1973), gene flow (Nm) (Nei, 1987), and coefficient of inbreeding (F) (Lukas and Donald, 2002), were all estimated using the Population Genetic Analysis software (POPGENE) version 1.3.1 (Yeh et al., 1999). F-statistics (F IS , F ST and F IT ) for each chromosome were estimated from variance components based on Wright (1978). Classical (D), standardized (D'), and conventional (r 2 ) linkage disequilibrium (LD) coefficients for any pair of alleles amplified at two loci on the same chromosome, were estimated based on Lewontin and Kojima (1960), using Arlequin suite version 3.5 computer software (Excoffier and Lischer, 2010).

Results
There was a wide range of genetic variation among the 10 pairs of chromosomes obtained from the inbred lines evaluated. Allelic richness and the number of effective alleles per chromosome ranged from 2.78 to 4.33 and 1.96 to 3.47, respectively, with respective mean values of 3.62 and 2.73 (Table 1). The combination, high homozygosity (Hom o = 0.9396), and very low heterozygosity (H o = 0.0604), was a sure indication of Hardy-Weinberg disequilibrium arising from heterozygosity loss during selfing and selection. Heterozygote locus deficiency was also revealed by the high and negative proportion of total chromosomal inbreeding, as an outcome of inbreeding among loci within chromosomes (F IS = -0.6666), and the high coefficient of inbreeding (F = 0.8985). Chromosome 7 was found to be the most homozygous among the lines studied (Hom o = 0.9715), and Chromosome 3, the least (Hom o = 0.8928). Chromosome 7 possessed the lowest rate of gene flow (Nm = 0.0094) of all (Table 1).
According to the Shannon's information index (I) and Nei's gene diversity, Chromosome 10 was the most informative (I = 1.3111 and Nei = 0.7033), and Chromosome 2 the least variable (I = 0.7616 and Nei = 0.4562). The proportion of total chromosomal inbreeding due to variation among chromosomes, and that due to both inbreeding within the chromosome and variation among chromosomes, indicate the wide range of genetic diversity among the 10 pairs (F ST = 0.9385 and F IT = 0.8972). This was also shown by the high I and Nei estimates, with mean values of 1.0503 and 0.5850, respectively. Chromosomes 4, 5, 8, 9 and 10 with the highest number of alleles per locus, number of effective alleles, Shannon's information index and Nei's heterozygosity coefficient, were identified as the informative chromosomes in the inbred lines, thus making them appropriate for these specific genetic diversity studies. 446 pairs of alleles were in linkage disequilibrium, of which only 50 were less than 50 cM apart on the same chromosome (Table 2). Chromosome 4 bore the highest number of pairs in LD (122), whereas in Chromosome 6, there were only six. On considering only loci less than 50 cM apart, all were in linkage equilibrium in Chromosomes 1, 6 and 7. Those pairs in linkage disequilibrium, with the shortest distance from each other based on cM, were bnlg1208 and dupssr10 (1.80 cM), umc1230 and bnlg1520 (2.35 cM), phi080 and dupssr14 (3.53 cM), bnlg1152 and bnlg1607 (9.86 cM), bnlg2162 and umc1086 (10.47 cM), and umc1165 and umc1227 (17.48 cM), all located on Chromosomes 5, 2, 8, 8, 4 and 2, respectively ( Table 2). The highest conventional measure of linkage disequilibrium between pairs of alleles at two loci was in umc1165-4 and umc1227-4, followed by umc1109-4 and bnlg1337-4, bnlg1444-4 and umc1086-4, umc1532-4 and bnlg1337-3, bnlg244-6 and umc1033-6, and umc1319-5 and phi063-5 616 Informative chromosomes in Tropical Sweet Corn (all with r 2 = 0.999, c 2 = 26.00, at p £ 0.001) located on Chromosomes 2, 4, 4, 4, 9 and 10, respectively. All were less than 50 cM apart. Thus, linkage between loci on Chromosome 4 was stronger than in the others. In general, Chromosome 4, with the highest number of pairs of alleles at two loci in linkage disequilibrium (122), the highest number of pairs of alleles at two loci in linkage disequilibrium located less than 50 cM apart (14), and the highest overall mean of conventional measure of linkage disequilibrium between pairs of alleles at two loci separated less than 50 cM apart (r 2 = 0.44), is a highly desirable candidate for further QTL analysis.

Discussion
Previous studies have shown that corn contains numerous (Senior and Heun, 1993;Senior et al., 1996Senior et al., , 1998, and highly polymorphic microsatellites, even among small samples of corn inbreds Taramino and Tingey, 1996). This could be due to their capacity to detect codominantly inherited length polymorphisms of repetitive DNA sequences, and thus discriminate between large numbers of alleles (Matsuoka et al., 2002;Perseguini et al., 2011). Variation among the 10 sweet-corn chromosomes, obtained from the inbred lines investigated here, could be a expeditious basis for determining the degree of genetic relationships among different lines, or for detecting genes linked to microsatellite loci. Chromosomes with high variation among inbred lines could be employed in future genetic-diversity research, in which a low number of amplified loci could reveal high diversity. The highest genetic variation among the inbred lines was in Chromosomes 10, 8, 5, 9 and 4. Mohammadi et al. (2008) reported that Chromosomes 1, 6 and 9 with 4, 4 and 3 informative markers, respectively, significantly contributed to total variation in yield in corn inbred lines. Chromosomes with high Shannon's information index (I) and high Nei's heterozygosity coefficient (Nei) were found to be liable for a high proportion of total genetic variation, whereby their utility in effectively discriminating inbred lines quickly and at a lower cost. Nevertheless, the high I and Nei values obtained from a particular chromosome might be due to locus heterozygosity, rather than the heterozygosity of corresponding chromosomes coming from different inbred lines. Chromosomes 9, 8, 4 and 10, with over 90% homozygosity, were found to have the highest number of alleles per locus, number of effective alleles, and Shannon's information index, as well as a high Nei's heterozygosity coefficient, thereby indicating their capacity to discriminate among the inbred lines studied. Zhang et al. (1994Zhang et al. ( , 1996 investigated the effectiveness of chromosomal variation based on informative markers, by improving the correlations between molecular marker divergence and hybrid performance. Their conclusion was that the relationship between the effectiveness of chromosomal variation based on molecular marker hetero- Kashiani et al. 617 Table 1   .454** LD = linkage disequilibrium, D cM = difference between two alleles in cM, D = classical linkage disequilibrium coefficient measuring deviation from random association between alleles at different loci, D' = standardized linkage disequilibrium coefficient, r 2 = conventional measure of linkage disequilibrium between pairs of alleles at two loci and c 2 = Chi-square value. ** and * = significant at p £ 0.01 and significant at p £ 0.05, respectively. zygosity and heterosis is variable, depending on the genetic material used in the study, germplasm diversity, and the complexity of the genetic heterosis base. Crop yields and their components are complex characters, controlled by multi-genes and environmental factors . Linkage disequilibrium (LD), the non-random association of alleles at closely linked loci, is used to infer the location of gene coding traits by virtue of their correlated appearance with surrounding markers (Zavattari et al., 2000). In contrast to animals and humans, little information is available on linkage disequilibrium in crop species, with most research concentrating on Arabidopsis thaliana and field corn (Flint-Garcia et al., 2003;Stich et al., 2005Stich et al., , 2006. Since homogeneity was predominant, analysis was concentrated on linkage disequilibrium among the inbred lines, and not within a single one or group. The high incidence of linkage disequilibrium (446 pairs of loci in LD) implied that the extent of LD between microsatellite markers could possibly facilitate the detection of marker-phenotype associations in a genome scan. This value, although in agreement with the results of Liu et al. (2003), was considerably higher than that reported by Remington et al. (2001), and lower than that reported by Stich et al. (2005). The discrepancy between the present results and those above mentioned can presumably be attributed to marker density and the number of inbred lines evaluated. Marker density used in this study, although much higher than that reported by Remington et al. (2001), was approximately equal to that used by both Liu et al. (2003) and Stich et al. (2005). However, the number of inbred lines used in the current study was lower than that used by Remington et al. (2001), Liu et al. (2003) and Stich et al. (2005).
It has been theoretically shown that selection acting on a monogenic trait generates LD around the gene. Furthermore, if selection involves an oligogenic or polygenic trait, LD is generated, not only between linked genes, but also between unlinked genes coding for the trait. During development of the inbred lines used here, selection, which took place simultaneously for several traits with high heritability estimates and significant correlations with yield, could consequently have also generated LD between the genes influencing different traits. Therefore, the high proportion of observed LD in this study was generated by selection potential. It has been suggested that selection, relatedness, population stratification and genetic drift are important forces generating and conserving LD between pairs of genome-widely distributed microsatellite markers (Stich et al., 2005). Furthermore, the increasing physical distance between pairs of markers was found to decrease LD, due to crossingover (Talbert and Henikoff, 2010). However, no correlations could be found between physical distance and LD in corn Centromere 2 that had been fully sequenced (Wolfgruber et al., 2009), possibly through both the conversion of one marker ordinarily having no effect on the coinheritance of its neighbours, and crossingover having been suppressed around centromeres.
In order to avoid disequilibrium between loci influencing several traits, only LD between loci less than 50 cM apart on the same chromosome was considered. Under these conditions, 66.7% of Chromosome 4 loci were amplified in LD, whereas only 18.2% of Chromosome 3 and 5 loci were. Furthermore, Chromosome 4 was found to have the highest number of linkages among loci separated by either more or less than 50 cM (144 and 14 pairs in LD, respectively), thereby indicating the high potential utility of this chromosome for investigating marker-phenotype association and QTL. The presence of QTLs for Chromosone 4 corn agronomic traits has been amply reported (Sourdille et al., 1996;Lubberstedt et al., 1997;Ribaut et al., 1997;Khairallah et al., 1998;Yousef and Juvik, 2002;Juvik et al., 2003;Messmer et al., 2009). Furthermore, since they posses a considerable number of loci in disequilibrium (9, 7, 6 and 6, respectively), Chromosomes 8, 10, 9 and 5 were also identified as candidate chromosomes for further QTL studies. Based on the latest information from Maize GDB (March, 2011), 768 out of 1716 QTL reported for corn agronomic traits were found to be on Chromosomes 4,5,8,9 and 10 (114,179,189,171 and 115,respectively). Among the QTL found on Chromosomes 4, 5, 8, 9 and 10, 357 (46.48%) were found to have contributed to yield (Ho et al., 2002;Landi et al., 2002;Sibov et al., 2003;Moreau et al., 2004;Messmer et al., 2009;Hu et al., 2010a,b). thus indicating that, in the present study, the microsatellites in LD on these same Chromosomes are useful for detecting yield-marker association for further marker-assisted selection of inbred lines and their F 1 single-cross progenies.
Finally, microsatellites were found to be informative markers for revealing chromosomal variation among the tropical sweet corn inbred-lines studied herein. Chromosomes 4, 5, 8, 9 and 10 were found to possess a high proportion of specific genetic variation, thus revealing their appropriateness for genetic diversity studies. Chromosome 4, with the highest number of loci in linkage disequilibrium, is the most appropriate for marker-phenotype association and QTL mapping for yields and yield-related traits, with Chromosomes 5, 8, 9 and 10 as runner-ups.