Identification of RAPD and SCAR markers associated with yield traits in the Indian tropical tasar silkworm Antheraea mylitta drury

The tropical tasar silkworm, Antheraea mylitta, is a semi-domesticated vanya silk-producing insect of high economic importance. To date, no molecular marker associated with cocoon and shell weights has been identified in this species. In this report, we identified a randomly amplified polymorphic DNA (RAPD) marker and examined its inheritance, and also developed a stable diagnostic sequence-characterized amplified region (SCAR) marker. Silkworms were divided into groups with high (HCSW) and low (LCSW) cocoon and shell weights, and the F2 progeny of a cross between these two groups were obtained. DNA from these silkworms was screened by PCR using 34 random primers and the resulting RAPD fragments were used for cluster analysis and discriminant function analysis (DFA). The clustering pattern in a UPGMA-based dendogram and DFA clearly distinguished the HCSW and LCSW groups. Multiple regression analysis identified five markers associated with cocoon and shell weights. The marker OPW16905 bp showed the most significant association with cocoon and shell weights, and its inheritance was confirmed in F2 progeny. Cloning and sequencing of this 905 bp fragment showed 88% identity between its 134 nucleotides and the Bmc-1/Yamato-like retroposon of A. mylitta. This marker was further converted into a diagnostic SCAR marker (SCOPW 16826 bp). The SCAR marker developed here may be useful in identifying the right parental stock of tasar silk-worms for high cocoon and shell weights in breeding programs designed to enhance the productivity of tasar silk.


Introduction
The rearing of silkworm species of the families Bombycidae and Saturniidae is an age old practice that is an important part of the rural economy in India. The tropical tasar silkworm (Antheraea mylitta) produces tasar silk, otherwise known as vanya silk, that is famous for its luster, durability and uniqueness. Tasar silk fetches excellent price on the Indian domestic market and abroad. However, this silkworm occurs mainly in the wild, with only two races (Daba and Sukinda) being exploited for commercial egg production; both of these races are reared under semidomestic conditions. Since this silkworm is heterogeneous (Suryanarayana and Srivastava, 2005), with a high level of heterozygosity (Kar et al., 2005), the development of stable heterotic breeds or lines is difficult. Although conventional breeding techniques have been used to develop some breeds, none of them is suitable for widespread use by farmers. On the other hand, the need to improve the production of tasar silk and provide employment for poor farmers requires the use of highly productive breeds of this silkworm. In this setting, the identification of molecular markers associated with yield traits is of immense importance.
In recent years, several types of molecular markers have been introduced into crop and animal improvement programs. Randomly amplified polymorphic DNA (RAPD) is a technique that has been used to construct a linkage map of the silkworm Bombyx mori (Williams et al., 1990;Proboon et al., 1995) and to generate a good number of morpho-biochemical markers (Tazima, 1964;Doira, 1978;Shi et al., 1995;Nagaraju and Singh, 1997). In contrast, random fragment length polymorphism (RFLP) has been used to map food preference genes in B. mori while inter-simple sequence repeat (ISSR) markers have been used to establish the inheritance of yield traits (Keisuke et al., 2007). Multiple regression analysis (MRA) has been the tool of choice for identifying markers associated with the main yield traits such as cocoon weight and shell weight (Chatterjee and Pradeep, 2003;Chatterjee and Mohandas, 2003). RAPD markers have also been used for linkage analysis of some phenotypical characters in plants, such as fruit skin color (Inoue et al., 2006). Rapid progress has also been made in the development of RFLP-based quantitative trait loci (QTL) mapping for the improvement of crop species such as pepper and tomato (McCouch et al., 1997). DNA markers such as RFLP and RAPD linked to a specific yield trait have been widely explored and successfully used to improve target traits in crop plants and livestock (Tanksley et al., 1989).
RAPD and sequence characterized amplified region (SCAR) markers have been used to identify ecoraces and estimate the genetic variability in tasar silkworms (Saha and Kundu, 2006;Saha et al., 2008). The marked diversity and variation in these silkworms (Kar et al., 2005) has made use of bulk segregation analysis to identify markers associated with yield traits a challenging task (Martin et al., 1991;Michaelmore et al., 1991). To date, no molecular studies have attempted to identify yield trait-specific markers in A. mylitta, although the availability of such markers would facilitate selection during directional molecular breeding and would improve our understanding of the molecular genetic phenomenon behind yield parameters and their proper genetic manipulation. In this report, we describe the first identification and inheritance pattern of an RAPD marker, along with the development of a stable diagnostic SCAR marker associated with two important yield parameters (cocoon weight and shell weight) in ecorace Daba of A. mylitta.

Tasar silkworm
Ecorace Daba of A. mylitta was used in this study. Two experimental groups were formed during diapause based on cocoon weight and shell weight. The high cocoon and shell weight group (HCSW) consisted of individuals with a cocoon weight > 11 g and a shell weight > 2 g whereas the low cocoon and shell weight group (LCSW) consisted of individuals with a cocoon weight <10 g and shell weight < 1 g. In each group, care was taken to maintain the male:female ratio (55:45) that is characteristic of random populations of tasar silkworm. Initially, the size of each group was limited to 1000 and there were significant differences in the mean values for cocoon weight and shell weight (p < 0.001, Student's t-test). HCSW x LCSW crosses were undertaken during the first crop (July to August), also known as the seed crop. The larvae of both parental groups and crosses were reared on Terminalia arjuna using standard rearing practices. In the subsequent generation, the offspring were allowed to mate among themselves. In the second crop (September to November), also known as the commercial crop or the diapausing generation under Ranchi conditions, the parental combinations and F 2 progeny of the HCSW x LCSW cross were obtained. In the parental HCSW group the average cocoon weight was 14.52 ± 0.87 g and the shell weight was 2.53 ± 0.38 g. Similarly, in the parental LCSW group the cocoon and shell weights were 9.31 ± 0.35 g and 0.85 ± 0.23 g, respectively.
In the F 2 generation, the cocoon and shell weights were 11.52 ± 0.27 g and 1.53 ± 0.31 g, respectively. Random samples of these cocoons were used to obtain genomic DNA for molecular analysis.

DNA isolation and PCR for RAPD
Genomic DNA was isolated from the fat body tissue of individual pupae by a standard method (Sambrook et al., 2003). Briefly, 0.2 g of fat bodies from individual pupae was ground in liquid nitrogen using a prechilled mortar and pestle and suspended in 10 mM Tris-HCl, pH 8.0, containing 100 mM NaCl, 25 mM EDTA and 0.5% SDS. The suspension was then digested with proteinase K (100 mg/mL) at 50°C overnight and extracted with phenol:chloroform:isoamyl alcohol (25:24:1, v/v). RNA was removed by treatment with RNase A (100 mg/mL) at 37°C for 30 min and DNA was precipitated with 2.5 volumes of ethanol. Finally, the DNA pellet was dissolved in TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) and quantified spectrophotometrically. The quality of the purified DNA was analyzed by electrophoresis in a 0.8% agarose gel and diluted to 5 ng/mL prior to use in the RAPD experiments.
DNA isolated from individual pupae of the HCSW and LCSW groups and F 2 progenies was initially screened by PCR with 34 random primers at the optimum annealing temperature and MgCl 2 concentration indicated in Table 1. RAPD-PCR for each DNA sample was done in triplicate, as described by Williams et al. (1990). The 25 mL reaction mixture contained 1X PCR buffer (Bioline), a variable concentration of MgCl 2 (2.0-2.5 mM), 100 mM dNTP mix, 0.2 mM primer, 0.75 U of Taq polymerase (Bioline) and genomic DNA (25 ng). The PCR was done using the following conditions: initial denaturation at 94°C for 3 min, followed by 45 cycles of denaturation at 94°C for 1 min, annealing at 36-42°C for 1 min, and extension at 72°C for 2 min, with a final extension at 72°C for 7 min. The amplified products were electrophoresed in a 1.5% agarose gel, stained with ethidium bromide, visualized with a UV transilluminator and photographed with a Kodak EDAS 290 gel documentation system.

Cloning and sequencing of an HCSW-specific RAPD fragment
An amplicon (~900 bp) specific for the HCSW group (amplified by the primer OPW16) was excised from the agarose gel and eluted with a QIAquick gel extraction kit (QIAGEN) according to the manufacturer's protocol. The eluted DNA was cloned into the vector pTZ57R/T (MBI Fermentas), transformed in XL Blue E. coli cells and the recombinant clones were selected on LB agar plates containing ampicillin and tetracycline. Plasmid DNA was isolated from these recombinant bacteria and analyzed by digestion with EcoRI and HindIII. Plasmids harboring inserts of the proper size were sequenced using M13 forward and reverse 744 Dutta et al.
sequencing primers with a cycle sequencing kit (Big Dye R Terminator 3.1, ABI) in an automated DNA sequencer (ABI Model 3100). The sequence was analyzed with the Sequencher program (Genecodes Corp.) and BLAST searches were used to identify homology with other sequences.

Development of diagnostic SCAR markers
A pair of specific forward and reverse primers (AmSCAR-F 5-TAGTGAAGGCGGCGTAGAGTAAGG GAG-3 and (AmSCAR-R 5-ACACTACACAAAGTACC AGCGACCCG-3), designed based on the sequence of the HCSW fragment (OPW 16 905 bp ) identified in the previous section, were used to amplify specific bands of genomic DNA from all individuals in the HCSW, LCSW and F 2 groups in order to convert the corresponding RAPD marker into a single locus sequence-characterized amplified region (SCAR) marker. PCR was done in a 25 mL reaction mixture containing 25 ng of genomic DNA, 2 mM MgCl 2 , 1X PCR buffer (Bioline), 100 mM dNTP mixture, 0.2 mM primers and 1 U of Taq polymerase (Bioline) under the following conditions: initial denaturation at 94°C for 3 min, followed by 35 cycles of denaturation at 94°C for 1 min, annealing at RAPD and SCAR markers for yield traits in A. mylitta 745 65°C for 1 min and extension at 72°C for 2 min, with a final extension at 72°C for 7 min. The PCR products were separated by electrophoresis on a 1% agarose gel, stained with ethidium bromide, visualized with a UV transilluminator and photographed. This SCAR marker was then cloned, sequenced and aligned with the sequence of OPW16 905 bp using ClustalW2 (EMBL-EBI).

Statistical analysis of RAPD data
The reproducible RAPD bands generated by 34 random primers were scored for presence (1) and absence (0) in a binary fashion. Similarity coefficients (S) between isolates were calculated using the formula S = 2Nxy/(Nx + Ny), where Nx and Ny are the number of fragments amplified in samples X and Y, respectively, and Nxy is the number of bands shared by the two isolates (Nei and Li, 1979). Similarity coefficients were converted to genetic distances (D) using the equation: D = 1 -S. A genetic distance matrix was used to construct a dendrogram by the unweighted pair group method with arithmetic averages (UPGMA) using POPGENE 1.32 software (Yeh and Boyle, 1997). The RAPD data were subjected to bootstrap analysis with 1000 replications by using the program WINBOOT and selecting the Dice similarity coefficient (Yap and Nelson, 1996), which is the same as the similarity coefficient of Nei and Li (1979). Discriminant function analysis (DFA) was done with the SPSS/PC+ 11.5 program (M. J. Norusis, SPSS Inc., Chicago). The squared Mahalanobis distance was used to test the group centroid. The Mahalanobis distance is the distance between a case and the centroid (mean of all cases in the group) for each group of the dependent variable in the attributed space (n -dimensional space defined by n variables) (Af and Clark, 1984). For DFA, the dependent variable was cocoon weight and four arbitrary groups were used (two each from the low and high cocoon weight groups). The range of cocoon weights for groups 1, 2, 3 and 4 was 7.0-8.5 g, 8.6-9.9 g, 11.0-12.5 g and > 12.6 g, respectively.
The association between RAPD markers and cocoon weight was estimated by stepwise multiple regression analysis in which cocoon weight was treated as the dependent variable and the RAPD markers were treated as independent variables. The analysis was based on the model: Y = a + b 1 m 1 + b 2 m 2 + + b j m j + + b n m n + d + e, which related the variation in the dependent variable (Y = accession means for pupation rate) to a linear function for the set of independent variables m j , the latter representing the RAPD markers. In this equation, bj corresponds to the partial regression coefficients that specified the empirical relationship between Y and m j , d represents the accession residual that is left after regression, and e is the random error of Y that includes environmental variation. To select independent variables for the regression equation, F-values with probabilities of 0.045 and 0.099 were used. R 2 denotes the square of R, the multiple regression value. Selected mark-ers were also tested with linear curve fitting using linear models.

RAPD-PCR of genomic DNA
The screening of genomic DNA from randomly selected tasar silkworms from two weight groups (HCSW and LCSW) and F2 progenies using 34 random primers (Table 1) yielded several reproducible amplicons. The average number of amplicons produced per DNA sample was 2-7 per primer, with sizes that ranged from 300 to 2500 bp. The percentage of polymorphism was 79% and the degree of polymorphism appeared to be independent of the number of markers generated by a particular primer. Figure 1 shows the UPGMA-based cluster analysis of 18 silkworm samples. The dendrogram separated the HCSW (samples 10 to 18) and LCSW (samples 1 to 9) groups into two major clusters with a significant bootstrap value of 92. In the HCSW group, samples 10, 11, 16, 12, 14 and 13 formed a sub-cluster while samples 15, 18 and 17 746 Dutta et al. formed another sub-cluster. Similarly, in the LCSW group, samples 3, 8, 4 and 1 formed a sub-cluster and samples 5, 6, 7, 9 and 2 formed another sub-cluster.
For DFA, the dependent variable was cocoon weight and four arbitrary groups were defined (two each from the low and high cocoon weight groups). The mean range of cocoon weight for groups 1, 2, 3 and 4 was 7.0-8.5 g, 8.6-9.9 g, 11.0-12.5 g and >12.6 g, respectively.
A matrix plot of discriminant function 1 versus function 2 revealed separation of the weight groups (Figure 2). The HCSW group occupied a position from 2.0 to 9.0 on the x-axis and from -3.0 to 2.1 on the y-axis of the plot. In the dendogram, this group also formed clusters clearly separated from the LCSW group. In the LCSW group, samples 2, 5, 6, 7 and 9 occupied a position from -5.8 to -8.0 on the x-axis and from 2.0 to 5.0 on the y-axis. In the dendrogram, this group also formed a major cluster. In addition, samples 1, 3, 4 and 8 of the LCSW group occupied positions from -5.0 to -9.0 on the x-axis and from -5.9 to -3.2 on the y-axis. In the dendrogram, these samples also formed a separate cluster. DFA identified two functions, the first of which explained 78.9% of the variance. The canonical correlation value was estimated to be 0.991 (Wilk's l = 0.001; c 2 = 87.255, p <0.001). The results of DFA thus supported the UPGMA-based cluster analysis.

Association of RAPD markers with quantitative traits
Stepwise multiple regression analysis (MRA) was used to assess the relationship (association) of DNA markers with yield parameters such as cocoon weight and shell weight. In MRA, each variable is entered sequentially and its value assessed. If adding the variable contributes to the model then it is retained and all other variables in the model are re-tested to determine whether they still contribute to the success of the model. In this case, R 2 is interpreted for the overall relationship at the step or model when the last statistically significant variable was entered. Variables that no longer make a significant contribution are removed. This method ensures that the smallest possible set of predictor variables is included in the steps or models. Table 2 provides the summary statistics of MRA for five identified markers. In the first model, MRA identified OPW16-2 which contributes to 61.2% of the total variance (R = 0.782, p <0.0001); the standardized b coefficient was also very high (0.757). In the next step or model 2, an additional marker band (OPW18-5) that contributed a further 16.1% of variance was identified. Subsequent steps identified three additional markers (OPC9-3, HAP03-6 and OPAJ08-4) and the final step (model 5) accounted for 93.7% of the variance (R 2 = 0.937). The high regression coefficients (R) with significant F values substantiated the strength of the association of these markers with cocoon weight and shell weight. The primer OPW16 consistently amplied a single, intense band of~900 bp (designated as OPW16 905 bp ) along with other bands of different sizes in all individuals of the HCSW group. . The scatter diagram of the MRA plot ( Figure 3) showed a significant, positive linear relationship between the OPW16-2 marker (i.e., OPW16 905 bp ) and the samples of the HCSW and LCSW groups.

RAPD fragment
Electrophoretic analysis of the PCR products amplified by primer OPW 16 revealed 152 amplicons of 13 sizes. Of these amplicons, 72 belonged to the HCSW group, 51 to the LCSW group and 29 to F2 progeny (HCSW x LCSW) (Figure 4). The amplicon sizes varied from 500 to <2500 bp and the observed polymorphism in the HCSW and LCSW groups and F2 progeny was 86.1%, 84.3% and 44.8%, respectively (Table 3). A single, intense band of~900 bp was consistently amplified in all individuals of the HCSW group and in 50% of the F 2 progeny (HCSW x LCSW) but in none of the LCSW group, except for sample 19, in which there was non-specific amplification of a band with a similar size (not amplified by the SCAR primer). Cloning and sequencing of this HCSW-specific RAPD fragment showed that it consisted of 905 nucleotides containing the OPW16 sequence at its 5' and 3' ends (GenBank accession number: JQ710731). BLAST searches of the nucleotide sequences in GenBank showed that 134 nucleotides of this fragment (nucleotides 628 to 762) shared 88% identity with the A. mylitta Bmc1/Yamato-like retroposon sequence, with an E value of 3e-34.

Conversion of OPW 16 905 bp into a SCAR marker
To generate a stable HCSW-specific diagnostic SCAR marker, two primers (AmSCAR-Forward and AmSCAR-Reverse) for PCR were synthesized based on the OPW16 905 bp sequence. As shown in Figure 5, a single specific band of~800 bp was amplified only in the HCSW group and in 50% of the F 2 progeny, thus indicating inheritance of this marker. Cloning and sequencing of this SCAR marker (826 bp) revealed a complete match with the original sequence of the OPW16 905 bp marker. The lack of this specific amplicon in the LCSW group indicated the efficacy of this marker in distinguishing the HCSW group from the others.

Discussion
Among commercial silk producing lepidopterans, A. mylitta has a special status, not only for its unique silk but also because of its contribution to the rural economy in India. In addition to its common use as a natural filament, two proteins produced by this silkworm (fibroin and sericin) have recently been shown to be useful biomaterials as a matrix for culturing cells (Acharya et al., 2009). However, silk production by this silkworm is hampered by a lack of high yield breeds or hybrids, hence the need to increase productivity through modern molecular tools. 748 Dutta et al.  The use of molecular markers to select parents and hybrids with desirable traits at the early stages of breeding programs is gradually increasing. Virk et al. (1996) strongly advocated the use of a regression model to identify the best genotype within a segregating population or in a pool of land races/varieties. In the silkworm B. mori, MRA has been used to identify molecular markers associated with yield traits (Sethuraman et al., 2002;Chatterjee and Pradeep, 2003;Chatterjee and Mohandas, 2003;Mohandas et al., 2004).
RAPD analysis provides wider coverage of the genome than microsatellite analysis, particularly for species such as A. mylitta, for which there is still no complete genome map (Gu et al., 1998). Jain et al. (2010) reviewed the use of RAPD and SCAR markers for analyzing insect-plant interactions, insect-pathogen interactions and studies of genetic diversity. In the present work, genomic DNA from selected individuals of the HCSW and LCSW groups of ecorace Daba and their F2 progeny (HCSW x LCSW) were amplified using random primers. The high level of polymorphism and heterozygosity observed in these samples corroborated the findings of previous studies (Kar et al., 2005;Saha and Kundu, 2006;Saha et al., 2008), and the HCSW group formed a separate cluster in the UPGMA dendogram.
In the wild, A. mylitta rarely undergoes inbreeding and yield traits such as cocoon weight, larval weight and shell weight are likely to be under polygenic control (Shibukawa et al., 1986;Rao et al., 1991). Consequently, bulk segregation analysis (Michaelmore et al., 1991) may not be ideal for detecting marker genes in this highly heterogeneous insect species. However, the selection of a group of markers, together with statistical approaches such as MRA and DFA, may be useful in explaining the polygenic control of these characters. For this reason, we used MRA and DFA to identify markers in selected individual pupae that differed in cocoon weight and shell weight. DFA showed that the distribution of the HCSW group differed from that of the LCSW group. In addition, OPW16-2 and four other associated RAPD markers (OPW18-5, OPC9-3, HAP03-6 and OPAJ08-4) that showed a significant positive association with cocoon weight and shell weight were identified by MRA. High regression coefficients (R) with significant F values further substantiated the association of these markers with cocoon weight and shell weight. Together, these findings indicate that all of these RAPD markers can be used to screen parental stock for breeding purposes.
A specific marker (OPW16 905 bp ) was identified and showed a positive, significant association with the HCSW group, as confirmed by MRA and DFA. The inheritance was also confirmed in F 2 progeny. Cloning and sequencing of this marker and subsequent BLAST analysis revealed that this sequence shared 88% identity with the Bmc-1/Yamato-like retroposon sequence of A. mylitta. The presence of this RAPD fragment was further verified by developing a specific diagnostic SCAR marker. Only a single band was amplified in all individuals of the HCSW group and in 50% of the F 2 progeny, indicating that this marker may be located in a single locus. SCAR markers are high fidelity molecular DNA markers (Michaelmore et al., 1991;Nair et al., 1995Nair et al., , 1996 that have a vital role in determining the configuration of RAPD markers; the codominance of SCAR markers overcomes some of the limitations of RAPD markers. SCAR markers have been developed to detect Helicoverpa armigera (Agusti et al., 1999) and the RAPD and SCAR markers for yield traits in A. mylitta 749