Exploiting a wheat EST database to assess genetic diversity

Expressed sequence tag (EST) markers have been used to assess variety and genetic diversity in wheat (Triticum aestivum). In this study, 1549 ESTs from wheat infested with yellow rust were used to examine the genetic diversity of six susceptible and resistant wheat cultivars. The aim of using these cultivars was to improve the competitiveness of public wheat breeding programs through the intensive use of modern, particularly marker-assisted, selection technologies. The F2 individuals derived from cultivar crosses were screened for resistance to yellow rust at the seedling stage in greenhouses and adult stage in the field to identify DNA markers genetically linked to resistance. Five hundred and sixty ESTs were assembled into 136 contigs and 989 singletons. BlastX search results showed that 39 (29%) contigs and 96 (10%) singletons were homologous to wheat genes. The database-matched contigs and singletons were assigned to eight functional groups related to protein synthesis, photosynthesis, metabolism and energy, stress proteins, transporter proteins, protein breakdown and recycling, cell growth and division and reactive oxygen scavengers. PCR analyses with primers based on the contigs and singletons showed that the most polymorphic functional categories were photosynthesis (contigs) and metabolism and energy (singletons). EST analysis revealed considerable genetic variability among the Turkish wheat cultivars resistant and susceptible to yellow rust disease and allowed calculation of the mean genetic distance between cultivars, with the greatest similarity (0.725) being between Harmankaya99 and Sönmez2001, and the lowest (0.622) between Aytin98 and Izgi01.


Introduction
Wheat (Triticum aestivum L.) is one of the most important crops in the world and is grown in all agricultural regions of Turkey. The total area cultivated worldwide and in Turkey is 210 and 9.4 million ha, respectively (Zeybek and Yigit, 2004). The allohexaploid wheat genome (2n = 6x = 42) is one of the largest among crop species, with a haploid size of 16 billion bp (Bennett and Leitch, 1995), and its genetics and genome organization have been extensively studied using molecular markers (Yu et al., 2004;Ercan et al., 2010;Akfirat-Senturk et al., 2010).
In recent year, expressed-sequence tags (ESTs) have become a valuable tool for genomic analyses and are currently the most widely used approach for sequencing plant genomes, both in terms of the number of sequences and total nucleotide counts (Rudd, 2003). EST analysis provides a simple strategy for studying the transcribed regions of genomes, and renders complex, highly redundant genomes such as that of wheat amenable to large-scale analysis. The number of ESTs and cDNA sequences in public databases such as GenBank has increased exponentially in recent few years, and EST-based markers have been used to distinguish varieties and assess genetic diversity in wheat (Kantety et al., 2002;Leigh et al., 2003).
Yellow rust, a destructive disease of wheat triggered by the biotrophic fungus Puccinia striiformis f. sp. tritici (Chen 2005), is the most frequent and important cereal disease in Turkey, where it causes grain yield losses of 40%-60% and lowers the quality of cereal products (Zeybek and Yigit, 2004). In this study, an EST database for yellow rust-infested wheat was used, in conjunction with a multi-variate statistical package (MVSP v.3.1), to assess the genetic diversity of yellow rust resistant and susceptible wheat genotypes. For this, EST sequences were assembled into longer contiguous sequences (contigs) using Vector NTI 10.0 software. Difficulties related to sequencing errors and the determination of orthology associated with the use of ESTs for systematics can be minimized by using several reads to assemble contigs and EST clusters for each region (Parkinson et al., 2002;Torre et al., 2006). The knowledge gained about the genetic constitution and relationships of genotypes using this approach should prove useful in the optimization of wheat breeding programs.

Plant material and evaluations
Six homozygous bread wheat genotypes (three yellow rust-resistant cultivars: PI178383, Izgi01, Sönmez2001, and three yellow rust-susceptible cultivars: Harmankaya99, ES14, Aytin98) were obtained from the Anatolian Agricultural Research Institute, Eskisehir, Turkey. The resistance of the parental cultivars and F 2 generation was tested in greenhouses by applying uredospores. Two weeks after the inoculation the infection was scored on a scale of 0-9 (McNeal et al., 1971), with scores of 0-6 indicating a low infection and 7-9 indicating a high infection. The disease score for PI178383, Izgi01 and Sönmez2001 was 0 while that of Harmankaya99, ES14 and Aytin98 was 8, this confirming the resistance and susceptibility of the parental genotypes.

Analysis of wheat yellow rust ESTs
ESTs from a yellow rust-infected wheat cDNA library (TA117G1X) were selected from the GrainGenes website and processed by means of VecScreen database searches to remove undesired vector fragments from the sequences. The Vector NTI 10.0 contig express program (InforMax, Bethesda, MD, USA) was used to construct contig tags from the EST sequences and the Contig Express module was used to assemble small fragments in text or chromatogram formats into contigs (Lu and Moriyama, 2004). Singletons were constructed from unassembled ESTs. The EST sequences were aligned and analyzed with ClustalW v.1.82 to identify conserved domains. Functional annotation was done using the BlastX algorithm of the Basic Alignment Search Tool (Altschul et al., 1990). PCR primers for the contigs and singletons selected for further characterization were designed with Primer Premier 5.0 and Primer 3.0 software (Figure 1). EST-derived contig and singleton primers were used to assess the genetic diversity of the six wheat genotypes.

PCR analyses of contigs and singletons
Total genomic DNA was extracted from the leaves of resistant and susceptible plants using the method of Wei-ning and Langridge (1991) as modified by Song and Henry (1995). Genomic DNA amplifications with sense and antisense primers were done using a PTC-100 MJ thermocycler (MJ Research, Watertown, MA) in a 25 mL reaction volume. Each reaction contained 1X Taq buffer (MBI Fermentas, Germany), 2.5 mM MgCl 2 (MBI Fermentas), 0.2 mM dNTP (MBI Fermentas), 400 nM of forward primer, 400 nM of reverse primer, 0.625 U of Taq polymerase/mL (MBI Fermentas) and 100 ng of genomic DNA. The thermal cycling parameters were: 3 min at 94°C (initial denaturation), 37 cycles of 1 min at 94°C, 1 min at 40-58°C (depending on the annealing temperature) and 1 min at 72°C, followed by a final extension at 72°C for 10 min. PCR products were separated in 2% agarose gels, stained with ethidium bromide and examined under UV light.

Genetic similarity estimation and cluster analyses
Each contig and singleton band was scored as absent (0) or present (1) for the different cultivars and the data were entered into a binary matrix as discrete variables ('1' for presence and '0' for absence of a homologous fragment). Only distinct, reproducible, well-resolved fragments were scored and the data were analyzed using MVSP 3.1 software (Kovach, 1999). This software package was also used to cal-720 Karakas et al. culate Jaccard (1908) similarity coefficients to construct a dendrogram by a neighbour-joining algorithm.

Results
Assembly of contigs and blast analysis Table 1 summarizes the characteristics of the database used in this analysis. 1549 ESTs were selected from a yellow rust-infested wheat cDNA library (TA117G1X) and used to assemble 136 contigs. The number of individual ESTs belonging to each contig ranged from 2 to 57. Singletons were derived from unassembled ESTs and accounted for 72.63% of ESTs. Tables 2 and 3 show the results of the NCBI database searches done using the contig and singleton sequences. The BlastX searches revealed that 39 contigs (29%) were homologous to wheat genes ( Figure 2). Contigs 3, 4, 11, 13, 16 and 112 did not match any organism. Contig 77 matched a sequence of unknown function (data not shown) while other contigs (71%) showed homology to genes of known function. The BlastX search also showed that 96 singletons (10%) were homologous to wheat genes (Figure 3), whereas 147 singletons (14%) did not match any organism and had no functional annotation (data not shown). The 39 contigs and 96 singletons that matched wheat proteins were assigned to eight functional groups that included protein synthesis, photosynthesis, metabolism and energy, stress proteins, transporter proteins, protein breakdown and recycling, cell growth and division and reactive oxygen scavengers. Photosynthesis was the major functional category of contigs, with nine proteins (22%), whereas cell growth and division was the smallest, with one protein (3%) (Figure 2). Metabolism was the major functional category of singletons, with 37 proteins (38%), whereas protein breakdown and recycling and cell growth and divison were the smallest functional categories, with three proteins (3%) (Figure 3). Tables 4 and 5 show the sense and antisense primers used to assess the genetic diversity of wheat cultivars; these primers were designed A wheat EST database for diversity 721      based on the contig and singleton sequences that were homologous to wheat genes.

EST-derived contig and singleton polymorphisms
PCR analyses with the contig and singleton primers showed that the most polymorphic functional categories were photosynthesis (30%) and metabolism and energy (46%) for contigs and singletons, respectively (Figures 4  and 5). Of the 39 contig and 92 singleton primers used to characterize the genetic diversity of the six wheat genotypes, 14 contig and 48 singleton primers were polymorphic in susceptible and resistant wheat cultivars. Table 6   724 Karakas et al.   summarizes the mean genetic distance and genetic identity between the cultivars as determined by MVSP 3.1. Pairwise within-group distances ranged from 0 to 0.725, with the highest similarity (0.725) occurring between Harman-kaya99 and Sönmez2001 and the lowest (0.622) between Aytin98 and Izgi01. Figure 6 shows the dendrogram based on the similarity index (Jaccard's coefficient) of the six cultivars. Two main clusters were observed, the first of which included cultivars Aytin98 and ES14 while the second was divided into two subclusters, the first of which comprised PI178383 while the second contained Izgi01, Sönmez2001 and Har-mankaya99. The latter subcluster consisted a group containing Izgi01 and another containing Sönmez2001 and Harmankaya99. The construction of this dendrogram dem-728 Karakas et al.