Sequence characterization of hypervariable regions in the soybean genome : leucine-rich repeats and simple sequence repeats

The genetic basis of cultivated soybean is rather narrow. This observation has been confirmed by analysis of agronomic traits a mong different genotypes, and more recently by the use of molecular markers. During the construction of an RFLP soybean map ( Glycine soja x Glycine max) the two progenitors were analyzed with over 2,000 probes, of which 25% were polymorphic. Among the probes that revealed polymorphisms, a small proportion, about 0.5%, hybridized to regions that were highly polymorphic. Here we report the sequencing and analysis of five of these probes. Three of the five contain segments that encode leucine-rich repeat (LRR) sequence hom ologous to known disease resistance genes in plants. Two other probes are relatively AT-rich and contain segments of (A) n/(T)n. DN segments corresponding to one of the probes (A45-10) were amplified from nine soybean genotypes. Partial sequencing of these amplicons sugges ts that deletions and/or insertions are responsible for the extensive polymorphism observed. We propose that genes encoding LRR pr teins and simple sequence repeat region prone to slippage are some of the most hypervariable regions of the soybean genome. Núcleo de Biotecnologia Aplicada à Agropecuaria, Universidade Federal de Viçosa, DBG/BIOAGRO, 36571-000 Viçosa, MG, Brasil. Send correspondence to E.G.B. Fax: +55-31-899-2864. E-mail: ebarros@mail.ufv.br DuPont Agricultural Biotechnology Genomics, Delaware Technology Park, Suite 200, 1 Innovation Way, PO Box 6104, Newark, DE 19714-6104, USA.


INTRODUCTION
Studying the genetic diversity among and within plant species is informative not only from an evolutionary point of view but it is also useful for breeding purposes.Soybean (Glycine max L. Merrill) is an autogamous species (2n = 40) with 1.81 x 10 9 pairs of nucleotides in its genome distributed in repetitive (60%) and non-repetitive sequences (40%) (Goldberg, 1978).In spite of the extensive variability of the species, several studies have demonstrated the low diversity of cultivated soybeans (Delannay et al., 1983;Hiromoto and Vello, 1986;Abdelnoor et al. 1995;Powell et al., 1996b).In the United States, 88% of the cultivars used in the Northern part of the country and 70% of those used in the South derive from only 10 ancestors (Delannay et al., 1983).In Brazil, 80% of the cultivars recommended for 1983/84 introduction were derived from nine ancestors (Hiromoto and Vello, 1986).
More recent molecular marker data suggest that the limited genetic diversity of cultivated soybean (G.max) is due not only to selection during the breeding process but also due to its domestication from G. soja (Morgante et al., 1994;Powell et al., 1996b).
Several genetic maps have been constructed for soybean, using populations derived from interspecific (Shoemaker and Olson, 1993) as well as from intraspecific crosses (Lark et al., 1993).We constructed an RFLP map for soybean (G.soja x G. max) with low copy number probes generated from a PstI soybean genomic library (Rafalski and Tingey, 1993).Among more than 2,000 pro-bes analyzed, only a few (0.5%) were extremely polymorphic when tested in different soybean lines.In this work, we characterized five of these probes in order to understand the genetic basis of the polymorphism revealed by them.
Each DNA, cloned in the vector pBluescript (Stratagene), was sequenced using dye primer chemistry (ABI 373A, Perkin-Elmer) using T3 and T7 primers.The sequencing was completed with dye terminator chemistry using custom-designed oligonucleotide primers.The sequences were deposited in the GenBank under accession numbers: AF 215727, AF 215728, AF 215729, AF 217488, and AF 217489.They were searched for open reading frames (ORFs) and compared to sequences contained in the GenBank release 109 using Blast (Altschul et al., 1997).

DNA hybridization analysis
DNA samples from leaves of six different soybean genotypes (Bonus, PI 81.762, PI 416.937, N85-2176, PI 153.293 and PI 230.970) were extracted (Murray and Thompson, 1980) and digested with the restriction enzymes Sequence characterization of hypervariable regions in the soybean genome: leucine-rich repeats and simple sequence repeats Everaldo G. de Barros 1 , Scott Tingey 2 and J. Antoni Rafalski 2

Abstract
The genetic basis of cultivated soybean is rather narrow.This observation has been confirmed by analysis of agronomic traits among different genotypes, and more recently by the use of molecular markers.During the construction of an RFLP soybean map (Glycine soja x Glycine max) the two progenitors were analyzed with over 2,000 probes, of which 25% were polymorphic.Among the probes that revealed polymorphisms, a small proportion, about 0.5%, hybridized to regions that were highly polymorphic.Here we report the sequencing and analysis of five of these probes.Three of the five contain segments that encode leucine-rich repeat (LRR) sequence homologous to known disease resistance genes in plants.Two other probes are relatively AT-rich and contain segments of (A) n /(T) n .DNA segments corresponding to one of the probes (A45-10) were amplified from nine soybean genotypes.Partial sequencing of these amplicons suggests that deletions and/or insertions are responsible for the extensive polymorphism observed.We propose that genes encoding LRR proteins and simple sequence repeat region prone to slippage are some of the most hypervariable regions of the soybean genome.BamHI, EcoR1, EcoRV, HindIII and PstI (10 units per µg of DNA).The DNA fragments were separated on a 0.7% agarose gel (10 µg per lane), transferred to a nylon membrane (Hybond N + , Amersham; Sambrook et al., 1989), and probed with plasmids A1-10, A2-08, A45-10, A53-09 or A75-10 labeled with 32 P-or GeneImages (Amersham) random primer labeling system.Labeling, washing and detection were according to the instructions contained in the GeneImages Kit (Amersham) or according to standard RFLP protocols (Rafalski et al., 1996).

RESULTS AND DISCUSSION
Five soybean clones isolated from a PstI genomic library and used as RFLP probes detected multiple polymorphic regions in the soybean genome.We considered a probe hypervariable if it detected multiple polymorphisms among the six soybean lines tested.Most RFLP probes are either monomorphic (75%) or detect two allelic variants among the soybean lines tested (Powell et al., 1996a).Figure 1 shows the hybridization patterns obtained with a hypervariable probe A45-10.
It is well known that the soybean genome is relatively monomorphic due to the narrow genetic base of cultivated soybean and possibly due to the domestication process from G. soja (Morgante et al. 1994;Powell et al., 1996b).Therefore, identification of hypervariable regions is of considerable interest.
The five clones were sequenced and analyzed for sequence homology to known genes.As the genomic library from which the clones were isolated was a PstI library, there was a high probability that they would map to transcriptionally active regions of the genome (Keim and Shoemaker, 1988).Two of the clones (A1-10 and A2-08) did not show significant homology to any sequences in GenBank.We also searched DuPont's collection of over 140,000 soybean ESTs and did not find significant sequence homology.One of these clones is AT rich (A2-08, 74% AT) and both clones contain regions of simple sequence repeats (SSRs).A1-10 contains a (A) 12 motif, and A2-08 contains three (A/T) 8 motifs and seven (A/T) 7 mo-tifs.SSRs were found to be highly polymorphic (Powell et al., 1995;Powell et al., 1996b).We conclude that SSRlike sequences are likely to contribute to hypervariability of A1-10-and A2-08 homologous loci in soybean.
Analysis of clones A45-10, A53-09 and A75-10 revealed the presence of ORFs.Comparison of predicted amino acid sequences in all reading frames to sequence databases revealed homology to proteins coded by known disease resistance genes in plants (Table I).
The most noticeable feature of all the putative amino acid sequences, especially the one derived from clone A45-10, is the presence of leucine-rich repeats (LRRs).The protein sequence resulting from conceptual translation of the ORF present in this clone was arranged in a pattern of imperfect LRRs (Figure 2; Kajava, 1998).These repeats correspond to protein structural elements that are thought to be associated with protein-protein interactions (Kajava, 1998).Proteins coded by plant disease resistance genes frequently contain LRR.They normally fall into two classes: those with extracytoplasmatic LRRs with the 24amino acid consensus LxxLxxLxxLxLxxNxLxGxIPxx, and those with cytoplasmatic LRRs with the 23 or 24-amino acid consensus LxxLxxLxxLxLxx(N/C/T)x(x)LxxIPxx regions (Jones and Jones, 1997).
Disease resistance genes are expected to evolve much more rapidly than, for example, genes of central metabolism, because of the selection pressure from the pathogen (Michelmore and Meyers, 1998).In fact, several rice disease resistance-like genes do not cross-hybridize to maize genomic DNA (Tarchini, R., unpublished observations), while genes encoding metabolic enzymes from corn and rice cross-hybridize (Chen et al., 1997).A45-10, A53-09 and A75-10 may in fact be disease resistance genes, explaining the high variability of RFLP patterns when they are used as hybridization probes.
To understand the molecular nature of the polymorphism, DNA segments homologous to A45-10 were isolated.To this end, oligonucleotide primers corresponding to A45-10 were designed and PCR was performed using DNA from nine soybean cultivars and PIs.Between one and three amplification products per cultivar were produced.Individual amplification products were cloned and sequenced (data not shown).Comparison of the DNA sequences revealed numerous insertions/deletions and single nucleotide changes among the different size clones from the same PCR.The interpretation of the result is complicated by the difficulty of assigning allelic relationships between multiple amplification products.The hybridization results (Figure 1) indicated that A45-10 was a member of a moderate-size family of related sequences, and several family members were represented among the amplification products.In the case of two G. soja accessions, PI 440.913B and PI 81.762,only one amplification product was identified from each accession.These were assumed to be allelic and compared.Several small deletions, 1-13 bp, explain the size differences between these PCR products.Many single nucleotide changes are also present.Such sequence variants could be the result of selection  acting upon products of intragenic unequal crossing over, or replication errors caused by the repetitive nature of these sequences.This is in agreement with mechanisms proposed recently by Michelmore and Meyers (Meyers et al., 1998;Michelmore and Meyers, 1998).
One EST, sls1c.pk010.j1,isolated from soybean infected with Sclerotinia sclerotiorum has 83% similarity at the nucleotide level to clone A75-10.This cDNA was similar to TMV resistance protein N (Table I) and shows that A75-10 is also likely to represent a disease resistance gene, although we do not have a direct evidence for its transcriptional activity.
No ESTs corresponding to A45-10 were identified in our collection.Nevertheless, highly significant homologies to LRR-containing disease resistance genes were identified throughout the length of the ORF (Table I).
Map-based cloning, transposon tagging, and PCR amplification of conserved regions have been used to clone a great number of disease resistance genes and disease resistance gene analogs in the past few years (Martin et al., 1993;Kanazin et al., 1996;Yu et al., 1996;Liester et al., 1996Liester et al., , 1998)).As we demonstrated here, sequences related to these genes explain some of the polymorphism present in the soybean genome.It is particularly striking that of the five most highly polymorphic probes studied here, three contain LRRs.Therefore, the use of disease resistance gene homologs and LRRs in particular as probes for genetic mapping and especially fingerprinting of accessions and cultivars may provide two significant benefits.These probes may reveal frequent polymorphisms, and these polymorphisms are likely to be related to agronomically relevant disease resistance phenotypes.Similarly, searching for LRR-containing disease resistance gene homologs, by degenerate PCR or other methods, provides an approach to the isolation of highly polymorphic mapping probes.

ACKNOWLEDGMENTS
We would like to thank Sylvia Stack and Maureen Dolan for DNA sequencing, Mike Hanafey for the development of EST data bases, and to Blake Meyers, Michele Morgante and Renato Tarchini for discussion of disease resistance genes and comments on the manuscript.Everaldo G. de Barros was the recipient of a fellowship from CAPES.

Figure 2 -
Figure 2 -The amino acid sequence of clone A45-10 arranged in the pattern of leucine-rich repeats.