Mining plant genome browsers as a means for efficient connection of physical, genetic and cytogenetic mapping: An example using soybean

Physical maps are important tools to uncover general chromosome structure as well as to compare different plant lineages and species, helping to elucidate genome structure, evolution and possibilities regarding synteny and colinearity. The increasing production of sequence data has opened an opportunity to link information from mapping studies to the underlying sequences. Genome browsers are invaluable platforms that provide access to these sequences, including tools for genome analysis, allowing the integration of multivariate information, and thus aiding to explain the emergence of complex genomes. The present work presents a tutorial regarding the use of genome browsers to develop targeted physical mapping, providing also a general overview and examples about the possibilities regarding the use of Fluorescent In Situ Hybridization (FISH) using bacterial artificial chromosomes (BAC), simple sequence repeats (SSR) and rDNA probes, highlighting the potential of such studies for map integration and comparative genetics. As a case study, the available genome of soybean was accessed to show how the physical and in silico distribution of such sequences may be compared at different levels. Such evaluations may also be complemented by the identification of sequences beyond the detection level of cytological methods, here using members of the aquaporin gene family as an example. The proposed approach highlights the complementation power of the combination of molecular cytogenetics and computational approaches for the anchoring of coding or repetitive sequences in plant genomes using available genome browsers, helping in the determination of sequence location, arrangement and number of repeats, and also filling gaps found in computational pseudochromosome assemblies.


Introduction
Scientific advances in the field of genomics have been promising for crop improvement in quality, productivity and resistance against pathogens, meeting the demands for food, fiber and biofuels. Such an interest has led to the production of large quantities of biological data from diverse sources. The continuous increase in the amount of available data on genomes and gene expression studies requires efficient storage, organization and data analysis. So the next logical step is to develop various graphical user interfaces or genome browsers, which provide logical access to data flows that otherwise would be unintelligible (Sen et al., 2010). According to the Entrez Genome Project, in 2009 more than 150 projects related to the Viridiplantae genomes were initiated, including several species of agronomic, industrial and biotechnological interest, emphasizing the importance of bioinformatics platforms for the promotion of comparative genomics of model plants so as to enable us to understand the biological properties of each species, as well as accelerating gene discovery and functional analysis.
In this scenario, several genome browsers were developed, especially dedicated to generate information on cultivated and model plants. Gramene, for example, is a free online tool for genome comparison, providing a total of 15 genomes, including those of Oryza sativa (cv. japonica PlantGDB provides access to sequences, as well as to a variety of tools for analysis and comparison of genomes, providing chromosome-based genome browsers (xGDB) for 14 plant species with completely or partially sequenced genomes (Duvick et al., 2008). Additional sources of information are Phytozome, which currently provides genome browsers for 22 plant species, including the legumes soybean (Glycine max) and Medicago truncatula, and LIS (Legume Information System) that comprises data on 18 legume species.
To facilitate gene and genome annotation, and to understand the organization, structure and evolution of genes and genomes, we carried out a set of procedures so as to optimize the use of the information deposited in plant genome browsers for cytogenetic and physical mapping of selected genes or genome regions. We also present a practical example of how to anchor Bacterial Artificial Chromosomes (BACs) and repetitive sequences in the soybean genome, integrating in silico and in situ approaches, as well as an example of how a careful study of gene families (e.g. aquaporins) may aid in characterizing and explaining the emergence of complexity in plant genomes.

Applications and Uses of Plant Genome Browsers (PGBs)
The information on complete genome sequences allows us to derive important sets of genomic features, including the identification of protein-coding and non-coding genes, regulatory elements, gene families and repetitive sequences, such as the Simple Sequence Repeats (SSR). Among other applications, this set of features has become the raw material for the integration of multivariate information such as "omics" data. Alignments are often used to explore/describe gene structure and the distribution of gene families in complete genomes (Soares-Cavalcanti et al., 2012), as well as the conservation of syntenic structures among chromosomes of different species, allowing for the evolutionary history reconstruction of genes and genomes through comparative structural and functional genomic approaches (McClean et al., 2010).
Notably, plant genomes contain large amounts of repetitive elements (RE), which refer to a broad and heterogeneous group of genetic elements that are often degenerate and inserted in each other. Mobile elements, simple sequence repeats (e.g. micro-, mini-and satellite) and gene families with high numbers of repeating units (e.g. rDNA and histones) are the main RE groups (Spannagl et al., 2007). These RE groups are present in mostly of the unanchored sequence scaffolds after plant genome assembly, as for instance in the case of the SoyBase platform (Schmutz et al., 2010). The FISH (Fluorescent In Situ Hybridization) procedure could be a good strategy to identify these blocks which are frequently localized in heterochromatic regions (Cuadrado and Jouve, 2007). This strategy emphasizes the power of complementation which may result from the combination of molecular cytogenetics and computational approaches to the anchoring of repetitive sequences in plant genomes with available genome browsers, in order to determine its location, arrangement and number of repeats, filling gaps found in computational pseudochromosome assemblies.
FISH-based cytogenetic maps developed using BAC clones as probes are often associated with genetic and contig maps (Cheng et al., 2001;Findley et al., 2010), and may be useful during whole genome sequencing projects, helping to evaluate the size of the putative remaining gaps. Given the low correlation observed between physical distances (measured in micrometers) and genetic distances (based on the recombination frequency), the integration of cytogenetic and genetic maps has allowed the identification of possible distortions in physical distances found in linkage maps (Kao et al., 2006). Recently, a cytogenetic map of the common bean was built by FISH with 43 available anchoring points (BACs) between the genetic and the cytogenetic maps. Their comparison confirmed the suppression of recombination in extended pericentromeric chromosome regions, indicating that suppression of recombination correlates with the presence of prominent pericentromeric heterochromatic blocks, and is responsible for the distortions of the inferred distances Fonsêca et al., 2010).
Bioinformatics platforms and associated databases are essential for the emergence of effective approaches that make the best use of genomic resources, including its respective integration. Genetic maps, often constructed by independent research groups for several plant species, allow to define the relative position of markers linked to heritable traits. When compared to physical maps, genetic maps provide a means to link these heritable traits to the underlying genomic sequence variation (Lim et al., 2007). It also allows the investigation of homologies among different genomes in the same species (allopolyploidy) or different species, observing colinearity (e.g. conservation of gene order) or synteny (e.g. conservation of linkage) among them (Hougaard et al., 2008), both at macro and micro levels (Kevei et al., 2005). The former focuses on the genome as a whole, examining large regions (e.g. linkage groups) by comparison of genes or chromosome segments based on genetic, physical or cytogenetic maps of different species (Mandáková and Lysak, 2008;McClean et al., 2010), while the latter focuses directly on smaller, but continuous, completely sequenced genomic regions (David et al., 2009).
Genome browsers are flexible platforms that allow blast searches, and also searches for pseudochromosomes, organism names, contig IDs, clone accession numbers, GenBank accession numbers, gene symbols, genetic markers, or any other term indexed in the database. Recent innovations in search platforms based on the various "omics" and the development of new applications provided essential research resources for various plant species. As these become available for ever more species, and when combined with wet lab experiments, they will aid in integrating biological data from diverse sources. With worldwide efforts directed towards the structural and functional characterization of its genome, soybean is at the forefront of legume genomics, with a robust infrastructure in information technology that is critical to understand the biology of this and other legumes. The final application of these resources and information reflects the efforts to elucidate the genetic background of given agronomic traits, with important implications for plant breeding.

A Practical Example Using Soybean
Previous studies demonstrated that the soybean genome (probably of polyploid origin) has undergone multiple whole genome duplications, genome diploidization, as well as chromosomal rearrangements (Shoemaker et al., 2006), thus making it one of the most complex plant genomes currently investigated. Hence, multiple copies (or blocks) of DNA sequences were identified in more than two chromosomes. On average, 61.4% of the homologous genes are present in blocks involving only two chromosomes, 5.63% are spread over three chromosomes, and 21.53% in four (Schmutz et al., 2010).
Soybean (2n = 40 chromosomes) was the first legume to be completely sequenced, serving as a reference for more than 20,000 legume species and helping to understand the mechanism of biological fixation of atmospheric nitrogen by symbiosis. The soybean genome was sequenced using the shotgun strategy, covering 950 Mb of sequence. Most of the genome sequences were assembled into 20 pseudochromosomes (Glyma 1.01), grouping 397 sequence scaffolds in ordered positions within the 20 soybean linkage groups. An additional amount of 17.7 Mb were recognized in 1,148 sequence scaffolds that were left unassembled, being constituted mainly of repetitive DNA and less than 450 predicted genes (Schmutz et al., 2010). The scaffold positions were identified by means of extensive genetic maps, including 4,991 single nucleotide polymorphisms (SNPs) and 874 simple sequence repeats (SSRs) (Song et al., 2004, Choi et al., 2007Hyten et al., 2010a,b).
Using a combination of full-length cDNA, EST, homology and ab initio methods, 46,430 protein-coding loci were identified in the soybean genome with a high confidence level, and another 20,000 loci were predicted with a low confidence level. From the first group of genes, 12,253 gene families (34,073 genes) could be identified with one or more sequences in other angiosperms, as well as 283 legume-specific gene families and 741 soybean-specific gene families, reflecting an ancient but continuous process of duplication and genetic divergence (Schmutz et al., 2010).

Anchoring gene families in physical maps
On a microscale, the genomic distribution pattern of gene family members has served to assist in the inference of the processes that generated the observed genome complexity (Di et al., 2010). As an example we used the aquaporin gene family, because aquaporins are a ubiquitous protein family and have important physiological roles.
Aquaporins constitute a set of small transmembrane proteins that facilitate the process of transporting water and small solutes. The first plant aquaporin was identified in soybean root nodules. Later, their presence was verified in many species of Viridiplantae, recognizing four main aquaporin types that reflect their size and subcellular localization (Chaumont et al., 2001(Chaumont et al., , 2005Kaldenhoff and Fischer, 2006;Kruse et al., 2006;Maeshima and Ishikawa, 2008). Aquaporins are abundant, diverse and widely distributed in plant genomes. Arabidopsis presents 35 aquaporin coding genes spread throughout the five chromosomes of the genome that is believed to be one of the simplest among plants (Chaumont et al., 2005;Ishikawa et al., 2005;Zhao et al., 2008). Although the first aquaporin was described in soybean, there are no studies on the abundance, diversity and distribution of aquaporins in this legume.
For the study of aquaporins in the soybean genome, we chose four Arabidopsis protein sequences as probes, representing each of the four subfamilies of aquaporins: Plasma Membrane Intrinsic Protein (PIP1.4; acc. NP_567178.1), Tonoplast Intrinsic Protein (PIR1-1, acc. P25818.1), Nodulin26-like Intrinsic Protein (NIP4-2, acc. NP_198598.1) and Small and Basic Intrinsic Protein (SIP2-1, acc. NP_191254.1). Using these as query sequences, a tBLASTn search was conducted in the EST sequence database of GENOSOJA. At this stage, we adopted a cutoff e-value of e -05 for acceptance of putative aquaporin homologs in soybean.
Subsequently, sequential analyses were performed to determine the identity of these putative homologs expressed in soybean, through recognition of similarities with known proteins using the BLASTx algorithm, conceptual translation using the ORF finder program, and evaluation of conserved domains using the rpsBLAST algorithm. After identifying the expressed homologs, the next step consisted of anchoring these transcripts in the soybean genome browser available at the SoyBase web server. For this purpose, such transcripts were entered as queries in a BLASTn search. The conceptually translated protein sequences were also used as queries in a tBLASTn search in order to discover possible new aquaporin loci not represented in the available soybean EST pool. Finally, a megaBLAST search was carried out using the nucleotide sequences of all loci in order to determine the most closely related genes, thus reflecting the relationship among the chromosomal regions harboring aquaporin genes (Figure 1).
The initial search for aquaporin homologs in soybean expressed sequences recovered 102 candidates. However, these sequences were anchored in only 64 loci in the soybean genome. This may be indicative of alternative processing of primary transcripts, but may also reflect certain noise introduced during the assembling process of the available ESTs. The proteins obtained by conceptual translation of the loci, when compared with the genome through the tBLASTn tool, reported 36 new loci, totalizing 100 aquaporin genes in the soybean genome. This number is approximately three times higher than that denoted for Arabidopsis and rice (Johanson et al., 2001, Sakurai et al., 2005, and is the largest number of aquaporins observed in a plant species to date. The increase in the number of aquaporin coding genes has been attributed to segmental and whole genome duplications (Liu et al., 2009). These processes can also be invoked to explain the number and distribution of aquaporins in the soybean genome. For example, pseudochromosomes 10 and 20 (Gm10 and Gm20) share four colinearly preserved aquaporin genes at the distal regions of the long chromosome arm, which are inverted only in relation to the extremity (Figure 1). This observation is consistent with the syntenic relationship between Gm10 and Gm20 (Schmutz et al., 2010), and among these and chromosome 7 (Pv7) of Phaseolus vulgaris (McClean et al., 2010). Another striking example is the commonality of a tandem duplication found integrally or with the loss of one of the genes from the tandem composition. The first case was observed between Gm5 and Gm8, as well as between Gm7 and Gm8 (Figure 1), again in agreement with previous observations (McClean et al., 2010) considering an overall evaluation regarding diverse gene families. The latter can be seen involving the distal regions of the long chromosome arm of Gm3 and Gm19, which are colinearly conserved, except for the absence of one of the SIP genes in Gm3 (Figure 1). A general prevalence of aquaporin genes in distal positions is also evident. These are just some of the events denoted in Figure 1. In general, the number and distribution of aquaporins corroborate previous suggestions of the octoploid nature of soybean (Shultz et al., 2006). The panel depicted by the analysis suggests that this gene family is a good candidate to determine the time elapsed after polyploidization of soybean from the putative diploid ancestor(s), especially when sister genomes are added to the comparison (Schranz and Mitchell-Olds, 2006).

Comparative mapping between genetic, physical and cytogenetic maps
With the development of the SoyBase platform, comparative analysis of genetic and physical maps 338 Belarmino et al. through contigs (distances measured in base pairs) with cytogenetic maps has made map integration even more informative, allowing not only a deeper analysis of both repetitive and single copy DNA sequences, but also the rapid and efficient identification of synteny between different taxonomic groups. Below are alternative ways of using the SoyBase for the analysis and selection of both repetitive and single-copy DNA sequences for cytogenetic mapping in soybean.
In silico selection of BACs for FISH BAC inserts are capable of carrying up to 500 kb of genomic DNA, with typical sizes ranging from 80 to 200 kb, containing highly repetitive DNA sequences to single copy DNA (Peterson et al., 2000). Accordingly, BACs containing markers linked to disease resistance genes, for example, can be directly selected from the genome browsers for subsequent acquisition and use as FISH probes, allowing in situ localization of the markers and also potentially contributing to the recognition of possible distortions between maps. Another point is the identification of chromosomes in a cell and the association with their respective linkage groups and/or pseudochromosomes, as recently elucidated for soybean (Findley et al., 2010).
As an example, we present the analysis and selection of BAC Gm_WBc0102N16 (102N16) and BAC Gm_WBc0088G15 (88G15) regarding Gm16 (linkage group J) on the SoyBase web server (Figure 2). Both BACs presented interesting characteristics like QTL (Quantitative Trait Loci) associated with drought tolerance or plant Physical, genetic and FISH maps of soybean 339 height/yield or height of plant (102N16) and increasing yield (88G15) (BARC SSR markers at SoyBase) (Table 1, Figure 2b). Another important point is the selection of BACs with high exon density, because BACs from regions with lower exon densities are more likely to carry repetitive DNA sequences, which can promote in situ hybridization at different sites, preventing its exact location in the karyotype. BACs with high exon density, lacking repetitive regions, can be selected through a heat map (Figure 2a') that consists of 100 kbp segments differentiated by a color intensity gradient representing exon density (including all splice variants). The BACs were also selected by the amount of Glyma1 gene models (Figure 2a' and Table 2), as well as presenting aligned sequences from other legumes (Figure 2a''), the presence of a given molecular marker (Figure 2c) or in synteny with Medicago truncatula ( Figure  2d). Additionally, some regions of genome duplication in soybean could be observed (Figure 2e).

Evaluation of SSR oligonucleotides in the soybean genome
As a case study, we report the distribution of an SSR sequence (AAC) 5 in soybean, as assessed by in silico analysis of repetitive sequences in SoyBase as compared with the FISH results. SSR microsatellites consist of small repeat units (1-6 bp) distributed in tandem throughout the genomes, they are found within structural genes or other repetitive sequences, as well as associated with heterochromatic regions (Heslop-Harrison, 2000; Cuadrado and Jouve 2010). Rapid SSR evolution has led to a genome-specific, species-specific and even chromosome-specific distribution pattern (Begum et al., 2009). The frequency and distribution of different SSR oligonucleotide motifs have been the subject of intense investigation, especially in some partially or completely sequenced genomes, as in P. vulgaris (Schlueter et al., 2008) and G. max (Hyten et al., 2010a), aiming to understand the genomic organization of different species.
However, large SSR blocks are difficult to detect by in silico analysis, as they are observed as numerous short overlapping repeat units. FISH can more easily identify these blocks as in situ marking sites, often located in heterochromatic regions (Cuadrado and Jouve, 2007).
With this in mind we performed an in silico screening of (AAC) 5 in the soybean unmasked genome using the following parameters in soybean genome browser at Phytozome: comparison matrix blossum62, e-value of 0.1 or less and low complexity filter off (Figure 3a). The oligonucleotide (AAC) 5 was used as the probe, with 77% pairing identity as a cut-off parameter (similar to FISH stringency). Due to the repetitive nature of the probe, the BLASTn alignment created an artifact of sliding windows in continuous regions (Figure 3a), thus the alignment page was processed by a macro scripted in UltraEdit (Figure 3b), resulting in a formatted Microsoft Excel table that enabled   340 Belarmino et al. the size and limits of the matching region to be calculated in bp (base pairs) by subtracting the initial from the final alignment position for each region (Figure 3c). This information pointed to sequence alignment distribution over 15 soybean pseudochromosomes, with no matches for Gm2, Gm3, Gm12, Gm14 and Gm18. The aligned regions were then examined in the SoyBase genome browser for associated genes, intragenomic duplications and synteny with other species (see Table 3). A schematic representation of the in silico mapping on soybean pseudochromosomes has been constructed using as size parameter the soybean pseudochromosome lengths available on the SoyBase web server, which range from 37.4 to 62.31 Mb. Considering a ratio of 1 Mb to 1 mm, the oligonucleotide repetitions were individually positioned along the pseudochromosomes (Figure 4).
The in silico mapping of the (AAC) 5 microsatellite in soybean showed the presence of 32 sites, with sizes varying from 26 to 81 bp, located in regions of high to moderate gene density, sometimes associated with genes, and only one site for a region without genes. Four out of the 32 sites represented two overlapping repeat units each (Figure 4).
FISH protocol using BACs (102N16 and 88G15) and synthetic oligonucleotide SSR (AAC) 5 as probes BAC probes BAC clones were selected as previously described and ordered from the G. max genomic library at the University of Arizona (USA) (www.genome.arizona.edu/orders). In this study, we used two soybean BACs belonging to linkage group J (BAC 102N16 and 88G15 -Gm16).
BAC DNA was isolated using the Qiagen Plasmid Mini kit protocol (Qiagen), with some adaptations. The probes were labeled by nick translation with Cy3-11-dUTP (Amersham) following manufacturer's instructions.
(AAC) 5 synthetic oligonucleotide and 45S rDNA probes The synthetic oligonucleotide (AAC) 5 was indirectly labeled with digoxigenin-11-dUTP by the end labeling method (DIG Oligonucleotide 3'-End Lab. Kit, 2nd generation, Roche) according to the manufacturer's instructions. R2, a plasmid with a 6.5 kb fragment of the 18S-5.8S-25S rDNA repeat unit from A. thaliana L. (Wanzenböck et al., 1997), was isolated as described above and labeled by nick translation with biotin-16-dUTP and used as a probe in Gm13 identification.

FISH
For both probe types, cytological preparations were produced as described by Carvalho and Saraiva (1993), with some adaptations. For the FISH procedure, slides were pretreated as described by Pedrosa et al. (2003). Chromosomes were denatured in 70% formamide in 2x SSC at 70°C for 7 min and then dehydrated for 5 min in each concentration of an ice-cold ethanol series (70% and 100%).
Physical, genetic and FISH maps of soybean 341 Probe denaturation, post-hybridization washes and detection were performed according to Heslop-Harrison et al. (1991), except for the stringent wash, which was performed with 0.1x SSC at 42°C. Probes labeled with digoxigenin-11-dUTP were detected using sheep anti-digoxigenin-FITC (Roche) and amplified with anti-sheep-FITC (Sigma), in 1% (w/v) BSA. Biotin probes were detected using mouse anti-biotin (Dako) and amplified with rabbit anti-mouse TRITC conjugate (Dako) in 1% (w/v) BSA. All preparations were counter-stained and mounted with 2 mg/mL DAPI in Vectashield (Vector). Cells were analyzed on a Leica DMLB microscope and images of the best cells were captured on a Leica DFC 340FX camera, using Leica CW 4000 software. All images were optimized for contrast and brightness, and for the su-perimposed images, DAPI staining image was converted to grayscale, while the BACs 88G15 and 102N16 were artificially colored in yellow and orange, respectively. Images were superimposed, using the lighten tool. All these processes were done using Adobe Photoshop CS4 (Adobe Systems Incorporated) ( Figure 5).

Comparison of cytogenetic maps with in silico analysis
The in silico selected BACs 88G15 and 102N16 were in situ mapped as a single signal in Gm16. BAC 102N16 was located at the subterminal region of the short chromosome arm, while BAC 88G15 aligned at the intercalary region of the long chromosome arm (Figure 5a). The chromosome size was measured (2.84 mm), as well as the exact location and site size using the Micromeasure program, en-342 Belarmino et al. abling us to determine the physical distance between these markers (1.5 mm or 53% of the total chromosome length), which was represented by a chromosome-specific ideogram ( Figure 6a). The positions of the cytogenetic markers were explored in a comparative analysis with the contig physical map, constructed by in silico analysis, and integrated with the available soybean genetic map, revealing some divergence. Comparing in situ and in silico results, the observed discrepancies may be related either to the heterochromatin condensation behavior in mitotic metaphase chromosomes, or the impossibility to computationally determine the position of the remaining non-anchored 17.7 Mb scaffolds in the soybean physical map (Schmutz et al., 2010). Moreover, comparing the in situ analysis to the linkage map, it appears that Satt622 and Satt405 located in BACs 88G15 and 102N16, respectively, are at a genetic distance corresponding to 33 cM between markers (or 36.5% of the J link-Physical, genetic and FISH maps of soybean 343 age group) of the soybean genetic map, and indicating a distortion between the cytogenetic and genetic distances. Such distortions have recently been observed in comparative map analyses for P. vulgaris Fonseca et al., 2010) and Oryza sativa (Cheng et al., 2001), and are attributed to the suppression of recombination events in pericentromeric regions.
Regarding the SSR oligonucleotide (AAC) 5 , a comparative in silico and in situ analysis of its location showed that of the 31 sites observed in silico, 20 were found outside the pericentromeric region ( Figure 4). Moreover, the FISH analysis revealed different (AAC) 5 hybridization sites scat-tered throughout most chromosomes, especially in the proximal regions of two chromosome arms (Figure 5b). Such information raised the hypothesis that FISH has also shown sites associated with heterochromatic regions, not revealed by the in silico analysis because of their absence in the assembled pseudochromosomes, due to the fact that the SoyBase platform excluded a fraction of the constituent scaffolds that remained non-anchored (Schmutz et al., 2010). The absence of such repetitive regions may be justified by technical difficulties in their clustering/assembling using bioinformatic tools. Besides, many genome projects face the difficulties of sequencing microsatellite rich re- 344 Belarmino et al.  gions, due to DNA polymerase slippage during PCR, causing variation and sometimes the "compression point" effect (Liepelt et al., 2005). Thus, the identified discrepancies support the idea that in silico and in situ analyses are complementary to each other, facilitating a better understanding of the physical structure and genomic organization, mainly regarding repetitive DNA rich regions. An in silico and in situ comparative analysis for chromosome 13 carrying the 45S rDNA further supports our findings (Figure 6b).

Synteny with other crops
From a macrosyntenic point of view, a broad conservation of genome macrostructure is observed among legumes, especially within the galegoid clade, also highlighting inferred chromosomal rearrangements that may justify the variation in chromosome number between these species (Choi et al., 2004). Recently, synteny mapping between common bean and soybean (phaseoloid legumes) revealed 55 syntenic blocks of shared loci, with a mean size of 32 cM and seven loci on average. By comparing the location of these blocks, it is very clear that nearly all segments of the common bean genome mapped to two segments of the soybean genome (McClean et al., 2010).
More recently, the integration of genetic and cytogenetic maps with sequencing data has provided a greater number of marks and information about genome organization and evolution, facilitating a better understanding of chromosome homeologies and macrosynteny conservation among species. Using SoyBase, it was possible to identify alignments and synteny among soybean pseudochromosomes, as well as among soybean and other legume chromosomes. For instance, the BACs used in the present work (88G15 and 102N16) have homologies with other legumes. BAC 88G15 aligned to sequences of Cajanus cajan, Chamaecrista fasciculata, P. vulgaris, Medicago truncatula and Vigna unguiculata, whereas 102N16 aligned to all the aforementioned species, as well as to Glycine soja, Lotus japonicus, Pisum sativum and Lupinus albus (Table 1). Regarding synteny, 88G15 and 102N16 were syntenic to M. truncatula chromosomes Mt5 and Mt8, while 102N16 was syntenic to Mt8. Table 1 shows the synteny (duplications) of those BACs to other soybean chromosomes. Recently, an association between soybean cytogenetic and physical maps was successfully conducted (Findley et al., 2010), enabling not only a comparative study between soybean and G. soja, but also the simultaneous identification of 20 chromosome pairs in soybean mitotic preparations, as well as the establishment of the relationship with their pseudochromosomes.
To date, no investigation on the conservation of chromosome position and colinearity has been made available for legume species regarding aquaporin coding genes. A recent physical mapping of wheat aquaporin genes confirmed many orthologous relationships between wheat and rice and/or barley aquaporin genes, many of which were conserved in the syntenic genome areas (Forrest and Bhave, 2010). Our data is the first to explore this gene family within the soybean genome, raising evidence of past intense duplication events in soybean, followed by genome reorganization that retained most of the new aquaporin coding genes. Given that most soybean chromosome regions correspond to two or more chromosome segments from P. vulgaris, it is likely that some of the aquaporin coding genes are conserved in the syntenic regions of both organisms.