Linkage disequilibrium and diversity for three genomic regions in Azoreans and mainland Portuguese

Studies on linkage disequilibrium (LD) across the genome and populations have been used in recent years with the main objective of improving gene mapping of complex traits. Here, we characterize the patterns of genetic diversity of HLA loci and evaluate LD (D') extent in three genomic regions: Xq13.3, NRY and HLA. In addition, we examine the distribution of DXS1225-DXS8082 haplotype diversity in Azoreans and mainland Portuguese. Allele distribution has demonstrated that the São Miguel population is genetically very diverse; haplotype analysis revealed 100% discriminatory power for X- and Y-markers and 94.3% for HLA markers. Standardized multiallelic D' in these three genomic regions shows values lower than 0.33, thereby suggesting there is no extensive LD in the São Miguel population. Data regarding the distribution of DXS1225-DXS8082 haplotypes indicate that there are no significant differences among all the populations studied, (Azorean geographical groups, the Azores archipelago and mainland Portugal). Moreover, in these as well as in other European populations, the most frequent DXS1225-DXS8082 haplotype is 210-219. Even though São Miguel islanders and Azoreans do not constitute isolated populations and show LD for only very short physical distances, certain characteristics, such as the absence of genetic structure, the same environment and the possibility of constructing extensive pedigrees through church and civil records, offer an opportunity for dissecting the genetic background of complex diseases in these populations.


Introduction
Linkage disequilibrium (LD), the nonrandom association of alleles at different loci, varies across populations and genomic regions, as well as between pairs of markers in close proximity. Certain factors which generate LD variance, as for example genetic drift and admixture, are population specific. Others, such as recombination rate, gene conversion and natural selection, are specific to genomic regions (Shifman et al., 2003). For these reasons, studies on LD extent and population structure are a good starting point for the investigation of complex traits (Angius et al., 2002).
The Azores is a Portuguese archipelago composed of nine islands distributed among three geographical groups: the Eastern group with two islands, São Miguel and Santa Maria; the Central group which includes five islands, Terceira, Pico, Faial, São Jorge and Graciosa; and the Western group, with Flores and Corvo islands. The Portuguese explorers, who discovered the archipelago in 1427, only started the settlement in 1439 through a long and difficult process. Historical data report that the Portuguese crown was compelled to cede both land and privileges, not only to the Portuguese, but also to foreigners, so as to attract people to the islands. Thus the Azorean population received a significant contribution from people with genetic backgrounds other than Portuguese. This included individuals of Flemish, Spanish, French, Italian, German, Scottish and Jewish origins, as well as Moorish prisoners and black slaves from Guinea, Cape Verde and São Tomé (Guill, 1993). The first islands to be settled were Santa Maria and São Miguel, the last being Flores and Corvo in the beginning of the 16 th century, the latter being populated mainly with individuals from the other islands. al., 2005;Pacheco et al., 2005;Spínola et al., 2005a;Branco et al., 2006Branco et al., , 2008aBranco et al., , 2008bBranco et al., , 2008c have been undertaken, with the aim of characterizing the genetic pool of the Azoreans. These studies report high genetic variability and heterogeneity in the Azorean population, as explained by the history of the settlement of the islands. Recently, Laan et al. (2005) proposed that the evaluation of DXS1225-DXS8082 haplotype diversity constitutes an efficient marker of population genetic history due to its low recombination rate. Therefore, in order to unravel possible differences between mainland Portuguese and Azoreans, unobserved in previous works, we re-analysed the data published in Branco et al. (2008a) for the three Azorean geographical groups, as well as for the Azores archipelago and mainland Portugal. In addition, the present research, based on an analysis of the Xq13.3 non-recombining portion of the Y-chromosome (NRY) and HLA regions in São Miguel Islanders, also mainly aims at answering questions such as: What is the allelic distribution of HLA class I and II in this island population, and does it reveal the presence of a genetic structure? Does LD extent vary considerably between these three different genomic regions?

Population samples and genotyping
The sample set was composed of healthy blood donors living in São Miguel Island and obtained from the "anonymous" Azorean DNA bank located in the main Hospital of the Azores archipelago, Portugal . LD for X-and Y-chromosomes was assessed only in males (189 and 149, respectively), whereas the analysis of the HLA region consisted of 106 individuals of both sexes (8 females and 98 males). The Xq13.3 region was analyzed according to Branco et al. (2008a), by examining eight microsatellite markers -DXS983, DXS1066, DXS986, DXS8092, DXS8082, DXS1225, DXS8037 and DXS995 -spanning approximately 6.9 centiMorgans (cM) or 20.9 megabases (Mb). Moreover, DXS1225-DXS8082 haplotype frequencies were estimated in the three Azorean geographical groups (Western, Central and Eastern), as well as in the Azores archipelago and mainland Portugal, by using a total of 527 individuals (450 islanders and 97 mainlanders), the same as were previously studied by these authors (Branco et al., 2008a).
The characterization of 7 Y-STRs in 172 male individuals is described in Pacheco et al. (2005). HLA class I (-A, -B and -Cw) and class II (-DRB1, -DQB1, -DPA1 and -DPB1) genotyping was undertaken in 106 individuals by PCR-SSP Olerup SSP (GenoVision Inc.), according to manufacturers' instructions. After electrophoresis on a 4% agarose gel stained with SYBR® Green, the PCR products were visualized, followed by HLA allele identification using the Helmberg-SCORE "Sequence Compilation and Rearrangment Evaluation for Research only" software ver-sion 3.320T (Olerup SSP AB, Saltsjöbaden, Sweden). We also typed two dinucleotide STRs -D6S265 and TNFa -located in the HLA region Branco et al. (2008b).

Statistical analysis
Allele and DXS1225-DXS8082 haplotype frequencies were calculated by direct counting. Average gene diversity estimation was done by using Arlequin software. Estimation of HLA haplotypes was obtained through the expectation maximization (EM) algorithm, an iterative procedure from multilocus genotype data, with the unknown gamete phase implemented in Arlequin v3.0. Evaluation of standardized multiallelic disequilibrium coefficient, D', was performed using the Haploxt application from GOLD software. This program calculates disequilibrium statistics from haplotype data. An estimation of average D' values in each genomic region, was executed with a simple mathematical mean, for all values obtained for each marker pair. Since the extent of the three regions varied widely, a second analysis, taking into consideration a smaller genetic distance of around 3 Mb, was undertaken, so as to give further insight into LD patterns over short distances, and to guarantee the truthfulness of the drawn conclusions.

HLA diversity in São Miguel Island
An analysis of the HLA alleles in 106 individuals from São Miguel Island (Table 1) revealed that the average gene diversity for all HLA loci varied from 0.821 for -DQB1 and -DPA1 to 0.934 for -B. Considering HLA loci, the overall gene diversity for São Miguel islanders was 0.843 (Table 2). HLA allele frequencies in São Miguel, mainland Portugal and other European populations demonstrated the absence of statistically significant differences (G ST = 0.03; data not shown).

Linkage disequilibrium in São Miguel Island
Since LD varies among genomic regions within the same population, in São Miguel Island, we investigated the extent of this parameter in the Xq13.3, NRY and HLA regions. The number of haplotypes, genetic diversity and D' average values are shown in Table 2. Taken as a whole, diversity results demonstrate that the São Miguel Island population is very diverse. In terms of haplotype number, which in itself can influence the value of D', we observed a smaller value for the NRY region. Since LD is generated by evolutionary processes, it is important to assess the patterns of LD, both in sex and autosomal chromosomes. In a comparison of D' in Xq13.3, both the NRY and HLA regions reveal lower LD than the Xq13.3 ( Table 2). The data indicate a higher LD for the NRY, followed by the HLA region.
On examining the analysis of LD patterns in shorter genetic distances (~3 Mb), we observed the same trend.
Nevertheless, values had increased for both Xq13.3 and NRY. The value for HLA had diminished due to the cut-off genetic distance value being smaller than that evaluated previously. The average D' 2 (0.247; Table 2) was not statistically different when compared to the observed value in larger genetic distance analysis (0.243;~21 Mb). Figure 1a shows plotting average D' over physical distances. We detected a decrease in LD values for shorter distances (< 4 Mb) in all the regions. As expected, the highest value (> 0.5) obtained in the X-chromosome corresponds to the association DXS1225-DXS8082, which is the shortest physical distance between all the markers. In order 222 Linkage disequilibrium in São Miguel Island to compare LD values, when taking into consideration the shortest distance studied (~3 Mb), we analysed D' for the three genomic regions (Figure 1b). The results indicated the same tendency. We also added trend lines (grey lines; Figures 1a and 1b), so as to understand the distribution of D' values in these genomic regions. The results clearly demonstrated that D' values diminished with physical distance. Nevertheless, we did not consider a trend line for HLA loci, in Figure 1a, since it would present a sharp decline, tending to negative values. We also noticed that, in both the NRY and Xq13.3 regions, stabilization in D' values with physical distance was registered (grey lines; Figures 1a and 1b), probably due to less recombination in the Xq13.3 region and a lack in the NRY.

DXS1225-DXS8082 haplotype analysis in Azorean and mainland Portuguese populations
Several studies (Latini et al., 2004;Bellis et al., 2008) have demonstrated a firm association between the markers DXS1225-DXS8082 (Xq13.3), mainly as a result of their short physical distance (162 kb). Recently, Laan et al. (2005) proposed this haplotype as a good marker of population genetic history, due to its low recombination rate. The distribution of DXS1225-DXS8082 haplotypes was also analyzed (Table 3) in Azoreans and mainland Portuguese populations, in order to detect significant differences between islanders and mainlanders. The results demonstrated that the most frequent haplotype in the Azorean and mainland Portuguese populations was 210-219 followed by 192-229. We identified a total of 52 different DXS1225-DXS8082 haplotypes, but only 16 with a frequency of ³3% (Table 3). Based on sample distribution, this selection of criterion (³3%) allows us to illustrate certain differences between the Azorean and mainland populations. For instance, we could only observe the presence of haplotypes 192-219 and 214-219 in the Western, 212-219 in the Central, and 198-225 in the Eastern groups. From Table 3, we could also perceive that there are only 3 common haplotypes in the whole Azorean population and geographical groups, namely 210-219 (the most frequent), 192-229 (the second most frequent) and 192-231 (absent in mainland Portugal). In the Azores archipelago population, we observed the presence of 9 out of 16 different haplotypes. If we add those with a frequency of ³1% (data not shown), we could identify 17 out of 52 (the remaining 35 do not reach 1% frequency). On considering the ³3% criterion, mainland Portugal shows 8 out of 27 different haplotypes (Table 3), all with a frequency of ³1% (data not shown).

Discussion
The evolution of populations is dependent on several mechanisms, such as migration, genetic drift, selection and mutation, all affecting the patterns of diversity of neutral and disease variants. Consequently, the measure of diversity in neutral markers allows for inferring how these processes shape the overall signature of a population, besides deducing further implications in general disease apportionment, since non-neutral loci may be under the same evolutionary forces. In general, the data corroborate previous studies Branco et al., 2006;Branco et al., 2008aBranco et al., , 2008bBranco et al., , 2008c, where Azoreans and São Miguel islanders showed higher genetic diversity values than mainland Portugal and other European populations. This may be a direct consequence of the Azores settlement process, where a major contribution of mainland Portuguese and, to a lesser extent, Flemish, Spaniards, French, Italians, Branco et al. 223   In addition, the same authors (Spínola et al., 2005a) question the identification of paternal lineage N3, specific to Asians and north Europeans (Helgason et al., 2000;Rosser et al. 2000), since, based on HLA loci, they did not encounter any results supporting this observation. Historic records reporting the presence of Asians or Mongolians in the archipelago are unknown. Nevertheless, with HLA data, the haplotype A*02 B*44 DRB1*04 was identified at a frequency of 1.42%. This haplotype, possibly oriental in origin, has previously been described in the Azores (Bruges-Armas et al., 1999). The introduction of this genetic contribution probably occurred during the expansion of trade navigation between Europe, America and Asia, in the 16 th and 17 th centuries, when the Azores played a strategic role due to its geographic position. Meyer et al. (2006) investigated LD among all the HLA loci in around 40 populations worldwide, and reported significant LD values. The present results, although showing that HLA has significant pairwise LD p-values (p < 0.01; 13 out of 36 pairs with significant LD; data not shown), do not imply strong LD for this region (D' values < 0.3). The distribution of LD among Y-linked alleles is accepted as being substantially larger than that of X-linked markers, since, Y-alleles constitute/have only one-fourth of the effective population size. This assumption is confirmed by data obtained herein. Nevertheless, the highest peak identified in Figures 1a and 1b corresponds to the association between DYS392-DYS385. This was unexpected, since this region does not present recombination. We hypothesise that this observation may reflect the influence of stochastic processes, such as random sampling, or even point mutations throughout the evolution of populations, since these markers have a higher genetic distance (~1.75 Mb) when compared to markers with the lowest value (DYS389I-DYS389II;~0.25 kb).
The study of DXS1225-DXS8082 haplotype diversity in Azorean and mainland Portuguese populations, has contributed to understanding how these populations are mutually related. The X-chromosome is an important tool for historical research, since there is but one copy in males, thus facilitating the determination of haplotypes. This feature permits accurately determining LD extension, and also allows for inferring population "maternal lineages". It is clear that, since this chromosome undergoes recombination, direct maternal lineages may not be obtained. Nonetheless, on studying DXS1225-DXS8082, which has a very small probability of recombination (their physical distance is 162 kb), we could come to some interesting conclusions. Taking into account the frequency ³1% proposed by Laan 224 Linkage disequilibrium in São Miguel Island  (2005), it was possible to notice that most haplotypes were present in all the evaluated populations, thereby suggesting that there were no mutual differences. Moreover, the 210-219 haplotype, reported by Laan et al. (2005) as being the most widely represented in Europe, was also the most frequent in this study. These results confirm previous works, where a strong similarity between Azoreans and other Europeans was evident Branco et al., 2006Branco et al., 2008aBranco et al., , 2008bBranco et al., , 2008c. There is some controversy regarding the amount of useful LD for mapping studies. According to Abecasis et al. (2001), the value of D' = 0.33, which corresponds to a 10-fold increase in the required sample size, is commonly taken as the minimum usable amount of LD. On the other hand, Reich et al. (2001) consider that D' > 0.5 is useful. Although both of these D' values are estimates based on SNP (single nucleotide polymorphism) markers, Schulze et al. (2002), on comparing both SNPs and microsatellites, reported the same values for D'. In the present case, none of the samples studied manifested values higher than 0.5 or 0.33, thus indicating no LD for the São Miguel population. These results are corroborated by those obtained by Service et al. (2006) and Branco et al. (2008a), where the Azoreans presented the lowest values of LD when compared with isolated and outbred populations. Even though the Azores are not an isolated population, there are certain characteristics that offer the opportunity for dissecting the genetic background of complex diseases in these populations, such as the absence of genetic structure, the same environment and the possibility of constructing extensive pedigrees through church and civil records. The absence of structure reduces the presence of false genetic associations in complex disease studies. The same environment allows for better control over external factors that may be influencing the development of a complex disease. Finally, extensive pedigrees permit the development of reliable studies on linkage, with statistical significance. In summary, the overall data implies that the identification of identical-by-descent (IBD) regions surrounding disease susceptibility genes or other complex trait loci in the São Miguel population, as well as in Azoreans, will require very high marker density, where data from the HapMap project (The International HapMap Consortium, 2007) will most certainly increase the power of IBD mapping.