New evidence for balancing selection at the HLA-G locus in South Amerindians.

HLA-G is a non-classical HLA (Human Leukocyte Antigen) molecule characterized by limited tissue distribution under normal physiological conditions and low variability at both DNA and protein levels. Several studies suggest that HLA-G could play a role, as an immunoregulatory molecule, in situations as diverse as transplantation, cancer, viral infections and inflammatory diseases. A total of 237 individuals from 21 South American tribes speaking nine different linguistic families were studied in relation to the 14 bp insertion/deletion polymorphism at the HLA-G gene. A consistent (seven in nine) excess of heterozygosity in samples classified by language was obtained. Our data supply evidences for balancing selection acting at the HLA-G 14 bp INDEL region. Enhanced fetal survival in a pathogen-rich environment may account for these findings.


Introduction
HLA-G is a non-classical HLA (Human Leukocyte Antigen) molecule characterized by limited tissue distribution under normal physiological conditions and low polymorphism at both DNA and protein levels. HLA-G presents unique characteristics as compared to its classical HLA counterparts, such as the expression of multiple isoforms generated by alternative splicing, which can be membrane-bound (G1-G4) or secreted (G5-G7). Moreover, it is the only HLA molecule capable to form dimers. Unlike the classical HLA-I A, B and C molecules, HLA-G seems to be involved in immune modulation rather than in antigen presentation. It has been shown that HLA-G plays a major role in immunosuppression, interacting with cells of the immune system and suppressing the immune response by different mechanisms. These mechanisms include the inhibition of T lymphocytes (CTL) and NK cells cytotoxic activity, protection of class I-negative or allogeneic tumors from NK-mediated anti-tumor immunity (Riteau et al., 2001) and even tumor cells that express ligands for NK activator receptors such as MICA (Menier et al., 2002;Rouas-Freiss et al., 2003). Also, it was shown that the HLA-G molecule can inhibit CD4+ T cell alloproliferative responses (Riteau et al., 1999), the proliferation of T and peripheral blood NK cells (Bahri et al., 2006;LeMaoult et al., 2007), and can also act on antigen presenting cells (APC) by inhibiting their maturation and function (Horuzsko et al., 2001). Additionaly, HLA-G may exert long-term immunotolerogenic effects through the generation of suppressor cells (reviewed by Carosella et al., 2008) and even cells which do not transcribe HLA-G may temporarily become HLA-G+ and suppressive through intercellular uptake of HLA-G-containing membrane patches, a mechanism called trogocytosis . On the other hand, it was postulated that HLA-G can activate uterine NK cells, leading to the production of pro-inflammatory and pro-angiogenic factors, which would be important for placental development (Rajagopalan et al., 2006). HLA-G expression was first described in the cytotrophoblast, and therefore the first studies on this molecule concentrated in its role during pregnancy. However, as soon as the first immunotolerogenic features of this molecule became known, the scientific community began to turn its attention to other immunological situations in which HLA-G could play a role, such as transplantation, cancer, viral infections, and inflammatory diseases (reviewed By Veit et al., 2010). Scientific interest on HLA-G has continuously increased since the description of the molecule in 1987, as evidenced by the fact that half of the 1352 entries for the term "HLA-G" in Pubmed (until July 2011) were generated in the last six years. At the Universidade Federal do Rio Grande do Sul, the Immunogenetics Laboratory of the Genetics Department is developing research on the variability of the gene HLA-G, with special focus on autoimmune diseases but also approaching infectious diseases and materno-foetal tolerance (Vianna et al., 2007;Veit et al., 2008Cordero et al., 2009;Consiglio et al., 2011).
The HLA-G gene is located at the Major Histocompatibility Complex (MHC) region, which comprises a collection of genes at the short arm of chromosome 6 (6p21.3). As previously mentioned, HLA-G presents limited polymorphism as compared to classic HLA molecules: only 47 alleles have been described to date, as compared to 1698, 2271 and 1213 alleles found at the A, B and C loci, respectively (IMGT/HLA Database), and it codifies 15 different proteins. Moreover, this limited variability is distributed along the three alpha domains, while in classic HLA molecules it is concentrated around the peptide binding groove.
Another striking difference between HLA-G and other HLA genes is its unique promoter region. While in classic HLA genes the promoter elements are located within 220 bp upstream the ATG start codon, the HLA-G regulatory elements are located on a region that spans 1.5 kb upstream from its start codon. Many typical HLA promoter elements are deleted or modified in the HLA-G promoter region, rendering HLA-G expression unresponsive to classical HLA stimulator factors such as nf-kB, IRF1 and CIITA. Another remarkable HLA-G promoter characteristic is its relatively high variability. To date, more than 29 SNPs have been identified in this region, many present near known regulatory elements . These polymorphisms may have an important impact on HLA-G expression, as reported by previous studies (Ober et al., 2003;. Moreover, haplotype analyses pointed to the existence of two lineages which may be under balancing selection. These lineages probably present different promoter activity patterns (Tan et al., 2005).
The 3' untranslated region (3'UTR) also seems to play an important role in HLA-G expression, mainly through post-transcriptional regulatory mechanisms. Castelli et al. (2010) described eight different haplotypes in a Brazilian population. In silico analysis of this region has identified numerous putative microRNA binding sites, which may influence HLA-G expression depending on the allele and/or biological context. Also, according to the latest information, eleven polymorphic positions have been identified at the HLA-G 3'UTR, many of which overlapping putative microRNA binding sites . Among the 3'UTR polymorphisms, a 14 bp insertion/deletion (INDEL) (rs1704) has been extensively studied due to its potential involvement in alternative splicing processes and post-transcriptional regulation. Transcripts with the 14 nucleotide sequence (ins) can undergo an additional splicing step which removes 92 bases from the region in which this sequence is located. This deletion is thought to influence mRNA stability since, following actinomycin treatment, the HLA-G transcripts lacking these 92 bases were shown to be more stable than the "complete" mRNAs in placental cells, (Rousseau et al., 2003). On the other hand, several studies have repeatedly reported the association of the ins allele with lower soluble HLA-G levels, and even the lack of detectable HLA-G expression in the plasma of homozygotes for the ins allele (Hviid et al., 2004;Rizzo et al., 2005Rizzo et al., , 2006Rizzo et al., , 2008. Interestingly, it was also described that the 14 bp INDEL lies in a region which is a putative binding site for many microRNAs (Castelli et al., 2009), and regulation through microRNA binding was also postulated to contribute to HLA-G expression control ). Nevertheless, it is evident that the net effect of microRNAs in HLA-G expression will be very difficult to access, as microRNA profiles may substantially vary among tissues and biological states. Mendes-Junior et al. (2007) hypothesized that since the insertion allele is associated to recurrent miscarriages and other pregnancy complications, its frequency might be rather low in isolated populations such as the Amazonian Indians, and this was indeed observed in their study. However, their data did not support that, in the studied populations, the 14 bp variant frequencies would depart from neutrality. Here we present new data on Amerindian populations that provide evidence for balancing selection acting at the HLA-G 14 bp INDEL region.

Subjects
We obtained the frequencies of the allelic variants of the14 bp polymorphism in 237 South Amerindians from 21 different tribes and nine different linguistic families, classified according to Campbell, 1997 (Table 1). They live all the way from 3°45' N to 39°20' S, and from 50°10' W to 69°30' W. These samples have been collected along five decades by members of our group and were conveniently stored for different types of testing. Informed oral consent was obtained from all participants, since they were illiterate. Consent was obtained according to the Helsinki Declaration, and this procedure was approved by the Brazilian National Ethics Commission (CONEP Resolution 123/98).

Polymerase chain reaction (PCR) amplification of exon 8 of the HLA-G gene and genotyping
Genotyping was performed by PCR as previously described (Hviid et al., 2002). Briefly, 100 ng of genomic DNA was amplified in a 25 mL reaction, with final concentrations as follows: PCR buffer 1X, dNTP 0.2 mM, MgCl 2 1.5 mM, Taq DNA polymerase 0.75 U, and 10 pmol of each primer (GE14HLAG: 5'-

Statistical analysis
Allele frequencies and observed heterozygosity (Ho) were computed by the direct counting method, and adherence of phenotypical proportions to expectations under Hardy-Weinberg equilibrium were tested by the complete enumeration method using the GENEPOP 3.4 software (Rousset, 2008). Expected heterozygosity values (He) were estimated by the ARLEQUIN 3.0 software (Excoffier et al., 2007). Departure from selective neutrality was tested by Slatkin's implementation of the Ewens-Watterson homozygosity neutrality test (Slatkin, 1994); and the calculations were carried out using the PyPop software (Lancaster et al., 2003).

Results and Discussion
Due to the relatively low number of individuals studied per tribe, we opted for grouping tribes according to their linguistic group, since it has previously been suggested that, among South American Indians, populations whose members speak the same language are genetically homogeneous and may be viewed as the ultimate evolutionary unit (Fagundes et al., 2002). The 14 bp deletion allele was the most common one in seven of nine linguistic groups, while in two others (Karib and Chapacura) the 14 bp insertion prevailed. Two Amazonian linguistic samples deviated significantly from Hardy-Weinberg expectations (Tupi and Mura), both with an excess of heterozygotes ( Table 2). Seven of the nine linguistic group samples also presented higher observed heterozygote frequencies (Ho) as compared to those expected (He), and four of these groups (Mapugungu, Mura, Tupi and Zamuco) presented Ho frequencies above 0.6. To investigate whether natural selection could be reflected in these values, the Ewens-Watterson's test of neutrality was performed. All linguistic groups presented negative normalized F values (Ho lower than He), with P-values ranging from 0.0024 to 0.1958 (Table 2). Among them, four reached statistical significance, significantly deviating from the neutral evolution hypothesis; curiously, three were members of the Macro-Tupi linguistic family (Kariri Tupi, Tupi Mondé and Tupi) and the Karib (Table 2). This finding is remarkable, considering that Macro-Tupi encompasses a wide group of languages spoken by tribes which have a widespread distribution in the Brazilian Amazon. Mendes-Junior et al. (2007), studied seven different Amazonian tribes and none presented a genotype distribution that departed from neutrality, in spite of the fact that 14 out of the 16 villages (87%) presented negative F values. It is worthy of note that there is no overlap between the tribes or linguistic groups of the two studies. Our HLA-G 14 bp INDEL in Amerindians 921 investigation is the first to report a significant deviation from the neutral hypothesis at the 14 bp locus in Amerindians, but both works point in the same direction. In order to check whether the significant deviations observed for the HLA-G polymorphism were in any way biased due to a limited number of individuals representing each linguistic group, the same analyses were performed on data already available for two apparently neutral markers (haptoglobin and acid phosphatase) in a subset of the individuals tested for HLA-G polymorphisms. These two biallelic markers were chosen due to the availability of data on a representative number of individuals (188 and 171 individuals, respectively). As expected, no significant deviation was observed for acid phosphatase, and only among the Zamuco a deviation was observed for haptoglobin (data not shown, available on request). The occurrence of balancing selection at the HLA-G promoter region has previously been described (Tan et al., 2005). Considering that HLA-G expression and levels were already related to different situations (either in physiological or pathological conditions) and that this molecule, depending on the context, may be deleterious or advantageous, balancing selection at this locus seems to be a plausible possibility. Recently, Castelli et al. (2011) made a comprehensive review of the HLA-G gene polymorphism and haplotypes in a Brazilian urban cohort, evidencing a high linkage disequilibrium along the whole length of the gene. In this same work, the authors revealed evidence for balancing selection acting on the regulatory regions only (5' and 3' UTRs) and on the HLA-G locus as a whole. We cannot rule out that the evidence of balancing selection observed in our data could be due to a hitchhiking effect caused by a linkage disequilibrium between the 14 bp locus and the HLA-G promoter region. Nevertheless, the compelling evidence for the functionality of the 14 bp insertion in alternative splicing and its potential role in post-transcriptional regulation by microRNA binding make us believe that the 14 bp INDEL might also be an adaptive factor, influencing HLA-G expression patterns and probably is related to survival of heterozygous fetuses due to resistance to pathogens (Mendes-Junior et al., 2007). In conclusion, our data corroborate the evidence of balancing selection at the HLA-G gene, highlighting important regulatory roles of this molecule in the immune system.