Uniparental genetic markers in South Amerindians

A comprehensive review of uniparental systems in South Amerindians was undertaken. Variability in the Y-chromosome haplogroups were assessed in 68 populations and 1,814 individuals whereas that of Y-STR markers was assessed in 29 populations and 590 subjects. Variability in the mitochondrial DNA (mtDNA) haplogroup was examined in 108 populations and 6,697 persons, and sequencing studies used either the complete mtDNA genome or the highly variable segments 1 and 2. The diversity of the markers made it difficult to establish a general picture of Y-chromosome variability in the populations studied. However, haplogroup Q1a3a* was almost always the most prevalent whereas Q1a3* occurred equally in all regions, which suggested its prevalence among the early colonizers. The STR allele frequencies were used to derive a possible ancient Native American Q-clade chromosome haplotype and five of six STR loci showed significant geographic variation. Geographic and linguistic factors moderately influenced the mtDNA distributions (6% and 7%, respectively) and mtDNA haplogroups A and D correlated positively and negatively, respectively, with latitude. The data analyzed here provide rich material for understanding the biological history of South Amerindians and can serve as a basis for comparative studies involving other types of data, such as cultural data.


Introduction
Native Americans have been the subject of a large number of population genetic studies because of particular characteristics: (a) there are groups among them that until recently had a hunter-gatherer way of living with only incipient agriculture, typical of our ancestors, (b) they show considerable interpopulation but low intrapopulation variability, and (c) since until recently they could not write there is no written record of their history, except for those of non-Amerindian colonizers. Biological studies can therefore be used to investigate their past.
The first genetic studies examined the variability in blood groups and proteins and have been summarized in Salzano and Callegari-Jacques (1988) and Crawford (1998). The advent of modern molecular biology, which allows direct, detailed DNA analysis, has opened new possibilities for investigating these populations.
DNA studies can basically be divided into two groups: those involving autosomal markers and those involving uniparental (Y-chromosome, mitochondrial DNA) markers. The latter are important because they can provide a clear-cut pattern of historical events that is not clouded by recombination factors. For Amerindians, the number of reviews that have dealt with these markers is not large or comprehensive. For the Y-chromosome, Bortolini et al. (2003) considered 438 individuals from 23 Southern and one Northern Amerindian populations who were screened for eight single nucleotide polymorphisms (SNPs) and six short tandem repeat/microsatellite (STR) loci, and Zegura et al. (2004) studied 63 binary polymorphisms and 10 STR regions in 2,344 persons from 15 Northern and three Southern Amerindian groups. Only a few recent studies have used all known SNPs necessary to identify the major Native American Y-haplogroups and their sublineages in Amerindian populations (Geppert et al., 2011;Jota et al., 2011;Bisso-Machado et al., 2011).
The most recent mtDNA reviews were published four years ago and involved sequence variability in the hypervariable region 1 (Hunley et al., 2007;Lewis Jr et al., 2007). Schurr and Sherry (2004), on the other hand, associated data from Y-chromosome markers with mitochondrial DNA (mtDNA) results, providing a good picture of the information available at the time. No general review considering both data sets has been published since then. specific information on Y-STR markers for 29 populations and 590 subjects is given. The haplogroup mtDNA data included 108 populations involving a total of 6,697 persons. Geographic and linguistic factors that may have influenced this variation were carefully considered, leading to a global, overview of the genetic pattern associated with these markers in South Amerindians. Information on mtDNA sequencing studies is also supplied.

Materials and Methods
The data used in this review were obtained from 17 primary surveys of the Y-chromosome and 66 primary surveys of mtDNA. These studies were retrieved through PubMed and by searching the reference lists of the corresponding papers. Haplogroup frequencies were obtained by direct counting. Intra-and inter-populational diversity was calculated with AMOVA (Weir and Cockerham, 1984;Excoffier et al., 1992;Weir, 1996) using Arlequin 3.5.1.2 software (Excoffier and Lischer, 2010). AMOVA was also used to estimate the level of differentiation between and within 17 pre-defined language and 7 geographical catego-ries, respectively. The distribution patterns of the mtDNA haplogroup frequencies were established by generating isoline maps using IDRISI 16.0 software (IDRISI Taiga) (Eastman, 2006). Spearman's correlation coefficients were calculated with PASW Statistics 18 software. Average heterozygosity (ah) was calculated with Arlequin 3.5.1.2 software. Table 1 gives the distribution of the Q and non-Qchromosomes (defined by a set of SNPs), as well as linguistic and geographical information for the samples considered. The samples were distributed from latitude 11°North to 45°South and longitude 46°to 76°West, with the individuals involved speaking 23 languages. Sample sizes varied widely from 1 to 151 individuals. Twenty-two of the studies involved less than 10 persons. Unfortunately, there is no standardization on the number of SNPs studied and in most cases only the M242 and M3 markers (which define the Asian/Native American paragroup Q* and its autochthonous Native American sublineage Q1a3a*, respec- 366 Bisso-Machado et al.  tively; Pena et al., 1995;Bortolini et al., 2003;Seielstad et al., 2003) were investigated. This fact precludes a complete, precise view of the distribution of Q1a3a sublineages and other Q clade chromosomes in South America. For this reason, the information in Table 1 was limited to the frequencies of the Q and non-Q-lineages only. Note that non-Q-chromosomes (which, for the reasons given above, could not be identified in sublineages) were identified in~50% of the tribal groups. For some of these populations admixture with non-Indians is known and could be the source of these non-Q chromosomes (for example, Mapuche and Guarani; Marrero et al., 2007;Bailliet et al., 2009;Blanco-Verea et al., 2010). Overall, the numbers presented in Table 1 indicate a higher presence of non-Q lineages in southern populations than in those of the northern/Amazonian region, probably because of greater admixture with non-Indians in the former than in the latter. However, for some isolated groups such as the Yanomámi, it is unlikely that admixture explains the findings. In these cases other causes are more probable, such as the presence of unknown autochthonous lineages and/or known Q lineages whose defining markers were not tested. Despite the great variation in the number of Y-SNPs used in these studies, Figure 1 illustrates some of the trends that were observed: The autochthonous Native American Q1a3a* is almost always the most prevalent, whereas its sublineages (Q1a3a1, Q1a3a2, Q1a3a3 and Q1a3a4) seem to have more restricted geographical distributions. The second most prevalent, Q1a3*, appears to occur equally in all regions, suggesting its presence among the first settlers of South America. The other known Q clade chromosomes (Q1*, Q1a*, Q1a1, Q1a2, Q1a4, Q1a5, Q1a6 and Q1b)

Results and Discussion
have not yet been identified in South America. Only one non-Q-chromosome (C3*) of probable native origin has been described in northwest South Amerindian populations ( Figure 1; Geppert et al., 2011).
The nature of some evolutionary and demographic scenarios, mediated by men, in native American populations has also been evaluated by using Y microsatellite markers (Y-STRs), which have a much faster evolutionary rate than SNPs. Y-STRs allow the retrieval of population and chromosome evolutionary histories. For example, STR data have been used to estimate that the mutations that gave rise to the Q1a3a1 and Q1a3a4 sublineages occurred 7,972 ± 2,916 and 5,280 ± 1,330 years ago, probably in northwest South America and the Andean region, respectively (Bortolini et al., 2003;Jota et al., 2011). Table 2 shows the STR allele frequencies observed in 29 South Amerindian populations, based only on Q clade chromosomes. In this compilation, we considered only studies containing information on the allele frequencies for each population individually. There was considerable variation in the number of samples tested in each study, the number of tribes, and the number of individuals per tribe. Depending on the locus considered, the number of alleles observed ranged from one to eight, with some of them appearing in only one study while others were present in almost all populations. Based on the most prevalent alleles per locus we reconstructed a probable haplotype of the ancient Native American Q-clade chromosome (ANAQC) as: Bisso-Machado et al. Arranged according to latitude. 2 Classification according to Lewis (2009). 3 Original language is extinct. 4 The Diaguita spoke originally Kakán, but this language became extinct and was substituted by Quechua.  Table 1, plus Santos et al. (1995), Underhill et al. (1997Underhill et al. ( , 2001, Karafet et al. (1997Karafet et al. ( , 1999Karafet et al. ( , 2008, Carvalho-Silva et al. (1999), Vallinoto et al. (1999), Bortolini et al. (2002), The Y-Chromosome Consortium (2002) and Geppert et al. (2011).  Bortolini et al. (2003); 3 Demarchi and Mitchell (2004)     4(DYS390)-10(DYS391)-14(DYS392)-13(DYS393)-14( DYS437)-11(DYS438)-12(DYS439)-20(DYS448)-15(D YS456)-16(DYS458)-22(DYS635). Using this information and additional data for these loci (except DYS388) reported in the Y Chromosome Haplotype Reference Database we found no matches in 36,448 haplotypes (245 populations). Although we found no complete identity with our estimated ANAQC, three one-step neighbor haplotypes were encountered, two in individuals with an admixed ancestry living in Latin American countries and one in a Native American individual (Kaqchiquel). Table 3 shows the results of the molecular analysis of variance for populations structured by language or geography based on the data in Table 2. The estimates were calculated for each STR locus because testing heterogeneity prevented haplotype identification. As expected, most of the diversity was attributable to intrapopulation variation, with one exception (DYS437) that was explained by the fixation of allele 14 in 40% of the populations, whereas only allele 8 was found in the Wichí. In contrast, significant variation among subdivisions was detected for only six loci (DYS398I, DYS391, DYS392, DYS393, DYS437 and DYS456) and in five out of these six it was attributable to geography. There was also considerable inter-population/within subdivision variability (significant in 28 of 30 evaluations), with the average percentage being 16% for geography and 21% for language. Table 4 summarizes the information on sequencing studies of mitochondrial DNA. The mtDNA genome of representative individuals from 35 populations has been entirely sequenced, as reported in six publications (Ingman et al., 2000;Kivisild et al., 2006;Tamm et al., 2007;Fagundes et al., 2008;Perego et al., 2009Perego et al., , 2010. However, the analyses performed did not consider the within South Amerindian relationships and were mostly concerned with interethnic or interhaplogroup comparisons. Based on 86 complete Amerindian genomes, Fagundes et al. (2008) concluded that the prehistoric colonization of the Americas involved a single founding population, with an initial differentiation from Asia occurring in Beringia that ended around 19,000-23,000 years ago, with a moderate bottleneck. Expansion into the New World would have occurred about 18,000 years ago. An extensive 5.76 kb analysis by Dornelles et al. (2005) established that haplogroup X is not present in extant South American Indians.

Genetics of South Amerindians
The most extensive set of data involves the highly variable segment 1 (HVS-I) that has been studied in 92 populations and reported in 30 papers; surveys that have included the HVS-II region are much less common (10 articles) (Table 4). For HVS-I, Merriwether et al. (2000) provided an excellent example of how intrapopulation variability in the Yanomámi could be interpreted in a historical and demographical context and relating it to other Amerindian and Asian data. They studied 129 Yanomámi sequences from individuals in eight villages and compared their haplotypes with those of other Asian and New World populations, in a total of 482 unique haplotypes. Interestingly, the pairwise inter-population gene flow estimates were lower between some pairs of Yanomámi villages than between them and four other South Amerindian groups.
With regard to intrapopulation variability, as measured by Q k , Fuselli et al. (2003) and Corella et al. (2007) reported extensive variation for 14 and 27 Central and 374 Bisso-Machado et al.
To explore the mtDNA data further we compiled the prevalences of haplogroups A-D for 109 populations, in a total of 6,697 individuals distributed between latitude 11°N orth and 54°South, and longitude 46°to 78°West (Table 5). Sample sizes varied widely, from only one subject tested (Jebero) up to 491 (Yanomámi). The haplogroup frequencies reported in 52 articles also varied widely. The presence of mtDNA genomes of probable non-Amerindian origin was rare in all regions and populations, in contrast to the Y-SNP data (Table 1). Asymmetrical sex-mediated admixture was common during the first centuries of South American colonization, and involved mostly European men and Amerindian/African women. The main consequences of this historical contact was the formation of mestizos and the present-day national societies; the former are characterized by a composite genome, with the majority of Y-chromosomes being of European origin, while their mtDNA derives from Amerindian or African sources (Bortolini et al., 1999;Alves-Silva et al., 2000;Carvalho-Silva et al., 2001;Salzano and Bortolini, 2002). Asymmetrical mating could also explain the introduction of non-Amerindian Y-chromosomes into the tribes, while the autochthonous mtDNA genomes were preserved. However, the admixture dynamics are probably different from those observed in urban groups since they normally involve Amerindian women who live on reservations and men who live near the border of the reservations. In this situation, the children normally remain with their mothers. This phenomenon has been described for Guarani Indians (Marrero et al., 2007), but the data presented here indicate that it could be much more common than previously thought.    Mesa et al. (2000); Keyeux et al. (2002); Melton et al. (2007) Kogi ( Desano (2) Torroni et al. (1992Torroni et al. ( , 1993; Easton et al. (1996);Merriwether et al. (2000); Williams et al. (2002); Silva Jr et al. (2003) Guayabero ( Santos et al. (1996); Lobato-da- ; Silva Jr et al. (2003); Mazières et al. (2008) Siona (12) Mesa et al. (2000); Keyeux et al. (2002); Torres et al. (2006); Rojas et al. (2010) Tucano (17) Table 6 summarizes the influence of geography. In the seven regions that were defined, 74% of the variation occurred within populations, 6% among geographic divisions and 20% among populations within divisions. To analyze this variability further, the isolated frequencies of haplogroups A to D were plotted as shown in Figure 2. High frequencies of haplogroup C were observed in specific regions along the northwestern portion of the continent, with additional high spots in southern Brazil and northern Argentina. The prevalences of haplogroups B and D showed a clear east-west separation, while for haplogroup A there were three main high prevalence nuclei in the north, center and south of the continent. Spearman's correlation coefficient between haplogroup frequencies and latitude yielded a positive value (0.27; p < 0.01) for haplogroup A, with a corresponding negative one (-0.25; p < 0.01) for haplogroup D. The coefficients for haplogroups B and C were not significant.  Merriwether et al. (1995); Moraga et al. (1997Moraga et al. ( , 2000 Mapuche (314) 5 23 32 36 4 Araucanian 39°10' -41°20' S; 68°37' -70°22' W Ginther et al. (1993); Horai et al. (1993); Bailliet et al. (1994); Bianchi et al. (1995); Moraga et al. (2000) Huilliche (207) Moraga et al. (1997Moraga et al. ( , 2000 Kawéskar 2 (Alacaluf)   Table 7 summarizes the influence of language. Sixteen main language groups were considered, plus a composite set of "others". The AMOVA results indicated that 73% of the haplogroup prevalence variability occurred within populations, with 7% of it being attributable to languages. However, there was considerable heterogeneity (20%) within the language categories established. Overall, the variability was similar to that obtained for geography. 382 Bisso-Machado et al.

Conclusion
South Amerindians have been extensively studied with regard to the Y-chromosome, as well as and especially so for mtDNA markers. In agreement with studies from other regions, by far most of the mtDNA variability (73%-74%) is intrapopulational. Geographical and linguistic factors influenced the patterns of mtDNA diversity to a similar extent, while geography was apparently more important than language in explaining the data for the Y chromosome Q clade-STRs. Additional factors that may have influenced these results include distinct male and female migration patterns, as well as cultural and other characteristics. The fact that most studies have generally dealt with small populations, in which genetic drift may be important, could also have influenced the results.