Genetic differences between Chibcha and Non-Chibcha speaking tribes based on mitochondrial DNA (mtDNA) haplogroups from 21 Amerindian tribes from Colombia

We analyzed the frequency of four mitochondrial DNA haplogroups in 424 individuals from 21 Colombian Amerindian tribes. Our results showed a high degree of mtDNA diversity and genetic heterogeneity. Frequencies of mtDNA haplogroups A and C were high in the majority of populations studied. The distribution of these four mtDNA haplogroups from Amerindian populations was different in the northern region of the country compared to those in the south. Haplogroup A was more frequently found among Amerindian tribes in northern Colombia, while haplogroup D was more frequent among tribes in the south. Haplogroups A, C and D have clinal tendencies in Colombia and South America in general. Populations belonging to the Chibcha linguistic family of Colombia and other countries nearby showed a strong genetic differentiation from the other populations tested, thus corroborating previous findings. Genetically, the Ingano, Paez and Guambiano populations are more closely related to other groups of south eastern Colombia, as also inferred from other genetic markers and from archeological data. Strong evidence for a correspondence between geographical and linguistic classification was found, and this is consistent with evidence that gene flow and the exchange of customs and knowledge and language elements between groups is facilitated by close proximity.


Introduction
Studies about genetic variation among human populations are of great value for understanding genetic structure, migration routes and possible genetic relationships among different continental populations, and mitochondrial DNA (mtDNA) analysis has frequently put to such use in American populations (Schurr et al., 1990;Torroni et al., 1992Torroni et al., , 1993aTorroni et al., ,b, 1994Horai et al., 1993;Bailliet et al., 1994;Merriwether et al., 1994;Santos et al., 1994a,b;Bianchi et al., 1995;Lorenz and Smith, 1996;Merriwether and Ferrell, 1996;Bonatto and Salzano, 1997;Bisso-Machado et al., 2012). Despite its maternal inheritance (Giles et al., 1980), the mitochondrial genome is extremely useful for determining genetic histories because of its rapid rate of mutation (Brown et al., 1979) and lack of recombination and repair mechanisms. Most mtDNA polymorphisms are single nucleotide substitutions, but insertions and deletions have also been described (Brown et al., 1980;Cann and Wilson 1983;Cann et al., 1984;Wallace et al., 1985;Horai et al., 1993;Torroni et al., 1992Torroni et al., , 1993aTorroni et al., ,b, 1994Howell and Smejkal, 2000;). By revealing specific geographic locations for mitochondrial haplogroups, such studies helped to clarify migration patterns of human populations throughout history and over all continents (Fernandez-Dominguez, 2005). some primarily North American populations (Eshleman et al., 2003), but is absent in South America (Dornelles et al., 2005).
Colombia has great cultural and genetic diversity. Its indigenous population is distributed in 89 different ethnic groups which are estimated to represent 1.83% of the total population (Arango and Sánchez, 2006). Based on the theory that peopling of the Americas occurred by migration from northeast Asia across the Bering Strait and subsequent migration through Central America to South America (Turner, 1984;Greenberg et al., 1986;Dillehay and Meltzer, 1991), the nowadays Colombian territory at the northern tip of South America became an obligatory passage for people migrating to the southern cone.
In Colombia, several mtDNA studies of indigenous communities have been carried out (Mesa et al., 2000;Keyeux et al., 2002;Rodas et al., 2002;Torres et al., 2006;Melton et al., 2007;Rondon et al., 2007). In this study, we analyzed 424 individuals from 21 Amerindian populations to determine genetic structure and relationships among them based on geographical and historical information, as well as linguistic and genetic relationships with other tribes of the Americas.

Samples
We analyzed 424 blood samples from individuals unrelated by maternal lineage from 21 Amerindian tribes of Colombia (Table 1). Blood samples were collected between 1989 and 1992 after proper informed consent had been obtained. Informed consent included approval of each tribal Chief or Governor. The linguistic affiliation of each tribe is shown in Table 1. No Ge-Pano Carib speaking tribes were included in this study (Table 1).

DNA extraction and mtDNA haplogroup analysis
DNA was extracted using the salting out method (Gustincich et al., 1991) with the DNA Wizard Genomic DNA Extraction Kit (Promega Corporation, Madison WI), following manufacturer's recommendations.
Four regions of the human mtDNA representing mtDNA haplogroups A, B, C and D were PCR amplified 150 Usme-Romero et al.  (Ruhlen, 1987).
with the use of primers that were described elsewhere (Parra et al., 1998). Each amplification reaction consisted of 2.5 mL of DNA, 1.25 mL of each set of primers (10 nmol/mL), 2.0 mL of dNTPs (10 mM), and 0.125 mL of DNA Taq polymerase (Promega Corporation, Madison WI). The reaction mixture also contained 1.5 mL of MgCl 2 (25 mM) for haplogroup A, and 2.0 mL of MgCl 2 (25 mM) for the other haplogroups, respectively, in a final volume of 25 mL.
Amplification conditions consisted of a first denaturing cycle at 94°C for 5 min; followed by 34 cycles of denaturing at 94°C for 30 s, annealing at 50°C for 30 s (Haplogroups B and D) or at 55°C for 30 s (haplogroups A and C), extension at 72°C for 30 s, and a final extension step at 72°C for 5 min. The amplification products were evaluated by electrophoresis in a 2% agarose Nusieve/Seakem gel that was stained with ethidium bromide and photographed under UV light. 15 mL aliquots of the amplified products for groups A, C and D were digested with restriction enzymes for 3 h at 37°C, while haplogroup B was only analyzed by electrophoresis. The digestion products were separated by electrophoresis in a 3% Nusieve/Seakem gel and processed as described above.
In addition we calculated the degree of genetic differentiation among subpopulations (G ST ) based on the genetic diversity of the total population. An AMOVA analysis using Arlequin (Excoffier et al., 2005) was carried out using linguistic classification or geographical location as testing parameters. In the first analysis, we evaluated the linguistic classification of each tribe, and whether differences could be attributed to belonging or not to the Chibcha speaking family. In the second analysis, we tested groups by geographic location (Tribes located in the north; tribes located in the east-Orinoquian/Amazonian region, and tribes located in the Pacific region-west). We also conducted a comparison to determine if the Andes mountain range was a factor in genetic differentiation (Table 2).
Finally, we compared the genetic (F ST values), geographical (distance in km using the AMIGLOBE program) (Collard, 2006) and linguistic distance based on Ruhlen's classification (Ruhlen, 1987) matrices to calculate a possible relationship between these three variables. This was done with the aid of Arlequin, V3.1 software (Excoffier et

Results
Mitochondrial DNA haplogroup frequencies from 424 individuals belonging to 21 Amerindian tribes of Colombia are shown in Table 1 and Figure 1. Haplogroup A was found most frequently; its average frequency was 31% (131/424 individuals), followed by haplogroup C with 30.4% (129/424), haplogroup B with 22.4% (95/424) and haplogroup D with 13.4% (57/424). The 12 out of 424 individuals who did not show any of the four mtDNA founder haplogroups (2.8%) were listed as haplogroup E. At least two of four mitochondrial haplogroups were present in the 21 populations studied. The frequency distribution of these haplogroups ranged from 2.1% to 95.2%.
Genetic diversity index values are shown in Table 1. The least genetic diversity was found among the Chimila tribe (h = 0.0952) while the greatest one was found among the Piapoco (h = 0.8929). The average diversity index for all populations studied was h = 0.7447 (n = 424). Figure 2 shows the mtDNA haplogroup frequency distribution based on four geographical location groups: Caribbean (northern region), Amazonian (southern region), Pacific (western region) and Orinoquian (eastern region). We also included data from other studies (Torroni et al., 1994;Kolman and Bermingham, 1997;Merriwether et al., 1997;Mesa et al., 2000;Keyeux et al., 2002;Briceño et al., 2003;Torres et al., 2006;Melton et al., 2007;Barreto et al., 2008) in this analysis. There was a marked clinal pattern for mtDNA haplogroup distribution among Amerindian tribes of Colombia. Haplogroup A frequency was higher in the northern region of Colombia (50% frequency) decreasing 152 Usme-Romero et al.  to 20% in the southern region of the country while haplogroup C frequency was lower in the north and highest in the south. The pattern for haplogroup D was similar, being almost absent in the northern part of Colombia, and showing the highest value in the southern part of the country (25%). Haplogroup B was more frequent in the west, declining towards the east and south. We constructed a UPGMA tree based on F ST genetic distances which includes other Amerindian populations from Central and South America (Figure 3). One cluster included the Kogui, Arhuaco and Chimila tribes of Colombia and the Teribe, Guaymi and Guataso Chibcha-speaking tribes of Central America, which are all characterized by high frequencies of haplogroup A. An exception was found for the Arsario tribe, where none of the individuals tested in this Chibcha speaking tribe carried haplogroup A. The remaining Colombian tribes clustered together with other Amerindian tribes of South America that do not belong to the Chibcha linguistic family. The Guambianos, Paez and Ingano tribes were grouped within this cluster, reflecting their relationships to these non-Chibcha Amerindian populations.
We performed a non-metric multidimensional scaling analysis based on the mtDNA haplogroups identified (Figure 4,). Herein we included the results of other Amerindian populations (Table S1) as well as populations of African descent of Colombia (Nuqui, Guangui and Providencia) described by Rodas et al. (2002), and African populations as an outgroup (Chen et al., 2000) (Figure 4). Most of the Amerindian tribes are clustered together due to the heterogeneous presence of the four mtDNA haplogroups among them. However, the Chibcha speaking tribes have a tendency to cluster much closer together due to the high frequency of haplogroup A and low frequencies for haplogroups C and D. The African descent populations from Colombia are located intermediately between the Amerindian populations and the African population used as outgroup. This is due to the admixture process that resulted in the presence of some of the four mtDNA haplogroups among the Colombian African-descent populations.
The AMOVA analysis based on linguistic affiliation was used to test for differences based on belonging or not to the Chibcha linguistic family ( Table 2). The Guambiano and Paez tribes were not included since their languages have not been classified yet. The results showed that 69% variations were due to variations within populations and 21% was due to whether or not a tribe belonged to the Chibcha linguistic family (p < 0.001). Another AMOVA analysis based on the geographical location of Colombian Amerindian tribes detected no significant differences when the tribes were grouped according to the side of the Andes mountain range they were located. Significant differences were found among tribes residing in the northern part of Colombia (most of the Chibcha speaking tribes analyzed here), compared to the Pacific region and the Orinoquian/Amazonian region (p = 0.013), but not so for the Andes as a separating barrier (p = 0.150) ( Table 2).
Finally, the Mantel test was used to evaluate the possible relationship between genetic, linguistic and geographical distance. There was a strong correlation between linguistic and geographic distances, and a less strong correlation between genetic and geographic distances. There was no correlation between genetic and linguistic distance (Table 3).

Discussion
This study provides additional information on mtDNA haplogroup distribution in several Colombian Amerindian populations to previous studies . Haplogroup A, with an average frequency of 31% (131/424 individuals) was found most frequently. It was followed by haplogroup C with 30.4% (129/424), haplogroup B with 22.4% (95/424) and haplogroup D with 13.4% (57/424).
Previous studies of Colombian Amerindian populations have shown high frequencies of haplogroups A and C and lower frequencies for haplogroup D Torres et al., 2006;Melton et al., 2007;Rondon et al., 2007). Our results are in agreement with those reports. However, the 13.4% average haplogroup D frequency we found was higher than that previously published for Co-mtDNA Haplogroups in Colombian tribes 153  Table S1. lombian Amerindian populations of 6.6% by Keyeux et al. (2002) and 9.95% by Torres et al. (2006). These differences could be attributed to the fact that these three studies chose different populations to study, or may even be due to differences within groups of the same population. For example, Keyeux et al. (2002) found no haplogroup D in the Paez tribe, whereas we found this haplogroup in 33% of the Paez individuals. Similar situations occurred in the cases of the other haplogroups. For instance, in our study, the Arsario tribe did not carry haplogroup A (but only 8 individuals were tested), while 68% of the individuals of the Arsario tribe tested by Melton et al. (2007) were reported to carry haplogroup A. These results indicate an even greater genetic heterogeneity within the same populations than has been described before.
Only 12 out of 424 individuals showed none of the four founder mtDNA haplogroups (2.8%). These individuals may either have unrecognized founder lineages (Bailliet et al., 1994), recent racial admixture (Torroni et al., 1993a) or reversal of a mutation. The second possibility could be the case for the Wayuu, Arsario and Paez tribes in which admixture has been documented by blood groups and HLA class II genes (Yunis et al., 1994(Yunis et al., , 2001. The third possibility, which is termed haplogroup C revertant, is common in populations found in the Colombian Orinoquian and Amazonian basin (Torres et al., 2006). This may be the case for the Piapoco tribe of our study that showed a 25% frequency of non A-D haplogroups. A high frequency (59%) for the revertant C haplogroup had previously been found by Torres et al. (2006) for this tribe. The same scenario is possible for the Piartapuyo (12.5%), Tuyuca (16%) and Guanana (10%) Amerindian tribes that live geographically close together in the Northern Amazonian region of Colombia. They present low genetic admixture based on Y STR haplotypes (Campo, D and, JJY, unpublished data) and HLA Class II genes (unpublished data).
The Amerindian tribes considered in this study showed a high degree of genetic heterogeneity (Table 2) and diversity (similar to or greater than populations found throughout South America) as has been described before (Santos et al., 1994a;Batista et al., 1995;Kolman et al., 1995;Ward et al., 1996;Bonatto and Salzano, 1997;Mesa et al., 2000;Keyeux et al., 2002).
Genetic diversity values were higher among the Tucano-Equatorial speaking tribes (0.60 to 0.80) while the Chibchan-speaking groups showed lower values (0.09 to 0.50). These results are consistent with those reported for Chibcha speaking tribes from Central and South America, including Colombia (Torroni et al., 1994;Kolman et al., 1995;Keyeux et al., 2002). The higher diversity values found in Amazonian populations may be a result of gene flow between these populations, as has been shown for other genetic markers such as the Y-chromosome (Mesa et al., 2000). Alternately, it could be the result of fission, fragmentation and founder effects (Cavalli-Sforza et al., 1992).
The population that showed the lowest genetic diversity value (and the highest frequency for haplogroup A) was the Chimila (h = 0.0952). The low diversity found in this population has been reported by others  looking at different genetic markers and is probably due to inbreeding (unpublished data).
The high genetic diversity found in our study and others indicates that it is unlikely that bottleneck events took place during the early Amerindian settlement of South America. However, it is evident that Amerindian populations located in northern Colombia that belong to the Chibcha linguistic family differ from non-Chibcha speaking tribes, as has been described before with nuclear genetic markers (Yunis et al., 1994(Yunis et al., , 2001. Previous studies have shown that Amerindian populations of northern Colombia are close to Central American tribes and North American Amerindian populations (Stone and Stoneking, 1993;Lorenz and Smith, 1996;O'Rourke et al., 2000;Melton et al., 2007). Our results provide further support indicating that Chibcha speaking tribes in Central and South America genetically differentiated from non-Chibcha speaking tribes prior to entering South America.
There were marked clinal patterns for mtDNA haplogroup distribution among Amerindian tribes of Colombia. When populations were grouped according to their geographical location (northern-Caribbean; southern-Amazonian, western-Pacific and eastern-Orinoquian) (see Figure 2), haplogroup A frequency was high in the northern part of Colombia (50% frequency) but decreased to 20% in the southern part of the country. Haplogroup C frequency was lower in the north and had its highest value in the south. Similarly, haplogroup D was almost absent in the north, but had the highest value in the southern part of the country (25%). Haplogroup B was more frequent in the west and had decreasing frequencies towards the east and south. These clinal patterns are similar to those described earlier (Torroni et al., 1994;Lalueza-Fox, 1996;Lorenz and Smith, 1996;Lalueza et al., 1997;Keyeux et al., 2002;Bisso-Machado et al., 2012).
A UPGMA tree constructed from data for the Amerindian tribes analyzed in this study plus data from several Amerindian populations from Central and South America described elsewhere showed a cluster of Chibcha speaking tribes (Chimila, Arhuaco, Kogui, Teribe, Guaymi tribes), which are genetically distant from other Amerindian tribes analyzed. The second cluster includes the remaining tribes including the Guambiano, and Paez tribes. The results for these two tribes, which currently have unclassified languages, provide further support of a genetic relationship to Tucano-Equatorial or Andean linguistic families rather than to the Chibcha linguistic family where they had been classified before. Similar results have also been obtained with HLA genes (Yunis et al., 2001). Some authors have postulated that the Paez originated from the Amazonian region and migrated northeast to their present location before the Spanish conquest (Arboleda, 1993). Recently, archeological findings in the west Amazonian region of Colombia have provided further support for an Amazonian ancestral origin of the Guambiano and Paez tribes.
The AMOVA analysis showed a significant association (p < 0.001) due to variation based on whether or not a tribe belonged to the Chibcha linguistic family (21%) (Table 2). On the other hand, our results do not support the hypothesis that the Andes mountain range served as a differentiation factor for the Amerindian tribes studied.
The strong genetic differentiation between the Chibcha and non-Chibcha speaking tribes is likely due to the high frequency of haplogroup A among these populations. Similar results were obtained in the past using the major histocompatibility complex and other genetic markers (Yunis et al., 1994(Yunis et al., , 2001. The correlation analysis between the geographical, linguistic and genetic data (Table 3) showed the highest correlation value for the linguistic-geographical pair followed by the genetic-geographic comparison. These results are explained by the fact that many populations that belong to the same linguistic family are also geographically close, so it is difficult to infer whether there is a linguistic-genetic relationship based solely on mtDNA haplogroups. The Amerindian tribes that are closely related are also geographically close, which facilitates gene flow and exchange of customs, knowledge and languages. Both geographic and linguistic factors are associated with genetic differentiation in the Amerindian populations analyzed in Colombia. As has been found for other Amerindian tribes, these three parameters have evolved together in a historical and strongly correlated fashion.