Effect of Race, Genetic Population Structure, and Genetic Models in Two-locus Association Studies: Clustering of Functional Renin-angiotensin System Gene Variants in Hypertension Association Studies

Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations. The purpose of the present study was to quantify this bias in two-locus association studies conducted on an admixtured urban population. We studied the genetic structure distribution of angiotensin-converting enzyme insertion/deletion (ACE I/ D) and angiotensinogen methionine/threonine (M/T) polymorphisms in 382 subjects from three subgroups in a highly admixtured urban population. Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects. We conducted sample size simulation studies using these data in different genetic models of gene action and interaction and used genetic distance calculation algorithms to help determine the population structure for the studied loci. Our results showed a statistically different population structure distribution of both ACE I/D (P = 0.02, OR = 1.56, 95% CI = 1.05-2.33 for the D allele, white versus black subgroup) and angioten-sinogen M/T polymorphism (P = 0.007, OR = 1.71, 95% CI = 1.14-2.58 for the T allele, white versus black subgroup). Different sample sizes are predicted to be determinant of the power to detect a given genotypic association with a particular phenotype when conducting two-locus association studies in admixtured populations. In addition, the postulated genetic model is also a major determinant of the power to detect any association in a given sample size. The present simulation study helped to demonstrate the complex interrelation among ethnicity, power of the association, and the postulated genetic model of action of a particular allele in the context of clustering studies. This information is essential for the correct planning and interpretation of future association studies conducted on this population.


Introduction
Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations.Particularly, studies with putative genetic risk factors for hypertension have been criticized by many investigators because their results have not been consistent (1).
Variants of the renin-angiotensin system (RAS) genes have the potential to reveal certain aspects of the genesis of some cardiovascular diseases.Two of these allelic variants have been undoubtedly associated with particular intermediate phenotypes that could help explain the RAS role in some of the pathophysiological derangements observed in cardiovascular homeostasis in a number of diseases (2,3).
A diallelic polymorphism in the angiotensin-converting enzyme (ACE) gene, characterized by a deletion (D) or insertion (I) in the 16th intron of the gene, has been unequivocally associated with differences in plasma ACE levels (4).However, its role in predicting hypertension and other cardiovascular diseases (e.g., myocardial infarction, cardiac hypertrophy) (4-10) remains unclear.
Additionally, a methionine®threonine variant at position 235 (M235T) of the angiotensinogen (ATG) gene has been considered a marker for primary hypertension and elevated plasma ATG concentrations (3).Other studies, however, have found no association between the M235T ATG variant and a number of cardiovascular phenotypes (11).
The hypothesis that a synergistic effect of both RAS genetic variants present in the same individual would increase cardiovascular risk has also been contemplated in a small number of studies and, again, no consensus has been obtained (12,13).
Finally, few studies have contemplated the relationship between the statistical power of association studies and sample size for different genetic models.Much criticism has been directed at the real value of not only positive, but also negative results (1).These facts have raised great concern about the real role of these variants in predicting the occurrence, or the outcomes, of cardiovascular disease in a particular population.Different ethnic allele frequencies and genetic structures among the studied populations, and the proposed genetic model of action of these genes in the genesis of a particular phenotype have been considered as biases in many of the studies to date (14,15).
No study to date has quantified this bias.In addition, the distribution of the RAS genetic polymorphisms in relation to population structure has been overlooked in previous investigations (1).To quantify this bias, we studied the distribution of RAS genetic polymorphisms and population genetic structure in three subgroups from a highly admixtured urban population.Sample size simulations were conducted using these data in different genetic models of gene action and interaction.These data can have important applications in the planning and interpretation of genetic association studies in our area and other similar urban areas worldwide.

Study population
The study population comprised 382 consecutive unrelated men and women admitted to the Blood Donation Center of the Fundação Pró-Sangue, São Paulo University Medical School.The population was divided into three groups according to ethnic morphological criteria.Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects.Mulatto, in this study, was defined as an individual with a known family history of admixture between black and white populations.The ethnic morphological subgroup classification was based on phenotype pigmentation of the abdomen, hair color, type and conformation of the nose and lips, and family history, as determined and agreed upon by two examining physicians (16).It should be noted that our subgroup stratification was based on morphological criteria and not on a true ethnic group stratification.Ethnic group stratification is defined by a group of genetic, linguistic and cultural characteristics.In particular, it has been shown that in the Brazilian population morphological criteria are not highly concordant with ancestry (17,18).Venous blood was obtained for genomic DNA extraction.

Assessment of the angiotensin-converting enzyme gene polymorphism genotype
Five-milliliter blood samples were drawn into tubes containing EDTA.The ACE I/D polymorphism was determined using a threeprimer system (19), which minimizes the mistyping (20) that can occur with a twoprimer system.The polymerase chain reaction (PCR) products were visualized by electrophoresis on 3% agarose gel with ethidium bromide and stored in digital form.

Assessment of the angiotensinogen gene polymorphism genotype
The M235T variant of the ATG gene was detected by a PCR method.Primers were selected according to Russ et al. (21).The PCR parameters were described by Kiema et al. (22).Three microliters of unpurified PCR product was diluted to 10 µl in recommended restriction buffer containing 5 U of Tth 111 I and digested at 65 o C overnight.The PCR products were visualized by electrophoresis on 3% agarose gel with ethidium bromide and stored in digital form.

Statistical analysis
Data were analyzed by the chi-square test and ANOVA using the statistical program GraphPad Prism (version 2.0, GraphPad Software Inc., San Diego, CA, USA).The Hardy-Weinberg equilibrium for the distribution of genotypes was estimated using the chi-square test.Allele and genotype frequencies were compared using the chi-square test and the chi-square test for linear trends available in EpiInfo (version 6.0).Genetic structure analyses, including population pairwise F ST values, sample heterozygosity and the population pairwise differentiation test, were conducted using the statistical program Arlequin (version 1.1, Genetics and Biometry Laboratory, University of Geneva, Switzerland) for population genetic data analysis (23).Sample size calculations were conducted using EpiInfo (version 6.0).Modeling different mechanisms of action of a particular allele was conducted by grouping individuals with one or two particular genotypes regarding the chosen model (i.e., dominant models contemplate as being at increased risk of developing the studied phenotype both the homozygous and heterozygous individuals harboring allele conferring increased risk; similarly, in the recessive model of action only the homozygous individuals for the allele conferring increased risk were considered to be at increased risk of developing the studied phenotype).Values of P<0.05 were considered to be statistically significant.

Population demographics
Population demographics are summarized in Table 1.The mean age for each subgroup was 32.4 ± 0.8 years for the white population, 30.6 ± 0.7 for the mulatto population, and 30.3 ± 0.8 for the black population.ANOVA did not reveal any relationship between the parameters studied and age in any of the populations.While there was approximately the same number of men in each group, the black population was primarily male, with 81% (73/90) being men.Groups I and II had approximately equal numbers of men and women, with 50% (75/150) and 51% (73/142) males, respectively.

Allele, genotype and association frequencies
Table 2 displays the allele and genotype frequencies of the ACE I/D and the ATG M/T gene variants.The genotype frequencies were consistent with Hardy-Weinberg equilibrium.
There were no sex differences in allele or genotype frequencies in any of the three groups.Therefore, male and female data were pooled and analyzed together as the same group.
Allele frequencies for the ACE gene variants differed significantly between groups: 54.3% in the Caucasian population (group I), 65.0% in the black population (group III), and 57.0% in the mulatto population (group II) for the D allele (P = 0.02, odds ratio (OR) = 1.56, for white versus black groups).ATG gene variants also showed a statistically different distribution among ethnic groups: 57.0% in group I, 56.0% in group II, and 69.4% in group III for the T allele (P = 0.007, OR = 1.71, for white versus black groups).When clustering both genotype frequencies a higher frequency of a postulated worse genotypic association (DD + TT) was evident in group III, the black population (11.3% in group I as compared to 21.1% in group III, P = 0.04, chi-square = 4.22) (Figure 1).There were no significant differences between groups (one-way ANOVA).

White
Mulatto Black

Sample heterozygosity
Sample heterozygosity, the proportion of heterozygote individuals for at least one of the studied polymorphisms, was 0.70 in group I, 0.73 in group II and 0.67 in group III.

Genetic structure analysis
Population pairwise F ST values were significantly different only between the white (group I) and black (group III) subgroups (P = 0.03 for group I versus group III, P = 0.83 for group I versus group II, and P = 0.07 for group III versus group II).
Finally, genotype frequency analysis by a population pairwise differentiation test using 10,000 Markov chain steps showed significant difference between all populations in the sample (probability (P) of nondifferentiation = 0.048).When pairs of populations were tested there was a significant difference between the mulatto and black populations (P = 0.03) and the black and Caucasian populations (P = 0.03).No difference was observed when comparing the Caucasian and the mulatto populations.These data imply that the populations studied differ in their genotypic components.

Sample size simulations
To quantify the possible role of ethnicity and population structure in association studies using these genes we conducted sample size simulations for both genes.Additionally, we used different genetic models (dominant/recessive) to study the influence of a particular model on the sample size and the design of the study.These data are shown in Tables 2 and 3.As illustrated in Figures 2  and 3, size was strongly dependent on relative risk of the variant for the development of the studied phenotype, ethnicity and genetic model of action, particularly when two-locus studies are considered.Sample size calculations used confidence intervals of 95%, power of 80% and a 1:1 case-control ratio.ACE, angiotensin-converting enzyme; ATG, angiotensinogen.
differences among these populations could explain the contradictory findings of some of these studies (14).Indeed, the ACE I/D polymorphism is one of the best-known examples.
Another major concern about the validity of genetic association studies is related to the small amount of knowledge about the molecular biology of the variants studied.It is especially noteworthy that analysis and interpretation remain primarily based on a particular assumption about the mode of action (i.e., dominant or recessive) of the genetic polymorphism being studied.
The data presented here establish the distribution of RAS genetic polymorphisms and the population genetic structure in three ethnic subgroups from a highly admixtured urban population.On this basis, it is interesting to note the different genotypic distribution of these markers among the studied groups (Figure 1).Whereas the observed distributions could be responsible for possible selection bias in association studies conducted on this population, the clustering of "worse" variants was more prominent in the black than in both the Caucasian and the mulatto populations.These findings may be related to a higher frequency and a worse outcome of hypertension and cardiovascular disease observed in black populations (25).The hypothesis of an association between these variants and the development or the severity of hypertension or another cardiovascular phenotype in this group should be tested in large, population-based studies.In addition, the use of population genetic algorithms as presented here can help investigators to determine the genetic differences between two populations, especially when one is working with a multi-locus model.
The use of sample size calculation models is of help in the quantification of this selection bias.Based on the data from this study, for example, to detect a 2.0 OR association of the ATG T allele with a particular phenotype in a dominant model of action it

Discussion
Association studies are increasingly being used in the genetic dissection of complex diseases.This approach has become an alternative to linkage studies, which have limited power to detect genes of modest effect (24).Despite the greater power of genetic association studies, there are several limitations to their application in the study of complex traits and diseases.
One of the major concerns is the little knowledge generally available about the allelic distribution among populations of different ethnic origins.It has been argued that Our data highlight the importance of elements that should be taken into consideration in association studies in admixtured urban populations.The increasing knowledge about population genetics and genetic variance among different populations will add important elements to the analysis of complex diseases and traits.Similarly, knowledge of gene action and interaction should be exercised whenever interpreting and analyzing such studies.250 subjects with the phenotype (and the same number of control individuals) if one is studying white subjects.However, if studying black subjects, it would be necessary to study 680 subjects with the same phenotype.Conversely, if one is working with the hypothesis of an association of the ACE D allele in a recessive model of action, to identify a 1.5 OR relationship with a particular phenotype, 461 subjects would be necessary in a white population, but only 403 in a black population.Thus, ethnicity should be considered a potential bias in unmatched studies, especially when one is working with the hypothesis of a dominant model of action.
Most important, however, is the use of this approach when working with two-locus models.As shown in this study, sample size is highly dependent on both ethnicity of the studied population and genetic model of action of the genes involved in the analysis.For instance, 685 black subjects would be required in a study in which the ACE D allele and the ATG T allele both have a dominant effect on the phenotype studied.Only 525 subjects would be required in a white population.This number would be completely different if one were working with a model in which the ACE D allele has a recessive action and the ATG T allele has a dominant one.In this case, for the same 1.5 OR, one would need 539 Caucasian subjects as opposed to only 407 black subjects with the phenotype of interest.It should also be noted that epistatic effects, which can substantially add complexity to the model, were not considered in these simulations.
Matched control groups (e.g., sib-pair studies) remain the best choice to avoid the ethnicity bias (24).However, it should be noted that the proposed genetic model of action of a particular combination of alleles can add a much stronger bias than

Figure 1 .
Figure 1.Genotypic distribution of the genetic variants among the subgroups.As shown, the clustering of "worse" variants (DD/TT) was more prominent in the black than in both the white and mulatto populations.DD, DI, II = angiotensin-converting enzyme I/D gene variants.MM, MT, TT = angiotensinogen M/T gene variants.
ACE I/D = angiotensin-converting enzyme insertion/deletion polymorphism; ATG M/T = angiotensinogen methionine/threonine polymorphism; DD, DI, II = ACE I/D gene variants; MM, MT, TT = ATG M/T gene variants; OR = odds ratio.P value (overall) refers to the comparison among the three subgroups.P value for trend refers to all groups (presented OR in relation to white DD genotype frequency) (chi-square test for linear trends).

Figure 2 .
Figure 2. Effect of ethnicity and odds ratio on the recommended sample size of an association study conducted on this population using dominant (A) or recessive (B) models of action of the genetic variant studied (i.e., increased risk of the DD or the TT genotype in a recessive model, or the presence of the D or T allele in a dominant model).ACE = angiotensin-converting enzyme; ATG = angiotensinogen.Open circles = white subjects; closed circles = mulatto subjects; triangles = black subjects.

Figure 3 .
Figure 3.The effect of ethnicity and genetic model on the recommended sample size for a fixed odds ratio (2.0) for the respective genotypic association.As shown, sample size is strongly dependent on both relative risk of the variant for the development of the studied phenotype, ethnicity and genetic model of action, particularly when clustering studies are taken.ACE = angiotensin-converting enzyme; ATG = angiotensinogen.

Table 2 .
Allele and genotype frequency distribution of the ACE I/D and ATG M/T polymorphisms among the studied groups.

Table 3 .
Sample size calculations considering the effect of ethnicity, odds ratio and genetic model for both genes individually (A,B) and when clustering studies are performed (C).