Polymorphisms associated with the risk of lung cancer in a healthy Mexican Mestizo population: Application of the additive model for cancer

Lung cancer is the leading cause of cancer mortality in Mexico and worldwide. In the past decade, there has been an increase in the number of lung cancer cases in young people, which suggests an important role for genetic background in the etiology of this disease. In this study, we genetically characterized 16 polymorphisms in 12 low penetrance genes (AhR, CYP1A1, CYP2E1, EPHX1, GSTM1, GSTT1, GSTPI, XRCC1, ERCC2, MGMT, CCND1 and TP53) in 382 healthy Mexican Mestizos as the first step in elucidating the genetic structure of this population and identifying high risk individuals. All of the genotypes analyzed were in Hardy-Weinberg equilibrium, but different degrees of linkage were observed for polymorphisms in the CYP1A1 and EPHX1 genes. The genetic variability of this population was distributed in six clusters that were defined based on their genetic characteristics. The use of a polygenic model to assess the additive effect of low penetrance risk alleles identified combinations of risk genotypes that could be useful in predicting a predisposition to lung cancer. Estimation of the level of genetic susceptibility showed that the individual calculated risk value (iCRV) ranged from 1 to 16, with a higher iCRV indicating a greater genetic susceptibility to lung cancer.


Introduction
Lung cancer (LC) is the major cause of mortality from neoplasias worldwide. In Mexico, LC was responsible for 11.45% of deaths by malignant neoplasia from 1998-2002, and a prospective study indicated that mortality from LC will be even greater in the future (Ruíz-Godoy et al., 2007). Lung cancer is a serious public health problem (Proctor, 2001). The World Health Organization (WHO) estimates that in 2030 the number of deaths attributable to the consumption of tobacco will be 100 million, accompanied by an increased incidence of LC (Xie and Minna, 2008).
In some populations, a significant association between gene polymorphisms and the risk of LC risk has been established, while in other populations no associations have been found, probably because of the low frequency of polymorphisms. For example, the CYP1A1 rs1048943 (Ile462Val) polymorphism is associated with a risk of LC in Asian Korean, Chinese and Japanese populations  and in Chilean (Quinoñes et al., 2001), Mexican (Gallegos-Arreola et al., 2008) and Afro-American (Taioli et al., 1998) populations, whereas this association has not been confirmed in American and European Caucasians, possibly because the frequency of the CYP1A1 rs1048943 polymorphism in these populations is < 2% (Hung et al., 2003). In contrast, the GSTM1 deletion polymorphism, which has a frequency of 0.35-0.58 in Asians, European and American Caucasians and Africans (Mohr et al., 2003), is significantly associated with LC in Caucasians (Hung et al., 2003) and Asians . ERCC2 rs1799793 (Asp399Asn), ERCC2 rs13181 (Lys751Asn) and MGMT rs12917 (Leu84Phe) polymorphisms show no consistent association with LC in different populations . However, XRCC1 rs3213245 (-77 T > C) was associated with a risk of LC in three casecontrol studies and ERCC2 rs13181 was associated with a risk of LC in a meta-analysis study of 18 case-control reports (Vineis et al., 2009).
Since these polymorphisms modify the functionality of the encoded proteins it has been suggested that polymorphic variants may alter the metabolic and detoxification pathways of carcinogenic compounds, thereby predisposing the individual bearing these polymorphisms to develop cancer (Nebert and Dalton, 2006). Indeed, polymorphisms in specific genes can modulate the formation of DNAcarcinogen adducts (Ketelslegers et al., 2006) which favors the generation of mutations leading to LC, particularly in smokers (Lodovici et al., 2004).
Recently, a polygenic cancer model was proposed that considers the genetic susceptibility to cancer as a global mechanism, with the susceptibility being defined by low risk alleles in multiple candidate genes (Pharoah et al., 2002;Dong et al., 2008). Susceptibility to LC may be caused by low penetrance genes (low risk) with high frequencies in the general population . In this context, susceptibility to LC is determined by a combination of multiple low risk alleles in an individual, with each allele contributing only slightly to the overall cancer risk, as proposed by Fletcher and Houlston (2010) in their polygenic additive model. This model, which allows the identification of high risk individuals, may be useful in preventing LC in the early stages, thereby significantly reducing LC-related mortality and the costs associated with the diagnosis and treatment of this disease.
The first step in any study of molecular epidemiology in which ethnicity plays an important role is the characterization of the general healthy population since this will provide the benchmark for further analysis. In this study, we investigated 16 polymorphisms in 12 low penetrance genes in a healthy Mexican Mestizo population. These genes code for proteins involved in the metabolic pathways of some environmental and tobacco smoke carcinogens, with their polymorphisms reportedly producing functional alterations that are associated with the risk of developing LC. The association between these polymorphisms and the risk of cancer was assessed using the polygenic additive model for cancer.

Subjects
The research protocol was approved by the Committee of Bioethics of the Instituto de Investigaciones Biomédicas of the Universidad Nacional Autónoma de México, and the Hospital "20 de Noviembre" ISSSTE gave permission to use the buffy coat of blood bank samples as a source of DNA. The study included 382 unrelated, healthy Mexican Mestizo individuals whose parents and grandparents were born in Mexico. After providing informed consent, the subjects answered a questionnaire that included information on their age, gender, smoking status and lifestyle.

Statistical analyses
The statistical package GenePop version 4.0.10 (http://genepop.curtin.edu.au) was used to assess whether the genotypes of each gene were in Hardy-Weinberg equilibrium and to determine the degree of linkage between the EPHX1 and CYP1A1 gene polymorphisms. Conglomerate and hierarchical clustering analyses were used to determine the genetic variability of the sample. The estimated cancer risk and the genotypic and allelic frequencies were determined using the statistical package JMP version 8.

Results
Of the 382 Mexican Mestizo individuals studied, 29% were women and 71% were men. The age range was between 18 and 80 years (mean age: 39.2 ± 12.1 for men and 41.5 ± 13.1 for women) and 48% of the population were smokers.
The following polymorphisms were studied in candidate genes: rs603965 and TP53 rs1042522. The genotypic and allelic frequencies of these polymorphisms are shown in Table 1. All genotypes were in Hardy-Weinberg equilibrium. GSTM1 and GSTT1 were not analyzed for Hardy-Weinberg equilibrium because the methodology did not allow discrimination between heterozygous and homozygous positive genotypes.
Polymorphisms in the same gene may have synergistic or antagonistic effects, as in the case of CYP1A1 and epoxide hydrolase 1. This is functionally significant because CYP1A1 acts in association with EPHX1 to convert polyaromatic hydrocarbons to highly toxic, mutagenic and carcinogenic epoxides.
Linkage analysis of four CYP1A1 and two EPHX1 polymorphisms showed that the CYP1A1 rs1799814 and CYP1A1 rs1800031 genotypes were linked in all cases, with a probability of 1 (p < 0.001), whereas CYP1A1 rs4646903 and CYP1A1 rs1048943 polymorphisms were not linked, with a probability of 0 (p < 0.001). EPHX1 rs1051740 and EPHX1 rs2234922 were linked, with a probability of 0.03254 (p = 0.0025). Combinations of CYP1A1 polymorphisms are shown in Table 2.
Conglomerate analysis was used to determine the genetic variability and the possible grouping of the individuals analyzed. We identified six groups that clustered according to their genetic characteristics, although there was considerable heterogeneity (Figure 1). Additionally, hierarchical cluster analysis of the genotypes showed that CYP1A1 rs1800031 and CYP1A1 rs1799814 polymorphisms clustered together, whereas CYP1A1 rs4646903 and CYP1A1 rs1048943 were close to each other but clustered separately (Figure 2). The EPHX1 rs1051740 and EPHX1 rs2234922 polymorphisms were widely separated, as also indicated by linkage analysis.
To determine the theoretical levels of susceptibility to LC, a risk matrix was generated using a log-additive model in which a value of 0 was assigned to homozygous genotypes that produced no risk, 1 to heterozygous genotypes (medium risk) and 2 to homozygous genotypes that produced changes in the activity of the protein, considered to be high risk. The individual calculated risk values (iCRVs) were determined by adding the values of the log-additive model to each locus. The iCRVs ranged from 1 to 16, although there was a marked decrease in individuals with iCRV £ 5 and ³ 12 (Figure 3). 548 Lung cancer risk in Mexicans

Discussion
Cancer is a polygenic disease, the risk of which may be related to the presence of low-penetrance genes that have additive effects. In this study, we examined the frequency of some polymorphisms possibly related to the risk of developing lung cancer in a sample of healthy Mexican Mestizos. These polymorphisms may be useful biomarkers of genetic susceptibility to lung cancer in specific populations.
We have previously reported on the high frequency of the CYP1A1 rs1048943 polymorphism in Mexican Mestizos (Pérez-Morales et al., 2008), a polymorphism that is highly represented in Amerindians (Kvitko et al., 2000). In some populations, the CYP1A1 rs1048943 polymorphism has been associated with a risk of LC (Quinoñes et al., 2001;Lee et al., 2008;Shah et al., 2008), although not all studies have found such an association, especially when the frequency of this polymorphism is very low, as in European and American Caucasian populations (Hung et al., 2003). However, in some populations the CYP1A1 rs1048943 polymorphism has a significant influence on the risk of developing LC because of the additive effect of other polymorphisms.
Although the CYP1A1 rs1799814 and CYP1A1 rs1800031 genotypes were found to be linked, the linkage analysis may have been affected by the high frequencies of    the wild type allele at these loci in our population. On the other hand, in contrast to a previous report (Hayashi et al., 1991), the CYP1A1 rs4646903 and CYP1A1 rs1048943 polymorphisms were not linked. This observation is functionally significant because the CYP1A1 rs1048943 polymorphism increases the activity of cytochrome CYP1A1, resulting in a more efficient generation of reactive metabolites, whereas the CYP1A1 rs4646903 polymorphism increases the levels of CYP1A1 mRNA. Hence, the combination of these alleles could increase the risk of LC (Yoon et al., 2008). Some studies have associated CYP1A1 rs1799814 polymorphism with a risk of LC (Gallegos-Arreola et al., 2008), whereas CYP1A1 rs1800031 is reportedly specific for African populations but is not associated with a risk of LC (Taioli et al., 1998).
The proportion of the population carrying both the EPHX1 rs1051740 and EPHX1 rs2234922 alleles was very small. In these individuals, the allele EPHX1 rs1051740 Tyr113 apparently offered no protection against LC, in contrast to previous observations (Voho et al., 2006). On the other hand, the EPHX1 rs2234922 139Arg variant increases the activity of the encoded enzyme (Hassett et al., 1994), which could suppress the low activity of the EPHX1 113His variant (Salam et al., 2007). For example, an individual with the CYP1A1 rs4646903, CYP1A1 rs1048943, EPHX1 rs1051740 and EPHX1 rs2234922 polymorphisms will produce reactive metabolites more efficiently and consequently have a higher risk of developing LC.
A conglomerate analysis of the genetic variability of the population studied here revealed six groups that clustered according to their genetic characteristics (Figure 1). In clusters 2 and 4, the polymorphism CYP1A1 rs1800031 was quite separate from the rest of the genotypes, although it should be noted that these two groups consisted of only two subjects each such that the probability of finding individuals belonging to these clusters in a given population is very low. In clusters 2, 3, 4 and 5, AhR rs2066853, EPHX1 rs2234922, GSTT1 null, ERCC2 rs13181, and CYP1A1 rs1799814 were over-represented with respect to the other polymorphisms analyzed, but the number of individuals was smaller than in clusters 1 and 6; there was no predominant genotype in the latter two clusters because of the combination of polymorphic alleles at each locus in each individual. Despite this clustering, our findings indicate that the population studied was highly heterogenous, as is characteristic of Mexican Mestizos.
Hierarchical cluster analysis showed that the genes were grouped according to their frequency. This analysis also revealed that CYP1A1 rs1800031 and CYP1A1 rs1799814 occurred together, in agreement with the linkage analysis, whereas CYP1A1 rs4646903 and CYP1A1 rs1048943 were found together in some individuals but were not linked (as indicated by the separation of their branches). Overall, this analysis showed that a portion of the population carried both risk alleles, although they were not linked. However, we have not determined whether these polymorphisms are in a cis or trans position, which could affect the linkage results.
Our analysis revealed individuals with a high susceptibility to LC based on the presence of risk genotypes, although different combinations of risk genotypes may confer varying degrees of susceptibility when combined with other components, such as environmental factors.
If the risk of developing LC is attributed to the interaction of several low-penetrance genes that exert an additive effect then we should be able to detect individuals with a high number of risk alleles and a greater genetic susceptibility to LC. We estimated the levels of susceptibility and found that the iCRV ranged from 1 to 16; there was a marked decrease in the susceptibility to LC among individ- 550 Lung cancer risk in Mexicans  uals with an iCRV ³ 12. Application of the polygenic model (Fletcher and Houlston, 2010) (Figure 3) yielded a normal distribution of risk alleles in which low and high risk individuals occurred at the extremes of the distribution. Theoretically, high-risk individuals should be more susceptible to LC, but one cannot exclude the important role of genotype x environmental interactions to which individuals are exposed.
Based on the polygenic model of cancer, which takes into consideration the additive effect of multiple risk alleles, the high risk genotypes identified in this study included genes involved in phase I and II metabolism, DNA repair, oxidative stress and cell cycle regulation. All of these gene groups need to be considered when analyzing the efficiency of reactive metabolite generation, DNAadduct repair, damage persistence and the cellular response to cycle arrest or apoptosis. However, as shown here, only a small minority of individuals actually possess a large number of risk alleles. Although studies of other populations have associated specific polymorphisms with a risk of LC, these relationships have not always been confirmed, possibly because of the pleiotropic or epistatic effects of one genotype on another within the same population.
Since LC is a multifactorial, polygenic disease, it is incorrect to attribute the susceptibility to LC to a single gene or group of related genes. In this context, determining the genetic background of a healthy population, such as done here for a Mexican Mestizo population, can provide a sound basis for subsequent studies on the association between these risk genotypes and LC. In such cases, the iCRV should be higher in patients than in the healthy controls.