The ERBB2 gene polymorphisms rs2643194, rs2934971, and rs1058808 are associated with increased risk of gastric cancer

Gastric cancer (GC) is the third most lethal type of cancer worldwide. Single nucleotide polymorphisms (SNPs) in regulatory sites or coding regions can modify the expression of genes involved in gastric carcinogenesis, as ERBB2, which encodes for the tyrosine-kinase receptor HER-2. The aim of this work was to analyze the association of the polymorphisms: rs2643194, rs2517951, rs2643195, rs2934971, and rs1058808 with GC, as they have not yet been analyzed in GC patients, as well as to report their frequency in the general Mexican population (GMP). We studied genomic DNA from subjects with GC (n=74), gastric inflammatory diseases (GID, n=76 control subjects), and GMP (n=102). Genotypes were obtained by means of real-time PCR and DNA-sequencing. The risks for GC were estimated through odds ratio (OR) using the Cochran-Armitage trend test and multinomial logistic regression. Increased risk for GC was observed under the dominant inheritance model for the rs2643194 TT or CT genotypes with an OR of 2.75 (95%CI 1.12−6.75, P=0.023); the rs2934971 TT or GT genotypes with an OR of 2.41 (95%CI 1.01−5.76, P=0.043), and the rs1058808 GG or CG genotypes with an OR of 2.21 (95%CI 1.00−4.87, P=0.046). The SNPs rs2643194, rs2934971, and rs1058808 of the ERBB2 gene were associated with increased risk for GC.


Introduction
Gastric cancer (GC) is the fifth most common type of cancer worldwide, and is the third most lethal type of cancer (1). Chronic gastritis, the inflammation of the mucosal layer of the stomach, is the first step of the multistep cascade of the onset of GC (non-atrophic chronic gastritis, multifocal atrophic gastritis, intestinal metaplasia, dysplasia, and cancer) related to Helicobacter pylori infection (2). Environmental and biological factors such as diet, smoking, toxic exposure, and H. pylori and Epstein-Barr virus infections are involved in all of these disease etiologies. Gastric carcinogenesis can be caused by the accumulation of genetic and epigenetic abnormalities, which modify the expression of different types of genes important for cell regulation, such as oncogenes and those encoding DNA repair molecules, tumor suppressor factors, and cell growth and cell adhesion molecules (3).
ERBB2 is a gene frequently altered in different types of cancer. It is located at chromosomal region 17q12 and has 32 exons. It encodes for the human epidermal receptor-2 (HER-2), which consists of 1255 residues, has a molecular weight of 137.9 kDa, and belongs to a family of tyrosine kinase receptors. Structural and functional alterations of the HER-2 protein have been reported in different stages of carcinogenesis, such as initiation, promotion, and progression. HER-2 amplification and overexpression were reported for the first time in breast cancer and were significantly associated with poor prognosis. Additional studies have shown that HER-2 is also present in other malignancies, including colorectal, ovarian, prostate, and lung cancers, and, in particular, gastric cancer and gastroesophageal cancer (4).

Material and Methods
We studied three groups of subjects: 1) GC group: 74 patients with gastric adenocarcinoma (diffuse type n=35, intestinal type n=26, mixed type n=6, undetermined type n=4), whose diagnoses were made by a pathologist through analysis of biopsies of gastric tumor obtained by endoscopy. 2) GID (gastric inflammatory diseases) group: as controls, we studied 76 subjects with chronic gastritis, chronic atrophic gastritis, or intestinal metaplasia, in whom malignancy was discarded by a pathologist through analysis of gastric biopsies. Both GC and GID groups were recruited from the gastroenterology departments of four hospitals belonging to Instituto Mexicano del Seguro Social in the city of Guadalajara, Jalisco. 3) GMP group: 102 individuals, adults, healthy and unrelated that were recruited from the Centro Medico Nacional de Occidente, Instituto Mexicano del Seguro Social blood bank.
The study was approved by the local Committee of Health Research and Ethics of the Centro de Investigación Biomédica de Occidente, Instituto Mexicano del Seguro Social (CLIEIS-1305). All subjects provided written informed consent.
Genomic DNA was extracted following the salting-out method (12) from a sample of 5 mL of peripheral blood. Genotyping of the SNPs rs2643194, rs2517951, and rs2643195 was done by polymerase chain reaction (PCR) amplification with the forward (5 0 -AAGCATGGCGTCCA CA-3 0 ) and reverse (5 0 -CATCGGGATGTTAGGATCA-3 0 ) primers. PCR reaction was performed in a 25-mL reaction mixture comprising 200 ng genomic DNA, 5 pM each primer, 0.5 U Taq polymerase (Invitrogen, USA), 1 Â PCR buffer, 1.5 mM MgCl 2 , and 2.0 mM deoxynucleotide triphosphate mix (dNTP Set; Vivantis, Malaysia). The PCR conditions were 94°C 4 min, and 30 cycles of 94°C 30 s, 55°C 30 s, and 72°C 30 s, to finalize at 72°C 3 min, carried out in a 2720 Thermal Cycler (Applied Biosystems, USA). The resulting fragment (118 bp in length) was purified by means of Centrisep columns and then sequenced using the Big Dye Terminator Kit v1.1 on ABI-310 DNA Sequencer (Applied Biosystems). The rs2934971 and rs1058808 genotypes were obtained by real-time PCR using Custom TaqMan s SNP Genotyping Assays on an ABI-PRISM 7000 Sequence Detection System, both supplied by Applied Biosystems following the recommendations of supplier.

Statistical analysis
The genotype and allele frequencies of each SNP, as well as the Hardy-Weinberg equilibrium (HWE) were estimated. The chi-squared test was used to compare the distribution of genotypes using Arlequin v. 3.01 software (http://cmpg.unibe.ch/software/arlequin35/). The genotypic and allelic frequencies of the five polymorphisms observed in the GMP were compared with those reported for six world populations in the ''1000 Genomes Project 1KGP'' including a population with Mexican ancestry (MXL) (13). Odds ratios (ORs) were calculated using the Cochran-Armitage test under classical inheritance patterns, and each polymorphism was tested with the models dominant, recessive, and codominant (14). Individual results are reported as the OR and 95% confidence interval (CI). Po0.05 was considered statistically significant. We also performed a multinomial logistic regression test in SPSSv22 software (IBM, USA) to evaluate each genotype in the three groups simultaneously.
Haplotypes were established with the five SNPs according to their location in the gene from the 5 0 to the 3 0 position (rs2643194 C4T, rs2517951 C4T, rs2643195 A4G, rs2934971 G4T, and rs1058808 C4G). Linkage disequilibrium (LD) analysis was performed with the haplotypes data. D' values between 0 and 0.5 were considered a low level of linkage, between 0.5 and 0.75 as moderate, and greater than 0.75 as high LD. r 2 values 40.33 were considered acceptable for correlation between sites, along with a P value of o0.05. The Arlequin v. 3.01 software was used.

Results
Demographic data of the three groups are reported in Table 1. Significant differences were observed for age between the GC group and the GID group (P=0.01). Also, there was a greater number of men in the GC group (1.7 to 1 proportion). On the contrary, in the GID group there was a greater number of women (1.5 to 1 proportion) (P=0.003).

Genotypic and allelic frequencies
The genotypic frequencies of all five SNPs in the three groups showed that heterozygous genotypes had the highest frequencies (range: 0.45-0.58), and that the mutated genotypes had higher frequencies than the wild-type genotypes (ranges: 0.24-0.39 vs 0.07-0.31, respectively). We also observed a higher frequency of mutated than wild-type alleles in terms of allelic frequencies of the five SNPs except for GID (controls) in the rs2517951, rs2643195, and rs1058808 polymorphisms (Supplementary Table S1). HWE was observed in all groups for all SNPs (P40.05).
The distribution of allelic and genotypic frequencies for each SNP was compared between the three studied groups; significant differences were observed in allelic frequencies between GC and GID groups of the rs2643194 (P=0.03) and rs2934971 (P=0.02) polymorphisms, as well as in allelic frequencies between GID and GMP groups of the rs2517951 (P=0.03), rs2643195 (P=0.03), rs2934971 (P=0.01), and rs1058808 (P=0.01), and in the genotypic frequencies for rs2934971 (P=0.005) and rs1058808 (P=0.01) (Supplementary Table S1).

Haplotypes
Haplotype analysis showed 11 of the 32 possible combinations in the three studied groups ( Table 2). The GC and GMP groups had nine haplotypes each, while the GID group had five. The two most frequent haplotypes in the three groups were TTGTG (446.7%) and CCAGC (434.7%), while the remaining combinations were observed at frequencies less than 5%. A comparison of the distribution of haplotypes between the GMP vs the GID and GC groups did not show any significant differences (Table 2); however, the wild haplotype ''CCAGC'' was observed in a higher proportion in the GID group (P=0.02); the risk analysis for this haplotype showed an OR of 0.57 (95%CI 0.36-0.92, P=0.02).

Linkage disequilibrium (LD)
In the linkage analysis of the five sites, rs2643194, rs2517951, rs2643195, rs2934971, and rs1058808, the results showed statistically significant LD for the 10 pairs of loci of each of the three groups studied (D'40.9415, r 2 40.6649, and Po0.0001).

Comparison of frequencies in the general Mexican population with other populations
In general, the distribution of genotypic and allelic frequencies for all five polymorphisms in GMP was different than that in the East Asian population (P40.05) while it was similar to the American population (P40.05) ( Table 3).

Discussion
ERBB2 is a gene frequently altered in different types of cancer. In gastric cancer, the alterations reported included mutations (n=34 different), and amplification and overexpression (up to 22.1% and 53.4% of cases, respectively); the latter are associated with a poor survival of the patients (15). In this work, we present the results of the  (16), as well as in females, non-drinkers and non-smokers lung cancer Korean patients (10). Also, this SNP was associated with risk of prostate cancer in a Chinese population (17). rs2934971, also known as -1985 G4T, showed association with GC under a dominant model, since the mutated (TT) and heterozygote GT genotypes showed increased risk for GC (OR 2.41, P=0.043); however, this SNP was not related to breast or lung cancer in Korean patients (7,10). rs1058808 (c.3418 C4G), also known as P1170A, is a missense variation located on exon 31 of the ERBB2 gene, which results in a proline to alanine substitution in a C-terminal intracellular regulatory domain. This SNP showed an increased risk for gastric cancer with the mutated genotype (GG) or heterozygote (CG) genotype (OR 2.21, P=0.046). Similarly, this SNP was previously associated with an increased risk for rectal cancer in TP53 positive tumors in American patients (OR 1.7, P=0.03) (18). As known, HER2 receptor lacks a ligand binding domain, therefore it requires forming heterodimers with other receptors such as EGFR, HER3, or HER4 to carry out its function in the signaling pathways, such as mitogen-activated protein kinase (MAPK) and phosphatidylinositol-3 kinase (PI-3K) (4). In silico analysis with PolyPhen-2 software (http://genetics.bwh.harvard.edu/pph2/) showed a score of 0.953, indicating that P1170A is a damaging variant, which can impact the structure and function of HER-2 protein. Thus, this polymorphism located on a region that encodes for the intracellular regulatory domain could probably modify the signaling pathways, and consequently be involved in gastric carcinogenesis. Although our results suggested that these three polymorphisms could have a role in gastric carcinogenesis, additional studies are required in order to establish if they could modulate gene expression and, consequently, result in an imbalance between cell proliferation and apoptosis.
The SNPs rs2517951 and rs2643195 were not associated with GC in our population. However, some authors have observed an association of the rs2643195 SNP with protection against prostate cancer in Chinese patients (17), as well the rs2517951 SNP with an increased risk of endometrial cancer in Korean women with obesity (9).
Regarding the high LD observed in our study for the five SNPs(D'40.941 and r 2 40.665), other authors have also reported high LD within this genomic region, which covers 33.9 kb, spanning the entire coding region of the ERBB2 gene, with D' values 40.7 (5), 40.88 (7), and 40.89 (8). Neither haplotype was associated with a significant risk for GC; no author has found an association of these haplotypes with breast cancer (5,7,8).
The similarities in the frequencies for the rs2643194, rs2517951, rs2643195, rs2934971, and rs1058808 SNPs observed among the GMP and European, American, T  South Asian, and African populations (Table 3) are due to the high variability in the genetic component of the Mexican Mestizo population, which is an admixture of Spanish, Native American, and African genes (19). Finally, the SNPs rs2643194, rs2934971, and rs1058808 of the ERBB2 gene were associated with increased risk for GC in the Mexican population. It would be interesting to deepen the study of these markers and their role in such diseases in other populations, as well as to know the levels of expression of the HER2 receptor in this group of alterations.

Supplementary Material
Click here to view [pdf].