Abstract
This genetic association study including 120 patients with type 2 diabetes mellitus (T2DM) and 166 non-diabetic individuals aimed to investigate the association of polymorphisms in the genes GSTM1 and GSTT1 (gene deletion), GSTP1 (rs1695), ACE (rs4646994), ACE2 (rs2285666), VEGF-A (rs28357093), and MTHFR (rs1801133) with the development of T2DM in the population of Goiás, Brazil. Additionally, the combined effects of these polymorphisms and the possible differences between sexes in susceptibility to the disease were evaluated. Finally, machine learning models were integrated to select the main risk characteristics for the T2DM diagnosis. Risk associations were found for the GSTT1-null genotype in the non-stratified sample and females, and for mutant C allele of the VEGF-A rs28357093 polymorphism in the non-stratified sample. Furthermore, an association of heterozygous (AG) and mutant (GG) GSTP1 genotypes was observed when combined with GSTT1-null. Machine learning approaches corroborated the results found. Therefore, these results suggested that GSTT1 and GSTP1 polymorphisms may contribute to T2DM susceptibility in a Brazilian sample.
T2DM; Genetic polymorphisms; Sex differences; Machine learning
Introduction
Type 2 diabetes mellitus (T2DM) is a complex and progressive metabolic disease characterized by high plasma glucose levels (1). Approximately 537 million individuals worldwide are diagnosed with diabetes mellitus (DM), and it is estimated that the number of people affected by DM will rise to 643 million by 2030 and to 783 million by 2045. In Brazil, 15.7 million diabetics were registered, placing the country in 6th place in the world ranking of incidence (2).
Characterized by relative deficiency and/or insulin resistance, T2DM can cause micro- and macrovascular complications. In addition, environmental, immunological, and genetic factors contribute significantly to the pathogenesis of the disease, the clinical course, and the manifestation of complications (1). Epidemiological studies also show sex differences in risk factors, clinical manifestations, prevention, diagnosis, response to treatment, and complications of DM. Therefore, it is evident that sex differences need to be taken into account when conducting and reporting research on T2DM and in health planning, contributing to the development of specific interventions for both sexes (3).
The etiopathogenesis of T2DM is also frequently related to oxidative stress, a condition characterized by an imbalance between the production of free radicals and their correct detoxification (4). The GSTM1 (1p13.3) (NCBI Gene ID 2944) and GSTT1 (22q11.2) (NCBI Gene ID 2952) genes have a complete deletion polymorphism that inactivates these enzymes. The absence of these enzymes can cause a deficit in the cellular detoxification process, making tissues more susceptible to oxidative stress (5,6). In contrast, the GSTP1 gene (11q13.2) (NCBI Gene ID 2950) has a single nucleotide polymorphism (SNP) called A313G (rs1695), characterized by the exchange of adenine for a guanine in codon 105 of exon 5 (7).
Hyperglycemia also acts as an excitatory stimulus for the renin-angiotensin system (RAS). Polymorphisms in RAS components have been evaluated in genetic association studies related to T2DM and its complications, due to the action of these enzymes in inhibiting the insulin signaling pathway and glucose metabolism, in addition to the induction of oxidative stress, causing endothelial damage, inflammation, and vascular remodeling (8).
The ACE gene (17q23.3) (NCBI Gene ID 1636) has an insertion/deletion (I/D) polymorphism (rs4646994) characterized by the presence (I) or absence (D) of an Alu segment with 287 bp in intron 16 (9), while the ACE2 gene (Xp22.2) (NCBI Gene ID 59272) presents the SNP G8790A (rs2285666) in intron 3 (10). Studies have shown an association between this polymorphism and left ventricular hypertrophy, myocardial infarction, coronary disease, and hypertension in patients with metabolic syndrome (11- 13).
Furthermore, studies have revealed increased plasma levels of vascular endothelial growth factor A (VEGF-A) in patients with DM, especially those with microvascular complications (14). In the VEGF-A gene (6p21.1) (NCBI Gene ID 7422), the SNP A-141C (rs28357093) stands out, characterized by the replacement of an adenine by a cytosine in the position -141 in the promoter region of the gene (15,16), which may interfere with the processes of angiogenesis, vasculogenesis, and vascular permeability.
Additionally, due to its role in cellular metabolism, the folate cycle, also known as one-carbon metabolism, has been considered an important research target on T2DM (17). This cycle includes several interconnected metabolic pathways and is responsible for homocysteine metabolism and DNA methylation (18). The MTHFR gene (1p36.6) (NCBI Gene ID 4524) encodes the MTHFR enzyme, responsible for the conversion of 5,10-methylenetetrahydrofolate (5,10-MTHF) into 5-methyltetrahydrofolate (5-MTHF), the form of circulating folate and the main methyl radical donor for homocysteine remethylation (19). The C677T SNP (rs1801133) in this gene causes the replacement of the amino acid alanine by valine at position 222 (20).
These underlying pathophysiological mechanisms, such as inflammation, endothelial dysfunction, and oxidative stress, are observed in the pathogenesis of T2DM. Pancreatic β cells are sensitive to free radicals, due to reduced levels of antioxidant compounds, such as glutathione peroxidase, catalase, and superoxide dismutase (21). Thus, researchers seek to understand the role of genetic polymorphisms in genes related to the oxidative stress pathway in the susceptibility and development of T2DM. Oxidative stress has been evaluated in its relationship with T2DM as a unifier of several cellular damage pathways in a hyperglycemic state (4), allowing the development of studies that evaluate several genes related to this condition.
Therefore, this study was designed to investigate the association of polymorphisms in the genes GSTM1 (gene deletion), GSTT1 (gene deletion), GSTP1 (rs1695), ACE (rs4646994), ACE2 (rs2285666), VEGF-A (rs28357093), and MTHFR (rs1801133) and their combined effects on the development of T2DM in the population of Goiás, Brazil. In addition, a possible difference between sexes in susceptibility to the disease was evaluated. Finally, machine learning models were integrated to select and classify the main risk characteristics for T2DM diagnosis in the sample.
Material and Methods
Subjects
A total of 120 patients diagnosed with T2DM treated at the Clinical Hospital of the Faculty of Medicine of the Federal University of Goiás (UFG), Brazil, were selected as the case group. The control group consisted of 166 individuals without a diagnosis of DM selected from the Clinical Analysis and Health Education Laboratory (LACES) at UFG, Goiás, Brazil. Inclusion criteria were established according to the Strengthening the Reporting of Genetic Association Studies (STREGA) guidelines for improved reporting of genetic association studies (22).
The selection criteria for the study groups were: a) Inclusion criteria: individuals aged between 30 and 90 years who underwent periodic clinical and laboratory monitoring during the data collection period; b) Exclusion criteria: patients who did not undergo laboratory monitoring, individuals who did not agree to participate in the study, and patients with DM (for the control group).
Clinical and biochemical data of the patients were collected from available medical records. Additional information on life habits, occupational history, general health conditions, previous diseases, and other anamnesis data were obtained through a questionnaire. Patients who reported smoking for more than a year before the diagnosis were considered as smokers and alcohol intake was considered for those reporting a regular intake of alcoholic beverages.
This study was conducted following the guidelines of the Research Ethics Committee (No. 1952011) of the UFG and the Ethical Principles for Medical Research Involving Human Beings of the Declaration of the World Medical Association of Helsinki. All participants provided written informed consent.
DNA extraction and quantification
Peripheral blood samples were collected in tubes containing heparin and stored at -80°C. DNA extraction was performed using the PureLinkTM Genomic DNA Mini Kit (Invitrogen, USA) following the manufacturer's suggested protocol. The genomic material was evaluated and quantified with a NanoDrop™ ND-1000 spectrophotometer (ThermoFisher®, USA).
Genotyping of polymorphisms in the GSTM1, GSTT1, and ACE genes
The genotyping of the polymorphisms in the GSTM1, GSTT1, and ACE genes was performed by Multiplex Real-Time Polymerase Chain Reaction (qPCR) using the fluorophore SYBR® Green I (Sso Advanced™ Universal SYBR® Green Supermix, Bio-Rad, USA), with discrimination of null/present and insertion/deletion genotypes by the analysis of the melting curves generated after amplification.
For the analysis of the polymorphisms in GSTM1 and GSTT1, the co-amplification of the RH92600 region, a microsatellite region used as an endogenous control of the reaction, was also performed. The primers and cycling protocols used for the amplification of GSTM1/GSTT1 and ACE were previously suggested (23,24).
Genotyping of polymorphisms in the GSTP1, MTHFR, VEGF-A, and ACE2 genes
Genotyping of polymorphisms in the GSTP1, MTHFR, VEGF-A, and ACE2 genes was performed using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). The PCR amplification products of the above genes were submitted to the enzymatic restriction technique using the restriction enzymes Alw26I (BsmAI), Hinf I, HhaI, and Alu I, respectively.
Subsequently, the digested fragments were visualized on an 8-15% polyacrylamide gel and stained with a 4-g/L silver nitrate solution. The primers and cycling protocols used for the above genes were reported by Harries et al. (25), Keku et al. (26), Holt et al. (15), and Benjafield et al. (27), respectively. The respective genotypes identified after enzymatic restriction are described in Table 1.
Statistical analysis
The statistical analysis of this study was conducted in three stages: the first stage involved a descriptive and comparative analysis of demographic and clinical variables in relation to T2DM, the second stage involved genetic models (binomial logistic regression): codominant, dominant, recessive, and overdominant, and the third approach was the application of supervised machine learning models.
In the first stage, multivariate principal component analysis (PCA) was used to evaluate the profile and interrelationship between the control and case groups. Subsequently, Student's t-test was used for quantitative variables, and Fisher's exact test was used for qualitative variables to assess differences and associations between demographic and clinical variables with the case and control groups. In this first approach, a significance level of 0.05 was considered.
In the second approach, the genotypes were classified as codominant, dominant, recessive, and overdominant for the application of genetic models with binary logistic regression (Equation 1). In this case, π(x) represents the probability of success given the value of the variable x (0 (control group) or 1 (case group)), X is the explanatory variable (genotypes) considered in the model, and β0 and βi are the parameters of the logistic regression (Equation 1).
Subsequently, using the estimated parameters, the odds ratio (OR) (Equation 2) and the confidence intervals (CI) with a confidence level of 95% (Equation 3) were calculated:
in which, ORLL is the lower limit of the OR, ORUL is the upper limit of the OR, Z is the value of the probability distribution, α is the significance level, and is the standard deviation of the parameter
Furthermore, for the diagnosis model, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were used according to Burnham and Anderson (28). Subsequently, the data were stratified by sex and the genetic models were created using logistic regression.
Additionally, to perform an initial screening of the interactions between all the SNPs evaluated in the study with T2DM, the chi-squared test was applied with a significance level of 0.10. Consecutively, all SNPs with a P-value less than 0.10 were subjected to a binary logistic regression analysis with a significance level of 0.05, since this approach has greater statistical power compared to the chi-squared association test. Therefore, the results were interpreted based on the logistic regression analysis.
In the third stage, the data were modeled with supervised machine learning to identify variables that helped in the diagnosis of T2DM. Initially, the database was randomly split into training (70.00%) and testing (30.00%) sets, and then the training data were used for machine learning, using the following models: logistic regression (LR), classification and regression tree (CART), K-nearest neighbors (KNN), support vector machine (SVM), and random forest (RF) (29- 32). Additionally, to avoid possible sample selection biases in the training set, cross-validation was performed. To verify the importance of each covariate in the database for the models, the permutation test with accuracy score was used. Subsequently, the diagnosis of the supervised models was carried out in two complementary ways: by calculating accuracy, precision, and recall from the confusion matrix and by estimating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Thus, for selecting the best models, accuracy, precision, recall, and AUC values of the ROC curve close to 1 were evaluated.
All analyses were performed with the aid of spreadsheets, using R software version 4.0.2 and Python version 3.12.2.
Results
Clinical data
A total of 120 T2DM patients and 166 individuals without DM were evaluated considering demographic, clinical, and laboratory characteristics (Tables 2 and 3). The mean age of the case and control groups was 60.47±9.87 and 57.89±9.91, respectively, with a significant difference between groups (P=0.03). Both groups were predominantly composed of women, however, there was no statistical difference for this variable (P=0.06) (Table 2).
Demographic and clinical characterization of diabetic (case) and non-diabetic (control) groups.
Furthermore, alcohol intake and smoking showed no statistical difference between groups (P=0.77 and P=0.17, respectively) (Table 2). Statistically significant differences (P<0.05) between the groups were found for fasting glycemia, cholesterol, triglycerides, high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), very low-density lipoprotein cholesterol (VLDL-C), body mass index (BMI), and blood pressure (Table 3).
Comparison of clinical parameters between diabetic (case) and non-diabetic (control) groups.
Genetic analysis
The PCA analysis revealed that component 1 (PC1) explains 32.70% of the data variability, while component 2 (PC2) explains 13.30%, demonstrating that components 1 and 2 explain 46.00% of the variability of the data set. Furthermore, the case and control groups did not present a relevant contrast, since the limits of the clusters of each group overlapped. Thus, this small variation between groups allowed using PCA as a covariate and resizing the data (Figure 1). However, there was greater variability in the group of patients with T2DM when compared to the control group, that is, the control group (without T2DM) presented greater homogeneity in clinical and demographic data compared to the case group (Figure 1). Therefore, this heterogeneity in the case group may be related to T2DM, as in individual comparisons between groups, statistical differences were observed for fasting glycemia, cholesterol, triglycerides, HDL-C, LDL-C, VLDL-C, systolic pressure, diastolic pressure, BMI, and creatine (Table 3).
The identification of polymorphisms in the GSTM1, GSTT1, and ACE genes by qPCR and GSTP1, MTHFR, VEGF-A, and ACE2 by PCR-RFLP are detailed in Supplementary Figures S1 to S6. The genotype and allele frequencies for the different models of inheritance in the case and control groups are presented in Table 4. There was a 3.16-fold increased risk for the GSTT1-null genotype and the development of T2DM (P=0.000267), and a significant difference of the mutant C allele of the VEGF-A rs28357093 between groups (P=0.048).
Genotypic and allele frequencies of the investigated polymorphisms and the association between diabetic (case) and non-diabetic (control) groups, using models of inheritance.
Moreover, a reduced risk of disease development was observed with the presence of the AG genotype of the GSTP1 rs1695 polymorphism in the codominant (OR=0.533, CI=0.299-0.940, P=0.030) and overdominant models (OR=0.516, CI=0.300-0.881, P=0.016), and AA of the ACE2 rs2285666 in the recessive model (OR=0.371, CI=0.130-0.919, P=0.042).
Supplementary Table S1 shows the genotype and allele frequencies for the different models of inheritance in both groups stratified by sex. Women with the GSTT1-null genotype demonstrated a 3.66-fold increased risk of developing T2DM (P=0.001). Additionally, a reduced risk of developing the disease in women was observed with the presence of the AG genotype of the GSTP1 rs1695 polymorphism in the codominant (OR=0.473, CI=0.23-0.94, P=0.035) and overdominant models (OR=0.479, CI=0.24-0.91, P=0.027).
The selection of polymorphisms in different genes for combined analysis is shown in Supplementary Table S2. Polymorphisms in GSTP1, ACE2, and GSTT1 in the codominant model, GSTP1 and GSTT1 in the dominant and overdominant models, and polymorphisms in ACE2 and GSTT1 in the recessive model were selected.
From screening, the analysis of the genotypic combinations was performed with the codominant, dominant, recessive, and overdominant models of inheritance (Table 5). In the codominant model, the combination of GSTP1-wild (AA) and GSTT1-null genotypes revealed a significant association with the development of T2DM (OR=3.70, CI=1.23-13.83, P=0.02). A risk association was also found for the ACE2-wild (GG) and GSTT1-null genotypic combination (OR=4.48, CI=1.67-14.30, P=0.00) (Table 5).
Frequency distribution of genotypic combinations and risk analysis in diabetic and non-diabetic groups according to inheritance models.
In the dominant model, risk associations were demonstrated in the genotypic combinations GSTP1-wild (AA) and GSTT1-null (OR=2.56, CI=1.06-6.54, P=0.04) and GSTP1-heterozygous+mutant (AG+GG) and GSTT1-null (OR=5.76, CI=1.96-21.17, P=0.00). In the recessive model, the genotypic combination of ACE2-wild (GG) with GSTT1-null (OR=3.57, CI=1.06-16.24, P=0.05) or present (OR=9.30, CI=2.45-46.34, P=0.00) demonstrated an association with the development of T2DM (Table 5).
The overdominant model showed association in all combinations performed: GSTP1-wild+mutant (AA+GG) and GSTT1-null (OR=3.22, CI=1.13-10.09, P=0.03), GSTP1-heterozygous (AG) and GSTT1-present (OR=1.96, CI=1.07-3.64, P=0.02), and GSTP1-heterozygous (AG) and GSTT1-null (OR=5.06, CI=2.01-14.11, P=0.00) (Table 5).
Machine learning approaches
Seeking an approach to predicting T2DM based on explanatory variables, five machine learning models were tested (LR, CART, KNN, SVM, and RF). There are few studies on the application of machine learning in the biological and medical areas, and the available work shows great divergence in the models used. Therefore, we used the top five supervised machine learning models to screen the best predictive models, a common approach for machine learning model selection.
The selection of the best model was based on the accuracy, precision, recall, and AUC parameters of the ROC curve. The AUC of the ROC curve revealed the ability of the models to distinguish classes (case and control). Models with 100% wrong predictions have an AUC=0, while models with 100% correct predictions have an AUC=1. In this way, we confirmed that the CART and RF models had the best fit to the training data in all inheritance models (Figure 2).
Receiver operating characteristic (ROC) curve plots of the ability to distinguish classes. A, codominant model of inheritance; B, dominant model of inheritance; C, recessive model of inheritance; D, overdominant model of inheritance. LR: logistic regression; CART: classification and regression tree; KNN: K-nearest neighbors; SVM: support vector machine; RF: random forest.
The training phase revealed the importance of each explanatory variable within the models used (LR, CART, KNN, SVM, and RF) (Figures 3 and 4, and Supplementary Figures S7 to S9); the fasting glycemia variable was excluded to avoid this confounding bias. Table 6 shows the performance of the models based on accuracy, precision, and recall in the training data. Values closer to 1 indicate the most appropriate model.
Permutation test with the classification and regression tree (CART) model for each inheritance model. A, codominant; B, dominant; C, recessive; D, overdominant. ACE: angiotensin converting enzyme; BMI: body mass index; Cod: codominant; Col: cholesterol; Cre: creatinine; Dom: dominant; GSTM1: glutathione S-transferase mu 1; GSTP1: glutathione S-transferase pi 1; GSTT1: glutathione S-transferase theta 1; HDL: high density lipoproteins; ID: insertion/deletion; LDL: low density lipoprotein; MTHFR: methylenetetrahydrofolate reductase; Over: overdominant; PD: diastolic pressure; PS: systolic pressure; Rec: recessive; TG: triglycerides; VEGF: vascular endothelial growth factor; VLDL: very low-density lipoprotein.
Permutation test with the random forest (RF) model for each inheritance model. A, codominant; B, dominant; C, recessive; D, overdominant. ACE: angiotensin converting enzyme; BMI: body mass index; Cod: codominant; Col: cholesterol; Cre: Creatinine; Dom: dominant; GSTM1: glutathione S-transferase mu 1; GSTP1: glutathione S-transferase pi 1; GSTT1: glutathione S-transferase theta 1; HDL: high density lipoproteins; ID: insertion/deletion; LDL: low density lipoprotein; MTHFR: methylenetetrahydrofolate reductase; Over: overdominant; PD: diastolic pressure; PS: systolic pressure; Rec: recessive; TG: triglycerides; VEGF: vascular endothelial growth factor; VLDL: very low-density lipoprotein.
Performance evaluation of machine learning models based on accuracy, precision, and recall in the training data for each inheritance model.
Therefore, the CART and RF models showed the best performance in the training data in all inheritance models (Table 6). However, caution is needed with well-adjusted models, since they can be great for adjusting data, but inefficient in predicting values (overfitting). Figures 3 and 4 show the permutation test performed for each inheritance model with the CART and RF models. The y-axis of the graphs shows the variables used and the x-axis measures the importance of each variable in the model.
In the CART model, the main variables found were blood pressure, HDL-C, and triglycerides in the codominant inheritance model; blood pressure, HDL-C, and cholesterol in the dominant inheritance model; blood pressure, triglycerides, and HDL-C in the recessive inheritance model; and systolic pressure, VLDL-C, triglycerides, and HDL-C in the overdominant inheritance model (Figure 3). While in the RF model, the main variables identified were blood pressure, HDL-C, and creatinine in all inheritance models (Figure 4).
In the testing phase, each model was validated with the accuracy, precision, and recall parameters in each inheritance model (Table 7). In the accuracy parameter, the CART and RF models were the most adequate in all inheritance models, in terms of accuracy, the LR and RF models received the best scores, and in the recall parameter, the SVM, KNN, and RF models had the best fit to the data. The RF model had the best fit to the data set in general, showing higher values in all evaluated parameters.
Performance evaluation of machine learning models based on accuracy, precision, and recall in the test data for each inheritance model.
Discussion
T2DM is a polygenic and multifactorial disease diagnosed in 90-95% of diabetics, being considered the most common form of diabetes in individuals over 45 years of age (1). Analyses revealed significant differences between groups for age, fasting glycemia, cholesterol, triglycerides, HDL-C, LDL-C, VLDL-C, BMI, and blood pressure (Tables 2 and 3).
These results agreed with scientific reports, which consider advanced age, obesity, sedentary lifestyle, and presence of metabolic syndrome components, such as arterial hypertension and dyslipidemia, as main risk factors for the development of T2DM (1). Thus, the differences found may reflect the degree of metabolic decompensation observed in T2DM patients (4).
In this study, genetic analysis revealed that the GSTT1-null genotype is associated with a statistically significant 3.16-fold higher risk of T2DM (Table 4). The combined analysis of the genotypes (Table 5) corroborated the previously found associations. The GSTT1-null genotype was associated with disease development in all inheritance models. Additionally, the risk association found in the genotypic combinations GSTT1-null + GSTP1-heterozygote+mutant (AG+GG) in the dominant model, and GSTT1-null + GSTP1-heterozygote (AG) in the overdominant model was highlighted. Previous studies have also found an association of GSTT1 and GSTP1 polymorphisms with the development of T2DM both individually and in combination (33- 35).
Glutathione S-transferases (GSTs) conjugate reduced glutathione with reactive compounds. Consequently, these compounds become more hydrophilic and are excreted. Probably due to the action of GSTs in cellular detoxification, individuals with deletion of these genes may have reduced antioxidant defenses. The enzymes of this family also act in other intracellular mechanisms, such as cell replication, modulation of signaling pathways, apoptosis, and drug resistance (5). Meta-analyses report the GSTT1 polymorphism as a risk factor for T2DM and its complications (33,34). Furthermore, the presence of this gene is reported to be a protective factor against the development of the disease (21).
A significant difference was also found for the mutant C allele of VEGF-A rs28357093 polymorphism (Table 4). This is the first study to evaluate this SNP in the pathogenesis of T2DM and there are no studies evaluating the effect of the SNP on gene expression and protein synthesis. However, according to the literature, the control of VEGF-A expression is essential for the maintenance of pancreatic islet vessels and for glucose homeostasis (36).
VEGF-A signaling from β cells to endothelial cells maintains the endocrine vasculature, while the opposite signaling acts on pancreatic development, insulin secretion, β cell proliferation, and adequacy of cell mass to metabolic changes (36). Studies have reported increased levels of VEGF-A in T2DM patients and a risk association of this gene with the development of microvascular complications (14).
Additionally, the analysis demonstrated a protective association of the AG genotypes of the GSTP1 rs1695 polymorphism in the codominant and overdominant models and AA of the ACE2 rs2285666 in the recessive model (Table 4). In contrast to this study, heterozygous (AG) genotypes of the rs1695 polymorphism in the GSTP1 gene (35) and mutant (AA) of the ACE2 rs2285666 polymorphism (37) were related to higher risk of developing T2DM. There were no significant associations with other analyzed variants.
Genetic analyzes were also performed with the sample stratified by sex. Results showed an increased risk in women with the GSTT1-null genotype, and a decreased risk with the AG genotype of the GSTP1 rs1695 polymorphism in the codominant and overdominant models of inheritance (Supplementary Table S1), in agreement with the associations found in the analyses with both sexes. We evaluated 83 women in the case group and 96 in the control group, so caution should be taken when interpreting the results due to the small sample and the higher frequency of women in both groups compared to men.
Rao et al. (38) reported that the combination of GSTM1-null and GSTT1-null genotypes confer a higher risk in women than in men of developing T2DM, and that diabetic woman had lower levels of GST activity. Associations between insulin resistance biomarkers, metabolic syndrome, inflammation, and endothelial dysfunction were also described more in women, since diabetic women are subject to greater changes in coagulation, inflammation, and vascular function (3).
Although medical interest in these specific differences between sexes is increasing, the underlying mechanisms that influence this differentiation are not fully elucidated. Therefore, the need to include a perspective of differences between sexes in conducting and reporting research on T2DM and in health planning is evident, helping in the elaboration of specific interventions for men and women (1,3).
Polymorphisms in genes of the GST family affect antioxidant defenses, favoring the onset of oxidative stress with increased production of free radicals (39). Therefore, these genetic variants seem promising for predicting the development of T2DM, and additional studies are needed to elucidate the molecular mechanisms underlying these polymorphisms and evaluate their interactions in T2DM and disease complications.
Additionally, clinical and genetic data were used to apply supervised machine learning, which consists of algorithms that allow the computer to learn from the available data. From the cross-validation (training and testing stage) the machine's performance is evaluated, measuring its effectiveness in predicting results based on the rules it learned in the training stage. These models are a powerful alternative to conventional statistical tests, providing greater flexibility in describing complex data (40). In this study, the method was used to determine which genetic and/or environmental variables lead to increased susceptibility of an individual to T2DM.
The performance evaluation revealed that the CART and RF models were the most adjusted for the training data set, and the RF model for the test data set (Tables 6 and 7). In the CART model, the variables blood pressure, HDL-C, VLDL-C, triglycerides, and cholesterol were the most relevant for predicting the phenotype. In the RF model, the variables blood pressure, HDL-C, LDL-C, and creatinine stood out. Thus, supervised machine learning approaches corroborate the clinical data found in this study.
The use of machine learning methods to predict diagnoses is increasing due to their applicability to different areas of health care. However, as this is a new approach, there is a lack of studies on the application of these models to predict human diseases, highlighting the importance of new reproducible studies in the general population to optimize the methods to predict the outcome in different populations.
Therefore, understanding the genetic factors associated with the development of T2DM in the Brazilian population can elucidate the role of molecular mechanisms in the susceptibility to the disease. The importance of these studies in Brazil is also highlighted for building knowledge about Brazilian genetics, which is greatly influenced by the mixture of ancestries not found in other populations. Furthermore, this study sought to contribute to the identification of possible biomarkers of susceptibility to T2DM, which indicated possible prognostic criteria and specific health interventions for both sexes.
Supplementary Material
Click here to view [pdf].
Acknowledgments
The authors would like to thank all individuals who voluntarily participated in this study and the Clinical Hospital of the Faculty of Medicine of the Federal University of Goiás (UFG) for providing samples for this study.
-
Funding
This work was supported by Fundação de Amparo è Pesquisa no Estado de Goiás (FAPEG - DOCFIX grant number: 201510267000195 to A.A.S. Reis) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq; Grant number: 448905/2014-0 to A.A.S. Reis), Brazil. K.F. Santos received financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES; Finance Code 001).
References
-
1 American Diabetes Association. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes-2018. Diabetes Care 2018; 41: S13-S27, doi: 10.2337/dc18-S002.
» https://doi.org/10.2337/dc18-S002 -
2 International Diabetes Federation (IDF). IDF Diabetes Atlas. 10th ed, Brussels, 2021. Available at: <https://www.diabetesatlas.org>
» https://www.diabetesatlas.org -
3 Day S, Wu W, Mason R, Rochon PA. Measuring the data gap: inclusion of sex and gender reporting in diabetes research. Res Integr Peer Rev 2019; 4: 9, doi: 10.1186/s41073-019-0068-4.
» https://doi.org/10.1186/s41073-019-0068-4 -
4 Yaribeygi H, Sathyapalan T, Atkin SL, Sahebkar A. Molecular mechanisms linking oxidative stress and Diabetes Mellitus. Oxid Med Cell Longev 2020; 2020: 8609213, doi: 10.1155/2020/8609213.
» https://doi.org/10.1155/2020/8609213 -
5 Allocati N, Masulli M, Di Ilio C, Federici L. Glutathione transferases: substrates, inhibitors and pro-drugs in cancer and neurodegenerative diseases. Oncogenesis 2018; 7: 8, doi: 10.1038/s41389-017-0025-3.
» https://doi.org/10.1038/s41389-017-0025-3 -
6 Suthar PC, Purkait P, Uttaravalli K, Sarkar BN, Ameta R, Sikdar M. Glutathione S-transferase M1 and T1 null genotype frequency distribution among four tribal populations of western India. J Genet 2018; 97: 11-24, doi: 10.1007/s12041-018-0888-x.
» https://doi.org/10.1007/s12041-018-0888-x -
7 Wang M, Li Y, Lin L, Song G, Deng T. GSTM1 null genotype and GSTP1 ILE105Val polymorphism are associated with alzheimer's disease: a meta-analysis. Mol Neurobiol 2016; 53: 1355-1364, doi: 10.1007/s12035-015-9092-7.
» https://doi.org/10.1007/s12035-015-9092-7 -
8 Hsiao CF, Sheu WWH, Hung YJ, Lin MW, Curb D, Ranadex K, et al. The effects of the renin-angiotensin-aldosterone system gene polymorphisms on insulin resistance in hypertensive families. J Renin-Angiotensin-Aldosterone Syst 2012; 13: 446-454, doi: 10.1177/1470320312438790.
» https://doi.org/10.1177/1470320312438790 -
9 Rigat B, Hubert C, Alhenc-Gelas F, Cambien F, Corvol P, Soubrier F. An insertion/deletion polymorphism in the angiotensin I-converting enzyme gene accounting for half the variance of serum enzyme levels. J Clin Invest 1990; 86: 1343-1346, doi: 10.1172/JCI114844.
» https://doi.org/10.1172/JCI114844 -
10 Rahimi Z. The role of renin angiotensin aldosterone system genes in diabetic nephropathy. Can J Diabetes 2016; 40: 178-183, doi: 10.1016/j.jcjd.2015.08.016.
» https://doi.org/10.1016/j.jcjd.2015.08.016 -
11 Lieb W, Graf J, Götz A, König IR, Mayer B, Fischer M, et al. Association of angiotensin-converting enzyme 2 (ACE2) gene polymorphisms with parameters of left ventricular hypertrophy in men: results of the MONICA Augsburg echocardiographic substudy. J Mol Med (Berl) 2006; 84: 88-96, doi: 10.1007/s00109-005-0718-5.
» https://doi.org/10.1007/s00109-005-0718-5 -
12 Yang W, Huang W, Su S, Li B, Zhao W, Chen S, et al. Association study of ACE2 (angiotensin I-converting enzyme 2) gene polymorphisms with coronary heart disease and myocardial infarction in a Chinese Han population. Clin Sci (Lond) 2006; 111: 333-340, doi: 10.1042/CS20060020.
» https://doi.org/10.1042/CS20060020 -
13 Zhong J, Yan Z, Liu D, Ni Y, Zhao Z, Zhu S, et al. Association of angiotensin-converting enzyme 2 gene A/G polymorphism and elevated blood pressure in Chinese patients with metabolic syndrome. J Lab Clin Med 2006; 147: 91-95, doi: 10.1016/j.lab.2005.10.001.
» https://doi.org/10.1016/j.lab.2005.10.001 -
14 Zhang Q, Fang W, Ma L, Wang ZD, Yang YM, Lu YQ. VEGF levels in plasma in relation to metabolic control, inflammation, and microvascular complications in type-2 diabetes. Medicine (Baltimore) 2018; 97: e0415, doi: 10.1097/MD.0000000000010415.
» https://doi.org/10.1097/MD.0000000000010415 -
15 Holt RCL, Ralph AS, Webb NJA, Watson CJ, Clark AGB, Mathieson PW, et al. Steroid-sensitive nephrotic syndrome and vascular endothelial growth factor gene polymorphisms. Eur J Immunogenet 2003; 30: 1-3, doi: 10.1046/j.1365-2370.2003.00360.x.
» https://doi.org/10.1046/j.1365-2370.2003.00360.x -
16 da Costa CCP, de Lima NS, Bento DCP, Santos RS, Reis AAS. A strong association between VEGF-A rs28357093 and amyotrophic lateral sclerosis: a Brazilian genetic study. Mol Biol Rep 2022; 49: 9129-9133, doi: 10.1007/s11033-022-07647-z.
» https://doi.org/10.1007/s11033-022-07647-z -
17 Fekih-Mrissa N, Mrad M, Ibrahim H, Akremi I, Sayeh A, Jaidane A, et al. Methylenetetrahydrofolate Reductase (MTHFR) (C677T and A1298C) polymorphisms and vascular complications in patients with type 2 diabetes. Can J Diabetes 2017; 41: 366-371, doi: 10.1016/j.jcjd.2016.11.007.
» https://doi.org/10.1016/j.jcjd.2016.11.007 -
18 de Lima NS, da Costa CCP, Assunção LP, Santos KF, Bento DCP, Reis AAS, et al. One‐carbon metabolism pathway genes and their non‐association with the development of amyotrophic lateral sclerosis. J Cell Biochem 2022; 123: 620-627, doi: 10.1002/jcb.30208.
» https://doi.org/10.1002/jcb.30208 -
19 Cheng J, Tao F, Liu Y, Venners SA, Hsu YH, Jiang S, et al. Associations of methylenetetrahydrofolate reductase C677T genotype with blood pressure levels in Chinese population with essential hypertension. Clin Exp Hypertens 2018; 40: 207-212, doi: 10.1080/10641963.2017.1281937.
» https://doi.org/10.1080/10641963.2017.1281937 -
20 Li A, Shi Y, Xu L, Zhang Y, Zhao H, Li Q, et al. A possible synergistic effect of MTHFR C677T polymorphism on homocysteine level variations increased risk for ischemic stroke. Medicine (Baltimore) 2017; 96: e9300, doi: 10.1097/MD.0000000000009300.
» https://doi.org/10.1097/MD.0000000000009300 -
21 Amer MA, Ghattas MH, Abo-Elmatty DM, Abou-El-Ela SH. Influence of glutathione S-transferase polymorphisms on type-2 diabetes mellitus risk. Genet Mol Res 2011; 10: 3722-3730, doi: 10.4238/2011.October.31.14.
» https://doi.org/10.4238/2011.October.31.14 -
22 Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, et al. STrengthening the REporting of genetic association studies (STREGA)- An extension of the STROBE statement. Genet Epidemiol 2009; 33: 581-598, doi: 10.1002/gepi.20410.
» https://doi.org/10.1002/gepi.20410 -
23 Lin MH, Tseng CH, Tseng CC, Huang CH, Chong CK, Tseng CP. Real-time PCR for rapid genotyping of angiotensin-converting enzyme insertion/deletion polymorphism. Clin Biochem 2001; 34: 661-666, doi: 10.1016/S0009-9120(01)00281-8.
» https://doi.org/10.1016/S0009-9120(01)00281-8 -
24 Santos KF, Azevedo RM, Bento DCP, Santos RS, Reis AAS. No association between GSTM1 and GSTT1 deletion polymorphisms and Amyotrophic Lateral Sclerosis: a genetic study in Brazilian patients. Meta Gene 2021; 30: 100979, doi: 10.1016/j.mgene.2021.100979.
» https://doi.org/10.1016/j.mgene.2021.100979 -
25 Harries LW, Stubbins MJ, Forman D, Howard GC, Wolf CR. Identification of genetic polymorphisms at the glutathione S-transferase Pi locus and association with susceptibility to bladder, testicular and prostate cancer. Carcinogenesis 1997; 18: 641-644, doi: 10.1093/carcin/18.4.641.
» https://doi.org/10.1093/carcin/18.4.641 - 26 Keku T, Millikan R, Worley K, Winkel S, Eaton A, Biscocho L, et al. 5,10-Methylenetetrahydrofolate reductase codon 677 and 1298 polymorphisms and colon cancer in African Americans and whites. Cancer Epidemiol Biomarkers Prev 2002; 11: 1611-1621.
-
27 Benjafield AV, Wang WYS, Morris BJ. No association of angiotensin-converting enzyme 2 gene (ACE2) polymorphisms with essential hypertension. Am J Hypertens 2004; 17: 624-628, doi: 10.1016/j.amjhyper.2004.02.022.
» https://doi.org/10.1016/j.amjhyper.2004.02.022 -
28 Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods Res 2004; 33: 261-304, doi: 10.1177/0049124104268644.
» https://doi.org/10.1177/0049124104268644 -
29 Sen PC, Hajra M, Ghosh M. Supervised classification algorithms in machine learning: a survey and review. In: Mandal J, Bhattacharya D (Editors). Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing 2020, 937. Springer, Singapore, doi: 10.1007/978-981-13-7403-6_11.
» https://doi.org/10.1007/978-981-13-7403-6_11 -
30 Mohamed S, Ashraf R, Ghanem A, Sakr M, Mohamed R. Supervised machine learning techniques: A comparison. 2022;<https://www.researchgate.net/publication/363870735>.
» https://www.researchgate.net/publication/363870735 -
31 López OAM, López AM, Crossa J. Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer Cham 2022, doi: 10.1007/978-3-030-89010-0.
» https://doi.org/10.1007/978-3-030-89010-0 -
32 Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14: 1150596, doi: 10.3389/fgene.2023.1150596.
» https://doi.org/10.3389/fgene.2023.1150596 -
33 Nath S, Das S, Bhowmik A, Ghosh SK, Choudhury Y. The GSTM1 and GSTT1 null genotypes increase the risk for type 2 diabetes mellitus and the subsequent development of diabetic complications: a meta-analysis. Curr Diabetes Rev 2019; 15: 31-43, doi: 10.2174/1573399814666171215120228.
» https://doi.org/10.2174/1573399814666171215120228 -
34 Liu LS, Wang D, Tang R, Wang Q, Zheng L, Wei J, et al. Individual and combined effects of the GSTM1, GSTT1, and GSTP1 polymorphisms on type 2 diabetes mellitus risk: a systematic review and meta-analysis. Front Genet 2022; 13: 959291, doi: 10.3389/fgene.2022.959291.
» https://doi.org/10.3389/fgene.2022.959291 -
35 Mergani A, Mansour AA, Askar T, Zahran RN, Mustafa AM, Mohammed MA, et al. Glutathione S-transferase Pi-Ile 105 Val polymorphism and susceptibility to T2DM in population from Turabah region of Saudi Arabia. Biochem Genet 2016; 54: 544-551, doi: 10.1007/s10528-016-9740-2.
» https://doi.org/10.1007/s10528-016-9740-2 -
36 Staels W, Heremans Y, Heimberg H, De Leu N. VEGF-A and blood vessels: a beta cell perspective. Diabetologia 2019; 62: 1961-1968, doi: 10.1007/s00125-019-4969-z.
» https://doi.org/10.1007/s00125-019-4969-z -
37 Younas H, Ijaz T, Choudhry N. Investigation of angiotensin-1 converting enzyme 2 gene (G8790A) polymorphism in patients of type 2 diabetes mellitus with diabetic nephropathy in Pakistani population. PLoS One 2022; 17: e0264038, doi: 10.1371/journal.pone.0264038.
» https://doi.org/10.1371/journal.pone.0264038 -
38 Rao DK, Shaik NA, Imran A, Murthy DK, Ganti E, Chinta C, et al. Variations in the GST activity are associated with single and combinations of GST genotypes in both male and female diabetic patients. Mol Biol Rep 2014; 41: 841-848, doi: 10.1007/s11033-013-2924-5.
» https://doi.org/10.1007/s11033-013-2924-5 -
39 Cuevas S, Villar VAM, Jose PA. Genetic polymorphisms associated with reactive oxygen species and blood pressure regulation. Pharmacogenomics J 2019; 19: 315-336, doi: 10.1038/s41397-019-0082-4.
» https://doi.org/10.1038/s41397-019-0082-4 -
40 McKinney BA, Reif DM, Ritchie MD, Moore JH. Machine learning for detecting gene-gene interactions. Appl Bioinformatics 2006; 5: 77-88, doi: 10.2165/00822942-200605020-00002.
» https://doi.org/10.2165/00822942-200605020-00002
Publication Dates
-
Publication in this collection
02 Dec 2024 -
Date of issue
2024
History
-
Received
5 Mar 2024 -
Accepted
26 Sept 2024








