Hardy-Weinberg Equilibrium in different mitochondrial haplogroups of four genes associated with neuroprotection and neurodegeneration

ABSTRACT Background: Malfunctioning or damaged mitochondria result in altered energy metabolism, redox equilibrium, and cellular dynamics and is a central point in the pathogenesis of neurological disorders such as Alzheimer’s disease, Parkinson’s disease, Huntington’s disease and Amyotrophic Lateral Sclerosis. Therefore, it is of utmost importance to identify mitochondrial genetic susceptibility markers for neurodegenerative diseases. Potential markers include the respiratory chain enzymes Riboflavin kinase ( RFK ), Flavin adenine dinucleotide synthetase ( FAD ), Succinate dehydrogenase B subunit ( SDHB ), and Cytochrome C1 ( CYC1 ). These enzymes are associated with neuroprotection and neurodegeneration. Objective: To test if variants in genes RFK , FAD , SDHB and CYC1 deviate from Hardy-Weinberg Equilibrium (HWE) in different human mitochondrial haplogroups. Methods: Sequence variants in genes RFK , FAD , SDHB and CYC1 of 2,504 non-affected individuals of the 1,000 genomes project were used for mitochondrial haplogroup assessment and HWE calculations in different mitochondrial haplogroups. Results: We show that RFK variants deviate from HWE in haplogroups G, H, L, V and W, variants of FAD in haplogroups B, J, L, U, and C, variants of SDHB in relation to the C, W, and A and CYC1 variants in B, L, U, D, and T. HWE deviation indicates action of selective pressures and genetic drift. Conclusions: HWE deviation of particular variants in relation to global populational HWE, could be, at least in part, associated with the differential susceptibility of specific populations and ethnicities to neurodegenerative diseases. Our data might contribute to the epidemiology and diagnostic/prognostic methods for neurodegenerative diseases.

Several studies have shown that mitochondria play an important role in the pathogenesis of neurodegenerative diseases 1 . Robust evidence suggests that structural changes in mitochondria, including increased mitochondrial fragmentation and decreased mitochondrial fusion, are critical factors associated with mitochondrial dysfunction and with several types of neurons' death in ALS, including motor neurons 2 .
Human populations can be classified in mitochondrial haplogroups based on specific single nucleotide variants (SNVs) scattered along the mitochondrial genome. Mutations in mitochondrial DNA are very common, so the distribution of mutant mitochondria during mitosis affects mitochondrial function in the daughter cells, generating variability in mitochondrial function 3 .
Common complex diseases have been associated with specific mitochondrial haplogroups. There is strong evidence that mitochondrial DNA variation plays a role in the development and progression of complex human diseases. Mitochondrial genetic variations have been widely recognized in the context of genome-wide association studies (GWAS), confirming the need for studies that investigate the function of mitochondrial DNA variation in human health and disease 4 .
Studies on mitochondrial haplogroups allow the identification of adequate pathways in the discovery of susceptibility of a specific disease, alteration in response of a drug or environmental factors. Characterization of the haplogroups is done according to the mitochondrial mutation rate, mainly the mutations located in the control region of mitochondria. Therefore, if some diseases are affected by mitochondrial dysfunction, they behave differently depending on the patient's haplogroup 5 .
Another factor that can make variations in the genetic characteristics and consequently affect the function of a gene in relation to the incidence of a disease is the microevolution process involving nuclear genes and/or rare mitochondrial DNA 6 .
Most mitochondrial diseases are generally identified in relation to mitochondrial respiratory chain compromised due to quantitative and qualitative defects of both mitochondrial DNA and nuclear DNA, which are inherited according to the mitochondrial genetics rules and Mendelian inheritance, respectively 7 . These diseases are classified genetically by three types: sporadic appearance (by mitochondrial DNA mutation -duplications or deletions); maternal inheritance (typically by point mutations in mitochondrial DNA); mendelian inheritance (typically due to nuclear DNA defects) 8 .
Previous studies have shown that enzymes of the mitochondrial respiratory chain complexes such as Riboflavin kinase (RFK), Flavin adenine synthase (FAD synthase), Succinate dehydrogenase B subunit (SDHB), Cytochrome C1 (CYC1), are involved in neuroprotection mechanisms ensuring neuron survival. Quantitative or qualitative defects in genes of these enzymes lead to neurodegeneration 9,10,11,12,13 . In addition, in a study with patients with Amyotrophic Lateral Sclerosis (ALS), Lin and collaborators, using quantitative realtime polymerase chain reaction (PCR) method showed that mRNA expression levels of RFK, FAD, SDHB, CYC1 are significantly lower in comparison to controls, confirming quantitative effects of these enzymes, especially in ALS patients 14 .
Considering that these enzymes may be associated with neurodegenerative diseases, we studied the combination of genotypic information of mitochondrial haplogroups and variants of these enzymes in normal, non-affected individuals. After studying the combination between the genotypic information of the individuals with their own mitochondrial haplogroups, we were able to identify the differential distribution of these variants in the different mitochondrial haplogroups.

Genome sequence data
The 2,504 genome sequences used in this project were from the 1,000 genomes project (https://www.internationalgenome.org/) 15,16,17,18 . This sample consists of 26 worldwide distributed populations grouped in five super-populations (African, American, East Asian, European and South Asian) 15 . Although the level of sample kinship is small due to its global scale, some authors identified that some level of inbreeding is detectable with the FSuite program 19 . The 1,000 genomes project consisted in three phases, phase 1 being the pilot. This phase contained sequences derived from trios (Father, Mother, Proband). The 1,000 genomes project guidelines ensured that a significant proportion of samples should be from unrelated individuals: https://www.internationalgenome.org/sample_collection_principles.
The mitochondrial haplogroups are groups of types of mitochondrial genomes with a genealogical relatedness and defining certain sets of mutations (http://www.phylotree. org/) 20 . The consent for subjects included in the sampled populations followed the project guidelines: (https://www. internationalgenome.org/sample_collection_principles/) desvio genético. Conclusões: O desvio do HWE de variantes particulares em relação ao HWE populacional global poderia estar, pelo menos em parte, associado à suscetibilidade diferencial de populações e etnias específicas a doenças neurodegenerativas. Nossos dados podem contribuir para a epidemiologia e métodos diagnósticos/prognósticos para doenças neurodegenerativas. and are available in: https://www.internationalgenome. org/sites/1000genomes.org/files/docs/Informed%20 Consent%20Form%20Template.pdf Ensemble Genome Browser was used to access the sequencing information and single nucleotide variants of 2,504 normal, non-affected, individuals from the 1000 genomes project 21 . Genes selected for the analysis were Riboflavin kinase (RFK), Flavin adenine dinucleotide synthetase (FAD), Succinate dehydrogenase B subunit (SDHB), and Cytochrome C1 (CYC1). Variants were filtered by minimum allele frequency (MAF)>0.1. The SNVs with highest global allele frequency for each individual gene were selected for HWE analysis. The SNVs analyzed are depicted with the dbSNP database ID 22 . The information on each individual SNV is available using the "rs" ID number for search in the dbSNP (https://www.ncbi.nlm.nih.gov/snp/).

Hardy-Weinberg Equilibrium Calculation
For the analysis of the data from the Hardy-Weinberg calculation, chi-square test was used, with one degree of freedom and significance level set at 0.05 (α=0.05), (The Pearson's chisquare goodness-of-fit) 23, 24 . For more accuracy, we used the program's in-built Hardy-Weinberg calculator to compute the chi-square index: https://www.easycalculation.com/health/ hardy-weinberg-equilibrium-calculator.php. Critical value is 3.841 for one degree of freedom and α=0.05. If chi-square values for each haplogroup are greater than the genotype, frequencies of the variant deviate from equilibrium.
The null hypothesis in this test states that there is no difference between observed and expected values, that is, variables are independent, and according to the alternative hypothesis, there is a significant difference between observed and expected values and the two variables are dependent. According to the test, when the chi-square index is greater than the table value or critical value, the null hypothesis is rejected, determining the existence of the correlation between the two variables and, evidently, as the chi-square index was higher than the critical value, we had stronger evidence to reject the null hypothesis in favor of the existence of stronger association between the two variables. Therefore, for a particular haplogroup, deviation from the HWE suggests the relationship between variants and mitochondrial haplogroups, such that variants deviate in relation to a haplogroup are in higher distributions in haplogroup populations and, consequently, populations are more susceptible to neurodegeneration. We describe these haplogroups as "haplogroups at risk".

RESULTS
The results here presented bear on variants in genes RFK, FAD, SDHB and CYC1 that were selected for highest global allele frequency. The analysis showed 36 variants that are not in HWE in the genes RFK, FAD, SDHB and CYC1. These four genes are associated with neuroprotection mechanisms.
Here we tested if differences in allele dynamics, as revealed by HWE, might suggest a differential distribution of neuroprotection in different mitochondrial haplogroups. The chisquare critical value in HWE was 3.841 (one gene, two alleles, one degree of freedom for α=0.05). Variants with chi-square value higher than the threshold were considered a risk factor for neurodegeneration.
The mitochondrial haplogroups were defined for the 1,000 genomes project database from the mitochondrial genomes using Haplogrep 2 25 and were fully consistent with a parallel published inference 26 .
There are nine variants in disequilibrium in the RFK gene in seven mitochondrial haplogroups, as described in Table 1. Haplogroups G, V and W showed to be more susceptible to neurodegeneration than others haplogroups at risk.
In the FAD gene, there are eight variants in disequilibrium in five mitochondrial haplogroups (Table 2).
Haplogroups B and L show chi-square values significantly higher than the other haplogroups. Halogroup J showed risk for neurodegeneration for all variants in disequilibrium. The SDHB gene showed eight variants in disequilibrium in seven mitochondrial haplogroups (Table 3).
Haplogroups A and W showed to be more susceptible to neurodegeneration than other haplogroups. The CYC1 gene The data here used for HWE calculations are obtained from a global sample and therefore small local changes might not have the necessary significance in the HWE. The 1,000 genomes project dataset contains samples that can be considered as close as possible to a random population, although in phase 1 (pilot) some trios were included. Because of the global nature of the sample collected from 26 worldwide locations in five different continents, it is likely that minor local population effects are not interfering with global HWE here measured.
Van der Walt et al. 29 observed that haplogroups J and K reduced the incidence of Parkinson's disease (PD) by 50%. Another analysis revealed that the SNP 9055A in ATP6 (which defines haplogroup K) reduced the risk in women, and the SNP 13708A in ND5 was protective in individuals over 70 years (haplogroup J) 29 . The UKJT group was associated with a 22% reduction in risk for PD but not with the risk of Alzheimer's disease (AD), confirming that the association with PD was disease specific and not a general effect observed in all neurodegenerative diseases 30 . In a large cohort of 620 Italian patients with PD, it was found that haplogroup K was associated with a lower risk of PD 31 .
In relation to Friedreich's ataxia (FA), Giacchetti et al. 32 studied 99 patients with FA and 48 control subjects, all from southern Italy, showing that patients with haplogroup U had a delay of five years in the onset of the disease and a lower rate of cardiomyopathy. However, no significant difference was found in the frequency distribution of haplogroups between patients and controls 32 .
In a study conducted by Mancuso and collaborators 33 , fifty-one patients with Huntington's disease (HD), a trinucleotide expansion disorder caused by a CAG expansion in the IT15 gene, were compared with 181 controls. The frequency of haplogroups and clusters of haplogroups did not differ between the two groups, and they did not observe a correlation with sex, age of onset, and status of the disease 33   showed nine variants in disequilibrium. The deletion (A/-) in variant rs60547285 showed a greater disequilibrium in 20 mitochondrial haplogroups (Table 4).

DISCUSSION
The accumulation of mutations in mitochondrial DNA leads to oxidative damage, energy reduction, and increased Reactive Oxygen Species (ROS) production. This can cause disease or negatively affect longevity in individuals or in a population that shares the same genotype of mitochondrial DNA. Perhaps the opposite may also be true for different polymorphisms, which may be protective 27 .
Several studies correlate mitochondrial haplogroups with disease susceptibility or protective roles. In 114 healthy Spanish men, Marcuello et al. 28 found that J Haplogroup was associated with reduced Electron Transport Chain (ETC) efficiency, decreased Adenosine Triphosphate (ATP) and ROS production, possible explaining the accumulation of this variant in elderly people.
In order to better clarify the involvement of mitochondria in the pathogenesis of AD, studies showed the possible association of mitochondrial haplogroups and susceptibility to the disease. According to studies, European mitochondrial haplogroups may be related to longevity 35,36 as well as to neurodegeneration, the risk of AD, and therefore death in Caucasians. In an article by Chagnon et al. 37 , haplogroup T is described as under-represented in AD, whereas haplogroup J is overrepresented. Van der Walt et al. 38 showed that men classified as U-haplogroup had a significant increase in the risk of AD, whereas women showed a significant decrease in risk with the same U-haplogroup. To assess the relationship between haplogroup and AD in an Iranian population, the two hypervariable segments of mitochondrial DNA in 30 patients with AD and 100 control subjects were sequenced 39 . It was found that H and U haplogroups are significantly more abundant in AD patients, presuming that these two haplogroups may act synergistically to increase the penetrance of AD 39 . When studying an Italian sample, Carrieri et al. 40 hypothesized that haplogroups K and U may act by neutralizing the effect of the major risk factor allele of AD, apolipoprotein E. However, this association was not confirmed in another study 41 . An association between AD and haplogroups G2a, B4c1 and N9b1 was described in 96 Japanese AD patients 42 . A study conducted in a Polish population found that the HV cluster is significantly associated with the risk for AD, although no evidence was reported for the involvement of haplogroups U, K, J or T in the risk for AD 43 .
To investigate whether specific genetic polymorphisms within mitochondrial DNA could act as susceptibility factors and contribute to the clinical expression of sporadic amyotrophic lateral sclerosis (ALS), the mitochondrial haplogroups in 222 patients of clear Italian origin were analyzed for sporadic ALS and 151 paired controls 44 . The frequency of haplogroups was lower in cases of ALS than in controls. Age of onset, severity and neurological system involved in the disease were not associated with haplogroups. In a comparison developed to test what makes haplogroup I different from the other haplogroups tested, a highly significant difference was found in alleles 16391A and 10034C. They concluded that mitochondrial DNA polymorphisms may contribute to motor neuron degeneration, possibly interacting with unknown genetic or environmental factors. However, this finding was not confirmed by a study with a large British cohort of 504 patients with ALS and 493 controls, and reported no evidence that mitochondrial DNA haplogroups contribute to the risk of developing ALS 45 .
The conflicting findings in relation to several neurodegenerative diseases may be related to analysis only in the mitochondrial DNA. Other studies have shown the reciprocal interaction between mitochondrial DNA and nuclear genome. Gene expression uses a high proportion of cellular energy and protein content of mitochondria is responsible for ATP production through the mechanism of oxidative phosphorylation. Mitochondria, depending on genome content, may alter the expression profile of the nuclear genes of cells differently due to a high amount of the mitochondrial DNA sequence that vary between human populations 46 and over time evolution, mitochondrial-nuclear interactions come highly specific 47 . On the other hand, more than 99% of mitochondrial proteins are encoded through nuclear genes and deleterious or protective effect of mitochondria compromises the importance of cytosolic proteins 48 . Therefore, the expression of nuclear genes may vary depending on the mitochondrial haplogroup, also if the nuclear gene is affected by some genetic alterations in its variants due to the process of microevolution, it can affect mitochondrial function differently. Through these interactions and considering that, in general, point mutations in enzyme variants may affect the stabilization of the enzyme transition state and consequently reduce their total catalytic activity 49 , one can conclude that the genetic expression of the components of the mitochondrial respiratory chain complexes encoded through the nuclear genes may vary depending on the mitochondrial haplogroup. Moreover, if the nuclear gene is affected by some mutations due to the microevolution process, in a different way, it can affect the function of mitochondria in final ATP production in population haplogroups. In our work, we found variants that were out of equilibrium in relation to specific haplogroups. These were considered as "Haplogroups at risk", which in most cases were clusters of identical haplogroups.
According to the analysis of outcome measures, variants of RFK encoded enzyme, including rs2501928, rs2490579, rs2501923, rs10716702, deviate from HWE relative to haplogroups G, V, W. Other variants of this enzyme had at least some of these groups in common, for example, rs2501925 (G, H, L, V) and rs11447646 (V, W). Some variants deviate from the HWE relative to the totally different haplogroups, such as rs11144870 (C, D, L), while still others did not deviate from any haplogroups, including rs60002266 and rs7210880.
In the case of the CYC1 encoded enzyme, variants rs11780874 and rs13255347 deviate from the equilibrium relative to the set (D, L, U) and variants rs13254954, rs78222232, rs11541475, rs12550729, rs144925641 deviated in relation to the single haplogroup (B), while the insertion of rs11433813 deviated from equilibrium in relation to the haplogroups B, L, R, T. Interestingly, variant rs60547285 deviated from the equilibrium in almost all haplogroups except I and Y, probably due to the very small sample size.
Therefore, according to the results here described, our hypothesis is that departure from the equilibrium in relation to a specific haplogroup results in a reciprocal interaction between mitochondrial genome and nuclear genes. This is related to the modification of the expression of variants through the mitochondrial genome and the presence of point mutations in the deviant variants, which leads to reduction in the catalytic activity of enzymes. All these interactions cause the deviant variants carriers in different haplogroup populations to be susceptible to neurodegeneration in different levels.
The main objective of this study was to identify variants of enzymes of RFK, FAD, SDHB, and CYC1, which, theoretically and according to analysis of calculations and statistical studies, may have greater distribution in haplogroup populations and increased susceptibility to neurodegenerative diseases. Because the study was based on the probability of occurrence of events, the results obtained should be compared to those seen in clinical trials. Another critical point of the study is the small size of some populations of haplogroups, as it may probably affect the accuracy of results in HWE calculation, because the larger the study population is, the more reliable results will be.
In this study, we identified differential distribution of variants in four genes encoding mitochondrial respiratory chain enzymes. These four genes (RFK, FAD, SDHB, CYC1) differ significantly in HWE statistics in different haplogroups. Departure from HWE in specific mitochondrial haplogroups can be explained by: (a) differential natural selection, (b) genetic drift due to small population size, (C) extensive migration and (e) significant non-random mating. Departure from HWE in some of the haplogroups and not others suggests that the specific haplogroups within a population could be more susceptible to neurodegeneration associated with AD, PD, HD and ALS, and that this susceptibility could increase the higher the chi-square index for each specific haplogroup is.