Epistasis of polymorphisms related to the articular cartilage extracellular matrix in knee osteoarthritis: Analysis-based multifactor dimensionality reduction

Abstract Osteoarthritis (OA) is a complex disease with a multifactorial etiology. The genetic component is one of the main associated factors, resulting from interactions between genes and environmental factors. The aim of this study was to identify gene-gene interactions (epistasis) of the articular cartilage extracellular matrix (ECM) in knee OA. Ninety-two knee OA patients and 147 healthy individuals were included. Participants were genotyped in order to evaluate nine variants of eight genes associated with ECM metabolism using the OpenArray technology. Epistasis was analyzed using the multifactor dimensionality reduction (MDR) method. The MDR analysis showed significant gene-gene interactions between MMP3 (rs679620) and COL3A1 (rs1800255), and between COL3A1 (rs1800255) and VEGFA (rs699947) polymorphisms, with information gain values of 3.21% and 2.34%, respectively. Furthermore, in our study we found interactions in high-risk genotypes of the HIF1AN, MMP3 and COL3A1 genes; the most representative were [AA+CC+GA], [AA+CT+GA] and [AA+CT+GG], respectively; and low-risk genotypes [AA+CC+GG], [GG+TT+GA] and [AA+TT+GA], respectively. Knowing the interactions of these polymorphisms involved in articular cartilage ECM metabolism could provide a new tool to identify individuals at high risk of developing knee OA.


Introduction
To date, knee osteoarthritis (OA) prevails as the main cause of physical disability in senior adults. It is well known that age, gender, overweight, and genetic predisposition are key factors for its development (Arden and Nevitt, 2006;Peláez-Ballestas et al., 2011;Hernández-Cáceres et al., 2015). Articular cartilage, synovial membrane, and subchondral bone are tissues affected by OA; however, other tissues, such as meniscus and ligaments are compromised too. Progressive degeneration of articular cartilage, the subsequent decrease in joint space and osteophyte formation, as well as proteoglycans and collagen loss, extracellular matrix (ECM) mineralization, and hypertrophic differentiation of chondrocytes represent the hallmark of OA (De Filippis et al., 2004;Bertrand et al., 2010;Kraus et al., 2015).
Healthy articular cartilage is a hyaline, viscoelastic, avascular, aneural, and alymphatic tissue. It is composed of an ECM rich in collagens (types II, IX and XI), proteoglycans (aggrecan), and water. The cellular part consists of chondrocytes only (Sophia et al., 2009). As a result of being an avascular tissue, oxygen and nutrients are lower than in other tissues, creating a hypoxic environment (0.5 -10% of oxygen, or 4 -70 mm Hg, respectively, which also depends of the zone in tissue, either deep or superficial) (Pfander et al., 2006;Oswald et al., 2008;Ströbel et al., 2010). Despite the extreme conditions of the cartilage, hypoxia is necessary in metabolic processes, such as chondrogenesis ECM synthesis and degradation. It also modulates the subchondral bone angiogenesis through the vascular endothelial growth factor A (VEGFA). These processes, and the proper maintenance of general cartilage homeostasis, are coordinated by trophic and disorganized with a ECM synthesis increase, as a way to compensate for the null ECM generated favoring osteophyte formation (Martel-Pelletier et al., 2008). In advanced life stages, or during the development of joint diseases, such as OA, the delicate balance between ECM synthesis and degradation is heavily affected by factors responsible for articular cartilage degradation, among which stand out the proinflammatory cytokines (IL-1b, IL-6, IL-8, TNF-a), chemokines, metalloproteinases 3,9 and 13), and aggrecanases (ADAMTS 4, 5 and 9) (Mariani et al., 2014). In addition to these molecules, EGFR ligands are able to stimulate the expression of genes related to cartilage catabolism and, on the other hand, inhibit the catabolic activity of chondrocytes (Long et al., 2015). Although VEGFA plays a beneficial role for the articular cartilage, during OA it is actively involved in remodeling the ECM and forming osteophytes (Hamilton et al., 2016). On the other hand, the concentration of nitric oxide (ON) generated by nitric oxide synthase (NOS) regulates MMP synthesis. At low concentrations it inhibits production, but at high concentrations, it activates them, favoring ECM degradation. It also promotes free radical production and induces chondrocyte apoptosis (Abramson, 2008a,b).
From a genetic standpoint, several studies suggest associations between single-nucleotide polymorphisms (SNP) and knee OA (Fernández-Moreno et al., 2008;van Meurs and Uitterlinden, 2012;Fernández-Torres et al., 2017;Ozcan et al., 2017). Nevertheless, most of them were assessed individually, in contrast to joint assessments through genegene interactions (epistasis), which could provide more information regarding their role. Recently, Wang et al. (2016) published a meta-analysis associating different SNPs with knee, hip, and hand OA risk, and suggesting that genetic predisposition is a key factor for OA development. Another relevant aspect to be considered is the fact that SNP distribution in this kind of studies is greatly affected by ethnicity and geographic location.
Several studies related to the analysis of gene-gene interactions in complex diseases have only been conducted using logistic regression models, linkage disequilibrium, and Hardy-Weinberg equilibrium tests, all of which have limitations Mechanic et al., 2008). Therefore, identification and characterization of gene-gene and geneenvironment interactions have been limited primarily due to a lack of powerful statistical methods, and particularly because of small sample sizes, which has been a challenge for geneticists. In this sense, the multifactor dimensionality reduction (MDR) method does not require a model as such, given that no genetic models are assumed, neither is it parametric, as no parameters are estimated.
Dimensionality is defined as the interactions between genetic or environmental factors, which increase exponentially as these factors increase. MDR reduces the dimensionality of multilocus data to improve the ability to detect genetic combinations that confer disease risk. MDR pools genotypes into 'high-risk' and 'low-risk' or 'response' and non-response' groups in order to reduce multidimensional data into only one dimension (dichotomous variable). Because it is a non-parametric method, no hypothesis concerning the value of any statistical parameter is made. Additionally, MDR was designed to detect gene-gene or gene-environment interactions in datasets with categorical independent variables, such as SNP and other sequence variations (insertions, deletions, etc.), as well as environmental data that can be represented as categorical variables. The endpoint, or dependent variable, must be dichotomous, such as case/control for studies of human disease. Pharmacogenomics data can also be analyzed with MDR, in terms of response/non-response' or`toxicity/no toxicity'. MDR is appropriate for any data type with two distinct clinical endpoints (Motsinger and Ritchie, 2006;Moore and Andrews, 2015).
For studies with more than two factors, the steps of the MDR method are repeated for each possible model size (two-factor, three-factor, etc.), if computationally feasible. The result is a set of models, one for each model size considered. From this set, the model with the combination of loci and/or discrete environmental factors that maximize the cross validation consistency and minimize the prediction error is selected. Cross-validation consistency is a measure of the number of times an MDR model is identified in each possible 90 per cent of the subjects. When cross-validation consistency is maximal for one model and prediction error is minimal for another model, statistical parsimony is used to choose the best model. On the other hand, a balanced accuracy function is used when the ratio of cases to controls is different than one (Motsinger and Ritchie, 2006;Moore and Andrews, 2015).
The MDR method is based on an algorithm that generates the characteristics or attributes created by a new variable or attribute. It has also developed an entropy-based method for a better statistical interpretation of the results Cho et al., 2004;Gui et al., 2011;Su et al., 2015;Kohli et al., 2016). Previously, Su et al. (2015) performed an epistasis study between TGF-b and Smad3 polymorphisms and OA patients, revealing that it can affect the development of OA. Recently, our work group published a study of epistasis between polymorphisms related to oxidative stress and OA patients, and the results have shown a strong interaction between ADIPOQ (rs1501299) and PON1 (rs662) polymorphisms (Fernández-Torres et al., 2019), suggesting an important role in the OA pathogenesis; both studies were carried out using the MDR method. The generalized MDR (GMDR) method is an extension from MDR. It allows an adjustment for discrete and quantitative covariables and can be applied to both dichotomous and continuous phenotypes in several study designs based on population. Additionally, a command sequence includes p-value assessment (Lou et al., 2007;Chen et al., 2011).
Interactions between multiple polymorphisms of different genes could be the foundation of the genetic origin of OA. Therefore, this study is focused on evaluating whether interactions between several genetic variants of articular car-tilage ECM are associated with knee OA in a Mexican population.

Subjects
Two hundred thirty-nine unrelated individuals of Mexican ancestry and geographically matched were included in this case-control study. Ninety-two of them were knee OA patients of both genders, recruited from the Rheumatology Service of the Instituto Nacional de Rehabilitación "Luis Guillermo Ibarra Ibarra" (INR-LGII). Knee OA was diagnosed under the guidelines of the American College of Rheumatology (Altman et al., 1986); the clinical exam and the X-ray evaluation were performed by a rheumatologist and a radiologist, respectively. OA severity was evaluated using the Kellgren-Lawrence radiological scale (K-L) (Kellgren and Lawrence, 1957). Patients with K-L ³ 2 were included, and those with other etiologies causing knee diseases, such as inflammatory arthritis (rheumatoid arthritis or any other autoimmune disease), post-traumatic arthritis, post-septic arthritis, poliomyelitis, or skeletal dysplasia, were excluded. One hundred forty-seven healthy employees of the INR-LGII of both genders, with no signs or symptoms of knee OA, OA family history, and no other type of arthritis or joint painful condition were recruited as controls from the departments of Human Communications and Human Resources, as well as the cleaning staff. They were invited to participate in the study through the proper authority. Additionally, all evaluated controls showed a K-L grade equal to or lower than one. All participants were at least 40 years old and signed an informed consent. Data on age, gender, body mass index (BMI), and birth place was collected through an interview. This study was conducted under the criteria set forth in the Declaration of Helsinki and was approved by the INR-LGII Ethics and Research Committee (CONBIO-ETICA-09CEI-031-20171207).
Given that the Mexican population structure is composed of a mix of Amerindian, European, and African ancestors, a set of nine ancestry-informative markers (AIMs) was used to evaluate whether any resulting association could be a confounder due to the population stratification during the data adjustment step (Bonilla et al., 2004;Choundry et al., 2006) (Table S1). DNA was isolated from peripheral blood in EDTA tubes, using a commercial kit (QIAmp 96 DNA Blood Kit, Qiagen, Hilden, Germany). The samples were genotyped through the OpenArray technology on a QuantStudio 12K flex equipment (Thermo Fisher Scientific). Genomic DNA from the samples was adjusted to 50 ng/mL; 2.5 mL of DNA were mixed with 2.5 mL of TaqMan OpenArray Genotyping Master Mix (Thermo Fisher Scientific) in 384-well trays. The mixtures were loaded onto OpenArray genotyping trays Epistasis in knee osteoarthritis 3

Statistical analysis
The clinical variables were evaluated with Student's t-test or Fisher's exact test, when appropriate, and values were expressed as mean ± SD. Gene and allele frequencies of all polymorphisms were calculated and compared between cases and controls using Fisher's exacts test. In order to control the global false positive rate, only SNPs with a statistically significant p-value in Fisher's exact test were considered in the multivariate analysis. Associations of each SNP with OA risk were assessed with logistic regression models adjusted by age, gender, BMI, and ancestry, taking into account a codominant inheritance model for the SNP. The Bonferroni's test was used to determine the significance level to correct multiple test errors, which taking into account the nine selected SNPs, an adjusted p-value < 0.005 (0.05/9) was considered statistically significant. Hardy-Weinberg equilibrium (HWE) in controls was calculated using the chi-squared test.
We used the STRUCTURE software v2.3.4 (Pritchard Lab, Stanford University, USA) to evaluate the effect of population stratification on the associations found of each population k (k = 3) with the genotypes of the nine AIMs mentioned above (Table S1). This information was included in the logistic regression models to adjust the associations found between the studied polymorphism and OA by individual mix.
The effect of the associated polymorphisms on groups stratified by gender and age (early OA, <50 years; late OA, >50 years) was analyzed (Table S2). All statistical analyses were performed with the STATA v14.0 (StataCorp, Texas, USA) statistical package considering an a = 0.05 significance level.
In order to study the epistasis, we used the MDR v3.0.2 or GMDR v0.9 statistical packages with Ritchie's algorithm . To corroborate the interaction model proposed by the MDR method, a logistic regression model was carried out, which included the SNPs suggested by the MDR. Subsequently, the p-value of the model, the adjusted r 2 , as well as the p-values for the ORs of the SNPs, were evaluated (Table S3).

Results
Characteristics of the study population Table 2 shows the demographic and clinical characteristics of cases (n = 92) and controls (n = 147). Cases were significantly older than controls (47.2 ± 12.4 vs. 40.9 ± 12.0 years, p = 0.0001). Gender ratio was similar among the study groups, with women being predominant in both. BMI showed overweight in OA patients, whereas the control group showed normal weight, albeit on the upper threshold. The majority of cases and controls were from Mexico City (77.2% and 84.3%, respectively).

Genetic and allelic frequencies of the SNPs studied in OA patients and controls
After adjusting for age, gender, BMI and ancestry, there were no significant differences between the genotype or allele frequencies of the analyzed polymorphisms in the study groups. Genotype distributions were compatible with HWE in controls, except for the HIF1AN (rs11292) polymorphism (Table 3).

Polymorphisms distribution in groups stratified by gender and age groups
The polymorphism distributions stratified by gender and age are shown in Table S2. In men, the GA, CA, and GA heterozygote genotypes of the COL3A1 (rs1800255), VEGFA (rs699947), and EGF (rs4444903) polymorphisms, respectively, were associated with a decreased OA risk (OR = 0.07, 95% CI = 0.00 -0.84, p = 0.03; OR = 0.04, 95% CI = 0.00 -0.85, p = 0.03; OR = 0.10, 95% CI = 0.01 -0.94, p = 0.04, respectively). Regarding the analysis stratified by age groups, the CT heterozygote genotype of the MMP3 (rs679620) polymorphism was associated with an increased risk of developing OA among the £ 50 year-old group (OR = 3.03, 95% CI = 1.07 -8.93, p = 0.03), whereas the G minor 4 Fernández-Torres et al.  allele of the HIF1AN (rs11292) polymorphism was associated with protection in the same age group (OR = 0.44, 95% CI = 0.22 -0.88, p = 0.02). The AA homozygote genotype of the VEGFA (rs699947) polymorphism showed an association with protection for the over 50 year-old age group (OR = 0.06, 95% CI = 0.00 -0.86, p = 0.03). Table 4 summarizes the results of the MDR analysis, which assesses all possible combinations of the associated polymorphisms stratified by gender and age. According to the MDR analysis, the best model included the MMP3 (rs679620), COL3A1 (rs1800255), and HIF1AN (rs11292) polymorphisms. This model had a balanced accuracy test of 0.6434, a consistency of cross-validation of 10/10, and an interaction p-value = 0.0010. Additionally, p-values from logistic regression models were included. Figure 1 shows the interaction map of the studied polymorphisms, based on entropy measures among individual variables. A strong interaction effect was observed between MMP3 (rs679620) and COL3A1 (rs1800255) polymorphisms, and between COL3A1 (rs1800255) and VEGFA (rs699947) polymorphisms, with information gain values of 3.21% and 2.34%, respectively. Significance value was also confirmed through logistic regression models, with an interaction p-value = 0.036. Additionally, in our model we found interactions in high-risk genotypes of the HIF1AN, MMP3 and COL3A1 genes, the most representative were

Discussion
In this study, we applied the MDR computation method to assess the epistasis of candidate genes related to articular cartilage ECM in knee OA patients in a Mexican population. Our main findings revealed important gene-gene interactions between the MMP3 (rs679620), VEGFA (rs699947), and COL3A1 (rs1800255) polymorphisms.
The importance of analyzing MMP-3 (stromelysin-1) is that it is a zinc-dependent endopeptidase that catalytically degrades several collagen and non-collagen proteins of the basement membrane and the ECM. It is considered an essential regulator of ECM homeostasis in healthy and damaged musculoskeletal soft tissue (Gibbon et al., 2017). The MMP3 gene has several genetic polymorphisms that could considerably affect ECM degradation and remodeling. Cases and controls studies evaluating different SNPs within the MMP3 gene have been conducted, including the rs679620 variant, yet none has been associated with OA risk (Guo et al., 2017;Tong et al., 2017). The non-association of this polymorphism that we observed in our work is consistent with previous reports, and it was unexpected, considering its functional role and the active involvement of this metalloprotease in ECM degradation in OA. This phenome-6 Fernández-Torres et al.  non has been reported in other pathologies, such as rheumatoid arthritis (Rodriguez-Lopez et al., 2006). However, when we stratified by age group, we observed that carriers of the CT heterozygote genotype showed a significant association with OA risk in people younger than 50 years old, which might suggest the possibility that the polymorphism could be associated with a more aggressive disease phenotype or more rapid progression. On the other hand, it has been proved that an increase in the VEGF protein level is associated with OA progression. Apparently, VEGF is involved in specific OA processes, including cartilage degeneration, osteophyte formation, subchondral bone cysts, sclerosis, synovitis, and pain. Moreover, inhibition of VEGF signaling decreases OA progression (Mariani et al., 2014;Hamilton et al., 2016). Genetically, there is little evidence of the association of VEGFA polymorphisms in OA pathogenesis. Sánchez-Enríquez et al. (2008) assessed the -460T/C (rs833061) and +405C/G (rs2010963) polymorphisms of VEGFA in knee arthrosis patients and compared them to healthy controls, but they did not find a significant association. In our study, we evaluated the rs699947 variant, and even though distribution was similar in cases and controls, we did find a genotype-phenotype association when stratifying by gender and age groups. We observed that the CA heterozygote genotype provides protection against OA development, but only among men, and that the AA homozygote genotype provides protection exclusively to individuals 50 years old or older. This variant is located in a promoter region of the VEGFA gene, but since its functional role in OA pathogenesis is not yet known, further research is needed.
Substitution of type-II collagen with type-III collagen in the articular cartilage ECM is a common feature of OA. Studies based on mRNA analysis prove that type-III collagen expression is associated with type-II collagen expres-sion, which suggests a metabolic reaction of the chondrocytes to type-III collagen build-up in articular cartilage areas, probably as a response to mechanical injury or other damage to the ECM (Hosseininia et al., 2016). To date, there is no scientific evidence of an association between polymorphisms of the COL3A1 gene with knee OA. Stepien-Slodkowska et al. (2015) evaluated the rs1800255 variant among Polish patients with an anterior cruciate ligament injury and observed that the distribution of the AA homozygote genotype was higher in comparison to the control group (p = 0.0087). In our study, we evaluated the same variant among knee OA patients, and we found that the GA heterozygote genotype is significantly associated with protection, but just among men. This variant causes a change in Ala531Thr, but its role at the articular cartilage level is unknown. There might be a defect in procollagen III synthesis, resulting in a structural modification of the protein.
By using MDR analysis, we found evidence of two strong interactions with significant value between three out of nine variants of the eight candidate genes involved in ECM degradation that we assessed. Likewise, we could identify interactions between the rs11292 (HIF1AN), rs679620 (MMP3) and rs1800255 (COL3A1) polymorphisms, generating high and low risk haplotypes between OA patients. On the other hand, when we performed the interaction analysis using logistic regression models, we did not detect any interaction. One of the virtues of the MDR method is that it does not require very large sample sizes to detect interactions. However, logistic regression models require a very large "N" to detect statistical significance. Also, polymorphisms that are poorly represented or of low frequency are difficult to detect, and interactions between the polymorphisms can only be evaluated by pairs.
It is well known that OA pathogenesis is multifactorial, and its complexity is primarily due to its polygenic na-Epistasis in knee osteoarthritis 7 Figure 2 -Distribution of high-risk and low-risk genotypes in the best three-locus model. The distribution shows high-risk (dark shading) and low-risk (light shading) genotypes associated with knee OA in the three-locus interaction detected by MDR analysis. The percentage of osteoarthritic subjects (left black bar in boxes) and control subjects (right hatched bar in boxes) is shown for each three-locus genotype combination. Boxes were labeled as high-risk if the ratio of the percentage of cases to controls met or exceeded the threshold of 1.0. Boxes were labeled as low-risk if the threshold was not exceeded. Based on the pattern of high-risk and low-risk genotypes, this three-locus model is evidence of gene to gene interaction.
ture. Given this polygenic nature, it has been difficult to prove gene-gene interactions associated with knee OA. To date, only four studies have reported gene-gene interactions in knee OA susceptibility. The first of these studies evaluated the TGF-b/Smad3 signaling pathway using the MDR method (Su et al., 2015). Other two focused on the Wnt/bcatenin signaling pathway and on the uric acid transporters polymorphisms, using logistic regression models (Fernández-Torres et al., 2018a,b). The most recent one published by our working group, identified interactions between polymorphisms of the ADIPOQ and PON1 genes, using the MDR method (Fernández-Torres et al., 2019). The effect magnitude of any polymorphism is likely to be overlooked if the genes are individually examined, without considering potential interactions with other genes, especially those in related pathways.
Despite the results and the advantages offered by the MDR method, it is important to emphasize some aspects. Unlike classical genetic analysis, our main approach highlights the importance of evaluating in an integral manner the effect of genetic variants in knee OA. However, we are aware of certain limitations of our study, the first being that our sample size is limited. However, we believe that after performing a multivariate analysis and a rigorous selection of our patients and controls, the presented data reinforce the biological plausibility of the SNPs in the OA. Second, our association study was limited to one population, so more studies in different populations are needed to support our findings, as well as to evaluate the functionality of the associated SNPs and be able to show evidence of whether they have a causal effect or not. Finally, there are more variants of the same gene that were not analyzed, as well as other genes of the ECM that were not considered and whose impact on OA development is unknown. Therefore, these preliminary results must be interpreted with caution, regarding our observation of a non-association between some polymorphisms and knee OA.
In summary, we evaluated nine polymorphisms associated with components of the articular cartilage ECM in knee OA patients in a Mexican population using the MDR and GMDR methods. The rs679620 (MMP3), rs699947 (VEGFA), and rs 1800255 (COL3A1) polymorphisms showed epistasis. Knowing the interactions involved in ECM metabolism could provide a new tool to identify individuals at high risk of developing knee OA. Further studies are needed to establish the mechanisms of interaction between these polymorphisms and their effects on knee OA susceptibility. This work highlights the importance to open the possibility of evaluating other pathways involved in ECM metabolism, such as the bone morphogenetic protein (BMP), fibroblast growth factor (FGF), hypoxia-inducible factor (HIF), nuclear factor kappa-B (NF-kB), mitogenactivated protein kinase (MAPK), and hedgehog (Hh) signaling pathways.