Predicting pathway cross-talks in ankylosing spondylitis through investigating the interactions among pathways

Given that the pathogenesis of ankylosing spondylitis (AS) remains unclear, the aim of this study was to detect the potentially functional pathway cross-talk in AS to further reveal the pathogenesis of this disease. Using microarray profile of AS and biological pathways as study objects, Monte Carlo cross-validation method was used to identify the significant pathway cross-talks. In the process of Monte Carlo cross-validation, all steps were iterated 50 times. For each run, detection of differentially expressed genes (DEGs) between two groups was conducted. The extraction of the potential disrupted pathways enriched by DEGs was then implemented. Subsequently, we established a discriminating score (DS) for each pathway pair according to the distribution of gene expression levels. After that, we utilized random forest (RF) classification model to screen out the top 10 paired pathways with the highest area under the curve (AUCs), which was computed using 10-fold cross-validation approach. After 50 bootstrap, the best pairs of pathways were identified. According to their AUC values, the pair of pathways, antigen presentation pathway and fMLP signaling in neutrophils, achieved the best AUC value of 1.000, which indicated that this pathway cross-talk could distinguish AS patients from normal subjects. Moreover, the paired pathways of SAPK/JNK signaling and mitochondrial dysfunction were involved in 5 bootstraps. Two paired pathways (antigen presentation pathway and fMLP signaling in neutrophil, as well as SAPK/JNK signaling and mitochondrial dysfunction) can accurately distinguish AS and control samples. These paired pathways may be helpful to identify patients with AS for early intervention.


Introduction
Ankylosing spondylitis (AS) is a chronic inflammation disorder that attacks sacroiliac joints and spine (1) with a 0.3% incidence rate in the Asian population (2). AS causes severe back pain, stiffness and new bone formation, and results in progressive joint ankylosis further decreasing quality of life (3). Unfortunately, the disease condition, including disease activity, progression and prognosis, are very hard to define in AS (4). Until now, the underlying molecular processes driving the AS progress are still unclear. Consequently, investigation on the pathogenesis of AS is urgently needed.
In recent years, genetic-associated research has detected several new genes related to AS. Of note, some of these genes seem specific for AS, but others have pleiotropic associations (5,6). Moreover, these studies offer little information concerning changes in gene activity during the progression of the disease. Fortunately, gene expression profiling provides a ''snapshot'' of cellular activity, supplying information on molecular mechanisms mediating disease changes, and can produce diagnostic gene sets. A number of recent studies have defined transcriptional profiles generated from peripheral blood mononuclear cells for AS. GSE25101 is one of the microarray profiles of AS that was reported by Pimentel-Santos et al. (7), who identified a set of differentially expressed genes (DEGs), which were highly connected with AS, partially regulating the inflammatory process and joint destruction. In 2015, Zhao et al. (8) used the same microarray profile of GSE25101 to predict the potential AS-related genes, such as RPL17, MRPL22, PSMA6 and PSMA4. In 2015, also using the same data, Shi et al. (9) identified 284 DEGs correlated with AS (such as MYH9, BCL11B and CD4), and detected the pathway for immune response regulation. Nevertheless, so far, most studies that assessed the genetics of AS focused on a single gene or a single pathway. However, pathway cross-talk is frequently neglected.
In general, different pathways interact with each other, and an abnormity in one pathway may influence the activities of many other related pathways. Understanding pathway interactions might be beneficial to explore the pathogenesis of AS. No reliable method is used to quantify the cross-talks for pathway pairs (10). However, integrating DEGs information and pathway information with Monte Carlo cross-validation has been proposed to quantify the cross-talk between pathways pairs (11). Monte Carlo cross-validation provided by Shao (12) has been demonstrated to decrease the risk of overfitting the model, and has been used to evaluate the prediction ability of the selected model.
Thus, in the current study, to explore the pathogenesis of AS, we undertook the microarray data analysis of AS to identify the significant pathways considering the functional dependency among pathways using Monte Carlo crossvalidation. We believe that our results may contribute to a better understanding of the molecular processes driving AS progression.

Material and Methods
Using microarray profile as well as biological pathways as study objects, Monte Carlo cross-validation method was used to identify the significant pathways. In this process, all steps were iterated 50 times. For each run, we implemented differential expression analysis (DEA), pathways enrichment analysis, calculation of a discriminating score (DS), and random forest (RF) classification. After 50 runs, the top 10 pathway pairs with the best area under the curve (AUC) values were identified (significant paired pathways). The flow chart of the analysis is shown in Figure 1.

Microarray data
The expression profile of GSE25101 deposited by Pimentel-Santos et al. (7) was downloaded from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/), based on the GPL6947 platform of Illumina Human HT-12 V3.0 expression BeadChips. In the GSE25101, there were 16 active AS patients and 16 normal controls with matched gender and age. Included AS patients had Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) scores 44 and Bath Ankylosing Spondylitis Functional Index (BASFI) scores 44. The raw probe annotation files were obtained for subsequent analysis.

Differential expression analysis
The raw probe sets were pre-processed by means of robust multiarray averaging method. Then, the probes were mapped to the genomics, and the human gene symbols were obtained. Afterwards, Bayesian approach (13) was utilized to detect DEGs. With the goal of avoiding the multiple testing bias, which might cause false positive results, the Benjamini-Hochberg method (14) was used for multiple correction, and false discovery rate (FDR) distribution for each gene was obtained. Eventually, |log2FC| 41 and FDR o0.05 were set as the cut-off criteria to extract DEGs between AS and control samples.

Pathway enrichment analysis
To extract a cluster of pathways significantly enriched by DEGs, pathway enrichment analysis from DEGs was conducted. First, a total of 589 biological pathways were derived from the Ingenuity Pathways Analysis (IPA) software (http://www.ingenuity.com/) (15). Then, we assessed the enrichment effect based on Fisher's exact test, aiming to place DEGs in an IPA pathway and to screen out the pathways responsible for coordinating their activity. The pathways with Po0.01 were identified. Afterwards, the raw P values were corrected for multiple testing through Benjamini-Hochberg procedure (14). In the current study, the pathways with FDR o0.05 were considered as differential pathways.

DS for pathway cross-talk
The DS is often used to compare the expression levels in the samples with amplification and the samples without amplification (16). The DS indicates the relationships between pathway pairs. Therefore, we calculated the DS to further analyze the pathway cross-talks in our study. In detail, we computed a DS by comparing the expression levels of each pathway pair involved in DEGs in each sample based on the formula used in a previous study by Orsetti et al. (16). A larger DS denotes a higher difference in activity of the pairs.

Selecting the best pathway pairs
RF is a powerful classifier utilized to handle two issues of variable selection, and it has become a standard analysis tool in bioinformatics (17). Thus, in the present work, we employed a RF model on the pathway pairs to estimate the classification performance of this method according to AUC index. The 10-fold cross-validation method was utilized to calculate AUC based on mtry and ntree parameters. The 'mtry', is the number of variables randomly sampled as candidates at each split, and is equal to sqrt(p), where p is the variable count in the data matrix. The 'ntree', is the count of trees grown, amounted to 500. Subsequently, we sorted all AUC values in descending order, and the top 10 pathway pairs were extracted.
As reported, during the validation analysis, the ratio of 6 to 4 is a commonly used distribution proportion (18). Thus, the Monte Carlo cross-validation method randomly assigned 60% of the original microarray profile to make up for the training set and assigned the remaining 40% to form the testing data. This procedure was iterated 50 times, randomly producing new training and testing sets each time. For each run, we used a training set to extract DEGs, the pathways enriched by DEGs, and the DS values for the top 10 pathway pairs with the best AUCs between AS and control groups; we utilized the testing data to validate the top 10 pathway pairs. At the end of the 50 runs, the top 10 pathway pairs sorted in descending rank were extracted in 50 bootstraps. Eventually, the list of the top 10 pathway pairs ranked for all 50 bootstraps were considered as the significant ones.

DEGs identification and pathway enrichment analysis
After robust microarray analysis, 11,586 genes were identified for differential expression analysis. According to |log2FC| 41 and FDR o 0.05, 19 genes were found to be differentially expressed between AS and control samples, as listed in Table 1. Moreover, 46 significant pathways enriched by DEGs were detected. Table 2 displays the names of significant pathways enriched from DEGs, the corresponding FDR values, count of genes for each pathway, and count of common genes between DEGs and genes in the pathways. Then, we computed a DS via comparing the expression levels of each pair of pathways involved in DEGs in each sample. The distribution of DS values is shown in Supplementary Table S1. The DS values of 4 pathway cross-talks were higher than 0.1, which included the role of NFAT in regulation of the immune response/prostanoid biosynthesis (DS=0.212938), the role

Identifying the best pathway pairs
With the goal of assessing the classification ability of this approach, we used 10-fold cross-validation to calculate the AUC values for pathway pairs using the RF model. Each pathway pair was then ranked based on its corresponding AUC value. Of note, there were 35 pairs of pathways with AUC not less than 0.800. It is known through the literature that AUC 40.7 is determined as good, and an AUC of 1.0 suggests a perfect classification (19). Greater AUCs indicate better disease classification, that is, a stronger correlation between the pathways and the given disease. Hence, to better understand the molecular mechanisms of AS, we focused more in the top 10 pathway pairs, as reported by Colaprico et al. (11). Table 3 demonstrates the top 10 paired pathways with the best classification performance for AS and control samples for all 50 runs. The pair 'antigen presentation pathway' and 'fMLP signaling in neutrophils' had the best AUC value of 1.0. Moreover, the pair 'activation of IRF by cytosolic pattern recognition receptors' and glucocorticoid receptor signaling' revealed a good classification ability with a 0.995 AUC. Similar performance was observed in the pair 'T helper cell differentiation' and 'CXCR4 signaling' with AUC of 0.992.
After this, the top 10 pairs of pathways were extracted based on the occurrence frequency in the 50 bootstraps X3. According to this procedure, the pair 'SAPK/JNK signaling' and 'mitochondrial dysfunction' were involved in 5 bootstraps, the pair 'mitochondrial dysfunction' and 'G beta gamma signaling' appeared in 4 runs, and the pair 'mitochondrial dysfunction' and 'G protein signaling mediated by Tubby' also appeared in 4 runs. Importantly, among these top 10 pathway pairs, the 'mitochondrial dysfunction' interacted with 6 different pathways. Specific information is shown in Table 4.

Discussion
AS, as a common rheumatic disorder, leads to inflammatory back pain, thereby reducing the quality of life (20). The potential molecular mechanism of AS remains unclear. In recent years, gene expression profiles have been widely used to identify disease-related biomarkers (21,22), several of them having similar functions, however reproducibility is poor. In this condition, these biomarkers may not have precise classification ability. With the goal of solving this challenge, extraction of biological pathways involved in a given phenotype is a key process. Pathway-based biosignatures are more reproducible and frequently obtain better classification ability than single gene biomarkers (19). However, currently, most approaches regard the pathways to be independent, not considering the interactions between them, called ''cross-talk'' (23). The cross-talks among pathways indicate the regulatory interaction among different pathways. Of note, detection of cross-talks among pathways better reveal the pathway functions and contribute more to the understanding of the synergistic effects on cellular processes, compared with individual pathways (24). Furthermore, several reports demonstrated the potential function of pathway cross-talks in therapeutic strategies (25,26). Although there are many merits of pathway cross-talk in disease treatment, pathways amount of cross-talk interactions have not been completely studied. Most importantly, no available technique can quantify the cross-talks for pathway pairs (10). Integrating DEGs information and the pathway information with Monte Carlo cross-validation has been proposed to quantify the cross-talk between pathways pairs (11). Consequently, in the present work, Monte Carlo cross-validation analysis was employed to uncover the best paired pathways that could distinguish between AS and control samples. We found 35 paired pathways with AUCs not less than 0.800, after evaluating the top 10 paired pathways. Thus, the pathogenesis of AS may be related with the expression alterations of these paired pathways. The pathway pair 'antigen presentation pathway' and 'fMLP signaling in neutrophils' got the best AUC value of 1.000, which indicated that this pathway cross-talk could distinguish AS patients from the normal subjects. As reported, exogenous antigens are presented by major histocompatibility complex (MHC) class I molecules (27). Significantly, MHC class I molecules have been suggested to play important roles in immune surveillance by binding to CD 8+ T cells, which act in concert towards antigen processing as well as antigen presentation machinery (28,29). HLA-27 and ERAP1 are central members in the antigen presentation machinery, which have been shown to contribute to the AS risk (30). HLA-27 can regulate the migration of neutrophil and neutrophils exert key functions in the innate immune response (31). A previous study published by Biasi et al. exhibited an increased response to fMLP by circulating neutrophils in AS patients (31). Accordingly, the cross-talk between antigen presentation pathway and fMLP signaling in neutrophils might be strongly correlated with the etiology of AS, probably via regulating the immune response.
Bone formation (for example, syndesmophytes) is a common feature of AS (32). Furthermore, bone formation and development depend on the balance between osteoblasts-mediated bone formation and osteoclasts-induced bone resorption, and this bone homeostasis is disrupted in an inflammation environment (33,34). TNF-a, as a key proinflammatory cytokine, is responsible for the inflammationrelated bone loss, and TNF-a can suppress BMP-mediated osteoblastogenesis through activating the SAPK/JNK pathway (35). Furthermore, endoplasmic reticulum (ER) stress can provide the links with the inflammatory responses, and result in the activation of JNK by reactive oxygen species (ROS) (36). Furthermore, mitochondria can contribute to the production of ROS (37). As documented, ROS attacks are directed primarily towards the polyunsaturated fatty acids of the membrane lipids, inducing lipid peroxidation, which further results in the disorganization of cell structure and function (38). Additionally, ROS have been indicated to be possible mediators of tissue damage, which is related to AS (39). Therefore, we infer that the alteration of the pathway pair 'SAPK/JNK signaling' and 'mitochondrial dysfunction' might induce AS onset and progression by affecting inflammatory and oxidative metabolism as mentioned above.
Nevertheless, several study limitations must be noted. First, the sample size was rather small. Second, this was a Table 4. Top 10 pathway pairs with occurrence number not less than 3.

Pathway pairs
Total occurrence number (1a) EIF2 signaling (1b) LXR/RXR activation preliminary study of mechanisms underlying AS and results were achieved based on in silico analysis without validation in animal models or patient tissues. Thus, these pathway pairs should be further investigated using western blotting or PCR-based experiments to reveal the pathway changes in AS.
In conclusion, our analysis provided new knowledge for AS and identified several bio-signatures for this disease.
Based on our results, the detected pathway cross-talks might be helpful to identify patients with AS for early intervention. However, these paired pathways call for future functional studies.