Diagnostic prediction model for levodopa-induced dyskinesia in Parkinson’s disease

Abstract Background: There are currently no methods to predict the development of levodopa-induced dyskinesia (LID), a frequent complication of Parkinson's disease (PD) treatment. Clinical predictors and single nucleotide polymorphisms (SNP) have been associated to LID in PD. Objective: To investigate the association of clinical and genetic variables with LID and to develop a diagnostic prediction model for LID in PD. Methods: We studied 430 PD patients using levodopa. The presence of LID was defined as an MDS-UPDRS Part IV score ≥1 on item 4.1. We tested the association between specific clinical variables and seven SNPs and the development of LID, using logistic regression models. Results: Regarding clinical variables, age of PD onset, disease duration, initial motor symptom and use of dopaminergic agonists were associated to LID. Only CC genotype of ADORA2A rs2298383 SNP was associated to LID after adjustment. We developed two diagnostic prediction models with reasonable accuracy, but we suggest that the clinical prediction model be used. This prediction model has an area under the curve of 0.817 (95% confidence interval [95%CI] 0.77–0.85) and no significant lack of fit (Hosmer-Lemeshow goodness-of-fit test p=0.61). Conclusion: Predicted probability of LID can be estimated with reasonable accuracy using a diagnostic clinical prediction model which combines age of PD onset, disease duration, initial motor symptom and use of dopaminergic agonists.

Parkinson's disease (PD) is a complex progressive disease associated to many clinical problems through its natural history, with increasing disability and functional dependence 1 . Motor fluctuations and levodopa-induced dyskinesias (LID) associated to levodopa therapy are usually the most relevant clinical problems in the intermediary phase of the disease. LID are involuntary movements affecting facial, cervical, and limb muscles, usually associated to the plasma peak-dose of levodopa 2 . Hospital-based prospective studies reported LID prevalence from 33 to 51.2% after five years of levodopa therapy 3,4 ; however, a recent community-based prospective study described a LID prevalence of 12.7% after five years of levodopa therapy 5 . LID may increase health care costs and negatively impact the quality of life of patients with PD 1,6-8 .
Many studies have extensively explored the potential association of LID with distinct predictors. That includes some clinical features related to PD, such as age of PD onset, disease duration, levodopa therapy duration, levodopa daily dosage and Hoehn & Yahr stage, but also demographic and environmental aspects, such as gender, weight, coffee consumption, among others 9 . Associations between LID and genetic variations, mainly single nucleotide polymorphism (SNP), have been studied over the last decades, with conflicting results. Nonetheless, some studies associated some genes variants to LID, such as COMT, MAOB, DRD2/ANKK1, DRD3, DAT1, BDNF and ADORA2A [10][11][12][13][14] .
Despite its clinical significance, there are no precise methods to predict the risk for the development of LID. Clinical trials on new drugs to prevent LID could be benefited from prediction tools to select patients with higher risks of developing LID. Furthermore, there is a current worldwide effort toward research on precision medicine, and a more effective decision-making process in Neurology and medical therapy in PD seems suitable to be tailored based on prediction models, regarding the prevention of LID 15 . A recent prospective study evaluated the risk of LID onset using a former prediction model, with modest accuracy 16 .
We aimed to assess the association of clinical and genetic data as predictors for LID in a sample of patients with PD and explored if it was possible to use it as a predictive tool for LID in patients with PD taking levodopa.

Study design and subjects
We conducted a cross-sectional study with epidemiological and clinical data from Brazilian patients with PD to identify clinical and genetic predictors associated to LID, as part of the Latin American Research Consortium on the Genetics of Parkinson's disease (LARGE-PD). We included patients followed at two Movement Disorders Clinics in Brazil (Ribeirão Preto Medical School and Universidade Federal de São Paulo).
After that, we also included in the analysis another Brazilian cohort, which had used similar procedures ( for further information on this sample see Rieck et al. 14 ). We enrolled patients between May 2007 and February 2014. All patients met the UK Parkinson's Disease Society Brain Bank clinical diagnostic criteria for PD 17 . We excluded patients if: (1) they were not taking levodopa at the time of study evaluation, and (2) there were missing data about LID. The study was approved by the Ethics Committee of Hospital das Clínicas de Ribeirão Preto and all participants provided the written informed consent.

Evaluations
We examined all patients using a standardized assessment, comprising (1) clinical evaluation by movement disorders specialists, which included the former version of the UPDRS 18 and the International Parkinson and Movement Disorders Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) 19 , and (2) peripheral blood collection for DNA extraction. For patients assessed by the former version of the UPDRS, we converted the Part III motor examination score to equivalent MDS-UPDRS Part III, as previously described 20 .
Other clinical data included demographics, age of PD onset, disease duration, levodopa therapy duration, PD medications and levodopa equivalent daily doses (LEDD) 21 , and original Hoehn & Yahr stage 22 .
We defined the presence of LID by a score ≥1 on item 4.1 of the MDS-UPDRS Part IV (time spent with dyskinesias) and confirmed this finding after a review of medical charts. The initial motor symptom was dichotomized in tremor or other symptoms (rigidity, bradykinesia, gait difficulty and postural instability) based on a detailed description of their initial dominant symptom after confirmed by a review of medical charts. We classified clinical phenotypes at evaluation as tremor dominant, postural instability/gait difficulty (PIGD), and indeterminate, based on specific items from MDS-UPDRS Parts II and III, as described previously 23,24 .

Genetic data
Genomic DNA was extracted from peripheral blood using standard methods. All patients were genotyped for the following genes and SNPs: COMT (rs4680), MAOB (rs1799836), DRD2/ ANKK1 (Taq 1A, rs1800497), DRD3 (rs6280), DAT1 (rs393795), BDNF (rs6265) and ADORA2A (rs2298383). These SNPs had a stronger association with LID, based on a systematic literature search on MEDLINE, EMBASE and Web of Science ( from inception to June 2017), using the following algorithms: MEDLINE -levodopa AND dyskinesia AND polymorphism; EMBASE -levodopa AND dyskinesia AND polymorphism NOT 'tardive dyskinesia'; Web of Science -TS=(levodopa AND polymorphism AND dyskinesia) NOT TS="tardive dyskinesia" [10][11][12][13][14] . We performed genotyping at LARGE-PD's coordinating site in Seattle, WA (University of Washington/VA Puget Sound Health Care System) using TaqMan SNP genotyping assays (Applied Biosystems, CA, USA) on a 7900HT Sequence Detection System (ABI Prism, Applied Biosystems, CA, USA). Genotyping from patients enrolled at one center (Porto Alegre City) was already described in previous publications 11,14 , after, it was added to the present study. These analyses were performed with the similar methodology used at LARGE-PD's coordinating site for the following SNPs: rs4680, rs1799836, rs1800497, rs6280, rs6265 and rs2298383.

Statistical analysis
We performed multivariate logistic regression models and defined the presence of LID as the dependent variable using clinical data, genetic data (genotyped SNPs), and mixed models. For clinical predictors model, we selected only independent variables with p<0.1 in the univariate analyses with the presence of LID. We entered these elected variables into a multivariable logistic regression analysis with a stepwise backward selection strategy. We used the Akaike Information Criterion as a stopping rule 25 . Variables associated to levodopa therapy (levodopa therapy duration, levodopa dose at evaluation, LEDD) were excluded from the multivariate analysis, because levodopa use is an essential condition for the outcome (LID) and could cause distortions in the models. Genetic predictors models with p<0.1 in univariate analysis were tested using multivariate logistic regression adjusted for gender, age of PD onset, and levodopa therapy duration. For the mixed model, we included only the main independent variables selected in clinical and genetic models.
For internal validation of the final model, we applied a bootstrap resampling procedure with 1,000 repetitions. Discrimination of the models was quantified by an area under the curve (AUC) of a receiver operating characteristic (ROC) curve. Calibration of the models was determined with the Hosmer-Lemeshow test for goodness of fit and visualized by the calibration plots, in which we estimated the calibration curve by local regression (LOESS). All analyses were performed using SPSS for Windows, version 23.0 (SPSS Inc., Chicago, USA).

Clinical and genetic characteristics
We recruited a total of 525 Brazilian patients with PD for this study. Of these, 430 patients fulfilled the inclusion and exclusion criteria. We did not perform genotyping in 16 patients (with LID -five patients, without LID -11 patients) due to problems with DNA extraction, and we did not genotype all patients from one center (Universidade Federal do Rio Grande do Sul -n=233) for DAT1 SNP.
Regarding clinical variables, there were no missing data for gender, age at PD onset, age at evaluation and disease duration. There were missing data for 22 patients for levodopa therapy duration, 20 patients for initial motor symptom, 17 patients for clinical motor phenotype definition, 14 patients for LEDD, 10 patients for MDS-UPDRS Part III scores, 10 patients for amantadine use, 10 patients for MAO-B inhibitors use, 10 patients for COMT inhibitors use, six patients for dopaminergic agonists use, four patients for Hoehn & Yahr stage.
Regarding genetic variables, DRD2/ANKK1 SNP genotype data were missing in 45 patients, BDNF SNP gen type data were missed in 37 patients, DRD3 SNP genotype data were missed in 22 patients, COMT and ADORA2A SNP genotype data were missing in 17 patients, and MAOB SNP genotype data were missing in 19 patients. DAT1 SNP genotyping was only performed in 186 patients.
Clinical characteristics of patients according to the presence of LID are shown in Table 1. Patients with LID had an earlier onset of PD, longer disease and levodopa therapy duration, higher LEDD, increased disease severity, a higher proportion of non-tremor symptoms at PD onset, and a higher proportion of PIGD clinical phenotype at evaluation. Moreover, patients with LID used dopaminergic agonists, amantadine, and COMT inhibitors more frequently.
All genotyped SNPs displayed Hardy-Weinberg equilibrium. There were no significant differences in genotypic distribution or allelic frequency for all analyzed SNPs between PD patients with and without LID, except for genotypic and allelic frequency of ADORA2A SNP rs2298383 ( Table 2).

Analysis of association between clinical and genetic variables with LID
We presented all independent variables in the univariate or multivariate analysis in Table 3. Multivariate analysis of clinical variables showed that age of PD onset, disease duration, initial motor symptom, and use of dopaminergic agonists were associated to LID (Table 3). There was no multicollinearity between independent variables.
Regarding genetics variables, only ADORA2A SNP was selected to multivariate analysis, and it was associated to LID (CC genotype) after adjustment for gender, age of PD onset, and levodopa therapy duration (Table 3). There was no multicollinearity between independent variables. After, we dichotomized ADORA2A SNP genotyping data (0=genotypes TT and TC; 1=genotype CC) to simplify the multivariate analysis.
We elaborated a multivariate model with the presence of LID as the dependent variable and included the main clinical variables (age of PD onset, disease duration, initial motor symptom and use of dopaminergic agonists) as independent variables, like described before. For this clinical model, the AUC was 0.815 (95% confidence interval [95%CI] 0.787-0.85) ( Figure 1A). The mixed model comprised three of four independent variables in the clinical model (age of PD onset, disease duration, initial motor symptom; use of dopaminergic agonists was not significant in this model, with p=0.075) plus ADORA2A rs2298383 SNP genotype, showing an AUC of 0.817 (95%CI 0.77-0.85) ( Figure 1B). Based on AUC, both models  Figures 1C and 1D. We submitted the models for a bootstrapping procedure (internal validation), and these results were not significantly different from original multivariate results (Table 4).
To allow individualized calculations of the predictive probability of LID in PD patients, we used the following regression formulae:

DISCUSSION
This cross-sectional and hospital-based study explored the association of clinical and genetic variables with LID in PD and found an association between age of PD onset, disease duration, initial motor symptom and use of dopaminergic agonists and ADORA2A rs2298383 SNP genotype with this motor complication. Furthermore, these data allowed the construction of two diagnostic prediction models for LID in PD (clinical and mixed), with good performance, discrimination, and calibration.
All these clinical predictors had been previously associated to LID, mainly age (age of PD onset or age at evaluation). However, the most reliable clinical predictor in our model was the initial motor symptom. We found that there were 70% decreased odds of developing LID among patients with tremor as initial motor symptom compared to patients with other motor symptoms at PD onset. Previous studies  showed patients with tremor as initial motor symptom had a lower risk of developing LID [26][27][28] , and other two recent studies reported tremor dominant motor phenotype at evaluation was also associated to a lower risk of LID 29,30 . ADORA2A gene is located on chromosome 22 (22q11.23) and encodes adenosine A 2A receptor, which stimulates adenylate cyclase generation and is predominantly expressed in basal ganglia 14,31 . Adenosine A 2A receptor has been associated to LID in PD. Its expression is upregulated in the striatum and external globus pallidus of a rodent model of LID and post-mortem PD patients with LID 32,33 . Studies in animal models showed adenosine A 2A receptor antagonists treatment could improve motor symptoms without increasing dyskinetic movements; however, new drugs, such as istradefylline, preladenant or tozadenant had no considerable impacts on LID in PD patients 34 . ADORA2A SNP rs2298383 is located in intron 1, a potential promoter region which can regulate the function of adenosine A 2A receptor 14 , and its association with LID was first described in Jewish Israeli PD patients 35 . Many patients enrolled in our study (about 52%) were recruited in a previous study in Southern Brazil, which also detected an association between ADORA2A SNP rs2298383 and LID 14 .
Regarding prediction models for LID in PD, Schapira et al. 36 described a prognostic prediction model to estimate the risk of developing LID in 3.25 years, based on patients enrolled in a clinical trial (STRIDE-PD). The prediction model selected as variables: age, daily levodopa dose per weight, UPDRS Part II score and gender, with modest discrimination (C-statistics 0.697) 36 .
The inclusion of SNPs as predictors of LID in PD was explored in a recent study, which did not find any association between a genetic risk score with dyskinesia 37 . The genetic risk score was based on a previous paper, which identified 28 independent risk SNPs for PD (none of these SNPs were analyzed in our study) 38 . We believe genetic variations can be very useful as predictors of LID, but more data from larger populations is needed.
Our primary purpose was to produce a diagnostic prediction tool that could be used in future clinical trials to select patients based on a stratification of their probability to develop LID. Negative results in clinical trials on LID are common, even when there is robust evidence of efficacy on animal models. One possible reason may be the absence of methods to select subgroups of patients more suitable for new drugs to prevent onset or to reduce dyskinesias. Considering SNP genotyping is not a widely available method in clinical practice, and since the accuracy, discrimination, and calibration of both prediction models (clinic and mixed) were similar, we suggest that the clinical prediction model may be more useful. The clinical prediction model has four parameters, which can be easily assessed.
We presented both models as regression formulae because it is one of the simplest formats, and it can be transformed into a specific program or an online calculator. Besides that, despite being developed to clinical research, our diagnostic prediction tool can be used in clinical practice, in a shared decision-making setting, helping neurologists and patients to decide about patients' therapeutic options, mainly about levodopa titration. Our clinical prediction model for the development of LID is already available as a mobile application (DysKalc ® ) for free in Google Play online store for Android operating system users.
Our study has many limitations. With a cross-sectional design, we cannot estimate the risk of developing LID over a specific period, meaning our prediction model has no prognostic properties. However, our data allowed the development of another type of prediction model, the diagnostic prediction model, which is designed to estimate the probability of an underlying outcome (in our model, LID), based on predictors (independent variables) that make a subject suspected of having a specific condition in the present, not in future 25 . Nevertheless, this approach must be replicated in a prospective cohort. We could not perform external validation, an essential step to support the generalizability of a prediction model for subjects other than those analyzed. About our results on the association of ADORA2A SNP rs2298383 and LID, they must be replicated in other samples.
As to the strengths of our study, our total sample was composed of three independent samples from distinct centers in Brazil, increasing the generalizability of our models. Statistical power based on sample size was enough, according to the "rule of thumb" of 10 subjects per candidate predictor. Selection of independent variables in the multivariable analysis based on Akaike's Information Criterion avoided overfitting and spurious predictors. Another strength is the internal validation (the reproducibility of a developed model) with bootstrapping, one of the most reliable techniques for this purpose.
In conclusion, we developed two diagnostic predictions (clinical and mixed) models to estimate the probability of LID in PD, with reasonable accuracy, discrimination, and calibration. Between these two models, we suggest the use of the clinical prediction model, because its predictors are more easily assessed. Further replications in prospective cohorts and external validation are needed.